Is this the project announced a week or two ago by an AI company claiming they had built a browser but it turned out to be a crappy wrapper around Servo that didn’t even build? Or is this another one? I thought it was Anthropic but this says Cursor.
> Last week Cursor published Scaling long-running autonomous coding, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was FastRender, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is now available on YouTube. I’ve included some of the highlights below.
It is the same project, but my impression is that HN exaggerated many of the issues with it.
For example:
- They did eventually get it to build. Unknown to me: were the agents working on it able to build it, or were they blindly writing code? The codebase can't have been _that_ broken since it didn't take long for them to get it buildable, and they'd produced demo screenshots before that.
- It had a dependency on QuickJS, but also a homegrown JS implementation; apparently (according to this post) QuickJS was intended as a placeholder. I have no idea which, if either, ended up getting used, though I suspect it may not even matter for the static screenshots they were showing off (the sites may not have required JS to show that).
- Some of the dependencies (like Skia and HarfBuzz) are libraries that other browsers also depend on and are not part of browser projects themselves.
- Other dependencies probably shouldn't have been used, but they only represent a fraction of what a browser has to do.
However…
What I don't know, and seemingly nobody else knows, is how functional the rest of the codebase is. It's apparently very slow and fails to render most websites. But is this more like "lots of bugs, but a solid basis", or is it more like "cargo-culted slop; even the stuff that works only works by chance"? I hope someone investigates.
> were the agents working on it able to build it, or were they blindly writing code?
The project was able to build the whole time, and the agents were constantly compiling it using the Rust compiler and fixing any compile errors as they occurred.
The GitHub CI builds were failing, and when they first opened the repo people incorrectly assumed that meant the code didn't compile at all.
The biggest problem with the repo when they first released it was that there were no build instructions for end-users, so it was hard to try out. They fixed that within 24 hours of the initial release.
> What I don't know, and seemingly nobody else knows, is how functional the rest of the codebase is.
That said, it's very much intended as a research project into running parallel coding agents as opposed to a serious browser project that's intended for end users. At the end of my post I compare it to "hello world" - I think "build a browser" may be the "hello world" of massively parallel coding agent systems, which I find quite amusing.
I hear you but still just struggle what we are supposed to take from this. If I worked for McDonalds and came up with a way to make 1000 bad hamburgers in the time it takes for them to currently make 10 good ones, no one would be that impressed.
"Hello world" is self-justifying, you know it when you see it, and it is what it is because it shows something unambiguous and impossible to mistake.
The thing I took from this is that you can arrange a set of coding agents in a tree of planners and workers and have them churn away on much larger projects than if you use a single coding agent.
This is a new capability - this likely would now have worked prior to GPT-5.1 and Opus 4.5, so we've had models that can do this for less than three months.
It's extremely new: the patterns that work are just starting to be figured out. Wilson had an effectively unlimited token budget from Cursor and got to run experiments that most teams would not be able to afford.
The fact that it is new is meaningless: the output is useless even as a proof of concept web engine and should be discarded, alongside the agent engineering pattern that produced it.
That project I consider a proper POC of a web engine, even though it doesn't even run javascript. Why? Because it has a nice architecture built around a clear idea--radical modularity--which could scale-up to a full web engine one day, despite major challenges remaining.
I think that with AI assistance, if you had some idea, you could reshuffle components of Blitz and have your own thing rendering to the screen within a day.
let's say you had a more ambitious goal, like taking Blitz and adding a JS engine like Boa. Well if you had some clear idea on how to do it, you could get a nice little POC in a week or two.
Basically what I'm saying is that yes the AI would save you a ton of typing and you'd be able to iterate on your idea. There are plenty of layout/graphics/Js components out there to choose from, so you could ensure a relatively small and clean POC.
Someone doing that, with or without AI but with a good idea, would impress me.
FastRender on the other hand is just this humongous pile of spaghetti, and my guess is it still is entirely dependent on existing libraries for actually showing something to the screen.
So that's the clear fail of the agent in my opinion: why produce so much code when it could be done so easily otherwise. Also, why bs yourself into these architecture docs and pretend you are following the specs when in fact you are not?
Everytime I try to browse the code I give up, mostly because when I look at something to try to understand how it fits into the whole, I end-up realizing it's only used in some unit-test.
> FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree.
This is something that can be done in much less than a million lines of code. There must be a core somewhere in Fastrender--probably just a few thousands lines--which is putting together existing layout and graphics libraries and makes it render something to the screen.
Doing that in a few weeks isn't impressive, especially not when buried in a million lines of spaghetti code.
I'm pretty sure it renders far better than Fastrender(the edits the agents made to Taffy are probably nonsense), and I'm guessing it is at most 50k lines.
Conclusion:
In the light of the efforts to paper over failures, I'm calling Fastrender not a research project but propaganda.
They wanted to figure out patterns to have thousands of agents work on millions of lines together in parallel without stepping on each other's toes. They achieved that. Looks like a success to me.
Implementing a browser was just the demo for that. I called it their "hello world" at the end of my post.
How is spaghetti code that does not implement the spec(web standards in this case) success?
You are one of the creators of Django; so let me try to give you an analogy: if someone runs thousands of agents in parallel to produce a web framework, and the code ends-up being able to connect to a database and render a template using existing libraries, and the rest would be total non-sense and otherwise useless to web devs; would you call that a success?
Success in software requires something that works as intended and is maintainable.
Their success criteria was "can we run thousands of agents at once and have them work towards a goal". By that criteria it was a success.
So yes, in your hypothetical I would call it a success IF their goal was parallel agent research. I'd call it a failure if they told me they were trying to build a production-quality alternative to Django.
I understand this is not meant as production level quality, but as a web engineer I was expecting at least a decent POC with some interesting design ideas; not total spaghetti that even gets the spec wrong(despite the good idea of checking the spec in the repo).
They may have solved a problem related to agent coordination, like you discussed in your interview related to conflicts and allowing edits to merge without always compiling.
But at the end of the day, a novelty like this is only useful in so far as it produces good code; I don't see how coding agents are of any help otherwise.
So the failure of the pattern should be acknowledged, so we can move on and figure out what does work.
I speculate that what does work is actually quite similar to managing an open source project: don't merge if it doesn't pass CI, and get a review from a human(the question is as what level of granularity). You also need humans in the project to decide on ways of doing things, so that the AI is relegated to its strength: applying existing patterns.
In all seriousness, you can tell Wilson to get in touch with me. With even only one person with domain knowledge involved in such an effort, and with some architectural choices made ahead of unleashing the herd, I think one could do amazing stuff.
I was actually thinking of your earlier comments about this from the perspective of a Servo engineer when I asked Wilson how much of his human-level effort on this project related to browser architecture as opposed to parallel agents research.
The answer I got made it clear to me that this wasn't a browser project - he outsourced almost all of the browser design thinking to the agents and focused on his research area, which was parallel agent coordination.
I'm certain having someone in the human driving seat who understands browsers and was actively seeking to build the best possible browser architecture would produce very different results!
With the scope of the experiment in mind, I think we can deduce from it that AI is just not able to produce good software unsupervised. It's an important lesson.
To make a wider point, let's look at another of your prediction: that in 2026 the quality of AI code output will be undeniable. I actually think we've already reached that point. Since those agents came around I've never encountered a case where the AI wasn't able to code what I instructed it to. But that's not the same thing as software engineering, and in fact, I have never been impressed by the AI solving real problems for me.
It simply sucks at high quality software architecture. And I don't think this is due to a lack of computing power but that, rather, only humans can figure out what makes sense for them. And this matters, because if the software doesn't make sense, beyond very simple things you can test manually, it becomes impossible to know whether it works as intended.
A Web engine is a great example, because despite the extensive shared suites of tests and specifications, implementing one remains a challenge. You can write code and pass 90% of some sub test suite, and then figure out that your architecture for that Web API is all wrong and you'll never get to the last 10% and in fact your code is fundamentally broken. Unleashing AI without supervision makes this problem worse I think. Solving it requires human judgement and creativity.
Current coding agents are not up to the task of producing a production-quality web browser.
I genuinely can't imagine a better example of a sophisticated software project than a browser. Chrome, Firefox and WebKit each represent a billion-plus dollars of engineering salaries paid out to expert developers over multiple decades.
Browsers have to deal with a bewildering array of specifications - and handle websites that don't conform fully to those specifications.
They have extreme performance targets.
They operate in the most hostile environment imaginable - executing arbitrary code from the web! - so security is diabolically difficult as well.
I expect there's not a human on earth who could single-handedly architect a production grade new browser, it requires too much specialist knowledge across too many different disciplines.
On the basis, I should reconsider my 2029 prediction. I was imagining something that looked like FastRender but maybe a little more advanced - working JavaScript, for example.
A much more interesting question is when we might see a production grade browser built mostly by coding agents.
I do think we could see that by 2029, but if we did it would be a very different shape from the FastRender project:
- a team of human experts driving the coding agents
- in development for at least a year
- built with the goal of "production ready Chrome competitor", not as a research project
The question then becomes who would fund such a thing - even a small team of experts doesn't come cheap, and I expect that LLM prices in 2029 will still measure in the tens or hundreds of thousands of dollars for this, if not more.
Hard for me to definitely predict that someone will step up to fund such a project - great open source browser engines exist already, why would someone fund one from scratch?
"Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should."
I'm curious what is the energy/environmental/financial impact of this "research" effort of cobbling together a browser based on AI model that had been trained on freely available source code of existing browsers.
I can't imagine this browser being used outside of tinkering or curiosity toy - so the purpose of the research is just to see whether you can run absurd amount of agents simultaneously and produce something that somewhat works?
>I can't imagine this browser being used outside of tinkering or curiosity toy - so the purpose of the research is just to see whether you can run absurd amount of agents simultaneously and produce something that somewhat works?
It took 2M years for the monkeys to produce typewriters and Shakespeare. Now the task is to make monkeys which can do the same in many orders of magnitude shorter time.
> FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree
I feel that we continue to miss the forest for the trees. Writing (or generating) a million lines of code in Rust should not count as an achievement in and of itself. What matters is whether those lines build, function as expected (especially in edge cases) and perform decently. As far as I can tell, AI has not been demonstrated to be useful yet at those three things.
SLOC was a bad indicator 20 years ago and it is today. Don't tell them - once they realize it's a red flag for us they will use some other metric, because they fight for our attention.
Most of us probably knew this already, the internet had paid content for as long as I can remember, but I (naively perhaps) thought that software developers and especially Hacker News was more resilient to it, but I think all of us have to get better at not trusting what we read, unless it's actually substantiated.
I don't understand, what does that screenshot show? That there exists at least one anonymous Chinese company that has offered someone $200 to post about them on HN? Why is that relevant to a conversation about Cursor?
Who are the "they" in "they straight up pay people"?
Read the parent comment first then mine, if you haven't, and it should make sense. Otherwise; "Them" here is referring to "AI companies wanting to market their products". The screenshot shows one such attempt of a company wanting to pay someone on HN to talk and share their product in return of compensation for that. Proof that "They" aren't just "fighting for our attention" in the commonly understood way, they're also literally paying money to talk about them.
To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files [...]
Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.
The point is that the agents can comprehend the huge amount of code generated and continue to meaningfully contribute to the goal of the project. We didn't know if that was possible. They wanted to find out. Now we have a data point.
Also, a popular opinion on any vibecoding discussion is that AI can help, but only on greenfield, toy, personal projects. This experiment shows that AI agents can work together on a very complex codebase with ambitious goals. Looks like there was a human plus 2,000 agents, in two months. How much progress do you think a project with 2,000 engineers can achieve in the first two months?
> What matters is whether those lines build, function as expected (especially in edge cases) and perform decently. As far as I can tell, AI has not been demonstrated to be useful yet at those three things.
They did build. You can give it a try. They did function as expected. How many edge cases would you like it to pass? Perform decently? How could you tell if you didn't try?
simonw, I find it almost shocking how you had the chance to talk directly with the engineer who built this, and even when he directly says things that contradict what Cursor's own CEO said, you didn't push back a single iota.
Is the takeaway here that it's fine for a CEO to claim "it even has a custom JS VM!" on Twitter/X, then afterwards the engineer explains: "The JavaScript engine isn’t working yet" and "the agents decided to pause it", and this is all OK? Not a single pushback about this very obvious contradiction? This is just one example of many, and again, since it seems to be repeated: no, no one thinks this was supposed to rival Chrome, what a trite way of trying to change the narrative.
I understand you don't want to spook future potential interviewees, but damn if that didn't feel like you suddenly are trying to defend Cursor here, instead of being curious about what actually happened. It doesn't feel curious, it feels like we're all giving up the fight against unneeded hype, exaggeration and degradation of quality.
What happened with balanced perspectives, where we don't just take people for their words, and when we notice something is off, we bring it up?
On a separate note, I actually emailed Wilson Lin too, asking if I could ask questions about it. While he initially accepted, I never actually received any answers. I'm glad to you were able to get someone from Cursor to clarify a bit at least, even though we're still just scratching the surface. I just wish we had a bit more integrity in the ecosystem and community I guess.
Either of them. If the CEO refuses to answer, you ask others. If you get a chance to talk with them, you ask them about it. Just ignoring the elephant in the room and hoping that the unclear details gets forgotten helps no one except Cursor here.
Honestly, grilling him about what the CEO had tweeted didn't even cross my mind.
I wanted to get to the truth of what had actually been built and how. If that contradicts what the CEO said then great, the truth is now out there - anyone is free to call that out and use my video as ammunition.
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
This doesn't strike me as the world's most dishonest tweet, though it exaggerates what was achieved. There IS a JS VM in there but it's feature-flagged off. The from-scratch is misleading because there are libraries handling certain aspects - most notably Taffy - which we discussed in the interview.
I just ran "cloc" and to my surprise it counted 3,036,403 (I had thought the 3M was an exaggeration) though only 1,658,651 of that was Rust.
"It kind of works" is a fair assessment IMO!
I don't think "Let's talk about your CEO exaggerating what you built on Twitter" would have added much to the interview.
I did make sure to go over the controversies I thought were material to the project, which is why I dug into the dependencies and talked about QuickJS and Taffy.
> Honestly, grilling him about what the CEO had tweeted didn't even cross my mind.
That's not the full meaning of what I meant either, I'm assuming you also read the initial blog post they posted? It's also has a bunch of similar inaccurate statements.
> I wanted to get to the truth of what had actually been built and how.
It's a shame that you seemed to have reviewed that from the point of after a human stepped in to fix the codebase, which happened way after they first published the blog post. Maybe now it compiles and builds, but how does that answer to the fact that it wasn't at the time of publishing?
There is a "hole" of two days without commits, presumably when the engineer was busy writing the blog post, and that's the point they "sold" as "this is what was produced by the experiment". To then let them spend more human engineering time to patch the codebase, and review if from after the human fixed it, seems like completely missing the point.
> I don't think "Let's talk about your CEO exaggerating what you built on Twitter" would have added much to the interview.
What would have added a whole lot more to the ecosystem's understanding on how feasible this sort of things actually are in reality, would have been to talk about what that same person you interviewed first wrote in the blog post, and what turned out to be real at the time they published it.
> To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore the source code on GitHub.
> Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.
The commits that knocked the project into shape so other people could build the code were handled by agents as well.
I really don't think there's a notable scandal here.
I don't think there is a "scandal" here neither, companies lie and exaggerate all the time and it's becoming normalized. With that said, I think it's important to record when it happens and exactly how it happens, because not only does it help people in the future to know what to look out for, it also serves as a historical record to refer to when you start to see repeating patterns.
Agree to disagree about "all of them accurate", I've already made my case elsewhere and doesn't really help anyone to re-iterate here what's public already.
1) The CEO said there was a JS engine, but it didn't work.
2) It didn't build when they published the blog post.
Therefore it lacks integrity! Except that it built (I took Simon's words for it), and building a browser is beside the point, there are a few other big projects listed (Java LSP, Windows 7 emulator, Excel, etc.)
The blog stated:
"Our goal is to understand how far we can push the frontier of agentic coding for projects that typically take human teams months to complete.
This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens."
They didn't set the goal of building a browser. It's an experiment about coordinating AI agents within a context of a complex software project, yet you complained they exaggerating about a JS engine?
The blog post itself is one of the first that describes a large scale experiment of agents, what works, what doesn't. There is very little hype. They didn't say it's game changing or Cursor is the best AI tool.
WD-42 | 21 hours ago
thunderbong | 21 hours ago
> Last week Cursor published Scaling long-running autonomous coding, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was FastRender, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is now available on YouTube. I’ve included some of the highlights below.
comex | 21 hours ago
For example:
- They did eventually get it to build. Unknown to me: were the agents working on it able to build it, or were they blindly writing code? The codebase can't have been _that_ broken since it didn't take long for them to get it buildable, and they'd produced demo screenshots before that.
- It had a dependency on QuickJS, but also a homegrown JS implementation; apparently (according to this post) QuickJS was intended as a placeholder. I have no idea which, if either, ended up getting used, though I suspect it may not even matter for the static screenshots they were showing off (the sites may not have required JS to show that).
- Some of the dependencies (like Skia and HarfBuzz) are libraries that other browsers also depend on and are not part of browser projects themselves.
- Other dependencies probably shouldn't have been used, but they only represent a fraction of what a browser has to do.
However…
What I don't know, and seemingly nobody else knows, is how functional the rest of the codebase is. It's apparently very slow and fails to render most websites. But is this more like "lots of bugs, but a solid basis", or is it more like "cargo-culted slop; even the stuff that works only works by chance"? I hope someone investigates.
simonw | 20 hours ago
The project was able to build the whole time, and the agents were constantly compiling it using the Rust compiler and fixing any compile errors as they occurred.
The GitHub CI builds were failing, and when they first opened the repo people incorrectly assumed that meant the code didn't compile at all.
The biggest problem with the repo when they first released it was that there were no build instructions for end-users, so it was hard to try out. They fixed that within 24 hours of the initial release.
> What I don't know, and seemingly nobody else knows, is how functional the rest of the codebase is.
It's functional enough to render web pages - you can build it and run it yourself to see that, I have some screenshots from trying it out here: https://simonwillison.net/2026/Jan/19/scaling-long-running-a...
That said, it's very much intended as a research project into running parallel coding agents as opposed to a serious browser project that's intended for end users. At the end of my post I compare it to "hello world" - I think "build a browser" may be the "hello world" of massively parallel coding agent systems, which I find quite amusing.
felipeerias | 13 hours ago
beepbooptheory | 9 hours ago
"Hello world" is self-justifying, you know it when you see it, and it is what it is because it shows something unambiguous and impossible to mistake.
simonw | 9 hours ago
This is a new capability - this likely would now have worked prior to GPT-5.1 and Opus 4.5, so we've had models that can do this for less than three months.
It's extremely new: the patterns that work are just starting to be figured out. Wilson had an effectively unlimited token budget from Cursor and got to run experiments that most teams would not be able to afford.
polyglotfacto | 9 hours ago
simonw | 8 hours ago
polyglotfacto | 8 hours ago
I wrote: "useless even as a proof of concept". It doesn't have to be perfect; it just needs to show a clear path forward.
simonw | 7 hours ago
If I was trying to build a new browser and I got to that point within a few weeks of starting I would be ecstatic.
polyglotfacto | 6 hours ago
A good place to look for how one could do this, is https://github.com/DioxusLabs/blitz/tree/main
That project I consider a proper POC of a web engine, even though it doesn't even run javascript. Why? Because it has a nice architecture built around a clear idea--radical modularity--which could scale-up to a full web engine one day, despite major challenges remaining.
I think that with AI assistance, if you had some idea, you could reshuffle components of Blitz and have your own thing rendering to the screen within a day.
let's say you had a more ambitious goal, like taking Blitz and adding a JS engine like Boa. Well if you had some clear idea on how to do it, you could get a nice little POC in a week or two.
Basically what I'm saying is that yes the AI would save you a ton of typing and you'd be able to iterate on your idea. There are plenty of layout/graphics/Js components out there to choose from, so you could ensure a relatively small and clean POC.
Someone doing that, with or without AI but with a good idea, would impress me.
FastRender on the other hand is just this humongous pile of spaghetti, and my guess is it still is entirely dependent on existing libraries for actually showing something to the screen.
So that's the clear fail of the agent in my opinion: why produce so much code when it could be done so easily otherwise. Also, why bs yourself into these architecture docs and pretend you are following the specs when in fact you are not?
Everytime I try to browse the code I give up, mostly because when I look at something to try to understand how it fits into the whole, I end-up realizing it's only used in some unit-test.
For a quick comparison:
- https://github.com/DioxusLabs/blitz/blob/f828015b26d32b0bed3... - https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...
I believe the two are more or less doing the same, but one is 30x the size of the other.
I can't begin to understand the render loop of Fastrender from the code.
On the other hand, here is the one from Blitz shell(the default Blitz app putting together various modular components):
- Window runs in winit loop: https://github.com/DioxusLabs/blitz/blob/f828015b26d32b0bed3... - Redraw is at https://github.com/DioxusLabs/blitz/blob/f828015b26d32b0bed3... - Calls into `paint_scene`, using the generic scene from the generic renderer: https://github.com/DioxusLabs/blitz/blob/f828015b26d32b0bed3...
Simple as that, and with a nice idea in terms of modularity.
That's a POC web engine.
polyglotfacto | 9 hours ago
If so then the failure of the experiment should be acknowledged.
Failure described among others at: https://news.ycombinator.com/item?id=46705625
> It's functional enough to render web pages
> FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree.
This is something that can be done in much less than a million lines of code. There must be a core somewhere in Fastrender--probably just a few thousands lines--which is putting together existing layout and graphics libraries and makes it render something to the screen.
Doing that in a few weeks isn't impressive, especially not when buried in a million lines of spaghetti code.
If you want an example of a real prototype web engine build along radical design choices, head over to https://github.com/DioxusLabs/blitz
I'm pretty sure it renders far better than Fastrender(the edits the agents made to Taffy are probably nonsense), and I'm guessing it is at most 50k lines.
Conclusion:
In the light of the efforts to paper over failures, I'm calling Fastrender not a research project but propaganda.
simonw | 9 hours ago
Implementing a browser was just the demo for that. I called it their "hello world" at the end of my post.
polyglotfacto | 9 hours ago
How is spaghetti code that does not implement the spec(web standards in this case) success?
You are one of the creators of Django; so let me try to give you an analogy: if someone runs thousands of agents in parallel to produce a web framework, and the code ends-up being able to connect to a database and render a template using existing libraries, and the rest would be total non-sense and otherwise useless to web devs; would you call that a success?
Success in software requires something that works as intended and is maintainable.
simonw | 8 hours ago
So yes, in your hypothetical I would call it a success IF their goal was parallel agent research. I'd call it a failure if they told me they were trying to build a production-quality alternative to Django.
polyglotfacto | 8 hours ago
They may have solved a problem related to agent coordination, like you discussed in your interview related to conflicts and allowing edits to merge without always compiling.
But at the end of the day, a novelty like this is only useful in so far as it produces good code; I don't see how coding agents are of any help otherwise.
So the failure of the pattern should be acknowledged, so we can move on and figure out what does work.
I speculate that what does work is actually quite similar to managing an open source project: don't merge if it doesn't pass CI, and get a review from a human(the question is as what level of granularity). You also need humans in the project to decide on ways of doing things, so that the AI is relegated to its strength: applying existing patterns.
In all seriousness, you can tell Wilson to get in touch with me. With even only one person with domain knowledge involved in such an effort, and with some architectural choices made ahead of unleashing the herd, I think one could do amazing stuff.
simonw | 8 hours ago
The answer I got made it clear to me that this wasn't a browser project - he outsourced almost all of the browser design thinking to the agents and focused on his research area, which was parallel agent coordination.
I'm certain having someone in the human driving seat who understands browsers and was actively seeking to build the best possible browser architecture would produce very different results!
polyglotfacto | 7 hours ago
With the scope of the experiment in mind, I think we can deduce from it that AI is just not able to produce good software unsupervised. It's an important lesson.
To make a wider point, let's look at another of your prediction: that in 2026 the quality of AI code output will be undeniable. I actually think we've already reached that point. Since those agents came around I've never encountered a case where the AI wasn't able to code what I instructed it to. But that's not the same thing as software engineering, and in fact, I have never been impressed by the AI solving real problems for me.
It simply sucks at high quality software architecture. And I don't think this is due to a lack of computing power but that, rather, only humans can figure out what makes sense for them. And this matters, because if the software doesn't make sense, beyond very simple things you can test manually, it becomes impossible to know whether it works as intended.
A Web engine is a great example, because despite the extensive shared suites of tests and specifications, implementing one remains a challenge. You can write code and pass 90% of some sub test suite, and then figure out that your architecture for that Web API is all wrong and you'll never get to the last 10% and in fact your code is fundamentally broken. Unleashing AI without supervision makes this problem worse I think. Solving it requires human judgement and creativity.
simonw | 7 hours ago
Current coding agents are not up to the task of producing a production-quality web browser.
I genuinely can't imagine a better example of a sophisticated software project than a browser. Chrome, Firefox and WebKit each represent a billion-plus dollars of engineering salaries paid out to expert developers over multiple decades.
Browsers have to deal with a bewildering array of specifications - and handle websites that don't conform fully to those specifications.
They have extreme performance targets.
They operate in the most hostile environment imaginable - executing arbitrary code from the web! - so security is diabolically difficult as well.
I expect there's not a human on earth who could single-handedly architect a production grade new browser, it requires too much specialist knowledge across too many different disciplines.
On the basis, I should reconsider my 2029 prediction. I was imagining something that looked like FastRender but maybe a little more advanced - working JavaScript, for example.
A much more interesting question is when we might see a production grade browser built mostly by coding agents.
I do think we could see that by 2029, but if we did it would be a very different shape from the FastRender project:
- a team of human experts driving the coding agents
- in development for at least a year
- built with the goal of "production ready Chrome competitor", not as a research project
The question then becomes who would fund such a thing - even a small team of experts doesn't come cheap, and I expect that LLM prices in 2029 will still measure in the tens or hundreds of thousands of dollars for this, if not more.
Hard for me to definitely predict that someone will step up to fund such a project - great open source browser engines exist already, why would someone fund one from scratch?
Ronsenshi | 21 hours ago
I'm curious what is the energy/environmental/financial impact of this "research" effort of cobbling together a browser based on AI model that had been trained on freely available source code of existing browsers.
I can't imagine this browser being used outside of tinkering or curiosity toy - so the purpose of the research is just to see whether you can run absurd amount of agents simultaneously and produce something that somewhat works?
sealeck | 21 hours ago
sebzim4500 | 13 hours ago
Yes but this is a very interesting question IMO
benatkin | 21 hours ago
> Any sufficiently complicated AI orchestration system contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Gas Town.
polotics | 18 hours ago
danpalmer | 17 hours ago
nurettin | 19 hours ago
trhway | 15 hours ago
terabytest | 17 hours ago
I feel that we continue to miss the forest for the trees. Writing (or generating) a million lines of code in Rust should not count as an achievement in and of itself. What matters is whether those lines build, function as expected (especially in edge cases) and perform decently. As far as I can tell, AI has not been demonstrated to be useful yet at those three things.
ksynwa | 17 hours ago
agumonkey | 16 hours ago
electroglyph | 12 hours ago
mejutoco | 17 hours ago
Company X does not have a production-ready product, but they have thousands of employees.
I guess it could be a strange flex about funding but in general it would be a bad signal.
bflesch | 13 hours ago
embedding-shape | 12 hours ago
Not only that, they straight up pay people to just share and write about their thing: https://i.imgur.com/JkvEjkT.png
Most of us probably knew this already, the internet had paid content for as long as I can remember, but I (naively perhaps) thought that software developers and especially Hacker News was more resilient to it, but I think all of us have to get better at not trusting what we read, unless it's actually substantiated.
simonw | 10 hours ago
Who are the "they" in "they straight up pay people"?
embedding-shape | 9 hours ago
azornathogron | 12 hours ago
I think some of these people need to be reminded of the Bill Gates' quote about lines of code:
“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”
signatoremo | 7 hours ago
To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files [...]
Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.
The point is that the agents can comprehend the huge amount of code generated and continue to meaningfully contribute to the goal of the project. We didn't know if that was possible. They wanted to find out. Now we have a data point.
Also, a popular opinion on any vibecoding discussion is that AI can help, but only on greenfield, toy, personal projects. This experiment shows that AI agents can work together on a very complex codebase with ambitious goals. Looks like there was a human plus 2,000 agents, in two months. How much progress do you think a project with 2,000 engineers can achieve in the first two months?
> What matters is whether those lines build, function as expected (especially in edge cases) and perform decently. As far as I can tell, AI has not been demonstrated to be useful yet at those three things.
They did build. You can give it a try. They did function as expected. How many edge cases would you like it to pass? Perform decently? How could you tell if you didn't try?
joduplessis | 15 hours ago
embedding-shape | 13 hours ago
Is the takeaway here that it's fine for a CEO to claim "it even has a custom JS VM!" on Twitter/X, then afterwards the engineer explains: "The JavaScript engine isn’t working yet" and "the agents decided to pause it", and this is all OK? Not a single pushback about this very obvious contradiction? This is just one example of many, and again, since it seems to be repeated: no, no one thinks this was supposed to rival Chrome, what a trite way of trying to change the narrative.
I understand you don't want to spook future potential interviewees, but damn if that didn't feel like you suddenly are trying to defend Cursor here, instead of being curious about what actually happened. It doesn't feel curious, it feels like we're all giving up the fight against unneeded hype, exaggeration and degradation of quality.
What happened with balanced perspectives, where we don't just take people for their words, and when we notice something is off, we bring it up?
On a separate note, I actually emailed Wilson Lin too, asking if I could ask questions about it. While he initially accepted, I never actually received any answers. I'm glad to you were able to get someone from Cursor to clarify a bit at least, even though we're still just scratching the surface. I just wish we had a bit more integrity in the ecosystem and community I guess.
sebzim4500 | 13 hours ago
embedding-shape | 13 hours ago
simonw | 10 hours ago
I wanted to get to the truth of what had actually been built and how. If that contradicts what the CEO said then great, the truth is now out there - anyone is free to call that out and use my video as ammunition.
I just had a look to see what Michael Truell had said about the project, here it is: https://x.com/mntruell/status/2011562190286045552
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
This doesn't strike me as the world's most dishonest tweet, though it exaggerates what was achieved. There IS a JS VM in there but it's feature-flagged off. The from-scratch is misleading because there are libraries handling certain aspects - most notably Taffy - which we discussed in the interview.
I just ran "cloc" and to my surprise it counted 3,036,403 (I had thought the 3M was an exaggeration) though only 1,658,651 of that was Rust.
"It kind of works" is a fair assessment IMO!
I don't think "Let's talk about your CEO exaggerating what you built on Twitter" would have added much to the interview.
I did make sure to go over the controversies I thought were material to the project, which is why I dug into the dependencies and talked about QuickJS and Taffy.
embedding-shape | 10 hours ago
That's not the full meaning of what I meant either, I'm assuming you also read the initial blog post they posted? It's also has a bunch of similar inaccurate statements.
> I wanted to get to the truth of what had actually been built and how.
It's a shame that you seemed to have reviewed that from the point of after a human stepped in to fix the codebase, which happened way after they first published the blog post. Maybe now it compiles and builds, but how does that answer to the fact that it wasn't at the time of publishing?
There is a "hole" of two days without commits, presumably when the engineer was busy writing the blog post, and that's the point they "sold" as "this is what was produced by the experiment". To then let them spend more human engineering time to patch the codebase, and review if from after the human fixed it, seems like completely missing the point.
> I don't think "Let's talk about your CEO exaggerating what you built on Twitter" would have added much to the interview.
What would have added a whole lot more to the ecosystem's understanding on how feasible this sort of things actually are in reality, would have been to talk about what that same person you interviewed first wrote in the blog post, and what turned out to be real at the time they published it.
simonw | 9 hours ago
> To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore the source code on GitHub.
> Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.
The commits that knocked the project into shape so other people could build the code were handled by agents as well.
I really don't think there's a notable scandal here.
embedding-shape | 9 hours ago
Agree to disagree about "all of them accurate", I've already made my case elsewhere and doesn't really help anyone to re-iterate here what's public already.
simonw | 9 hours ago
Is this just about the usage of the term "from scratch"?
signatoremo | 7 hours ago
https://github.com/wilsonzlin/fastrender/commits/main/
signatoremo | 6 hours ago
1) The CEO said there was a JS engine, but it didn't work.
2) It didn't build when they published the blog post.
Therefore it lacks integrity! Except that it built (I took Simon's words for it), and building a browser is beside the point, there are a few other big projects listed (Java LSP, Windows 7 emulator, Excel, etc.)
The blog stated:
"Our goal is to understand how far we can push the frontier of agentic coding for projects that typically take human teams months to complete.
This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens."
They didn't set the goal of building a browser. It's an experiment about coordinating AI agents within a context of a complex software project, yet you complained they exaggerating about a JS engine?
The blog post itself is one of the first that describes a large scale experiment of agents, what works, what doesn't. There is very little hype. They didn't say it's game changing or Cursor is the best AI tool.
lifis | 7 hours ago
simonw | 5 hours ago