This whole thread is an overreaction. 302 comments about code that does not work. We haven’t committed to rewriting. There’s a very high chance all this code gets thrown out completely.
I’m curious to see what a working version of this looks, what it feels like, how it performs and if/how hard it’d be to get it to pass Bun’s test suite and be maintainable. I’d like to be able to compare a viable Rust version and a Zig version side by side.
I mean, sure, but is a Rust version with 1000 global static muts really a "viable" Rust version in any way that matters?
I don't think that Jarred is lying here or anything like that, but I think that people have wildly different definitions for what viable means and what is reasonable to expect from an experiment like this. The result of this port is going to be so far away from reasonable Rust that I don't think anyone is going to be able to get much information out of it regardless of what test suites it passes.
I'm dealing with something similar at work: a vibecoded k8s operator POC we have to complete and ship (in 2 months total). It's a very shaky foundation, most of the code would not pass review, it's very longwinded and roundabout. I resent being made to prioritize shipping something half broken.
I can't speak for the Bun branch, but my experience comparing to other projects is starting from scratch is a slower start but worth it medium term in knowledge, maintainability, and stability.
Most are created autonomously by @robobun, checked for duplicates with a GitHub action (powered by Claude), reviewed by @coderabbitai and @claude. Meanwhile the CI is broken and @robobun finally closes a portion of its own PRs because they duplicate other PRs it has written. (Merging into main is still done by a human.)
Oh, I did not expect this. I remember Bun being praised for Jarred's "obsession" with perfs. I wouldn't have thought they'd let LLMs run wild on the codebase.
One of his biggest obsessions for a while has been LLMs and maximizing his usage of them, as far as I can see. It 100% tracks with his interests, and it tracks with his views on open source development.
To be fair, I've met a lot of human FOSS maintainers that have no idea or interest in community, either.
I'm deeply uninterested in Zig as a tech, but I do give them kudos for saying they want to build community (as part of why they reject AI-assisted PRs).
Sure, I know of those, and I think that's a valid way to lead a project.
But FOSS in itself in general is kind of like a community, and a world where most or all of FOSS is just a BDFL-led repo being worked on by the BDFL's favorite LLM is not that. Without any required interaction, it'll just devolve into silos of fiefdoms that publish binary artifacts at some interval.
Ah well, it's not that I suspect people espousing such views of being able to recognize the difference between a social movement and a readable source repository.
I'm weirdly committed to the idea that FLOSS is just the license and the software, not any source of community. There should be zero expectation that the creator of the software has to accept any sort of input from other people.
You can argue that doing so makes the software better and development more effective, but that's not really anything inherent in the license.
The license implies a community of lawyers. The software implies hardware, which in turn implies a community of electrical engineers. It is a curious epistemic helplessness to insist that there is not a community of software engineers who author the software, particularly in light of Conway's Law, which bridges the software to the community via homomorphism.
There are plenty of software engineer communities. But putting a FLOSS license on something isn’t a membership card. I can own a motorcycle yet not be assumed to hang out at biker bars.
I'm weirdly committed to the idea that FLOSS is just the license and the software, not any source of community. There should be zero expectation that the creator of the software has to accept any sort of input from other people.
Yes, there should be zero expectation of a community, but there is a culture of doing so.
Just like in many other things in life, if something is legal, it does not necessarily mean that it is welcome.
I fully agree. I'm kind of antisocial-tinker-on-my-own-stuff as well, and could not expect people to behave against their own preferences in their own happy place.
But I would argue three things:
The FLOSS community is (maybe incidentally) mostly build on people, not projects, even with many behaving like you describe.
There would be no FLOSS without the community (in the sense that what exists would not have formed if everything FLOSS was read-only).
Arguing for the total absence of shared work between people (what I read Sumner as doing) is antithetical to the whole FLOSS thing.
This is what AI hit hard first way before software 😕 I also miss visiting my local library and see some other freak giving back a stack of books they've read reminding me to keep reading too.
Unlike humans, they’ll conform to your style guide and implement your every nit in minutes. They’ll write extensive tests & documentation. They’ll research alternative implementations from competing projects. They’ll grind relentlessly until all 3rd party conformance tests pass.
That's crazy. It's like admitting you just want efficient way to steal work written by actual human.
There's certainly something sinister in corporations taking freely available works to turn them into commercial machines that devalue the work and replace the creators who made it possible in the first place.
The copyright laws have failed to anticipate and prevent the extreme case of "fair use" turning into something deeply unfair.
But at the same time, the rejection of LLMs gets uncomfortably close to rejection of fair use and remixing. Before LLMs, these were the good things! We were supposed to build upon, transform, and improve existing works.
I wish we had better ways of raging about the brazen exploitation by resource-gobbling AI companies, the unfairness and externalities of LLMs, without taking a stance that even a trace amount of someone else's work in yours is a crime or moral failing.
What LLMs do has gone beyond simple reproduction of copyrighted material. You can make them create blatant ripoffs if you ask, but you can also make them create a mashup that wouldn't be called plagiarism if it was done by a human. So the damage and exploitation of LLMs is something else, something new, more than simply plagiarism.
I think "fair use" is an entirely flawed framing. A given LLM output is diffusely, stochastically dependent upon tiny fractions of countless documents from the training corpus, but the model itself is built on an enormous number of entire documents. With the right techniques- the right prompts, fine-tuning, model ablation, etc- a significant quantity of the training corpus is recoverable from such a model verbatim, and a great deal of material is recoverable to a lossy approximation: the model is a compressed representation of creative material divorced from its attribution.
"Remixing" is well and good with the consent of the authors or for works in the public domain, but LLM vendors very explicitly ignore the wishes of the authors of all the writing they slurp up from the public web, divorced from attribution or any licensing terms laid out by the author. Without the non-consensual harvesting of non-public domain works at massive scale, willfully obliterating the authorship of the training material, none of the commercially available LLMs could exist.
We don't need to invent a new term for the misrepresentation and erasure of authorship simply because a bunch of sociopathic billionaires have developed a method of ripping off millions of people at once. You can choose to split hairs about whether or not the person turning the crank to use the model is directly plagiarizing or simply benefitting from plagiarism committed by the creators of the model, but either way they're endorsing the act of creating the model and by extension endorsing how the sausage is invariably made!
I do think we need separate treatment of LLMs, because if what you're saying was applied to other technologies, we'd have a copyright-extremist dystopia.
A Web Search index and Internet Archive contain an enormous corpus of entire documents. With the right prompts they output entire documents verbatim. The copyrighted documents generally have been collected without explicit consent of the authors, and often without knowledge about licensing terms (possibly violating them). These sites have a best-effort attribution, which is a decent thing to do, but it's meaningless from licensing perspective. Licenses that grant rights under condition of attribution tend to require specific formats that may not be met by default. These sites don't do any due diligence to correctly attribute individual works. They rely on copyright exceptions that allow them to ignore all terms set by the copyright owners. The copyright law does split hairs on who cranks the crank and how. Reproduction of copyrighted material against wishes of copyright owners is allowed for a web search index, but doesn't automatically extend to users clicking Save As on its pages.
Speech Recognition can be built from slurped public content, wilfully obliterating authorship of the training material with no ability to attribute specific sources.
Fanfic and parodies are often created against wishes of authors of the original works. Reviews and critiques reproduce copyright material, often verbatim.
There are also grayer areas. Machine Translation has shafted professional translators like LLMs shafted everyone else, but it slipped through without causing such outrage. People rather focus on how bad it can be rather than every translated word is a femto-plagiarism of some translator.
Music samples and collages are created from fragments of copyrighted works of other authors. Licensing and attribution are a gray area there, often done just in case and to avoid bad PR rather than as a strict copyright requirement. There have been entire music genres (and big commercial successes) created from ripped samples.
We also already have cases where strict copyright kills projects like software emulators, even in cases where the emulator is independently developed and you want to emulate a game you own, because e.g. the hardware and games perform a handshake that checks for presence of a specific copyrighted and trademarked logo.
I don’t really disagree with you, but with a few pieces of nuance that I think are worth considering:
Training corpus + my series of prompts can result in code that does not exist in the training corpus. Code that you could search the entire internet for and discover that it is in fact novel, not just by using different variable names but because it’s doing something new.
Even before LLMs, there’s been a blurry plagiarism line. Concrete example: If I’m writing “pseudo-object-oriented” code in C and base it all off of structs full of function pointers that all take an explicit self parameter as their first argument… am I plagiarizing something? I’ve seen that pattern in open source code before. I think, possibly, the first place I personally encountered it was writing Quake 2 mods in the late 90s. If I’m writing a short story and start it with “It was a dark and stormy night…” am I plagiarizing someone? If I put together a todo list application and the majority of the code looks a lot like an Intro to React tutorial, am I plagiarizing?
"Free software" implies a freedom to study and reuse ideas. If you think that studying code and reusing ideas is "stealing work", what are you doing in the free software community in the first place?
Sure. But what I'm saying that if you are simply using a software that scans everyones projects and autocompletes the code in with zero effort (licensing be damned) and benefits from it financially, you are kind of a asshole.
Nobody said that “copying code” has occurred, let alone “licensing be damned”. For your information, copyright only protects expression, not ideas. Ideas and concepts can be studied, learned and borrowed without regard to any licensing.
Copyright is not a law of nature, but a construct meant to balance authors' ability to benefit and control their works with broader benefits to society and access to culture in which the works exist.
LLMs have changed the situation dramatically. They didn't just upset the balance, they don't fit copyright's model of the world. We have a paradoxical machine that used and benefitted from everyone's works, arguably collectively harmed all authors, but can produce endless derivative works that by copyright's standards don't harm any author in particular.
For your information, copyright only protects expression, not ideas. Ideas and concepts can be studied, learned and borrowed without regard to any licensing.
The first part is true, the second not really. Ideas and concepts are not the subject of copyright law, but they can be patented.
This is nuts. Looking at the "Idiom map", it's absolutely littered with unsafe and is going to produce a ton of non-idiomatic Rust code. The mapping for @fieldParentPtr("field", ptr) is particularly wild. Although I guess this "phase A" is just about the (basically) line-for-line translation, and they will attempt to prompt-refactor into more idiomatic and maintainable Rust. The issue is, of course, language design can and does push implementations into a particular direction, and there will be some gnarly things to unravel. I suspect to the point where a good ol' fashioned rewrite would have been the better path.
Making it idiomatic and safe can probably be done in a following step by the LLM. The convenient feature of IAs, especially when you have the funding of Anthropic, is that they're not getting tired of refactoring the same code.
Manual or automated, first massage the original code into a portable shape. Remove as much "magic" as possible.
Then port the code as is to a new framework, new language version, or even a new language. Ideally, file and line accurate to be able to quickly refer to the source material in case of doubt.
Find the mismatches between the src and dst set of hidden WTFs, and iron those out. Make the tests green, run some shadow prod traffic, fuzz it, what have you.
Now, you're in a safe place to cast it into the shape you've originally wanted and started the whole process for 😋
Apart from the migration itself, I find the approach that they're taking particularly interesting.
The docs/PORTING.md file I've linked to contains a set of ~300 rules for the port, which I would think is way too many for any LLM to "keep in mind" and follow faithfully. Since Anthropic owns Bun, they probably have an infinite token budget for this port, so maybe they can spawn number of files * number of rules agents to "ensure" they're all followed.
Also, they're splitting the port into two phases: A -> crudely port each file in isolation, expect compilation to be broken; B -> wire everything up so it compiles. For phase A, they're telling the agent that the code "does not need to compile", and to score each ported file with low/medium/high according to the output quality (low = logic is wrong; medium = logic is right but doesn't compile; high = logic is right and should compile).¹
This is the opposite of how I understand and use coding agents, so I'm curious to see how it plays out. (In my mind, telling the agent that the code does not need to compile, and thus not giving it a clear "end goal", would yield very unpredictable results, and would leave me in phase B having to review a mountain of code that I have no confidence in.)
¹ Running a quick grep, out of 1279 such output-quality scores ~3% are low, ~80% are medium, and ~17% are high.
300 rules seems ok to me at the time of writing in May 2026. That PORTING.md is ~16k tokens, and they focus on the primary task. Not too bad, and probably ideal for a fresh subagent.
In the sense that a single model has that much attention budget? I actually never tried to push the limits to see how many "things at the same time" a model could pay attention to, but my intuition would have given a figure one order of magnitude lower.
A year ago there's no way models would have usefully followed a 16,000 token porting guide like this.
Basically every frontier model released since then has boasted "improved instruction following". My intuition is that they can probably handle this level of complexity reasonably well now. It's going to be interesting to see how well this experimental branch works.
The conspiracy theorist inside of me says they are doing this because of Zig's recent ban on AI assisted contributions and Mattew Lugg's roast of Bun's fork of the language. It would be a big flex from Anthropic to move Bun, one of the most popular projects in Zig, away to another language in such vulgar display of power.
I don't fully believe myself in this drama-fuelled theory and it might be just Anthropic deep pockets trying to showcase their products.
If I were to guess they want to test the waters of what a Rust version of Bun might look like. Since they are now owned by anthropic they have tokens to spend and it might make exploring an alternative path viable. I doubt this has anything to do with Zig's stance on AI contributions.
I will however say that as someone who is using agents to program both Rust and Zig, I find it significantly easier to get an agent to work well on a Rust codebase than a Zig codebase and it makes me wonder if that is a related motivation.
Mattew Lugg's roast of Bun's fork of the language.
Fascinating, I hadn't seen this before. From the link:
The rewritten type resolution semantics were designed to avoid these issues, but Bun’s Zig fork does not incorporate the changes (and has not otherwise solved the design problems), which means their parallelized semantic analysis implementation will exhibit non-deterministic behavior. That’s pretty much a non-starter for most serious developers: you don’t want your compilation to randomly fail with a nonsense error 30% of the time.
I think that perhaps the prevailance of LLMs in the industry implies that people are perfectly fine with heisenbugs and something that fails 30% of the time, unfortunately
Remember the kind of people in charge of these companies. My money is on your first bet. "You are breaking up with me? Not if I'm breaking up with you first!"
The only issue I see is that the set of rules to handle at once is impossible to verify without hammering the output through a ton of code review cycles. I get it, they've got tokens, but after such a transformation it's going to be virtually impossible to verify the code. Tests have to go through the same, so what's left from the ground truth then?
Still, an amazing experiment by all means. Good luck!
deno failed to take the world by storm, offering marginal improvements like the permission system, typescript ootb etc., which are all being backported to nodejs as well.
from my experience with it, the dx is not meaningfully better, and sometimes worse due to awesome tools like pnpm. sure, it may be faster, but i haven't really noticed any issues with nodejs's perf in my usage over the last 5-6 years.
jsr also failed to take the world by storm, with the community instead building npmx which now offers a much better experience than jsr.
i will say i do use deno's std library, it's pretty nice, e.g. @std/collections
bun's huge scope raised many red flags for me, but i was cautiously optimistic about it, since it seemed that jarred is commited to really making it the best, be-all end-all js runtime even if it'll take years. however, seeing how it got acquired by anthropic, and how much vibecoding agentic development jarred now does (+ this news of a vibe-port, even if it's an experimental branch), realy cemented it for me as a project i won't be using.
so, nodejs regins supreme! (seriously, the built in typescript stuff and sqlite is nuts!)
finally, i'm not fully anti-ai and i think vibecoding a little python script/webapp to solve a super specific problem you have is fun (and useful!), time and time again i am being convinced that large-scale "agentic development" in any complex project with many moving parts and users is a net negative, which temporarily speeds up feature development, but ultimately causes the project to destabilize and bloat up (e.g. vscode, cursor, mise, perplexity webui (seriously, how difficult is it to make a dropdown with js that doesn't lag when scrolling?)...). i hope more big tools/libraries/projects i rely on recognize this and adopt a strict(er) anti-ai policy.
I am glad Deno and Bun existed because they forced Node to be better (Node deliberately didn't used to implement btoa for crying out loud!), but I think they will ultimately just be footnotes.
Background: I have maintained, oh, a half dozen Rust code bases in between 10,000 and 100,000 lines, but nothing over 100,000 lines. And I know of a few cases where a 20,000-line Rust program implements the essential features of a 300,000 line Go program. So my situation isn't exactly what you described.
But in that 10,000-100,000 line range? Oh, my, I am super happy with Rust as a maintainer. That's small enough for a program to be tightly focused, but large enough implement one core idea thoroughly.
In this size range, both the language itself and the tooling are great assets, at least in my personal experience.
cargo-nextest is currently 84k non-blank/comment lines of Rust code, and I don't think it's feasible for me as a solo maintainer to write it to my personal standard of excellence in any other language.
I've maintained a crate that's about 50K LoC, and a product that is around 700K LoC (of rust code - much larger when you add in other languages).
Both are lovely to work in. The latter has some compile time issues, that is being sorted out. I'd hate to see it in go/zig/ts because rust syntax is really good once you get to know it, and the other languages are pretty meh on conveniences / affordances in comparison.
gonz | a day ago
For what it's worth, here's what Jarred Sumner said on HN about this:
flyingsnake | 22 hours ago
"A lie can travel halfway around the world while the truth is still putting on its shoes."
Most of pop-culture takes on tech are simply based on knee jerk reactions.
jkaye | 20 hours ago
I mean, sure, but is a Rust version with 1000 global static muts really a "viable" Rust version in any way that matters?
I don't think that Jarred is lying here or anything like that, but I think that people have wildly different definitions for what viable means and what is reasonable to expect from an experiment like this. The result of this port is going to be so far away from reasonable Rust that I don't think anyone is going to be able to get much information out of it regardless of what test suites it passes.
strongly-typed | 20 hours ago
I think the point is that it becomes a stepping stone to a more reasonable rust implementation.
jkaye | 15 hours ago
Possibly. Given the current state of the ported code, I don't really see how that could be the case, but I could very easily be wrong about that.
ThinkChaos | 7 hours ago
I'm dealing with something similar at work: a vibecoded k8s operator POC we have to complete and ship (in 2 months total). It's a very shaky foundation, most of the code would not pass review, it's very longwinded and roundabout. I resent being made to prioritize shipping something half broken.
I can't speak for the Bun branch, but my experience comparing to other projects is starting from scratch is a slower start but worth it medium term in knowledge, maintainability, and stability.
pyfisch | a day ago
The regular pull requests for bun are wild too: https://github.com/oven-sh/bun/pulls?q=is%3Apr+
Most are created autonomously by @robobun, checked for duplicates with a GitHub action (powered by Claude), reviewed by @coderabbitai and @claude. Meanwhile the CI is broken and @robobun finally closes a portion of its own PRs because they duplicate other PRs it has written. (Merging into main is still done by a human.)
[OP] pscanf | a day ago
Oh, I did not expect this. I remember Bun being praised for Jarred's "obsession" with perfs. I wouldn't have thought they'd let LLMs run wild on the codebase.
gonz | a day ago
One of his biggest obsessions for a while has been LLMs and maximizing his usage of them, as far as I can see. It 100% tracks with his interests, and it tracks with his views on open source development.
kolja | a day ago
What a ride that links to… These people have no idea what a community is, do they?
kingmob | a day ago
To be fair, I've met a lot of human FOSS maintainers that have no idea or interest in community, either.
I'm deeply uninterested in Zig as a tech, but I do give them kudos for saying they want to build community (as part of why they reject AI-assisted PRs).
kolja | 21 hours ago
Sure, I know of those, and I think that's a valid way to lead a project.
But FOSS in itself in general is kind of like a community, and a world where most or all of FOSS is just a BDFL-led repo being worked on by the BDFL's favorite LLM is not that. Without any required interaction, it'll just devolve into silos of fiefdoms that publish binary artifacts at some interval.
Ah well, it's not that I suspect people espousing such views of being able to recognize the difference between a social movement and a readable source repository.
gerikson | 21 hours ago
I'm weirdly committed to the idea that FLOSS is just the license and the software, not any source of community. There should be zero expectation that the creator of the software has to accept any sort of input from other people.
You can argue that doing so makes the software better and development more effective, but that's not really anything inherent in the license.
(forking the project is always an option)
edited to remove a restatement of your point.
Corbin | 19 hours ago
The license implies a community of lawyers. The software implies hardware, which in turn implies a community of electrical engineers. It is a curious epistemic helplessness to insist that there is not a community of software engineers who author the software, particularly in light of Conway's Law, which bridges the software to the community via homomorphism.
wrs | 15 hours ago
There are plenty of software engineer communities. But putting a FLOSS license on something isn’t a membership card. I can own a motorcycle yet not be assumed to hang out at biker bars.
intelfx | 15 hours ago
Yes, there should be zero expectation of a community, but there is a culture of doing so.
Just like in many other things in life, if something is legal, it does not necessarily mean that it is welcome.
kolja | an hour ago
I fully agree. I'm kind of antisocial-tinker-on-my-own-stuff as well, and could not expect people to behave against their own preferences in their own happy place.
But I would argue three things:
alper | 55 minutes ago
The guy who called an entire segment of software engineers "monkeys" wants to build community??
peter-leonov | 15 hours ago
This is what AI hit hard first way before software 😕 I also miss visiting my local library and see some other freak giving back a stack of books they've read reminding me to keep reading too.
Cloudef | a day ago
That's crazy. It's like admitting you just want efficient way to steal work written by actual human.
Internet_Janitor | 21 hours ago
Endorsement of LLMs is equivalent to endorsing plagiarism, just with some extra steps.
kornel | 10 hours ago
There's certainly something sinister in corporations taking freely available works to turn them into commercial machines that devalue the work and replace the creators who made it possible in the first place.
The copyright laws have failed to anticipate and prevent the extreme case of "fair use" turning into something deeply unfair.
But at the same time, the rejection of LLMs gets uncomfortably close to rejection of fair use and remixing. Before LLMs, these were the good things! We were supposed to build upon, transform, and improve existing works.
I wish we had better ways of raging about the brazen exploitation by resource-gobbling AI companies, the unfairness and externalities of LLMs, without taking a stance that even a trace amount of someone else's work in yours is a crime or moral failing.
What LLMs do has gone beyond simple reproduction of copyrighted material. You can make them create blatant ripoffs if you ask, but you can also make them create a mashup that wouldn't be called plagiarism if it was done by a human. So the damage and exploitation of LLMs is something else, something new, more than simply plagiarism.
Internet_Janitor | 9 hours ago
I think "fair use" is an entirely flawed framing. A given LLM output is diffusely, stochastically dependent upon tiny fractions of countless documents from the training corpus, but the model itself is built on an enormous number of entire documents. With the right techniques- the right prompts, fine-tuning, model ablation, etc- a significant quantity of the training corpus is recoverable from such a model verbatim, and a great deal of material is recoverable to a lossy approximation: the model is a compressed representation of creative material divorced from its attribution.
"Remixing" is well and good with the consent of the authors or for works in the public domain, but LLM vendors very explicitly ignore the wishes of the authors of all the writing they slurp up from the public web, divorced from attribution or any licensing terms laid out by the author. Without the non-consensual harvesting of non-public domain works at massive scale, willfully obliterating the authorship of the training material, none of the commercially available LLMs could exist.
We don't need to invent a new term for the misrepresentation and erasure of authorship simply because a bunch of sociopathic billionaires have developed a method of ripping off millions of people at once. You can choose to split hairs about whether or not the person turning the crank to use the model is directly plagiarizing or simply benefitting from plagiarism committed by the creators of the model, but either way they're endorsing the act of creating the model and by extension endorsing how the sausage is invariably made!
kornel | 7 hours ago
I do think we need separate treatment of LLMs, because if what you're saying was applied to other technologies, we'd have a copyright-extremist dystopia.
A Web Search index and Internet Archive contain an enormous corpus of entire documents. With the right prompts they output entire documents verbatim. The copyrighted documents generally have been collected without explicit consent of the authors, and often without knowledge about licensing terms (possibly violating them). These sites have a best-effort attribution, which is a decent thing to do, but it's meaningless from licensing perspective. Licenses that grant rights under condition of attribution tend to require specific formats that may not be met by default. These sites don't do any due diligence to correctly attribute individual works. They rely on copyright exceptions that allow them to ignore all terms set by the copyright owners. The copyright law does split hairs on who cranks the crank and how. Reproduction of copyrighted material against wishes of copyright owners is allowed for a web search index, but doesn't automatically extend to users clicking Save As on its pages.
Speech Recognition can be built from slurped public content, wilfully obliterating authorship of the training material with no ability to attribute specific sources.
Fanfic and parodies are often created against wishes of authors of the original works. Reviews and critiques reproduce copyright material, often verbatim.
There are also grayer areas. Machine Translation has shafted professional translators like LLMs shafted everyone else, but it slipped through without causing such outrage. People rather focus on how bad it can be rather than every translated word is a femto-plagiarism of some translator.
Music samples and collages are created from fragments of copyrighted works of other authors. Licensing and attribution are a gray area there, often done just in case and to avoid bad PR rather than as a strict copyright requirement. There have been entire music genres (and big commercial successes) created from ripped samples.
We also already have cases where strict copyright kills projects like software emulators, even in cases where the emulator is independently developed and you want to emulate a game you own, because e.g. the hardware and games perform a handshake that checks for presence of a specific copyrighted and trademarked logo.
tonyarkles | 15 hours ago
I don’t really disagree with you, but with a few pieces of nuance that I think are worth considering:
Training corpus + my series of prompts can result in code that does not exist in the training corpus. Code that you could search the entire internet for and discover that it is in fact novel, not just by using different variable names but because it’s doing something new.
Even before LLMs, there’s been a blurry plagiarism line. Concrete example: If I’m writing “pseudo-object-oriented” code in C and base it all off of structs full of function pointers that all take an explicit self parameter as their first argument… am I plagiarizing something? I’ve seen that pattern in open source code before. I think, possibly, the first place I personally encountered it was writing Quake 2 mods in the late 90s. If I’m writing a short story and start it with “It was a dark and stormy night…” am I plagiarizing someone? If I put together a todo list application and the majority of the code looks a lot like an Intro to React tutorial, am I plagiarizing?
intelfx | 15 hours ago
"Free software" implies a freedom to study and reuse ideas. If you think that studying code and reusing ideas is "stealing work", what are you doing in the free software community in the first place?
Cloudef | 14 hours ago
Sure. But what I'm saying that if you are simply using a software that scans everyones projects and autocompletes the code in with zero effort (licensing be damned) and benefits from it financially, you are kind of a asshole.
intelfx | 14 hours ago
Nobody said that “copying code” has occurred, let alone “licensing be damned”. For your information, copyright only protects expression, not ideas. Ideas and concepts can be studied, learned and borrowed without regard to any licensing.
kornel | 9 hours ago
It is generally legal to be an asshole.
Copyright is not a law of nature, but a construct meant to balance authors' ability to benefit and control their works with broader benefits to society and access to culture in which the works exist.
LLMs have changed the situation dramatically. They didn't just upset the balance, they don't fit copyright's model of the world. We have a paradoxical machine that used and benefitted from everyone's works, arguably collectively harmed all authors, but can produce endless derivative works that by copyright's standards don't harm any author in particular.
muvlon | 3 hours ago
The first part is true, the second not really. Ideas and concepts are not the subject of copyright law, but they can be patented.
intelfx | 3 hours ago
Well, yes, in the context of the whole discussion we are talking about copyright licensing, not patent licensing.
mort | 3 hours ago
What are you talking about? Who is studying and re-using ideas here?
nickmonad | a day ago
This is nuts. Looking at the "Idiom map", it's absolutely littered with
unsafeand is going to produce a ton of non-idiomatic Rust code. The mapping for@fieldParentPtr("field", ptr)is particularly wild. Although I guess this "phase A" is just about the (basically) line-for-line translation, and they will attempt to prompt-refactor into more idiomatic and maintainable Rust. The issue is, of course, language design can and does push implementations into a particular direction, and there will be some gnarly things to unravel. I suspect to the point where a good ol' fashioned rewrite would have been the better path.denys_seguret | a day ago
Making it idiomatic and safe can probably be done in a following step by the LLM. The convenient feature of IAs, especially when you have the funding of Anthropic, is that they're not getting tired of refactoring the same code.
peter-leonov | a day ago
Exactly. It's like eating your broccolis first.
Manual or automated, first massage the original code into a portable shape. Remove as much "magic" as possible.
Then port the code as is to a new framework, new language version, or even a new language. Ideally, file and line accurate to be able to quickly refer to the source material in case of doubt.
Find the mismatches between the src and dst set of hidden WTFs, and iron those out. Make the tests green, run some shadow prod traffic, fuzz it, what have you.
Now, you're in a safe place to cast it into the shape you've originally wanted and started the whole process for 😋
TypeScript native port is a great example.
[OP] pscanf | a day ago
Apart from the migration itself, I find the approach that they're taking particularly interesting.
The
docs/PORTING.mdfile I've linked to contains a set of ~300 rules for the port, which I would think is way too many for any LLM to "keep in mind" and follow faithfully. Since Anthropic owns Bun, they probably have an infinite token budget for this port, so maybe they can spawnnumber of files * number of rulesagents to "ensure" they're all followed.Also, they're splitting the port into two phases: A -> crudely port each file in isolation, expect compilation to be broken; B -> wire everything up so it compiles. For phase A, they're telling the agent that the code "does not need to compile", and to score each ported file with low/medium/high according to the output quality (low = logic is wrong; medium = logic is right but doesn't compile; high = logic is right and should compile).¹
This is the opposite of how I understand and use coding agents, so I'm curious to see how it plays out. (In my mind, telling the agent that the code does not need to compile, and thus not giving it a clear "end goal", would yield very unpredictable results, and would leave me in phase B having to review a mountain of code that I have no confidence in.)
¹ Running a quick grep, out of 1279 such output-quality scores ~3% are low, ~80% are medium, and ~17% are high.
kingmob | a day ago
300 rules seems ok to me at the time of writing in May 2026. That PORTING.md is ~16k tokens, and they focus on the primary task. Not too bad, and probably ideal for a fresh subagent.
[OP] pscanf | a day ago
In the sense that a single model has that much attention budget? I actually never tried to push the limits to see how many "things at the same time" a model could pay attention to, but my intuition would have given a figure one order of magnitude lower.
simonw | a day ago
A year ago there's no way models would have usefully followed a 16,000 token porting guide like this.
Basically every frontier model released since then has boasted "improved instruction following". My intuition is that they can probably handle this level of complexity reasonably well now. It's going to be interesting to see how well this experimental branch works.
kingmob | a day ago
My experience based on the last 2 months is that this is perfectly usable.
Part of the problem with LLM-assisted coding is how fast our experiences get out of date.
alurm | a day ago
I'm also very curious about how this actually works. Is there some reliable data about this?
jaredkrinke | a day ago
Do they actually want to transition to Rust or is this a test of Anthropic's LLMs?
jrgtt | a day ago
The conspiracy theorist inside of me says they are doing this because of Zig's recent ban on AI assisted contributions and Mattew Lugg's roast of Bun's fork of the language. It would be a big flex from Anthropic to move Bun, one of the most popular projects in Zig, away to another language in such vulgar display of power.
I don't fully believe myself in this drama-fuelled theory and it might be just Anthropic deep pockets trying to showcase their products.
thesadsre | a day ago
Were Matthew's comments a roast? I think he gave an excellent explanations to the whys and it did not feel like he was dunking on anyone's efforts.
gonz | a day ago
I don't think there were any real efforts by a human being to dunk on.
thesadsre | a day ago
The linked thread was discussing bun's upstream attempts of zig's compiler optimisations, nothing to do with bun's rust rewrite.
gonz | 21 hours ago
I don't think humans were involved in the Zig compiler optimizations either is what I mean.
mitsuhiko | a day ago
If I were to guess they want to test the waters of what a Rust version of Bun might look like. Since they are now owned by anthropic they have tokens to spend and it might make exploring an alternative path viable. I doubt this has anything to do with Zig's stance on AI contributions.
I will however say that as someone who is using agents to program both Rust and Zig, I find it significantly easier to get an agent to work well on a Rust codebase than a Zig codebase and it makes me wonder if that is a related motivation.
alexandria | 21 hours ago
Fascinating, I hadn't seen this before. From the link:
I think that perhaps the prevailance of LLMs in the industry implies that people are perfectly fine with heisenbugs and something that fails 30% of the time, unfortunately
kolja | a day ago
Remember the kind of people in charge of these companies. My money is on your first bet. "You are breaking up with me? Not if I'm breaking up with you first!"
abnercoimbre | 19 hours ago
Folks should read Careless People, as it touches on figures outside Meta too. Personal grievances rule over these folks' psyches.
skade | a day ago
Is the Rust side of things that much LLM-friendlier?
wezm | a day ago
We'll have to wait and see.
peter-leonov | a day ago
Looks real interesting.
The only issue I see is that the set of rules to handle at once is impossible to verify without hammering the output through a ton of code review cycles. I get it, they've got tokens, but after such a transformation it's going to be virtually impossible to verify the code. Tests have to go through the same, so what's left from the ground truth then?
Still, an amazing experiment by all means. Good luck!
kraxen72 | 21 hours ago
let's see how the nodejs killers are doing:
vibecodingagentic development jarred now does (+ this news of a vibe-port, even if it's an experimental branch), realy cemented it for me as a project i won't be using.so, nodejs regins supreme! (seriously, the built in typescript stuff and sqlite is nuts!)
finally, i'm not fully anti-ai and i think vibecoding a little python script/webapp to solve a super specific problem you have is fun (and useful!), time and time again i am being convinced that large-scale "agentic development" in any complex project with many moving parts and users is a net negative, which temporarily speeds up feature development, but ultimately causes the project to destabilize and bloat up (e.g. vscode, cursor, mise, perplexity webui (seriously, how difficult is it to make a dropdown with js that doesn't lag when scrolling?)...). i hope more big tools/libraries/projects i rely on recognize this and adopt a strict(er) anti-ai policy.
carlana | 20 hours ago
I am glad Deno and Bun existed because they forced Node to be better (Node deliberately didn't used to implement btoa for crying out loud!), but I think they will ultimately just be footnotes.
kraxen72 | 15 hours ago
yeah, i agree. should've made that more explicit in my comment.
pmarreck | 23 hours ago
literally the wrong direction LOL
I haven't met a single Rust maintainer of a medium-size (hundred-thousand-line+) project who is happy with how it scales up on larger codebases
matklad | 22 hours ago
Nice to meet you!
I do think Rust is exceptionally great for larger code bases, see:
It does take a bit of knowing what you are doing to wield it efficiently though:
No informed opinion how Zig works relative to Rust in these contexts (TigerBeetle is quite special in how it is engineered).
emk | 21 hours ago
Background: I have maintained, oh, a half dozen Rust code bases in between 10,000 and 100,000 lines, but nothing over 100,000 lines. And I know of a few cases where a 20,000-line Rust program implements the essential features of a 300,000 line Go program. So my situation isn't exactly what you described.
But in that 10,000-100,000 line range? Oh, my, I am super happy with Rust as a maintainer. That's small enough for a program to be tightly focused, but large enough implement one core idea thoroughly.
In this size range, both the language itself and the tooling are great assets, at least in my personal experience.
sunshowers | 17 hours ago
cargo-nextest is currently 84k non-blank/comment lines of Rust code, and I don't think it's feasible for me as a solo maintainer to write it to my personal standard of excellence in any other language.
paulocuambe | 4 hours ago
can you elaborate?
joshka | an hour ago
I've maintained a crate that's about 50K LoC, and a product that is around 700K LoC (of rust code - much larger when you add in other languages).
Both are lovely to work in. The latter has some compile time issues, that is being sorted out. I'd hate to see it in go/zig/ts because rust syntax is really good once you get to know it, and the other languages are pretty meh on conveniences / affordances in comparison.
dhruvp | a day ago
Maybe because of this - https://github.com/oven-sh/bun/issues/28001.
invlpg | a day ago
Explain how exactly would Rust have caught this? This isn't a temporal memory safety issue.