GNU and the AI reimplementations

28 points by amontalenti a month ago on lobsters | 33 comments

I think it is dangerous that people like Antirez, who we look up to, write these kind of articles without a clear disclaimer at the top "this is an opionion, i am not a copyright expert or lawyer".

Because, and this is a fact, most of this is opinions and wishful thinking about a topic with a largely unclear legal status right now. There is barely any jurisprudence to refer to. This story is also incomplete because copyright law is very complicated and it is not just copying, there is also derivative works and there is context and ownership and a whole thing that happens on the side of the LLM providers, etc. Nothing about this simple or remotely similar to what we did decades ago when GNU was born.

timthelion | a month ago

Given how poor a track record actual lawyers have at predicting the outcomes of copyright cases, maybe this is just gatekeeping.

antirez | a month ago

Hi st3fan. What you write is not how jurisprudence works. For things that never happened before, it's grey area, like, is it fair use or not to train an LLM on XYZ? But this is different. Unless a new law is made, the old copyright law applies perfectly when you create code with LLMs. Does it violates the copyright law? It is a problem. Otherwise it is not. If you invent a new gun the old laws still apply if you kill somebody: this is a trivialization, but that's how it works. The grey areas are for new things. Copyright laws perfectly describe if some code is in violation or not of some other code.

kingmob | a month ago

I think the broader point is still that you aren't a lawyer (most of us here aren't), and whether it's well-trodden territory or novel, our legal opinions are less informed and less useful because of that.

Domain expertise doesn't generalize to other domains.

jrwren | a month ago

We aren't lawyers, but I see no reason why one creative work, e.g. an image, is different from another creative work, e.g. code. SCOTUS declined to look at Thaler v. Perlmutter. LLM generated work is not copyrightable. Vibed code is in the public domain.

simonw | a month ago

That's a big claim you're making there. Have you seen any commentary from genuine legal experts that agrees with your claim there?

Given that billions of dollars of software has been created by serious (brand name) companies using AI assisted programming tools over the past 24 months I would expect there to be way more credible commentary on this than I've seen so far.

: dvogel | a month ago
That just resolved last week. Legal professionals tend to work on much longer timelines. Their "commentary" is usually in the form of law review articles. Those journals are usually published monthly or quarterly and edited very slowly.

quasi_qua_quasi | a month ago

A huge difference is that in Thaler v. Perlmutter, Thaler listed the AI itself as the work's sole author. He later tried to claim that he was the actual author because he made the AI, but the court said "no, that's not what you said on the application, you don't get to change your mind now".

kornel | a month ago

Rewriting proprietary software to make a copyleft version is good. Rewriting a copyleft version to bypass rights of users is bad.

The article's whole gotcha is based on misunderstanding of GNU. They don't care about copyright per se, they care about people having freedom to control their software. Licenses are merely a tool, and also one that evidently has stopped working.

: gcupc | a month ago
Yes, that's the thing that's being missed here. No matter what the law says, or how it ends up being interpreted, some things are good, and some are bad. Saying bad things are legal is merely an indictment of the law.
: awal | a month ago
Thank you for writing the obvious. And I don't think it's a misunderstanding - it's classic post-hoc rationalization, either impulsive or deliberate. It seems a bit unlikely to me that a large portion of the rich and vocal tech community would suddenly forget the fundamental ideas behind FOSS.

dzwdz | a month ago

Moreover, this time the inbalance of force is in the right direction: big corporations always had the ability to spend obscene amounts of money in order to copy systems,

But now they can do it for a negligible amount of money. Whereas previously buying a commercial license for otherwise copylefted software was a reasonable choice to avoid spending said "obscene amounts of money", now we've killed that business model. The FOSS funding situation was already horrible pre-AI; yet maintainers gotta eat nonetheless.

Now, small groups of individuals can do the same to big companies' software systems:

I assume most maintainers would prefer to be paid in money, which can be exchanged for food and shelter, rather than getting paid in the ability to do "clean-room" rewrites of proprietary software.

: dzwdz | a month ago
by the way, I got clickbaited :( when I saw this on the IRC channel I thought this would be an actual response from the GNU project. oh well

Picnoir | a month ago

Moreover, this time the inbalance of force is in the right direction: (...) Now, small groups of individuals can do the same to big companies' software systems: they can compete on ideas now that a synthetic workforce is cheaper for many.

Well, this is omitting the elephant in the room: no small group of individual created one of these horizon LLMs so far. Only big corporation having access to truckload of data and hardware are currently creating and operating those. These corporations have total control on these new tools. The bleeding edge programmers tools are not open source anymore.

If you were up trying to build one of those, let alone hardware and energy access, I'd bet you'd end up in jail long before collecting all this copyrighted corpus these big corporations collected.

I'd argue the power imbalance is now even worse now than before.

jmtd | a month ago

Tanenbaum protested about the architecture (in the famous exchange), not about copyright infringement. So, we could reasonably assume Tanenbaum considered rewrites fair

Only if you take as given Tanenbaum believed Linux to be a rewrite of minix. And I’m pretty sure he did not.

eminence32 | a month ago

The other thing that is relevant here, in my opinion, is the slow erosion in the popularity of the GPL in favor of less restrictive licenses like Apache/MIT. This is relevant because as tools are reimplemented (either the old-fashioned way or the new-fashioned way), they tend to not adopt the GPL. This is troubling for anyone who is a proponent of the GPL because the GPL is only relevant when there is a certain critical mass of GPL-licensed products.

gerikson | a month ago

GPL licenses started losing mindshare long before LLMs appeared. There's always been a robust counternarrative (chiefly from the BSD camp) against the views of Stallman/FSF on how to best organize non-restrictive software licensing.

I don't know the exact numbers, but I would wager that before the widespread adoption of LLMs, the ratio of non-GPL licenses on places like Github was maybe 80%. Of course, some projects are more "foundational" than others so just looking at raw project counts is misleading.

conartist6 | a month ago

I'm just not sure the difference between MIT and GPL matters that much here. Whatever the license and copyright was before, now it's just a sticker that you can peel off with an AI. Even as an MIT user I was very much relying on having a way to use copyright to claim real estate for free software!

: gerikson | a month ago
GPL has an entity that more or less actively polices the license (indeed I believe this is the primary funding for the FSF). MIT/Apache doesn't have that. If someone doesn't attribute your authorship when using your code, you're basically reduced to try to sue for damages yourself.

pointlessone | a month ago

Is GPL relevant if any software can be generated on the spot? The point of GPL is to democratise software, take away exclusivity from Big Corpos. Do we still need GPL if pretty much any software is can be made at any time? And also under permissive license. That is while not GPL, the code is still available if you want it for some reason, and you still can use it.

gerikson | a month ago

Do we still need GPL if pretty much any software is can be made at any time?

It's easy to imagine a future where the purveyors of GenAI gate access to certain features behind higher prices.

Generate a cute flyer for a birthday party? Free.

Generate fan-art? $10/month for 100 pieces.

Generate software? $200/month, because you need to pay it to keep up with the competition.

Don't mistake the current all-you-can-eat buffet as anything other than VC-funded companies and "Big Corpos" loss-leading to entrench GenAI in all parts of society, in expectation of collecting rent once they succeed.

eminence32 | a month ago

I think the GPL is even more relevant, not less. The point of the GPL license is to protect the source code in such a way that users can continue to access it if the source code gets modified and distributed. In a world when AI code generators are prevalent and the barriers to making code changes plummets, I think a GPL proponent would be even more concerned that the code changes are accessible to everyone

zetashift | a month ago

This was/is one of my first thoughts as well. I would like to use strong copyleft licenses more, because putting something out there for people to enjoy/experience and getting it ripped off from a LLM to serve to users for a subscription, without attribution, feels bad.

toastal | a month ago

Then I think we need to adapt the licenses in a manner that as the current GPL isn’t built for this.

: gcupc | a month ago
Maybe, but if it turns out that a slop-clone is not legally considered a derived work, there's not really anything a license can do about it. Any recourse you might have is at best outside the realm of copyright law and copyright licenses.

st3fan | a month ago

I think software, specifically source code, in general is becoming less relevant. It is now an option to roll your own instead of using open source or closed source or an proprietary OS-provided libary, business licensed software, etc. You can ask an AI to build you software either unique or following some specification.

Crazy story as an example:

I was looking at https://github.com/openai/symphony which is a project OpenAI did in Elixir. Their README literally says: "if you do not like our implementation in Elixir then run our 2100 line SPEC.md past your coding agent and ask it to implement this project in your preferred language"

This just blows my mind. We've gone very rapidly to a situation where software can now be a natural language Specification that you feed into a program and as a result a program rolls out of it. Do I now own that? Can I put any license on that? Is that now 100% mine without worries? (IANAL but the answer is most likely yes?)

Sofware as we know it is not dead yet but we you can see where things are heading.

jrwren | a month ago

No mention of Thaler v. Perlmutter misses the most important part of the story: vibe coded source code has no copyright in USA.

satvikberi | a month ago

The Thaler vs. Perlmutter opinion is very readable, I encourage people to read the first few pages: https://media.cadc.uscourts.gov/opinions/docs/2025/03/23-5233.pdf

It addresses a much narrower claim: whether AI can be legally considered the sole author of a work of art for the purpose of copyright. It explicitly does not opine on whether Thaler would have been granted the copyright if he listed himself.

st3fan | a month ago

I find it a bit difficult to point at that case because it is very different. It is a case where somone tried to copyright a completely new work of art (a picture). That copyright application was rejected by the copyright office. They basically said "AI cannot be an author under copyright law". (This is what Thaler tried to do - he tried to make his program (The creativeity machine) the owner of that copyright). And the Perlmutter in this case is the person representing the copyright office. Not the owner of the original work. AFAIK there is no original work for that specific case. (Their "Almost Paradise" visual artwork)

It is of course relevant for GenAI in general but I think that is where the similarities end. "Reimplementations of software" is about .. software. Where there is an original and a reimplementation. And a reimplementation is usually not a completely new original work - there may be, such as with chardet, API similarities for compatibility. Or pieces of code that 1:1 map to another work.

I'm sure Thaler v. Perlmutter is relevant in some way but we haven't had a real case about software yet .. so it is really unclear what would happen there.

: cultpony | a month ago
I think focusing on Thaler v Perlmutter is a bit of a false goal to hunt down, because ultimately the appelate courts and the SC both basically said the gudiance of the Copyright Office was correct. If we presume this extends to their entire guidance on AI works (found here: https://www.copyright.gov/ai/ai_policy_guidance.pdf ) then AI works are only copyrighted when the human behind the AI was the actual author of the work and used AI to give it form. If the AI is merely following an automatic process, there would be no copyright.

So given if we simply tell an AI to implement a cleanroom (with two teams of AI for the two sides of a cleanroom), then it would be a purely mechanical process that forms no copyright basis.

SamRW | a month ago

I personally think that taking the code that somebody wrote, feeding it to Claude code, and asking it to create a rewrite of it for purposes of avoiding a copyleft license is tempting, but a bad idea.

I think it's unlikely the code will function in the same way, i.e., won't be compatible in any reasonable way, and will have new bugs that are not present in the original implementation. If we are to base the coding performance based on existing vibe coded software performance this will have atrocious performance & security implications (e.g., How vulnerable are vibe-coded apps?)
You will have to maintain this code yourself. Depending on what code you're trying to steal this will be a very large codebase with a lot of dependencies that themselves may have e.g., security updates. Maintaining vibe coded software is notoriously painful based on some people I know's experience.
IANAL but I don't think you can grab an open source library, feed it to an LLM as a training set and grab whatever output is there and take it as clean code, legally. There's been plenty of examples of images that were AI generated that still had the watermark of the original author, for example. So the re-generated code may have copyrightable material within. You would need to carefully review it.
This is not the same as grabbing the Java API and reimplementing it. Even cases like this are not black and white, such as Google LLC v. Oracle America, Inc took years of litigation. If I was running a business I'd rather not have this legal risk.

dsr | a month ago

What's right is not the same as what's legal. What's legal always trails behind technology changes.

The correct thing to do is to decide what's right, and then work to implement that. To do otherwise is to be an enemy of civilization.

(Have I just named Meta, Google and all the AI companies as enemies of civilization? So be it.)

: bitshift | a month ago
The founding fathers didn't get everything right, but I think about this bit from the US Constitution all the time:

The Congress shall have Power [...] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;

Specifically the bit about promoting progress.

I have my own opinions on how we should "fix copyright" or how LLMs ought to be used, but I'm not ideologically committed to (say) ten-year copyright terms—I'd happily support any change to copyright if I thought it would make the field of computing more vibrant for the next generation.