> Yet, this shift made me re-evaluate the open source code publishing. Prior to that, I have been positive about free and open software, and considered this to be the default mode for work such as kefir. I did not require any justifications from myself to publish something. Now, however, I feel more and more that the main beneficiaries of my unpaid work are companies scraping the internet to train large language models. Currently accepted status quo in this area goes against my own intentions in licensing this work under GNU GPLv3. Publication has ceased to be the "null hypothesis" for me, and requires explicit mental justification which I am not able to provide.
I feel this pain, one of my small donation driven sites has been destroyed by crawlers who just ignore robots.txt and burn the site into the ground.
Sort of jokingly I proposed an update to the "spam fax" law:
Really hate to say it, but I’ve stopped publishing my work too for this reason. I spend most of my time now building my own little software ark, and I aspire to no longer think of programming in the next few years. I feel like the creative economy in general will be unrecognizable in the near future, maybe nonexistent. I wonder what modes of collaboration on ideas might form in the next few years.
Here is what the purveyors of AI don't seem to realise. You can bend copyright law all you want in order to train your models on whatever you can grab, but in the absence of genuine protection of their creative work authors are simply not going to be publishing at all.
Without any material or immaterial benefits? And with one's work being ground up and turned into weights for the next version of the machine that's threatening one's employment?
I think they see it all too well. They still think they can make bank today while it lasts, whatever comes after is some other shareholder's problem. And if we're talking about open source, killing it might be a positive side effect, they'll be ready to sell you a closed source alternative when you no longer have options.
Furthermore, if people not only stop publishing, but also take down already published works, it will create a moat around already existing Language Models
And the more they DDOS small websites — instead of respectfully scraping once — the more realistic my conspiracy theory looks.
The sad thing is I feel trapped on all sides of the debate, I wrote a book about LLMs and human creativity (spoiler Humans win for a long time) but I was going to do it as a blog series, instead I published https://www.amazon.com/dp/B0GXCSY4W8 because I felt at least I might get a bit back for literally 100’s of hours of my life I poured into the book and my editor and friends who read and provided reviews.
And I push a lot of open source code including a ton for the SWGEmu project, but now I’m of mixed mind to stop pushing anything public. I can’t decide, am I talking out of both sides of my mouth, it’s a confusing time to navigate for sure.
Indeed sad, congrats on publishing your book though. I’ve certainly felt a bit of that same angst myself.
I think SWGEmu (cool project, just learned of it from you!) do represent some optimism though. Maybe these sorts of passion projects will take over the space?
People who break the social contract are the ones responsible for breaking the social contract, not the ones who take steps in response to social contract being broken.
People who take steps in response to social contract being broken are the ones responsible for the steps they've taken, not the ones who break the social contract.
Are you asking how AI coding agents, the companies selling them and the individuals using them break the FOSS social contract (copyleft, attribution, upstreaming), or are you disputing that they do?
There seems to be an implicit premise here that any work generated by an LLM whose training data includes a particular bit of code itself constitutes a redistribution of that code. I've yet to encounter any strong arguments substantiating this premise as a general principle, and my own suspicion is that it is not valid as a general principle, given the nature of how LLMs operate.
It's certainly possible that specific instances of LLMs lazily copy-pasting code from public repos may exist, and the extent to which this is happening is something that can be substantiated by empirical examples, so if you have any to point to, I'd be interested in looking at them. However, where this is happening, it ought to be regarded as a failure modality of LLMs, and not something that implicates the underlying nature of LLMs, given that their intended purpose is to function as stochastic generators that do not merely copy-paste input data.
My initial feeling here is that using open-source code to train LLMs is not per se a violation of the generally accepted FOSS social contract, but rather that attempting to restrict specific use cases of FOSS-licensed code on the basis of normative opinions unrelated to the license terms is a violation, or at least a rejection, of that social contract. I'm not fully committed to this position, though, and would welcome well-reasoned arguments to the contrary.
Yes but my answer would be different. It can be either about what coding agents do (and you'll see that it breaks the social contract), or it can be about what the FOSS social contract is (and you'll argue that coding agents don't break it.) Lo and behold, it was the latter.
> There seems to be an implicit premise here that any work generated by an LLM whose training data includes a particular bit of code itself constitutes a redistribution of that code.
Not any work. But if a specific work was generated based on a specific open source work, then according to the social contract that binds non-AI code generators such as transpilers, the output is derivative and should follow the license of that open source work.
There's also the question of whether the model itself is a redistribution. For every other lossy compression algorithm in history, the answer is a resounding yes. Is a model meaningfully different from a hypercompressed corpus of its learning data?
The social contract of the open source (not to be confused with the legal contract of GPL, MIT etc.) is that developers give users software that they can use and modify in any way they want, and in exchange the users give the developer recognition and help with development and maintance, as well as give each other the assurance that the software will remain available to them and any future users.
AI gives the user all the benefits of using open source software with none of the obligations that come from using open source software. Developer gains nothing from going open source. It makes no sense for any developer to go open source. Social comtract breaks down, and it's all because AI users didn't hold up their half of the bargain.
The contract behind open source was something like (GPL):
"If you copy my work, you should share your work too."
or at minimum (MIT):
"If you copy my work, you should credit me."
I think it is no longer under dispute that the legal contract is satisfied by LLMs. The AI companies won and will continue to win.
But we are talking about a social contract, which is not quite the same thing. The social contract is what leads some devs who previously enjoyed publishing their work openly to no longer feel the same way. What did the authors mean by "copy"? Did they mean literally CTRL+C, CTRL+V or something broader?
This is a matter of opinion which only each individual creator can answer. For me, copying meant something like:
"To reproduce the function of my work, dependent on my having published it, without effort nor understanding of your own"
Ten years ago this basically required doing a CTRL+C, CTRL+V so there was no need to be more specific. Anybody who did enough work to, say, rewrite in another language (with that language's idioms), met the bar of clause 3. Now AI enables a form of "copying" that matches my definition, without the user even being aware of whose works they are copying. It perfectly launders the origins of its output. It can write an FFmpeg clone in Rust for you that would appear to be a novel work.
Of course, I cannot say that my own little bits and pieces of open source code would make a scratch in AI's capability, were it removed.
But I do strongly believe that if all the code that was published by authors with the same mindset was unavailable, Claude would be a far weaker developer.
> But we are talking about a social contract, which is not quite the same thing. The social contract is what leads some devs who previously enjoyed publishing their work openly to no longer feel the same way.
Perhaps this illustrates a fissure that was always lurking under the surface, then. The social contract that I've personally always attributed to FOSS communities was that attempting to restrict how people downstream of you use code is illegitimate, and that licenses like the GPL were meant to use copyright law to achieve something that resembles the state of affairs that might exist if copyright didn't exist in the first place. That's what the whole concept of "copyleft" always seemed to imply.
Now we have a new class of technologies that is admittedly fraught with a wide range of risks and pitfalls, but also a lot of promise to enable people to actually put the "four freedoms" into practice in ways they couldn't before, and we're seeing people who have normative opinions about AI derived from other, unrelated principles trying to circle the wagons and exclude those use cases. That is what seems like a breach of the social contract as I've always understood it.
> Did they mean literally CTRL+C, CTRL+V or something broader?
Given that FOSS licenses were always constructed to function within applicable copyright law, I don't see how they could mean anything else. "Literal CTRL+C, CTRL+V" is the only thing copyright has ever applied to, and the whole point of "copyleft" was to lessen the restrictions on even that.
DDOSing websites seems to be an unrelated problem, and one that has traditionally been solved through response throttling and IP blocking.
Attribution is often required even on MIT or BSD licenses where code is being redistributed, either in original or modified versions, but that would relate to this discussion only to the extent that one regards using LLMs whose training data included a certain bit of code as itself constituting redistribution of that specific code -- but that in turn is a very debatable premise which really ought to be argued for, and not merely argued upon as though it is already generally recognized as true.
Are you now layering the old and tired "copyright infringement = stealing" argument on top of the still unsubstantiated premise that all LLM training is copyright infringement?
Seems to me LLMs have changed some things. I'm not sure how it's best put, but it used to be:
- Seeing code (or a blogpost or whatever) was a result from effort where thought had gone into it. The writer paid effort so the reader didn't have to.
- There'd be some level of attachment to what you've put effort into.
With LLMs, that's undermined: it's easy to produce thoughtless imitations. Code or comments where thought didn't go into it. So, seeing some result isn't an indication of skill, but also not even an indication thought went into it.
I guess there's still something lost if someone isn't going to share code they've put thought into. -- But on the other hand, if it's just for me & I don't have to share it with a wider audience, getting LLMs to write out code isn't so expensive.. so code itself isn't necessarily something to value so much.
But LLMs don’t seem particularly good at inventing new ways to code (or write, or…). It’s literally all derivative. So what happens in 10 years? Are we headed for a great stagnation?
> But LLMs don’t seem particularly good at inventing new ways to code (or write, or…). It’s literally all derivative.
I think the key part is how much thought goes into something.
Optimistically, LLMs are good at taking unstructured input, and (probably) producing the intended output from that. -- This allows for an interesting new way of coding: a set of instructions don't need to be as rigorous as a shell script, but can be natural language.
That part surely extends creativity. An LLM will be familiar with domain ideas I'm not, even if an LLM is completely disinterested in doing things.
Pessimistically, I think it's still not clear what the right way of interacting online with all of this is (other than clear expectations of "no AI")... in some sense LLM output is worthless to share, in the sense that I'm just as capable of asking the LLM to output something as anyone else is.
I don't know... I've been writing code for good twenty years (15 professionally).
First, I think it's the best time to write software since so much boring stuff can be automated. I can put my thoughts into what I'm trying to achieve instead of how. To put it otherwise, I think about big picture much more than about mundane details like dealing with particularities of a programming language.
Second, most people were using SO to solve just about any issue they had. The number of developers producing truly original code was minimal even 10 years ago.
What a well-rounded nicely written announcement that touches on all parts of the argument without any rage baiting or flex etc. It would be easy to just ramble against AI and how its the end of the world etc but the author focused on a point that's not even related to use or misue of AI in software but rather how we have made it acceptable that large corporate companies can skirt copyright without any issue and make rivers of money with it. This problem extends not only to coding but other industries as well.
People taking your work and not giving anything back was ALWAYS the risk you took when writing free software. LLM training doesn't change that much. That the us military no doubt is using gcc to compile embedded software for their icbm:s no doubt irks the gnu people. But you can't have it any other way. "You can only use my software for good things" just is not consistent with "free software".
There's an almost intergalactic level of irony in the extent to which open source has benefited giant corporations and the military at the expense of individuals, and ultimately contributed to the commercialised enclosure of software IP.
I suppose you could argue it also indirectly led to the empowerment of non-developers to create their own vibe coded solutions. But we're not quite there yet.
And the AI IP that makes that possible is still enclosed rather than open.
> There's an almost intergalactic level of irony in the extent to which open source has benefited giant corporations and the military at the expense of individuals, and ultimately contributed to the commercialised enclosure of software IP.
Could you perhaps explain that irony a bit more explicitly?
Can you provide any examples of "commercialized enclosure of software IP" somehow backwashing into the FOSS ecosystem and closing things up that are already open?
Sure, Free Software hasn't been the vehicle for societal change that RMS and others certainly hoped. I remember being flamed out in a user group for suggesting that our conference shouldn't be held in a "non-free" country such as Morocco, Turkey, or China because it's counter-productive to freedom. Very few people actually got it. But it's orthogonal to LLM trainers also using free software in "non-approved" ways.
Yeah, I really can't comprehend these sentiments as anything other than an "I don't like AI" argument. FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.
I see a lot of risks involved in people surrendering their own decision-making to LLMs, but that's a question of how they're used, not how they're trained. The idea that using FOSS software to train LLMs is somehow a violation of FOSS norms just doesn't seem valid.
Sure, but I guess I'm not seeing the relevance here. Are we seeing some greater-than-normal wave of people redistributing FOSS code without attribution, or creating derivative works without adhering to the license terms? LLM training doesn't seem to be either of these things.
Can you point to some specific examples of products shipped by the companies I assume you're referring to here that are in fact unattributed derivative works of GPL-licensed software?
Or are you saying that you think anything generated by an LLM qualifies as a derivative work of anything included in its training data?
> It's a tool, if using data is necessary to make the tool work, then its output derives from the data.
That's simply not correct within the applicable meaning of "derives" as understood in copyright law. In fact, data per se is not even within the scope of copyright protection in the first place: specific published works are copyrighted, but the underlying ideas and facts that they convey are not.
Even creating works that merely draw on a single source of data, but express the ideas drawn from that in a new or transformative way, are not considered derivative works (see the ruling in Google v. Oracle, for example), let alone works based on patterns extrapolated by relating together ideas sourced from many distinct works, which is what LLMs are principally doing.
If you applied the principle you're proposing here to human developers, you'd conclude that any code written by someone who learned to program by studying techniques used in FOSS software would in turn be a derivative work of that software. No one has ever regarded this to be the case.
Before LLMs, you could use the GNU GPL or other copyleft licenses to protect your code from being used to develop non-free software. Unfortunately, the courts have decided that LLMs are free to ignore licenses.
I'm also very hesitant to release any new works (code, artworks, etc.) to the public. I usually release code under the GPL or AGPL, but I don't think any of those choices are properly respected by the AI crawlers, and subsequent "mixing into" those models.
Multiple times I got partially broken "citations" of GPL licensed code out of the models as answers to basic research questions (aka prompts) w/o any mentioning of the original license applied to the code. Just adding some random bugs every 10th line doesn't make it not a direct derivate. Image generators happily generated Sonics or Bart Simpsons (w/o directly prompting for that either). No mentions that those are copyrighted characters either.
One of the very few small compilers which passes the full gcc torture tests. But for me kefir is good enough as the reference small compiler. Not as fast as tcc, but more correct
One person show. Effectively, it is dead since now it became the proprietary toy of its author. The author is entitled to do what he wants with his own creation, however.
This project in particular has been unconcerned with new coding practices so far, primarily, because I derive pleasure from hand-written implementations of my ideas, and believe that overcoming challenges the hard way is the main value I get from it.
This 100% the same for me. Outside of work where speed is more important than quality, and I work with people that use AI, I don't use AI at all on my own projects. It poisons the mind and the soul. Ok that sounds dramatic, but I felt down up until the point where I started hand writing everything again. Software engineering is still fun and powerful, and the hell with where the world is going.
That a function which is at its core literally trained to be as close to its input as possible is not (yet, court cases are still pending) IP theft is one of the great mysteries of our time.
Worse, because the sometimes valuable real time answers are generated by scraping the web and rewriting the IP in plain sight.
A couple of academic psychopaths who write horrible academic code themselves steal all valuable human knowledge right before our eyes and market it as "tech".
There should be a new civil war against these modern plantation owners and slave holders.
turtleyacht | 7 hours ago
kator | 6 hours ago
I feel this pain, one of my small donation driven sites has been destroyed by crawlers who just ignore robots.txt and burn the site into the ground.
Sort of jokingly I proposed an update to the "spam fax" law:
https://www.karlbunch.com/random/website-protection-act/
jagged-chisel | 6 hours ago
You have a hole here. Your web server is sending the response and the bot is receiving.
Fix that and … profit? :-)
wizzwizz4 | 4 hours ago
> The initiator of the communication pays, not the server operator.
kator | 3 hours ago
malwrar | 5 hours ago
irdc | 5 hours ago
dzhiurgis | 4 hours ago
egypturnash | 3 hours ago
irdc | 3 hours ago
buran77 | 2 hours ago
lesostep | 2 hours ago
And the more they DDOS small websites — instead of respectfully scraping once — the more realistic my conspiracy theory looks.
irdc | an hour ago
kator | 3 hours ago
And I push a lot of open source code including a ton for the SWGEmu project, but now I’m of mixed mind to stop pushing anything public. I can’t decide, am I talking out of both sides of my mouth, it’s a confusing time to navigate for sure.
malwrar | an hour ago
I think SWGEmu (cool project, just learned of it from you!) do represent some optimism though. Maybe these sorts of passion projects will take over the space?
account42 | 5 hours ago
Gormo | 3 hours ago
Xirdus | 2 hours ago
dlev_pika | 2 hours ago
Xirdus | an hour ago
Gormo | 2 hours ago
Xirdus | an hour ago
Gormo | an hour ago
There seems to be an implicit premise here that any work generated by an LLM whose training data includes a particular bit of code itself constitutes a redistribution of that code. I've yet to encounter any strong arguments substantiating this premise as a general principle, and my own suspicion is that it is not valid as a general principle, given the nature of how LLMs operate.
It's certainly possible that specific instances of LLMs lazily copy-pasting code from public repos may exist, and the extent to which this is happening is something that can be substantiated by empirical examples, so if you have any to point to, I'd be interested in looking at them. However, where this is happening, it ought to be regarded as a failure modality of LLMs, and not something that implicates the underlying nature of LLMs, given that their intended purpose is to function as stochastic generators that do not merely copy-paste input data.
My initial feeling here is that using open-source code to train LLMs is not per se a violation of the generally accepted FOSS social contract, but rather that attempting to restrict specific use cases of FOSS-licensed code on the basis of normative opinions unrelated to the license terms is a violation, or at least a rejection, of that social contract. I'm not fully committed to this position, though, and would welcome well-reasoned arguments to the contrary.
Xirdus | 43 minutes ago
Yes but my answer would be different. It can be either about what coding agents do (and you'll see that it breaks the social contract), or it can be about what the FOSS social contract is (and you'll argue that coding agents don't break it.) Lo and behold, it was the latter.
> There seems to be an implicit premise here that any work generated by an LLM whose training data includes a particular bit of code itself constitutes a redistribution of that code.
Not any work. But if a specific work was generated based on a specific open source work, then according to the social contract that binds non-AI code generators such as transpilers, the output is derivative and should follow the license of that open source work.
There's also the question of whether the model itself is a redistribution. For every other lossy compression algorithm in history, the answer is a resounding yes. Is a model meaningfully different from a hypercompressed corpus of its learning data?
The social contract of the open source (not to be confused with the legal contract of GPL, MIT etc.) is that developers give users software that they can use and modify in any way they want, and in exchange the users give the developer recognition and help with development and maintance, as well as give each other the assurance that the software will remain available to them and any future users.
AI gives the user all the benefits of using open source software with none of the obligations that come from using open source software. Developer gains nothing from going open source. It makes no sense for any developer to go open source. Social comtract breaks down, and it's all because AI users didn't hold up their half of the bargain.
rspeele | 53 minutes ago
"If you copy my work, you should share your work too."
or at minimum (MIT):
"If you copy my work, you should credit me."
I think it is no longer under dispute that the legal contract is satisfied by LLMs. The AI companies won and will continue to win.
But we are talking about a social contract, which is not quite the same thing. The social contract is what leads some devs who previously enjoyed publishing their work openly to no longer feel the same way. What did the authors mean by "copy"? Did they mean literally CTRL+C, CTRL+V or something broader?
This is a matter of opinion which only each individual creator can answer. For me, copying meant something like:
"To reproduce the function of my work, dependent on my having published it, without effort nor understanding of your own"
Ten years ago this basically required doing a CTRL+C, CTRL+V so there was no need to be more specific. Anybody who did enough work to, say, rewrite in another language (with that language's idioms), met the bar of clause 3. Now AI enables a form of "copying" that matches my definition, without the user even being aware of whose works they are copying. It perfectly launders the origins of its output. It can write an FFmpeg clone in Rust for you that would appear to be a novel work.
Of course, I cannot say that my own little bits and pieces of open source code would make a scratch in AI's capability, were it removed.
But I do strongly believe that if all the code that was published by authors with the same mindset was unavailable, Claude would be a far weaker developer.
Gormo | 37 minutes ago
Perhaps this illustrates a fissure that was always lurking under the surface, then. The social contract that I've personally always attributed to FOSS communities was that attempting to restrict how people downstream of you use code is illegitimate, and that licenses like the GPL were meant to use copyright law to achieve something that resembles the state of affairs that might exist if copyright didn't exist in the first place. That's what the whole concept of "copyleft" always seemed to imply.
Now we have a new class of technologies that is admittedly fraught with a wide range of risks and pitfalls, but also a lot of promise to enable people to actually put the "four freedoms" into practice in ways they couldn't before, and we're seeing people who have normative opinions about AI derived from other, unrelated principles trying to circle the wagons and exclude those use cases. That is what seems like a breach of the social contract as I've always understood it.
> Did they mean literally CTRL+C, CTRL+V or something broader?
Given that FOSS licenses were always constructed to function within applicable copyright law, I don't see how they could mean anything else. "Literal CTRL+C, CTRL+V" is the only thing copyright has ever applied to, and the whole point of "copyleft" was to lessen the restrictions on even that.
hilariously | 2 hours ago
Gormo | an hour ago
Attribution is often required even on MIT or BSD licenses where code is being redistributed, either in original or modified versions, but that would relate to this discussion only to the extent that one regards using LLMs whose training data included a certain bit of code as itself constituting redistribution of that specific code -- but that in turn is a very debatable premise which really ought to be argued for, and not merely argued upon as though it is already generally recognized as true.
hilariously | an hour ago
Gormo | an hour ago
Are you now layering the old and tired "copyright infringement = stealing" argument on top of the still unsubstantiated premise that all LLM training is copyright infringement?
oooyay | 2 hours ago
Max-Ganz-II | 6 hours ago
krystalgamer | 6 hours ago
Xirdus | 5 hours ago
rgoulter | 6 hours ago
- Seeing code (or a blogpost or whatever) was a result from effort where thought had gone into it. The writer paid effort so the reader didn't have to.
- There'd be some level of attachment to what you've put effort into.
With LLMs, that's undermined: it's easy to produce thoughtless imitations. Code or comments where thought didn't go into it. So, seeing some result isn't an indication of skill, but also not even an indication thought went into it.
I guess there's still something lost if someone isn't going to share code they've put thought into. -- But on the other hand, if it's just for me & I don't have to share it with a wider audience, getting LLMs to write out code isn't so expensive.. so code itself isn't necessarily something to value so much.
irdc | 5 hours ago
dzhiurgis | 4 hours ago
asibahi | 4 hours ago
VladVladikoff | 3 hours ago
There was a relatively big shift in riding style right around the same time of the first mass production of vehicles.
irdc | 4 hours ago
rgoulter | 3 hours ago
I think the key part is how much thought goes into something.
Optimistically, LLMs are good at taking unstructured input, and (probably) producing the intended output from that. -- This allows for an interesting new way of coding: a set of instructions don't need to be as rigorous as a shell script, but can be natural language.
That part surely extends creativity. An LLM will be familiar with domain ideas I'm not, even if an LLM is completely disinterested in doing things.
Pessimistically, I think it's still not clear what the right way of interacting online with all of this is (other than clear expectations of "no AI")... in some sense LLM output is worthless to share, in the sense that I'm just as capable of asking the LLM to output something as anyone else is.
multjoy | an hour ago
f6v | 2 hours ago
First, I think it's the best time to write software since so much boring stuff can be automated. I can put my thoughts into what I'm trying to achieve instead of how. To put it otherwise, I think about big picture much more than about mundane details like dealing with particularities of a programming language.
Second, most people were using SO to solve just about any issue they had. The number of developers producing truly original code was minimal even 10 years ago.
altmanaltman | 6 hours ago
snarfy | 3 hours ago
bjourne | 5 hours ago
TheOtherHobbes | 5 hours ago
I suppose you could argue it also indirectly led to the empowerment of non-developers to create their own vibe coded solutions. But we're not quite there yet.
And the AI IP that makes that possible is still enclosed rather than open.
fragmede | 4 hours ago
Judging from the number of projects I've seen from people who aren't software developers, we're there enough.
nine_k | 3 hours ago
Gormo | 3 hours ago
Could you perhaps explain that irony a bit more explicitly?
Can you provide any examples of "commercialized enclosure of software IP" somehow backwashing into the FOSS ecosystem and closing things up that are already open?
bjourne | an hour ago
Gormo | 3 hours ago
I see a lot of risks involved in people surrendering their own decision-making to LLMs, but that's a question of how they're used, not how they're trained. The idea that using FOSS software to train LLMs is somehow a violation of FOSS norms just doesn't seem valid.
xigoi | 3 hours ago
Not true. Most FOSS licenses require attribution and many require derivatives to be released under the same license.
Gormo | 2 hours ago
xigoi | 2 hours ago
Gormo | 2 hours ago
Or are you saying that you think anything generated by an LLM qualifies as a derivative work of anything included in its training data?
DexesTTP | an hour ago
It's a tool, if using data is necessary to make the tool work, then its output derives from the data.
If the LLM generation is not derivative of its training data, then why would it need the training data in the first place?
Gormo | 48 minutes ago
That's simply not correct within the applicable meaning of "derives" as understood in copyright law. In fact, data per se is not even within the scope of copyright protection in the first place: specific published works are copyrighted, but the underlying ideas and facts that they convey are not.
Even creating works that merely draw on a single source of data, but express the ideas drawn from that in a new or transformative way, are not considered derivative works (see the ruling in Google v. Oracle, for example), let alone works based on patterns extrapolated by relating together ideas sourced from many distinct works, which is what LLMs are principally doing.
If you applied the principle you're proposing here to human developers, you'd conclude that any code written by someone who learned to program by studying techniques used in FOSS software would in turn be a derivative work of that software. No one has ever regarded this to be the case.
xigoi | 3 hours ago
bjourne | an hour ago
binaryturtle | 4 hours ago
Multiple times I got partially broken "citations" of GPL licensed code out of the models as answers to basic research questions (aka prompts) w/o any mentioning of the original license applied to the code. Just adding some random bugs every 10th line doesn't make it not a direct derivate. Image generators happily generated Sonics or Bart Simpsons (w/o directly prompting for that either). No mentions that those are copyrighted characters either.
rurban | 4 hours ago
paufernandez | 3 hours ago
RetroTechie | 4 hours ago
If a one-person show, closing it up would effectively kill it? Or (re?)turn it into a hobby project developed at snail pace.
If some community exists: fork coming up?
tocariimaa | 2 hours ago
keyle | 3 hours ago
34aSHGAS | 3 hours ago
Worse, because the sometimes valuable real time answers are generated by scraping the web and rewriting the IP in plain sight.
A couple of academic psychopaths who write horrible academic code themselves steal all valuable human knowledge right before our eyes and market it as "tech".
There should be a new civil war against these modern plantation owners and slave holders.
genxy | 2 hours ago
fithisux | an hour ago