That's... refreshingly normal? Surely something most people acting in good faith can get behind.
> Signed-Off ...
> The human submitter is responsible for:
> Reviewing all AI-generated code
> Ensuring compliance with licensing requirements
> Adding their own Signed-off-by tag to certify the DCO
> Taking full responsibility for the contribution
> Attribution: ... Contributions should include an Assisted-by tag in the following format:
Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.
Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.
This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.
It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.
How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.
However, the gotcha here seems to be that the developer has to say that the code is compatible with the GPL, which seems an impossible ask, since the AI models have presumably been trained on all the code they can find on the internet regardless of licensing, and we know they are capable of "regenerating" (regurgitating) stuff they were trained on with high fidelity.
I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.
If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.
LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].
The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which
1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies
2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.
I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.
Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
[0]: https://news.ycombinator.com/item?id=47356000
[1]: http://prize.hutter1.net/
[2]: https://en.wikipedia.org/wiki/ELIZA_effect
[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...
How long until people get exhausted with the new volume of code review and start "trusting" the LLMs more without sufficient review, I wonder?
I don't envy Linus in his position... hopefully this approach will work out well for the team.
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
I feel like a lot of people will have an ideological opposition to AI, but that would lead to people sometimes submitting AI generated code with no attribution and just lying about it.At the same time, I feel bad for all the people that have to deal with low quality AI slop submissions, in any project out there.
The rules for projects that allow AI submissions might as well state: "You need to spend at least ~10 iterations of model X review agents and 10 USD of tokens on reviewing AI changes before they are allowed to be considered for inclusion."
(I realize that sounds insane, but in my experience iterated review even by the same Opus model can help catch bugs in the code, I feel like the next token prediction in of itself is quite error prone alone; in other words, even Opus "writes" code that it has bugs that its own review iterations catch)
Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?
The typical trailer for this is "AI-assistant:".
Are there other popular repos with effectively this policy stated as neatly that I’ve missed?
You don't get to bang on a screw and blame the hammer.
So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?
Side note, I'm not sure why I feel weird about having the string "Assisted-by: AGENT_NAME:MODEL_VERSION" [TOOL1] [TOOL2] in the kernel docs source :D. Mostly joking. But if the Linux kernel has it now, I guess it's the inflection point for...something.
Or, to put it another way, in the old days in order to be a 3k-LoC PR wielding psychopath intent on making your colleagues miserable with churny aggro diffs from hell you at least had to be good at coding.
Nowadays, you only need to do the psychopath art — Claude will happily fill in the PR for you.
Humans for humans!
Don't let skynet win!!!
We've been using Co-Developed-By: <email> for our AI annotations.