FRESH

Hacker News

Home

No right to relicense this project

442 points by robin_reala

by antirez

10 subcomments

I believe that Pilgrim here does not understand very well how copyright works:
> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code
This is simply not true. The reason why the "clean room" concept exists is precisely since actually the law recognizes that independent implementations ARE possibile. The "clean room" thing is a trick to make the litigation simpler, it is NOT required that you are not exposed to the original code. For instance, Linux was implemented even if Linus and other devs where well aware of Unix internals. The law really mandates this: does the new code copy something that was in the original one? The clean room trick makes it simpler to say, it is not possible, if there are similar things it is just by accident. But it is NOT a requirement.

by dathinab

7 subcomments

The argument that a rewrite is a copyright violation because they are familiar with the code base is not fully sound.
"Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.
Or else a artist having seen a picture of a sunset over an empty ocean wouldn't be allowed to pain another sunset over an empty ocean as people could claim copyright violation.
Through what is a violation is, if you place the code side by side and try to circumvent copyright law by just rephrasing the exact same code.
This also means that if you give an AI access to a code base and tell it to produce a new code base doing the same (or similar) it will most likely be ruled as copyright violation as it's pretty much a side by side rewriting.
But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so. Rewrite it from scratch. And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.
Through while doing so is not per-se illegal, it is legally very attackable. As you will have a hard time defending such a rewrite from copyright claims (except if it's internally so completely different that it stops any claims of "being a copy", e.g. you use complete different algorithms, architecture, etc. to produce the same results in a different way).
In the end while technically "legally hard to defend" != "illegal", for companies it's most times best to treat it the same.

by markthered

3 subcomments

The copyright argument is a sidetrack both in the PR comment thread and here. The issue opened claims the new code is based on the old code, and therefore derivative, and therefore must be offered in a modified version of the source code under the previous license, LGPL. The complaint is the maintainers violated the terms of LGPL, that they must prove no derivation from the original code to legally claim this is a legal new version without the LGPL license. Claim is if they or Claude read the old code (or of course directly use any of it) it is a license violation. “… in the release 7.0.0, the maintainers claim to have the right to “relicense” the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation).“ By this reasoning, I am genuinely asking (I’m not a license expert) if a valid clean room rewrite is possible, because at a minimum you would need a spec describing all behavior, which ses to require ample exposure to the original to be sufficiently precise.

by Roritharr

15 subcomments

As part of my consulting, i've stumbled upon this issue in a commercial context. A SaaS company who has the mobile apps of their platform open source approached me with the following concern.
One of their engineers was able to recreate their platform by letting Claude Code reverse engineer their Apps and the Web-Frontend, creating an API-compatible backend that is functionally identical.
Took him a week after work. It's not as stable, the unit-tests need more work, the code has some unnecessary duplication, hosting isn't fully figured out, but the end-to-end test-harness is even more stable than their own.
"How do we protect ourselves against a competitor doing this?"
Noodling on this at the moment.

by amtamt

0 subcomment

Maintainers must not be able to change the license that original author chose, and based on which contributors made contributions. When one stepped up to be maintainer, it was a trustee role, not owner role.
It should be perfectly ok (by maintainer or anyone for that mater) to be inspired from a community project and build something from scratch hand-crafting/ AI sloping, as long as the imitation is given a new name/ identity.
What rubbed me off personally was maintainer saying "pin your dependncies to version 6.0.0 or 5.x.x", as if maintainer owns the project. maintainer role is more akin to serve the community, not rule.
If it is completely new, why not start a new project with new name? No one will object. And of course leave the old project behind to whoever is willing to maintain it. And if the new name project is better, people will follow.

by kreco

1 subcomments

A comment from 2021:
> Unfortunately, because the code that chardet was originally based on was LGPL, we don't really have a way to relicense it. Believe me, if we could, I would. There was talk of chardet being added to the standard library, and that was deemed impossible because of being unable to change the license.
So the person that did the rewrite knew this was a dive into dangerous water. That's so disrespectful.

by scosman

5 subcomments

Sounds like they didn’t build a proper clean room setup: the agent writing the code could see the original code.
Question: if they had built one using AI teams in both “rooms”, one writing a spec the other implementing, would that be fine? You’d need to verify spec doesn’t include source code, but that’s easy enough.
It seems to mostly follow the IBM-era precedent. However, since the model probably had the original code in its training data, maybe not? Maybe valid for closed source project but not open-source? Interesting question.

by kermatt

1 subcomments

On a side note, it is interesting to see Mark Pilgrim rise from the "dead": https://en.wikipedia.org/wiki/Mark_Pilgrim#%22Disappearance%...
His Python books, although a bit dated, are something I still recommend to new Python programmers.

by QuadmasterXLII

1 subcomments

“Mr Teacher, how many words do I have to change after copy pasting wikipedia so its not plagiarism?” has grown up and entered the workforce.
Pin your dependency versions people! With hashes at this point, cant trust anybody out here.

by p0w3n3d

4 subcomments

Wow that's hot. I was not aware that you need to be "untainted" by the original LGPL code. This could mean that...
All AI generated code is tainted with GPL/LGPL because the LLMs might have been taught with it

by hu3

1 subcomments

I torn on where the line should be drawn.
If the code is different but API compatible, Google Java vs Oracle Java case shows that if the implementation is different enough, it can be considered a new implementation. Clean room or not.

by darkwater

2 subcomments

It's not clear at all why the current maintainers wanted/needed this re-licensing. I guess that their employee, Monarch Money, wants to use derivative work in their application without releasing the changes? It was already LGPL, perfect for a library, not GPL.

by starkparker

0 subcomment

Everyone armchair debating the licensing side of this frustrates discussion, because the only licensing discussion that matters is the one in front of a court. Until and unless one happens, this is just a boring hobby.
Otherwise all this rewrite accomplishes is a 2.3% accuracy improvement and some performance gains that might not be relevant in production, in exchange for a broken test suite, breaking changes, and unnecessary legal and ethical risks pushed out as an update to what was already a stable project.
If it's truly a sufficiently separate project that it can be relicensed from LGPL, then it could've just been _a fully separate project with a new identity_, and the license change would've been at least harder to challenge. Instead, we're here.

by oxag3n

0 subcomment

AI "translation" kills any incentive to open source code.
What's worse - disassembler+AI is good enough to "translate" the binary into working source code, probably in a different programming language than then original.

by Ardren

1 subcomments

Huh, 7e25bf4 was a big commit.
```
  2,305 files changed
  +0 -546871 lines changed
```
https://github.com/chardet/chardet/commit/7e25bf40bb4ae68848...

by malklera

0 subcomment

You have to look from two sides this Moral: What is right or wrong? If they wanted to change the license, they could have made another project with another name, and nobody would care, but they wanted the reputation of the project.
Legal: How much are you willing to spend on litigation? The only real "protection" by copyright is in court.

by geenat

1 subcomments

FastAPI's underlying library, Starlette, has been going through licensing shenanigans too lately: https://github.com/Kludex/starlette/issues/3042
Be really careful who you give your projects keys to, folks!

by PaulDavisThe1st

1 subcomments

I am confused. In the USA, there has been a clear rule that machine-generated code cannot be copyright. If the "new implementation" was in fact created by Claude (which is my impression), then nobody holds any copyright to the code, and it cannot be licensed under any license at all.
I am sure I am missing something ... what is it?

by binaryturtle

3 subcomments

Isn't the real issue here that tons of projects that depend on the "chardet" now drag in some crappy still unverified AI slop? AI forgery poisoning, IMHO.
Why does this new project here needed to replace the original like that in this dishonourable way? The proper way would have been to create a proper new project.
Note: even Python's own pip drags this in as dependency it seems (hopefully they'll stick to a proper version)

by jrochkind1

3 subcomments

Does anyone understand the intent behind the changed license on the package, why are the current maintainers trying to do it in the first place? What's actually going on?

by duckerude

0 subcomment

Perhaps notable: years ago the original original chardet was rewritten with a different license: https://github.com/hsivonen/chardetng
AFAIK this was not a clean room reimplementation. But since it was rewritten by hand, into a different language, with not just a different internal design but a different API, I could easily buy that chardetng doesn't infringe while Python chardet 7 does.

by pmarreck

1 subcomments

I have successfully reproduced a few projects with LLM assistance via strict cleanroom rules and only working off public specifications.

by red_admiral

0 subcomment

The law is what a court says it is; there is precedent for decisions on human rewrites but LLM (assisted) code might still be fairly uncharted territory.

by nailer

0 subcomment

> chardet 7.0 is a ground-up, MIT-licensed rewrite of chardet. Same package name, same public API
Licensing aside, morally you don't rewrite someone else's project with the same package name.

by mytailorisrich

6 subcomments

> Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation).
I don't think that the second sentence is a valid claim per se, it depends on what this "rewritten code" actually looks like (IANAL).
Edit: my understanding of "clean room implementation" is that it is a good defence to a copyright infrigement claim because there cannot be infringement if you don't know the original work. However it does not mean that NOT "clean room implementation" implies infrigement, it's just that it is potentially harder to defend against a claim if the original work was known.

0 subcomment

by nilsbunger

1 subcomments

Does using the old version’s tests to create a new version make it a derivative work? That’s certainly some pretty tight coupling.

by kreco

0 subcomment

This is not unprecedented, TCC relicensed part of its code by being approved by all authors:
https://repo.or.cz/tinycc.git/blob/3d963aebcd533da278f086a3e...
The interesting part is that the original author is against it but some people claims it could be a rewrite and not a derivative work.
I don't know the legal basis of everything but it's definitly not morally correct toward the original author.

by oytis

1 subcomments

I wonder if LLMs will push the industry towards protecting their IP with patents like the other branches of engineering rather than copyright. If you patent a general idea of how your software works then no rewrite will be able to lift this protection.

by 0sdi

1 subcomments

surely this can also be used to turn proprietary software into free

by noosphr

0 subcomment

If the code is written by an Ai they can't copyright it. It is all public domain.

by dminor

1 subcomments

Another tangent that I didn't see in the thread is that the Supreme Court just confirmed a ruling that LLM created art isn't copyrightable since the author must be human for copyright to apply.
If the new code was generated entirely by an LLM, can it be licensed at all? Or is it automatically in the public domain?

by bachmeier

0 subcomment

Another day in the post-copyright world. Surely someone somewhere is already using this to test the effect of copyright laws, should we decide to go back to that world.

by Dunedan

0 subcomment

The even more concerning news to me is that chardet 7.0 is now vibe coded AI slop, as documented in the PR for the rewrite [1]:
> I put this together using Claude Code with Opus 4.6 with the amazing https://github.com/obra/superpowers plugin in less than a week. It took a fair amount of iteration to get it dialed in quite like I wanted, but it took a project I had been putting off for many years and made it take ~4 days.
Given the amount of changes I seriously doubt that this re-implementation has been reviewed properly and I wonder how this is going to be maintainable going forward.
[1]: https://news.ycombinator.com/item?id=47259177

by charcircuit

0 subcomment

Clean room implementations are not necessary to avoid copyright infringement.

by myrmidon

0 subcomment

I think Mark Pilgrim misrepresents the legal situation somewhat: The AI rewrite does not legally need to be a clean room implementation (whatever exactly that would even mean here).
That is just the easiest way to disambiguate the legal situation (i.e. the most reliable approach to prevent it from being considered a derivative work by a court).
I'm curious how this is gonna go.

by soulofmischief

2 subcomments

The README has clearly been touched by an LLM. Count the idiosyncrasies:
“chardet 7.0 is a ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x”
Do people not write anymore?

by q3k

2 subcomments

> 12-stage detection pipeline
What is this recent (clanker-fueled?) obsession to give everything fancy computer-y names with high numbers?
It's not a '12 stage pipeline', it's just an algorithm.

by TZubiri

0 subcomment

Interesting.
While I am obviously Team GPL and not team "I 'rewrote' this with AI so now it's mine", I'm team anti-fork, and definitely not team 'Chardet'.
Forking should be a last resort, one better option is to yeet the thing entirely.
And chardet lends itself perfectly for this, using chardet is a sign of an issue and low craftmanship, either by the developer using chardet, or the developer that failed to signal the encoding of their text. (See Joel Spolsky's "The absolute minimum every developer should know about character encoding") (And let's be honest, it's probably the developers problem, not everything is someone else's fault.)
Just uninstall this thing where you can, and avoid installing it always, because you always can.
You know I'm right. I will not be replying to copium

by AlexandrB

1 subcomments

Setting aside the legal questions, what a nasty thing to do. I would expect to see this kind of move from some big corpo, not an OSS maintainer. This feels like it runs counter to the whole open source ethos and undermines the idea that authorship means anything anymore.

by imcritic

3 subcomments

Licenses are cancer and the enemy of opensource.

by skeledrew

3 subcomments

I feel like the author is missing a huge point here by fighting this. The entire reason why GPL and any other copyleft license exists in the first place is to ensure that the rights of a user to modify, etc a work cannot be ever taken away. Before, relicensing as MIT - or any other fully permissive license - would've meant open doors to apply restrictions going forward, but with AI this is now a non-issue. Code is now very cheap. So the way I see this, anyone who is for copyleft should be embracing AI-created things as not being copyrightable (or a rewrite being relicensable) hard*.

by raggi

2 subcomments

Look, forget the details, step back and consider the implications of the principle.
Someone should not be able to write a semi-common core utility, provide it as a public good, abandon it for over a decade, and yet continue to hold the rest of the world hostage just because of provenance. That’s a trap and it’s not in any public interest.
The true value of these things only comes from use. The extreme positions for ideals might be nice at times, but for example we still don’t have public access to printer firmware. Most of this ideology has failed in key originating goals and continues to cause headaches.
If we’re going to share, share. If you don’t want to share, don’t. But let’s not setup terminal traps, no one benefits from that.
If we flip this back around though, shouldn’t this all be MPL and Netscape communications? (Edit: turns out they had an argument about that in the past on their own issue tracker: https://github.com/chardet/chardet/issues/36)