> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code
This is simply not true. The reason why the "clean room" concept exists is precisely since actually the law recognizes that independent implementations ARE possibile. The "clean room" thing is a trick to make the litigation simpler, it is NOT required that you are not exposed to the original code. For instance, Linux was implemented even if Linus and other devs where well aware of Unix internals. The law really mandates this: does the new code copy something that was in the original one? The clean room trick makes it simpler to say, it is not possible, if there are similar things it is just by accident. But it is NOT a requirement.
"Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.
Or else a artist having seen a picture of a sunset over an empty ocean wouldn't be allowed to pain another sunset over an empty ocean as people could claim copyright violation.
Through what is a violation is, if you place the code side by side and try to circumvent copyright law by just rephrasing the exact same code.
This also means that if you give an AI access to a code base and tell it to produce a new code base doing the same (or similar) it will most likely be ruled as copyright violation as it's pretty much a side by side rewriting.
But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so. Rewrite it from scratch. And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.
Through while doing so is not per-se illegal, it is legally very attackable. As you will have a hard time defending such a rewrite from copyright claims (except if it's internally so completely different that it stops any claims of "being a copy", e.g. you use complete different algorithms, architecture, etc. to produce the same results in a different way).
In the end while technically "legally hard to defend" != "illegal", for companies it's most times best to treat it the same.
One of their engineers was able to recreate their platform by letting Claude Code reverse engineer their Apps and the Web-Frontend, creating an API-compatible backend that is functionally identical.
Took him a week after work. It's not as stable, the unit-tests need more work, the code has some unnecessary duplication, hosting isn't fully figured out, but the end-to-end test-harness is even more stable than their own.
"How do we protect ourselves against a competitor doing this?"
Noodling on this at the moment.
It should be perfectly ok (by maintainer or anyone for that mater) to be inspired from a community project and build something from scratch hand-crafting/ AI sloping, as long as the imitation is given a new name/ identity.
What rubbed me off personally was maintainer saying "pin your dependncies to version 6.0.0 or 5.x.x", as if maintainer owns the project. maintainer role is more akin to serve the community, not rule.
If it is completely new, why not start a new project with new name? No one will object. And of course leave the old project behind to whoever is willing to maintain it. And if the new name project is better, people will follow.
> Unfortunately, because the code that chardet was originally based on was LGPL, we don't really have a way to relicense it. Believe me, if we could, I would. There was talk of chardet being added to the standard library, and that was deemed impossible because of being unable to change the license.
So the person that did the rewrite knew this was a dive into dangerous water. That's so disrespectful.
Question: if they had built one using AI teams in both “rooms”, one writing a spec the other implementing, would that be fine? You’d need to verify spec doesn’t include source code, but that’s easy enough.
It seems to mostly follow the IBM-era precedent. However, since the model probably had the original code in its training data, maybe not? Maybe valid for closed source project but not open-source? Interesting question.
His Python books, although a bit dated, are something I still recommend to new Python programmers.
Pin your dependency versions people! With hashes at this point, cant trust anybody out here.
All AI generated code is tainted with GPL/LGPL because the LLMs might have been taught with it
If the code is different but API compatible, Google Java vs Oracle Java case shows that if the implementation is different enough, it can be considered a new implementation. Clean room or not.
Otherwise all this rewrite accomplishes is a 2.3% accuracy improvement and some performance gains that might not be relevant in production, in exchange for a broken test suite, breaking changes, and unnecessary legal and ethical risks pushed out as an update to what was already a stable project.
If it's truly a sufficiently separate project that it can be relicensed from LGPL, then it could've just been _a fully separate project with a new identity_, and the license change would've been at least harder to challenge. Instead, we're here.
What's worse - disassembler+AI is good enough to "translate" the binary into working source code, probably in a different programming language than then original.
2,305 files changed
+0 -546871 lines changed
https://github.com/chardet/chardet/commit/7e25bf40bb4ae68848...Legal: How much are you willing to spend on litigation? The only real "protection" by copyright is in court.
Be really careful who you give your projects keys to, folks!
I am sure I am missing something ... what is it?
Why does this new project here needed to replace the original like that in this dishonourable way? The proper way would have been to create a proper new project.
Note: even Python's own pip drags this in as dependency it seems (hopefully they'll stick to a proper version)
AFAIK this was not a clean room reimplementation. But since it was rewritten by hand, into a different language, with not just a different internal design but a different API, I could easily buy that chardetng doesn't infringe while Python chardet 7 does.
Licensing aside, morally you don't rewrite someone else's project with the same package name.
I don't think that the second sentence is a valid claim per se, it depends on what this "rewritten code" actually looks like (IANAL).
Edit: my understanding of "clean room implementation" is that it is a good defence to a copyright infrigement claim because there cannot be infringement if you don't know the original work. However it does not mean that NOT "clean room implementation" implies infrigement, it's just that it is potentially harder to defend against a claim if the original work was known.
https://repo.or.cz/tinycc.git/blob/3d963aebcd533da278f086a3e...
The interesting part is that the original author is against it but some people claims it could be a rewrite and not a derivative work.
I don't know the legal basis of everything but it's definitly not morally correct toward the original author.
If the new code was generated entirely by an LLM, can it be licensed at all? Or is it automatically in the public domain?
> I put this together using Claude Code with Opus 4.6 with the amazing https://github.com/obra/superpowers plugin in less than a week. It took a fair amount of iteration to get it dialed in quite like I wanted, but it took a project I had been putting off for many years and made it take ~4 days.
Given the amount of changes I seriously doubt that this re-implementation has been reviewed properly and I wonder how this is going to be maintainable going forward.
That is just the easiest way to disambiguate the legal situation (i.e. the most reliable approach to prevent it from being considered a derivative work by a court).
I'm curious how this is gonna go.
“chardet 7.0 is a ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x”
Do people not write anymore?
What is this recent (clanker-fueled?) obsession to give everything fancy computer-y names with high numbers?
It's not a '12 stage pipeline', it's just an algorithm.
While I am obviously Team GPL and not team "I 'rewrote' this with AI so now it's mine", I'm team anti-fork, and definitely not team 'Chardet'.
Forking should be a last resort, one better option is to yeet the thing entirely.
And chardet lends itself perfectly for this, using chardet is a sign of an issue and low craftmanship, either by the developer using chardet, or the developer that failed to signal the encoding of their text. (See Joel Spolsky's "The absolute minimum every developer should know about character encoding") (And let's be honest, it's probably the developers problem, not everything is someone else's fault.)
Just uninstall this thing where you can, and avoid installing it always, because you always can.
You know I'm right. I will not be replying to copium
Someone should not be able to write a semi-common core utility, provide it as a public good, abandon it for over a decade, and yet continue to hold the rest of the world hostage just because of provenance. That’s a trap and it’s not in any public interest.
The true value of these things only comes from use. The extreme positions for ideals might be nice at times, but for example we still don’t have public access to printer firmware. Most of this ideology has failed in key originating goals and continues to cause headaches.
If we’re going to share, share. If you don’t want to share, don’t. But let’s not setup terminal traps, no one benefits from that.
If we flip this back around though, shouldn’t this all be MPL and Netscape communications? (Edit: turns out they had an argument about that in the past on their own issue tracker: https://github.com/chardet/chardet/issues/36)