I don't know when the extreme intellectual property viewpoint entered software engineering as a mainstream opinion because I have never before seen it expressed so strongly in this community (seeing as I wasn't around when Bill Gates famously asked for money first or whatever). To think that a past OpenOffice would have been considered unconscionably close to a copy of an old MS Office of the era twenty years ago.
In some way, The Corporations Won, because it turns out software engineers turned into IP maximalists. Thinking back to when I first installed Tux Kart decades ago I never could have imagined that we'd get to this stage. Really wild, man.
From that property rights perspective, the property that's created when new information is created is not the information itself, rather, it's the act of creation (claim to authorship) that's the scarce resource.
I don't know what a world looks like where the only form of IP is non-transferable and owned by the original creator. Maybe that new form of IP creates less value over all, and maybe that's ok if the creator is getting 100% of the smaller pie instead of crumbs from media labels. Companies like Red Hat could be an example of a viable business model if IP laws follow the current winds.
Companies like Corgi will need to rely on internal talent to ensure that their product is better than what someone looking at their product can vibe code a copy of, which from my perspective as a consumer, sounds like a better route than Corgi relying on an internal legal team to send a cease and desist letter.
As an artist who got repeatedly told to stop making buggy whips and get into the absolutely tedious-sounding new field of "writing prompts" every time I expressed dismay and displeasure about image generation around here, every story about this sort of thing here is the sweetest schadenfreude I have tasted in my life.
Especially when the general feeling in the markets I work in is that AI images are kinda tacky and empty and nasty, and people would rather pay another human to realize their ideas than try to refine image generation prompts for a couple hours and get something vaguely okay that makes people go "ew, AI".
Not looking at the source code has been used to make nuisance copyright lawsuits less likely (e.g. Phoenix and AMI implementations of IBM's BIOS) but it's still easy to prevail when a new work is created by rewriting some else's source code. (https://en.wikipedia.org/wiki/UNIX_System_Laboratories,_Inc.....)
Neither copyright nor patent cover a user interface (https://en.wikipedia.org/wiki/Apple_Computer,_Inc._v._Micros....), so that can legally be copied outright.
You don't need to register each release, so long as a material portion of the registered work exists in subsequent derivative works.
Without a registration threats of a copyright dispute are mostly noise to someone savvy enough to know how the game is played. If they think you'll persist they can just replace the infringing work or cease distribution, which is a hassle but not a significant deterrence for bad faith actors.
There's an argument to be made for patent protections, but many of those are questionable considering the number of trivial software-related patents (there must be a patent somewhere for replying to an online conversation through an edit box and an "add comment" button).
I don't know if LLMs can somehow help the situation. I hope they can expose the ridiculousness of software copyrights but I won't be holding my breath.
If you give it a prompt telling it to replicate a product that's in its training set then its optimal next token prediction output is going to be to a lossy copy of that product's source code.
https://en.wikipedia.org/wiki/Threshold_of_originality
Oh and if it's not human generated, you can just copy it.
The replication/copying has always been there in one form or another. The bar has traditionally been higher for reputation and monetary risks.
Lately the legal bar is the one that going down, ease of replication makes it even more tempting and when big players are doing it at scale (bots) then it validates the strategy in one way or another.
If anything, there have to be downstream consequences of this with time, libraries to pollute the front end code for LLMs are most likely going to get popular and probably one way to make it harder for your IP to be replicated.
e.g. If you're creating an uptime dashboard...they all kinda look the same anyway and there aren't that many ways to do it so that seems OK. If it's copying an comprehensive UI with layout and flow between the various pages etc then you're getting a bit closer to theft.
with such bad behavior from SWE community, you just got to lock down your app behind certificate pinning, hardware attestation, gRPC/protobufs, and internal data only. no more "free open web in browsers" when you get gents like this stealing other peoples efforts.
if you cant prove that the source code wasnt trained on, how can you show that its not a copy of the copywrited original?
(e.g. instagram copying snapchat)
I say let them sue for copyright infringement for the text. Let them sue for breaking a license. Let's see how it works out. Doesn't impact me and if it costs the rich money, good. Let them suffer.
The first thing to note is that nonliteral copying can still be infringing. Actually, among copyright cases that actually go to trial, most of them are not bit-exact matches ("striking similarity" in legalese). The lower standard those cases would have to meet is substantial similarity, which requires proving both access and similarity. In other words, in order for you to produce a copy[0], you have to have both seen the original and produced something that is close enough if you squint at it.
So let's say Papermark is the original and Corgi Dataroom copied it. If this actually went to trial, a significant amount of discovery would be spent harvesting all the e-mails and messages Corgi's development team sent to one another. Any evidence of knowledge or access to Papermark would probably be enough to prove a copyright violation.
You mentioned LLMs, and some of the tweets here also mention them. I have no clue if either product used an LLM, but it's important to note that in the US, anything written by an LLM does not accrue copyright protection. So, if Papermark was LLM-authored, as a threshold matter, they would have to register[2] a very specific copyright that neatly carves out the LLM-authored bits. The judge would then only consider the parts of the code with verifiable human authorship, which would severely weaken Papermark's case.
In the reverse case - i.e. Corgi Dataroom is LLM-authored - then Papermark's case becomes stronger. Note how I didn't say "AI slop is public domain" last paragraph, because it isn't. There are still unsettled legal questions as to whether or not training on copyrighted works is legal and if using an LLM trained on that work constitutes access to it. Furthermore, LLMs can have search tools that would give them access to code not within the training set, which would also be a more straightforward copyright violation. So if it turns out Dataroom's developers are all Claude fans, and Claude is copying Papermark code, then it's just a normal license violation. LLMs do not launder copyright.
We can also consider the case where BOTH tools are LLM-authored. In this case, there might just not be a copyright case at all. Technically speaking, there would be some third class of plaintiffs who have been infringed, but they would have to choose to sue. It is a long-standing principle in law that you are not allowed to sue for other people's harms[1]. So in this case, nobody would have a case.
> Now software developers are feeling what authors and artist felt
It was specifically the FOSS community that sounded the alarm about training data theft first, because the FOSS community correctly understood coding agents to be an attack on copylefts[3] (albeit through the incorrect belief that LLMs could launder away copyright interest, as opposed to it just being told to make a noninfringing substitute).
[0] I am skipping over notions of fair use and derivative works as they would complicate the analysis and do not apply here.
[1] In general, the operating principle of American courts is "fuck around and find out", and this implies that the court is only allowed to find out once someone has fucked around. Otherwise, the courts could just sue themselves to rule on whatever the hell they want. Isn't adversarial common law GREAT!?
[2] Yes, copyright registration is mandatory in the US, otherwise you can't sue, which is the whole point of copyright. The Berne Convention only half-applies here.
[3] Clauses in licenses that require modifications to the work to be provided under the same license terms. Creative Commons calls these Share-Alike licenses.
there may some other intellectual property remedy, or not, but it isn't copyright
hope that helps