The previous infringement case with Anthropic said that while training an AI was transformative and not itself an infringement, pirating works for that purpose still was definitely infringement all by itself. The settlement was $1.5bn, so close to $3k for each of the 500k they pirated, so if Zuckerberg pirated "millions" (plural) it is quite plausible his settlement could be $6bn.
I've wondered what the legalese justification for letting liability evaporate as it does so often with corps. So far the reasons I'm left with are 'shrugs' and 'the relevant provision (seemingly? apparently?) simply don't apply', neither of which are any good.
I was going to make a joke about how we should attach magnets to Aaron Swartz' corpse, since that'd make for a pretty potent energy source, given how fast he must be spinning. But honestly, I think he would have seen this sort of thing coming, given how his case was handled and how things really haven't gotten any better.
All the Aaron Schwartzes of the future could freely share scientific papers with the world.
The question to answer is, did it happen and if so is this copyright infringement (not covered by fair use), not which company official authorized it.
But a multi-billion dollar corporation downloading millions of copyrighted creative works so that they can reshape the entire labor market by training a new type of artificial intelligence model on that data set? Meh, sounds like Silicon Valley disruption, give the man a medal!
Like the fine means nothing to meta, and they'll still be the beneficiaries of their infringement.
In this current state, you really just need to have enough money to bypass this lawsuit and be on your way.
RICO specifically cites "criminal infringement of a copyright" as laid out in 18 U.S. Code § 2319. If the CEO tells his employees to download hundreds of thousands of works illegally in order to carry out his money-making scheme, how is that not organized crime even if (dubiously) LLM training on the material is fair use?
-----
RICO: https://www.law.cornell.edu/uscode/text/18/part-I/chapter-96
Definitions: https://www.law.cornell.edu/uscode/text/18/1961
> As used in this chapter — (1) “racketeering activity” means (A)[...]; (B) any act which is indictable under any of the following provisions of title 18, United States Code: [...], section 2319 (relating to criminal infringement of a copyright),[...]
18 U.S. Code § 2319 - Criminal infringement of a copyright: https://www.law.cornell.edu/uscode/text/18/2319
-----
edit:
> 18 U.S. Code § 1962 - Prohibited activities
> (c) It shall be unlawful for any person employed by or associated with any enterprise engaged in, or the activities of which affect, interstate or foreign commerce, to conduct or participate, directly or indirectly, in the conduct of such enterprise’s affairs through a pattern of racketeering activity[...].
https://www.law.cornell.edu/uscode/text/18/1962
From the lawsuit:
“Meta — at Zuckerberg’s direction — copied millions of books, journal articles, and other written works without authorization, including those owned or controlled by Plaintiffs and the Class, and then made additional copies of those works to train Llama,” the suit says. “Zuckerberg himself personally authorized and actively encouraged the infringement. Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.”
> Authors have sued AI companies for copyright infringement before - and lost.
So, basically nothing will come out of this
- Mark Zuckerberg
Tired of the double standard that CEOs get away when bad things happen (because they can’t be everywhere all the time) but all the benefits when the company makes a great profit (because they’re personally driving results!).
Who gave permission to Anthropic/ClosedAI to scan hundreds of thousands of books to feed to their systems which they commercially sell. Why is this the new normal. Even GitHub a month ago was like if you don't opt out we will read even your private repos for AI training.
Tech is turning into next level BS, I don't know if it always was like this but this has pierced even the very bottom.
> "81.7TB"
https://en.wikipedia.org/wiki/United_States_v._Swartz
> "approximately 70 gigabytes"
I know there's a complaint that AI can verbatim repeat that work. But so can human savants. No one is suing human savants for reading their books.
Producing copyrighted material, of course. Training on copyrighted material... I just don't see it.
EDIT: Making a perfectly valid point, but it's unpopular, so down I go.
How are these fruits "stolen" if they still have what was allegedley stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"
And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.
Royalties are owed and continuously owed as these models are deployed and doing inference. How is it any different to paying a small pittance to someone every time a song is played?