As might any plaintiff. NYT might be the first of many others and the lawsuits may not be limited to copyright claims
Why has OpenAI collected and stored 20 million conversations (including "deleted chats")
What is the purpose of OpenAI storing millions of private conversations
By contrast the purpose of NYT's request is both clear and limited
The documents requested are not being made public by the plaintiffs. The documents will presumably be redacted to protect any confidential information before being produced to the plaintiffs, the documents can only be used by the plaintiffs for the purpose of the litigation against OpenAI and, unlike OpenAI who has collected and stored these conversations for as long as OpenAI desires, the plaintiffs are prohibited from retaining copies of the documents after the litigation is concluded
The privacy issue here has been created by OpenAI for their own commercial benefit
It is not even clear what this benefit, if any, will be as OpenAI continues to search for a "business model"
Wanton data collection
I'm glad the NYT is fighting them. They've infringed the rights of almost every news outlet but someone has to bring this case.
But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.
In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.
It's quite literally a fishing expedition.
Little do they know that I care very little for either party and enjoy seeing both of them squirm. You went to business school, not me. Work it out.
In this case, it's awfully suspicious that OpenAI is worried about The New York Times finding literal passages in their articles that ChatGPT spits out verbatim. If your AI doesn't do that, like you say, then why would it be a problem to check?
Finally, both parties should find a neutral third party. The neutral third party gets the full text of every NYT article and ChatGPT transcript, and finds the matches. NYT doesn't get ChatGPT transcripts. OpenAI doesn't get the full text of every NYT article (even though they have to already have that). Everyone is happy. If OpenAI did something illegal, the court can find out. If they didn't, then they're safe. I think it would be very fair.
(I take the side of neither party. I'm not a huge fan of training language models on content that wasn't licensed for that purpose. And I'm not a huge fan of The NYT's slide to the right as they cheerlead the end of the American experiment.)
Their statements are all aspirational, "we're working toward de-identifying" etc. They've built one of the most powerful AIs ever seen and now they're claiming it's difficult to delete, de-identify / anonymize. Maybe they should ask their AI to do it :-)
It's impossible to take this company seriously. They're nothing but a carny barker stealing everything of value that they can lay their (creepy) hands on.
Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...
Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.
Probably because they have a lot to hide, a lot to lose, and no interest in fair play.
Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.
The way I see it, the problem is that OpenAI employees can look at the chats and the fact that some NYT lawyer can look at it doesn't make me more uncomfortable. Insane argumentation. It's like saying an investigator with a court-order should not be allowed to look at stored copies of letters, although the company sending those letters a) looks at them regularly b) stores these copies in the first place.
-- openai
And what if they for example find evidence of X other thing such as:
1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?
2. A crime.
3. An ongoing crime.
4. Something else they can sue someone else for.
5. Top secret information
what protection does user data typically have during legal discovery in a civil suit like this where the defendant is a service provider but relevant evidence is likely present in user data?
Does a judge have to weigh a users' expectation of privacy against the request? Do terms of service come into play here (who actually owns the data? what privacy guarantees does the company make?).
I'm assuming in this case that the request itself isn't overly broad and seems like a legitimate use of the discovery process.
1. search conversations for PII they know of for the given user and redact
2. use their own models to scrub the conversations of non-verbatim mentions
In the end, the NYT isn't asking for the identity of the posters. If that was the case, I'd 100% be onboard to scream bloody murder with 'em.https://www.schneier.com/blog/archives/2025/06/what-llms-kno...
At some point they'll monetize these dossiers.
If this is truly my data then it should be okay for me to download it and train my own model on it right?
Nope, that would explicitly be disallowed under the terms OpenAI has made me sign and they would ban my account and maybe even sue me for it.
So yeah, they are full of shit.
Private? Aren’t they stored in a third party server, subject to OpenAI terms of service and all sorts of relevant laws?
That on top of every lie they told, every value they betrayed, every line they crossed, they still have the nerve to blog about being the good guy!
They could’ve asked permission. They could have worked with content providers instead of scraping. But they didn’t - and they knew what could happen.
FA (with fair use boundaries) and FO
-OpenAI
Hard to be sympathetic with OpenAI here.
This is funny!
I think I have enough with the first sentence, no need to read more. The narration is clear, we are the brain and no one can stop us.
OpenAI may be trying to paint themselves as the goody-two-shoes here, but they're not.
Can I just say that everyone sucks here and I hope they both lose somehow?
What I don't understand is why they can't have a third party handle the data. Why does the NYT need it itself?
I wish I had a solution, so we could all feel a sense of freedom and pressure lifted from our thoughts and actions. But I only see this getting worse.
So am I upset that the NYT's lawyers want access to the records... a little. It's an invasion of privacy. But I'm more upset that they have anything to dig through to begin with.
If only we could see how things within all these companies we are forced to trust actually work. If only OpenAI was actually open. When will we all learn to demand open source, open platform services. Capitalize the development, and capitalize the infrastructure, but leave the process and operations out in the open so users can make informed decisions again. Normalize it like how homes are normally inspected before being purchased.
It's not like the NYT will be published this shit in the news. Their lawyers and experts will have access to make a legal case, under a protective order. I'm not going to lose my law license because I'm doing doc review and you asked it something naughty and I think it's funny.
Courts and lawyers deal with this stuff all the time. What's very very weird to me is how upset OpenAI is about it.
They look like they are hiding something.
That little bit of morality - truth, honesty, integrity, etc. - is essential to a functioning society that leans toward good outcomes. (Often it seems that many just assume we'll get good outcomes, not that they must work hard to make it happen.)
> Q: Is the NYT obligated to keep this data private?
> A: Yes. The Times would be legally obligated at this time to not make any data public outside the court process.
The NY Times has built over a century a reputation for fiercely protecting its confidential sources. Why are they somehow less trustworthy than OpenAI is?
If the NY Times leaked the customer information to a third party, they'd be in contempt of court. On the other hand, OpenAI is bound only by their terms of service with its customers, which they can modify as they please.
Let me rewrite this without propaganda:
Despite spending hundreds of millions of dollars on lawyers, we couldn't persuade the judge that our malfeasance should be kept from the light of day.
The New York Times is demanding that we turn over 20 million of your private
ChatGPT conversations. They claim they might find examples of you using
ChatGPT to try to get around their paywall.* I am not too familiar with this matter and hence definitely am not rooting for one party or another. Asking this just out of technical curiosity.
L O L
In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.
Privacy is paramount. People _trust_ that their chats are private: they ask sensitive questions, ones to do with intensely personal or private or confidential things. For that to be broken -- for a company to force users to have their private data accessed -- is vile.
The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data, etc. I hope we can collectively be better (I'm using ethical terms for a reason) than the other replies show. We don't have to support OpenAI's actions in order to oppose the NYT's actions.
Meanwhile, OpenAI talking about invading privacy sounds an awful lot like a claim with unclean hands.
- Is it part of a slow process of eroding public expectations of data privacy while blaming it on an external actor?
- Is it to undermine trust in traditional media, in an effort to increase dependence on AI companies as a source of truth?
- Is something else I'm not seeing?
I'm guessing it's all three of these?
[1] Those emails that came up in the suit with Elon Musk, followed by his eventual complete takeover of OpenAI, and the elaborate process of getting himself installed as chairman of the Reddit board to get the original founders back in control are prominent examples.
Maybe if you didn't scrape every single site on the internet they wouldn't have a basis for their case that you've stolen all of their articles through training your models on them. If anyone is to blame for this its openAI, not the NYT.
Play stupid games win stupid prizes.
OpenAI is lying about why they are doing this. They want the public to attack the New York Times because OpenAI probably broke the law in so many ways...
If they cared about privacy they would no training their models on that same private data. But here we are.
We need very strong regulations to rule in all these tech companies and make them work for their users instead of working against them and lying about it.
Also down to 20M from 120M per court order.
Sorry, but this seems a completely reasonable standard for discovery to me given the total lack of privacy on the platform - especially for free users.
Also sorry it probably means you’re going to owe a lot of money to the Times.
Under those circumstances, why wouldn't NYT have a case? I advise everybody who employs some sort of DRM or online system that limits access to ask for every chat that every one of these companies has ever had with anyone. Why are they the only people who get to break copyright and hacking laws? Why are they the only people who get to have private conversations?
I might also check if any LLMs have ever endorsed terrorist points of view (or banned political parties) during a chat, because even though those points of view may be correct (depending on the organization), endorsing them may be illegal and make you subject to sanctions or arrest. If people can't just speak, certainly corporate LLMs shouldn't be able to.
Of course the Times wants more evidence that the content OpenAI allegedly stole is ending in things OpenAI is selling.
Chats contain way too much sensitive private data to subject them to bulk fishing expeditions
Man, the sooner this company goes bankrupt the better.
Is this a joke? We all know people do this. There is no "might" in it. They WILL find it.
OpenAI is trying to make it look like this is a breach of user's privacy, when the reality is that it's operating like a pirate website and if it were investigated that would become proven.
This is so transparently icky. "Oh woe is us! We're being sued and we're looking out for YOU the user, who is definitely not the product. We are just a 'lil 'ol (near) trillion-dollar business trying to protect you!"
Come ON.
Look I don't actually know who's in the right in the OAI vs. NYT dispute, and frankly I personally lean more toward the side the says that you are allowed to train models on the world's information as long as you consume it legally and don't violate copyright.
But this transparent attempt to get user sympathy under insanely disingenuous pretenses is just absurd.
It gives an interesting insight into politics and the modern Democrat party that the newspaper of the wealthy leans so strongly left. This was even before Trump came to power.
That's simply a function of the fact it's a controversial news organization running a dragnet on private communications to a technology platform.
"Great cases, like hard cases, make bad law."
OpenAI is right here. The NYT needs to prove their case another way.