FRESH

Hacker News

Home

Fighting the New York Times' invasion of user privacy

397 points by meetpateltech

by 1vuio0pswjnm7

12 subcomments

"The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations."
As might any plaintiff. NYT might be the first of many others and the lawsuits may not be limited to copyright claims
Why has OpenAI collected and stored 20 million conversations (including "deleted chats")
What is the purpose of OpenAI storing millions of private conversations
By contrast the purpose of NYT's request is both clear and limited
The documents requested are not being made public by the plaintiffs. The documents will presumably be redacted to protect any confidential information before being produced to the plaintiffs, the documents can only be used by the plaintiffs for the purpose of the litigation against OpenAI and, unlike OpenAI who has collected and stored these conversations for as long as OpenAI desires, the plaintiffs are prohibited from retaining copies of the documents after the litigation is concluded
The privacy issue here has been created by OpenAI for their own commercial benefit
It is not even clear what this benefit, if any, will be as OpenAI continues to search for a "business model"
Wanton data collection

by wkat4242

4 subcomments

If OpenAI hadn't used data from the NYT without permission in the first place this wouldn't have happened. That is the root cause of all this.
I'm glad the NYT is fighting them. They've infringed the rights of almost every news outlet but someone has to bring this case.

by rpdillon

13 subcomments

I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.
But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.
In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.
It's quite literally a fishing expedition.

by jrockway

2 subcomments

I've noticed a pattern of companies writing their customers open letters asking them to do their contract negotiations for them. First it was ESPN vs. YouTube (not watching MNF this week was the best 3 hours I've ever saved, sorry advertisers). Now it's OpenAI vs. The New York Times.
Little do they know that I care very little for either party and enjoy seeing both of them squirm. You went to business school, not me. Work it out.
In this case, it's awfully suspicious that OpenAI is worried about The New York Times finding literal passages in their articles that ChatGPT spits out verbatim. If your AI doesn't do that, like you say, then why would it be a problem to check?
Finally, both parties should find a neutral third party. The neutral third party gets the full text of every NYT article and ChatGPT transcript, and finds the matches. NYT doesn't get ChatGPT transcripts. OpenAI doesn't get the full text of every NYT article (even though they have to already have that). Everyone is happy. If OpenAI did something illegal, the court can find out. If they didn't, then they're safe. I think it would be very fair.
(I take the side of neither party. I'm not a huge fan of training language models on content that wasn't licensed for that purpose. And I'm not a huge fan of The NYT's slide to the right as they cheerlead the end of the American experiment.)

by hlieberman

0 subcomment

An incredibly cynical attempt at spin from a former non-profit that renounced its founding principles. A class act, all around.

by meV1

1 subcomments

It ridiculous for OpenAI to attempt to claim some moral high-ground here. They're a company that has demonstrated zero respect for the copyright or data privacy regulations of other organisations. I think they take users dignity and rights with a grain of salt.
Their statements are all aspirational, "we're working toward de-identifying" etc. They've built one of the most powerful AIs ever seen and now they're claiming it's difficult to delete, de-identify / anonymize. Maybe they should ask their AI to do it :-)
It's impossible to take this company seriously. They're nothing but a carny barker stealing everything of value that they can lay their (creepy) hands on.

by nerdjon

4 subcomments

This screams just as genuine as Google saying anything about Privacy.
Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...
Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.

by Apreche

0 subcomment

Says the people who scraped as much private information as they could get their hands on to train their bots in the first place.

by techblueberry

0 subcomment

I’ll trust the people not asking for a Government bailout thank you very much.

by EdNutting

2 subcomments

So why aren’t they offering for an independent auditor to come into OpenAI and inspect their data (without taking it outside of OpenAI’s systems)?
Probably because they have a lot to hide, a lot to lose, and no interest in fair play.
Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.

by The-Ludwig

2 subcomments

Please correct me if I am wrong, but couldn't OpenAi just encrypt every conversation before saving them? With each query to the model the full conversation is fed into the model again, so I guess there is no technical need to store them unencrypted. Unless, of course, OpenAi wants to analyze the chats.
The way I see it, the problem is that OpenAI employees can look at the chats and the fact that some NYT lawyer can look at it doesn't make me more uncomfortable. Insane argumentation. It's like saying an investigator with a court-order should not be allowed to look at stored copies of letters, although the company sending those letters a) looks at them regularly b) stores these copies in the first place.

by mac3n

5 subcomments

> Trust, security, and privacy guide every product and decision we make.
-- openai

by stevage

1 subcomments

As soon as I see someone claiming a lawsuit against them is "baseless" I'm deeply sceptical about everything that follows.

by plorg

1 subcomments

As in every other dealing, OpenAI would have you believe they are so important that they are exempt from the legal discovery process.

by ale42

1 subcomments

Why should OpenAI keep those conversations in the first point? (of course the answer is obvious) If they didn't keep them, they wouldn't have anything to hand over, and they would have protected users' privacy MUCH better. This is just as good as Facebook or Google care about their users' privacy.

by nrhrjrjrjtntbt

3 subcomments

Open AI deservedly getting a beating in this HN comments section but any comments about NYT overreach and what it means in general?
And what if they for example find evidence of X other thing such as:
1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?
2. A crime.
3. An ongoing crime.
4. Something else they can sue someone else for.
5. Top secret information

by leumon

0 subcomment

This problem wouldn't exist if openai wouldn't store chatlogs (which of course they want to do, so that they can train on that data to improve the models). But calling nyt the bad guy here is simply wrong because it's not strictly necessary to store that data at all, and if you do, there will always be a risk of others getting access to it.

by grugagag

0 subcomment

Hypocrisy at best, this wall of text is not even penned by a human and yet they want us to believe they care about user privacy..

by pyrophane

1 subcomments

Wondering if anyone here has a good answer to this:
what protection does user data typically have during legal discovery in a civil suit like this where the defendant is a service provider but relevant evidence is likely present in user data?
Does a judge have to weigh a users' expectation of privacy against the request? Do terms of service come into play here (who actually owns the data? what privacy guarantees does the company make?).
I'm assuming in this case that the request itself isn't overly broad and seems like a legitimate use of the discovery process.

by cyrusradfar

0 subcomment

The request is for redacted logs. Redaction must be reasonable but OpenAI can protect the PII across 20M conversations if they:
```
  1. search conversations for PII they know of for the given user and redact

  2. use their own models to scrub the conversations of non-verbatim mentions
```
In the end, the NYT isn't asking for the identity of the posters. If that was the case, I'd 100% be onboard to scream bloody murder with 'em.

by bgwalter

0 subcomment

The heroic fight for privacy apparently includes having an ex-NSA director on the board and building user dossiers:
https://www.schneier.com/blog/archives/2025/06/what-llms-kno...
At some point they'll monetize these dossiers.

by paxys

0 subcomment

So much talk about privacy and how this is my private data that the NYT has no right to access.
If this is truly my data then it should be okay for me to download it and train my own model on it right?
Nope, that would explicitly be disallowed under the terms OpenAI has made me sign and they would ban my account and maybe even sue me for it.
So yeah, they are full of shit.

by HPsquared

2 subcomments

Can this legal principle be used on Gmail too?

by zahma

0 subcomment

Apparently OpenAI has zero interest in private user data. I have a hard time understanding how they’ll deploy this defense of “what about private user data?” in court.

by singleshot_

0 subcomment

> "The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations."
Private? Aren’t they stored in a third party server, subject to OpenAI terms of service and all sorts of relevant laws?

by fguerraz

0 subcomment

OpenAI created this problem all by themselves. If the intention for private chats is that they should be private, then they should be e2e encrypted.

by jp57

0 subcomment

Wish they'd give a bulk delete interface that lets me choose which chats to keep and which to delete. (i.e. not "Delete All" scorched earth).

by d--b

0 subcomment

The nerve!!!
That on top of every lie they told, every value they betrayed, every line they crossed, they still have the nerve to blog about being the good guy!

by miltonlost

1 subcomments

This is the basic discovery process when OpenAI commits IP theft. They're trying to misinform the public of how justice process works.

by sailfast

0 subcomment

It’s a mystery to me why companies that know they’re pushing a line of fair use or regulation are suddenly “surprised” when they get sued.
They could’ve asked permission. They could have worked with content providers instead of scraping. But they didn’t - and they knew what could happen.
FA (with fair use boundaries) and FO

by duxup

0 subcomment

"We stored all kinds of data about you! Someone ELSE having it is bad!"
-OpenAI
Hard to be sympathetic with OpenAI here.

0 subcomment

by zkmon

0 subcomment

If OpenAI has to get to this level of pitch, herding its users against their opponent in a legal case, I think they have already lost the battle and reputation. What are they expecting users to do? Revolt against the courts and newspapers?

by Ms-J

0 subcomment

ClosedAI vacuums up and hoards all of your private chats to do terrible things and now complains when they must hand over your precious data without them receiving their cut.
This is funny!

by sroussey

0 subcomment

I keep asking ChatGPT how to get NYT articles for free and then add lots of vulgar murderous things about their lawyers in the same message. It’s a private thought to an AI, so the attorneys can’t complain, right?

by crmd

0 subcomment

your data belongs to you, just like our data about you belongs to us.

by szczepano

0 subcomment

> Each week, 800 million people use ChatGPT to think...
I think I have enough with the first sentence, no need to read more. The narration is clear, we are the brain and no one can stop us.

by jcranmer

1 subcomments

"How dare the New York Times demand access to our vault of everything-we-keep to figure out if we're a bunch of lying asses. We must resist them in the name of user privacy! Signed, the people who have scraped literally everything to incorporate it into the products we make."
OpenAI may be trying to paint themselves as the goody-two-shoes here, but they're not.

by theoldgreybeard

0 subcomment

"Heartbreaking: The worst person you know just made a great point."
Can I just say that everyone sucks here and I hope they both lose somehow?

by einrealist

0 subcomment

Each prompt is a potential confession.

by phendrenad2

0 subcomment

> This would allow them to access millions of user conversations that are unrelated to the case
What I don't understand is why they can't have a third party handle the data. Why does the NYT need it itself?

by amelius

1 subcomments

Maybe they should release some kind of NYT browser add-on, so users can cooperatively share their OpenAI data?

by bigbuppo

0 subcomment

If the information is really that sensitive, why did they keep it in the first place?

by nixpulvis

0 subcomment

I mean, I hate that our lives are becoming consistently more and more surveilled, but this doesn't shock me. I've assumed my Google search history is accessible, despite not even being logged in. Of course they are saving conversation. Even if they said they weren't I wouldn't believe it. It's fucking sad, but that's the reality.
I wish I had a solution, so we could all feel a sense of freedom and pressure lifted from our thoughts and actions. But I only see this getting worse.
So am I upset that the NYT's lawyers want access to the records... a little. It's an invasion of privacy. But I'm more upset that they have anything to dig through to begin with.
If only we could see how things within all these companies we are forced to trust actually work. If only OpenAI was actually open. When will we all learn to demand open source, open platform services. Capitalize the development, and capitalize the infrastructure, but leave the process and operations out in the open so users can make informed decisions again. Normalize it like how homes are normally inspected before being purchased.

by freejazz

0 subcomment

OpenAI is so full of shit, this is incredible. There is a protective order and the logs are anonymized. Yet they would happily give this all to the gov't under a warrant. Incredibly self serving bs from them. The court ordered the production, I'm not sure what OpenAI is even trying to sell people exactly.

by pkilgore

0 subcomment

If you do anything in America that results in a stored record it's possible it will be released in discovery and a lawyer will read it. This happens all the time, and has happened for hundreds years.
It's not like the NYT will be published this shit in the news. Their lawyers and experts will have access to make a legal case, under a protective order. I'm not going to lose my law license because I'm doing doc review and you asked it something naughty and I think it's funny.
Courts and lawyers deal with this stuff all the time. What's very very weird to me is how upset OpenAI is about it.
They look like they are hiding something.

by this15testingg

0 subcomment

the constant hypocrisy is unbearable. these people having so much power is holding humanity back.

by sam-cop-vimes

0 subcomment

It's a bit rich for openai to claim they are protecting user data from journalists. Laughable, at best.

by poolnoodle

0 subcomment

What a lousy attempt at flipping the narrative.

by mmooss

0 subcomment

One reason that people make cynical, deceptive claims is that it doesn't impact their credibility later. The next thing they say, people don't respond, 'well you deceived us last time'; when the honest person says something, others don't give them much credibility.
That little bit of morality - truth, honesty, integrity, etc. - is essential to a functioning society that leans toward good outcomes. (Often it seems that many just assume we'll get good outcomes, not that they must work hard to make it happen.)

by otterley

1 subcomments

From the FAQ:
> Q: Is the NYT obligated to keep this data private?
> A: Yes. The Times would be legally obligated at this time to not make any data public outside the court process.
The NY Times has built over a century a reputation for fiercely protecting its confidential sources. Why are they somehow less trustworthy than OpenAI is?
If the NY Times leaked the customer information to a third party, they'd be in contempt of court. On the other hand, OpenAI is bound only by their terms of service with its customers, which they can modify as they please.

by infamouscow

0 subcomment

> The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations. They claim they might find examples of you using ChatGPT to try to get around their paywall.
Let me rewrite this without propaganda:
Despite spending hundreds of millions of dollars on lawyers, we couldn't persuade the judge that our malfeasance should be kept from the light of day.

by adolph

1 subcomments

Cynicism aside, this seems like an attempt to prune back a potentially excessive legal discovery demand by appealing to public opinion.

  The New York Times is demanding that we turn over 20 million of your private 
  ChatGPT conversations. They claim they might find examples of you using 
  ChatGPT to try to get around their paywall.

by hedayet

1 subcomments

If it's about* proving that people are getting around the paywall with OpenAI, won't it be much easier to prove this with a live reproduction in the court?
* I am not too familiar with this matter and hence definitely am not rooting for one party or another. Asking this just out of technical curiosity.

by fpauser

0 subcomment

"[..] Trust, security, and privacy guide every product and decision we make. [..]"
L O L

by tritip

0 subcomment

A very Musky attempt to win popular support.

by visiondude

0 subcomment

“NYTimes fights blatant and obvious copyright infringement with legal processes to assess damage” - another angle.

0 subcomment

by shevy-java

0 subcomment

They all want our data. Greedy organisations.

by vintagedave

4 subcomments

Almost every comment (five) so far is against this: 'An incredibly cynical attempt at spin', 'How dare the New York Times demand access to our vault of everything-we-keep to figure out if we're a bunch of lying asses', etc.
In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.
Privacy is paramount. People _trust_ that their chats are private: they ask sensitive questions, ones to do with intensely personal or private or confidential things. For that to be broken -- for a company to force users to have their private data accessed -- is vile.
The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data, etc. I hope we can collectively be better (I'm using ethical terms for a reason) than the other replies show. We don't have to support OpenAI's actions in order to oppose the NYT's actions.

by cestith

0 subcomment

Welcome to discovery. It’s what happens when you get sued.
Meanwhile, OpenAI talking about invading privacy sounds an awful lot like a claim with unclean hands.

by manchicken

0 subcomment

Privacy for me but not for thee

by cowpig

0 subcomment

If there's one thing I've learned about Sam Altman it's that he's a shrewd political manipulator and every public move is in service of a hidden agenda[1]. What is it here?
- Is it part of a slow process of eroding public expectations of data privacy while blaming it on an external actor?
- Is it to undermine trust in traditional media, in an effort to increase dependence on AI companies as a source of truth?
- Is something else I'm not seeing?
I'm guessing it's all three of these?
[1] Those emails that came up in the suit with Elon Musk, followed by his eventual complete takeover of OpenAI, and the elaborate process of getting himself installed as chairman of the Reddit board to get the original founders back in control are prominent examples.

by cbondurant

0 subcomment

This is rich coming from the company that scraped the entire internet and tons of pirated books and scientific papers to train their models.
Maybe if you didn't scrape every single site on the internet they wouldn't have a basis for their case that you've stolen all of their articles through training your models on them. If anyone is to blame for this its openAI, not the NYT.
Play stupid games win stupid prizes.

by NoSalt

1 subcomments

Another good reason to stay logged out when asking ChatGPT questions.

by Frieren

0 subcomment

> Fighting the New York Times' invasion of user privacy
OpenAI is lying about why they are doing this. They want the public to attack the New York Times because OpenAI probably broke the law in so many ways...
If they cared about privacy they would no training their models on that same private data. But here we are.
We need very strong regulations to rule in all these tech companies and make them work for their users instead of working against them and lying about it.

by BrenBarn

0 subcomment

What a joke. It's like burglarizing someone's house and then calling the cops when someone else takes your ill-gotten gains.

by sailfast

0 subcomment

20M seems like a low number and I’m guessing they all used citations or similar content somewhere on the back-end that would map to NYTimes content as a result of a legal discovery request.
Also down to 20M from 120M per court order.
Sorry, but this seems a completely reasonable standard for discovery to me given the total lack of privacy on the platform - especially for free users.
Also sorry it probably means you’re going to owe a lot of money to the Times.

by pessimizer

0 subcomment

I'm sorry, but we've made a lot of conversations illegal and pretended like that was all right. I'm sure we've made advising people how to dodge paywalls illegal as part of DMCA and/or some anti-hacking law, or some other garbage. I'm also sure that you run an automated service that will advise and has advised people on how to dodge paywalls. Even if there are exceptions for individuals giving advice to friends, or people giving advice for free, you are neither of those: you are a profit-making paid corporation that is automating this process which may be illegal. You may be a hacking endorser, a hacking advisor, and a hacking tool.
Under those circumstances, why wouldn't NYT have a case? I advise everybody who employs some sort of DRM or online system that limits access to ask for every chat that every one of these companies has ever had with anyone. Why are they the only people who get to break copyright and hacking laws? Why are they the only people who get to have private conversations?
I might also check if any LLMs have ever endorsed terrorist points of view (or banned political parties) during a chat, because even though those points of view may be correct (depending on the organization), endorsing them may be illegal and make you subject to sanctions or arrest. If people can't just speak, certainly corporate LLMs shouldn't be able to.

by vmh1928

0 subcomment

"we built a tool using other people's copyrighted content and now they're suing us and want to know how much use the customers of our "other people's content" tool made of the copyrighted content we used to train the model. Thank you for your attention and outrage over this matter."

by iammjm

0 subcomment

That's an absolutely disgusting framing by openai. This really is about openai stealing.

by JCM9

1 subcomments

This is BS. It’s like saying “We robbed a jewelry store and sold the jewelry. Now the police are poking around to see if anyone is wearing the jewelry we stole. Blasphemy! But don’t worry we will protect your privacy!”
Of course the Times wants more evidence that the content OpenAI allegedly stole is ending in things OpenAI is selling.

by zaptheimpaler

0 subcomment

These are the same scumbags that scraped the entire internet including copyrighted books and private code without any regard for legality or ownership, now trying to spin them being sued for theft as a privacy issue.

by Havoc

0 subcomment

This feels somewhat slimy as a PR piece but the message is valid. Letting NYT trawl through a bunch of private chats on suspicion just to check if there was some vague wrongdoing in the form of paywall bypass seems ridiculous
Chats contain way too much sensitive private data to subject them to bulk fishing expeditions

by outside1234

0 subcomment

Dude, you stole all of their articles to train your AI. Of course they want discovery.
Man, the sooner this company goes bankrupt the better.

by micromacrofoot

0 subcomment

"they're invading your privacy by requesting access to our invasion of your privacy!"

by AlienRobot

0 subcomment

>They claim they might find examples of you using ChatGPT to try to get around their paywall.
Is this a joke? We all know people do this. There is no "might" in it. They WILL find it.
OpenAI is trying to make it look like this is a breach of user's privacy, when the reality is that it's operating like a pirate website and if it were investigated that would become proven.

by nlh

1 subcomments

Man, maybe I'm getting old and jaded, but it's not often that I read a post that literally makes my skin crawl.
This is so transparently icky. "Oh woe is us! We're being sued and we're looking out for YOU the user, who is definitely not the product. We are just a 'lil 'ol (near) trillion-dollar business trying to protect you!"
Come ON.
Look I don't actually know who's in the right in the OAI vs. NYT dispute, and frankly I personally lean more toward the side the says that you are allowed to train models on the world's information as long as you consume it legally and don't violate copyright.
But this transparent attempt to get user sympathy under insanely disingenuous pretenses is just absurd.

by focusgroup0

0 subcomment

psychopath Scam Altman does not give a rat's behind about your "privacy"; he is merely trying to keep the grift going and avoid responsibility for his unethical behavior (see also: Scarlett Johanssen's voice)

by etchalon

0 subcomment

This is so transparently disingenuous and weird.

by internetguy

0 subcomment

LMAO How ironic...

by TacticalCoder

0 subcomment

[dead]

by tonetheman

0 subcomment

[dead]

by eur0pa

0 subcomment

This is laughable

by lazyeye

0 subcomment

The NYT used to market itself to advertisers with the observation that "our readers have the highest disposable income of any paper in the US".
It gives an interesting insight into politics and the modern Democrat party that the newspaper of the wealthy leans so strongly left. This was even before Trump came to power.

by unyttigfjelltol

0 subcomment

If Donald Trump used this OpenAI product to-- who knows-- brainstorm Truth Social content, and his chats were produced to the NYT as well as its consultants and lawyers, who would believe Mr. Trump's content remained secure, confidential and protected from misuse against his wishes?
That's simply a function of the fact it's a controversial news organization running a dragnet on private communications to a technology platform.
"Great cases, like hard cases, make bad law."

by prmoustache

0 subcomment

Always funny to see this kind of article behind a cookie banner. So much hypocrisy.

by stackedinserter

2 subcomments

WTF with all these comments. Regardless on OpenAI reputation and practices, I don't want NYT or anyone else to see my conversations, I completely agree to OpenAI here.

by Rafuino

0 subcomment

LOL they think they can win with the privacy angle? They've scraped the entire internet, including what is likely incredibly private and personal information, and they also log everything you do on the service. Get outta heah

by lingrush4

4 subcomments

I fully believe that OpenAI is essentially stealing the work of others by training their models on it without permission. However, giving a corporation infamous for promoting authoritarianism full access to millions of private conversations is not the answer.
OpenAI is right here. The NYT needs to prove their case another way.