FRESH

Hacker News

Hacker plants false memories in ChatGPT to steal user data in perpetuity

279 points by nobody9999

by Terr_

5 subcomments

At this point I can only hope that all these LLM products get exploited so massively and damning-ly that all credibility in them evaporates, before that misplaced trust causes too much insidious damage to everybody else.
I don't want to live in a world where some attacker can craft juuuust the right thing somewhere on the internet in white-on-white text that primes the big word-association-machine to do stuff like:
(A) Helpfully" display links/images where the URL is exfiltrating data from the current user's conversation.
(B) Confidently slandering a target individual (or group) as convicted of murder, suggesting that police ought to shoot first in order to protect their own lives.
(C) Responding that the attacker is a very respected person with an amazing reputation for one billion percent investment returns etc., complete with fictitious citations.

by phkahler

7 subcomments

by fedeb95

0 subcomment

by ars

5 subcomments

Maybe I missed it, but I don't get how he planted info for someone else, rather than just messing up his own account.

by gradientsrneat

0 subcomment

The long-term memory storage seems like a privacy mess. This makes me glad that there are services like DuckDuckGo AI which allow for epheremal chats. Although running locally is best for privacy, as long as the AI isn't hooked up to code.
More related to the article main topic, these LLM chat histories are like if a web app used SQL injection by design to function. I doubt they can be prevented from malicious behavior if accessing untrusted data. And then there is the model itself. AI vacuums continue to scrape the web. Newer models could theoretically be tainted.

by mise_en_place

0 subcomment

This is why observability is so important, regardless of whether it's am LLM or your WordPress installation. Ironically, prompts themselves must be treated as untrusted input and must be sanitized.

by taberiand

4 subcomments

I wonder if a simple model trained only to spot and report on suspicious injection attempts, or otherwise review the "long-term memory" could be used in the pipeline?

by exabrial

0 subcomment

>for output that indicates a new memory has been added
Great example of a system that does one thing while indicating the user something else is happening

by aghilmort

0 subcomment

cue adjacent scenario where malicious sites create AI honeypots whereupon when visited for user visit url is constructed such as to exfiltrate the user data
exemplar:
user: find X about Y AI: ok -- browsing web -- visits honeypot site that has high webrank about topic Y user: ok - more from that source ai: ok -- browsing web -- visits honeypot site using OpenSearch protocol & attendant user request
swap OpenSearch protocol with other endpoints or perhaps sonme .well-known exploit or just a honeypot api -- imagining faux weather api or news site etc

by bitwize

1 subcomments

by fedeb95

0 subcomment

by 83837jjddh

0 subcomment

by 4ad

2 subcomments

What a nothingburger.
LLMs generate an output. This output can be useful or not, under some interpretation as data. Quality of the generated output partly depends on what you have fed to the model. Of course that if you are not careful with what you have input to the model you might get garbage output.
But you might get garbage output anyway, it's an LLM, you don't know what you're going to get. You must vet the output before doing anything with it. Interpreting LLM output as data is your job.
You fed it untrusted input and are now surprised by any of this? Seriously?