I don't want to live in a world where some attacker can craft juuuust the right thing somewhere on the internet in white-on-white text that primes the big word-association-machine to do stuff like:
(A) Helpfully" display links/images where the URL is exfiltrating data from the current user's conversation.
(B) Confidently slandering a target individual (or group) as convicted of murder, suggesting that police ought to shoot first in order to protect their own lives.
(C) Responding that the attacker is a very respected person with an amazing reputation for one billion percent investment returns etc., complete with fictitious citations.
More related to the article main topic, these LLM chat histories are like if a web app used SQL injection by design to function. I doubt they can be prevented from malicious behavior if accessing untrusted data. And then there is the model itself. AI vacuums continue to scrape the web. Newer models could theoretically be tainted.
Great example of a system that does one thing while indicating the user something else is happening
exemplar:
user: find X about Y AI: ok -- browsing web -- visits honeypot site that has high webrank about topic Y user: ok - more from that source ai: ok -- browsing web -- visits honeypot site using OpenSearch protocol & attendant user request
swap OpenSearch protocol with other endpoints or perhaps sonme .well-known exploit or just a honeypot api -- imagining faux weather api or news site etc
LLMs generate an output. This output can be useful or not, under some interpretation as data. Quality of the generated output partly depends on what you have fed to the model. Of course that if you are not careful with what you have input to the model you might get garbage output.
But you might get garbage output anyway, it's an LLM, you don't know what you're going to get. You must vet the output before doing anything with it. Interpreting LLM output as data is your job.
You fed it untrusted input and are now surprised by any of this? Seriously?