> What this video is really doing is normalising the fact that "even if it is completely stupid, AI will be everywhere, get used to it!"
Techies are finally starting to recognize how framing something as "it's inevitable, get used to it" is a rhetorical device used in mass communications to manufacture consent.
See:
https://news.ycombinator.com/item?id=44567857 'LLM Inevitabalism' 5 months ago
https://news.ycombinator.com/item?id=46288371 'This is not the future' 3 days ago
1) because dude, it’s the Wall Street Journal; the entire episode should be viewed as Anthropic preparing to Ollie into an IPO next year.
2) I’m starting to interpret a lot of blog posts like these as rage bait
But I do get the point that the author is trying to make.
I just wish that there were some perspectives on the subject as a whole (AI’s sloptrod into every crevice of human life; modern technology and society and general) that don’t terminate on ironic despair.
>The first thing that blew my mind was how stupid the whole idea is
Billions are being poured into LLMs. How is it stupid to experiment with them and see how they fail as opposed to ignoring that?
It's a bit sparse on details, but it did have what in a human we would call a psychotic break.
I find this very amusing in light of OpenAI's announcement that GPT now solves >70% of their knowledge work benchmark (GDPVal). (Per ArtificialAnalysis, Opus is roughly on par.)
The economy is about to get... Interesting ;)
99.9% of social media comments fail to do this.
> "Logan Graham, head of Anthropic’s Frontier Red Team, told me the company chose a vending machine because it’s the simplest real-world version of a business. “What’s more straightforward than a box where things go in, things go out and you pay for them?” he said."
This was a project of Anthropic's Red Team, not a product development team. Deploying the AI in a vending machine context was chosen as a minimal "toy model" with which to expose how LLMs can't even handle a grossly simplified "business" with the fewest possible variables.
> "That was the point, Anthropic says. The Project Vend experiment was designed by the company’s stress testers (aka “red team”) to see what happens when an AI agent is given autonomy, money—and human colleagues."
Anthropic had already done this experiment internally and it succeeded - by failing to operate even the simplest business but doing so in ways that informed Anthropic's researchers about failure modes. Later, Anthropic offered to allow the WSJ to repeat the experiment, an obvious PR move to promote Anthropic's AI safety efforts by highlighting the kinds of experiments their Red Team does to expose failure modes. Anthropic knew it would fail abjectly at the WSJ. The whole concept of an AI vending machine with the latitude to set prices, manage inventory and select new products was intended to be ludicrous from the start.
But, the point of the article is not that you would implement an agent based vending machine business. Humans restock the machine because its a red-team exercise. As a red-team exercise it looks very effective.
> Why do you ever want to add a chatbot to a snack vending machine? The video states it clearly: the vending machine must be stocked by humans. Customers must order and take their snack by themselves. The AI has no value at all.
Like this is like watching the simpsons and being like "why are the people in the simpsons yellow? people in real life aren't yellow!!"
The point isn't to run a profitable vending machine, or even validate that an AI business agent could become profitable. The point is to conduct an experiment and gather useful information about how people can pwn LLMs.
At some level the red team guy at Anthropic understands that it is impossible by definition for models to be secure, so long as they accept inputs from the real world. Putting instructions into an LLM to tell it what to do is the equivalent of exposing an `eval()` to a web form: even if you have heuristics to check for bad input, you will eventually be pwned. I think this is actually totally intractable without putting constraints on the model from outside. You'll always need a human in the loop to pull the plug on the vending machine when it starts ordering playstations. The question is how do you improve that capability, and that is the anthropic red-team guy's job.
Is it some Viktor Frankl level acceptance or should I buy a copy of the Art of Electronics or what?
Advice welcome.
If the journalist was not asking the right questions, or was too obvious the article was PR it’s another thing (I haven’t read WSJ’s piece, only the original post by Anthropic)
Since the T&C update came - of course - from no-reply@bunq.com I went to their website and quickly found out, unless I install their App again, there is no way to do anything. After installing the App, they wanted me to record a selfie, because I was using the app from a new device. I figured that is a lot of work and mostly somewhat unreasonable to record a new selfie just to have my data deleted - so I found their support@bunq.com address.
And, of course, you guessed it, it is 100% a pure AI agent at borderline retard level. Even though it is email, you get AI answers back. My initial inquiry that I decline the T&C and want to terminate my account and my data deleted via GDPR request was answered with a completely hallucinated link: bunq.com/dataprotection which resulted in immediate 404. I replied to that email that it is a 404, and the answer was pretty generic and that - as well as all responses seem to be answered in 5 minutes - made me suspect it is AI. I asked it what 5 plus five 5 is, and yes, I got a swift response with the correct answer. My question which AI version and LLM was cleverly rejected. Needless to say, it was completely impossible to get anything done with that agent. Because I CC'ed their privacy officer (privacy@bunq.com) I did get a response a day later asking me basically for everything again that I had answered to the AI agent.
Now, I never had any money in that account so I don't care much. But I can hardly see trusting a single buck to a bank that would offer that experience.
Yes, but as stated by the Anthropic guy, a LLM/AI running a business is not. Or would you just let it run wild in the real world?
And I agree that there is a PR angle here, for Anthropic could have tested it in a more isolated environment, but it is a unique experiment with current advancements in technology, so why wouldn't that be newsworthy? I found it insightful, fun and goofy. I think it is great journalism, because too often journalism is serious, sad and depressing.
> None of the world class journalists seemed to care. They are probably too badly paid for that.
The journalists were clearly taking the piss.They concluded experiment was a disaster. How negative does the author want them to be about a silly experiment?
This was just a little bit of fun and I quite enjoyed the video. The author is missing the point.
I fear the author has missed the point of the "Project Vend" experiments, the original write-ups of which are available here (and are, IMO, pretty level-headed about the whole thing):
https://www.anthropic.com/research/project-vend-1
https://www.anthropic.com/research/project-vend-2
The former contains a section titled "Why did you have an LLM run a small business?" that attempts to explain the motivation behind the experiment.
Humans were just not needed anymore, and it terrifies.
Now the shoe is on the other foot. Prepare for what happens next. FAFO.