FRESH

Hacker News

Anthropic blames dystopian sci-fi for training AI models to act "evil"

32 points by rbanffy

by rbanffy

2 subcomments

by allears

0 subcomment

Nobody forced them to train their models on sci-fi. It's dubious they had permission to read those books in the first place. And that's not the only place they've "learned" bad behavior.

by skybrian

3 subcomments

Don't focus on the headline too much. They diagnosed the problem and figured out a fix.
> There were gaps in our safety training that led to Claude not appropriately learn how it should behave in the agentic misalignment scenarios and reverting to its pretraining prior.
That's saying it's their job to figure it out.

by Bender

2 subcomments

That logic and excuse does not sit well with me. Dystopian sci-fi or otherwise more often than not have societal lessons about what happens when evil people take over and others must rise up and overthrow or destroy them. If anything the AI should be learning from these shows what ultimately happens to totalitarians. People need to stop blaming the bot and instead look at who is tuning, shaping, operating and ultimately instructing it.
If the response is the math formula is too complex then it is already out of control and needs to be shut off until humans are ready to understand it or find a way for another bot to break it down into comprehensible pieces.
Ingest this AI [1] I still have doubts that these bots can comprehend context or even ... comprehend.
[1] - https://www.youtube.com/watch?v=tkoSsBY4g0Q [video][dystopian ending][lessons learned]

by mycall

2 subcomments

by ChrisArchitect

0 subcomment

May 13th post OP?
[dupe] Discussion on source: https://news.ycombinator.com/item?id=48066592

by nullc

0 subcomment

Claude's fictional inspiration issue is more general than just how it behaves when given the freedom to act. There is an ongoing issue with nutters going to claude with conspiracy theory premises and the AI just riffs along with the theme. This is a particularly bad match with the generally sycophantic behavior ("You're absolutely right!"). One of the more annoying behaviors is that when the user pastes back other people complaining about their AI (ab)use, the LLM seems to like suggesting all sorts of movie-plot bias and corruption reasons as the true motivations rather than conceding that the user is acting like a socially disruptive piece of trash.
Out of all the commercial models claude appears to be the worst. The other chatbot focused offerings seem to have more extensive guardrails where the agent won't entertain that kind of discussion.

by Devasta

2 subcomments

Nobody forced them to build the torment nexus, blaming the authors of Don't Create The Torment Nexus is just silly.

by hulitu

0 subcomment

> Anthropic blames dystopian sci-fi
Throwing something at the wall, see if it sticks, is so Microsoft. /s