In what world would I ever expect a commercial (or governmental) entity to have precise alignment with me personally, or even with my own business? I argue those relationships are necessarily adversarial, and trusting anyone else to align their "AI" tool to my goals, needs, and/or desires is a recipe for having my livelihood completely reassigned into someone else's wallet.
This is true, and I believe that the "sufficient funds" threshold will keep dropping too. It's a relief more than a concern, because I don't trust that big models from American or Chinese labs will always be aligned with what I need. There are probably a lot of people in the world whose interests are not especially aligned with the interests of the current AI research leaders.
"Don't turn the visible universe into paperclips" is a practically universal "good alignment" but the models we have can't do that anyhow. The actual refusal-guards that frontier models come with are a lot more culturally/historically contingent and less universal. Lumping them all under "safety" presupposes the outcome of a debate that has been philosophically unresolved forever. If we get hundreds of strong models from different groups all over the world, I think that it will improve the net utility of AI and disarm the possibility of one lab or a small cartel using it to control the rest of us.
1. Introduction: <https://news.ycombinator.com/item?id=47689648> (619 comments)
2. Dynamics: <https://news.ycombinator.com/item?id=47693678> (0 comments)
3. Culture: <https://news.ycombinator.com/item?id=47703528>
4. Information Ecology: <https://news.ycombinator.com/item?id=47718502> (106 comments)
5. Annoyances: <https://news.ycombinator.com/item?id=47730981> (171 comments)
6. Psychological Hazards: <https://news.ycombinator.com/item?id=47747936> (0 comments)
And this submission makes:
7. Safety: <https://news.ycombinator.com/item?id=47754379> (89 comments, presently).
There's also a comprehensive PDF version for those who prefer that kind of thing: <https://aphyr.com/data/posts/411/the-future-of-everything-is...> (PDF) 26 pp.
(Derived from aphyr's comment: <https://news.ycombinator.com/item?id=47754834>.)
Anyone outside the UK can share what this is about?
You don't need to train new models. Every single frontier model is susceptible to the same jailbreaks they were 3 years ago.
Only now, an agent reading the CEOs email is much more dangerous because it is more capable than it was 3 years ago.
The cynic in me agrees with the article’s premise, but not because I believe "alignment is a joke", but because I doubt that humans are "biologically predisposed to acquire prosocial behavior."
Seems easy enough, I'm actually pretty confident in even the most incompetent of current world leaders in this particular task.
Geoffrey Hinton will not have his liver pecked out every day like Prometheus does.
I think the author is brushing against some larger system issues that are already in motion, and that the way AI is being rolled out are exacerbating, as opposed to a root cause of.
There's a felony fraudster running the executive branch of the US, and it takes a lot of political resources to get someone elected president.
I'm seeing that these tools are extremely powerful the hands of experts that already understand software engineering, security, observability, and system reliability / safety.
And extremely dangerous in the hands of people that don't understand any of this.
Perhaps reality of economics and safety will kick in, and inexperienced people will stop making expensive and dangerous mistakes.
How did brains acquire this predisposition if there is nothing intrinsic in the mathematics or hardware? The answer is "through evolution" which is just an alternative optimization procedure.
Such a fear mongering position. You can learn to build pipe bombs already. Take any chemical reaction that produces gas and heat and contain it. Congratulations, you have a pipe bomb.
Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.
> I see four moats that could prevent this from happening.
Really? Because you just said:
> human brains, which are biologically predisposed to acquire prosocial behavior
You think you're going to constrain _human_ behavior by twiddling with the language models? This is foolishly naive to an extreme.
If you put basic and well understood human considerations before corporate ones then reality is far easier to predict.
The internet produced 4chan. Produced scammers. Produced fraud. Instrumental in spreading child porn. Caused suicides. Many people lost their lives due to bullying on the internet. Many develop have addictions to gaming.
To anyone who has given it some thought, any sufficiently advanced technology usually affects both in good and bad ways. Its obvious that something that increases degrees of freedom in one direction will do so in others. Humans come in and align it.
There's some social credit to gain by being cynical and by signalling this cynicism. In the current social dynamics - being cynical gives you an edge and makes you look savvy. The optimistic appear naive but the pessimists appear as if they truly understand the situation. But the optimists are usually correct in hindsight.
We know how the internet turned out despite pessimists flagging potential problems with it. I know how AI will turn out. These kind of articles will be a dime a dozen and we will look at it the same way as we look at now at bygone internet-pessimists.
This is response not just to this article, but a few others.
I do think that safety is important. I'm particularly concerned about vulnerable people and sycophantic behavior. But I think it's better not to be a luddite. I will give a positively biased view because the article already presents a strongly negative stance. Two remarks:
> Alignment is a Joke
True, but for a different reason. Modern LLMs clearly don't have a strong sense of direction or intrinsic goals. That's perfect for what we need to do with them! But when a group of people aligns one to their own interest, they may imprint a stance which other groups may not like (which this article confusingly calls "unaligned model", even though it's perfectly aligned with its creators' intent). People unaligned with your values have always existed and will always exist. This is just another tool they can use. If they're truly against you, they'll develop it whether you want it or not. I guess I'm in the camp of people that have decided that those harmful capabilities are inevitable, as the article directly addresses.
> LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators.
What about the new scales of sophisticated defenses that they will enable? And for a simple solution to avoid the produced text and imagery: don't go online so much? We already all sort of agree that social media is bad for society. If we make it completely unusable, I think we will all have to gain for it. If digital stops having any value, perhaps we'll finally go back to valuing local communities and offline hobbies for children. What if this is our wakeup call?
1. AI becomes a highly protected technology, a totalitarian world government retains a monopoly on its powers and enforces use, and offers it to those with preexisting connections: permanent underclass outcome
2. Somehow the world agrees to stop building AI and keep tech in many fields at a permanent pre-2026 level: soft butlerian jihad
3. Futurama: somehow we get ASI and a magical balance of weirdness and dance of continual disruption keeps apocalypse in check and we accept a constant steady-state transformation without paperclipocalypse