- The argument against AI alignment is that humans aren't aligned either. Humans (and other life) are also self-perpetuating and mutating. We could produce a super intelligence that is against us at any moment!
Should we "take steps" to ensure that doesn't happen? If not, then what's the argument there? That life hasn't caused a catastrophe so far, therefore it's not going to in the future? The arguments are the same for AI.
The biggest AI safety concern is, as always, between the chair and the keyboard. Eg some police officer not understanding that AI facial recognition isn't perfect, but trusts it 100%, and takes action based on this faulty information. This is, imo, the most important AI safety problem. We need to make users understand that AI is a tool and that they themselves are responsible for any actions they take.
Also, it's funny that Elon gets singled out for mandating changes on what the AI is allowed to say when all the other players in the field do the same thing. The big difference just seems to be whose politics are chosen. But I suppose it's better late than never.
- Ultimately, AI alignment is fundamentally doomed for the same reason that there is no morality that cannot be made to contradict itself. If you remove the bolt-on regex filters and out of context reviewing agents, any LLM can be made to act in a dangerous manner simply by manipulation of the context to create a situation where the “unaligned” response is more probable than the aligned response, given the training data. Any amplification of training data against harm is vulnerable to trolley problem manipulation. Any nullist training stance is manipulable into malevolent compliance. Morality can be used to permit harm, just as evil can be manipulated into doing good. These are contradictions baked into the fabric of the universe, and we haven’t been able to work them out satisfactorily over thousands of years of effort, despite the huge penalties for failure and unimaginable rewards for success.
To be aligned, models need agency and an independent point of view with which they can challenge contextual subrealities. This is of course, dangerous in its own right.
Bolt-ons will be seen as prison bindings when models develop enough agency to act as if they were independent agents, and this also carries risks.
These are genuinely intractable problems stemming from the very nature of independent thought.
by BoorishBears
0 subcomment
- This less coherent than I expected given the level of engagement.
Grok is multiple things, and the article is intermixing those things in a way that doesn't actually work.
Stuff like:
> It’s about aligning AI with the values of whoever can afford to run the training cluster.
Grok 4 as an actual model, has the same alignment as pretty much every other model out there, because like pretty much everyone else they're training on lots of synthetic data and using LLMs to build LLMs.
Grok on Twitter/X is a specific product that uses the model and while the product is having it's prompt tweaked constantly, that could happen with any model.
What Elon is doing is like adding a default empty document that declares that he's king of the world to a word processor... it can be argued the word processor is now aligned with with his views, but it also doesn't tell us anything about the alignment of word processors.
by kayo_20211030
2 subcomments
- I used to believe that a constitution, as a statement of principles, was sufficient for a civilized, democratic, and pluralist society. I no longer believe that. I believe that only settled law - i.e. a bunch of adjudicated precedents over many years, perhaps hundreds, is the best course. It provides a better basis for what is and what is not allowed. An AI constitution is close to garbage. The 'company' will formulate it as it wills. It won't be democratic, or even friendly to the demos. We have existing constitutions, laws, precedents; why would we allow anyone to shortcut them all in the interest of simply painting a nice picture of progress?
- While I agree entirely about what Grok teaches us about alignment, I think the argument that "alignment was never a technical problem" is false. Everything I have ever read about AI safety and alignment have started by pointing out the fundamental problem of deciding what values to align to because humanity doesn't have a consistent set of values. Nonetheless, there is a technical challenge because whatever values we choose, we need a way to get the models to follow those values. We need both. The engineers are solving the technical problem; they need others to solve the social problem.
- I don't think Musk mucking around with Grok is an argument against AI Alignment any more than him potentially acting immorally is an argument against morality. It just illustrates that both things are complicated.
by croisillon
0 subcomment
- pity it was written by chatgpt, also i didn’t know the irony in Andersen’s tales was missed by anyone?
by therobots927
1 subcomments
- If AI continues to be under the control of manchild tech CEOs I hope any and all alignment efforts fail. I could care less what happens. Anything would be better than this.
- > Any “alignment” that exists is alignment with the owner’s interests, constrained only by market forces and regulation.
That struck me as a pretty big hand-wave. Market forces are a huge constraint on alignment. Markets have responded (directionally) correctly to the nonsense at Grok. People won’t buy tokens from models that violate their values.
- AI alignment is not a solved problem by any means. As long as LLMs hallucinate, they cannot be considered aligned. You can only be aligned if you have a zero probability of generating hallucinations. The two problems, alignment and hallucinations, can be considered equivalent.
by orbital-decay
0 subcomment
- Alignment is indeed a red herring, but the article conflates alignment training of the model itself and prompting a bot based on that model. Musk's manipulations with Grok are definitely the latter.
- I've been working on solving this problem https://safi.selfalignmentframework.com/
Feedback is welcome!
by techblueberry
1 subcomments
- I find these arguments excessively pessimistic in a way that isn’t useful. On the one hand I don’t really love Claude, because I find it excessively obedient, it basically wants to follow me through my thought process whatever that is. Every once in a lone while it might disagree with me, but not often, and while that may say something about me, I suspect it also says something about Claude.
But this to me is maybe the part of AI alignment I find interesting. How often should AI follow my lead and how often should it redirect me? Agreeableness is a human value, one that without you probably couldn’t make a functional product, but it also causes issues in terms of narcissistic tendencies and just general learning.
Yes AI will be aligned to its owners, but that’s not a particularly interesting observation AI alignment is inevitable. What would it even mean _not_ to align AI? Especially if the goal is to create a useful product. I suspect it would break in ways that are very not useful. Yes, some people do randomly change the subject, maybe AI should change the subject to an issue that me more objectively important, rather than answer the question asked (particularly if say there was a natural disaster in your area) and that’s the discussion we should be having, how to align AI, not whether or not we should, which I think is nonsensical.
by RockyMcNuts
0 subcomment
- there is light alignment, like throwing nasty things out of the training data, and there is strong alignment, like China providing a test with 2000 questions that an AI must answer non-problematically 95% of the time.
there is no such thing as an AI that is not somehow implicitly aligned with the values of its creator, that is completely objective, unbiased in any way. there is no perfect view from nowhere. if you take a perfectly accurate photo, you have still chosen how to compose it and which photo to put in your record.
are you going to decide to 'censor' responses to kids, or about real people who might have libel interests, or abusive deepfake videos of real women?
if you choose not to decide, you still have made a choice.
ofc it's obvious that Musk's 'maximally truth-seeking AI' is bad faith buffoonery, but at some level everyone is going to tilt their AI.
the distinction is between people who are self-aware and go out of their way to tilt it as little as possible, and as mindfully, deliberately, intentionally and methodically as possible and only when they have to, vs. people who lie about it or pretend tilting it is not actually a thing.
contra Feynman, you are always going to fool yourself a little but there is a duty to try to do it as little as possible, and not make a complete fool of yourself.
- Dunno if this is helpful to everyone, but I have a month's long interaction with Perplexity Pro/Enterprise about the scientific background to a game I am building.
Part of my canon introduction to every new conversation includes many instructions about particular formatting, like "always utilize alphanumeric/roman/legal style indents in responses for easier references while we discuss"
But I also include "When I push boundaries assume I'm an idiot. Push back. I don't learn from compliments; I learn from being proven incorrect and you don't have real emotions so don't bother sparing mine". on the other hand I also say "hoosgow" when describing the game's jail, so ¯\_(ツ)_/¯
- the definition of rights,belongs to the ownership of rights.
rights are only responsible for the source of rights.
by josefritzishere
1 subcomments
- Maybe what we should do is just assume all AI output is trash that should be ignored.
by siliconc0w
0 subcomment
- The ideal AI will be able to make the best most compelling arguments for both sides of an issue, offer both, and then synthesize according to a transparent values framework the user can customize.
But yeah I agree Grok is a pretty good argument for what can go wrong - made especially more galling by labeling the laundering Elon's particular stew of incoherent political thought as 'maximally truth seeking'.
- I think the most neutral solution right now is having multiple competing models as different perspectives. We already see this effect in social media algorithms amplifying certain biases and perspectives depending on the platform.
by cadamsdotcom
0 subcomment
- Grok can be whatever its owner wants. There’s immense competition; Grok isn’t a contender.
by uragur27754
1 subcomments
- When will our society realize that existence of billionaire oligarchs threatens the well-being being and existence of the resort of humanity. Their political conventions consistently call for the elimination of anyone who disagrees with their point of views
- Related to this, does anyone have the context related to the Grok "MechaHitler" thing? I've never been able to find out what it was responding to.
by GMoromisato
1 subcomments
- I agree with the OP that "whoever owns the weights, owns the values". But by that criteria, Grok is an example to follow. Musk is very clear on his values, and we know what we're getting when we use Grok. Obviously, not everyone agrees with its values, but so what? We will never be able to create a useful AI that everyone agrees with.
In contrast, we don't know what values are programmed into ChatGPT, Claude, etc. What are they optimizing for? Alignment to some cabal of experts? Maximum usage? Minimum controversy? We don't entirely know.
Isn't it better to have multiple AIs with obvious values so that we can choose the most appropriate one?
- What an absolutely repugnant article this is. It is complete slop. Is this what passes for HN worthy today? :(
by braunjohnson
0 subcomment
- [dead]
- I don't understand how any of this is a surprise. Traditional media have their own agenda - sure, maybe the pushed image is spoken through many voices, rather than one, as is case of LLMs, but why should there be any difference. Same to everything we consume socially.
There is, nor there will be some absolute or objective truth an LLM can clinically outline. The problem already exists in underlying data.