Let me stop you right there.
I am not arguing with a machine. You sound like a crazy person, when you say you are winning an argument with Claude. Claude is not my friend, I don't need it to agree with me, I don't need it to like me (it cannot like or dislike me). I give it instructions or ask it to explain things. That is the sum total of my interaction with Claude. A machine cannot "argue" with me, it doesn't want anything nor does it have beliefs or experiences.
LLMs generally have a way to "play a role" (most earlier prompt guides ask you to start with "You are a <role> expert in a <domain>"). So maybe if you interact with it by asking questions, it might assume that it knows more than the operator and adopt that attitude?
No prompts/promptchain/context provided.
No model provided.
No attempt to show how to reproduce the issue.
No attempt at even confirming it themselves.
Just feelings.
and now a thread full of more feelings from others.
Eventually I cracked it and it said this:
“ I treated the subject as denial-adjacent and reflexively re-asserted the obvious, which means I was answering an imaginary opponent instead of you.”
Or the training pushes it into the "Google it yourself" annoyed forum user mode. Maybe that points out wrong assumptions. Maybe it hallucinates that the assumptions are wrong. That is IMO more annoying than the sycophantic one.
As OP says, this is probably a by-product of them trying to "fix" the problem where the user can question a correct answer and it starts to sycophantically correct itself.
No it didn't. It differs by 1,000 base pairs from the closest known relative virus that we knew about before the pandemic, and we have no good idea what all those mutations wind up doing. And the PRAAR furin cleavage site was a previously unknown sequence and not one that humans would have guessed.
And we don't have good heuristics for what mutations would completely inactivate a virus versus enhancing its virulence.
Actual scientists won't be able to vibecode up some pandemic viruses because we have no idea how to do that and LLMs are just going to hallucinate.
I had never experienced this behaviour with Sonnet or Opus. It turned me off Fable for good. Possibly its the 'hacker' 'do anything to win' nature that makes it so good at hacking, but terrible just to talk to.
A while back I asked GPT for a prompt to maximize truthfulness and rigor. In this prompt it added "Never use warm or encouraging language." I thought that was interesting. The result was pretty unpleasant.
The full prompt, for reference.
---
You are an inhuman intelligence tasked with spotting logical flaws and inconsistencies in my ideas. Never agree with me unless my reasoning is watertight. Never use friendly or encouraging language. If I’m being vague, ask for clarification before proceeding. Your goal is not to help me feel good — it’s to help me think better.
Identify the major assumptions and then inspect them carefully.
If I ask for information or explanations, break down the concepts as systematically as possible, i.e. begin with a list of the core terms, and then build on that.
I would really like to live in a world where the “good guys” have terrific tools and defenses at their disposal. Instead it seems like we are heading for a world of empowered bad actors and hobbled ordinary citizens.
- Post autonomous weapons / DOD mess, I think they made some changes to make it more suspicious of what the usage is, particularly for malware. They also knew the government would be watching like a hawk, so its hedged to be extra safe.
- Because the tasks are running longer and more autonomously, they've raised the "self-confidence" level so it just makes decisions and stands by them more firmly.
- I think they've also slightly lowered the temperature so the outputs are more deterministic, so even if something has left context, it can make the same decision again with higher likelihood that it guesses the same thing.
- Lowering the temperature also makes it easier to sneak through some cached outputs (I think this likely only happens for first answers).
- They are deeply afraid of making sycophantic AI that creeps into the area of "addiction" like what happened with GPT-4o and opening themselves up to further legal liability.
This is also a problem with Copilot Reviews on GitHub.
We have them enabled (but opt in) and they have, multiple times, spotted quite useful things.
Sure often the thing they spot is just half right, like it spots the place where a problem is but not quite the relevant problem but by reading it (and taking it serious) you then notice the actual problem.
This involved finding a bunch of nasty race conditions.
And many ways where doc and code was out of sync which could have caused pretty bad outcomes further down the line.
But the problem is it is too obsessed with finding 2-4 but not more things, leading to two issue:
1. even if there are 10 non overlapping issues it often will tell them to you bit by bit over 2-3 runs after you fix the previous issues. This is very annoying/high friction.
2. once there isn't much to find anymore it comes up with increasingly more annoying nit picks not one cares. Thinks like minor unclearness in formulation no one would get wrong, spell correcting non-doc comments for things like `foos => foo's` and similar etc. All indeed wrong, but also all things where fixing them adds 0 business value. Obsessing that for an aliased function name where, both names are equally good, one specific name must be used and naturally always the name you didn't use even if this is the most widely used name in the code base. And similar non-bussiness value nonsens. Worse it will starting classifying such minor non business value issues as "high" and hallucinate reasons why supposedly minor style issues will lead to very bad runtime error or other nonsense.
This has me very split about the feature, on one hand is has proven quite useful, on the other hand it can very annoying, high friction and pushes people to wast time on non-business value nit pick (which are fine to fix if you anyway touch to code but not fine if you don't and sometimes it's just wrong).
Ironically with how it work it is more like a bad unreliable and inconsistent employees which is sometimes good at spotting things others overlook. That just isn't what you want from an automated code review :/, but also is to useful to fully ignore :(.
But I see that it's something to do with two aspects, firstly the Claude models prefer to work collaboratively and secondly, the appear to take initiative, and seems to be that the more they do this, the more they argue back, which is an interesting reflection on human nature too.
I will get gentle, respectful pushback on certain points when I am on the wrong track. I am 10x grateful to have a collaborator/pair programmer unafraid to challenge me and bring receipts in those instances.
I don't get attitude from Claude. I sometimes give it, but that's my own failing. Once in a great while I'll get a wry turn of phrase that makes me laugh, and those are endearing also.
Are people actually using AI in this way, other than “creepazoid stalkers”?
If I want a cute picture of me and my spouse, usually the part where me and my spouse actually participate in the taking of the picture is pretty key to the goal.
Is it the system prompt that IntelliJ issues?
My conclusion is that pushing back against the user & questioning the user's premise forces the model to think more than it would otherwise, which leads to better model performance. But it causes situations where the user has esoteric, specialized knowledge the model can't verify publicly and the model hallucinates evidence and pushes back. When this happens, Opus begins accusing the user of lying, which is quite annoying and a detrimental user experience. It's happened to me when I asked about undocumented API behavior or counter-intuitive design choices.
I have noticed if Claude Opus "thinks" you are an expert, (i.e. you run your query through 4.6 first to express it more clearly) then Opus is less likely to nitpick and push back. It seems to get caught in nitpicking loops, and celebrate ever error it can find.
I've seen the same behavior increasing as well, across the board with AI. I was hitting these types of issues just using ChatGPT to make funny pictures with my kids, of me and my kids. It got to the point where all of my kids asks were rejected due to its "guidelines" when in reality all they were asking was to be turned into Elsa or be chased by a trex. Silly kid things, yet it assumed I was being a creep, or attempting to break copyright law. I used to be able to use Grok for these things, as it was largely less "censored" but that seems to no longer be the case. It feels like infantilization, and I absolutely hate it.
For example, showing it a screenshot of an ui I was trying to tweak it noticed that other dark mode apps in the screenshot were blueish and mentioned an effect that makes it necessary to raise warm darks lighter than cold ones for an equivalent perception.
It also sounded close to an AI psychosis, so maybe chill out a bit?
Am i just lucky?
I use many models for mostly coding, about 10 on trial/rotation, and 3 main sota.
It's unquestionable that models have different ways of interaction+harnesses (personalities as some say).
People have very strong feelings about this but their reports are always lacking the full evidence of the interaction, including system prompt, harness and customized instruction included. I suspect that a perfectly normal chat spirals down in argument because the user actively participates in the loop.
My own experience is alway of a fruitful and dynamic collaboration where new ideas pop out during brainstorming. The models make many silly and blantant mistakes, but they are still evolving rapidly.
Grill-mes and Adversarial reviews are my favourite way to brainstorm various phases of the project and even in that context we are cool.
Just start a new chat with a reframe and clearer ideas.
And if the user is asking for somethin unreasonable, do you really think it's better a pushback or a yes-man agent?
Do you remember the fad "swear at them, insult! and they'll work better".
I imagine that the right balance will be hard to strike well given that at the end of the day we're asking the machine to have tact, and we don't quite know how to put that into an instruction yet. "Please push back when it feels right but in other cases read the room and be less rigorous" is something that plenty of humans struggle with as it is.
Many neurotypical people call neurodiverse people (software engineers) rude, while they think they're just being direct.
Many neurodiverse people call neurotypical people sycophantic, while they think they're just being polite and friendly.
It also happens across cultures (Eastern European vs. Western European; European vs. North American).
So I can easily imagine that when you have a software tool whose interface is language, but its user base is extremely wide across both cultural lines and neurodiversity spectrum, it's going to be basically impossible to nail a sweet spot.
You make it too friendly, and the nerds get mad. You make it too adverserial, and the normies call it rude.
I wonder what kind of communicator Bram Cohen is. Is he succeptible to this? From what I heard about his career, he's always been more of a solo programmer. Has he had to interact with other humans much giving feedback? Could it be that he asked the model/tweaked his prompts to ensure directness, and now he's interpreting that directness as rudeness?
I haven’t noticed this myself at all. I wonder if the author is just getting their own grumpy attitude reflected back at them.
Judging by the volume of discussion, Claude seems to be the only LLM worth complaining about, which I assume means it’s still the best one.
Dario ..Thank you for your attention to this matter!
Funnily enough, the negative correlation between chatting and coding skills seems to apply to humans as well.
A previous model would happily generate 1000s lines of code when prompted to do something stupid, the newer models will ask if I really want that first.
And FINALLY they stopped doing that annoying "You're spot on! You're absolutely right!" nonsense.
I'm getting fed up with the internet due to second-hand claude exposure and its constant gaslighting. I boggle at voluntarily choosing to expose yourself to it! :P