by Jcampuzano2
11 subcomments
- I'm struggling to understand why I'd ever use this instead of just using a lower effort level for opus given on many of the benchmarks listed the cost per task rises above opus at anything higher than medium effort.
Only thing I can think of is for when someone is out of opus credits. Of course there are API billing use cases but I'd probably still just use opus on low.
by conradkay
3 subcomments
- Wow, seems worse even on price/performance than GLM 5.2, which is only 744b parameters.
From the system card: "On CyberGym vulnerability discovery, Claude Sonnet 5 is less capable than Sonnet 4.6, and far less capable than Opus 4.8 and Mythos 5
As with the other evaluations in this section, these results were achieved with all safeguards turned off. When run with our default mitigations, Sonnet 5 scored a 0 on CyberGym"
by microtonal
21 subcomments
- Claude Sonnet 5 is built to be the most agentic Sonnet model yet. It can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.
I have been using Sonnet 4.6 more than Opus, because I'm mostly doing agent-assisted development and not fully agent-driven development. This announcement does not make me positive, I have found that the more models are optimized for fully agentic development, the worse they get at assisted development and often start doing too much despite very strict/specific instructions.
I have been moving more and more to K2.7 Code and GLM-5.2 the last few weeks. They are often good enough for assistance, very fast, and cheap.
- I just tested it on my benchmarks[0], it's GLM-5.2 level, at 2x cost, but also 2x faster.
Weak spots (categories it fails):
- Trivia — 0/3 - basically not much built-in knowledge
- Combined tool-calling tasks — score 45/100, sometimes makes invalid tool calls
- Puzzle Solving — score 77, flubs carwash-like tests
[0]: https://aibenchy.com/compare/anthropic-claude-sonnet-4-6-med...
- Claude Sonnet 5 itself described its pelican as looking like a goose:
> Illustration of a white goose riding a bicycle, with one wing extended forward to grip the handlebar, set against a plain white background with a brown ground line.
https://simonwillison.net/2026/Jun/30/claude-sonnet-5/
- Wonder if the whole cyber paranoia leads to their models ultimately generating less secure code. After all, if it has the ability to generate safe code, it would imply that it knows something about cybersecurity, which could surely be used to hack all the banks in the world.
- Important to note: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer that changes how the model processes text to improve performance (this is similar to the tokenizer change we introduced with Claude Opus 4.7). The tradeoff is that the same input can map to more tokens: roughly 1.0–1.35× depending on the content type. The introductory pricing is set so that the transition to Sonnet 5 is roughly cost-neutral."
by phillipcarter
3 subcomments
- Seems to be another great incremental update to the workhorse, nice!
I've been using Sonnet instead of Opus for almost all coding tasks for a while now. A little elbow grease to break down tasks and you can spend a lot less money for just about the same output quality.
- Edit June 30, 2026: In the original version of this post, we included a cost-performance chart for the BrowseComp evaluation that was based on data from a simpler methodology that did not reflect the standard methodology we use for agentic search evaluations. This had the result of underestimating Sonnet 5's performance on the evaluation.
They changed the Sonnet 5 'Agentic search' benchmark graph overnight
by doctoboggan
20 subcomments
- The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level - Opus always performs better for a given cost. So I guess the takeaway is that if Sonnet 5 medium isn't good enough for you, switch models, not effort levels.
- Anthropic outsmarted everyone again.
They released Sonnet 5 with a temporary price reduction until August. Everyone was excited, but in reality, they increased the tokenizer size by 50%. As a result, the actual cost went up by 50%, they shifted everyone's attention to decrease.
Thus, Anthropic is raising prices but not telling anyone about it. Nobody is really aware of it. You go to the pricing page, the price looks the same. Yet people are actually paying 50% more.
Very shady marketing.
And of course they lie about 35% again. In reality with coding it is 50%.
UPD: I run playcode.io, so it’s my job test all models, their pricing, quality in order to provide best price/quality/speedy/reliability to non-techy.
by satvikpendem
20 subcomments
- > Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.
Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.
And Opus 4.8 is still cheaper for a higher pass rate (much less open weight models like GLM 5.2) so not sure why I'd use Sonnet except on the low effort level for I suppose trivial tasks where I want it to work only 50% of the time judging by the graph. The pricing doesn't really make any sense.
- Judging from those cost-performance graphs, Sonnet doesn't make sense to run at anything higher than a medium reasoning level, since Opus 4.8 low reasoning outclasses it for the price.
This line as a selling point is also pretty funny:
> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.
- I didn't think they'd actually release a model that was worse than the open-weight frontier and at a higher price-point. Wow.
by theLiminator
3 subcomments
- Seems like the way to go for any smaller models is to only use the low reasoning levels, and for anything where you'd want it to reason harder, to just use a larger model.
In effect, high reasoning only makes sense when you're using the frontier model and need extra performance (higher levels of reasoning are never pareto optimal unless you're at the largest model size).
- > Me: What was the sushi place near latitude 41 in Columbus? Did it go out of business I don't see it on Google maps anymore
> Sonnet 5 (medium): None of these past chats mention a sushi place — I don't have anything on record about that. Do you remember the name, or roughly which part of Columbus (neighborhood/street) it was near? That'll help me search and check its current status.
Not impressed. It got the name right on high effort one shot but hallucinated the date relativity (Jan 2026 is not last month...). Worked okay on extra. Sonnet 4.6 worked fine on medium, high, and extra one shot.
Edit: as the replies point out, the prompt is definitely ambiguous however Sonnet 5 didn't even extract the semantic meaning "looking for <place> near <compound place>" which all the human replies seem to understand. Even Haiku 4.5 identifies the semantic meaning although it fails to retrieve the correct results
> Haiku 4.5 (reasoning off) I'll help you find information about that sushi place. Let me search for sushi restaurants that were near latitude 41 in Columbus and check their current status.
>_Searched the web_
> I see that "Latitude 41" is a restaurant in Columbus, but it's actually a modern American restaurant, not specifically a sushi place. However, based on your mention of latitude 41, that's what came up. Let me search more specifically for sushi restaurants that may have closed in Columbus around that area.
- When can we get a new Haiku? 4.5 came out nearly a year ago, and it's showing its age.
by brunooliv
1 subcomments
- I only wish Opus 4.6 from earlier this year at a faster inference speed.
Since Opus 4.6 things have been so much messier and the overall push for more agency isn’t really panning out for agent assisted development as much as they would like
by ashvardanian
1 subcomments
- Got really excited for this model and asked my Opus planners in 3 pretty different projects to use Sonnets instead of Opus subagents to help me experiment on HPC kernels faster. Not one of them ended up writing a single line of code... Sonnets just kept spinning, wasting tokens. Can't remember the last time it happened with Opus in my codebases. Reverting back.
by phtrivier
3 subcomments
- What is the reference, unbiased, honest, reputable and trustworthy site that ranks and compare models on the couple of realistic metrics that matters ? ("Does it work for code", "no, I mean, for real", "how much does it cost", etc...) ?
- Seems like the cyber detection even is on Sonnet now. https://support.claude.com/en/articles/14604842-real-time-cy...
by SkitterKherpi
1 subcomments
- $5/$25 for Opus 4.8 vs $3/$15 doesnt seem cheaper enough to be too worth it. It depends how much better it is than e.g. Mimo, but I imagine Mimo and co to be too cost efficient in the lower tier to be overtaken by Sonnet for most tasks.
by theHocineSaad
0 subcomment
- What's interesting is that Claude Sonnet 5 costs more per task ($2.29) than Opus 4.8 ($1.80), while the latter is obviously better!
It actually costs more per task than every other model. It's only cheaper than Claude Fable 5.
Source: https://artificialanalysis.ai/?cost=cost-per-task#price-and-..., as of writing this comment (the results are frequently changing)
- This is much more interesting of a model at $2/$10 (their launch pricing) than at full price. There are many competing models at around this level of performance.
I also like that the difference between low, medium, high, xhigh seems more spread, which is actually a good thing for people trying to tune applications. Running Sonnet 5 on low with the launch pricing makes this potentially a better fit than Haiku or open source models for some tasks. I don't think it will make sense at full price.
- Ironically, the key message of today's release is that Sonnet 5 is far less capable than Opus 4.8 and Mythos 5. It's a funny development is the past few weeks
- That’s nice, but we want Fable
by DonsDiscountGas
1 subcomments
- I'd love if they would include speed (though I know there are difficulties involved). At this point the quality of Opus 4.8 is no longer my limiting factor, it's the speed, so a faster model would be great.
by chipgap98
2 subcomments
- Interesting that tasks on extra high cost almost the same as Opus 4.8 with a slightly worse performance
by sreekanth850
0 subcomment
- After using codex i will never return to cc even if they offer it for free.
- Tbh we'll see what using it looks like, but the reasoning/cost charts do not look promising. It seems like the only useful reasoning level for Sonnet 5 is Low; medium might trade blows at price/performance with Opus, but anything beyond that Opus is Just Better.
I struggle to understand where this model fits in. If I need a cheap model for simple stuff (like, summarizing an email); I'd go Haiku (actually, I'd go Deepseek v4 Flash, but you catch my drift). I just can't think of many tasks where I'm like "yeah let me reach for Sonnet Low Reasoning so I can save a dollar but also seriously run the risk of it failing"; I'd just reach for Opus Low.
by johnhamlin
0 subcomment
- Kind of hilarious how much they’re touting that it sucks at cybersecurity like it’s a feature
- In our coding evaluations, we found Sonnet 5 is more capable than Sonnet 4.6 (which was an underrated model itself), but is now faster and slightly cheaper.
Sonnet 5's performance is comparable to GLM 5.2 in both one-shot coding and agentic ability. However, it's about ~20% less verbose than GLM 5.2 in average code submission sizes, and uses fewer reasoning tokens, which reduces the cost gap and suggests it writes cleaner code. In practice, Sonnet 5 ends up being 40% more expensive and ~2x faster than GLM 5.2 in our evaluations (not 300% more expensive as the per-token pricing would suggest). Granted, GLM 5.2 is an extremely reasoning heavy model.
Overall, it's a solid release that gives Anthropic some standing in the price-conscious inference market.
Data at https://gertlabs.com/rankings
- System Card: https://www-cdn.anthropic.com/d9bb04416ffe1352af84721476c1fa...
- 5 as in 5 times more likely to tell you that you can't edit your driver INF files because that enables DRM circumvention and is dangerous!
by stavarotti
0 subcomment
- I’ll continue to use the last great reasonably affordable duo from Anthropic: Opus 4.6 for planning and Sonnet 4.6 for implementation.
by taspeotis
1 subcomments
- > Claude Opus 4.7 and later Opus models, Claude Fable 5, Claude Mythos 5, Claude Mythos Preview, and Claude Sonnet 5 use a newer tokenizer that contributes to their improved performance on a wide range of tasks. This tokenizer produces approximately 30% more tokens for the same text. Claude Sonnet 4.6 and earlier models use the previous tokenizer.
- I accidentally used Sonnet 5 a bit today. It seemed significantly worse to me than Opus 4.8 for software development.
- Until now we've been using Sonnet 4 to power an editing agent in ApostropheCMS. Sonnet is a good price/quality/speed compromise, but sometimes when giving it a large set of instructions it would miss half of them. At least until we told it to go back and try again.
In my early tests tonight, Sonnet 5 is a LOT better out of the box. It's one-shotting complex instructions. It also recovered independently from bad instructions that led to an uninformative 400 error by using its schema-fetching tool to figure out there were was too much input.
If I have to gripe about something: it interpreted another impossible instruction by quietly discarding the input in question. But, the way it did it is... kinda exactly what anybody else would do, if they weren't in a position to change the implementation.
This is, obviously, early days but I'm impressed.
by Alien1Being
0 subcomment
- Only if you have no problem with their extremely harmful political lobbying.
by epsteingpt
0 subcomment
- If only the agentic model supported the most popular agents like Hermes and OpenClaw...
- Opus 4.8 beats Sonnet 5 on the pareto frontier in several of their graphs (Agentic Search, Agentic Computer Use).
In other words, for certain tasks, Opus 4.8 is cheaper than Sonnet 5, and does better than Sonnet 5.
I've noticed this pattern on a lot of benchmarks. You can try to emulate a bigger model by ramping up the test time compute (max reasoning, more turns, model fusion etc.), but you can't reach the same quality level, and you often exceed the cost you would have paid by just using a bigger model.
tldr: if you're doing something hard, just use a bigger model.
- Claude Sonnet 5 is built to be the most agentic Sonnet model yet.
or
The Dodge Charger is built to be the most Charger like car yet.
by richardfey
1 subcomments
- I don't know what I am doing right, or wrong, but I have access to claude and codex and I find myself giving the more serious work to codex recently. I tend to trust it more.
I might try again Fable when it's back, but this Sonnet 5 didn't work well for my current projects.
by kingjimmy
1 subcomments
- interesting footnotes: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer... can map to more tokens: roughly 1.0–1.35× depending on the content type." AKA expect higher costs on Sonnet 5 vs Sonnet 4.6 for the same tasks.
by theplumber
2 subcomments
- Is there any reason to use Sonnet instead of GLM?
- Claude is a great model for me, but unfortunately, its quota is often insufficient. It seems that many people are now considering Codex as an alternative. If the quota is sufficient, I believe many people will continue to use the Claude Code model.
by Escapade5160
0 subcomment
- At that price you should just use glm-5.2. You get an Opus class model for 1/3 the cost.
- I tried Sonnet 5 and burned the entire 5h quota on a single deep research run. This has never happened with Opus before.
by terekhindc
0 subcomment
- cost per task > opus-low is a weird place to land. is there a specific task shape where sonnet 5 medium actually wins?
- interesting how much worse the sentiment around Anthropic is getting
by baalimago
1 subcomments
- Not looking great for an upcoming IPO
by docheinestages
1 subcomments
- But does it burn tokens just like Opus? That's the feeling I have nowadays. Regardless of what model I choose, the 5-hour limit gets exhausted in the first hour or so.
- What I starting to hate is that each model's effort level can mean completely different power.
Today sonnet 5's med level effort is equivalent to sonnet 4.6 low level effort :/
by benjiro29
2 subcomments
- Anybody notice that they did not include Sonnet 5 Max in the "Agentic Search results", when comparing to Opus 4.8 ...
Based upon the "Agentic Computer usage", Sonnet 5 Max was going to be off "Agentic Search results" chart. lol ...
In short, Sonnet 5 Low/Medium is more cost efficient, if its a task below Opus 4.8 Medium. For the rest its expensive and your better off using Opus 4.8.
Why even release this model?
- Why is Claude Sonnet 5 allowed to be released but OpenAI Terra not? Are they not the same class of models?
- Not sure what niche it's going to occupy: too expensive for it's intelligence category.
- Fun/interesting to see how opensource models surpassed Anthropic's
- The jump in reasoning quality is noticeable. What's interesting is how it handles ambiguous instructions now — it seems to ask fewer clarifying questions and just makes a reasonable judgment call. That's a double-edged sword depending on your use case.
- Why did this get the coveted "5"? I want an Opus that can compete with GPT 5.5
- Sonnet 5 is not currently available in the EU region on Bedrock, whereas previous models were and still are. I wonder if this is only due to early stages of the rollout or if this is due to recent US restrictions.
Unfortunately that means I won't be using it at work for now.
- Sonnet seems to be really expensive
- The use of the "cheaper models" in big AI companies are next to useless as they don't even score as well as the open/super cheap Chinese models. Only the frontier big models like Fable and Opus have value.
- It does not pass the "I want to wash my car, should I drive or walk"
- I believe that’s gonna be meta for agentic coding this year for enterprises. Cost optimized models approaching SOTA capabilities on software engineering but without cybersec training.
- Anthropic's run on the model and product side of things is highly impressive. They got Sam A. punching the air consistently, which is well-deserved and self-inflicted above all.
- In the 4.x era, I prefer Sonnet to Opus. The quality of Sonnet generation is good enough for me, but it's much faster than Opus.
- > the computer use evaluation OSWorld-Verified. Sonnet 5 (orange line) is a strict improvement over Sonnet 4.6
cool to see, still waiting for models to get better at computer use.
- Let’s see how long until opus 5 comes out but to me this lends some credence to the rumour that fable/mythos was supposed to be opus 5
- > Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.
It seems being incompetent is a feature now...
by primaprashant
0 subcomment
- Based on both performance vs price charts, it seems using Opus 4.8 with med effort is almost a better choice than using Sonnet 5 at xhigh effort
by jerrygoyal
0 subcomment
- It's actually a huge update for building products, given most tasks are sub-agent driven where Sonnet is used, steered by Opus.
by OsrsNeedsf2P
0 subcomment
- Great timing. I just started using Claude Sonnet as a long term reverse engineering project[0] for a game I used to play as a kid. The cheaper tokens but sufficiently smart with hard verification makes it a perfect combo for the task
[0] https://github.com/dginovker/BFME-Source-Code/
- idk, i think they just tried to compensate for the ban of fable, nothing too good
- Costs are very opaque from within the product...
- In my case, 4.6 degraded massively over time. 5 fails the same basic tasks that I gave 4.6 yesterday. And quite frankly this low, med, high, extra, max, turbo, ultra, ludicrous nonsense is getting tiresome
- American AI company status: We are now bragging about how bad our models are unironically.
Okay.
by Scroll_Swe
0 subcomment
- I don't pay so I'm glad for the upgrade. I usually use Gemini, Mistral Le Chat (Vibe...) or Deepseek as they have way more generous free limits and I can basically spam forever.
by smallerfish
0 subcomment
- Ah that's why Opus has been so slow for the last couple of days.
- Important to note that the cost graphs are heavily distorted. The agentic serch one for example is divided into 3 'columns': $0-$2, $2-$5 and $5-$10.
And yet, the $2-$5 section is the widest, even though it only contains a single point.
I can't even say if this is making the product look better or not, but it sure is weird. Maybe Claude just hallucinated those splits xD
- It looks good. Now waiting for Opus 5.
- It's not Fable, but I'll take it.
by tensegrist
1 subcomments
- there was a vibecoded prediction market–style page that was put up yesterday (?) that got the date exactly right i think
by matheusmoreira
1 subcomments
- Who cares about Sonnet? I want to know about Fable. Are the export restrictions really going to be permanent?
- Roughly on par with GLM 5.2 at 5x the price
- So many things to think about regarding these "benchmarks":
- Do the ever increasing scores on the mean we will soon have models that approach 100%? And what would that even mean? That there is no more room for improvement?
- Would Anthropic (or any other model vendor for that matter) ever release a newer model that scores lower? If not, does that mean they keep tweaking a new model they want to release until it shows an improvement of the prior model?
- Would it be more useful to move toward a comparative rather than absolute ranking?
by ai_fry_ur_brain
0 subcomment
- Finally a model release where everyone is realising the scam. The world is healing (maybe).
by micromacrofoot
1 subcomments
- So they repackaged Fable and added "don't scare the government" to the prompt
by docheinestages
1 subcomments
- Is it just me or is there a huge difference between how much one can accomplish in a 5-hour window with GPT 5.5 on xhigh versus any Claude model?
by neonstatic
0 subcomment
- I appreciate they added thinking. Sonnet used to think in the actual response, leading to a lot of unnecessary burden for me. "This thing is X, no wait, it's actually Y. Therefore..." - now it's hidden in the thinking trail, so I don't have to read it unless I want to.
by PeterStuer
0 subcomment
- Anyone else feel like Opus 4.8 got significantly dumber over the last 2 weeks?
by Foobar8568
0 subcomment
- And Anthropic put that shit model as default, after a single prompt I was wondering what was the shit it was spouting, and yes, Sonnet 5.
- I'd rather upgrade myself to a more effective version, thanks. in part because I have a monopoly in the market on providing Me
- Too expensive?
- Have they ever said what the difference is between Sonnet and Opus? Are they trained differently? Different architectures? Is Sonnet a distillation? Is it just that Sonnet has less resources for inference?
None of the other labs are doing this kind of long lived two model series.
by artursapek
0 subcomment
- I run a proofreading benchmark that tests how well models can find and fix errors in English text. They get several passes in a simple agent loop. Sonnet 5 is definitely better than Sonnet 4.6, but inferior on both quality and cost to GLM 5.1, GLM 5.2, Gemini 3.1 Flash, and Gemini 3.1 Pro. https://revise.io/errata-bench
- Is this the default model for non-paying users? If so, that could be an interesting move in the competition for this segment.
by ekjhgkejhgk
0 subcomment
- In effective terms they're lowering prices.
- opus is better
by impodimium
0 subcomment
- Eh still looks like it is weaker than Opus 4.8 but maybe a good replacement for Sonnet 4.x
- I feel like this is a bit of a disappointment. Sonnet 4 was a clear step above Opus 3.x, while this is a lot muddier.
by ClaudioCronin
0 subcomment
- nice!
by andrewchambers
0 subcomment
- The whole fable fiasco really soured me on Anthropic. This just looks disappointing by comparison.
by mesmertech
0 subcomment
- Ok thats a one month clock to the next Opus model at least, so thats a silver lining to a meh model.
- What is the point if it is one Trump's brain fart away from being blocked?
by botfriendsarent
0 subcomment
- Sonnet 5 OUCH! every model is just loaded with more hurt, stolen content, BS prompts, more scare tactics, more illusions, more government lobbying, less honesty.
Oh Claude you master of software engineering does it ever end?
DO you have no bounds?
How may we further assist you oh Claude?
by stackedinserter
1 subcomments
- "Our new model is proudly dumber now!"
- should have called it 4.9, it don't deserve the 5 monkeier
by Getchowned
0 subcomment
- Fable soon please.
by kvetching
2 subcomments
- GLM 5.2 is better and cheaper. Maybe they are trying to embarrass Trump by making it look like we are losing to China.
by Madmallard
0 subcomment
- Claude thread top of HN
loads of trust me bro benchmarks
financially incentivized comments and upvote/downvoting patterns
it's all slop
- [flagged]
- [flagged]
- [flagged]
by justicehunter
0 subcomment
- [dead]
by aykutseker
0 subcomment
- [dead]
by ricardobeat
0 subcomment
- [dead]
by yashthakker
0 subcomment
- [dead]
- [dead]
- AMAZING