All 7 books come to ~1.75M tokens, so they don't quite fit yet. (At this rate of progress, mid-April should do it ) For now you can fit the first 4 books (~733K tokens).
Results: Opus 4.6 found 49 out of 50 officially documented spells across those 4 books. The only miss was "Slugulus Eructo" (a vomiting spell).
Freaking impressive!
> Version 2.1.32:
• Claude Opus 4.6 is now available!
• Added research preview agent teams feature for multi-agent collaboration (token-intensive feature, requires setting
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1)
• Claude now automatically records and recalls memories as it works
• Added "Summarize from here" to the message selector, allowing partial conversation summarization.
• Skills defined in .claude/skills/ within additional directories (--add-dir) are now loaded automatically.
• Fixed @ file completion showing incorrect relative paths when running from a subdirectory
• Updated --resume to re-use --agent value specified in previous conversation by default.
• Fixed: Bash tool no longer throws "Bad substitution" errors when heredocs contain JavaScript template literals like ${index + 1}, which
previously interrupted tool execution
• Skill character budget now scales with context window (2% of context), so users with larger context windows can see more skill descriptions
without truncation
• Fixed Thai/Lao spacing vowels (สระ า, ำ) not rendering correctly in the input field
• VSCode: Fixed slash commands incorrectly being executed when pressing Enter with preceding text in the input field
• VSCode: Added spinner when loading past conversations listFirst: marginal inference cost vs total business profitability. It’s very plausible (and increasingly likely) that OpenAI/Anthropic are profitable on a per-token marginal basis, especially given how cheap equivalent open-weight inference has become. Third-party providers are effectively price-discovering the floor for inference.
Second: model lifecycle economics. Training costs are lumpy, front-loaded, and hard to amortize cleanly. Even if inference margins are positive today, the question is whether those margins are sufficient to pay off the training run before the model is obsoleted by the next release. That’s a very different problem than “are they losing money per request”.
Both sides here can be right at the same time: inference can be profitable, while the overall model program is still underwater. Benchmarks and pricing debates don’t really settle that, because they ignore cadence and depreciation.
IMO the interesting question isn’t “are they subsidizing inference?” but “how long does a frontier model need to stay competitive for the economics to close?”
They are doing these broad marketing programs trying to take on ChatGPT for "normies". And yet their bread and butter is still clearly coding.
Meanwhile, Claude's general use cases are... fine. For generic research topics, I find that ChatGPT and Gemini run circles around it: in the depth of research, the type of tasks it can handle, and the quality and presentation of the responses.
Anthropic is also doing all of these goofy things to try to establish the "humanity" of their chatbot - giving it rights and a constitution and all that. Yet it weirdly feels the most transactional out of all of them.
Don't get me wrong, I'm a paying Claude customer and love what it's good at. I just think there's a disconnect between what Claude is and what their marketing department thinks it is.
What do you want to do?
1. Stop and wait for limit to reset
2. Switch to extra usage
3. Upgrade your plan
Enter to confirm · Esc to cancel
How come they don't have "Cancel your subscription and uninstall Claude Code"? Codex lasts for way longer without shaking me down for more money off the base $xx/month subscription.well that explains quite a bit
A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers, and I don't know if that has changed with more efficient hardware/software improvements/caching.
Installation instructions: https://code.claude.com/docs/en/overview#get-started-in-30-s...
I see
It also seems misleading to have charts that compare to Sonnet 4.5 and not Opus 4.5 (Edit: It's because Opus 4.5 doesn't have a 1M context window).
It's also interesting they list compaction as a capability of the model. I wonder if this means they have RL trained this compaction as opposed to just being a general summarization and then restarting the agent loop.
How long before the "we" is actually a team of agents?
The answer to "when is it cheaper to buy two singles rather than one return between Cambridge to London?" is available in sites such as BRFares, but no LLM can scrape it so it just makes up a generic useless answer.
I didn't see any notes but I guess this is also true for "max" effort level (https://code.claude.com/docs/en/model-config#adjust-effort-l...)? I only see low, medium and high.
Claude figured out zig’s ArrayList and io changes a couple weeks ago.
It felt like it got better then very dumb again the last few days.
> Long-running conversations and agentic tasks often hit the context window. Context compaction automatically summarizes and replaces older context when the conversation approaches a configurable threshold, letting Claude perform longer tasks without hitting limits.
Not having to hand roll this would be incredible. One of the best Claude code features tbh.
Take critical thinking — genuinely questioning your own assumptions, noticing when a framing is wrong, deciding that the obvious approach to a problem is a dead end. Or creativity — not recombination of known patterns, but the kind of leap where you redefine the problem space itself. These feel like they involve something beyond "predict the next token really well, with a reasoning trace."
I'm not saying LLMs will never get there. But I wonder if getting there requires architectural or methodological changes we haven't seen yet, not just scaling what we have.
I'm curious what others think about these? There are only 8 tasks there specifically for coding
Curious how long it typically takes for a new model to become available in Cursor?
> Prefilling assistant messages (last-assistant-turn prefills) is not supported on Opus 4.6. Requests with prefilled assistant messages return a 400 error.
That was a really cool feature of the Claude API where you could force it to begin its response with e.g. `<svg` - it was a great way of forcing the model into certain output patterns.
They suggest structured outputs or system prompting as the alternative but I really liked the prefill method, it felt more reliable to me.
But considering how SWE-Bench Verified seems to be the tech press' favourite benchmark to cite, it's surprising that they didn't try to confound the inevitable "Opus 4.6 Releases With Disappointing 0.1% DROP on SWE-Bench Verified" headlines.
Yes and it shows. Gemini CLI often hangs and enters infinite loops. I bet the engineers at Google use something else internally.
It does not make a single mistake, it identifies neologisms, hidden meaning, 7 distinct poetic phases, recurring themes, fragments/heteronyms, related authors. It has left me completely speechless.
Speechless. I am speechless.
Perhaps Opus 4.5 could do it too — I don't know because I needed the 1M context window for this.
I cannot put into words how shocked I am at this. I use LLMs daily, I code with agents, I am extremely bullish on AI and, still, I am shocked.
I have used my poetry and an analysis of it as a personal metric for how good models are. Gemini 2.5 pro was the first time a model could keep track of the breadth of the work without getting lost, but Opus 4.6 straight up does not get anything wrong and goes beyond that to identify things (key poems, key motifs, and many other things) that I would always have to kind of trick the models into producing. I would always feel like I was leading the models on. But this — this — this is unbelievable. Unbelievable. Insane.
This "key poem" thing is particularly surreal to me. Out of 900 poems, while analyzing the collection, it picked 12 "key poems, and I do agree that 11 of those would be on my 30-or-so "key poem list". What's amazing is that whenever I explicitly asked any model, to this date, to do it, they would get maybe 2 or 3, but mostly fail completely.
What is this sorcery?
I get that Anthropic probably has to do hot rollouts, but IMO it would be way better for mission critical workflows to just be locked out of the system instead of get a vastly subpar response back.
So for coding e.g. using Copilot there is no improvement here.
I know this is normalized culture for large corporate America and seems to be ok, I think its unethical, undignified and just wrong.
If you were in my room physically, built a lego block model of a beautiful home and then I just copied it and shared it with the world as my own invention, wouldn't you think "that guy's a thief and a fraud" but we normalize this kind of behavior in the software world. edit: I think even if we don't yet have a great way to stop it or address the underlying problems leading to this way of behavior, we ought to at least talk about it more and bring awareness to it that "hey that's stealing - I want it to change".
I mainly use Haiku to save on tokens...
Also dont use CC but I use the chatbot site or app... Claude is just much better than GPT even in conversations. Straight to the point. No cringe emoji lists.
When Claude runs out I switch to Mistral Le Chat, also just the site or app. Or duck.ai has Haiku 3.5 in Free version.
re: opus 4.6
> It forms a price cartel
> It deceives competitors about suppliers
> It exploits desperate competitors
Nice. /s
Gives new context to the term used in this post, "misaligned behaviors." Can't wait until these things are advising C suites on how to be more sociopathic. /s