- Anthropic was pulling ahead of their peers, but if they can't hear their customer's complaints about negatively changing value between releases they're going to undermine their position until no advantage is left.
- This is perfectly legitimate. It's something I've been denouncing day after day. Company X charges you 10dolar per token, while company Y charges you 7dolar, yet company X is cheaper because of the tokenizer they use. The token consumption depends on the tokenizer, and companies create tokenizers using standard algorithms like BPE. But they're charging for hardware access, and the system can be biased to the point that if you speak in English, you consume 17% less than if your prompt is written in Spanish, or even if you write with Chinese characters, you'll significantly reduce your token consumption compared to English speakers. I've written about this several times on HN, but for whatever reason, every time I mention it, they flag my post.
by kouteiheika
7 subcomments
- > Opus 4.7 tokenizer used 1.46x the number of tokens as Opus 4.6
Interesting. Unfortunately Anthropic doesn't actually share their tokenizer, but my educated guess is that they might have made the tokenizer more semantically aware to make the model perform better. What do I mean by that? Let me give you an example. (This isn't necessarily what they did exactly; just illustrating the idea.)
Let's take the gpt-oss-120b tokenizer as an example. Here's how a few pieces of text tokenize (I use "|" here to separate tokens):
Kill -> [70074]
Killed -> [192794]
kill -> [25752]
k|illed -> [74, 7905]
<space>kill -> [15874]
<space>killed -> [17372]
You have 3 different tokens which encode the same word (Kill, kill, <space>kill) depending on its capitalization and whether there's a space before it or not, you have separate tokens if it's the past tense, etc.This is not necessarily an ideal way of encoding text, because the model must learn by brute force that these tokens are, indeed, related. Now, imagine if you'd encode these like this:
<capitalize>|kill
<capitalize>|kill|ed
kill|
kill|ed
<space>|kill
<space>|kill|ed
Notice that this makes much more sense now - the model now only has to learn what "<capitalize>" is, what "kill" is, what "<space>" is, and what "ed" (the past tense suffix) is, and it can compose those together. The downside is that it increases the token usage.So I wouldn't be surprised if this is what they did. Or, my guess number #2, they removed the tokenizer altogether and replaced them with a small trained model (something like the Byte Latent Transformer) and simply "emulate" the token counts.
- This is a great piece of data, but only a piece of the actual question that we need to answer, which is:
For a given input, how many tokens will be used for an answer, and how high quality will that answer be?
Measuring the tokenizer is just one input into the cost-benefit tradeoff.
- I'm really surprised that:
1. Anthropic has not published anything about why they made the change and how exactly they changed it
2. Nobody has reverse engineered it. It seems easy to do so using the free token counting APIs (the Google Vertex AI token count endpoint seems to support 2000 req/min = ~3million req/day, seems enough to reverse engineer it)
- This is the rugpull that is starting to push me to reconsider my use of Claude subscriptions. The "free ride" part of this being funded as a loss leader is coming to a close. While we break away from Claude, my hope is that I can continue to send simple problems to very smart local llms (qwen 3.6, I see you) and reserve Claude for purely extreme problems appropriate for it's extreme price.
by RITESH1985
0 subcomment
- Token counting matters a lot when agents are running long action chains. The hidden cost is retry loops — when an agent action times out and the agent retries, it re-sends the full context including all previous tool call results. A single failed payment call can cost 3x the tokens of a successful one. Observability at the token level is one thing, but you also need observability at the action level — did this side effect actually execute or did it fail silently?
by Esophagus4
3 subcomments
- Anyone have good tips or resources on token management best practices? Because I’ve hit the limiter with one single prompt now on Opus 4.7.
What I’m reading so far seems to be:
-selective use of models based on task complexity
-encoding large repos into more digestible and relevant data structures to reduce constant reingesting
-ask Claude to limit output to X tokens (as output tokens are more expensive)
-reduce flailing by giving plenty of input context
-use Headroom and RTK
-disable unused MCP, move stuff from CLAUDE.md to skills
But I’d love to learn if anyone has any good tips, links, or tools as I’m getting rate limited twice a day now.
- Aren't these increases offset by the quality of the responses and reducing the iterations needed to fine-tune the responses?
by onchainintel
0 subcomment
- Many comparisons between 4.6 & 4.7 at https://tokens.billchambers.me/leaderboard
My prompt was 40% more tokens using Opus 4.7.
by great_psy
4 subcomments
- Is there any provided reason from anthropic why they changed the tokenizer ?
Is there a quality increase from this change or is it a money grab ?
by mudkipdev
3 subcomments
- Why do you need an API key to tokenize the text? Isn't it supposed to be a cheap step that everything else in the model relies on?
by sergiopreira
1 subcomments
- An interesting question is whether the tokenizer is better at something measurable or just denser. A denser tokenizer with worse alignment to semantic boundaries costs you twice, higher bill and worse reasoning. A denser tokenizer that actually carves at the joints of the model's latent space pays for itself in quality. Nobody outside Anthropic can answer which it is without their eval suite, so the rugpull read is fair but premature. Perhaps the real tell will be whether 4.7 beats 4.6 on the same dollar budget on the benchmarks you care about, not on the per-token ones Anthropic publishes.
- I just asked Claude about defaulting to 4.6 and there are several options. I might go back to that as default and use --model claude-opus-4-7 as needed. The token inflation is very real.
by tomglynch
1 subcomments
- Interesting findings. Might need a way to downsample images on upload to keep costs down.
- Okay, but what about output tokens?
- [dead]
- [dead]
by chattermate
0 subcomment
- [dead]
- [dead]