by readthemanual
0 subcomment
- I think the main issue I have with the article is that author whole argument is based on 'Qwen wouldn't run at a loss'. But why wouldn't it?
Depsite it being a business, there might be a number of arguments why they decide to run without profit for now: from trying to expand the user base, to Chinese government sponsoring Chinese AI business.
by hirako2000
5 subcomments
- > Qwen 3.5 397B-A17B is a good comparison
It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.
That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.
That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months
by overrun11
5 subcomments
- A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't. It's just become a meme uncritically regurgitated.
This sloppy Forbes article has polluted the epistemic environment because now theres a source to point to as "evidence."
So yes this post author's estimation isn't perfect but it is far more rigorous than the original Forbes article which doesn't appear to even understand the difference between Anthropic's API costs and its compute costs.
- > Cost remains an ever present challenge. Cursor’s larger rivals are willing to subsidize aggressively. According to a person familiar with the company’s internal analysis, Cursor estimated last year that a $200-per-month Claude Code subscription could use up to $2,000 in compute, suggesting significant subsidization by Anthropic. Today, that subsidization appears to be even more aggressive, with that $200 plan able to consume about $5,000 in compute, according to a different person who has seen analyses on the company’s compute spend patterns.
This is the relevant quote from the original article.
by anonzzzies
5 subcomments
- I calculated only last weekend that my team would cost, if we would run Claude Code on retail API costs, around $200k/mo. We pay $1400/month in Max subscriptions. So that's $50k/user... But what tokens CC is reporting in their json -> a lot of this must be cached etc, so doubt it's anywhere near $50k cost, but not sure how to figure out what it would cost and I'm sure as hell not going to try.
by eaglelamp
8 subcomments
- If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.
Anthropic's models may be similar in parameter size to model's on open router, but none of the others are in the headlines nearly as much (especially recently) so the comparison is extremely flawed.
The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch based on gear count.
- How confident are you in the opus 4.6 model size? I've always assumed it was a beefier model with more active params that Qwen397B (17B active on the forward pass)
- “The article is right to separate compute cost from retail price — but the retail price baseline itself is arbitrary depending on where you run the model.
The same capability (e.g. Llama 3.3 70B with tool calling and 128K context) runs $3.00/1M tokens at model developer list price and $0.22/1M at Fireworks AI — a 93% gap for identical specs. That spread makes any “it costs Anthropic X” estimate depend entirely on which reference price you anchor to.
We track this live across 1,625 SKUs and 40+ vendors at a7om.com — the variance across the market is larger than most people realise when they back-calculate provider economics.”
by faangguyindia
3 subcomments
- Claude subscription is equivalant of spot instance
And APIs are on-demand service equivalant.
Priority is set to APIs and leftover compute is used by Subscription Plans.
When there is no capacity, subscriptions are routed to Highly Quantized cheaper models behind the scenes.
Selling subscription makes it cheaper to run such inference at scale otherwise many times your capacity is just sitting there idle.
Also, these subscription help you train your model further on predictable workflow (because the model creators also controls the Client like qwen code, claude code, anti gravity etc...)
This is probably why they will ban you for violating TOS that you cannot use their subscription service model with other tools.
They aren't just selling subscription, but the subscription cost also help them become better at the thing they are selling which is coding for coding models like Qwen, Claude etc...
I've used qwen code, codex and claude.
Codex is 2x better than Qwen code and Claude is 2x better than Codex.
So I'd hope the Claude Opus is atleast 4-5x more expensive to run than flagship Qwen Code model hosted by Alibaba.
by himata4113
3 subcomments
- What people don't realize is that cache is *free*, well not free, but compared to the compute required to recompute it? Relatively free.
If you remove the cached token cost from pricing the overall api usage drops from around $5000 to $800 (or $200 per week) on the $200 max subscription. Still 4x cheaper over API, but not costing money either - if I had to guess it's break even as the compute is most likely going idle otherwise.
- This is such a well-written essay. Every line revealed the answer to the immediate question I had just thought of
by jeff_antseed
0 subcomment
- the openrouter comparison is interesting because it shows what happens when you have actual supply-side competition. multiple providers, different quantizations, price competition. the spread between cheapest and priciest for the same model can be 3-5x.
anthropic doesn't have that. single provider, single pricing decision. whether or not $5k is accurate the more interesting question is what happens to inference pricing when the supply side is genuinely open. we're seeing hints of it with open router but its still intermediated
not saying this solves anthropic's cost problem, just that the "what does inference actually cost" question gets a lot more interesting when providers are competing directly
- Good article! Small suggestions:
1. It would be nice to define terms like RSI or at least link to a definition.
2. I found the graph difficult to read. It's a computer font that is made to look hand-drawn and it's a bit low resolution. With some googling I'm guessing the words in parentheses are the clouds the model is running on. You could make that a bit more clear.
- I have very naive question:
People in comments have assumption that Atropic 10 times bigger than chinese models so calc cost is 10 times more.
But from perspective of Big O notation only a few algorithms gives you O(N). Majority high optimized things provide O(N*Log(N))
So what is big O for any open model for single request?
- What CC costs internally is not public. How efficient it is, is not public.
…You could take efficiency improvement rates from previous models releases (from x -> y) and assume; they have already made “improvements” internally. This is likely closer to what their real costs are.
by brianjeong
1 subcomments
- These margins are far greater than the ones Dario has indicated during many of his recent podcasts appearances.
by akhrail1996
0 subcomment
- The comparison with Qwen/Kimi by "comparable architecture size" is doing a lot of heavy lifting. Parameter count doesn't tell you much when the models aren't in the same league quality-wise.
I wonder if a better proxy would be comparing by capability level rather than size. The cost to go from "good" to "frontier" is probably exponential, not linear - so estimating Anthropic's real cost from what it takes to serve Qwen 397B seems off.
by functionmouse
4 subcomments
- Was anyone under the impression that it does? Serious question. I've never heard that, personally.
by vbezhenar
1 subcomments
- Why does Claude charge 10x for API, compared to subscriptions? They're not a monopoly, so one would expect margins to be thinner.
by aurareturn
2 subcomments
- By the way, one of the charts in the article shows that Opus 4.6 is 10x costlier than Kimi K2.5.
I thought there was no moat in AI? Even being 10x costlier, Anthropic still doesn't have enough compute to meet demand.
Those "AI has no moat" opinions are going to be so wrong so soon.
- Is it fair to say the Open Router models aren't subsidized though? They make the case that companies on there are running a business, but there are free models, and companies with huge AI budgets that want to gather training data and show usage.
- Nobody gets RSI typing “iterate until tests pass”
- This article is hilariously flawed, and it takes all of 5 seconds of research to see why.
Alibaba is the primary comparison point made by the author, but it's a completely unsuitable comparison. Alibab is closer to AWS then Anthropic in terms of their business model. They make money selling infrastructure, not on inference. It's entirely possible they see inference as a loss leader, and are willing to offer it at cost or below to drive people into the platform.
We also have absolutely no idea if it's anywhere near comparable to Opus 4.6. The author is guessing.
So the articles primary argument is based on a comparison to a company who has an entirely different business model running a model that the author is just making wild guesses about.
- Did anthropic do the oldest SaaS sales trick in the 2010s SaaS playbooks ;)
- Well, IDK, I have used CC with API billing pretty extensively and managed to spend ~$1000 in one month more or less. Moved to a Max 20x subscription and using it a bit less (I'm still scared) but not THAT less and I'm around 10% weekly usage. I'm not counting the tokens, though.
- And on top of that, Anthropic does not run their own compute clusters do they? They probably get completely ripped by whoever is renting them the processors.
$200 worth of actual computation is an awful lot of computation.
- What this doesn't mention is the "cost" to the public: the inevitable bailouts after it all comes crashing down again, the massive subsidies that Datacenters get from tax payers, the fresh water they consume, the electricity price hikes for everyone else, the noise, air and water pollution and the massive health impact on the surrounding population of every datacenter. The jobs that it destroys and the innocent people it kills through use of the technology in military targeting and autonomous weapons usage.
- Tl;dr, their guesstimate:
> Anthropic is looking at approximately $500 in real compute cost for the heaviest users.
by beepbooptheory
5 subcomments
- Ok but so it does cost Cursor $5k per power-Cursor user?? Still seems pretty rough..
- > I'm fairly confident the Forbes sources are confusing retail API prices with actual compute costs
Aren't they losing money on the retail API pricing, too?
> ... comparisons to artificially low priced Chinese providers...
Yeah, no this article does not pass the sniff test.
- I easily go through two pro max $200/m accounts and yesterday got a third pro account when I ran out.
It’s worth it, but I know they aren’t making money on me. But, of course I’m marketing them constantly so…