FRESH

Hacker News

Home

Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

738 points by cmaster11

by bcherny

54 subcomments

Hey all, Boris from the Claude Code team here.
We've been investigating these reports, and a few of the top issues we've found are:
1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude.
2. People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins. This was the case for a surprisingly large number of users, and we are actively working on (a) improving the UX to make these cases more visible to users and (b) more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage.
In the process, we ruled out a large number of hypotheses: adaptive thinking, other kinds of harness regressions, model and inference regressions.
We are continuing to investigate and prioritize this. The most actionable thing for people running into this is to run /feedback, and optionally post the feedback ids either here or in the Github issue. That makes it possible for us to debug specific reports.

by chandureddyvari

16 subcomments

Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.
I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.
That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”
On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.
I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.
My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best

by SkyPuncher

8 subcomments

I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.
Here’s what I’ve done to mostly fix my usage issues:
* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.
* keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.
* compact after 200k tokens as soon as I reasonably can. I have no data but my usage absolutely sky rockets as I get into longer sessions. This is the most frustrating thing because Anthropic forced the 1M model on everyone.

by geeky4qwerty

10 subcomments

I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.
For those not in the Google Gemini/Antigravity sphere, over the last month or so that community has been experiencing nothing short of contempt from Google when attempting to address an apparent bait and switch on quota expectations for their pro and ultra customers (myself included). [1]
While I continue to pay for my Google Pro subscription, probably out of some Stockholm Syndrome, beaten wife level loyalty and false hope that it is just a bug and not Google being Google and self-immolating a good product, I have since moved to Kiro for my IDE and Codex for my CLI and am as happy as clam with this new setup.
[1] https://github.com/google-gemini/gemini-cli/issues/24937

by comandillos

3 subcomments

Quite scared by the fact that the original issue pointing out the actual root cause of the issue has been 'Closed as not planned' by Anthropic.
https://github.com/anthropics/claude-code/issues/46829

by oldnewthing

1 subcomments

If this helps, I rolled back to version 2.1.34. Here is the ~/.claude/settings.json blurb I added:
```
  "effortLevel": "high",
  "autoUpdatesChannel": "stable",
  "minimumVersion": "2.1.34",
  "env": {
      "DISABLE_AUTOUPDATER": 1,
      "CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING": 1
  }
```
I also had to:
1. Nuke all other versions within /.local/share/claude/versions/ except 2.1.34. 2. Link ~/.local/bin/claude to claude -> ~/.local/share/claude/versions/2.1.34
This seems to have fixed my running out of quota issues quickly problems. I have periods of intense use (nights, weekends) and no use (day job). Before these changes, I was running out of quota rather quickly. I am on the same 100$ plan.
I am not sure adaptive thinking setting is relevant for this version but in the future that will help once they fix all the quota & cache issues. Seriously thinking about switching to Codex though. Gemini is far behind from what I have tried so far.

by WarmWash

2 subcomments

I did my (out of the ordinary) taxes this year using agents, kind of as an experiment and kind of to save ~$750. Opus 4.6 max in CC, 5.4 xhigh in codex, and 3.1 high in antigravity. All on the $20/mo plans.
I have a day job, a side business, actively trade shares options and futures, and have a few energy credit items.
All were given the same copied folder containing all the needed documents to compose the return, and all were given the same prompt. My goal was that if all three agreed, I could then go through it pretty confidently and fill out the actual submission forms myself.
5.4 nailed it on the first shot. Took about 12 minutes.
3.1 missed one value, because it decided to only load the first 5 pages of a 30 page document. Surprisingly it only took about 2 minutes to complete though. A second prompt and ~10 seconds corrected it. GPT and Gemini now were perfectly aligned with outputs.
4.6 hit my usage limit before finishing after running for ~10 minutes. I returned the next day to have it finish. It ran for another 5 minutes or so before finishing. There were multiple errors and the final tax burden was a few thousand off. On a second prompt asking to check for errors in the problem areas, it was able to output matching values after a couple more minutes.
For my first time using CC and 4.6 (outside of some programming in AG), I am pretty underwhelmed given the incessant hype.

by jameson

1 subcomments

I'm noticing a fair number of degradation of Claude infrastructure recently and makes me wonder why they can't use Claude to identify or fix these issues in advance?
It seems a counter intuitive to Anthropic's message that Claude uncovered bugs in open source project*.
[*] https://www.anthropic.com/news/mozilla-firefox-security

by wg0

2 subcomments

Been experiencing similar issues even with the lower tier models.
Fair transactions involve fair and transparent measurements of goods exchanged. I'm going to cancel my subscription this month.

by delbronski

0 subcomment

Ever since this change they announced:
https://www.reddit.com/r/ClaudeAI/comments/1s4idaq/update_on...
It’s been unusable for me as my daily coding agent. I run out of credits in the pro account in an hour or so. Before that I had never reached the session limit. Switched back to Junie with Gemini/chatgpt.

by meetingthrower

9 subcomments

I don't get it. Last week on the 100 bucks plan I generated probably 50k LOC (not a quality measure for sure!) and just barely kissed the weekly limit. I did get rate limited on some sessions for sure, but that's to be expected.
I'm curious what are people doing that is consuming your limits? I can't imagine filling the $200 a month plan unless I was essentially using Claude code itself as the api to mass process stuff? For basic coding what are people doing?

by pxc

2 subcomments

It's a bit shocking to me how opaque the pricing for the subscription services by the frontier labs is. It's basically impossible for people to tell what they're actually buying, and difficult to even meaningfully report or compare experiences.
How is this normal?

by tedivm

2 subcomments

Something similar is happening with GitHub Copilot too. It's impossible to know what a "request" is and some change in the last couple of months has seen my request usage go up for the same style of work. Toss in the bizarre and impossible to understand rate limiting that occurs with regular usage and it's pretty obvious that these companies are struggle to scale.

by MeetingsBrowser

3 subcomments

I pay for the lowest plan. I used to struggle to hit my quota.
Now a single question consistently uses around 15% of my quota

by themantalope

1 subcomments

I’ve switched to open code and openrouter.
I only did the $20/month subscription since 9/2025
It was great for about 5 months, amazing in fact. I under utilized it.
For the past month, it’s basically unusable, both Claude code and just Claude chat. 1-2 prompts and I’m out. Last week I prob sent a total of 15 messages to Claude and was out of daily and weekly usage each day.
I get that the $20/month subscription isn’t a money maker for them, and they probably lose money. But the experience of using Claude has been ruined

by cmaster11

1 subcomments

For whoever else is having the same problems, worth voting these kind of issues. There needs to be more transparency over what goes on with our subscriptions.

by GodelNumbering

0 subcomment

In the anticipation of a future where,
a) quotas will get restricted
b) the subscription plan prices will go up
c) all LLMs will become good enough at coding tasks
I just open sourced a coding agent https://github.com/dirac-run/dirac
The entire goal is to be token efficient (over 50% cheaper), and by extension, take advantage of LLM's better reasoning at shorter context lengths
This really started as an internal side project that made me more productive, I hope it will help others too. Apache 2.0
Currently it still can't compete the subsidized coding plan rates using Anthropic API pricing though (even though it beats CC while both use API key), which tells me that all subscription plan operators are losing money on such plans

by hgoel

1 subcomments

I've experienced none of the problems I've seen people complaining about here (5x plan), Claude has been working pretty well and I've been using it constantly without exhausting any of my quotas.
Yet, there must obviously be something different for so many people to be reporting these issues.
I feel for the Anthropic devs that have to deal with this, having to figure out what setup everyone has, what their usage patterns are to filter out the valid reports, and then also deal with the backlash from people that were just pulling obvious footguns like having a ton of skills/MCPs polluting their context window.

by weavie

2 subcomments

How good are local LLMs at coding these days? Does anyone have any recommendations for how to get this setup? What would the minimum spend be for usable hardware?
I am getting bored of having to plan my weekends around quota limit reset times...

by zkmon

2 subcomments

Unless the agent code is open-sourced, there is hardly any transparency in how the agent is spending your tokens and how does it calculate the tokens. It's like asking your lawyer why they charged some amount.

by Nic0

2 subcomments

I'm i alone to think that it become slower that usual to get responses?

by nickstinemates

2 subcomments

It feels so weird to me - people are exhausting their quotas while I am trying very hard to even reach mine with the $200 plan.
We're generating all of the code for swamp[1] with AI. We review all of that generated code with AI (this is done with the anthropic API.) Every part of our SDLC is pure AI + compute. Many feature requests every day. Bug fixes, etc.
Never hit the quota once. Something weird is definitely going on.
1: https://github.com/systeminit/swamp

by monological

0 subcomment

My usage limits were reset this morning. I'm already 90% through my weekly limits. This have _never_ happened before. They should reset the limits for everyone.

by yalogin

0 subcomment

So this is trending towards new prices and quotas just like your Netflix pricing. The cost of this infra is high or they have realized they have hit a tipping point in usage and they can raise prices and people will pay, just like Netflix.

by spiderfarmer

3 subcomments

That’s why I switched to Codex. It’s so much more generous and in my experience, just as good. Also, optimizing your setup for working with agents can easily make a 5x difference.

by 0xbadcafebee

3 subcomments

Please remember you do not need Anthropic. There are cheaper subscriptions with higher rate limits. Comparison of subscriptions to API: https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/... Score/price comparison: https://benchlm.ai/llm-pricing
Opus is not worth the moat, there are multiple equivalent models, GLM 5.1 and Kimi K2.5 being the open ones, GPT 5.4 and Gemini 3.1 Pro being closed. https://llm-stats.com/ https://artificialanalysis.ai/leaderboards/models https://benchlm.ai/
Even API use (comparatively expensive) can be cheaper than Anthropic subscriptions if you properly use your agents to cache tokens, do context-heavy reading at the beginning of the session, and either keep prompt cache alive or cycle sessions frequently. Create tickets for subagents to do investigative work and use smaller cheaper models for that. Minimize your use of plugins, mcp, and skills.
Use cheaper models to do "non-intelligent" work (tool use, searching, writing docs/summaries) and expensive models for reasoning/problem-solving. Here's an example configuration: https://amirteymoori.com/opencode-multi-agent-setup-speciali... A more advanced one: https://vercel.com/kb/guide/how-i-use-opencode-with-vercel-a...

by bushido

0 subcomment

Tangentially related to some of the issues a lot of people are facing, especially the ones where Claude keeps rechecking/scanning the same files over and over.
Ask claude code to give you all the memories it has about you in the codebase and prune them. There is a very high chance that you have memories in there which are contradicting each other and causing bad behavior. Auto-saved memories are a big source of pollution and need to be pruned regularly. I almost don't let it create any memories at all if I can help it.
Disclaimer: I'm also burning through usage very quickly now - though for different reasons. Less than 48 hours to exhaust an account, where it used to take me 5-6 days with the same workload.

by iLoveOncall

0 subcomment

It's very easy to calculate the actual cost given they list the exact tokens used. If I take the AWS Bedrock pricing for Opus 4.6 1M context (because Anthropics APIs are subsidized and sold at a loss), here's what each costs:
Cache reads cost $0.31
Cache writes cost $105
Input tokens cost $0.04
Output tokens cost $28.75
The total spent in the session is $134.10, while the Pro Max 5x subscription is $100.
Even taking the Anthropics API pricing, we arrive at $80.58. Below the subscription price, but not by much.
It's just the end of the free tokens, nothing to see here. It's easy to feel like you're doing "moderate" or even "light" usage because you use so little input tokens, but those "agentic workflows" are simply not viable financially.

by voisin

0 subcomment

It is pretty obvious to me that Anthropic wasn’t prepared with sufficient infrastructure to handle the wave of OpenAI/DoD refugees. Now everyone is getting throttled excessively and Claude is essentially unusable beyond chatting. Their big new release of Cowork is even worse than Claude Code for blasting through session limits.
I am tired of all the astroturf articles meant to blame the user with “tips” for using fewer tokens. I never had to (still don’t) think of this with Codex, and there has been a massive, obvious decline between Claude 1 month ago and Claude today.

by rzkyif

1 subcomments

My personal experience is way different: I struggle to burn through more than 50% of the 5 hour limit
For context, with Google AI Pro, I can burn through the Antigravity weekly limit in 1-2 hours if I force it to use Gemini 3.1 Pro. Meanwhile Gemini 3 Flash is basically unlimited but frequently produces buggy code or fail to implement things how I personally would (felt like it doesn't "think" like a software dev)
I also tried VS Code + Cline + OpenRouter + MiniMax M2.7. It's quite cheap and seems to be better than Gemini 3 Flash, but it gets really pricy as the context fills up because prompt caching is not supported for MiniMax on OpenRouter. The result itself usually needs 3-6 revisions on average so the context fills up pretty often
Eventually I got Claude Max 5x to try for a month. VS Code + Claude Code extension on a ~15k lines codebase, model set to "Default", and effort set to "Max". So far it's been really good: 0-2 revisions on average, and most of the time it implements things exactly how I would or better. And, like I said, I can only consume 40-60% of the 5-hour limits no matter how hard I try
Granted, I'm not forcing it to use Opus like OP (nor do I use complicated skills or launch multiple tasks at the same time), but I feel like they really nailed the right balance of when to use which model and how to pass context between the them. Or at least enough that I haven't felt the need to force it to use Opus all the time

by wolvoleo

0 subcomment

Yeah perplexity used to be great but they've also clamped down on the 20€ plan. Only one deep research query was enough to block me until the end of the month.
The thing is, if it's going to be this expensive it's not going to be worth it for me. Then I'll rather do it myself. I'm never going to pay for a €100 subscription, that's insane. It's more than my monthly energy bill.
Maybe from a business standpoint it still makes sense because you can use it to make money, but as a consumer no way.

by sailingcode

0 subcomment

I had Max plan and never reached its limit despite constantly working. Now I use the Pro plan and regularly reach the 5h limit as well as the weekly limit, as expected. I found that it makes a huge difference if you provide clear context when developing code. If you leave open room for interpretation, Claude Code uses tokens up much faster than in a defined context. The same is true for his time to answer getting longer if there isn't much documentation about the project.

by anonfunction

0 subcomment

A little off topic, but did Anthropic distill from an older OpenAI model? All the sudden over the last few days I'm getting a ton of em dashes in claude code responses!

by mchinen

1 subcomments

I've been feeling the squeeze too. I've tried switching between different models as a test, I can at least say it feels like the limits are about half of what they used to be a few months ago. I'd be totally willing to concede that this is just my perception if Anthropic would only release some tools for measuring your usage.
In theory the /stats command tells you how many tokens you've used, which you could use to compute how much you are getting for your subscription, but in practice it doesn't contain any useful info, it may be counting what is printed to the terminal or something - my stats suggest my claude code usage is a tiny amount of tokens, but they must be an extremely underestimated token count, or they are charging much more for the subscription than the API per token (which is not supposed to be the case).
Last week's free extra usage quota shed some light on this. It seems like the reported tokens are probably are between 1/30th to 1/100th of the actual tokens billed, from looking at how they billed (/stats went up 10k tokens and I was billed $7.10). With the API it should be $25 for a million tokens.

by time4tea

0 subcomment

Cancelled today after responses became code soup, skills ignored completely, and in response to a question told me "its A, no thats wrong, its B, no actually i dont know, please look for the answer".
Something materially changed in last 4 weeks.
Also, see made up boosterism about finding security holes everywhere. Its just fanning the flames of the industry worries about all the stupid account take overs.

by agrippanux

1 subcomments

For me, iterating with Claude begins to degrade at 200k context used, by 350k it’s crossed-fingers time, by 500k it’s essentially useless. Starting a fresh context after 300k is usually the best move imho. I wonder if people are hitting a case where Claude becomes both dumb and increasingly more expensive, essentially a doom loop.

by jedisct1

3 subcomments

GPT-5.4 works amazingly well.
I’ve moved away from Claude and toward open-source models plus a ChatGPT subscription.
That setup has worked really well for me: the subscription is generous, the API is flexible, and it fits nicely into my workflow. GPT-5.4 + Swival (https://swival.dev) are now my daily drivers.

by siliconc0w

1 subcomments

Switched back to codex for the promotion. Opus at the start of the year was GOAT- just relentless at chewing through hard problems. Now it spins on pretty easy work (took three swings just to edit a ts file) and my session is like 1-3 prompts (downgraded to the $20 plan but still)

by postalcoder

3 subcomments

I had used Claude Code max as my daily driver last year and this sort of drama was par for the course. It's why I migrated entirely to Codex, despite liking Claude, the harness, more.
There's this honeymoon period with Claude you experience for a month or two followed by a trough of disillusionment, and then a rebound after a model update (rinse and repeat). It doesn't help that Anthropic is experiencing a vicious compute famine atm.

by bob1029

0 subcomment

I've got a dual path system to keep costs low and avoid TOS violations.
For general queries and investigation I will use whatever public/free model is available without being logged in. Not having a bunch of prior state stacked up all the time is a feature for me. This is essentially my google replacement.
For very specific technical work against code files, I use prepaid OAI tokens in VS copilot as a "custom" model (it's just gpt5.4).
I burn through maybe $30 worth of tokens per month with this approach. A big advantage of prepaying for the API tokens is that I can look at everything copilot is doing in my usage logs. If I use the precanned coding agent products, the prompts are all hidden in another layer of black box.

by danbots

1 subcomments

Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that **gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help me, that it has better things to do.
On the flip side- Using Opus with a baby billy freeman persona has never been more entertaining.

by danbots

1 subcomments

Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that *gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help.
For something I spend all my time using- I’d rather iterate with Claude. The personality makes a big difference to me.

by desireco42

0 subcomment

I don't use Claude so this doesn't affect me, but I worry it will spoil the fun for me for following reason.
They inflated how much their tools burn tokens from day one pretty much,remember all the stupid research and reports Claude always wanted to do, no matter what you asked it. Other tools are much smarter so this is not such a big deal.
More importantly, these moves tend to reverberate in the industry, so I expect others will clamp down on usage a lot and this will spoil my joy of using AI without countring every token.
Burning tokens doesn't just wastes your allotment, it also wastes your time. This gave rise to turbo offering where you get responses faster but burn 2x your tokens.

by mridulmalpani

0 subcomment

I extensively used Claude till now and just tested Genini 3.1 pro yesterday via AI studio. In gemini cli, they don't offer this, i don't know why?
Taking a second opinion has significantly helped me to design the system better, and it helped me to uncover my own and Claude blindspots.
Also, agree that, it spent and waist a lot of token on web search and many a times get stuck in loop.
Going forward- i will always use all 3 of them. Still my main coding agent is Claude for now.. but happy to see this field evolving so fast and it's easy to switch and use others on same project.
No network effects or lock in for a customer. Great to live in this period of time.

by pks016

0 subcomment

If the Claude team care for feedback for the free model.
I'm using the free model via chat from the beginning. This is the first time, I'm seriously considering moving away from Claude. Before last month, Claude's Sonnet model was consistent in quality. But, now the responses are all over the place. It's hard to replicate the issue as it happens once in a while. I rarely encountered hallucinations from Claude models with questions from my domain however since last month I have observed abundance of them.

by ianberdin

0 subcomment

Yesterday I faced 5h window limit for the first time. I was surprised. Max 20x plan. Usually I work 12-15 hours per day 7 days a week with no limits. But yesterday it was under 3 hours… what a pity.

by kirby88

0 subcomment

I've been building an AI coding agent that using the exact same prompt than claude code, but uses a virtual filesystem to minify source code + the concept of stem agents (general agents that specializes during the conversation for maximum cache hit). The results on my modest benchmark is 50% of claude code cost and 40% of the time. https://github.com/kirby88/vix-releases

by ofjcihen

1 subcomments

Been running into the same issue since a week or 2 ago on Opus.
To be fair I have a pretty loose harness and pattern but it’s been enough to pull in 20k in bounties a month for a long time without going over plan with very little steering (sometimes days of continuous work)
That being said I’ve figured this was coming for a long time and have been slowly moving to local models. They’re slower but with the right harnesses and setup they’re still finding much the same amount in bounties.

by hyperionultra

0 subcomment

Vote with wallet. The voting continues until product improve or die.

by aeneas_ory

0 subcomment

Besides some of the obvious hacks to reduce token usage, properly indexed code bases (think IntelliJ) reduce token usage significantly (30%-50%, while keeping or exceeding result quality compared with baseline) as shown with https://github.com/ory/lumen
Anthropic is not incentivized to reduce token use, only to increase it, which is what we are seeing with Opus 4.6 and now they are putting the screws on

by bojangleslover

1 subcomments

That's weird, I'm on the $100/mo and I use it for around 2-4hrs a day often with multiple terminal windows and I never even hit 20% of my quota.

by mrbonner

0 subcomment

Unverifiable software stack now amplified with LLM undetermistic. This while thing starts to feel like we are building on top a giant house of card!

by anonyfox

0 subcomment

Essentially I also am now using sonnet instead of opus most of the time as a default. Even a single project only coding session with opus without any external plugins or skills won’t make it to the 5hr mark now before limits claw in. And the weekly limit is even more brutal now it seems, reaching 50%+ in like ~2 days now easily … with mostly sonnet! On the highest 20x plan!

by stavros

0 subcomment

It's crazy, a few weeks ago the limits would comfortably last me all week. This week, I've used up half the limit in a day.

by auggierose

0 subcomment

I switched to Codex, it's a monster compared to Opus.

by algoth1

0 subcomment

Wasn't Antrophic previously offering double the token usage outside busy hours? Now they are counting tokens back at normal rate. But yeah, it's not good. I use codex because claude insists in peaking at and messing with folders and file outside its work area though

by 10keane

1 subcomments

this same pattern seems to occur every time a new model is about to release. i didnt notice the usage problem - i am on 20x. but opus 4.6 feels siginificantly dumber for some reason. i cant qualitify it, but it failed on everyday tasks where it used to complete perfectly

by jubilanti

0 subcomment

I'm also hitting the limits in a day when it would last the entire week. The service is literally worth 4x to 6x less. Imagine I go to my favorite restaurant and I pay the same for 1/5th of the food. Bye bye, you have to vote with your wallet.

by heyitsaamir

0 subcomment

It would be really nice to have improved transparency in token usage and throttling imo.

by tiku

1 subcomments

Went with Kimi and z.ai a while back, no regrets yet. When I started using it the limit was far away but Anthropic moves the goalposts, tried to get my money back but they rejected it. Lesson learned, never buy a full year.

by mortsnort

0 subcomment

I think this comes from Anthropic recently implementing auto routing of model effort. You can manually set effort with /effort in CC.
It does seem like this new routing is worse for the consumer in terms of code quality and token usage somehow.

by bad_haircut72

0 subcomment

They also need to fix the 30 second lag between submitting the request and actually starting to get tokens back - it used to be instant, and still is at work where we use Anthropic models via an enterprise copilot subscription.

by nprateem

0 subcomment

I've seen ridiculously fast quota usage on antigravity too, where sometimes lots of work is possible, then it all goes literally within 4 questions.
Probably a combination of it being vibe coded shit and something in the backend I expect.

by armchairhacker

0 subcomment

Make an AI usage tracker like https://marginlab.ai/trackers/codex/. These hearsay anecdotes prove nothing.

by oybng

0 subcomment

Cancelled my subscription after repeatedly hitting ridiculously low limits. Unfortunately since off-peak free usage was increased there are way more timeouts and failed requests, but hey at least it's free.

by dr_dshiv

1 subcomments

"Hey Claude, can you help me create a strategy to optimize my token use so I don't run into limits so often?" --> worked for me! I had two $200 plans before and now I am cool despite all day use

by catketch

0 subcomment

stuff is getting goofy. I can blow through claude's session limit on sonnet, i don't even bother with opus now. same prompts and code for codex and it will hardly put a dent in the quota ($200/yr claude vs $20/mo codex). This is not with any crazy parallel agents, mcps, or skills.... pretty much vanilla installs, with some projects using beads.
I don't have the receipts, but I think they were somewhat closer in Jan/Feb.

by pawelduda

0 subcomment

50 days ago I wrote this [1] as the world seemed high on AI and it gave me crypto bubble vibe.
Since then, I've been seeing increased critique of Anthropic in particular (several front page posts on HN, especially past few days), either due to it being nerfed or just straight up eating up usage quota (which matches my personal experience). It appears that we're once again getting hit by enshittiffication of sorts.
Nowadays I rely a lot on LLMs on a daily basis for architecture and writing code, but I'm so glad that majority of my experience came from pre-AI era.
If you use these tools, make sure you don't let it atrophy your software engineering "muscles". I'm positive that in long run LLMs are here to stay. The jump in what you can now self-host, or run on consumer hardware is huge, year after year. But if your abilities rely on one vendor, what happens if you come to work one day and find out you're locked out of your swiss army knife and you can no longer outsource thinking?
[1] https://news.ycombinator.com/item?id=47066701

by behole

0 subcomment

I shred my Maxx5 in 2 hours on the reg this week! Glm here I come!

by Rekindle8090

1 subcomments

I put this in a reply but I'm also posting it as a general comment:
Please unsubscribe to these services and see how they perform:
"Maybe if I spend more money on the max plan it will be better" > no it will be the same "Maybe if I change my prompt it will work" > no it will be the same "Maybe if I try it via this API instead of that API it will improve" > no it will be the same.
Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.
It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.
None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.
Stop paying more, stop buying these pro max plans, hoping it will get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.
When competitors have a better product these issues go away When a new model is released these issues don't exist
I was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.

by elthor89

1 subcomments

Are there local models dedicated to programming already any good? That could be a way to deal with anthropic or others flipflopping with token usage or limits

by lvl155

0 subcomment

Constant complaints about Anthropic. Not much on OAI/Codex. It seems people should just use OAI and come back when they realize compute isn’t free elsewhere.

by niklasd

0 subcomment

We also experienced hitting our Claude limits much earlier than before during the last two weeks. Up to a degree where we were thinking it must be a bug.

by tiahura

0 subcomment

Also pro max 5x and hit quota for first time yesterday.

by TheRealPomax

0 subcomment

And in classic Anthropic fashion at this point, their issues appear to just be for show. No one triages them, no one responds to them.

by docheinestages

0 subcomment

Anthropic paved the path for agentic coding and their pricing made it possible for masses of people to discover and experiment with this new style of development. Their Claude Code plans subsidized usage of models so much that I'm sure they must've had negative margin for quite some time. But now that they have acquired a substantial user base, it makes sense for them to dial back and become more greedy. These quiet and weird changes to the behavior of Claude in the recent weeks must have been due to both this increased greed and their struggles with scaling.
What I wish for right now is for open-weight models and hardware companies (looking at you Apple) to make it possible to run local models with Opus 4.6-level intelligence.
@Anthropic I've cancelled my subscription. Good luck :)

by ozozozd

1 subcomments

Pretty sure OpenCode is not subsidizing, and across Codex 5.x always on xhigh, Claude Opus 4.6 on high effort and a bunch of Chinese models, I only burned about $50 over the last month.
I don’t understand why people insist on these subscriptions and CC.
Fanboyism is a bit too hardcore at this point. Apple fanboys look extremely prudent compared to this behavior.

by bakugo

0 subcomment

This is your regular friendly reminder that these subscriptions do not entitle you to any specific amount of usage. That "5x" is utterly meaningless because you don't know what it's 5x of.
This is by design, of course. Anyone who has been paying even the slightest bit of attention knows these subscriptions are not sustainable, and the prices will have to go up over time. Quietly reducing the usage limits that they were never specific about in the first place is much easier than raising the prices of the individual subscription tiers, with the same effect.
If you want to know what kind of prices you'll be paying to fuel your vibe coding addiction in a few years, try out API pricing for a bit, and try not to cry when your 100$ credit is gone in 2 days.

by delduca

0 subcomment

I noticed the same in last weeks. I canceled my Max 5X and subscribed to Copilot (with Opus 4.6).
It is hard now to hit the limit...

by dboreham

0 subcomment

Random data point: I beat on Claude pretty much every day and have never run into limits of any kind.

by sdevonoes

0 subcomment

I guess it’s better to step down now that we can rather than wait until it becomes impossible (Stockholm syndrome)
No FOMO

by semiquaver

0 subcomment

As an anecdote, I use the pro max 5x plan heavily for coding and have almost never hit a limit.

by lforster

0 subcomment

Lol imagine how much overcharging is going on for enterprise tokens. This is just the beginning.

by peterpanhead

0 subcomment

I don't understand Anthropic. Be consistent. Why do models deteriorate to shit, this is not good for workflows and or trust. What Opus 4.7 is gonna come out and again the same thing? Come on.

by gessha

0 subcomment

I’m processing some images(custom board game images -> JSON) with a common layout and basic structure and I exhausted my quota after just 30 images(pleb Pro account). I have 700 images to process…
What I did instead is tune the prompt for gemma 4 26b and a 3090. Worked like a charm. Sometimes you have to run the main prompt and then a refinement prompt or split the processing into cases but it’s doable.
Now I’m waiting for anyone to put up some competition against NVIDIA so I can finally be able to afford a workstation GPU for a price less than a new kidney.

by bit1993

0 subcomment

You know Emacs still works.

by holoduke

3 subcomments

I spend full 20x the week quota in less than 10 hours. How is that possible? Well try to mass translate texts in 30 languages and you will hit limits extremely quick.

by softwaredoug

0 subcomment

So glad I just pay by the token.

by qwertyforce

0 subcomment

thats exaclty why i prefer codex

by Achshar

0 subcomment

I feel like I am living in a bubble, no one seems to mention Antigravity in these discussions and I have not had any issues with Ultra subscription yet. It seems to go on forever and the Interface is so much better for dev work as compared to CC. (Though admittedly my experience with cc is limited).
I strongly believe google's legs will allow it to sustain this influx of compute and still not do the rug-pull like OAI or Anthropic will be forced to do as more people come onboard the code-gen use case.

by gavinray

1 subcomments

Codex is the only CLI I've had purely positive experiences with. Take that for what you will

by jLaForest

0 subcomment

After last week I canceled my claude subscription and bought the GitHub copilot subscription ($40/mo tier) so far I've been very happy, haven't hit any usage limits yet and seems like I won't ever at this rate

by jandrese

0 subcomment

I mean this is expected is it not? These companies burned unimaginable amounts of investor cash to get set up and now they have to start turning a profit. They can't make up for the difference with volume because the costs are high, so the only option is to raise prices.

by mannanj

3 subcomments

so basically the anthropic employee who responded says those 1h caches were writes were almost never accessed, so a silent 5m cache change is for our best interest and saves cost. (justifying why they did this silently)
however his response gaslights us because in the OPs opening post his math demonstrates this is not true, it shows reads 26x more so at least in his case the cache is not doing what the anthropic employee describes.
clearly we are being charged for less optimization here and being given the message (from my perspective by anthropic) that if you are in a special situation your needs don't matter and we will close your thread without really listening.

by Traubenfuchs

0 subcomment

Seems like the math ain‘t mathing for any ither but Anthropics pay-per-token API plan.
Try it out and you will quickly see how much money they‘d really like for your excessive usage.

by x86hacker1010

0 subcomment

Im sorry but I have to finally cancel, it’s gotten abysmal.

by rdevilla

1 subcomments

Bubble's bursting, get in.

by a7om_com

0 subcomment

[dead]

by bustah

0 subcomment

[dead]

by alexwelsh

0 subcomment

[dead]

by hadifrt20

0 subcomment

[dead]

0 subcomment

by brunooliv

4 subcomments

[flagged]

by vfalbor

0 subcomment

Some months ago, I created a software for this reason, it has no success, but the thing is that communities could reduce tokens consumption, not all is LLM, you can share things from API calls between agents. Even my idea was no success I think it is a good concept share things each others, if you have some interest it's called tokenstree.com

by rvz

2 subcomments

Why so many 'developers' complaining about Claude rate limiting them? You know you can actually....use local LLMs? instead of donating your money to Anthropic's casino?
I guess this is fitting when the person who submitted the issue is in "AI | Crypto".
Well there's no crying at the casino when, you exhaust your usage or token limit.
The house (Anthropic) always wins.