by CharlieDigital
12 subcomments
- $1500/mo is $18,000/seat/annum.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
- The $1,500 frames this as a per-engineer ceiling, but the unit of consumption shifted under everyone's feet — engineers don't issue prompts anymore, they kick off agent loops that fan out into 20–100 tool calls and 10–50 LLM calls per task. A single agent run on a non-trivial refactor burns more tokens than the engineer typing for an hour. So the cap doesn't constrain engineers, it constrains agent-task throughput per engineer — which is a different thing. The leaderboard-vs-cap debate misses that the metric worth bounding is $/successful-PR or $/correct-completion, not $/engineer-month; variance between cheap and expensive tasks at the same budget is 10–50x now rather than 2–3x. Per-tool caps eventually force every team to ask: which workflows justify burning through tokens, and which should be cached, retrieved, or templated.
- The $1500 number is less interesting than the fact that they hit a ceiling at all. Most engineering teams I've talked to have no idea what their AI spend is per developer because it's buried in a consolidated cloud bill. Having a hard cap forces two useful conversations: what workflows actually justify API calls vs local inference, and whether the output is being measured against any real productivity metric. Without that feedback loop it's just a race to see who can burn tokens fastest.
by PessimalDecimal
5 subcomments
- These are still at currently subsidized prices. We'll see if they think they're getting $1500/month of value when that buys significantly fewer tokens.
- How many more months do we need to wait, until big companies realize that flash models work just fine if you:
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
by LurkandComment
0 subcomment
- 1) This happened because they fundementally misunderstand how to use AI and how AI is priced
2) Most organizations are throwing everything in for analyses and not limiting the answer they want. You need to be specific of about what you analyze and what answers you want
3) People undervalue prompting or templated responses. I will have written. validated and sanity checked a prompt several times and run it across several models before I say its ready for use. But when it is, I know what it will give me and that the scope of its research and answer is as close to what I want as it can be. As little excess as I can. This all saves tokens
- If you estimate 10k salary per engineer that means the moment it’s cheaper for them to hire another engineer but that doesn’t mean it’s improving productivity 15% but if 15% is the moment it stopped being better than another human we can assume 7.5%?
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
- Seems odd limit, especially since it highly dependant on Token provider used, with Opus this is not much and could easily be burnt in a week or less, but with something like deepseek the 1500 can literarily be an annual budget.
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
by epsteingpt
0 subcomment
- Uber engineers reported that loading their workspace and pulling recent commits exhausted that AI limit for Claude Code (4.8 x-high) immediately.
- They are also beholden to enterprise pricing and can't use the subsidized consumer max plans.
- I have strong conviction that companies will now choose tech stack/programming languages based on 'tokenomics'. I am vibe coding using Clojure, a language I can read but cannot write and I never hit the usage limits even when using the latest model on Claude. I have similar experience with F#, which is a bit more verbose than clojure but absolutely beats every OOP language, Python, Typescript etc.
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
- A lot of things can be done with local models.
by ChrisArchitect
0 subcomment
- Related:
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
https://news.ycombinator.com/item?id=48335388