FRESH

Hacker News

LLM pricing has never made sense

31 points by Brajeshwar

by sponaugle

4 subcomments

"But there’s another challenge: local LLMs. It’s already possible to run LLMs on local hardware, and that’s only going to get easier in the future. Apple’s M-series chips are extremely good at doing this today. Open weight (read: free) models are widely available and good enough that most people probably couldn’t tell the difference. They also have the benefits of running on hardware that’s sipping power most of the time, rather than slurping it down in massive data centres."
This is such an odd and illogical conclusion. If a smaller model can be sufficient (which is not something I would have said), that smaller model can be ran in a datacenter. The idea that a small model running at home is 'sipping' while that same small model in a datacenter is 'slurping' is absurd. The datacenter will have much greater overall efficiency in both power usage and total cost to implement. Of course if you compare a small home model to a DC frontier model the power usage is different, but so is the output.

by gpapilion

1 subcomments

So recently I moved from a Anthropic model to a qwen 3.5 model running on my Mac to summarize ticket activity over 7 days. I used to do this manually with a colleague and it would take us a couple hours to go through. Opus took 58 seconds, and Qwen took 2.5 minutes. The quality of the qwen output was comparable, but the there was a 2.5x difference in time.
All that said I actually don’t think that matters much. I think we are dragging attention economy concepts in to ai responses, and it doesn’t matter. Both options saved me hours per week, and the difference between 3 and 1 minute may not be worth the additional cost.
Also there are times when the model output is much better with anthropic, but it’s not all the time. I think it becomes a question should we be using the best model for all questions?

by garethsprice

0 subcomment

> I don’t think it’s crazy to believe that people will also be running local inference on their phones in the next 5 years.
How about now? https://apfel.franzai.com/ (iOS/MacOS, runs the 3B param model already bundled for Siri) https://github.com/alichherawalla/off-grid-mobile-ai (Android, runs ~7B models on flagship phones at 15-30tok/s)
Foundation model investment feels like the bubble around fiber optics circa 2001 - a great technology being pushed forward by a speculative mania as it seems like it'll be useful in some profitable way, but nobody's quite sure how.

by Almured

1 subcomments

I have been talking about this with a colleague this morning. The 20$ option is just a trail version, I could not do any real work with.
And I wonder whether then subscription model is just a way to create a demand for API. For example, I’m building this portal with the support of an LLM for coding, but then I will need to have an LLM using API token to run the platform giving them additional revenue, a demand that did not exist without the coding I did with the subscription.

by awedisee

1 subcomments

I get the article and the take and I don't think you are wrong, but I would like you to further your thinking and come up with some improvements or fixes.
I get it you may not work in this industry or know the workings of how an AI company seeking frontier AGI WOULD operate but its helpful in connecting ideas and concepts by adding a proposed solution if for nothing more than to show the direction of your thinking.
Sure some people may talk smack about your idea but I've learned that the difference between someone who complains for the argument of complaints and those who complains to fix things have different forms of thinking. The latter may be wrong but its an indicator of HOW that person thinks which is always valuable.
Thanks for the blog.

by starkeeper

0 subcomment

by Till_Opel

0 subcomment

by vdelpuerto

0 subcomment

by getshiprelay

0 subcomment

by jditu

0 subcomment