FRESH

Hacker News

Tell HN: I cut Claude API costs from $70/month to pennies

38 points by ok_orco

by LTL_FTC

4 subcomments

It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines.

by 44za12

1 subcomments

This is the way. I actually mapped out the decision tree for this exact process and more here:
https://github.com/NehmeAILabs/llm-sanity-checks

by gandalfar

4 subcomments

by deepsummer

0 subcomment

As much as I like the Claude models, they are expensive. I wouldn't use them to process large volumes of data. Gemini 2.5 Flash-Lite is $0.10 per million tokens. Grok 4.1 Fast is really good and only $0.20. They will work just as well for most simple tasks.

by toxic72

0 subcomment

consider this for addtl cost savings if local doesnt interest you - https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...

by joshribakoff

0 subcomment

by dezgeg

0 subcomment

Are you also adding the proper prompt cache control attributes? I think Anthropic API still doesn't do it automatically

by arthurcolle

1 subcomments

by DeathArrow

0 subcomment

You also can try to use cheaper models like GLM, Deepseek, Qwen,at least partially.