FRESH

Hacker News

Usage-based pricing killing your vibe, here's how to roll your own local AI

44 points by Bender

by _345

4 subcomments

It's a seriously degraded experience from a developer's perspective. Okay you've got one local LLM installed finally after configuring everything perfectly, what happens when you want to run a second instance? Now you've blown past your vram and system ram limits, and you're stuck to just one.
Furthermore, the model they recommend doesn't quite reach ~gpt-5.4-mini level performance- that quality dip means you may as well just pay for something like Kimi K2.6 via openrouter if you want a something ~>= sonnet 4.6 in performance as a backup for when you run out of anthropic/openai usage.

by trashface

1 subcomments

I like how in copilot now, I need to consider in vscode whether to accept a tab-complete, because if its coming from copilot it will count against my usage, whereas if it is coming from the ide tools it will not. So I'm like, making individual decisions on whether to type something myself or just "use up" some completion budget. Funny to get nickel and dimed like this by one of the biggest companies in the world.

by roscas

1 subcomments

BTW, LMStudio and a few others are really amazing. They allow you to download models from HF and manage many details before load them. A medium pc with an 8 or 10gb graphics card is already a nice setup to run many models, that are really good. You can also run Ollama that is very simple to use and help you code on vscodium with Continue. Pretty nice!

by AussieWog93

0 subcomment

I've tried these small models and they're nowhere near as good as Claude or GPT-5.
The new ones running on a 16GB M1 are maybe GPT-4 level (with decent performance to be fair).
I wonder if it's possible to make some hyper-overturned model that, say, does nothing but program in Python get SOTA-ish performance in that narrow task.

by janice1999

2 subcomments

by efficax

2 subcomments

qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.

by gowld

0 subcomment

$20/month cloud plan is definitely better than anything you'll get locally.
Cost is not a reason to go local.

by roscas

0 subcomment