FRESH

Hacker News

Home

DeepSeek V4 – almost on the frontier

622 points by indigodaddy

by wg0

8 subcomments

Deepseek v4 Pro feels like Claude Opus 4.6 in it's personality but here's what I did find out about costs:
I did cut loose Deepseek v4 on a decent sized Typescript codebase and asked it to only focus on a single endpoint and go in depth on it layer by layer (API, DTOs, service, database models) and form a complete picture of types involved and introduced and ensure no adhoc types are being introduced.
It developed a very brief but very to the point summary of types being introduced and which of them were refunded etc.
Then I asked it to simplify it all.
It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.
On Claude Opus I think (from past experience before price hikes) these two prompts alone would have burned somewhere between $9 to $13 easily with not much benefit.
Note - I didn't use Open router rather used the Deepseek API directly because Open router itself was being rate limited by Deep seek.

by itissid

1 subcomments

So RPI/QRSPI like skills (e.g. https://github.com/mattpocock/skills and https://github.com/humanlayer/humanlayer/tree/main/.claude/c... and https://github.com/dfrysinger/qrspi-plus ) for working with claude code work well enough for me that they can reliably* produce code that matches the plan/spec in a way they did not till December 2025.
I have a gut feeling that these models can do just as well, has someone run a reasonable size task — >=1-2 days of designing and planning — and see it work well with these models?
* For me what worked well was the grill me skill(or its variation) at the design stage, the hygiene I followed here was have it ask one question at a time, resolving dependencies at the design stage and reading the hashed out plan closely. The use of a couple of other MCP tools like a documentation server like deepwiki and arxiv for grounding. Other tricks I use are having high signal tests and having claude either be able to read logs and code at the same time or embedding it in the execution(e.g. as a debugger, repl or devtools)

by cedws

17 subcomments

The biggest differentiator for me: DeepSeek just does what I ask. I've tried using both GPT and Claude for reverse engineering recently, both refused. I even got a warning on my OpenAI account.

by deaux

18 subcomments

I'm surprised that people here don't care at all about these models openly training on your data, especially if you use them straight from the model developer. Whereas things like "GitHub now automatically opts everyone into using their code for model training" get hundreds of justifiably angry comments, I never see this brought up anymore on posts like these talking about using Chinese models through OpenRouter. This might be explained by "well they're different people", but the difference is very stark for that to be the whole explanation.

by cheshire_cat

3 subcomments

While the cost are lower than frontier models there are two factors that make DS4 Pro and K2.6 not as cheap as they might look.
For DS4 Pro there's a discount going on for the official API, which sometimes gets overlooked and mixed up in discussions. Simon uses the full price in the comparison, so that's not an issue here.
The other issue is that DS4 Pro and K2.6 often use way more reasoning tokens than the frontier models. In my testing there are certain pathological cases where a request can cost the same as with a frontier model because they use so much more tokens. To be fair I'm using DS and kimi via 3rd party providers, so they might have issues with their setups.
But if you look at the Artificial Analysis pages of the models you'll see that DSv4 Pro uses 190M tokens and K2.6 170M tokens for their intelligence benchmark, while GPT 5.5 (high) only used 45M.[0][1][2]
I recommend looking at the "Intelligence vs. Cost to Run Artificial Analysis Intelligence Index" ("Intelligence vs Cost" in the UI). The open source models are still cheaper to run, but not by as much as you'd think just looking at the token prices.
[0] https://artificialanalysis.ai/models/deepseek-v4-pro [1] https://artificialanalysis.ai/models/kimi-k2-6 [2] https://artificialanalysis.ai/models/gpt-5-5-high

by Accacin

1 subcomments

I tried DeepSeek via chat, and gave it a rather simple question:
"Can you tell me who was on series 8 of Taskmaster, and what's the general opinion about the series? No spoilers!"
It told me amongst other things that Paul Sinha was diagnosed with Parkinsons, as well as who the winner was.
Then I said, "But I said no spoilers!"
And it apologised for telling me Paul Sinha was diagnosed with Parkinsons.

by jdasdf

2 subcomments

I've been using v4 pro for the past few days and honestly in terms of quality it seems more or less on par with open AIs 5.4 or opus 4.6 (i havent tried 4.7)
To be clear, i'm not doing state of the art stuff. I mostly used it for frontend development since i'm not great at that and just need a decent looking prototype.
But for my purposes it's a perfectly good model, and the price is decent.
I can't wait for open model small enough for me to run locally come out though. I hate having to rely on someone elses machines (and getting all my data exfiltrated that way)

by naaqq

1 subcomments

DeepSeek’s official API has a cache hit rate of over 99% if you use it continuously within the same codebase for long sessions, so it’s much cheaper than frontier models. I have an example of 200M token session in claude code.

by gyoridavid

0 subcomment

I've connected it with my vscode copilot and took it for a ride. I've tried both flash and pro. For a small POC flash was sufficient enough, quite fast, and dirt cheap. It did stop a few times (maybe latency issue?) but it did a good job. I used the pro to do some heavy lifting, planning, etc. and it did a fantastic job. I paid ~10 cents for a small proof of concept, that worked exactly how I prompted it.
For me, this is a real alternative after I cancel my github copilot towards the end of the month..

by Havoc

2 subcomments

This gives me hope that when the subsidization circus ends and everyone is on pure usage then it won't be entirely exclusionary to mere mortals who don't have $200pm budgets.

by KronisLV

4 subcomments

I'm currently paying for Anthropic's Max subscription (the 100 USD one) and I quite often hit or approach the 5 hour limits, but usually get to around 60-80% of the weekly limits before they reset (Opus 4.7 with high thinking for everything, unless CC decides to spawn sub-agents with Haiku or something).
Those tokens are heavily subsidized, but DeepSeek's API pricing is looking really good. For example, with an agentic coding setup (roughly 85% input, 15% output and around 90% cache reads) I'd get around 150M tokens per month for the same 100 USD. Even at more output tokens and worse cache performance, it'd still most likely be upwards of 100M.

by curioussquirrel

0 subcomment

V4 is definitely a step-up from V3.2 on our multilingual benchmarks.
Two caveats: - when inferring through Openrouter, we've had a lot of issues with very slow speeds (TPS) and an occasional instability. I just checked and it's still 10-30 TPS on all available providers, which is not a lot for a model that likes to think as much as DeepSeek does.
- the official DeepSeek API makes no guarantees of data privacy even for paying users.
Both points could be moot with using it through Azure AI foundry (the latter is, afaik); I have yet to test that.
In any case, happy to see more open-weights models that are somewhat competitive with SOTA models!

by Haven880

0 subcomment

Run it on an NVIDIA GPU and charge $20 a month, and it becomes 'frontier.' That is what the term means these days. In terms of performance, it beats ChatGPT 5.5 and Mythos on several metrics.

by gertlabs

0 subcomment

DeepSeek V4 Flash is the most cost effective model we've tested.
We had to really understand why it outperformed DeepSeek V4 Pro (although even on unreliable model cards, Flash was very close to Pro). Pro is slower and smarter in one-shot reasoning problems, but less effective with tools and therefore less performant in long horizon agentic tasks (especially with custom tools it was not trained on).
Benchmarks at https://gertlabs.com/rankings

by 0xkvyb

0 subcomment

It might be at the frontier, but DeepSeek is really struggling with compute. The amount of 429 Rate Limit responses I've been getting just testing this thing made me pause all my attempts at cross-comparing it to others.
I'm gonna stick to GLM5.1 for now.

by crakhamster01

1 subcomments

I realize this post is about the pelican test, but in regards to coding, has anyone tried out the advisor strategy with V4?[0]
e.g. Have V4 call out to Opus when it's uncertain, but otherwise handle execution.
The results with Sonnet/Haiku in the blog post seemed promising, so I'm curious how it would go with these latest open models.
[0] https://claude.com/blog/the-advisor-strategy

by holysantamaria

3 subcomments

From the pricing page of deepseek:
(3) The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC.
Was this taken into account when reviewing the model?

by teruakohatu

3 subcomments

The pelican is really getting old as an a standalone evaluation metric. By now they are certainly going to be in training set if not explicitly tuned to produce it for the press on HN alone.
Keep the pelican but isn’t it time to add something else more novel that all current and past models struggle with?

by ghm2180

1 subcomments

I've been using the planning framework from Matt Pocock on very typical brownfield code. I use a harness over claude code, this is so cheap that I would be tempted to mirror my initial prompt to it and compare their responses to the task.

by Palmik

1 subcomments

Why was the title changed from "DeepSeek V4—almost on the frontier, a fraction of the price" to "DeepSeek V4—almost on the frontier"?

by alasano

1 subcomments

I tweeted about some implementation and review runs that used V4 Pro.
Even without the currently discounted pricing, the value is incredible.
It takes about twice as long to finish code reviews given an identical context compared to opus 4.7/gpt 5.5 but at 1/10 the cost of less, there's just no comparison.
https://twitter.com/aljosa/status/2049176528638902555

by antirez

1 subcomments

Related: live demo of DeepSeek v4 Flash running on my 128GB MacBook. Italian language with English subs.
https://www.youtube.com/watch?v=todMmp6AGCE

by piker

4 subcomments

Jensen has a point. I believe these were trained and run on Huawei chips. The Nvidia embargo may backfire on American leadership as necessity gives way to invention.

by linzhangrun

0 subcomment

Strangely, my experience using DeepSeek V4 Pro on OpenCode has been absolutely awful. I switched back to GPT-5.3-CodeX as the execution model.

by myaccountonhn

0 subcomment

I recently switched from Claude to Opencode Go + pi.dev. It has Deepseek v4 pro along with Kimi K2.6, and it's performing quite well for basic coding, without hitting any limits.

by downbad_

0 subcomment

I've found this to be a very good model, and I think I'd even go as far as rating it higher than Chatgpt.
ChatGPT has really degraded in my eyes, and I find Grok and Deepseek more helpful most of the time.
Of course, ChatGPT is better sometimes.
These models are just better than others at different cases, thus the reason to experiment.

by taffydavid

10 subcomments

I tried deepseek v4 through open code at the weekend. I'm a daily Claude/Claude code user.
I tried to build something simple and while it got the job done the thinking displayed did not fill me with confidence. It was pages and pages of "actually no", "hang on", "wait that makes no sense". It was like the model was having a breakdown.
Bear in mind open code was also new to me so I could be just seeing thinking where I usually don't

by fy20

0 subcomment

> DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI’s GPT-5.4 Nano.
GPT-5 Nano should really be in the list too. It is $0.05 input and $0.40 output - and half that if you use the Flex tier.
Last week I upgraded an old batch process from GPT-4.1 Nano, and GPT-5 Nano worked just as well as GPT-5.4 Nano but at a much lower cost.
As always OpenAIs naming is really bad, GPT-5.4 Nano is a different model, its not a straight upgrade from GPT-5 Nano.

0 subcomment

by bilsbie

0 subcomment

Dumb question? Why does pro make a worse pelican than flash?

by gelbmann

0 subcomment

perhaps the capital market do not want to see it because it does not want to ruin o&a's IPO?

by zkmon

0 subcomment

Tokens are cheap. LLMs are fast. Pre-processing and post processing are the real bottlenecks. I know you are going to say that why not Use LLMs for that. Complexity in an end-to-end workflow is a zero-sum game. If you throw more of that workflow to LLM, more complexity comes back to you, to those steps that you need to do on your own. If you keep only 10% of work for yourself, it's going to be 10 times more complex and rapid than what you usually do.

by XCSme

0 subcomment

Strangely, the V4 Flash pelican looks better than the V4 Pro one.
In my tests[0], V4 Flash actually does slightly better and for a lot cheaper than V4 Pro, mostly because it reasons twice as much.
[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

by rsanek

2 subcomments

I'm not sure I'd call it "almost on the frontier," but I do think that v4 Pro is the most usable coding model I've seen out of China. I've used it via Ollama Cloud (coding) and OpenRouter (data processing). Feels Sonnet-level to me -- solid at implementation when given a specification, but falls a good bit short of Opus 4.7 max thinking when planning out larger changes or when given open-ended prompts.

by wolttam

0 subcomment

DS V4 Pro has rocked. ~250 million tokens through their API, which has cost me about $10, and some of that was at the non-discount rate. So ~$40 at the non-discount rate. I have yet to have a single request feel slow or get rejected.
I've used K2.6, GLM5.1, and DSV4 all a good amount. They're all very impressive, but DSV4 has taken the cake.

by mohsen1

0 subcomment

In my experience V4 is pretty good but for very hard problems it burns way too many tokens that it ends up being not so cheap anymore. I'm working on a compiler and the tasks are very involved. Tests won't pass unless it gets it absolutely right. 5.5 can achieve more in less time compared to V4 for me.

by qekagn

0 subcomment

There are so many login-free models now that most people will not even try DeepSeek if the access requires a login.

by twothreeone

1 subcomments

For a solo dev sure.. but isn't there a huge privacy difference between Anthropic and DeepSeek APIs as well? I assumed part of the cost for Anthropic was essentially a privacy premium.. (plus they offer B2B).

by mamman777

0 subcomment

DeepSeek is very good in design and debugging, but it lacks modern tech feeling which Gemini has

by koala-news

0 subcomment

Its cost is relatively low, making it very cost-effective.

by edg5000

2 subcomments

Has anybody used V4 hard, for the most challenging tasks (agentically, locally)? It's so hard to compare without putting serious time in it. Like spending a year daily with the model.

by aucisson_masque

0 subcomment

From my testing, it's just as good as Claude sonnet for a fraction of the price.

by makerofthings

0 subcomment

Anybody know how much ram you would need in a Mac to run the Pro model?

by fagnerbrack

0 subcomment

I use in readplace.. oh boy it's SOO good and cheap for summaries!!

by chaosprint

0 subcomment

I doubt if those models already knew this pelican test...

by alfiedotwtf

0 subcomment

… waiting patiently for llama.cpp support to land

by tomchui157

0 subcomment

Wanna see ppl fine-tuning it

by csomar

0 subcomment

Here is a comparison for SVG generation for the top models: https://codeinput.com/s/5KEGl1e3rB3
Open AI has GPT-5.5 Pro which only difference, I think, is in the price. Billing is from open router but the breakdown is roughly
```
    - GPT 5.5 Pro: Super expensive it makes no sense (cost is around $2)
    - Gemini/Opus: $0.2/$0.1. Opus is cheaper as it consumed less tokens
    - DeepSeek/GLM: $0.019/$0.021 10-5 times cheaper than Gemini and Opus
```
The example Simon generated just shows that larger models don't necessarily produce better results.

by forrestthewoods

1 subcomments

Naive Question: is DeepSeek V4 actually cheaper to run? Or is it cheaper because of other reasons? For example Anthropic running at a higher margin or DeepSeek at a larger loss?

0 subcomment

by alex1138

6 subcomments

Does it censor mentions of what happened in Tiananmen Square in 1989?

by sylware

1 subcomments

If I want to run 'coding prompts' running the biggest deepseek model on CPU, what is the order of time I will have wait, hours, days?

by npv789

1 subcomments

my default model now, less censorship

by kk_mors

0 subcomment

[dead]

by shawryadev

0 subcomment

[flagged]

by alexmercerdev

0 subcomment

[dead]

by ajaystream

0 subcomment

[flagged]

by ai_terk_er_jerb

0 subcomment

[dead]

by Tarcroi

0 subcomment

[dead]

by raincole

3 subcomments

The V3/R1 time and now are in such contrast. V3/R1 were hyped hard and barely usable for coding. V4 is much less hyped but (anecdotally) it has completely demolished all the Flash/Lite/Spark models.

by trilogic

0 subcomment

https://www.reddit.com/r/Hugston/comments/1t1mk0j/comparison...

by tomjuggler

0 subcomment

So I'm involved in an open source AI cli coding assistant called Cecli (cecli.dev) which is specifically designed to work well with DeepSeek.
DeepSeek is a great model, and Cecli is all about efficiency. It works great for my purposes - agentic programming on a budget.

by grassfedgeek

4 subcomments

The credit for DeepSeek, in part, goes to US companies such as OpenAI [1] and DeepSeek [2]. Portions of DeepSeek are based on their products.
[1] https://www.reuters.com/world/china/openai-accuses-deepseek-...
[2] https://x.com/AnthropicAI/status/2025997928242811253