FRESH

Hacker News

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]

207 points by fspeech

by dwohnitmok

3 subcomments

Is everyone just glossing over the first place score of 118/120 on the Putnam?! I mean we'll see how it does on the upcoming 2025 test, but that's insane!
We've seen absolutely ridiculous progress in model capability over the past year (which is also quite terrifying).

by N_Lens

1 subcomments

The core innovation is a verifier-generator dual architecture that enables the model to self-check reasoning rigor, addressing the fundamental problem that correct answers don't guarantee correct reasoning processes.

by jimmy76615

0 subcomment

Amazing model! I'm trying to get it to run on an ec2 machine right now, but it looks like a lot of the performance actually depends on more than just classical LLM inference. And it looks like Deepseek didn't share their scripts to do the parallel thinking traces and self-verification loops. Is anybody else working on recreating this right now?

by zaxioms

5 subcomments

It's cool, but I genuinely cannot fathom why they are targeting natural language proofs instead of a proof assistant.

by gunalx

3 subcomments

If i read it right it used multiple samples of itself to verify the aqccuracy, but isnt this problematic?

by awei

8 subcomments

Something weird here, why is it so hard to have a deterministic program capable of checking a proof or anything math related, aren't maths super deterministic when natural language is not. From first principles, it should be possible to do this without a llm verifier.

0 subcomment

by agentultra

1 subcomments

So it's designed for informal proofs and it "verifies" based on a rubric fitting function and human interaction, is that right?
What's the use case for a system like this?

by mekpro

1 subcomments

by photon_lines

0 subcomment

by createaccount99

0 subcomment

by newyankee

0 subcomment

That is amazing if they can do all of this at < 10 % of the cost of frontier labs. Off course they work in the shadows of the great work done in the frontier labs and shared, but there is some exceptional high speed execution happening behind the scenes that shows this is clearly a race, but a race where China is happy to be #2 as long as the gap is not significant and the costs are reasonable