DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]
207 points by fspeech
by dwohnitmok
3 subcomments
Is everyone just glossing over the first place score of 118/120 on the Putnam?! I mean we'll see how it does on the upcoming 2025 test, but that's insane!
We've seen absolutely ridiculous progress in model capability over the past year (which is also quite terrifying).
by N_Lens
1 subcomments
The core innovation is a verifier-generator dual architecture that enables the model to self-check reasoning rigor, addressing the fundamental problem that correct answers don't guarantee correct reasoning processes.
by jimmy76615
0 subcomment
Amazing model! I'm trying to get it to run on an ec2 machine right now, but it looks like a lot of the performance actually depends on more than just classical LLM inference. And it looks like Deepseek didn't share their scripts to do the parallel thinking traces and self-verification loops. Is anybody else working on recreating this right now?
by zaxioms
5 subcomments
It's cool, but I genuinely cannot fathom why they are targeting natural language proofs instead of a proof assistant.
by gunalx
3 subcomments
If i read it right it used multiple samples of itself to verify the aqccuracy, but isnt this problematic?
by awei
8 subcomments
Something weird here, why is it so hard to have a deterministic program capable of checking a proof or anything math related, aren't maths super deterministic when natural language is not. From first principles, it should be possible to do this without a llm verifier.
by
0 subcomment
by agentultra
1 subcomments
So it's designed for informal proofs and it "verifies" based on a rubric fitting function and human interaction, is that right?
What's the use case for a system like this?
by mekpro
1 subcomments
How this improvement translate into real world agentic coding task ?
by photon_lines
0 subcomment
Exciting stuff from a fantastic team.
by createaccount99
0 subcomment
This habit of making advertising repos on github confounds many.
by newyankee
0 subcomment
That is amazing if they can do all of this at < 10 % of the cost of frontier labs. Off course they work in the shadows of the great work done in the frontier labs and shared, but there is some exceptional high speed execution happening behind the scenes that shows this is clearly a race, but a race where China is happy to be #2 as long as the gap is not significant and the costs are reasonable