If we disregard programming and just look at formalizing math (Christian Szegedy has been doing it for a long time now), the length of proofs that are being formalized are exponentially growing and there's a good chance that in 2026 close to 100% of human written big/important proofs will be translated to and verified by Lean.
Just as an example for programming / modelling cache lines and cycle counts: we have quite good models for lots of architectures (even quite good reverse engineered model for NVIDIA GPUs in some papers). The problem is that calculating exact numbers for cache reads / writes is boring with lots of constants in them, and whenever we change the model a little bit the calculations have to be remade.
It's a lot of boring constraints to solve, and the main bottleneck for me when I was trying to do it by hand was that I couldn't just trust the output of LLMs.
Not that relevant in context as the code in question is used to conclude a formal proof, not the other way around. Buy hey, it is a common quote when talking about proving software and someone has to do it...
Context: https://staff.fnwi.uva.nl/p.vanemdeboas/knuthnote.pdf
Type errors, especially once you have designed your types to be correct by construction, is extremely, extremely useful for LLMs. Once you have the foundation correct, they just have to wiggle through that narrow gap until it figures out something that fits.
But from what I understood and read so far, I am not convinced of OP's "formal verification". A simple litmus test is to take any of your recent day job task and try to describe a formal specification of it. Is it even doable? Reasonable? Is it even there? For me the most useful kind of verification is the verification of the lower level tools i.e. data structures, language, compilers etc
For example, the type signature of Vec::operator[usize] in Rust returns T. This cannot be true because it cannot guarantee to return a T given ANY usize. To me, panic is the most laziest and worst ways to put in a specification. It means that every single line of Rust code is now able to enter this termination state.
- Lean will optimize peano arithmetic with binary bignums underneath the hood
- Property based checking and proof search already exist on a continuum, because counterexamples are a valid (dis)proof technique. This should surprise no writer of tactics.
- the lack of formal specs for existing software should become less a problem for greenfield software after these techniques go mainstream. People will be incentivized to actually figure out what they want, and successfully doing so vastly improves project management.
Finally, and most importantly, people thinking that there is a "big specification" and then "big implementation" are totally missing the mark. Remember tools like lean are just More Types. When we program with types, do we have a single big type and a single untyped term, paired together? Absolutely not.
As always, the key to productive software development is more and more libraries. Fancier types will allow writing more interesting libraries that tackle the "reusable core" of many tasks.
For example, do you want to write a "polymorphic web app" that can be instantiated with a arbitrary SQL Schema? Ideas like that become describable.
This really resonates. We can write code a lot faster than we can safely deploy it at the moment.
“It affects point number 1 because AI-assisted programming is a very natural fit fot specification-driven development.”
made me smile. Reading something hand made that hadn’t been through the filters and presses of modern internet writing.
But you don't really need complete formal verification to get these benefits. TDD gets you a lot of them as well. Perhaps your verification is less certain, but it's much easier to get high automated test coverage than it is to get a formally verifiable codebase.
I think AI assisted coding is going to cause a resurgence of interest in XP (https://en.wikipedia.org/wiki/Extreme_programming) since AI is a great fit for two big parts of XP. AI makes it easy to write well-tested code. The "pairing" method of writing code is also a great model for interacting with an AI assistant (much better than the vibe-coding model).
AI will make formal verification go mainstream
All the problems mentioned in the article are serious. They're also easier than the problem of getting an AI to automatically prove at least hundreds of correctness properties on programs that are hundreds of thousand, if not millions of lines long. Bringing higher mathematics into the discussion is also unhelpful. Proofs of interesting mathematical theorems require ingenuity and creativity that isn't needed in proving software correct, but they also require orders of magnitude fewer lemmas and inference steps. We're talking 100-1000 lines of proof per line of program code.
I don't know when AI will be able to do all that, but I see no reason to believe that a computer that can do that wouldn't also be able to reconcile the formal statements of correctness properties with informal requirements, and even match the requirements themselves to market needs.
> This makes formal verification a prime target for AI-assisted programming. Given that we have a formal specification, we can just let the machine wander around for hours, days, even weeks.
Is this sentiment completely discounting that there can be many possible ways to write program that satisfies certain requirements that all have correct outputs? Won’t many of these be terrible in terms of performance, time complexity, etc? I know that in the most trivial case, AI doesn’t jump straight to O(n)^3 solutions or anything, but also there’s no guarantee it won’t have bugs that degrade performance as long as they don’t interfere with technical correctness.
Also, are we also pretending that having Claude spin for “even weeks” is free?
Some software needs formal verification, but all software needs testing.
On another subject...
> Tests are great at finding bugs ... but they cannot prove the absence of bugs.
I wish more people understood this.
- A formal spec can be used to derive randomly generated tests attempting to find counterexamples to spec invariants (which the article hinted at but didn't describe explicitly)
- A formal spec can be used as input to a model checker, which will try to find counterexamples to spec invariants via bounded exploration of the model's state space
- A formal spec can be used to find spec invariant violations by analyzing traces from production systems
What all these examples have in common is that they attempt to falsify, not verify, that the spec accurately describes the desired design properties or the actual implementation. That tends to be much more feasible than an actual proof.
To me, this reads as an insurmountably high hurdle for the application domain. We're talking about trying to verify systems which are produced very quickly by AIs. If the verification step is glacially slow (which, by any measure, a million cycles to add two integers is), I don't see how this could be considered a tractable solution.
Why can't we just prove theorems about the standard two's complement integers, instead of Nat?
How I learned to deploy faster.
This nonsense again. No. No it isn’t.
I’m sure the people selling it wish it was, but that doesn’t make it true.
...but we will be modelling those 5-10kLOC modules across multiple services doing critical business logic or distributed transactions. This has been unthinkable a couple months ago and today is a read-only-Friday experiment away (try it with a frontier model and you'll be surprised).
Thanks for the article. Perhaps you could write a follow-up article or tutorial on your favored approach, Verification-Guided Development? This is new to most people, including myself, and you only briefly touch on it after spending most of the article on what you don't like.
Good luck with your degree!
P.S. Some links in your Research page are placeholders or broken.
1: https://blog.regehr.org/archives/482 there were many issues here, not just with compcert