> A clear explanation can be found in Alex Kontorovich’s account of his own learning curve with formalized mathematics. In a nutshell: Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics. It exposes clean APIs and abstractions, without which no autoformalization could take place. By contrast, Math Inc’s autoformalized proof of Viazovska’s results exposes no intelligible interface. Who in their right mind would merge a 200,000-line unaudited vibe-coded blob into the master branch of global human science?
https://davidbessis.substack.com/p/the-fall-of-the-theorem-e...
When a codebase gets too large, you eventually can't understand all of it. Even code I wrote myself, I can't fully grasp it.
In those cases, we usually write tests.
But when tests get too big, we end up writing tests for the tests.
Eventually, it feels like we're heading into an era of proofs for proofs.
For me, this problem usually unfolds like this:
1.I can't trust SDKs or Stack Overflow code.
2.So I write tests.
3.But I can't trust the tests either.
4.So I use test coverage, mutation testing, property testing, and fuzzing.
5.If that's still not enough, I add formal verification.
6.And then the problem becomes: can I trust the verifier?
That's how it ends up. Wouldn't human work shift toward verifying the verification systems themselves?
There are of course all the computer-assisted proofs (see 4 color theorem), as well as the partially-assisted ones (see Viazovska et al on packing problems in dimensions 8, 24). But even finding a solution numerically, then rigorously verifying its properties can leave a lingering sense of incompleteness, of a gap in understanding. I like this one quote by (allegedly) Wigner that illustrates it well:
"It is nice to know that the computer understands the problem, but I would like to understand the problem, too."
In particular, until now, mathematics were one of the rare sciences were great scientists could emerge from any country with a good education system.
With the raise of strong AI tools, only scientists in rich countries with access to those tools might be able to advance faster on the most difficult problems like the millennium problems.
Mathematics might become like experimental sciences were you need to build expensive machines to make further progress, such as nuclear fusion.
Actually, even now, the strongest models in mathematics are only available to a few engineers and a few mathematicians selected by Openai and Google.
If it's utility, then why doesn't AI deliver utility? Isn't this an argument for AI?
One can expand this to various kinds of utility. Is knowing mathematical statements are true useful? In the ability to produce proofs useful? Does being able to prove carry over to other kinds of reasoning? Is producing a cadre of math-capable people useful in wartime? Is national prestige of value? There's a pro-AI reading of all these.
It it's beauty not utility, then why isn't math funding over with art funding, along with hills covered with acres of fabric, motorized embalmed animal carcasses, and religious symbols in bottles of urine?
Priests were interpreting the oracles (at least at a place like Delphi) according to the context of the people asking the questions aka participating in politics of that ancient times.
Subjectivity was a feature and I’m not sure that fits to mathematics though.
I wonder if mathematics as a science field moves more into engineering or if a different branch will emerge that is closer to that because to the point of the article, science is about understanding not just results.
I've been in the software industry for 30 years and I understand that sentiment perfectly on a personal level. However, also having used ChatGPT for the first time this year to help solve a technical problem, I was surprised to learn that feeling didn't go away. In fact, it was because I leaned on my experience and technical understanding to get to the point where I realized I needed help that I decided to try a new tool. I didn't feel any shame or disappointment in myself, rather I felt excited to learn something new that came out of the solution. It sent me on a new path of learning.
Now, that was research and not implementation. While plenty of code options were presented by ChatGPT, I analyzed the solution and educated myself from it. My final fix looked very different from the proposal because I do things a certain way, based on experience and learning from my many mistakes. In this scenario I was the secondary verification. My peers the tertiary. AI made the proposal, humans did the verification and could not have done so without cumulative knowledge and experience.
I have to assume that all fields utilizing AI will remain as they have always been, human education and experience will come first no matter what the tools available, because we are the ones impacted by the data produced by AI. As many math-oriented commenters here have already noted, human verification is a necessity and to do that, you must understand the discipline within which the data is being produced.
Personally, the idea of reaching solutions in math and computing (for example) exponentially beyond human capacity is exciting; I want certain answers before I die! But it still must be human-verified and the solutions should be for humans, not for machines, and not for "time to market". Repositories full of unvetted AI-generated code is bad enough, but once you start engineering structures, spacecraft and medicine strictly with AI, well...
It is precisely NOT about big questions but rather potentially covering and thus cornering away all the "small" questions.
PS: already argued for something similar in recent Ergos related work, see comment history.
If mathematics is human understanding of logical consequences, understanding is the priority. But if AI proves something we can't understand but can utilize, that is a different sort of useful.
We are getting awfully close to "the answer of the universe is 42" and having it not be a joke...
The quality of the mathematics is a function of who has authored it?
Is direct experience and struggle really the only driver for developing intuition?
> Some worry about the accessibility of AI tools. Traditionally, mathematicians have required little more than intuition, training, and a pen and paper to advance their field. If this slow, deliberative process is no longer valued by society, and particularly by research funders, then mathematics could become an elitist activity, only practiced by select organizations that can afford to work with proprietary AI models.
This can be true of anything LLMs do better than existing options. Because the best LLMs require enormous resources to develop, access to them can be very limited. Right now they are priced for market share. What happens when your small law firm attorney, or self-representation, goes up against a large firm with LLM expertise? Can the kid from the poor high school compete with the kid from the rich school with premium LLM access, in mathematics for example?
It would be great if someone could explain to me how AI improves this situation. Even if AI thinks it’s solved a problem, unless the proof is incredibly efficient and well explained, it will be difficult to verify the correctness. One hallucination in 300 steps of logic is enough to destroy the entire proof.
>> In February, for example, the AI company Math, Inc. used its aspirationally named reasoning agent Gauss to formalize a proof that had earned the mathematician Maryna Viazovska, of EPFL, in Switzerland, a Fields Medal in 2022. Gauss first helped human mathematicians complete the formalization of Viazovska’s solution to the 8-dimensional sphere-packing problem in a matter of days, and then autonomously formalized the more complicated 24-dimensional case in just two weeks.
Gauss "autonomously" formalized? There are no autonomous AI systems! Even the most performant agentic harnesses require extensive human guidance and cannot complete substantial problems without some user interaction. Indeed, if I read the Math, Inc. pages linked in the article I find that Gauss is a typical agentic harness that relies on human scaffolding and interaction and its vaunted autonomy is currently only speculative:
>> Gauss offers a glimpse of how formalization will scale into the future. Currently, it relies on natural language scaffolding supplied by human mathematicians, and requires high-level expert guidance and development on that scaffolding. We anticipate future iterations of Gauss to be more capable and autonomous.
Or take this bit from the Math, Inc. pages again:
>> In two weeks, Gauss then autoformalized the 24-dimensional case using only the original paper as input, performing autonomous literature searches when needed. This brought the total sphere packing formalization from 70k to ~200k lines.
https://www.math.inc/sphere-packing
Gauss took only the original paper as input? Really? No initial prompts? No scaffolding? Just the paper and off it went? But then, the future feared by the young mathematicians at the 12th Heidelberg Laureate Forum, has already come to pass:
>> Speakers described a future in which superhuman AI mathematicians transcend human knowledge and capabilities: forming conjectures, searching solution spaces, proving conjectures, and finally verifying the proofs and generalizing the results, all without human involvement.
Hold the press for that is already happening: Gauss, the autonomous agent for autoformilization, is already capable of running completely autonomously with "only a paper as input"!
This is not serious. It's some kind of over-exaggerated mumbo-jumbo that fails to impart any useful information and makes it impossible to tell what advances have really been made and where things still can go. But perhaps that's exactly the point? Unbelievable.