FRESH

Hacker News

Home

“Erdos problem #728 was solved more or less autonomously by AI”

613 points by cod1r

by maxwells-daemon

13 subcomments

I work at Harmonic, the company behind Aristotle.
To clear up a few misconceptions:
- Aristotle uses modern AI techniques heavily, including language modeling.
- Aristotle can be guided by an informal (English) proof. If the proof is correct, Aristotle has a good chance at translating it into Lean (which is a strong vote of confidence that your English proof is solid). I believe that's what happened here.
- Once a proof is formalized into Lean (assuming you have formalized the statement correctly), there is no doubt that the proof is correct. This is the core of our approach: you can do a lot of (AI-driven) search, and once you find the answer you are certain it's correct no matter how complex the solution is.
Happy to answer any questions!

by MyFirstSass

9 subcomments

Based on Tao’s description of how the proof came about - a human is taking results backwards and forwards between two separate AI tools and using an AI tool to fill in gaps the human found?
I don’t think it can really be said to have occurred autonomously then?
Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.
EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

by observationist

5 subcomments

Reconfiguring existing proofs in ways that have been tedious or obscured from humans, or using well framed methods in novel ways, will be done at superhuman speeds, and it'll unlock all sorts of capabilities well before we have to be concerned about AGI. It's going to be awesome to see what mathematicians start to do with AI tools as the tools become capable of truly keeping up with what the mathematicians want from the tools. It won't necessarily be a huge direct benefit for non-mathematicians at first, because the abstract and complex results won't have direct applications, but we might start to see millenium problems get taken down as legitimate frontier model benchmarks.
Or someone like Terence Tao might figure out how to wield AI better than anyone else, even the labs, and use the tools to take a bunch down at once. I'm excited to see what's coming this year.

by svat

1 subcomments

For context, Terence Tao started a wiki page titled “AI contributions to Erdős problems”: https://github.com/teorth/erdosproblems/wiki/AI-contribution... (as mentioned in an earlier post https://mathstodon.xyz/@tao/115818402639190439) — even relative to when he started this page less than two weeks ago (Dec 31), the current result (for problem [728]) represents a milestone: it is the first green in Section 1 of that wiki page.

by lwansbrough

11 subcomments

Can anyone with specific knowledge in a sophisticated/complex field such as physics or math tell me: do you regularly talk to AI models? Do feel like there's anything to learn? As a programmer, I can come to the AI with a problem and it can come up with a few different solutions, some I may have thought about, some not.
Are you getting the same value in your work, in your field?

by tachim

3 subcomments

You can try out Aristotle yourself today https://aristotle.harmonic.fun/. No more waitlist!

0 subcomment

by D-Machine

5 subcomments

This is great, there is still so much potential in AI once we move beyond LLMs to specialized approaches like this.
EDIT: Look at all the people below just reacting to the headline and clearly not reading the posts. Aristotle (https://arxiv.org/abs/2510.01346) is key here folks.
EDIT2: It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.

by spopejoy

0 subcomment

Much of the discussion here seems focused on the Lean part/correctness, but it sure looks like for Tao its the rapid iteration on the _paper_ that's the important part:
> ... to me, the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument.
> This is sharp contrast to existing practice where the effort required to produce even one readable manuscript is quite time-consuming, and subsequent revisions (in response to referee reports, for instance) are largely confined to local changes (e.g., modifying the proof of a single lemma), with large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors. However, the combination of reasonably competent AI text generation and modification capabilities, paired with the ability of formal proof assistants to verify the informal arguments thus generated, allows for a much more dynamic and high-multiplicity conception of what a writeup of an argument is, with the ability for individual participants to rapidly create tailored expositions of the argument at whatever level of rigor and precision is desired.
Of course this implies that the math works which is the Aristotle part, and that's great ... but this rebuts the "but this isn't AI by itself, this is AI and a bunch of experts working hard, nothing to see here": right, well even "experts working hard" fail to iterate on the paper which significantly hinders research progress.

by markusde

0 subcomment

Very cool to see how far things have come with this technology!
Please remember that this is a theorem about integers that is subject to a fairly elementary proof that is well-supported by the existing Mathlib infrastructure. It seems that the AI relies on the symbolic proof checker, and the proofs that it is checking don't use very complex definitions in this result. In my experience, proofs like this which are one step removed from existing infra are much much more likely to work.
Again though, this is really insanely cool!!

by snowmobile

0 subcomment

Digging through the PDFs on Google Drive, this seems to be (one of) the generated proofs. I may be misunderstanding something, but 1400 lines of AI-generated code seems a very good place for some mistake in the translation to sneak in https://github.com/plby/lean-proofs/blob/main/src/v4.24.0/Er...
Though I suppose if the problem statement in Lean is human-generated and there are no ways to "cheat" in a Lean proof, the proof could be trusted without understanding it

by libraryofbabel

3 subcomments

2026 should be interesting. This stuff is not magic, and progress is always going to be gradual with solutions to less interesting or "easier" problems first, but I think we're going to see more milestones like this with AI able to chip away around the edges of unsolved mathematics. Of course, that will require a lot of human expertise too: even this one was only "solved more or less autonomously by AI (after some feedback from an initial attempt)".
People are still going to be moving the goalposts on this and claiming it's not all that impressive or that the solution must have been in the training data or something, but at this point that's kind of dubiously close to arguing that Terence Tao doesn't know what he's talking about, which to say the least is a rather perilous position.
At this point, I think I'm making a belated New Years resolution to stop arguing with people who are still staying that LLMs are stochastic parrots that just remix their training data and can never come up with anything novel. I think that discussion is now dead. There are lots of fascinating issues to work out with how we can best apply LLMs to interesting problems (or get them to write good code), but to even start solving those issues you have to at least accept that they are at least somewhat capable of doing novel things.
In 2023 I would have bet hard against us getting to this point ("there's no way chatbots can actually reason their way through novel math!"), but here we are are three years later. I wonder what comes next?

by cultofmetatron

1 subcomments

I remember seeing a documentary where there was a bit about some guy who' life's work was computing pi to 30 digits. Imagine all that time to do what my computer can do in less than a second + a day or two to write the code using the algorithm he used. 10 min if you use newton's

by zkmon

2 subcomments

When Deep Blue beat Kaspaorov, it was not the end of career for human players. But since mathematics is not a sport with human players, what are the career prospects for mathematicians or mathematics-like fields?

by aziis98

0 subcomment

The erdos problem website tells the theorem is formalized in Lean but on the mathlib project there is just the theorem statement with a sorry. Does someone know where I can find the lean proof? I don't know maybe it's in some random pull request I didn't find.
Edit: Found it here https://github.com/plby/lean-proofs/blob/main/src/v4.24.0/Er...

by mehdi1964

0 subcomment

If AI can rewrite and formalize proofs this way, do we risk losing the human intuition behind the arguments? Or is it just a tool to explore math faster?

by thomasahle

3 subcomments

It took Andrew Wiles 7 years of intense work to solve Fermat's Last Theorem.
The METR institute predicts that the length of tasks AI agents can complete doubles every 7 months.
We should expect it to take until 2033 before AI solves Clay Institute-level problems with 50% reliability.

by kittikitti

0 subcomment

This is a great achievement for AI! I quickly read through the thread but found that Tao's page on Github to be easier to comprehend,
https://github.com/teorth/erdosproblems/wiki/AI-contribution...
It classifies the advancements based on the level of AI input. In particular, the entry in Table 1 related to the original post has both a green and yellow light, reflecting the skepticism from others.

by dnw

0 subcomment

I really want to see if someone can prompt out a more elegant proof of Fermat's Last Theoremthan, compared to that of Wiles's proof.

by dylanz

0 subcomment

Does it work on cryptography? Can it find out the methods behind the fourth Kryptos problem?

0 subcomment

by esafak

1 subcomments

How are academics going to assess AI-coauthored research for appointment and promotion?

by emsign

0 subcomment

Sounds to me the actual work was done in the discussions with ChatGPT by the researchers.

by demirbey05

0 subcomment

https://news.ycombinator.com/item?id=46550836 Another view on that.

by shevy-java

1 subcomments

Skynet 3.0 is annoying.

by MORPHOICES

0 subcomment

[dead]

by maximgeorge

0 subcomment

[dead]

by bgwalter

7 subcomments

[flagged]

by muldvarp

0 subcomment

Everyone who works for a living is about to have a really bad time.

by leggothrow

4 subcomments

This almost implies mathematicians aren’t some ungodly geniuses if something as absolutely dumb as an LLM can solve these problems via blind pattern matching.
Meanwhile I can’t get Claude code to fix its own shit to save my life.