AI Solves Major Erdős Puzzle, Yet Data Reveals Systemic Failure Rates.

GPT-5.2 Pro solves a 44-year puzzle, but new data confirms AI's systemic success rate is only two percent.

January 18, 2026

AI Solves Major Erdős Puzzle, Yet Data Reveals Systemic Failure Rates.
The world of advanced mathematics has once again been shaken by the apparent triumph of artificial intelligence, as the latest iteration of the frontier language model, GPT-5.2 Pro, is credited with helping to solve another long-standing problem posed by the legendary Hungarian mathematician Paul Erdős. This new breakthrough, involving a deep number theory puzzle, represents an escalation in AI's demonstrated capacity for complex, abstract reasoning and proof generation. However, the excitement of this victory is tempered by a stark reality check from within the mathematical community, driven by new data that meticulously tracks the systemic failures of AI systems, revealing that their actual success rate on such challenges remains perilously low, hovering at just one to two percent.
The specific challenge recently conquered is Erdős Problem #281, a complex number theory puzzle concerning covering systems and congruence classes that had remained unsolved for over 44 years since its initial proposal in 1980 by Erdős and Ronald Graham. The solution was facilitated by technologist Neel Somani, who used GPT-5.2 Pro to generate a novel proof. The AI's approach leveraged highly sophisticated mathematical tools, notably employing ergodic theory and working within the profinite integers with Haar measure. The proof structure, which utilizes concepts like the Birkhoff ergodic theorem and Dini's theorem to show that avoiding all congruences leads to density zero, represents a significant leap past simple pattern-matching, demonstrating the model's ability to synthesize and apply advanced, disparate mathematical concepts in a coordinated fashion[1][2]. Crucially, the final proof was rigorously verified by Fields Medalist Terence Tao, one of the world's most accomplished living mathematicians and a leading authority on the Erdős problem set. Tao hailed the result as "perhaps the most unambiguous instance" of an AI contributing to the solution of an open mathematical problem, lending unparalleled credence to the AI's role[1][2]. This latest win follows a pattern of recent AI-assisted successes, including the resolution of Erdős Problems #397, #728, and #729, signaling a clear shift in the technological landscape of mathematical research[3][4][5].
Despite the genuine significance of these high-profile victories, Tao and other researchers caution against a skewed public perception of AI's overall competency in this domain. The celebratory announcements of successful solves tend to go viral, obscuring the vast number of attempts that result in failure or inaccurate proofs. Tao himself has been careful to categorize many of the recent AI successes as solving the "lowest-hanging fruit" within the Erdős collection, meaning problems that are solvable with standard techniques and established methods rather than requiring the profound, genuinely novel breakthroughs that characterize true mathematical frontiers[4][2]. The models, he suggests, are increasingly adept at manipulating existing knowledge to solve less complex problems, but still struggle acutely with those requiring fundamental, original conceptual insight. This distinction is vital for understanding the true trajectory of AI in scientific discovery.
This reality check is powerfully supported by the introduction of a new, crucial database developed by researchers Paata Ivanisvili and Mehmet Mars Seven. This database serves as an invaluable, non-viral repository, meticulously tracking all documented AI attempts—both successful and failed—at solving the various Erdős problems[2]. The sobering data contained within directly contradicts the prevailing public narrative of AI-driven mathematical dominance. The database confirms that the actual success rate of AI attempts is clustered only around the easiest problems and sits at a discouraging one to two percent[2]. This low rate provides an essential systemic context that is often lost in the hype cycle of singular, well-marketed achievements. The very act of tracking and publishing negative results is a deliberate effort to correct the distorted perception that arises when only the positive outcomes receive widespread attention, a bias that Tao has explicitly warned against[2]. The database acts as a cold, objective metric, grounding the debate about Artificial General Intelligence (AGI) in empirical evidence and showcasing the vast distance that still separates current large language models from human-level ingenuity in high-level theoretical research.
The actual, lasting implication of this period of AI-assisted discovery may not be the immediate replacement of human mathematicians, but rather the establishment of an entirely new, profoundly powerful workflow. The verification process stands out as the cornerstone of this new paradigm. For the GPT-5.2 Pro proof of Problem #281, as with the other recent successes, the raw proof generated by the language model was subsequently formalized using a specialized theorem-proving system named Aristotle, which converted the informal argument into the Lean verification language[3][4][5]. Lean functions as a 'compiler for truth,' a mechanism that auto-corrects subtle logical gaps and ensures that the final result is mathematically sound and verifiable, effectively serving as the crucial bridge between the AI’s probabilistic generation and the human need for certainty[3][4][5]. This shift from pattern-matching to verifiable proof-generation, albeit with a low initial success rate, is what matters most for the AI industry and other fields requiring rigorous logical reasoning, such as contract analysis, regulatory compliance, and engineering[4]. With approximately 660 problems still open in the Erdős collection, this new model of human-AI collaboration—where the AI provides a candidate proof and a formal verification system guarantees its rigor—is projected to systematically address a substantial fraction of the remaining open challenges over the coming months and years[5]. The ultimate value of GPT-5.2 Pro, therefore, lies not in its capacity to solve every problem, but in its ability to serve as an indispensable, semi-autonomous partner for a human researcher, capable of tackling the more accessible tail of complex mathematical challenges and accelerating the pace of discovery, while the new database ensures a clear-eyed view of its systemic limitations.

Sources
Share this article