Google DeepMind AI solves legendary math problems that stumped human experts for decades

By pairing language models with formal logic, AlphaProof Nexus solves decades-old mathematical puzzles with unprecedented cost efficiency.

May 25, 2026

Google DeepMind AI solves legendary math problems that stumped human experts for decades
Artificial intelligence has crossed a historic threshold in the realm of pure science, demonstrating an unprecedented ability to solve original, research-level mathematical problems that have stumped human experts for generations. Google DeepMind recently unveiled its AlphaProof Nexus framework, an advanced system that autonomously proved nine open problems posed by the legendary mathematician Paul Erdős. Among these achievements are the solutions to two highly complex conjectures that had remained entirely unresolved for 56 years[1][2]. What makes this milestone particularly remarkable is not just the depth of the mathematics involved, but the efficiency of the computation. DeepMind achieved these breakthroughs at an inference cost of only a few hundred dollars per problem[1][2]. While the system's overall success rate on these notoriously difficult questions sits at a modest 2.5 percent, the absolute precision of its outputs marks a major leap forward for the AI industry[3][4].
The core technical breakthrough of AlphaProof Nexus lies in its rejection of traditional, natural-language approaches in favor of a hybrid system grounded in formal logic. While competitor systems rely primarily on generative natural language to write proofs, they remain highly vulnerable to logical hallucinations that can invalidate an entire mathematical argument[1]. DeepMind bypasses this vulnerability by pairing its Gemini 3.1 Pro large language model with Lean, a rigorous formal proof assistant and programming language[1]. When tasked with a problem, the AI does not merely write out a prose explanation; instead, it generates highly structured proof steps in Lean code[1][5].
This structure enables what researchers call agentic loops, which utilize symbolic feedback to self-correct in real time[1][4]. The generated Lean code is automatically fed into the Lean compiler, which meticulously verifies every single logical step[1][5]. If the compiler detects an error or a logical gap, it generates detailed error messages that are fed directly back to the language model[1][5]. The model then uses this diagnostic feedback to debug, revise, and refine its proof in subsequent attempts[1][5]. To prevent the system from accidentally proving a mathematically flawed statement due to a poor initial translation of the problem, DeepMind also implemented strict guardrails[5]. The agent is required to successfully prove test lemmas checking the early terms of each sequence, ensuring the underlying problem is correctly formalized before wasting compute on an impossible proof[5].
The mathematical significance of these results is anchored in the legacy of Paul Erdős, one of the most prolific and influential mathematicians of the twentieth century[1]. Erdős was famous for posing hundreds of open conjectures across combinatorics, number theory, and graph theory, often attaching small cash bounties to their solutions as a challenge to the global mathematical community[1][4]. DeepMind’s AlphaProof Nexus attempted 353 of these formalized Erdős problems, successfully cracking nine of them, including difficult variants categorized under the Erdős catalog as problems 12, 125, 138, and 741[1][4]. These are not simplified high school competition questions, but active research frontiers where professional mathematicians have struggled to make progress for over half a century[1][4].
In addition to the Erdős problems, the system demonstrated its versatile problem-solving capabilities across several other complex mathematical disciplines[2]. According to DeepMind's published findings, AlphaProof Nexus proved 44 out of 492 open conjectures from the Online Encyclopedia of Integer Sequences, representing roughly a nine percent success rate[1][4]. It also settled a 15-year-old open question regarding Hilbert functions in algebraic geometry and discovered a novel algorithmic parameter schedule that improved a known open bound in convex optimization[1][2]. The AI even succeeded in identifying several long-standing misformalizations and ambiguities in existing mathematical literature, proving that its utility extends beyond solving equations to actively curating the integrity of scientific data[2][5].
Architecturally, the success of AlphaProof Nexus signals a profound paradigm shift within the AI industry, moving away from brute-force scale and toward highly optimized agentic design[2]. In analyzing the framework, DeepMind researchers compared their full-featured agent—which integrates evolutionary search algorithms and specialized reinforcement learning theorem provers—with a much simpler, basic agent[2]. Remarkably, the basic agent was also capable of solving all nine of the Erdős problems[6][2]. However, the basic version required significantly more compute and higher computational costs to reach the same conclusions[6][2]. This reveals that the advanced Nexus architecture does not necessarily possess superior raw reasoning capability, but is vastly more efficient at navigating the massive search space of potential proofs[1].
This finding carries major economic implications for the commercialization of advanced artificial intelligence. In an industry where training and running massive frontier models can cost tens of millions of dollars, the ability to solve decades-old scientific mysteries for a few hundred dollars of cloud compute per problem is a game-changer[1]. It suggests that future breakthroughs in AI reasoning will not depend solely on building larger, more expensive neural networks. Instead, substantial progress can be unlocked by designing clever multi-agent workflows, leveraging symbolic compiler feedback, and refining how models iteratively collaborate with specialized verification software[1][2].
Ultimately, the achievements of AlphaProof Nexus represent a pivotal moment in the evolution of machine intelligence, illustrating how AI is transitioning from a passive information synthesizer into an active scientific collaborator[6][5]. While a 2.5 percent success rate on Erdős problems indicates that general mathematical intelligence remains out of reach, it establishes a reliable baseline for automated theorem proving[3][4]. The pairing of neural language models with rigid symbolic compilers solves the persistent trust problem that has plagued generative AI since its inception[1][5]. As these systems continue to refine their efficiency and expand their logical boundaries, the scientific community may soon find itself working hand-in-hand with machine agents to conquer the most stubborn intellectual frontiers of the human mind[6][1].

Sources
Share this article