ChatGPT 5.5 Pro achieves landmark breakthrough by solving original doctoral level math problems autonomously
Fields Medalist Timothy Gowers reveals how ChatGPT 5.5 Pro independently solved open problems, achieving a landmark in autonomous mathematical research.
May 9, 2026

In a landmark moment for both computer science and higher mathematics, Sir Timothy Gowers, a Fields Medalist and professor at the University of Cambridge, has announced that OpenAI’s latest model, ChatGPT 5.5 Pro, successfully produced original mathematical research of doctoral quality. The achievement, which occurred in less than two hours and without any guiding human intervention, has sent shockwaves through the academic community. Gowers reported that the model solved an open problem in number theory, improving an existing mathematical bound from an exponential scale to a polynomial one.[1][2] This specific type of advancement—reducing the complexity of a bound—is often the centerpiece of a PhD thesis or a major peer-reviewed journal article. By demonstrating that an artificial intelligence can bridge this gap independently, the experiment suggests that large language models have moved beyond mere calculation and into the realm of creative, structural reasoning.
The experiment was conducted using problems recently proposed by Melvyn Nathanson in a paper regarding additive number theory.[2] Nathanson, a veteran in the field, is known for identifying problems that eventually become central to mathematical trends.[3][4] Gowers, who has long been interested in the intersection of AI and formal logic, decided to see if the Pro version of ChatGPT 5.5 could handle genuine, unsettled questions rather than textbook exercises. The model was tasked with investigating a specific combinatorial parameter where the best known upper bound was exponential.[2] After roughly 17 minutes of internal processing, the model produced a construction that yielded a quadratic upper bound, which Gowers described as clearly the best possible result for that specific case.[2][1] What followed was an even more complex challenge involving work by Isaac Rajagopal, a researcher at the Massachusetts Institute of Technology. Rajagopal had previously established an exponential bound for a related problem, and the AI was asked to improve upon it. Through a few iterations of self-correction, the model not only refined the bound but pushed it all the way to a polynomial result, a feat that typically requires a deep, intuitive understanding of combinatorial structures.
What makes this development particularly significant is the nature of the "idea" generated by the AI. Rajagopal, upon reviewing the output, stated that the model’s core strategy was completely original and clever, noting it was the sort of insight a human mathematician would be proud to have achieved after weeks of intense deliberation. This marks a departure from the "stochastic parrot" critique often leveled at language models, which suggests they only recombine existing text without understanding. In this instance, the model did not simply retrieve a known proof from its training data, as the specific problem had not been solved in the literature. Instead, it swapped out a component of the existing proof for a more efficient variant known in other areas of combinatorics, a cross-disciplinary application that represents the high-water mark of mathematical creativity. Gowers emphasized that his own contribution to the process was zero; he did not provide hints or steer the logic, but merely acted as a witness to the model's autonomous reasoning process.[1]
The speed and efficiency of the research process have introduced a new variable into the equation of scientific discovery. The transition from identifying the problem to generating a full LaTeX preprint took the model just over 31 minutes. In a traditional academic setting, such a task would involve months of reading, drafting, and manual verification. The model even handled the technical "checking" of its own conjectures, reporting optimism about certain statements before taking nine minutes to verify them through a internal chain-of-thought process. This rapid-fire delivery of verified results suggests that the "System 2" reasoning capabilities—long-form, slow, and logical thinking—have been significantly refined in the 5.5 Pro architecture. For the AI industry, this validates the shift toward inference-time scaling, where models are given more computational resources to "think" before they speak, allowing them to navigate the vast tree of logical possibilities inherent in a mathematical proof.
Beyond the technical achievement, the implications for the future of mathematical research and education are profound.[5] Gowers argued that this event necessitates a total reassessment of what constitutes a "contribution" to the field. If a model can dispatch "gentle" research problems in an afternoon, the bar for human mathematicians is effectively raised. Gowers noted that the new standard for a PhD or a published paper may soon be the ability to prove something that an LLM cannot.[1] This creates a potential crisis for early-career researchers who traditionally cut their teeth on the very types of intermediate-level problems that AI is now beginning to dominate.[5] There is also the burgeoning question of attribution and the "immortality" that mathematicians seek through named theorems.[6] If a human identifies a problem but the machine provides the central insight and the technical proof, the traditional structures of academic credit and tenure become increasingly difficult to maintain.[5]
However, many in the community, including Gowers and his colleagues at MIT, see this not as the end of human mathematics but as a transformation of the craft. The role of the mathematician may shift toward that of a "research director" or an architect who frames the high-level questions and verifies the machine's output. Just as the calculator did not eliminate the need for mathematicians but instead freed them from the drudgery of long division, these advanced models could act as accelerators that allow humans to tackle far more ambitious conjectures. The focus may move toward "formalization," where human intuition is paired with machine precision to explore territories of mathematics that were previously too dense or computationally taxing for a single human mind to grasp. The consensus among elite researchers is that we are entering a phase of human-AI synergy where the "manual labor" of proof-finding is automated, leaving the broader conceptual strategy to the human observer.
In conclusion, the performance of ChatGPT 5.5 Pro in solving these additive number theory problems represents a definitive shift in the capabilities of artificial intelligence. It has demonstrated an ability to produce doctoral-level research that is original, technically sound, and useful to the mathematical community. This milestone suggests that the boundary between human intuition and machine logic is becoming increasingly porous.[5] As AI systems continue to improve their reasoning depth, the mathematical world must now grapple with a reality where the machine is no longer just a tool, but a peer-level collaborator capable of independent discovery.[5] The challenge moving forward will be to redefine the value of human ingenuity in an era where the most difficult logical puzzles can be solved in the time it takes to have a lunch break.