OpenAI AI Reaches Gold-Medal IMO Level, Demonstrating Unprecedented Reasoning

Not just a math win: OpenAI's unreleased AI shows deep general reasoning, enabling unprecedented human-AI collaboration.

July 21, 2025

OpenAI AI Reaches Gold-Medal IMO Level, Demonstrating Unprecedented Reasoning
An unreleased artificial intelligence model from OpenAI has reportedly achieved a gold medal-level performance at the prestigious International Mathematical Olympiad (IMO), a feat that signals a significant leap in AI reasoning capabilities and hints at the potential for these systems to tackle increasingly complex, multi-step problems. The model successfully solved five out of six problems under strict competition conditions, which prohibit the use of the internet or any external tools, earning a score of 35 out of 42 points.[1][2][3] This achievement is particularly noteworthy not just for the result itself, but for the methodology behind it, which deviates from specialized, task-specific AI and instead points toward more generalized and advanced reasoning processes.[4][5] The success suggests a future where AI can assist in, and perhaps even drive, breakthroughs in science, cryptography, and other fields that rely on intricate, long-form argumentation and creative problem-solving.[1][6]
The core of this breakthrough lies in a departure from narrow, task-specific training.[7] Unlike systems designed exclusively for mathematics, such as Google DeepMind's AlphaGeometry, the OpenAI model is described as a general-purpose reasoning language model.[4][5] According to OpenAI researchers, its success stems from new developments in general-purpose reinforcement learning and what they term "test-time compute scaling."[7][3] This approach allows the model to "think" for longer periods—hours, in the case of the IMO problems, compared to seconds or minutes for previous models—and to do so more efficiently.[7] The model generates solutions in natural language, crafting detailed, multi-page proofs that were independently graded and unanimously approved by three former IMO medalists.[1][7] This ability to construct what one researcher called "intricate, watertight arguments at the level of human mathematicians" marks a significant evolution from simply providing a final correct answer to demonstrating a human-like reasoning process.[2][8]
The implications of this achievement extend far beyond the realm of competitive mathematics, suggesting a new trajectory for AI development focused on deep reasoning. The progress in handling IMO-level problems, which demand sustained creative thinking over hours, represents a significant jump in the "reasoning time horizon" for AI.[8] Previously, benchmarks like the MATH dataset required reasoning on the scale of minutes.[8] This new capability is built upon techniques like process supervision, where the model is rewarded for each correct step in its reasoning process, rather than just the final outcome.[9] This method not only improves performance but also enhances alignment with human-endorsed thinking, making the AI's reasoning more interpretable and trustworthy.[9][10] The ability to generate detailed explanations and self-evaluate thought processes is a fundamental shift, transforming AI from a pattern-matcher into a system capable of structured, transparent thought that can be applied to diverse fields like coding and scientific discovery.[11][12]
The success of the experimental model stands in stark contrast to the performance of current publicly available AI systems. A recent evaluation of leading models, including Gemini 2.5 Pro and OpenAI's own o3 and o4-mini, on the same IMO 2025 tasks showed that none could even reach the score required for a bronze medal.[4] Those models were plagued by logical errors, incomplete arguments, and even fabricated theorems, highlighting the "jagged intelligence" where AI can excel at some complex tasks while failing at simpler ones.[1][4] OpenAI's announcement of the gold medal-winning model, which is not expected to be publicly released for several months, seems strategically timed to showcase this massive leap in capability.[2] While OpenAI has released the model-generated proofs on GitHub, some experts remain skeptical, calling for independent verification by official IMO coordinators and more transparency about the methodology to rule out potential overfitting or other issues.[7][5][6]
Ultimately, this mathematical milestone serves as a powerful indicator of the future of AI. The development of general-purpose reasoning models that can tackle exceptionally difficult, long-form problems opens the door to a new era of human-AI collaboration.[13] Such advanced AI could serve as invaluable tools for mathematicians and scientists, helping to verify proofs, generate novel conjectures, and even accelerate research by bridging knowledge gaps between specialized fields.[14] While the path to artificial general intelligence (AGI) remains a subject of debate, the ability to craft creative, rigorous mathematical arguments suggests that AI is moving beyond simple computation and into the realm of genuine intellectual partnership.[5][6] The real story is not merely that an AI solved a hard math problem, but that it demonstrated a form of reasoning that may soon be applied to solve some of the most complex and pressing challenges facing humanity.

Sources
Share this article