OpenAI AI Solves Complex Math, Achieving Olympiad Gold Standard
Experimental AI tackles complex math with 'System 2' thinking, sparking AGI hopes but awaiting verification.
July 19, 2025

OpenAI has announced a significant development in artificial intelligence, claiming an experimental large language model has demonstrated the ability to solve complex mathematical problems at a level equivalent to a gold medal performance at the International Mathematical Olympiad (IMO). This assertion, if independently verified, would mark a substantial leap forward in AI reasoning capabilities, tackling a domain that has long been considered a grand challenge for machine intelligence. The IMO is the most prestigious mathematics competition for pre-university students, and its problems require a deep, creative, and sustained level of thinking that goes far beyond simple calculation. The claim has generated considerable excitement within the AI community, as success in this area suggests progress towards more general problem-solving abilities.
According to an OpenAI researcher, the experimental model was evaluated under the same stringent conditions as human competitors. It was given the problems from the 2025 IMO and had to produce solutions within two 4.5-hour sessions, without access to the internet or any other tools.[1] The model's submissions, which were in the form of natural language proofs, were then graded by a panel of three former IMO medalists.[2] The consensus of this panel was that the AI had successfully solved five out of the six problems, earning a total of 35 out of a possible 42 points—a score sufficient for a gold medal.[1][2] This performance is a dramatic improvement over previous models. For instance, in a qualifying exam for the IMO, the earlier GPT-4o model correctly solved only 13% of problems, whereas the new reasoning model, part of the 'o1' series, scored 83%.[3][4]
The technical underpinnings of this apparent breakthrough lie in a shift away from simply scaling up existing models and towards enhancing their intrinsic reasoning abilities.[3] This new generation of models, reportedly known internally by codenames such as "Strawberry" or "Q*", are described as "reasoning engines" designed to tackle complex, multi-step problems.[5][6] They employ a method analogous to what psychologist Daniel Kahneman termed "System 2" thinking, which is slow, deliberate, and analytical.[5][7] This contrasts with the faster, more intuitive "System 1" thinking that characterizes many previous AI systems.[5] The new approach combines techniques like chain-of-thought reasoning with reinforcement learning, allowing the model to think through problems step-by-step, generate potential solution paths, and even recognize and correct its own mistakes in real-time.[8][3] This iterative process allows the AI to craft intricate and logical arguments, a crucial skill for producing the multi-page proofs required for IMO problems.[2]
The implications of an AI achieving this level of mathematical reasoning are profound and extend far beyond the realm of competitive mathematics. Advanced reasoning is a cornerstone of human intelligence and a critical component for achieving artificial general intelligence (AGI). An AI that can independently reason through novel, complex problems could accelerate breakthroughs in numerous scientific and engineering disciplines, from drug discovery and materials science to advanced software development.[8][3] The ability to generate verifiable, human-readable proofs also opens up new possibilities for collaboration between humans and AI in research. Furthermore, this development has caused a significant stir in prediction markets, where the probability of an AI winning an IMO gold medal jumped from around 20% to 86% immediately following OpenAI's announcement.[1]
Despite the excitement, the claims remain unverified by the broader scientific community. OpenAI has stated that the model used is an experimental research model and that it does not plan to release a model with this level of mathematical capability for several months.[1] This lack of immediate, independent access has led to calls for caution. The history of AI is rife with claims of breakthroughs that were later found to be less robust than initially suggested. Recent research has shown that some AI models excel on benchmarks by effectively "memorizing" solutions from their training data, with their performance dropping significantly when presented with slightly altered problems.[9] Furthermore, controversies have arisen regarding transparency in AI benchmarking, with instances of AI companies funding the development of the very benchmarks they later achieve high scores on.[10][11] While there is no evidence of such issues in this case, the context underscores the critical need for independent evaluation and replication of OpenAI's results by neutral third parties.
In conclusion, OpenAI's announcement heralds a potential watershed moment for artificial intelligence. The claim of achieving a gold medal standard on International Mathematical Olympiad problems points to a new frontier in machine reasoning, with a model capable of sustained, creative, and logical thought. The technical approach, emphasizing deliberate, step-by-step problem-solving, could unlock new applications across science and technology, bringing the prospect of AGI closer to reality. However, the results are preliminary and have not yet undergone the crucible of independent peer review. The AI community awaits the opportunity to rigorously test these capabilities and verify whether this represents a true, generalizable leap in reasoning or a more specialized success. Until then, the claim stands as a tantalizing glimpse into the future of AI, balanced by a healthy and necessary scientific skepticism.
Sources
[4]
[7]
[8]
[9]
[10]
[11]