Tiny Recursive AI TRM Outperforms LLM Giants on Reasoning, Redefines Efficiency
A tiny 7-million parameter AI model, using recursive reasoning, outperforms industry giants, signaling a shift to smarter, more efficient AI.
October 9, 2025

In a significant challenge to the prevailing "bigger is better" philosophy dominating the artificial intelligence industry, a new, remarkably small AI model has demonstrated superior performance on a key reasoning benchmark, outperforming massive systems developed by tech giants. The Tiny Recursive Model (TRM), with only 7 million parameters, has surpassed well-known large language models (LLMs) like Google’s Gemini 2.5 Pro and OpenAI's o3‑mini on the notoriously difficult Abstraction and Reasoning Corpus (ARC-AGI) benchmark, a test designed to measure an AI's fluid intelligence. This achievement suggests that architectural innovation, rather than sheer scale, may hold the key to unlocking more advanced and efficient AI reasoning capabilities.
Developed by researcher Alexia Jolicoeur-Martineau at the Samsung Advanced Institute of Technology in Montreal, TRM operates on a principle of recursive reasoning.[1][2] Instead of relying on trillions of parameters to process information in a single, massive computational step, TRM takes an iterative approach.[1] The model begins with an initial guess at a solution and then cycles through a refinement process up to 16 times, progressively improving its own answer and correcting errors from previous steps.[3][1] This method allows a tiny, two-layer network to achieve a depth of reasoning that emulates extremely deep neural networks without the immense memory and processing requirements.[4] TRM is a streamlined and more effective version of a predecessor, the Hierarchical Reasoning Model (HRM), simplifying the approach by using a single network and removing the need for complex biological arguments or mathematical theorems that underpinned the earlier model.[4][3] This lean architecture has proven to be a feature, not a bug, as its small size helps prevent overfitting, a common issue when training on specialized datasets.[3]
The success of TRM is most evident in its performance on the ARC-AGI benchmark, a test specifically created to evaluate an AI's ability to solve novel problems that are easy for humans but have historically stumped machines.[5] The benchmark assesses fluid intelligence, the capacity to reason and adapt to new situations, rather than relying on memorized knowledge.[5] On the ARC-AGI-1 test, the 7M-parameter TRM achieved a score of 45% accuracy.[4][1] On the newer and more challenging ARC-AGI-2, TRM scored 7.8%.[3][2] These results are notable when compared to some of the industry's leading models; Gemini 2.5 Pro scored only 4.9% on ARC-AGI-2, while o3-mini-high achieved just 3%.[2] TRM's prowess extends beyond this specific test, as it also achieved state-of-the-art results on other complex reasoning tasks, scoring 87.4% on the Sudoku-Extreme dataset and 85.3% on Maze-Hard puzzles.[3]
Perhaps the most disruptive aspect of TRM's breakthrough is its profound computational efficiency, which stands in stark contrast to the resource-intensive nature of large-scale AI development. The entire model was trained in just two days using only four NVIDIA H-100 GPUs, at a cost estimated to be less than $500.[2][6] This is a minuscule fraction of the multi-million dollar budgets required to train foundational LLMs, which often require vast data centers and enormous energy consumption.[1][2] This "less is more" approach demonstrates that architectural ingenuity can be a more sustainable and accessible path toward advanced AI.[6] By proving that a small, specialized model can reason more effectively on certain tasks than a general-purpose giant, the research presents a compelling argument against the current trajectory of ever-expanding model sizes and opens the door for startups and smaller research labs to make significant contributions without massive financial backing.[2][7]
The emergence of TRM signals a potential paradigm shift in the field of artificial intelligence, suggesting that the future may lie not just in scaling up existing models but in creating a diversity of AI architectures tailored for specific functions. While TRM excels at abstract reasoning and logic puzzles, it has not yet been tested on the open-ended language and perception tasks where LLMs shine.[8][1] This points toward a future where hybrid systems may become the norm, with large models handling generative language tasks while delegating complex logical or mathematical reasoning to smaller, more efficient recursive modules like TRM.[8] This development could democratize AI, enabling powerful reasoning systems to run locally on devices like smartphones and wearables, enhancing privacy and real-time responsiveness.[1][9] The success of the Tiny Recursive Model is a powerful reminder that in the quest for artificial general intelligence, the most elegant solution may not be the largest, but the smartest.