AI Tech Suite

Thinking Machines Achieves Deterministic LLMs, Revolutionizing AI Reliability

Mira Murati's startup ends LLM unpredictability, ensuring consistent, reliable AI for critical scientific and enterprise use.

September 11, 2025

Thinking Machines Achieves Deterministic LLMs, Revolutionizing AI Reliability

In a significant development for the artificial intelligence industry, the startup Thinking Machines, led by former OpenAI CTO Mira Murati, has announced a breakthrough in solving the pervasive issue of nondeterminism in large language models. The company asserts it has identified and engineered a solution for the frustrating inconsistency that causes LLMs to produce different outputs even when given the exact same input under identical settings. This lack of predictability has been a major roadblock for scientific research and enterprise applications where reliability and reproducibility are paramount. For years, the AI community has grappled with this randomness, often attributing it to the complex interplay of concurrent processing on GPUs and the peculiarities of floating-point arithmetic. However, Thinking Machines' research reframes the entire problem, pointing to a more controllable factor. This breakthrough promises to usher in a new era of reliability for AI, potentially accelerating its adoption in high-stakes, precision-critical fields and resolving a fundamental challenge that has plagued developers since the dawn of the LLM era.

The core of the problem, as detailed in a blog post by Thinking Machines researcher Horace He, is not the long-held "concurrency + floating point hypothesis."[1] While tiny rounding errors in parallel computations can cause variations, the company argues the primary culprit is a lack of "batch invariance" in the inference kernels—small, highly-optimized programs that run on GPUs to generate LLM responses.[2][3] In practice, LLM providers bundle multiple user queries together into a "batch" to process them efficiently.[2] From a user's perspective, the number of other queries in that batch (i.e., the server load) is random and constantly changing.[4] Thinking Machines discovered that the kernels used in current inference systems employ different calculation strategies for different batch sizes to maximize performance.[2] This means the mathematical path to the result changes depending on how many other people are using the service at the same moment, leading to different final outputs even when a user's input and settings, like a "temperature" of zero, remain constant.[2][4] This subtle but powerful variable has been the true source of nondeterminism, making debugging a nightmare and scientific replication nearly impossible.[2][5]

Having diagnosed the problem, Thinking Machines engineered a direct solution: creating batch-invariant kernels. This involved reprogramming the core operations of a Transformer model to force them to use the same calculation strategy regardless of the batch size.[2] The team focused on three key areas: RMSNorm, matrix multiplication, and the most complex component, the attention mechanism.[2] By ensuring the reduction order and calculation paths are fixed, they achieved bitwise-identical results across different server loads.[2] This forces a trade-off, as using a less specialized computational method can be slower, but it guarantees consistency.[2] The company's work effectively demonstrates that the randomness many accepted as an inherent, almost mystical property of large-scale AI is, in fact, an engineering artifact that can be controlled. This shifts the perception of LLM outputs from being unavoidably unpredictable to being predictably consistent, provided the underlying software is designed for it.

The implications of achieving deterministic AI are profound and far-reaching. For the scientific community, it means experiments involving LLMs can finally be truly reproducible, a cornerstone of scientific progress that has been largely absent in AI research.[4][5] Businesses that have hesitated to deploy AI in critical functions due to unpredictable outcomes can now move forward with greater confidence, knowing that a trusted input will yield a trusted output every time.[5][6] This is particularly crucial for industries like finance, healthcare, and law, where consistency is not just a feature but a requirement. Furthermore, solving nondeterminism stands to significantly advance the field of reinforcement learning (RL), a method for training AI through rewards.[6][7] Inconsistent outputs create noisy and unreliable training data, hindering an AI's ability to learn effectively.[6][7] With deterministic models, the feedback loop becomes clearer and more efficient, potentially unlocking huge advances in training more capable and complex AI agents, a key part of Thinking Machines' long-term strategy.[2][6]

In conclusion, the work by Mira Murati's Thinking Machines Lab marks a pivotal moment in the evolution of artificial intelligence. By correctly identifying batch variance in inference kernels as the root cause of nondeterminism and engineering a viable solution, the company has addressed one of the most significant and persistent challenges facing the field.[2][3][8] This breakthrough moves the industry beyond simply accepting the erratic behavior of LLMs as an unavoidable flaw and provides a clear path toward building more reliable, trustworthy, and scientifically rigorous AI systems.[2] While performance trade-offs exist, the ability to guarantee consistent outputs opens the door to a new wave of enterprise adoption and more efficient research and development. The shift from unpredictable creativity to deterministic reliability could fundamentally alter how we build, deploy, and interact with artificial intelligence, making it a more dependable and indispensable tool for progress across all sectors of society.[5]