OpenAI Dumps Nvidia Chips for Cerebras Deal to Conquer AI Speed Race.
Seeking the world’s fastest platform, OpenAI commits $10 billion to Cerebras for specialized, low-latency inference.
February 3, 2026

The world's foremost artificial intelligence developer, OpenAI, has signaled a strategic shift in its hardware procurement by seeking alternatives to a segment of its Nvidia chip supply, a move that culminated in a landmark deal with AI chip startup Cerebras. This development underscores the intensely competitive nature of the AI infrastructure race and highlights a critical divergence in the performance requirements for different AI workloads. The reported dissatisfaction does not concern the chips used for training massive AI models, a domain where Nvidia's Graphics Processing Units (GPUs) remain dominant, but rather the speed of its hardware for "inference" tasks—the process where a trained model like ChatGPT generates a real-time response to a user query.[1][2][3] Sources indicate that OpenAI has been exploring alternatives since the previous year, with a particular focus on addressing latency issues in applications such as its code-generation tool, Codex, where speed is paramount for a seamless user experience.[1][2][3][4]
The core of the technical challenge lies in the memory architecture favored by traditional GPUs. Inference, particularly for the enormous large language models in use today, demands more memory access relative to computational operations compared to the intensive calculations of model training.[2][5] Nvidia's GPUs rely on external memory, which introduces latency as the chip must repeatedly fetch data from off-chip storage.[1][5] OpenAI's search for new hardware focused on chips that feature a large amount of Static Random-Access Memory (SRAM) embedded directly onto the silicon, an architecture that drastically reduces the time needed to access data and offers a significant speed advantage for high-volume, low-latency applications like chatbots.[1][2][5] The company is reportedly seeking new hardware that will eventually account for about ten percent of its future inference computing needs, an effort that reflects an increased emphasis on optimizing its models for real-time user interaction.[1][2][3][5]
The quest for faster inference solutions led OpenAI into negotiations with several startups, most notably Cerebras and Groq. The negotiations with Groq were reportedly cut short when Nvidia, the established market leader, signed a licensing agreement with the startup for its technology, an arrangement valued at twenty billion dollars.[1][5][4] This aggressive counter-move by Nvidia, which also involved hiring Groq's chip designers, highlights the industry giant’s keen awareness of the burgeoning competitive threat in the inference market.[1][5] Following the end of talks with Groq, OpenAI formally announced a commercial deal with Cerebras, which had earlier declined an acquisition offer from Nvidia.[1][5]
The resulting partnership between OpenAI and Cerebras Systems is a massive, multi-year, multi-billion dollar agreement, with some industry sources estimating its value to exceed ten billion dollars.[6][7][8] The deal calls for the deployment of 750 megawatts of Cerebras's wafer-scale systems, with the rollout of this computing capacity scheduled in multiple phases through 2028.[9][7][8] Cerebras's unique technology centers on its Wafer-Scale Engine (WSE), an entire silicon wafer dedicated to a single processor, which is designed to run large AI models with exceptional efficiency.[6][7][8] The company claims its wafer-scale chips can deliver AI responses up to 15 times faster than conventional GPU-based systems, a claim that directly addresses OpenAI's expressed need for dedicated, low-latency inference solutions.[9][6] OpenAI CEO Sam Altman confirmed that the Cerebras deal is specifically aimed at meeting the high-speed requirements necessary for coding models.[1][4] The co-founder and president of OpenAI, Greg Brockman, stated that the partnership is expected to make their platform the fastest AI platform in the world, which would help unlock the next generation of use cases.[8]
This strategic pivot by OpenAI carries significant implications for the broader AI industry and the future of hardware design. It not only validates the innovative architectures developed by challengers like Cerebras but also signifies the maturation of the AI market into distinct segments, with specialized hardware becoming necessary to optimize for both model training and real-time inference.[5] For Cerebras, the ten-billion-dollar-plus contract represents a monumental vote of confidence from a leading industry customer, potentially bolstering its financial position ahead of a renewed push for an initial public offering.[6][7] Furthermore, the hardware disagreement coincides with a reported stall in Nvidia’s highly publicized plan to invest up to one hundred billion dollars in OpenAI.[1][10][11][4] While Nvidia's CEO publicly dismissed notions of tension as "nonsense" and reaffirmed plans for a "huge" investment, the negotiations have dragged on for months, adding a layer of corporate intrigue to the technical shift.[10][11][4] This unfolding drama between two of the most influential companies in the AI ecosystem illustrates the growing pains of a rapidly expanding industry, where the pursuit of speed and efficiency is driving new alliances and challenging the established market hierarchy. The move underscores a fundamental principle in technology: as AI applications transition from the research lab to mass-market utility, performance per dollar and response latency in inference will become the new competitive battlegrounds.[5]