xAI Unleashes Grok 4 Fast: High-Performance AI Now 98% Cheaper
xAI's Grok 4 Fast makes high-performance AI accessible, cutting costs by up to 98% with advanced unified architecture.
September 20, 2025

In a significant move aimed at making high-performance artificial intelligence more accessible and affordable, xAI has released Grok 4 Fast, a more efficient and cost-effective version of its flagship Grok 4 model.[1][2] The new model is engineered to deliver nearly the same level of performance as its more powerful predecessor but with substantially lower computational requirements, a development that could have wide-ranging implications for both enterprise and consumer applications of AI. According to xAI, Grok 4 Fast maintains comparable accuracy on a variety of benchmarks while utilizing approximately 40% fewer "thinking tokens," the units of text that AI systems process.[3][2] This leap in efficiency translates directly into dramatic cost savings, with the company claiming that the price per task can be reduced by as much as 98%, a figure corroborated by independent analysis.[3][1][4] This release signals a clear strategy from xAI to compete not just on raw power, but also on the increasingly crucial metrics of speed, cost, and accessibility, positioning itself aggressively in a crowded and rapidly evolving market. The model's introduction is poised to lower the barrier to entry for developers and businesses looking to leverage cutting-edge AI without incurring prohibitive operational costs.
At the core of Grok 4 Fast's appeal is its remarkable balance of performance and efficiency. xAI has substantiated its claims with a series of benchmark results that show the new model performing nearly on par with the larger Grok 4. On challenging academic benchmarks, Grok 4 Fast achieved scores of 85.7% on GPQA Diamond, 92% on AIME 2025, and 93.3% on HMMT 2025, closely mirroring the results of Grok 4.[3][5] Furthermore, the model has demonstrated improved capabilities in practical applications such as code execution and search-based tasks, boasting a 95% pass rate on SimpleQA and 74% on X Bench Deepsearch.[3][5] This high level of performance is coupled with a massive two million token context window, enabling the model to process and understand vast amounts of information in a single input.[3][6][5][4] The efficiency gains are not merely theoretical; they have been validated by independent reviews which confirm a state-of-the-art price-to-intelligence ratio for Grok 4 Fast.[4] This combination of high-level reasoning, a large context window, and breakthrough efficiency is set to make advanced AI capabilities accessible for a broader range of applications, from complex data analysis to more responsive and capable chatbots.
One of the key technical innovations underpinning Grok 4 Fast is its unified architecture, a significant departure from previous designs.[4][7] Unlike earlier iterations that relied on separate models for reasoning-intensive tasks and quicker, non-reasoning responses, Grok 4 Fast integrates both functionalities into a single, streamlined system.[3][6][5] This unified weight space, steerable via system prompts, allows the model to dynamically adjust its computational effort based on the query's complexity, reducing both latency and token consumption.[6][4][7] For users and developers, this means faster response times for simple questions and the ability to seamlessly engage in deeper, more complex reasoning without switching between different models.[4] The model has also been trained from the ground up using reinforcement learning specifically for tool use, excelling at deciding when to access external tools like web browsers or code interpreters to augment its responses with real-time information.[6][8] This "agentic" capability is particularly evident in its search performance, where it can navigate websites and the X platform to synthesize up-to-date answers, a feature that has already propelled it to the top of leaderboards in competitive evaluations like the LMArena Search Arena.[4][2][7]
In a move to democratize access to its latest technology, xAI has made Grok 4 Fast widely available across multiple platforms and at a highly competitive price point. The model is now accessible to all users on grok.com, as well as on its iOS and Android applications, with even free users gaining access to this new tier of performance for the first time.[3][6][4][2] For developers, xAI is providing API access through platforms like OpenRouter and Vercel AI Gateway, with a promotional free period to encourage adoption.[3][4][8] The pricing structure is designed to be highly accessible, starting at just $0.20 per million input tokens for smaller workloads.[3][5][2] The API is offered in two versions, "grok-4-fast-reasoning" and "grok-4-fast-non-reasoning," both of which support the full two million token context window, giving developers fine-grained control over the model's behavior to optimize for either speed or depth depending on their specific use case.[6][5][4] This broad availability and aggressive pricing strategy underscore xAI's intent to capture a significant share of the AI market by providing a powerful, yet economical, alternative to offerings from competitors.
The introduction of Grok 4 Fast represents a pivotal moment in the ongoing development of large language models. By successfully engineering a system that significantly reduces the cost and computational overhead of high-end AI without a commensurate drop in performance, xAI is challenging the notion that cutting-edge capabilities must come with an exorbitant price tag. This development not only intensifies the competitive landscape, putting pressure on other major AI labs to innovate on efficiency, but it also empowers a wider community of developers, researchers, and businesses. The ability to deploy a model with a two-million-token context window and sophisticated reasoning capabilities at a fraction of previous costs opens up new possibilities for complex, real-time applications that were previously impractical. As the industry continues to grapple with the immense energy and financial costs of training and deploying ever-larger models, the focus on "intelligence density" and cost-efficiency exemplified by Grok 4 Fast is likely to become an increasingly important and defining trend.