Google launches ultra-fast Gemini 3.5 Flash as industry-wide AI costs skyrocket

While Google’s ultra-fast new model delivers elite performance, its soaring token costs signal the end of cheap AI.

May 20, 2026

Google launches ultra-fast Gemini 3.5 Flash as industry-wide AI costs skyrocket
At its annual developer conference, Google unveiled Gemini 3.5 Flash, the latest iteration in its popular family of lightweight, high-speed artificial intelligence models. While the tech giant positioned the release as a major leap forward in both speed and capability, the launch has highlighted an increasingly undeniable reality for the software industry: advanced artificial intelligence is becoming substantially more expensive to operate[1][2]. Despite retaining the Flash moniker, which historically signaled a low-cost, developer-friendly alternative to premium models[1][3], Gemini 3.5 Flash marks a significant departure from its predecessor's economic model[1][2]. In benchmark testing, the new model has proven to be dramatically costlier, reflecting a broader pattern among leading AI labs, including OpenAI and Anthropic, which are aggressively raising prices on their newest systems to recoup massive research and development investments[1][2].
The pricing structure of Gemini 3.5 Flash reveals a sharp upward trajectory in token costs[1][2]. Google now charges developers $1.50 per million input tokens and $9.00 per million output tokens for the new model[1]. Compared to Gemini 3 Flash Preview, which cost just $0.50 per million input and $3.00 per million output tokens, this represents a tripling of basic rates[1][2]. When compared to the ultra-efficient Gemini 3.1 Flash-Lite, the price jump is even more pronounced, with the new model costing six times as much[2]. This pricing puts Gemini 3.5 Flash nearly on par with Google's premium Gemini 3.1 Pro, which sits at $2.00 per million input and $12.00 per million output tokens[2]. Consequently, the distinct boundary that once separated Google’s cheap utility models from its high-end reasoning systems has begun to blur[4][3].
However, the true financial impact for developers goes far beyond the base token rates[1]. According to an independent evaluation conducted by Artificial Analysis, Gemini 3.5 Flash actually costs 5.5 times as much to run in benchmark testing than its immediate predecessor[1]. This discrepancy is driven by a combination of higher baseline prices and increased token consumption during complex tasks[1]. Most notably, on multi-step agentic tasks, the total operational cost of Gemini 3.5 Flash actually exceeded that of the theoretically more expensive Gemini 3.1 Pro by 75 percent[1]. Because the model is highly verbose and requires a greater number of interaction steps to complete long-horizon tasks, it consumes vast quantities of tokens[1][5]. For enterprises building autonomous agents, this creates an unexpected paradox where a Flash model can end up costing significantly more than its premium stablemate[1][4].
Despite the financial sticker shock, Google is delivering undeniable performance upgrades in exchange for the premium[1][3]. Gemini 3.5 Flash has emerged as a powerhouse for agentic and multimodal workflows, proving to be exceptionally fast[1][3]. It delivers between 280 and 455 output tokens per second, making it the fastest model in its intelligence class and roughly four times faster than rival systems like OpenAI's GPT-5.5 and Anthropic's Claude Opus 4.7[1][3]. On Google's published evaluations, the model achieved an 83.6 percent score on the Model Context Protocol Atlas, an agentic benchmark[6]. This represents a solid victory over Claude Opus 4.7, which scored 79.1 percent, and GPT-5.5, which registered 75.3 percent[6]. For applications where rapid, multi-step tool execution and immediate feedback are paramount, the raw speed and coordination of Gemini 3.5 Flash present a compelling value proposition[6][3].
Nevertheless, the model is not a universal replacement for premier reasoning systems, particularly when it comes to programming and software development tasks[1]. In rigorous coding benchmarks, Gemini 3.5 Flash exhibits noticeable weaknesses, falling behind its primary competitors[1]. On SWE-Bench Pro, which measures a model's ability to resolve real-world software repository issues, the model scored 55.1 percent, trailing Anthropic's Claude Opus 4.7 at 64.3 percent and OpenAI's GPT-5.5 at 58.6 percent[6][7]. Similarly, in the Terminal-Bench 2.1 evaluation, GPT-5.5 outperformed Gemini 3.5 Flash with a score of 78.2 percent compared to the Google model's 76.2 percent[6]. These results suggest that while the model excels at fast-paced agentic planning and tool usage, developers focusing on deep, complex code generation may still find better accuracy and reliability in alternative frontier models[1][7].
Google's pricing adjustments are not an isolated strategy but rather part of a coordinated industry-wide shift toward monetizing frontier-level capabilities[1][2]. As the technology has matured, the era of heavily subsidized, ultra-low-cost AI models appears to be drawing to a close[1]. OpenAI set a similar precedent with the release of GPT-5.5, which launched at double the pricing of its predecessor, GPT-5.4, charging $5.00 input and $30.00 output per million tokens[8]. Anthropic followed a matching trajectory, elevating the cost of its Claude Opus 4.7 model[2]. This collective upward pressure on pricing is directly tied to the staggering capital requirements needed to build, train, and run these models[1]. With tech giants collectively investing tens of billions of dollars in specialized hardware, data centers, and power infrastructure, the commercial pressure to demonstrate clear avenues to profitability has intensified[1].
This transition presents a strategic dilemma for enterprises and independent developers who have built their entire pipelines around cheap API calls[1]. During the initial wave of AI adoption, many companies integrated large language models into their customer service, search, and internal automation tools with the assumption that operational costs would continually slide downward. Instead, they are now finding themselves squeezed by escalating token budgets and murky returns on investment[1][9]. While Google argues that customers can save money by migrating high-volume workloads from rival frontier models to Gemini 3.5 Flash, the fact remains that operating a modern, agentic system is no longer a cheap endeavor[9][3]. Developers are being forced to carefully architect their systems, utilizing cheaper legacy models like Gemini 3.1 Flash-Lite for low-complexity tasks, while reserving expensive new models like Gemini 3.5 Flash strictly for workflows that demand high speed and advanced agentic coordination[4][2].
Ultimately, the release of Gemini 3.5 Flash underscores a fundamental evolution in the generative AI market[1]. The distinction between affordable, lightweight models and premium, expensive systems is disappearing as the capabilities of Flash tiers begin to rival yesterday's flagship offerings[4][3]. By pricing Gemini 3.5 Flash closer to its premium Pro models and introducing massive performance enhancements alongside increased token demands, Google is signaling that the path forward for AI is one of premium, high-utility services rather than cheap, commoditized utilities[2][3]. As developers navigate this newly expensive landscape, the focus will inevitably shift from simply deploying the latest model to meticulously optimizing workflows to ensure that the impressive speed and intelligence of these new systems justify their rising financial costs[1].

Sources
Share this article