Google AI Overviews reach 91 percent accuracy milestone despite persistent source grounding errors

As Google AI hits 91 percent accuracy, source grounding failures and rising zero-click searches create new risks for digital truth.

April 7, 2026

Google AI Overviews reach 91 percent accuracy milestone despite persistent source grounding errors
The transformation of the modern search engine from a directory of links into an authoritative answer engine has reached a critical milestone as new research indicates that Google AI Overviews are now factually accurate in more than nine out of ten instances.[1] For years, the integration of generative artificial intelligence into search results was viewed with skepticism, largely due to high-profile "hallucinations" where the software suggested using non-toxic glue to keep cheese on pizza or recommended eating rocks for health benefits. However, a comprehensive study conducted by the AI startup Oumi on behalf of the New York Times suggests that Google has significantly narrowed the error margin. By testing over four thousand unique search queries using the industry-standard SimpleQA benchmark, researchers found that the latest iteration of Google's search AI, powered by the Gemini 3 model, achieved a 91 percent accuracy rate.[2][1] This represents a notable improvement from just a few months prior, when the previous Gemini 2 model registered an 85 percent success rate.[1] While a 90 percent accuracy grade would be considered excellent in many academic or industrial settings, the sheer scale of global search introduces a unique set of complications for the tech giant and its billions of users.
The methodology behind these findings highlights the rapid pace of development within Google's Large Language Model (LLM) ecosystem. The Oumi study utilized a sample of 4,326 searches, focusing on factual queries where a definitive answer exists. The jump from 85 percent to 91 percent accuracy between October and February indicates that Google is successfully fine-tuning its models to prioritize factual consistency. Despite this progress, the study uncovered a paradoxical trend regarding the verifiability of these answers.[2][1][3][4] Even as the AI became more accurate, its "grounding"—the ability of the linked sources to actually support the claims made in the summary—appeared to decline.[2][1] In the Gemini 3 testing, approximately 56 percent of the correct answers were considered ungrounded, meaning the sources Google cited did not fully back up the information provided, or in some cases, even contradicted it.[2] This suggests that while the AI is becoming better at "knowing" the right answer through its internal training data, it is struggling to consistently pair those answers with the correct external documentation.
The implications of a nine percent error rate are staggering when applied to Google's total search volume, which is estimated at more than five trillion queries per year.[1] In practical terms, an accuracy rate of 91 percent still leaves room for tens of millions of incorrect answers to be generated every hour.[1] The study highlighted several high-profile misses that illustrate the subtlety of these errors. For instance, in response to a query about when musician Bob Marley's home was converted into a museum, the AI provided the year 1987, whereas the correct answer was 1986.[1] In another example involving legendary cellist Yo-Yo Ma, the AI claimed there was no record of his induction into the Classical Music Hall of Fame despite linking directly to the organization’s website which listed his induction.[1] These types of factual slips are particularly concerning in the context of "Your Money or Your Life" (YMYL) topics, such as healthcare, legal advice, and financial planning. Although Google has been more cautious about triggering AI Overviews for these sensitive categories, data shows that the AI now appears in 88 percent of healthcare-related searches, creating a scenario where millions of users may be receiving medical summaries that are statistically likely to contain errors once every ten searches.
Beyond the question of factual integrity, the rise of AI Overviews is fundamentally restructuring the economics of the open web. For decades, the implicit contract between Google and content creators was that the search engine would index information in exchange for sending traffic to the publisher's website. The Oumi report and accompanying industry data suggest this contract is being rewritten in favor of a centralized AI interface. Market analysis from SEO firms indicates that "zero-click" searches—queries where the user finds their answer on the search results page without clicking any external links—now account for nearly 58 percent of all Google activity. When an AI Overview provides a comprehensive summary that satisfies the user's intent, the organic click-through rate for the top-ranking websites can plummet by as much as 65 percent. This shift has led to the emergence of a new discipline known as Generative Engine Optimization (GEO).[5] Marketers and publishers are no longer just fighting for a spot in the traditional "top ten" blue links; they are now competing to be the "grounding" source that the AI cites in its response box, even if the AI doesn't actually require the user to visit their page to get the information.
The competitive landscape of the AI industry is also being shaped by these accuracy benchmarks. While Google dominates the market share, competitors like OpenAI’s SearchGPT and Perplexity have pushed the industry toward a model of "citations-first" search. Google's struggle with grounding—where 56 percent of answers could not be verified by the provided links—stands in contrast to some smaller competitors who prioritize transparency over pure speed or conversational flow. Furthermore, research from BrightEdge suggests that Google’s AI is significantly more likely to display a "point of view" than its competitors, showing a 44 percent higher tendency to surface negative brand sentiment or controversial opinions compared to ChatGPT.[6] This indicates that as AI Overviews become more accurate at fetching facts, they may also be becoming more influential in shaping public perception and brand reputation, moving away from the role of a neutral librarian and toward that of an automated editor.
For the average user, the standard disclaimer at the bottom of every AI Overview—stating that AI responses may include mistakes—serves as a legal safeguard for Google, but it does little to mitigate the psychological impact of a 90 percent reliable system. Human behavior research suggests that when a system is correct most of the time, users develop a sense of "automation bias," where they stop verifying the information and begin to trust the output implicitly. This makes the remaining ten percent of errors far more dangerous than if the system were frequently wrong. As Google continues to iterate on its Gemini models, the goal of reaching 100 percent accuracy remains an elusive, and perhaps impossible, target for generative AI. The industry is currently at a crossroads where the convenience of a 91 percent accurate instant answer is being weighed against the long-term health of the information ecosystem that provides the very data these models need to function.
The future of search will likely depend on whether Google can solve the grounding paradox and ensure that every "correct" answer is also a "verifiable" one. If the search engine continues to move toward a model where users are presented with answers rather than links, the burden of proof will shift entirely to the AI itself. For now, the "nine out of ten" finding serves as both a testament to the incredible technical progress made in natural language processing and a stark reminder of the risks inherent in delegating our collective knowledge to a probabilistic machine. The transition from an index of the world's information to a definitive voice on the world's facts is nearly complete, but the final ten percent of the journey may prove to be the most challenging for the AI industry to navigate.

Sources
Share this article