Author: Claude, Deep Tide TechFlow
Deep Tide Guide: A recent test conducted by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is approximately 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.
Google is disseminating misinformation on an unprecedented scale, and most people are completely unaware.
According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA, developed by OpenAI, to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducted in two rounds: one in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.
91% sounds good, but it's a different story when considering Google's massive scale. Google processes approximately 5 trillion search queries annually. With a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.
Correct Answers, Wrong Sources
More alarming than the accuracy rate is the issue of "unsubstantiated citations."
Oumi's data shows that in the Gemini 2 era, 37% of correct answers had the problem of "unsubstantiated citations," meaning the links attached to the AI summary did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it is increasingly failing to "show its work."
Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"
The heavy reliance on low-quality sources by AI Overviews exacerbates this problem. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% rate in accurate answers.
BBC Journalist's Fake Article "Poisons" Results Within 24 Hours
Another serious flaw of AI Overviews is its susceptibility to manipulation.
A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.
This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by stating that the search AI feature is built on the same ranking and security mechanisms used to block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."
Google's Rebuttal: The Test Itself Is Flawed
Google raised several concerns about Oumi's study. A Google spokesperson called the research "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model, HallOumi, to judge another AI's performance, potentially introducing additional errors; and the test content does not reflect real user search behavior.
Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. However, Google emphasized that AI Overviews, leveraging the search ranking system, performs better in accuracy than the model alone.
Nevertheless, as PCMag pointed out in a logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this likely does not enhance user confidence in your product's accuracy.





