Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbitОпубліковано о 2026-04-10Востаннє оновлено о 2026-04-10

Анотація

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to over 57 million incorrect answers generated hourly. A more critical issue is the prevalence of unsubstantiated citations. For correct answers, the rate of "unfounded citations"—where provided source links do not support the AI's claims—worsened, rising from 37% with Gemini 2 to 56% with Gemini 3. This makes it difficult for users to verify the information. The AI also heavily relies on low-quality sources, with Facebook and Reddit being its second and fourth most cited domains. Furthermore, the system is highly susceptible to manipulation. A BBC journalist successfully "poisoned" it by publishing a fake article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing the use of the SimpleQA benchmark and an AI model (Oumi's HallOumi) to evaluate its own AI. The company maintains that its internal safeguards and ranking systems improve accuracy beyond the base model's performance.

Author: Claude, Deep Tide TechFlow

Deep Tide Introduction: The latest test by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is about 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is delivering misinformation to users on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA developed by OpenAI to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducting one round in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's scale. Google processes approximately 5 trillion search queries annually. Calculating with a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unanchored" citation sources.

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had "unsupported citations," meaning the links attached to the AI summaries did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it's increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The problem is exacerbated by AI Overviews' heavy reliance on low-quality sources. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% in accurate answers.

BBC Journalist's Fake Article "Poisoned" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by saying the search AI feature is built on the same ranking and security mechanisms that block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several objections to Oumi's research. A Google spokesperson called the study "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model HallOumi to judge another AI's performance, potentially introducing additional errors; and the test content doesn't reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. But Google emphasized that AI Overviews leverages the search ranking system to improve accuracy, performing better than the model itself.

However, as PCMag's commentary pointed out the logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this probably doesn't enhance users' confidence in your product's accuracy.

Пов'язані питання

QWhat is the accuracy rate of Google's AI Overviews feature according to the Oumi study?

AThe accuracy rate of Google's AI Overviews was found to be approximately 91% when powered by Gemini 3, an improvement from about 85% with Gemini 2.

QHow many inaccurate answers does the article estimate Google's AI Overviews produces per hour?

ABased on Google's annual volume of 5 trillion searches and a 9% error rate, the AI Overviews feature is estimated to produce over 57 million inaccurate answers per hour.

QWhat is the 'unsubstantiated citation' problem identified in the report?

AThe 'unsubstantiated citation' problem refers to instances where the AI Overviews provides a correct answer, but the attached source links do not actually support the information given. This issue increased from 37% with Gemini 2 to 56% with Gemini 3.

QWhich low-quality websites are frequently used as sources by AI Overviews, according to the Oumi data?

AAccording to Oumi's data, Facebook and Reddit are the second and fourth most cited sources by AI Overviews, with Facebook being cited more frequently in inaccurate answers.

QHow did Google respond to the findings of the Oumi study?

AGoogle criticized the study, calling it 'seriously flawed.' Their spokesperson argued that the SimpleQA benchmark itself contains inaccuracies, that using an AI (HallOumi) to judge another AI introduces errors, and that the test queries do not reflect real user search behavior.

Пов'язані матеріали

Not a Price Hike, but a Supply Cut? Oil Prices Have Crossed the Tipping Point

The global oil market has passed a critical point, shifting the focus from price increases to potential physical supply shortages. The core issue is a time mismatch: even if the Strait of Hormuz reopens, shipping disruptions have already caused significant delays, which will continue to deplete onshore crude inventories for weeks. Refinery behavior acts as an amplifier. Reduced runs in Asia and Europe don’t reflect weaker demand but instead shrink product inventories, raising fuel prices and refining margins, which in turn encourages higher runs—creating a self-reinforcing cycle. If the Strait remains closed beyond April, traditional pricing models may fail. The market could face an unprecedented physical shortfall of 11-13 million barrels per day—roughly four times historical disruption levels. In such a scenario, price becomes an inadequate balancing tool. The only way to rebalance the market would be policy-driven demand destruction, similar to COVID-era lockdowns. Current prices around $95/barrel are insufficient to balance the market. Key signals to watch include inventory levels, policy announcements, and the pace of involuntary demand contraction. Geopolitically, the situation appears likely to worsen before improving, with little room for compromise between the US and Iran.

marsbit50 хв тому

Not a Price Hike, but a Supply Cut? Oil Prices Have Crossed the Tipping Point

marsbit50 хв тому

Amazon Invests Additional $25 Billion in Anthropic, AI Infrastructure 'Arms Race' Escalates

Amazon announces an additional investment of up to $25 billion in Anthropic, with $5 billion delivered immediately and the remaining contingent on performance milestones. This follows a recent $50 billion investment in OpenAI, highlighting Amazon's strategy of backing leading AI labs. The deal includes a commitment from Anthropic to spend over $100 billion on AWS infrastructure over the next decade, securing up to 5 gigawatts of computing power to address growing demand and capacity constraints. Anthropic’s annualized revenue has surpassed $30 billion, but the company faces infrastructure strain due to rapid user growth. The investment will support scaling Claude’s capabilities using Amazon’s custom Trainium and Graviton chips. The move deepens integration between Anthropic and AWS, allowing Claude to be accessed natively within AWS services. Over 100,000 organizations already use Claude via Amazon Bedrock. This investment is part of a broader AI infrastructure race, with Amazon planning around $200 billion in capital expenditures this year, largely focused on expanding AI compute capacity.

marsbit1 год тому

Amazon Invests Additional $25 Billion in Anthropic, AI Infrastructure 'Arms Race' Escalates

marsbit1 год тому

A New CEO Who Has Worked Exclusively with Hardware for 25 Years Takes Over Apple, Valued at 4 Trillion

Apple, the world's most valuable tech firm, has appointed John Ternus as its new CEO, effective September 1, replacing Tim Cook who transitions to executive chairman. Ternus, a 51-year-old mechanical engineer with 25-year tenure at Apple, previously served as senior vice president of hardware engineering. Known for his low public profile and lack of social media presence, he played key roles in developing products like the iPad, AirPods, and led Apple’s transition from Intel to in-house silicon chips. Ternus takes over a company valued at $4 trillion, with Cook having multiplied Apple’s revenue and built a $100 billion services business. However, he inherits significant challenges: Apple’s AI efforts, including the delayed Siri revamp powered by Google’s Gemini, remain unproven, and the company is preparing to launch its first foldable iPhone amid supply constraints. While Ternus’s hardware expertise positions him well for product-driven innovation, his lack of software/AI experience raises questions about Apple’s competitiveness in the AI era. The company’s structure has been adjusted to support him, with Cook remaining in an advisory role. Ternus’s engineer-led approach—emphasizing humility and collaboration—may prove vital in navigating Apple’s next chapter.

marsbit1 год тому

A New CEO Who Has Worked Exclusively with Hardware for 25 Years Takes Over Apple, Valued at 4 Trillion

marsbit1 год тому

Atkins' First Year at the Helm of the SEC: A Comprehensive Shift in Crypto Regulation

Paul Atkins marked his one-year anniversary as Chair of the U.S. Securities and Exchange Commission (SEC) on April 21, 2025, overseeing a significant shift in the agency’s approach to cryptocurrency regulation. Under his leadership, the SEC dropped multiple enforcement actions against crypto firms, approved several crypto-linked ETFs, and issued guidance clarifying that most cryptocurrencies are not considered securities under federal law. The SEC also signed a memorandum with the CFTC to improve regulatory coordination. These actions reversed the aggressive enforcement stance of his predecessor, Gary Gensler, and aligned with Trump administration promises to support the crypto industry. However, Atkins has faced criticism from Democratic lawmakers, including Senator Elizabeth Warren, who raised concerns over potential conflicts of interest, particularly regarding dropped cases linked to Trump-affiliated companies. While regulatory clarity has improved, the SEC still awaits congressional action to formally define its jurisdiction over digital assets.

marsbit1 год тому

Atkins' First Year at the Helm of the SEC: A Comprehensive Shift in Crypto Regulation

marsbit1 год тому

a16z: 5 Ways Blockchain Can Help AI Agent Infrastructure

Blockchain technology provides critical infrastructure for AI agents by addressing five key challenges: 1) Non-human identity: AI agents lack standardized, portable identity systems. Blockchain enables verifiable, cross-platform agent identities (like "Know Your Agent" frameworks) through cryptographic credentials and on-chain registries. 2) AI governance: When AI systems execute decisions, blockchain ensures transparency and prevents centralized control by recording actions on-chain and enabling auditable execution logs. 3) Payments: Stablecoins and crypto payments (e.g., x402, MPP) serve as default settlement layers for agent-to-agent commerce, enabling frictionless, programmable transactions for "headless" AI-native businesses. 4) Trust and verification: As AI scales, blockchain provides cryptographic proof of origin and auditable histories, making verification—not intelligence—the scarce resource. 5) User control: Crypto-native tools (e.g., delegation toolkits, intent-based architectures) allow users to set boundaries and maintain oversight over autonomous agents, minimizing blind trust. Together, blockchain and AI can create an economic infrastructure built on transparency, accountability, and user sovereignty.

marsbit2 год тому

a16z: 5 Ways Blockchain Can Help AI Agent Infrastructure

marsbit2 год тому

Торгівля

Спот

Ф'ючерси