Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbitОпубліковано о 2026-04-13Востаннє оновлено о 2026-04-13

Анотація

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (powered by Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to nearly 57 million incorrect answers generated hourly. A critical finding is the prevalence of "unsubstantiated citations." For correct answers, the rate of citations that do not support the AI's summary surged from 37% to 56% with the Gemini 3 upgrade, making it difficult for users to verify information. The AI heavily relies on low-quality sources, with Facebook and Reddit being among its top-cited websites. Furthermore, the system is highly manipulable. A BBC journalist successfully "poisoned" it by publishing a fabricated article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing its use of the SimpleQA benchmark and an AI model (Oumi's own) to evaluate another AI. The company maintains its AI Overviews, combined with its search ranking systems, perform better than the underlying model alone. Critics note this defense does little to bolster user confidence in the feature's reliability.

Author: Claude, Deep Tide TechFlow

Deep Tide Guide: A recent test conducted by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is approximately 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is disseminating misinformation on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA, developed by OpenAI, to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducted in two rounds: one in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's massive scale. Google processes approximately 5 trillion search queries annually. With a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unsubstantiated citations."

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had the problem of "unsubstantiated citations," meaning the links attached to the AI summary did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it is increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The heavy reliance on low-quality sources by AI Overviews exacerbates this problem. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% rate in accurate answers.

BBC Journalist's Fake Article "Poisons" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by stating that the search AI feature is built on the same ranking and security mechanisms used to block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several concerns about Oumi's study. A Google spokesperson called the research "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model, HallOumi, to judge another AI's performance, potentially introducing additional errors; and the test content does not reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. However, Google emphasized that AI Overviews, leveraging the search ranking system, performs better in accuracy than the model alone.

Nevertheless, as PCMag pointed out in a logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this likely does not enhance user confidence in your product's accuracy.

Пов'язані питання

QWhat was the accuracy rate of Google's AI Overviews feature as tested by Oumi, and how many errors does this translate to per hour given Google's search volume?

AThe accuracy rate of Google's AI Overviews was found to be 91% in the test. Given Google's annual volume of 5 trillion searches, this 9% error rate translates to over 57 million inaccurate answers generated every hour.

QAccording to the Oumi study, what was the trend in 'unsubstantiated citations' between the Gemini 2 and Gemini 3 versions of the AI Overviews?

AThe problem of 'unsubstantiated citations' (where the provided links did not support the AI's answer) increased from 37% with Gemini 2 to 56% with the upgraded Gemini 3.

QWhich low-quality websites were identified as major sources frequently cited by Google's AI Overviews?

AFacebook and Reddit were identified as the second and fourth most frequently cited sources by the AI Overviews feature.

QHow did a BBC journalist demonstrate the vulnerability of Google's AI Overviews to manipulation?

AA BBC journalist tested the system by publishing a deliberately fabricated article. Within 24 hours, Google's AI Overviews began presenting the false information from that article as a factual answer to user queries.

QWhat were Google's main criticisms of the Oumi study's methodology?

AGoogle criticized the study for having 'serious flaws,' stating that the SimpleQA benchmark itself contains inaccuracies, that using Oumi's own AI model to judge another AI could introduce errors, and that the test queries did not reflect real user search behavior.

Пов'язані матеріали

From Survival to Accelerated Growth: The Journey of Zcash's Three-Year Rise as Told by the Founder of ZODL

**From Survival to Accelerated Growth: Zcash Founder Details the 3-Year Rise** Three years ago, Zcash (ZEC) was a struggling pioneer in privacy technology, with a price near $30, low shielded supply (11%), and a community mired in governance disputes. Today, ZEC trades around $600, with over 31% of its supply (~$3B) in user-controlled shielded pools. This transformation resulted from breaking key constraints. First, **governance shackles were removed**. The old model guaranteed funding to two entities (ECC and ZF) regardless of performance, creating a monopoly. In 2024, ECC rejected further direct funding, forcing a change. The NU6 upgrade ended direct funding, allocating 8% to community grants and 12% to a protocol-controlled treasury for retroactive rewards, expiring in 2028 unless renewed by overwhelming consensus. The entities also relinquished their trademark-based veto power, freeing community governance. Second, the **product focus shifted** from pure cryptography to user growth. Previously, engineering excelled at privacy tech but failed to attract users. In early 2024, the team (later ZODL) pivoted to building products users wanted, like the Zodl wallet (default privacy, hardware support, cross-asset swaps). This drove shielded supply to grow over 400% in ZEC terms, with 86.5% of recent transactions being shielded, representing real user adoption. Third, the **narrative evolved** from the limiting "privacy coin" label to "unstoppable private money." This clarified Zcash's value proposition: a Bitcoin-like monetary policy with verifiable private payments via advanced cryptography. This structural narrative—protocol (Zcash), asset (ZEC), gateway (Zodl)—enabled broader exchange listings, institutional interest, and ETF filings. Finally, **organizational constraints were broken**. In early 2026, the ECC team left its non-profit structure after disputes over control, forming Zcash Open Development Lab (ZODL). ZODL raised $25M from top VCs (Paradigm, a16z, etc.), gaining the capital and agility of a startup to scale consumer products. Current metrics show strong momentum: social discussion volume for ZEC surged 15,245% in a year, with 81% positive sentiment. The focus is now on enhancing user experience (Zodl wallet), scalability (Tachyon project targeting Visa-level throughput with 25-second blocks), and post-quantum security (quantum-recoverable wallets coming soon). Zcash is positioned to become faster, more usable, scalable, and quantum-resistant.

marsbit4 хв тому

From Survival to Accelerated Growth: The Journey of Zcash's Three-Year Rise as Told by the Founder of ZODL

marsbit4 хв тому

Five Counterparty Risk Architectures: A Settlement-Layer Methodology for Classifying TradFi Models in Crypto Exchanges

**Summary:** This companion piece reframes the five TradFi-on-crypto exchange architectures, previously classified by "architectural fingerprint," through the lens of counterparty risk. The core question is: whose balance sheet bears the loss first in a stress scenario, and has it historically done so? Each of the five models corresponds to a distinct risk holder with its own documented failure modes. * **Model 1 (Stablecoin-Settled CEX Perpetuals):** Risk is held by the stablecoin issuer (e.g., reserve composition, bank connectivity) and the CEX's own book. History includes Tether's banking disconnections (2017) and reserve misrepresentations (CFTC 2021 Order). * **Model 2 (CFD Brokers):** Risk resides on the broker's balance sheet (B-book model). Regulatory differences (e.g., ESMA's mandatory negative balance protection vs. Mauritius FSC's lack thereof) define loss allocation rules, as seen in the 2015 SNB event (Alpari UK insolvency). * **Model 3 (Off-Chain Custody & Transfer Agent Chain):** Risk lies with the off-chain custodian/platform. User asset recovery depends on Terms of Use and corporate structure, exemplified by the Celsius bankruptcy ruling (2023) where Earn Account assets were deemed property of the estate. * **Model 4 (DEX Perpetual Protocols):** No single balance sheet bears risk. Loss absorption relies on a protocol's insurance fund and Auto-Deleveraging (ADL) mechanism, as demonstrated in the GMX V1 (2022) and dYdX v3 YFI (2023) incidents. * **Model 5 (Regulated CCP - DCM-DCO-FCM):** The most institutionalized model concentrates risk in the Central Counterparty (CCP). However, history shows CCPs can employ non-standard tools under extreme stress, such as mass trade cancellation (LME Nickel, 2022) or enabling negative price settlements (CME WTI, 2020). The report argues that regulatory choices and counterparty risk structures are co-extensive, not in an upstream-downstream relationship. It concludes with five separate observation checklists (not predictions) for monitoring the structural vulnerabilities of each risk model.

marsbit21 хв тому

Five Counterparty Risk Architectures: A Settlement-Layer Methodology for Classifying TradFi Models in Crypto Exchanges

marsbit21 хв тому

Торгівля

Спот
Ф'ючерси
活动图片