Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbitОпубліковано о 2026-04-13Востаннє оновлено о 2026-04-13

Анотація

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (powered by Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to nearly 57 million incorrect answers generated hourly. A critical finding is the prevalence of "unsubstantiated citations." For correct answers, the rate of citations that do not support the AI's summary surged from 37% to 56% with the Gemini 3 upgrade, making it difficult for users to verify information. The AI heavily relies on low-quality sources, with Facebook and Reddit being among its top-cited websites. Furthermore, the system is highly manipulable. A BBC journalist successfully "poisoned" it by publishing a fabricated article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing its use of the SimpleQA benchmark and an AI model (Oumi's own) to evaluate another AI. The company maintains its AI Overviews, combined with its search ranking systems, perform better than the underlying model alone. Critics note this defense does little to bolster user confidence in the feature's reliability.

Author: Claude, Deep Tide TechFlow

Deep Tide Guide: A recent test conducted by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is approximately 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is disseminating misinformation on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA, developed by OpenAI, to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducted in two rounds: one in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's massive scale. Google processes approximately 5 trillion search queries annually. With a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unsubstantiated citations."

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had the problem of "unsubstantiated citations," meaning the links attached to the AI summary did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it is increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The heavy reliance on low-quality sources by AI Overviews exacerbates this problem. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% rate in accurate answers.

BBC Journalist's Fake Article "Poisons" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by stating that the search AI feature is built on the same ranking and security mechanisms used to block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several concerns about Oumi's study. A Google spokesperson called the research "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model, HallOumi, to judge another AI's performance, potentially introducing additional errors; and the test content does not reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. However, Google emphasized that AI Overviews, leveraging the search ranking system, performs better in accuracy than the model alone.

Nevertheless, as PCMag pointed out in a logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this likely does not enhance user confidence in your product's accuracy.

Пов'язані питання

QWhat was the accuracy rate of Google's AI Overviews feature as tested by Oumi, and how many errors does this translate to per hour given Google's search volume?

AThe accuracy rate of Google's AI Overviews was found to be 91% in the test. Given Google's annual volume of 5 trillion searches, this 9% error rate translates to over 57 million inaccurate answers generated every hour.

QAccording to the Oumi study, what was the trend in 'unsubstantiated citations' between the Gemini 2 and Gemini 3 versions of the AI Overviews?

AThe problem of 'unsubstantiated citations' (where the provided links did not support the AI's answer) increased from 37% with Gemini 2 to 56% with the upgraded Gemini 3.

QWhich low-quality websites were identified as major sources frequently cited by Google's AI Overviews?

AFacebook and Reddit were identified as the second and fourth most frequently cited sources by the AI Overviews feature.

QHow did a BBC journalist demonstrate the vulnerability of Google's AI Overviews to manipulation?

AA BBC journalist tested the system by publishing a deliberately fabricated article. Within 24 hours, Google's AI Overviews began presenting the false information from that article as a factual answer to user queries.

QWhat were Google's main criticisms of the Oumi study's methodology?

AGoogle criticized the study for having 'serious flaws,' stating that the SimpleQA benchmark itself contains inaccuracies, that using Oumi's own AI model to judge another AI could introduce errors, and that the test queries did not reflect real user search behavior.

Пов'язані матеріали

From Survival to Accelerated Growth: The Journey of Zcash's Three-Year Rise as Told by the Founder of ZODL

**From Survival to Accelerated Growth: Zcash Founder Details the 3-Year Rise** Three years ago, Zcash (ZEC) was a struggling pioneer in privacy technology, with a price near $30, low shielded supply (11%), and a community mired in governance disputes. Today, ZEC trades around $600, with over 31% of its supply (~$3B) in user-controlled shielded pools. This transformation resulted from breaking key constraints. First, **governance shackles were removed**. The old model guaranteed funding to two entities (ECC and ZF) regardless of performance, creating a monopoly. In 2024, ECC rejected further direct funding, forcing a change. The NU6 upgrade ended direct funding, allocating 8% to community grants and 12% to a protocol-controlled treasury for retroactive rewards, expiring in 2028 unless renewed by overwhelming consensus. The entities also relinquished their trademark-based veto power, freeing community governance. Second, the **product focus shifted** from pure cryptography to user growth. Previously, engineering excelled at privacy tech but failed to attract users. In early 2024, the team (later ZODL) pivoted to building products users wanted, like the Zodl wallet (default privacy, hardware support, cross-asset swaps). This drove shielded supply to grow over 400% in ZEC terms, with 86.5% of recent transactions being shielded, representing real user adoption. Third, the **narrative evolved** from the limiting "privacy coin" label to "unstoppable private money." This clarified Zcash's value proposition: a Bitcoin-like monetary policy with verifiable private payments via advanced cryptography. This structural narrative—protocol (Zcash), asset (ZEC), gateway (Zodl)—enabled broader exchange listings, institutional interest, and ETF filings. Finally, **organizational constraints were broken**. In early 2026, the ECC team left its non-profit structure after disputes over control, forming Zcash Open Development Lab (ZODL). ZODL raised $25M from top VCs (Paradigm, a16z, etc.), gaining the capital and agility of a startup to scale consumer products. Current metrics show strong momentum: social discussion volume for ZEC surged 15,245% in a year, with 81% positive sentiment. The focus is now on enhancing user experience (Zodl wallet), scalability (Tachyon project targeting Visa-level throughput with 25-second blocks), and post-quantum security (quantum-recoverable wallets coming soon). Zcash is positioned to become faster, more usable, scalable, and quantum-resistant.

marsbit4 хв тому

From Survival to Accelerated Growth: The Journey of Zcash's Three-Year Rise as Told by the Founder of ZODL

marsbit4 хв тому

Five Counterparty Risk Architectures: A Settlement-Layer Methodology for Classifying TradFi Models in Crypto Exchanges

**Summary:** This companion piece reframes the five TradFi-on-crypto exchange architectures, previously classified by "architectural fingerprint," through the lens of counterparty risk. The core question is: whose balance sheet bears the loss first in a stress scenario, and has it historically done so? Each of the five models corresponds to a distinct risk holder with its own documented failure modes. * **Model 1 (Stablecoin-Settled CEX Perpetuals):** Risk is held by the stablecoin issuer (e.g., reserve composition, bank connectivity) and the CEX's own book. History includes Tether's banking disconnections (2017) and reserve misrepresentations (CFTC 2021 Order). * **Model 2 (CFD Brokers):** Risk resides on the broker's balance sheet (B-book model). Regulatory differences (e.g., ESMA's mandatory negative balance protection vs. Mauritius FSC's lack thereof) define loss allocation rules, as seen in the 2015 SNB event (Alpari UK insolvency). * **Model 3 (Off-Chain Custody & Transfer Agent Chain):** Risk lies with the off-chain custodian/platform. User asset recovery depends on Terms of Use and corporate structure, exemplified by the Celsius bankruptcy ruling (2023) where Earn Account assets were deemed property of the estate. * **Model 4 (DEX Perpetual Protocols):** No single balance sheet bears risk. Loss absorption relies on a protocol's insurance fund and Auto-Deleveraging (ADL) mechanism, as demonstrated in the GMX V1 (2022) and dYdX v3 YFI (2023) incidents. * **Model 5 (Regulated CCP - DCM-DCO-FCM):** The most institutionalized model concentrates risk in the Central Counterparty (CCP). However, history shows CCPs can employ non-standard tools under extreme stress, such as mass trade cancellation (LME Nickel, 2022) or enabling negative price settlements (CME WTI, 2020). The report argues that regulatory choices and counterparty risk structures are co-extensive, not in an upstream-downstream relationship. It concludes with five separate observation checklists (not predictions) for monitoring the structural vulnerabilities of each risk model.

marsbit21 хв тому

Five Counterparty Risk Architectures: A Settlement-Layer Methodology for Classifying TradFi Models in Crypto Exchanges

marsbit21 хв тому

White House Crypto Advisor Fires Back At Bank CEOs In Stablecoin Rewards Clash

As a key crypto market structure bill nears a markup, the White House's top crypto advisor has criticized bank CEOs for reigniting a debate over stablecoin rewards. This follows a letter from American Bankers Association CEO Rob Nichols urging bank executives to pressure lawmakers to revise the CLARITY Act, claiming its current language inadequately prevents crypto firms from offering "interest-like rewards" on payment stablecoins, which could risk bank deposits and financial stability. The bill prohibits interest-equivalent payments but allows rewards for specific activities like staking. Patrick Witt, a White House digital assets advisor, rebuked the banking CEOs for refusing to attend White House mediation meetings on this issue, which has delayed the bill for months. While Senate sources suggest the banking lobby's push is weak and the committee is focused on other bill aspects, the debate could resurface when the legislation reaches the full Senate.

bitcoinist56 хв тому

White House Crypto Advisor Fires Back At Bank CEOs In Stablecoin Rewards Clash

bitcoinist56 хв тому

Here Are The Major Bitcoin Levels To Watch After Breaking $80,000

Bitcoin has closed a week above $80,000 for the first time since late January, strengthening the bullish case after breaking through the $78,000 to $80,000 bearish order block. The key to future price action now depends on whether bulls can hold two critical levels. The first is the $78,000 floor, which must be defended to confirm the former resistance zone has turned into support. The second is the $82,000 lower-high area; a clean break above it could open the path toward the next major resistance at $90,000. However, a truly bullish trend shift would only be confirmed with a high-timeframe close above the $97,900 "Change of Character" level, which would break the pattern of lower highs formed since late 2025. While currently trading around $81,000, analysis suggests there remains a significant chance Bitcoin retests the $60,000 zone before any sustained upward move.

bitcoinist1 год тому

Here Are The Major Bitcoin Levels To Watch After Breaking $80,000

bitcoinist1 год тому

Zcash Is Up 1,500% And Its Biggest Backer Says This Is Why

Zcash (ZEC) has surged approximately 1,500%, a rally attributed by prominent backer Josh Swihart to fundamental resets in governance, product strategy, narrative, and organization. Three years ago, Zcash had strong cryptography but weak momentum. Today, shielded transaction adoption has risen dramatically, and over $3 billion in value is held in shielded wallets. Key changes include a governance reset where direct funding to core institutions ended, breaking a perceived "stranglehold" and empowering the broader community. Product focus shifted to user adoption, exemplified by the Zodl wallet, which contributed to shielded supply rising from 11% to about 30%. The narrative moved from "privacy coin" to "unstoppable private money," improving institutional perception and leading to listings and investment. Organizationally, key developers formed the Zcash Open Development Lab (ZODL), raising $25 million to scale adoption. Current priorities are improving user experience, scalability (targeting faster block times), and post-quantum security. Swihart concluded these compounded efforts explain Zcash's accelerated growth.

bitcoinist2 год тому

Zcash Is Up 1,500% And Its Biggest Backer Says This Is Why

bitcoinist2 год тому

Торгівля

Спот

Ф'ючерси