Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbitPubblicato 2026-04-13Pubblicato ultima volta 2026-04-13

Introduzione

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (powered by Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to nearly 57 million incorrect answers generated hourly. A critical finding is the prevalence of "unsubstantiated citations." For correct answers, the rate of citations that do not support the AI's summary surged from 37% to 56% with the Gemini 3 upgrade, making it difficult for users to verify information. The AI heavily relies on low-quality sources, with Facebook and Reddit being among its top-cited websites. Furthermore, the system is highly manipulable. A BBC journalist successfully "poisoned" it by publishing a fabricated article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing its use of the SimpleQA benchmark and an AI model (Oumi's own) to evaluate another AI. The company maintains its AI Overviews, combined with its search ranking systems, perform better than the underlying model alone. Critics note this defense does little to bolster user confidence in the feature's reliability.

Author: Claude, Deep Tide TechFlow

Deep Tide Guide: A recent test conducted by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is approximately 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is disseminating misinformation on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA, developed by OpenAI, to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducted in two rounds: one in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's massive scale. Google processes approximately 5 trillion search queries annually. With a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unsubstantiated citations."

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had the problem of "unsubstantiated citations," meaning the links attached to the AI summary did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it is increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The heavy reliance on low-quality sources by AI Overviews exacerbates this problem. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% rate in accurate answers.

BBC Journalist's Fake Article "Poisons" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by stating that the search AI feature is built on the same ranking and security mechanisms used to block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several concerns about Oumi's study. A Google spokesperson called the research "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model, HallOumi, to judge another AI's performance, potentially introducing additional errors; and the test content does not reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. However, Google emphasized that AI Overviews, leveraging the search ranking system, performs better in accuracy than the model alone.

Nevertheless, as PCMag pointed out in a logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this likely does not enhance user confidence in your product's accuracy.

Domande pertinenti

QWhat was the accuracy rate of Google's AI Overviews feature as tested by Oumi, and how many errors does this translate to per hour given Google's search volume?

AThe accuracy rate of Google's AI Overviews was found to be 91% in the test. Given Google's annual volume of 5 trillion searches, this 9% error rate translates to over 57 million inaccurate answers generated every hour.

QAccording to the Oumi study, what was the trend in 'unsubstantiated citations' between the Gemini 2 and Gemini 3 versions of the AI Overviews?

AThe problem of 'unsubstantiated citations' (where the provided links did not support the AI's answer) increased from 37% with Gemini 2 to 56% with the upgraded Gemini 3.

QWhich low-quality websites were identified as major sources frequently cited by Google's AI Overviews?

AFacebook and Reddit were identified as the second and fourth most frequently cited sources by the AI Overviews feature.

QHow did a BBC journalist demonstrate the vulnerability of Google's AI Overviews to manipulation?

AA BBC journalist tested the system by publishing a deliberately fabricated article. Within 24 hours, Google's AI Overviews began presenting the false information from that article as a factual answer to user queries.

QWhat were Google's main criticisms of the Oumi study's methodology?

AGoogle criticized the study for having 'serious flaws,' stating that the SimpleQA benchmark itself contains inaccuracies, that using Oumi's own AI model to judge another AI could introduce errors, and that the test queries did not reflect real user search behavior.

Letture associate

Coldcard exploit sparks Bitcoin flight, ‘bullish’ crypto consolidation: Hodler’s Digest, August 2

A Coldcard hardware wallet exploit led to estimated losses of 1,367 BTC ($88.6 million), causing a spike in small Bitcoin transfers as users moved funds to centralized exchanges and other custody methods. In U.S. politics, the Clarity Act faces hurdles with time running out for a Senate vote, amid debates over ethics rules and crypto regulation. Major crypto firms like Coinbase reported disappointing Q2 earnings, while an analyst notes the industry is entering a significant consolidation phase, with revenue concentrating in a few dominant protocols like Hyperliquid and Pump.fun. Bitcoin's price decline continued, though some analysts suggest the market may have bottomed. Other news includes Telegram's legal troubles in Russia and Australia, layoffs at Pump.fun ahead of token distributions, and a White House staffer accused of insider betting leaving his post.

cointelegraph6 min fa

Coldcard exploit sparks Bitcoin flight, ‘bullish’ crypto consolidation: Hodler’s Digest, August 2

cointelegraph6 min fa

LATEST NEWS: Donald Trump makes a sharp statement regarding Iran! He has halted attacks

U.S. President Donald Trump announced he called off planned military strikes against Iran after Saudi Arabia, the UAE, Qatar, and Iran itself requested a delay. Trump stated the planned operation would have been large-scale and powerful but was suspended to allow time for diplomatic negotiations. He added that regional allies believe an agreement is near, with initial talks focused on security and reopening the Strait of Hormuz. Negotiations on Iran's nuclear program would follow once that is settled. The Strait of Hormuz is a vital global chokepoint for oil and gas shipments, and conflict there could significantly impact energy prices and world trade. Trump further announced that new talks with Iran will begin tomorrow. Separately, Trump commented on events involving the Japanese yen, stating the U.S. intervened in the market due to good relations with Japan, asserting Washington's consistent support for Tokyo and mutual economic benefits from the relevant rules. *This is not an investment recommendation.

cryptonews.ru1 h fa

LATEST NEWS: Donald Trump makes a sharp statement regarding Iran! He has halted attacks

cryptonews.ru1 h fa

Bank of Italy Finds No Systemic Advantages of Stablecoins in Transfers

A study by the Bank of Italy found that stablecoins do not offer a consistent advantage in cost or speed for cross-border money transfers. The research compared sending 200 USDC in 10 bilateral corridors (Italy to Brazil, Argentina, Japan, UAE, and South Africa) against standard money transfer services. While the final cost of stablecoin transfers ranged from 0.3% to nearly 9%, and were often cheaper than the global average cost of 6.65%, they only outperformed services like Wise in three out of seven comparable corridors. Key costs and delays were attributed to fees for converting to and from fiat currency and the quality of local payment infrastructure, not blockchain fees. Transfer times varied from under 20 minutes in corridors with instant payment systems to one or two business days where such infrastructure was lacking. The authors concluded that stablecoins' benefits would be more significant if they could be spent directly without conversion and noted that overly restrictive regulations complicate retail use without eliminating demand.

cryptonews.ru2 h fa

Bank of Italy Finds No Systemic Advantages of Stablecoins in Transfers

cryptonews.ru2 h fa

Bitcoin Chart Pattern 'Head and Shoulders' Promises a Rise to $67,200

Bitcoin price action is forming a potential bullish reversal pattern. Currently trading around $63,200, BTC is shaping the right shoulder of an inverse head-and-shoulders formation. Analysts note this pattern is the primary reason for short-term bullish optimism, targeting a key breakout toward $67,200. However, market dynamics show a rotation of liquidity into Ethereum. The ETH/BTC pair has already broken upward, with ETH establishing an uptrend and targeting 0.0312. Against the US dollar, ETH is testing support near $1,875, with a path to $2,163 if it holds. This relative strength in ETH signals overall market positivity but drains volume from Bitcoin. The near-term outlook for BTC hinges on a decisive breakout above the pattern's neckline. Failure to do so could see bears push prices toward support levels at $60,000 and $58,000.

cryptonews.ru2 h fa

Bitcoin Chart Pattern 'Head and Shoulders' Promises a Rise to $67,200

cryptonews.ru2 h fa

Bitcoin Boom in Full Swing: Saylor's Latest Statement Fuels Buying Speculation

MicroStrategy's Executive Chairman Michael Saylor has fueled speculation about a new Bitcoin purchase by posting "Bitcoin Drive engaged" on August 2, accompanied by the company's customary purchase tracker. This aligns with his pattern of hinting at treasury changes ahead of weekly reports. The accompanying report showed MicroStrategy's Bitcoin holdings at 843,775 BTC, with an average cost of $75,653 per coin and an unrealized loss of -$10.58B. A similar signal preceded the company's July 27 announcement, strengthening expectations for a treasury update on Monday. However, MicroStrategy's real-time ledger reflects two recent Bitcoin sales totaling 3,588 BTC, reducing holdings from 847,363 BTC to the current 843,775 BTC. The company stated these sales funded preferred stock dividends and replenished its U.S. dollar reserve. Recent reports indicate the company made no Bitcoin purchases the week ending July 26 while increasing its dollar reserve to approximately $3.75B. The company faces financial headwinds after reporting an $8.33B operating loss for Q2 2026, including an $8.32B unrealized loss on its digital assets. Management may sell up to $1.25B more in Bitcoin to meet cash obligations. The expected Monday update will reveal if the "Bitcoin Drive" signal marks a return to accumulation as MicroStrategy balances its massive Bitcoin stash against growing cash commitments.

cryptonews.ru2 h fa

Bitcoin Boom in Full Swing: Saylor's Latest Statement Fuels Buying Speculation

cryptonews.ru2 h fa

Trading

Spot