Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbitОпубликовано 2026-04-10Обновлено 2026-04-10

Введение

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to over 57 million incorrect answers generated hourly. A more critical issue is the prevalence of unsubstantiated citations. For correct answers, the rate of "unfounded citations"—where provided source links do not support the AI's claims—worsened, rising from 37% with Gemini 2 to 56% with Gemini 3. This makes it difficult for users to verify the information. The AI also heavily relies on low-quality sources, with Facebook and Reddit being its second and fourth most cited domains. Furthermore, the system is highly susceptible to manipulation. A BBC journalist successfully "poisoned" it by publishing a fake article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing the use of the SimpleQA benchmark and an AI model (Oumi's HallOumi) to evaluate its own AI. The company maintains that its internal safeguards and ranking systems improve accuracy beyond the base model's performance.

Author: Claude, Deep Tide TechFlow

Deep Tide Introduction: The latest test by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is about 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is delivering misinformation to users on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA developed by OpenAI to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducting one round in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's scale. Google processes approximately 5 trillion search queries annually. Calculating with a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unanchored" citation sources.

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had "unsupported citations," meaning the links attached to the AI summaries did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it's increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The problem is exacerbated by AI Overviews' heavy reliance on low-quality sources. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% in accurate answers.

BBC Journalist's Fake Article "Poisoned" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by saying the search AI feature is built on the same ranking and security mechanisms that block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several objections to Oumi's research. A Google spokesperson called the study "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model HallOumi to judge another AI's performance, potentially introducing additional errors; and the test content doesn't reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. But Google emphasized that AI Overviews leverages the search ranking system to improve accuracy, performing better than the model itself.

However, as PCMag's commentary pointed out the logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this probably doesn't enhance users' confidence in your product's accuracy.

Связанные с этим вопросы

QWhat is the accuracy rate of Google's AI Overviews feature according to the Oumi study?

AThe accuracy rate of Google's AI Overviews was found to be approximately 91% when powered by Gemini 3, an improvement from about 85% with Gemini 2.

QHow many inaccurate answers does the article estimate Google's AI Overviews produces per hour?

ABased on Google's annual volume of 5 trillion searches and a 9% error rate, the AI Overviews feature is estimated to produce over 57 million inaccurate answers per hour.

QWhat is the 'unsubstantiated citation' problem identified in the report?

AThe 'unsubstantiated citation' problem refers to instances where the AI Overviews provides a correct answer, but the attached source links do not actually support the information given. This issue increased from 37% with Gemini 2 to 56% with Gemini 3.

QWhich low-quality websites are frequently used as sources by AI Overviews, according to the Oumi data?

AAccording to Oumi's data, Facebook and Reddit are the second and fourth most cited sources by AI Overviews, with Facebook being cited more frequently in inaccurate answers.

QHow did Google respond to the findings of the Oumi study?

AGoogle criticized the study, calling it 'seriously flawed.' Their spokesperson argued that the SimpleQA benchmark itself contains inaccuracies, that using an AI (HallOumi) to judge another AI introduces errors, and that the test queries do not reflect real user search behavior.

Похожее

Crypto Industry Watches As Poland Advances Long-Delayed Regulatory Bill

Poland's parliament has passed a long-delayed cryptocurrency regulatory bill, driven by urgency following a fraud scandal at a local exchange. Prime Minister Donald Tusk cited the Zondacrypto case, where users lost access to funds and which had alleged Russian ties, as proof of the need for investor protections. The approved bill grants Poland's Financial Supervision Authority (KNF) broad powers to oversee the market, impose penalties, and block accounts or transactions. However, the legislation faces potential rejection, as it retains contentious account-blocking provisions that led President Karol Nawrocki to veto two prior versions. Critics argue it lacks sufficient judicial oversight. A third veto would prolong regulatory uncertainty as Poland races to align with the EU's MiCA framework by a July deadline. The bill prevailed over three other competing proposals in a parliamentary vote.

bitcoinist14 мин. назад

Crypto Industry Watches As Poland Advances Long-Delayed Regulatory Bill

bitcoinist14 мин. назад

Winter for Crypto IPOs: Consensys and Ledger Withdraw Applications

The crypto IPO window is tightening significantly in 2026, marked by prominent companies delaying or pausing their public listing plans. Following a successful 2025 "harvest year" that saw Circle, Bullish, and Gemini go public amidst a bull market, the tide has turned. Consensys, developer of MetaMask, recently postponed its IPO until at least fall 2026. Hardware wallet leader Ledger also suspended its planned US listing due to unfavorable market conditions, with Kraken having previously delayed its own plans. This shift is driven by a cooling market in 2026, characterized by a significant Bitcoin price correction, declining trading volumes, and reduced investor risk appetite for crypto stocks. The poor post-IPO performance of 2025 listings like Circle and Bullish, which saw major share price declines, has heightened investor caution. This contrasts sharply with the current AI sector, where companies like SpaceX, OpenAI, and Anthropic are commanding massive valuations and investor enthusiasm based on narratives of stable, exponential growth. Crypto companies now face pressure to transition from hype-driven models to demonstrating reliable cash flows and robust compliance. While the paused IPO plans may lead to valuation resets and affect ecosystem liquidity, they also accelerate industry consolidation toward stronger, more compliant infrastructure players. A potential recovery in Bitcoin's price and clearer regulations could reopen the IPO window in the latter half of 2026.

marsbit31 мин. назад

Winter for Crypto IPOs: Consensys and Ledger Withdraw Applications

marsbit31 мин. назад

ChatGPT Can Manage Your Money for You. Would You Trust It with Your Bank Account?

OpenAI has launched a personal finance tool for ChatGPT, currently in preview for US-based ChatGPT Pro users. This feature allows users to connect their bank and investment accounts (via Plaid, supporting over 12,000 institutions) directly to ChatGPT. It analyzes transactions, generates visual dashboards, and offers conversational financial advice—such as budgeting or planning for major purchases—based on the user's actual data. This move follows OpenAI's acquisitions of fintech startups Roi and Hiro Finance, signaling a strategic push into vertical "super assistant" applications, similar to its earlier health-focused feature. However, the launch has sparked significant privacy concerns. Critics question the safety of granting such sensitive financial access to an AI, especially amid ongoing lawsuits alleging OpenAI shared user chat data with third parties like Meta and Google. OpenAI emphasizes that ChatGPT only reads data (no transaction capabilities), deletes it within 30 days if disconnected, and offers opt-out options for model training. Yet, trust remains a major hurdle. The trend reflects a broader industry shift: AI companies like Anthropic and Perplexity are also targeting high-value, data-rich domains like finance and health. While technically promising, the tool operates in a regulatory gray area—it provides personalized guidance but disclaims formal financial advice or liability. Ultimately, OpenAI's challenge is convincing users to trust an AI with their most private financial information.

marsbit32 мин. назад

ChatGPT Can Manage Your Money for You. Would You Trust It with Your Bank Account?

marsbit32 мин. назад

Hyperliquid Sued by Two Major Traditional Exchanges

In a notable development within the financial sector, two of the world's largest traditional exchanges, CME Group and Intercontinental Exchange (ICE), have jointly filed a complaint with the U.S. Congress and the Commodity Futures Trading Commission (CFTC). They are demanding stricter regulatory oversight of the cryptocurrency derivatives platform Hyperliquid. This move signals escalating tensions between established traditional financial institutions and emerging digital asset trading venues, highlighting the ongoing regulatory debate surrounding the crypto industry.

marsbit33 мин. назад

Hyperliquid Sued by Two Major Traditional Exchanges

Hyperliquid, a cryptocurrency derivatives platform, is facing formal complaints from two major traditional exchanges, CME (Chicago Mercantile Exchange) and ICE (Intercontinental Exchange, parent company of the New York Stock Exchange). The firms have jointly approached the U.S. Congress and the CFTC (Commodity Futures Trading Commission), urging stricter regulatory oversight of Hyperliquid's operations. The action highlights growing tensions between established traditional financial institutions and emerging crypto-native platforms in the derivatives market.

链捕手4 ч. назад

Торговля

Спот

Фьючерсы

Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

Введение

Correct Answers, Wrong Sources

BBC Journalist's Fake Article "Poisoned" Results Within 24 Hours

Google's Rebuttal: The Test Itself Is Flawed

Связанные с этим вопросы

Похожее

Crypto Industry Watches As Poland Advances Long-Delayed Regulatory Bill

Winter for Crypto IPOs: Consensys and Ledger Withdraw Applications

ChatGPT Can Manage Your Money for You. Would You Trust It with Your Bank Account?

Hyperliquid Sued by Two Major Traditional Exchanges

Hyperliquid Sued by Two Major Traditional Exchanges

Торговля

Популярные категории

Популярные теги