Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbitОпубликовано 2026-04-10Обновлено 2026-04-10

Введение

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to over 57 million incorrect answers generated hourly. A more critical issue is the prevalence of unsubstantiated citations. For correct answers, the rate of "unfounded citations"—where provided source links do not support the AI's claims—worsened, rising from 37% with Gemini 2 to 56% with Gemini 3. This makes it difficult for users to verify the information. The AI also heavily relies on low-quality sources, with Facebook and Reddit being its second and fourth most cited domains. Furthermore, the system is highly susceptible to manipulation. A BBC journalist successfully "poisoned" it by publishing a fake article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing the use of the SimpleQA benchmark and an AI model (Oumi's HallOumi) to evaluate its own AI. The company maintains that its internal safeguards and ranking systems improve accuracy beyond the base model's performance.

Author: Claude, Deep Tide TechFlow

Deep Tide Introduction: The latest test by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is about 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is delivering misinformation to users on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA developed by OpenAI to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducting one round in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's scale. Google processes approximately 5 trillion search queries annually. Calculating with a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unanchored" citation sources.

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had "unsupported citations," meaning the links attached to the AI summaries did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it's increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The problem is exacerbated by AI Overviews' heavy reliance on low-quality sources. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% in accurate answers.

BBC Journalist's Fake Article "Poisoned" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by saying the search AI feature is built on the same ranking and security mechanisms that block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several objections to Oumi's research. A Google spokesperson called the study "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model HallOumi to judge another AI's performance, potentially introducing additional errors; and the test content doesn't reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. But Google emphasized that AI Overviews leverages the search ranking system to improve accuracy, performing better than the model itself.

However, as PCMag's commentary pointed out the logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this probably doesn't enhance users' confidence in your product's accuracy.

Связанные с этим вопросы

QWhat is the accuracy rate of Google's AI Overviews feature according to the Oumi study?

AThe accuracy rate of Google's AI Overviews was found to be approximately 91% when powered by Gemini 3, an improvement from about 85% with Gemini 2.

QHow many inaccurate answers does the article estimate Google's AI Overviews produces per hour?

ABased on Google's annual volume of 5 trillion searches and a 9% error rate, the AI Overviews feature is estimated to produce over 57 million inaccurate answers per hour.

QWhat is the 'unsubstantiated citation' problem identified in the report?

AThe 'unsubstantiated citation' problem refers to instances where the AI Overviews provides a correct answer, but the attached source links do not actually support the information given. This issue increased from 37% with Gemini 2 to 56% with Gemini 3.

QWhich low-quality websites are frequently used as sources by AI Overviews, according to the Oumi data?

AAccording to Oumi's data, Facebook and Reddit are the second and fourth most cited sources by AI Overviews, with Facebook being cited more frequently in inaccurate answers.

QHow did Google respond to the findings of the Oumi study?

AGoogle criticized the study, calling it 'seriously flawed.' Their spokesperson argued that the SimpleQA benchmark itself contains inaccuracies, that using an AI (HallOumi) to judge another AI introduces errors, and that the test queries do not reflect real user search behavior.

Похожее

KOL's Perspective: Why Is SOL Set to Rise from This Point?

**Summary: Why SOL is Positioned for Growth at This Level** The article argues that SOL is poised for an upward move from its current price point, citing several key factors. Primarily, SOL has just broken out of a 4-month consolidation phase. This breakout signals a return of risk appetite to the broader crypto market, as SOL is seen as a key indicator of overall crypto health. The token's ownership has reportedly shifted from short-term traders and tourists to long-term accumulators, leading to low volume. Any meaningful increase in trading activity could thus trigger significant upward momentum. Fundamental strengths include strong institutional adoption, integration with DeFi and RWAs (Real-World Assets), and the potential benefits from the Clarity Act. Despite its high volatility—having dropped 70% from its all-time high but still up 12x from its bear market low—SOL is highlighted as one of the few tokens from the last cycle to reach new highs. It boasts a robust ecosystem of applications, users, and protocols. Future catalysts include the expected influx of AI developers following the Miami Accelerate conference, which focused on AI on Solana. Furthermore, Solana is positioned as the premier chain for memecoin activity, a trend expected to continue and drive network usage and fees. The article concludes that recent price action reflects a healthy transfer to long-term holders, setting the stage for growth.

marsbit41 мин. назад

KOL's Perspective: Why Is SOL Set to Rise from This Point?

marsbit41 мин. назад

Those Pre-Bitcoin PoW Protocols Have Recently Been Reimplemented

This article details a recent surge in replicating pre-Bitcoin Proof-of-Work (PoW) protocols, specifically focusing on Hal Finney's 2004 RPOW (Reusable Proofs of Work). Within five days in May 2026, multiple independent builders in the Bitcoin/cypherpunk community launched projects inspired by this early electronic cash proposal. The initiative began with Fred Krueger's `rpow2.com`, a centralized but auditable system that replaced RPOW's original IBM 4758 hardware with Ed25519 signatures. Initially a faithful replica, it later adopted Bitcoin-like features (21M supply cap, difficulty adjustment) and a controversial 5.24% founder allocation. This sparked rapid forks, including `rpow4.com` which incorporated full Bitcoin parameters, a prediction market (`rpowmarket.com`), and a DEX (`rpow2swap.com`). Concurrently, Mike In Space created a prototype of Wei Dai's 1998 b-money proposal (`b-money.replit.app`), pushing the historical exploration even further back. The article contrasts these centralized, server-dependent experiments with Bitcoin's core innovation of decentralized, trustless consensus. It also highlights a parallel development: the `HASH` project on Ethereum, which uses smart contract hooks to enable a purely fair-launch, browser-mineable PoW token with 0% allocations to team or VCs. The collective activity is framed as a meme-driven, educational exploration of cypherpunk history rather than a serious financial movement, with all projects heavily disclaiming any investment value.

marsbit46 мин. назад

Those Pre-Bitcoin PoW Protocols Have Recently Been Reimplemented

marsbit46 мин. назад

South Korean Exchanges 'Battle' Regulators, Challenging the Boundaries of Enforcement and Legislation

South Korea's cryptocurrency industry is engaged in a rare, direct confrontation with regulators. The Financial Intelligence Unit (FIU), the primary anti-money laundering (AML) watchdog, has recently imposed heavy penalties on major exchanges like Upbit and Bithumb for alleged violations involving unregistered overseas VASPs and AML procedures. However, exchanges are now actively challenging these actions in court and through industry associations. In a significant shift, the Seoul Administrative Court ruled in favor of Upbit's operator, Dunamu, overturning part of an FIU-ordered business suspension. The court found the FIU's penalty criteria and justification insufficiently clear. Similarly, the court suspended the enforcement of a six-month business suspension against Bithumb pending a final ruling, citing potential irreversible harm to the exchange. Beyond legal battles, the industry is contesting proposed legislative amendments. The Digital Asset eXchange Alliance (DAXA) strongly opposes a draft rule that would mandate Suspicious Transaction Reports (STRs) for all crypto transfers over 10 million KRW (~$6,800). DAXA argues this "poison pill" clause violates legal principles and would overwhelm the STR system, increasing reports from 63,000 to an estimated 5.45 million annually for major exchanges, thereby crippling effective AML monitoring. This conflict highlights a structural tension in South Korea's crypto governance: comprehensive digital asset laws are still developing, while regulators rely heavily on AML enforcement. The industry's move from passive compliance to active legal and legislative challenges signifies a new phase, pressing for clearer rules and more proportionate enforcement. While short-term disputes may intensify, this clash could ultimately lead to a more mature and sustainable regulatory framework for South Korea's vibrant crypto market.

marsbit1 ч. назад

South Korean Exchanges 'Battle' Regulators, Challenging the Boundaries of Enforcement and Legislation

marsbit1 ч. назад

Earnings Report, CLARITY Bill, and Warsh's Arrival: CRCL Faces Three Major Tests This Week

Circle (CRCL) faces three major tests this week that will significantly impact its stock price and valuation. First, on May 11, it will release its Q1 2026 earnings. Key metrics to watch are overall revenue and EPS, the proportion of revenue paid to distributors like Coinbase, and growth in non-interest income. The market also awaits Circle's stance on renegotiating its revenue-sharing contract with Coinbase, which expires in August. Second, on May 14, the U.S. Senate Banking Committee will vote on the CLARITY Act. This bill aims to establish a clear federal regulatory framework for digital assets. A recent compromise proposal would ban yield on static stablecoin reserves but allow rewards for active ones. Its passage, currently seen as likely by prediction markets, would be a major positive for the industry and Circle. Finally, on May 15, Kevin Warsh will succeed Jerome Powell as Federal Reserve Chair. Warsh's proposed policy of quantitative tightening combined with interest rate cuts could pose a short-term headwind for CRCL, as lower rates reduce Circle's primary revenue from USDC reserves. However, long-term prospects may improve as Warsh, a known crypto investor who opposes a Fed CBDC, is seen as potentially favorable to regulated private stablecoins like USDC.

marsbit1 ч. назад

Earnings Report, CLARITY Bill, and Warsh's Arrival: CRCL Faces Three Major Tests This Week

marsbit1 ч. назад

After 50x Storage Surge, Justin Sun Always Looks to the Next Decade

Sun Yuchen, known for his controversial stunts like a $30 million lunch with Warren Buffett (canceled due to a kidney stone) and eating a $6.2 million duct-taped banana, is often overshadowed by a significant fact: his decade-long track record of spotting major investment trends. In 2016, he famously advised young people to invest in Bitcoin, Nvidia, Tesla, and Tencent instead of buying property. A hypothetical $20,000 investment in Nvidia and Tesla from that list would now be worth over 50 million RMB. His latest major call was on November 6, 2025, predicting a "50x storage opportunity" tied to the AI boom, which materialized with Sandisk's stock surging nearly 50-fold by 2026. Looking ahead, Sun now focuses on the next frontier: Physical AI. He identifies four key areas: 1. **Embodied AI/Robotics**: He sees this reaching its "iPhone moment," with companies like UBTech and Galaxy General leading in commercialization. 2. **Drones**: Viewed as the first commercially viable form of Physical AI, revolutionizing sectors from warfare (e.g., AeroVironment's Switchblade) to logistics. 3. **Spatial Computing**: Beyond VR, it's about AI understanding physical space, a foundational technology for robotics and autonomous systems, exemplified by Apple's Vision Pro. 4. **Space Exploration**: After a 2025 suborbital flight with Blue Origin, Sun advocates for space as the ultimate frontier, discussing blockchain's potential role in space asset management and data transactions. His investment philosophy involves betting on entire, inevitable trends rather than single companies. For robotics, he sees Tesla (the body/manufacturer) and Nvidia (the brain/AI platform) as complementary plays. In defense drones, he highlights companies making tanks obsolete (AeroVironment) and those augmenting fighter jets (Kratos). For space, he participated in Blue Origin's flight and anticipates SpaceX's potential IPO to redefine the sector's valuation. Sun Yuchen's vision frames the next two decades not as a revolution in information flow (like the internet), but in the fundamental operation of the physical world through AI-powered robots, autonomous systems, and spatial intelligence, ultimately extending human and AI activity into space. While many still focus on conventional assets, he continues to look toward the next technological horizon.

marsbit2 ч. назад

After 50x Storage Surge, Justin Sun Always Looks to the Next Decade

marsbit2 ч. назад

Торговля

Спот

Фьючерсы

Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

Введение

Correct Answers, Wrong Sources

BBC Journalist's Fake Article "Poisoned" Results Within 24 Hours

Google's Rebuttal: The Test Itself Is Flawed

Связанные с этим вопросы

Похожее

KOL's Perspective: Why Is SOL Set to Rise from This Point?

Those Pre-Bitcoin PoW Protocols Have Recently Been Reimplemented

South Korean Exchanges 'Battle' Regulators, Challenging the Boundaries of Enforcement and Legislation

Earnings Report, CLARITY Bill, and Warsh's Arrival: CRCL Faces Three Major Tests This Week

After 50x Storage Surge, Justin Sun Always Looks to the Next Decade

Торговля

Популярные категории

Популярные теги