Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbitPubblicato 2026-04-10Pubblicato ultima volta 2026-04-10

Introduzione

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to over 57 million incorrect answers generated hourly. A more critical issue is the prevalence of unsubstantiated citations. For correct answers, the rate of "unfounded citations"—where provided source links do not support the AI's claims—worsened, rising from 37% with Gemini 2 to 56% with Gemini 3. This makes it difficult for users to verify the information. The AI also heavily relies on low-quality sources, with Facebook and Reddit being its second and fourth most cited domains. Furthermore, the system is highly susceptible to manipulation. A BBC journalist successfully "poisoned" it by publishing a fake article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing the use of the SimpleQA benchmark and an AI model (Oumi's HallOumi) to evaluate its own AI. The company maintains that its internal safeguards and ranking systems improve accuracy beyond the base model's performance.

Author: Claude, Deep Tide TechFlow

Deep Tide Introduction: The latest test by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is about 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is delivering misinformation to users on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA developed by OpenAI to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducting one round in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's scale. Google processes approximately 5 trillion search queries annually. Calculating with a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unanchored" citation sources.

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had "unsupported citations," meaning the links attached to the AI summaries did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it's increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The problem is exacerbated by AI Overviews' heavy reliance on low-quality sources. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% in accurate answers.

BBC Journalist's Fake Article "Poisoned" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by saying the search AI feature is built on the same ranking and security mechanisms that block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several objections to Oumi's research. A Google spokesperson called the study "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model HallOumi to judge another AI's performance, potentially introducing additional errors; and the test content doesn't reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. But Google emphasized that AI Overviews leverages the search ranking system to improve accuracy, performing better than the model itself.

However, as PCMag's commentary pointed out the logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this probably doesn't enhance users' confidence in your product's accuracy.

Domande pertinenti

QWhat is the accuracy rate of Google's AI Overviews feature according to the Oumi study?

AThe accuracy rate of Google's AI Overviews was found to be approximately 91% when powered by Gemini 3, an improvement from about 85% with Gemini 2.

QHow many inaccurate answers does the article estimate Google's AI Overviews produces per hour?

ABased on Google's annual volume of 5 trillion searches and a 9% error rate, the AI Overviews feature is estimated to produce over 57 million inaccurate answers per hour.

QWhat is the 'unsubstantiated citation' problem identified in the report?

AThe 'unsubstantiated citation' problem refers to instances where the AI Overviews provides a correct answer, but the attached source links do not actually support the information given. This issue increased from 37% with Gemini 2 to 56% with Gemini 3.

QWhich low-quality websites are frequently used as sources by AI Overviews, according to the Oumi data?

AAccording to Oumi's data, Facebook and Reddit are the second and fourth most cited sources by AI Overviews, with Facebook being cited more frequently in inaccurate answers.

QHow did Google respond to the findings of the Oumi study?

AGoogle criticized the study, calling it 'seriously flawed.' Their spokesperson argued that the SimpleQA benchmark itself contains inaccuracies, that using an AI (HallOumi) to judge another AI introduces errors, and that the test queries do not reflect real user search behavior.

Letture associate

Solana ETF Draws $39 Million Inflow in Single Week, Highest Since February, Futures Open Interest Surges 30%, Traders Target $120

Solana ETF sees its strongest weekly inflow since February, attracting $39.23 million with Bitwise's BSOL ETF accounting for 92% of the total. This marks a potential reversal of a six-month downtrend in monthly inflows. Simultaneously, SOL futures open interest surged 29.5% in the first half of May, reaching $6.4 billion, indicating significant new capital entering the derivatives market. Spot buying pressure also increased, with the Cumulative Volume Difference (CVD) rising sharply. On the fundamental side, Solana's largest-ever consensus upgrade, Alpenglow, has launched on a community testnet, aiming to drastically reduce transaction finality times. Mainnet deployment could occur next quarter. Analysts note that after a four-month consolidation period, short-term traders have largely exited, leaving conviction holders in control of the supply. This shift in holder structure means any renewed trading volume is more likely to push prices upward. A technical Adam & Eve bottom pattern suggests a price target of $120. However, short-term momentum has shown signs of cooling near the $95-$96 resistance level. Risks remain, including potential pressure from exchange sell orders if ETF inflows weaken again and the threat of cascading liquidations from highly leveraged positions.

marsbit32 min fa

Solana ETF Draws $39 Million Inflow in Single Week, Highest Since February, Futures Open Interest Surges 30%, Traders Target $120

marsbit32 min fa

BitMart Research Institute Weekly Highlights: A Comprehensive Review of Macro Environment, Crude Oil, AI Tech Stocks, and Crypto Market

**Weekly Market Review: Macro, Oil, AI Tech Stocks & Crypto Market** **Macroeconomic & Traditional Finance** The April U.S. Non-Farm Payrolls report of 115K new jobs exceeded expectations, but the data's quality was questioned. Growth was heavily concentrated in healthcare, while other sectors contracted, and manufacturing employment turned negative. A statistical model accounted for a large portion of the gains, conflicting with household survey data showing a loss of 226K jobs. Meanwhile, AI's impact on jobs is emerging, with information sector roles declining, though overall unemployment remains at ~4.3%. Oil prices hovered near $100 per barrel. Global oil buffer inventories have drawn down significantly, supporting prices, but high costs are suppressing demand. China's recent reduction in crude imports acted as a market stabilizer. Geopolitically, the U.S. and Iran are likely to reach a tentative agreement to keep the Strait of Hormuz open and avoid price spikes. For AI tech stocks, short-term prospects are mixed. A potential SpaceX IPO in June could pressure current index heavyweights like Nvidia, while smaller components might benefit. The mid-term focus shifts to Q2 earnings, emphasizing AI's return on investment. Long-term risks include potential election policy shifts and massive IPOs from companies like OpenAI, which could test the sector's sustainability. **Crypto Market & Ecosystem** Crypto markets rose moderately, with BTC climbing from ~$77K to ~$82K, driven by improved risk sentiment. Spot trading volumes remain low, but buying pressure is evident. ETF inflows continued (~$791M last week). However, institutional purchases of BTC and ETH were more modest than expected. The derivatives market shows lingering bearish bets, particularly on alts and ETH. A key trend is the "dual-track" model where projects pursue public listings for traditional funding while also building their own blockchains/tokens to capture crypto liquidity, as seen with Circle's ARC chain. Stablecoins and institutional chains present significant future opportunities. *Disclaimer: This is market analysis, not investment advice.*

marsbit34 min fa

BitMart Research Institute Weekly Highlights: A Comprehensive Review of Macro Environment, Crude Oil, AI Tech Stocks, and Crypto Market

marsbit34 min fa

Earnings Beat Expectations, $222 Million Chain Launch: Can Circle Escape the 'Interest Stock' Valuation Narrative?

Circle released its Q1 2026 financial results, with revenue of $6.94 billion missing expectations but adjusted EPS of $0.21 beating forecasts. USDC circulation grew 28% to $770 billion, and on-chain transaction volume surged 263% to $21.5 trillion. The same day, Circle announced it raised $222 million in a token pre-sale for its new Arc blockchain at a $3 billion valuation, led by a16z. It also unveiled Agent Stack, a suite of tools for AI agents to autonomously hold and transact in USDC. CEO Jeremy Allaire stated the company is evolving into a broader internet platform and operating system business. CRCL's stock price rose 16% on the news. Analysts are divided, with some viewing the strategic moves as an effort to shift Circle's valuation narrative away from reliance on USDC interest income and towards being an infrastructure platform and gateway to the AI agent economy. The success of this transition hinges on Arc's adoption and Agent Stack's competitiveness.

marsbit51 min fa

Earnings Beat Expectations, $222 Million Chain Launch: Can Circle Escape the 'Interest Stock' Valuation Narrative?

marsbit51 min fa

While Everyone Says NFTs Are 'Dead', the Art World is Quietly Completing an 'On-Chain Renaissance'

While many declare NFTs "dead" and dismiss them as overhyped JPEGs, a significant institutional shift is quietly underway within the art world, signaling a "on-chain renaissance." Traditional art, a ~$60B market, is stagnant, aging, and highly concentrated, facing a massive $80 trillion generational wealth transfer to digital-native heirs. Contrary to the narrative, leading institutions have been building infrastructure for digital and on-chain art. Major museums like MoMA, the Centre Pompidou, LACMA, and the Guggenheim have acquired seminal NFT works into their permanent collections. Top galleries like Pace, Gagosian, and Hauser & Wirth have launched NFT platforms or accepted crypto, with Pace giving a solo show to generative artist Tyler Hobbs. Auction houses Sotheby's and Christie's operate dedicated on-chain sales platforms. This follows a historical pattern where every major art movement—from Impressionism to Pop Art—was initially mocked before institutional acceptance. NFT art, only 7-12 years old, is progressing faster. Auction data shows resilience, with works by Beeple ($69.3M), Pak (~$91M), and Dmitri Cherniak ($6.2M in a bear market) achieving high prices. A new cohort of collectors (e.g., FlamingoDAO, PleasrDAO) and "Medici" figures like Cozomo de' Medici are accumulating foundational works. The core argument is that NFTs represent not a speculative asset class but a new ownership system for digital culture, solving provenance issues through immutable, timestamped blockchain records. The medium has survived the speculative crash and is being institutionalized. The bet isn't on short-term price rallies but on the long-term cultural significance of on-chain art as the defining medium for the next generation of collectors.

marsbit1 h fa

While Everyone Says NFTs Are 'Dead', the Art World is Quietly Completing an 'On-Chain Renaissance'

marsbit1 h fa

Jensen Huang's Message to Graduates: AI Won't Replace You, But Those Who Excel at Using AI Will

NVIDIA CEO Jensen Huang, addressing 2026 graduates at Carnegie Mellon University, emphasized that AI will not replace people, but those who leverage AI effectively will have an advantage. He delivered this message during a commencement speech where he also received an honorary doctorate, his seventh. Huang reflected on his personal journey as an immigrant, starting from humble beginnings as a dishwasher to co-founding NVIDIA. He shared early struggles, including a near-bankruptcy moment saved by honesty with Sega, highlighting resilience and learning from failure. He positioned the current era as the dawn of the AI revolution, a shift as significant as past computing waves. Huang explained that AI is redefining computing from human-written software to machine learning, creating a new industry focused on manufacturing intelligence. While acknowledging fears about job displacement, he argued that AI amplifies human capabilities rather than replaces human purpose. Tasks may be automated, but the core meaning of professions remains. Huang urged graduates to embrace this transformative time with responsibility and optimism. He stated that AI should democratize technology, bridging gaps and enabling broader participation in creation and problem-solving. His final advice was to actively engage with the opportunity: "So run, don’t walk," and to put their hearts into their work.

marsbit1 h fa

Jensen Huang's Message to Graduates: AI Won't Replace You, But Those Who Excel at Using AI Will

marsbit1 h fa

Trading

Spot

Futures