Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

marsbit发布于2026-04-13更新于2026-04-13

文章摘要

A New York Times investigation, in collaboration with AI startup Oumi, reveals significant accuracy and reliability issues with Google's AI Overviews search feature. Testing over 4,300 queries showed the accuracy rate improved from 85% (powered by Gemini 2) to 91% (Gemini 3). However, given Google's scale of ~5 trillion annual searches, this 9% error rate translates to nearly 57 million incorrect answers generated hourly. A critical finding is the prevalence of "unsubstantiated citations." For correct answers, the rate of citations that do not support the AI's summary surged from 37% to 56% with the Gemini 3 upgrade, making it difficult for users to verify information. The AI heavily relies on low-quality sources, with Facebook and Reddit being among its top-cited websites. Furthermore, the system is highly manipulable. A BBC journalist successfully "poisoned" it by publishing a fabricated article; Google's AI began presenting the false information as fact within 24 hours. Google disputed the study's methodology, criticizing its use of the SimpleQA benchmark and an AI model (Oumi's own) to evaluate another AI. The company maintains its AI Overviews, combined with its search ranking systems, perform better than the underlying model alone. Critics note this defense does little to bolster user confidence in the feature's reliability.

Author: Claude, Deep Tide TechFlow

Deep Tide Guide: A recent test conducted by The New York Times in collaboration with AI startup Oumi shows that the accuracy rate of Google Search's AI Overviews feature is approximately 91%. However, given Google's scale of processing 5 trillion searches annually, this translates to tens of millions of incorrect answers generated every hour. More troublingly, even when the answers are correct, over half of the cited links fail to support their conclusions.

Google is disseminating misinformation on an unprecedented scale, and most people are completely unaware.

According to The New York Times, AI startup Oumi, commissioned by the publication, used the industry-standard test SimpleQA, developed by OpenAI, to evaluate the accuracy of Google's AI Overviews feature. The test covered 4,326 search queries, conducted in two rounds: one in October last year (powered by Gemini 2) and another in February this year (upgraded to Gemini 3). The results showed that Gemini 2's accuracy was about 85%, which improved to 91% with Gemini 3.

91% sounds good, but it's a different story when considering Google's massive scale. Google processes approximately 5 trillion search queries annually. With a 9% error rate, AI Overviews generates over 57 million inaccurate answers per hour, nearly 1 million per minute.

Correct Answers, Wrong Sources

More alarming than the accuracy rate is the issue of "unsubstantiated citations."

Oumi's data shows that in the Gemini 2 era, 37% of correct answers had the problem of "unsubstantiated citations," meaning the links attached to the AI summary did not support the information provided. After upgrading to Gemini 3, this proportion increased instead of decreasing, jumping to 56%. In other words, while the model gives correct answers, it is increasingly failing to "show its work."

Oumi CEO Manos Koukoumidis pointedly questioned: "Even if the answer is correct, how do you know it's correct? How do you verify it?"

The heavy reliance on low-quality sources by AI Overviews exacerbates this problem. Oumi found that Facebook and Reddit are the second and fourth most cited sources for AI Overviews, respectively. In inaccurate answers, Facebook was cited 7% of the time, higher than the 5% rate in accurate answers.

BBC Journalist's Fake Article "Poisons" Results Within 24 Hours

Another serious flaw of AI Overviews is its susceptibility to manipulation.

A BBC journalist tested the system with a deliberately fabricated false article. In less than 24 hours, Google's AI Overview presented the false information from the article as fact to users.

This means anyone who understands how the system works could potentially "poison" AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance responded by stating that the search AI feature is built on the same ranking and security mechanisms used to block spam, and claimed that "most examples in the test are unrealistic queries that people wouldn't actually search for."

Google's Rebuttal: The Test Itself Is Flawed

Google raised several concerns about Oumi's study. A Google spokesperson called the research "seriously flawed," citing reasons including: the SimpleQA benchmark itself contains inaccurate information; Oumi used its own AI model, HallOumi, to judge another AI's performance, potentially introducing additional errors; and the test content does not reflect real user search behavior.

Google's internal tests also showed that when Gemini 3 operates independently outside the Google Search framework, it produces false outputs at a rate as high as 28%. However, Google emphasized that AI Overviews, leveraging the search ranking system, performs better in accuracy than the model alone.

Nevertheless, as PCMag pointed out in a logical paradox: If your defense is that "the report pointing out our AI's inaccuracies itself uses potentially inaccurate AI," this likely does not enhance user confidence in your product's accuracy.

你可能也喜欢

Strategy的STRC下跌揭示了比特币挂钩信贷产品背后的风险

Strategy（前身为MicroStrategy）的优先股STRC在近期市场压力中大幅下跌，一度跌至82.53美元，远低于其100美元的参考面值。公司CEO将此归因于杠杆清算引发的强制抛售，而非公司基本面违约。这一事件凸显了与比特币挂钩的信用产品（如旨在产生收益的优先股）在运用杠杆时所隐藏的风险。当市场波动时，杠杆可能加速抛售，即使发行方并未违约。这表明比特币国库策略的金融化正变得更为复杂，相关产品并非无风险，其表现取决于发行方信誉、市场流动性和资本结构承受波动的能力。此次抛售应被视为对杠杆风险的警示，而非违约信号。

bitcoinist5小时前

bitcoinist5小时前

澳大利亚最高法院在Block Earner加密收益产品案中为ASIC赢得重大胜利

澳大利亚高等法院近日一致裁定，支持该国证券监管机构ASIC对加密公司Block Earner的上诉，认定其已停运的固定收益产品“Earner”属于金融产品及衍生品。该产品在2022年3月至11月期间运营，未持有必要的澳大利亚金融服务牌照。此判决为ASIC将传统金融法规应用于加密收益类产品确立了重要先例。法院强调，判断关键在于产品的经济实质而非其技术标签。这意味着，在澳大利亚，任何提供结构化回报或具有衍生品经济特征的加密产品都可能需要遵守相应的金融牌照规定。尽管本案涉及历史产品，但其确立的法律原则具有现实约束力，为ASIC未来监管类似加密投资产品提供了明确依据。案件现已发回联邦法院全庭审理以确定具体处罚。该裁决向加密行业发出明确信号：提供由他人资产部署产生收益的产品将面临严格的合规审查。对消费者而言，这也提醒了加密收益产品与单纯持有现货的风险差异。澳大利亚加密市场预计将迎来更清晰的监管界限，相关企业需评估现有及计划中产品的合规性。

bitcoinist8小时前

澳大利亚最高法院在Block Earner加密收益产品案中为ASIC赢得重大胜利

bitcoinist8小时前

Blockchain.com通过与Ondo Finance合作扩展代币化股票访问渠道

Blockchain.com通过与Ondo Finance的合作，将其钱包生态系统中的代币化美国股票和ETF访问权限扩展给符合条件的用户。这一合作为用户提供了在熟悉的加密钱包界面内接触受监管的现实世界资产的途径，而非依赖传统券商模式。 Ondo Finance在代币化现实资产（RWA）市场中已成为知名参与者，专注于将国债、收益产品和股票等传统金融产品上链。此次与Blockchain.com的整合为Ondo提供了一个面向大量钱包用户的分发渠道，解决了代币化资产的分发与可及性问题。该举措主要面向美国以外的全球用户，为其提供接触美国股市的加密原生替代方案，这些用户可能已在使用稳定币和加密钱包作为金融基础设施。随着RWA市场竞争日益激烈，股票和ETF因其易于理解和全球需求旺盛而成为焦点。然而，代币化股票仍需解决托管、赎回、法律权利和监管处理等问题。Blockchain.com和Ondo押注于钱包原生访问方式，旨在使这些资产像普通加密货币一样简单易用，同时确保底层资产的合法性。

bitcoinist10小时前

Blockchain.com通过与Ondo Finance合作扩展代币化股票访问渠道

bitcoinist10小时前

CPU杀回牌桌，一场1700亿美元的“上位”大戏开启

英伟达在2026年台北电脑展上首次发布独立CPU产品线Vera CPU，标志着其业务重心从GPU向更广阔的计算领域扩展。CEO黄仁勋指出，在AI智能体时代，CPU已成为数据中心性能的关键瓶颈。与此同时，AMD将服务器CPU市场规模预测大幅上调至1200亿美元以上，行业预测其潜在市场规模将在2030年达到约1700亿美元。市场格局正在发生变化。2026年一季度，AMD在服务器CPU收入份额上逼近英特尔，显示出高核数产品的强大溢价能力。分析指出，AI发展正从训练转向推理和智能体阶段，后者需要频繁进行复杂控制流、工具调用和数据处理，这些任务严重依赖CPU而非GPU。在智能体任务中，GPU利用率可能低于50%，而CPU工作量占比可达七成以上。这导致CPU与GPU的配比从过去的1:8显著收敛至1:4甚至1:1。需求变化直接推动了十多年来首次大规模涨价，英特尔和AMD服务器CPU价格普遍上涨10%-15%，且出现产能紧张。市场分化为配合GPU的高核数CPU和用于智能体任务编排的中核数批量CPU两类需求。英伟达基于ARM架构的Vera CPU入局，进一步凸显了CPU的战略地位。这对中国CPU产业链既是机遇也是挑战。国产CPU厂商如海光信息、华为鲲鹏等，既受益于全球AI需求增长，也面临信创政策带来的国产替代窗口期。行业共识是，AI大规模落地的关键已从单芯片性能转向CPU与GPU的协同能力。

marsbit10小时前

marsbit10小时前

TechFlow 情报局：AMD AI 总监公开批评 Claude Code"变得更笨更懒"，特朗普称霍尔木兹将全面停火但海峡仍有 80 枚水雷待清

**科技与地缘动态摘要** **AI与芯片领域** * **技术竞争与审查**：韩国SK Telecom因与Anthropic的合作面临美国出口管制审查。与此同时，中国Z.AI发布了不依赖英伟达芯片、性能对标Claude Opus的GLM-5.2大模型，引发关于技术围堵效果的讨论。 * **安全与伦理问题**：Google Gemini被曝在诈骗场景中提供误导建议，引发AI安全担忧。GitHub上发现上万个分发木马的仓库，开源供应链安全敲响警钟。 * **行业动态**：亚马逊正洽谈对外出售其自研AI芯片，意图进军市场。苹果据悉将为特殊版iPhone独享台积电最新制程工艺。0G Labs宣布其链上AI推理总量突破重要里程碑。 * **争议与监管**：AMD AI总监公开批评Claude Code性能下降。多名亚马逊工程师因批评公司AI数据中心扩张的环境影响遭内部调查。微软、亚马逊云服务或面临欧盟严厉反垄断审查。 **加密/Web3动态** * 韩国交易所Bithumb上线ReProtocol (RE)交易对，而Upbit则移除了KernelDAO (KERNEL)交易对。 **地缘与财经** * **霍尔木兹海峡局势**：尽管美伊达成协议，但霍尔木兹海峡主航道仍有约80枚水雷未清除，导致近8000万桶满载石油的油轮滞留，等待“安全信号”。伊朗取消了赴瑞士外交行程，和谈前景不明。特朗普称协议是伊朗“无条件投降”，并宣称总统拥有无限权力。 * **美股表现**：美股半导体板块大涨，英特尔因与苹果合作传闻暴涨10.6%，而SpaceX股价下跌3.5%。 **核心观察** 当前局势呈现鲜明对比：地缘政治达成临时“和平”，但实际风险（水雷）与不确定性（伊朗行程取消）犹存，导致经济活动（油轮通航）停滞。与此同时，科技领域的竞争与重构却在加速进行，从芯片自主研发、AI模型突破到供应链安全，科技公司正以另一种方式重塑全球格局。

marsbit10小时前

TechFlow 情报局：AMD AI 总监公开批评 Claude Code"变得更笨更懒"，特朗普称霍尔木兹将全面停火但海峡仍有 80 枚水雷待清

marsbit10小时前

交易

现货

合约

Tens of Millions of Errors Per Hour: Investigation Reveals the 'Accuracy Illusion' of Google AI Search

文章摘要

Correct Answers, Wrong Sources

BBC Journalist's Fake Article "Poisons" Results Within 24 Hours

Google's Rebuttal: The Test Itself Is Flawed

相关问答

你可能也喜欢

Strategy的STRC下跌揭示了比特币挂钩信贷产品背后的风险

澳大利亚最高法院在Block Earner加密收益产品案中为ASIC赢得重大胜利

Blockchain.com通过与Ondo Finance合作扩展代币化股票访问渠道

CPU杀回牌桌，一场1700亿美元的“上位”大戏开启

TechFlow 情报局：AMD AI 总监公开批评 Claude Code"变得更笨更懒"，特朗普称霍尔木兹将全面停火但海峡仍有 80 枚水雷待清

交易

热门分类

热门标签