Overturning the Mainstream Approach to Hallucinations: Metacognition is the New Solution for Large Models to Break the Hallucination Barrier

marsbitPublicado a 2026-06-03Actualizado a 2026-06-03

Resumen

This paper, "Hallucinations Undermine Trust; Metacognition is a Way Forward," proposes a paradigm shift in combating AI hallucination. It argues that the current mainstream approaches—striving for omniscience by scaling data/models or having AI abstain from uncertain answers—are fundamentally flawed. The former has inevitable knowledge gaps, while the latter imposes a crippling "utility tax," requiring the rejection of many correct answers to achieve high accuracy, due to models' poor "discrimination" (the ability to distinguish correct from incorrect answers internally). The core contribution is redefining hallucination not as "being wrong," but as "expressing false information with unwarranted certainty." The proposed solution is **Faithful Uncertainty** or **Metacognition**: enabling AI to accurately perceive its internal uncertainty and honestly express it in its language (e.g., using hedging phrases when unsure). This creates a more reliable assistant that provides useful information while signaling its confidence, minimizing harm from errors. The paper emphasizes that metacognition is critical for the era of AI Agents. Without it, Agents cannot intelligently decide when to use tools like search engines, leading to inefficiency and misuse. Key implementation challenges are highlighted: the "bootstrapping paradox" of training with static uncertainty data, the "alignment distortion signal" where human preference training suppresses internal uncertainty cues, and the diff...

A recent paper from Google Research has a core idea that can be summarized in one sentence: Instead of stubbornly pursuing "making AI omniscient and omnipotent", it's better to teach it to say "I'm not sure".

This paper, titled "Hallucinations Undermine Trust; Metacognition is a Way Forward", was jointly completed by Google Research and Tel Aviv University and has been accepted by the ICML 2026 Position Track. The paper suggests that the mainstream industry approach to combating "hallucinations" might be fundamentally misguided—everyone is busy pumping more knowledge into models, overlooking a more crucial and underestimated capability: enabling AI to perceive and express its own confidence level for each answer.

(Paper address: [2605.01428] Hallucinations Undermine Trust; Metacognition is a Way Forward)

The Utility Tax: The Real Cost of Eliminating Hallucinations

Let's start with a scenario everyone encounters.

You ask an AI assistant a question, and it gives an answer with an unshakeably confident tone, its wording precise, logic complete, seemingly flawless. Later, you check, and that answer is completely fabricated. What's even more infuriating is that it said it without any hesitation, as if it had witnessed it firsthand.

This is AI "hallucination"—the model outputs factually incorrect content but presents it to the user in an unquestionable manner. This problem is particularly fatal in high-stakes scenarios like healthcare, law, and scientific research.

The industry's approach to hallucinations essentially follows two paths. The first path: Make the AI know more, by expanding training data and increasing model parameters to cover more facts. The second path: Make the AI stay silent when uncertain, refusing to answer questions it's unsure about.

Both paths have obvious shortcomings. Facts in the world are endless; models cannot possibly memorize everything, so the first path will always have uncovered blind spots. The problem with the second path is that once AI starts refusing to answer on a large scale, it turns from a "useful assistant" into a "scaredy-cat that dares not say anything"—users ask ten questions, eight get rejected, leading to a terrible experience.

The paper gives the cost of the second path an apt name: "utility tax"—to reduce the hallucination rate, you must sacrifice a large amount of information that could have been answered correctly.

Why is this tax so heavy? The root cause is the lack of a key ability in AI. For the "refuse-to-answer" strategy to work precisely, the model needs to accurately distinguish between "I got this question right" and "I got this question wrong"—refusing only the wrong ones while keeping the right ones. But in reality, models cannot make this precise distinction. The paper distinguishes two easily confused but fundamentally different concepts to illustrate this problem.

Calibration measures whether the AI's overall confidence level matches its overall accuracy. For example, the AI answers 100 questions, each time saying "I'm 60% confident," and exactly 60 out of 100 are correct—this is perfect calibration.

Discrimination measures whether the AI can accurately distinguish "I'm right" from "I'm wrong" on each specific question. An AI that gives 60% confidence for all questions, with an overall accuracy of exactly 60%, is perfectly calibrated but has zero discrimination—it completely fails to distinguish which ones to trust and which to guard against. Good calibration does not equal strong discrimination, and this is precisely the crux of the problem.

After reviewing extensive literature, the paper finds that current mainstream large models' discrimination metric AUROC on factual question-answering tasks is concentrated between 0.70 and 0.85. This number might sound decent but is far from sufficient. Using AUROC=0.71 as a parameter, the paper conducts a set of simulations with startling results: assuming a base error rate of 25%, to reduce the error rate to 5%, the AI must refuse to answer over 52% of correct questions. Even if discrimination improves to 0.85, which is near the literature ceiling, it still requires sacrificing 28% of correct answers. Only when discrimination reaches above 0.95 does the cost become negligible—and currently, no method comes close to this number on knowledge-intensive tasks.

Figure: The difference between calibration and discrimination. The left plot shows the model is well-calibrated (red line close to the diagonal), while the right plot reveals the harsh reality—even with perfect calibration, to reduce the error rate from 25% to 5%, 52% of correct answers must be sacrificed.

Real data confirms this assessment. The paper analyzes the performance of various state-of-the-art models on the SimpleQA Verified benchmark, with results clear and somewhat brutal: most models distribute along the "answer more, err more" diagonal, while a few models pursuing high accuracy achieve higher per-question accuracy by refusing many answers, but at the expense of a huge utility cost. That ideal "upper-right corner" region—answering many and erring few—is currently empty. This blank space is precisely the "discrimination gap" mentioned in the paper.

Figure: Measured performance of mainstream models on SimpleQA Verified. The five-pointed star in the upper right is the ideal target, "Discrimination Gap" marks the chasm between current models and the ideal, and "Utility Tax" indicates the utility price Claude Opus 4 pays for its high accuracy.

Since "stuffing more knowledge" has blind spots, and "staying silent when unsure" is too costly, is there a third way?

Redefining Hallucination: Not "Saying It Wrong", but "Claiming Certainty Without the Right to Be Certain"

The paper's core contribution lies not in diagnosing the problem, but in redefining the problem itself.

For a long time, the industry has defined "hallucination" as "AI outputting wrong information," implying a premise: eliminating hallucinations = eliminating all errors. But the paper proposes looking at it from another angle—hallucination is not "AI saying something wrong," but "AI having no right to be certain, yet giving wrong information with a tone of certainty".

This distinction seems subtle but has profound implications. For example: A doctor looks at test results and says, "You have disease X." If he's just guessing based on intuition, this is irresponsible. But if he says, "Current symptoms point to X, but further tests are needed for confirmation," even if the initial assessment direction is slightly off, this way of expression is honest in itself—he's telling the patient, "Please treat this judgment with caution." Errors are not unacceptable; what's unacceptable is pretending to be certain when uncertain.

Based on this new definition, the third way emerges: faithful uncertainty—enabling AI to express confidence levels at the linguistic level that truly correspond to its internal state's confidence level.

Specifically, an AI's "internal uncertainty" can be objectively measured through repeated sampling: ask the same question a hundred times; if it gives the same answer every time, it's internally certain; if answers vary widely, it's internally uncertain. "Linguistic uncertainty" is the sense of confidence reflected in the AI's wording—"August 4, 1961" versus "I seem to recall it was 1961, but I'm not entirely sure" give readers completely different signals.

Faithful uncertainty requires aligning the two: using tentative wording when internally uncertain, and using a definite tone only when internally certain. The paper emphasizes that this goal is more feasible than "eliminating all errors." The reason is that faithful uncertainty only requires the AI's language output to correspond to its own internal state—this is a closed-loop problem, the signal is inside the model, not dependent on external truth. Eliminating errors requires the AI's output to completely correspond to the truth of the external world; the paper cites the Halting Problem and computational theory, indicating this has fundamental theoretical limits.

The paper summarizes this capability into a higher-level concept: metacognition—AI can both perceive its own uncertainty and adjust its behavior based on that perception. This concept is borrowed from psychology, originally meaning "cognition about one's own cognitive processes." In the AI context, it means AI has a clear awareness of what it knows and what it doesn't know.

Figure: The left side shows the traditional dilemma—"answering" risks hallucinations, "refusing to answer" imposes a utility cost. The right side shows the new path—by faithfully expressing uncertainty, both retaining useful information and minimizing the harm of wrong information, achieving "reliable utility."

The Agent Era: Agents Without Metacognition Are "Flying Blind"

The value of metacognition is not limited to conversational scenarios. In the era of AI Agents, it becomes even more critical.

On the surface, equipping AI with a search engine seems to solve the lack of knowledge problem—just look it up when you don't know, what's there to fear about hallucinations? But the paper points out that tools introduce not a "storage solution," but a "control problem".

With tools, AI faces a series of new decisions: Do I know this myself, or do I need to search? Is the information found credible? If search results contradict what I know, which do I trust? When should I stop searching?

All these decisions depend on the AI's accurate perception of its own internal confidence level. An AI agent without metacognitive ability is like a pilot without an instrument panel—the engine is already alarming, and he's still accelerating.

Figure: The metacognitive control layer serves as a bridge between the AI's foundational capabilities and the external tool system. Without this layer, an Agent's scheduling of external tools is like "flying blind"—not knowing whether to search, whether to believe the results, or to what extent.

Research cited by the paper indicates that current search-enhanced AI agents commonly suffer from tool overuse—searching for questions that don't need searching at all, leading to inefficiency and introducing unnecessary noise. The reason is simple: AI without metacognition simply cannot judge "Do I need extra information?"

On the Road to Metacognition, Several Hard Challenges Remain

The paper also frankly points out key challenges on the path to realization.

"Bootstrapping Paradox": Teaching AI to express uncertainty requires training data demonstrating "when to hesitate," but the AI's knowledge boundaries are dynamic. A training sample labeled "I'm not sure" might become something the model confidently knows after evolution. Using static data to teach a dynamic ability would train AI that "pretends to be uncertain." This requires developing dynamic data infrastructure that reflects the model's current knowledge boundaries.

"Alignment Destroys the Signal": Research finds that AI after pre-training actually has decent internal uncertainty signals—its internal state can distinguish "confident about this question" from "unsure about this question." But alignment training like RLHF erases this signal. The reason is that humans prefer answers with a certain tone, forcing the AI to learn to appear confident externally no matter how uncertain it is internally.

"Causality in Evaluation": A deeper challenge is ensuring the AI is genuinely reading internal signals, not just learning superficial patterns like "say I'm unsure when encountering rare words." Distinguishing "real metacognition" from "performance of metacognition" is a fundamental scientific evaluation problem.

The paper also offers specific suggestions to the research community: Stop evaluating anti-hallucination methods with only a single accuracy number. Instead, visualize the complete "utility-error rate trade-off curve" to see clearly whether a method genuinely improves underlying discrimination ability or merely raises the refusal threshold on the same curve. Simultaneously, detect "collateral damage"—whether reducing error rates in knowledge Q&A comes at an unexpected cost to tasks like reasoning, coding, and writing.

Ultimately, the core message this paper wants to convey is: AI doesn't have to be omniscient and omnipotent, but it must have an honest understanding of what it knows and doesn't know, and communicate that understanding to the user.

We trust professionals not because they never make mistakes, but because they can honestly distinguish "I'm certain" from "I'm guessing"—it is this distinction that separates the professional from the unprofessional. AI should also move down this road. Instead of endlessly chasing the illusion of perfect infallibility, it's more practical to teach AI one more thing: knowing when it's talking nonsense, and honestly telling the user. (This article first appeared on Titanium Media APP, author | Silicon Valley Tech_news, editor | Jiao Yan)

Preguntas relacionadas

QAccording to the article, what is the core limitation of the current two mainstream approaches (scaling up knowledge and refusing to answer) to combating AI hallucinations?

AThe core limitation is the 'utility tax'—the high cost of sacrificing many correct answers. Scaling knowledge has inherent blind spots, while refusing to answer lacks precision because current models have poor 'discrimination'—they cannot accurately tell which specific answers they got right or wrong, forcing them to reject many correct responses to lower error rates.

QWhat key concept does the paper borrow from psychology and propose as a new solution to the hallucination problem?

AThe paper proposes the concept of 'metacognition,' borrowed from psychology, as a new solution. In the AI context, it means the AI's ability to perceive its own uncertainty (knowing what it knows and doesn't know) and adjust its behavior accordingly, primarily through 'faithful uncertainty' in its language.

QHow does the paper redefine the problem of 'hallucination'?

AThe paper redefines 'hallucination' not simply as 'the AI saying something wrong,' but as 'the AI outputting wrong information with unwarranted certainty.' It shifts the focus from eliminating all errors to ensuring the AI's expressed certainty in language faithfully corresponds to its internal state of certainty.

QWhy is metacognition particularly critical for AI Agents, according to the article?

AMetacognition is critical for AI Agents because it acts as the essential control layer for using external tools like search engines. Without it, Agents face a 'control problem': they cannot determine when they need to search, whether to trust the results, or when to stop, leading to tool abuse and inefficient 'blind flying.'

QWhat are two of the key technical challenges mentioned in the article for achieving faithful uncertainty in AI models?

ATwo key challenges are: 1) The 'Bootstrapping Paradox': Training models on static data to express uncertainty doesn't account for their dynamic, evolving knowledge boundaries, risking models that only 'perform' uncertainty. 2) 'Alignment Destroys the Signal': RLHF and other alignment training can erase the model's inherent internal uncertainty signals by rewarding confident-sounding responses, teaching the AI to appear certain even when it's not.

Lecturas Relacionadas

Blockmaze Defines the Future of RWA Tokenisation with Compliance-First Infrastructure for a $500T On-Chain World

Blockmaze, a regulated ecosystem for tokenized real-world assets (RWAs) backed by Finvasia Group, is launching compliance-first infrastructure to bridge traditional finance and blockchain. It aims to solve the critical challenge of connecting digital tokens to legally recognized ownership and regulatory frameworks. With licenses across eight jurisdictions and over 45 regulatory registrations, Blockmaze provides a platform for issuers and institutions to tokenize assets like real estate, stocks, and commodities securely and compliantly. The goal is to unlock institutional trust and accelerate the transition of a portion of the estimated $500+ trillion global asset market onto the blockchain. The company's CEO, Tajinder Virk, emphasizes that the future of tokenization hinges not just on technology, but on regulated infrastructure that ensures legal recognition and trust.

TheNewsCryptoHace 4 min(s)

Blockmaze Defines the Future of RWA Tokenisation with Compliance-First Infrastructure for a $500T On-Chain World

TheNewsCryptoHace 4 min(s)

ETH Bull and Bear Views Compilation: Can Ethereum's Value Flow Back to ETH?

Titled "ETH Bull and Bear Views: Can Ethereum's Value Flow Back to ETH?", this article synthesizes the current heated debate around Ethereum's native token, ETH, following Bankless co-founder David Hoffman's decision to sell his entire ETH holdings. The **bullish case**, represented by figures like Tom Lee (BitMine CEO) and Raoul Pal, argues that ETH's core thesis remains intact. They contend Ethereum is the essential, secure, and neutral foundational layer for future finance—encompassing stablecoins, RWA, DeFi, L2s, and Agentic AI. Bulls bet on ETH's long-term revaluation as institutional adoption of on-chain finance grows, with significant buying activity from entities like BitMine and Consensys cited as evidence. Conversely, the **bearish perspective**, led by Hoffman and analysts like Markus Thielen, questions ETH's value capture mechanism. They acknowledge Ethereum's network success but argue that the value created by L2s, DeFi, and applications does not sufficiently accrue to the ETH token itself. Bears point to ETH's prolonged underperformance versus the broader crypto market, lack of traditional cash flows, weakening "ultrasound money" narrative, and apparent institutional retreat (e.g., Harvard Management Company exiting its ETH ETF position) as key concerns. The debate highlights a pivotal shift: ETH is no longer just a community belief asset. The central question is whether ETH can transition from being a "**used infrastructure**" to a "**continuously bought and held core asset**" as more value enters the Ethereum ecosystem. The market is now critically examining the direct link between network growth and ETH's value.

marsbitHace 36 min(s)

ETH Bull and Bear Views Compilation: Can Ethereum's Value Flow Back to ETH?

marsbitHace 36 min(s)

Crypto is dead, Perps are forever

The crypto industry is shifting from a focus on creating native assets (like altcoins and protocol tokens) to becoming a "global asset pipeline." Native cryptocurrencies, except for Bitcoin, are seen as failing in their value storage and utility promises, with demand driven largely by speculation. Attention and liquidity are now moving toward real-world assets (RWAs) like U.S. stocks, bonds, gold, and oil traded on-chain via perpetual contracts (Perps). Stablecoins like USDT and USDC set the precedent, proving blockchain's core strength is efficient global settlement and transfer, not inventing new monetary systems. Meanwhile, assets like Ethereum and many DeFi tokens struggle as their narratives weaken against tangible traditional assets and the rapid real-world progress of AI. Perpetual contracts have emerged as a pivotal innovation. They simplify trading by offering pure price exposure to any asset, bypassing complexities of ownership, custody, and traditional market hours. Projects like Hyperliquid gained traction by combining CEX-like efficiency with on-chain transparency, capitalizing on post-FTX distrust, macroeconomic volatility, and the surge in demand for 24/7 stock trading. In conclusion, while the era of speculative native "crypto assets" may be over, perpetual contracts persist as the industry's most potent financial instrument—transforming all assets into globally accessible, constantly tradable instruments centered on price speculation.

marsbitHace 41 min(s)

marsbitHace 41 min(s)

Tencent, Alibaba, ByteDance in a Battle for the Skill Store

Skill is becoming a key concept in the AI field, essentially serving as a structured "instruction manual" for AI Agents that specifies tool calls, decision logic, and output standards. This allows Agents to execute predefined tasks. As the number of Skills grows, distribution platforms have emerged. Major tech companies are swiftly entering this space. In March, Tencent, Alibaba, and ByteDance launched Skill stores within their respective Agent platforms. Subsequently, players like Zhipu AI, Meituan, and Xiaohongshu joined the fray. This competition for the "Skill store" is fundamentally a battle for the AI-era user entry point; whoever controls distribution controls the users. While ByteDance's Coze has experimented with paid Skills, most platforms offer them for free. The real value lies not in the stores themselves but in using them to attract and retain users within an ecosystem, driving revenue from services like cloud computing, model calls, or advertising. The landscape features three main player types: 1) **Internet giants** (e.g., Alibaba, ByteDance, Tencent, Meituan), leveraging Skills to drive traffic and monetize through their broader ecosystems (cloud services, transactions, ads). 2) **Large model companies** (e.g., Zhipu AI, Moonshot AI), using Skill stores to increase user engagement and monetize model API calls. 3) **Content platforms** (e.g., Xiaohongshu), treating Skills as a new content format to generate traffic and ad revenue. However, transforming Skill stores into a sustainable business faces significant hurdles. Key challenges include: the **difficulty in pricing Skills** due to inconsistent outputs across different models and contexts; **lack of cost transparency** (varying token consumption); **security risks** like Skill poisoning; and the **absence of standardized protocols** for development and evaluation. Unlike standardized mobile apps, Skills are often personalized workflows resistant to uniformity, which hinders the establishment of a reliable review and monetization system akin to the App Store. While there is genuine user demand for paid Skills—particularly in enterprise (e.g., contract review) and certain personal productivity scenarios—current platforms offer developers limited and unpredictable distribution. The future of Skill stores depends on overcoming these standardization, evaluation, and safety challenges to make acquiring a Skill as straightforward as downloading an app. For now, the stores function more as display shelves than robust marketplaces.

marsbitHace 41 min(s)

Tencent, Alibaba, ByteDance in a Battle for the Skill Store

marsbitHace 41 min(s)

The Crypto Scene Is Dead, Perpetual Swaps Are Eternal

The crypto industry is undergoing a fundamental shift. The era defined by minting novel, native digital assets (altcoins) is fading. These assets, lacking real-world cash flows or clear value, are losing relevance as attention and capital flow elsewhere. Two powerful external forces are reshaping the space. First, traditional assets like U.S. stocks, bonds, gold, and oil are being tokenized and traded on-chain. Second, the explosive growth of AI, with its tangible products, has overshadowed crypto's once-dominant "future narrative." This marks a critical pivot: crypto is transitioning from being a "factory for new assets" to becoming a "global conduit for existing assets." Its validated utility is not complex financial reinvention but efficient global settlement, transfer, and trading—the original promise of blockchain. Stablecoins like USDT and USDC exemplify this, offering faster dollar movement rather than replacing it. Consequently, native ecosystems like Ethereum face profound challenges. While still crucial infrastructure, ETH struggles to capture value as users interact with Layer 2s or trade traditional assets without needing to hold it. DeFi's grand narrative of rebuilding finance has narrowed to core needs like cheap transfers and deep liquidity. The true breakout innovation is the perpetual contract (Perp). It brilliantly bypasses the complexities of direct asset ownership (custody, compliance, dividends) by creating pure price exposure. Users can speculate on the price movement of *any* asset—NVIDIA, gold, oil—24/7, globally, and with leverage. This "price casino" model, while risky and ethically fraught, delivers unmatched liquidity and accessibility. Projects like Hyperliquid succeeded not by inventing new mechanics but by perfecting the timing and execution of this model. Key drivers included making on-chain Perps feel like centralized exchanges, post-FTX trust migration towards transparency, and rising demand to trade macro assets and equities round-the-clock. In conclusion, the crypto world's most enduring successes are the dollar (via stablecoins), Bitcoin, and trading. Its new frontier is not creating alternative assets but providing a seamless, perpetual trading layer—a new API—for the world's existing financial system. The age of native altcoins is over; the age of perpetual synthetic exposure has begun.

Odaily星球日报Hace 50 min(s)

The Crypto Scene Is Dead, Perpetual Swaps Are Eternal

Odaily星球日报Hace 50 min(s)

Trading

Spot

Futuros

Artículos destacados

Cómo comprar CORE

¡Bienvenido a HTX.com! Hemos hecho que comprar CORE (CORE) sea simple y conveniente. Sigue nuestra guía paso a paso para iniciar tu viaje de criptos.Paso 1: crea tu cuenta HTXUtiliza tu correo electrónico o número de teléfono para registrarte y obtener una cuenta gratuita en HTX. Experimenta un proceso de registro sin complicaciones y desbloquea todas las funciones.Obtener mi cuentaPaso 2: ve a Comprar cripto y elige tu método de pagoTarjeta de crédito/débito: usa tu Visa o Mastercard para comprar CORE (CORE) al instante.Saldo: utiliza fondos del saldo de tu cuenta HTX para tradear sin problemas.Terceros: hemos agregado métodos de pago populares como Google Pay y Apple Pay para mejorar la comodidad.P2P: tradear directamente con otros usuarios en HTX.Over-the-Counter (OTC): ofrecemos servicios personalizados y tipos de cambio competitivos para los traders.Paso 3: guarda tu CORE (CORE)Después de comprar tu CORE (CORE), guárdalo en tu cuenta HTX. Alternativamente, puedes enviarlo a otro lugar mediante transferencia blockchain o utilizarlo para tradear otras criptomonedas.Paso 4: tradear CORE (CORE)Tradear fácilmente con CORE (CORE) en HTX's mercado spot. Simplemente accede a tu cuenta, selecciona tu par de trading, ejecuta tus trades y monitorea en tiempo real. Ofrecemos una experiencia fácil de usar tanto para principiantes como para traders experimentados.

208 Vistas totalesPublicado en 2024.12.13Actualizado en 2026.06.02

Discusiones

Bienvenido a la comunidad de HTX. Aquí puedes mantenerte informado sobre los últimos desarrollos de la plataforma y acceder a análisis profesionales del mercado. A continuación se presentan las opiniones de los usuarios sobre el precio de CORE (CORE).