Kalshi's First Research Report Released: How Collective Intelligence Outperforms Wall Street Think Tanks in Predicting CPI

Odaily星球日报Published on 2025-12-24Last updated on 2025-12-24

Abstract

Kalshi Research's inaugural report demonstrates that prediction markets consistently outperform Wall Street consensus forecasts in predicting the U.S. year-over-year CPI inflation rate. The study, covering over 25 monthly CPI releases from February 2023 to mid-2025, shows Kalshi’s market-implied forecasts had a 40.1% lower mean absolute error (MAE) than consensus predictions across all environments. The advantage was most pronounced during economic "shocks." For large surprises (over 0.2 percentage points), Kalshi's forecasts were 50% more accurate a week before the data release, improving to 60% more accurate the day before. For medium surprises (0.1-0.2 percentage points), the advantage was similarly 50%, rising to 56.2% closer to the release. Crucially, a divergence of over 0.1 percentage points between the market forecast and consensus served as a strong signal, with an 81.2% probability that a shock would occur. When the two forecasts disagreed, the market prediction was more accurate 75% of the time. The report attributes this "Shock Alpha" to three factors: the "wisdom of crowds" aggregating diverse information, superior incentive structures that reward accuracy over conformity, and more efficient information synthesis, even with the same public data. This suggests prediction markets provide a valuable, differentiated signal for investors and policymakers, especially during periods of high uncertainty.

This article is from:Kalshi Research

Compiled by | Odaily Planet Daily (@OdailyChina); Translator | Azuma (@azuma_eth)

Editor's Note: Leading prediction market platform Kalshi yesterday announced the launch of a new research report series, Kalshi Research, aimed at providing Kalshi's internal data to scholars and researchers interested in topics related to prediction markets. The inaugural research report for this series has been released. The original title is "Kalshi Outperforms Wall Street in Predicting Inflation" (Beyond Consensus: Prediction Markets and the Forecasting of Inflation Shocks).

Below is the content of the original report, compiled by Odaily Planet Daily.

Overview

Typically, in the week leading up to the release of important economic statistics, analysts and senior economists from large financial institutions provide their estimates of the expected figures. These forecasts, when aggregated, are referred to as the "consensus expectation" and are widely regarded as a crucial reference for gaining insights into market changes and adjusting position layouts.

In this research report, we compare the performance of the consensus expectation versus the implied pricing from Kalshi's prediction markets (sometimes referred to herein as "market prediction") in forecasting the actual value of a key macroeconomic signal—the year-over-year headline inflation rate (YOY CPI).

Key Highlights

  • Overall Superior Accuracy: Across all market environments (including normal and shock periods), Kalshi's predictions had a Mean Absolute Error (MAE) that was 40.1% lower than the consensus expectation.
  • "Shock Alpha": During periods of major shocks (greater than 0.2 percentage points), Kalshi's predictions one week ahead had an MAE 50% lower than the consensus expectation; this advantage expanded to 60% on the day before the data release. During periods of moderate shocks (between 0.1 - 0.2 percentage points), Kalshi's predictions one week ahead also had an MAE 50% lower than the consensus expectation, expanding to 56.2% on the day before the data release.
  • Predictive Signal: When the deviation between the market prediction and the consensus expectation exceeded 0.1 percentage points, the probability of a shock occurring was approximately 81.2%, rising to about 82.4% on the day before the data release. In cases where the market prediction differed from the consensus expectation, the market prediction was more accurate in 75% of instances.

Background

Macroeconomic forecasters face an inherent challenge: the times when forecasting is most critical—namely, during market dislocations, policy shifts, and structural breaks—are precisely the periods when historical models are most likely to fail. Financial market participants typically release consensus forecasts days before key economic data announcements, aggregating expert opinions into market expectations. However, these consensus views, while valuable, often share similar methodological approaches and information sources.

For institutional investors, risk managers, and policymakers, the stakes of forecasting accuracy are asymmetric. During uncontroversial periods, slightly better predictions offer limited value; but during periods of market turmoil—when volatility spikes, correlations break down, or historical relationships fail—superior accuracy can yield significant Alpha returns and limit drawdowns.

Therefore, understanding how parameters behave during market volatility is crucial. We focus on a key macroeconomic indicator—the year-over-year headline inflation rate (YOY CPI)—a core reference for future interest rate decisions and an important signal of economic health.

We compared and evaluated forecasting accuracy across multiple time windows before the official data release. Our core finding is that so-called "Shock Alpha" indeed exists—during tail events, market-based predictions can achieve additional predictive precision compared to the consensus benchmark. This outperformance is not merely of academic interest; it significantly enhances signal quality at critical moments when forecasting errors carry the highest economic cost. In this context, the truly important question is not whether prediction markets are "always correct," but whether they provide a differentiated signal worthy of inclusion in traditional decision-making frameworks.

Methodology

Data

We analyzed the daily implied predictions of traders on the Kalshi platform at three time points: one week before the data release (matching the consensus release timing), one day before release, and the morning of the release. Each market used was (or had been) a real, tradable, active market, reflecting real-money positions at varying liquidity levels. For the consensus expectation, we collected institution-level YoY CPI consensus forecasts, typically published about a week before the U.S. Bureau of Labor Statistics official data release.

The sample period spans from February 2023 to mid-2025, covering over 25 monthly CPI release cycles across various macroeconomic environments.

Shock Classification

We categorized events into three types based on the "magnitude of surprise" relative to historical levels. A "shock" was defined as the absolute difference between the consensus expectation and the actual published data:

  • Normal Events: YoY CPI forecast error below 0.1 percentage points;
  • Moderate Shocks: YoY CPI forecast error between 0.1 and 0.2 percentage points;
  • Major Shocks: YoY CPI forecast error exceeding 0.2 percentage points.

This classification allows us to examine whether predictive advantages vary systematically with the difficulty of the forecast.

Performance Metrics

To evaluate forecasting performance, we employed the following metrics:

  • Mean Absolute Error (MAE): The primary accuracy metric, calculated as the average of the absolute differences between predicted and actual values.
  • Win Rate: When the difference between the consensus expectation and the market prediction reached or exceeded 0.1 percentage points (rounded to one decimal place), we recorded which forecast was closer to the final actual result.
  • Forecast Timeframe Analysis: We tracked how the accuracy of market valuations evolved from one week before release to the release day, revealing the value of continuously incorporating information.

Results: CPI Forecasting Performance

Overall Superior Accuracy

Across all market environments, the market-based CPI predictions had a Mean Absolute Error (MAE) that was 40.1% lower than the consensus forecasts. Across all timeframes, the MAE for market-based CPI predictions was lower than the consensus expectation by 40.1% (one week ahead) to 42.3% (one day ahead).

Furthermore, in cases where the consensus expectation and the market-implied value diverged, Kalshi's market-based predictions demonstrated a statistically significant win rate, ranging from 75.0% one week ahead to 81.2% on release day. If ties with the consensus expectation (accurate to one decimal place) are included, the market-based prediction was tied or better than consensus in approximately 85% of cases one week ahead.

Such a high directional accuracy rate indicates: when market predictions diverge from the consensus expectation, this divergence itself carries significant informational value regarding the likelihood of a shock event occurring.

"Shock Alpha" Exists

The difference in forecasting accuracy was particularly pronounced during shock events. During moderate shock events, the MAE of market predictions was 50% lower than the consensus expectation at the release time, and this advantage expanded to 56.2% or more on the day before the data release; during major shock events, the MAE of market predictions was also 50% lower than the consensus expectation at the release time, and could reach 60% or more on the day before the data release; whereas in normal environments without shocks, market predictions and consensus expectations performed roughly equally.

Although the sample size for shock events is small (reasonable in a world where shocks are inherently highly unpredictable), the overall pattern is clear: when the forecasting environment is most challenging, the information aggregation advantages of markets are most valuable.

However, more importantly, it's not just that Kalshi's predictions perform better during shock periods, but also that the divergence between market predictions and the consensus expectation itself may be a signal of an impending shock. In cases of divergence, the win rate of market predictions relative to the consensus expectation reached 75% (within comparable time windows). Furthermore, threshold analysis indicates: when the deviation between the market and consensus exceeds 0.1 percentage points, the probability of predicting a shock is approximately 81.2%, and on the day before the data release, this probability further increases to about 84.2%.

This practically significant difference suggests that prediction markets can serve not only as a competitive forecasting tool alongside consensus expectations but also as a "meta-signal" regarding forecasting uncertainty, transforming market-consensus divergence into a quantifiable early warning indicator for potential unexpected outcomes.

Further Discussion

An obvious question follows: Why do market predictions outperform consensus forecasts during shocks? We propose three complementary mechanisms to explain this phenomenon.

Market Participant Heterogeneity and "Wisdom of the Crowd"

Traditional consensus expectations, while integrating views from multiple institutions, often share similar methodological assumptions and information sources. Econometric models, Wall Street research reports, and government data releases form a highly overlapping common knowledge base.

In contrast, prediction markets aggregate positions held by participants with diverse information bases: including proprietary models, industry-level insights, alternative data sources, and experience-based intuition. This participant diversity has a solid theoretical foundation in the "wisdom of crowds" theory. This theory suggests that when participants possess relevant information and their prediction errors are not perfectly correlated, aggregating independent predictions from diverse sources often yields superior estimates.

The value of this informational diversity is particularly pronounced during "state shifts" in the macro environment—individuals with scattered, local information interact in the market, and their informational fragments combine to form a collective signal.

Differences in Participant Incentive Structures

Institution-level consensus forecasters often operate within complex organizational and reputational systems that systematically deviate from the goal of "purely pursuing predictive accuracy." The career risks faced by professional forecasters create an asymmetric payoff structure—significant forecasting errors incur substantial reputational costs, while even extreme accuracy, especially achieved by deviating substantially from peer consensus, may not yield proportional career rewards.

This asymmetry induces "herding behavior," where forecasters tend to cluster their predictions near the consensus value, even if their private information or model outputs suggest different results. The reason is that within the career system, the cost of "being wrong alone" is often higher than the reward for "being right alone."

In stark contrast, the incentive mechanism faced by prediction market participants directly aligns forecasting accuracy with economic outcomes—accurate predictions mean profits, incorrect predictions mean losses. In this system, reputational factors are almost non-existent; the only cost of deviating from market consensus is economic loss, entirely dependent on the prediction's correctness. This structure imposes stronger selection pressure for predictive accuracy—participants who can systematically identify consensus forecast errors continuously accumulate capital and amplify their influence in the market through larger position sizes;而那些 mechanically follow consensus suffer continuous losses when consensus proves wrong.

During periods of significantly heightened uncertainty, when the career cost for institutional forecasters to deviate from expert consensus is at its peak, this divergence in incentive structures is often most pronounced and economically most significant.

Information Aggregation Efficiency

A noteworthy empirical fact is: even one week before the data release—a timeframe matching the typical window for consensus expectation releases—market predictions still exhibit significant accuracy advantages. This suggests that the market advantage does not stem solely from the often-cited "information speed advantage" of prediction market participants.

Instead, market predictions may more efficiently aggregate information fragments that are too dispersed, too industry-specific, or too ambiguous to be formally incorporated into traditional econometric forecasting frameworks. The relative advantage of prediction markets may lie not in earlier access to public information, but in their ability to synthesize heterogeneous information more effectively within the same timeframe—information that survey-based consensus mechanisms, even with the same time window, often struggle to process efficiently.

Limitations and Caveats

Our findings require an important qualification. Since the overall sample covers only about 30 months, and major shock events are by definition rare, this means statistical power remains limited for larger tail events. A longer time series will enhance future inferential ability, although the current results strongly suggest the superiority and differentiated signal of market predictions.

Conclusion

We document systematic and economically significant outperformance of prediction markets relative to expert consensus expectations, particularly during shock periods when forecasting accuracy is most critical. Market-based CPI predictions exhibited approximately 40% lower error overall, and this error reduction reached about 60% during periods of major structural change.

Based on these findings, several future research directions become particularly important: First, investigating whether "Shock Alpha" events themselves can be predicted using volatility and forecast divergence indicators, across a larger sample size and multiple macroeconomic indicators; Second, determining the liquidity threshold above which prediction markets can stably outperform traditional forecasting methods; Third, exploring the relationship between prediction market forecasts and those implied by high-frequency trading financial instruments.

In an environment where consensus forecasts heavily rely on correlated model assumptions and shared information sets, prediction markets offer an alternative information aggregation mechanism capable of capturing state switches earlier and processing heterogeneous information more efficiently. For entities needing to make decisions in an economic environment characterized by rising structural uncertainty and tail event frequency, "Shock Alpha" may represent not just an incremental improvement in predictive capability, but a fundamental component of a robust risk management infrastructure.

Related Questions

QWhat is the main finding of Kalshi Research's first report regarding CPI prediction accuracy?

AKalshi's prediction market had a 40.1% lower Mean Absolute Error (MAE) than Wall Street consensus forecasts across all market conditions.

QWhat is 'Shock Alpha' as defined in the Kalshi report?

A'Shock Alpha' refers to the significant additional predictive accuracy of Kalshi's market-based forecasts over consensus during shock events, with MAE reductions of 50% to 60%.

QWhat probability does a divergence of over 0.1 percentage points between market and consensus forecasts signal a potential shock event?

AA divergence of over 0.1 percentage points signals an approximately 81.2% probability of a shock event occurring, rising to about 84.2% the day before the data release.

QWhat are the three mechanisms proposed in the report to explain why market predictions outperform consensus during shocks?

AThe three mechanisms are: 1. Participant heterogeneity and the 'wisdom of crowds'. 2. Differences in incentive structures (direct financial alignment in markets vs. career risks in institutions). 3. Superior information aggregation efficiency in markets.

QWhat key macroeconomic indicator was the focus of the performance comparison in this study?

AThe study focused on comparing the prediction performance for the year-over-year headline inflation rate (YoY CPI).

Related Reads

Bitroot Public Chain Invited to Attend Tencent Cloud Singapore AI Conference, Discussing the Future Alongside Solana

On May 19, Bitroot, an emerging Layer 1 blockchain, participated in the Tencent Cloud AI Summit in Singapore alongside key industry players like Solana Foundation. The event explored the intersection of AI infrastructure, enterprise applications, AI Agents, and Web3. Bitroot's invitation, despite being pre-mainnet, highlights industry interest in its focus on high-performance, AI-native architecture tailored for future AI Agent execution and verifiable on-chain automation. Bitroot CEO Juan Jose emphasized that AI competition is shifting from model performance to data, real-world application scenarios, and trust infrastructure. He argued that for AI Agents to evolve from assistants to autonomous executors managing transactions and assets, they require low-latency, low-cost, and high-throughput blockchain environments. Bitroot aims to address this through its EVM-compatible design, optimistic parallel execution, and a consensus mechanism targeting high scalability. Currently in its Testnet 5.0 phase, Bitroot reports metrics like over 50,000 peak TPS and sub-0.3 second average block time. Its narrative positions it within a growing landscape where next-generation Layer 1s like Monad and Aptos also compete on performance, while Bitroot differentiates by integrating AI computational capabilities natively across its stack. The summit underscored that the fusion of AI and Web3 is moving from concept to infrastructure competition, where networks balancing performance, security, and verifiability will be crucial for enabling scalable AI-driven applications.

marsbit7m ago

Bitroot Public Chain Invited to Attend Tencent Cloud Singapore AI Conference, Discussing the Future Alongside Solana

marsbit7m ago

Hedge Fund Q1 Interpretation: Everyone Is Selling Software, Buying Chips

Hedge Funds and Mutual Funds Aligned in Q1: Dumping Software, Buying Chips A clear consensus emerged among major U.S. hedge funds and mutual funds in Q1: they were simultaneously selling software stocks and pouring capital into the semiconductor sector. This aggressive rotation pushed semiconductor exposure in hedge fund long portfolios to a record high. Hedge funds delivered a 7% return year-to-date, while only 30% of large-cap active mutual funds outperformed their benchmarks. The average short interest for S&P 500 constituents rose to 3% of market cap, the highest since 2011. Within technology, the structural shift was stark. Hedge funds' semiconductor weighting hit an all-time high, while software fell to its lowest since 2019. Excluding Microsoft, mutual funds' relative overexposure to semis vs. software was the largest since 2012. Microsoft was among the most net-sold stocks by both groups. Hedge funds net purchased semiconductor names like LRCX and AMAT. Strategies diverged on leverage and cash. Hedge funds increased their net exposure to near a one-year high after an initial cut. Mutual funds raised their cash allocation, though it remains historically low at 1.4%. Sector alignment was high in Industrials (both overweight) but divergent in Tech: hedge funds increased their Tech net tilt by a record 853 basis points, while mutual funds reduced theirs. Clear splits also appeared in Financials and Consumer Discretionary. Four stocks appeared on both Goldman's hedge fund VIP and mutual fund overweight lists: BA, MA, MRVL, and V. This "shared favorites" basket has returned 10% YTD, outperforming the equal-weight S&P 500. Notably, all "Magnificent Seven" stocks are on the hedge fund VIP list but are uniformly underweighted by mutual funds.

marsbit17m ago

Hedge Fund Q1 Interpretation: Everyone Is Selling Software, Buying Chips

marsbit17m ago

The Evolution Path of Physical Bitcoin

The Evolution of Physical Bitcoin Bitcoin's digital nature is its core strength, enabling self-custody and rapid global transfers. However, its intangibility also hinders mainstream adoption. For over a decade, creators have attempted to materialize Bitcoin while preserving its cash-like properties, yielding notable results. Casascius Coins, launched in 2011, were the first and most iconic physical Bitcoin. Creator Mike Caldwell generated private keys offline, printed them on coins, and sealed them with tamper-evident holograms. This model relied on user trust in the centralized issuer. Production ceased in 2013 due to regulatory pressure from FinCEN. RavenBit Coins emerged in 2014 aiming to decentralize minting by letting users generate and apply their own keys. However, this led to trust issues with numerous untrusted minters and insecure key generation methods. In 2016, Coinkite introduced Opendimes—a breakthrough in bearer asset technology. These USB-shaped devices generate and store keys internally. Funds can be received by checking the public key, but spending requires physically breaking the device to extract the private key. While innovative and open-source, its cost (~$20) and form factor limit its use for small, everyday transactions. Satochip's Satodime, a card-shaped device using similar secure chip technology, followed. It supports NFC interaction and comes in various forms. While potentially cheaper in bulk (~13€), it remains a high-security hardware wallet, not a low-cost cash substitute. A fundamental cost barrier exists. For physical Bitcoin to achieve widespread commercial use, hardware costs must drop below $1 to match the production cost of fiat banknotes. Current secure chips capable of running Bitcoin's cryptographic algorithms (like secp256k1) are too expensive. Chips like NXP's NTAG X DNA (~$3) show cost-reduction potential but lack native Bitcoin curve support. Projects like OfflineCash embed chips in banknote-like paper, but face challenges with durability, the need for custom Bitcoin-enabled chips, and the inherent requirement for users to verify balances online—which conflicts with Bitcoin's trustless ideal. Coinkite's Tapsigner, a ~$20 card with a proprietary Bitcoin NFC chip, is seen as a more practical step forward. It functions as a reloadable hardware wallet for contactless payments, solving the "change" problem and focusing on real-world retail integration, a direction also pursued by companies like Cash App and Square. In summary, the journey to physical Bitcoin has progressed from trusted centralized mints (Casascius) to user-generated keys (RavenBit) and finally to self-contained secure hardware (Opendimes, Satodime, Tapsigner). The core challenge remains developing a sufficiently low-cost, durable, and truly trustless physical bearer asset that can function like cash in daily transactions. Current solutions are either too expensive or introduce new trust assumptions, keeping the ideal of ubiquitous physical Bitcoin just out of reach for now.

marsbit1h ago

The Evolution Path of Physical Bitcoin

marsbit1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片