Kalshi's First Research Report Released: How Collective Intelligence Outperforms Wall Street Think Tanks in Predicting CPI

marsbitPublished on 2025-12-24Last updated on 2025-12-24

Abstract

Kalshi Research's inaugural report demonstrates that prediction markets significantly outperform Wall Street consensus forecasts in predicting year-over-year CPI data, particularly during economic shocks. The study, analyzing data from February 2023 to mid-2025, shows Kalshi’s predictions had a 40.1% lower mean absolute error (MAE) than consensus forecasts overall. During major shocks (deviations >0.2 percentage points), Kalshi’s MAE was 50–60% lower, and for moderate shocks (0.1–0.2 points), it was 50–56.2% lower. When market predictions diverged from consensus by over 0.1 points, they correctly signaled a shock about 81% of the time and were more accurate in 75% of cases. The outperformance is attributed to diverse participant insights, direct financial incentives for accuracy, and efficient aggregation of dispersed information. These findings suggest prediction markets provide a valuable complementary signal, especially in volatile periods where traditional models often fail.

This article is from:Kalshi Research

Compiled by | Odaily Planet Daily (@OdailyChina); Translator | Azuma (@azuma_eth)

Editor's Note: Leading prediction market platform Kalshi announced the launch of a new research report series, Kalshi Research, yesterday. It aims to provide Kalshi's internal data to scholars and researchers interested in topics related to prediction markets. The inaugural report has been published, originally titled "Kalshi Outperforms Wall Street in Predicting Inflation" (Beyond Consensus: Prediction Markets and the Forecasting of Inflation Shocks).

Below is the content of the original report, compiled by Odaily Planet Daily.

Overview

Typically, in the week leading up to the release of important economic statistics, analysts and senior economists from large financial institutions provide their estimates of the expected figures. These forecasts are aggregated to form what is known as the "consensus expectation," which is widely regarded as a crucial reference for gaining insights into market changes and adjusting portfolio positioning.

In this research report, we compare the performance of the consensus expectation with the implied pricing from Kalshi's prediction markets (sometimes referred to as "market forecast" below) in predicting the actual value of the same core macroeconomic indicator—the year-over-year headline inflation rate (YOY CPI).

Key Highlights

  • Overall Superior Accuracy: Across all market environments (including normal and shock periods), Kalshi's predictions had a Mean Absolute Error (MAE) that was 40.1% lower than the consensus expectation.
  • "Shock Alpha": During significant shocks (greater than 0.2 percentage points), Kalshi's predictions had an MAE 50% lower than the consensus expectation in the one-week-ahead forecast window; this advantage expanded to 60% on the day before the data release. For moderate shocks (between 0.1 and 0.2 percentage points), Kalshi's predictions also had an MAE 50% lower than the consensus expectation one week ahead, which widened to 56.2% on the day before the release.
  • Predictive Signal: When the deviation between the market forecast and the consensus expectation exceeded 0.1 percentage points, the probability of a shock occurring was approximately 81.2%, rising to about 82.4% on the day before the data release. In cases where the market forecast differed from the consensus, the market forecast was more accurate in 75% of instances.

Background

Macroeconomic forecasters face an inherent challenge: the most critical times for forecasting—periods of market disorder, policy shifts, and structural breaks—are precisely when historical models are most likely to fail. Financial market participants typically release consensus forecasts for key economic data days in advance, aggregating expert opinions into market expectations. However, while valuable, these consensus views often share similar methodological approaches and information sources.

For institutional investors, risk managers, and policymakers, the stakes of forecasting accuracy are asymmetric. During uncontroversial periods, slightly better predictions offer limited value; but during periods of market turmoil—when volatility spikes, correlations break down, or historical relationships fail—superior accuracy can yield significant alpha returns and limit drawdowns.

Therefore, understanding how parameters behave during volatile market periods is crucial. We focus on a key macroeconomic indicator—the year-over-year headline inflation rate (YOY CPI)—a core reference for future interest rate decisions and an important signal of economic health.

We compared and evaluated forecasting accuracy across multiple time windows before the official data release. Our core finding is that "shock alpha" indeed exists—during tail events, market-based predictions can achieve additional forecasting precision compared to the consensus benchmark. This outperformance is not merely of academic interest; it significantly enhances signal quality precisely when forecasting errors carry the highest economic costs. In this context, the crucial question is not whether prediction markets are "always right," but whether they provide a differentiated signal worthy of inclusion in traditional decision-making frameworks.

Methodology

Data

We analyzed the daily implied predictions from traders on the Kalshi platform at three time points: one week before the data release (matching the timing of consensus expectation releases), one day before, and the morning of the release. Each market used was (or had been) a real, tradable, active market, reflecting real-money positions at varying liquidity levels. For the consensus, we collected institution-level YoY CPI consensus forecasts, typically published about a week before the official U.S. Bureau of Labor Statistics data release.

The sample period spans from February 2023 to mid-2025, covering over 25 monthly CPI release cycles across various macroeconomic environments.

Shock Classification

We categorized events into three types based on the "surprise magnitude" relative to historical levels. A "shock" was defined as the absolute difference between the consensus expectation and the actual published data:

  • Normal Events: YOY CPI forecast error below 0.1 percentage points;
  • Moderate Shocks: YOY CPI forecast error between 0.1 and 0.2 percentage points;
  • Major Shocks: YOY CPI forecast error exceeding 0.2 percentage points.

This classification allows us to examine whether predictive advantages vary systematically with the difficulty of the forecast.

Performance Metrics

To assess forecasting performance, we employed the following metrics:

  • Mean Absolute Error (MAE): The primary accuracy metric, calculated as the average of the absolute differences between predicted and actual values.
  • Win Rate: When the difference between the consensus expectation and the market forecast reached or exceeded 0.1 percentage points (rounded to one decimal place), we recorded which prediction was closer to the final actual result.
  • Forecast Horizon Analysis: We tracked how the accuracy of market valuations evolved from one week before the release to the release day, revealing the value of continuously incorporating information.

Results: CPI Forecasting Performance

Overall Superior Accuracy

Across all market environments, the market-based CPI forecasts had a Mean Absolute Error (MAE) that was 40.1% lower than the consensus forecasts. Across all time horizons, the MAE for market-based CPI forecasts was lower than the consensus by 40.1% (one week ahead) to 42.3% (one day ahead).

Furthermore, in cases where the consensus expectation and the market-implied value diverged, Kalshi's market-based forecasts demonstrated a statistically significant win rate, ranging from 75.0% one week ahead to 81.2% on release day. Including ties with the consensus (to one decimal place), the market-based forecast performed equally well or better than the consensus in approximately 85% of cases one week ahead.

Such a high directional accuracy rate indicates that when market forecasts diverge from the consensus expectation, the divergence itself carries significant informational value regarding the likelihood of a shock event.

"Shock Alpha" Exists

The difference in forecasting accuracy was particularly pronounced during shock events. During moderate shock events, the MAE of market forecasts was 50% lower than the consensus expectation at the same release time; this advantage expanded to 56.2% or more one day before the data release. During major shock events, the MAE of market forecasts was also 50% lower than the consensus at the same time, reaching 60% or more one day before release. In contrast, during normal, non-shock environments, market forecasts and consensus expectations performed roughly equally well.

Although the sample size for shock events is small (reasonable in a world where shocks are inherently highly unpredictable), the overall pattern is clear: the information aggregation advantage of markets is most valuable precisely when the forecasting environment is most challenging.

However, more importantly, it's not just that Kalshi's predictions perform better during shock periods, but also that the divergence between the market forecast and the consensus expectation can itself be a signal of an impending shock. In cases of divergence, the market forecast's win rate against the consensus expectation was 75% (within comparable time windows). Furthermore, threshold analysis indicated that when the market deviation from the consensus exceeded 0.1 percentage points, the probability of predicting a shock was approximately 81.2%, rising to about 84.2% on the day before the data release.

This practically significant difference suggests that prediction markets can serve not only as a competitive forecasting tool alongside consensus expectations but also as a "meta-signal" regarding forecast uncertainty, transforming market-consensus divergence into a quantifiable early warning indicator for potential unexpected outcomes.

Further Discussion

An obvious question follows: Why do market forecasts outperform consensus forecasts during shocks? We propose three complementary mechanisms to explain this phenomenon.

Market Participant Heterogeneity and "Wisdom of the Crowd"

Traditional consensus expectations, while aggregating views from multiple institutions, often share similar methodological assumptions and information sources. Econometric models, Wall Street research reports, and government data releases form a highly overlapping common knowledge base.

In contrast, prediction markets aggregate positions held by participants with diverse information bases: including proprietary models, industry-level insights, alternative data sources, and experience-based intuition. This participant diversity has a solid theoretical foundation in the "wisdom of crowds" theory. This theory suggests that when participants possess relevant information and their prediction errors are not perfectly correlated, aggregating independent predictions from diverse sources often yields a superior estimate.

The value of this informational diversity is particularly pronounced during "state shifts" in the macro environment—individuals with scattered, local information interact in the market, combining their informational fragments to form a collective signal.

Differences in Participant Incentive Structures

Institution-level consensus forecasters often operate within complex organizational and reputational systems that systematically deviate from the goal of "purely pursuing predictive accuracy." The career risks faced by professional forecasters create an asymmetric payoff structure—large forecasting errors incur significant reputational costs, while even extreme accuracy, especially achieved by deviating substantially from peer consensus, may not yield proportional career rewards.

This asymmetry induces "herding behavior," where forecasters tend to cluster their predictions near the consensus value, even if their private information or model outputs suggest different results. The reason is that in a professional context, the cost of "being wrong alone" often outweighs the benefits of "being right alone."

In stark contrast, the incentive structure faced by prediction market participants directly aligns forecasting accuracy with economic outcomes—accurate predictions mean profits, incorrect predictions mean losses. In this system, reputational factors are almost non-existent; the only cost of deviating from the market consensus is economic loss, solely dependent on the prediction's correctness. This structure imposes stronger selection pressure for predictive accuracy—participants who can identify consensus forecast errors systematically accumulate capital and increase their influence in the market through larger position sizes;而那些 mechanically following the consensus suffer continuous losses when the consensus proves wrong.

During periods of significantly heightened uncertainty, when the career cost for institutional forecasters to deviate from the expert consensus is at its peak, this divergence in incentive structures is often most pronounced and economically most important.

Information Aggregation Efficiency

A noteworthy empirical fact is: even one week before the data release—a time point matching the typical release window of consensus expectations—market forecasts still exhibit significant accuracy advantages. This suggests that the market advantage does not stem solely from the often-cited "information speed advantage" of prediction market participants.

Instead, market forecasts may more efficiently aggregate informational fragments that are too dispersed, too industry-specific, or too nebulous to be formally incorporated into traditional econometric forecasting frameworks. The relative advantage of prediction markets may lie not in earlier access to public information, but in their ability to synthesize heterogeneous information more effectively within the same time frame—a task that survey-based consensus mechanisms, even with the same time window, often struggle to perform efficiently.

Limitations and Caveats

Our findings require an important qualification. Since the overall sample covers only about 30 months, and major shock events are by definition rare, statistical power for larger tail events remains limited. A longer time series would enhance future inferential capabilities, although the current results strongly suggest the superiority and differentiated signal of market forecasts.

Conclusion

We document systematic and economically significant outperformance of prediction markets relative to expert consensus expectations, particularly during shock periods where forecasting accuracy is most critical. Market-based CPI forecasts exhibited approximately 40% lower error overall, with error reduction reaching around 60% during periods of major structural change.

Based on these findings, several future research directions become particularly important: First, investigating whether "shock alpha" events can be predicted themselves using volatility and forecast divergence indicators, across a larger sample size and multiple macroeconomic indicators; Second, determining the liquidity threshold above which prediction markets can stably outperform traditional forecasting methods; Third, exploring the relationship between prediction market forecast values and those implied by high-frequency trading financial instruments.

In an environment where consensus forecasts heavily rely on correlated model assumptions and shared information sets, prediction markets offer an alternative information aggregation mechanism, capable of capturing state switches earlier and processing heterogeneous information more efficiently. For entities needing to make decisions in an economic environment characterized by rising structural uncertainty and tail event frequency, "shock alpha" may represent not just an incremental improvement in forecasting ability, but a fundamental component of a robust risk management infrastructure.

Related Questions

QWhat is the main finding of Kalshi Research's report regarding the prediction of CPI?

AKalshi's prediction market had a 40.1% lower Mean Absolute Error (MAE) than the Wall Street consensus forecast in predicting the year-over-year CPI, with the advantage being even greater (up to 60%) during significant economic shocks.

QHow does the performance of Kalshi's market prediction compare to consensus during 'shock' events?

ADuring medium shock events (0.1-0.2 percentage point error), Kalshi's MAE was 50% lower than consensus a week before the data release, widening to 56.2% the day before. For major shocks (>0.2 p.p.), the MAE was 50% lower a week before and over 60% lower the day before.

QWhat is the 'Predictive Signal' mentioned in the market and consensus forecasts diverge?

AWhen the market prediction and consensus forecast differed by more than 0.1 percentage points, the probability of a shock event occurring was about 81.2%. In these cases of disagreement, the market prediction was more accurate 75% of the time.

QWhat are the proposed reasons (mechanisms) for why prediction markets outperform consensus during shocks?

AThree complementary mechanisms are proposed: 1) Participant heterogeneity and the 'wisdom of crowds' aggregating diverse information, 2) Differing incentive structures that directly align accuracy with profit/loss in markets vs. herding behavior in consensus forecasts, and 3) Superior information aggregation efficiency, even with the same public information.

QWhat key macroeconomic indicator was the focus of this comparative study?

AThe study focused on comparing the accuracy of predicting the year-over-year (YoY) Consumer Price Index (CPI), a core indicator for future interest rate decisions and overall economic health.

Related Reads

Bitroot Public Chain Invited to Attend Tencent Cloud Singapore AI Conference, Discussing the Future Alongside Solana

On May 19, Bitroot, an emerging Layer 1 blockchain, participated in the Tencent Cloud AI Summit in Singapore alongside key industry players like Solana Foundation. The event explored the intersection of AI infrastructure, enterprise applications, AI Agents, and Web3. Bitroot's invitation, despite being pre-mainnet, highlights industry interest in its focus on high-performance, AI-native architecture tailored for future AI Agent execution and verifiable on-chain automation. Bitroot CEO Juan Jose emphasized that AI competition is shifting from model performance to data, real-world application scenarios, and trust infrastructure. He argued that for AI Agents to evolve from assistants to autonomous executors managing transactions and assets, they require low-latency, low-cost, and high-throughput blockchain environments. Bitroot aims to address this through its EVM-compatible design, optimistic parallel execution, and a consensus mechanism targeting high scalability. Currently in its Testnet 5.0 phase, Bitroot reports metrics like over 50,000 peak TPS and sub-0.3 second average block time. Its narrative positions it within a growing landscape where next-generation Layer 1s like Monad and Aptos also compete on performance, while Bitroot differentiates by integrating AI computational capabilities natively across its stack. The summit underscored that the fusion of AI and Web3 is moving from concept to infrastructure competition, where networks balancing performance, security, and verifiability will be crucial for enabling scalable AI-driven applications.

marsbit7m ago

Bitroot Public Chain Invited to Attend Tencent Cloud Singapore AI Conference, Discussing the Future Alongside Solana

marsbit7m ago

Hedge Fund Q1 Interpretation: Everyone Is Selling Software, Buying Chips

Hedge Funds and Mutual Funds Aligned in Q1: Dumping Software, Buying Chips A clear consensus emerged among major U.S. hedge funds and mutual funds in Q1: they were simultaneously selling software stocks and pouring capital into the semiconductor sector. This aggressive rotation pushed semiconductor exposure in hedge fund long portfolios to a record high. Hedge funds delivered a 7% return year-to-date, while only 30% of large-cap active mutual funds outperformed their benchmarks. The average short interest for S&P 500 constituents rose to 3% of market cap, the highest since 2011. Within technology, the structural shift was stark. Hedge funds' semiconductor weighting hit an all-time high, while software fell to its lowest since 2019. Excluding Microsoft, mutual funds' relative overexposure to semis vs. software was the largest since 2012. Microsoft was among the most net-sold stocks by both groups. Hedge funds net purchased semiconductor names like LRCX and AMAT. Strategies diverged on leverage and cash. Hedge funds increased their net exposure to near a one-year high after an initial cut. Mutual funds raised their cash allocation, though it remains historically low at 1.4%. Sector alignment was high in Industrials (both overweight) but divergent in Tech: hedge funds increased their Tech net tilt by a record 853 basis points, while mutual funds reduced theirs. Clear splits also appeared in Financials and Consumer Discretionary. Four stocks appeared on both Goldman's hedge fund VIP and mutual fund overweight lists: BA, MA, MRVL, and V. This "shared favorites" basket has returned 10% YTD, outperforming the equal-weight S&P 500. Notably, all "Magnificent Seven" stocks are on the hedge fund VIP list but are uniformly underweighted by mutual funds.

marsbit17m ago

Hedge Fund Q1 Interpretation: Everyone Is Selling Software, Buying Chips

marsbit17m ago

The Evolution Path of Physical Bitcoin

The Evolution of Physical Bitcoin Bitcoin's digital nature is its core strength, enabling self-custody and rapid global transfers. However, its intangibility also hinders mainstream adoption. For over a decade, creators have attempted to materialize Bitcoin while preserving its cash-like properties, yielding notable results. Casascius Coins, launched in 2011, were the first and most iconic physical Bitcoin. Creator Mike Caldwell generated private keys offline, printed them on coins, and sealed them with tamper-evident holograms. This model relied on user trust in the centralized issuer. Production ceased in 2013 due to regulatory pressure from FinCEN. RavenBit Coins emerged in 2014 aiming to decentralize minting by letting users generate and apply their own keys. However, this led to trust issues with numerous untrusted minters and insecure key generation methods. In 2016, Coinkite introduced Opendimes—a breakthrough in bearer asset technology. These USB-shaped devices generate and store keys internally. Funds can be received by checking the public key, but spending requires physically breaking the device to extract the private key. While innovative and open-source, its cost (~$20) and form factor limit its use for small, everyday transactions. Satochip's Satodime, a card-shaped device using similar secure chip technology, followed. It supports NFC interaction and comes in various forms. While potentially cheaper in bulk (~13€), it remains a high-security hardware wallet, not a low-cost cash substitute. A fundamental cost barrier exists. For physical Bitcoin to achieve widespread commercial use, hardware costs must drop below $1 to match the production cost of fiat banknotes. Current secure chips capable of running Bitcoin's cryptographic algorithms (like secp256k1) are too expensive. Chips like NXP's NTAG X DNA (~$3) show cost-reduction potential but lack native Bitcoin curve support. Projects like OfflineCash embed chips in banknote-like paper, but face challenges with durability, the need for custom Bitcoin-enabled chips, and the inherent requirement for users to verify balances online—which conflicts with Bitcoin's trustless ideal. Coinkite's Tapsigner, a ~$20 card with a proprietary Bitcoin NFC chip, is seen as a more practical step forward. It functions as a reloadable hardware wallet for contactless payments, solving the "change" problem and focusing on real-world retail integration, a direction also pursued by companies like Cash App and Square. In summary, the journey to physical Bitcoin has progressed from trusted centralized mints (Casascius) to user-generated keys (RavenBit) and finally to self-contained secure hardware (Opendimes, Satodime, Tapsigner). The core challenge remains developing a sufficiently low-cost, durable, and truly trustless physical bearer asset that can function like cash in daily transactions. Current solutions are either too expensive or introduce new trust assumptions, keeping the ideal of ubiquitous physical Bitcoin just out of reach for now.

marsbit1h ago

The Evolution Path of Physical Bitcoin

marsbit1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片