Your Backtest Is Lying: Why You Must Use Point-in-Time Data

insights.glassnodePubblicato 2026-03-13Pubblicato ultima volta 2026-03-13

Introduzione

This article warns against a common pitfall in backtesting trading strategies: look-ahead bias caused by using revised historical data. It illustrates this with a hypothetical Bitcoin strategy based on exchange outflows from Binance. The strategy is built on the premise that sustained outflows (when the 5-day moving average of BTC balance falls below the 14-day average) are bullish, while inflows signal a sell-off. An initial backtest using standard, revised data shows the strategy performing comparably to a simple buy-and-hold approach. However, the author argues these results are misleading because the data has been updated with information that wasn't available in real-time. This data mutation creates an unfair advantage in the backtest. To demonstrate, the test is rerun using Point-in-Time (PiT) data—an immutable, append-only record that reflects only what was known on any given day. The results are significantly worse, as the PiT-based strategy misses key profitable moves. The key takeaway is that accurate backtesting requires immutable Point-in-Time data to avoid look-ahead bias and replay history honestly.

Let's build a simple, hypothetical trading strategy. The premise is straightforward and rooted in a widely discussed narrative: when coins leave exchanges, it tends to be bullish. The reasoning is intuitive: coins moving off exchanges typically signal that holders are withdrawing to self-custody, reducing the available supply for selling. Conversely, coins flowing onto exchanges may indicate that holders are preparing to sell.

A single day of outflows, however, is just noise. To identify a genuine trend, we would apply a moving average crossover on the exchange balance. When the short-term average falls below the long-term average, it confirms that coins have been leaving exchanges consistently, as a sustained pattern, rather than isolated events.

Using Glassnode's exchange balance for Binance, we define the following:

  • Enter the market when the 5-day moving average of Binance's BTC balance falls below its 14-day moving average, signaling a sustained outflow trend.
  • Exit the market when the 5-day average rises back above the 14-day average, signaling that the outflow trend has reversed and coins are returning to the exchange.

We then benchmark this strategy against simply holding BTC over the same period, starting January 1, 2024 through March 9, 2026, with an initial capital of $1,000 and 0.1% trading fees applied to each trade.

This is a simplified trading strategy, designed primarily for illustrative purposes. It is not investment advice, nor is it meant to suggest that exchange balances are a robust foundation for a trading system.
Access live chart

Here's how to read this chart:

🟫 The brown line at the bottom is the binary trading signal, toggling between in the market (1) and out of the market (0).

🟦 The blue line tracks the strategy's portfolio value over time.

🟩 The green line is the buy-and-hold portfolio benchmark.

We can observe that the exchange balance strategy performed reasonably well, although at times the buy-and-hold strategy outperformed it. In the final days of the research period, however, the exchange balance strategy caught up. While some investors may find the combination of reduced volatility and an ultimately comparable performance to buy-and-hold appealing, the final numbers are misleading – and here’s why.

The Problem: Data Mutation and Look-Ahead Bias

Metrics are not static. Many are retroactively revised as new information becomes available. This is particularly true for metrics that depend on address clustering or entity labeling, such as on-chain exchange balances. However, it is also the case for metrics such as trading volume or price, as individual exchanges can occasionally submit their data with slight delays.

This means that a value you see today for, say, January 15, 2024, may not be the same value that was published on January 15, 2024. The data has been revised with hindsight. When you backtest a strategy on this revised data, you are implicitly using information that was not available at the time the trading decisions would have been made. This introduces a look-ahead bias.

The Honest Backtest: Using Point-in-Time Data

Let's therefore repeat the exact same backtest – same signal logic, same parameters, same dates, same fees – but this time using the Point-in-Time (PiT) variant of the Exchange Balance metric, available in Glassnode Studio.

PiT metrics are strictly append-only and immutable. Each historical data point reflects only the information that was known at the time it was first computed. No retroactive revisions, no look-ahead bias.

While we are using the same metric, the strategy now produces significantly different results, as illustrated by the purple line in the new chart below. The overall performance is notably worse.

Although both strategies behave similarly for much of 2024, we observe that the PiT-based version fails to capture the strong upticks in November 2024 and March 2025 as effectively. As a result, the cumulative performance diverges meaningfully and ends up considerably lower.

Access live chart

Key Takeaway

In this example, the purple strategy, which only has access to information as it was available at the time, performs noticeably worse. ► Backtests will lie if fed with wrong or revised data. Only immutable, Point-in-Time metrics ensure you’re replaying history as it actually happened.

Domande pertinenti

QWhat is the main problem with using revised data for backtesting a trading strategy?

AThe main problem is that it introduces look-ahead bias, as the revised data includes information that was not available at the time the trading decisions would have been made.

QHow does the Point-in-Time (PiT) data differ from the standard exchange balance metric?

APoint-in-Time data is strictly append-only and immutable, meaning each historical data point reflects only the information known at the time it was first computed, with no retroactive revisions.

QWhat was the trading signal used in the hypothetical strategy based on exchange balances?

AThe strategy entered the market when the 5-day moving average of Binance's BTC balance fell below its 14-day moving average, and exited when the 5-day average rose back above the 14-day average.

QWhy did the backtest using Point-in-Time data perform worse than the one using revised data?

AThe PiT-based strategy failed to capture strong market upticks as effectively because it only had access to information available in real-time, without the benefit of hindsight revisions.

QWhat is the key takeaway from the article regarding backtesting and data quality?

ABacktests will produce misleading results if fed with revised data; only immutable, Point-in-Time metrics ensure an accurate replay of history as it actually happened.

Letture associate

a16z: AI's 'Amnesia', Can Continuous Learning Cure It?

The article "a16z: AI's 'Amnesia' – Can Continual Learning Cure It?" explores the limitations of current large language models (LLMs), which, like the protagonist in the film *Memento*, are trapped in a perpetual present—unable to form new memories after training. While methods like in-context learning (ICL), retrieval-augmented generation (RAG), and external scaffolding (e.g., chat history, prompts) provide temporary solutions, they fail to enable true internalization of new knowledge. The authors argue that compression—the core of learning during training—is halted at deployment, preventing models from generalizing, discovering novel solutions (e.g., mathematical proofs), or handling adversarial scenarios. The piece introduces *continual learning* as a critical research direction to address this, categorizing approaches into three paths: 1. **Context**: Scaling external memory via longer context windows, multi-agent systems, and smarter retrieval. 2. **Modules**: Using pluggable adapters or external memory layers for specialization without full retraining. 3. **Weights**: Enabling parameter updates through sparse training, test-time training, meta-learning, distillation, and reinforcement learning from feedback. Challenges include catastrophic forgetting, safety risks, and auditability, but overcoming these could unlock models that learn iteratively from experience. The conclusion emphasizes that while context-based methods are effective, true breakthroughs require models to compress new information into weights post-deployment, moving from mere retrieval to genuine learning.

marsbit2 h fa

a16z: AI's 'Amnesia', Can Continuous Learning Cure It?

marsbit2 h fa

Can a Hair Dryer Earn $34,000? Deciphering the Reflexivity Paradox in Prediction Markets

An individual manipulated a weather sensor at Paris Charles de Gaulle Airport with a portable heat source, causing a Polymarket weather market to settle at 22°C and earning $34,000. This incident highlights a fundamental issue in prediction markets: when a market aims to reflect reality, it also incentivizes participants to influence that reality. Prediction markets operate on two layers: platform rules (what outcome counts as a win) and data sources (what actually happened). While most focus on rules, the real vulnerability lies in the data source. If reality is recorded through a specific source, influencing that source directly affects market settlement. The article categorizes markets by their vulnerability: 1. **Single-point physical data sources** (e.g., weather stations): Easily manipulated through physical interference. 2. **Insider information markets** (e.g., MrBeast video details): Insiders like team members use non-public information to trade. Kalshi fined a剪辑师 $20,000 for insider trading. 3. **Actor-manipulated markets** (e.g., Andrew Tate’s tweet counts): The subject of the market can control the outcome. Evidence suggests Tate’sociated accounts coordinated to profit. 4. **Individual-action markets** (e.g., WNBA disruptions): A single person can execute an event to profit from their pre-placed bets. Kalshi and Polymarket handle these issues differently. Kalshi enforces strict KYC, publicly penalizes insider trading, and reports to regulators. Polymarket, with its anonymous wallet-based system, has historically been more permissive, arguing that insider information improves market accuracy. However, it cooperated with authorities in the "Van Dyke case," where a user traded on classified government information. The core paradox is reflexivity: prediction markets are designed to discover truth, but their financial incentives can distort reality. The more valuable a prediction becomes, the more likely participants are to influence the event itself. The market ceases to be a mirror of reality and instead shapes it.

marsbit3 h fa

Can a Hair Dryer Earn $34,000? Deciphering the Reflexivity Paradox in Prediction Markets

marsbit3 h fa

Trading

Spot
Futures
活动图片