Your Backtest Is Lying: Why You Must Use Point-in-Time Data

insights.glassnodeОпубликовано 2026-03-13Обновлено 2026-03-13

Введение

The article warns that backtests using revised historical data are misleading due to look-ahead bias. It illustrates this with a hypothetical trading strategy based on Bitcoin exchange outflows from Binance: entering the market when the 5-day moving average of BTC balance falls below the 14-day average, and exiting on the reverse. Initial backtest results using current data appeared reasonably successful, even comparable to buy-and-hold at times. However, the author argues this is deceptive because metrics like exchange balances are often revised retroactively as new data arrives, meaning a backtest uses information that wasn't available in real-time. Re-running the identical strategy using Point-in-Time (PiT) data—which is immutable and reflects only what was known when the data was first published—yielded significantly worse performance. The PiT-based strategy failed to capture key market upswings as effectively, resulting in meaningfully lower cumulative returns. The key takeaway is that only immutable, Point-in-Time data ensures a backtest accurately replays history without look-ahead bias, and that using revised data will produce lying backtests.

Let's build a simple, hypothetical trading strategy. The premise is straightforward and rooted in a widely discussed narrative: when coins leave exchanges, it tends to be bullish. The reasoning is intuitive: coins moving off exchanges typically signal that holders are withdrawing to self-custody, reducing the available supply for selling. Conversely, coins flowing onto exchanges may indicate that holders are preparing to sell.

A single day of outflows, however, is just noise. To identify a genuine trend, we would apply a moving average crossover on the exchange balance. When the short-term average falls below the long-term average, it confirms that coins have been leaving exchanges consistently, as a sustained pattern, rather than isolated events.

Using Glassnode's exchange balance for Binance, we define the following:

  • Enter the market when the 5-day moving average of Binance's BTC balance falls below its 14-day moving average, signaling a sustained outflow trend.
  • Exit the market when the 5-day average rises back above the 14-day average, signaling that the outflow trend has reversed and coins are returning to the exchange.

We then benchmark this strategy against simply holding BTC over the same period, starting January 1, 2024 through March 9, 2026, with an initial capital of $1,000 and 0.1% trading fees applied to each trade.

This is a simplified trading strategy, designed primarily for illustrative purposes. It is not investment advice, nor is it meant to suggest that exchange balances are a robust foundation for a trading system.
Access live chart

Here's how to read this chart:

🟫 The brown line at the bottom is the binary trading signal, toggling between in the market (1) and out of the market (0).

🟦 The blue line tracks the strategy's portfolio value over time.

🟩 The green line is the buy-and-hold portfolio benchmark.

We can observe that the exchange balance strategy performed reasonably well, although at times the buy-and-hold strategy outperformed it. In the final days of the research period, however, the exchange balance strategy caught up. While some investors may find the combination of reduced volatility and an ultimately comparable performance to buy-and-hold appealing, the final numbers are misleading – and here’s why.

The Problem: Data Mutation and Look-Ahead Bias

Metrics are not static. Many are retroactively revised as new information becomes available. This is particularly true for metrics that depend on address clustering or entity labeling, such as on-chain exchange balances. However, it is also the case for metrics such as trading volume or price, as individual exchanges can occasionally submit their data with slight delays.

This means that a value you see today for, say, January 15, 2024, may not be the same value that was published on January 15, 2024. The data has been revised with hindsight. When you backtest a strategy on this revised data, you are implicitly using information that was not available at the time the trading decisions would have been made. This introduces a look-ahead bias.

The Honest Backtest: Using Point-in-Time Data

Let's therefore repeat the exact same backtest – same signal logic, same parameters, same dates, same fees – but this time using the Point-in-Time (PiT) variant of the Exchange Balance metric, available in Glassnode Studio.

PiT metrics are strictly append-only and immutable. Each historical data point reflects only the information that was known at the time it was first computed. No retroactive revisions, no look-ahead bias.

While we are using the same metric, the strategy now produces significantly different results, as illustrated by the purple line in the new chart below. The overall performance is notably worse.

Although both strategies behave similarly for much of 2024, we observe that the PiT-based version fails to capture the strong upticks in November 2024 and March 2025 as effectively. As a result, the cumulative performance diverges meaningfully and ends up considerably lower.

Access live chart

Key Takeaway

In this example, the purple strategy, which only has access to information as it was available at the time, performs noticeably worse. ► Backtests will lie if fed with wrong or revised data. Only immutable, Point-in-Time metrics ensure you’re replaying history as it actually happened.

Связанные с этим вопросы

QWhat is the main premise of the hypothetical trading strategy described in the article?

AThe strategy is based on the narrative that when coins leave exchanges, it is a bullish signal, as it indicates holders are moving to self-custody and reducing available supply. It enters the market when the 5-day moving average of Binance's BTC balance falls below its 14-day average and exits when it rises back above.

QWhat is the primary problem identified with using standard historical data for backtesting?

AThe primary problem is data mutation and look-ahead bias. Many metrics are retroactively revised as new information becomes available, meaning a backtest uses data that was not known at the time of the original trading decision, leading to inaccurate and overly optimistic results.

QWhat is Point-in-Time (PiT) data and how does it differ from standard data?

APoint-in-Time data is strictly append-only and immutable. Each historical data point reflects only the information that was known and published at that specific time, with no retroactive revisions. This prevents look-ahead bias in backtesting.

QHow did the performance of the strategy change when backtested with Point-in-Time data?

AThe performance was significantly worse. The PiT-based strategy failed to capture strong market upticks as effectively, leading to a meaningfully lower cumulative performance compared to the backtest using revised data.

QWhat is the key takeaway from the article regarding backtesting and data?

AThe key takeaway is that backtests will produce misleading results if they use revised data. Only immutable, Point-in-Time metrics ensure an accurate historical replay of what was actually knowable at the time, preventing look-ahead bias.

Похожее

New Wall Street Play: Yen Shorts Still Adding, But Japan Stocks Don't Rely on Carry Trade Unwinding

On June 3rd, USD/JPY hit 160.44, its highest level since July 2024, while the Nikkei 225 surged past 68,000 points. Contrary to popular narratives of an imminent "carry trade unwind" akin to August 2024, data reveals a more complex picture. Speculative net short positions in yen futures have actually increased, reaching -114,667 contracts by late May, suggesting traders are doubling down rather than retreating. Meanwhile, Japan's Finance Ministry conducted its largest-ever single-round FX intervention (11.73 trillion yen) in April-May but failed to hold the 160 yen line. The Nikkei's rally is not driven by carry trade dynamics. Foreign investors are aggressively buying Japanese stocks, with net purchases in 2026 running nearly 16 times higher than 2025 levels. This inflow is concentrated in AI and semiconductor-related stocks like SoftBank and Socionext, fueled by positive sector outlooks, rather than being a flight from unwinding yen shorts. Furthermore, the Nikkei has continued climbing despite the Bank of Japan's (BOJ) rate hikes to 0.75%. This disconnect exists because the current equity boom is fueled by AI-driven foreign investment, not reliant on cheap yen funding. However, this relationship remains fragile. Should the BOJ hike rates further (e.g., to 1.0%) while dollar weakness increases carry trade costs, the trajectories of the yen and Japanese stocks could reconverge, potentially triggering volatility.

marsbit3 мин. назад

New Wall Street Play: Yen Shorts Still Adding, But Japan Stocks Don't Rely on Carry Trade Unwinding

marsbit3 мин. назад

Broadcom's Q3 Guidance Misses Expectations by $12 Billion, After-Hours Trading Plummets Over 13%, AI Narrative "Cooling"?

On June 3, Broadcom released record Q2 FY26 results with revenue of $22.19B, up 48% YoY, and AI chip sales of $10.8B, up 143%. Adjusted EPS of $2.44 beat estimates. However, its Q3 AI semiconductor revenue guidance of $16B, while up over 200% YoY, fell roughly $1.2B (7%) short of analyst consensus expectations of $17.2B. This miss, coupled with slightly weaker-than-expected software revenue, triggered a severe market reaction. CEO Hock Tan maintained the FY26 AI revenue outlook of over $100B but did not raise it, disappointing investors who had priced in more robust growth. The stock plummeted over 13% in after-hours trading, erasing roughly $270B in market cap. The sell-off extended to peers like Marvell. A key concern for markets, particularly for Chinese optical module suppliers, was Tan's comment that the contribution of AI networking (e.g., Ethernet switches, optical interconnect chips) to AI revenue, currently near 40%, is expected to normalize to around 30% over time, signaling a potential peak in growth for that segment. Despite the guidance shortfall, Tan reiterated that AI demand remains "insatiable" and reaffirmed the long-term target of exceeding $100B in AI revenue by FY27. The reaction highlights the heightened sensitivity and premium valuation placed on AI-exposed stocks, where anything less than stellar guidance can prompt significant profit-taking. The broader question is whether this represents a cooling AI narrative or a correction in overstretched valuations.

marsbit4 мин. назад

Broadcom's Q3 Guidance Misses Expectations by $12 Billion, After-Hours Trading Plummets Over 13%, AI Narrative "Cooling"?

marsbit4 мин. назад

Fei-Fei Li's Team Clarifies the Concept of 'World Models', Sora Merely a Renderer

"World Models" has become a widely used yet confusing term in AI. To address this, a team led by Fei-Fei Li and World Labs proposed a functional taxonomy based on the Partially Observable Markov Decision Process framework. This taxonomy categorizes systems called "world models" into three distinct projections: Renderers, Simulators, and Planners. Renderers, like OpenAI's Sora and other video generation models, focus on producing photorealistic visual outputs for human perception. They prioritize visual fidelity over physical accuracy. Simulators, such as NVIDIA Omniverse, aim to compute precise future environmental states for computational tasks like engineering analysis or digital twins. Planners, like Vision-Language-Action models, take in observations and goals to output executable actions for robots or agents. The article clarifies that most current "world models," including Sora, are primarily Renderers. They generate convincing visuals but lack the core ability to simulate state transitions based on actions, a key requirement for a true world model in classic reinforcement learning definitions. This conceptual confusion has practical implications, leading to potential misalignment in technology selection, investment, and public understanding of AI capabilities. Clear categorization is crucial. It helps enterprises avoid costly mistakes (e.g., using a renderer for robot training), allows investors to accurately assess markets, and enables researchers to build comparable benchmarks. While future systems may integrate these functions, recognizing current boundaries is essential for honest assessment and progress.

marsbit1 ч. назад

Fei-Fei Li's Team Clarifies the Concept of 'World Models', Sora Merely a Renderer

marsbit1 ч. назад

Торговля

Спот
Фьючерсы
活动图片