How an Anthropic Engineer Saved 300 Million Tokens in a Week: A Claude Code Caching Guide

marsbitPublished on 2026-05-24Last updated on 2026-05-24

Abstract

Anthropic engineers reveal how prompt caching in Claude Code dramatically reduces token consumption. By reusing already-processed context, cached tokens cost only 10% of regular input tokens. The author saved over 300 million tokens in a week, with 91 million cached tokens in a single day counted as roughly 9 million for billing. Caching works via prefix matching across three layers: system (instructions, tools), project (CLAUDE.md, rules), and conversation (chat history). As long as the beginning of a request matches cached content, Claude reuses it instead of reprocessing. Key points for users: - Claude Code's cache TTL is 1 hour (vs. 5 minutes for API/Sub-agents). Avoid pausing a session beyond this. - Switching models (including enabling "Opus plan" mode) breaks cache, forcing a full reprocess. - For task switching, use a clear session handoff instead of letting an old session expire. - Place large documents in Projects, not directly in chat, for better caching. High cache reuse benefits both users (longer sessions) and Anthropic (lower costs). Monitoring cache hit rates is crucial internally. By managing context as an asset and avoiding cache-breaking habits, users can make their Claude Code sessions more efficient and cost-effective.

Editor's Note: When many people use Claude Code, the most intuitive feeling is that Token consumption is too fast, and long sessions can easily eat up the quota. But from the perspective of an Anthropic engineer, what truly affects cost is often not how much code you write, but whether the system consistently reuses context that has already been processed.

The core of this article is how to save Tokens through caching mechanisms. The author reused over 300 million Tokens through caching in one week, with a single-day cache hit reaching 91 million. Since the cost of a cached Token is only 10% of a regular input Token, this means 91 million cached Tokens are billed roughly equivalent to 9 million regular Tokens. The reason Claude Code long sessions seem more "durable" isn't because the model works for free, but because a large amount of repeated context is successfully reused.

The key to prompt caching is "don't break the cache." Claude Code caches system prompts, tool definitions, CLAUDE.md, project rules, and conversation history in layers; as long as the prefix of subsequent requests remains consistent, Claude can directly read from the cache instead of reprocessing the entire context. Anthropic internally also monitors prompt cache hit rates because it affects not only user quotas but also directly impacts model service costs and operational efficiency.

For ordinary users, you don't need to understand all the underlying details, just master a few key habits: don't let a session sit idle for more than 1 hour; perform a clean session handoff when switching tasks; avoid frequently switching models; put large documents into Projects instead of repeatedly pasting them into the conversation.

This article is less about a Token-saving trick and more about providing a Claude Code usage approach closer to an engineer's mindset: treat context as an asset to manage, let the cache be continuously reused, and make long sessions do less repetitive computation.

The following is the original text:

I saved 300 million Tokens this week, 91 million in a single day, over 300 million in a week.

I didn't change any settings. This is just prompt caching working normally in the background.

But after I truly understood what caching is and how to avoid "breaking" it, with the same usage quota, my sessions could last longer. So, here is a compiled 80/20 introductory guide to Claude Code prompt caching, without delving into deep API-level details.

TL;DR

Cached Tokens cost only 10% of regular input Tokens. 91 million cached Tokens are billed approximately equivalent to 9 million Tokens.

Claude Code subscription cache TTL is 1 hour; API default is 5 minutes; Sub-agent is always 5 minutes.

Caching is divided into three layers: system, project, and conversation.

Switching models mid-conversation breaks the cache, including enabling "opus plan" mode.

How is caching actually billed?

Every cached Token costs 10% of a regular input Token.

So, when my dashboard shows 91 million Tokens hitting cache on a particular day, the actual billing is roughly equivalent to processing only 9 million Tokens. This is also why, compared to having no cache, using Claude Code for long periods makes the session feel almost "free" to extend.

Two numbers on the dashboard are worth focusing on:

Cache create: The one-time cost incurred when writing content to the cache. It starts to take effect in the next round of conversation.
Cache read: Tokens Claude reuses from the cache, such as your CLAUDE.md, tool definitions, previous messages, etc. Cost is 10 times cheaper compared to reprocessing them as input.

If your Cache read number is high, it means you are effectively utilizing the cache; if this number is low, it means you are repeatedly paying for the same context.

Anthropic's Thariq once said something that left a deep impression on me: "We actually monitor prompt cache hit rates, and if the hit rate gets too low, it triggers an alert, even declaring a SEV-level incident."

He also wrote a great X article. When cache hit rates are high, four things happen simultaneously: Claude Code feels faster, Anthropic's service costs decrease, your subscription quota seems more durable, and long coding sessions become more realistic.

But if the hit rate is low, everyone loses.

So, the incentives are actually aligned: Anthropic wants your cache hit rate higher, and you yourself want the hit rate higher. What truly holds things back are some seemingly insignificant habits that quietly reset the cache.

How does the cache grow with each conversation turn?

Caching relies on prefix matching.

Without getting too deep into technical details, you just need to understand one thing: as long as the content before a certain position is completely identical to what's already cached, Claude can reuse those cached Tokens.

A brand new session typically unfolds like this:

According to Claude Code documentation, a fresh session usually runs like this:

First conversation turn: No cache exists yet. The system prompt, your project context (like CLAUDE.md, memory, rules), and your first message are all processed from scratch and written to the cache.

Second conversation turn: All content from the first turn is now cached. Claude only needs to process your new reply and the next message. This round is much cheaper.

Third conversation turn: Same logic. Previous conversation remains in the cache, only the latest round of interaction needs reprocessing.

The cache itself can be divided into three layers:

From Thariq's X article:

System layer: Includes base instructions, tool definitions (read, write, bash, grep, glob), and output style. This layer is cached globally.

Project layer: Includes CLAUDE.md, memory, project rules. This layer is cached per project.

Conversation layer: Includes replies and messages, growing with each conversation turn.

If anything in the system or project layer changes mid-session, everything must be recached from scratch. This is the most "expensive" operation. Imagine: you're already on the 16th message, then suddenly change the system prompt, or pause for an hour, all Tokens from message 1 onward need to be reprocessed.

The confusion between 1 hour and 5 minutes

This is the most easily misunderstood point.

Claude Code subscription: Default TTL is 1 hour.

Claude API: Default TTL is 5 minutes. You can pay a higher cost to increase it to 1 hour.
Sub-agent on any plan: Always 5 minutes.

Claude.ai web chat: Not officially documented. Likely same as subscription, but I haven't confirmed.

A few months ago, many people complained that Claude subscription quotas were being consumed too quickly. Some thought Anthropic had quietly reduced TTL from 1 hour to 5 minutes without notifying users. But that wasn't the case; Claude Code's TTL remains 1 hour.

The problem is, Claude Code and API documentation are kept separate, and these are two completely different things, leading to much confusion.

If you're running many Sub-agent workflows, or using the API directly, the 5-minute figure is important. But for 95% of Claude Code users, what really matters is that 1-hour window.

Three habits that cover 95% of users

The following are parts I find truly useful for daily use.

Don't pause too long

If you've been idle for more than an hour, previous content has mostly expired from the cache. Your next message will rebuild the cache. In such cases, instead of resuming an old session that has "gone cold," it's often cheaper to do a clean handoff and start a new session.

When switching tasks, just start fresh

/compact or /clear inherently break the cache, so it's better to use that moment for a true reset.

I made a session handoff skill to replace /compact. It summarizes what we've completed, what pending decisions remain, which files are most important, and where to continue next. Then I run /clear, paste this summary in, and can proceed as if nothing was interrupted.

The compact command sometimes runs slowly too. This handoff skill usually finishes in under a minute.

In Claude chat, put large documents into Projects

The caching mechanism on Claude.ai isn't officially documented in great detail, but Projects clearly use different optimizations compared to regular chat threads. So, if you need to paste large documents, it's better to put them in a Project rather than directly into the chat.

Which operations quietly break the cache?

A few things can completely reset the cache without obvious warning.

Switching models: Because caching relies on prefix matching, and each model has its own cache. Switching models means the next request will read the full history with no cache hits.

"Opus plan" mode: This setting uses Opus for planning and Sonnet for execution. I recommended it in some token optimization videos for a reason. But it's important to understand that each plan switch is essentially a model switch, meaning the cache must be rebuilt. In the long run, it still helps extend session quota, but you need to know what's happening under the hood.

Editing CLAUDE.md mid-session is okay: This change doesn't take effect immediately; it applies on the next restart. Therefore, the currently running cache isn't affected.

My free Token dashboard

The screenshot I showed earlier is from a token dashboard.

It's a simple GitHub repo. You give the link to Claude Code, have it deploy locally on localhost, and it will read all your past session records instead of starting statistics from scratch. You immediately see daily input, output, cache create, and cache read data.

One thing to note: this dashboard counts Token data on your local device. If you switch from desktop to laptop, the numbers won't match exactly. Each device has its own statistical view.

Summary

Prompt caching is something you can research deeply. Thariq's article covers it more completely than here; if you want the full picture, it's worth reading.

But you don't need to understand all the details to benefit. You just need to grasp the key 80/20: cached Tokens are 10 times cheaper than regular Tokens; Claude Code TTL is 1 hour; switching models breaks the cache; making a clean handoff between tasks is usually more cost-effective than forcing an old session back to life after it "expires."

Trending Cryptos

CitreaCTR

wrapped stUSDTWSTUSDT

Velodrome FinanceVELODROME

BrevisBREV

PancakeSwapCAKE

JUSTJST

Must-Watch Events Next Week｜CLARITY Act Could Face Senate Vote; SpaceX, Circle to Report Earnings (8.3-8.9)

**Summary: Key Events and Developments to Watch (August 3-9)** The upcoming week is marked by significant financial disclosures, key legislative deadlines, and notable product updates. **Major Financial Events:** Several companies are scheduled to release their Q2 2026 earnings. American Bitcoin (ABTC) will report on August 3, followed by SpaceX and Hut 8 Mining Corp. on August 4, and Circle on August 5. Notably, a significant portion of SpaceX shares (up to 12% of total shares) will be unlocked on August 6 following their earnings release. **Key Legislative Deadline:** The U.S. Senate faces an August 7 deadline to secure 60 votes for the CLARITY Act, a bipartisan bill aiming to establish a federal regulatory framework for cryptocurrencies. The Senate may hold a full vote on the bill during the week. **Economic Data:** The U.S. July Non-Farm Payrolls report will be released on August 7, providing crucial labor market data. **Technology & Product Updates:** * **Shutdowns:** DeFi portfolio tracker Zapper and wallet app Ctrl Wallet will cease operations on August 3. * **Upgrades:** LayerZero will deprecate its v1 relayers on August 3. XRP Ledger's new version 3.3.0, featuring five new functions, is expected next week. * **AI:** Elon Musk announced that the advanced Grok 4.6 AI model is set for release around August 7. * **Bitcoin:** The BIP-110 forced signaling for a potential Bitcoin network change is scheduled to begin around August 8. **Other Notable Events:** Chinese robotics firm Unitree Tech has set its preliminary price inquiry for its IPO for August 5. South Korean exchange Upbit will delist AQT and AERGO tokens on August 3.

marsbit36m ago

Must-Watch Events Next Week｜CLARITY Act Could Face Senate Vote; SpaceX, Circle to Report Earnings (8.3-8.9)

marsbit36m ago

Stocks Are Plummeting More Sharply Than Cryptocurrencies. Where Has the Money Gone?

Stock Markets Plunge Deeper Than Cryptocurrencies: Where Did the Money Go? In late July, Seoul's Kospi index triggered circuit breakers for two consecutive days, plummeting over 40% from its June high. The collapse was led by heavyweight stocks like SK Hynix, whose record profits still disappointed investors, and devastating leveraged ETFs, with one major product losing over 83% of its value. This signaled a global, forced deleveraging targeting the most crowded trades. Interestingly, while stocks exhibited extreme volatility akin to crypto markets, Bitcoin rose nearly 15% in July after a prior steep drop. Analysis shows the money fleeing equities did not flow into Bitcoin. Instead, Bitcoin had already absorbed its sell-off in May-June, when U.S. spot Bitcoin ETFs saw historic outflows. The true safe-haven beneficiary was gold, whose price rose over 20% year-on-year, highlighting a decoupling between Bitcoin and gold as "digital gold." The sell-off was a targeted unwinding of leveraged positions in tech and semiconductors, accelerated by broker-dealer risk management and shifts in the AI narrative, including new competition from Chinese memory chipmakers. The retreat path was clear: from high-valuation tech stocks to cash and U.S. Treasuries, then to gold. For Bitcoin to attract sustained institutional inflows, conditions like eased global liquidity pressure, a "soft-landing" Fed rate cut, and U.S. regulatory clarity via legislation like the stalled CLARITY Act are needed. Currently, Bitcoin is not a safe haven but an already-cleared asset. Its low correlation with tech stocks, however, makes it a potential diversification play for institutional portfolios once the storm passes. The money isn't here yet, but the positioning is underway.

marsbit36m ago

Stocks Are Plummeting More Sharply Than Cryptocurrencies. Where Has the Money Gone?

marsbit36m ago

In Conversation with Ray Dalio: We Are Currently in an AI Bubble, with 1% of My Portfolio in Bitcoin

Ray Dalio, founder of Bridgewater Associates, warns in an interview that the current AI boom shows classic bubble characteristics, which could lead to significant economic downturns as seen in past cycles like 1929 or 2000. He explains that speculative enthusiasm, fueled by debt and overvaluation, often precedes a crash when rising rates or taxation force asset sales, causing widespread losses and recession. Dalio also outlines his "Big Cycle" theory, describing an approximate 80-year pattern where widening wealth gaps, massive government deficits, and shifting geopolitical power (like China's rise) create internal conflict and global instability. He emphasizes that we are in a late-cycle, transitional phase where traditional powers like the US and UK face decline. For personal wealth protection, Dalio advises diversification beyond cash into assets like stocks, bonds, real estate, and particularly gold, which he prefers over Bitcoin. While he holds about 1% of his portfolio in Bitcoin as a non-printable hard asset, he views gold as more secure from technological or governmental threats. Regarding AI's impact, Dalio believes it will disproportionately benefit capital owners, worsening inequality by replacing both physical and cognitive labor. He suggests that human intuition and emotional intelligence, combined with AI, will be key for future workers. On taxation, Dalio argues that wealth taxes are impractical and risk triggering asset sell-offs, reducing productive investment. He points to the UK as a cautionary example of debt, low productivity, and political strife. Geopolitically, Dalio foresees a more regionalized world, with the US showing weakness in prolonged conflicts like with Iran, akin to past imperial declines. The ideal outcome, he suggests, is coexisting powerful blocs (e.g., Americas, China-Asia Pacific) without major war.

marsbit4h ago

In Conversation with Ray Dalio: We Are Currently in an AI Bubble, with 1% of My Portfolio in Bitcoin

marsbit4h ago

Daily 7.2 Trillion KRW: Foreign Capital's Record Net Buying on Friday! Wall Street Says Headwinds for Korean Stock Fund Flows Have Subsided

South Korean stock market sees a dramatic shift in fund flows. On July 31, foreign investors made a record net purchase of approximately KRW 7.2 trillion in KOSPI stocks, marking a fundamental reversal from the persistent large-scale net outflows seen in previous months. This contributed to a significant narrowing of foreign net selling in July to KRW 9.8 trillion, down sharply from KRW 48.4 trillion in June and KRW 44.5 trillion in May. Simultaneously, domestic institutional pressure eased. South Korean pension funds and asset managers turned to a net buying position in July, purchasing KRW 1.0 trillion worth of KOSPI shares, contrasting with net sales in May and June. Market volatility is expected to be dampened by new financial regulations. Effective July 31, the Financial Services Commission tightened access for retail investors to single-stock leveraged ETFs by raising the minimum cash deposit requirement. Trading volumes for these products subsequently dropped to about 50% of their monthly average. Citigroup Research maintains its year-end KOSPI target of 10,000 points. The firm cites several supportive factors: the substantial easing of headwinds from capital outflows, a robust fundamental outlook for the semiconductor sector, historically low market valuations, strong economic fundamentals, and the potential for policy support from financial authorities if needed.

marsbit4h ago

Daily 7.2 Trillion KRW: Foreign Capital's Record Net Buying on Friday! Wall Street Says Headwinds for Korean Stock Fund Flows Have Subsided

marsbit4h ago

Breaking! OpenAI's Next-Gen AI Solves 10 Fields Medal-Level Problems

OpenAI's next-generation AI model Astra achieves breakthroughs in 10 long-standing mathematical conjectures. The results, including constructing the first known infinite, finitely presented non-sofic group—resolving a major question since 1999—and advancing the high-dimensional sphere packing problem beyond a 46-year-old barrier, are detailed in a 249-page paper. Key proofs have been formally verified using Lean 4. The AI also refuted a rigidity conjecture by Fields Medalist Alain Connes. According to OpenAI, generating these proofs cost under $2,000. Experts describe the findings as potentially Fields Medal-worthy and a landmark moment for both mathematics and AI, showcasing the model's ability to produce profound, human-like reasoning across diverse fields like group theory, geometry, and operator algebras.

marsbit6h ago

Breaking! OpenAI's Next-Gen AI Solves 10 Fields Medal-Level Problems

marsbit6h ago

Trading

Spot

Hot Articles

Beoble: A Social App for Web3 People

Beoble is a communication infrastructure and ecosystem.

34.0k Total ViewsPublished 2024.03.13Updated 2024.03.13

How to Buy PEOPLE

Welcome to HTX.com! We've made purchasing ConstitutionDAO (PEOPLE) simple and convenient. Follow our step-by-step guide to embark on your crypto journey.Step 1: Create Your HTX AccountUse your email or phone number to sign up for a free account on HTX. Experience a hassle-free registration journey and unlock all features.Get My AccountStep 2: Go to Buy Crypto and Choose Your Payment MethodCredit/Debit Card: Use your Visa or Mastercard to buy ConstitutionDAO (PEOPLE) instantly.Balance: Use funds from your HTX account balance to trade seamlessly.Third Parties: We've added popular payment methods such as Google Pay and Apple Pay to enhance convenience.P2P: Trade directly with other users on HTX.Over-the-Counter (OTC): We offer tailor-made services and competitive exchange rates for traders.Step 3: Store Your ConstitutionDAO (PEOPLE)After purchasing your ConstitutionDAO (PEOPLE), store it in your HTX account. Alternatively, you can send it elsewhere via blockchain transfer or use it to trade other cryptocurrencies.Step 4: Trade ConstitutionDAO (PEOPLE)Easily trade ConstitutionDAO (PEOPLE) on HTX's spot market. Simply access your account, select your trading pair, execute your trades, and monitor in real-time. We offer a user-friendly experience for both beginners and seasoned traders.

7.8k Total ViewsPublished 2024.03.29Updated 2026.06.02

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of PEOPLE (PEOPLE) are presented below.

How an Anthropic Engineer Saved 300 Million Tokens in a Week: A Claude Code Caching Guide

Abstract

TL;DR

How is caching actually billed?

How does the cache grow with each conversation turn?

The confusion between 1 hour and 5 minutes

Three habits that cover 95% of users

Don't pause too long

When switching tasks, just start fresh

In Claude chat, put large documents into Projects

Which operations quietly break the cache?

My free Token dashboard

Summary

Trending Cryptos

Related Questions

Related Reads

Must-Watch Events Next Week｜CLARITY Act Could Face Senate Vote; SpaceX, Circle to Report Earnings (8.3-8.9)

Stocks Are Plummeting More Sharply Than Cryptocurrencies. Where Has the Money Gone?

In Conversation with Ray Dalio: We Are Currently in an AI Bubble, with 1% of My Portfolio in Bitcoin

Daily 7.2 Trillion KRW: Foreign Capital's Record Net Buying on Friday! Wall Street Says Headwinds for Korean Stock Fund Flows Have Subsided

Breaking! OpenAI's Next-Gen AI Solves 10 Fields Medal-Level Problems

Trading

Hot Articles

Beoble: A Social App for Web3 People

How to Buy PEOPLE

Discussions

Top Questions