How an Anthropic Engineer Saved 300 Million Tokens in a Week: A Claude Code Caching Guide
Anthropic engineers reveal how prompt caching in Claude Code dramatically reduces token consumption. By reusing already-processed context, cached tokens cost only 10% of regular input tokens. The author saved over 300 million tokens in a week, with 91 million cached tokens in a single day counted as roughly 9 million for billing.
Caching works via prefix matching across three layers: system (instructions, tools), project (CLAUDE.md, rules), and conversation (chat history). As long as the beginning of a request matches cached content, Claude reuses it instead of reprocessing.
Key points for users:
- Claude Code's cache TTL is 1 hour (vs. 5 minutes for API/Sub-agents). Avoid pausing a session beyond this.
- Switching models (including enabling "Opus plan" mode) breaks cache, forcing a full reprocess.
- For task switching, use a clear session handoff instead of letting an old session expire.
- Place large documents in Projects, not directly in chat, for better caching.
High cache reuse benefits both users (longer sessions) and Anthropic (lower costs). Monitoring cache hit rates is crucial internally. By managing context as an asset and avoiding cache-breaking habits, users can make their Claude Code sessions more efficient and cost-effective.
marsbit05/24 00:36