Author: xiyu
Want to use Claude Opus 4.6 but don't want the bill to explode at the end of the month? This guide will help you cut 60-85% of the cost.
1. Where do tokens go?
You think tokens are just "what you say + what the AI replies"? Actually, it's far more than that.
Hidden costs of each conversation:
-
System Prompt (~3000-5000 tokens): OpenClaw core instructions, cannot be changed
-
Context file injection (~3000-14000 tokens): AGENTS.md, SOUL.md, MEMORY.md, etc., included in every conversation – this is the biggest hidden cost
-
Message history: Gets longer the more you chat
-
Your input + AI output: This is what you thought was the "whole" thing
A simple "How's the weather today?" actually consumes 8000-15000 input tokens. Calculated with Opus, just the context costs $0.12-0.22.
Cron is even worse: Each trigger = a brand new conversation = re-injecting all context. A cron running every 15 minutes, 96 times a day, costs $10-20 per day under Opus.
Heartbeat is the same principle: Essentially also a conversation call, the shorter the interval, the more money it burns.
2. Model Tiering: Sonnet for Daily, Opus for Critical
The first major money-saving trick, with the most dramatic effect. Sonnet is priced at about 1/5 of Opus, and is fully sufficient for 80% of daily tasks.
markdown
Prompt:
Please help me change OpenClaw's default model to Claude Sonnet,
and only use Opus when deep analysis or creation is needed.
Specific needs:
1) Set default model to Sonnet
2) cron tasks default to Sonnet
3) Only specify Opus for writing, deep analysis tasks
Opus scenarios: Long-form writing, complex code, multi-step reasoning, creative tasks
Sonnet scenarios: Daily chat, simple Q&A, cron checks, heartbeat, file operations, translation
Actual test: After switching, monthly cost dropped 65%, experience almost no difference.
3. Context Slimming: Cut the Hidden Token Hogs
The "background noise" per call can be 3000-14000 tokens. Streamlining injected files is the optimization with the highest cost-performance ratio.
markdown
Prompt:
Help me streamline OpenClaw's context files to save tokens.
Specifically include: 1) Delete unnecessary parts of AGENTS.md (group chat rules, TTS, unused features), compress to within 800 tokens
2) Simplify SOUL.md to concise key points, 300-500 tokens
3) Clean up expired information in MEMORY.md, control within 2000 tokens
4) Check workspaceFiles configuration, remove unnecessary injected files
Rule of thumb: For every 1000 tokens reduced in injection, calculated at 100 Opus calls per day, save about $45 per month.
4. Cron Optimization: The Most Hidden Cost Killer
markdown
Prompt: Help me optimize OpenClaw's cron tasks to save tokens.
Please:
1) List all cron tasks, their frequency, and model
2) Downgrade all non-creative tasks to Sonnet
3) Merge tasks in the same time period (e.g., combine multiple checks into one)
4) Reduce unnecessary high frequency (system check from 10 minutes to 30 minutes, version check from 3 times/day to 1 time/day)
5) Configure delivery to notify on demand, no message when normal
Core principle: More frequent is not always better, most "real-time" demands are false demands. Merging 5 independent checks into 1 call saves 75% context injection cost.
5. Heartbeat Optimization
markdown
Prompt: Help me optimize OpenClaw heartbeat configuration:
1) Set work hour interval to 45-60 minutes
2) Set 23:00-08:00 at night as silent period
3) Streamline HEARTBEAT.md to the minimum number of lines
4) Merge scattered check tasks into heartbeat for batch execution
6. Precise Retrieval: Use qmd to Save 90% Input Token
When the agent looks up information, it defaults to "reading the full text" – a 500-line file is 3000-5000 tokens, but it only needs 10 lines from it. 90% of input tokens are wasted.
qmd is a local semantic retrieval tool that builds a full-text + vector index, allowing the agent to pinpoint paragraphs instead of reading the entire file. All computed locally, zero API cost.
Use with mq (Mini Query): Preview directory structure, precise paragraph extraction, keyword search – only read the needed 10-30 lines each time.
markdown
Prompt:
Help me configure qmd knowledge base retrieval to save tokens.
Github address: https://github.com/tobi/qmd
Needs:
1) Install qmd
2) Build index for the working directory
3) Add retrieval rules in AGENTS.md, force agent to prioritize qmd/mq search over direct read full text
4) Set up scheduled index updates
Actual effect: Each information lookup dropped from 15000 tokens to 1500 tokens, a 90% reduction.
Difference from memorySearch: memorySearch manages "memories" (MEMORY.md), qmd manages "looking up information" (custom knowledge base), they do not affect each other.
7. Memory Search Choice
markdown
Prompt: Help me configure OpenClaw's memorySearch.
If I don't have many memory files (dozens of md),
recommend using local embedding or Voyage AI?
Please explain the cost and retrieval quality differences of each.
Simple conclusion: Use local embedding for few memory files (zero cost), use Voyage AI for high multilingual needs or many files (200 million tokens per account free).
8. Ultimate Configuration Checklist
markdown
Prompt:
Please help me optimize OpenClaw configuration in one go to save tokens to the maximum extent, execute according to the following checklist:
Change default model to Sonnet, only reserve Opus for creative/analysis tasks
Streamline AGENTS.md / SOUL.md / MEMORY.md
Downgrade all cron tasks to Sonnet + merge + reduce frequency
Heartbeat interval 45 minutes + nighttime silence
Configure qmd precise retrieval to replace full-text reading
workspaceFiles only keep necessary files
Regularly streamline memory files, control MEMORY.md within 2000 tokens
Configure once, benefit long-term:
1. Model Tiering — Sonnet daily, Opus critical, save 60-80%
2. Context Slimming — Streamline files + qmd precise retrieval, save 30-90% input token
3. Reduce Calls — Merge cron, extend heartbeat, enable silent period
Sonnet 4 is already very strong, can't feel the difference in daily use. Just switch to Opus when you really need it.
Based on multi-agent system practical experience, data are desensitized estimates.