OpenClaw Token Saving Ultimate Guide: Use the Strongest Model, Spend the Least Money / Includes Prompts

marsbitОпубліковано о 2026-02-11Востаннє оновлено о 2026-02-11

Анотація

This guide provides strategies to reduce OpenClaw token usage by 60-85% when using expensive models like Claude Opus. The main costs come not just from your input and the model's output, but from hidden overhead: a fixed System Prompt (~3000-5000 tokens), injected context files like AGENTS.md and MEMORY.md (~3000-14000 tokens), and conversation history. Key strategies include: 1. **Model Tiering:** Use the cheaper Claude Sonnet for 80% of daily tasks (chat, simple Q&A, cron jobs) and reserve Opus for complex tasks like writing and deep analysis. 2. **Context Slimming:** Drastically reduce the token count in injected files (AGENTS.md, SOUL.md, MEMORY.md) and remove unnecessary files from `workspaceFiles`. 3. **Cron Optimization:** Lower the frequency, merge tasks, and downgrade non-critical cron jobs to Sonnet. Configure deliveries for notifications only when necessary. 4. **Heartbeat Tuning:** Increase the interval (e.g., 45-60 minutes), set a silent period overnight, and slim down the HEARTBEAT.md file. 5. **Precise Retrieval with QMD:** Implement the local, zero-cost qmd tool for semantic search. This allows the agent to retrieve only specific relevant paragraphs from documents instead of reading entire files, saving up to 90% of tokens per query. 6. **Memory Search Selection:** For small memory files, use local embedding; for larger or multi-language needs, consider Voyage AI's free tier. By implementing these changes—model switching, context reduction, and smarter...

Author: xiyu

Want to use Claude Opus 4.6 but don't want the bill to explode at the end of the month? This guide will help you cut 60-85% of the cost.

1. Where do tokens go?

You think tokens are just "what you say + what the AI replies"? Actually, it's far more than that.

Hidden costs of each conversation:

System Prompt (~3000-5000 tokens): OpenClaw core instructions, cannot be changed
Context file injection (~3000-14000 tokens): AGENTS.md, SOUL.md, MEMORY.md, etc., included in every conversation – this is the biggest hidden cost
Message history: Gets longer the more you chat
Your input + AI output: This is what you thought was the "whole" thing

A simple "How's the weather today?" actually consumes 8000-15000 input tokens. Calculated with Opus, just the context costs $0.12-0.22.

Cron is even worse: Each trigger = a brand new conversation = re-injecting all context. A cron running every 15 minutes, 96 times a day, costs $10-20 per day under Opus.

Heartbeat is the same principle: Essentially also a conversation call, the shorter the interval, the more money it burns.

2. Model Tiering: Sonnet for Daily, Opus for Critical

The first major money-saving trick, with the most dramatic effect. Sonnet is priced at about 1/5 of Opus, and is fully sufficient for 80% of daily tasks.

markdown

Prompt:

Please help me change OpenClaw's default model to Claude Sonnet,

and only use Opus when deep analysis or creation is needed.

Specific needs:

1) Set default model to Sonnet

2) cron tasks default to Sonnet

3) Only specify Opus for writing, deep analysis tasks

Opus scenarios: Long-form writing, complex code, multi-step reasoning, creative tasks

Sonnet scenarios: Daily chat, simple Q&A, cron checks, heartbeat, file operations, translation

Actual test: After switching, monthly cost dropped 65%, experience almost no difference.

3. Context Slimming: Cut the Hidden Token Hogs

The "background noise" per call can be 3000-14000 tokens. Streamlining injected files is the optimization with the highest cost-performance ratio.

markdown

Prompt:

Help me streamline OpenClaw's context files to save tokens.

Specifically include: 1) Delete unnecessary parts of AGENTS.md (group chat rules, TTS, unused features), compress to within 800 tokens

2) Simplify SOUL.md to concise key points, 300-500 tokens

3) Clean up expired information in MEMORY.md, control within 2000 tokens

4) Check workspaceFiles configuration, remove unnecessary injected files

Rule of thumb: For every 1000 tokens reduced in injection, calculated at 100 Opus calls per day, save about $45 per month.

4. Cron Optimization: The Most Hidden Cost Killer

markdown

Prompt: Help me optimize OpenClaw's cron tasks to save tokens.

Please:

1) List all cron tasks, their frequency, and model

2) Downgrade all non-creative tasks to Sonnet

3) Merge tasks in the same time period (e.g., combine multiple checks into one)

4) Reduce unnecessary high frequency (system check from 10 minutes to 30 minutes, version check from 3 times/day to 1 time/day)

5) Configure delivery to notify on demand, no message when normal

Core principle: More frequent is not always better, most "real-time" demands are false demands. Merging 5 independent checks into 1 call saves 75% context injection cost.

5. Heartbeat Optimization

markdown

Prompt: Help me optimize OpenClaw heartbeat configuration:

1) Set work hour interval to 45-60 minutes

2) Set 23:00-08:00 at night as silent period

3) Streamline HEARTBEAT.md to the minimum number of lines

4) Merge scattered check tasks into heartbeat for batch execution

6. Precise Retrieval: Use qmd to Save 90% Input Token

When the agent looks up information, it defaults to "reading the full text" – a 500-line file is 3000-5000 tokens, but it only needs 10 lines from it. 90% of input tokens are wasted.

qmd is a local semantic retrieval tool that builds a full-text + vector index, allowing the agent to pinpoint paragraphs instead of reading the entire file. All computed locally, zero API cost.

Use with mq (Mini Query): Preview directory structure, precise paragraph extraction, keyword search – only read the needed 10-30 lines each time.

markdown

Prompt:

Help me configure qmd knowledge base retrieval to save tokens.

Github address: https://github.com/tobi/qmd

Needs:

1) Install qmd

2) Build index for the working directory

3) Add retrieval rules in AGENTS.md, force agent to prioritize qmd/mq search over direct read full text

4) Set up scheduled index updates

Actual effect: Each information lookup dropped from 15000 tokens to 1500 tokens, a 90% reduction.

Difference from memorySearch: memorySearch manages "memories" (MEMORY.md), qmd manages "looking up information" (custom knowledge base), they do not affect each other.

7. Memory Search Choice

markdown

Prompt: Help me configure OpenClaw's memorySearch.

If I don't have many memory files (dozens of md),

recommend using local embedding or Voyage AI?

Please explain the cost and retrieval quality differences of each.

Simple conclusion: Use local embedding for few memory files (zero cost), use Voyage AI for high multilingual needs or many files (200 million tokens per account free).

8. Ultimate Configuration Checklist

markdown

Prompt:

Please help me optimize OpenClaw configuration in one go to save tokens to the maximum extent, execute according to the following checklist:

Change default model to Sonnet, only reserve Opus for creative/analysis tasks

Streamline AGENTS.md / SOUL.md / MEMORY.md

Downgrade all cron tasks to Sonnet + merge + reduce frequency

Heartbeat interval 45 minutes + nighttime silence

Configure qmd precise retrieval to replace full-text reading

workspaceFiles only keep necessary files

Regularly streamline memory files, control MEMORY.md within 2000 tokens

Configure once, benefit long-term:

1. Model Tiering — Sonnet daily, Opus critical, save 60-80%

2. Context Slimming — Streamline files + qmd precise retrieval, save 30-90% input token

3. Reduce Calls — Merge cron, extend heartbeat, enable silent period

Sonnet 4 is already very strong, can't feel the difference in daily use. Just switch to Opus when you really need it.

Based on multi-agent system practical experience, data are desensitized estimates.

Трендові криптовалюти

CitreaCTR

wrapped stUSDTWSTUSDT

Velodrome FinanceVELODROME

Пов'язані питання

QWhat are the main hidden costs of token usage in OpenClaw according to the article?

AThe main hidden costs include the System Prompt (~3000-5000 tokens), context file injections like AGENTS.md, SOUL.md, and MEMORY.md (~3000-14000 tokens), and the accumulation of historical messages in conversations.

QWhat is the primary strategy recommended for reducing costs with model selection?

AThe primary strategy is model layering: using Claude Sonnet for daily tasks and reserving Claude Opus only for critical tasks like deep analysis or creative work, as Sonnet is about 1/5 the cost of Opus.

QHow does using qmd help in reducing token consumption?

Aqmd is a local semantic retrieval tool that creates a vector index for precise paragraph retrieval instead of reading entire files, reducing input tokens by up to 90% for research tasks, as it only fetches the needed 10-30 lines.

QWhat optimizations are suggested for cron tasks to save tokens?

AOptimizations include downgrading non-creative tasks to Sonnet, merging multiple tasks into single calls, reducing unnecessary high frequency (e.g., from 10 to 30 minutes), and configuring delivery for on-demand notifications to avoid messages when normal.

QWhat is the recommended approach for heartbeat configuration to minimize costs?

ASet heartbeat intervals to 45-60 minutes during work hours, implement a silent period from 23:00 to 08:00,精简 HEARTBEAT.md to minimal lines, and consolidate scattered check tasks into batch executions within heartbeat.

Пов'язані матеріали

Ethereum down 45% YTD – So why do SharpLink and whales keep buying?

Despite Ethereum (ETH) being down 20-45% year-to-date amid broader crypto weakness, institutional and whale buying activity suggests growing long-term conviction. SharpLink, after an eight-month pause, purchased 5,000 ETH and later added $45.54 million in LSETH, increasing its total holdings significantly despite substantial unrealized losses. Similarly, a new whale wallet accumulated over 18,000 ETH worth $28.9 million in nine days, indicating strategic positioning for future price movements rather than short-term trading. However, this accumulation contrasts with spot Ethereum ETFs, which saw net outflows of $12.85 million recently, highlighting a divergence between direct treasury/whale buyers and ETF investors. While the persistent buying from these large holders may gradually ease selling pressure, a sustained recovery for Ethereum still depends on a reversal in ETF flows and a broader improvement in network demand and market sentiment.

ambcrypto3 год тому

Ethereum down 45% YTD – So why do SharpLink and whales keep buying?

ambcrypto3 год тому

Just now, DeepSeek V4 updates with DSpark, improving inference speed by 80%

DeepSeek has updated its DeepSeek V4 model with the DSpark speculative decoding framework, achieving a significant 60-85% speedup in generation for Flash models and 57-78% for Pro models while maintaining the same overall throughput. This engineering-focused update, rather than a core architectural change, introduces DSpark to address latency and throughput bottlenecks in high-concurrency production environments. DSpark combines high-throughput parallel generation with adaptive load-aware verification. Its key innovations include a semi-autoregressive generation architecture to model dependencies within token blocks and a hardware-aware confidence-scheduled verification system. This system uses a confidence head to predict token acceptance probabilities, allowing it to dynamically optimize verification length per request and allocate compute only to tokens with the highest expected payoff. The asynchronous scheduler is designed for real-world deployment, ensuring zero-overhead scheduling and continuous CUDA graph replay while preserving the target model's output distribution. In tests across mathematical reasoning, code generation, and daily dialogue, DSpark outperformed state-of-the-art models like Eagle3 and DFlash, increasing average acceptance length by 26.7%-30.9% and 16.3%-18.4% respectively on Qwen3 target models. DeepSeek also open-sourced DeepSpec, a full-stack codebase for training and evaluating speculative decoding draft models, providing a standardized toolkit that includes data preparation tools, model implementations, training code, and evaluation scripts.

marsbit5 год тому

Just now, DeepSeek V4 updates with DSpark, improving inference speed by 80%

marsbit5 год тому

Can Aavenomics 3.0 sustain AAVE’s recovery rally amid Kraken buyout talks?

Aave Labs CEO Stani Kulechov has dismissed reports of a potential stake sale to Kraken, clarifying that Aave would not sell at a significant discount. He highlighted Aave's substantial annualized revenue and its focus on a broad financial market. Kulechov also announced plans for Aavenomics 3.0, featuring a new automated buyback mechanism. Following these updates, the AAVE token price surged 12%, extending its June recovery rally to over 50% from its recent lows. The rally is partly attributed to reduced selling pressure and positive sentiment from the announced tokenomics plan, despite the token remaining significantly below its all-time high.

ambcrypto5 год тому

Can Aavenomics 3.0 sustain AAVE’s recovery rally amid Kraken buyout talks?

ambcrypto5 год тому

BIT Research: The 2028 Halving Is Not the End, the Real Shake-Up of the Bitcoin Mining Industry Is Just Beginning

The Bitcoin mining industry is undergoing its most complex structural adjustment since inception. Despite Bitcoin's price holding near $61,000 and the network hash rate approaching a record 1 ZH/s, miner profitability is deteriorating. The industry is operating close to its breakeven point, with the 2028 halving expected to accelerate consolidation. The challenges extend beyond the halving's subsidy reduction; the industry's revenue model has yet to successfully transition towards a fee-driven structure. Increasingly, mining companies are evolving from simple Bitcoin producers into infrastructure and energy operators, including providers of AI/HPC computing power. Competition is shifting from pure hash rate expansion to business model upgrades. Economic pressure is evident. The theoretical daily mining revenue at current prices is around $78 million, yet the actual figure is only about $33 million—a 136% gap. Transaction fees remain low at roughly $220k daily, far below historical implied levels. With a current estimated industry-wide breakeven price near $65,000, mining alone is struggling to generate ideal profits. The 2028 halving is projected to push the fundamental production cost floor to approximately $93,289. This will likely accelerate a shift towards consolidation among larger, well-capitalized miners with diversified revenue streams. Competitive advantage will belong to institutionalized players with access to low-cost energy, AI/HPC hosting operations, and stronger balance sheets. In essence, Bitcoin mining is transitioning from a "mining business" to an "infrastructure business." Future profitability and resilience will depend less on block rewards and more on diversified income sources like energy management and computational infrastructure services. For investors, the key question is not the halving itself, but which miners can successfully navigate this business model transformation.

marsbit6 год тому

BIT Research: The 2028 Halving Is Not the End, the Real Shake-Up of the Bitcoin Mining Industry Is Just Beginning

marsbit6 год тому

This is How God Karpathy Uses Claude?

Andrej Karpathy, a prominent figure in AI, has reportedly joined Anthropic, leading to a noticeable decrease in his open-source contributions and social media activity. A document claiming to be his personal "CLAUDE.md" file—a set of instructions for the Claude AI to follow within a specific codebase—has been circulating online. While its authenticity is unverified, the content aligns closely with Karpathy's publicly shared principles on effective AI-assisted programming. The document outlines key rules for AI coding assistants, emphasizing the importance of reading existing code thoroughly before writing new code to maintain consistency. It advises against over-engineering, advocating for simple, surgical modifications that match the project's existing style. Other guidelines include clarifying assumptions upfront, writing meaningful tests, thoughtful debugging, and carefully considering dependencies. The core message is that these principles help prevent common AI coding failures, such as introducing unnecessary abstractions, style drift, or making invisible architectural decisions. The community has noted that even experts like Karpathy require detailed instructions to guide AI effectively, akin to managing a junior developer. A related GitHub repository, "andrej-karpathy-skills," which encapsulates these ideas, is reported to significantly reduce Claude's code error rate. Ultimately, the advice stresses that the best CLAUDE.md is tailored to one's own tech stack and coding practices.

marsbit6 год тому

marsbit6 год тому

Торгівля

Спот

Обговорення

Ласкаво просимо до спільноти HTX. Тут ви можете бути в курсі останніх подій розвитку платформи та отримати доступ до професійної ринкової інформації. Нижче представлені думки користувачів щодо ціни T (T).

OpenClaw Token Saving Ultimate Guide: Use the Strongest Model, Spend the Least Money / Includes Prompts

Анотація

1. Where do tokens go?

2. Model Tiering: Sonnet for Daily, Opus for Critical

3. Context Slimming: Cut the Hidden Token Hogs

4. Cron Optimization: The Most Hidden Cost Killer

5. Heartbeat Optimization

6. Precise Retrieval: Use qmd to Save 90% Input Token

7. Memory Search Choice

8. Ultimate Configuration Checklist

Configure once, benefit long-term:

Трендові криптовалюти

Пов'язані питання

Пов'язані матеріали

Ethereum down 45% YTD – So why do SharpLink and whales keep buying?

Just now, DeepSeek V4 updates with DSpark, improving inference speed by 80%

Can Aavenomics 3.0 sustain AAVE’s recovery rally amid Kraken buyout talks?

BIT Research: The 2028 Halving Is Not the End, the Real Shake-Up of the Bitcoin Mining Industry Is Just Beginning

This is How God Karpathy Uses Claude?

Торгівля

Популярні статті

Як купити T

Обговорення