OpenClaw Token Saving Ultimate Guide: Use the Strongest Model, Spend the Least Money / Includes Prompts

marsbitPublished on 2026-02-11Last updated on 2026-02-11

Abstract

This guide provides strategies to reduce OpenClaw token usage by 60-85% when using expensive models like Claude Opus. The main costs come not just from your input and the model's output, but from hidden overhead: a fixed System Prompt (~3000-5000 tokens), injected context files like AGENTS.md and MEMORY.md (~3000-14000 tokens), and conversation history. Key strategies include: 1. **Model Tiering:** Use the cheaper Claude Sonnet for 80% of daily tasks (chat, simple Q&A, cron jobs) and reserve Opus for complex tasks like writing and deep analysis. 2. **Context Slimming:** Drastically reduce the token count in injected files (AGENTS.md, SOUL.md, MEMORY.md) and remove unnecessary files from `workspaceFiles`. 3. **Cron Optimization:** Lower the frequency, merge tasks, and downgrade non-critical cron jobs to Sonnet. Configure deliveries for notifications only when necessary. 4. **Heartbeat Tuning:** Increase the interval (e.g., 45-60 minutes), set a silent period overnight, and slim down the HEARTBEAT.md file. 5. **Precise Retrieval with QMD:** Implement the local, zero-cost qmd tool for semantic search. This allows the agent to retrieve only specific relevant paragraphs from documents instead of reading entire files, saving up to 90% of tokens per query. 6. **Memory Search Selection:** For small memory files, use local embedding; for larger or multi-language needs, consider Voyage AI's free tier. By implementing these changes—model switching, context reduction, and smarter...

Author: xiyu

Want to use Claude Opus 4.6 but don't want the bill to explode at the end of the month? This guide will help you cut 60-85% of the cost.

1. Where do tokens go?

You think tokens are just "what you say + what the AI replies"? Actually, it's far more than that.

Hidden costs of each conversation:

  • System Prompt (~3000-5000 tokens): OpenClaw core instructions, cannot be changed

  • Context file injection (~3000-14000 tokens): AGENTS.md, SOUL.md, MEMORY.md, etc., included in every conversation – this is the biggest hidden cost

  • Message history: Gets longer the more you chat

  • Your input + AI output: This is what you thought was the "whole" thing

A simple "How's the weather today?" actually consumes 8000-15000 input tokens. Calculated with Opus, just the context costs $0.12-0.22.

Cron is even worse: Each trigger = a brand new conversation = re-injecting all context. A cron running every 15 minutes, 96 times a day, costs $10-20 per day under Opus.

Heartbeat is the same principle: Essentially also a conversation call, the shorter the interval, the more money it burns.

2. Model Tiering: Sonnet for Daily, Opus for Critical

The first major money-saving trick, with the most dramatic effect. Sonnet is priced at about 1/5 of Opus, and is fully sufficient for 80% of daily tasks.

markdown

Prompt:

Please help me change OpenClaw's default model to Claude Sonnet,

and only use Opus when deep analysis or creation is needed.

Specific needs:

1) Set default model to Sonnet

2) cron tasks default to Sonnet

3) Only specify Opus for writing, deep analysis tasks

Opus scenarios: Long-form writing, complex code, multi-step reasoning, creative tasks

Sonnet scenarios: Daily chat, simple Q&A, cron checks, heartbeat, file operations, translation

Actual test: After switching, monthly cost dropped 65%, experience almost no difference.

3. Context Slimming: Cut the Hidden Token Hogs

The "background noise" per call can be 3000-14000 tokens. Streamlining injected files is the optimization with the highest cost-performance ratio.

markdown

Prompt:

Help me streamline OpenClaw's context files to save tokens.

Specifically include: 1) Delete unnecessary parts of AGENTS.md (group chat rules, TTS, unused features), compress to within 800 tokens

2) Simplify SOUL.md to concise key points, 300-500 tokens

3) Clean up expired information in MEMORY.md, control within 2000 tokens

4) Check workspaceFiles configuration, remove unnecessary injected files

Rule of thumb: For every 1000 tokens reduced in injection, calculated at 100 Opus calls per day, save about $45 per month.

4. Cron Optimization: The Most Hidden Cost Killer

markdown

Prompt: Help me optimize OpenClaw's cron tasks to save tokens.

Please:

1) List all cron tasks, their frequency, and model

2) Downgrade all non-creative tasks to Sonnet

3) Merge tasks in the same time period (e.g., combine multiple checks into one)

4) Reduce unnecessary high frequency (system check from 10 minutes to 30 minutes, version check from 3 times/day to 1 time/day)

5) Configure delivery to notify on demand, no message when normal

Core principle: More frequent is not always better, most "real-time" demands are false demands. Merging 5 independent checks into 1 call saves 75% context injection cost.

5. Heartbeat Optimization

markdown

Prompt: Help me optimize OpenClaw heartbeat configuration:

1) Set work hour interval to 45-60 minutes

2) Set 23:00-08:00 at night as silent period

3) Streamline HEARTBEAT.md to the minimum number of lines

4) Merge scattered check tasks into heartbeat for batch execution

6. Precise Retrieval: Use qmd to Save 90% Input Token

When the agent looks up information, it defaults to "reading the full text" – a 500-line file is 3000-5000 tokens, but it only needs 10 lines from it. 90% of input tokens are wasted.

qmd is a local semantic retrieval tool that builds a full-text + vector index, allowing the agent to pinpoint paragraphs instead of reading the entire file. All computed locally, zero API cost.

Use with mq (Mini Query): Preview directory structure, precise paragraph extraction, keyword search – only read the needed 10-30 lines each time.

markdown

Prompt:

Help me configure qmd knowledge base retrieval to save tokens.

Github address: https://github.com/tobi/qmd

Needs:

1) Install qmd

2) Build index for the working directory

3) Add retrieval rules in AGENTS.md, force agent to prioritize qmd/mq search over direct read full text

4) Set up scheduled index updates

Actual effect: Each information lookup dropped from 15000 tokens to 1500 tokens, a 90% reduction.

Difference from memorySearch: memorySearch manages "memories" (MEMORY.md), qmd manages "looking up information" (custom knowledge base), they do not affect each other.

7. Memory Search Choice

markdown

Prompt: Help me configure OpenClaw's memorySearch.

If I don't have many memory files (dozens of md),

recommend using local embedding or Voyage AI?

Please explain the cost and retrieval quality differences of each.

Simple conclusion: Use local embedding for few memory files (zero cost), use Voyage AI for high multilingual needs or many files (200 million tokens per account free).

8. Ultimate Configuration Checklist

markdown

Prompt:

Please help me optimize OpenClaw configuration in one go to save tokens to the maximum extent, execute according to the following checklist:

Change default model to Sonnet, only reserve Opus for creative/analysis tasks

Streamline AGENTS.md / SOUL.md / MEMORY.md

Downgrade all cron tasks to Sonnet + merge + reduce frequency

Heartbeat interval 45 minutes + nighttime silence

Configure qmd precise retrieval to replace full-text reading

workspaceFiles only keep necessary files

Regularly streamline memory files, control MEMORY.md within 2000 tokens

Configure once, benefit long-term:

1. Model Tiering — Sonnet daily, Opus critical, save 60-80%

2. Context Slimming — Streamline files + qmd precise retrieval, save 30-90% input token

3. Reduce Calls — Merge cron, extend heartbeat, enable silent period

Sonnet 4 is already very strong, can't feel the difference in daily use. Just switch to Opus when you really need it.

Based on multi-agent system practical experience, data are desensitized estimates.

Related Questions

QWhat are the main hidden costs of token usage in OpenClaw according to the article?

AThe main hidden costs include the System Prompt (~3000-5000 tokens), context file injections like AGENTS.md, SOUL.md, and MEMORY.md (~3000-14000 tokens), and the accumulation of historical messages in conversations.

QWhat is the primary strategy recommended for reducing costs with model selection?

AThe primary strategy is model layering: using Claude Sonnet for daily tasks and reserving Claude Opus only for critical tasks like deep analysis or creative work, as Sonnet is about 1/5 the cost of Opus.

QHow does using qmd help in reducing token consumption?

Aqmd is a local semantic retrieval tool that creates a vector index for precise paragraph retrieval instead of reading entire files, reducing input tokens by up to 90% for research tasks, as it only fetches the needed 10-30 lines.

QWhat optimizations are suggested for cron tasks to save tokens?

AOptimizations include downgrading non-creative tasks to Sonnet, merging multiple tasks into single calls, reducing unnecessary high frequency (e.g., from 10 to 30 minutes), and configuring delivery for on-demand notifications to avoid messages when normal.

QWhat is the recommended approach for heartbeat configuration to minimize costs?

ASet heartbeat intervals to 45-60 minutes during work hours, implement a silent period from 23:00 to 08:00,精简 HEARTBEAT.md to minimal lines, and consolidate scattered check tasks into batch executions within heartbeat.

Related Reads

Korean Youth, Making a 'Last Stand' in an Epic Bull Market

South Korea is experiencing an unprecedented stock market boom in the first half of 2026, with the KOSPI index doubling in six months, driven primarily by tech giants Samsung Electronics and SK Hynix. This "epic bull run," tied to the semiconductor cycle, has sparked a nationwide frenzy for stock trading. The country, with a population of just over 50 million, now has over 105 million securities accounts. The article, from the perspective of a Chinese national living in Seoul, explores how this speculative fever reflects deeper societal anxieties among Korean youth. Facing stagnant wages, high costs of living, housing pressures, and rigid social stratification, many young people see the volatile market as a "last chance" to alter their predetermined life trajectories and escape financial precarity. Stories include a young office worker investing her meager savings, a couple delaying marriage due to financial pressures, and a seasoned trader navigating exclusive social circles where market information is currency. However, the boom also exposes and exacerbates existing inequalities. While some achieve windfalls, others face devastating losses, with borrowing to invest reaching record highs. The narrative contrasts the illusion of equal opportunity with the harsh reality that the ability to absorb risk is unevenly distributed. Ultimately, the market frenzy is portrayed not as a solution, but as a symptom of a generation's struggle against a system offering limited upward mobility, where daily life is a precarious balance of bills, debts, and societal expectations.

marsbit13m ago

Korean Youth, Making a 'Last Stand' in an Epic Bull Market

marsbit13m ago

Young South Koreans, Making a 'Last-Ditch Effort' in an Epic Bull Market

This article explores how an unprecedented stock market boom in South Korea during the first half of 2026, driven by the semiconductor industry, is transforming the lives of ordinary people, particularly the youth. The KOSPI index doubled in six months, fueled by giants Samsung and SK Hynix, leading to a frenzy of retail investing. With over 105 million stock accounts in a population of just over 50 million, a sense of "FOMO" (fear of missing out) is pervasive. Through the perspective of Li Yuning, a Chinese woman living in Seoul, the piece follows several young Koreans who see the market as a last chance to escape stifling economic pressures, high housing costs, and narrow social mobility. Individuals like Minji, a low-paid office worker, and Junho, saving for marriage, invest their limited savings, while experienced traders like Suhu navigate exclusive social circles. The narrative reveals that this speculative fever stems less from greed and more from deep-seated anxiety about being left behind in a society with growing wealth inequality and rigid class structures. However, the boom also exposes stark social divides. It exacerbates wealth gaps, as those with family support or existing capital fare better. The pressure to succeed is immense, with stories of devastating losses leading to personal tragedy. Ultimately, the article suggests the牛市 acts as a pressure valve and a temporary illusion of opportunity in a system where traditional paths to advancement seem increasingly closed, leaving young people to gamble on the market as a final, desperate bid for a better future.

链捕手19m ago

Young South Koreans, Making a 'Last-Ditch Effort' in an Epic Bull Market

链捕手19m ago

Doubao Charges More than GPT, While DeepSeek Slashes Prices Dramatically: Who Will Win?

The article discusses the divergent pricing strategies of two major Chinese AI companies. In May, Doubao (by ByteDance) began testing fees, with its professional tier priced higher than ChatGPT Plus. Meanwhile, DeepSeek permanently cut prices for its V4-Pro API to a quarter of the original, setting new global lows. Doubao, with high user traffic from ByteDance apps like TikTok, leads in monthly active users but faces massive compute costs from its free model. Its move to a freemium model targets heavy users, aiming to balance scale and monetization amid substantial investments. DeepSeek's price cut is attributed to architectural innovations that slash inference costs, adaptation to domestic hardware reducing dependency, and engineering optimizations. It focuses on the enterprise (B2B) market, aiming to become a leading model base. Both companies are currently unprofitable. The article contrasts their approaches with Anthropic, which is profitable by primarily serving enterprises with high-value use cases like coding and agents. It argues that sustainable AI business models require integrating AI into real workflows to deliver tangible ROI, rather than just offering chat services. DeepSeek's recent $7 billion funding round, including investments from Tencent, is noted to bolster its B2B position. The ultimate winner will be the player that successfully transforms AI into measurable returns, whether through consumer productivity ecosystems or enterprise platforms.

marsbit28m ago

Doubao Charges More than GPT, While DeepSeek Slashes Prices Dramatically: Who Will Win?

marsbit28m ago

Promised Year of Crypto IPOs? Only One Went Public in Six Months, Down 70%

The much-anticipated wave of crypto IPOs in 2026 has failed to materialize, with market conditions worsening dramatically. While SpaceX prepares for the largest IPO in history, raising $75 billion at a $1.75 trillion valuation, the crypto sector faces a frozen pipeline. The sole crypto IPO success this year, BitGo, serves as a cautionary tale. After launching on the NYSE in January at $18, its stock has plummeted approximately 70%. Other major contenders have stalled or delayed. Kraken, which secretly filed in late 2025, has put its plans on ice, seeing its valuation drop 33% to $13.3 billion. Consensys has postponed its filing until autumn at the earliest, and Bitpanda is poised to miss its self-imposed H1 deadline for a Frankfurt listing. This widespread retreat is driven by a severe liquidity crunch. Bitcoin has fallen below $60,000, with capital being diverted to AI stocks and the massive SpaceX offering. The poor performance of earlier crypto listings like Gemini and the stagnant price of Coinbase further dampen investor appetite. A key underlying pressure is the impending US midterm elections in November, which could alter the currently favorable regulatory landscape. Companies had hoped to go public during this window of policy certainty, but challenging market dynamics have overridden those plans. The transparency that comes with being a public company is now seen as a potential liability rather than a benefit in a down market. The industry's fate now hinges on a few critical watchpoints: whether Kraken restarts its process in H2, if Consensys files in the fall, and if SpaceX's debut can revitalize market liquidity. Otherwise, the promised "crypto IPO year" will likely be pushed beyond the election.

marsbit43m ago

Promised Year of Crypto IPOs? Only One Went Public in Six Months, Down 70%

marsbit43m ago

Trading

Spot
Futures

Hot Articles

How to Buy T

Welcome to HTX.com! We've made purchasing Threshold Network Token (T) simple and convenient. Follow our step-by-step guide to embark on your crypto journey.Step 1: Create Your HTX AccountUse your email or phone number to sign up for a free account on HTX. Experience a hassle-free registration journey and unlock all features.Get My AccountStep 2: Go to Buy Crypto and Choose Your Payment MethodCredit/Debit Card: Use your Visa or Mastercard to buy Threshold Network Token (T) instantly.Balance: Use funds from your HTX account balance to trade seamlessly.Third Parties: We've added popular payment methods such as Google Pay and Apple Pay to enhance convenience.P2P: Trade directly with other users on HTX.Over-the-Counter (OTC): We offer tailor-made services and competitive exchange rates for traders.Step 3: Store Your Threshold Network Token (T)After purchasing your Threshold Network Token (T), store it in your HTX account. Alternatively, you can send it elsewhere via blockchain transfer or use it to trade other cryptocurrencies.Step 4: Trade Threshold Network Token (T)Easily trade Threshold Network Token (T) on HTX's spot market. Simply access your account, select your trading pair, execute your trades, and monitor in real-time. We offer a user-friendly experience for both beginners and seasoned traders.

12.0k Total ViewsPublished 2024.03.29Updated 2026.06.02

How to Buy T

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of T (T) are presented below.

活动图片