The Art of Saving in the AI Era: How to Spend Every Token Wisely

marsbit發佈於 2026-04-03更新於 2026-04-03

文章摘要

In the AI era, tokens are the new currency, and efficiency is paramount. This article outlines strategies to minimize token usage while maximizing value. Key principles include prioritizing high signal-to-noise ratio inputs by removing unnecessary content like greetings, repetitive context, or verbose instructions before processing. Converting files (e.g., PDFs to clean Markdown) and compressing images drastically reduce token consumption. Avoid conversational, multi-turn interactions; instead, provide clear, concise, and complete instructions upfront to prevent costly back-and-forth. Output costs are higher than input, so eliminate AI pleasantries and enforce structured responses (e.g., JSON) over verbose explanations. Use system prompts to mandate direct answers and disable unnecessary features like "extended thinking" for simple tasks. Manage context efficiently: start new conversations for new tasks, compress long histories, and leverage prompt caching to reuse fixed instructions at lower costs. Employ model tiering—assigning complex tasks to premium models (e.g., Claude Opus) and simpler subtasks to cheaper ones (e.g., Claude Haiku)—to optimize cost and performance. Ultimately, the most effective saving is questioning whether a task requires AI at all. Human judgment remains a critical filter to avoid unnecessary token expenditure, ensuring that AI complements rather than replaces human efficiency.

In the telegraph era where charges were per word, ink and paper were money. People were accustomed to condensing thousands of words to the extreme; "Return quickly" was worth more than a long letter, and "All is well" carried the heaviest嘱咐.

Later, telephones entered homes, but long-distance calls were charged by the minute. Parents' long-distance calls were always concise, hanging up promptly after the main point was made. Once the conversation started to extend slightly, the thought of the phone bill would cut short any budding small talk.

Then, broadband came home, and internet access was charged by the hour. People stared at the timer on the screen, opening and closing web pages instantly, only daring to download videos—streaming was a奢侈 verb back then. At the end of every download progress bar lay people's渴望 for "connecting to the world" and their忌惮 of "insufficient balance."

The unit of billing changed again and again, but the instinct to save money remained eternal.

Today, the Token has become the currency of the AI era. However, most people have yet to learn how to be thrifty in this age because we haven't learned how to calculate gains and losses within the invisible algorithms.

When ChatGPT first emerged in 2022, almost no one cared what a Token was. It was the era of AI's "communal pot," where you paid $20 a month and could chat as much as you wanted.

But ever since AI Agents recently became popular, Token expenditure has become something every AI Agent user must pay attention to.

Unlike simple Q&A dialogues, behind a task flow are hundreds or thousands of API calls. An Agent's independent thinking comes at a cost; every self-correction, every tool call, corresponds to a跳动数字 on the bill. Then you'll find that the money you topped up suddenly isn't enough, and you don't even know what the Agent has been doing.

In real life, everyone knows how to save money. When buying groceries at the market, we know to remove the muddy, rotten leaves before weighing; when taking a taxi to the airport, experienced drivers know to avoid the elevated highway during the morning rush hour.

The logic of saving money in the digital world is actually the same; it's just that the billing unit has changed from "jin" and "kilometers" to Tokens.

In the past, saving was due to scarcity; in the AI era, saving is for precision.

We hope that through this article, we can help you梳理出一套 money-saving methodology for the AI era, allowing you to spend every penny where it counts.

Before Weighing, Remove the Rotten Leaves

In the AI era, the value of information is no longer determined by its scope but by its purity.

AI's billing logic charges based on the number of words it reads. Whether you feed it profound insights or meaningless formatting nonsense, as long as it reads it, you have to pay.

Therefore, the first mindset for saving Tokens is to刻进潜意识 the "signal-to-noise ratio."

Every word, every image, every line of code you feed to AI costs money. So before handing anything over to AI, remember to ask yourself: How much of this does AI actually need? How much of it is muddy, rotten leaves?

For example,冗长的开场白 like "Hello, please help me...", repeated background introductions, and code comments that weren't cleaned up properly are all muddy, rotten leaves.

Beyond that, the most common waste is throwing a PDF or webpage screenshot directly to AI. This确实 saves you effort, but "saving effort" in the AI era often means "expensive."

A fully formatted PDF contains, besides the main text, headers, footers, chart labels, hidden watermarks, and a large amount of formatting code for typesetting. These things are utterly useless for AI to understand your question, but they are all billed.

Next time, remember to convert the PDF into clean Markdown text before feeding it to AI. When you turn a 10MB PDF into a 10KB clean text, you not only save 99% of the money but also make AI's brain run much faster than before.

Images are another gold sink.

In the logic of visual models, AI doesn't care how beautifully your photo is taken; it only cares how many pixels you occupy.

Taking Claude's official calculation logic as an example: Image Token consumption = width in pixels × height in pixels ÷ 750.

A 1000×1000 pixel image consumes about 1334 Tokens. Calculated at Claude Sonnet 4.6's pricing, that's about $0.004 per image.

But if you compress the same image to 200×200 pixels, it only consumes 54 Tokens, costing $0.00016, a full 25 times cheaper.

Many people throw high-definition photos taken with their phones or 4K screenshots directly to AI, unaware that the Tokens consumed by these images could be enough for AI to read most of a novella. If the task is merely to recognize text in an image or make a simple visual judgment, like asking AI to识别发票上的金额, read text in an instruction manual, or judge if there is a traffic light in the picture, then 4K resolution is pure waste. Compressing the image to the minimum usable resolution is sufficient.

But the biggest reason for wasted Tokens on the input side isn't actually file format, but inefficient ways of speaking.

Many people treat AI like a real-life neighbor,习惯用 social-style碎碎念 to communicate, first tossing out a "Help me write a webpage," waiting for AI to spit out a半成品, then补充细节, and反复拉扯. This挤牙膏式的 dialogue forces AI to generate content repeatedly, with each round of modification叠加 Token consumption.

Engineers at Tencent Cloud found in practice that for the same requirement,挤牙膏式的 multi-turn dialogue最终消耗的 Tokens are often 3 to 5 times that of stating things clearly once.

The true way to save money is to abandon this inefficient social probing and state the requirements, boundary conditions, and reference examples clearly all at once. Spend less effort explaining "what not to do," because negative sentences often consume more comprehension cost than affirmative ones; directly tell it "how to do it" and provide a clear, correct example.

同时, if you know the target, tell AI directly; don't make AI play detective.

When you command AI to "find the user-related code," it must perform large-scale scanning, analysis, and guessing in the background; whereas when you directly tell it "go look at the src/services/user.ts file," the Token consumption is worlds apart. In the digital world, information parity is the greatest saving.

Don't Pay for AI's "Politeness"

There's an潜规则 in large model billing that many people aren't aware of: Output Tokens are typically 3 to 5 times more expensive than Input Tokens.

That is to say, what AI says is much more expensive than what you say to it. Taking Claude Sonnet 4.6's pricing as an example, input costs only $3 per million Tokens, while output jumps sharply to $15 per million Tokens—a full 5 times difference.

Those polite opening remarks like "Okay, I have fully understood your需求, now I will begin to answer for you......" and those客套结尾 like "I hope the above content is helpful to you" are polite social pleasantries in human communication, but on the API bill, these寒暄 with no informational value also cost you your own money.

The most effective way to solve waste on the output side is to set rules for AI. Use system instructions to clearly tell it: No small talk, no explanations, no restating the requirements, just give the answer directly.

These rules only need to be set once and take effect in every subsequent conversation, truly a "one-time investment, permanent benefit" financial management手段. But when establishing rules, many people fall into another trap: using冗长的 natural language to pile up instructions.

Engineers'实测数据 show that the effectiveness of instructions lies not in word count but in density. Compressing a 500-word system prompt to 180 words, by deleting meaningless polite phrases, merging重复指令, and restructuring paragraphs into concise bulleted lists, the output quality remains almost unchanged, but the Token consumption per call can plummet by 64%.

There's an even more proactive control method: limiting the output length. Many people never set an output上限, letting AI run free. This放任 of expression rights often leads to extreme cost失控. You might only need a short sentence that makes the point, but AI, to demonstrate some kind of "intellectual sincerity," unceremoniously generates an 800-word essay for you.

If what you追求 is pure data, you should force AI to return structured formats, not冗长的 natural language descriptions. Carrying the same amount of information, the Token consumption of JSON format is far lower than that of散文-style paragraphs. This is because structured data剔除 all redundant connecting words, modal particles, and explanatory modifiers, retaining only the high-concentration logical core. In the AI era, you should be清醒ly aware that what is worth paying for is the value of the result, not AI's meaningless self-explanation.

Beyond that, AI's "overthinking" is also疯狂蚕食 your account balance.

Some advanced models have an "extended thinking" mode, which performs massive internal reasoning before answering. This reasoning process is also billed, and at the output price, which is very expensive.

This mode is essentially designed for "complex tasks requiring deep logical support." But most people also choose this mode when asking simple questions. For tasks that don't require deep reasoning, explicitly telling AI "no need to explain the思路, just give the answer directly," or manually turning off extended thinking, can also save you a lot of money.

Don't Let AI Dig Up Old History

Large models have no real memory; they are just疯狂地翻旧账.

This is an underlying mechanism many people don't know about. Every time you send a new message in a chat window, AI doesn't start understanding from your new sentence; it re-reads ALL the previous content you've chatted about, including every round of dialogue, every piece of code, every referenced document, and *then* answers you.

In the Token bill, this "reviewing the old to know the new" is by no means free. As dialogue rounds accumulate, even if you are just asking for a simple word, the cost of AI re-reading the entire old history behind the scenes grows exponentially. This mechanism dictates that the heavier the conversation history, the more expensive your every question becomes.

Someone tracked 496 real conversations containing more than 20 messages and found that the 1st message平均读取 14,000 Tokens, costing about 3.6 cents each; by the 50th message, it平均读取 79,000 Tokens, costing about 4.5 cents each—a full 80% more expensive. And the context gets longer and longer; by the 50th message, the context AI has to reprocess is already 5.6 times that of the 1st message.

The simplest habit to solve this problem is: One task, one chat window.

When a topic is finished,果断开启 a new conversation. Don't treat AI like a聊天窗口 that is never turned off. This habit sounds simple, but many people just can't do it, always thinking "what if I need the previous content later." In fact, the "what if" you worry about绝大多数时候 will not happen, and for this "what if," you are already paying several times more for every new message.

When the dialogue确实需要延续 but the context has become very long, we can use some tools' compression functions. Claude Code has a /compact command that can condense long-winded conversation history into a brief summary, helping you do a cyber断舍离.

There's also a省钱 logic called Prompt Caching. If you repeatedly use the same system prompt, or need to reference the same document in every conversation, AI will cache this content. The next time it's called, it will only charge a small cache read fee, not the full price every time.

Anthropic's official pricing shows that the Token price for a cache hit is 1/10 of the normal price. OpenAI's Prompt Caching can similarly reduce input costs by about 50%. A paper published on arXiv in January 2026 tested long tasks on multiple AI platforms and found that prompt caching could reduce API costs by 45% to 80%.

That is to say, for the same content, the first time you feed it to AI you pay full price; every subsequent call, you only pay 1/10. For users who need to use the same set of规范文档 or system prompts every day, this feature can save a huge number of Tokens.

But Prompt Caching has one prerequisite: your system prompt and reference document content and order must remain consistent, and it must be placed at the very beginning of the conversation. Once the content is changed in any way, the cache becomes invalid, and it's billed at full price again. So, if you have a fixed set of work规范, write it down and don't modify it随意ly.

The last技巧 for context management is loading on demand. Many people like to stuff all their规范, documents, and precautions into the system prompt at once, again for that "just in case" reason.

But the cost of doing this is that you are明明 only doing a very simple task but are forced to load thousands of words of rules,白白浪费 a bunch of Tokens. Claude Code's official documentation recommends keeping CLAUDE.md under 200 lines, splitting专项规则 for different scenarios into independent skill files, and only loading the rules for the scenario you are using. Keeping the context absolutely pure is the highest form of respect for computing power.

Don't Drive a Porsche to Buy Groceries

Different AI models have vastly different prices.

Claude Opus 4.6 costs $5 per million Tokens input and $25 output, while Claude Haiku 3.5 costs only $0.8 input and $4 output—a difference of nearly 6 times. Using the top-tier model to do杂活 like gathering materials or formatting is not only slow but also very expensive.

The smart way is to bring the common "class division of labor" thinking from our human society into AI society. Tasks of different difficulties are assigned to models of different price points.

Just like hiring people to work in the real world, you wouldn't specifically hire an expert with an annual salary of a million dollars to carry bricks at a construction site. It's the same with AI. Claude Code's official documentation also explicitly recommends: Use Sonnet for most programming tasks, reserve Opus for complex architectural decisions and multi-step reasoning, and assign simple subtasks specifically to Haiku.

A more specific practical solution is to构建 a "two-stage workflow." In the first stage, use free or cheap base models to do the early脏活累活, like data collection, format cleaning, draft generation, simple classification, and summarization. Entering the second stage, feed the refined high-purity essence to the top-tier model for core decision-making and deep polishing.

For example, if you need to analyze a 100-page industry report, you can first use Gemini Flash to extract the key data and conclusions from the report and organize them into a 10-page summary. Then, feed this summary to Claude Opus for in-depth analysis and judgment. This two-stage workflow can大幅压缩 costs while ensuring quality.

More advanced than simple分段处理 is deep division of labor based on task deconstruction. A complex engineering task can be completely broken down into several independent subtasks, each matched with the most suitable model.

For example, for a task requiring code writing, you can have a cheap model write the framework and boilerplate code first, and then only hand over the core logic part to an expensive model to implement. Each subtask has a clean, focused context, resulting in more accurate results and lower costs.

You Didn't Need to Spend Tokens in the First Place

All the previous discussions essentially solve the tactical problem of "how to save money," but a more fundamental logical proposition is overlooked by many: Does this action actually require spending Tokens?

The most extreme saving is not the optimization of algorithms but the断舍离 of decisions. We are accustomed to seeking万能解答 from AI, forgetting that in many scenarios, calling an expensive large model is无异于 using a cannon to kill a mosquito.

For example, letting AI automatically process emails, it will treat every email as an independent task to understand, classify, and reply to, consuming huge amounts of Tokens. But if you first spend 30 seconds glancing through the inbox, manually filtering out those emails that clearly don't require AI processing, and then hand the rest over to AI, the cost immediately drops to a small fraction of the original. Human judgment here is not an obstacle but the best filter.

People in the telegraph era knew how much money each additional word cost, so they would掂量—this was an intuitive perception of resources. It's the same in the AI era. When you truly know how much money it costs to make AI say one more sentence, you will naturally掂量 whether this thing is worth letting AI do, whether this task requires a top-tier model or a cheap one, and whether this piece of context is still useful.

This掂量 is the most money-saving ability. In an era where computing power is becoming increasingly expensive, the smartest usage is not to let AI replace humans, but to let AI and humans do what they are each good at. When this sensitivity to Tokens is internalized as a conditioned reflex, you truly change from being a附庸 of computing power back to being its master.

你可能也喜歡

每月10万美元：Truth Social向投资公司出售特朗普帖文访问权

特朗普媒体与技术集团于2026年8月1日正式推出付费数据服务Truth API。该服务以每月高达10万美元的费用，向机构投资者和高频交易公司提供实时访问Truth Social上最具影响力账户（包括特朗普总统拥有约1300万粉丝的账号）帖文的权限，延迟仅毫秒级。公司称此举是将其核心资产货币化、创造稳定高利润收入来源的战略一部分。此项服务引发了政治争议。民主党参议员沃伦和希夫要求美国证券交易委员会调查其是否违法。共和党参议员卡西迪批评这是以金钱售卖获取总统言论的特权通道。TMTG回应称批评是协调一致的抹黑行动，旨在损害这家上市公司。分析指出，此类高速数据流可能重现类似2013年美联社账号被黑导致市场闪崩的风险，因为交易算法会快于人工验证而做出反应。这引发了对于帖子真实性验证机制缺失及潜在市场操纵或黑客攻击风险的担忧。特朗普目前仍持有TMTG约41%的股份。

cryptonews.ru27 分鐘前

cryptonews.ru27 分鐘前

STRC优先股价格仍低于面值，策略集团股息维持在12%

Strategy公司的优先股STRC在7月份价格持续低于其100美元的面值，但公司宣布8月股息将维持12%不变，不会上调。董事长Michael Saylor通过社交媒体确认了这一消息，并继续将STRC宣传为增加收入的工具。8月将是股息改为半月支付后的第二个月。 STRC股价在7月有所回升，月底收于89.46美元，全月上涨5.42%，但交易量低于日均水平。公司CEO重申，管理层的目标是让STRC股价最终达到99-100美元区间，但未给出具体时间表。尽管公司第二季度因比特币持仓未实现亏损而录得巨额净亏损，但已建立37.5亿美元的现金储备，以支持其BTC货币化计划下的优先股派息。该储备足以支付超过两年的优先股股息和利息义务。公司近期已折价回购了部分STRC优先股，并计划在股价低于面值时继续回购。

cointelegraph1 小時前

cointelegraph1 小時前

比特币提现仍在继续：Coldcard冷钱包8年存储终成空

硬件钱包Coldcard遭黑客攻击，导致大量资金从易受攻击设备中被持续转出。据Galaxy Research数据，截至2026年8月2日，已有4585个地址被盗，损失总额达1367.05 BTC（约合8860万美元），远超7月30日最初报告的594.5 BTC。大部分被盗资金仍停留在攻击者地址。问题根源并非固件，而是设备生成的种子短语存在漏洞。2021年3月起，因程序员错误集成libNgU库，设备从使用STM32硬件随机数生成器转为使用软件生成器Yasmarang，该生成器由公开可获取的芯片序列号和计时器状态初始化，导致生成的种子短语可在离线状态下被暴力破解。即使固件后续已更新，只要用户未将资金转移至基于新种子短语生成的新地址，旧钱包就始终处于风险中。受影响的设备包括特定固件版本的Mk2/Mk3、Mk4/Mk5及Q系列。仅当种子短语是通过至少50次独立掷骰子或强唯一性BIP-39密码短语创建时方可幸免。官方建议受影响用户立即在已修复的固件上生成新种子短语并转移资产。报道提及一位39岁投资者的案例，他因该漏洞损失了2 BTC（约13万美元）。他多年来通过体力劳动积攒比特币，将其视为在制裁和高通胀国家中的财务保障与提前退休的途径。此次事件使他的长期持有策略和“冷存储”信心遭受重击，他因此决定彻底退出加密货币领域。从历史数据看，随机数生成器缺陷并非首例，类似问题曾导致巨额损失。此次事件警示，即使离线存储也未必绝对安全，其安全性高度依赖于底层硬件和算法的可靠性。

cryptonews.ru1 小時前

cryptonews.ru1 小時前

韩国15种山寨币交易量呈现爆发式增长！

韩国主要加密货币交易所Upbit和Bithumb上部分山寨币交易量出现显著增长。过去24小时内，最受欢迎的山寨币总交易额达到约3.477亿美元。其中，MetaDAO（META）交易量居首，仅在Upbit上的单日交易额就达6584万美元，占该交易所现货总交易量的12.39%。Euler（EUL）以4765万美元的总交易额位居第二，XRP以3811万美元位列第三，持续受到韩国投资者关注。其他交易量靠前的山寨币包括ThunderCore（TT）、Babylon（BABY）、Geodnet（GEOD）、Hyperlane（HYPER）、Momentum（MMT）、Ondo（ONDO）、柴犬币（SHIB）等。本文提供的信息不构成投资建议。

cryptonews.ru3 小時前

cryptonews.ru3 小時前

唐纳德·特朗普的公司再度出售大批比特币！

据报道，与美国总统唐纳德·特朗普的媒体公司Trump Media & Technology Group相关的地址，疑似向加密货币交易所CryptoCom转移了约2628枚比特币，价值约1.65亿美元。此前有分析称，该公司总计购买了11542枚比特币，平均成本为每枚11.85万美元。据称，2026年至今，相关地址已转出约7281枚比特币，目前仍持有约4261枚。 Trump Media在比特币投资上的已实现和未实现损失总额估计约为5.55亿美元。不过，将比特币转移至交易所并不一定意味着出售资产，也可能是为了托管、流动性管理或其他财务操作。目前尚无法确定其具体意图，但从冷钱包向中心化交易所转移通常被视为潜在的出售行为。 *本文不构成投资建议。

cryptonews.ru5 小時前

cryptonews.ru5 小時前

交易

現貨

The Art of Saving in the AI Era: How to Spend Every Token Wisely

文章摘要

Before Weighing, Remove the Rotten Leaves

Don't Pay for AI's "Politeness"

Don't Let AI Dig Up Old History

Don't Drive a Porsche to Buy Groceries

You Didn't Need to Spend Tokens in the First Place

熱門幣種推薦

相關問答

你可能也喜歡

每月10万美元：Truth Social向投资公司出售特朗普帖文访问权

STRC优先股价格仍低于面值，策略集团股息维持在12%

比特币提现仍在继续：Coldcard冷钱包8年存储终成空

韩国15种山寨币交易量呈现爆发式增长！

唐纳德·特朗普的公司再度出售大批比特币！

交易

熱門文章

如何購買ERA

相關討論

熱門問答

熱門分類

熱門標籤