The Art of Saving in the AI Era: How to Spend Every Token Wisely

marsbit發佈於 2026-04-03更新於 2026-04-03

文章摘要

In the AI era, tokens are the new currency, and efficiency is paramount. This article outlines strategies to minimize token usage while maximizing value. Key principles include prioritizing high signal-to-noise ratio inputs by removing unnecessary content like greetings, repetitive context, or verbose instructions before processing. Converting files (e.g., PDFs to clean Markdown) and compressing images drastically reduce token consumption. Avoid conversational, multi-turn interactions; instead, provide clear, concise, and complete instructions upfront to prevent costly back-and-forth. Output costs are higher than input, so eliminate AI pleasantries and enforce structured responses (e.g., JSON) over verbose explanations. Use system prompts to mandate direct answers and disable unnecessary features like "extended thinking" for simple tasks. Manage context efficiently: start new conversations for new tasks, compress long histories, and leverage prompt caching to reuse fixed instructions at lower costs. Employ model tiering—assigning complex tasks to premium models (e.g., Claude Opus) and simpler subtasks to cheaper ones (e.g., Claude Haiku)—to optimize cost and performance. Ultimately, the most effective saving is questioning whether a task requires AI at all. Human judgment remains a critical filter to avoid unnecessary token expenditure, ensuring that AI complements rather than replaces human efficiency.

In the telegraph era where charges were per word, ink and paper were money. People were accustomed to condensing thousands of words to the extreme; "Return quickly" was worth more than a long letter, and "All is well" carried the heaviest嘱咐.

Later, telephones entered homes, but long-distance calls were charged by the minute. Parents' long-distance calls were always concise, hanging up promptly after the main point was made. Once the conversation started to extend slightly, the thought of the phone bill would cut short any budding small talk.

Then, broadband came home, and internet access was charged by the hour. People stared at the timer on the screen, opening and closing web pages instantly, only daring to download videos—streaming was a奢侈 verb back then. At the end of every download progress bar lay people's渴望 for "connecting to the world" and their忌惮 of "insufficient balance."

The unit of billing changed again and again, but the instinct to save money remained eternal.

Today, the Token has become the currency of the AI era. However, most people have yet to learn how to be thrifty in this age because we haven't learned how to calculate gains and losses within the invisible algorithms.

When ChatGPT first emerged in 2022, almost no one cared what a Token was. It was the era of AI's "communal pot," where you paid $20 a month and could chat as much as you wanted.

But ever since AI Agents recently became popular, Token expenditure has become something every AI Agent user must pay attention to.

Unlike simple Q&A dialogues, behind a task flow are hundreds or thousands of API calls. An Agent's independent thinking comes at a cost; every self-correction, every tool call, corresponds to a跳动数字 on the bill. Then you'll find that the money you topped up suddenly isn't enough, and you don't even know what the Agent has been doing.

In real life, everyone knows how to save money. When buying groceries at the market, we know to remove the muddy, rotten leaves before weighing; when taking a taxi to the airport, experienced drivers know to avoid the elevated highway during the morning rush hour.

The logic of saving money in the digital world is actually the same; it's just that the billing unit has changed from "jin" and "kilometers" to Tokens.

In the past, saving was due to scarcity; in the AI era, saving is for precision.

We hope that through this article, we can help you梳理出一套 money-saving methodology for the AI era, allowing you to spend every penny where it counts.

Before Weighing, Remove the Rotten Leaves

In the AI era, the value of information is no longer determined by its scope but by its purity.

AI's billing logic charges based on the number of words it reads. Whether you feed it profound insights or meaningless formatting nonsense, as long as it reads it, you have to pay.

Therefore, the first mindset for saving Tokens is to刻进潜意识 the "signal-to-noise ratio."

Every word, every image, every line of code you feed to AI costs money. So before handing anything over to AI, remember to ask yourself: How much of this does AI actually need? How much of it is muddy, rotten leaves?

For example,冗长的开场白 like "Hello, please help me...", repeated background introductions, and code comments that weren't cleaned up properly are all muddy, rotten leaves.

Beyond that, the most common waste is throwing a PDF or webpage screenshot directly to AI. This确实 saves you effort, but "saving effort" in the AI era often means "expensive."

A fully formatted PDF contains, besides the main text, headers, footers, chart labels, hidden watermarks, and a large amount of formatting code for typesetting. These things are utterly useless for AI to understand your question, but they are all billed.

Next time, remember to convert the PDF into clean Markdown text before feeding it to AI. When you turn a 10MB PDF into a 10KB clean text, you not only save 99% of the money but also make AI's brain run much faster than before.

Images are another gold sink.

In the logic of visual models, AI doesn't care how beautifully your photo is taken; it only cares how many pixels you occupy.

Taking Claude's official calculation logic as an example: Image Token consumption = width in pixels × height in pixels ÷ 750.

A 1000×1000 pixel image consumes about 1334 Tokens. Calculated at Claude Sonnet 4.6's pricing, that's about $0.004 per image.

But if you compress the same image to 200×200 pixels, it only consumes 54 Tokens, costing $0.00016, a full 25 times cheaper.

Many people throw high-definition photos taken with their phones or 4K screenshots directly to AI, unaware that the Tokens consumed by these images could be enough for AI to read most of a novella. If the task is merely to recognize text in an image or make a simple visual judgment, like asking AI to识别发票上的金额, read text in an instruction manual, or judge if there is a traffic light in the picture, then 4K resolution is pure waste. Compressing the image to the minimum usable resolution is sufficient.

But the biggest reason for wasted Tokens on the input side isn't actually file format, but inefficient ways of speaking.

Many people treat AI like a real-life neighbor,习惯用 social-style碎碎念 to communicate, first tossing out a "Help me write a webpage," waiting for AI to spit out a半成品, then补充细节, and反复拉扯. This挤牙膏式的 dialogue forces AI to generate content repeatedly, with each round of modification叠加 Token consumption.

Engineers at Tencent Cloud found in practice that for the same requirement,挤牙膏式的 multi-turn dialogue最终消耗的 Tokens are often 3 to 5 times that of stating things clearly once.

The true way to save money is to abandon this inefficient social probing and state the requirements, boundary conditions, and reference examples clearly all at once. Spend less effort explaining "what not to do," because negative sentences often consume more comprehension cost than affirmative ones; directly tell it "how to do it" and provide a clear, correct example.

同时, if you know the target, tell AI directly; don't make AI play detective.

When you command AI to "find the user-related code," it must perform large-scale scanning, analysis, and guessing in the background; whereas when you directly tell it "go look at the src/services/user.ts file," the Token consumption is worlds apart. In the digital world, information parity is the greatest saving.

Don't Pay for AI's "Politeness"

There's an潜规则 in large model billing that many people aren't aware of: Output Tokens are typically 3 to 5 times more expensive than Input Tokens.

That is to say, what AI says is much more expensive than what you say to it. Taking Claude Sonnet 4.6's pricing as an example, input costs only $3 per million Tokens, while output jumps sharply to $15 per million Tokens—a full 5 times difference.

Those polite opening remarks like "Okay, I have fully understood your需求, now I will begin to answer for you......" and those客套结尾 like "I hope the above content is helpful to you" are polite social pleasantries in human communication, but on the API bill, these寒暄 with no informational value also cost you your own money.

The most effective way to solve waste on the output side is to set rules for AI. Use system instructions to clearly tell it: No small talk, no explanations, no restating the requirements, just give the answer directly.

These rules only need to be set once and take effect in every subsequent conversation, truly a "one-time investment, permanent benefit" financial management手段. But when establishing rules, many people fall into another trap: using冗长的 natural language to pile up instructions.

Engineers'实测数据 show that the effectiveness of instructions lies not in word count but in density. Compressing a 500-word system prompt to 180 words, by deleting meaningless polite phrases, merging重复指令, and restructuring paragraphs into concise bulleted lists, the output quality remains almost unchanged, but the Token consumption per call can plummet by 64%.

There's an even more proactive control method: limiting the output length. Many people never set an output上限, letting AI run free. This放任 of expression rights often leads to extreme cost失控. You might only need a short sentence that makes the point, but AI, to demonstrate some kind of "intellectual sincerity," unceremoniously generates an 800-word essay for you.

If what you追求 is pure data, you should force AI to return structured formats, not冗长的 natural language descriptions. Carrying the same amount of information, the Token consumption of JSON format is far lower than that of散文-style paragraphs. This is because structured data剔除 all redundant connecting words, modal particles, and explanatory modifiers, retaining only the high-concentration logical core. In the AI era, you should be清醒ly aware that what is worth paying for is the value of the result, not AI's meaningless self-explanation.

Beyond that, AI's "overthinking" is also疯狂蚕食 your account balance.

Some advanced models have an "extended thinking" mode, which performs massive internal reasoning before answering. This reasoning process is also billed, and at the output price, which is very expensive.

This mode is essentially designed for "complex tasks requiring deep logical support." But most people also choose this mode when asking simple questions. For tasks that don't require deep reasoning, explicitly telling AI "no need to explain the思路, just give the answer directly," or manually turning off extended thinking, can also save you a lot of money.

Don't Let AI Dig Up Old History

Large models have no real memory; they are just疯狂地翻旧账.

This is an underlying mechanism many people don't know about. Every time you send a new message in a chat window, AI doesn't start understanding from your new sentence; it re-reads ALL the previous content you've chatted about, including every round of dialogue, every piece of code, every referenced document, and *then* answers you.

In the Token bill, this "reviewing the old to know the new" is by no means free. As dialogue rounds accumulate, even if you are just asking for a simple word, the cost of AI re-reading the entire old history behind the scenes grows exponentially. This mechanism dictates that the heavier the conversation history, the more expensive your every question becomes.

Someone tracked 496 real conversations containing more than 20 messages and found that the 1st message平均读取 14,000 Tokens, costing about 3.6 cents each; by the 50th message, it平均读取 79,000 Tokens, costing about 4.5 cents each—a full 80% more expensive. And the context gets longer and longer; by the 50th message, the context AI has to reprocess is already 5.6 times that of the 1st message.

The simplest habit to solve this problem is: One task, one chat window.

When a topic is finished,果断开启 a new conversation. Don't treat AI like a聊天窗口 that is never turned off. This habit sounds simple, but many people just can't do it, always thinking "what if I need the previous content later." In fact, the "what if" you worry about绝大多数时候 will not happen, and for this "what if," you are already paying several times more for every new message.

When the dialogue确实需要延续 but the context has become very long, we can use some tools' compression functions. Claude Code has a /compact command that can condense long-winded conversation history into a brief summary, helping you do a cyber断舍离.

There's also a省钱 logic called Prompt Caching. If you repeatedly use the same system prompt, or need to reference the same document in every conversation, AI will cache this content. The next time it's called, it will only charge a small cache read fee, not the full price every time.

Anthropic's official pricing shows that the Token price for a cache hit is 1/10 of the normal price. OpenAI's Prompt Caching can similarly reduce input costs by about 50%. A paper published on arXiv in January 2026 tested long tasks on multiple AI platforms and found that prompt caching could reduce API costs by 45% to 80%.

That is to say, for the same content, the first time you feed it to AI you pay full price; every subsequent call, you only pay 1/10. For users who need to use the same set of规范文档 or system prompts every day, this feature can save a huge number of Tokens.

But Prompt Caching has one prerequisite: your system prompt and reference document content and order must remain consistent, and it must be placed at the very beginning of the conversation. Once the content is changed in any way, the cache becomes invalid, and it's billed at full price again. So, if you have a fixed set of work规范, write it down and don't modify it随意ly.

The last技巧 for context management is loading on demand. Many people like to stuff all their规范, documents, and precautions into the system prompt at once, again for that "just in case" reason.

But the cost of doing this is that you are明明 only doing a very simple task but are forced to load thousands of words of rules,白白浪费 a bunch of Tokens. Claude Code's official documentation recommends keeping CLAUDE.md under 200 lines, splitting专项规则 for different scenarios into independent skill files, and only loading the rules for the scenario you are using. Keeping the context absolutely pure is the highest form of respect for computing power.

Don't Drive a Porsche to Buy Groceries

Different AI models have vastly different prices.

Claude Opus 4.6 costs $5 per million Tokens input and $25 output, while Claude Haiku 3.5 costs only $0.8 input and $4 output—a difference of nearly 6 times. Using the top-tier model to do杂活 like gathering materials or formatting is not only slow but also very expensive.

The smart way is to bring the common "class division of labor" thinking from our human society into AI society. Tasks of different difficulties are assigned to models of different price points.

Just like hiring people to work in the real world, you wouldn't specifically hire an expert with an annual salary of a million dollars to carry bricks at a construction site. It's the same with AI. Claude Code's official documentation also explicitly recommends: Use Sonnet for most programming tasks, reserve Opus for complex architectural decisions and multi-step reasoning, and assign simple subtasks specifically to Haiku.

A more specific practical solution is to构建 a "two-stage workflow." In the first stage, use free or cheap base models to do the early脏活累活, like data collection, format cleaning, draft generation, simple classification, and summarization. Entering the second stage, feed the refined high-purity essence to the top-tier model for core decision-making and deep polishing.

For example, if you need to analyze a 100-page industry report, you can first use Gemini Flash to extract the key data and conclusions from the report and organize them into a 10-page summary. Then, feed this summary to Claude Opus for in-depth analysis and judgment. This two-stage workflow can大幅压缩 costs while ensuring quality.

More advanced than simple分段处理 is deep division of labor based on task deconstruction. A complex engineering task can be completely broken down into several independent subtasks, each matched with the most suitable model.

For example, for a task requiring code writing, you can have a cheap model write the framework and boilerplate code first, and then only hand over the core logic part to an expensive model to implement. Each subtask has a clean, focused context, resulting in more accurate results and lower costs.

You Didn't Need to Spend Tokens in the First Place

All the previous discussions essentially solve the tactical problem of "how to save money," but a more fundamental logical proposition is overlooked by many: Does this action actually require spending Tokens?

The most extreme saving is not the optimization of algorithms but the断舍离 of decisions. We are accustomed to seeking万能解答 from AI, forgetting that in many scenarios, calling an expensive large model is无异于 using a cannon to kill a mosquito.

For example, letting AI automatically process emails, it will treat every email as an independent task to understand, classify, and reply to, consuming huge amounts of Tokens. But if you first spend 30 seconds glancing through the inbox, manually filtering out those emails that clearly don't require AI processing, and then hand the rest over to AI, the cost immediately drops to a small fraction of the original. Human judgment here is not an obstacle but the best filter.

People in the telegraph era knew how much money each additional word cost, so they would掂量—this was an intuitive perception of resources. It's the same in the AI era. When you truly know how much money it costs to make AI say one more sentence, you will naturally掂量 whether this thing is worth letting AI do, whether this task requires a top-tier model or a cheap one, and whether this piece of context is still useful.

This掂量 is the most money-saving ability. In an era where computing power is becoming increasingly expensive, the smartest usage is not to let AI replace humans, but to let AI and humans do what they are each good at. When this sensitivity to Tokens is internalized as a conditioned reflex, you truly change from being a附庸 of computing power back to being its master.

相關問答

QWhat is the core concept of saving money' in the AI era according to the article?

AThe core concept is to spend every Token precisely and efficiently, focusing on maximizing the value of each Token spent rather than simply reducing usage. It's about optimizing input purity, managing output, and making smart decisions about when and how to use AI models.

QWhat are the two main areas where Token waste commonly occurs, as identified in the article?

AThe two main areas are input waste (e.g., feeding AI unnecessary formatting, low-resolution images, or inefficient prompts) and output waste (e.g., paying for AI's polite greetings, lengthy explanations, or 'over-thinking' on simple tasks).

QWhy does a long conversation history lead to higher costs, and what is one simple habit to mitigate this?

AA long history is costly because AI re-reads the entire conversation context with each new message, causing the token count to grow geometrically. One simple habit to mitigate this is to use 'one task, one dialog window' and start a new conversation after a topic is finished.

QHow can users significantly reduce costs when working with visual models and images?

AUsers can significantly reduce costs by compressing images to the minimum usable resolution needed for the task. For example, compressing a 1000x1000 pixel image to 200x200 can reduce token consumption by 25 times, as cost is based on the number of pixels processed.

QWhat is the recommended strategy for choosing which AI model to use for a task to optimize costs?

AThe recommended strategy is to implement a 'class-based division of labor' or a 'two-stage workflow.' Use cheaper models for preliminary tasks like data gathering and formatting, and reserve expensive, powerful models only for complex core decision-making and deep analysis, matching the model's cost to the task's difficulty.

你可能也喜歡

从身份协议到AI入口,World的野心有多大?

近期,加密市场中的WLD成为焦点,其价格持续上涨,市值突破30亿美元。这一热度源于World项目正式进入“The Simple Plan”第三阶段,其发展逻辑正从早期的代币激励转向实用驱动。World的核心目标是构建全球“人格证明”网络,通过扫描虹膜的World ID解决互联网中验证真实人类身份的关键问题。随着生成式AI爆发,区分真人与AI变得日益紧迫。 World的落地场景正在拓宽,覆盖企业端、个人端及AI Agent端。企业方面,与Zoom等公司合作应对深度伪造;个人层面,瞄准社交与票务等场景的真人验证需求;AI Agent端则推出AgentKit,旨在建立人与AI间的可信授权框架,为未来AI经济奠定信任基础。 市场上涨背后是对“真人身份”稀缺价值的重估。在AI内容成本趋近零的未来,真人身份与行为可能成为稀缺资源。World的运营策略也更聚焦,资源集中于高价值城市以构建网络效应,同时下一代Orb设备将实现自助化以降低扩张成本。 宏观来看,World可能推动加密叙事从金融扩展到身份基础设施,身份或成为可组合资产。它也有望成为AI Agent经济的关键入口,解决Agent归属、可信与验证问题。World ID 4.0引入的费用机制开启了协议的收入来源,使其商业模式更趋清晰。 总之,WLD的上涨反映了市场对World在AI时代定位的认可——其野心是成为验证人类身份的关键入口。随着AI与人的界限模糊,掌握人格证明网络可能意味着掌握下一代互联网的重要枢纽。

marsbit52 分鐘前

从身份协议到AI入口,World的野心有多大?

marsbit52 分鐘前

没有腾讯,燧原还剩什么?

燧原科技科创板IPO获通过,成为国产GPU“四小龙”中最后一家上市的公司。其招股书揭示了一个核心问题:公司营收高度依赖单一客户腾讯,2025年销售额的74.9%(按另一口径超80%)来自腾讯。 与其他“四小龙”先融资、讲故事的路径不同,燧原从成立起就锚定大客户交付,营收增长迅猛,2026年第一季度同比暴增1474.85%。这种陡峭增长源于超级大客户的算力订单集中释放。 腾讯大规模采购燧原芯片,背后是自身庞大的AI算力需求(如混元大模型、元宝等)以及构建可控、稳定算力供应链的战略考量。燧原超过80%的加速卡及模组收入来自推理产品,精准匹配了腾讯大模型落地的急需。 腾讯不仅是燧原第一大客户,也是持股20.26%的第一大股东。这种“股东+客户”的深度绑定,在产业逻辑上被视为供应链培育。腾讯通过确定性订单帮助燧原迭代工艺,而自身业务系统与燧原芯片的深度集成也形成了较高的替换成本,构成了燧原的生态护城河。 行业格局逐渐清晰:英伟达为规则制定者,华为昇腾走国家级路线,而燧原、摩尔线程等商业化玩家则依靠市场订单。燧原的定位愈发偏向“腾讯生态的算力底座”,其产品路线图与腾讯需求高度协同。 文章指出,中国AI芯片行业已告别PPT融资驱动,进入残酷的订单交付周期。未来比拼的关键不再是技术参数,而是订单量、交付能力和生态绑定深度。燧原手握腾讯长期且金额翻倍的采购订单,这或许比技术本身更能体现其现阶段价值。国产芯片的长期主义,在于赢得客户的信任、场景和持续订单。

marsbit1 小時前

没有腾讯,燧原还剩什么?

marsbit1 小時前

BTC 市场脉搏:第25周

比特币市场显现试探性反弹,但结构证据指向企稳而非趋势逆转。上周关键变化是交易者行为显著转变:永续合约CVD从-7.7亿美元转为+1.82亿美元,现货CVD从-2.05亿美元回升至接近盈亏平衡。RSI自超卖区反弹94.8%,但仍处29.1低位,显示缺乏持续买盘主导。 反弹基础脆弱:现货成交量骤降40.4%至58亿美元,期货未平仓合约再降3%至306亿美元,表明上涨主要由空头回补驱动。多头资金费率下降22.3%,ETF交易量下降38.1%至111亿美元,市场流动性减弱而非健康改善。 市场恐慌情绪有所缓解:波动率利差一周内压缩85%至4.07%,期权参与者快速下调尾部风险定价。25-Delta偏度从19.07%降至15.99%,反映下行保护需求减少。ETF净流出改善65.5%至-4.65亿美元,ETF MVRV回升至1.06。投降速度放缓:已实现盈亏比改善46%,NUPL收窄14%,但两者仍处亏损区间。 链上数据显示市场活动趋冷:活跃地址减少6.3%,实体调整转账量下降38.8%至39亿美元。已实现市值变化加深至-1.3%,表明资金持续流出网络。积极信号在于供应结构:热资本占比和短期持有者/长期持有者比率均跌破下轨,显示近期买入的供应已被大量清洗,持有者结构正转向长期主导。 目前仅50.8%流通供应处于盈利状态,低于55.1%的下轨,虽压制抛压但也延长投资者压力期。总体而言,市场正在构建盘整基础而非确认反转,缺乏成交量、衍生品规模收缩及资金持续外流表明,市场仍需等待真正的信心与机构资金回归作为催化动力。

insights.glassnode1 小時前

BTC 市场脉搏:第25周

insights.glassnode1 小時前

交易

現貨
合約

熱門文章

如何購買ERA

歡迎來到HTX.com!在這裡,購買Caldera (ERA)變得簡單而便捷。跟隨我們的逐步指南,放心開始您的加密貨幣之旅。第一步:創建您的HTX帳戶使用您的 Email、手機號碼在HTX註冊一個免費帳戶。體驗無憂的註冊過程並解鎖所有平台功能。立即註冊第二步:前往買幣頁面,選擇您的支付方式信用卡/金融卡購買:使用您的Visa或Mastercard即時購買Caldera (ERA)。餘額購買:使用您HTX帳戶餘額中的資金進行無縫交易。第三方購買:探索諸如Google Pay或Apple Pay等流行支付方式以增加便利性。C2C購買:在HTX平台上直接與其他用戶交易。HTX 場外交易 (OTC) 購買:為大量交易者提供個性化服務和競爭性匯率。第三步:存儲您的Caldera (ERA)購買Caldera (ERA)後,將其存儲在您的HTX帳戶中。您也可以透過區塊鏈轉帳將其發送到其他地址或者用於交易其他加密貨幣。第四步:交易Caldera (ERA)在HTX的現貨市場輕鬆交易Caldera (ERA)。前往您的帳戶,選擇交易對,執行交易,並即時監控。HTX為初學者和經驗豐富的交易者提供了友好的用戶體驗。

723 人學過發佈於 2025.07.17更新於 2026.06.02

如何購買ERA

相關討論

歡迎來到 HTX 社群。在這裡,您可以了解最新的平台發展動態並獲得專業的市場意見。 以下是用戶對 ERA (ERA)幣價的意見。

活动图片