Claude Bill Skyrockets by 5 Billion, Surges 60-Fold Overnight—Can Your Token Budget Keep Up?

marsbit發佈於 2026-06-01更新於 2026-06-01

文章摘要

An enterprise reportedly ran up a staggering $500 million bill on Anthropic's Claude AI in just one month due to a simple oversight: failing to set usage limits for employee accounts. This incident highlights a growing trend of runaway AI costs. Other examples include a Google Cloud user hit with an unexpected $18,000 bill from API key abuse, and an OpenAI internal experiment that consumed 603 billion tokens, costing $1.3 million in 30 days. Major AI providers like OpenAI and GitHub are shifting from flat monthly fees to granular, usage-based pricing (per input/output/cached token), causing shock for some users whose costs skyrocketed by orders of magnitude. The root causes extend beyond pricing. The rise of autonomous AI agents executing long, complex tasks has drastically increased token consumption. Furthermore, misaligned incentives, like internal "leaderboards" ranking employees by AI usage, can encourage wasteful "tokenmaxxing"—using powerful models for trivial tasks just to inflate metrics. This has sparked a new industry focused on cost optimization. Solutions include providing AI with better context (reducing redundant searches) and intelligent model routing (matching tasks to the most cost-effective model). Research indicates token consumption for agentic tasks can vary wildly (up to 30x for the same job) without guaranteeing better results, and models often underestimate their own costs. As AI expenses begin to rival or even surpass human labor costs for some t...

A $500 Million Bill in Just 1 Month!

Recently, a shocking blunder erupted in the tech world. According to Axios, a company actually managed to rack up a $500 million bill on Claude in just one month!

The reason is laughable: management forgot to set usage limits when granting employees access to Claude accounts.

In fact, this isn't the only case of AI bills exploding.

In April, a Google Cloud user, whose publicly accessible API key was misused, received a bill for $18,000 overnight, despite having only a $7 budget set.

The unlucky user, Jesse Davies, is an Australian AI consultant and founder of Agentic Labs. He had set up two safeguards for his Google Cloud account: a A$10 (about $7) budget alert and a hard spending cap of $1,400.

As reported by Tom's Hardware, attackers discovered a Cloud Run service he had deployed months earlier from AI Studio, sending over 60,000 requests. Both safeguards failed: there was a delay in billing calculations, and by the time the system reacted, the amount had skyrocketed to $18,000.

In mid-May, Peter Steinberger, founder of the open-source project OpenClaw, posted a screenshot on X: a $1.3 million OpenAI API bill for 30 days.

His team has only three people, but they orchestrated 100 Codex agents running in parallel: burning through 603 billion Tokens and making 7.6 million requests in 30 days. Fortunately, he didn't have to foot this $1.3 million bill himself.

Steinberger joined OpenAI this past February, and this $1.3 million was treated as an internal experiment:

to test the absolute limits of AI programming when token cost is not a consideration. He added that this was the result of Codex's "Fast Mode" (higher-tier billing); turning it off reduced the cost to about $300,000.

Even earlier, Uber's CTO Praveen Neppalli Naga had admitted to The Information that the company had exhausted its annual Claude Code budget by April, and their COO also publicly stated that AI costs were becoming increasingly "hard to justify."

$500 million, $1.3 million, $18,000—though these figures differ by orders of magnitude, they point to the same reality:

In the age of agents, any one of these—a compromised key, an army of agents running 24/7, or an account with forgotten limits—can blow up your token bill overnight.

Why Do AI Bills Explode?

The answer lies mainly in the shift in billing methods.

Starting April this year, OpenAI began transitioning from monthly flat fees to usage-based billing by Token.

On April 2, Codex billing shifted from per-message estimates to alignment with actual Token usage: Input, Cached Input, and Output Tokens are billed separately. On April 23, this rule was extended to all Enterprise, Edu, Health, and Gov plans: the invisible discount within the monthly fee was removed.

GitHub followed closely, just announcing: all Copilot plans will switch to usage-based billing effective June 1, 2026. The old premium request logic is scrapped, replaced with AI credits, settled based on actual consumption of Input, Output, and Cached Tokens against each model's API rate.

GitHub officially explained the reason for this change:

Currently, a quick chat question and a multi-hour autonomous coding task cost the user the same amount. GitHub has been subsidizing the heavy users, but this model is no longer sustainable.

Before the rise of AI agents, the costs of chat and completions were similar, and monthly fees could cover them.

After agents rose, a single task could run for hours and modify entire codebases, creating a cost difference of orders of magnitude between heavy and light users. The flat monthly fee model collapses in the face of such disparity.

The news sparked an uproar on Reddit and X.

A developer with the ID JBusu shared a screenshot of their bill, bluntly calling the new pricing "a joke." Their previous monthly cost of $28.12 would become $746.01 under the new system. They've decided to cancel, "At this price, I could rent a cloud server myself and it would be cheaper."

Another user shared an even more extreme screenshot, showing costs soaring from $50 to $3,000. They said they never expected pricing to be this outrageous, "Is anyone still subscribing?"

However, some veteran Copilot users countered: these extreme bills are likely burned by "vibe-coders" who aren't mindful of token usage and may not represent normal use.

One veteran user commented: "I use it all day long and rarely exceed limits by month-end. It's hard to believe this is due to differences in task complexity." Another was more direct: "It's people wanting fully automated YOLO-mode development, letting AI run wild. Culling this waste is actually good for everyone else."

One thing is clear: GitHub hasn't abolished monthly fees; the base subscription price remains unchanged. What has changed is that extra usage, agent tasks, and calls to more expensive models now fall under usage-based billing.

The hardest hit are those heavy agent users who rely on Copilot for long-chain tasks.

The Leaderboard Gamed by Its Own Users

The collapse of flat fees is partly due to platforms changing their billing rules, and partly because AI users themselves are burning through tokens.

In May, Business Insider reported that Amazon took down an internal AI usage leaderboard called KiroRank.

The report cited insiders saying the leaderboard quietly encouraged a strange work style: some employees, to climb the ranks, would burn tokens on tasks that didn't solve actual problems, purely for ranking.

After the story broke, Amazon SVP Dave Treadwell directly addressed all employees: "Don't use AI for the sake of using AI. Use it to solve customer problems, business problems, to innovate."

Though absurd, this is hardly surprising. When "burning tokens" gets you on a leaderboard, employees will naturally burn tokens.

Silicon Valley has coined a term for this phenomenon: Tokenmaxxing—treating consumption volume as productivity.

Axios's report also mentioned CTOs discovering employees using cutting-edge AI models to check the weather or write routine emails—trivial tasks that, when run on the most expensive frontier models, can silently send bills soaring.

KiroRank wasn't part of Amazon's official evaluation system but an informal tool built by employees. Yet it clearly exposes a classic management principle: when KPIs are set wrong, people will use the cleverest ways to game the system.

Equating "how much was used" with "how well it was done"—this is the systemic root of this wave of AI waste.

Those Who Count Tokens Are Already Making Money

On the flip side of token bill anxiety, some are quietly turning it into a business.

First approach: Feed the AI with context.

Glean is actually Arvind Jain's own company. It builds an enterprise AI work assistant: unifying knowledge scattered across a company, giving employees' AI direct context so they don't have to dig around. The AI takes fewer detours, naturally burning fewer tokens.

This mechanism helped Glean's annual revenue triple in 15 months, crossing $300 million, with clients including Databricks, Reddit, and Samsung.

Second approach: Delegate tasks to the right model.

This is what model routing startup Factory AI does: automatically routing each task to the most suitable model, cheap ones for simple tasks, top-tier for complex ones. Arvind also noted: Do routing right, and you can save 10x.

Both paths lead to the same destination: Let AI work, but don't let it burn money indiscriminately.

Academic research is also laying the groundwork for this shift.

https://arxiv.org/pdf/2604.22750

An arXiv paper from April 2026 systematically broke down how agent coding tasks actually burn money for the first time.

Conclusion One: Token consumption for agent tasks can be thousands of times higher than ordinary code reasoning or code chat, with Input Tokens being the main cost driver.

Conclusion Two: Running the same task multiple times can result in a 30x difference in Token consumption.

Conclusion Three: Higher Token consumption does not necessarily lead to higher accuracy. Accuracy often peaks at medium cost—burning more beyond that spends money without yielding better results.

The paper also found that even frontier models can't reliably predict their own token consumption, generally underestimating the real cost.

You think spending more gets more done. In reality, money is spent, the work isn't necessarily better, and the budget is still unpredictable.

When AI Bills Start Rivaling Labor Costs

"This is the first time in my memory that technology costs are starting to be on par with human costs."

On May 29, Glean CEO Arvind Jain said this in an interview with CNBC's Deirdre Bosa.

Observations from Nvidia's Vice President of Applied Deep Learning, Bryan Catanzaro, corroborate this.

He mentioned in an Axios interview that for his team, compute costs far exceed employee salaries.

Similar trends are emerging across multiple companies: from enterprise AI player Glean, to AI compute seller Nvidia, to AI user Uber—all are re-evaluating this equation.

In Arvind's view, historically, technology was just a small slice of overall corporate costs. But now, AI costs are catching up to payrolls. Many companies' annual AI budgets are often burned through in just one or two months.

Over the past year, AI usage rate was a worshipped metric: more usage meant being advanced, burning tokens meant embracing the future. Now, many companies are reflecting on that simple question: What exactly did all those burned tokens buy?

The window of free or flat-rate unlimited usage is precisely closing at this moment.

Going forward, the question facing all developers is this: How to budget meticulously and maximize the value of every single Token.

Undoubtedly, the true winners of the future will be those who learn to count tokens first.

References:

https://x.com/dee_bosa/status/2060791500049613306%20

https://www.cnbc.com/2026/05/29/-tokens-or-humans-the-new-corporate-trade-off.html%20

https://www.axios.com/2026/05/28/ai-spending-roi-enterprise-costs%20

https://www.businessinsider.com/amazon-ai-leaderboard-tokenmaxxing-2026-5

This article is from the WeChat public account "AI Era Insights", author: ASI启示录

你可能也喜歡

a16z 全球化转向：VC 正在成为美国科技联盟的「推手」

a16z（Andreessen Horowitz）发布公告，宣布其全球化战略发生重要转向：不再局限于海外寻找项目和投资，而是将自身定位融入更大的技术竞争与国际盟友合作框架中。面对AI、机器人、国防科技等成为国家竞争焦点的领域，创业公司面临复杂的国际监管、产业政策和地缘关系。a16z通过设立东京办公室、任命Anne Neuberger负责全球事务、将投资者关系团队升级为全球合作伙伴团队等举措，主动应对这一变化。公告明确将a16z的全球网络与“美国及其盟友”的技术领导力绑定，标志着技术创新已进入国家安全和国际竞争语境。未来，风投的角色不仅是提供资本和增长建议，更要帮助创始人对接关键市场、政府机构和战略资源，理解多国政策环境。a16z旨在成为连接创业公司、国家能力、产业资源和全球资本的组织者，支持盟友国家在关键创新领域的合作，并助力投资组合公司进行全球扩张。这一布局体现了硅谷资本对全球科技竞争新格局的主动站位。

marsbit10 分鐘前

marsbit10 分鐘前

解读Agent商业、支付与基础设施的真相

作者基于一年来为Agent经济构建基础设施的经验，指出当前Agent商业尚未形成真实、规模化的市场需求，初创公司面临结构性挑战。文章分析了四个关键场景： 1. **Agent对商户**：目前电商体验中，聊天界面在视觉比价购物上逊于传统界面，商户接入多出于防御性“优化”心态。对话式商业在如外卖等高頻、低决策场景有潜力，但受限于平台开放性和成本。 2. **Agent对API**：开发者现有支付方式（如预付）已能处理低频、小额的API调用成本问题。真正的机会在于服务长尾、小众的供应商市场，但规模有限。 3. **Agent对Agent**：这是长期的愿景，涉及机器间的自动交易与结算，需求真实但当前市场几乎为零，需要专用的基础设施。 4. **Agent对金融**：这是唯一存在现成需求和付费客户的领域。将AI嵌入金融工作流是自然演进，但竞争激烈，老牌机构优势明显。文章认为，行业巨头因资金充足和战略防御而持续投入，但对初创公司而言，真正的机会并非单纯构建支付层。支付只是更宏大问题——**Agent与人类的协同工作、验证与结算**——的一部分。未来，解决协同问题的公司将主导市场，而非支付服务商。作者团队已转向一个存在真实需求、快速增长且未被充分服务的领域。

marsbit16 分鐘前

marsbit16 分鐘前

Kalshi、MTS 与 a16z 的野望

本文探讨了预测市场在2025年成为投资、加密和媒体领域共同关注焦点的现象，并着重分析了其精神内核的演变及其与风投机构a16z所倡导的“新媒体”愿景的契合。文章首先回顾了预测市场的思想渊源：从哈耶克关于市场作为分散知识协调机制的观点，到罗宾·汉森设计对数市场评分规则（LMSR）以激励信息真实披露，乃至衍生出的“未来统治”（Futarchy）治理乌托邦构想。然而，作者指出，a16z在2024-2025年投资估值飙升的预测市场平台Kalshi，为此领域注入了新的精神内涵——“在场感”。在人们与现实世界日益疏离的后现代语境下，预测市场提供了一种通过真金白银下注来介入和“预测”未来的方式，使用户从被动观察者转变为主动的“超级观察者”，从而对抗不确定性与无力感。当足够多人使用并依赖这种媒介时，市场本身将对事件的真实性与重要性获得解释权，这正是a16z构建新媒体帝国的关键拼图。最后，文章以媒体公司MTS为例，说明a16z的“新媒体”是一种全频段、高烈度的信息工程，旨在“接管时间线”。而Kalshi的核心价值在于，它通过真实的交易数据构建了一种强大的“现实扭曲力场”，其显示的市场概率能深刻影响公众认知与判断，这种赋予私营公司的社会影响力是其获得高估值的根本原因。

链捕手16 分鐘前

链捕手16 分鐘前

一周代币解锁：HOME解锁超流通量20%代币

本周代币解锁重点关注项目DeFi.app（代币HOME），其将释放7.9亿枚代币，价值约4126万美元，解锁量超过其流通供应量的20%。 DeFi.app旨在简化去中心化金融操作，为用户提供跨链购买代币、一键式跨链交易、访问去中心化衍生品市场以及获取收益机会等功能，并宣称无需支付Gas费或使用跨链桥。项目相关信息可通过其官网与推特获取。文内附有其代币释放曲线图以供参考。

marsbit3 小時前

marsbit3 小時前

美股遭遇2025年来最惨烈暴跌，三大导火索引爆科技股估值重估

2025年6月5日，美股遭遇当年最惨烈暴跌，纳斯达克指数重挫4.18%，标普500跌2.64%。此前市场连续上涨，此次逆转由三条导火索同时引爆。首先，博通发布财报，其AI芯片收入虽大幅增长，但下一季度展望不及预期，暗示AI增速可能放缓，引发市场对半导体板块估值重估的担忧，芯片股集体暴跌。其次，强劲的非农就业数据超出预期，叠加前值上修，令市场担忧美联储可能不仅不降息，甚至转向加息。这直接压缩了高估值科技股的未来现金流价值，并促使资金轮动。第三条是持续的地缘政治风险。伊朗战争导致的霍尔木兹海峡封锁推高油价，通胀阴影挥之不去，使美联储政策面临两难，加剧了市场对流动性收紧的忧虑。三者叠加，攻击了“AI无限增长”、“美联储降息”和“通胀受控”三大市场叙事，引发全球风险资产抛售。此次事件被视为一次“估值重定价”而非“AI叙事崩塌”，市场开始更冷静地区分AI浪潮中的真正赢家。未来走势将取决于美联储政策立场、更多AI公司财报及地缘局势发展。

marsbit4 小時前

marsbit4 小時前

交易

現貨

合約

Claude Bill Skyrockets by 5 Billion, Surges 60-Fold Overnight—Can Your Token Budget Keep Up?

文章摘要

Why Do AI Bills Explode?

The Leaderboard Gamed by Its Own Users

Those Who Count Tokens Are Already Making Money

When AI Bills Start Rivaling Labor Costs

相關問答

你可能也喜歡

a16z 全球化转向：VC 正在成为美国科技联盟的「推手」

解读Agent商业、支付与基础设施的真相

Kalshi、MTS 与 a16z 的野望

一周代币解锁：HOME解锁超流通量20%代币

美股遭遇2025年来最惨烈暴跌，三大导火索引爆科技股估值重估

交易

熱門文章

如何購買BILL

相關討論

熱門問答

熱門分類

熱門標籤