Claude Bill Skyrockets by 5 Billion, Surges 60-Fold Overnight—Can Your Token Budget Keep Up?

marsbitPublished on 2026-06-01Last updated on 2026-06-01

Abstract

An enterprise reportedly ran up a staggering $500 million bill on Anthropic's Claude AI in just one month due to a simple oversight: failing to set usage limits for employee accounts. This incident highlights a growing trend of runaway AI costs. Other examples include a Google Cloud user hit with an unexpected $18,000 bill from API key abuse, and an OpenAI internal experiment that consumed 603 billion tokens, costing $1.3 million in 30 days. Major AI providers like OpenAI and GitHub are shifting from flat monthly fees to granular, usage-based pricing (per input/output/cached token), causing shock for some users whose costs skyrocketed by orders of magnitude. The root causes extend beyond pricing. The rise of autonomous AI agents executing long, complex tasks has drastically increased token consumption. Furthermore, misaligned incentives, like internal "leaderboards" ranking employees by AI usage, can encourage wasteful "tokenmaxxing"—using powerful models for trivial tasks just to inflate metrics. This has sparked a new industry focused on cost optimization. Solutions include providing AI with better context (reducing redundant searches) and intelligent model routing (matching tasks to the most cost-effective model). Research indicates token consumption for agentic tasks can vary wildly (up to 30x for the same job) without guaranteeing better results, and models often underestimate their own costs. As AI expenses begin to rival or even surpass human labor costs for some t...

A $500 Million Bill in Just 1 Month!

Recently, a shocking blunder erupted in the tech world. According to Axios, a company actually managed to rack up a $500 million bill on Claude in just one month!

The reason is laughable: management forgot to set usage limits when granting employees access to Claude accounts.

In fact, this isn't the only case of AI bills exploding.

In April, a Google Cloud user, whose publicly accessible API key was misused, received a bill for $18,000 overnight, despite having only a $7 budget set.

The unlucky user, Jesse Davies, is an Australian AI consultant and founder of Agentic Labs. He had set up two safeguards for his Google Cloud account: a A$10 (about $7) budget alert and a hard spending cap of $1,400.

As reported by Tom's Hardware, attackers discovered a Cloud Run service he had deployed months earlier from AI Studio, sending over 60,000 requests. Both safeguards failed: there was a delay in billing calculations, and by the time the system reacted, the amount had skyrocketed to $18,000.

In mid-May, Peter Steinberger, founder of the open-source project OpenClaw, posted a screenshot on X: a $1.3 million OpenAI API bill for 30 days.

His team has only three people, but they orchestrated 100 Codex agents running in parallel: burning through 603 billion Tokens and making 7.6 million requests in 30 days. Fortunately, he didn't have to foot this $1.3 million bill himself.

Steinberger joined OpenAI this past February, and this $1.3 million was treated as an internal experiment:

to test the absolute limits of AI programming when token cost is not a consideration. He added that this was the result of Codex's "Fast Mode" (higher-tier billing); turning it off reduced the cost to about $300,000.

Even earlier, Uber's CTO Praveen Neppalli Naga had admitted to The Information that the company had exhausted its annual Claude Code budget by April, and their COO also publicly stated that AI costs were becoming increasingly "hard to justify."

$500 million, $1.3 million, $18,000—though these figures differ by orders of magnitude, they point to the same reality:

In the age of agents, any one of these—a compromised key, an army of agents running 24/7, or an account with forgotten limits—can blow up your token bill overnight.

Why Do AI Bills Explode?

The answer lies mainly in the shift in billing methods.

Starting April this year, OpenAI began transitioning from monthly flat fees to usage-based billing by Token.

On April 2, Codex billing shifted from per-message estimates to alignment with actual Token usage: Input, Cached Input, and Output Tokens are billed separately. On April 23, this rule was extended to all Enterprise, Edu, Health, and Gov plans: the invisible discount within the monthly fee was removed.

GitHub followed closely, just announcing: all Copilot plans will switch to usage-based billing effective June 1, 2026. The old premium request logic is scrapped, replaced with AI credits, settled based on actual consumption of Input, Output, and Cached Tokens against each model's API rate.

GitHub officially explained the reason for this change:

Currently, a quick chat question and a multi-hour autonomous coding task cost the user the same amount. GitHub has been subsidizing the heavy users, but this model is no longer sustainable.

Before the rise of AI agents, the costs of chat and completions were similar, and monthly fees could cover them.

After agents rose, a single task could run for hours and modify entire codebases, creating a cost difference of orders of magnitude between heavy and light users. The flat monthly fee model collapses in the face of such disparity.

The news sparked an uproar on Reddit and X.

A developer with the ID JBusu shared a screenshot of their bill, bluntly calling the new pricing "a joke." Their previous monthly cost of $28.12 would become $746.01 under the new system. They've decided to cancel, "At this price, I could rent a cloud server myself and it would be cheaper."

Another user shared an even more extreme screenshot, showing costs soaring from $50 to $3,000. They said they never expected pricing to be this outrageous, "Is anyone still subscribing?"

However, some veteran Copilot users countered: these extreme bills are likely burned by "vibe-coders" who aren't mindful of token usage and may not represent normal use.

One veteran user commented: "I use it all day long and rarely exceed limits by month-end. It's hard to believe this is due to differences in task complexity." Another was more direct: "It's people wanting fully automated YOLO-mode development, letting AI run wild. Culling this waste is actually good for everyone else."

One thing is clear: GitHub hasn't abolished monthly fees; the base subscription price remains unchanged. What has changed is that extra usage, agent tasks, and calls to more expensive models now fall under usage-based billing.

The hardest hit are those heavy agent users who rely on Copilot for long-chain tasks.

The Leaderboard Gamed by Its Own Users

The collapse of flat fees is partly due to platforms changing their billing rules, and partly because AI users themselves are burning through tokens.

In May, Business Insider reported that Amazon took down an internal AI usage leaderboard called KiroRank.

The report cited insiders saying the leaderboard quietly encouraged a strange work style: some employees, to climb the ranks, would burn tokens on tasks that didn't solve actual problems, purely for ranking.

After the story broke, Amazon SVP Dave Treadwell directly addressed all employees: "Don't use AI for the sake of using AI. Use it to solve customer problems, business problems, to innovate."

Though absurd, this is hardly surprising. When "burning tokens" gets you on a leaderboard, employees will naturally burn tokens.

Silicon Valley has coined a term for this phenomenon: Tokenmaxxing—treating consumption volume as productivity.

Axios's report also mentioned CTOs discovering employees using cutting-edge AI models to check the weather or write routine emails—trivial tasks that, when run on the most expensive frontier models, can silently send bills soaring.

KiroRank wasn't part of Amazon's official evaluation system but an informal tool built by employees. Yet it clearly exposes a classic management principle: when KPIs are set wrong, people will use the cleverest ways to game the system.

Equating "how much was used" with "how well it was done"—this is the systemic root of this wave of AI waste.

Those Who Count Tokens Are Already Making Money

On the flip side of token bill anxiety, some are quietly turning it into a business.

First approach: Feed the AI with context.

Glean is actually Arvind Jain's own company. It builds an enterprise AI work assistant: unifying knowledge scattered across a company, giving employees' AI direct context so they don't have to dig around. The AI takes fewer detours, naturally burning fewer tokens.

This mechanism helped Glean's annual revenue triple in 15 months, crossing $300 million, with clients including Databricks, Reddit, and Samsung.

Second approach: Delegate tasks to the right model.

This is what model routing startup Factory AI does: automatically routing each task to the most suitable model, cheap ones for simple tasks, top-tier for complex ones. Arvind also noted: Do routing right, and you can save 10x.

Both paths lead to the same destination: Let AI work, but don't let it burn money indiscriminately.

Academic research is also laying the groundwork for this shift.

https://arxiv.org/pdf/2604.22750

An arXiv paper from April 2026 systematically broke down how agent coding tasks actually burn money for the first time.

Conclusion One: Token consumption for agent tasks can be thousands of times higher than ordinary code reasoning or code chat, with Input Tokens being the main cost driver.

Conclusion Two: Running the same task multiple times can result in a 30x difference in Token consumption.

Conclusion Three: Higher Token consumption does not necessarily lead to higher accuracy. Accuracy often peaks at medium cost—burning more beyond that spends money without yielding better results.

The paper also found that even frontier models can't reliably predict their own token consumption, generally underestimating the real cost.

You think spending more gets more done. In reality, money is spent, the work isn't necessarily better, and the budget is still unpredictable.

When AI Bills Start Rivaling Labor Costs

"This is the first time in my memory that technology costs are starting to be on par with human costs."

On May 29, Glean CEO Arvind Jain said this in an interview with CNBC's Deirdre Bosa.

Observations from Nvidia's Vice President of Applied Deep Learning, Bryan Catanzaro, corroborate this.

He mentioned in an Axios interview that for his team, compute costs far exceed employee salaries.

Similar trends are emerging across multiple companies: from enterprise AI player Glean, to AI compute seller Nvidia, to AI user Uber—all are re-evaluating this equation.

In Arvind's view, historically, technology was just a small slice of overall corporate costs. But now, AI costs are catching up to payrolls. Many companies' annual AI budgets are often burned through in just one or two months.

Over the past year, AI usage rate was a worshipped metric: more usage meant being advanced, burning tokens meant embracing the future. Now, many companies are reflecting on that simple question: What exactly did all those burned tokens buy?

The window of free or flat-rate unlimited usage is precisely closing at this moment.

Going forward, the question facing all developers is this: How to budget meticulously and maximize the value of every single Token.

Undoubtedly, the true winners of the future will be those who learn to count tokens first.

References:

https://x.com/dee_bosa/status/2060791500049613306%20

https://www.cnbc.com/2026/05/29/-tokens-or-humans-the-new-corporate-trade-off.html%20

https://www.axios.com/2026/05/28/ai-spending-roi-enterprise-costs%20

https://www.businessinsider.com/amazon-ai-leaderboard-tokenmaxxing-2026-5

This article is from the WeChat public account "AI Era Insights", author: ASI启示录

Related Questions

QWhat is the main reason behind the dramatic increase in AI usage costs as discussed in the article?

AThe primary reason is the shift from flat-rate monthly subscription models to consumption-based pricing (charging per Token used). This change, implemented by companies like OpenAI and GitHub, means that intensive AI agent tasks, which can consume orders of magnitude more tokens than simple chats or completions, now incur significantly higher costs.

QWhat incident involving a leaked API key led to a massive unexpected bill, and how much was it?

AAn Australian AI consultant named Jesse Davies had a Google Cloud API key exposed from a public service. Attackers used it to make over 60,000 requests, resulting in a bill of $18,000, despite him having set a budget alert and a hard spending limit.

QWhat does the term 'Token maxxing' refer to in the context of corporate AI use?

A'Token maxxing' refers to the practice of employees excessively consuming AI tokens, not to solve real problems, but to climb internal usage leaderboards (like Amazon's KiroRank) or meet misguided productivity KPIs that equate high token usage with good performance.

QWhat was the key finding of the April 2026 arXiv paper regarding AI agent coding tasks and cost?

AThe key finding was that AI agent tasks can consume up to a thousand times more tokens than standard code reasoning/dialogue, primarily due to input tokens. Crucially, higher token consumption does not necessarily lead to higher accuracy, with performance often plateauing at a medium cost level.

QAccording to the article, what are the two main business approaches emerging to help manage and reduce AI token costs?

A1. Providing context to AI: Companies like Glean build systems that give AI assistants direct access to relevant company knowledge, reducing the need for lengthy searches and context-building, thus saving tokens. 2. Model routing: Startups like Factory AI automatically route tasks to the most cost-appropriate AI model (e.g., simple tasks to cheaper models, complex ones to top-tier models), potentially saving up to 10x in costs.

Related Reads

Is the Sharp Decline Over? Let the Data Speak

**Has the Sharp Decline Ended? Let Data Speak** Bitcoin's recent significant drop has placed short sellers in a precarious position. Three concurrent pressures—sustained outflows from ETFs, miners offloading coins to exchanges, and short-term holders capitulating—pushed the price near $63k. The asset fell 13% this week and 21% this month, roughly halving from its all-time high. A critical data point is the extremely crowded short positioning, with a short-to-long ratio reaching 8:1, representing nearly $100 billion in short interest overhead. This creates conditions for a potential short squeeze if selling pressure merely pauses, similar to the event in November 2022 which triggered a 24% rally. The selling pressures are real: spot Bitcoin ETFs have seen a record $5.4 billion outflow over 20 days. Short-term holders moved 53k loss-held BTC to exchanges in a day, and miners sent 24k BTC to Binance, a six-month high. Capital is also rotating towards AI and tech stocks like SpaceX, with $400 billion invested in AI infrastructure recently. However, on-chain data shows accumulation by long-term holders, who added 200k BTC in a month, and institutions/miners have absorbed 1.24 million BTC since 2023. This indicates strong buying beneath the surface. Key levels to watch are the $67k-$70k zone (2021 high & 2024 breakout point). A swift recovery above it suggests a leverage washout; failure could test $60k-$55k. The direction also hinges on ETF flow reversal. Currently, the S&P 500 hits new highs driven by AI, while Bitcoin and DeFi (TVL down from $173b to $73.9b) lag. The most probable path is a grinding basing process between $60k-$58k with continued ETF outflows. A less likely but explosive scenario involves a sudden flow reversal, a surge above $70k triggering a short squeeze, and a rally back above $76k. The immediate trigger depends on when the relentless selling pauses. A final cautionary note questions Bitcoin's correlation: if the high-flying U.S. stock market corrects, will Bitcoin once again miss the rally but not the decline?

foresightnews_api17m ago

Is the Sharp Decline Over? Let the Data Speak

foresightnews_api17m ago

Single-Day Plunge of 30%, Arthur Hayes Suddenly Liquidates: Why Did ZEC Get Exploded by Security Issues?

On June 5th, Zcash founder Zooko Wilcox disclosed a critical soundness vulnerability in the project's latest Orchard privacy pool. This flaw, found in the elliptic curve multiplication constraints, could allow an attacker to create unlimited counterfeit ZEC within the shielded pool, with transactions appearing valid. The vulnerability was discovered in late May by security researcher Taylor Hornby, who utilized Anthropic's new Opus 4.8 AI model for a targeted audit. The Zcash ecosystem had already performed an emergency network upgrade to patch the issue. However, the detailed disclosure triggered severe market panic, causing ZEC's price to plummet over 30% in a single day. Notably, prominent investor Arthur Hayes announced he had sold his entire ZEC position following the news. The incident starkly challenges the "technological trust" narrative central to privacy coins. Despite years of top-tier cryptographic audits, the bug persisted until uncovered with advanced AI-assisted research. This highlights the growing gap between theoretical perfection and practical implementation in privacy technology. The event serves as a industry-wide warning: in an AI-driven security landscape, the assumption that "undiscovered equals safe" is obsolete. It underscores the urgent need for continuous, proactive security practices combining AI audits, formal verification, and rapid response mechanisms.

foresightnews_api1h ago

Single-Day Plunge of 30%, Arthur Hayes Suddenly Liquidates: Why Did ZEC Get Exploded by Security Issues?

foresightnews_api1h ago

Breaking the Curse of DeFi Cascading Liquidations, Vitalik Proposes a New Solution

**Vitalik Buterin Proposes New DeFi Design to Eliminate Forced Liquidations** Ethereum co-founder Vitalik Buterin has published a proposal for a new decentralized finance (DeFi) architecture aimed at removing the automatic liquidation mechanisms prevalent in current lending protocols. The core idea involves creating synthetic assets using options as building blocks, fundamentally avoiding the抵押借贷结构 that triggers forced sell-offs. The proposal responds to a recurring flaw in DeFi: during sharp market downturns, mass自动清算 of under-collateralized positions can exacerbate price declines, creating systemic selling pressure and market instability, as evidenced by recent crypto market volatility. Buterin's model would split an asset like 1 ETH into two option-like derivatives, P and N, pegged to a price index with a set strike price and expiration. At expiry, an oracle determines the settlement price to allocate the underlying ETH between P and N holders. This design eliminates the "cliff" of instant liquidation. Instead, a position's value would gradually drift from its target peg if not actively rebalanced by the user, transferring the rebalancing decision from the protocol to the user or automated tools. A key advantage is the reduced reliance on high-frequency, real-time oracle price feeds, which are vulnerable to manipulation and errors in current systems. The delayed settlement in the options model allows for more robust, fault-tolerant oracle designs. However, significant challenges remain for practical adoption. High transaction costs (slippage) from frequent rebalancing on automated market makers (AMMs) could erode user funds. The model may not be suitable for stablecoins requiring a strict 1:1 dollar peg, as it inherently allows for value drift. Success would depend on developing new liquidity provisioning models and deep markets for these synthetic assets. The proposal represents a fundamental rethinking of DeFi risk management, challenging the industry to explore alternatives to被动集中平仓 rather than merely optimizing existing liquidation processes. It remains a theoretical framework awaiting implementation and testing by development teams.

foresightnews_api1h ago

Breaking the Curse of DeFi Cascading Liquidations, Vitalik Proposes a New Solution

foresightnews_api1h ago

Bitcoin's Decline Marks the Transformation of Crypto

Title: The Decline of Bitcoin Marks the Transformation of Crypto While Bitcoin's price recently fell below $70,000, down approximately 45% from its peak, the broader crypto industry is not following it into decline. Instead, crypto is maturing and evolving beyond its dependence on Bitcoin's price movements. Two of Bitcoin's core functions are being usurped. First, AI has captured its role as the primary speculative asset. AI, with its tangible revenue, explosive demand, and massive capital inflows ($700-830 billion in 2024), is siphoning off the speculative "hot money" that once drove Bitcoin. It also contributes to a sustained high-interest-rate environment, further tightening liquidity for assets like Bitcoin. Second, dollar-pegged stablecoins like USDC and USDT have replaced Bitcoin as the crypto market's foundational currency and primary on/off-ramp. Most trading pairs and on-chain transactions are now settled in stablecoins, severing the historical link where all capital inflows had to pass through Bitcoin first. This decoupling allows projects to thrive based on their own fundamentals rather than Bitcoin's price. Examples include Hyperliquid, an on-chain derivatives exchange with annual revenues of $8-13 billion, and prediction market platform Polymarket, valued at $200 billion with $3.65 billion in annual fees. These projects are evaluated on traditional metrics like revenue and user growth. New opportunities are emerging, particularly around privacy. Privacy coins like Zcash (ZEC) are seeing surging demand, while infrastructure like NEAR enables private, cross-chain asset transfers without requiring users to hold a specific token—privacy becomes a universal service layer. In this new paradigm, stablecoins are the universal cash, various project tokens represent equity, and privacy-enabled cross-chain coordination layers (like NEAR) act as the critical infrastructure connecting a fragmented, multi-chain ecosystem. Bitcoin is now just one asset among many. The era where the entire crypto market moved in lockstep with Bitcoin is over. The industry's health should now be judged by project fundamentals—real revenue, active users, and tokenomics that capture value—and the development of the underlying infrastructure enabling a mature, dollar-denominated crypto economy.

foresightnews_api1h ago

Bitcoin's Decline Marks the Transformation of Crypto

foresightnews_api1h ago

Trading

Spot
Futures

Hot Articles

How to Buy BILL

Welcome to HTX.com! We've made purchasing Billions Network (BILL) simple and convenient. Follow our step-by-step guide to embark on your crypto journey.Step 1: Create Your HTX AccountUse your email or phone number to sign up for a free account on HTX. Experience a hassle-free registration journey and unlock all features.Get My AccountStep 2: Go to Buy Crypto and Choose Your Payment MethodCredit/Debit Card: Use your Visa or Mastercard to buy Billions Network (BILL) instantly.Balance: Use funds from your HTX account balance to trade seamlessly.Third Parties: We've added popular payment methods such as Google Pay and Apple Pay to enhance convenience.P2P: Trade directly with other users on HTX.Over-the-Counter (OTC): We offer tailor-made services and competitive exchange rates for traders.Step 3: Store Your Billions Network (BILL)After purchasing your Billions Network (BILL), store it in your HTX account. Alternatively, you can send it elsewhere via blockchain transfer or use it to trade other cryptocurrencies.Step 4: Trade Billions Network (BILL)Easily trade Billions Network (BILL) on HTX's spot market. Simply access your account, select your trading pair, execute your trades, and monitor in real-time. We offer a user-friendly experience for both beginners and seasoned traders.

1.1k Total ViewsPublished 2026.05.07Updated 2026.06.02

How to Buy BILL

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of BILL (BILL) are presented below.

活动图片