Original title: The Largest U.S. Crypto Exchange Quietly Switched to a Chinese AI Model, Cutting Costs in Half
Original author: AI Hands-On Guide
A Figure That Makes Silicon Valley Uneasy
Recently, a statement by Brian Armstrong, CEO of Coinbase, the largest U.S. cryptocurrency exchange, caused a stir in tech circles:
"We switched our AI models to China's GLM 5.2 and Kimi 2.7, cutting our AI spend directly in half."
In half? Did usage decrease then?
Quite the opposite. Coinbase's token usage has been increasing.
Spending less while using more—that's what truly makes OpenAI and Anthropic uneasy.
How Did They Do It? Three Cost-Saving Strategies
Coinbase didn't just swap to a cheaper model. They built a complete "cost-saving system":
First Trick: Don't Commit to One Model, Let the System Choose
Coinbase built an automatic routing system. For each request that comes in, the system automatically selects the most suitable model based on task type, price, and caching status.
Not all tasks need the most expensive model. Use cheap ones for simple translation, better ones for complex reasoning—just like you wouldn't drive a sports car to the grocery store downstairs.
Second Trick: Boosting Cache Hit Rate from 5% to 60%
This is the most ruthless move. By optimizing caching strategies, Coinbase increased the cache hit rate from 5% to 60%.
Simply put, 60% of requests can reuse previous calculation results, significantly reducing the actual cost per call. This single optimization saved a huge chunk of money.
Third Trick: Context Engineering
Coinbase requires developers to streamline context, starting new sessions for new tasks instead of cramming too much into a single conversation.
This isn't laziness; it's a new discipline—what the industry calls Context Engineering. Anthropic explicitly stated in a technical blog: When managing AI agents, context engineering is more effective than prompt engineering.
In simple terms: It's not about making the AI smarter, but giving it more precise information.

▲ More and more enterprises are starting to be cost-conscious about AI models
It's Not Just Coinbase, It's a Trend
Coinbase isn't the first to try this.
Lindy, an AI startup with only 25 people, saw its CEO Flo Crivello replace all Claude with Deepseek. He told CNBC: "AI costs have already exceeded human labor costs; this is unsustainable." After the switch, costs "plummeted," saving millions of dollars.
Snowflake's CEO Sridhar Ramaswamy conducted a practical comparison: On 103 coding tasks, GLM-5.2 solved 66%, Claude Opus 4.7 solved 67%. The gap? Almost nonexistent.
But the price difference is real:
Price Comparison (per million tokens)
- GLM-5.2: Input $1.40 / Output $4.40
- Claude Opus 4.7: Input $5 / Output $25
- GPT-5.5: Input $5 / Output $30
Output prices differ by 5-7 times.
You Get What You Pay For? Don't Rush to Judgment
Reading this, you might ask: With such a big discount, is the quality the same?
To be honest, not exactly, but the gap is smaller than you think.
Snowflake's tests showed GLM-5.2 is indeed less stable on some tasks—first-attempt success rate 47.6%, lower than Opus's 53.7%. Also, GLM sometimes "doubles down" on the wrong path: On one task, it spent 24 minutes calling tools 411 times, yet still failed. Opus finished it with 49 calls in 9 minutes.
But on most tasks, their final success rates are almost equal. The key question is: Are you willing to pay 5 times more for a few percentage points of stability?
For many enterprises, the answer is increasingly clear: No.

▲ The price gap between Chinese and Western AI models is reshaping the industry landscape
What Does This Mean for Us Ordinary People?
You might say: I'm not Coinbase, what does this have to do with me?
Actually, this trend has three direct implications for how you use AI:
1. Don't Stick to Just One Model
Many people stick to one AI—either ChatGPT or Claude. But professional players don't operate that way anymore. Using different models for different tasks is the most cost-effective approach.
Use cheap models for daily Q&A, good ones for coding and analysis. Just like you don't eat at a Michelin-star restaurant for every meal.
2. Caching and Reuse Are Key to Saving Money
If you frequently use AI for similar tasks (e.g., writing weekly reports, organizing daily notes), learning to use caching and templates can significantly reduce consumption.
3. Streamlined Context = Better Results
Many people try to cram all background information into an AI conversation. But facts prove that giving the AI less but more precise information yields better results. For a new task, start a new conversation. Don't make the AI dig through a pile of history for answers.
Deeper Change: The AI Pricing Model Is Being Reshaped
Behind this wave of "model migration" lies a shakeup in the entire AI industry's pricing logic.
The high valuations of OpenAI and Anthropic are built on the assumption of "continuously high revenue growth." But if more and more enterprises follow Coinbase and Lindy's lead and switch to cheaper alternatives, this assumption won't hold.
According to reports, a price war has already started between OpenAI and Anthropic. In OpenAI's newly released GPT-5.6 series, the Terra model is half the price of GPT-5.5, and Luna is positioned as the lowest-cost option.
For users, this is good news. The more competition, the lower the prices, the more choices.
When American giants start using Chinese models to save money, it shows that AI competition is no longer just a benchmark race in the lab, but a real cost competition involving hard cash. Being able to do the same thing with less money is the real skill.








