Original Title: US Largest Crypto Exchange Quietly Switches to Chinese AI Model, Saves Half the Cost
Original Author: AI Hands-on Notes
A Data Point That Makes Silicon Valley Uneasy
Recently, a statement made by Brian Armstrong, the CEO of the largest US cryptocurrency exchange, Coinbase, caused a stir in the tech circle:
"We switched our AI models to China's GLM 5.2 and Kimi 2.7, cutting AI expenses in half."
Cut in half? Did usage also drop?
On the contrary. Coinbase's token usage has been consistently increasing.
Using more while spending less is what truly makes OpenAI and Anthropic uneasy.
How Did They Do It? Three Cost-Saving Strategies
Coinbase didn't just swap to a cheaper model. They built a complete "cost-saving system":
First Move: Don't Lock into One Model, Let the System Choose
Coinbase built an automated routing system. For each incoming request, the system automatically selects the most suitable model based on task type, price, and cache status.
Not every task requires the most expensive model. Simple translations use cheaper ones; complex reasoning uses better ones—just like you wouldn't drive a sports car to buy groceries downstairs.
Second Move: Boost Cache Hit Rate from 5% to 60%
This is the most impactful move. By optimizing caching strategies, Coinbase increased the cache hit rate from 5% to 60%.
Simply put, 60% of requests can reuse previous calculation results, significantly reducing the actual cost per call. This single optimization saved a substantial amount of money.
Third Move: Context Engineering
Coinbase requires developers to streamline context, start new sessions for new tasks, and avoid cramming too much into a single conversation.
This isn't laziness; it's a new field of study—known in the industry as Context Engineering. In a technical blog, Anthropic explicitly stated: when managing AI agents, context engineering is more effective than prompt engineering.
Simply put: it's not about making the AI smarter, but giving it more precise information.

▲ More and more enterprises are starting to be meticulous about AI model costs
Not Just Coinbase, This is a Trend
Coinbase isn't the first to try this.
Lindy, an AI startup with only 25 people, had its CEO Flo Crivello completely replace Claude with Deepseek. He told CNBC: "AI costs have already surpassed human costs; this is unsustainable." After switching models, costs "plummeted," saving millions of dollars.
Snowflake's CEO Sridhar Ramaswamy conducted a hands-on comparison: on 103 coding tasks, GLM-5.2 solved 66%, while Claude Opus 4.7 solved 67%. The gap? Almost none.
But the price gap is real:
Price Comparison (Per Million Tokens)
- GLM-5.2: Input $1.40 / Output $4.40
- Claude Opus 4.7: Input $5 / Output $25
- GPT-5.5: Input $5 / Output $30
Output prices differ by 5-7 times.
Cheap Means No Good? Don't Jump to Conclusions
Reading this, you might ask: It's so much cheaper, is the quality the same?
Honestly, not exactly the same, but the gap is smaller than you think.
Snowflake's tests showed that GLM-5.2 is indeed less stable on certain tasks—first-attempt success rate was 47.6%, lower than Opus's 53.7%. Also, GLM sometimes "perseverates" on the wrong approach: on one task, it spent 24 minutes making 411 tool calls and still failed. Opus solved it in 9 minutes with 49 calls.
But on most tasks, the final success rates of the two were almost equal. The key question is: Are you willing to pay 5 times more for a few percentage points of stability?
For many companies, the answer is increasingly clear: No.

▲ The price gap between Chinese and Western AI models is reshaping the industry landscape
What Does This Mean for Us Ordinary People?
You might say: I'm not Coinbase, what does this have to do with me?
Actually, this trend offers three direct insights into how you use AI:
1. Don't Stick to Just One Model
Many people use AI and swear by just one—either ChatGPT or Claude. But professional players don't do that anymore. Using different models for different tasks is the most cost-effective approach.
Use cheaper ones for daily Q&A; use better ones for coding and analysis. It's like eating; you don't go to a Michelin-starred restaurant for every meal.
2. Caching and Reuse are Key to Saving Money
If you often use AI for similar tasks (like writing weekly reports or organizing notes daily), learning to leverage caching and templates can significantly reduce consumption.
3. Streamline Context = Better Results
Many people feed AI with every bit of background information. But facts show that giving AI less but more precise information leads to better results. New task? Start a new conversation. Don't make the AI search through a pile of history for answers.
Deeper Change: AI Pricing Models are Being Reshaped
Behind this wave of "model migration" is a shake-up of the entire AI industry's pricing logic.
The high valuations of OpenAI and Anthropic are built on the assumption of "continued high-speed revenue growth." But if more and more companies, like Coinbase and Lindy, switch to cheaper alternatives, this assumption crumbles.
Reportedly, OpenAI and Anthropic have already begun a price war. In OpenAI's newly released GPT-5.6 series, the Terra model is half the price of GPT-5.5, and the Luna model focuses on being the lowest-cost option.
For users, this is good news. The fiercer the competition, the lower the prices, and the more choices available.
When US giants start using Chinese models to save money, it shows that AI competition is no longer just a benchmark race in the lab, but a real cost competition measured in hard cash. The real skill is achieving the same results while spending less.






