When US Giants Collectively "Defect" to Chinese AI Models

marsbit发布于2026-07-03更新于2026-07-03

文章摘要

When Silicon Valley Giants Turn to Chinese AI Models to Cut Costs A surprising trend is emerging: major U.S. tech companies are significantly reducing AI costs by switching to Chinese models. Coinbase, the largest U.S. cryptocurrency exchange, reportedly halved its AI spending after migrating to China's GLM-5.2 and Kimi 2.7 models, despite increasing usage. They achieved this through a sophisticated three-part strategy: implementing an automatic routing system to select the most cost-effective model per task, boosting cache hit rates from 5% to 60% to reuse computations, and employing "context engineering" to provide AI with more precise, less cluttered information. They are not alone. AI startup Lindy switched from Claude to DeepSeek, saving millions, while Snowflake's tests found GLM-5.2 solved 66% of coding tasks compared to Claude Opus's 67%—but at a fraction of the cost (output pricing is 5-7 times lower). While the top Western models may offer slightly better stability, the massive price differential is leading many businesses to reconsider their value proposition. This shift signals a deeper change in the AI industry, moving beyond pure performance benchmarks to a fierce cost competition. As pressure mounts, even OpenAI and Anthropic have begun slashing prices. For users, this means more choices, lower costs, and a crucial lesson: using multiple models based on task complexity, optimizing with caching, and keeping contexts lean are now key to leveraging AI efficient...

Original Title: US Largest Crypto Exchange Quietly Switches to Chinese AI Model, Saves Half the Cost

Original Author: AI Hands-on Notes

A Data Point That Makes Silicon Valley Uneasy

Recently, a statement made by Brian Armstrong, the CEO of the largest US cryptocurrency exchange, Coinbase, caused a stir in the tech circle:

"We switched our AI models to China's GLM 5.2 and Kimi 2.7, cutting AI expenses in half."

Cut in half? Did usage also drop?

On the contrary. Coinbase's token usage has been consistently increasing.

Using more while spending less is what truly makes OpenAI and Anthropic uneasy.

How Did They Do It? Three Cost-Saving Strategies

Coinbase didn't just swap to a cheaper model. They built a complete "cost-saving system":

First Move: Don't Lock into One Model, Let the System Choose

Coinbase built an automated routing system. For each incoming request, the system automatically selects the most suitable model based on task type, price, and cache status.

Not every task requires the most expensive model. Simple translations use cheaper ones; complex reasoning uses better ones—just like you wouldn't drive a sports car to buy groceries downstairs.

Second Move: Boost Cache Hit Rate from 5% to 60%

This is the most impactful move. By optimizing caching strategies, Coinbase increased the cache hit rate from 5% to 60%.

Simply put, 60% of requests can reuse previous calculation results, significantly reducing the actual cost per call. This single optimization saved a substantial amount of money.

Third Move: Context Engineering

Coinbase requires developers to streamline context, start new sessions for new tasks, and avoid cramming too much into a single conversation.

This isn't laziness; it's a new field of study—known in the industry as Context Engineering. In a technical blog, Anthropic explicitly stated: when managing AI agents, context engineering is more effective than prompt engineering.

Simply put: it's not about making the AI smarter, but giving it more precise information.

▲ More and more enterprises are starting to be meticulous about AI model costs

Not Just Coinbase, This is a Trend

Coinbase isn't the first to try this.

Lindy, an AI startup with only 25 people, had its CEO Flo Crivello completely replace Claude with Deepseek. He told CNBC: "AI costs have already surpassed human costs; this is unsustainable." After switching models, costs "plummeted," saving millions of dollars.

Snowflake's CEO Sridhar Ramaswamy conducted a hands-on comparison: on 103 coding tasks, GLM-5.2 solved 66%, while Claude Opus 4.7 solved 67%. The gap? Almost none.

But the price gap is real:

Price Comparison (Per Million Tokens)

  • GLM-5.2: Input $1.40 / Output $4.40
  • Claude Opus 4.7: Input $5 / Output $25
  • GPT-5.5: Input $5 / Output $30

Output prices differ by 5-7 times.

Cheap Means No Good? Don't Jump to Conclusions

Reading this, you might ask: It's so much cheaper, is the quality the same?

Honestly, not exactly the same, but the gap is smaller than you think.

Snowflake's tests showed that GLM-5.2 is indeed less stable on certain tasks—first-attempt success rate was 47.6%, lower than Opus's 53.7%. Also, GLM sometimes "perseverates" on the wrong approach: on one task, it spent 24 minutes making 411 tool calls and still failed. Opus solved it in 9 minutes with 49 calls.

But on most tasks, the final success rates of the two were almost equal. The key question is: Are you willing to pay 5 times more for a few percentage points of stability?

For many companies, the answer is increasingly clear: No.

▲ The price gap between Chinese and Western AI models is reshaping the industry landscape

What Does This Mean for Us Ordinary People?

You might say: I'm not Coinbase, what does this have to do with me?

Actually, this trend offers three direct insights into how you use AI:

1. Don't Stick to Just One Model

Many people use AI and swear by just one—either ChatGPT or Claude. But professional players don't do that anymore. Using different models for different tasks is the most cost-effective approach.

Use cheaper ones for daily Q&A; use better ones for coding and analysis. It's like eating; you don't go to a Michelin-starred restaurant for every meal.

2. Caching and Reuse are Key to Saving Money

If you often use AI for similar tasks (like writing weekly reports or organizing notes daily), learning to leverage caching and templates can significantly reduce consumption.

3. Streamline Context = Better Results

Many people feed AI with every bit of background information. But facts show that giving AI less but more precise information leads to better results. New task? Start a new conversation. Don't make the AI search through a pile of history for answers.

Deeper Change: AI Pricing Models are Being Reshaped

Behind this wave of "model migration" is a shake-up of the entire AI industry's pricing logic.

The high valuations of OpenAI and Anthropic are built on the assumption of "continued high-speed revenue growth." But if more and more companies, like Coinbase and Lindy, switch to cheaper alternatives, this assumption crumbles.

Reportedly, OpenAI and Anthropic have already begun a price war. In OpenAI's newly released GPT-5.6 series, the Terra model is half the price of GPT-5.5, and the Luna model focuses on being the lowest-cost option.

For users, this is good news. The fiercer the competition, the lower the prices, and the more choices available.

When US giants start using Chinese models to save money, it shows that AI competition is no longer just a benchmark race in the lab, but a real cost competition measured in hard cash. The real skill is achieving the same results while spending less.

热门币种推荐

相关问答

QAccording to the article, what specific cost-saving measures did Coinbase implement when switching to Chinese AI models?

ACoinbase implemented three main cost-saving strategies: 1) Building an automatic routing system that selects the most appropriate model for each task based on type, price, and cache status. 2) Dramatically improving the cache hit rate from 5% to 60%, reusing previous computations. 3) Applying 'Context Engineering,' which involves keeping prompts concise and starting new sessions for new tasks to avoid bloated contexts.

QWhat performance and price comparison between GLM-5.2 and Claude Opus 4.7 is presented in the article?

AIn a test on 103 coding tasks, GLM-5.2 solved 66% while Claude Opus 4.7 solved 67%, showing minimal performance difference. However, the price gap was significant: GLM-5.2 costs $1.40 per million tokens for input and $4.40 for output, whereas Claude Opus 4.7 costs $5 and $25 respectively. This makes GLM's output 5-7 times cheaper.

QBesides Coinbase, which other companies are mentioned as switching to more affordable AI models, and what were their reasons?

ATwo other companies are mentioned: 1) Lindy, a 25-person AI startup, replaced Claude with Deepseek because AI costs had surpassed human labor costs, leading to 'cliff-like' cost savings of millions of dollars. 2) Snowflake's CEO conducted tests comparing GLM-5.2 and Claude Opus, highlighting the massive price difference for similar performance.

QWhat does the article suggest are the key takeaways for individual users regarding efficient AI usage?

AThe article suggests three key takeaways for individuals: 1) Don't rely on just one model; use different models for different tasks for the best cost-performance ratio. 2) Utilize caching and templates for repetitive tasks to reduce consumption. 3) Practice 'Context Engineering'—keep prompts focused and start new sessions for new tasks instead of using long, cluttered conversations.

QWhat broader industry shift does the trend of companies switching to Chinese AI models signify, according to the article?

AThe trend signifies a fundamental shake-up in the AI industry's pricing logic. It challenges the assumption of sustained high revenue growth for companies like OpenAI and Anthropic. As competition intensifies—evidenced by OpenAI's own price cuts with models like Terra and Luna—the market is shifting from a pure performance race to a real-world cost-efficiency battle, benefiting users with lower prices and more choices.

你可能也喜欢

交易

现货

热门文章

从H2A到A2A:AI Agent经济体与Crypto新机遇

6月17日,哈佛大学独立研究员、美国AI科学院(NAAI)通讯院士、比特币基金会终身会员韩锋做客火币HTX《大咖讲堂》第三期,以《从H2A到A2A》为主题,分享了其对Agent经济、Crypto基础设施及数字社会未来发展的思考。

52人学过发布于 2026.07.01更新于 2026.07.01

从H2A到A2A:AI Agent经济体与Crypto新机遇

相关讨论

欢迎来到HTX社区。在这里,您可以了解最新的平台发展动态并获得专业的市场意见。以下是用户对AI(AI)币价的意见。

活动图片