By the end of 2025, the annual usage report released by OpenRouter, the world's largest AI model aggregation platform, showed that 47% of its users were from the United States, while Chinese developers accounted for 6%. Additionally, English comprised 83% of the platform's content calls, with Chinese making up less than 5%.
However, as of the week of April 3, 2026, six of the top ten models by call volume on the platform were from China. Ranked from highest to lowest call volume, they were: Xiaomi MiMo-V2-Pro, StepFun Step 3.5 Flash, MiniMax M2.7, DeepSeek V3.2, Zhipu GLM 5 Turbo, and MiniMax M2.5. Among them, Xiaomi's MiMo-V2-Pro topped the entire platform with 4.82 trillion tokens.
In fact, since the week of February 9 to 15, 2026, when the call volume of Chinese models first surpassed that of the U.S., the lead of Chinese models has been maintained for nearly two months.
The OpenRouter platform aggregates over 400 AI models, covering more than 60 suppliers. Its call volume data is regarded as one of the windows to observe the model preference of global developers. Developers can switch between different models at any time using the same API Key (a key used for authentication and service calls).
Chris Clark, co-founder and COO of OpenRouter, publicly stated in February 2026 that Chinese open-source models account for a disproportionately high share in the Agent workflows run by U.S. enterprises. Meanwhile, discussions in the developer community about task allocation between models and cost optimization are increasing.
Some views compare this phenomenon to China's manufacturing industry 30 years ago: at that time, China leveraged cost advantages to enter the assembly segment of the global electronics industry chain, giving rise to contract manufacturers like Foxconn and Luxshare Precision; today, Chinese large models are also using price advantages to enter the execution segment of the global AI industry chain. Some also view domestic large models as the "Foxconn of the AI era."
What role do domestic large models play in the AI industry chain? How high is the actual value of this role?
Price Advantage
A review by Economic Observer reporters of the official API pricing of various manufacturers as of the end of March 2026 revealed a huge price gap between mainstream large models from China and the U.S.
Taking input prices as an example, among Chinese models, DeepSeek V3.2 is $0.28 per million tokens, MiniMax M2.5 is $0.3, and Moonshot AI's Kimi K2.5 is $0.42. Among U.S. models, Anthropic's Claude Opus 4.6 is $5, and OpenAI's GPT-5.4 is $2.50. The input price of mainstream U.S. models is about 10 to 20 times that of mainstream Chinese models.
The gap in output prices is even more pronounced. For Chinese models, DeepSeek V3.2 is $0.42 per million tokens, MiniMax M2.5 is $1.1, and Moonshot AI's Kimi K2.5 is $2.2. For U.S. models, OpenAI's GPT-5.4 is $15, and Claude Opus 4.6 is $25. The output price gap between mainstream Chinese and U.S. models ranges from about 7 times to 60 times.
This price difference has always existed but did not trigger large-scale user migration previously for a simple reason: most people's primary use case for AI was chatting, where token consumption was low, and the price difference had minimal impact.
However, in early 2026, the emergence of a "lobster" changed all that. The open-source tool OpenClaw (referred to as "Lobster" by the developer community) quickly gained popularity around February 2026, soon topping OpenRouter's application rankings and consuming over 600 billion tokens in a single week. "Lobster" is an agent application. Unlike the past "question-and-answer" chat mode, it enables AI to autonomously perform tasks like programming, testing, and file management on a computer without step-by-step human intervention.
In this workflow, token consumption is on a completely different scale compared to chat scenarios.
For example, a programming task might require dozens of cycles of "write code -> run -> error -> modify -> run again," each cycle being a complete model call. To allow the agent to remember previous operations, each call also requires the conversation history.
Some developers have stated on social platforms that an active OpenClaw session context can easily expand to over 230,000 tokens. If using the Claude API throughout, the cost could range from $800 to $1500 per month. Some users reported that a misconfigured automated task burned through $200 in a single day.
Agent applications like OpenClaw have driven up the platform's overall token consumption. For instance, in the week of March 3 to 9, 2025, the total weekly call volume of the top ten models on OpenRouter was 1.24 trillion tokens. By the week of February 16 to 22, 2026, the weekly call volume of just the top ten models exceeded 8.7 trillion tokens, an increase of nearly 7 times. The proportion of programming tasks in the platform's token consumption also rose from 11% in early 2025 to over 50% by the end of 2025.
When the token consumption per task increased from thousands to hundreds of thousands, the price gap between Chinese and U.S. models transformed from a negligible cost into a significant difference of hundreds or even thousands of dollars per month.
Around February 19, 2026, U.S. large model company Anthropic updated its terms of service, prohibiting users from connecting Claude subscription account credentials to third-party tools like OpenClaw and requiring pay-as-you-go billing via API. Google subsequently imposed similar restrictions. For agent applications that require frequent API calls daily, the price factor in model selection became an unavoidable issue, pushing developers onto the pay-as-you-go track.
In the core programming scenarios for agents, the capabilities of Chinese and U.S. models are already quite close.
SWE-Bench Verified is a public evaluation of programming capabilities maintained by a research team at Princeton University. The method involves having AI models fix real code issues on GitHub (the world's largest open-source code hosting platform). According to data on the public leaderboard of this evaluation, the Chinese model MiniMax M2.5, released on February 13, 2026, scored 80.2%, while the U.S. model Claude Opus 4.6, released on February 5, scored 80.8%, a difference of only 0.6 percentage points.
With comparable capabilities but vastly different prices, developers' choices were quickly reflected in the data.
In the week of February 9 to 15, 2026, Chinese model token call volume reached 4.12 trillion, surpassing the U.S. models' 2.94 trillion for the first time. The following week, Chinese model call volume rose to 5.16 trillion, a 127% increase in three weeks. During the same period, U.S. model call volume dropped to 2.7 trillion.
Why can Chinese large models be so much cheaper than U.S. models?
Pan Helin, a member of the Expert Committee on Information and Communication Economy of the Ministry of Industry and Information Technology, told the Economic Observer that there are two main reasons: first, the scale of China's computing power infrastructure is large with high reuse rates, leading to lower quotes; second, there is a large amount of self-built computing power within Chinese computing clusters, acquired at lower costs than overseas.
Additionally, technical routes also affect costs. Some industry insiders told reporters that mainstream Chinese large models generally adopt the MoE architecture, also known as "Mixture of Experts." Simply put, although a MoE model has a large total parameter count, only a small portion of these parameters are activated to handle a task during each operation, rather than all parameters, which significantly reduces the computational load required for each inference.
Different Paths
Martin Casado, a partner at Silicon Valley venture capital firm a16z, stated at the end of 2025 that among AI startups using open-source technology stacks, about 80% use Chinese models. He later clarified on social media that this did not mean 80% of U.S. AI startups use Chinese models, but rather that among those choosing the open-source technology route (accounting for about 20% to 30% of all U.S. AI startups), about 80% use Chinese models.
Reporters noted that multiple open-source tools have appeared on GitHub to help developers optimize costs across different models. The general idea is to grade tasks by difficulty, assigning simple tasks to free or low-cost Chinese models and reserving complex tasks for expensive U.S. models.
One project named ClawRouter provided comparative data in its documentation, showing that after adopting this mixed approach, the average cost dropped from $25 per million tokens to about $2. Anthropic's product ClaudeCode also uses a similar hierarchical design in its official documentation, defaulting to the cheapest model for routine tasks.
The premise for this model to work is that Chinese models are sufficiently capable in execution tasks. In programming, the SWE-Bench data mentioned earlier illustrates this point. But beyond programming, how large is the overall capability gap between Chinese and U.S. large models?
LMSYS Chatbot Arena is one of the globally most recognized AI model evaluation platforms. Its method involves having real users trial two models simultaneously without knowing their names, then voting for the better one, equivalent to a blind taste test for AIs.
In its comprehensive rankings as of March 25, 2026, the top five positions were all held by U.S. company models. The highest-ranked Chinese model, DeepSeek V3.2 Speciale, was sixth. The gap is more pronounced in the Hard Prompts category (specifically designed to test a model's ability to handle complex reasoning and multi-step logic tasks), where the first tier is still primarily composed of U.S. models.
Close programming capabilities but a remaining gap in complex reasoning—this is the manifestation of the differentiated capabilities between Chinese and U.S. large models today and the foundation for the viability of the "layered calling" approach.
However, unlike being locked into low-profit-margin contract manufacturing 30 years ago, Chinese large model vendors have not continuously driven prices down.
In fact, the Chinese large model industry experienced a price war starting in 2024: In May 2024, ByteDance's Volcano Engine Doubao model triggered a "price war" with a price of 0.0008 yuan per thousand tokens, followed by Alibaba Cloud and Baidu Intelligent Cloud. In the nearly year that followed, the industry saw token prices drop by over 90%, with inference computing毛利率 (gross margin) for some vendors turning negative at times.
The strategy for vendors at the time was to accept losses to gain scale and cultivate user calling habits. However, after OpenClaw's popularity surge in February 2026, token consumption growth far exceeded expectations, and computing power supply tightened.
Zhipu was the first to react. It raised API pricing when releasing the new model GLM-5 on February 12, 2026, and raised prices again when releasing GLM-5-Turbo on March 16, with a cumulative increase of 83% over the two rounds.
Zhipu CEO Zhang Peng stated at the 2025 annual performance briefing that API call pricing increased by 83% in Q1 2026, while call volume grew by 400%. According to the annual report, Zhipu's full-year revenue for 2025 was 724.3 million yuan, a year-on-year increase of 132%, and the annual recurring revenue of its MaaS (Model-as-a-Service) platform was approximately 1.7 billion yuan, a 60-fold increase in 12 months.
Zhipu wasn't the only one choosing to raise prices. On March 13, 2026, Tencent Cloud adjusted pricing for its Hunyuan series large models, with some models seeing increases of over 460%. On March 18, Alibaba Cloud and Baidu Intelligent Cloud issued price adjustment announcements on the same day, with increases for AI computing power-related products ranging from 5% to 34%, effective April 18.
Li Bin, Senior Vice President of Sugon, told the Economic Observer in an interview that the evaluation metrics for computing power systems are changing. The past standard for measuring a system was its amount of computing power, but now it's about how economically it can produce tokens.
The shift from collective price cuts to collective price hikes took less than two years.
In March 2026, Liu Liehong, head of the National Data Bureau, announced a set of figures at the China Development Forum: China's daily token call volume has exceeded 140 trillion, an increase of over 1000 times compared to two years ago.
At the GTC conference the same month, NVIDIA founder Jensen Huang stated that tokens would be the most core commodity in the future digital world.
In Pan Helin's view, the competitiveness of Chinese large models is strong; they are not catching up but leading, especially on the AI application end. However, he also stated that China still has room for improvement in original innovation. The core architectures in the current AI system, from artificial neural networks to attention mechanisms, were first proposed overseas and then iterated upon domestically. The next step for Chinese large models is to continue efforts on the application end while also pursuing original innovation in basic algorithms.
The consumer electronics contract manufacturing industry 30 years ago had a characteristic: the profit margin of the assembly segment was firmly suppressed by upstream brand owners. Many leading contract manufacturers still have gross margins not exceeding 10% today. Cost advantages brought orders but did not bring pricing power.
Currently, the situation of Chinese large models seems somewhat similar to the consumer electronics contract manufacturing industry back then, but seems quite different regarding pricing power. For example, after Zhipu raised prices by 83%, call volume grew by 400%. Alibaba Cloud, Baidu Intelligent Cloud, and Tencent Cloud collectively raised prices for AI computing power and model services in March 2026; demand did not shrink, and call volume continued to grow.
On the SWE-Bench programming evaluation, the gap between top Chinese models and top U.S. models has narrowed to less than 1 percentage point. The gap in complex reasoning remains, but it is also narrowing rapidly.
This time, the development path for Chinese large model manufacturers seems to be different.
This article is from the WeChat public account "Economic Observer", author: Zheng Chenye







