Chinese Large Models: This Time, the Script Is Different

marsbitОпубликовано 2026-04-07Обновлено 2026-04-07

Введение

By early 2026, Chinese large language models (LLMs) have gained significant global traction, representing six of the top ten most-used on the AI model aggregation platform OpenRouter. This shift, led by models like Xiaomi's MiMo-V2-Pro, occurred after Chinese models' weekly token usage surpassed that of U.S. models in February 2026. A key driver is the substantial price gap: Chinese models are often 10–20 times cheaper for input and up to 60 times cheaper for output tokens than leading U.S. models like OpenAI’s GPT-5.4 and Anthropic’s Claude Opus. This cost advantage became critical with the rise of agentic applications like OpenClaw, which automate complex tasks (e.g., programming, testing) and consume tokens at a much higher volume than traditional chat interfaces. While U.S. models still lead in complex reasoning benchmarks, Chinese models have nearly closed the gap in programming tasks—evidenced by near-parity scores on the SWE-Bench coding evaluation. This enabled cost-conscious developers, especially in AI startups using open-source stacks, to adopt a "layered" approach: using Chinese models for routine tasks and reserving premium U.S. models for harder problems. Rising demand led Chinese firms like Zhipu and Tencent to increase API prices in early 2026, yet usage continued growing sharply. Analysts note that China’s cost edge stems from large-scale, efficient compute infrastructure and widespread adoption of MoE (Mixture of Experts) architecture. Unlike the low-marg...

By the end of 2025, the annual usage report released by OpenRouter, the world's largest AI model aggregation platform, showed that 47% of its users were from the United States, while Chinese developers accounted for 6%. Additionally, English comprised 83% of the platform's content calls, with Chinese making up less than 5%.

However, as of the week of April 3, 2026, six of the top ten models by call volume on the platform were from China. Ranked from highest to lowest call volume, they were: Xiaomi MiMo-V2-Pro, StepFun Step 3.5 Flash, MiniMax M2.7, DeepSeek V3.2, Zhipu GLM 5 Turbo, and MiniMax M2.5. Among them, Xiaomi's MiMo-V2-Pro topped the entire platform with 4.82 trillion tokens.

In fact, since the week of February 9 to 15, 2026, when the call volume of Chinese models first surpassed that of the U.S., the lead of Chinese models has been maintained for nearly two months.

The OpenRouter platform aggregates over 400 AI models, covering more than 60 suppliers. Its call volume data is regarded as one of the windows to observe the model preference of global developers. Developers can switch between different models at any time using the same API Key (a key used for authentication and service calls).

Chris Clark, co-founder and COO of OpenRouter, publicly stated in February 2026 that Chinese open-source models account for a disproportionately high share in the Agent workflows run by U.S. enterprises. Meanwhile, discussions in the developer community about task allocation between models and cost optimization are increasing.

Some views compare this phenomenon to China's manufacturing industry 30 years ago: at that time, China leveraged cost advantages to enter the assembly segment of the global electronics industry chain, giving rise to contract manufacturers like Foxconn and Luxshare Precision; today, Chinese large models are also using price advantages to enter the execution segment of the global AI industry chain. Some also view domestic large models as the "Foxconn of the AI era."

What role do domestic large models play in the AI industry chain? How high is the actual value of this role?

Price Advantage

A review by Economic Observer reporters of the official API pricing of various manufacturers as of the end of March 2026 revealed a huge price gap between mainstream large models from China and the U.S.

Taking input prices as an example, among Chinese models, DeepSeek V3.2 is $0.28 per million tokens, MiniMax M2.5 is $0.3, and Moonshot AI's Kimi K2.5 is $0.42. Among U.S. models, Anthropic's Claude Opus 4.6 is $5, and OpenAI's GPT-5.4 is $2.50. The input price of mainstream U.S. models is about 10 to 20 times that of mainstream Chinese models.

The gap in output prices is even more pronounced. For Chinese models, DeepSeek V3.2 is $0.42 per million tokens, MiniMax M2.5 is $1.1, and Moonshot AI's Kimi K2.5 is $2.2. For U.S. models, OpenAI's GPT-5.4 is $15, and Claude Opus 4.6 is $25. The output price gap between mainstream Chinese and U.S. models ranges from about 7 times to 60 times.

This price difference has always existed but did not trigger large-scale user migration previously for a simple reason: most people's primary use case for AI was chatting, where token consumption was low, and the price difference had minimal impact.

However, in early 2026, the emergence of a "lobster" changed all that. The open-source tool OpenClaw (referred to as "Lobster" by the developer community) quickly gained popularity around February 2026, soon topping OpenRouter's application rankings and consuming over 600 billion tokens in a single week. "Lobster" is an agent application. Unlike the past "question-and-answer" chat mode, it enables AI to autonomously perform tasks like programming, testing, and file management on a computer without step-by-step human intervention.

In this workflow, token consumption is on a completely different scale compared to chat scenarios.

For example, a programming task might require dozens of cycles of "write code -> run -> error -> modify -> run again," each cycle being a complete model call. To allow the agent to remember previous operations, each call also requires the conversation history.

Some developers have stated on social platforms that an active OpenClaw session context can easily expand to over 230,000 tokens. If using the Claude API throughout, the cost could range from $800 to $1500 per month. Some users reported that a misconfigured automated task burned through $200 in a single day.

Agent applications like OpenClaw have driven up the platform's overall token consumption. For instance, in the week of March 3 to 9, 2025, the total weekly call volume of the top ten models on OpenRouter was 1.24 trillion tokens. By the week of February 16 to 22, 2026, the weekly call volume of just the top ten models exceeded 8.7 trillion tokens, an increase of nearly 7 times. The proportion of programming tasks in the platform's token consumption also rose from 11% in early 2025 to over 50% by the end of 2025.

When the token consumption per task increased from thousands to hundreds of thousands, the price gap between Chinese and U.S. models transformed from a negligible cost into a significant difference of hundreds or even thousands of dollars per month.

Around February 19, 2026, U.S. large model company Anthropic updated its terms of service, prohibiting users from connecting Claude subscription account credentials to third-party tools like OpenClaw and requiring pay-as-you-go billing via API. Google subsequently imposed similar restrictions. For agent applications that require frequent API calls daily, the price factor in model selection became an unavoidable issue, pushing developers onto the pay-as-you-go track.

In the core programming scenarios for agents, the capabilities of Chinese and U.S. models are already quite close.

SWE-Bench Verified is a public evaluation of programming capabilities maintained by a research team at Princeton University. The method involves having AI models fix real code issues on GitHub (the world's largest open-source code hosting platform). According to data on the public leaderboard of this evaluation, the Chinese model MiniMax M2.5, released on February 13, 2026, scored 80.2%, while the U.S. model Claude Opus 4.6, released on February 5, scored 80.8%, a difference of only 0.6 percentage points.

With comparable capabilities but vastly different prices, developers' choices were quickly reflected in the data.

In the week of February 9 to 15, 2026, Chinese model token call volume reached 4.12 trillion, surpassing the U.S. models' 2.94 trillion for the first time. The following week, Chinese model call volume rose to 5.16 trillion, a 127% increase in three weeks. During the same period, U.S. model call volume dropped to 2.7 trillion.

Why can Chinese large models be so much cheaper than U.S. models?

Pan Helin, a member of the Expert Committee on Information and Communication Economy of the Ministry of Industry and Information Technology, told the Economic Observer that there are two main reasons: first, the scale of China's computing power infrastructure is large with high reuse rates, leading to lower quotes; second, there is a large amount of self-built computing power within Chinese computing clusters, acquired at lower costs than overseas.

Additionally, technical routes also affect costs. Some industry insiders told reporters that mainstream Chinese large models generally adopt the MoE architecture, also known as "Mixture of Experts." Simply put, although a MoE model has a large total parameter count, only a small portion of these parameters are activated to handle a task during each operation, rather than all parameters, which significantly reduces the computational load required for each inference.

Different Paths

Martin Casado, a partner at Silicon Valley venture capital firm a16z, stated at the end of 2025 that among AI startups using open-source technology stacks, about 80% use Chinese models. He later clarified on social media that this did not mean 80% of U.S. AI startups use Chinese models, but rather that among those choosing the open-source technology route (accounting for about 20% to 30% of all U.S. AI startups), about 80% use Chinese models.

Reporters noted that multiple open-source tools have appeared on GitHub to help developers optimize costs across different models. The general idea is to grade tasks by difficulty, assigning simple tasks to free or low-cost Chinese models and reserving complex tasks for expensive U.S. models.

One project named ClawRouter provided comparative data in its documentation, showing that after adopting this mixed approach, the average cost dropped from $25 per million tokens to about $2. Anthropic's product ClaudeCode also uses a similar hierarchical design in its official documentation, defaulting to the cheapest model for routine tasks.

The premise for this model to work is that Chinese models are sufficiently capable in execution tasks. In programming, the SWE-Bench data mentioned earlier illustrates this point. But beyond programming, how large is the overall capability gap between Chinese and U.S. large models?

LMSYS Chatbot Arena is one of the globally most recognized AI model evaluation platforms. Its method involves having real users trial two models simultaneously without knowing their names, then voting for the better one, equivalent to a blind taste test for AIs.

In its comprehensive rankings as of March 25, 2026, the top five positions were all held by U.S. company models. The highest-ranked Chinese model, DeepSeek V3.2 Speciale, was sixth. The gap is more pronounced in the Hard Prompts category (specifically designed to test a model's ability to handle complex reasoning and multi-step logic tasks), where the first tier is still primarily composed of U.S. models.

Close programming capabilities but a remaining gap in complex reasoning—this is the manifestation of the differentiated capabilities between Chinese and U.S. large models today and the foundation for the viability of the "layered calling" approach.

However, unlike being locked into low-profit-margin contract manufacturing 30 years ago, Chinese large model vendors have not continuously driven prices down.

In fact, the Chinese large model industry experienced a price war starting in 2024: In May 2024, ByteDance's Volcano Engine Doubao model triggered a "price war" with a price of 0.0008 yuan per thousand tokens, followed by Alibaba Cloud and Baidu Intelligent Cloud. In the nearly year that followed, the industry saw token prices drop by over 90%, with inference computing毛利率 (gross margin) for some vendors turning negative at times.

The strategy for vendors at the time was to accept losses to gain scale and cultivate user calling habits. However, after OpenClaw's popularity surge in February 2026, token consumption growth far exceeded expectations, and computing power supply tightened.

Zhipu was the first to react. It raised API pricing when releasing the new model GLM-5 on February 12, 2026, and raised prices again when releasing GLM-5-Turbo on March 16, with a cumulative increase of 83% over the two rounds.

Zhipu CEO Zhang Peng stated at the 2025 annual performance briefing that API call pricing increased by 83% in Q1 2026, while call volume grew by 400%. According to the annual report, Zhipu's full-year revenue for 2025 was 724.3 million yuan, a year-on-year increase of 132%, and the annual recurring revenue of its MaaS (Model-as-a-Service) platform was approximately 1.7 billion yuan, a 60-fold increase in 12 months.

Zhipu wasn't the only one choosing to raise prices. On March 13, 2026, Tencent Cloud adjusted pricing for its Hunyuan series large models, with some models seeing increases of over 460%. On March 18, Alibaba Cloud and Baidu Intelligent Cloud issued price adjustment announcements on the same day, with increases for AI computing power-related products ranging from 5% to 34%, effective April 18.

Li Bin, Senior Vice President of Sugon, told the Economic Observer in an interview that the evaluation metrics for computing power systems are changing. The past standard for measuring a system was its amount of computing power, but now it's about how economically it can produce tokens.

The shift from collective price cuts to collective price hikes took less than two years.

In March 2026, Liu Liehong, head of the National Data Bureau, announced a set of figures at the China Development Forum: China's daily token call volume has exceeded 140 trillion, an increase of over 1000 times compared to two years ago.

At the GTC conference the same month, NVIDIA founder Jensen Huang stated that tokens would be the most core commodity in the future digital world.

In Pan Helin's view, the competitiveness of Chinese large models is strong; they are not catching up but leading, especially on the AI application end. However, he also stated that China still has room for improvement in original innovation. The core architectures in the current AI system, from artificial neural networks to attention mechanisms, were first proposed overseas and then iterated upon domestically. The next step for Chinese large models is to continue efforts on the application end while also pursuing original innovation in basic algorithms.

The consumer electronics contract manufacturing industry 30 years ago had a characteristic: the profit margin of the assembly segment was firmly suppressed by upstream brand owners. Many leading contract manufacturers still have gross margins not exceeding 10% today. Cost advantages brought orders but did not bring pricing power.

Currently, the situation of Chinese large models seems somewhat similar to the consumer electronics contract manufacturing industry back then, but seems quite different regarding pricing power. For example, after Zhipu raised prices by 83%, call volume grew by 400%. Alibaba Cloud, Baidu Intelligent Cloud, and Tencent Cloud collectively raised prices for AI computing power and model services in March 2026; demand did not shrink, and call volume continued to grow.

On the SWE-Bench programming evaluation, the gap between top Chinese models and top U.S. models has narrowed to less than 1 percentage point. The gap in complex reasoning remains, but it is also narrowing rapidly.

This time, the development path for Chinese large model manufacturers seems to be different.

This article is from the WeChat public account "Economic Observer", author: Zheng Chenye

Связанные с этим вопросы

QWhat percentage of AI model calls on OpenRouter came from Chinese models during the week of April 3, 2026?

ASix out of the top ten most called models on OpenRouter during the week of April 3, 2026, were from China, with Xiaomi's MiMo-V2-Pro ranking first with 4.82 trillion tokens.

QWhat is the main reason cited for the significant price difference between Chinese and American AI models?

AThe main reasons are China's large-scale, highly utilized computing infrastructure with lower pricing, the prevalence of self-built computing clusters with lower acquisition costs, and the widespread adoption of the MoE (Mixture of Experts) architecture which reduces computational requirements per task.

QWhat specific event in early 2026 triggered a massive shift in developer preference towards Chinese AI models?

AThe rise of the intelligent agent application 'OpenClaw' (also known as 'Lobster') in February 2026, which drastically increased token consumption for automated tasks like programming, making the large price gap between Chinese and American models a significant financial factor for developers.

QHow did Chinese AI model companies change their pricing strategy in response to surging demand in early 2026?

AAfter a previous price war, Chinese companies collectively shifted from cutting prices to raising them. For example, Zhipu raised its API prices by 83% over two adjustments, and other major providers like Alibaba Cloud, Baidu Cloud, and Tencent Cloud also announced significant price increases for their AI models and computing power.

QAccording to the SWE-Bench programming evaluation, how did the capabilities of top Chinese models compare to their American counterparts?

AAs of the data cited from February 2026, the gap was very small. The Chinese model MiniMax M2.5 scored 80.2% on the SWE-Bench benchmark, while the American model Claude Opus 4.6 scored 80.8%, a difference of only 0.6 percentage points.

Похожее

Торговля

Спот
Фьючерсы

Популярные статьи

Как купить S

Добро пожаловать на HTX.com! Мы сделали приобретение Sonic (S) простым и удобным. Следуйте нашему пошаговому руководству и отправляйтесь в свое крипто-путешествие.Шаг 1: Создайте аккаунт на HTXИспользуйте свой адрес электронной почты или номер телефона, чтобы зарегистрироваться и бесплатно создать аккаунт на HTX. Пройдите удобную регистрацию и откройте для себя весь функционал.Создать аккаунтШаг 2: Перейдите в Купить криптовалюту и выберите свой способ оплатыКредитная/Дебетовая Карта: Используйте свою карту Visa или Mastercard для мгновенной покупки Sonic (S).Баланс: Используйте средства с баланса вашего аккаунта HTX для простой торговли.Третьи Лица: Мы добавили популярные способы оплаты, такие как Google Pay и Apple Pay, для повышения удобства.P2P: Торгуйте напрямую с другими пользователями на HTX.Внебиржевая Торговля (OTC): Мы предлагаем индивидуальные услуги и конкурентоспособные обменные курсы для трейдеров.Шаг 3: Хранение Sonic (S)После приобретения вами Sonic (S) храните их в своем аккаунте на HTX. В качестве альтернативы вы можете отправить их куда-либо с помощью перевода в блокчейне или использовать для торговли с другими криптовалютами.Шаг 4: Торговля Sonic (S)С легкостью торгуйте Sonic (S) на спотовом рынке HTX. Просто зайдите в свой аккаунт, выберите торговую пару, совершайте сделки и следите за ними в режиме реального времени. Мы предлагаем удобный интерфейс как для начинающих, так и для опытных трейдеров.

1.1k просмотров всегоОпубликовано 2025.01.15Обновлено 2025.03.21

Как купить S

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

Он решает проблемы масштабируемости, совместимости между блокчейнами и стимулов для разработчиков с помощью технологических инноваций.

2.2k просмотров всегоОпубликовано 2025.04.09Обновлено 2025.04.09

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

HTX Learn — ваш проводник в мир перспективных проектов, и мы запускаем специальное мероприятие "Учитесь и Зарабатывайте", посвящённое этим проектам. Наше новое направление .

1.8k просмотров всегоОпубликовано 2025.04.10Обновлено 2025.04.10

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на S (S) представлены ниже.

活动图片