Chinese Large Models: This Time, the Script Is Different

marsbitОпубликовано 2026-04-07Обновлено 2026-04-07

Введение

By early 2026, Chinese large language models (LLMs) have gained significant global traction, representing six of the top ten most-used on the AI model aggregation platform OpenRouter. This shift, led by models like Xiaomi's MiMo-V2-Pro, occurred after Chinese models' weekly token usage surpassed that of U.S. models in February 2026. A key driver is the substantial price gap: Chinese models are often 10–20 times cheaper for input and up to 60 times cheaper for output tokens than leading U.S. models like OpenAI’s GPT-5.4 and Anthropic’s Claude Opus. This cost advantage became critical with the rise of agentic applications like OpenClaw, which automate complex tasks (e.g., programming, testing) and consume tokens at a much higher volume than traditional chat interfaces. While U.S. models still lead in complex reasoning benchmarks, Chinese models have nearly closed the gap in programming tasks—evidenced by near-parity scores on the SWE-Bench coding evaluation. This enabled cost-conscious developers, especially in AI startups using open-source stacks, to adopt a "layered" approach: using Chinese models for routine tasks and reserving premium U.S. models for harder problems. Rising demand led Chinese firms like Zhipu and Tencent to increase API prices in early 2026, yet usage continued growing sharply. Analysts note that China’s cost edge stems from large-scale, efficient compute infrastructure and widespread adoption of MoE (Mixture of Experts) architecture. Unlike the low-marg...

By the end of 2025, the annual usage report released by OpenRouter, the world's largest AI model aggregation platform, showed that 47% of its users were from the United States, while Chinese developers accounted for 6%. Additionally, English comprised 83% of the platform's content calls, with Chinese making up less than 5%.

However, as of the week of April 3, 2026, six of the top ten models by call volume on the platform were from China. Ranked from highest to lowest call volume, they were: Xiaomi MiMo-V2-Pro, StepFun Step 3.5 Flash, MiniMax M2.7, DeepSeek V3.2, Zhipu GLM 5 Turbo, and MiniMax M2.5. Among them, Xiaomi's MiMo-V2-Pro topped the entire platform with 4.82 trillion tokens.

In fact, since the week of February 9 to 15, 2026, when the call volume of Chinese models first surpassed that of the U.S., the lead of Chinese models has been maintained for nearly two months.

The OpenRouter platform aggregates over 400 AI models, covering more than 60 suppliers. Its call volume data is regarded as one of the windows to observe the model preference of global developers. Developers can switch between different models at any time using the same API Key (a key used for authentication and service calls).

Chris Clark, co-founder and COO of OpenRouter, publicly stated in February 2026 that Chinese open-source models account for a disproportionately high share in the Agent workflows run by U.S. enterprises. Meanwhile, discussions in the developer community about task allocation between models and cost optimization are increasing.

Some views compare this phenomenon to China's manufacturing industry 30 years ago: at that time, China leveraged cost advantages to enter the assembly segment of the global electronics industry chain, giving rise to contract manufacturers like Foxconn and Luxshare Precision; today, Chinese large models are also using price advantages to enter the execution segment of the global AI industry chain. Some also view domestic large models as the "Foxconn of the AI era."

What role do domestic large models play in the AI industry chain? How high is the actual value of this role?

Price Advantage

A review by Economic Observer reporters of the official API pricing of various manufacturers as of the end of March 2026 revealed a huge price gap between mainstream large models from China and the U.S.

Taking input prices as an example, among Chinese models, DeepSeek V3.2 is $0.28 per million tokens, MiniMax M2.5 is $0.3, and Moonshot AI's Kimi K2.5 is $0.42. Among U.S. models, Anthropic's Claude Opus 4.6 is $5, and OpenAI's GPT-5.4 is $2.50. The input price of mainstream U.S. models is about 10 to 20 times that of mainstream Chinese models.

The gap in output prices is even more pronounced. For Chinese models, DeepSeek V3.2 is $0.42 per million tokens, MiniMax M2.5 is $1.1, and Moonshot AI's Kimi K2.5 is $2.2. For U.S. models, OpenAI's GPT-5.4 is $15, and Claude Opus 4.6 is $25. The output price gap between mainstream Chinese and U.S. models ranges from about 7 times to 60 times.

This price difference has always existed but did not trigger large-scale user migration previously for a simple reason: most people's primary use case for AI was chatting, where token consumption was low, and the price difference had minimal impact.

However, in early 2026, the emergence of a "lobster" changed all that. The open-source tool OpenClaw (referred to as "Lobster" by the developer community) quickly gained popularity around February 2026, soon topping OpenRouter's application rankings and consuming over 600 billion tokens in a single week. "Lobster" is an agent application. Unlike the past "question-and-answer" chat mode, it enables AI to autonomously perform tasks like programming, testing, and file management on a computer without step-by-step human intervention.

In this workflow, token consumption is on a completely different scale compared to chat scenarios.

For example, a programming task might require dozens of cycles of "write code -> run -> error -> modify -> run again," each cycle being a complete model call. To allow the agent to remember previous operations, each call also requires the conversation history.

Some developers have stated on social platforms that an active OpenClaw session context can easily expand to over 230,000 tokens. If using the Claude API throughout, the cost could range from $800 to $1500 per month. Some users reported that a misconfigured automated task burned through $200 in a single day.

Agent applications like OpenClaw have driven up the platform's overall token consumption. For instance, in the week of March 3 to 9, 2025, the total weekly call volume of the top ten models on OpenRouter was 1.24 trillion tokens. By the week of February 16 to 22, 2026, the weekly call volume of just the top ten models exceeded 8.7 trillion tokens, an increase of nearly 7 times. The proportion of programming tasks in the platform's token consumption also rose from 11% in early 2025 to over 50% by the end of 2025.

When the token consumption per task increased from thousands to hundreds of thousands, the price gap between Chinese and U.S. models transformed from a negligible cost into a significant difference of hundreds or even thousands of dollars per month.

Around February 19, 2026, U.S. large model company Anthropic updated its terms of service, prohibiting users from connecting Claude subscription account credentials to third-party tools like OpenClaw and requiring pay-as-you-go billing via API. Google subsequently imposed similar restrictions. For agent applications that require frequent API calls daily, the price factor in model selection became an unavoidable issue, pushing developers onto the pay-as-you-go track.

In the core programming scenarios for agents, the capabilities of Chinese and U.S. models are already quite close.

SWE-Bench Verified is a public evaluation of programming capabilities maintained by a research team at Princeton University. The method involves having AI models fix real code issues on GitHub (the world's largest open-source code hosting platform). According to data on the public leaderboard of this evaluation, the Chinese model MiniMax M2.5, released on February 13, 2026, scored 80.2%, while the U.S. model Claude Opus 4.6, released on February 5, scored 80.8%, a difference of only 0.6 percentage points.

With comparable capabilities but vastly different prices, developers' choices were quickly reflected in the data.

In the week of February 9 to 15, 2026, Chinese model token call volume reached 4.12 trillion, surpassing the U.S. models' 2.94 trillion for the first time. The following week, Chinese model call volume rose to 5.16 trillion, a 127% increase in three weeks. During the same period, U.S. model call volume dropped to 2.7 trillion.

Why can Chinese large models be so much cheaper than U.S. models?

Pan Helin, a member of the Expert Committee on Information and Communication Economy of the Ministry of Industry and Information Technology, told the Economic Observer that there are two main reasons: first, the scale of China's computing power infrastructure is large with high reuse rates, leading to lower quotes; second, there is a large amount of self-built computing power within Chinese computing clusters, acquired at lower costs than overseas.

Additionally, technical routes also affect costs. Some industry insiders told reporters that mainstream Chinese large models generally adopt the MoE architecture, also known as "Mixture of Experts." Simply put, although a MoE model has a large total parameter count, only a small portion of these parameters are activated to handle a task during each operation, rather than all parameters, which significantly reduces the computational load required for each inference.

Different Paths

Martin Casado, a partner at Silicon Valley venture capital firm a16z, stated at the end of 2025 that among AI startups using open-source technology stacks, about 80% use Chinese models. He later clarified on social media that this did not mean 80% of U.S. AI startups use Chinese models, but rather that among those choosing the open-source technology route (accounting for about 20% to 30% of all U.S. AI startups), about 80% use Chinese models.

Reporters noted that multiple open-source tools have appeared on GitHub to help developers optimize costs across different models. The general idea is to grade tasks by difficulty, assigning simple tasks to free or low-cost Chinese models and reserving complex tasks for expensive U.S. models.

One project named ClawRouter provided comparative data in its documentation, showing that after adopting this mixed approach, the average cost dropped from $25 per million tokens to about $2. Anthropic's product ClaudeCode also uses a similar hierarchical design in its official documentation, defaulting to the cheapest model for routine tasks.

The premise for this model to work is that Chinese models are sufficiently capable in execution tasks. In programming, the SWE-Bench data mentioned earlier illustrates this point. But beyond programming, how large is the overall capability gap between Chinese and U.S. large models?

LMSYS Chatbot Arena is one of the globally most recognized AI model evaluation platforms. Its method involves having real users trial two models simultaneously without knowing their names, then voting for the better one, equivalent to a blind taste test for AIs.

In its comprehensive rankings as of March 25, 2026, the top five positions were all held by U.S. company models. The highest-ranked Chinese model, DeepSeek V3.2 Speciale, was sixth. The gap is more pronounced in the Hard Prompts category (specifically designed to test a model's ability to handle complex reasoning and multi-step logic tasks), where the first tier is still primarily composed of U.S. models.

Close programming capabilities but a remaining gap in complex reasoning—this is the manifestation of the differentiated capabilities between Chinese and U.S. large models today and the foundation for the viability of the "layered calling" approach.

However, unlike being locked into low-profit-margin contract manufacturing 30 years ago, Chinese large model vendors have not continuously driven prices down.

In fact, the Chinese large model industry experienced a price war starting in 2024: In May 2024, ByteDance's Volcano Engine Doubao model triggered a "price war" with a price of 0.0008 yuan per thousand tokens, followed by Alibaba Cloud and Baidu Intelligent Cloud. In the nearly year that followed, the industry saw token prices drop by over 90%, with inference computing毛利率 (gross margin) for some vendors turning negative at times.

The strategy for vendors at the time was to accept losses to gain scale and cultivate user calling habits. However, after OpenClaw's popularity surge in February 2026, token consumption growth far exceeded expectations, and computing power supply tightened.

Zhipu was the first to react. It raised API pricing when releasing the new model GLM-5 on February 12, 2026, and raised prices again when releasing GLM-5-Turbo on March 16, with a cumulative increase of 83% over the two rounds.

Zhipu CEO Zhang Peng stated at the 2025 annual performance briefing that API call pricing increased by 83% in Q1 2026, while call volume grew by 400%. According to the annual report, Zhipu's full-year revenue for 2025 was 724.3 million yuan, a year-on-year increase of 132%, and the annual recurring revenue of its MaaS (Model-as-a-Service) platform was approximately 1.7 billion yuan, a 60-fold increase in 12 months.

Zhipu wasn't the only one choosing to raise prices. On March 13, 2026, Tencent Cloud adjusted pricing for its Hunyuan series large models, with some models seeing increases of over 460%. On March 18, Alibaba Cloud and Baidu Intelligent Cloud issued price adjustment announcements on the same day, with increases for AI computing power-related products ranging from 5% to 34%, effective April 18.

Li Bin, Senior Vice President of Sugon, told the Economic Observer in an interview that the evaluation metrics for computing power systems are changing. The past standard for measuring a system was its amount of computing power, but now it's about how economically it can produce tokens.

The shift from collective price cuts to collective price hikes took less than two years.

In March 2026, Liu Liehong, head of the National Data Bureau, announced a set of figures at the China Development Forum: China's daily token call volume has exceeded 140 trillion, an increase of over 1000 times compared to two years ago.

At the GTC conference the same month, NVIDIA founder Jensen Huang stated that tokens would be the most core commodity in the future digital world.

In Pan Helin's view, the competitiveness of Chinese large models is strong; they are not catching up but leading, especially on the AI application end. However, he also stated that China still has room for improvement in original innovation. The core architectures in the current AI system, from artificial neural networks to attention mechanisms, were first proposed overseas and then iterated upon domestically. The next step for Chinese large models is to continue efforts on the application end while also pursuing original innovation in basic algorithms.

The consumer electronics contract manufacturing industry 30 years ago had a characteristic: the profit margin of the assembly segment was firmly suppressed by upstream brand owners. Many leading contract manufacturers still have gross margins not exceeding 10% today. Cost advantages brought orders but did not bring pricing power.

Currently, the situation of Chinese large models seems somewhat similar to the consumer electronics contract manufacturing industry back then, but seems quite different regarding pricing power. For example, after Zhipu raised prices by 83%, call volume grew by 400%. Alibaba Cloud, Baidu Intelligent Cloud, and Tencent Cloud collectively raised prices for AI computing power and model services in March 2026; demand did not shrink, and call volume continued to grow.

On the SWE-Bench programming evaluation, the gap between top Chinese models and top U.S. models has narrowed to less than 1 percentage point. The gap in complex reasoning remains, but it is also narrowing rapidly.

This time, the development path for Chinese large model manufacturers seems to be different.

This article is from the WeChat public account "Economic Observer", author: Zheng Chenye

Трендовые криптовалюты

CitreaCTR

wrapped stUSDTWSTUSDT

Связанные с этим вопросы

QWhat percentage of AI model calls on OpenRouter came from Chinese models during the week of April 3, 2026?

ASix out of the top ten most called models on OpenRouter during the week of April 3, 2026, were from China, with Xiaomi's MiMo-V2-Pro ranking first with 4.82 trillion tokens.

QWhat is the main reason cited for the significant price difference between Chinese and American AI models?

AThe main reasons are China's large-scale, highly utilized computing infrastructure with lower pricing, the prevalence of self-built computing clusters with lower acquisition costs, and the widespread adoption of the MoE (Mixture of Experts) architecture which reduces computational requirements per task.

QWhat specific event in early 2026 triggered a massive shift in developer preference towards Chinese AI models?

AThe rise of the intelligent agent application 'OpenClaw' (also known as 'Lobster') in February 2026, which drastically increased token consumption for automated tasks like programming, making the large price gap between Chinese and American models a significant financial factor for developers.

QHow did Chinese AI model companies change their pricing strategy in response to surging demand in early 2026?

AAfter a previous price war, Chinese companies collectively shifted from cutting prices to raising them. For example, Zhipu raised its API prices by 83% over two adjustments, and other major providers like Alibaba Cloud, Baidu Cloud, and Tencent Cloud also announced significant price increases for their AI models and computing power.

QAccording to the SWE-Bench programming evaluation, how did the capabilities of top Chinese models compare to their American counterparts?

AAs of the data cited from February 2026, the gap was very small. The Chinese model MiniMax M2.5 scored 80.2% on the SWE-Bench benchmark, while the American model Claude Opus 4.6 scored 80.8%, a difference of only 0.6 percentage points.

Похожее

Акции цикличные или роста? Отчет Coinbase за Q2 раскрывает "разногласия в оценке"

Криптобиржа Coinbase опубликовала отчет за второй квартал 2026 года. Выручка компании составила 1,22 млрд долларов, что на 19% меньше, чем годом ранее, и ниже рыночных ожиданий (1,29 млрд долларов). Основной источник доходов — комиссии от розничных транзакций с криптовалютами — сократился на 30% в годовом исчислении, достигнув уровня 2023 года. Чистый убыток составил 359 млн долларов, что является третьим кварталом убытков подряд. Несмотря на заявление Coinbase о рекордной доле рынка криптотрейдинга (10,3%), этот рост в основном обеспечен новыми продуктами, такими как деривативы, прогнозные рынки и токенизированные акции. Традиционный спотовый рынок для розничных клиентов продолжает сокращаться. В отчете также отмечаются положительные тенденции. Доходы от подписок и услуг, включая стабильные монеты, выросли и теперь почти равны доходам от транзакций. Компания делает ставку на диверсификацию: развивает платформу стабильных монет (лидер по хранению USDC), расширяет предложение деривативов после приобретения Deribit и укрепляет лидерство в области ончейн-агентской экономики через свою сеть Base и протокол x402. Таким образом, оценка Coinbase зависит от перспективы инвестора. Как циклическая акция, она страдает от медвежьего рынка. Как акция роста, она может быть недооценена благодаря потенциалу новых направлений, таких как агентская экономика, которую Coinbase оценивает как многотриллионный рынок к 2030 году.

Odaily星球日报7 мин. назад

Акции цикличные или роста? Отчет Coinbase за Q2 раскрывает "разногласия в оценке"

Odaily星球日报7 мин. назад

Утренний отчет с Уолл-Стрит: Microsoft развеяла опасения по поводу «AI cash burn», технологические акции совершили мощный отскок, ETF на память DRAM и индекс PHLX Semiconductor взлетели на 17% и 8%

Уолл-Стрит пережила мощный отскок после предыдущих распродаж. Ключевым фактором стал отчет Microsoft, который превзошел ожидания и развеял опасения по поводу затрат на ИИ. Акции Microsoft выросли на 15,51%, а капитализация компании увеличилась на рекордные 450 млрд долларов за день. Рост облачного сервиса Azure на 43% и оптимистичные прогнозы по денежным потокам восстановили доверие инвесторов к технологическому сектору. Индекс Nasdaq вырос на 2,78%, а индекс Philadelphia Semiconductor Index (SOX) взлетел на 8,19%. ETF Roundhill Memory, отслеживающий сектор памяти, подскочил почти на 17% на фоне ожиданий роста спроса на чипы для центров обработки данных ИИ. Однако широта рынка оставалась слабой: более 70% акций в S&P 500 закрылись в минусе, что указывает на концентрацию ралли в нескольких крупных технологических компаниях. Акции Meta упали на 7,95% из-за опасений по поводу больших капитальных затрат. На валютном рынке иена и вона значительно укрепились на фоне подозрений о валютных интервенциях Японии и Южной Кореи. Нефть слегка снизилась перед предстоящей встречей ОПЕК+, где ожидается решение о возможном увеличении добычи. Ближайшие события для наблюдения: срок выполнения правительством США рамок регулирования ИИ (1 августа), данные об экспорте Южной Кореи (1 августа) и встреча ОПЕК+ (2 августа).

marsbit13 мин. назад

Утренний отчет с Уолл-Стрит: Microsoft развеяла опасения по поводу «AI cash burn», технологические акции совершили мощный отскок, ETF на память DRAM и индекс PHLX Semiconductor взлетели на 17% и 8%

marsbit13 мин. назад

Исследование глобальной доли рынка: японские компании лидируют в области полупроводниковых материалов

Обзор доли рынка показывает, что японские компании сохраняют лидирующие позиции в секторе материалов для полупроводников. В 2025 году Shin-Etsu Chemical занимает первое место на рынке кремниевых подложек с долей 26,3%, а второй японский производитель, SUMCO, увеличивает свою долю. Три японские компании доминируют на рынке фоторезистов, занимая 60,5% доли. Однако в ключевых областях полупроводников, таких как память DRAM и NAND, а также GPU, лидерами являются южнокорейские (SK Hynix, Samsung) и американские (Micron) компании. Китайская компания CXMT удвоила свою долю на рынке DRAM. На фоне прогнозируемого резкого роста мирового рынка полупроводников японским производителям материалов для сохранения своих позиций необходимо осуществлять масштабные инвестиции, аналогичные инвестициям крупных производителей чипов. В автомобильной промышленности, являющейся опорой для Японии, наблюдается застой. Toyota сохранила первое место, но с незначительным ростом доли. Японские компании не входят в топ-5 на рынке электромобилей, где доминируют BYD и Tesla. В судостроении японская Imabari Shipbuilding поднялась на третье место в мире благодаря выполнению заказов на крупные контейнеровозы. Правительство Японии поставило цель удвоить объемы строительства к 2035 году, однако отрасль сталкивается с проблемой нехватки рабочей силы, а лидерами рынка остаются китайские и южнокорейские компании.

marsbit32 мин. назад

Исследование глобальной доли рынка: японские компании лидируют в области полупроводниковых материалов

marsbit32 мин. назад

25-летний вундеркинд OpenAI, который погубил глобальных инвесторов

Ведущий хедж-фонд Situational Awareness, управляемый 25-летним Леопольдом Ашенбреннером, бывшим сотрудником OpenAI, потерпел крах, потеряв миллиарды долларов. Фонд, достигший пика в $45 млрд и показавший доходность 439%, совершил ошибку, сделав чрезмерно leveraged-ставки на акции компаний, связанных с искусственным интеллектом, такие как SK Hynix и Micron. В июле 2026 года резкое падение цен на эти акции, вызванное в том числе массовыми продажами розничных инвесторов в Южной Корее, привело к маржин-коллам и принудительной ликвидации позиций. Ирония ситуации в том, что сразу после распродажи активов фонда по заниженным ценам (часть из них выкупила Citadel Кена Гриффина) эти же акции резко пошли вверх. История Ашенбреннера, автора популярного эссе об ИИ, и его фонда стала очередным примером на Уолл-стрит, когда верный долгосрочный прогноз терпит неудачу из-за неправильного выбора времени и чрезмерного использования кредитного плеча, что привело к огромным убыткам для него и последовавших за ним инвесторов.

marsbit57 мин. назад

25-летний вундеркинд OpenAI, который погубил глобальных инвесторов

marsbit57 мин. назад

STRC после деанкоринга: первая финансовая отчетность. Как Strategy намерена восстановить капитальное "колесо"?

Корпорация MicroStrategy опубликовала финансовый отчет за второй квартал 2026 года. Несмотря на рост выручки, компания зафиксировала чистый убыток в размере 8,22 млрд долларов США, в основном из-за нереализованных убытков от колебаний цены биткоина. В отчете основное внимание уделяется способности компании восстановить свою модель финансирования после дестабилизации своей ключевой приоритетной акции STRC, которая в настоящее время торгуется со скидкой к своей целевой номинальной стоимости. MicroStrategy продолжает придерживаться своей основной стратегии, увеличивая запасы биткоинов до 843 775 BTC. Однако подход к управлению капиталом изменился: от пассивного накопления к активному управлению, включая программу монетизации BTC для укрепления ликвидности. Главной задачей компании является восстановление стоимости STRC до номинала. Руководство исключило выпуск акций со скидкой и планирует использовать денежные резервы (3,75 млрд долларов США) и программу обратного выкупа акций на 10 млрд долларов для поддержки цены. Целевой срок восстановления — около 70 торговых дней. В будущем успех MicroStrategy зависит от двух ключевых факторов: краткосрочного восстановления STRC до номинальной стоимости и долгосрочного роста цены биткоина, который лежит в основе всей ее бизнес-модели.

marsbit1 ч. назад

STRC после деанкоринга: первая финансовая отчетность. Как Strategy намерена восстановить капитальное "колесо"?

marsbit1 ч. назад

Торговля

Спот

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на S (S) представлены ниже.

Chinese Large Models: This Time, the Script Is Different

Введение

Price Advantage

Different Paths

Трендовые криптовалюты

Связанные с этим вопросы

Похожее

Акции цикличные или роста? Отчет Coinbase за Q2 раскрывает "разногласия в оценке"

Исследование глобальной доли рынка: японские компании лидируют в области полупроводниковых материалов

25-летний вундеркинд OpenAI, который погубил глобальных инвесторов

STRC после деанкоринга: первая финансовая отчетность. Как Strategy намерена восстановить капитальное "колесо"?

Торговля

Популярные статьи

Как купить S

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

Обсуждения

Топ вопросы

Популярные категории

Популярные теги