The Computing Power Dilemma in the Sino-US AI Rivalry

marsbitPublished on 2026-06-22Last updated on 2026-06-22

Abstract

The Sino-US AI rivalry faces a fundamental bottleneck: the widening compute power gap. While Chinese AI chip companies have seen investment surges, their current focus remains largely on the less demanding inference market. The real challenge lies in the high-end training chip sector, crucial for developing cutting-edge large language models (LLMs), where Nvidia holds a near-monopoly. The compute disparity is stark. US tech giants like Meta, Google, and xAI command massive GPU clusters, enabling them to train trillion-parameter models rapidly. Estimates suggest US data center count and total compute capacity significantly outstrip China's. This "brute force" advantage allows for faster model iteration and exploration of larger parameter scales, with top US models reportedly leading their Chinese counterparts by 8 to 15 months. Chinese alternatives, such as Huawei's Ascend and others from companies like Moore Thread and Biren, are emerging. They show promise in inference and some training scenarios, closing the performance gap with mid-range Nvidia products. However, the core hurdle extends beyond raw chip performance to the entrenched software ecosystem, exemplified by Nvidia's CUDA platform. The path forward involves "walking on two legs": navigating import restrictions while heavily investing in the domestic chip industry. Though still in a catch-up phase, China's vast market, talent pool, and capital are fostering progress. The ultimate test is whether Chinese firms can...

The Constraint of Computing Power

Since the end of last year, domestic GPU companies such as Moore Threads, MetaX, Biren Technology, and Tianshu Zhixin have ignited a capital frenzy. However, beneath the wealth feast in the secondary market, an undeniable undercurrent is becoming increasingly clear, and the problems it triggers are becoming more urgent.

Over the past few years, domestic AI chips have primarily focused on the relatively safe and more peripheral "inference side." For instance, Doubao recently planned a massive purchase of 50,000 Tianshu Zhixin chips for inference computing tasks to meet the frequent calls from China's largest AI app terminal.

In the top tier of the computing power pyramid—AI training—domestic chips can currently only participate in peripheral "support" tasks.

AI training chips are primarily used for training artificial intelligence models, involving massive matrix operations and parameter adjustments. Therefore, they require powerful computing capabilities and high energy efficiency ratios, boasting higher performance but also a significantly higher price, such as NVIDIA's A100, H100, H200, and AMD's MI300 series.

In contrast, the task of inference chips is much lighter. Used in the deployment stage after model training is completed, they are mainly responsible for executing model inference tasks, which require high real-time performance. Inference chips need to ensure rapid response and low power consumption while maintaining accuracy.

An apt analogy is that training is about making an AI model "learn knowledge," while inference is about making a large model "apply knowledge." During the learning phase, training chips must invoke massive amounts of data to "feed" dynamic updates involving billions, trillions, or even tens of trillions of parameters. They not only need robust computing power but also efficient bandwidth and communication capabilities, as well as ensuring stability in clusters with tens of thousands of cards.

The root of the Sino-US model gap lies precisely in these "invisible areas," especially the absence of high-end training chips.

Under the scaling law of large models, as model parameters increase, computing power demands grow linearly. The exponentially expanding computing power and hardware costs make training large models an "exclusive game" for a very few tech giants.

Among US tech giants, Meta alone plans to deploy over 1.2 million high-end GPUs by the end of 2026, with annual investments exceeding $145 billion; another estimate suggests Google's total AI computing power is equivalent to 5 million NVIDIA H100s, accounting for one-quarter of the global total for a single company.

The capital expenditures of Amazon, Microsoft, Alphabet, and Meta this year amount to a staggering $725 billion, a sharp 77% increase year-on-year. This scale is equivalent to 13% of the total annual private domestic investment in the United States. Morgan Stanley further predicts that by 2027, US tech companies' capital expenditures could reach a record $1.1 trillion.

The US currently controls over 70% of the world's high-end GPUs. After the chip export bans, the high-end chips available domestically in China are only 1/8th of those in the US. The Stanford AI Index Report 2026 points out that the number of data centers in the US (5,427) is over 10 times that of China.

According to calculations by the China Academy of Information and Communications Technology (CAICT), as of early 2025, the US computing power scale was 2400 EFLOPS, while China's was 1053 EFLOPS, making the US's scale more than double that of China.

The computing power scale held by each of the aforementioned four tech giants individually has already surpassed the sum of all Chinese AI companies.

This overwhelming computing power advantage allows US companies to complete over a dozen rounds of large model iteration experiments within a year.

Elon Musk is even more extravagant. His xAI boasts the Colossus 2, claimed to be the world's "first GW-class AI cluster." Therefore, he confidently announced that he is simultaneously training 7 models—two with 1 trillion, two with 1.5 trillion, one with 6 trillion, and one with 10 trillion parameters. This kind of "brute force aesthetics" is only possible with an extreme abundance of computing power.

Meanwhile, due to US restrictions on chip exports, the share of high-end AI chips acquired by Chinese companies in recent shipments has been continuously declining (according to epoch.AI statistics).

It is no exaggeration to say that the huge gap in the computing power foundation will keep Chinese AI in a catching-up phase for a long time and will make the process for domestic large models to catch up with their US counterparts even more difficult.

The Gap Between Generations

"The pace of Chinese innovation is unstoppable," "Anyone who thinks China cannot make (chips) is truly mistaken. The gap between China and the US is only at the nanosecond level."

NVIDIA founder Jensen Huang has praised the progress of China's semiconductor industry on multiple public occasions.

Elon Musk also frequently expresses similar views on X—"China will definitely solve the chip bottleneck issue; in the field of artificial intelligence computing power, it will far surpass all other countries globally," "China will win the AI race on Earth."

Such lavish praise from globally renowned tech leaders about China's AI development can easily be taken at face value. These remarks are clearly suspect of being flattery aimed at setting unrealistic expectations. Some US media continuously propagate the narrative that the gap between Chinese and US models is minimal, attempting to obscure facts and cover up certain objective truths.

In this regard, all domestic AI-related fields should maintain a clear and calm perspective.

If it is said that China's advanced large models are not very different from their US counterparts when solving standardized problems, the gap becomes more apparent in complex industrial and enterprise environments.

Compared to cutting-edge models from US companies like Anthropic, China still belongs to the catching-up camp. US CAISI assessments suggest that China's strongest model, DeepSeek V4 Pro, lags behind the US cutting-edge by about 8 months.

Kai-Fu Lee recently stated in an interview with The Wall Street Journal that, using top US models like Anthropic's Claude Fable 5 as benchmarks, the US currently leads China by about 15 months.

Large models follow the scaling law: the larger the model parameters, the more training data, and the greater the computing power invested, the better the model's performance. Currently, the most cutting-edge US large models have entered the era of tens of trillions of parameters, and the iteration speed is still accelerating.

Anthropic's most powerful model, Mythos, has reached 10 trillion parameters, costing $10 billion to train; xAI's Colossus 2 is simultaneously training 7 models, including 6-trillion and 10-trillion parameter models; OpenAI's iteration cycle for a 4-trillion parameter model is just one month.

China's strongest model, DeepSeek V4 Pro, has a total parameter count of 1.6 trillion, about 6 times less than the US cutting-edge tens-of-trillions-level models.

Anthropic's Claude series has already been widely recognized as the strongest AI programming large model in recent years. Mythos has once again refreshed public perception, with its performance being even more powerful than the previous flagship, Oups 4.6.

OpenBSD is reputed in the industry for having the most secure system. Yet, Mythos found a vulnerability that had gone undetected for 27 years. It also discovered vulnerabilities in FFmpeg and the Linux kernel that had been unnoticed for years or even over a decade, and it did so completely autonomously, without human assistance.

It is important to note that a model's "pre-training" determines the upper limit of its capabilities. It is impossible to fine-tune a trillion-parameter level model through "post-training" to reach the capability level of a 10-trillion parameter model. And the decisive factor in pre-training is high-end computing power chips, which determine the parameter scale and training iteration speed.

Liu Qingfeng, Chairman of iFlytek, frankly admitted that currently, all top large model companies, especially US giants, are building ultra-large-scale computing power platforms. Domestic computing power indeed faces a painful period, leading to limitations encountered when training on ultra-long contexts.

Thus, the computing power gap is the root cause of the difference between Chinese and US models.

The Rise of Domestic Chips

One company monopolizes 90% of the global high-end AI training chip market—this has helped NVIDIA maintain its throne as the world's largest company by market capitalization. Its total market value once exceeded Germany's 2025 GDP, the world's third-largest economy.

Data from TrendForce shows that in Q1 2026, NVIDIA alone accounted for 68% of the global GPU server market, AMD held 5%-6%, while domestic GPU manufacturers collectively accounted for less than 4%.

Leveraging first-mover advantage, formidable technical barriers, high-speed interconnects, software ecosystem, and ties to TSMC's advanced processes, NVIDIA dominates the world. In high-end training scenarios, NVIDIA's GB300 outperforms AMD's MI325, as well as Cambricon's Siyuan 690 and Moore Threads' MTT40. Especially in trillion-parameter large model training, it outperforms competitors by over 30%.

Under the export bans, Jensen Huang has previously stated that NVIDIA's market share (new) in China has essentially dropped to zero, leaving only the existing stock market. Supported by domestic substitution policies, companies including Huawei's Ascend 910, Hygon's DCU ShenSuan 2, Cambricon's Siyuan 370/590, as well as Moore Threads, MetaX, and others have emerged.

Among them, the Ascend 910 is Huawei's strongest computing power chip. The Ascend 910B's computing power reaches 640 TOPS (INT8), comparable to NVIDIA's A100 chip.

At the absolute performance level, although domestic GPUs still have a gap, they can start from inference and edge scenarios. Currently, domestic GPUs basically meet the general inference needs of domestic government and enterprise sectors. The gap with NVIDIA's mid-range products has narrowed to 15%-20%, making substitution feasible.

It is particularly important to note that while computing power performance is crucial, the underlying technical software ecosystem is the Achilles' heel of domestic GPUs. Just as CUDA is the foundation of NVIDIA's GPU empire, Chinese Academy of Engineering academician Zheng Weimin pointed out that the core issue with domestic AI chips is the insufficiently developed ecosystem. If the ecosystem were good, even with 60% of the performance, there would be users.

It can be said that the software ecosystem is the hardest barrier in the GPU赛道, and in this regard, NVIDIA's capabilities are equally difficult to replace.

The CUDA ecosystem, cultivated over more than a decade, now boasts over 4 million developers, hundreds of thousands of open-source models, and a full range of third-party toolchains, covering AI training, inference, graphics rendering, and scientific computing. Its ecosystem barrier is formidable and unparalleled.

IDC data shows that currently over 95% of global AI models are developed based on the CUDA ecosystem. While domestic GPUs rely on policy support, they need long-term collaboration with the industry chain and require sufficient patience from media, public opinion, and the capital market.

In January this year, Zhipu AI, in collaboration with Huawei, open-sourced the new-generation image generation model GLM-Image. This model was developed based on Huawei's Ascend Atlas 800T A2 equipment and the MindSpore AI framework, achieving a full-process closed loop from data processing to model training. It is the first SOTA multimodal model trained entirely on domestic chips.

Moore Threads also collaborated with the Beijing Academy of Artificial Intelligence (BAAI). Based on the MTT S5000 intelligent computing cluster and the FlagOS-Robo framework, they completed the full-process training of BAAI's self-developed embodied brain model, RoboBrain 2.5. This achievement marks the first verification of the usability of domestic computing power clusters in training embodied intelligence large models.

It can be seen that domestic GPUs have made breakthroughs in compatibility and ecosystem building, and are moving from "single-point breakthroughs" on the inference side to "gradual adaptation" on the training side. This already represents significant progress.

Summary

Overall, against the backdrop of obstacles to importing advanced foreign chips, it is advisable to "combine Chinese and Western approaches" and walk on two legs. Simultaneously, focus should be on supporting domestic computing power chips to meet urgent market demands.

The authenticity of the demand is undeniable. The "bubble theory" still exists, but its voice is not growing louder. The global market's enthusiasm for AI construction has already surpassed the early development journey of any previous industry.

Since the beginning of this year, the global capital market has once again ignited a super AI cycle. Stock prices of Samsung, SK Hynix, Broadcom, and TSMC have repeatedly hit new highs. In the domestic market, hard-tech companies represented by Cambricon have seen strong gains, and the optical module giant Zhongji Innolight's market capitalization once surpassed that of Kweichow Moutai.

Looking back at the history of South Korea's semiconductor industry, South Korea supported its memory chip industry with a national effort, endured the darkest moments, and ultimately defeated Japan to become the absolute world leader in the memory industry.

Whether it's memory chips, mobile phone chips, or even current AI chips, China is still in a catching-up stage. This is by no means an overnight achievement. However, with a huge market, continuously emerging AI talent, and massive capital strength, domestic GPUs have begun to demonstrate certain adaptability and can solve the real needs of many AI enterprises.

In this AI rivalry concerning national destinies, China and the United States are both rivals and possess technologies, markets, and resources that the other needs.

This article is from the WeChat public account: 巨潮WAVE , Editor: Yang Xuran, Author: Xie Zefeng, Original Title: "The Computing Power Dilemma in the Sino-US AI Rivalry | 巨潮"

Related Questions

QAccording to the article, what is the fundamental reason for the performance gap between Chinese and American AI models?

AThe fundamental reason is the significant disparity in computing power, particularly the lack of high-end training chips (GPUs) in China, which are crucial for training large-scale AI models following the Scaling Law.

QWhat are the two main application sides of AI chips mentioned in the article, and how do they differ in task difficulty?

AThe two main sides are training and inference. Training chips are for the computationally intensive 'learning' phase of AI models and are at the top of the computing pyramid. Inference chips are for the 'application' phase after training, handling real-time tasks, which are considered relatively lighter.

QBesides raw chip performance, what does the article identify as the biggest weakness (soft rib) for domestic Chinese GPUs?

AThe biggest weakness is the software ecosystem. The article points out that NVIDIA's CUDA ecosystem, with millions of developers and tools, is a formidable barrier. Domestic GPUs lack a similarly mature and comprehensive software environment.

QWhat specific example does the article give to demonstrate recent progress in domestic Chinese AI chip adaptation and ecosystem building?

AThe article mentions two examples: 1) Zhipu AI and Huawei co-developed the GLM-Image model using Huawei's Ascend chips and MindSpore framework. 2) Moore Thread and the Beijing Academy of Artificial Intelligence completed the full training of the RoboBrain 2.5 model on Moore Thread's MTT S5000 cluster, validating the usability of domestic computing clusters for embodied AI model training.

QWhat is the article's suggested strategic approach for China to address the computing power challenge under export restrictions?

AThe article suggests a dual-track or 'walking on two legs' approach: simultaneously seeking access to advanced foreign chips where possible while focusing on supporting and nurturing the domestic computing chip industry to meet urgent market demands.

Related Reads

Two Giants' Credit Expansion: Loan Balances of $9.9 Billion vs. $14.6 Billion, Brazil Emerges as the Main Battlefield

Title: Two Giants "Credit" Surge: Loan Balances of 99 Billion vs. 146 Billion USD, Brazil Emerges as Main Battlefield Summary: The article compares the rapid expansion of credit businesses by two major e-commerce and fintech players, Sea (via Monee) and Mercado Libre (via Mercado Pago), in overseas markets like Southeast Asia and Latin America, contrasting with a slowing domestic Chinese credit market. Using Q1 2026 financial data, it highlights their significant growth. Sea's Monee reached a loan balance of $99 billion, up 71% year-over-year (YoY), contributing 17.5% to Sea's total revenue. Mercado Pago's loan balance hit $146 billion, up 87% YoY, contributing 45% to its parent company's revenue. Both maintained stable risk metrics (e.g., Monee's 90+ day NPL at 1.1%) despite rapid scaling. Brazil is identified as a key and accelerating growth market for both. Sea's Brazilian operations saw loan volumes exceed $10 billion, growing 250% YoY, with SPayLater GMV penetration still low (~10%) indicating high potential. Sea also secured a key Brazilian financial credit license (SCFI). Mercado Libre's Brazil segment contributed over half (54%) of total group revenue, with its credit business there generating $11.24 billion in revenue, up 89% YoY and accounting for 12.7% of global revenue. Mercado Pago's credit portfolio, especially credit cards (46% of loans, +105% YoY), is a strategic focus, described as crucial as building logistics was a decade ago. Its net interest margin after loss (NIMAL) remains high at 17.8%. The article concludes that while Brazil presents immense opportunities, the success is largely driven by these integrated "e-commerce + fintech" giants with proprietary transaction data and ecosystems, making it challenging for standalone fintech players to compete effectively.

链捕手14m ago

Two Giants' Credit Expansion: Loan Balances of $9.9 Billion vs. $14.6 Billion, Brazil Emerges as the Main Battlefield

链捕手14m ago

Research Report Analysis: Is Intel Making a Comeback with Apple? Bernstein's Calculations Show the Right Direction, but the Price Is Already Overvalued

Bernstein analyst Stacy A. Rasgon published a report on June 18 regarding Intel, assessing the potential impact of recent political support for a US-based PC chip design and manufacturing collaboration between Apple and Intel. The report views this as a significant signal for the foundry landscape shift but concludes the initial financial contribution would be minimal. Key conclusions: 1) An Apple deal is seen as a small-scale "proof of concept." Even if Intel wins 40% of Apple's premium notebook chip orders (~5 million units/year), Bernstein estimates it would generate only about $500M in annual revenue and ~$0.03 EPS, negligible against Intel's ~$55B revenue. 2) Political encouragement is not equivalent to enforceable mandates. Winning orders ultimately depends on Intel demonstrating competitive technology (like its 18A node), cost, and reliable supply. 3) The path from validation to large-scale production involves significant challenges, capital investment, and time. Due to these uncertainties, Bernstein maintains a Market-Perform (Hold) rating with a $100 price target, implying potential downside from the ~$121.10 price at the report date. The analysis highlights the tension between near-term validation value—serving as a crucial trust signal for Intel's foundry ambitions and US supply chain resilience—and the long-term opportunity to attract larger cloud and AI chip customers. The investment thesis hinges on successful 18A execution and sustained policy support, not on immediate financial gains from Apple.

marsbit39m ago

Research Report Analysis: Is Intel Making a Comeback with Apple? Bernstein's Calculations Show the Right Direction, but the Price Is Already Overvalued

marsbit39m ago

27-Year Reign Ends: SK Hynix Market Cap Surpasses Samsung for First Time, an AI-Driven Reshuffle of Korean Chip Power

On June 22, 2026, SK Hynix made history by surpassing Samsung Electronics in market capitalization, ending Samsung's 27-year reign as South Korea's most valuable company. This dramatic reversal is powered by the AI boom and SK Hynix's dominant position in High Bandwidth Memory (HBM), a critical component for AI model training. Once a heavily indebted firm on the brink of bankruptcy, SK Hynix bet early on HBM, which has evolved from a niche product to essential AI infrastructure. It now commands a 59% share of the global HBM market. Its financial performance is staggering, with Q1 2026 net profit soaring nearly fourfold year-over-year to KRW 40.35 trillion, translating to over 2 billion RMB in daily net profit. HBM now drives roughly 40% of its revenue with exceptionally high margins. In contrast, Samsung, with its broad portfolio spanning memory chips, smartphones, and foundry services, has lagged in the HBM race while facing headwinds in other divisions. This shift signifies a deeper restructuring of South Korea's economy, moving from consumer electronics to AI-driven growth. However, the future remains competitive. With major capacity expansions planned industry-wide by 2028 and Samsung aiming to catch up in HBM technology, the new market leader cannot afford complacency. This event marks a pivotal moment in the global semiconductor industry's ongoing power realignment.

marsbit50m ago

27-Year Reign Ends: SK Hynix Market Cap Surpasses Samsung for First Time, an AI-Driven Reshuffle of Korean Chip Power

marsbit50m ago

Trading

Spot
Futures
活动图片