Autonomy or Compatibility: The Choice Facing China's AI Ecosystem Behind the Delay of DeepSeek V4

marsbitPubblicato 2026-04-21Pubblicato ultima volta 2026-04-21

Introduzione

DeepSeek V4's repeated delay in early 2026 has sparked global discussions on "de-CUDA-ization" in AI. The highly anticipated trillion-parameter open-source model is undergoing deep adaptation to Huawei’s Ascend chips using the CANN framework, representing China’s first systematic attempt to run a core AI model outside the CUDA ecosystem. This shift, however, comes with significant engineering challenges. While the model uses a MoE architecture to reduce computational load, it places extreme demands on memory bandwidth, chip interconnects, and system scheduling—areas where NVIDIA’s mature CUDA ecosystem currently excels. Migrating to Ascend introduces complexities in hardware topology, communication latency, and software optimization due to CANN’s relative immaturity compared to CUDA. The move highlights a broader strategic dilemma: short-term compatibility with CUDA offers practical benefits and faster adoption, as seen in CANN’s efforts to emulate CUDA interfaces. Yet, long-term over-reliance on compatibility risks inheriting CUDA’s limitations and stifling native innovation. If global AI shifts away from transformer-based architectures, strict compatibility could lead to technological obsolescence. Despite these challenges, DeepSeek V4’s eventual release could demonstrate the viability of a full domestic AI stack and accelerate CANN’s ecosystem growth. However, true technological independence will require building an original software-hardware paradigm beyond compatibili...

By Sun Yongjie

Entering 2026, the release window for DeepSeek V4 has been repeatedly postponed, unexpectedly igniting global discussions within the AI community about "de-CUDA-ization." According to reports from multiple media outlets, this open-source multimodal model, expected to have a parameter scale of trillions and support for million-token context, is being fully adapted for Huawei's Ascend chips, with core code rewritten through the CANN framework.

If this eventually becomes reality, it will mark the first time China's AI system has systematically explored the possibility of carrying core model capabilities on a non-CUDA platform in a real production environment. In other words, this is not just the release of a model but more like a "stress test" of the underlying technical route.

However, as DeepSeek founder Liang Wenfeng emphasized in internal communications, this is only "the first step of a long march." The future holds both risks and opportunities, and the balance—or even trade-off—between compatibility and self-reliance will determine whether China's AI can truly forge its own path of development.

DeepSeek V4 Delay: The Inevitable Cost of Transitioning the Foundational AI Computing Platform

As mentioned, the V4, originally planned for release during the Chinese New Year or in February–March of this year, has repeatedly missed its window. As of early April, relevant media confirmed a release "within weeks." The reason lies in the deep adaptation of the inference side to Huawei's Ascend chips. However, this path is far more complex than imagined. To understand this complexity, we must first return to the technical characteristics of DeepSeek V4 itself.

As is well known, entering 2026, large model parameter scales have crossed the "trillion" threshold and are moving toward tens of trillions. Against this backdrop, although V4 adopts a more aggressive MoE (Mixture of Experts) architecture, theoretically reducing single-inference computational load by "activating experts on demand," the trade-off is that it places more extreme demands on system capabilities, including memory bandwidth, inter-chip connectivity, and KV Cache management.

In other words, the computational pressure shifts from "pure computation" to "system scheduling and communication." Within the NVIDIA ecosystem, this set of problems has relatively mature solutions.

For example, based on H100 or B200, high-bandwidth interconnects built via NVLink and NVSwitch can achieve TB/s-level bandwidth between GPUs within a single node, forming a near "fully connected" computational network where data flows between chips like on a highway, with latency and synchronization costs greatly compressed. However, when DeepSeek attempts to migrate this sophisticated system to the Huawei Ascend platform, it faces a completely different hardware topology.

It is undeniable that Ascend chips have made significant progress in recent years, but there remains a physical-layer gap with NVIDIA in terms of "full connectivity capability" in ultra-large-scale clusters. For instance, constrained by process technology and SerDes IP capabilities, Ascend relies more on optical modules for cross-node expansion. This "trading space for bandwidth" solution, while feasible, introduces longer physical links, thereby bringing complexities such as signal latency, synchronization overhead, and power and thermal management.

At the same time, the gap at the software level is equally non-negligible. The maturity of Huawei's CANN framework in areas such as operator coverage, automatic parallelism, kernel fusion, and distributed communication scheduling still lags behind the CUDA ecosystem overall. This means that DeepSeek's engineering team needs to perform targeted optimizations on a large number of underlying details, even manually rewriting key operators.

More棘手的是, this lag is often not linear but systemic. Specifically, the performance drop of one operator can affect the entire computational chain; a reduction in communication efficiency can cause significant fluctuations in overall throughput. The end result may be that the model can still run, but it is still a long way from being stable, efficient, and scalable.

From this perspective, the delay of DeepSeek V4 is not simply a product timing issue but an inevitable cost of the deep磨合 between China's top algorithm teams and the domestic chip system. Although the process is difficult, it is highly significant.

More importantly, this process sends a clear signal that AI competition is shifting from a "model capability contest" to a "systems engineering capability contest." In this phase, whoever can "run the model, run it stably, and run it cheaply" faster will truly approach an industrial-level advantage.

CUDA Monopoly Hard to Break, CANN's Reluctant Compromise

If the aforementioned adaptation difficulties of DeepSeek V4 on the inference side reveal the practical bottlenecks at the engineering level, then following this question further, a more fundamental question emerges: Why has migrating a model from one computing platform to another become so difficult?

Looking back at the Wintel alliance of the PC era, although Microsoft and Intel jointly held a monopoly, there was an interest game between the two companies, which预留ed space for the rise of Linux, AMD, and even Apple's system later. However, NVIDIA has established a "monolithic vertical monopoly" in the AI field, essentially the combined entity of Microsoft and Intel.

Specifically, at the hardware level, NVIDIA defined the physical structure of the SM (Streaming Multiprocessor) and the computational logic of the Tensor Core; at the software level, CUDA provides closed-source libraries like cuBLAS and cuDNN that are perfectly 1:1 matched with it. The叠加 of these two aspects has led to an extremely daunting reality: over 6 million developers worldwide optimize algorithms around cuBLAS, cuDNN, NVLink/NVSwitch; frameworks (PyTorch, TensorFlow) prioritize CUDA implementations; and even "anti-NVIDIA" heterogeneous clusters like AWS Trainium + Cerebras WSE still require NVIDIA NIXL software and AWS EFA for KV cache migration.

This shows that it is no longer a single-point technical detail but an ecosystem lock-in. Before model portability fails, developers' habit of "thinking in the language of NVIDIA hardware features" has become惯性. It is precisely this ecosystem惯性 that makes NVIDIA like a huge black hole, absorbing over 90% of global innovation红利.

Against this background, as its strongest competitor, Huawei's CANN initially attempted to follow a relatively independent path. However, with the advent of the large model era, this path gradually revealed problems, such as developers' reluctance to migrate, enterprises' fear of taking risks, and slow ecosystem growth. Coupled with the pressure of time (e.g., the rapid iteration of large models), the path of complete self-reliance began to become unrealistic.

Based on this, CANN gradually introduced abstraction layer designs similar to CUDA. For example, in CANN Next, it attempted to对标 cuBLAS and cuDNN interfaces, achieving high比例 compatibility, reducing model migration costs from "weeks or even months" to "hours." At the architectural level, the recently released 950PR heterogeneous architecture (prefill/decode decoupling) also deliberately imitates NVIDIA's decoupled service approach, rather than Google TPU's彻底 heterogeneous route.

We must admit that this "compatibility first" strategy is successful in the short term. It lowers the threshold, allowing Ascend to quickly gain an application foundation in the domestic market, and enables companies like DeepSeek, Tencent, and ByteDance to尝试 domestic computing power with a relatively low barrier to entry. For example, CANN Next achieves over 95% CUDA compatibility through the SIMT programming model, already helping many enterprises大幅 shorten migration time to the hour level, accelerating practical deployment.

But the随之而来的 challenge is that once it involves cutting-edge innovation, the compatibility layer becomes a "ceiling."

For instance, when developers深入 use the Ascend platform, they find that although common paths have been smoothed, once they involve some niche, innovative underlying operators, CANN's support drops, and performance jitters剧烈. The difficulties DeepSeek V4 encountered during adaptation, such as when trying to introduce hybrid architectures like SSM (State Space Model) or Mamba (non-Transformer structures), and finding that CANN's underlying optimization is still mainly倾斜 towards matrix multiplication (GEMM), are largely because they hit the "boundary" of CANN's compatibility layer when attempting algorithm optimizations that go beyond the conventional.

The deeper issue is that once compatibility is chosen, it means默认 CUDA remains the隐形 standard. You can replace the hardware, but in terms of software semantics and development paradigms, you are still following the rules defined by the other party. This is both a shortcut and a limitation.

Compatibility Hides Risks and Challenges, Future Opportunities Still Require True Self-Reliance

As前述, under the reality that the CUDA ecosystem has become a de facto standard, Huawei's choice of a "compatibility-like" path is almost inevitable. However, it also pushes the entire Chinese AI industry to a critical choice node: continue to be compatible with CUDA, or gradually move towards a truly independent ecosystem?

In the short term, the answer is almost unquestionably that compatibility is necessary; it is a choice of efficiency and reality. But in the long term, this path hides risks that cannot be ignored.

As is well known, when a system (like CANN) is designed to be compatible with another system (like CUDA), it inevitably inherits the other's limitations.

The fact is, most global open-source algorithms are currently developed around the NVIDIA architecture. If we一味 pursue 1:1 compatibility to leverage these existing assets, we will fall into the "imitator trap" in hardware design. This manifests as follows: if NVIDIA's hardware architecture faces a paradigm shift at some future point, for example, shifting from Transformer to some new architecture that doesn't require large-scale matrix multiplication but relies more on asynchronous logic, then the domestic computing stack, which has been in a "shadow state," might face an instant technological断层. This dead end of "Bug-for-Bug compatibility"无疑 keeps our underlying innovation perpetually shrouded in the shadow of others.

And the deeper risk lies in the "time gap." According to statistics from Bernstein and Epoch AI, although Huawei's domestic share has surged,国产 chips account for only 5% of the global AI computing power total, which remains relatively limited. It is precisely this absolute scale gap that causes serious "R&D efficiency friction."

Specifically, US AI giants can leverage Blackwell's powerful communication bandwidth to run through 10T parameter Scaling Laws in 18 months, while China's top talent has to消耗 over 50% of its R&D capacity on issues like "how to solve signal attenuation in older chips" and "adapting to immature compilers."

It needs to be clarified that this kind of time misalignment is放大ed infinitely in the rapidly changing AI era. While our talent is still busy "filling pits," opponents may have already completed exponential compound growth in model capabilities, leading to a one-year lead in the opponent's model演变为 a gap of more than a year for us, compounded by exponential growth in model capability, data flywheel, and safety alignment.

Of course, challenges often contain opportunities. If DeepSeek V4 is successfully released, it will prove the feasibility of a "domestic full-stack," accelerate the maturation of the CANN ecosystem, attract more developers to follow suit, and coupled with the global sentiment of "the world has long suffered under NVIDIA," industry support for CANN may exceed expectations. If后续 Huawei Ascend and other chips achieve 80%–90% of H100's inference performance,叠加ed with the compatibility红利 of CANN Next, a critical mass for China's AI supply chain is expected to form within 1–2 years.

But it is necessary to清醒认识 that compatibility can only solve the problem of "survival"; true self-reliance determines "how far we can go." The next 3-5 years will be a critical window. If we can gradually establish independent programming models, operator systems, and system architectures while maintaining compatibility, China's AI ecosystem still has the opportunity to achieve a leap from following to defining the rules. Otherwise, Chinese AI may陷入 the轨道 of rough replication."

In conclusion: The delayed release of DeepSeek V4, seemingly an accidental "slip," actually reveals a deeper reality: AI competition has long ceased to be just about models; it is a comprehensive contest of underlying ecosystems and system capabilities. Compatibility with CUDA is固然 the shortest path to reality, but stopping there may also lock in the future ceiling.

Therefore, the real challenge lies not in whether we can replace one set of technologies, but in whether we can break free from dependence on existing paradigms and build our own rule system. The next 3-5 years will determine whether China's AI becomes an important pole in the global ecosystem or remains in a position of "high-level following" for the long term. Of course, while pursuing self-reliance, we must also be vigilant about the potential impact a closed ecosystem might have on its attractiveness to global developers, to ensure the ecosystem's openness and long-term international competitiveness.

Domande pertinenti

QWhat is the main reason for the delay in the release of DeepSeek V4, and what broader challenge does this represent for China's AI ecosystem?

AThe delay is due to the deep adaptation of DeepSeek V4 for inference on Huawei's Ascend chips using the CANN framework, which involves rewriting core code. This represents a broader challenge of systemically exploring the possibility of running core AI models on a non-CUDA platform, a significant 'stress test' for China's underlying AI technology roadmap and a move towards reducing reliance on NVIDIA's ecosystem.

QHow does the article characterize the nature of NVIDIA's dominance in AI, and why is it described as more challenging than historical monopolies like Wintel?

AThe article characterizes NVIDIA's dominance as a 'monolithic vertical monopoly,' combining hardware (defining SM physical structure and Tensor Core logic) and software (the CUDA ecosystem with closed-source libraries like cuBLAS and cuDNN). This is described as more challenging than the Wintel alliance because NVIDIA's tight integration creates a powerful 'ecosystem lock-in,' where developers think in terms of NVIDIA's hardware features, making model portability difficult and allowing NVIDIA to capture the vast majority of global innovation红利 (benefits).

QWhat is the 'compatibility-first' strategy adopted by Huawei's CANN, and what are its main advantages and limitations according to the article?

AHuawei's CANN adopted a 'compatibility-first' strategy by introducing abstraction layers designed to be similar to CUDA interfaces (e.g.,对标 cuBLAS, cuDNN), aiming for high compatibility. The main advantage is that it significantly lowers the barrier to entry, allowing companies like DeepSeek, Tencent, and ByteDance to migrate models to domestic Ascend hardware with much lower cost and effort (e.g., reducing migration time to hours). The limitation is that this compatibility layer can become a 'ceiling,' hindering true底层 innovation. When developers attempt novel or cutting-edge optimizations or non-standard architectures (e.g., SSM, Mamba), they often hit the boundaries of CANN's support, experiencing performance issues, as the framework's underlying optimizations are still geared towards common operations like GEMM.

QWhat long-term risks does the article associate with over-reliance on a compatibility strategy with CUDA for China's AI development?

AThe article highlights two major long-term risks: 1) The 'imitator trap': Hardware design could be constrained by mimicking NVIDIA's architecture. If NVIDIA undergoes a major paradigm shift (e.g., moving away from Transformer-based models reliant on large matrix multiplications), China's compute stack, built for compatibility, could face a sudden technological断层 (discontinuity). 2) The 'time gap' or 'R&D efficiency friction': The significant disparity in the global share of AI compute (5% for domestic chips vs. NVIDIA's dominance) means China's top talent spends a disproportionate amount of R&D capacity solving basic hardware/software adaptation problems instead of pushing the boundaries of AI model capabilities. This time lag could lead to an exponentially widening gap in overall AI capability.

QWhat does the article conclude is the ultimate challenge for China's AI ecosystem, beyond simply replacing NVIDIA's technology?

AThe article concludes that the ultimate challenge is not merely replacing a set of technologies (like CUDA and NVIDIA hardware) but breaking free from dependence on the existing paradigm and building its own rule system. This involves establishing an independent programming model, operator system, and systems architecture. The next 3-5 years are a critical window to determine if China's AI can become an important pole in the global ecosystem with its own defining rules or remain in a position of 'high-level followership.' It also cautions against creating a closed ecosystem that could reduce its appeal to global developers, emphasizing the need to maintain openness and long-term international competitiveness.

Letture associate

Eight Global Central Banks Enter the Fray, Aiming to Claim a Piece of the Stablecoin Pie?

The article discusses the Agorá project, a global cross-border payment system initiative led by the Bank for International Settlements (BIS) with participation from eight major central banks (including the Federal Reserve Bank of New York, Bank of England, and Bank of Japan) and over 40 private financial institutions like JPMorgan and SWIFT. Agorá aims to create a unified platform for the instant settlement of cross-border transactions using tokenized commercial bank deposits. A key feature is its strict "permissioned" design, where funds are pre-labeled by country and smart contracts enforce AML and sanctions checks. This contrasts with the "permissionless" ideal suggested by its ancient Greek namesake. The system employs a two-tier architecture: central banks retain full control over sovereign reserves on separate ledgers, while private entities manage a shared ledger for multi-currency clearing. The project, which completed a prototype in May 2026, seeks to streamline the slow, multi-step process of traditional cross-border payments. It is positioned as a centralized, regulatory-compliant alternative to decentralized stablecoins like Tether, targeting large-scale institutional transfers. The analysis highlights a potential future market split: projects like Agorá could dominate wholesale institutional payments, while public blockchain-based stablecoins retain their role in retail, remittance, and emerging market use cases. This represents an effort by traditional finance to establish boundaries for decentralized networks. The upcoming integration of the EU's Pontes framework with its core settlement system will test this dynamic.

marsbit1 min fa

Eight Global Central Banks Enter the Fray, Aiming to Claim a Piece of the Stablecoin Pie?

marsbit1 min fa

BitMart Research Institute Weekly Highlights: ETF Continued Outflows + AI Drain, Crypto Market Seeks Bottom Amid Volatility

**BitMart Research Weekly Highlights: ETF Outflows and AI Demand Weigh on Crypto Market** The crypto market saw a correction this past week, diverging from the all-time highs in U.S. equity markets. Bitcoin (BTC) fell roughly 6%, while Ethereum (ETH) declined about 4.5%. The primary pressure point was significant and sustained outflows from U.S. spot Bitcoin ETFs, which experienced a record nine consecutive days of net redemptions totaling approximately $2.8 billion. Spot Ethereum ETFs also faced continuous outflows. This weakness in digital assets contrasted with the continued surge in traditional markets, particularly AI-related stocks. The news of Anthropic's secret IPO filing, targeting a potential $750B IPO, and Alphabet's major new AI infrastructure funding further fueled the tech rally. The analysis suggests a potential "liquidity siphon" effect, where capital is being diverted from crypto into the dominant AI investment narrative. Other notable developments include DTCC's DTC announcing plans to integrate Stellar for tokenized asset services, signaling a major step for tokenized equities. Meanwhile, MicroStrategy paused its primary mechanism for funding Bitcoin purchases to focus on debt management, removing a key institutional buyer from the market. The report concludes that the crypto market remains under pressure from the competing AI narrative and major upcoming IPOs, with a potential for a broader market bottom if an AI-driven correction occurs later this cycle.

marsbit16 min fa

BitMart Research Institute Weekly Highlights: ETF Continued Outflows + AI Drain, Crypto Market Seeks Bottom Amid Volatility

marsbit16 min fa

The Death of the Three-Act Play: AI Ushers Enterprise Software Startups into the ‘Speedrun Era’

The Death of the Three-Act Play: How AI is Ushering in a 'Speedrun Era' for Enterprise Software Startups The traditional three-act play for building an enterprise software company—first, a niche wedge product; second, an expanded suite; third, a dominant platform—is becoming obsolete in the AI era. Previously, startups would spend 3-5 years perfecting a single-point solution to reach tens of millions in ARR (Act 1: The Wedge). Then, over another few years, they'd build adjacent products to form a suite and cross the $100M ARR threshold (Act 2: The Suite). Finally, with scale and user engagement, they could aim to become a foundational platform themselves (Act 3: The Platform). This model assumed a timeline measured in years. However, AI-driven tools have dramatically compressed software development costs and timelines. Companies like Cursor, Clay, and Harvey have scaled from near zero to approaching or surpassing $100M ARR in remarkably short periods, demonstrating a new competitive pace. The core argument is that in this rapidly changing market, relying on a small, "safe" wedge as a protective harbor may now be a conservative, even risky, strategy. The plummeting cost of building software means the time required for Acts 1 and 2 is approaching zero. Consequently, rational strategy now favors planning to build the entire vision from the outset. This shift changes the calculus for early-stage investment. The emphasis is moving from finding a defensible niche to backing founders with "unreasonable, relentless ambition" to reimagine entire workflows or replace incumbent platforms from day one. The age of gradual expansion is giving way to an era of immediate, full-scale ambition.

marsbit36 min fa

The Death of the Three-Act Play: AI Ushers Enterprise Software Startups into the ‘Speedrun Era’

marsbit36 min fa

After the 'Golden Finger' Points to IBM, the Stock God Trump's Next Target Emerges

The White House occupant is being called a "stock god." Financial disclosures show former President Trump executed 3,642 stock trades in Q1 2026, averaging 58 per trading day. More significantly, a pattern has emerged where companies he publicly praises often see their stock prices rise and frequently overlap with his personal portfolio holdings, government industrial policy, and federal funding. Since a high-profile Tesla event in March 2025, Trump has publicly endorsed at least nine companies, including Intel, Dell, Micron, Palantir, IBM, Apple, Thermo Fisher, Nvidia, and AMD. These "Trump concept stocks" share key traits: they are tied to AI, semiconductors, quantum computing, or "Made in America" narratives; they often receive government contracts, subsidies (like CHIPS Act funding), or regulatory favors; and their CEOs typically have strong personal or political ties to Trump. Timing raises questions. In several instances, such as with Palantir and Dell, Trump's personal account established or increased positions weeks before his public endorsements, which were followed by significant stock price jumps. While his assets are reportedly held in a blind trust managed by his children, the correlation is notable. Based on this pattern, analysis suggests the next companies likely to be endorsed are those where the US government has already taken a strategic equity stake but which haven't yet received a high-profile "call-out." Prime candidates include MP Materials (rare earths, 15% DoD interest), Lithium Americas (lithium, DoE-backed), and quantum computing firms like IonQ, Rigetti, and D-Wave, which are reportedly in talks for government equity-for-funding deals. Other potential names are Oracle (deep political ties) and GlobalFoundries (semiconductors and quantum funding). These stocks carry high political premium, meaning their valuations are highly sensitive to political favor, which can be volatile.

marsbit1 h fa

After the 'Golden Finger' Points to IBM, the Stock God Trump's Next Target Emerges

marsbit1 h fa

Trading

Spot
Futures

Articoli Popolari

Come comprare SUN

Benvenuto in HTX.com! Abbiamo reso l'acquisto di SUN (SUN) semplice e conveniente. Segui la nostra guida passo passo per intraprendere il tuo viaggio nel mondo delle criptovalute.Step 1: Crea il tuo Account HTXUsa la tua email o numero di telefono per registrarti il tuo account gratuito su HTX. Vivi un'esperienza facile e sblocca tutte le funzionalità,Crea il mio accountStep 2: Vai in Acquista crypto e seleziona il tuo metodo di pagamentoCarta di credito/debito: utilizza la tua Visa o Mastercard per acquistare immediatamente SUNSUN.Bilancio: Usa i fondi dal bilancio del tuo account HTX per fare trading senza problemi.Terze parti: abbiamo aggiunto metodi di pagamento molto utilizzati come Google Pay e Apple Pay per maggiore comodità.P2P: Fai trading direttamente con altri utenti HTX.Over-the-Counter (OTC): Offriamo servizi su misura e tassi di cambio competitivi per i trader.Step 3: Conserva SUN (SUN)Dopo aver acquistato SUN (SUN), conserva nel tuo account HTX. In alternativa, puoi inviare tramite trasferimento blockchain o scambiare per altre criptovalute.Step 4: Scambia SUN (SUN)Scambia facilmente SUN (SUN) nel mercato spot di HTX. Accedi al tuo account, seleziona la tua coppia di trading, esegui le tue operazioni e monitora in tempo reale. Offriamo un'esperienza user-friendly sia per chi ha appena iniziato che per i trader più esperti.

862 Totale visualizzazioniPubblicato il 2024.12.12Aggiornato il 2025.03.21

Come comprare SUN

Discussioni

Benvenuto nella Community HTX. Qui puoi rimanere informato sugli ultimi sviluppi della piattaforma e accedere ad approfondimenti esperti sul mercato. Le opinioni degli utenti sul prezzo di SUN SUN sono presentate come di seguito.

活动图片