Autonomy or Compatibility: The Choice Facing China's AI Ecosystem Behind the Delay of DeepSeek V4

marsbitPubblicato 2026-04-21Pubblicato ultima volta 2026-04-21

Introduzione

DeepSeek V4's repeated delay in early 2026 has sparked global discussions on "de-CUDA-ization" in AI. The highly anticipated trillion-parameter open-source model is undergoing deep adaptation to Huawei’s Ascend chips using the CANN framework, representing China’s first systematic attempt to run a core AI model outside the CUDA ecosystem. This shift, however, comes with significant engineering challenges. While the model uses a MoE architecture to reduce computational load, it places extreme demands on memory bandwidth, chip interconnects, and system scheduling—areas where NVIDIA’s mature CUDA ecosystem currently excels. Migrating to Ascend introduces complexities in hardware topology, communication latency, and software optimization due to CANN’s relative immaturity compared to CUDA. The move highlights a broader strategic dilemma: short-term compatibility with CUDA offers practical benefits and faster adoption, as seen in CANN’s efforts to emulate CUDA interfaces. Yet, long-term over-reliance on compatibility risks inheriting CUDA’s limitations and stifling native innovation. If global AI shifts away from transformer-based architectures, strict compatibility could lead to technological obsolescence. Despite these challenges, DeepSeek V4’s eventual release could demonstrate the viability of a full domestic AI stack and accelerate CANN’s ecosystem growth. However, true technological independence will require building an original software-hardware paradigm beyond compatibili...

By Sun Yongjie

Entering 2026, the release window for DeepSeek V4 has been repeatedly postponed, unexpectedly igniting global discussions within the AI community about "de-CUDA-ization." According to reports from multiple media outlets, this open-source multimodal model, expected to have a parameter scale of trillions and support for million-token context, is being fully adapted for Huawei's Ascend chips, with core code rewritten through the CANN framework.

If this eventually becomes reality, it will mark the first time China's AI system has systematically explored the possibility of carrying core model capabilities on a non-CUDA platform in a real production environment. In other words, this is not just the release of a model but more like a "stress test" of the underlying technical route.

However, as DeepSeek founder Liang Wenfeng emphasized in internal communications, this is only "the first step of a long march." The future holds both risks and opportunities, and the balance—or even trade-off—between compatibility and self-reliance will determine whether China's AI can truly forge its own path of development.

DeepSeek V4 Delay: The Inevitable Cost of Transitioning the Foundational AI Computing Platform

As mentioned, the V4, originally planned for release during the Chinese New Year or in February–March of this year, has repeatedly missed its window. As of early April, relevant media confirmed a release "within weeks." The reason lies in the deep adaptation of the inference side to Huawei's Ascend chips. However, this path is far more complex than imagined. To understand this complexity, we must first return to the technical characteristics of DeepSeek V4 itself.

As is well known, entering 2026, large model parameter scales have crossed the "trillion" threshold and are moving toward tens of trillions. Against this backdrop, although V4 adopts a more aggressive MoE (Mixture of Experts) architecture, theoretically reducing single-inference computational load by "activating experts on demand," the trade-off is that it places more extreme demands on system capabilities, including memory bandwidth, inter-chip connectivity, and KV Cache management.

In other words, the computational pressure shifts from "pure computation" to "system scheduling and communication." Within the NVIDIA ecosystem, this set of problems has relatively mature solutions.

For example, based on H100 or B200, high-bandwidth interconnects built via NVLink and NVSwitch can achieve TB/s-level bandwidth between GPUs within a single node, forming a near "fully connected" computational network where data flows between chips like on a highway, with latency and synchronization costs greatly compressed. However, when DeepSeek attempts to migrate this sophisticated system to the Huawei Ascend platform, it faces a completely different hardware topology.

It is undeniable that Ascend chips have made significant progress in recent years, but there remains a physical-layer gap with NVIDIA in terms of "full connectivity capability" in ultra-large-scale clusters. For instance, constrained by process technology and SerDes IP capabilities, Ascend relies more on optical modules for cross-node expansion. This "trading space for bandwidth" solution, while feasible, introduces longer physical links, thereby bringing complexities such as signal latency, synchronization overhead, and power and thermal management.

At the same time, the gap at the software level is equally non-negligible. The maturity of Huawei's CANN framework in areas such as operator coverage, automatic parallelism, kernel fusion, and distributed communication scheduling still lags behind the CUDA ecosystem overall. This means that DeepSeek's engineering team needs to perform targeted optimizations on a large number of underlying details, even manually rewriting key operators.

More棘手的是, this lag is often not linear but systemic. Specifically, the performance drop of one operator can affect the entire computational chain; a reduction in communication efficiency can cause significant fluctuations in overall throughput. The end result may be that the model can still run, but it is still a long way from being stable, efficient, and scalable.

From this perspective, the delay of DeepSeek V4 is not simply a product timing issue but an inevitable cost of the deep磨合 between China's top algorithm teams and the domestic chip system. Although the process is difficult, it is highly significant.

More importantly, this process sends a clear signal that AI competition is shifting from a "model capability contest" to a "systems engineering capability contest." In this phase, whoever can "run the model, run it stably, and run it cheaply" faster will truly approach an industrial-level advantage.

CUDA Monopoly Hard to Break, CANN's Reluctant Compromise

If the aforementioned adaptation difficulties of DeepSeek V4 on the inference side reveal the practical bottlenecks at the engineering level, then following this question further, a more fundamental question emerges: Why has migrating a model from one computing platform to another become so difficult?

Looking back at the Wintel alliance of the PC era, although Microsoft and Intel jointly held a monopoly, there was an interest game between the two companies, which预留ed space for the rise of Linux, AMD, and even Apple's system later. However, NVIDIA has established a "monolithic vertical monopoly" in the AI field, essentially the combined entity of Microsoft and Intel.

Specifically, at the hardware level, NVIDIA defined the physical structure of the SM (Streaming Multiprocessor) and the computational logic of the Tensor Core; at the software level, CUDA provides closed-source libraries like cuBLAS and cuDNN that are perfectly 1:1 matched with it. The叠加 of these two aspects has led to an extremely daunting reality: over 6 million developers worldwide optimize algorithms around cuBLAS, cuDNN, NVLink/NVSwitch; frameworks (PyTorch, TensorFlow) prioritize CUDA implementations; and even "anti-NVIDIA" heterogeneous clusters like AWS Trainium + Cerebras WSE still require NVIDIA NIXL software and AWS EFA for KV cache migration.

This shows that it is no longer a single-point technical detail but an ecosystem lock-in. Before model portability fails, developers' habit of "thinking in the language of NVIDIA hardware features" has become惯性. It is precisely this ecosystem惯性 that makes NVIDIA like a huge black hole, absorbing over 90% of global innovation红利.

Against this background, as its strongest competitor, Huawei's CANN initially attempted to follow a relatively independent path. However, with the advent of the large model era, this path gradually revealed problems, such as developers' reluctance to migrate, enterprises' fear of taking risks, and slow ecosystem growth. Coupled with the pressure of time (e.g., the rapid iteration of large models), the path of complete self-reliance began to become unrealistic.

Based on this, CANN gradually introduced abstraction layer designs similar to CUDA. For example, in CANN Next, it attempted to对标 cuBLAS and cuDNN interfaces, achieving high比例 compatibility, reducing model migration costs from "weeks or even months" to "hours." At the architectural level, the recently released 950PR heterogeneous architecture (prefill/decode decoupling) also deliberately imitates NVIDIA's decoupled service approach, rather than Google TPU's彻底 heterogeneous route.

We must admit that this "compatibility first" strategy is successful in the short term. It lowers the threshold, allowing Ascend to quickly gain an application foundation in the domestic market, and enables companies like DeepSeek, Tencent, and ByteDance to尝试 domestic computing power with a relatively low barrier to entry. For example, CANN Next achieves over 95% CUDA compatibility through the SIMT programming model, already helping many enterprises大幅 shorten migration time to the hour level, accelerating practical deployment.

But the随之而来的 challenge is that once it involves cutting-edge innovation, the compatibility layer becomes a "ceiling."

For instance, when developers深入 use the Ascend platform, they find that although common paths have been smoothed, once they involve some niche, innovative underlying operators, CANN's support drops, and performance jitters剧烈. The difficulties DeepSeek V4 encountered during adaptation, such as when trying to introduce hybrid architectures like SSM (State Space Model) or Mamba (non-Transformer structures), and finding that CANN's underlying optimization is still mainly倾斜 towards matrix multiplication (GEMM), are largely because they hit the "boundary" of CANN's compatibility layer when attempting algorithm optimizations that go beyond the conventional.

The deeper issue is that once compatibility is chosen, it means默认 CUDA remains the隐形 standard. You can replace the hardware, but in terms of software semantics and development paradigms, you are still following the rules defined by the other party. This is both a shortcut and a limitation.

Compatibility Hides Risks and Challenges, Future Opportunities Still Require True Self-Reliance

As前述, under the reality that the CUDA ecosystem has become a de facto standard, Huawei's choice of a "compatibility-like" path is almost inevitable. However, it also pushes the entire Chinese AI industry to a critical choice node: continue to be compatible with CUDA, or gradually move towards a truly independent ecosystem?

In the short term, the answer is almost unquestionably that compatibility is necessary; it is a choice of efficiency and reality. But in the long term, this path hides risks that cannot be ignored.

As is well known, when a system (like CANN) is designed to be compatible with another system (like CUDA), it inevitably inherits the other's limitations.

The fact is, most global open-source algorithms are currently developed around the NVIDIA architecture. If we一味 pursue 1:1 compatibility to leverage these existing assets, we will fall into the "imitator trap" in hardware design. This manifests as follows: if NVIDIA's hardware architecture faces a paradigm shift at some future point, for example, shifting from Transformer to some new architecture that doesn't require large-scale matrix multiplication but relies more on asynchronous logic, then the domestic computing stack, which has been in a "shadow state," might face an instant technological断层. This dead end of "Bug-for-Bug compatibility"无疑 keeps our underlying innovation perpetually shrouded in the shadow of others.

And the deeper risk lies in the "time gap." According to statistics from Bernstein and Epoch AI, although Huawei's domestic share has surged,国产 chips account for only 5% of the global AI computing power total, which remains relatively limited. It is precisely this absolute scale gap that causes serious "R&D efficiency friction."

Specifically, US AI giants can leverage Blackwell's powerful communication bandwidth to run through 10T parameter Scaling Laws in 18 months, while China's top talent has to消耗 over 50% of its R&D capacity on issues like "how to solve signal attenuation in older chips" and "adapting to immature compilers."

It needs to be clarified that this kind of time misalignment is放大ed infinitely in the rapidly changing AI era. While our talent is still busy "filling pits," opponents may have already completed exponential compound growth in model capabilities, leading to a one-year lead in the opponent's model演变为 a gap of more than a year for us, compounded by exponential growth in model capability, data flywheel, and safety alignment.

Of course, challenges often contain opportunities. If DeepSeek V4 is successfully released, it will prove the feasibility of a "domestic full-stack," accelerate the maturation of the CANN ecosystem, attract more developers to follow suit, and coupled with the global sentiment of "the world has long suffered under NVIDIA," industry support for CANN may exceed expectations. If后续 Huawei Ascend and other chips achieve 80%–90% of H100's inference performance,叠加ed with the compatibility红利 of CANN Next, a critical mass for China's AI supply chain is expected to form within 1–2 years.

But it is necessary to清醒认识 that compatibility can only solve the problem of "survival"; true self-reliance determines "how far we can go." The next 3-5 years will be a critical window. If we can gradually establish independent programming models, operator systems, and system architectures while maintaining compatibility, China's AI ecosystem still has the opportunity to achieve a leap from following to defining the rules. Otherwise, Chinese AI may陷入 the轨道 of rough replication."

In conclusion: The delayed release of DeepSeek V4, seemingly an accidental "slip," actually reveals a deeper reality: AI competition has long ceased to be just about models; it is a comprehensive contest of underlying ecosystems and system capabilities. Compatibility with CUDA is固然 the shortest path to reality, but stopping there may also lock in the future ceiling.

Therefore, the real challenge lies not in whether we can replace one set of technologies, but in whether we can break free from dependence on existing paradigms and build our own rule system. The next 3-5 years will determine whether China's AI becomes an important pole in the global ecosystem or remains in a position of "high-level following" for the long term. Of course, while pursuing self-reliance, we must also be vigilant about the potential impact a closed ecosystem might have on its attractiveness to global developers, to ensure the ecosystem's openness and long-term international competitiveness.

Crypto di tendenza

CitreaCTR

wrapped stUSDTWSTUSDT

Velodrome FinanceVELODROME

BrevisBREV

PancakeSwapCAKE

JUSTJST

Domande pertinenti

QWhat is the main reason for the delay in the release of DeepSeek V4, and what broader challenge does this represent for China's AI ecosystem?

AThe delay is due to the deep adaptation of DeepSeek V4 for inference on Huawei's Ascend chips using the CANN framework, which involves rewriting core code. This represents a broader challenge of systemically exploring the possibility of running core AI models on a non-CUDA platform, a significant 'stress test' for China's underlying AI technology roadmap and a move towards reducing reliance on NVIDIA's ecosystem.

QHow does the article characterize the nature of NVIDIA's dominance in AI, and why is it described as more challenging than historical monopolies like Wintel?

AThe article characterizes NVIDIA's dominance as a 'monolithic vertical monopoly,' combining hardware (defining SM physical structure and Tensor Core logic) and software (the CUDA ecosystem with closed-source libraries like cuBLAS and cuDNN). This is described as more challenging than the Wintel alliance because NVIDIA's tight integration creates a powerful 'ecosystem lock-in,' where developers think in terms of NVIDIA's hardware features, making model portability difficult and allowing NVIDIA to capture the vast majority of global innovation红利 (benefits).

QWhat is the 'compatibility-first' strategy adopted by Huawei's CANN, and what are its main advantages and limitations according to the article?

AHuawei's CANN adopted a 'compatibility-first' strategy by introducing abstraction layers designed to be similar to CUDA interfaces (e.g.,对标 cuBLAS, cuDNN), aiming for high compatibility. The main advantage is that it significantly lowers the barrier to entry, allowing companies like DeepSeek, Tencent, and ByteDance to migrate models to domestic Ascend hardware with much lower cost and effort (e.g., reducing migration time to hours). The limitation is that this compatibility layer can become a 'ceiling,' hindering true底层 innovation. When developers attempt novel or cutting-edge optimizations or non-standard architectures (e.g., SSM, Mamba), they often hit the boundaries of CANN's support, experiencing performance issues, as the framework's underlying optimizations are still geared towards common operations like GEMM.

QWhat long-term risks does the article associate with over-reliance on a compatibility strategy with CUDA for China's AI development?

AThe article highlights two major long-term risks: 1) The 'imitator trap': Hardware design could be constrained by mimicking NVIDIA's architecture. If NVIDIA undergoes a major paradigm shift (e.g., moving away from Transformer-based models reliant on large matrix multiplications), China's compute stack, built for compatibility, could face a sudden technological断层 (discontinuity). 2) The 'time gap' or 'R&D efficiency friction': The significant disparity in the global share of AI compute (5% for domestic chips vs. NVIDIA's dominance) means China's top talent spends a disproportionate amount of R&D capacity solving basic hardware/software adaptation problems instead of pushing the boundaries of AI model capabilities. This time lag could lead to an exponentially widening gap in overall AI capability.

QWhat does the article conclude is the ultimate challenge for China's AI ecosystem, beyond simply replacing NVIDIA's technology?

AThe article concludes that the ultimate challenge is not merely replacing a set of technologies (like CUDA and NVIDIA hardware) but breaking free from dependence on the existing paradigm and building its own rule system. This involves establishing an independent programming model, operator system, and systems architecture. The next 3-5 years are a critical window to determine if China's AI can become an important pole in the global ecosystem with its own defining rules or remain in a position of 'high-level followership.' It also cautions against creating a closed ecosystem that could reduce its appeal to global developers, emphasizing the need to maintain openness and long-term international competitiveness.

Letture associate

Donald Trump's Company Sold Another Large Batch of Bitcoins!

Donald Trump's company, Trump Media & Technology Group, reportedly transferred another large batch of Bitcoin to the CryptoCom exchange. Blockchain analysis indicates that addresses linked to Trump Media moved approximately 2,628 BTC (worth around $165 million) to the exchange. Prior reports suggested the company had acquired a total of 11,542 BTC at an average price of $118,500. It is claimed that by 2026, about 7,281 BTC had been withdrawn from these addresses, with approximately 4,261 BTC still held on them. The total realized and unrealized losses from Trump Media's Bitcoin investments are estimated to be roughly $555 million. It is important to note that sending Bitcoin to an exchange does not definitively mean the assets were sold. Such transfers could also be for custody, liquidity management, or other financial operations. However, movements from cold wallets to centralized exchanges are commonly viewed as potential sales activity.

cryptonews.ru1 h fa

Donald Trump's Company Sold Another Large Batch of Bitcoins!

cryptonews.ru1 h fa

Parker Lewis Explains Why Bitcoin Remains the Best Money

Bitcoin analyst Parker Lewis criticized companies promoting themselves as "crypto treasuries" for selling perpetual preferred stock, calling it a distortion of Bitcoin's essence. He argues Bitcoin has no inherent yield, and promises of dividends from such corporate derivatives are risky, often relying on new investor inflows. Lewis highlighted the vast discrepancy between the $300 trillion global credit market and the $1 trillion perpetual preferred stock market, suggesting these instruments shift indefinite risks to retail investors. He also refuted the notion that Bitcoin is "too volatile," stating volatility is a natural mathematical outcome of a fixed-supply asset gaining mass adoption, as new users must bid higher to acquire it. Instead of buying shares of companies like MicroStrategy, Lewis advises direct Bitcoin ownership as safer. The focus on corporate derivatives distracts from the primary threat of fiat currency devaluation. Citing his informal "Ribeye Index," Lewis notes a steep rise in steak prices, indicating real inflation far exceeding official CPI figures. In conclusion, the most prudent strategy against inflation is direct ownership and self-custody of Bitcoin. Chasing corporate yield through crypto treasury stocks multiplies systemic risks, while understanding decentralized money protects savings from macroeconomic turmoil.

cryptonews.ru1 h fa

Parker Lewis Explains Why Bitcoin Remains the Best Money

cryptonews.ru1 h fa

Why Bitcoin Holds Above $64,000 After Fed's Hard Pause

**Bitcoin Stabilizes Near $64,000 Following Hawkish Fed Pause** The cryptocurrency market, led by Bitcoin, remained stable around $64,000 despite a volatile reaction to the latest U.S. Federal Reserve meeting. The Fed paused interest rates but signaled a hawkish stance, with three committee members voting for an increase—the highest dissent since 2016. This limits risk appetite but hasn't triggered panic selling. Key market highlights include Bitcoin ETFs seeing a net inflow of $32.1 million, breaking a streak of outflows, while Ethereum ETFs experienced outflows of $18.65 million. Liquidations affected about 90,000 traders. Technically, Bitcoin finds support around $63,000-$63,500, with major resistance near $66,000. While its price is about 49% below its all-time high, institutional demand via ETFs and the absence of mass capitulation support a potential recovery scenario in the second half of the year. Major altcoins showed mixed movements, with Solana attracting capital while Ethereum faced selling pressure despite strong on-chain metrics like a growing staking queue. Regulatory news took a pause as the U.S. Senate delayed the CLARITY Act vote until at least autumn. For the final trading day of July, U.S. inflation and consumer spending data will be crucial. Bitcoin's key levels to watch are $63,000 support and $66,000 resistance. Sustained ETF inflows and Bitcoin holding above $63,000 are seen as positive signs for a potential market recovery later in the year.

cryptonews.ru1 h fa

Why Bitcoin Holds Above $64,000 After Fed's Hard Pause

cryptonews.ru1 h fa

ARK Invest's Cathie Wood Buys 109,129 Circle Shares Worth $6.83 Million

ARK Invest, led by Cathie Wood, purchased approximately 109,129 shares of Circle for nearly $6.83 million across three of its ETFs: ARK Innovation, ARK Next Generation Internet, and ARK Fintech Innovation. This investment followed Circle's recent receipt of a trust charter license from the New York Department of Financial Services for its subsidiary, Circle New York Trust, which CEO Jeremy Allaire described as a long-term company goal. Despite this regulatory approval, Circle's stock (CRCL) fell 2.54% to $62.61 on July 31, as investors may not have viewed the license as a catalyst for growth. In the same period, ARK Invest also bought shares in Tesla, SpaceX, and Nvidia worth about $40.2 million amid a broader tech sell-off, while reducing its holdings in companies like Shopify, Cloudflare, and CrowdStrike.

cryptonews.ru1 h fa

ARK Invest's Cathie Wood Buys 109,129 Circle Shares Worth $6.83 Million

cryptonews.ru1 h fa

Participants in XRP Fraud Scheme That Stole $9 Million from 71 Investors Arrested

South Korean police have arrested three individuals accused of operating a fraudulent investment platform that stole approximately 3.4 million XRP (worth about $9 million) from 71 investors between October 16 and 23. The suspects promoted the site Fxrpntwork.com through blogs, online articles, and YouTube videos, promising guaranteed principal and monthly returns of 1.5% to 1.8%. Investors were instructed to transfer XRP from Korean exchanges to overseas platforms and then to wallets controlled by the group before the site was shut down. The scammers copied the branding of legitimate projects Flare Network and FXRP to appear credible. Authorities warn that such impersonation frauds, which use familiar branding and urgent promises of guaranteed profits, are a common red flag. Legitimate companies do not solicit cryptocurrency transfers through unsolicited promotions. Seoul police have issued an Interpol Red Notice for a fourth suspect abroad and are investigating others involved in creating and promoting the fraudulent website. While investigators froze 17.3 billion won in assets, approximately 10 billion won was moved during the probe, with wallet analysis revealing transfers totaling 27.3 billion won, suggesting there may be additional unidentified victims and accomplices. The case underscores the organized, cross-border nature of crypto investment fraud.

cryptonews.ru1 h fa

Participants in XRP Fraud Scheme That Stole $9 Million from 71 Investors Arrested

cryptonews.ru1 h fa

Trading

Spot

Articoli Popolari

Come comprare SUN

Benvenuto in HTX.com! Abbiamo reso l'acquisto di SUN (SUN) semplice e conveniente. Segui la nostra guida passo passo per intraprendere il tuo viaggio nel mondo delle criptovalute.Step 1: Crea il tuo Account HTXUsa la tua email o numero di telefono per registrarti il tuo account gratuito su HTX. Vivi un'esperienza facile e sblocca tutte le funzionalità,Crea il mio accountStep 2: Vai in Acquista crypto e seleziona il tuo metodo di pagamentoCarta di credito/debito: utilizza la tua Visa o Mastercard per acquistare immediatamente SUNSUN.Bilancio: Usa i fondi dal bilancio del tuo account HTX per fare trading senza problemi.Terze parti: abbiamo aggiunto metodi di pagamento molto utilizzati come Google Pay e Apple Pay per maggiore comodità.P2P: Fai trading direttamente con altri utenti HTX.Over-the-Counter (OTC): Offriamo servizi su misura e tassi di cambio competitivi per i trader.Step 3: Conserva SUN (SUN)Dopo aver acquistato SUN (SUN), conserva nel tuo account HTX. In alternativa, puoi inviare tramite trasferimento blockchain o scambiare per altre criptovalute.Step 4: Scambia SUN (SUN)Scambia facilmente SUN (SUN) nel mercato spot di HTX. Accedi al tuo account, seleziona la tua coppia di trading, esegui le tue operazioni e monitora in tempo reale. Offriamo un'esperienza user-friendly sia per chi ha appena iniziato che per i trader più esperti.

918 Totale visualizzazioniPubblicato il 2024.12.12Aggiornato il 2025.03.21

Discussioni

Benvenuto nella Community HTX. Qui puoi rimanere informato sugli ultimi sviluppi della piattaforma e accedere ad approfondimenti esperti sul mercato. Le opinioni degli utenti sul prezzo di SUN SUN sono presentate come di seguito.