Can DeepSeek Save China One Trillion Dollars?

marsbitPublié le 2026-06-03Dernière mise à jour le 2026-06-03

Résumé

"DeepSeek and the $1 Trillion Infrastructure Question" The article examines whether DeepSeek's AI optimization breakthroughs could potentially save China $1 trillion in future AI infrastructure costs. The analysis begins with Nvidia's upcoming Vera Rubin AI platform, costing ~$7.8 million, where memory (HBM4/LPDDR5X) constitutes $2 million—a 435% cost increase in one year, highlighting how AI hardware spending is shifting toward expensive memory components. DeepSeek's approach works in the opposite direction. Through three key technical innovations showcased in DeepSeek V4, the company dramatically improves hardware efficiency: 1. **Memory Compression (MLA)**: Re-engineers the attention mechanism to compress long-context memory (KV Cache) by over 90%, drastically reducing expensive HBM usage. 2. **Selective Activation (MoE)**: Employs Mixture-of-Experts architecture where only a small fraction of parameters (e.g., 49B out of 1.6T in V4-Pro) are activated per token, allowing most parameters to reside in cheaper memory/SSD. 3. **Computation Caching**: Reuses previously computed results via cache hits, replacing expensive GPU computations with cheap memory reads. Combined, these optimizations allow the same hardware to produce approximately 4x more tokens, effectively reducing required hardware investment by 75%. DeepSeek's pricing reflects this: a 10-billion token workload costs ~$522 monthly versus ~$9,000-$10,000 for competitors. The $1 trillion savings projection stems...

In the second half of 2026, NVIDIA will deliver its most powerful AI platform to date: the Vera Rubin VR200 NVL72. This single rack is packed with 72 Rubin GPUs and 36 Vera CPUs. Morgan Stanley estimates the bill-of-materials cost for this machine to be approximately $7.8 million.

That number is already staggering. But what's more crucial to watch is where the money is being spent.

Out of this $7.8 million, about $2 million isn't spent on the world-famous GPU chip or the compute cores. Instead, it's spent on memory—High Bandwidth Memory 4 (HBM4) and standard LPDDR5X. The cost of this memory component has skyrocketed by 435% in just one year due to price hikes.

This is a signal. Within the increasingly expensive AI machine, money is flowing in large quantities from the "components responsible for computation" to the "components responsible for memory and storage."

Remember this signal. Because the subject of this article, DeepSeek, is doing precisely the opposite: while everyone is being pushed by the era to pay AI hardware premiums for increasingly expensive memory, DeepSeek is figuring out how to—without compromising competitiveness—increase token throughput by over 4x through hardware-software co-design, effectively saving 75% of hardware investment.

And the ultimate implication of this effort has sparked a recent fervent discussion—can DeepSeek's efforts save one trillion dollars for China's AI infrastructure development?

Is this really possible?

One Trillion Dollars Saved Through Efficiency

The NVIDIA price tag mentioned earlier represents the hardest cost in the recent AI infrastructure ledger. Under the current supply-demand dynamics, if you want to buy the most advanced AI machines, you must accept this bill.

DeepSeek cannot change that.

What it changes is something else: how many tokens the same machine, with the same $2 million worth of expensive storage hardware, can actually produce.

This question has become particularly concrete following the release of DeepSeek V4.

What's even more noteworthy about V4 is not just the model itself, but the three-pronged approach it demonstrates: First, continue compressing "memory," preventing long contexts from overwhelming VRAM; Second, activate the "body" on demand, so the massive expert model doesn't need to fully mobilize every time; Third, turn repeated computations into reusable assets, so computed contexts don't burn money over and over.

The characteristics of these technologies reveal a distinct trait—they involve significant work in hardware-software co-design, not just pure software optimization. Hence the joke circulating: DeepSeek might become China's largest AI hardware company.

Its model page indicates that in scenarios with a 1 million token context, V4-Pro requires only 27% of the single-token inference compute and 10% of the cache footprint compared to the previous generation. For the calculations later in this article, we'll use the approximate value of a quarter of the compute.

Under the traditional approach, this hardware could only support one unit of throughput. However, through long-context compression, on-demand activation, cache reuse, and inference scheduling, DeepSeek can increase the effective token output of the same hardware to four times. The cost isn't "cut," but rather, it's diluted. Tasks that previously required 4 machines might now be handled by 1; the expensive hardware cost that previously had to be fully borne for generating every single token can now be spread across 4 tokens.

This is where DeepSeek truly excels: It hasn't changed NVIDIA's price, but it has changed the output efficiency of NVIDIA's machines in the AI ledger. The significance of this far surpasses a single API price reduction.

And the scale of one trillion dollars is not a baseless assumption.

McKinsey's 2026 report "The Cost of Compute" provides a concrete number: by 2030, global data centers need approximately $6.7 trillion in investment to keep up with compute demand, with the portion dedicated to handling AI workloads consuming about $5.2 trillion.

In other words, the money humanity plans to pour into AI hardware in the coming years is measured in trillions of dollars.

A significant portion of this colossal sum will flow to the most cutting-edge, scarcest hardware—namely, HBM high-bandwidth memory and LPDDR memory. What DeepSeek is doing is systematically reducing the Chinese AI industry's dependence on this expensive hardware. Even lowering that dependence by just a portion could save the industry a trillion-dollar sum of astronomical value.

When China's daily token consumption rises from today's hundreds of trillions to hundreds, even thousands of trillions, any reduction in per-token cost will be magnified into huge infrastructure savings. If the same throughput truly can be achieved with a quarter of the hardware, then in the foreseeable future, it could potentially save China's AI infrastructure close to one trillion dollars in compute hardware investment.

This is a fundamental infrastructure equation: whoever can make the same rigid hardware expenditure produce more tokens is effectively building fewer data centers, buying fewer GPUs, stacking less VRAM—redistributing the future tickets to the AI arena.

So, how does DeepSeek do it? The answer: it performs three "surgeries" on the large model machine.

Two Gas Guzzlers

A common misconception is that the most expensive part of a large model is "thinking," is computation. Actually, it's not.

Its two real gas guzzlers are "memory" and "body." And they burn the same type of most expensive fuel—High Bandwidth Memory (HBM), a type of memory directly integrated into the GPU package system, extremely fast and extremely expensive.

First, memory. When a large model generates text, it has an awkward characteristic: to spit out each new word, it must look back over all the previous content. Because the meaning of language is built up layer by layer; what to say next depends entirely on what context has already been established.

This is like a simultaneous interpreter. He can't start translating based solely on your last sentence; he must always keep in mind everything you've said before—only by remembering that context can he understand the true direction of the current sentence. The longer you speak, the more he must remember.

To avoid recalculating from scratch for every word (which would be unbearably slow), the model caches intermediate results it has already computed. This archive is called the KV Cache (Key-Value Cache, roughly understood as the model's short-term memory).

The trouble is, it expands crazily as the conversation lengthens.

Take a concrete number: estimated according to one standard architecture, processing a context of roughly 120,000 words, this memory alone could consume 488GB of HBM. NVIDIA's upcoming top-tier Rubin GPU has 288GB of VRAM per card. This means that just storing this memory would occupy nearly one and a half, even close to two of the most advanced GPUs' full VRAM—and the model hasn't even started doing real work yet.

Next, the body. The model's "body" refers to its parameter weights, which can be roughly understood as the carrier of all its knowledge and capabilities. The stronger the capability, the larger the body tends to be, often reaching hundreds of billions or trillions of parameters.

Traditional dense models (Dense Model, meaning models that must use all parameters for any input) have a flaw: no matter what you ask, they must mobilize their entire body. It's like going to a hospital just to see a dentist, but doctors from all departments are called to surround you, examining you from head to toe, and finally the dentist gets their turn. Absurd, but you pay the full bill.

This massive body also needs to reside in expensive HBM, ready at all times.

Memory and body, these two gas guzzlers, firmly anchor the value distribution of the entire hardware system to that most expensive, scarcest, and most externally constrained hardware. Over the past decade-plus, the industry's approach has been simple and crude: if compute power is insufficient, add more; if VRAM is insufficient, add more. Thus, the industry's wealth has become highly concentrated in this most cutting-edge hardware chain, with the fattest profits stuck at the scarcest link.

The price of tokens is thus held hostage by the scarcity of one type of hardware. And DeepSeek's three "surgeries" precisely aim to loosen this constraint.

First Surgery: Operating on the Brain

The first surgery falls on "memory." And the incision is made precisely where it's most untouchable, or dare not be touched—the Attention mechanism (the core mechanism large models use to understand contextual relationships).

The Attention mechanism is the large model's brain. Its ability to comprehend context and grasp key points in long conversations relies entirely on this mechanism repeatedly weighing relationships between words. That expensive memory mentioned earlier is precisely the product of each pulsation of this brain.

Wanting to save memory but fearing the risks, almost everyone chooses to bypass this brain, operating only on the periphery. From Multi-Query Attention (MQA) proposed in 2019 by Noam Shazeer, one of the original Transformer authors, to Grouped-Query Attention (GQA) proposed by Google in 2023 and widely adopted by models like Llama, the mainstream approach has always been "let multiple query heads share the same memory"—essentially "remember less, make do." The space-saving effect is impressive, but the cost is compromised model quality. Frankly, the consensus of this path has always been "compromise": defaulting that compression inevitably damages quality, only haggling over how much damage.

DeepSeek refuses to compromise. It chooses to operate directly on the brain, transforming the Attention mechanism itself.

Its solution is called Multi-head Latent Attention (MLA), first appearing in DeepSeek-V2 in 2024. To use an analogy: other models take notes by copying every detail verbatim, filling several large notebooks; MLA first distills the notes into a highly condensed summary, stores only the summary, and when needed, accurately reconstructs details based on the summary. Technically, this is called "low-rank compression"—projecting those seemingly vast, actually highly redundant memories into a much more compact space for storage.

How astounding is the effect? The DeepSeek-V2 paper shows that compared to its previous-generation sibling model, V2, while being more capable, reduced training cost by 42.5%, KV Cache by 93.3%, and increased maximum generation throughput by 5.76x. The previous example that consumed 488GB could potentially be compressed to a few GB using this approach.

But what's truly remarkable isn't how much is saved, but that it paid almost no cost in detail loss.

Logically, compressing a book into a one-page summary, no matter how you reconstruct it, cannot recover all details. Yet in experiments published by DeepSeek, this compressed memory not only matched but in some cases slightly outperformed the standard attention that "copies the entire book."

With V4, this approach was pushed to even more extreme long-context scenarios: V4-Pro employs a hybrid attention architecture, and in a 1 million token context setting, it requires only 27% of the inference compute and 10% of the cache footprint compared to the previous generation.

To appreciate how difficult this is, one must understand this is like performing surgery on a flying airplane. Modifying the Attention mechanism means rewriting the model's most fundamental computational logic, retraining the entire model, and redoing the entire service system that supports its operation. A mistake in any link, and the intelligence collapses. This isn't changing a tire valve; it's brain surgery.

And DeepSeek did it, making the AI healthier post-surgery than before.

Second and Third Surgeries: Installing Numbered Storage Lockers

The first surgery tamed memory. The second surgery tackles that massive "body."

DeepSeek didn't originate the idea for this surgery; it continues along a clear, established path: Mixture of Experts (MoE), a structure that splits the model into many "experts," only calling a few of them each time.

This concept dates back to 1991, introduced into neural networks by Shazeer et al. in 2017, and later brought into Transformer by Google's GShard and Switch Transformer; what truly made it mainstream was the Mixtral 8x7B released in late 2023 by the French company Mistral—with only a torrent link. Its total parameters were about 46.7 billion, but only about 12.9 billion were activated for processing each token.

Back to that "see a dentist but alarms the entire hospital" analogy. What MoE does is transform it into a hospital with clear, specialized departments: you come for dental care, the front desk directly guides you to the dental department, and doctors in other departments go about their business. The hospital's total staff can still be massive, with total parameters in the hundreds of billions or trillions, but only a small fraction actually mobilizes each time.

DeepSeek pushed this approach to a quite aggressive scale in V3, and by the V4 era it's even more extreme—V4-Pro has 1.6 trillion total parameters and 49 billion activated parameters; V4-Flash has 284 billion total parameters and 13 billion activated parameters. This means the model's "total body" continues to grow, but the part that actually moves each step is still kept very small.

But the true ingenuity of the second surgery lies not just in "mobilizing fewer doctors." It logically transforms how the model accesses this "body."

Here's a more fitting picture: Large models of the past were like a huge but utterly disorganized storage room: everything piled together, and every time you wanted just one thing, you had to swing open the door and rummage through everything from the bottom up to find it. To make this rummaging fast enough to handle the flood of customers, you had no choice but to place the entire storage room in the most expensive "downtown storefront"—the HBM.

DeepSeek transforms this storage room into a cabinet with tens of thousands of numbered compartments. Want to use something? Just pull open the corresponding numbered compartment directly, never touching the rest. This means you no longer need to pile the entire cabinet's contents in the most expensive storefront. Most temporarily unused compartments can be placed in much cheaper standard memory (LPDDR), or even cheaper solid-state drives, and quickly fetched when needed. Around this kind of offloading and streaming loading, both the DeepSeek ecosystem and open-source inference systems like SGLang are continuously exploring.

At this point, the synergy between the first two surgeries becomes clear: The first compresses "memory," the second numbers the "body" and only fetches the needed compartment. Combined, the portion of the machine that truly needs to occupy the most expensive VRAM at any given moment is compressed to an extremely low level.

The third surgery pushes this "fetch by number" logic to the extreme: even the "computation" action is saved wherever possible. Some computation results can actually be pre-computed and stored as numbered compartments for direct retrieval when needed, instead of being recomputed each time. It's like someone who knows the multiplication tables by heart doesn't count seven times eight on their fingers each time; they just say fifty-six. This substitutes a very low-cost "lookup" (memory read) for a very high-cost "hard computation" (chip compute).

In V4, this third surgery found a more direct commercial expression: the cache-hit price was set extremely low, and long-context reuse was directly written into the pricing system—repeated computation isn't just something you *can* save technically; you're *encouraged* to save it commercially.

Looking at the three surgeries together, they aren't three isolated things, but a progressive application of the same logic: transforming a messy heap you have to rummage through into a system where everything can be precisely fetched by number. Compress memory to the minimum, activate only the needed parts of the body, and prefer lookup over recomputation. Each surgery makes the machine's consumption of the most expensive hardware a bit smaller; together, for the same workload, its consumption of the most cutting-edge hardware is only a fraction of what it used to be.

How Cheap Does It Get?

In May 2026, DeepSeek announced converting the previous 75% discounted price for V4-Pro into its long-term price, creating a huge gap between the cache-hit price, cache-miss price, and output token price. The importance of the cache-hit price is that it turns DeepSeek's third surgery directly into a business rule: computed context shouldn't be charged repeatedly as "new work."

Placed in a real bill for comparison, the difference becomes concrete. For a medium-scale application processing 1 billion tokens monthly: using DeepSeek V4-Pro, the monthly bill is about $522; using Claude Opus 4.7, about $9,000; using GPT-5.5, about $10,000. That's a 17 to 19-fold difference.

Look at an extreme but common scenario: a long-context programming assistant repeatedly reading a 100,000-token codebase one hundred times. Thanks to the almost-free cache-hit pricing, DeepSeek spends only about $0.036 for this task; the same task costs about $5 for both GPT-5.5 and Claude Opus 4.7—a difference of over 100 times.

This price is explosively low, but it's not a loss leader; it's how cheaply this modified machine inherently runs—costs meticulously engineered down by the Chinese team. Two years ago, when Liang Wenfeng discussed pricing, he said the principle was "not to subsidize, nor to reap excessive profits." It should be understood this way: when your cost structure is fundamentally not on the same line as others, your pricing naturally isn't in the same range either.

Of course, this modification isn't a guaranteed win. For instance, moving loads to cheaper memory and disks, existing research points out, might incur costs in power consumption, latency, and scheduling complexity from frequent transfers. In some cases, the total system cost per generated token might not necessarily be lower unless hardware, software stacks, and storage media are further optimized. So these three surgeries represent an extremely delicate balancing act, not mindless cost-saving. But the direction is clear: use cheaper, more easily accessible resources to replace that most expensive, most supply-chain-constrained resource.

Turning "One Trillion" into a Tangible Equation

Having talked so much about "saving," let's translate it into a more intuitive picture: how many intelligence computing centers are we not building?

First, token volume. National statistics show that by March 2026, China's daily token call volume had already exceeded 140 trillion, a more than thousand-fold increase compared to early 2024. In industry terms, the Doubao large model alone also saw its daily usage exceed 120 trillion in the same month. While statistical boundaries differ, they collectively indicate one thing: China's AI token consumption has entered the daily operation scale of hundreds of trillions and is rapidly advancing towards thousands of trillions. So, 500 trillion tokens/day can be seen as the next stop in the near future; and 5,000 trillion tokens/day represents a high-volume scenario with widespread adoption of agents, multimodal capabilities, and code generation.

Against this backdrop, looking at the cost of computing centers, DeepSeek's value becomes prominent. In 2025, China Unicom began constructing a thousand-card intelligent computing inference center in Wuhan, with an initial investment of nearly 200 million yuan. We can roughly consider this a sample investment for a thousand-card-scale inference center: about 200 million yuan per center.

Calculated according to DeepSeek V4's efficiency improvements—at least in the long-context scenarios it excels at—the change it brings is not a mere ten-plus percent optimization, but a hardware efficiency improvement of several times. We won't take the most aggressive claim, but a more conservative, easier-to-understand assumption: V4's three-pronged approach increases the effective token throughput of the same hardware by 4 times. That means work that previously required 4 centers might now be done by 1; the 3 centers no longer needed represent savings equivalent to 75% of hardware investment.

Note, DeepSeek isn't simply using less storage. On the contrary, it's making smarter use of storage—using compressed attention, on-demand activation, cache hits, and inference scheduling to utilize the most expensive GPU and VRAM time more intensely. What's truly saved is the additional hardware that would have needed to be purchased for the same token throughput.

So, what does one trillion dollars correspond to? $1 trillion is roughly equivalent to 7 trillion RMB. At 200 million RMB per thousand-card inference center, 7 trillion RMB corresponds to 35,000 such centers. If the V4 approach yields a 4x effective throughput improvement, avoiding the need to build 35,000 equivalent centers would correspond to a daily token flow of approximately 5,000 trillion.

This is the industry landscape corresponding to the "one trillion dollars" mentioned in this article. This isn't a precise calculation from an engineering tender, but a fundamental infrastructure-scale equation, corresponding to a future token volume scenario years from now, not something already realized. Its true purpose is to illustrate: in an era of low call volumes, efficiency gains save a few cards or racks; in an era of daily thousands of trillions of tokens, efficiency gains save tens of thousands of intelligence computing centers that would otherwise have been built.

Therefore, what DeepSeek truly changes isn't the price of a single API call, but the future ledger of AI infrastructure.

It Reverses a Dangerous Trend

Now, back to that machine at the beginning. Remember? Out of the Vera Rubin's $7.8 million, $2 million is locked up in memory, and this portion is experiencing疯狂 price increases. This reveals a dangerous trend—the entire industry's value is being increasingly, and unhealthily, tied to memory chips. And memory shouldn't be pushed to be this expensive.

A common misconception is that DeepSeek is "adapting" to this trend because it also uses a lot of memory. Quite the opposite, DeepSeek is reversing it. The old method is passively, inefficiently devouring hardware, stacking value upside-down onto chips, letting memory be pushed along by price surges; DeepSeek first drastically reduces the real demand for hardware with its three surgeries, then intelligently allocates the remaining small demand to the cheapest, most suitable tier of storage. The former is "being pushed by prices"; the latter is "first calculating the ledger clearly, then deciding where to spend."

This difference is particularly important for China. Because it shifts the battlefield from a place where we are at a disadvantage to a place where we have a better chance. The most cutting-edge compute chips—we are temporarily behind. But memory and other storage chips are precisely where China has made real progress this year.

China's domestic DRAM leader, ChangXin Memory Technologies (CXMT), reported Q1 2026 revenue of 50.8 billion RMB, with net profit around 25 billion RMB. The company expects first-half net profit to reach 66 to 75 billion RMB, equivalent to earning ByteDance's entire annual net profit from last year in just half a year. Although CXMT still holds only the fourth position in the global DRAM market, this domestic production capacity, which was almost nonexistent before, is finally standing strong this year.

And this is precisely the strategic significance of DeepSeek's three surgeries. This isn't "substituting storage for compute," but reducing marginal dependence on the scarcest compute and shifting some of the pressure to more accessible storage, caching, and systems engineering. When an AI machine relies more on memory, caching, scheduling, and systems engineering—areas where we have a better chance of mastery—China's existing supply chain suddenly shifts from "constrained everywhere" to "sufficient," even "effective." This greatly enhances the security of the entire supply chain.

Conclusion

A Liang Wenfeng, who makes "eliminating inefficiency" his instinct, wouldn't be satisfied with making a single model slightly cheaper. What he's targeting is the greatest inefficiency in the entire AI industry—the premise taken as gospel by the whole industry: "to achieve stronger intelligence, one must rely on the most cutting-edge, scarcest, most expensive hardware."

If it can enable the entire industry to accomplish the same things with far less cutting-edge hardware, what it virtually creates for the industry is a trillion-dollar-scale, virtual production capacity base—occupying not an inch of factory space, yet genuinely freeing up the massive investment that would have been poured into hardware. That "one trillion" thus ceases to be a valuation story and becomes a fundamental infrastructure equation.

Portraying DeepSeek as "using algorithms to eliminate NVIDIA" is another cheap myth. But if we rephrase the question, the answer becomes interesting: Is it possible for DeepSeek to make the industry purchase less of the most expensive hardware, occupy less of the scarcest VRAM, and pay less of the inference cost previously considered a given? Yes. Is it possible for DeepSeek to redistribute the value of AI infrastructure from a singular high-end GPU narrative to model architecture, inference systems, cache management, storage scheduling, and engineering optimization? Also yes. This is its true industrial significance.

Genuine technological revolutions often aren't about making everything more expensive, but about suddenly making what was once affordable only to a few into everyday infrastructure affordable to the majority. From a broader perspective, what truly matters in this grand scheme isn't how much money is saved, but that the act of saving money quietly redistributes the tickets to the future back into the hands of China's myriad industries that need to be empowered by AI.

(This article is based on public information and industry discussion. Some forward-looking judgments herein, such as the trillion-dollar-scale infrastructure substitution value, hardware efficiency trade-offs, and equivalent cost calculations, belong to industry projections and debated viewpoints, not established facts. Readers are advised to view them with discretion.)

This article is from the WeChat public account "Hushuo Chengli" (胡說成理), author: Hu Zhe.

Questions liées

QAccording to the article, what is the core technological path DeepSeek is taking to reduce AI infrastructure costs?

ADeepSeek's core technological path involves three key optimizations: 1) Compressing 'memory' (KV Cache) through advanced attention mechanisms like MLA, drastically reducing high-bandwidth memory (HBM) requirements for long contexts. 2) Using a Mixture of Experts (MoE) architecture to activate only a small fraction of the model's total parameters ('body') for each token. 3) Implementing caching and reuse of computed results to avoid redundant calculations, turning expensive computation into cheap memory lookups. Together, these 'three cuts' significantly reduce dependency on the most expensive and scarce hardware components like HBM.

QWhat key industry trend does the article suggest DeepSeek is attempting to reverse?

AThe article suggests DeepSeek is attempting to reverse a dangerous trend where the value and cost of AI infrastructure are becoming increasingly and unhealthily concentrated on the most expensive and scarce components, particularly high-bandwidth memory (HBM). While traditional approaches passively accept rising memory costs and inefficiently consume hardware, DeepSeek's optimizations first drastically reduce the real demand for such premium hardware and then intelligently allocate the remaining needs to cheaper, more suitable storage tiers. This shifts the battlefield from an area of disadvantage (cutting-edge compute chips) to areas where China has growing strength, like memory chips and systems engineering.

QHow does the article justify the hypothetical figure of 'saving $1 trillion' for China's AI infrastructure?

AThe $1 trillion saving is presented as a large-scale infrastructure accounting projection, not a precise current calculation. It's based on: 1) Projections like McKinsey's estimate of $5.2 trillion in global AI hardware investment needed by 2030. 2) The rapid growth of China's daily token consumption, expected to reach levels like 5,000 trillion tokens/day. 3) DeepSeek V4's claimed efficiency improvements, conservatively estimated as a 4x increase in effective token throughput per unit of hardware. If the same throughput can be achieved with 1/4 of the hardware, then for massive future token volumes, it could theoretically save the equivalent cost of building tens of thousands of AI compute centers in China, aggregating to a scale of approximately $1 trillion (7 trillion RMB) in avoided hardware investment.

QWhat are the 'two oil guzzlers' of large language models identified in the article, and what expensive resource do they consume?

AThe 'two oil guzzlers' are: 1) 'Memory' (记性): This refers to the Key-Value (KV) Cache, the model's short-term memory of the context. It expands dramatically with conversation length. 2) 'Body' (身体): This refers to the model's parameter weights, its repository of knowledge and capability. Traditional dense models activate the entire 'body' for every token. Both of these components primarily consume and must reside in the most expensive and scarce resource: High-Bandwidth Memory (HBM) integrated within GPU packages.

QWhat is the strategic significance of DeepSeek's approach for China's AI industry, beyond just cost savings?

ABeyond cost savings, the strategic significance lies in reshaping the competitive landscape and enhancing supply chain security. DeepSeek's approach reduces the marginal dependency on the most scarce and geopolitically sensitive components (like cutting-edge GPUs and HBM where China trails). It shifts more of the performance burden to areas where China has growing capabilities, such as DRAM memory chips (e.g., CXMT), caching systems, storage scheduling, and systems engineering. This makes the entire AI supply chain less vulnerable to external constraints and more reliant on domestically improvable and obtainable resources, thereby increasing the overall security and sustainability of China's AI industry development.

Lectures associées

“小美”, Yuanbao, l'interconnexion, est-ce un précurseur pour l'agent intelligent de WeChat ?

Lors de la publication des résultats du premier trimestre 2026 de Meituan, l'attention a été attirée par l'annonce de Wang兴 concernant l'intégration de son assistant IA "Xiao Mei" avec "Yuan Bao" de Tencent. Cette collaboration permettra aux utilisateurs d'exprimer des besoins en services de proximité dans Yuan Bao, déclenchant automatiquement une communication "Agent à Agent" avec Xiao Mei pour accéder aux services de livraison de repas de Meituan. Cet accord stratégique vise à positionner Meituan comme une infrastructure de services de base, tout en s'appuyant sur une entrée IA externe pour compenser son manque d'accès indépendant. Cette démarche intervient dans un contexte où les assistants IA rivaux, comme Dou Bao (ByteDance) et Qian Wen (Alibaba), développent leurs propres écosystèmes fermés ("jardin clos") en intégrant des services transactionnels, contournant potentiellement les avantages de Meituan. L'article analyse les défis de cette collaboration : plafond de l'expérience utilisateur due à l'architecture "Agent à Agent", répartition des bénéfices, et fluidité de l'expérience entre deux écosystèmes distincts. Elle est également interprétée comme un test préliminaire crucial pour le futur "Agent IA" intégré à WeChat, dont le lancement a été rapporté par le Financial Times. Le succès de ce partenariat avec Meituan pourrait servir de modèle pour convaincre d'autres grandes plateformes de s'intégrer à l'écosystème IA de Tencent via des protocoles standardisés, préservant leur souveraineté des données tout en gagnant en visibilité.

marsbitIl y a 19 mins

“小美”, Yuanbao, l'interconnexion, est-ce un précurseur pour l'agent intelligent de WeChat ?

marsbitIl y a 19 mins

Morningstar valorise SpaceX à seulement 780 milliards, loin de la moitié de l'objectif d'IPO : le prix du « plus gros IPO de l'histoire » est-il excessif ?

SpaceX, sur le point de lancer ce qui pourrait être la plus grande introduction en bourse (IPO) de l'histoire, fait face à un sérieux doute sur sa valorisation. Morningstar estime sa juste valeur à seulement 780 milliards de dollars, soit environ 45% de l'objectif de 1 750 milliards de dollars visé par l'IPO. L'analyste Nicolas Owens considère la société comme "sérieusement surévaluée". Le modèle de Morningstar valorise les activités principales de lancement et le service Internet Starlink à environ 611 milliards de dollars. La partie "IA", incluant xAI, ne reçoit qu'une valorisation probabiliste de 170 milliards de dollars, l'analyste jugeant ces perspectives incertaines et dépendantes de technologies non éprouvées comme les data centers orbitaux. En réponse, Elon Musk a évoqué sur X le parcours de Tesla, dont la valorisation a explosé depuis son IPO. Cependant, SpaceX cherche une valorisation bien plus élevée que Tesla par rapport à ses revenus actuels. Malgré son analyse critique, Morningstar reconnaît que le cours de l'action pourrait monter à court terme après l'IPO, en raison du faible flottement initial (seulement 3% des actions), de la forte demande pour les titres liés à l'IA, et de l'inclusion rapide prévue dans l'indice Nasdaq 100. Mais des ventes importantes sont attendues à moyen terme en raison d'un calendrier de levée de restrictions (lock-up) échelonné pour les initiés. L'analyse soulève également des risques, dont un prêt relais de 20 milliards de dollars arrivant à échéance 15 mois après l'IPO, et des questions de gouvernance liées au contrôle majoritaire de Musk et à l'acquisition récente de xAI, une transaction entre parties liées. SpaceX doit commencer sa tournée de présentation aux investisseurs cette semaine, avec un prix fixé le 11 juin et une entrée en bourse prévue le 12 juin sous le code SPCX.

marsbitIl y a 23 mins

Morningstar valorise SpaceX à seulement 780 milliards, loin de la moitié de l'objectif d'IPO : le prix du « plus gros IPO de l'histoire » est-il excessif ?

marsbitIl y a 23 mins

a16z : Pourquoi les marchés prédictifs deviendront l'infrastructure de la "probabilité future"

Les marchés de prédiction, qui permettent de négocier sur les résultats d'événements futurs, évoluent d'un outil de niche vers un champ d'information publique plus large. Leur mécanisme est simple : transformer un événement futur en contrat négociable, où les participants utilisent des fonds réels pour exprimer leurs convictions, formant ainsi un prix reflétant une probabilité approximative. Leur force réside dans la capacité à agréger des informations dispersées en temps réel, avec l'incitation financière ("avoir tort coûte de l'argent") pour attirer ceux qui détiennent une réelle connaissance. L'article souligne que ces marchés ne sont pas des "oracles" magiques, mais exploitent le mécanisme fondamental du marché : allouer des ressources *et* agréger de l'information. Ils appliquent cette capacité d'agrégation pour estimer la probabilité d'occurrence d'événements spécifiques, des élections aux performances des modèles d'IA, là où les actifs traditionnels échouent. Cependant, leur efficacité n'est pas automatique. Elle dépend de qui participe, de la conception des contrats, de la vérification des résultats et de la résistance aux manipulations. Sans la participation des acteurs informés, les prix ne sont que du bruit. Une implication excessive d'informés privilégiés ou des tentatives de manipulation (par exemple, par des équipes de campagne) pourraient pervertir l'outil d'agrégation en outil de propagande. L'étape suivante ne consiste donc pas seulement à accroître le volume des échanges, mais à construire une infrastructure de marché plus fiable : règles de participation transparentes, conception claire des contrats, mécanismes de règlement vérifiables et garde-fous contre les manipulations. La valeur ultime des marchés de prédiction ne réside pas dans le pari sur l'avenir, mais dans leur capacité à fournir, dans un environnement très incertain, un nouveau signal probabiliste public.

marsbitIl y a 34 mins

a16z : Pourquoi les marchés prédictifs deviendront l'infrastructure de la "probabilité future"

marsbitIl y a 34 mins

La forte hausse des modules optiques : pourquoi NOK est-elle la deuxième tête d'affiche après MRVL ?

Le cours de Nokia s'échange actuellement autour de 16,8 $, ayant augmenté de près de 170% depuis l'investissement de 1 milliard de dollars de Nvidia en octobre 2025, portant sa capitalisation boursière à environ 90 milliards de dollars. Le marché réévalue l'entreprise comme un acteur des infrastructures réseau pour l'IA et en périphérie, délaissant son ancienne image d'équipementier télécoms cyclique. La collaboration stratégique AI-RAN avec Nvidia en est le principal catalyseur. Les résultats du Q1 2026 valident cette accélération : les ventes nettes AI & Cloud ont progressé de 49%, avec 1 milliard d'euros de nouvelles commandes. Nokia a relevé ses prévisions de croissance pour ce segment. Des tests réussis avec des opérateurs comme T-Mobile lors du MWC 2026 et l'ouverture d'un laboratoire d'innovation en Californie démontrent que l'intégration des GPU Nvidia dans les équipements réseau pour exécuter des charges de travail IA en parallèle passe du concept à des déploiements commerciaux précoces. Cependant, avec un PER passé à près de 100, la valorisation a déjà intégré une grande partie de l'optimisme futur. Le principal facteur à surveiller sera désormais la vitesse et l'ampleur de la concrétisation des commandes auprès des grands opérateurs, dans un contexte où Ericsson suit une voie différente avec ses propres puces ASIC. La marge d'erreur est désormais réduite.

marsbitIl y a 46 mins

La forte hausse des modules optiques : pourquoi NOK est-elle la deuxième tête d'affiche après MRVL ?

marsbitIl y a 46 mins

Trading

Spot
Futures

Articles tendance

Comment acheter ONE

Bienvenue sur HTX.com ! Nous vous permettons d'acheter Harmony (ONE) de manière simple et pratique. Suivez notre guide étape par étape pour commencer votre parcours crypto.Étape 1 : Création de votre compte HTXUtilisez votre adresse e-mail ou votre numéro de téléphone pour ouvrir un compte sur HTX gratuitement. L'inscription se fait en toute simplicité et débloque toutes les fonctionnalités.Créer mon compteÉtape 2 : Choix du mode de paiement (rubrique Acheter des cryptosCarte de crédit/débit : utilisez votre carte Visa ou Mastercard pour acheter instantanément Harmony (ONE).Solde :utilisez les fonds du solde de votre compte HTX pour trader en toute simplicité.Prestataire tiers :pour accroître la commodité d'utilisation, nous avons ajouté des modes de paiement populaires tels que Google Pay et Apple Pay.P2P :tradez directement avec d'autres utilisateurs sur HTX.OTC (de gré à gré) : nous offrons des services personnalisés et des taux de change compétitifs aux traders.Étape 3 : stockage de vos Harmony (ONE)Après avoir acheté vos Harmony (ONE), stockez-les sur votre compte HTX. Vous pouvez également les envoyer ailleurs via un transfert sur la blockchain ou les utiliser pour trader d'autres cryptos.Étape 4 : tradez des Harmony (ONE)Tradez facilement Harmony (ONE) sur le marché Spot de HTX. Il vous suffit d'accéder à votre compte, de sélectionner la paire de trading, d'exécuter vos trades et de les suivre en temps réel. Nous offrons une expérience conviviale aux débutants comme aux traders chevronnés.

389 vues totalesPublié le 2024.12.12Mis à jour le 2026.06.02

Comment acheter ONE

Discussions

Bienvenue dans la Communauté HTX. Ici, vous pouvez vous tenir informé(e) des derniers développements de la plateforme et accéder à des analyses de marché professionnelles. Les opinions des utilisateurs sur le prix de ONE (ONE) sont présentées ci-dessous.

活动图片