Can DeepSeek Save China One Trillion Dollars?

marsbitОпубліковано о 2026-06-03Востаннє оновлено о 2026-06-03

Анотація

"DeepSeek and the $1 Trillion Infrastructure Question" The article examines whether DeepSeek's AI optimization breakthroughs could potentially save China $1 trillion in future AI infrastructure costs. The analysis begins with Nvidia's upcoming Vera Rubin AI platform, costing ~$7.8 million, where memory (HBM4/LPDDR5X) constitutes $2 million—a 435% cost increase in one year, highlighting how AI hardware spending is shifting toward expensive memory components. DeepSeek's approach works in the opposite direction. Through three key technical innovations showcased in DeepSeek V4, the company dramatically improves hardware efficiency: 1. **Memory Compression (MLA)**: Re-engineers the attention mechanism to compress long-context memory (KV Cache) by over 90%, drastically reducing expensive HBM usage. 2. **Selective Activation (MoE)**: Employs Mixture-of-Experts architecture where only a small fraction of parameters (e.g., 49B out of 1.6T in V4-Pro) are activated per token, allowing most parameters to reside in cheaper memory/SSD. 3. **Computation Caching**: Reuses previously computed results via cache hits, replacing expensive GPU computations with cheap memory reads. Combined, these optimizations allow the same hardware to produce approximately 4x more tokens, effectively reducing required hardware investment by 75%. DeepSeek's pricing reflects this: a 10-billion token workload costs ~$522 monthly versus ~$9,000-$10,000 for competitors. The $1 trillion savings projection stems...

In the second half of 2026, NVIDIA will deliver its most powerful AI platform to date: the Vera Rubin VR200 NVL72. This single rack is packed with 72 Rubin GPUs and 36 Vera CPUs. Morgan Stanley estimates the bill-of-materials cost for this machine to be approximately $7.8 million.

That number is already staggering. But what's more crucial to watch is where the money is being spent.

Out of this $7.8 million, about $2 million isn't spent on the world-famous GPU chip or the compute cores. Instead, it's spent on memory—High Bandwidth Memory 4 (HBM4) and standard LPDDR5X. The cost of this memory component has skyrocketed by 435% in just one year due to price hikes.

This is a signal. Within the increasingly expensive AI machine, money is flowing in large quantities from the "components responsible for computation" to the "components responsible for memory and storage."

Remember this signal. Because the subject of this article, DeepSeek, is doing precisely the opposite: while everyone is being pushed by the era to pay AI hardware premiums for increasingly expensive memory, DeepSeek is figuring out how to—without compromising competitiveness—increase token throughput by over 4x through hardware-software co-design, effectively saving 75% of hardware investment.

And the ultimate implication of this effort has sparked a recent fervent discussion—can DeepSeek's efforts save one trillion dollars for China's AI infrastructure development?

Is this really possible?

One Trillion Dollars Saved Through Efficiency

The NVIDIA price tag mentioned earlier represents the hardest cost in the recent AI infrastructure ledger. Under the current supply-demand dynamics, if you want to buy the most advanced AI machines, you must accept this bill.

DeepSeek cannot change that.

What it changes is something else: how many tokens the same machine, with the same $2 million worth of expensive storage hardware, can actually produce.

This question has become particularly concrete following the release of DeepSeek V4.

What's even more noteworthy about V4 is not just the model itself, but the three-pronged approach it demonstrates: First, continue compressing "memory," preventing long contexts from overwhelming VRAM; Second, activate the "body" on demand, so the massive expert model doesn't need to fully mobilize every time; Third, turn repeated computations into reusable assets, so computed contexts don't burn money over and over.

The characteristics of these technologies reveal a distinct trait—they involve significant work in hardware-software co-design, not just pure software optimization. Hence the joke circulating: DeepSeek might become China's largest AI hardware company.

Its model page indicates that in scenarios with a 1 million token context, V4-Pro requires only 27% of the single-token inference compute and 10% of the cache footprint compared to the previous generation. For the calculations later in this article, we'll use the approximate value of a quarter of the compute.

Under the traditional approach, this hardware could only support one unit of throughput. However, through long-context compression, on-demand activation, cache reuse, and inference scheduling, DeepSeek can increase the effective token output of the same hardware to four times. The cost isn't "cut," but rather, it's diluted. Tasks that previously required 4 machines might now be handled by 1; the expensive hardware cost that previously had to be fully borne for generating every single token can now be spread across 4 tokens.

This is where DeepSeek truly excels: It hasn't changed NVIDIA's price, but it has changed the output efficiency of NVIDIA's machines in the AI ledger. The significance of this far surpasses a single API price reduction.

And the scale of one trillion dollars is not a baseless assumption.

McKinsey's 2026 report "The Cost of Compute" provides a concrete number: by 2030, global data centers need approximately $6.7 trillion in investment to keep up with compute demand, with the portion dedicated to handling AI workloads consuming about $5.2 trillion.

In other words, the money humanity plans to pour into AI hardware in the coming years is measured in trillions of dollars.

A significant portion of this colossal sum will flow to the most cutting-edge, scarcest hardware—namely, HBM high-bandwidth memory and LPDDR memory. What DeepSeek is doing is systematically reducing the Chinese AI industry's dependence on this expensive hardware. Even lowering that dependence by just a portion could save the industry a trillion-dollar sum of astronomical value.

When China's daily token consumption rises from today's hundreds of trillions to hundreds, even thousands of trillions, any reduction in per-token cost will be magnified into huge infrastructure savings. If the same throughput truly can be achieved with a quarter of the hardware, then in the foreseeable future, it could potentially save China's AI infrastructure close to one trillion dollars in compute hardware investment.

This is a fundamental infrastructure equation: whoever can make the same rigid hardware expenditure produce more tokens is effectively building fewer data centers, buying fewer GPUs, stacking less VRAM—redistributing the future tickets to the AI arena.

So, how does DeepSeek do it? The answer: it performs three "surgeries" on the large model machine.

Two Gas Guzzlers

A common misconception is that the most expensive part of a large model is "thinking," is computation. Actually, it's not.

Its two real gas guzzlers are "memory" and "body." And they burn the same type of most expensive fuel—High Bandwidth Memory (HBM), a type of memory directly integrated into the GPU package system, extremely fast and extremely expensive.

First, memory. When a large model generates text, it has an awkward characteristic: to spit out each new word, it must look back over all the previous content. Because the meaning of language is built up layer by layer; what to say next depends entirely on what context has already been established.

This is like a simultaneous interpreter. He can't start translating based solely on your last sentence; he must always keep in mind everything you've said before—only by remembering that context can he understand the true direction of the current sentence. The longer you speak, the more he must remember.

To avoid recalculating from scratch for every word (which would be unbearably slow), the model caches intermediate results it has already computed. This archive is called the KV Cache (Key-Value Cache, roughly understood as the model's short-term memory).

The trouble is, it expands crazily as the conversation lengthens.

Take a concrete number: estimated according to one standard architecture, processing a context of roughly 120,000 words, this memory alone could consume 488GB of HBM. NVIDIA's upcoming top-tier Rubin GPU has 288GB of VRAM per card. This means that just storing this memory would occupy nearly one and a half, even close to two of the most advanced GPUs' full VRAM—and the model hasn't even started doing real work yet.

Next, the body. The model's "body" refers to its parameter weights, which can be roughly understood as the carrier of all its knowledge and capabilities. The stronger the capability, the larger the body tends to be, often reaching hundreds of billions or trillions of parameters.

Traditional dense models (Dense Model, meaning models that must use all parameters for any input) have a flaw: no matter what you ask, they must mobilize their entire body. It's like going to a hospital just to see a dentist, but doctors from all departments are called to surround you, examining you from head to toe, and finally the dentist gets their turn. Absurd, but you pay the full bill.

This massive body also needs to reside in expensive HBM, ready at all times.

Memory and body, these two gas guzzlers, firmly anchor the value distribution of the entire hardware system to that most expensive, scarcest, and most externally constrained hardware. Over the past decade-plus, the industry's approach has been simple and crude: if compute power is insufficient, add more; if VRAM is insufficient, add more. Thus, the industry's wealth has become highly concentrated in this most cutting-edge hardware chain, with the fattest profits stuck at the scarcest link.

The price of tokens is thus held hostage by the scarcity of one type of hardware. And DeepSeek's three "surgeries" precisely aim to loosen this constraint.

First Surgery: Operating on the Brain

The first surgery falls on "memory." And the incision is made precisely where it's most untouchable, or dare not be touched—the Attention mechanism (the core mechanism large models use to understand contextual relationships).

The Attention mechanism is the large model's brain. Its ability to comprehend context and grasp key points in long conversations relies entirely on this mechanism repeatedly weighing relationships between words. That expensive memory mentioned earlier is precisely the product of each pulsation of this brain.

Wanting to save memory but fearing the risks, almost everyone chooses to bypass this brain, operating only on the periphery. From Multi-Query Attention (MQA) proposed in 2019 by Noam Shazeer, one of the original Transformer authors, to Grouped-Query Attention (GQA) proposed by Google in 2023 and widely adopted by models like Llama, the mainstream approach has always been "let multiple query heads share the same memory"—essentially "remember less, make do." The space-saving effect is impressive, but the cost is compromised model quality. Frankly, the consensus of this path has always been "compromise": defaulting that compression inevitably damages quality, only haggling over how much damage.

DeepSeek refuses to compromise. It chooses to operate directly on the brain, transforming the Attention mechanism itself.

Its solution is called Multi-head Latent Attention (MLA), first appearing in DeepSeek-V2 in 2024. To use an analogy: other models take notes by copying every detail verbatim, filling several large notebooks; MLA first distills the notes into a highly condensed summary, stores only the summary, and when needed, accurately reconstructs details based on the summary. Technically, this is called "low-rank compression"—projecting those seemingly vast, actually highly redundant memories into a much more compact space for storage.

How astounding is the effect? The DeepSeek-V2 paper shows that compared to its previous-generation sibling model, V2, while being more capable, reduced training cost by 42.5%, KV Cache by 93.3%, and increased maximum generation throughput by 5.76x. The previous example that consumed 488GB could potentially be compressed to a few GB using this approach.

But what's truly remarkable isn't how much is saved, but that it paid almost no cost in detail loss.

Logically, compressing a book into a one-page summary, no matter how you reconstruct it, cannot recover all details. Yet in experiments published by DeepSeek, this compressed memory not only matched but in some cases slightly outperformed the standard attention that "copies the entire book."

With V4, this approach was pushed to even more extreme long-context scenarios: V4-Pro employs a hybrid attention architecture, and in a 1 million token context setting, it requires only 27% of the inference compute and 10% of the cache footprint compared to the previous generation.

To appreciate how difficult this is, one must understand this is like performing surgery on a flying airplane. Modifying the Attention mechanism means rewriting the model's most fundamental computational logic, retraining the entire model, and redoing the entire service system that supports its operation. A mistake in any link, and the intelligence collapses. This isn't changing a tire valve; it's brain surgery.

And DeepSeek did it, making the AI healthier post-surgery than before.

Second and Third Surgeries: Installing Numbered Storage Lockers

The first surgery tamed memory. The second surgery tackles that massive "body."

DeepSeek didn't originate the idea for this surgery; it continues along a clear, established path: Mixture of Experts (MoE), a structure that splits the model into many "experts," only calling a few of them each time.

This concept dates back to 1991, introduced into neural networks by Shazeer et al. in 2017, and later brought into Transformer by Google's GShard and Switch Transformer; what truly made it mainstream was the Mixtral 8x7B released in late 2023 by the French company Mistral—with only a torrent link. Its total parameters were about 46.7 billion, but only about 12.9 billion were activated for processing each token.

Back to that "see a dentist but alarms the entire hospital" analogy. What MoE does is transform it into a hospital with clear, specialized departments: you come for dental care, the front desk directly guides you to the dental department, and doctors in other departments go about their business. The hospital's total staff can still be massive, with total parameters in the hundreds of billions or trillions, but only a small fraction actually mobilizes each time.

DeepSeek pushed this approach to a quite aggressive scale in V3, and by the V4 era it's even more extreme—V4-Pro has 1.6 trillion total parameters and 49 billion activated parameters; V4-Flash has 284 billion total parameters and 13 billion activated parameters. This means the model's "total body" continues to grow, but the part that actually moves each step is still kept very small.

But the true ingenuity of the second surgery lies not just in "mobilizing fewer doctors." It logically transforms how the model accesses this "body."

Here's a more fitting picture: Large models of the past were like a huge but utterly disorganized storage room: everything piled together, and every time you wanted just one thing, you had to swing open the door and rummage through everything from the bottom up to find it. To make this rummaging fast enough to handle the flood of customers, you had no choice but to place the entire storage room in the most expensive "downtown storefront"—the HBM.

DeepSeek transforms this storage room into a cabinet with tens of thousands of numbered compartments. Want to use something? Just pull open the corresponding numbered compartment directly, never touching the rest. This means you no longer need to pile the entire cabinet's contents in the most expensive storefront. Most temporarily unused compartments can be placed in much cheaper standard memory (LPDDR), or even cheaper solid-state drives, and quickly fetched when needed. Around this kind of offloading and streaming loading, both the DeepSeek ecosystem and open-source inference systems like SGLang are continuously exploring.

At this point, the synergy between the first two surgeries becomes clear: The first compresses "memory," the second numbers the "body" and only fetches the needed compartment. Combined, the portion of the machine that truly needs to occupy the most expensive VRAM at any given moment is compressed to an extremely low level.

The third surgery pushes this "fetch by number" logic to the extreme: even the "computation" action is saved wherever possible. Some computation results can actually be pre-computed and stored as numbered compartments for direct retrieval when needed, instead of being recomputed each time. It's like someone who knows the multiplication tables by heart doesn't count seven times eight on their fingers each time; they just say fifty-six. This substitutes a very low-cost "lookup" (memory read) for a very high-cost "hard computation" (chip compute).

In V4, this third surgery found a more direct commercial expression: the cache-hit price was set extremely low, and long-context reuse was directly written into the pricing system—repeated computation isn't just something you *can* save technically; you're *encouraged* to save it commercially.

Looking at the three surgeries together, they aren't three isolated things, but a progressive application of the same logic: transforming a messy heap you have to rummage through into a system where everything can be precisely fetched by number. Compress memory to the minimum, activate only the needed parts of the body, and prefer lookup over recomputation. Each surgery makes the machine's consumption of the most expensive hardware a bit smaller; together, for the same workload, its consumption of the most cutting-edge hardware is only a fraction of what it used to be.

How Cheap Does It Get?

In May 2026, DeepSeek announced converting the previous 75% discounted price for V4-Pro into its long-term price, creating a huge gap between the cache-hit price, cache-miss price, and output token price. The importance of the cache-hit price is that it turns DeepSeek's third surgery directly into a business rule: computed context shouldn't be charged repeatedly as "new work."

Placed in a real bill for comparison, the difference becomes concrete. For a medium-scale application processing 1 billion tokens monthly: using DeepSeek V4-Pro, the monthly bill is about $522; using Claude Opus 4.7, about $9,000; using GPT-5.5, about $10,000. That's a 17 to 19-fold difference.

Look at an extreme but common scenario: a long-context programming assistant repeatedly reading a 100,000-token codebase one hundred times. Thanks to the almost-free cache-hit pricing, DeepSeek spends only about $0.036 for this task; the same task costs about $5 for both GPT-5.5 and Claude Opus 4.7—a difference of over 100 times.

This price is explosively low, but it's not a loss leader; it's how cheaply this modified machine inherently runs—costs meticulously engineered down by the Chinese team. Two years ago, when Liang Wenfeng discussed pricing, he said the principle was "not to subsidize, nor to reap excessive profits." It should be understood this way: when your cost structure is fundamentally not on the same line as others, your pricing naturally isn't in the same range either.

Of course, this modification isn't a guaranteed win. For instance, moving loads to cheaper memory and disks, existing research points out, might incur costs in power consumption, latency, and scheduling complexity from frequent transfers. In some cases, the total system cost per generated token might not necessarily be lower unless hardware, software stacks, and storage media are further optimized. So these three surgeries represent an extremely delicate balancing act, not mindless cost-saving. But the direction is clear: use cheaper, more easily accessible resources to replace that most expensive, most supply-chain-constrained resource.

Turning "One Trillion" into a Tangible Equation

Having talked so much about "saving," let's translate it into a more intuitive picture: how many intelligence computing centers are we not building?

First, token volume. National statistics show that by March 2026, China's daily token call volume had already exceeded 140 trillion, a more than thousand-fold increase compared to early 2024. In industry terms, the Doubao large model alone also saw its daily usage exceed 120 trillion in the same month. While statistical boundaries differ, they collectively indicate one thing: China's AI token consumption has entered the daily operation scale of hundreds of trillions and is rapidly advancing towards thousands of trillions. So, 500 trillion tokens/day can be seen as the next stop in the near future; and 5,000 trillion tokens/day represents a high-volume scenario with widespread adoption of agents, multimodal capabilities, and code generation.

Against this backdrop, looking at the cost of computing centers, DeepSeek's value becomes prominent. In 2025, China Unicom began constructing a thousand-card intelligent computing inference center in Wuhan, with an initial investment of nearly 200 million yuan. We can roughly consider this a sample investment for a thousand-card-scale inference center: about 200 million yuan per center.

Calculated according to DeepSeek V4's efficiency improvements—at least in the long-context scenarios it excels at—the change it brings is not a mere ten-plus percent optimization, but a hardware efficiency improvement of several times. We won't take the most aggressive claim, but a more conservative, easier-to-understand assumption: V4's three-pronged approach increases the effective token throughput of the same hardware by 4 times. That means work that previously required 4 centers might now be done by 1; the 3 centers no longer needed represent savings equivalent to 75% of hardware investment.

Note, DeepSeek isn't simply using less storage. On the contrary, it's making smarter use of storage—using compressed attention, on-demand activation, cache hits, and inference scheduling to utilize the most expensive GPU and VRAM time more intensely. What's truly saved is the additional hardware that would have needed to be purchased for the same token throughput.

So, what does one trillion dollars correspond to? $1 trillion is roughly equivalent to 7 trillion RMB. At 200 million RMB per thousand-card inference center, 7 trillion RMB corresponds to 35,000 such centers. If the V4 approach yields a 4x effective throughput improvement, avoiding the need to build 35,000 equivalent centers would correspond to a daily token flow of approximately 5,000 trillion.

This is the industry landscape corresponding to the "one trillion dollars" mentioned in this article. This isn't a precise calculation from an engineering tender, but a fundamental infrastructure-scale equation, corresponding to a future token volume scenario years from now, not something already realized. Its true purpose is to illustrate: in an era of low call volumes, efficiency gains save a few cards or racks; in an era of daily thousands of trillions of tokens, efficiency gains save tens of thousands of intelligence computing centers that would otherwise have been built.

Therefore, what DeepSeek truly changes isn't the price of a single API call, but the future ledger of AI infrastructure.

It Reverses a Dangerous Trend

Now, back to that machine at the beginning. Remember? Out of the Vera Rubin's $7.8 million, $2 million is locked up in memory, and this portion is experiencing疯狂 price increases. This reveals a dangerous trend—the entire industry's value is being increasingly, and unhealthily, tied to memory chips. And memory shouldn't be pushed to be this expensive.

A common misconception is that DeepSeek is "adapting" to this trend because it also uses a lot of memory. Quite the opposite, DeepSeek is reversing it. The old method is passively, inefficiently devouring hardware, stacking value upside-down onto chips, letting memory be pushed along by price surges; DeepSeek first drastically reduces the real demand for hardware with its three surgeries, then intelligently allocates the remaining small demand to the cheapest, most suitable tier of storage. The former is "being pushed by prices"; the latter is "first calculating the ledger clearly, then deciding where to spend."

This difference is particularly important for China. Because it shifts the battlefield from a place where we are at a disadvantage to a place where we have a better chance. The most cutting-edge compute chips—we are temporarily behind. But memory and other storage chips are precisely where China has made real progress this year.

China's domestic DRAM leader, ChangXin Memory Technologies (CXMT), reported Q1 2026 revenue of 50.8 billion RMB, with net profit around 25 billion RMB. The company expects first-half net profit to reach 66 to 75 billion RMB, equivalent to earning ByteDance's entire annual net profit from last year in just half a year. Although CXMT still holds only the fourth position in the global DRAM market, this domestic production capacity, which was almost nonexistent before, is finally standing strong this year.

And this is precisely the strategic significance of DeepSeek's three surgeries. This isn't "substituting storage for compute," but reducing marginal dependence on the scarcest compute and shifting some of the pressure to more accessible storage, caching, and systems engineering. When an AI machine relies more on memory, caching, scheduling, and systems engineering—areas where we have a better chance of mastery—China's existing supply chain suddenly shifts from "constrained everywhere" to "sufficient," even "effective." This greatly enhances the security of the entire supply chain.

Conclusion

A Liang Wenfeng, who makes "eliminating inefficiency" his instinct, wouldn't be satisfied with making a single model slightly cheaper. What he's targeting is the greatest inefficiency in the entire AI industry—the premise taken as gospel by the whole industry: "to achieve stronger intelligence, one must rely on the most cutting-edge, scarcest, most expensive hardware."

If it can enable the entire industry to accomplish the same things with far less cutting-edge hardware, what it virtually creates for the industry is a trillion-dollar-scale, virtual production capacity base—occupying not an inch of factory space, yet genuinely freeing up the massive investment that would have been poured into hardware. That "one trillion" thus ceases to be a valuation story and becomes a fundamental infrastructure equation.

Portraying DeepSeek as "using algorithms to eliminate NVIDIA" is another cheap myth. But if we rephrase the question, the answer becomes interesting: Is it possible for DeepSeek to make the industry purchase less of the most expensive hardware, occupy less of the scarcest VRAM, and pay less of the inference cost previously considered a given? Yes. Is it possible for DeepSeek to redistribute the value of AI infrastructure from a singular high-end GPU narrative to model architecture, inference systems, cache management, storage scheduling, and engineering optimization? Also yes. This is its true industrial significance.

Genuine technological revolutions often aren't about making everything more expensive, but about suddenly making what was once affordable only to a few into everyday infrastructure affordable to the majority. From a broader perspective, what truly matters in this grand scheme isn't how much money is saved, but that the act of saving money quietly redistributes the tickets to the future back into the hands of China's myriad industries that need to be empowered by AI.

(This article is based on public information and industry discussion. Some forward-looking judgments herein, such as the trillion-dollar-scale infrastructure substitution value, hardware efficiency trade-offs, and equivalent cost calculations, belong to industry projections and debated viewpoints, not established facts. Readers are advised to view them with discretion.)

This article is from the WeChat public account "Hushuo Chengli" (胡說成理), author: Hu Zhe.

Пов'язані питання

QAccording to the article, what is the core technological path DeepSeek is taking to reduce AI infrastructure costs?

ADeepSeek's core technological path involves three key optimizations: 1) Compressing 'memory' (KV Cache) through advanced attention mechanisms like MLA, drastically reducing high-bandwidth memory (HBM) requirements for long contexts. 2) Using a Mixture of Experts (MoE) architecture to activate only a small fraction of the model's total parameters ('body') for each token. 3) Implementing caching and reuse of computed results to avoid redundant calculations, turning expensive computation into cheap memory lookups. Together, these 'three cuts' significantly reduce dependency on the most expensive and scarce hardware components like HBM.

QWhat key industry trend does the article suggest DeepSeek is attempting to reverse?

AThe article suggests DeepSeek is attempting to reverse a dangerous trend where the value and cost of AI infrastructure are becoming increasingly and unhealthily concentrated on the most expensive and scarce components, particularly high-bandwidth memory (HBM). While traditional approaches passively accept rising memory costs and inefficiently consume hardware, DeepSeek's optimizations first drastically reduce the real demand for such premium hardware and then intelligently allocate the remaining needs to cheaper, more suitable storage tiers. This shifts the battlefield from an area of disadvantage (cutting-edge compute chips) to areas where China has growing strength, like memory chips and systems engineering.

QHow does the article justify the hypothetical figure of 'saving $1 trillion' for China's AI infrastructure?

AThe $1 trillion saving is presented as a large-scale infrastructure accounting projection, not a precise current calculation. It's based on: 1) Projections like McKinsey's estimate of $5.2 trillion in global AI hardware investment needed by 2030. 2) The rapid growth of China's daily token consumption, expected to reach levels like 5,000 trillion tokens/day. 3) DeepSeek V4's claimed efficiency improvements, conservatively estimated as a 4x increase in effective token throughput per unit of hardware. If the same throughput can be achieved with 1/4 of the hardware, then for massive future token volumes, it could theoretically save the equivalent cost of building tens of thousands of AI compute centers in China, aggregating to a scale of approximately $1 trillion (7 trillion RMB) in avoided hardware investment.

QWhat are the 'two oil guzzlers' of large language models identified in the article, and what expensive resource do they consume?

AThe 'two oil guzzlers' are: 1) 'Memory' (记性): This refers to the Key-Value (KV) Cache, the model's short-term memory of the context. It expands dramatically with conversation length. 2) 'Body' (身体): This refers to the model's parameter weights, its repository of knowledge and capability. Traditional dense models activate the entire 'body' for every token. Both of these components primarily consume and must reside in the most expensive and scarce resource: High-Bandwidth Memory (HBM) integrated within GPU packages.

QWhat is the strategic significance of DeepSeek's approach for China's AI industry, beyond just cost savings?

ABeyond cost savings, the strategic significance lies in reshaping the competitive landscape and enhancing supply chain security. DeepSeek's approach reduces the marginal dependency on the most scarce and geopolitically sensitive components (like cutting-edge GPUs and HBM where China trails). It shifts more of the performance burden to areas where China has growing capabilities, such as DRAM memory chips (e.g., CXMT), caching systems, storage scheduling, and systems engineering. This makes the entire AI supply chain less vulnerable to external constraints and more reliant on domestically improvable and obtainable resources, thereby increasing the overall security and sustainability of China's AI industry development.

Пов'язані матеріали

Overturning the Mainstream Approach to Hallucinations: Metacognition is the New Solution for Large Models to Break the Hallucination Barrier

This paper, "Hallucinations Undermine Trust; Metacognition is a Way Forward," proposes a paradigm shift in combating AI hallucination. It argues that the current mainstream approaches—striving for omniscience by scaling data/models or having AI abstain from uncertain answers—are fundamentally flawed. The former has inevitable knowledge gaps, while the latter imposes a crippling "utility tax," requiring the rejection of many correct answers to achieve high accuracy, due to models' poor "discrimination" (the ability to distinguish correct from incorrect answers internally). The core contribution is redefining hallucination not as "being wrong," but as "expressing false information with unwarranted certainty." The proposed solution is **Faithful Uncertainty** or **Metacognition**: enabling AI to accurately perceive its internal uncertainty and honestly express it in its language (e.g., using hedging phrases when unsure). This creates a more reliable assistant that provides useful information while signaling its confidence, minimizing harm from errors. The paper emphasizes that metacognition is critical for the era of AI Agents. Without it, Agents cannot intelligently decide when to use tools like search engines, leading to inefficiency and misuse. Key implementation challenges are highlighted: the "bootstrapping paradox" of training with static uncertainty data, the "alignment distortion signal" where human preference training suppresses internal uncertainty cues, and the difficulty of causally evaluating true metacognition vs. its superficial imitation. The paper concludes that the goal should not be an infallible AI, but one that is honest about the limits of its knowledge, thereby building user trust through transparent communication of its certainty.

marsbit1 год тому

Overturning the Mainstream Approach to Hallucinations: Metacognition is the New Solution for Large Models to Break the Hallucination Barrier

marsbit1 год тому

Hedge by Buying Gold and Oil, Chase Soaring Returns with AI. ‘Dated’ Bitcoin Enters a Bear Market

Bitcoin has recently declined, hitting a two-month low near $66,123, while Ethereum fell to a three-month low around $1,837. Analysts suggest the drop is not merely due to factors like ETF outflows or MicroStrategy's selling but reflects a deeper issue: Bitcoin is losing a broader asset competition. In a near-zero interest rate environment, Bitcoin previously thrived as an outlet for investor dissatisfaction with inflation and limited options. However, the market landscape has shifted. Bitcoin now occupies an "awkward middle ground," facing competition on three fronts. For inflation hedging, investors prefer gold, energy stocks, and commodity producers—assets with tangible backing and clearer pricing power. For growth exposure, AI-related companies with actual revenues and profits are more attractive. Even within crypto, investors can choose stablecoins, exchanges, or infrastructure firms tied directly to adoption, offering clearer business models and leverage. Thus, Bitcoin is no longer the top choice for hedging, growth, or crypto exposure. This shift is evident in market reactions: despite recent warnings about persistent inflation from a Fed official, Bitcoin did not rally as it might have in the past. Instead, capital flowed to assets with direct commodity or energy exposure. The recent ETF outflows and MicroStrategy sales are symptoms, not causes, of this new reality. Investors are becoming more selective, demanding clearer value propositions beyond mere scarcity. The emerging bear case for Bitcoin is not about it being a bubble or failed technology, but that scarcity alone is no longer sufficient.

华尔街日报1 год тому

Hedge by Buying Gold and Oil, Chase Soaring Returns with AI. ‘Dated’ Bitcoin Enters a Bear Market

华尔街日报1 год тому

Торгівля

Спот
Ф'ючерси

Популярні статті

Як купити ONE

Ласкаво просимо до HTX.com! Ми зробили покупку Harmony (ONE) простою та зручною. Дотримуйтесь нашої покрокової інструкції, щоб розпочати свою криптовалютну подорож.Крок 1: Створіть обліковий запис на HTXВикористовуйте свою електронну пошту або номер телефону, щоб зареєструвати обліковий запис на HTX безплатно. Пройдіть безпроблемну реєстрацію й отримайте доступ до всіх функцій.ЗареєструватисьКрок 2: Перейдіть до розділу Купити крипту і виберіть спосіб оплатиКредитна/дебетова картка: використовуйте вашу картку Visa або Mastercard, щоб миттєво купити Harmony (ONE).Баланс: використовуйте кошти з балансу вашого рахунку HTX для безперешкодної торгівлі.Треті особи: ми додали популярні способи оплати, такі як Google Pay та Apple Pay, щоб підвищити зручність.P2P: Торгуйте безпосередньо з іншими користувачами на HTX.Позабіржова торгівля (OTC): ми пропонуємо індивідуальні послуги та конкурентні обмінні курси для трейдерів.Крок 3: Зберігайте свої Harmony (ONE)Після придбання Harmony (ONE) збережіть його у своєму обліковому записі на HTX. Крім того, ви можете відправити його в інше місце за допомогою блокчейн-переказу або використовувати його для торгівлі іншими криптовалютами.Крок 4: Торгівля Harmony (ONE)Легко торгуйте Harmony (ONE) на спотовому ринку HTX. Просто увійдіть до свого облікового запису, виберіть торгову пару, укладайте угоди та спостерігайте за ними в режимі реального часу. Ми пропонуємо зручний досвід як для початківців, так і для досвідчених трейдерів.

345 переглядів усьогоОпубліковано 2024.12.12Оновлено 2026.06.02

Як купити ONE

Обговорення

Ласкаво просимо до спільноти HTX. Тут ви можете бути в курсі останніх подій розвитку платформи та отримати доступ до професійної ринкової інформації. Нижче представлені думки користувачів щодо ціни ONE (ONE).

活动图片