Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

marsbitPublished on 2026-04-26Last updated on 2026-04-26

Abstract

DeepSeek-V4 has been released as a preview open-source model, featuring 1 million tokens of context length as a baseline capability—previously a premium feature locked behind enterprise paywalls by major overseas AI firms. The official announcement, however, openly acknowledges computational constraints, particularly limited service throughput for the high-end DeepSeek-V4-Pro version due to restricted high-end computing power. Rather than competing on pure scale, DeepSeek adopts a pragmatic approach that balances algorithmic innovation with hardware realities in China’s AI ecosystem. The V4-Pro model uses a highly sparse architecture with 1.6T total parameters but only activates 49B during inference. It performs strongly in agentic coding, knowledge-intensive tasks, and STEM reasoning, competing closely with top-tier closed models like Gemini Pro 3.1 and Claude Opus 4.6 in certain scenarios. A key strategic product is the Flash edition, with 284B total parameters but only 13B activated—making it cost-effective and accessible for mid- and low-tier hardware, including domestic AI chips from Huawei (Ascend), Cambricon, and Hygon. This design supports broader adoption across developers and SMEs while stimulating China's domestic semiconductor ecosystem. Despite facing talent outflow and intense competition in user traffic—with rivals like Doubao and Qianwen leading in monthly active users—DeepSeek has maintained technical momentum. The release also comes amid reports of a new...

By | Tech Never Cold

On April 24, a shoe dropped in the domestic large model race. The preview version of DeepSeek-V4 was officially launched and simultaneously open-sourced, directly making 1M (one million words) of ultra-long context a standard feature of the official service.

If this were a year ago, this level of long-text processing capability was still an exclusive right locked behind the enterprise paywalls of overseas tech giants. Now, it's laid out on the open-source community's table, becoming infrastructure that developers can use at will. For developers who have been staying up late dealing with lengthy codebases or complex legal contracts, this is undoubtedly good news.

But behind this technological release, the official press release contained a very restrained admission: "Limited by high-end computing power, the service throughput of DeepSeek-V4-Pro is currently very constrained."

For those accustomed to vendors boasting about their computing reserves at launch events, this candor carries a rare sense of cold realism.

As the large model race enters its second half, the industry has a clear idea of who holds how many high-end hardware chips. Rather than maintaining the illusion of parametric prosperity, it's better to lay bare the current state of the industry. DeepSeek's move actually abandons the obsession with pure benchmark competition, finding a compromise between core breakthroughs, China's still-maturing heterogeneous computing ecosystem, and the real commercial environment of enterprises.

China's AI industry is shedding its early blind cash-burning exterior and entering an extremely realistic era of "computing power accounting."

How to Balance the Pro Version's Computing Power Ledger?

Looking specifically at the throughput-limited V4-Pro. As the flagship of the system, V4-Pro has a total parameter count of up to 1.6T, but only activates 49B parameters during inference. This extreme sparsity design is not just a showcase model; under the harsh testing of real production lines, its technical foundation is highly robust.

The ability to handle complex code and logical reasoning is the litmus test for whether a large model can truly enter core production. In the Agentic Coding evaluation environment, V4-Pro's practical performance firmly places it in the top tier of current open-source models.

DeepSeek has already integrated it into its internal code pipeline, making it a productivity tool heavily relied upon by frontline engineers. Feedback from R&D personnel indicates that its code generation and error correction experience is better than Sonnet 4.5, and in non-deep-thinking scenarios, it is close to Opus 4.6, though there is still a gap with Opus 4.6's thinking mode.

Behind this实战 performance is the R&D team's极致挖掘 of algorithmic depth. In world knowledge evaluations that test the quality of pre-training data cleaning and knowledge density, V4-Pro leads most existing open-source models, currently only slightly behind the top closed-source model Gemini-Pro-3.1. As for mathematics, STEM (Science, Technology, Engineering, Mathematics), and competitive code evaluations, it has earned the right to compete with the world's top closed-source giants.

Gaining this capability显然 does not rely solely on stacking compute cards. Domestic teams know well that competing on high-end GPU reserves is unrealistic. That V4-Pro can handle 1M of ultra-large context with limited VRAM is underpinned by the R&D team's deep reconstruction of the attention mechanism. They implemented a novel attention compression scheme, performing high-intensity compression at the token dimension, paired with their signature DSA sparse attention technology (DeepSeek Sparse Attention).

This original technical路线, combined with the首次引入 KV Cache sliding window and compression algorithm, effectively controls the computational overhead and memory usage brought by long sequence processing. To allow developers to truly调用 its capabilities in business, the R&D team specifically made underlying adaptations for mainstream Agent tools like Claude Code and OpenClaw.

The technical documentation even explicitly states that developers can directly enable the thinking mode for complex tasks by setting the reasoning_effort parameter to max. This system-level engineering optimization under limited computing resources恰恰 proves to the industry that even with constrained high-end computing power, local teams can still expand the performance boundaries of models through native architecture design.

Who Is Pinned Down by the 13B Activation Quantity?

Those focusing on the Pro version's throughput bottleneck often overlook the commercial pivot DeepSeek has hidden behind it: the Flash version. Some industry voices believe this is merely a compromise product under computing power shortages, a view that显然 underestimates the management team's long-term considerations. This is a pragmatic move to卡位 the下沉 ecosystem after rigorous cost accounting.

According to publicly disclosed adaptation code information, the Flash version maintains a massive total parameter count of 284B, but its activated parameter quantity is precisely capped at 13B.

13B, in a context where peers are trying to push parameters to trillions, seems unremarkable. But this恰恰 reflects the economic logic of the Mixture of Experts (MoE) architecture in commercial deployment: total parameters determine the breadth of the model's knowledge, while activated parameters directly determine the electricity cost and memory bandwidth the server needs to expend for each API call.

Capping the activation at 13B directly剥离 the large model from the expensive top-tier AI computing centers. Its demands on single-card VRAM and compute peaks are very modest. Actual tests show that the Flash version maintains stable response speed and accuracy when handling massive, high-frequency simple daily tasks, with no significant drop in underlying general reasoning capability. For small and medium-sized developers and long-tail enterprises that need to handle thousands of API calls daily, this is a truly affordable and runnable productivity tool.

The deeper industrial logic lies in the fact that mainstream domestic heterogeneous computing chips are still in the catch-up phase in terms of single-card absolute performance. Computing systems carrying full activation easily hit the memory wall, leading to low operational efficiency; but facing the Flash version with only 13B activation, these chips can run smoothly at medium-to-low power consumption.

DeepSeek's move revitalizes a large amount of idle mid-to-low-end computing resources in China, providing a highly suitable testing ground for domestic chips urgently needing deployment scenarios. This downward-compatible infrastructure building logic is far more aligned with current commercial reality than simply chasing rankings on various test leaderboards.

Can Domestic Chips Handle It?

What sparked widespread industry discussion about this release was its label of full-stack domestic deployment. For a long time, there has been a certain misalignment between algorithm companies and domestic chip manufacturers: model vendors worried that an immature hardware ecosystem would hinder R&D progress, while chip manufacturers lacked cutting-edge large models for deep optimization. This time, the deadlock has been substantively broken.

Huawei Computing quickly announced that its Ascend Super Node full product series fully supports the new model. From a technical perspective, the underlying Ascend chips rely on fused kernel and multi-stream parallel technology to effectively reduce system computational overhead, thereby stabilizing推理 performance in long-text scenarios. Cambricon also quickly completed Day 0 adaptation and open-sourced the underlying code, while Hygon DCU simultaneously announced a closed-loop打通.

But we need to look beyond the表象 of ecological prosperity and examine the real resistance faced when stitching software and hardware together in the server room. Taking the Ascend 950 series chip as an example, according to industry消息, this chip features 112GB of self-developed HBM, 1.4TB/s bandwidth, and a single-card power consumption of 600W. At specific推理 precision (e.g., FP4), its single-card算力 has shown extremely strong data performance, reaching 2.87 times that of NVIDIA's H20. But in the more demanding FP16 or FP32 general training precision ranges, the performance gap between domestic hardware and NVIDIA still exists.

Furthermore,所谓的 "Day 0 adaptation" still needs to跨越 the hidden costs brought by opaque supply chains before achieving lossless operation for enterprise-level business. The high-speed connection standards of Super Node hardware are extremely封闭, and the flow of core components is like an information black box. This procurement-side barrier无疑 makes the large-scale deployment and maintenance of computing systems more complex.

Simultaneously, the current system highly relies on procurement large orders from a very few large domestic institutions. The lack of overseas market orders means this computing power breakout battle can only be fought within an internal循环. This single commercial closed-loop means the operational efficiency of the entire软硬协同 system urgently needs tempering in more diverse commercial environments.

The tight ramp-up of high-end computing power产能 directly led DeepSeek to坦承 in its press release that a significant price reduction for the Pro version还需要等待 the batch上市 of Super Nodes in the second half of the year. Large models and domestic chips have indeed completed a preliminary physical咬合, but under technological gaps and supply chain constraints, this posture of running wounded恰恰 is the most realistic survival profile of the domestic computing power ecosystem.

If People Leave, Can the Technology Still Run?

Zooming out to the real商业竞争, the release of DeepSeek-V4 is an extremely precise strategic defense. Over the past half year, this company's situation has been under high pressure. The C-end track has become a red ocean, with top vendors using massive funds for intensive投放. QuestMobile data shows a clear competitive landscape: as of March 2026, Doubao's MAU reached 345 million, Qianwen 166 million, with DeepSeek holding its own basic盘 at 127 million.

External traffic competition is fierce, and the internal technical team also faces流动性考验. Poaching competition within the industry is白热化, with key personnel from multiple business lines leaving consecutively. According to public resumes and industry information, the core author of the first-generation large language model has confirmed joining Tencent, a core contributor to V3 went to Xiaomi, a core researcher from R1 joined ByteDance, and core strength in the multimodal direction has also confirmed new destinations. According to industry rumors, core OCR author Wei Haoran has also left.

The turnover of core R&D members will inevitably trigger strict external scrutiny of its R&D momentum: will the underlying architectural innovation capability of this technology-based company be affected?

At this juncture, the release of the V4 preview version became the most direct response. It proves to the market that the company has established a systematic R&D pipeline with risk resistance. Even facing personnel structure adjustments, the logic of its technological evolution can still operate precisely. This organizational resilience built on an engineering system quickly received positive feedback in the capital market.

Recently, DeepSeek was reported to be seeking financing at a valuation of no less than $10 billion, planning to raise funds to replenish reserves. According to industry media citing sources close to the transaction, market rumors suggest a leading internet giant is expected to invest, potentially pushing the valuation higher. If this deal is finalized, it will rewrite the valuation records of the domestic large model track, surpassing Moonlight's previous performance. At a critical juncture in financing negotiations, presenting the substantive achievement of million-word context and full-stack domestic adaptation is a rational move by management to stabilize the strategic大盘 and respond to external doubts.

Final Thoughts

In the tech business语境 where concepts change frequently, teams willing to focus on building underlying infrastructure are always scarce. The release of DeepSeek-V4 sets a pragmatic and冷峻 tone for the competition in the second half of the large model race.

Facing computing power bottlenecks, they chose not to embellish but threw the真实供需现状 of domestic high-end hardware to the market; facing下沉 deployment needs, they used the Flash version with 13B activation to provide生存空间 for domestic computing chips in the catch-up phase; facing external traffic siege and talent competition, they responded on an industry level with concrete long-text processing capabilities.

The quote from "Xunzi" cited by the official account on the day of release is deeply meaningful: "Not seduced by praise, not frightened by slander, follow the path and act,端正正己 (correct oneself)."

The model can be open-sourced, but computing power is not free. What DeepSeek has delivered this time is not a stronger model, but a solution for how capabilities are redistributed after computing power becomes a constraint. In the reality where computing power is still imperfect, this is perhaps the evolutionary direction closer to the essence of the industry.

Related Questions

QWhat is the key technical innovation that allows DeepSeek-V4 to handle 1M context length despite limited high-end computing power?

ADeepSeek-V4 achieves 1M context length through a deep reconstruction of the attention mechanism, including a novel attention compression scheme on the token dimension, combined with its signature DSA sparse attention) technology, and the introduction of KV Cache sliding window and compression algorithms to control computational overhead and memory usage.

QHow does the DeepSeek-V4-Flash version balance performance and cost for commercial use?

AThe Flash version maintains a total parameter count of 284B but activates only 13B parameters per inference. This design significantly reduces electricity and memory bandwidth costs per API call, making it affordable for SMEs and long-tail enterprises while maintaining stable performance for high-frequency, simple tasks.

QWhich domestic hardware vendors have announced support for DeepSeek-V4, and what are the challenges in full-stack domestic adoption?

AHuawei Computing (Ascend), Cambricon, and Haiguang DCU have announced support. Challenges include performance gaps in higher precision computations (e.g., FP16/FP32) compared to Nvidia, opaque supply chains for critical components, reliance on domestic large-scale procurement, and lack of overseas market orders, limiting the ecosystem's maturity and scalability.

QWhat competitive pressures is DeepSeek facing in the consumer market and internal talent retention?

AIn the C-end market, DeepSeek faces intense competition with Doubao (345M MAU) and Qianwen (166M MAU) leading, while DeepSeek holds 127M MAU. Internally, key technical staff have left for major tech firms like Tencent, Xiaomi, and ByteDance, raising concerns about sustained innovation, though the V4 release demonstrates resilient R&D pipelines.

QWhat strategic significance does the release of DeepSeek-V4 hold for the broader AI industry in China?

ADeepSeek-V4 represents a shift towards pragmatic infrastructure building in the face of compute constraints. It highlights a focus on algorithmic efficiency over brute-force scaling, enables cost-effective deployment for diverse businesses, fosters domestic hardware-software collaboration, and sets a realistic benchmark for industry evolution amid limited high-end compute resources.

Related Reads

How to Define "Real U.S. Stocks": Differences Between On-Chain Tokens, Price Contracts, and Direct Broker Connections

**Title:** Defining "Real US Stocks": Differences Among On-Chain Tokens, Price Contracts, and Broker-Direct Access **Summary:** In 2026, using stablecoins to purchase US stocks is mainstream, but products marketed as "buying US stocks with USDT" offer fundamentally different assets. This article analyzes three primary models. **1. Tokenized Stocks:** These are on-chain tokens representing economic exposure to underlying stocks, held by an issuer or custodian. They offer benefits like 24/7 trading and DeFi composability (e.g., use as loan collateral). However, users lack direct legal shareholder status; dividends may not be paid in cash, and voting rights are typically non-binding advisory expressions. Examples include platforms like Ondo Finance. **2. Stock Futures / Equity Perpetuals:** These are derivative contracts tracking a stock's price, allowing leveraged long/short positions 24/7, similar to crypto perpetuals. They offer high efficiency and flexibility but involve funding fees, which can be a significant long-term cost, especially during strong trends. Crucially, they confer no ownership rights (dividends, voting) to the holder. **3. Broker-Direct Model:** This model provides access to real securities via licensed broker-dealers. Stocks/ETFs are bought and held within the US clearing and custodial system (e.g., DTCC), making it the only path to genuine stock ownership. Users receive cash dividends and formal proxy voting rights (where applicable). It supports thousands of stocks and ETFs, far exceeding the coverage of the other two models. Key advantages include no funding fees, a clean cost structure for long-term holds, and the potential to transfer holdings to other brokers. Some platforms facilitate stablecoin (USDT/USDC) deposits, reducing reliance on traditional banking. A critical distinction exists *within* the broker-direct model: the underlying brokerage architecture (e.g., Fully Disclosed IB, Omnibus IB, Self-Clearing) determines how client assets are held, protected, and how safeguards like SIPC insurance are conveyed. Users should verify the specific clearing structure and regulatory compliance of any platform. In conclusion, "buying US stocks with USDT" can mean holding an on-chain economic proxy (Tokenized Stocks), trading a price derivative (Stock Futures), or owning the actual security (Broker-Direct). For users seeking full ownership rights and long-term investment, the broker-direct model is the definitive choice, though its implementation details require careful scrutiny.

marsbit18m ago

How to Define "Real U.S. Stocks": Differences Between On-Chain Tokens, Price Contracts, and Direct Broker Connections

marsbit18m ago

NVIDIA Launches DSX Platform, Expanding into AI Factory Infrastructure

NVIDIA has unveiled the DSX platform at its GTC Taipei event, marking a strategic expansion from GPU sales into comprehensive AI factory infrastructure solutions. The platform addresses challenges like power supply, cooling, and resource orchestration as AI models scale, shifting the industry focus from single-chip performance to overall infrastructure efficiency. DSX integrates NVIDIA's chips, systems, software, and partner technologies to cover the entire AI factory lifecycle—from design and simulation to deployment and operations. It aims to accelerate deployment, improve reliability and operational efficiency, and reduce the cost per generated token in AI inference. The software suite includes DSX MaxLPS, which uses 45°C liquid cooling and rack-level optimization to allow up to 40% more GPUs per megawatt, and DSX OS, an open-source platform for AI factory operations. The platform also encompasses reference designs, digital twin simulation (DSX Sim), dynamic workload adjustment based on grid conditions (DSX Flex), and data exchange between systems. Early adopters include cloud providers like CoreWeave and Lambda. Major hardware partners, including Dell, HPE, Lenovo, and Supermicro, are developing DSX-ready systems. Pilot projects for DSX Flex are underway with energy providers. Strategically, DSX represents NVIDIA's ongoing transition from an AI chip supplier to a full-stack AI infrastructure platform provider, aiming to set industry standards and solidify its market leadership.

marsbit23m ago

NVIDIA Launches DSX Platform, Expanding into AI Factory Infrastructure

marsbit23m ago

After Burning Tens of Billions of Dollars in Tokens, Silicon Valley Giants Start Limiting Employee Token Usage

After burning tens of billions of dollars on AI tokens, major Silicon Valley firms are now restricting employee usage. Companies like Microsoft, Uber, and Salesforce, which heavily promoted AI for "efficiency," are facing a cost crisis. The practice of "tokenmaxxing"—pushing employees to maximize AI tool usage—led to wasteful spending on trivial tasks like checking the weather or writing birthday messages, with studies showing significant hidden costs for bug fixes and code rewrites. The core issue is a misalignment between individual productivity gains and actual business value. While employees use AI to automate tasks they dislike, such as writing reports, this often doesn't translate to increased company revenue or improved core business outcomes. For instance, AI-generated code speeds up development but also sees an 800% increase in "code churn" (code being discarded or rewritten). As a result, only 14% of CFOs report seeing a clear, measurable return on AI investments. Firms are now shifting strategies. Microsoft has revoked most internal licenses for Claude Code, while others are implementing monitoring and cost controls. New tools from companies like Harness and CloudZero aim to track AI spending and tie costs to business results. Some AI vendors, like HubSpot, are moving from token-based pricing to charging based on outcomes, such as "resolved conversations" or "leads generated." This represents a necessary correction in the AI adoption cycle. The challenge now is for companies to move beyond using AI merely to speed up old tasks and instead rethink their workflows and business models fundamentally. The future of enterprise AI depends on proving its value, not just its usage.

marsbit44m ago

After Burning Tens of Billions of Dollars in Tokens, Silicon Valley Giants Start Limiting Employee Token Usage

marsbit44m ago

Trading

Spot
Futures
活动图片