By Gao Heng (Member Expert of the Science Fiction Communication and Future Industry Committee, China Society for Science and Technology Journalism)
After the release of DeepSeek V4, the most deserves attention is not the benchmark scores, but a line of fine print below the price list.
In the pricing description for V4, DeepSeek mentioned that, limited by high-end computing power, the Pro version service currently has very limited throughput. It is expected that after the batch release of Ascend 950 super nodes in the second half of the year, the Pro price will be significantly reduced.
This sentence contains more information than many technical parameters. It shows that DeepSeek's low prices are no longer just the result of model engineering optimization but are beginning to be tied to the supply rhythm of domestic computing power. In the past, when model companies reduced prices, it was usually interpreted as improved algorithm efficiency, vendor subsidies, or a new round of price wars. But this time, DeepSeek directly linked the premise of future price reductions to the large-scale deployment of Ascend 950 super nodes.
This is also what truly sets the V4 release apart. On the surface, it is a routine model upgrade: 1.6 trillion parameters, 1 million token context, stronger code and Agent capabilities, and lower API prices. But looking deeper, it seems more like Liang Wenfeng is simultaneously answering three questions: Can DeepSeek continue to make models cheaper? Can domestic computing power enter the critical path of cutting-edge models? Can a team that has long presented itself as technologically idealistic withstand the pressures of financing, talent retention, and commercialization?
Over the past year, DeepSeek has changed the pricing model of China's large model industry. After the releases of V3 and R1, domestic and foreign model vendors were forced to recalculate API prices, training costs, and commercialization paths. With V4, the issue has become more complex. DeepSeek is not only continuing to lower prices but is also pinning the next step of price reductions on the large-scale deployment of domestic computing power. In my opinion, this means that the competition in China's large models is shifting from "whose model capability is stronger" to "who can integrate models, chips, engineering systems, and business organizations into a closed loop."
01 DeepSeekMade Long Context Affordable
On the morning of April 24, DeepSeek announced the official launch of the preview version of its new model series, DeepSeek-V4, and simultaneously open-sourced it.
This time, it was not a single model but two versions released simultaneously: DeepSeek-V4-Pro and DeepSeek-V4-Flash. According to information disclosed by DeepSeek, V4-Pro has a total of 1.6 trillion parameters, with 49 billion activated parameters, targeting high-performance tasks; V4-Flash has a total of 284 billion parameters, with 13 billion activated parameters, focusing on low cost and high throughput. Both models adopt the MoE architecture, or "Mixture of Experts."
Renowned technology industry commentator Peng Deyu analyzed for me: The logic of MoE is not complicated. A large model can have many "experts" inside, but each time a question is answered, not all experts need to work simultaneously—only the most relevant ones are called. This allows the model capacity to be large without burdening every call with the full parameter computational load. For users, the perceived result is a cheaper and faster model; for model companies, the key is that the inference cost per unit is reduced.
Another change with V4 is making a 1 million token context a standard official service. This capability may sound abstract to ordinary users, but in usage scenarios, it is straightforward: users can let the model process an entire book, a large codebase, a full annual report, or a set of complex project documents at once. In the past, such long-text processing was typically an additional feature of high-end models, with high prices, slow calls, and significant memory pressure. In my opinion, the focus of V4 is not being the first to achieve a million-token context but attempting to make it a low-cost basic capability.
Li Rui, Executive Director of Qishijie Beijing Technology Co., Ltd., told me: This is also the most practically significant change with V4. A million-token context is no longer an exclusive capability today, as models like Gemini and Qwen have also reached this level. The question DeepSeek needs to answer is not "whether it can be done" but "whether the cost can be sustained after it is done." If long context remains expensive, it is only a feature for a few high-end users; if the cost is reduced, it can become infrastructure that enterprises and developers can use daily.
A large model industry researcher told me: This addresses a long-standing contradiction in the large model industry: the longer the context, the higher the cost. Traditional models need to compute the interactions between a large number of tokens to understand long text. The longer the text, the more the computational load and memory usage tend to increase. DeepSeek V4 does not tackle this problem head-on but instead compresses long text first and then focuses on key points through sparse attention and compression mechanisms. In other words, it does not make the model reread all the content from start to finish repeatedly but first organizes the content into a more condensed information structure and then performs reasoning around the key points.
The pricing continues DeepSeek's consistent strategy. According to the API pricing announced for V4, the Pro version's input cache hit price is 1 yuan per million tokens, and output is 24 yuan per million tokens; the Flash version's input cache hit price is 0.2 yuan per million tokens, and output is 2 yuan per million tokens. A comparison shows that currently, Zhipu GLM-5.1's input cache hit price is about 1.3-2 yuan per million tokens, and Kimi-K2.6's input cache hit price is about 1.1 yuan per million tokens. This means V4's input price remains at the low end among mainstream domestic models.
In my opinion, what is truly noteworthy this time is that low prices and long context are combined. A million-token context is not an isolated parameter; it determines whether the model can enter heavier workflows. Scenarios such as code, finance, law, scientific research, and enterprise knowledge bases all require the model to read long materials, process complex structures, and retain context.
V4's capability changes also revolve around these scenarios. Information disclosed by DeepSeek's evaluations shows that V4-Pro outperforms most open-source models in public evaluations in tasks such as mathematics, STEM, and competitive coding; it enters the first tier of open-source models in Agentic Coding and is used internally by DeepSeek as an engineering team coding tool. It has also been adapted for mainstream Agent tools like Claude Code, OpenClaw, and CodeBuddy, optimizing performance in code generation, document processing, and tool invocation scenarios.
But this does not mean V4 has completely pulled ahead. Corporate strategy positioning expert Wu Yuxing analyzed for me: V4's performance breakthrough is somewhat smaller compared to the impact R1 had at the time. It is still in the first tier, but there remains a gap with the top closed-source models in some complex Agent tasks and the broadest world knowledge.
The highlight of V4 is not "comprehensive crushing" but providing sufficiently strong long-context and production task capabilities at a lower price. This is the first layer of meaning of DeepSeek V4: it continues to lower the usage threshold for high-performance models. But more importantly, DeepSeek is beginning to explain what else this low price can rely on to continue, and the answer points to domestic computing power.
02 The Next Step of Affordability Points to Domestic Computing Power
The most critical point of V4 is not in the parameter table but in that sentence about Ascend 950.
DeepSeek explicitly mentioned in the pricing description that, limited by high-end computing power, the Pro version service currently has very limited throughput. It is expected that after the batch release of Ascend 950 super nodes in the second half of the year, the Pro price will be significantly reduced. It is uncommon in the industry for a model company to directly tie future price reductions to the release schedule of a specific type of computing cluster. It shows that model prices are beginning to be determined by the computing power structure.
In the past, DeepSeek's affordability was more understood as a victory of model architecture and engineering efficiency. V2 used MoE to reduce the scale of activated parameters; R1 used more efficient training and inference routes to challenge the industry's reliance on computing power stacking; then V3 used extreme cost control and engineering optimization to dismantle the traditional pricing logic of general large models. After V3 and R1, domestic large models were forced into a new round of price reassessment. But what sets V4 apart is that DeepSeek is beginning to place the next step of low prices on the large-scale deployment of domestic computing power.
According to DeepSeek's technical report, V4 has implemented fine-grained expert parallelism, or the EP scheme, at the system底层. In simple terms, this optimizes the scheduling of the model on the chips, making computation and communication overlap like an assembly line, reducing chip idle time. If the same batch of chips can handle more requests, the inference cost per unit will naturally decrease.
The technical report mentions that this EP scheme has been validated on both NVIDIA GPU and Huawei Ascend NPU systems. For general inference tasks, it can achieve 1.5-1.73 times acceleration, and in latency-sensitive scenarios (such as RL inference and high-speed agent services), it can reach up to 1.96 times. Huawei Ascend also announced after the V4 release that its super node full series products support the DeepSeek V4 series. It is understood that Ascend 950 reduces Attention computation and memory access overhead through fused kernels and multi-stream parallel technology, significantly improving inference performance. Combined with various quantization algorithms, it enables high-throughput, low-latency inference deployment of the DeepSeek V4 model.
Peng Deyu told me: The significance of this information is not just "faster inference." It means that DeepSeek's engineering optimization is beginning to possess cross-platform capabilities. In the past, most large model companies developed around the NVIDIA CUDA ecosystem. CUDA is not just a programming tool; it is more like the underlying operating system of the AI era. A large number of global developers, operator libraries, frameworks, and model codes are built around CUDA. Once away from this system, much underlying code needs to be rewritten, with high engineering and testing costs. This is also NVIDIA's real moat.
What DeepSeek is doing now is not immediately overthrowing CUDA but trying to leave itself a second path. Based on comprehensive media reports, DeepSeek uses methods like TileLang and Tile Kernels to abstract part of the underlying operator logic from the single CUDA path, expressing the computation logic in a more universal language, and then having the compiler generate underlying code adapted to different hardware. This way, developers do not have to completely rewrite a set of code for each GPU or NPU but can first write universal logic and then optimize for specific hardware.
This is important for domestic chips. Domestic AI chips have faced not just paper computing power issues but also software ecosystem and effective utilization issues. Whether chips can be used well depends on multiple links such as models, operators, compilers, communication, and memory management. If DeepSeek can run cutting-edge models on Huawei Ascend and bring down inference costs, it will bring not just an adaptation case of one model but a technical validation of software-hardware coordination.
But DeepSeek has not immediately摆脱 NVIDIA. In the short term, CUDA remains the most mature and stable path. The signal released by V4 is that domestic computing power has begun to enter DeepSeek's key cost structure and, to some extent, influence future pricing. It has not overturned CUDA, but it makes CUDA no longer seem completely irreplaceable.
This is exactly what Jensen Huang is worried about. NVIDIA founder Jensen Huang recently stated in an interview with Dwarkesh Patel that if DeepSeek were to release on the Huawei platform first, it would be disastrous for the United States. Li Rui pointed out that this judgment is not because DeepSeek's某项跑分 surpassed someone else's but because once top open-source models can run stably on non-NVIDIA systems, developers might start to change their habits. If the models are good enough, the prices are low enough, and the toolchain gradually matures, migration will no longer be just a political or supply chain choice but will become a商业 choice.
Therefore, the second layer of meaning of V4 is that DeepSeek's low-price logic is shifting from "model optimization-driven" to "model optimization + computing system-driven." In the past, large model prices were mainly determined by algorithm efficiency, training costs, and vendor subsidies; now, prices are beginning to be tied to chip supply, super node deployment, and software-hardware协同 efficiency. For DeepSeek, this is a path to lower costs; for NVIDIA, it is a crack that is暂时不大 but must be警惕ed.
However, software-hardware coordination is not a light-asset business. The deeper the model is embedded in chips and infrastructure, the greater the cost, organizational pressure, and commercialization pressure DeepSeek must bear.
03 DeepSeekBegins to Become Heavier
This is also why the news that Liang Wenfeng began接触 external financing around the time of the V4 release is equally important.
According to Sina Tech reports, DeepSeek recently also exposed plans to raise 50 billion yuan. Sources close to DeepSeek revealed that DeepSeek's pre-money valuation is 300 billion yuan, approximately $44 billion, and currently, Tencent Holdings and Alibaba Group are both in talks to invest in DeepSeek. However, regarding financing matters, DeepSeek has not yet正面 responded to media inquiries.
The specific valuation is not the most important. The key is that DeepSeek is beginning to open the window for external financing. This means the competition it faces is no longer just about capability but has extended to computing power investment, talent stability, employee incentives, and commercialization capability.
This matter is important not because the investment amount is a sufficiently large number. In today's AI financing market, it is not exaggerated. What matters is that the person opening the financing is Liang Wenfeng. DeepSeek was long regarded as a rare technologically idealistic company, backed by幻方量化, not in a hurry to take external capital or tell商业 stories. Now it is beginning to接触 external financing, indicating that the competitive landscape after V4 has become heavier and there is pressure: computing power infrastructure, talent incentives, and commercialization落地 all require more stable capital arrangements than before.
The first pressure comes from computing power. The deeper V4 goes into domestic computing power, the more infrastructure investment is needed. As model parameters move from hundreds of billions to trillions, training and inference costs will rise. If more adaptation, tuning, and deployment around the Ascend system are needed, DeepSeek cannot just be a light-asset model company. Currently, DeepSeek is already recruiting data center operation and maintenance engineers in Ulanqab, Inner Mongolia. This is its first recruitment of personnel directly responsible for computing infrastructure operation, which is also seen by the outside world as a signal of its move towards heavier computing power infrastructure.
The second pressure comes from talent. Reports from multiple media outlets show that currently, 5 core technical experts have confirmed leaving DeepSeek, going to companies like ByteDance, Tencent, Xiaomi, and Yuanrong Qihang, involving areas such as base models, inference reinforcement learning, multimodal, and OCR. Among them, Guo Daya (core author of DeepSeek R1) was reported to have joined ByteDance's Seed team; Wang Bingxuan (core author of DeepSeek LLM) joined Tencent Hunyuan; Ruan Chong (deeply involved in the development of DeepSeek-VL, VL2, Janus series and other multimodal models) joined Yuanrong Qihang; Luo Fuli (key developer of DeepSeek-V2 and core contributor to MLA technology) joined Xiaomi; Wei Haoran (core author of the DeepSeek OCR series) destination has not been publicly disclosed.
For a company with less than 200 people in total, this type of turnover is not ordinary personnel changes. Media reports state that DeepSeek's core R&D team is about 100 people, hardly recruiting socially, mainly relying on应届生 and interns staying. In such a team, the departure of a core researcher may affect not just one position but the continuity of a technical line.
This does not mean DeepSeek's organization is poor. On the contrary, the long-term external impression of DeepSeek is that it has an organizational method difficult for large companies to replicate: no clocking in, no KPIs, researchers can freely form teams or独自钻研 new ideas. This organizational method is suitable for early technological breakthroughs and explains why DeepSeek has been able to make counter-intuitive engineering innovations over the past few years. But when the industry enters a heavier phase, the problem changes. Top talent does not only look at work freedom but also technical direction, resource investment, and落地 scenarios. Large companies can simultaneously offer money, computing power, product scenarios, and larger teams.
The third pressure comes from commercialization. Before the V4 release, the DeepSeek App was revamped on April 8, launching an "Expert Mode" that supports complex reasoning and a "Quick Mode" for simple tasks. With the V4 release, the outside world learned that the Expert Mode corresponds to the 1.6 trillion parameter V4-Pro, and the Quick Mode corresponds to the 284 billion parameter V4-Flash. This change shows that DeepSeek is no longer just releasing models for developers to use but is beginning to polish product分层 for users.
Peng Deyu pointed out that there is a natural tension with the open-source route. Open source can quickly build technical momentum and allow developers and ecosystem partners to reuse DeepSeek's route faster. But open source usually means thinner profit margins and higher cost sensitivity. Closed-source companies like OpenAI and Anthropic can establish more direct商业闭环 through subscriptions, APIs, and enterprise services; Google, Amazon, and Microsoft can消化 model costs in cloud computing and ecosystem systems. DeepSeek does not have these ready-made商业缓冲层. If it continues to insist on low prices, open source, and cutting-edge model R&D, it must find new funding, computing power, and commercialization support.
Li Rui said: Therefore, the V4 release and financing are not two independent events. V4 is the答卷 Liang Wenfeng交给 the market, proving that DeepSeek can still make strong models, low prices, and push domestic computing power to the critical path. Financing is the答卷 he交给 the team, leaving缓冲 for computing power investment, employee options, talent stability, and commercialization exploration.
Wu Yuxing further stated: There is also a more realistic paradox here. Financing can solve equity pricing, alleviate computing power pressure, and give the company more chips in the talent war. But financing cannot solve all problems. What was most scarce about DeepSeek in the past was not money but that organizational temperament willing to long-term bet on underlying technology and willing to bypass mainstream paths to make engineering innovations. Once capital, commercialization, and the large company talent war enter simultaneously, what DeepSeek must guard is not only model leadership but also its original technical route and organizational culture.
In my opinion, this is also the deep problem真正 exposed by V4. It proves that China's large models already have the ability to simultaneously take a step forward in model capability, inference price, and domestic computing power adaptation; but it also proves that large model competition is no longer a contest where a few geniuses write better algorithms. The next phase competes on computing power infrastructure, engineering systems, product transformation, financing capability, and talent density.
Liang Wenfeng has this time placed his bet on domestic computing power. V4 keeps DeepSeek at the center of the industry and also lets the outside world see that the CUDA ecosystem is not completely unshakable. But the harder problem has just begun: as models become heavier, talent becomes more expensive, and commercialization becomes more urgent, can DeepSeek, after becoming a heavier AI infrastructure company, still maintain the ability to change the rules as it did in the past.





