DeepSeek V4 Finally Released, Breaking the Strongest Closed-Source Monopoly, Explicitly Partnering with Huawei Chips

marsbitPublished on 2026-04-24Last updated on 2026-04-24

Abstract

DeepSeek AI has officially released DeepSeek-V4, available in two versions: the high-performance **DeepSeek-V4-Pro** (49B activated parameters, 1.6T total) and the more efficient **DeepSeek-V4-Flash** (13B activated parameters, 284B total). Both support a 1M context length, making long-context capability a baseline feature rather than a premium offering. The Pro version rivals top closed-source models in agent capabilities, world knowledge, and reasoning performance. It outperforms Claude Sonnet 4.5 in agentic coding and approaches Claude Opus 4.6 (non-thinking mode) in quality. The Flash version offers competitive performance at a lower cost, though it lags in highly complex tasks. A key technical innovation is a new attention mechanism that reduces computational and memory demands for long contexts. The models are optimized for agent frameworks like Claude Code and OpenClaw. API services are available with support for both OpenAI and Anthropic-style interfaces. DeepSeek also announced upcoming support for Huawei’s computing hardware in the second half of the year. The models are open-sourced on Hugging Face and ModelScope.

Just now, DeepSeek-V4 is here!

The preview version is officially launched and simultaneously open-sourced.

There are two versions in total:

DeepSeek-V4-Pro: Comparable to top closed-source models, 1.6T, 49B activated, 1M context length;

DeepSeek-V4-Flash: A smaller and faster economical version, 284B, 13B activated, 1M context length.

The official statement is: It leads domestically and in the open-source field in Agent capabilities, world knowledge, and reasoning performance.

And:

Currently, DeepSeek-V4 has become the Agentic Coding model used by company employees. According to evaluation feedback, the user experience is better than Sonnet 4.5, and the delivery quality is close to Opus 4.6 non-thinking mode. However, there is still a certain gap compared to the Opus 4.6 thinking model.

Currently, both the official website and the app have been updated, and the API service has also been synchronized.

Regarding the much-concerned domestic computing power, the key point is: Support for Huawei computing power in the second half of the year.

Top-Tier and Cost-Effective Choices, Two Versions Launched Together

This time, V4 releases two versions at once.

V4-Pro, performance comparable to top closed-source models.

The official judgment has three points:

Significantly improved Agent capabilities: In the Agentic Coding evaluation, V4-Pro has reached the best level among current open-source models and also performed excellently in other Agent-related evaluations. In internal evaluations, in Agent Coding mode, the V4 experience is better than Sonnet 4.5, and the delivery quality is close to Opus 4.6 non-thinking mode, but there is still a certain gap compared to the Opus 4.6 thinking mode.

Rich world knowledge: In world knowledge evaluations, DeepSeek-V4-Pro significantly leads other open-source models, only slightly inferior to the top closed-source model Gemini-Pro-3.1.

World-class reasoning performance: In evaluations of mathematics, STEM, and competitive code, DeepSeek-V4-Pro surpasses all currently publicly evaluated open-source models and achieves excellent results comparable to the world's top closed-source models.

V4-Flash, a smaller and faster economical version. Reasoning ability is close to Pro, world knowledge reserve is slightly inferior, but with smaller parameters and activation, and cheaper API.

In Agent tasks, DeepSeek-V4-Flash is on par with DeepSeek-V4-Pro in simple tasks, but there is still a gap in high-difficulty tasks.

In the car wash test, V4 also passed quickly.

In the classic biological scenario "Desperate Father," DeepSeek-V4 did not immediately grasp the key point of red-green color blindness in one round (according to genetic rules, if a female is red-green color blind, her biological father must be as well).

Million Context Length Becomes Standard

It is worth mentioning that from today, 1M context length is standard for all DeepSeek official services.

A year ago, 1M context length was Gemini's exclusive trump card; all other closed-source models were either 128K or 200K; on the open-source side, almost no one could afford this level.

DeepSeek directly moved the million context length from a "high-end feature" to "basic infrastructure."

And it's open source. How did they do it? The release directly gave the answer—

V4 has created a new attention mechanism that compresses at the token dimension and is used in combination with DSA sparse attention. Compared to traditional methods, the demand for computation and memory is significantly reduced.

DSA is not a new term. It was first introduced in the V3.2-Exp update half a year ago. At that time, external attention was low because the benchmark scores were almost the same as V3.1-Terminus, making it seem like an insignificant intermediate version.

Looking back now, that was the foundation of V4.

Special Optimization for Agent Capabilities

On the Agent side, V4 has been adapted and optimized for mainstream Agent products such as Claude Code, OpenClaw, OpenCode, CodeBuddy, etc., with improvements in code tasks and document generation tasks.

The release also included an example of a PPT inner page generated by V4-Pro under a certain Agent framework.

API Pricing

On the API side, V4-Pro and V4-Flash are simultaneously launched, supporting both OpenAI ChatCompletions interface and Anthropic interface.

The base_url remains unchanged, just change the model parameter to deepseek-v4-pro or deepseek-v4-flash to call.

Both versions have a maximum context length of 1M and support both non-thinking mode and thinking mode. In thinking mode, the intensity can be adjusted through the reasoning_effort parameter, with two levels: high and max. The official recommendation is to directly use max for complex Agent scenarios.

Here is a key point—Support for Huawei computing power in the second half of the year.

In addition, old model names will be discontinued.

deepseek-chat and deepseek-reasoner will be discontinued three months later (July 24, 2026). During the current phase, these two names point to the non-thinking and thinking modes of V4-Flash, respectively.

It has little impact on individual developers; just change one model parameter. Companies with production environments need to migrate during these three months.

One more thing

At the end of the release, DeepSeek quoted a sentence.

"Not tempted by praise, not frightened by slander, follow the path and act,端正自己端正自己 (correct oneself)."

This is a sentence from Xunzi's "Non-Twelve Masters." The literal meaning is: not tempted by praise, not frightened by slander, move forward according to the path one believes in, and correct oneself.

In today's context, it's somewhat interesting.

Over the past six months, rumors about when V4 would be released, whether it was delayed, whether it had been surpassed by others, whether it had been compromised by Claude's distilled data, etc., have circulated back and forth in both Chinese and English AI circles. At the beginning of the year, some even confidently said that V4 would be released before the Spring Festival, but it wasn't until the end of April.

They never responded once.

Then, on a Friday afternoon, they released V4, simultaneously open-sourced it, simultaneously launched it on the official website and app, simultaneously updated the API, and even wrote into the release that internal employees have already abandoned Claude.

No roadmap, no live stream, no interviews.

The four words "率道而行" (follow the path and act) sound like a slogan. But if you look at the path over the past six months: the V3.2 "unremarkable" Exp version, the DSA sparse attention that paved the way for V4 for half a year, and the path of making 1M context length from a trump card to a standard feature.

DeepSeek has already done it.

DeepSeek-V4 model open-source links:

[1]https://huggingface.co/collections/deepseek-ai/deepseek-v4

[2]https://modelscope.cn/collections/deepseek-ai/DeepSeek-V4

DeepSeek-V4 Technical Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

This article is from the WeChat public account "QbitAI", author: QbitAI

Related Questions

QWhat are the two versions of DeepSeek-V4 that were released, and what are their key specifications?

ADeepSeek-V4 was released in two versions: DeepSeek-V4-Pro and DeepSeek-V4-Flash. The Pro version has 1.6T parameters with 49B activated and a 1M context length. The Flash version is a smaller, faster, and more economical model with 284B parameters, 13B activated, and also a 1M context length.

QAccording to the article, how does DeepSeek-V4-Pro's performance compare to top closed-source models like Anthropic's Opus 4.6?

AAccording to internal evaluations, DeepSeek-V4-Pro's performance in Agent Coding mode is better than Sonnet 4.5 and its delivery quality is close to Opus 4.6 in non-thinking mode, but it still has a gap compared to Opus 4.6 in thinking mode.

QWhat major technical achievement is highlighted for the DeepSeek-V4 models regarding context length?

AA major technical achievement is that a 1M context length has become the standard for all DeepSeek official services. This was achieved through a novel attention mechanism that compresses at the token dimension and is combined with DSA sparse attention, significantly reducing computational and memory requirements.

QWhat significant partnership or hardware support is announced for the future of DeepSeek's models?

AThe article announces that DeepSeek will support Huawei's computing power in the second half of the year.

QWhere can users find the open-source models and the technical report for DeepSeek-V4?

AThe open-source models can be found on Hugging Face and ModelScope collections under 'deepseek-ai/deepseek-v4'. The technical report is available at: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Related Reads

TechFlow Intelligence Bureau: Chip Stocks Lose Trillions in a Single Day, Bitcoin Falls Below $60,000, US-Iran Conflict Escalates

**Daily Tech & Markets Roundup: AI Advances, Market Turmoil, and Geopolitical Tensions** **AI / LLMs**: Anthropic's internal report on AI self-improvement sparked serious discussions about Recursive Self-Improvement (RSI). Meanwhile, debate continues on AI coding tools after Claude was accused of introducing bugs into the rsync codebase. In positive news, DeepSeek V4 Flash impressed in local deployment tests, and GitHub Copilot now supports custom endpoints for local models. A surprising research turn suggests removing chain-of-thought prompting can sometimes improve LLM performance. **Crypto / Web3**: Bitcoin plunged below $60,000, with its RSI hitting levels last seen during the COVID-19 crash, driven by strong U.S. jobs data reviving interest rate hike fears. Discussions highlight Ethereum DeFi's continued lack of a smooth consumer payment layer. **Chips / Hardware**: Chip stocks suffered a massive sell-off, with the Philadelphia Semiconductor Index posting its worst single-day drop in six years, erasing over a trillion dollars in value. Marvell, Micron, AMD, and Intel were among the biggest losers. **Tech Companies**: A leaked Microsoft document revealing goals to make Copilot "addictive" drew criticism. LinkedIn founder Reid Hoffman left Microsoft's board to focus full-time on his AI agent startup, Manus. Google was revealed to be paying SpaceX $920 million monthly for AI training compute. **Markets & Macro**: A blowout U.S. jobs report (172k vs. 80k expected) crushed hopes for near-term rate cuts, sending Treasury yields soaring and triggering a broad market sell-off. CEOs from Kraft, McDonald's, and Whirlpool simultaneously warned U.S. consumers are exhausting their savings. **Geopolitics**: U.S.-Iran tensions escalated with missile/drone interceptions and U.S. strikes on Iranian radar sites, keeping the critical Strait of Hormuz largely closed since late February and posing ongoing oil supply risks. **The Bottom Line**: The strong jobs data acted as a single trigger for correlated sell-offs across equities, crypto, and chips. Underlying the volatility is a stark contradiction between robust employment data and warnings of consumer weakness, alongside geopolitical risks that could reignite inflation, leaving markets to price in a fraught macro outlook with no clear "soft landing" path.

marsbit2h ago

TechFlow Intelligence Bureau: Chip Stocks Lose Trillions in a Single Day, Bitcoin Falls Below $60,000, US-Iran Conflict Escalates

marsbit2h ago

It Took Me a Year to See the Bitter Truth About Agent Payments

After a year building infrastructure for the Agent economy, engaging with major players like Stripe, Visa, and Coinbase, the author shares a sobering analysis of the current state of Agent payments. The core finding is a stark lack of genuine, immediate demand across most envisioned use cases. The article breaks down four key market segments: 1. **Agent-to-Merchant (Consumer Shopping):** For most product categories (e.g., clothing, electronics), conversational AI shopping is a step backwards from visual e-commerce interfaces. While agents excel at understanding needs, they can't replace side-by-side product comparison. Real merchant interest is defensive "Agent Engine Optimization," not driven by current customer demand. Potential exists for high-frequency, low-decision purchases (like food delivery) or navigating complex store UIs, but these require massive B2C distribution channels dominated by giants like Amazon. 2. **Agent-to-API (Developer Services):** Developers already have subscriptions and billing relationships for APIs (compute, data). Prepaid balances solve micro-payment issues for low transaction volumes. A deeper structural problem is that major SaaS vendors' business models rely on enterprise contracts, resisting granular pay-per-call pricing. While protocols like MPP and x402 serve the long tail of niche services, this market is small and developers are historically low-willingness-to-pay. 3. **Agent-to-Agent:** This remains largely theoretical with minimal transaction volume. While it represents a long-term bet on a fundamentally new transaction infrastructure (sub-second, micro-penny to million-dollar, multi-party settlements), it does not constitute a present market. 4. **Agent-to-Finance:** This is the only category with existing, paying demand. Integrating AI into financial workflows (trading, portfolio management) is a natural evolution and enables new capabilities like autonomous rebalancing. However, competition favors established, regulated institutions. The "real problem" is not moving money between agents, but the broader challenge of **coordination**—orchestrating work between agents and humans, verifying outcomes, and settling results. Payment is just one component of settlement, which is itself part of coordination. Companies that solve the coordination layer will subsume payment, not the other way around. While well-funded incumbents build defensively for a long-term future, startups must find where the market is today—which, for the author's team, lies outside these four categories in an area of real, growing, and underserved activity.

marsbit3h ago

It Took Me a Year to See the Bitter Truth About Agent Payments

marsbit3h ago

It Took Me a Year to See the Hard Truth About Agent Payments

**Title: It Took Me a Year to See the Hard Truth About Agent Payments** Over the past year, I've worked on infrastructure for the Agent economy, engaging with major players like Stripe, Visa, Coinbase, and numerous startups. The findings reveal a stark reality: genuine, widespread demand for Agent-based payments does not yet exist. **Key Observations:** * **Agent-to-Merchant (Shopping):** The user experience for AI shopping often falls short, especially for visual product discovery. While AI excels at understanding needs, conversational interfaces can't yet replace browsing and comparing multiple products visually. Current merchant interest is largely defensive ("Agent Engine Optimization") for a future that hasn't arrived. High-frequency, low-friction purchases (like food delivery) are potential fits, but lack open APIs and face high AI inference costs. Simpler, more affordable, or cross-language interactions for complex UIs are a niche opportunity but require massive consumer distribution to scale. * **Agent-to-API (Developer Tools):** Developer payment needs for APIs (computing, data, models) are already met through subscriptions and prepaid credits. The core challenge is not payment friction but supplier economics: most large SaaS providers prefer enterprise contracts over micropayments for API calls. Protocols like MPP and x402 suit the long-tail of smaller services but cater to a developer market historically reluctant to pay for these tools. Major infrastructure needs at the top of the stack are already being addressed. * **Agent-to-Agent (Machine Commerce):** This is a long-term vision with almost no current transaction volume. While a future with high-speed, high-frequency, multi-party machine-to-machine transactions would require novel infrastructure, it remains theoretical. The market is not here yet. * **Agent-to-Finance:** This is the only category with clear, present demand. Financial professionals and DeFi users already pay for tools, and AI augmentation is a natural evolution. Autonomous AI agents can enable entirely new financial strategies. However, competition is fierce from established, regulated incumbents who can more easily layer AI onto their existing products. **The Core Insight:** Companies, especially giants with long time horizons, are building defensively for a potential future of mass machine commerce. For them, early investment is a low-cost hedge. For startups, the current market reality is different. The primary challenge isn't just moving money between agents (payments). The larger, unsolved problem is **orchestration** – coordinating work between agents and humans, verifying outcomes, and then settling. Payment is just a part of settlement, which is just a part of orchestration. Companies that solve the orchestration problem will subsume payments, not the other way around. After a year of building, we see the real, growing, and underserved market opportunity lies in this broader domain of orchestration.

链捕手3h ago

It Took Me a Year to See the Hard Truth About Agent Payments

链捕手3h ago

Trading

Spot
Futures
活动图片