Insider: DeepSeek Is Forming a Harness Team to Benchmark Against Claude Code

链捕手Опубликовано 2026-05-22Обновлено 2026-05-22

Введение

DeepSeek is reportedly forming a dedicated "Harness" team to develop a code agent product, directly targeting Anthropic's Claude Code. According to internal sources and a social media post by DeepSeek senior researcher Chen Deli, the team will focus on building "DeepSeek Code Harness." The initiative involves recruiting for key roles like Harness Product Manager and Harness R&D Engineer in Beijing. DeepSeek defines its approach with the core formula: Model + Harness = Agent. This signifies a strategic shift from merely offering a powerful coding model to creating the essential middleware that connects the model to real-world developer workflows. The Harness will handle context management, tool calls, task planning, file operations, code editing, terminal execution, and feedback loops. The move highlights that competition in AI-assisted coding is evolving from pure model capability to ownership of the developer workflow entry point. While DeepSeek has strong foundational models (e.g., DeepSeek-Coder series), it has lacked an integrated, productized agent experience. The popularity of a community-built project, DeepSeek-TUI, demonstrated developer demand for a Claude Code-like tool using DeepSeek's models, but also revealed the limitations of unofficial solutions. By building its official Code Harness, DeepSeek aims to leverage its unique advantages: direct collaboration with its model training team, control over APIs and design, the ability to create a data feedback loop fo...

Author | Wang Bo, Jiazi Guangnian

According to information obtained by "Jiazi Guangnian" from a source close to DeepSeek, DeepSeek is internally organizing a new Harness team focused on a code agent product, internally benchmarking against Anthropic's Claude Code.

Senior DeepSeek researcher Chen Deli recently confirmed this on social media, stating, "DeepSeek is organizing a new Harness team to work on Harness-related products and research," and explicitly said, "In simple terms, it's to benchmark against Claude Code and create DeepSeek Code Harness."

This is not an ordinary recruitment.

Recruitment information shows DeepSeek is opening two key positions: Harness Product Manager and Harness R&D Engineer, currently limited to Beijing. DeepSeek's Beijing office is located in the R&F Centre in Haidian District, close to Peking University and Tsinghua University. Officially, it's located in the "Centennial Beijing-Zhangjiakou AI Innovation Belt," while colloquially, it's also in the recently popular "Wang Huiwen Area."

Core Definition: Model + Harness = Agent

In the job description, a core formula is placed in the most prominent position:

Model + Harness = Agent

This statement can almost be seen as DeepSeek's internal definition for the next phase of productization: the model itself is merely the foundation for an Agent. Everything outside the model—context management, tool calling, task planning, file reading/writing, code modification, terminal execution, feedback collection, and evaluation loops—is the critical part that enables the Agent to truly integrate into workflows.

The job description further states: "We are transforming DeepSeek's cutting-edge model capabilities into leading Agent products. All work beyond the model itself falls under the scope of Harness." Additionally, the role will participate in the entire process of developing "DeepSeek desktop Agent products" and "define DeepSeek's understanding of Harness."

Jiazi Guangnian analyzes that DeepSeek does not simply want to create a code assistant plugin but is complementing the middle layer that connects the model to real-world workflows.

Over the past year, the industry has proven: strong coding ability does not mean developers will actually adopt it; a model's capability to write code does not mean it can consistently complete an engineering task.

What truly changes developers' workflow is not the Claude model alone, but Claude Code; not the GPT model alone, but Codex; not just a code response in a chatbox, but an engineering agent capable of entering the terminal, understanding projects, reading/writing files, running commands, fixing errors, managing Git, and calling tools.

DeepSeek's past strength was its model. Now, it is beginning to add that layer of "hands" on top of the model.

I. Why DeepSeek Emphasizes Harness

In traditional AI product contexts, "code assistant" usually refers to two types of products: one is a completion plugin in the IDE, and the other is code Q&A in a chatbox.

But the term repeatedly appearing in this DeepSeek recruitment is not Code Assistant, but Harness.

Harness originally referred to a "test harness" or "execution framework" in an engineering context. In the Agent context, it is closer to an external system that enables the model to truly act. The model is responsible for understanding, reasoning, and generation; Harness is responsible for connecting these capabilities to the real environment.

The job description mentions that this role needs to plan the DeepSeek Harness product roadmap, connect researchers, engineers, the open-source community, and end-users, and communicate deeply with researchers from the model training team to achieve co-evolution between the model and Harness.

This is a crucial point.

It indicates that DeepSeek's goal is not just to wrap a shell around existing models but to make the Agent product itself a part of the model's evolution. In the past, the common product logic for large model companies was: the research team trains a model first, and then the product team builds applications based on the model's capabilities. However, in the Agent era, this sequence is being disrupted. The product is no longer just an outlet for model capabilities but a training ground for those capabilities.

A code Agent failing on a real project might not be due to product interaction issues but to incorrect methods of long-context compression by the model; it might not be a problem with the tool-calling pipeline but instability in the model's task decomposition strategy; it might not be insufficient coding ability but a lack of continuous understanding of engineering constraints, test feedback, and user intent.

Therefore, the value of the Harness team is not just "building a product" but turning real development tasks into a feedback source for continuous model evolution.

II. Why Must DeepSeek Complement Code Harness?

DeepSeek placed early bets on coding capabilities. From DeepSeek-Coder to DeepSeek-Coder-V2, DeepSeek has continuously increased investment in code models, with improvements in supported languages, context length, and complex task capabilities. Its problem is not a lack of coding capability but that this capability has largely remained at the model layer and has not yet become a high-frequency product in developers' daily workflows.

The popularity of Claude Code proves one thing: The competition in AI Coding is shifting from model capability competition to competition for developer workflow entry points.

This is also a lesson DeepSeek must learn now. More subtly, before DeepSeek officially stepped in, the developer community had already created a version of "DeepSeek Claude Code" for it.

An open-source project called DeepSeek-TUI previously gained popularity in the developer community. It is a coding agent running in the terminal that can read/write files, execute Shell commands, search the web, manage Git, and coordinate sub-agents through a TUI interface.

The popularity of DeepSeek-TUI highlights two issues:

  1. Foundation Mindshare is Mature: The DeepSeek model already has a foundation in developers' minds for being a code Agent. Otherwise, the community wouldn't naturally develop Claude Code-like products around it.

  2. Official Layer is Missing: DeepSeek lacks not model attention but an official Harness.

For developers, the appeal of DeepSeek-TUI is straightforward: low cost, domestic availability, long context, and relatively low deployment barriers. Many domestic developers aren't unwilling to use Claude Code but are constrained by price, access stability, account systems, and enterprise compliance.

However, community projects also have inherent limitations:

  • No matter how active a third-party open-source project is, it's difficult to truly grasp the rhythm of internal model capability evolution;

  • It can adapt around APIs but cannot reverse-decide how the model is trained;

  • It can work on prompts, toolchains, and interaction optimization, but it's difficult to systematically inject massive real-task feedback into model improvements.

This is precisely the significance of an official Harness.

By creating its own Code Harness, DeepSeek possesses several advantages that community projects lack: collaboration with the model team, design authority over interfaces, closed-loop training data, access to internal real task scenarios, and the ability for long-term developer ecosystem operations.

The open-source community has already paved the way: developers indeed need a DeepSeek version of Claude Code. Now, DeepSeek is reclaiming this path to build its own core product.

And DeepSeek officially starting to recruit means it is finally preparing to step onto the field.

Chen Deli mentioned at the 2025 World Internet Conference Wuzhen Summit last November: "One of our company's core advantages is long-termism, adhering to the main line of frontier AI breakthroughs. In this process, we have also abandoned many side paths, not engaging in those short and quick side projects."

After the model war, the real Agent war has begun. This time, DeepSeek is complementing the most critical layer between the model and action—Harness.

DeepSeek is equipping its model with a pair of hands.

Связанные с этим вопросы

QWhat is the main objective of DeepSeek's new Harness team, according to the article?

AThe main objective is to develop a code agent product, specifically a code intelligent agent, internally benchmarked against Anthropic's Claude Code. The focus is on building the "harness" - the external system that enables the model to interact with real-world environments, tools, and workflows, moving beyond just the model's coding capabilities.

QWhat core formula is highlighted in DeepSeek's job description, and what does it signify?

AThe core formula is "Model + Harness = Agent." It signifies DeepSeek's internal definition for its productization path: the model is just the foundation of an agent. The crucial part that allows an agent to integrate into real workflows is the harness, which includes context management, tool calling, task planning, file I/O, code modification, terminal execution, feedback collection, and evaluation loops.

QWhy does the article suggest DeepSeek's move to build Code Harness is crucial, beyond just having strong coding models?

ABecause the competition in AI-assisted coding is shifting from pure model capability competition to a competition for becoming the entry point into developers' workflows. Products like Claude Code have shown that developers adopt tools that integrate deeply into their workflow (terminal, project understanding, Git, etc.), not just models that can generate code in a chat interface. DeepSeek needs to bridge this gap to turn its model strength into a high-frequency product.

QWhat is the significance of the community project 'DeepSeek-TUI' mentioned in the article?

ADeepSeek-TUI is a third-party, open-source terminal-based coding agent that gained popularity. Its significance is twofold: 1) It proves that developers already perceive DeepSeek's model as a solid foundation for a Claude Code-like agent, demonstrating mature developer mindshare. 2) It highlights a gap: the lack of an official, first-party harness from DeepSeek itself, which the company now aims to fill with its own team.

QWhat key advantage does an official DeepSeek Harness team have over community projects, according to the article?

AAn official team has several key advantages: direct collaboration with the model research team, control over API/interface design, the ability to create a closed-loop system for training data and feedback from real tasks, access to internal real-world scenarios, and the capacity for long-term developer ecosystem operations. This allows for co-evolution of the model and the harness, which community projects cannot achieve.

Похожее

Is MicroStrategy Selling Bitcoin Not a Bearish Signal? Deconstructing the 5 Financial Logics Behind Corporate Bitcoin Divestment

The article "Is Strategy Selling Bitcoin Not a Bearish Signal? Decoding 5 Financial Logics Behind Corporate Bitcoin Divestment" analyzes why companies might sell their bitcoin holdings, arguing it's not necessarily negative. It begins by noting the market's surprise at Strategy's potential sale, contrasting its previous "never sell" stance. The core argument is that corporate decisions prioritize shareholder value, and selling bitcoin can be a rational strategic choice. The article outlines five key financial reasons for such sales: 1. **Increase Bitcoin Holdings Per Share:** Companies can use proceeds from bitcoin sales to repurchase shares when the stock price is undervalued relative to its bitcoin assets. This reduces the outstanding share count, potentially increasing the bitcoin amount backing each remaining share. 2. **Optimize Capital Structure & Reduce Financing Costs:** Building cash reserves through bitcoin sales can improve credit ratings (as favored by agencies like S&P), leading to lower future borrowing costs. Repaying debt with sale proceeds also reduces financial leverage. 3. **Legitimate Tax Planning:** In the absence of wash-sale rules for bitcoin in the US, companies can sell to realize capital losses, then repurchase, lowering the tax basis of their holdings and creating tax offsets. 4. **Counter Negative Market Narratives:** A controlled, non-disruptive sale could demonstrate market resilience and disprove fears that corporate selling would crash the market, thereby normalizing bitcoin as a corporate treasury asset. 5. **Repurchase Preferred Stock at a Discount:** If a company's preferred stock trades significantly below its face value, using bitcoin sale proceeds to repurchase it can retire expensive liabilities at a profit, saving on future dividend payments. The conclusion emphasizes that bitcoin's monetary properties offer flexibility. Strategic sales can protect corporate and shareholder interests, making asset utilization more important than rigid "hold" mandates.

marsbit3 мин. назад

Is MicroStrategy Selling Bitcoin Not a Bearish Signal? Deconstructing the 5 Financial Logics Behind Corporate Bitcoin Divestment

marsbit3 мин. назад

Why Did Zhipu Surge Nearly 30% in a Single Day?

"Global AI Model Unicorn" Zhipu's stock surged nearly 30% in a single day, reaching a new market cap high. The catalyst was the launch of its GLM-5.1-highspeed API, boasting a generation speed of **400 tokens per second**, setting a new global benchmark. This speed, roughly 3-5 times faster than industry leaders like OpenAI's GPT-4o and Anthropic's Claude, is achieved **without compromising the full-scale model's capabilities**. In the era of AI Agents requiring dozens of self-calls, such latency reduction is critical, transforming speed from a system metric into a determinant of intelligence limits. The breakthrough stems from a three-layer technical overhaul: 1. **TileRT Inference Engine**: Compiles the entire model into a continuous, always-on computation pipeline using "Warp Specialization," minimizing GPU idle time by having different processor groups handle data loading, computation, and communication in parallel. 2. **Heterogeneous Parallelism for MLA**: To efficiently run the GLM-5.1 model using the MLA attention mechanism, TileRT employs a heterogeneous strategy. One GPU handles sparse indexing/routing, while the others perform dense computation, optimizing for MLA's unique workflow. 3. **ZCube Network Architecture**: Replaces the standard Spine-Leaf (ROFT) network topology with a flat, dual-group interconnect. This design creates a single optimal path between any two GPUs, eliminating network congestion at scale and reducing latency. The business impact is significant: a 15% increase in cluster throughput (free extra capacity), a 40.6% reduction in tail latency (improved stability), and a one-third cut in networking hardware costs. Long-term, this innovation challenges the dominance of NVIDIA's integrated hardware-software stack (GPU+NVLink+InfiniBand), potentially benefiting manufacturers of high-density Leaf switches and optical modules while lowering the software barrier for domestic AI chips like Huawei's Ascend. The innovation proves that more can be achieved with the same compute, reshaping the infrastructure beyond just GPUs.

marsbit1 ч. назад

Why Did Zhipu Surge Nearly 30% in a Single Day?

marsbit1 ч. назад

Bidding Farewell to the 'Gray Gambling Game'! Polymarket Charges into the Compliance Track—How Will This Impact the Entire Crypto Industry?

From Gray to Regulated: How Polymarket’s Compliance Journey Reshapes Crypto The evolution of Polymarket, a decentralized prediction market platform, illustrates a critical trend in crypto: innovative, high-value sectors ultimately integrate into regulatory frameworks. Founded in 2020, Polymarket quickly gained traction by leveraging low-cost Layer 2 blockchain technology for event-based trading, notably during the 2024 US presidential election where its markets outperformed traditional polls. However, its "build first, comply later" approach led to a 2022 CFTC enforcement action, resulting in a $1.4 million fine and a ban from the US market. A pivotal shift occurred in 2025 under a new US administration. Polymarket strategically acquired CFTC-licensed derivatives exchange QCX for $112 million, securing a regulated pathway back into the US. This move coincided with a regulatory reversal, as the CFTC withdrew a prior proposal to ban political event contracts. The platform’s successful "regulatory acquisition" strategy, avoiding a lengthy independent licensing process, highlights a viable compliance path for crypto-native projects. Its journey from regulatory target to a CFTC-recognized entity—bolstered by a major data partnership and investment from Intercontinental Exchange (ICE)—signals the maturation of prediction markets from a "crypto novelty" into acknowledged financial infrastructure. The story underscores that genuine utility provides negotiating power with regulators and that embracing compliance does not necessarily mean sacrificing core technological advantages.

marsbit1 ч. назад

Bidding Farewell to the 'Gray Gambling Game'! Polymarket Charges into the Compliance Track—How Will This Impact the Entire Crypto Industry?

marsbit1 ч. назад

Торговля

Спот
Фьючерсы
活动图片