Insider: DeepSeek Is Forming a Harness Team to Benchmark Against Claude Code

链捕手Publicado a 2026-05-22Actualizado a 2026-05-22

Resumen

DeepSeek is reportedly forming a dedicated "Harness" team to develop a code agent product, directly targeting Anthropic's Claude Code. According to internal sources and a social media post by DeepSeek senior researcher Chen Deli, the team will focus on building "DeepSeek Code Harness." The initiative involves recruiting for key roles like Harness Product Manager and Harness R&D Engineer in Beijing. DeepSeek defines its approach with the core formula: Model + Harness = Agent. This signifies a strategic shift from merely offering a powerful coding model to creating the essential middleware that connects the model to real-world developer workflows. The Harness will handle context management, tool calls, task planning, file operations, code editing, terminal execution, and feedback loops. The move highlights that competition in AI-assisted coding is evolving from pure model capability to ownership of the developer workflow entry point. While DeepSeek has strong foundational models (e.g., DeepSeek-Coder series), it has lacked an integrated, productized agent experience. The popularity of a community-built project, DeepSeek-TUI, demonstrated developer demand for a Claude Code-like tool using DeepSeek's models, but also revealed the limitations of unofficial solutions. By building its official Code Harness, DeepSeek aims to leverage its unique advantages: direct collaboration with its model training team, control over APIs and design, the ability to create a data feedback loop fo...

Author | Wang Bo, Jiazi Guangnian

According to information obtained by "Jiazi Guangnian" from a source close to DeepSeek, DeepSeek is internally organizing a new Harness team focused on a code agent product, internally benchmarking against Anthropic's Claude Code.

Senior DeepSeek researcher Chen Deli recently confirmed this on social media, stating, "DeepSeek is organizing a new Harness team to work on Harness-related products and research," and explicitly said, "In simple terms, it's to benchmark against Claude Code and create DeepSeek Code Harness."

This is not an ordinary recruitment.

Recruitment information shows DeepSeek is opening two key positions: Harness Product Manager and Harness R&D Engineer, currently limited to Beijing. DeepSeek's Beijing office is located in the R&F Centre in Haidian District, close to Peking University and Tsinghua University. Officially, it's located in the "Centennial Beijing-Zhangjiakou AI Innovation Belt," while colloquially, it's also in the recently popular "Wang Huiwen Area."

Core Definition: Model + Harness = Agent

In the job description, a core formula is placed in the most prominent position:

Model + Harness = Agent

This statement can almost be seen as DeepSeek's internal definition for the next phase of productization: the model itself is merely the foundation for an Agent. Everything outside the model—context management, tool calling, task planning, file reading/writing, code modification, terminal execution, feedback collection, and evaluation loops—is the critical part that enables the Agent to truly integrate into workflows.

The job description further states: "We are transforming DeepSeek's cutting-edge model capabilities into leading Agent products. All work beyond the model itself falls under the scope of Harness." Additionally, the role will participate in the entire process of developing "DeepSeek desktop Agent products" and "define DeepSeek's understanding of Harness."

Jiazi Guangnian analyzes that DeepSeek does not simply want to create a code assistant plugin but is complementing the middle layer that connects the model to real-world workflows.

Over the past year, the industry has proven: strong coding ability does not mean developers will actually adopt it; a model's capability to write code does not mean it can consistently complete an engineering task.

What truly changes developers' workflow is not the Claude model alone, but Claude Code; not the GPT model alone, but Codex; not just a code response in a chatbox, but an engineering agent capable of entering the terminal, understanding projects, reading/writing files, running commands, fixing errors, managing Git, and calling tools.

DeepSeek's past strength was its model. Now, it is beginning to add that layer of "hands" on top of the model.

I. Why DeepSeek Emphasizes Harness

In traditional AI product contexts, "code assistant" usually refers to two types of products: one is a completion plugin in the IDE, and the other is code Q&A in a chatbox.

But the term repeatedly appearing in this DeepSeek recruitment is not Code Assistant, but Harness.

Harness originally referred to a "test harness" or "execution framework" in an engineering context. In the Agent context, it is closer to an external system that enables the model to truly act. The model is responsible for understanding, reasoning, and generation; Harness is responsible for connecting these capabilities to the real environment.

The job description mentions that this role needs to plan the DeepSeek Harness product roadmap, connect researchers, engineers, the open-source community, and end-users, and communicate deeply with researchers from the model training team to achieve co-evolution between the model and Harness.

This is a crucial point.

It indicates that DeepSeek's goal is not just to wrap a shell around existing models but to make the Agent product itself a part of the model's evolution. In the past, the common product logic for large model companies was: the research team trains a model first, and then the product team builds applications based on the model's capabilities. However, in the Agent era, this sequence is being disrupted. The product is no longer just an outlet for model capabilities but a training ground for those capabilities.

A code Agent failing on a real project might not be due to product interaction issues but to incorrect methods of long-context compression by the model; it might not be a problem with the tool-calling pipeline but instability in the model's task decomposition strategy; it might not be insufficient coding ability but a lack of continuous understanding of engineering constraints, test feedback, and user intent.

Therefore, the value of the Harness team is not just "building a product" but turning real development tasks into a feedback source for continuous model evolution.

II. Why Must DeepSeek Complement Code Harness?

DeepSeek placed early bets on coding capabilities. From DeepSeek-Coder to DeepSeek-Coder-V2, DeepSeek has continuously increased investment in code models, with improvements in supported languages, context length, and complex task capabilities. Its problem is not a lack of coding capability but that this capability has largely remained at the model layer and has not yet become a high-frequency product in developers' daily workflows.

The popularity of Claude Code proves one thing: The competition in AI Coding is shifting from model capability competition to competition for developer workflow entry points.

This is also a lesson DeepSeek must learn now. More subtly, before DeepSeek officially stepped in, the developer community had already created a version of "DeepSeek Claude Code" for it.

An open-source project called DeepSeek-TUI previously gained popularity in the developer community. It is a coding agent running in the terminal that can read/write files, execute Shell commands, search the web, manage Git, and coordinate sub-agents through a TUI interface.

The popularity of DeepSeek-TUI highlights two issues:

Foundation Mindshare is Mature: The DeepSeek model already has a foundation in developers' minds for being a code Agent. Otherwise, the community wouldn't naturally develop Claude Code-like products around it.
Official Layer is Missing: DeepSeek lacks not model attention but an official Harness.

For developers, the appeal of DeepSeek-TUI is straightforward: low cost, domestic availability, long context, and relatively low deployment barriers. Many domestic developers aren't unwilling to use Claude Code but are constrained by price, access stability, account systems, and enterprise compliance.

However, community projects also have inherent limitations:

No matter how active a third-party open-source project is, it's difficult to truly grasp the rhythm of internal model capability evolution;
It can adapt around APIs but cannot reverse-decide how the model is trained;
It can work on prompts, toolchains, and interaction optimization, but it's difficult to systematically inject massive real-task feedback into model improvements.

This is precisely the significance of an official Harness.

By creating its own Code Harness, DeepSeek possesses several advantages that community projects lack: collaboration with the model team, design authority over interfaces, closed-loop training data, access to internal real task scenarios, and the ability for long-term developer ecosystem operations.

The open-source community has already paved the way: developers indeed need a DeepSeek version of Claude Code. Now, DeepSeek is reclaiming this path to build its own core product.

And DeepSeek officially starting to recruit means it is finally preparing to step onto the field.

Chen Deli mentioned at the 2025 World Internet Conference Wuzhen Summit last November: "One of our company's core advantages is long-termism, adhering to the main line of frontier AI breakthroughs. In this process, we have also abandoned many side paths, not engaging in those short and quick side projects."

After the model war, the real Agent war has begun. This time, DeepSeek is complementing the most critical layer between the model and action—Harness.

DeepSeek is equipping its model with a pair of hands.

Preguntas relacionadas

QWhat is the main objective of DeepSeek's new Harness team, according to the article?

AThe main objective is to develop a code agent product, specifically a code intelligent agent, internally benchmarked against Anthropic's Claude Code. The focus is on building the "harness" - the external system that enables the model to interact with real-world environments, tools, and workflows, moving beyond just the model's coding capabilities.

QWhat core formula is highlighted in DeepSeek's job description, and what does it signify?

AThe core formula is "Model + Harness = Agent." It signifies DeepSeek's internal definition for its productization path: the model is just the foundation of an agent. The crucial part that allows an agent to integrate into real workflows is the harness, which includes context management, tool calling, task planning, file I/O, code modification, terminal execution, feedback collection, and evaluation loops.

QWhy does the article suggest DeepSeek's move to build Code Harness is crucial, beyond just having strong coding models?

ABecause the competition in AI-assisted coding is shifting from pure model capability competition to a competition for becoming the entry point into developers' workflows. Products like Claude Code have shown that developers adopt tools that integrate deeply into their workflow (terminal, project understanding, Git, etc.), not just models that can generate code in a chat interface. DeepSeek needs to bridge this gap to turn its model strength into a high-frequency product.

QWhat is the significance of the community project 'DeepSeek-TUI' mentioned in the article?

ADeepSeek-TUI is a third-party, open-source terminal-based coding agent that gained popularity. Its significance is twofold: 1) It proves that developers already perceive DeepSeek's model as a solid foundation for a Claude Code-like agent, demonstrating mature developer mindshare. 2) It highlights a gap: the lack of an official, first-party harness from DeepSeek itself, which the company now aims to fill with its own team.

QWhat key advantage does an official DeepSeek Harness team have over community projects, according to the article?

AAn official team has several key advantages: direct collaboration with the model research team, control over API/interface design, the ability to create a closed-loop system for training data and feedback from real tasks, access to internal real-world scenarios, and the capacity for long-term developer ecosystem operations. This allows for co-evolution of the model and the harness, which community projects cannot achieve.

Lecturas Relacionadas

United Stables Crosses $1B As Chainlink Data Feeds Secure U Token Collateral

United Stables' U token has surpassed $1 billion in market capitalization, with Chainlink Data Feeds providing the critical pricing and collateral data infrastructure across its deployment chains. This milestone highlights that robust oracle infrastructure is essential for stablecoin credibility, as DeFi protocols require reliable data for integration. Chainlink's role is foundational, enabling automated collateral auditing and pricing verification, which helps make the U token safer for the broader market to evaluate and use. While this strengthens Chainlink's position in the stablecoin infrastructure narrative, it does not directly guarantee increased fees or value for LINK token holders. The growth of U indicates both widening competition in the stablecoin sector and the increasing dependence of scalable stablecoins on dependable external data services.

bitcoinistHace 40 min(s)

United Stables Crosses $1B As Chainlink Data Feeds Secure U Token Collateral

bitcoinistHace 40 min(s)

BonkDAO Treasury Drain Shows Solana Governance Risk Is Real

The BonkDAO treasury suffered an approximately $20 million drain due to a malicious governance proposal passed through the Realms platform on Solana. This incident highlights governance as a critical attack surface for DAOs, distinct from smart contract exploits. The attack leveraged the DAO's own voting mechanics to move funds, demonstrating that flawed governance design—such as weak proposal rules or voter weight calculations—can be exploited even when the underlying blockchain functions correctly. The event serves as a warning for Solana DAOs to rigorously review and strengthen their governance safeguards, including quorum thresholds, timelocks, and execution permissions. While damaging to community trust, especially for the prominent BONK meme ecosystem, the issue is not a failure of the Solana base layer but a widespread application-level risk common across blockchain ecosystems.

bitcoinistHace 55 min(s)

BonkDAO Treasury Drain Shows Solana Governance Risk Is Real

bitcoinistHace 55 min(s)

Midnight Token Falls After $13M Wanchain Bridge Exploit

Midnight's NIGHT token plummeted roughly 30% following a $13.2 million exploit on a Wanchain cross-chain bridge. The attacker drained 515 million NIGHT tokens by exploiting a signature reuse flaw in the bridge's infrastructure, not the core Cardano or Midnight networks. This incident highlights the persistent vulnerability of cross-chain bridges, which are critical for asset transfers but introduce complex security risks. While the underlying blockchains remained secure, the token's price suffered due to liquidity disruption and lost market confidence. The response from Wanchain, including pausing the affected bridge and providing clear communication on recovery efforts, will be crucial for stabilizing the situation. The event serves as a reminder for traders to factor in bridge-related risks when evaluating tokens reliant on cross-chain liquidity.

bitcoinistHace 1 hora(s)

Midnight Token Falls After $13M Wanchain Bridge Exploit

bitcoinistHace 1 hora(s)

Bitcoin: Will a $130.5M whale move derail BTC’s push toward $70K?

A long-term Bitcoin whale moved 2,000 BTC (worth $130.5 million), transferring 800 BTC to an OTC desk (Cumberland) and 1,200 BTC to new addresses. While this suggests a potential partial sale, it did not negatively impact BTC's price, which continued its bullish trend to around $66,195. Technical indicators like the Stochastic Momentum Index and Squeeze Momentum Indicator remain positive, supporting a potential push toward $70,000. However, on-chain metrics show rising pressure. The Bitcoin Fund Flow Ratio and Exchange Netflow have turned positive, indicating increased inflows to exchanges. This typically raises the risk of short-term selling, which could weaken momentum and potentially test support near $64,800 if profit-taking persists.

ambcryptoHace 1 hora(s)

Bitcoin: Will a $130.5M whale move derail BTC’s push toward $70K?

ambcryptoHace 1 hora(s)

XRP Ledger Axelar Integration Opens A New Cross-Chain DeFi Route

The XRP Ledger has integrated with Axelar's interoperability stack, enabling XRP and other XRPL-native assets to move into cross-chain DeFi environments on EVM and Cosmos-based networks. This provides a bridge for XRP's significant liquidity to reach a wider array of decentralized finance applications beyond its native ecosystem. Importantly, the integration does not make XRPL a native EVM chain; it focuses on improving connectivity and utility. The move addresses the growing market expectation for multi-chain asset usability. However, the ultimate success of this integration depends on actual user adoption, the building of liquidity in DeFi protocols, and the development of practical cross-chain applications utilizing XRP.

bitcoinistHace 1 hora(s)

XRP Ledger Axelar Integration Opens A New Cross-Chain DeFi Route

bitcoinistHace 1 hora(s)

Trading

Spot