Insider: DeepSeek Is Forming a Harness Team to Benchmark Against Claude Code

链捕手Опубликовано 2026-05-22Обновлено 2026-05-22

Введение

DeepSeek is reportedly forming a dedicated "Harness" team to develop a code agent product, directly targeting Anthropic's Claude Code. According to internal sources and a social media post by DeepSeek senior researcher Chen Deli, the team will focus on building "DeepSeek Code Harness." The initiative involves recruiting for key roles like Harness Product Manager and Harness R&D Engineer in Beijing. DeepSeek defines its approach with the core formula: Model + Harness = Agent. This signifies a strategic shift from merely offering a powerful coding model to creating the essential middleware that connects the model to real-world developer workflows. The Harness will handle context management, tool calls, task planning, file operations, code editing, terminal execution, and feedback loops. The move highlights that competition in AI-assisted coding is evolving from pure model capability to ownership of the developer workflow entry point. While DeepSeek has strong foundational models (e.g., DeepSeek-Coder series), it has lacked an integrated, productized agent experience. The popularity of a community-built project, DeepSeek-TUI, demonstrated developer demand for a Claude Code-like tool using DeepSeek's models, but also revealed the limitations of unofficial solutions. By building its official Code Harness, DeepSeek aims to leverage its unique advantages: direct collaboration with its model training team, control over APIs and design, the ability to create a data feedback loop fo...

Author | Wang Bo, Jiazi Guangnian

According to information obtained by "Jiazi Guangnian" from a source close to DeepSeek, DeepSeek is internally organizing a new Harness team focused on a code agent product, internally benchmarking against Anthropic's Claude Code.

Senior DeepSeek researcher Chen Deli recently confirmed this on social media, stating, "DeepSeek is organizing a new Harness team to work on Harness-related products and research," and explicitly said, "In simple terms, it's to benchmark against Claude Code and create DeepSeek Code Harness."

This is not an ordinary recruitment.

Recruitment information shows DeepSeek is opening two key positions: Harness Product Manager and Harness R&D Engineer, currently limited to Beijing. DeepSeek's Beijing office is located in the R&F Centre in Haidian District, close to Peking University and Tsinghua University. Officially, it's located in the "Centennial Beijing-Zhangjiakou AI Innovation Belt," while colloquially, it's also in the recently popular "Wang Huiwen Area."

Core Definition: Model + Harness = Agent

In the job description, a core formula is placed in the most prominent position:

Model + Harness = Agent

This statement can almost be seen as DeepSeek's internal definition for the next phase of productization: the model itself is merely the foundation for an Agent. Everything outside the model—context management, tool calling, task planning, file reading/writing, code modification, terminal execution, feedback collection, and evaluation loops—is the critical part that enables the Agent to truly integrate into workflows.

The job description further states: "We are transforming DeepSeek's cutting-edge model capabilities into leading Agent products. All work beyond the model itself falls under the scope of Harness." Additionally, the role will participate in the entire process of developing "DeepSeek desktop Agent products" and "define DeepSeek's understanding of Harness."

Jiazi Guangnian analyzes that DeepSeek does not simply want to create a code assistant plugin but is complementing the middle layer that connects the model to real-world workflows.

Over the past year, the industry has proven: strong coding ability does not mean developers will actually adopt it; a model's capability to write code does not mean it can consistently complete an engineering task.

What truly changes developers' workflow is not the Claude model alone, but Claude Code; not the GPT model alone, but Codex; not just a code response in a chatbox, but an engineering agent capable of entering the terminal, understanding projects, reading/writing files, running commands, fixing errors, managing Git, and calling tools.

DeepSeek's past strength was its model. Now, it is beginning to add that layer of "hands" on top of the model.

I. Why DeepSeek Emphasizes Harness

In traditional AI product contexts, "code assistant" usually refers to two types of products: one is a completion plugin in the IDE, and the other is code Q&A in a chatbox.

But the term repeatedly appearing in this DeepSeek recruitment is not Code Assistant, but Harness.

Harness originally referred to a "test harness" or "execution framework" in an engineering context. In the Agent context, it is closer to an external system that enables the model to truly act. The model is responsible for understanding, reasoning, and generation; Harness is responsible for connecting these capabilities to the real environment.

The job description mentions that this role needs to plan the DeepSeek Harness product roadmap, connect researchers, engineers, the open-source community, and end-users, and communicate deeply with researchers from the model training team to achieve co-evolution between the model and Harness.

This is a crucial point.

It indicates that DeepSeek's goal is not just to wrap a shell around existing models but to make the Agent product itself a part of the model's evolution. In the past, the common product logic for large model companies was: the research team trains a model first, and then the product team builds applications based on the model's capabilities. However, in the Agent era, this sequence is being disrupted. The product is no longer just an outlet for model capabilities but a training ground for those capabilities.

A code Agent failing on a real project might not be due to product interaction issues but to incorrect methods of long-context compression by the model; it might not be a problem with the tool-calling pipeline but instability in the model's task decomposition strategy; it might not be insufficient coding ability but a lack of continuous understanding of engineering constraints, test feedback, and user intent.

Therefore, the value of the Harness team is not just "building a product" but turning real development tasks into a feedback source for continuous model evolution.

II. Why Must DeepSeek Complement Code Harness?

DeepSeek placed early bets on coding capabilities. From DeepSeek-Coder to DeepSeek-Coder-V2, DeepSeek has continuously increased investment in code models, with improvements in supported languages, context length, and complex task capabilities. Its problem is not a lack of coding capability but that this capability has largely remained at the model layer and has not yet become a high-frequency product in developers' daily workflows.

The popularity of Claude Code proves one thing: The competition in AI Coding is shifting from model capability competition to competition for developer workflow entry points.

This is also a lesson DeepSeek must learn now. More subtly, before DeepSeek officially stepped in, the developer community had already created a version of "DeepSeek Claude Code" for it.

An open-source project called DeepSeek-TUI previously gained popularity in the developer community. It is a coding agent running in the terminal that can read/write files, execute Shell commands, search the web, manage Git, and coordinate sub-agents through a TUI interface.

The popularity of DeepSeek-TUI highlights two issues:

  1. Foundation Mindshare is Mature: The DeepSeek model already has a foundation in developers' minds for being a code Agent. Otherwise, the community wouldn't naturally develop Claude Code-like products around it.

  2. Official Layer is Missing: DeepSeek lacks not model attention but an official Harness.

For developers, the appeal of DeepSeek-TUI is straightforward: low cost, domestic availability, long context, and relatively low deployment barriers. Many domestic developers aren't unwilling to use Claude Code but are constrained by price, access stability, account systems, and enterprise compliance.

However, community projects also have inherent limitations:

  • No matter how active a third-party open-source project is, it's difficult to truly grasp the rhythm of internal model capability evolution;

  • It can adapt around APIs but cannot reverse-decide how the model is trained;

  • It can work on prompts, toolchains, and interaction optimization, but it's difficult to systematically inject massive real-task feedback into model improvements.

This is precisely the significance of an official Harness.

By creating its own Code Harness, DeepSeek possesses several advantages that community projects lack: collaboration with the model team, design authority over interfaces, closed-loop training data, access to internal real task scenarios, and the ability for long-term developer ecosystem operations.

The open-source community has already paved the way: developers indeed need a DeepSeek version of Claude Code. Now, DeepSeek is reclaiming this path to build its own core product.

And DeepSeek officially starting to recruit means it is finally preparing to step onto the field.

Chen Deli mentioned at the 2025 World Internet Conference Wuzhen Summit last November: "One of our company's core advantages is long-termism, adhering to the main line of frontier AI breakthroughs. In this process, we have also abandoned many side paths, not engaging in those short and quick side projects."

After the model war, the real Agent war has begun. This time, DeepSeek is complementing the most critical layer between the model and action—Harness.

DeepSeek is equipping its model with a pair of hands.

Связанные с этим вопросы

QWhat is the main objective of DeepSeek's new Harness team, according to the article?

AThe main objective is to develop a code agent product, specifically a code intelligent agent, internally benchmarked against Anthropic's Claude Code. The focus is on building the "harness" - the external system that enables the model to interact with real-world environments, tools, and workflows, moving beyond just the model's coding capabilities.

QWhat core formula is highlighted in DeepSeek's job description, and what does it signify?

AThe core formula is "Model + Harness = Agent." It signifies DeepSeek's internal definition for its productization path: the model is just the foundation of an agent. The crucial part that allows an agent to integrate into real workflows is the harness, which includes context management, tool calling, task planning, file I/O, code modification, terminal execution, feedback collection, and evaluation loops.

QWhy does the article suggest DeepSeek's move to build Code Harness is crucial, beyond just having strong coding models?

ABecause the competition in AI-assisted coding is shifting from pure model capability competition to a competition for becoming the entry point into developers' workflows. Products like Claude Code have shown that developers adopt tools that integrate deeply into their workflow (terminal, project understanding, Git, etc.), not just models that can generate code in a chat interface. DeepSeek needs to bridge this gap to turn its model strength into a high-frequency product.

QWhat is the significance of the community project 'DeepSeek-TUI' mentioned in the article?

ADeepSeek-TUI is a third-party, open-source terminal-based coding agent that gained popularity. Its significance is twofold: 1) It proves that developers already perceive DeepSeek's model as a solid foundation for a Claude Code-like agent, demonstrating mature developer mindshare. 2) It highlights a gap: the lack of an official, first-party harness from DeepSeek itself, which the company now aims to fill with its own team.

QWhat key advantage does an official DeepSeek Harness team have over community projects, according to the article?

AAn official team has several key advantages: direct collaboration with the model research team, control over API/interface design, the ability to create a closed-loop system for training data and feedback from real tasks, access to internal real-world scenarios, and the capacity for long-term developer ecosystem operations. This allows for co-evolution of the model and the harness, which community projects cannot achieve.

Похожее

Warsh's First Day in Office, Markets Deliver a 'Wake-up Call': Rate Hike Expected This Year

On his first day in office, newly inaugurated Federal Reserve Chairman Warsh received a stark market warning, with expectations now fully pricing in a 25-basis-point interest rate hike this year. The shift was triggered by hawkish remarks from Fed Governor Waller, who stated that inflation is now the key policy "driver" and that the odds of a hike or cut are evenly split. This sent short-term Treasury yields higher. Waller signaled a significant pivot in his stance, citing disappointing inflation and labor data. He suggested removing "easing bias" language from Fed statements and did not rule out future rate increases if inflation fails to recede, though he noted immediate action isn't warranted without signs of unanchored inflation expectations. Chairman Warsh faces immediate pressure at his first FOMC meeting in June. With the preferred inflation gauge at a three-year high, analysts warn that failing to hike could be interpreted as an implicit easing of policy. The geopolitical situation in the Middle East is adding to existing price pressures. The market's expectation for a hike contrasts sharply with earlier forecasts for multiple cuts. While long-term Treasury yields have been contained by lower energy prices recently, analysts note they remain under structural upward pressure. Warsh's swearing-in at the White House highlights political scrutiny over Fed independence. However, the market has made it clear that inflation is the most urgent challenge, leaving the new chairman little time to settle in.

marsbit12 ч. назад

Warsh's First Day in Office, Markets Deliver a 'Wake-up Call': Rate Hike Expected This Year

marsbit12 ч. назад

Торговля

Спот
Фьючерсы
活动图片