Insider: DeepSeek Is Forming a Harness Team to Benchmark Against Claude Code

链捕手Publicado a 2026-05-22Actualizado a 2026-05-22

Resumen

DeepSeek is reportedly forming a dedicated "Harness" team to develop a code agent product, directly targeting Anthropic's Claude Code. According to internal sources and a social media post by DeepSeek senior researcher Chen Deli, the team will focus on building "DeepSeek Code Harness." The initiative involves recruiting for key roles like Harness Product Manager and Harness R&D Engineer in Beijing. DeepSeek defines its approach with the core formula: Model + Harness = Agent. This signifies a strategic shift from merely offering a powerful coding model to creating the essential middleware that connects the model to real-world developer workflows. The Harness will handle context management, tool calls, task planning, file operations, code editing, terminal execution, and feedback loops. The move highlights that competition in AI-assisted coding is evolving from pure model capability to ownership of the developer workflow entry point. While DeepSeek has strong foundational models (e.g., DeepSeek-Coder series), it has lacked an integrated, productized agent experience. The popularity of a community-built project, DeepSeek-TUI, demonstrated developer demand for a Claude Code-like tool using DeepSeek's models, but also revealed the limitations of unofficial solutions. By building its official Code Harness, DeepSeek aims to leverage its unique advantages: direct collaboration with its model training team, control over APIs and design, the ability to create a data feedback loop fo...

Author | Wang Bo, Jiazi Guangnian

According to information obtained by "Jiazi Guangnian" from a source close to DeepSeek, DeepSeek is internally organizing a new Harness team focused on a code agent product, internally benchmarking against Anthropic's Claude Code.

Senior DeepSeek researcher Chen Deli recently confirmed this on social media, stating, "DeepSeek is organizing a new Harness team to work on Harness-related products and research," and explicitly said, "In simple terms, it's to benchmark against Claude Code and create DeepSeek Code Harness."

This is not an ordinary recruitment.

Recruitment information shows DeepSeek is opening two key positions: Harness Product Manager and Harness R&D Engineer, currently limited to Beijing. DeepSeek's Beijing office is located in the R&F Centre in Haidian District, close to Peking University and Tsinghua University. Officially, it's located in the "Centennial Beijing-Zhangjiakou AI Innovation Belt," while colloquially, it's also in the recently popular "Wang Huiwen Area."

Core Definition: Model + Harness = Agent

In the job description, a core formula is placed in the most prominent position:

Model + Harness = Agent

This statement can almost be seen as DeepSeek's internal definition for the next phase of productization: the model itself is merely the foundation for an Agent. Everything outside the model—context management, tool calling, task planning, file reading/writing, code modification, terminal execution, feedback collection, and evaluation loops—is the critical part that enables the Agent to truly integrate into workflows.

The job description further states: "We are transforming DeepSeek's cutting-edge model capabilities into leading Agent products. All work beyond the model itself falls under the scope of Harness." Additionally, the role will participate in the entire process of developing "DeepSeek desktop Agent products" and "define DeepSeek's understanding of Harness."

Jiazi Guangnian analyzes that DeepSeek does not simply want to create a code assistant plugin but is complementing the middle layer that connects the model to real-world workflows.

Over the past year, the industry has proven: strong coding ability does not mean developers will actually adopt it; a model's capability to write code does not mean it can consistently complete an engineering task.

What truly changes developers' workflow is not the Claude model alone, but Claude Code; not the GPT model alone, but Codex; not just a code response in a chatbox, but an engineering agent capable of entering the terminal, understanding projects, reading/writing files, running commands, fixing errors, managing Git, and calling tools.

DeepSeek's past strength was its model. Now, it is beginning to add that layer of "hands" on top of the model.

I. Why DeepSeek Emphasizes Harness

In traditional AI product contexts, "code assistant" usually refers to two types of products: one is a completion plugin in the IDE, and the other is code Q&A in a chatbox.

But the term repeatedly appearing in this DeepSeek recruitment is not Code Assistant, but Harness.

Harness originally referred to a "test harness" or "execution framework" in an engineering context. In the Agent context, it is closer to an external system that enables the model to truly act. The model is responsible for understanding, reasoning, and generation; Harness is responsible for connecting these capabilities to the real environment.

The job description mentions that this role needs to plan the DeepSeek Harness product roadmap, connect researchers, engineers, the open-source community, and end-users, and communicate deeply with researchers from the model training team to achieve co-evolution between the model and Harness.

This is a crucial point.

It indicates that DeepSeek's goal is not just to wrap a shell around existing models but to make the Agent product itself a part of the model's evolution. In the past, the common product logic for large model companies was: the research team trains a model first, and then the product team builds applications based on the model's capabilities. However, in the Agent era, this sequence is being disrupted. The product is no longer just an outlet for model capabilities but a training ground for those capabilities.

A code Agent failing on a real project might not be due to product interaction issues but to incorrect methods of long-context compression by the model; it might not be a problem with the tool-calling pipeline but instability in the model's task decomposition strategy; it might not be insufficient coding ability but a lack of continuous understanding of engineering constraints, test feedback, and user intent.

Therefore, the value of the Harness team is not just "building a product" but turning real development tasks into a feedback source for continuous model evolution.

II. Why Must DeepSeek Complement Code Harness?

DeepSeek placed early bets on coding capabilities. From DeepSeek-Coder to DeepSeek-Coder-V2, DeepSeek has continuously increased investment in code models, with improvements in supported languages, context length, and complex task capabilities. Its problem is not a lack of coding capability but that this capability has largely remained at the model layer and has not yet become a high-frequency product in developers' daily workflows.

The popularity of Claude Code proves one thing: The competition in AI Coding is shifting from model capability competition to competition for developer workflow entry points.

This is also a lesson DeepSeek must learn now. More subtly, before DeepSeek officially stepped in, the developer community had already created a version of "DeepSeek Claude Code" for it.

An open-source project called DeepSeek-TUI previously gained popularity in the developer community. It is a coding agent running in the terminal that can read/write files, execute Shell commands, search the web, manage Git, and coordinate sub-agents through a TUI interface.

The popularity of DeepSeek-TUI highlights two issues:

  1. Foundation Mindshare is Mature: The DeepSeek model already has a foundation in developers' minds for being a code Agent. Otherwise, the community wouldn't naturally develop Claude Code-like products around it.

  2. Official Layer is Missing: DeepSeek lacks not model attention but an official Harness.

For developers, the appeal of DeepSeek-TUI is straightforward: low cost, domestic availability, long context, and relatively low deployment barriers. Many domestic developers aren't unwilling to use Claude Code but are constrained by price, access stability, account systems, and enterprise compliance.

However, community projects also have inherent limitations:

  • No matter how active a third-party open-source project is, it's difficult to truly grasp the rhythm of internal model capability evolution;

  • It can adapt around APIs but cannot reverse-decide how the model is trained;

  • It can work on prompts, toolchains, and interaction optimization, but it's difficult to systematically inject massive real-task feedback into model improvements.

This is precisely the significance of an official Harness.

By creating its own Code Harness, DeepSeek possesses several advantages that community projects lack: collaboration with the model team, design authority over interfaces, closed-loop training data, access to internal real task scenarios, and the ability for long-term developer ecosystem operations.

The open-source community has already paved the way: developers indeed need a DeepSeek version of Claude Code. Now, DeepSeek is reclaiming this path to build its own core product.

And DeepSeek officially starting to recruit means it is finally preparing to step onto the field.

Chen Deli mentioned at the 2025 World Internet Conference Wuzhen Summit last November: "One of our company's core advantages is long-termism, adhering to the main line of frontier AI breakthroughs. In this process, we have also abandoned many side paths, not engaging in those short and quick side projects."

After the model war, the real Agent war has begun. This time, DeepSeek is complementing the most critical layer between the model and action—Harness.

DeepSeek is equipping its model with a pair of hands.

Preguntas relacionadas

QWhat is the main objective of DeepSeek's new Harness team, according to the article?

AThe main objective is to develop a code agent product, specifically a code intelligent agent, internally benchmarked against Anthropic's Claude Code. The focus is on building the "harness" - the external system that enables the model to interact with real-world environments, tools, and workflows, moving beyond just the model's coding capabilities.

QWhat core formula is highlighted in DeepSeek's job description, and what does it signify?

AThe core formula is "Model + Harness = Agent." It signifies DeepSeek's internal definition for its productization path: the model is just the foundation of an agent. The crucial part that allows an agent to integrate into real workflows is the harness, which includes context management, tool calling, task planning, file I/O, code modification, terminal execution, feedback collection, and evaluation loops.

QWhy does the article suggest DeepSeek's move to build Code Harness is crucial, beyond just having strong coding models?

ABecause the competition in AI-assisted coding is shifting from pure model capability competition to a competition for becoming the entry point into developers' workflows. Products like Claude Code have shown that developers adopt tools that integrate deeply into their workflow (terminal, project understanding, Git, etc.), not just models that can generate code in a chat interface. DeepSeek needs to bridge this gap to turn its model strength into a high-frequency product.

QWhat is the significance of the community project 'DeepSeek-TUI' mentioned in the article?

ADeepSeek-TUI is a third-party, open-source terminal-based coding agent that gained popularity. Its significance is twofold: 1) It proves that developers already perceive DeepSeek's model as a solid foundation for a Claude Code-like agent, demonstrating mature developer mindshare. 2) It highlights a gap: the lack of an official, first-party harness from DeepSeek itself, which the company now aims to fill with its own team.

QWhat key advantage does an official DeepSeek Harness team have over community projects, according to the article?

AAn official team has several key advantages: direct collaboration with the model research team, control over API/interface design, the ability to create a closed-loop system for training data and feedback from real tasks, access to internal real-world scenarios, and the capacity for long-term developer ecosystem operations. This allows for co-evolution of the model and the harness, which community projects cannot achieve.

Lecturas Relacionadas

STRC Breaks Below $95: Why Does It Continue to Depeg? Is There Default Risk?

"STRC Falls Below $95: Why the Persistent Depegging and Is There Default Risk?" The article discusses the recent decline in the price of STRC, a perpetual preferred stock issued by Strategy (MSTR) designed to trade around a $100 par value. As of publication, STRC traded at $94.65, raising market concerns. STRC is described as a high-yield cash flow product, offering an 11.50% annual dividend paid monthly. Its "preferred" status grants it priority over common stock for dividends and in liquidation. Key reasons cited for the price depegging include: 1. **Bitcoin's Price Drop:** MSTR's assets are heavily tied to Bitcoin (BTC), which fell over 21% from its recent high, pressuring all Strategy-related products. 2. **Competitive Pressure:** Rival Strive Asset Management's similar product, SATA, offers daily dividends and has maintained its $100 par value with a ~13% yield. In response, Strategy has proposed changing STRC's dividend frequency from monthly to bi-weekly, pending shareholder vote. 3. **Technical Selling:** A break below $100 may have triggered algorithmic selling and stop-losses, exacerbating the decline. Regarding default risk, the analysis suggests it is currently low. Strategy founder Michael Saylor confirmed the June 2026 dividend rate remains at 11.50% with no cuts or suspensions. The company's massive reserve of 843,706 BTC provides a significant backstop for its obligations. Industry opinions are mixed. Some analysts view the BTC holdings as reliable support for dividends, while critics like Peter Schiff warn of potential dividend cuts leading to price crashes and lawsuits. Others highlight inflation risk and the company's ability to reduce dividends without a formal default. In summary, STRC's drop is attributed to BTC volatility, competition, and technical factors. While immediate default risk appears contained, the product faces challenges from market conditions and competitive dynamics.

marsbitHace 29 min(s)

STRC Breaks Below $95: Why Does It Continue to Depeg? Is There Default Risk?

marsbitHace 29 min(s)

AI Trading Cools, South Korean Stocks Plunge 1.8%, Spot Gold Rises 1%, Bitcoin Dives

A sell-off in AI-related stocks, triggered by Broadcom's disappointing earnings forecast, sent shockwaves through global markets. South Korea's KOSPI led Asia's decline, plunging 1.8% as the risks from concentrated chip stock gains and surging leveraged investments came to the fore. The tech-heavy Nasdaq 100 futures fell 0.5% following Broadcom's 14% after-hours plunge, which signaled a slower-than-expected transition to AI clients. This pullback extended Wall Street's weakness, halting the S&P 500's nine-day rally amid hawkish Fed signals and renewed Middle East tensions. South Korean authorities convened an emergency meeting, pledging "immediate measures" against market volatility and warning of record-high stock margin debt. The adjustment rippled across assets: Bitcoin fell to around $64,000, its lowest since February, while safe-haven gold rose 1% on bargain hunting. Oil prices dipped on Middle East ceasefire news. Market analysts noted the sell-off was driven by profit-taking after massive gains, particularly in chip stocks like Samsung and SK Hynix, which now dominate the KOSPI. Wall Street banks are divided on Korea's outlook, with Goldman Sachs raising its target while Citigroup and others warn of overvaluation and a potential bubble. Bridgewater's Ray Dalio noted that great technological shifts often create bubbles. Meanwhile, Fed officials' hints at potential future rate hikes added to the cautious mood ahead of key U.S. jobs data.

华尔街日报Hace 56 min(s)

AI Trading Cools, South Korean Stocks Plunge 1.8%, Spot Gold Rises 1%, Bitcoin Dives

华尔街日报Hace 56 min(s)

Seeking Alpha's Hot Article: Why Might the U.S. Stock Market Crash in June?

In a recent Seeking Alpha article, financial professor and analyst Damir Tokic argues that the US stock market may be poised for a significant crash in June 2026. The core thesis centers on a "mega-bubble" in equities, particularly within the technology sector, which has driven the S&P 500 to near-record valuations, with a Shiller P/E ratio exceeding 40—a level comparable to the 2000 dot-com bubble. Tokic identifies two primary catalysts for a potential collapse. First, he points to unsustainable market exuberance fueled by what he terms the "Trump Stimulus"—massive AI capital expenditure by tech giants, which he believes is politically driven and cannot last. Second, and more urgently, he highlights the escalating Iran war as a critical threat. The ongoing closure of the Strait of Hormuz has created a severe global energy supply crunch. Strategic petroleum reserves are projected to hit critically low operational levels by June, potentially causing oil prices to spike above $200 per barrel and triggering a severe, supply-driven inflationary shock. This scenario, Tokic warns, would force the Federal Reserve's hand. Despite currently maintaining a dovish bias, the Fed would likely be compelled to officially pivot to a hawkish stance at its June FOMC meeting to combat soaring inflation and bond yields. He contends that such a shift—or even a failure to act, which would destroy Fed credibility—could be the trigger that punctures the market bubble. The resulting downturn, he concludes, could rival the bear markets of 2000 and 2008, advising investors to prepare for a major correction.

marsbitHace 1 hora(s)

Seeking Alpha's Hot Article: Why Might the U.S. Stock Market Crash in June?

marsbitHace 1 hora(s)

Trading

Spot
Futuros
活动图片