Beyond the Model Lies the Harness: Deepseek Enters the Arena, Why Has the Main Battlefield of China's AI Competition Shifted?

marsbitPublished on 2026-06-22Last updated on 2026-06-22

Abstract

In mid-to-late May 2026, Deepseek internally established a new Harness team focused on code agent products, internally benchmarked against Anthropic's Claude Code. This move, marked by the formula "Model + Harness = Agent" in their job postings, signals a major shift in China's AI competition: the main battlefield is transitioning from developing large models to building toolchains and achieving workplace integration. Deepseek's direct involvement in Harness development aims to secure control over interface design and training data feedback loops, moving beyond open-sourcing powerful models. Harness, the runtime infrastructure for AI agents, handles everything beyond model reasoning—task orchestration, tool calling, context management, safety checks, and error recovery. It is crucial because agent products are not just outputs of model capability but also training grounds for it. Real-world task failures recorded by Harness can feed back into model training, creating a flywheel effect. Engineering Harness is more critical than optimizing prompts, as poor context management or error handling can drastically reduce agent success rates in multi-step, real-world scenarios. This shift is not isolated. Other major Chinese tech companies are also pursuing differentiated toolchain strategies. Tencent leverages its enterprise ecosystem (WeChat Work, Tencent Cloud) to build connectors for organizational-level AI collaboration and complex task delivery. Alibaba focuses on lowering aut...

In late May 2026, Deepseek internally formed a new Harness team, focused on a code agent product, internally benchmarking against Anthropic's Claude Code. Cui Tianyi, a former star quantitative engineer from Jane Street, joined the team in March, with senior researcher Chen Deli publicly confirming and leading the recruitment. Deepseek's job description clearly states a formula: 'Model + Harness = Agent'. As the capabilities of foundational large models gradually converge, the era of simply competing on parameters is fading. Deepseek's direct entry in building a toolchain team marks a shift in the main battlefield of China's AI competition from 'refining large models' to 'building toolchains and office productivity integration'.

Why is Deepseek Building Its Own Harness?

For a long time, developers' expectations for Deepseek focused on open-sourcing more powerful base models. But strong coding capability doesn't mean developers will adopt it as a productivity tool. What truly changes workflows isn't code answers in a chatbox, but engineering agents that can enter terminals, understand projects, read/write files, run commands, and fix bugs. Before the official move, the developer community had already built various open-source terminal Agents based on Deepseek models. By forming the Harness team now, Deepseek aims to control interface design and training data loop closure, integrating community-developed pathways into official core products.

To understand this strategic intent, one must first clarify what 'Harness' is. For non-technical readers, the term 'Harness' might be unfamiliar. In Deepseek's formula, the model handles reasoning, and the Harness handles everything else. 'Harness' originally means 'horse tack' or 'safety belt' in engineering, extended in the AI field to refer to the 'runtime infrastructure' of an Agent.

For a more accessible analogy, consider a large model as the 'brain' and 'intelligence' of a highly capable employee, while the Harness is that employee's 'job description, KPI evaluation criteria, office blast walls, and toolbox'. It's not a 'scaffolding' assembled before runtime, nor a 'framework' providing building blocks, but a continuously running system. It orchestrates execution loops, dispatches tool calls, manages context, performs security checks, and handles error recovery and state persistence. The large model itself is stateless and lacks environmental interaction capability—it can only receive text input and output text. The Harness compensates for these flaws, enabling the model to truly interact with the external world and execute specific tasks.

Why must foundational model companies master this runtime themselves? The core reason is that Agent products are not just outlets for model capabilities but also training grounds. Deepseek's JD emphasizes 'achieving co-evolution of the model and Harness'. In real-world complex tasks, models encounter various failures due to environmental constraints or tool exceptions. Recording these failure trajectories via the Harness can feed back into model training, creating a flywheel effect. If left to the community, model providers risk losing core application-layer data feedback, becoming mere compute and weight providers.

From an engineering perspective, optimizing the Harness is more critical to Agent success than merely optimizing prompts. According to technical experts, in Agent runtime, tool outputs constitute 67.6% of the content the Agent actually sees in its context, while system prompts account for only 3.4%. This means most of the model's 'view' is occupied by tool call results. If the Harness mishandles tool output formatting or fails to compress redundant information effectively, the model suffers from 'context rot', causing subsequent reasoning quality to plummet.

More critical is the compound error problem. An Agent process with 10 steps, each 99% reliable, has an end-to-end success rate of about 90%. When task complexity rises to 50 steps, the success rate plummets to around 60%. In real-world scenarios like codebase maintenance or enterprise office automation, continuous operations spanning dozens of steps are common. Here, even the strongest model reasoning cannot compensate for the cumulative probability loss. Only through error handling and recovery mechanisms within the Harness can retries or path corrections occur upon step failures. This is the engineering value of Harness and precisely why Deepseek must enter this arena directly.

Tencent Makes Connectors, Alibaba Makes Frontend Inroads: Big Tech's Divergent Toolchain Paths

Deepseek's shift is not an isolated case. According to industry media, strengthening Agent capabilities has become a key development direction for domestic foundational large models in 2026. Foundational models are gradually becoming 'utilities', shifting the competitive main battlefield to the application layer. Other domestic tech giants are also carving out differentiation through toolchains, but with distinct approaches, reflecting their respective ecosystem endowments and target user bases.

In June 2026, Tencent played its new card for enterprise Agents, launching WorkBuddy Enterprise Edition. Its core positioning is a full-scenario workplace intelligent agent desktop workbench, focusing on shifting from individual efficiency to organizational collaboration. WorkBuddy Enterprise Edition supports multi-agent parallelism and business system Connector integration, aiming to seize the unified AI office entry point. Tencent's positioning logic leverages its vast WeCom (Enterprise WeChat) and Tencent Cloud ecosystem. For large enterprises, the pain point in AI office automation isn't the ultimate experience of a single-point tool, but whether it can integrate with internal siloed office systems. By building connectors, Tencent enables Agents to directly orchestrate enterprise data and workflows, focusing on organization-level collaboration and complex task delivery. This path's strength lies in high barriers; once integrated into core business processes, switching costs are immense. The challenge is the need for robust enterprise service capabilities and customized support.

Alibaba has taken a different path, choosing to lower automation barriers on the web frontend. Alibaba open-sourced the purely frontend, in-browser GUI Agent framework, PageAgent. This framework requires no backend deployment; a single line of code allows any website to integrate AI operator capabilities. Alibaba's positioning logic is empowering web developers, instantly transforming any webpage into an AI-native application. Given the reality that many legacy enterprise systems lack API interfaces, achieving automation through frontend DOM manipulation is a pragmatic, disruptive path. This approach's advantage is its lightweight, easy integration nature, enabling rapid coverage of a vast long tail of websites. However, frequent changes to frontend DOM structures pose stability challenges, demanding higher error recovery capabilities from the Harness.

In contrast, companies are no longer solely competing on model benchmarks but building toolchains based on their unique ecosystem strengths. Tencent focuses on connectors, Alibaba on frontend penetration, while Deepseek starts with the most critical pain point for developers: code engineering scenarios. This divergence indicates that China's AI industry has recognized there is no perfect, universal Agent—only vertical solutions honed through robust Harness engineering for specific scenarios. For enterprise procurement, choosing a toolchain essentially means choosing an automation path: deep integration with an office ecosystem, flexible embedding into existing web systems, or empowering developer engineering workflows.

Viktor's $20M ARR Proof: Enterprises Will Pay for Autonomous Execution

The maturation of toolchains is changing the paradigm of AI's role in the office. The native Copilot logic is 'draft and wait for human completion'—AI generates copy or code, with the final step requiring human intervention for modification and execution. In this mode, AI is merely an efficiency tool, not a true labor replacement. Employees must constantly monitor AI output for verification and implementation, which actually increases cognitive load.

Overseas markets already show clear signals of a paradigm shift. As a reference point for global trends, Poland-based AI office automation company Viktor, positioned as an AI employee within Slack, achieved a $20 million Annual Recurring Revenue (ARR) without a sales team, serving 30,000 companies, and secured a $75 million Series A funding round in May 2026. Viktor's model represents the end state of new AI employees: possessing a cloud computer, capable of long-duration continuous operation, firmly grasping massive context, and delivering results directly.

Viktor is positioned as a Tier 3 AI Coworker, meaning it handles not simple Q&A but complex tasks like marketing audits, ad campaign management, lead research—requiring multi-step, long-running operations. Enterprises show strong willingness to pay for this type of AI that requires no final human confirmation and can operate continuously for long periods. The explosion of such commercial data proves the value anchor of office automation has shifted from 'assistive generation' to 'autonomous execution'.

Domestic manufacturers' focus on Harness and Agent toolchains aims to capture this trend. When the Harness provides sufficient safety rails, state persistence, and error recovery capabilities, AI can evolve from an 'intern' requiring constant human supervision to an 'outsourcing partner' capable of independently delivering work outcomes. Enterprise procurement focus will shift from model parameter size to whether the Agent can run stably for 8 hours without crashing, automatically handle API rate limits, and adapt to webpage structure changes. For developers, this means the focus of building AI applications shifts from 'how to write good prompts' to 'how to design a robust runtime environment'.

Token Explosion and the Engineering Barriers of 'Thick Frameworks'

As competition shifts to toolchains, the challenges faced by enterprises and developers in practical implementation haven't decreased but have become more focused on the engineering layer.

First and foremost is the Token explosion problem. Agents running for extended durations, in their 'think, act, feedback' loops, are prone to rapidly inflating context due to redundant tool outputs. This is widely discussed in developer communities, as it not only drives up inference costs but also causes model attention to scatter, drastically increasing task failure rates. For example, in a web scraping task, if the Harness feeds the entire webpage's HTML source code unchanged into the context, the model quickly gets lost in redundant information, forgetting the original task objective. Therefore, the Harness's context compression and memory management capabilities become a core consideration for enterprise procurement. A superior Harness must know which historical information can be discarded and which tool return results need summarization. This tests deep engineering architectural capabilities, not the model's inherent intelligence.

This also heightens developer wariness towards 'thin-shell' frameworks. If the Harness launched by a large model provider is merely a simple API wrapper offering basic chat windows and tool-calling interfaces, it will lack practical debugging value. The fragility of production environments demands Harness features like sandbox isolation, fine-grained permission control, and checkpoint/restart—characteristics of a 'thick framework'. Only a runtime with solid engineering barriers can truly meet the stability needs of enterprise-grade applications. For instance, in code execution scenarios, the Harness must provide a safe sandbox environment to prevent malicious code generated by the model from harming the host system. For long-running tasks, it must support checkpoint/restart to avoid restarting entire tasks due to network fluctuations.

Furthermore, geopolitical factors create a significant market vacuum for domestic Harness solutions. Top overseas engineering agent products like Claude Code restrict access for mainland China and Chinese-affiliated enterprises. Unable to use these top tools directly, domestic developers can only seek domestic alternatives. Deepseek forming its Harness team is not just following a technical trend but also responding to this vast replacement demand.

For enterprises and developers, understanding the value of Harness means when selecting AI products, they won't be dazzled by flashy demo conversations but will instead probe into its error recovery mechanisms, context management strategies, and whether it can truly integrate into existing workflows. In the toolchain competition stage, enterprises should prioritize evaluating vendors' engineering delivery capabilities and ecosystem compatibility over simply comparing model benchmarks. Developers should focus on the Harness framework's openness and the completeness of its debugging toolchain, choosing platforms that offer deeply controllable runtimes.

Trending Cryptos

Related Questions

QWhat does the term 'Harness' refer to in the context of AI agents, according to the Deepseek article?

AIn the context of AI agents, the article defines 'Harness' as the "runtime infrastructure" that complements the core model. It is likened to a job description, KPI, safety protocols, and toolkit for a highly intelligent worker (the AI model). It manages the execution loop, tool calls, context, security, error recovery, and state persistence, enabling the stateless model to interact with the external world.

QWhy did Deepseek decide to build its own Harness team for code agents, as explained in the article?

ADeepseek built its own Harness team to master the interface design and establish a training data feedback loop. As model capabilities converge, the competition shifts to toolchains. An official Harness allows Deepseek to control the product, collect crucial failure data from real tasks to improve the model, and avoid becoming a mere model provider while the community builds the critical application layer.

QHow do the toolchain strategies of Tencent and Alibaba differ from each other, based on the article's analysis?

ATheir strategies differ based on their respective ecosystems. Tencent's WorkBuddy Enterprise focuses on being a connector and unified AI office entry point, leveraging its Tencent Meeting and corporate WeChat ecosystem to integrate with and orchestrate complex internal business systems for organizational tasks. Alibaba's PageAgent is a lightweight, front-end framework that enables AI automation directly within web browsers by manipulating the DOM, aiming to lower the barrier for web-based automation without backend APIs.

QWhat key shift in the value of office AI does the success of the company Viktor represent, according to the article?

AThe success of Viktor, with its $20M ARR, represents a shift in the value proposition of office AI from 'assisted generation' to 'autonomous execution.' Instead of just drafting content for humans to finalize, AI like Viktor acts as a Tier 3 coworker that can handle multi-step, long-running complex tasks (e.g., marketing audits) independently and deliver final results without constant human supervision or final approval.

QWhat are the main engineering challenges highlighted for running long-lived AI agents, and why is a 'thick framework' Harness important?

AThe main engineering challenges are token explosion from redundant tool outputs cluttering context and the cumulative probability of failure in multi-step tasks. A 'thick framework' Harness is crucial because it provides essential features like context compression, memory management, sandbox isolation, fine-grained permission control, and checkpoint recovery. These features, which go beyond simple API wrappers, are needed to ensure stability, security, and cost-effectiveness in production environments.

Related Reads

Research Report Analysis: The Fed's New Chair's Debut – New Leader, But Same Script?

Report Analysis: Federal Reserve's New Chair Debut – A New Captain, But the Same Script? Morgan Stanley's chief global economist Seth B. Carpenter analyzes the first FOMC meeting under new Fed Chair Kevin Warsh in a June 21 report. Warsh deliberately avoided providing forward guidance on interest rates, aligning with his philosophy. However, market expectations for a rate hike this year were reinforced. Key signals lie elsewhere: inflation may fall more than expected, and quantitative tightening (QT) could be more aggressive than anticipated. The FOMC's "dot plot" suggests only one rate hike in 2026. Carpenter argues that if inflation undershoots forecasts, the logic for even a single hike weakens, especially as projections indicate potential rate cuts in 2027. On QT, Warsh's stance is clear. Carpenter notes that measures like halving the Treasury's account balance could shrink the Fed's balance sheet by around $500 billion with minimal market impact. Combined with adjustments to reserve interest and liquidity rules, the ultimate QT scale may exceed expectations, though its market effect might be less disruptive unless the Fed actively sells Mortgage-Backed Securities (MBS). While Warsh initiated a review of the Fed's policy framework, the 2% inflation target remains intact for now. The report concludes that the market may be overestimating the significance of reduced forward guidance and the near-term rate hike risk, while potentially underestimating the scope and manageable nature of the coming balance sheet reduction. The key debates will hinge on upcoming core PCE data, the specifics of the QT path, and the framework review's findings.

marsbit11m ago

Research Report Analysis: The Fed's New Chair's Debut – New Leader, But Same Script?

marsbit11m ago

Critical Game Week: BTC Retracement Confirmation vs. HYPE Support Battle | Guest Analysis

This weekly analysis outlines a critical juncture for BTC and HYPE markets, focusing on key price level confirmations. **BTC Analysis:** BTC is at a pivotal point after a five-wave rally from the June 5th low of $59,100. The price has broken below a short-term rising channel's lower boundary, with the current move seen as a pullback to test this breakdown. Failure to reclaim this level could lead to a retest of the $59,000-$60,000 support zone. The core scenario hinges on this channel retest outcome. * **Key Levels:** Resistance at $64,500-$65,000 (channel boundary) and $69,500-$70,500. Support at $59,000-$60,000 and $55,000. * **Strategy:** A core bearish stance is maintained (20% short from last week), with short-term plans for tactical trades. Three detailed contingency plans (A/B/C) are provided for short positions on resistance tests or breakdowns, emphasizing strict stop-loss discipline. **HYPE Analysis:** HYPE shows strong momentum but is currently in a corrective phase after hitting a new high of $76.94. The price is retesting the crucial $64-$66 support area. * **Key Levels:** Resistance near $77 and $80-$82. Support at $64-$66 and $52-$54. * **Strategy:** The short-term approach is "buy on dips, avoid chasing rallies." A long position is considered only if clear stabilization signals appear at the $64-$66 or deeper $52-$54 support zones, with tight risk controls. **General Risk Management:** A standardized trailing stop-loss protocol is emphasized: set initial stop, breakeven at +1% profit, then trail stops upward to lock in gains. *Disclaimer: All analysis is presented as a personal trading framework, not investment advice. Market conditions are complex and require dynamic adjustment.*

marsbit24m ago

Critical Game Week: BTC Retracement Confirmation vs. HYPE Support Battle | Guest Analysis

marsbit24m ago

Research Report Interpretation: Citi Attends AWS Summit, Bullish on Cloud Business Acceleration but Data Governance Remains Key Variable

Citi analyst Tyler Radke's team attended the AWS New York Summit (June 17-18), engaging with over 10 clients and partners. In a June 19 report, they highlighted the summit's focus on scaling agent AI for enterprise deployment. Citi maintains a "Buy" rating on Amazon, forecasting AWS revenue growth to accelerate to 37% in FY27 from 30% in FY26, noting this estimate may be conservative. Key takeaways: 1. **AWS Strategy Shift:** AWS is moving from proof-of-concepts to scalable deployment. New offerings like AWS Context (building enterprise knowledge graphs), Amazon Quick (cross-application AI assistant), and security tool Continuum address core enterprise pain points for AI adoption. 2. **Data Infrastructure Beneficiaries:** Data infrastructure companies like Snowflake, Elastic, Oracle, and ClickHouse are seen as direct beneficiaries of scaling AI workloads, as evidenced by strong growth and use cases presented. 3. **Critical Role of Data Governance:** As AI agents scale from hundreds to thousands, effective data governance becomes the key variable for deploying AI in core business processes. AWS Context represents AWS's strategic extension from providing compute/models to offering a data governance infrastructure layer. The report emphasizes that without solving data governance, AI will remain confined to pilot projects. The investment thesis focuses on AWS revenue acceleration and data infrastructure vendors' growth, while monitoring signals like AWS's quarterly revenue growth, Bedrock AgentCore task volume, and pricing impacts on companies like Elastic.

marsbit30m ago

Research Report Interpretation: Citi Attends AWS Summit, Bullish on Cloud Business Acceleration but Data Governance Remains Key Variable

marsbit30m ago

Crucial Week of Contention: BTC Tests Support and HYPE's Key Level Battle | Special Analysis

**Market Enters Critical Week: Bitcoin Pullback Test and HYPE Support Battle** The market enters a crucial phase of contention this week. The marginal shifts in Federal Reserve policy expectations continue to dictate the pricing rhythm for risk assets. Meanwhile, in the crypto market, following a period of sideways consolidation, the divergence between bulls and bears is becoming concentrated at key price levels. **Bitcoin (BTC) Analysis & Strategy** * **Technical View:** The 4-hour chart suggests BTC is in a five-wave structure since the June 5th low near $59,100. Price action shows a short-term rising channel. The recent drop below this channel's lower boundary is now being followed by a pullback attempt (wave 40-41). The outcome of this retest is critical. * **This Week's Outlook:** The core focus is whether BTC can reclaim and hold above the channel's lower boundary. * **Bullish Scenario:** A successful hold could lead to a continued rebound, potentially challenging the $69,500 - $70,500 resistance zone. * **Bearish Scenario:** Failure to hold may trigger a renewed test of the $59,000 - $60,000 core support area, with $55,000 as a deeper support level. * **Operational Strategy:** The author maintains a 20% mid-term short position initiated last week near $64,500, based on a model signaling a shift to a bearish structure. Short-term tactics involve using 30% capital for potential "spread" trades, with three contingency plans (A, B, C) outlined for reacting to resistance tests, breakouts, or support breakdowns. **HYPE Analysis & Strategy** * **Technical View:** On the 4-hour chart, HYPE shows strong momentum, having recently broken to a new high since January. The current pullback presents a clear three-wave correction structure, bringing the price back to the critical $64 - $66 support zone. * **This Week's Outlook:** The focus is on the battle for the $64 - $66 support area. * **Bullish Scenario:** Holding this support could signal a continuation of the uptrend from the June 10th low, leading to new highs. * **Bearish Scenario:** A breakdown could extend the correction, potentially testing the deeper $52 - $54 support band. * **Operational Strategy:** The recommended short-term approach is "buy on dips, avoid chasing rallies." A light long position (under 30% capital) could be considered if HYPE shows stabilization signals at the $64-$66 or $52-$54 support zones, confirmed by model signals. Strict stop-loss discipline is emphasized. **General Risk Management:** A strict trailing stop-loss protocol is advised: set an initial stop; move to breakeven at +1% profit; lock in profits progressively thereafter. *Disclaimer: All analysis is presented as the author's personal technical perspective and trading log, not as investment advice. Markets are complex and dynamic; risk control is paramount.*

Odaily星球日报31m ago

Crucial Week of Contention: BTC Tests Support and HYPE's Key Level Battle | Special Analysis

Odaily星球日报31m ago

AI Agents Also Need 'Credit Checks': ERC-8126 is Filling the Gap in On-chain Trust

The article discusses ERC-8126, a proposed standard designed to address the lack of trust and verification for AI Agents operating on-chain. While ERC-8004 provides AI Agents with a basic on-chain identity (answering "Who are you?"), it does not guarantee trustworthiness. ERC-8126 aims to fill this gap by establishing a verification layer (answering "Are you reliable?"). It standardizes how independent verification providers can assess an agent's associated risks across five key areas: Token/Contract Verification (ETV), Media Content Verification (MCV), Solidity Code Verification (SCV), Web Application Verification (WAV), and Wallet Verification (WV). These providers generate a standardized risk score (0-100) and proofs based on their checks, without acting as a single authoritative certifier. This allows wallets, marketplaces, dApps, and other agents to consume these risk signals—for example, to display warnings, filter listings, or make interaction decisions. The standard also incorporates concepts like Private Data Verification (PDV) and Zero-Knowledge Proofs (ZKP) to allow verification without exposing sensitive underlying data. Positioned alongside ERC-8004 (Identity) and ERC-8183 (Commerce for agents), ERC-8126 represents a step toward building a verifiable and accountable infrastructure for the emerging on-chain AI Agent economy, shifting trust assessment from purely user-based judgment to standardized, consumable signals.

marsbit48m ago

AI Agents Also Need 'Credit Checks': ERC-8126 is Filling the Gap in On-chain Trust

marsbit48m ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片