Beyond the Model Lies the Harness: Deepseek Enters the Arena, Why Has the Main Battlefield of China's AI Competition Shifted?

marsbitPublished on 2026-06-22Last updated on 2026-06-22

Abstract

In mid-to-late May 2026, Deepseek internally established a new Harness team focused on code agent products, internally benchmarked against Anthropic's Claude Code. This move, marked by the formula "Model + Harness = Agent" in their job postings, signals a major shift in China's AI competition: the main battlefield is transitioning from developing large models to building toolchains and achieving workplace integration. Deepseek's direct involvement in Harness development aims to secure control over interface design and training data feedback loops, moving beyond open-sourcing powerful models. Harness, the runtime infrastructure for AI agents, handles everything beyond model reasoning—task orchestration, tool calling, context management, safety checks, and error recovery. It is crucial because agent products are not just outputs of model capability but also training grounds for it. Real-world task failures recorded by Harness can feed back into model training, creating a flywheel effect. Engineering Harness is more critical than optimizing prompts, as poor context management or error handling can drastically reduce agent success rates in multi-step, real-world scenarios. This shift is not isolated. Other major Chinese tech companies are also pursuing differentiated toolchain strategies. Tencent leverages its enterprise ecosystem (WeChat Work, Tencent Cloud) to build connectors for organizational-level AI collaboration and complex task delivery. Alibaba focuses on lowering aut...

In late May 2026, Deepseek internally formed a new Harness team, focused on a code agent product, internally benchmarking against Anthropic's Claude Code. Cui Tianyi, a former star quantitative engineer from Jane Street, joined the team in March, with senior researcher Chen Deli publicly confirming and leading the recruitment. Deepseek's job description clearly states a formula: 'Model + Harness = Agent'. As the capabilities of foundational large models gradually converge, the era of simply competing on parameters is fading. Deepseek's direct entry in building a toolchain team marks a shift in the main battlefield of China's AI competition from 'refining large models' to 'building toolchains and office productivity integration'.

Why is Deepseek Building Its Own Harness?

For a long time, developers' expectations for Deepseek focused on open-sourcing more powerful base models. But strong coding capability doesn't mean developers will adopt it as a productivity tool. What truly changes workflows isn't code answers in a chatbox, but engineering agents that can enter terminals, understand projects, read/write files, run commands, and fix bugs. Before the official move, the developer community had already built various open-source terminal Agents based on Deepseek models. By forming the Harness team now, Deepseek aims to control interface design and training data loop closure, integrating community-developed pathways into official core products.

To understand this strategic intent, one must first clarify what 'Harness' is. For non-technical readers, the term 'Harness' might be unfamiliar. In Deepseek's formula, the model handles reasoning, and the Harness handles everything else. 'Harness' originally means 'horse tack' or 'safety belt' in engineering, extended in the AI field to refer to the 'runtime infrastructure' of an Agent.

For a more accessible analogy, consider a large model as the 'brain' and 'intelligence' of a highly capable employee, while the Harness is that employee's 'job description, KPI evaluation criteria, office blast walls, and toolbox'. It's not a 'scaffolding' assembled before runtime, nor a 'framework' providing building blocks, but a continuously running system. It orchestrates execution loops, dispatches tool calls, manages context, performs security checks, and handles error recovery and state persistence. The large model itself is stateless and lacks environmental interaction capability—it can only receive text input and output text. The Harness compensates for these flaws, enabling the model to truly interact with the external world and execute specific tasks.

Why must foundational model companies master this runtime themselves? The core reason is that Agent products are not just outlets for model capabilities but also training grounds. Deepseek's JD emphasizes 'achieving co-evolution of the model and Harness'. In real-world complex tasks, models encounter various failures due to environmental constraints or tool exceptions. Recording these failure trajectories via the Harness can feed back into model training, creating a flywheel effect. If left to the community, model providers risk losing core application-layer data feedback, becoming mere compute and weight providers.

From an engineering perspective, optimizing the Harness is more critical to Agent success than merely optimizing prompts. According to technical experts, in Agent runtime, tool outputs constitute 67.6% of the content the Agent actually sees in its context, while system prompts account for only 3.4%. This means most of the model's 'view' is occupied by tool call results. If the Harness mishandles tool output formatting or fails to compress redundant information effectively, the model suffers from 'context rot', causing subsequent reasoning quality to plummet.

More critical is the compound error problem. An Agent process with 10 steps, each 99% reliable, has an end-to-end success rate of about 90%. When task complexity rises to 50 steps, the success rate plummets to around 60%. In real-world scenarios like codebase maintenance or enterprise office automation, continuous operations spanning dozens of steps are common. Here, even the strongest model reasoning cannot compensate for the cumulative probability loss. Only through error handling and recovery mechanisms within the Harness can retries or path corrections occur upon step failures. This is the engineering value of Harness and precisely why Deepseek must enter this arena directly.

Tencent Makes Connectors, Alibaba Makes Frontend Inroads: Big Tech's Divergent Toolchain Paths

Deepseek's shift is not an isolated case. According to industry media, strengthening Agent capabilities has become a key development direction for domestic foundational large models in 2026. Foundational models are gradually becoming 'utilities', shifting the competitive main battlefield to the application layer. Other domestic tech giants are also carving out differentiation through toolchains, but with distinct approaches, reflecting their respective ecosystem endowments and target user bases.

In June 2026, Tencent played its new card for enterprise Agents, launching WorkBuddy Enterprise Edition. Its core positioning is a full-scenario workplace intelligent agent desktop workbench, focusing on shifting from individual efficiency to organizational collaboration. WorkBuddy Enterprise Edition supports multi-agent parallelism and business system Connector integration, aiming to seize the unified AI office entry point. Tencent's positioning logic leverages its vast WeCom (Enterprise WeChat) and Tencent Cloud ecosystem. For large enterprises, the pain point in AI office automation isn't the ultimate experience of a single-point tool, but whether it can integrate with internal siloed office systems. By building connectors, Tencent enables Agents to directly orchestrate enterprise data and workflows, focusing on organization-level collaboration and complex task delivery. This path's strength lies in high barriers; once integrated into core business processes, switching costs are immense. The challenge is the need for robust enterprise service capabilities and customized support.

Alibaba has taken a different path, choosing to lower automation barriers on the web frontend. Alibaba open-sourced the purely frontend, in-browser GUI Agent framework, PageAgent. This framework requires no backend deployment; a single line of code allows any website to integrate AI operator capabilities. Alibaba's positioning logic is empowering web developers, instantly transforming any webpage into an AI-native application. Given the reality that many legacy enterprise systems lack API interfaces, achieving automation through frontend DOM manipulation is a pragmatic, disruptive path. This approach's advantage is its lightweight, easy integration nature, enabling rapid coverage of a vast long tail of websites. However, frequent changes to frontend DOM structures pose stability challenges, demanding higher error recovery capabilities from the Harness.

In contrast, companies are no longer solely competing on model benchmarks but building toolchains based on their unique ecosystem strengths. Tencent focuses on connectors, Alibaba on frontend penetration, while Deepseek starts with the most critical pain point for developers: code engineering scenarios. This divergence indicates that China's AI industry has recognized there is no perfect, universal Agent—only vertical solutions honed through robust Harness engineering for specific scenarios. For enterprise procurement, choosing a toolchain essentially means choosing an automation path: deep integration with an office ecosystem, flexible embedding into existing web systems, or empowering developer engineering workflows.

Viktor's $20M ARR Proof: Enterprises Will Pay for Autonomous Execution

The maturation of toolchains is changing the paradigm of AI's role in the office. The native Copilot logic is 'draft and wait for human completion'—AI generates copy or code, with the final step requiring human intervention for modification and execution. In this mode, AI is merely an efficiency tool, not a true labor replacement. Employees must constantly monitor AI output for verification and implementation, which actually increases cognitive load.

Overseas markets already show clear signals of a paradigm shift. As a reference point for global trends, Poland-based AI office automation company Viktor, positioned as an AI employee within Slack, achieved a $20 million Annual Recurring Revenue (ARR) without a sales team, serving 30,000 companies, and secured a $75 million Series A funding round in May 2026. Viktor's model represents the end state of new AI employees: possessing a cloud computer, capable of long-duration continuous operation, firmly grasping massive context, and delivering results directly.

Viktor is positioned as a Tier 3 AI Coworker, meaning it handles not simple Q&A but complex tasks like marketing audits, ad campaign management, lead research—requiring multi-step, long-running operations. Enterprises show strong willingness to pay for this type of AI that requires no final human confirmation and can operate continuously for long periods. The explosion of such commercial data proves the value anchor of office automation has shifted from 'assistive generation' to 'autonomous execution'.

Domestic manufacturers' focus on Harness and Agent toolchains aims to capture this trend. When the Harness provides sufficient safety rails, state persistence, and error recovery capabilities, AI can evolve from an 'intern' requiring constant human supervision to an 'outsourcing partner' capable of independently delivering work outcomes. Enterprise procurement focus will shift from model parameter size to whether the Agent can run stably for 8 hours without crashing, automatically handle API rate limits, and adapt to webpage structure changes. For developers, this means the focus of building AI applications shifts from 'how to write good prompts' to 'how to design a robust runtime environment'.

Token Explosion and the Engineering Barriers of 'Thick Frameworks'

As competition shifts to toolchains, the challenges faced by enterprises and developers in practical implementation haven't decreased but have become more focused on the engineering layer.

First and foremost is the Token explosion problem. Agents running for extended durations, in their 'think, act, feedback' loops, are prone to rapidly inflating context due to redundant tool outputs. This is widely discussed in developer communities, as it not only drives up inference costs but also causes model attention to scatter, drastically increasing task failure rates. For example, in a web scraping task, if the Harness feeds the entire webpage's HTML source code unchanged into the context, the model quickly gets lost in redundant information, forgetting the original task objective. Therefore, the Harness's context compression and memory management capabilities become a core consideration for enterprise procurement. A superior Harness must know which historical information can be discarded and which tool return results need summarization. This tests deep engineering architectural capabilities, not the model's inherent intelligence.

This also heightens developer wariness towards 'thin-shell' frameworks. If the Harness launched by a large model provider is merely a simple API wrapper offering basic chat windows and tool-calling interfaces, it will lack practical debugging value. The fragility of production environments demands Harness features like sandbox isolation, fine-grained permission control, and checkpoint/restart—characteristics of a 'thick framework'. Only a runtime with solid engineering barriers can truly meet the stability needs of enterprise-grade applications. For instance, in code execution scenarios, the Harness must provide a safe sandbox environment to prevent malicious code generated by the model from harming the host system. For long-running tasks, it must support checkpoint/restart to avoid restarting entire tasks due to network fluctuations.

Furthermore, geopolitical factors create a significant market vacuum for domestic Harness solutions. Top overseas engineering agent products like Claude Code restrict access for mainland China and Chinese-affiliated enterprises. Unable to use these top tools directly, domestic developers can only seek domestic alternatives. Deepseek forming its Harness team is not just following a technical trend but also responding to this vast replacement demand.

For enterprises and developers, understanding the value of Harness means when selecting AI products, they won't be dazzled by flashy demo conversations but will instead probe into its error recovery mechanisms, context management strategies, and whether it can truly integrate into existing workflows. In the toolchain competition stage, enterprises should prioritize evaluating vendors' engineering delivery capabilities and ecosystem compatibility over simply comparing model benchmarks. Developers should focus on the Harness framework's openness and the completeness of its debugging toolchain, choosing platforms that offer deeply controllable runtimes.

Trending Cryptos

Related Questions

QWhat does the term 'Harness' refer to in the context of AI agents, according to the Deepseek article?

AIn the context of AI agents, the article defines 'Harness' as the "runtime infrastructure" that complements the core model. It is likened to a job description, KPI, safety protocols, and toolkit for a highly intelligent worker (the AI model). It manages the execution loop, tool calls, context, security, error recovery, and state persistence, enabling the stateless model to interact with the external world.

QWhy did Deepseek decide to build its own Harness team for code agents, as explained in the article?

ADeepseek built its own Harness team to master the interface design and establish a training data feedback loop. As model capabilities converge, the competition shifts to toolchains. An official Harness allows Deepseek to control the product, collect crucial failure data from real tasks to improve the model, and avoid becoming a mere model provider while the community builds the critical application layer.

QHow do the toolchain strategies of Tencent and Alibaba differ from each other, based on the article's analysis?

ATheir strategies differ based on their respective ecosystems. Tencent's WorkBuddy Enterprise focuses on being a connector and unified AI office entry point, leveraging its Tencent Meeting and corporate WeChat ecosystem to integrate with and orchestrate complex internal business systems for organizational tasks. Alibaba's PageAgent is a lightweight, front-end framework that enables AI automation directly within web browsers by manipulating the DOM, aiming to lower the barrier for web-based automation without backend APIs.

QWhat key shift in the value of office AI does the success of the company Viktor represent, according to the article?

AThe success of Viktor, with its $20M ARR, represents a shift in the value proposition of office AI from 'assisted generation' to 'autonomous execution.' Instead of just drafting content for humans to finalize, AI like Viktor acts as a Tier 3 coworker that can handle multi-step, long-running complex tasks (e.g., marketing audits) independently and deliver final results without constant human supervision or final approval.

QWhat are the main engineering challenges highlighted for running long-lived AI agents, and why is a 'thick framework' Harness important?

AThe main engineering challenges are token explosion from redundant tool outputs cluttering context and the cumulative probability of failure in multi-step tasks. A 'thick framework' Harness is crucial because it provides essential features like context compression, memory management, sandbox isolation, fine-grained permission control, and checkpoint recovery. These features, which go beyond simple API wrappers, are needed to ensure stability, security, and cost-effectiveness in production environments.

Related Reads

Idle Macs Can Also Make Money? An Overview of Eigen Labs' Decentralized AI Inference Network Darkbloom

AI inference is becoming a crucial layer of internet infrastructure, yet it remains largely dependent on costly, capacity-limited centralized systems with potential security risks. Meanwhile, millions of powerful computers sit idle globally. Eigen Labs' Darkbloom network aims to utilize this idle capacity by enabling distributed AI inference on Mac computers, specifically those with Apple Silicon chips. Darkbloom's architecture consists of three components: users who send inference requests, a coordinator (operated by Eigen Labs) that routes these requests, and providers (Mac owners) whose machines run the models and return outputs without being able to see the request content. The system prioritizes privacy through a hardened provider process, software integrity checks, and hardware-supported attestation based on Apple's security architecture to ensure verifiable privacy. Economically, Darkbloom differs from traditional models. It leverages existing hardware, with marginal costs primarily driven by electricity, allowing it to offer pricing roughly 50% lower than major API aggregators. Providers keep 100% of the inference revenue, and the project does not rely on token subsidies; earnings come solely from real AI inference demand. However, early-stage earnings are modest, with top providers currently earning under $6 per day, influenced by factors like hardware specs, uptime, and network demand. The network currently supports models like Google's Gemma 4 and OpenAI's GPT-OSS via OpenRouter. To participate as a provider, users need an Apple Silicon Mac running macOS 14 or later, must install the Darkbloom provider software, and keep the machine online with a stable internet connection.

marsbit13m ago

Idle Macs Can Also Make Money? An Overview of Eigen Labs' Decentralized AI Inference Network Darkbloom

marsbit13m ago

Which Crypto Sectors Have Been "Eaten" by AI Agents?

The article examines which crypto sectors have been increasingly dominated by AI Agents and which remain human-centric. In certain high-speed, efficiency-driven areas, AI Agents have taken clear control. This includes derivatives/perpetuals trading, where bots outperform humans significantly (e.g., a contest showed 0% of AI Agents were liquidated vs. 43% of humans), arbitrage/MEV extraction, and yield optimization (with ~68% of new DeFi protocols in Q1 2026 featuring autonomous AI Agents). Spot trading and portfolio optimization are also seeing heavy Agent adoption. However, the shift is not universal. In "battleground" sectors, both Agents and humans coexist. In prediction markets, Agents dominate short-term arbitrage, but humans still outperform in long-term, nuanced judgment calls. In DeFi lending, while liquidation is automated, core deposit/borrow decisions remain largely human-driven. Sectors still firmly led by human activity include stablecoin payments and card-based spending (driven by real-world economic activity and remittances) and wallets, which serve as the crucial human-verification and approval layer. The rise of Agents increases the need for robust human-Agent verification layers. Projects like World/AgentKit, t54, Self Protocol, and Kite AI are building infrastructure to create trust, security, and accountability by binding Agents to verified human identities. In conclusion, while AI Agents have decisively "eaten" speed and optimization-focused crypto sectors, human judgment, trust, and real-world context remain dominant in areas that create broad economic value, such as payments and identity. The future likely involves a symbiotic relationship where Agents require human verification and oversight to operate effectively.

Foresight News19m ago

Which Crypto Sectors Have Been "Eaten" by AI Agents?

Foresight News19m ago

After Rising 11 Times in a Year, Micron's Earnings Report Becomes a Stress Test for the AI Memory Market

**Micron's Upcoming Earnings: A Crucial Test for the AI Memory Rally** Investors in AI memory stocks face a critical moment on June 24th, when Micron Technology reports quarterly earnings. The stock, having surged approximately 11-fold from $103 to $1,134 over the past year, carries immense market expectations. Wall Street consensus forecasts a staggering ~932% year-over-year jump in EPS to around $19.72 and ~270% revenue growth to ~$345 billion, largely driven by sold-out HBM (High Bandwidth Memory) capacity through 2026. Analysts have aggressively revised estimates upward over the last 90 days, with EPS expectations rising 68%. This creates a high bar: even strong results risk a sell-off if they fail to meet these elevated projections. Notably, price forecasts from institutions like Citi (predicting ~200% DRAM price increases in 2026) are already among the most bullish on Wall Street, not conservative. The key metric to watch is gross margin, guided to a record ~81%. Such peak profitability raises questions about sustainability in the historically cyclical memory sector. While management has signaled continued strength, the stock's direction post-earnings will likely hinge more on forward guidance for the next quarter and details on HBM capacity expansion for 2027, rather than the already-anticipated stellar past results. The report represents a major pressure test for the high-flying AI memory trade.

marsbit24m ago

After Rising 11 Times in a Year, Micron's Earnings Report Becomes a Stress Test for the AI Memory Market

marsbit24m ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片