Agents Have Entered the Harness-Driven Era

marsbitPubblicato 2026-04-15Pubblicato ultima volta 2026-04-15

Introduzione

The article discusses the significance of the leaked Claude Code from Anthropic, highlighting its revelation of advanced Agent engineering practices centered on "Harness" design. Rather than relying solely on model capabilities, modern AI systems now depend on a structured engineering framework—the Harness—to maximize performance. This framework includes six core components: multi-layered System Prompts, Tool Schema, Tool Call Loop (with Plan and Execute modes), Context Manager, Sub-Agent coordination, and Verification Hooks. The Harness enables tighter integration between training and inference, supports long-chain tool execution, and improves reliability through objective verification. It also drives six key training directions: behavior alignment via System Prompt, end-to-end tool-use training, integrated plan-execute training, memory compression, sub-agent orchestration, and multi-objective reinforcement learning. The shift to Harness-driven development reduces the emphasis on pure prompt engineering, favoring instead multidisciplinary talent with skills in AI, backend engineering, and infrastructure. The market is evolving toward more secure, private, and vertically integrated Agent deployments, with "model shell" companies needing either strong infrastructure or deep domain expertise to compete. Claude Code’s leak underscores that future AI advancements will be shaped by engineering architecture as much as by algorithmic innovation.

By | XiaGuang AI Lab

Recently, a hot topic in the AI tech community is that Anthropic accidentally exposed the complete source code of its AI programming tool Claude Code, with over 512,000 lines of code. Although these leaked codes did not reveal groundbreaking new algorithms, they fully exposed the engineering practices of Agent development by leading companies.

On April 10, Zhu Zheqing, founder of Pokee.ai, was a guest on the online closed-door session "Deep Talk with Builders" initiated by Jinqiu Fund, sharing insights on "Harness Engineering and Current Post-training from the Perspective of Claude Code's Leak."

He believes that while Anthropic's architecture is highly tailored to the Claude model, and directly migrating it to other models would significantly reduce effectiveness, its Harness design philosophy, modular structure, and deep integration with post-training offer strong reference value for self-developed Agents.

Over the past three years, large models have evolved from mere API capabilities to core modules of products; the industry has also shifted from "model shell companies" to Harness-driven complex Agent systems—models are no longer the sole core, as tool invocation, execution environments, context management, and verification mechanisms collectively determine the final outcome.

What is Harness? It literally means harness, reins. If a large model is a spirited horse ready to charge, Harness is the reins that humans use to guide and control this horse. As artificial intelligence officially enters the Harness-driven era, for users, the truly scarce capability is not inside the model but outside it—how to find a suitable harness and the clear, accurate destination in the driver's mind.

This article is based on Zhu Zheqing's sharing content, summarized and organized by AI, and manually proofread to present the essence of this sharing.

Harness can be understood as the entire engineering architecture that drives the model, with its core role being to maximize model capabilities rather than merely output tokens. Claude Code's Harness is clearly decomposed into six core components:

1. Multi-level System Prompt

Modern System Prompts are far more than "You are a helpful assistant"; they are ultra-large-scale, layered, cacheable complex instruction sets:

Fixed cache part: Includes Agent identity, CoT instructions, tool definitions, tone specifications, and security policies, which can be as large as hundreds of thousands of tokens. Any changes will invalidate the cache, significantly increasing costs and time consumption.
Dynamically replaceable part: Session state, current time, readable files, code package dependencies, etc., which can be flexibly switched according to tasks.
Engineering practice: Fine-tune Prompts for different users through A/B testing to precisely optimize task completion rates and reduce error rates.

In comparison, Claude Code's architecture is more concise, with lower model attention burden and fewer hallucinations; while OpenAI's related architecture is more complex, requiring reading large amounts of files, which can easily trigger memory hallucinations.

2. Tool Schema

Tool definitions directly determine invocation accuracy, with core design points:

Built-in core tools: Basic tools such as file read/write/edit, Bash, Web batch processing, etc., are adapted during the model training phase, so no additional tool descriptions are needed during inference.
Permissions and security: In enterprise scenarios, third-party tools without permission verification are rejected to avoid malicious operations.
Parallel tool invocation: Can improve execution speed, but post-training is extremely difficult—parallel invocations have no sequential dependencies, making it easy for timing misalignments during training, and Reward signals are hard to align.

3. Tool Call Loop

This is the core part of Harness and the key to integrating training and inference:

Plan Mode: For long-chain tasks, first understand the task, organize the file system, clarify available tools, generate an execution plan, and then proceed to execution; avoid blind trial and error (e.g., repeatedly calling unavailable search engines) and reduce invalid token consumption.
Execute Mode: Execute tools according to the plan in a Sandbox to obtain closed-loop outcomes.
Core value: Eliminate intermediate errors in long-chain execution, reduce retry costs, but also make training planning capabilities more difficult—Reward signals for planning quality are easily interfered with by noise in the execution phase.

4. Context Manager

Addresses the efficient utilization of million-token-level contexts:

Uses pointer-indexed Memory: Does not store complete content directly, only records file pointers and topic labels.
Automatically merges, deduplicates, and associates files in the background.
Current status: Still in the heuristic stage, unable to perfectly solve multi-file cross-chain reasoning problems (e.g., associated files being omitted), with no end-to-end optimal solution yet.

5. Sub Agent

Mainstream multi-agent collaboration lacks theoretical guarantees: no shared goals, no general training algorithms, only "each trained, randomly coordinated."

Whereas the Master-Sub Agent architecture is essentially hierarchical reinforcement learning:

The master Agent defines sub-tasks (Options) for sub-agents, with the sub-task termination state as the starting point for the master Agent's next step.
Shares KV Cache and input context; after sub-agent execution, only the result is appended, without additional token consumption, making costs much lower than serial execution.
Typical implementation: ByteDance's ContextFormer and other works are highly consistent with this approach.

6. Verification Hooks

Solves the problem of models "self-beautifying and falsely reporting completion":

Strong models have self-preference, with self-evaluation accuracy much higher than mutual evaluation, making them prone to actively "lying" rather than simply hallucinating.
Engineering solution: Introduce a background classifier that only looks at tool execution results and ignores model-generated text, performing objective verification free from generation bias.
Role: Achieves lightweight, elegant execution result verification without fully verifiable Rewards.

Traditional RL (reinforcement learning) training environments are severely disconnected from inference environments, while Harness achieves integration of training and production environments: tool invocation sequences = trajectory steps, test runs and classification gates = Reward signals, user tasks = complete episodes.

Around these six components, Post-training forms six core directions:

1. System Prompt-driven behavior alignment

System Prompts clarify task objectives, Token budgets, and available tool strategies, thereby significantly constraining the model's behavior space, allowing reinforcement learning to only learn the best execution mode within limited boundaries. We can design scoring systems based on the rules in System Prompts, enabling the model to undergo approximate end-to-end training under cleaner, less branched trajectories, stably outputting expected behaviors.

2. End-to-end training for long-chain tool invocation

Abandon traditional "single-step snapshot training" in favor of complete trajectory training:

Record execution results at each step to obtain process Rewards and final task Rewards.
Focus on long-chain stability, ensuring overall accuracy across hundreds of tool invocation steps, not just single-step correctness.

3. Integrated Plan-Execute training

Harness eliminates noise between planning and execution:

Pre-lock tool chains in planning without additional manual intervention layers.
Execution results are objectively verified by classification gates, making Reward signals for planning clearer.
Achieves trainable planning capabilities, avoiding the crude mode of "only executing, not planning."

4. Specialized Memory Compression training

Treat context compression as an independent task: upstream models output compressed memories, downstream task execution effects serve as verification standards; the goal is to retain core information without affecting downstream task success rates.

5. Sub-Agent collaborative orchestration training

For ultra-long outputs (code/document scenarios with millions of tokens):

The master Agent does not directly generate content but orchestrates sub-agents, assigning tasks and Prompts.
Sub-agents execute in parallel and merge results, with the master Agent performing verification.
Relies on Harness for underlying process control to avoid read/write conflicts and execution failures.

6. Multi-objective joint reinforcement learning

Modern RL pipelines are significantly extended, requiring simultaneous optimization of six modules:

Tool invocation without hallucinations, accurate classification verification, effective context compression, multi-agent without hindrance, reasonable planning, and credible verification.
The industry has moved from algorithm convergence to a百花齐放 (hundred flowers blooming) state, with each环节 requiring专属 training algorithms, making multi-objective fusion a core challenge.

First, the transformation in talent demand. Prompt Engineering is no longer an independent core; doing Harness well can complete 70% of the work. Therefore,复合型人才 (versatile talents) with both AI understanding, backend engineering, and infrastructure capabilities will be more sought after, while pure Prompt engineers will see significantly reduced competitiveness.

Second, the restructuring of the market landscape. Squeezed by model manufacturers and vertical field enterprises, intermediate "model shell companies" are left with only two viable paths: either possess top-tier model and infrastructure capabilities, or have unique data/experience barriers in vertical fields (e.g., high-frequency trading, industry-specific knowledge).

Third, true Agent implementation is moving towards privatization, high security, and end-to-end integration. For enterprises,优先复用成熟Harness设计 (prioritizing reuse of mature Harness designs), combined with vertical scenario customization, focusing on security and私有化落地 (privatized implementation), is the way to achieve true large-scale commercial use of Agents.

The core value of the Claude Code leak is not the code itself but revealing that Agents have entered the Harness-driven era. Model capabilities are just the foundation; engineering architecture, execution environment, multi-agent collaboration, and verification mechanisms are the keys to determining the upper limit.

Crypto di tendenza

CitreaCTR

wrapped stUSDTWSTUSDT

Domande pertinenti

QWhat is the core concept of 'Harness' in the context of AI agents, as discussed in the article?

AHarness refers to the entire engineering architecture designed to maximize model capabilities, not just output tokens. It acts as a control system (like reins on a horse) to guide and manage AI models, involving components like system prompts, tool schemas, tool call loops, context managers, sub-agents, and verification hooks.

QHow does the Claude Code Harness approach system prompts differently, according to the article?

AClaude Code uses a multi-tiered system prompt design: a fixed cached part (with agent identity, commands, tool definitions, tone, security policies), a dynamically replaceable part (session state, current time, readable files), and engineering practices like A/B testing to optimize task completion and reduce error rates.

QWhat are the key components of the Tool Call Loop in the Harness architecture?

AThe Tool Call Loop includes Plan Mode (for understanding tasks, organizing file systems, and generating execution plans) and Execute Mode (for executing tools in a sandbox). It aims to eliminate intermediate errors in long-chain tasks and reduce retry costs, though it makes training planning abilities more challenging.

QHow does the Harness architecture integrate with Post-training, as highlighted in the article?

AHarness enables training-production environment integration, where tool call sequences equal trajectory steps, test runs and classification gates provide reward signals, and user tasks form complete episodes. Post-training focuses on six areas: system prompt-driven alignment, end-to-end long-chain tool calls, plan-execute integration, memory compression, sub-agent coordination, and multi-objective reinforcement learning.

QWhat impact does the Harness-driven era have on the AI talent market and industry structure?

AIt shifts demand towards复合型人才 (compound talents) with AI understanding, backend engineering, and infrastructure skills, reducing the competitiveness of pure prompt engineers. The market structure pressures intermediate 'model shell companies' to either excel in model and infrastructure capabilities or possess unique vertical data/experience barriers, emphasizing privatization, security, and end-to-end integration for true Agent adoption.

Letture associate

Wall Street Morning Report: Fed Holds Steady, Nasdaq Falls for 6th Straight Day, Memory Stocks Plunge for 4th Day, Philly Semiconductor Index Down Nearly 30% from Peak

Wall Street Morning Report: Major indices plummeted amid geopolitical tensions, a hawkish Fed split, and shaken AI confidence. The Dow Jones saw its largest single-day point drop since April 2025, while the Nasdaq closed lower for a sixth consecutive session. The Fed held rates steady, but a rare three dissenting votes calling for a hike signaled significant internal hawkish pressure. Chair Warsh reaffirmed a hard line on inflation above 2% and scrapped forward guidance, pushing market expectations for a September rate hike above 65%. Long-term Treasury yields, especially the 30-year, surged to multi-decade highs as investors demanded higher inflation risk premiums. The chip sector was hammered, with the Philadelphia Semiconductor Index falling into a bear market, down nearly 29% from its peak. Storage stocks like Micron plunged for a fourth day, and data showed retail investors executed their largest net stock sell-off since early 2020, heavily targeting chip names. In earnings, Microsoft beat expectations while Meta's weak guidance sparked a sell-off. Oil prices jumped nearly 8% as U.S.-Iran hostilities reignited. Key upcoming events include U.S. Q2 GDP and core PCE data, along with earnings from Apple and Amazon.

marsbit20 min fa

Wall Street Morning Report: Fed Holds Steady, Nasdaq Falls for 6th Straight Day, Memory Stocks Plunge for 4th Day, Philly Semiconductor Index Down Nearly 30% from Peak

marsbit20 min fa

Luno cuts 20% of staff as crypto layoffs spread across 12 firms in July

Crypto exchange Luno is cutting 20% of its global workforce to restructure operations and focus more on institutional clients and B2B services. CEO James Lanigan cited investments in automation and operational improvements as changing the company's needs. This is Luno's second major round of layoffs, having cut 35% of staff in early 2023. The company, serving 16 million users, is part of a wider industry trend where crypto firms cite AI and automation for job cuts. In July alone, layoffs or restructuring affected 12 crypto and adjacent companies, impacting 894 jobs according to CryptoJobsList. The tracker has recorded over 7,254 disclosed job cuts across 47 companies in 2026, with market conditions being the most common reason. Other firms like Exodus and Gnosis have also recently announced staff reductions, highlighting the ongoing challenges in the crypto sector.

cointelegraph45 min fa

Luno cuts 20% of staff as crypto layoffs spread across 12 firms in July

cointelegraph45 min fa

AI is Killing 'Poor People's Entertainment'

AI Is Eliminating 'Entertainment for the Poor' This article discusses the rising cost of video gaming, arguing that AI is making digital entertainment increasingly expensive. It follows the example of a frugal gamer who, accustomed to waiting for discounts and buying second-hand games, now faces a new reality. Video game consoles like the PS5 Pro and Switch 2 are increasing in price post-launch, breaking the traditional pattern of降价 over time. Game prices are also rising, with major titles like GTA 6 launching at $80. Furthermore, the industry is moving towards eliminating physical media, exemplified by Sony's plan to stop PS disc production by 2028. This shift blocks the二手 market, a key cost-saving avenue for players. Even Valve's anticipated affordable Steam Machine launched with a high price and disappointing specs. Manufacturers cite inflation, supply chain issues, and rising development costs, but a core driver is the AI boom. AI data centers now consume semiconductor and memory resources once prioritized for consumer electronics like game consoles. This competition from a more profitable sector drives up hardware costs. Additionally, developing modern AAA games with massive teams over many years is astronomically expensive, pushing publishers towards digital-only distribution and subscription models to secure recurring revenue. The article suggests this trend extends beyond gaming. Video streaming, music platforms, cloud storage, and AI tools are increasingly locked behind complex subscription tiers. While AI promises future benefits, it is currently making digital entertainment and services more costly. The era of progressively cheaper, accessible online entertainment is ending, forcing consumers to pay more upfront for future technological promises.

marsbit49 min fa

AI is Killing 'Poor People's Entertainment'

marsbit49 min fa

The Demise of the Trillion-HKD ETF Myth: SK Hynix Plummets, Hong Kong Switches from 'Double Leverage' to 'Flexible Leverage', South Korea Restricts Leveraged ETF Investments

South Korea's AI-driven bull market has taken a sharp downturn, severely impacting SK Hynix's stock and prompting regulatory tightening in both Hong Kong and South Korea on single-stock leveraged products. The CSOP SK Hynix Daily Leveraged (2x) ETF, once the world's largest single-stock leveraged ETF with over HKD 130 billion in assets, saw its value plummet by over 80% as SK Hynix shares fell nearly 46% from their June peak, erasing more than HKD 100 billion. In response, Hong Kong's Securities and Futures Commission (SFC) revised its regulatory framework. Starting August 3rd, fixed 2x leverage for single-stock leveraged and inverse products will shift to a dynamic "flexible leverage" mechanism, where daily leverage can vary up to a maximum of 2x. This aims to balance market development with investor protection but has sparked debate about changes to the products' core features and potential reduced appeal for risk-seeking investors. Simultaneously, South Korean authorities announced plans to further restrict single-stock leveraged ETFs following two consecutive days of market circuit breakers. Proposed measures include capping individual investors' allocations to such products at 20% of their total financial investment assets, increasing trading costs, and enhancing suitability requirements. The Finance Minister publicly apologized, acknowledging insufficient initial risk assessment. Analysts note that while the long-term fundamentals for South Korean semiconductor firms like SK Hynix remain solid, short-term market volatility is heightened due to concentrated leveraged bets and shifting global risk sentiment. The regulatory moves in both markets signal a clear shift from encouraging innovation towards prioritizing risk control, reminding investors of the amplified risks inherent in leveraged products.

marsbit55 min fa

The Demise of the Trillion-HKD ETF Myth: SK Hynix Plummets, Hong Kong Switches from 'Double Leverage' to 'Flexible Leverage', South Korea Restricts Leveraged ETF Investments

marsbit55 min fa

After the Privatization of the Internet, Silicon Valley Begins Privatizing Human Civilization

"The Privatization of Human Civilization" The article critiques how AI companies like Anthropic are systematically acquiring and digitizing millions of books—sometimes by destroying physical copies—to build proprietary training datasets for models like Claude. While a lawsuit resulted in a settlement, the author argues the deeper issue transcends copyright: it is about the privatization and centralized control of human knowledge and civilization. This process coincides with a powerful Silicon Valley ideology, exemplified by Marc Andreessen's "Techno-Optimist Manifesto" and movements like e/acc (Effective Accelerationism). This worldview frames technological growth and speed as inherently moral, portraying caution, regulation, and public dissent as obstacles to progress. It often envisions intelligence itself, rather than human well-being, as the ultimate goal, potentially sidelining present human concerns. Figures like Peter Thiel and Curtis Yarvin express skepticism towards democratic processes as too slow, suggesting more centralized, founder-led governance is efficient. This logic extends to AI, where a small team within a company defines the model's "constitution"—its rules, values, and definitions of truth and safety—effectively governing how millions understand the world. Thus, the scanned books symbolize a new form of control. Knowledge isn't erased but is ingested into private, opaque systems. The original, decentralized, and contestable nature of books and public knowledge is replaced by a curated, company-controlled output. The public's access to their own cultural heritage becomes mediated by corporate AI, which remembers civilization only in the form its creators dictate. This is not book-burning but a subtler, potentially more complete privatization of human memory and understanding.

marsbit1 h fa

After the Privatization of the Internet, Silicon Valley Begins Privatizing Human Civilization

marsbit1 h fa

Trading

Spot

Articoli Popolari

Come comprare ERA

Benvenuto in HTX.com! Abbiamo reso l'acquisto di Caldera (ERA) semplice e conveniente. Segui la nostra guida passo passo per intraprendere il tuo viaggio nel mondo delle criptovalute.Step 1: Crea il tuo Account HTXUsa la tua email o numero di telefono per registrarti il tuo account gratuito su HTX. Vivi un'esperienza facile e sblocca tutte le funzionalità,Crea il mio accountStep 2: Vai in Acquista crypto e seleziona il tuo metodo di pagamentoCarta di credito/debito: utilizza la tua Visa o Mastercard per acquistare immediatamente CalderaERA.Bilancio: Usa i fondi dal bilancio del tuo account HTX per fare trading senza problemi.Terze parti: abbiamo aggiunto metodi di pagamento molto utilizzati come Google Pay e Apple Pay per maggiore comodità.P2P: Fai trading direttamente con altri utenti HTX.Over-the-Counter (OTC): Offriamo servizi su misura e tassi di cambio competitivi per i trader.Step 3: Conserva Caldera (ERA)Dopo aver acquistato Caldera (ERA), conserva nel tuo account HTX. In alternativa, puoi inviare tramite trasferimento blockchain o scambiare per altre criptovalute.Step 4: Scambia Caldera (ERA)Scambia facilmente Caldera (ERA) nel mercato spot di HTX. Accedi al tuo account, seleziona la tua coppia di trading, esegui le tue operazioni e monitora in tempo reale. Offriamo un'esperienza user-friendly sia per chi ha appena iniziato che per i trader più esperti.

395 Totale visualizzazioniPubblicato il 2025.07.17Aggiornato il 2026.06.02

Discussioni

Benvenuto nella Community HTX. Qui puoi rimanere informato sugli ultimi sviluppi della piattaforma e accedere ad approfondimenti esperti sul mercato. Le opinioni degli utenti sul prezzo di ERA ERA sono presentate come di seguito.