NVIDIA Team Enables Programming Agent to Take Over Real Robot Experiments, Achieves 99% Success Rate

marsbitPublicado a 2026-06-18Actualizado a 2026-06-18

Resumen

NVIDIA's ENPIRE project demonstrates fully automated robotic research where AI agents, given only high-level goals, autonomously manage the entire loop—from literature search and code development to training, deployment, and hardware iteration—on a fleet of physical robots. The system achieves 99% success rates on dexterous real-world tasks like cable tying and peg sorting. Key insights include the discovery of a "physical scaling law" where more parallel robots speed up task resolution, and the observation that resetting an environment is often easier than the main task. The framework introduces metrics like Mean Robot Utilization (MRU) to measure efficiency, with robots often idle half the time, waiting for agent decisions. The long-term vision is a lab that runs autonomously, even without human oversight. The project will be open-sourced, allowing developers to build similar systems.

Automated research has truly stepped out of the code sandbox and into the real physical world.

Recently, Jim Fan, lead of NVIDIA's GEAR lab, introduced a new project called ENPIRE. This marks their first implementation of automated research on robot hardware.

They placed 8 Codex Agents into a robot fleet, allocated GPU computing power and ample token budgets, and gave a simple goal: solve the task as quickly as possible, keep the robots busy but safe, and avoid wasting computing power.

After that, human intervention was largely withdrawn. The Agents autonomously drove the entire closed loop, including automatic scene resetting, literature review, idea implementation and infrastructure setup, policy training and deployment, self-verification, log analysis and code improvement, iterating continuously until high-precision dexterous tasks were reliably completed on real hardware, such as fastening cable ties, organizing pins into a box, installing GPUs, etc.

They also observed a "physical scaling law": increasing the number of parallel robots (e.g., from a few to 8) significantly sped up task resolution.

Currently, part of the lab's systems can perform self-iteration overnight without human intervention, with researchers only needing to check reports in the morning.

Jim Fan stated that the future goal is to allow team members to go on vacation with peace of mind, with even NVIDIA CEO Jensen Huang unaware that the lab is still running autonomously.

The ENPIRE project plans to be fully open-sourced, potentially enabling regular developers to set up similar autonomous robot research systems at home.

Project address: https://research.nvidia.com/labs/gear/enpire/

ENPIRE System Architecture: Four Modules Forming a Closed Loop

ENPIRE is a framework system designed for coding Agents, constructing a repeatable physical feedback loop through four core modules: the Environment module (EN) handles automatic resetting and verification; the Policy Improvement module (PI) initiates policy optimization; the Rollout module (R) supports policy evaluation on single or multiple robots in parallel; and the Evolution module (E) enables coding Agents to analyze logs, review literature, and improve training infrastructure and algorithm code to address failure modes.

This closed-loop system transforms real-world robot learning into an Agent-managed, controllable optimization process, thereby minimizing manual input while supporting fair ablation experiments across different training recipes and Agent variants.

Supported by ENPIRE, cutting-edge programming Agents can autonomously develop policies and achieve a 99% success rate on challenging real-world dexterous manipulation tasks like PushT, organizing pins into a pin box, and cutting cable ties with a cutter.

Key Finding: Resetting the Environment is Often Easier Than Completing the Task Itself

One key observation is: For many robot tasks, resetting the environment is often easier than completing the task itself.

Therefore, ENPIRE's approach is to first let the Agent build an automatic resetting environment via Code-as-Policy. Often, the so-called reset is essentially a pick-and-place task, solvable by Cap-X.

Subsequently, the agent writes a heuristic rule-based reward function. The research team then places this environment into a sandbox and initiates automated research by the Agent centered around scoring.

This also echoes Karpathy's definition of automated research: automated research here is not simply tuning a hyperparameter or modifying a small piece of code. The Agent explores different paradigms from the internet and rewrites anything that could potentially boost performance, including algorithms, training objectives, and even the data loader.

In the pin task, one Agent even wrote its own contact force safety controller, which proved more effective than merely adjusting several reinforcement learning parameters.

New Metrics MRU and MTU

ENPIRE's scalability depends on the size of the Agent team and computing resources, but here, the truly scarce resource is not GPUs, but robot time.

When the research team provided Agents with 8 robots instead of 1, the time needed for the pin task to achieve near-perfect performance was reduced from over 1.5 hours to about 40 minutes. These Agents coordinate via Git: sharing code, discarding suboptimal ideas, and autonomously selecting each other's best-performing runs.

This points to a larger shift: robotics research is becoming a work of environment design—building environments where coding Agents can conduct automated research; algorithmic work is shifting to a higher layer, towards constructing feedback loops that Agents can autonomously close.

And this loop compounds continuously: a skill mastered by an Agent today becomes the building block for constructing and resetting environments for more difficult tasks tomorrow. Capabilities bootstrap new capabilities.

Under this paradigm, the true hard constraint is the budget for real-world interaction.

Therefore, the research team proposed two metrics:

  • Mean Robot Utilization (MRU): The proportion of time robots are actually running experiments relative to total elapsed real-world time.
  • Mean Token Utilization (MTU): Measures the efficiency of Agents in converting tokens into research progress.

In their experiments, MRU consistently remained below 50%. That means the robots were idle half the time, waiting for the Agents to think. Therefore, a better harness and faster models directly translate into tangible benefits.

PushT is a long-standing robot manipulation benchmark. Typically, completing this task requires extensive human demonstration data plus hours of behavior cloning training.

However, they observed that Codex, Claude Code, and Kimi Code all "solved" this task in under 2 hours using a rule-based heuristic approach: without neural networks, without training, and without relying on any human data.

To enable more people to experiment with automated research in the physical world at home, they developed a full-stack system based on the @LeRobotHF SO-101 kit + NVIDIA Jetson Thor. This system can perform the PushT task.

Reference Links:

https://x.com/_wenlixiao/status/2066913334994358342

https://x.com/DrJimFan/status/2066921736369766762

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Yang Wen

Preguntas relacionadas

QWhat is the key achievement of the ENPIRE project as described in the article?

AThe ENPIRE project by NVIDIA's GEAR lab has, for the first time, successfully implemented automated research on physical robot hardware. This involved 8 Codex Agents autonomously driving a closed-loop process to solve complex, high-precision dexterous manipulation tasks like tying zip ties and organizing a pin box, achieving a success rate of 99% with minimal human intervention.

QWhat are the four core modules that make up the ENPIRE system's closed-loop architecture?

AThe four core modules are: the Environment (EN) module for automatic resetting and validation, the Policy Improvement (PI) module for initiating policy optimization, the Rollout (R) module for evaluating policies on single or multiple robots, and the Evolution (E) module where coding agents analyze logs, consult literature, and improve infrastructure and code to address failure modes.

QWhat key insight regarding task difficulty did the researchers discover, and how did ENPIRE leverage it?

AA key observation was that resetting a robot environment is often easier than completing the main task itself. ENPIRE leverages this by first having agents build an automatic reset environment (often a simple pick-and-place task) using Code-as-Policy. This automated reset capability is crucial for enabling continuous, unattended experimentation.

QAccording to the article, what new metrics were proposed to measure the efficiency of this automated research paradigm, and what does a low MRU indicate?

ATwo new metrics were proposed: Mean Robot Utilization (MRU), which measures the proportion of time robots are actively running experiments, and Mean Token Utilization (MTU), which measures how efficiently agent tokens are converted into research progress. In their experiments, MRU was below 50%, indicating robots spent over half their time idle, waiting for the agents to think, highlighting a bottleneck.

QWhat future vision did Jim Fan, leader of the NVIDIA GEAR lab, describe for the ENPIRE system?

AJim Fan's stated future goal is for the lab team to be able to go on vacation comfortably, with the system running so autonomously that even NVIDIA's CEO, Jensen Huang, would be unaware that the laboratory continues to operate and conduct research on its own.

Lecturas Relacionadas

Gate Research Institute: Analysis of Chart Patterns and Breakout Trading Strategies

Gate Research Institute: Chart Pattern Analysis and Breakout Trading Strategies Chart patterns are crucial tools in technical analysis for observing market supply and demand shifts, trend continuations, and reversals. This analysis involves a comprehensive evaluation of trend, volume, support/resistance, time cycles, and breakout validity, not just rote pattern recognition. Patterns are broadly categorized into reversal patterns (e.g., Double Tops/Bottoms, Head and Shoulders) and continuation patterns (e.g., Flags, Triangles, Rectangles). An effective breakout, key for trading, requires clear support/resistance, prolonged consolidation, a prevailing trend backdrop, and volume confirmation. However, breakouts are not guaranteed, as false breakouts are common. Risk must be managed through position sizing, stop-loss orders, pullback confirmations, and profit-taking in stages. Key pattern types discussed include: * **Rectangle Patterns:** Indicate market indecision within parallel support and resistance, with breakouts projecting a move equal to the pattern's width. * **Flag & Pennant Patterns:** Short-term continuation patterns following sharp price moves ("flagpoles"). * **Triangle Patterns:** Symmetrical, Ascending (bullish bias), and Descending (bearish bias) triangles, representing consolidation before a directional move. * **Head and Shoulders Patterns:** Major reversal patterns signaling trend exhaustion. The article details breakout trading strategies, defining valid breakouts by price closing beyond a key level with increased volume and minimal immediate re-entry into the prior range. It contrasts range trading with breakout trading and outlines entry methods (immediate entry, pullback entry, scaling in), stop-loss placement (based on pattern failure), and profit-taking techniques (target-based, structure-based, trend-following). It further classifies breakout outcomes: 1. **Valid Breakouts:** Strong, sustained moves in the breakout direction. 2. **Pullback Breakouts:** Price breaks out, retests the breakout level as support/resistance, then resumes the trend—offering a lower-risk entry. 3. **False Breakouts:** Price briefly breaches a level but quickly reverses back into the prior range, a common risk managed by strict stop-losses. Key validation tools for breakouts include volume analysis, the principle of support/resistance role reversal, and momentum indicators like ATR, Moving Averages, Bollinger Bands, and RSI. In conclusion, while chart patterns and breakout analysis provide a structured framework, their effectiveness relies on multiple confirming factors—trend context, volume, and proper risk management. They should be integrated into a broader trading system rather than used as standalone signals.

marsbitHace 11 min(s)

Gate Research Institute: Analysis of Chart Patterns and Breakout Trading Strategies

marsbitHace 11 min(s)

Joseph Chalom: Ethereum is Becoming the "Settlement Layer of Trust" for Global Finance

In a speech titled "The Industrialization of Trust," Sharplink CEO Joseph Chalom (former BlackRock digital assets head) discussed the future transformation of global finance. Drawing from 20 years at BlackRock, where he led the launch of Bitcoin/ETH ETFs and tokenized funds, Chalom highlighted the immense hidden costs of establishing trust in traditional finance—estimated at over $9.3 trillion annually in the US alone due to fragmented systems, multi-day settlements, and countless reconciliations. He argued that Ethereum is emerging as the global financial "settlement layer for trust," with its robust, decentralized infrastructure securing over $300 billion in on-chain assets and most stablecoins and tokenized assets. The future, he stated, will be driven by three accelerating pillars: stablecoins (evolving beyond crypto gateways to become efficient cross-border payment rails), tokenized assets (enabling 24/7 trading and reshaping capital markets), and DeFi (providing automated, accessible financial services). A potential game-changer, Chalom added, is the fourth pillar: "Agentic Finance," where AI agents autonomously execute programmable financial transactions via smart contracts and stablecoins. He envisions individuals soon having AI-powered "CFOs in their pockets" to optimize idle capital and manage tokenized portfolios. This shift, facilitated by Ethereum's trustless settlement, could multiply on-chain transaction volume 1000x within a year, moving finance toward a seamless, digitized future.

marsbitHace 11 min(s)

Joseph Chalom: Ethereum is Becoming the "Settlement Layer of Trust" for Global Finance

marsbitHace 11 min(s)

STRC Severely Unpegged, What Risks Is the Market Pricing In?

The article analyzes the recent significant de-pegging of Strategy's perpetual preferred stock, STRC, whose price fell to approximately $89, far below its $100 face value. This discount has pushed its simple yield to around 12.9%, creating a paradox. The stock was designed as a high-yield instrument trading near par, and Strategy maintains an 11.5% annual dividend, even recently switching to semi-monthly payments to support the price. The author explores several reasons why the high yield hasn't attracted enough buying pressure to restore the par value. A key factor is potential reverse deleveraging from carry trades, where leveraged investors may be forced to sell due to margin calls as the price falls, creating a self-reinforcing downward spiral. Additionally, the tokenization and integration of STRC into DeFi protocols (like Apyx, Saturn, Pendle) have introduced faster, more transparent, and potentially more volatile price adjustment mechanisms through leverage and yield-splitting products. The emergence of a competing product, Strive's SATA, offering a 13% yield with daily dividends, has also changed the yield benchmark, challenging STRC's unique high-yield narrative. Furthermore, the market is questioning the distinction between Strategy's substantial Bitcoin reserves, which provide long-term balance sheet coverage, and the certainty of stable near-term cash flow for dividends. Ultimately, the price dip represents a stress test for this type of BTC-backed, high-yield financing tool. The future path of STRC depends on whether Strategy acts to reinforce the $100 peg (e.g., by adjusting dividends), whether DeFi-related leverage unwinds further, and how investors ultimately price the risks of leverage, competition, and cash flow uncertainty against the offered yield.

marsbitHace 22 min(s)

STRC Severely Unpegged, What Risks Is the Market Pricing In?

marsbitHace 22 min(s)

LIT Token Hits Six-Month High: How Long Can the Buyback Flywheel Keep Burning Fuel?

The LIT token of decentralized perpetual exchange Lighter surged to a six-month high above $1.90 on June 18th, with a market cap of $425 million. After a price correction earlier this year, the recent rebound is attributed to its core "buyback flywheel" mechanism. All protocol fee revenue is used for programmatic, hourly market buybacks of LIT. Since its TGE in December 2025, approximately 15 million LIT (6% of circulating supply) has been repurchased for around $21 million. Additional price support comes from the LLP (Lighter Liquidity Pool), where providers must stake LIT worth 10% of their deposited USDC, locking significant token supply. However, challenges persist. Trading volume has declined amidst a sluggish market, with total volume at $1.68 trillion, significantly lower than leading competitor Hyperliquid's $4.37 trillion. While Lighter focuses on perpetual contracts, RWA, and Pre-IPO markets, Hyperliquid has expanded into prediction markets and boasts a U.S. spot ETF, attracting institutional investment and influencer endorsements like from Arthur Hayes. In contrast, LIT currently lacks similar high-profile backing. With 75% of LIT's total 1 billion supply still locked (team and investor tokens begin a 3-year linear unlock in December 2026), there is no immediate unlock selling pressure. The token's future performance hinges on sustaining trading volume growth, successful product iteration, and executing its transparent buyback strategy against a dominant competitor.

Foresight NewsHace 42 min(s)

LIT Token Hits Six-Month High: How Long Can the Buyback Flywheel Keep Burning Fuel?

Foresight NewsHace 42 min(s)

Trading

Spot
Futuros
活动图片