NVIDIA Team Enables Programming Agent to Take Over Real Robot Experiments, Achieves 99% Success Rate

marsbitPublished on 2026-06-18Last updated on 2026-06-18

Abstract

NVIDIA's ENPIRE project demonstrates fully automated robotic research where AI agents, given only high-level goals, autonomously manage the entire loop—from literature search and code development to training, deployment, and hardware iteration—on a fleet of physical robots. The system achieves 99% success rates on dexterous real-world tasks like cable tying and peg sorting. Key insights include the discovery of a "physical scaling law" where more parallel robots speed up task resolution, and the observation that resetting an environment is often easier than the main task. The framework introduces metrics like Mean Robot Utilization (MRU) to measure efficiency, with robots often idle half the time, waiting for agent decisions. The long-term vision is a lab that runs autonomously, even without human oversight. The project will be open-sourced, allowing developers to build similar systems.

Automated research has truly stepped out of the code sandbox and into the real physical world.

Recently, Jim Fan, lead of NVIDIA's GEAR lab, introduced a new project called ENPIRE. This marks their first implementation of automated research on robot hardware.

They placed 8 Codex Agents into a robot fleet, allocated GPU computing power and ample token budgets, and gave a simple goal: solve the task as quickly as possible, keep the robots busy but safe, and avoid wasting computing power.

After that, human intervention was largely withdrawn. The Agents autonomously drove the entire closed loop, including automatic scene resetting, literature review, idea implementation and infrastructure setup, policy training and deployment, self-verification, log analysis and code improvement, iterating continuously until high-precision dexterous tasks were reliably completed on real hardware, such as fastening cable ties, organizing pins into a box, installing GPUs, etc.

They also observed a "physical scaling law": increasing the number of parallel robots (e.g., from a few to 8) significantly sped up task resolution.

Currently, part of the lab's systems can perform self-iteration overnight without human intervention, with researchers only needing to check reports in the morning.

Jim Fan stated that the future goal is to allow team members to go on vacation with peace of mind, with even NVIDIA CEO Jensen Huang unaware that the lab is still running autonomously.

The ENPIRE project plans to be fully open-sourced, potentially enabling regular developers to set up similar autonomous robot research systems at home.

Project address: https://research.nvidia.com/labs/gear/enpire/

ENPIRE System Architecture: Four Modules Forming a Closed Loop

ENPIRE is a framework system designed for coding Agents, constructing a repeatable physical feedback loop through four core modules: the Environment module (EN) handles automatic resetting and verification; the Policy Improvement module (PI) initiates policy optimization; the Rollout module (R) supports policy evaluation on single or multiple robots in parallel; and the Evolution module (E) enables coding Agents to analyze logs, review literature, and improve training infrastructure and algorithm code to address failure modes.

This closed-loop system transforms real-world robot learning into an Agent-managed, controllable optimization process, thereby minimizing manual input while supporting fair ablation experiments across different training recipes and Agent variants.

Supported by ENPIRE, cutting-edge programming Agents can autonomously develop policies and achieve a 99% success rate on challenging real-world dexterous manipulation tasks like PushT, organizing pins into a pin box, and cutting cable ties with a cutter.

Key Finding: Resetting the Environment is Often Easier Than Completing the Task Itself

One key observation is: For many robot tasks, resetting the environment is often easier than completing the task itself.

Therefore, ENPIRE's approach is to first let the Agent build an automatic resetting environment via Code-as-Policy. Often, the so-called reset is essentially a pick-and-place task, solvable by Cap-X.

Subsequently, the agent writes a heuristic rule-based reward function. The research team then places this environment into a sandbox and initiates automated research by the Agent centered around scoring.

This also echoes Karpathy's definition of automated research: automated research here is not simply tuning a hyperparameter or modifying a small piece of code. The Agent explores different paradigms from the internet and rewrites anything that could potentially boost performance, including algorithms, training objectives, and even the data loader.

In the pin task, one Agent even wrote its own contact force safety controller, which proved more effective than merely adjusting several reinforcement learning parameters.

New Metrics MRU and MTU

ENPIRE's scalability depends on the size of the Agent team and computing resources, but here, the truly scarce resource is not GPUs, but robot time.

When the research team provided Agents with 8 robots instead of 1, the time needed for the pin task to achieve near-perfect performance was reduced from over 1.5 hours to about 40 minutes. These Agents coordinate via Git: sharing code, discarding suboptimal ideas, and autonomously selecting each other's best-performing runs.

This points to a larger shift: robotics research is becoming a work of environment design—building environments where coding Agents can conduct automated research; algorithmic work is shifting to a higher layer, towards constructing feedback loops that Agents can autonomously close.

And this loop compounds continuously: a skill mastered by an Agent today becomes the building block for constructing and resetting environments for more difficult tasks tomorrow. Capabilities bootstrap new capabilities.

Under this paradigm, the true hard constraint is the budget for real-world interaction.

Therefore, the research team proposed two metrics:

  • Mean Robot Utilization (MRU): The proportion of time robots are actually running experiments relative to total elapsed real-world time.
  • Mean Token Utilization (MTU): Measures the efficiency of Agents in converting tokens into research progress.

In their experiments, MRU consistently remained below 50%. That means the robots were idle half the time, waiting for the Agents to think. Therefore, a better harness and faster models directly translate into tangible benefits.

PushT is a long-standing robot manipulation benchmark. Typically, completing this task requires extensive human demonstration data plus hours of behavior cloning training.

However, they observed that Codex, Claude Code, and Kimi Code all "solved" this task in under 2 hours using a rule-based heuristic approach: without neural networks, without training, and without relying on any human data.

To enable more people to experiment with automated research in the physical world at home, they developed a full-stack system based on the @LeRobotHF SO-101 kit + NVIDIA Jetson Thor. This system can perform the PushT task.

Reference Links:

https://x.com/_wenlixiao/status/2066913334994358342

https://x.com/DrJimFan/status/2066921736369766762

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Yang Wen

Related Questions

QWhat is the key achievement of the ENPIRE project as described in the article?

AThe ENPIRE project by NVIDIA's GEAR lab has, for the first time, successfully implemented automated research on physical robot hardware. This involved 8 Codex Agents autonomously driving a closed-loop process to solve complex, high-precision dexterous manipulation tasks like tying zip ties and organizing a pin box, achieving a success rate of 99% with minimal human intervention.

QWhat are the four core modules that make up the ENPIRE system's closed-loop architecture?

AThe four core modules are: the Environment (EN) module for automatic resetting and validation, the Policy Improvement (PI) module for initiating policy optimization, the Rollout (R) module for evaluating policies on single or multiple robots, and the Evolution (E) module where coding agents analyze logs, consult literature, and improve infrastructure and code to address failure modes.

QWhat key insight regarding task difficulty did the researchers discover, and how did ENPIRE leverage it?

AA key observation was that resetting a robot environment is often easier than completing the main task itself. ENPIRE leverages this by first having agents build an automatic reset environment (often a simple pick-and-place task) using Code-as-Policy. This automated reset capability is crucial for enabling continuous, unattended experimentation.

QAccording to the article, what new metrics were proposed to measure the efficiency of this automated research paradigm, and what does a low MRU indicate?

ATwo new metrics were proposed: Mean Robot Utilization (MRU), which measures the proportion of time robots are actively running experiments, and Mean Token Utilization (MTU), which measures how efficiently agent tokens are converted into research progress. In their experiments, MRU was below 50%, indicating robots spent over half their time idle, waiting for the agents to think, highlighting a bottleneck.

QWhat future vision did Jim Fan, leader of the NVIDIA GEAR lab, describe for the ENPIRE system?

AJim Fan's stated future goal is for the lab team to be able to go on vacation comfortably, with the system running so autonomously that even NVIDIA's CEO, Jensen Huang, would be unaware that the laboratory continues to operate and conduct research on its own.

Related Reads

Matrixdock Featured Again in SBMA’s 《Crucible》: Discussing How Tokenisation Enhances Efficiency in the Precious Metals Market

Matrixdock's research article, titled "Why Tokenisation Matters for the Bullion Industry and How Carrying Costs Fit In," has been featured again in the SBMA's industry publication *Crucible*. Authored by Matrixdock lead Eva Meng, the piece examines how tokenisation enhances the efficiency and utility of the precious metals market. The article argues that tokenisation builds upon the accessibility improvements brought by gold ETFs, not by redefining gold's value but by enabling it to function within digital finance. It extends gold's role beyond a portfolio holding, potentially facilitating instant settlement, digital collateral, and operation in 24/7 markets. A key focus is transparently handling the unavoidable carrying costs (storage, insurance) of physical assets like gold and silver. Matrixdock introduces the Fungible Reserve Standard (FRS) framework, based on an "Economic Purity Principle," which aims to reflect these real-world economic costs clearly within the token mechanism, rather than bundling them opaquely. The platform's practical applications are highlighted, including its gold token XAUm and its silver token XAGm, the first built on the FRS framework. As the tokenised gold market surpassed $6 billion in February 2026, the industry's focus is shifting from initial proofs of reserves to broader concerns of market efficiency and capital utilization. Tokenisation is positioning gold and other precious metals to become active components within the evolving digital financial system.

marsbit6m ago

Matrixdock Featured Again in SBMA’s 《Crucible》: Discussing How Tokenisation Enhances Efficiency in the Precious Metals Market

marsbit6m ago

New Huo Research: Dense Bottom Fishing in the $60K BTC Range, a 'High Value-for-Money Zone' Sees a Handover Surge

Bitcoin has shown a significant oversold rebound this week, with extreme panic in the crypto market easing. Multiple data points indicate a notable market bottom is forming. On the market front, the net outflow from Bitcoin spot ETFs has continued to shrink, and the negative premium between Coinbase and USDT is steadily correcting. Industry fundamentals suggest the shutdown cost for mainstream miners is concentrated between $30,000 and $50,000, potentially solidifying a阶段性 industry cost floor—a classic signal of market bottoms in previous cycles. Institutional capital is notably positioning against the trend. For instance, Sinohope Group's weekly OTC trading volume surged over 8 times环比, with active platform users doubling, both reaching record highs. This confirms a sharp increase in large capital transaction activity and a spike in off-exchange funding demand. Sinohope Research also observed on-chain data showing that funds from entities with public company attributes and long-term "whale" wallets are actively accumulating Bitcoin around the key $60,000 price level. The research institute has maintained since mid-May that a high-value investment window has reopened, and the market is now undergoing a shift from panic selling to long-term holding. Looking ahead, the core drivers for an upward market move will be liquidity release and macro policy developments. The successful and strong performance of SpaceX's IPO has reignited market optimism, and the massive liquidity frozen during its subscription period is now being unlocked. This substantial capital is expected to seek new value opportunities, potentially flowing into currently undervalued assets like Bitcoin. On the macro and policy front, the tone set in Kevin Warsh's upcoming speech at the FOMC meeting is crucial for near-term monetary policy expectations. Furthermore, the potential passage of the CLARITY Act by late July could significantly boost institutional confidence for capital entry. Considering these bottoming signals alongside favorable liquidity and policy factors, Sinohope Research remains optimistic about the market's subsequent trajectory.

marsbit10m ago

New Huo Research: Dense Bottom Fishing in the $60K BTC Range, a 'High Value-for-Money Zone' Sees a Handover Surge

marsbit10m ago

Uncovering the 'God of Investment Research' Behind Citrini: Perpetual Substack Chart-Topper, a Single Report Evaporated Trillions from US Stocks

Revealing the "Research God" Behind Citrini: A Non-Finance Founder Shaking Up Markets Citrini, an independent research firm consistently ranked #1 on Substack's finance charts with nearly 250,000 subscribers, has gained significant attention in the current bull market. Its founder, James van Geelen, holds dual degrees in biology and psychology from UCLA, with a background as an emergency medical technician and a healthcare entrepreneur before founding Citrini. The firm is known for its deep, narrative-driven analyses focusing on long-term "super trends" like AI, geopolitics, and macro policy. Citrini made headlines in February with its report "The 2028 Global Intelligence Crisis," a thought experiment on AI's potential societal impact. Despite being labeled a scenario analysis, it triggered a widespread sell-off in software and related stocks, briefly wiping hundreds of billions from the US market. Other notable reports include an on-the-ground analysis of the Strait of Hormuz and accurate calls on the copper foil industry's importance for AI/semiconductors. Geelen champions "second-order thinking," focusing on the indirect consequences of events. His investment style is thematic and often contrarian, seeking opportunities others miss. Citrini operates with a founder-driven, anonymous elite team model, recently adding specialists in macroeconomics and semiconductor analysis. The firm also manages a model portfolio, Citrindex, which has reportedly achieved over 200% cumulative returns.

marsbit14m ago

Uncovering the 'God of Investment Research' Behind Citrini: Perpetual Substack Chart-Topper, a Single Report Evaporated Trillions from US Stocks

marsbit14m ago

Ethereum Q1 2026 Review: On-chain Activity Hits New Highs, Tokenized Assets Lead the Industry

Ethereum Q1 2026 Review: Chain Activity Hits Record High, Tokenized Assets Lead the Industry In Q1 2026, the Ethereum ecosystem displayed a dual narrative: record-high on-chain activity coincided with declining USD-denominated metrics. User adoption surged, with monthly active addresses reaching 13.2 million (up 53.5% QoQ) and layer-1 transactions hitting 200.4 million (up 38% QoQ). However, the total value locked (TVL) decreased 11% to $316.2B, and protocol revenue fell 16.9% to $2.0B, largely due to a broader crypto market downturn. Tokenized assets emerged as a key strength. Their total market cap reached $203.4B, with significant growth in tokenized funds (+73.1% YoY) and commodities (+325.9% YoY). Ethereum dominated cross-chain comparisons, holding over 61% of stablecoin value and 84% of tokenized commodities value among top chains. A major development was the impact of network scaling. The "Blob" upgrade significantly increased data capacity, causing average transaction fees on layer-1 to plummet 47.9% QoQ despite higher usage. This demonstrates the "Jevons Paradox" in action: cheaper block space stimulates demand. The report highlights Ethereum's established position as the primary settlement layer for institutional tokenization, evidenced by new fund launches from giants like BlackRock and JPMorgan in May. Analysts draw parallels between Ethereum's current stage and the internet in the mid-1990s, suggesting its open, neutral network is poised to become the foundational infrastructure for global finance, outcompeting closed, private alternatives.

Foresight News18m ago

Ethereum Q1 2026 Review: On-chain Activity Hits New Highs, Tokenized Assets Lead the Industry

Foresight News18m ago

Trading

Spot
Futures
活动图片