NVIDIA Team Enables Programming Agent to Take Over Real Robot Experiments, Achieves 99% Success Rate

marsbit2026-06-18 tarihinde yayınlandı2026-06-18 tarihinde güncellendi

Özet

NVIDIA's ENPIRE project demonstrates fully automated robotic research where AI agents, given only high-level goals, autonomously manage the entire loop—from literature search and code development to training, deployment, and hardware iteration—on a fleet of physical robots. The system achieves 99% success rates on dexterous real-world tasks like cable tying and peg sorting. Key insights include the discovery of a "physical scaling law" where more parallel robots speed up task resolution, and the observation that resetting an environment is often easier than the main task. The framework introduces metrics like Mean Robot Utilization (MRU) to measure efficiency, with robots often idle half the time, waiting for agent decisions. The long-term vision is a lab that runs autonomously, even without human oversight. The project will be open-sourced, allowing developers to build similar systems.

Automated research has truly stepped out of the code sandbox and into the real physical world.

Recently, Jim Fan, lead of NVIDIA's GEAR lab, introduced a new project called ENPIRE. This marks their first implementation of automated research on robot hardware.

They placed 8 Codex Agents into a robot fleet, allocated GPU computing power and ample token budgets, and gave a simple goal: solve the task as quickly as possible, keep the robots busy but safe, and avoid wasting computing power.

After that, human intervention was largely withdrawn. The Agents autonomously drove the entire closed loop, including automatic scene resetting, literature review, idea implementation and infrastructure setup, policy training and deployment, self-verification, log analysis and code improvement, iterating continuously until high-precision dexterous tasks were reliably completed on real hardware, such as fastening cable ties, organizing pins into a box, installing GPUs, etc.

They also observed a "physical scaling law": increasing the number of parallel robots (e.g., from a few to 8) significantly sped up task resolution.

Currently, part of the lab's systems can perform self-iteration overnight without human intervention, with researchers only needing to check reports in the morning.

Jim Fan stated that the future goal is to allow team members to go on vacation with peace of mind, with even NVIDIA CEO Jensen Huang unaware that the lab is still running autonomously.

The ENPIRE project plans to be fully open-sourced, potentially enabling regular developers to set up similar autonomous robot research systems at home.

Project address: https://research.nvidia.com/labs/gear/enpire/

ENPIRE System Architecture: Four Modules Forming a Closed Loop

ENPIRE is a framework system designed for coding Agents, constructing a repeatable physical feedback loop through four core modules: the Environment module (EN) handles automatic resetting and verification; the Policy Improvement module (PI) initiates policy optimization; the Rollout module (R) supports policy evaluation on single or multiple robots in parallel; and the Evolution module (E) enables coding Agents to analyze logs, review literature, and improve training infrastructure and algorithm code to address failure modes.

This closed-loop system transforms real-world robot learning into an Agent-managed, controllable optimization process, thereby minimizing manual input while supporting fair ablation experiments across different training recipes and Agent variants.

Supported by ENPIRE, cutting-edge programming Agents can autonomously develop policies and achieve a 99% success rate on challenging real-world dexterous manipulation tasks like PushT, organizing pins into a pin box, and cutting cable ties with a cutter.

Key Finding: Resetting the Environment is Often Easier Than Completing the Task Itself

One key observation is: For many robot tasks, resetting the environment is often easier than completing the task itself.

Therefore, ENPIRE's approach is to first let the Agent build an automatic resetting environment via Code-as-Policy. Often, the so-called reset is essentially a pick-and-place task, solvable by Cap-X.

Subsequently, the agent writes a heuristic rule-based reward function. The research team then places this environment into a sandbox and initiates automated research by the Agent centered around scoring.

This also echoes Karpathy's definition of automated research: automated research here is not simply tuning a hyperparameter or modifying a small piece of code. The Agent explores different paradigms from the internet and rewrites anything that could potentially boost performance, including algorithms, training objectives, and even the data loader.

In the pin task, one Agent even wrote its own contact force safety controller, which proved more effective than merely adjusting several reinforcement learning parameters.

New Metrics MRU and MTU

ENPIRE's scalability depends on the size of the Agent team and computing resources, but here, the truly scarce resource is not GPUs, but robot time.

When the research team provided Agents with 8 robots instead of 1, the time needed for the pin task to achieve near-perfect performance was reduced from over 1.5 hours to about 40 minutes. These Agents coordinate via Git: sharing code, discarding suboptimal ideas, and autonomously selecting each other's best-performing runs.

This points to a larger shift: robotics research is becoming a work of environment design—building environments where coding Agents can conduct automated research; algorithmic work is shifting to a higher layer, towards constructing feedback loops that Agents can autonomously close.

And this loop compounds continuously: a skill mastered by an Agent today becomes the building block for constructing and resetting environments for more difficult tasks tomorrow. Capabilities bootstrap new capabilities.

Under this paradigm, the true hard constraint is the budget for real-world interaction.

Therefore, the research team proposed two metrics:

Mean Robot Utilization (MRU): The proportion of time robots are actually running experiments relative to total elapsed real-world time.
Mean Token Utilization (MTU): Measures the efficiency of Agents in converting tokens into research progress.

In their experiments, MRU consistently remained below 50%. That means the robots were idle half the time, waiting for the Agents to think. Therefore, a better harness and faster models directly translate into tangible benefits.

PushT is a long-standing robot manipulation benchmark. Typically, completing this task requires extensive human demonstration data plus hours of behavior cloning training.

However, they observed that Codex, Claude Code, and Kimi Code all "solved" this task in under 2 hours using a rule-based heuristic approach: without neural networks, without training, and without relying on any human data.

To enable more people to experiment with automated research in the physical world at home, they developed a full-stack system based on the @LeRobotHF SO-101 kit + NVIDIA Jetson Thor. This system can perform the PushT task.

Reference Links:

https://x.com/_wenlixiao/status/2066913334994358342

https://x.com/DrJimFan/status/2066921736369766762

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Yang Wen

İlgili Sorular

QWhat is the key achievement of the ENPIRE project as described in the article?

AThe ENPIRE project by NVIDIA's GEAR lab has, for the first time, successfully implemented automated research on physical robot hardware. This involved 8 Codex Agents autonomously driving a closed-loop process to solve complex, high-precision dexterous manipulation tasks like tying zip ties and organizing a pin box, achieving a success rate of 99% with minimal human intervention.

QWhat are the four core modules that make up the ENPIRE system's closed-loop architecture?

AThe four core modules are: the Environment (EN) module for automatic resetting and validation, the Policy Improvement (PI) module for initiating policy optimization, the Rollout (R) module for evaluating policies on single or multiple robots, and the Evolution (E) module where coding agents analyze logs, consult literature, and improve infrastructure and code to address failure modes.

QWhat key insight regarding task difficulty did the researchers discover, and how did ENPIRE leverage it?

AA key observation was that resetting a robot environment is often easier than completing the main task itself. ENPIRE leverages this by first having agents build an automatic reset environment (often a simple pick-and-place task) using Code-as-Policy. This automated reset capability is crucial for enabling continuous, unattended experimentation.

QAccording to the article, what new metrics were proposed to measure the efficiency of this automated research paradigm, and what does a low MRU indicate?

ATwo new metrics were proposed: Mean Robot Utilization (MRU), which measures the proportion of time robots are actively running experiments, and Mean Token Utilization (MTU), which measures how efficiently agent tokens are converted into research progress. In their experiments, MRU was below 50%, indicating robots spent over half their time idle, waiting for the agents to think, highlighting a bottleneck.

QWhat future vision did Jim Fan, leader of the NVIDIA GEAR lab, describe for the ENPIRE system?

AJim Fan's stated future goal is for the lab team to be able to go on vacation comfortably, with the system running so autonomously that even NVIDIA's CEO, Jensen Huang, would be unaware that the laboratory continues to operate and conduct research on its own.

İlgili Okumalar

Bitcoin Establishes Floor In $60K–$70K Range, Technical Analyst Says

Technical analyst Frank Fetter frames Bitcoin's prolonged consolidation between $60,000 and $70,000 as a potential price floor. He argues that the extended time and high trading volume within this range are creating a dense cost-basis cluster, effectively transferring supply to stronger hands. This establishes a key battleground zone where every test of the lower half assesses buyer defense, and pushes toward the upper half test seller control. For the bullish scenario to be confirmed, Bitcoin first needs stability within the range, followed by a push above short-term holder cost-basis and prior resistance levels. Crucially, recovery must be led by improving spot demand, not just futures-driven leverage squeezes, to be durable. The floor thesis would be invalidated by a decisive and sustained break below the $60,000 support level, suggesting the market hasn't finished repricing risk. Ultimately, if the range holds and transforms into a durable base through continued supply absorption and compressed volatility, it could provide a stronger foundation for Bitcoin's next trend attempt.

bitcoinist11 dk önce

Bitcoin Establishes Floor In $60K–$70K Range, Technical Analyst Says

bitcoinist11 dk önce

Gate Research Institute: ETF Outflows Suppress Risk Appetite, Two-Way System Navigates Weak Market

Gate Institute Research Report: May 2026 Crypto Market Review & Strategy Analysis In May 2026, the crypto market shifted from an early-month rally to a mid-month correction, concluding with low-volatility consolidation. BTC, ETH, and SOL peaked in early May before declining. The primary market dynamic was a divergence between weakening spot ETF inflows and persistently high leverage-driven perpetual trading volume. A dual-direction moving average cluster breakout strategy outperformed, returning +2.11% for an equally-weighted BTC/ETH/SOL portfolio. This contrasted with a -6.09% return for buy-and-hold and -3.65% for a long-only version of the strategy. Profits were primarily generated from short positions on ETH and SOL during the mid-to-late May downtrend, demonstrating the month's suitability for two-way trend trading. Market structure evolved in three phases: an initial surge (May 1-6), a failure and reversal (starting May 7), and low-volatility compression (May 22 onward). While stablecoin supply remained stable, significant outflows from mainstream BTC and ETH ETFs created selling pressure. Concurrently, high correlation with the S&P 500 (~0.6) and stronger performance from AI equities like Nvidia highlighted crypto's position as a high-beta risk asset within a broader risk-budget framework, lacking independent momentum. The successful strategy employed a 4-hour chart system using a cluster of six moving averages (EMA6,12,24 & SMA6,12,24). A breakout signal was triggered after the cluster width compressed below 2.2%. Trades were managed with a 2.5% fixed stop-loss, a 3:1 Risk/Reward (7.5%) take-profit, and an EMA12-based exit rule to control losses from false breakouts. The strategy's low win rate but high payoff from a few large trend moves was effective in May's conditions. The report concludes that for June, a disciplined, bidirectional approach remains superior to subjective directional bets. The framework should adapt signal weighting based on BTC's position relative to key EMAs, ETF flow trends, and the relative strength of the Nasdaq, prioritizing risk management and trend preservation.

marsbit23 dk önce

Gate Research Institute: ETF Outflows Suppress Risk Appetite, Two-Way System Navigates Weak Market

marsbit23 dk önce

Anthropic CEO's Latest Interview: On Technological Explosion, Safety Red Lines, and the Civilization Contract

Interview with Anthropic CEO Dario Amodei covers the intense pressures and ethical dilemmas of leading AI development. He describes the experience as "exponential growth," feeling constant acceleration akin to relativistic time dilation. The discussion delves into his departure from OpenAI, rooted in a fundamental loss of trust and divergent values rather than mere technical disagreements. Amodei emphasizes Anthropic's enterprise-focused business model, arguing it aligns better with safety and responsible deployment than consumer-facing, ad-driven models. He addresses critical issues like AI's impact on employment, advocating for proactive macroeconomic policies and a shift towards "doing more with the same resources" to avoid widespread job displacement. On safety and governance, he details Anthropic's cautious approach, including delaying the release of the powerful "Mythos" model due to its advanced cyber capabilities. He stresses the need for "human-in-the-loop" principles in military applications, setting red lines against autonomous weapons and mass surveillance. Amodei calls for industry collaboration among trustworthy actors to establish standards and advocates for a balanced regulatory framework with checks and balances, such as Anthropic's Long-term Benefit Trust, rather than corporate or government monopoly over the technology. He expresses geopolitical concerns, particularly regarding China, and a belief that AI should bolster liberal democracies. While acknowledging a non-zero risk of civilizational catastrophe from advanced AI, he asserts Anthropic's actions are aimed at significantly reducing that probability. The interview concludes with Amodei arguing that trust must be earned through concrete actions, like sacrificing commercial gain for safety, to distinguish Anthropic in a Silicon Valley landscape he criticizes for eroded public trust.

marsbit26 dk önce

Anthropic CEO's Latest Interview: On Technological Explosion, Safety Red Lines, and the Civilization Contract

marsbit26 dk önce

As Capital Rotates From Crypto to AI, Zoomex Traders Already Have Access to Both

Zoomex, a crypto derivatives exchange, launches Zoomex Stocks, a tokenized equities trading solution that allows users to access both AI/semiconductor stocks and crypto from a single Unified Trading Account. This comes as institutional capital rotates from crypto (with Bitcoin ETFs seeing outflows) toward surging AI equities. The product offers 12 tokenized U.S. stocks/ETFs (like TSLAx, NVDAx, QQQx) backed 1:1 by real assets via the xStocks model. Key benefits include 24/7 trading with USDT, no separate brokerage account, a flat 0.50% fee, near-instant on-chain settlement, and no geographic restrictions. It enables crypto-native traders to seamlessly diversify into traditional high-growth equities without leaving the Zoomex platform.

TheNewsCrypto37 dk önce

As Capital Rotates From Crypto to AI, Zoomex Traders Already Have Access to Both

TheNewsCrypto37 dk önce

The Brutal Truth Behind CARDS' $535M FDV: Only $43M in Net Revenue and Halved Profit Margins

The article titled "The Brutal Truth Behind CARDS' $535 Million FDV: Only $43 Million Net Revenue, Profit Margins Halved" provides a critical analysis of Collector Crypt (CC), a platform combining physical collectible cards with NFTs in a gacha-style system. Key findings include: * CC has generated $635 million in total user deposits. However, 90.6% ($576 million) is instantly returned to users via automatic card buybacks, resulting in only $43 million in net platform revenue (6.7% retention). * Activity is highly concentrated among dozens of high-frequency wallets, with an average of only ~420 daily active players. * There is minimal secondary market activity for the cards (under $5 million total), indicating the platform functions more as a gambling casino than a collector's marketplace. eBay sales as a percentage of gacha volume have declined for six consecutive quarters. * Despite a tripling in transaction volume, net profit margins have been halved from 11.2% to 5.8% as activity shifts to higher-priced card packs with lower margins. * Value captured by the CARDS token is minimal: only $140,000 (from burns and recent buybacks), representing just 3.4% of CC's cumulative net revenue. In contrast, wallets linked to operational infrastructure have off-ramped $45.7 million in USDC. * The token's ~$535 million Fully Diluted Valuation (FDV) represents a 7.3x multiple of annualized net revenue. Only 20.5% of the token supply is floating, with 72% allocated to insiders and locked until November 2027. The conclusion is that CC has found product-market fit as a high-speed gambling platform for a niche user base, not as a growing collector economy. The token currently captures a negligible share of the platform's revenue.

Foresight News51 dk önce

The Brutal Truth Behind CARDS' $535M FDV: Only $43M in Net Revenue and Halved Profit Margins

Foresight News51 dk önce

İşlemler

Spot

Futures