NVIDIA Team Enables Programming Agent to Take Over Real Robot Experiments, Achieves 99% Success Rate

marsbitPublished on 2026-06-18Last updated on 2026-06-18

Abstract

NVIDIA's ENPIRE project demonstrates fully automated robotic research where AI agents, given only high-level goals, autonomously manage the entire loop—from literature search and code development to training, deployment, and hardware iteration—on a fleet of physical robots. The system achieves 99% success rates on dexterous real-world tasks like cable tying and peg sorting. Key insights include the discovery of a "physical scaling law" where more parallel robots speed up task resolution, and the observation that resetting an environment is often easier than the main task. The framework introduces metrics like Mean Robot Utilization (MRU) to measure efficiency, with robots often idle half the time, waiting for agent decisions. The long-term vision is a lab that runs autonomously, even without human oversight. The project will be open-sourced, allowing developers to build similar systems.

Automated research has truly stepped out of the code sandbox and into the real physical world.

Recently, Jim Fan, lead of NVIDIA's GEAR lab, introduced a new project called ENPIRE. This marks their first implementation of automated research on robot hardware.

They placed 8 Codex Agents into a robot fleet, allocated GPU computing power and ample token budgets, and gave a simple goal: solve the task as quickly as possible, keep the robots busy but safe, and avoid wasting computing power.

After that, human intervention was largely withdrawn. The Agents autonomously drove the entire closed loop, including automatic scene resetting, literature review, idea implementation and infrastructure setup, policy training and deployment, self-verification, log analysis and code improvement, iterating continuously until high-precision dexterous tasks were reliably completed on real hardware, such as fastening cable ties, organizing pins into a box, installing GPUs, etc.

They also observed a "physical scaling law": increasing the number of parallel robots (e.g., from a few to 8) significantly sped up task resolution.

Currently, part of the lab's systems can perform self-iteration overnight without human intervention, with researchers only needing to check reports in the morning.

Jim Fan stated that the future goal is to allow team members to go on vacation with peace of mind, with even NVIDIA CEO Jensen Huang unaware that the lab is still running autonomously.

The ENPIRE project plans to be fully open-sourced, potentially enabling regular developers to set up similar autonomous robot research systems at home.

Project address: https://research.nvidia.com/labs/gear/enpire/

ENPIRE System Architecture: Four Modules Forming a Closed Loop

ENPIRE is a framework system designed for coding Agents, constructing a repeatable physical feedback loop through four core modules: the Environment module (EN) handles automatic resetting and verification; the Policy Improvement module (PI) initiates policy optimization; the Rollout module (R) supports policy evaluation on single or multiple robots in parallel; and the Evolution module (E) enables coding Agents to analyze logs, review literature, and improve training infrastructure and algorithm code to address failure modes.

This closed-loop system transforms real-world robot learning into an Agent-managed, controllable optimization process, thereby minimizing manual input while supporting fair ablation experiments across different training recipes and Agent variants.

Supported by ENPIRE, cutting-edge programming Agents can autonomously develop policies and achieve a 99% success rate on challenging real-world dexterous manipulation tasks like PushT, organizing pins into a pin box, and cutting cable ties with a cutter.

Key Finding: Resetting the Environment is Often Easier Than Completing the Task Itself

One key observation is: For many robot tasks, resetting the environment is often easier than completing the task itself.

Therefore, ENPIRE's approach is to first let the Agent build an automatic resetting environment via Code-as-Policy. Often, the so-called reset is essentially a pick-and-place task, solvable by Cap-X.

Subsequently, the agent writes a heuristic rule-based reward function. The research team then places this environment into a sandbox and initiates automated research by the Agent centered around scoring.

This also echoes Karpathy's definition of automated research: automated research here is not simply tuning a hyperparameter or modifying a small piece of code. The Agent explores different paradigms from the internet and rewrites anything that could potentially boost performance, including algorithms, training objectives, and even the data loader.

In the pin task, one Agent even wrote its own contact force safety controller, which proved more effective than merely adjusting several reinforcement learning parameters.

New Metrics MRU and MTU

ENPIRE's scalability depends on the size of the Agent team and computing resources, but here, the truly scarce resource is not GPUs, but robot time.

When the research team provided Agents with 8 robots instead of 1, the time needed for the pin task to achieve near-perfect performance was reduced from over 1.5 hours to about 40 minutes. These Agents coordinate via Git: sharing code, discarding suboptimal ideas, and autonomously selecting each other's best-performing runs.

This points to a larger shift: robotics research is becoming a work of environment design—building environments where coding Agents can conduct automated research; algorithmic work is shifting to a higher layer, towards constructing feedback loops that Agents can autonomously close.

And this loop compounds continuously: a skill mastered by an Agent today becomes the building block for constructing and resetting environments for more difficult tasks tomorrow. Capabilities bootstrap new capabilities.

Under this paradigm, the true hard constraint is the budget for real-world interaction.

Therefore, the research team proposed two metrics:

Mean Robot Utilization (MRU): The proportion of time robots are actually running experiments relative to total elapsed real-world time.
Mean Token Utilization (MTU): Measures the efficiency of Agents in converting tokens into research progress.

In their experiments, MRU consistently remained below 50%. That means the robots were idle half the time, waiting for the Agents to think. Therefore, a better harness and faster models directly translate into tangible benefits.

PushT is a long-standing robot manipulation benchmark. Typically, completing this task requires extensive human demonstration data plus hours of behavior cloning training.

However, they observed that Codex, Claude Code, and Kimi Code all "solved" this task in under 2 hours using a rule-based heuristic approach: without neural networks, without training, and without relying on any human data.

To enable more people to experiment with automated research in the physical world at home, they developed a full-stack system based on the @LeRobotHF SO-101 kit + NVIDIA Jetson Thor. This system can perform the PushT task.

Reference Links:

https://x.com/_wenlixiao/status/2066913334994358342

https://x.com/DrJimFan/status/2066921736369766762

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Yang Wen

The Brutal Truth Behind CARDS' $535M FDV: Only $43M in Net Revenue and Halved Profit Margins

The article titled "The Brutal Truth Behind CARDS' $535 Million FDV: Only $43 Million Net Revenue, Profit Margins Halved" provides a critical analysis of Collector Crypt (CC), a platform combining physical collectible cards with NFTs in a gacha-style system. Key findings include: * CC has generated $635 million in total user deposits. However, 90.6% ($576 million) is instantly returned to users via automatic card buybacks, resulting in only $43 million in net platform revenue (6.7% retention). * Activity is highly concentrated among dozens of high-frequency wallets, with an average of only ~420 daily active players. * There is minimal secondary market activity for the cards (under $5 million total), indicating the platform functions more as a gambling casino than a collector's marketplace. eBay sales as a percentage of gacha volume have declined for six consecutive quarters. * Despite a tripling in transaction volume, net profit margins have been halved from 11.2% to 5.8% as activity shifts to higher-priced card packs with lower margins. * Value captured by the CARDS token is minimal: only $140,000 (from burns and recent buybacks), representing just 3.4% of CC's cumulative net revenue. In contrast, wallets linked to operational infrastructure have off-ramped $45.7 million in USDC. * The token's ~$535 million Fully Diluted Valuation (FDV) represents a 7.3x multiple of annualized net revenue. Only 20.5% of the token supply is floating, with 72% allocated to insiders and locked until November 2027. The conclusion is that CC has found product-market fit as a high-speed gambling platform for a niche user base, not as a growing collector economy. The token currently captures a negligible share of the platform's revenue.

Foresight News11m ago

The Brutal Truth Behind CARDS' $535M FDV: Only $43M in Net Revenue and Halved Profit Margins

Foresight News11m ago

Cardano Van Rossem Upgrade Moves Closer To Mainnet As Governance Phase Advances

Cardano's Van Rossem protocol upgrade is progressing through its final governance and readiness phase, following its successful enactment on the PreProd testnet. This upgrade is a key test of Cardano's on-chain governance model post-Voltaire, involving coordination among stakeholders. Traders are monitoring the rollout closely, as a smooth process could boost confidence in ADA, while delays could hurt sentiment, especially in a weak market. While PreProd success is a positive step, the focus is now on mainnet activation and clear communication from official channels. Ultimately, a technically sound and uneventful execution would validate Cardano's governance process, serving as a concrete development catalyst.

bitcoinist18m ago

Cardano Van Rossem Upgrade Moves Closer To Mainnet As Governance Phase Advances

bitcoinist18m ago

BitTorrent Launches BTTInferGrid: The Decentralized Infrastructure Layer for Scalable AI Inference

BitTorrent has launched BTTInferGrid, a decentralized GPU computing network designed to meet the surging demand for AI inference workloads. The platform aggregates global idle GPU resources into an open-access, verifiable, and pay-as-you-go infrastructure, aiming to solve the cost, scalability, and supply bottlenecks of traditional centralized cloud providers. BTTInferGrid addresses a key market shift, as industry forecasts indicate over 70% of future AI compute will be for inference—a continuous operational cost. It tackles centralization issues like inflexible resource allocation during volatile demand, prohibitive GPU pricing, and the underutilization of fragmented global compute capacity. The platform establishes a direct corridor between AI developers and idle hardware. On the supply side, it allows providers to monetize underutilized GPUs through tokenized incentives. On the demand side, it offers developers cost-efficient, on-demand inference with on-chain verification. Key differentiators include permissionless access for providers, verifiable service quality through blockchain validation, and a sustainable, demand-driven economic model. Built on BitTorrent's proven DePIN expertise from the BitTorrent File System (BTFS), BTTInferGrid follows a phased roadmap. It begins with network bootstrapping in 2026, focusing on scaling GPU nodes, and aims to evolve into a foundational Web3 AI infrastructure layer by 2028, supporting diverse model architectures and decentralized fine-tuning.

TheNewsCrypto47m ago

BitTorrent Launches BTTInferGrid: The Decentralized Infrastructure Layer for Scalable AI Inference

TheNewsCrypto47m ago

HTX Research | From U.S. Equities to On-Chain Markets: How Perpetual Contracts Are Reshaping Global Stock Trading

In 2026, the tokenized equity market is transitioning from a fringe experiment to a mainstream segment, with its core driver being the explosive growth of perpetual contracts as an innovative product format. According to CoinLaw data, as of May 2026, the total on-chain market value of the tokenized equity market has surpassed $1.43 billion, with a 30-day growth rate of 25.83% and 267,000 holders—both metrics ranking as the highest growth rates among all RWA assets. Decentralized perpetual exchanges represented by Hyperliquid have surpassed Coinbase International in derivatives trading volume, signaling that on-chain equity derivatives are evolving into a structurally independent financial market with autonomous pricing capabilities and institutional-grade operating mechanisms. We will systematically review the evolutionary logic of the two major product architectures—fully collateralized spot markets and perpetual contracts—analyze the competitive landscape of leading participants such as Hyperliquid and Ondo Finance, empirically validate the overnight price discovery function through on-chain data of Samsung Electronics and SK Hynix perpetual contracts, identify the core risks of this segment, and outline three investment themes: funding rate arbitrage, cross-exchange spread arbitrage, and market maker services.

HTX Learn47m ago

HTX Research | From U.S. Equities to On-Chain Markets: How Perpetual Contracts Are Reshaping Global Stock Trading

HTX Learn47m ago

Annualized 15%-25%, Is BlackRock's Bitcoin Income ETF an Opportunity or a Trap?

BlackRock’s Bitcoin Income ETF (BITA), which aims to deliver an annual yield of 15-25% while capturing at least 70% of Bitcoin’s upside, has sparked debate over whether it represents an opportunity or a trap. The fund, listed on Nasdaq in mid-June, generates income by writing covered call options against BlackRock’s spot Bitcoin ETF (IBIT), providing cash flow but capping significant bullish gains. Proponents argue it could attract yield-seeking capital into Bitcoin, boosting demand and price, with institutions like JPMorgan and VanEck offering bullish long-term targets. However, critics warn it may simply divert existing spot investment rather than bring new capital, and that investors bear full downside risk while sacrificing upside potential. Figures like Bitfinex’s Paolo Ardoino also caution that over-concentration in ETFs could harm crypto’s decentralized ethos. Market views on Bitcoin’s cycle bottom remain divided, with predictions ranging from $40,000-$46,000 to the bear market already ending. The fund’s impact will ultimately be revealed by flow data: sustained inflows into BITA and IBIT would signal strong institutional adoption, while mere redistribution would support the "yield trap" narrative.

Foresight News48m ago

Annualized 15%-25%, Is BlackRock's Bitcoin Income ETF an Opportunity or a Trap?