AI & Machine Learning

Nvidia's robot fleet trains itself via AI coding agents

Huma Shazia18 June 2026 at 6:47 am5 min read

Key Takeaways

Eight robot stations using AI coding agents achieved 99% success on complex manipulation tasks like GPU insertion
The system eliminates human oversight by having AI agents write their own reward functions and share discoveries via Git
Real-world testing remains the hardest challenge, with agents failing tasks they solved easily in simulation

A fleet of eight robots at NVIDIA's labs is teaching itself dexterous manipulation tasks with almost no human involvement. The ENPIRE project, a collaboration between NVIDIA, Carnegie Mellon University, and UC Berkeley, uses AI coding agents to run physical experiments, analyze results, and rewrite their own training code. On tasks like inserting pins and cutting cable ties, the system hit up to 99% success rates.

"We are effectively turning a hardware lab into a self-improving software pipeline where agents can iterate on physical experiments overnight," said Jim Fan, Director of AI at NVIDIA.

The bottleneck in robot training has always been human labor. Someone has to collect data, reset the scene after each attempt, and tweak algorithms when things go wrong. ENPIRE eliminates that bottleneck by handing the entire loop to AI agents.

How do AI coding agents teach robots new skills?

ENPIRE runs in two phases. In the first, an AI agent sets up the working environment with minimal human input. It defines safety boundaries, creates an automatic reset mechanism, and writes its own reward function to distinguish success from failure. The agent needs only a few minutes of example video showing both outcomes.

For a pin insertion task, the agent developed an evaluation check combining visual alignment, gripper height, and estimated force. For closing a cable tie, it used two camera angles to avoid false positives and pushed reaction time below 150 milliseconds. These evaluation tools get built once and reused without changes.

The second phase is fully autonomous. The agent reads research papers, forms hypotheses, and edits training code directly. It picks between methods like behavior cloning, where the robot mimics human demonstrations, or reinforcement learning, where it improves through trial and error. The choice depends on real-world success signals, not human judgment.

Why use Git for robot coordination?

The fleet consists of eight dual-arm YAM robot stations, each with its own hardware, computer, and coding agent. The agents test different hypotheses simultaneously and share results only through Git, the version control system developers use for software.

This approach treats physical experiments like software engineering. When one agent discovers a breakthrough, successful training recipes spread across the entire fleet. Bad ideas get discarded without committee meetings.

The scaling benefits are measurable. On the Push-T test, where a robot slides a T-shaped block into a target position, going from one to eight agents cut the time to full success from five hours to two. For pin insertion, time dropped from over 90 minutes to roughly 40.

Which AI models performed best?

The researchers tested three coding agents: Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6. Codex performed best in most cases. But the more interesting finding was the gap between simulation and reality.

On the Push-T test, all three agents solved the task in simulation. Two out of three failed in the real environment. The researchers blame unpredictable conditions: robot dynamics, friction, and object movement that simulations cannot fully capture.

In the RoboCasa simulation benchmark, ENPIRE beat both GR00T, an end-to-end vision-language-action model, and CaP-X, a tool-based approach without automated research capabilities.

What are the system's limits?

The study is honest about inefficiencies. Robots and compute don't get fully used because agents spend significant time reading logs, writing code, and waiting. The more robots in the fleet, the lower the per-robot utilization. Agents spend more time summarizing each other's results than running experiments.

Token costs also grow faster than performance gains. Larger fleets reach goals sooner but burn through far more compute budget to get there. The economics of scaling are not linear.

Still, learned skills do transfer. Experience from pin insertion helped the agents slot GPUs into motherboards using the robot arms, a task never explicitly trained.

ℹ️

Logicity's Take

The real breakthrough here is not the 99% success rate on grasping tasks. It is the automation of scientific method itself. ENPIRE's agents read papers, form hypotheses, and run experiments. If this approach generalizes beyond manipulation, research labs may soon deploy AI agents that run physical experiments 24/7, testing ideas faster than any human team could. The token cost problem is temporary. The paradigm shift is not.

Frequently Asked Questions

What is NVIDIA's ENPIRE project?

ENPIRE is a research framework that uses AI coding agents to autonomously train robots on dexterous manipulation tasks. The system handles experiment setup, evaluation, and code optimization without continuous human oversight.

How do AI coding agents train robots without humans?

The agents write their own reward functions by analyzing example videos, then run experiments, check results, and rewrite training code based on what works. They share successful approaches through Git.

What success rates did the ENPIRE robots achieve?

The eight-robot fleet achieved up to 99% success on tasks like pin insertion, Push-T block positioning, and cable tie cutting. Pin insertion reached 100% success faster than comparable human-supervised methods.

Why do robots still struggle with real-world tasks?

Simulations cannot fully capture real-world variables like friction, robot dynamics, and unpredictable object movement. Two of three agents that solved tasks in simulation failed when tested on physical hardware.

Which AI coding model performed best in the ENPIRE tests?

Codex with GPT-5.5 performed best in most cases, outperforming Claude Code with Opus 4.7 and Kimi Code with Kimi K2.6 on the dexterous manipulation tasks.

Need Help Implementing This?

If you're exploring AI agents for automation, robotics, or research workflows, reach out to our team at Logicity. We track the latest developments and can help you evaluate what's practical for your organization today.

Source: The Decoder / Maximilian Schreiner