
Tool-as-Interface, a two-camera, video-only method that teaches robots tool use, with a high average success rate and cuts data collection time.

A research team from University of Illinois in collaboration with Columbia University and UT Austin, has unveiled a framework that trains robots to use tools by learning directly from ordinary human videos. The method reports higher success rate for doing the particular tasks and faster data collection than teleoperation-based baselines, pointing to a lower-cost route for teaching dynamic skills.
The approach, called Tool-as-Interface, the robot learns from data, collected by two RGB camera views of a person performing a task. A 3D reconstruction model (MASt3R) builds scene geometry; 3D Gaussian splatting synthesises extra views to improve robustness.
The real magic happens, with the removal of humans from the video. With Grounded-SAM, an open-set object detector to combine with the segment anything model (SAM). The system tracks only the tool and its interaction with the scene, ignoring the human from it.
The system then estimates the tool’s 6-DoF, 6 Degree of Freedom to mimic and learns a tool-centred policy, which links to cross-robot transfer.
The team validated the framework on five tasks that require speed or precision: hammering a nail, scooping a meatball, flipping food in a pan, balancing a wine bottle, and kicking a football into a goal. Across these tasks, the method achieved a 71% higher average success rate than diffusion policies trained on teleoperation data and reduced data-collection time by 77%. Some tasks were solved only by this framework in the reported tests.
The data pipeline uses commodity cameras and does not require robot-side operators or motion-capture rigs. That reduces setup complexity and can scale to demonstrations recorded outside the lab.
Limitations remain. The current system assumes a rigid tool fixed to the gripper and can suffer from pose-estimation errors; novel-view synthesis can degrade under large viewpoint changes. These constraints guide the next set of engineering targets.