While autonomous driving is complex, roboticist Ken Goldberg argues it's an easier problem than dexterous manipulation. Driving fundamentally involves avoiding contact with objects, whereas manipulation requires precisely controlled contact and interaction with them, a much harder challenge.
Ken Goldberg's company, Ambi Robotics, successfully uses simple suction cups for logistics. He argues that the industry's focus on human-like hands is misplaced, as simpler grippers are more practical, reliable, and capable of performing immensely complex tasks today.
Instead of creating bespoke self-driving kits for every car model, a humanoid robot can physically sit in any driver's seat and operate the controls. This concept, highlighted by George Hotz, bypasses proprietary vehicle systems and hardware lock-in, treating the car as a black box.
Leading roboticist Ken Goldberg clarifies that while legged robots show immense progress in navigation, fine motor skills for tasks like tying shoelaces are far beyond current capabilities. This is due to challenges in sensing and handling deformable, unpredictable objects in the real world.
The dream of a do-everything humanoid is a top-down approach that will take a long time. Roboticist Ken Goldberg argues for a bottom-up strategy: master specific, valuable tasks like folding clothes or making coffee reliably first. General intelligence will emerge from combining these skills over time.
The choice between simulation and real-world data depends on a task's core difficulty. For locomotion, complex reactive behavior is harder to capture than simple ground physics, favoring simulation. For manipulation, complex object physics are harder to simulate than simple grasping behaviors, favoring real-world data.
Autonomous systems can perceive and react to dangers beyond human capability. The example of a Cybertruck autonomously accelerating to lessen the impact of a potential high-speed rear-end collision—a car the human driver didn't even see—showcases a level of predictive safety that humans cannot replicate, moving beyond simple accident avoidance.
Ken Goldberg quantifies the challenge: the text data used to train LLMs would take a human 100,000 years to read. Equivalent data for robot manipulation (vision-to-control signals) doesn't exist online and must be generated from scratch, explaining the slower progress in physical AI.
Self-driving cars, a 20-year journey so far, are relatively simple robots: metal boxes on 2D surfaces designed *not* to touch things. General-purpose robots operate in complex 3D environments with the primary goal of *touching* and manipulating objects. This highlights the immense, often underestimated, physical and algorithmic challenges facing robotics.
The debate over putting cameras in a robot's palm is analogous to Tesla's refusal to use LIDAR. Ken Goldberg suggests that just as LIDAR helps with edge cases in driving, in-hand cameras provide crucial, low-cost data for manipulation. Musk's purist approach may be a self-imposed handicap in both domains.
Surgeons perform intricate tasks without tactile feedback, relying on visual cues of tissue deformation. This suggests robotics could achieve complex manipulation by advancing visual interpretation of physical interactions, bypassing the immense difficulty of creating and integrating artificial touch sensors.