Leading roboticist Ken Goldberg clarifies that while legged robots show immense progress in navigation, fine motor skills for tasks like tying shoelaces are far beyond current capabilities. This is due to challenges in sensing and handling deformable, unpredictable objects in the real world.
Ken Goldberg's company, Ambi Robotics, successfully uses simple suction cups for logistics. He argues that the industry's focus on human-like hands is misplaced, as simpler grippers are more practical, reliable, and capable of performing immensely complex tasks today.
While autonomous driving is complex, roboticist Ken Goldberg argues it's an easier problem than dexterous manipulation. Driving fundamentally involves avoiding contact with objects, whereas manipulation requires precisely controlled contact and interaction with them, a much harder challenge.
The dream of a do-everything humanoid is a top-down approach that will take a long time. Roboticist Ken Goldberg argues for a bottom-up strategy: master specific, valuable tasks like folding clothes or making coffee reliably first. General intelligence will emerge from combining these skills over time.
The robotics field has a scalable recipe for AI-driven manipulation (like GPT), but hasn't yet scaled it into a polished, mass-market consumer product (like ChatGPT). The current phase focuses on scaling data and refining systems, not just fundamental algorithm discovery, to bridge this gap.
Ken Goldberg quantifies the challenge: the text data used to train LLMs would take a human 100,000 years to read. Equivalent data for robot manipulation (vision-to-control signals) doesn't exist online and must be generated from scratch, explaining the slower progress in physical AI.
World Labs co-founder Fei-Fei Li posits that spatial intelligence—the ability to reason and interact in 3D space—is a distinct and complementary form of intelligence to language. This capability is essential for tasks like robotic manipulation and scientific discovery that cannot be reduced to linguistic descriptions.
Self-driving cars, a 20-year journey so far, are relatively simple robots: metal boxes on 2D surfaces designed *not* to touch things. General-purpose robots operate in complex 3D environments with the primary goal of *touching* and manipulating objects. This highlights the immense, often underestimated, physical and algorithmic challenges facing robotics.
Surgeons perform intricate tasks without tactile feedback, relying on visual cues of tissue deformation. This suggests robotics could achieve complex manipulation by advancing visual interpretation of physical interactions, bypassing the immense difficulty of creating and integrating artificial touch sensors.
Moving a robot from a lab demo to a commercial system reveals that AI is just one component. Success depends heavily on traditional engineering for sensor calibration, arm accuracy, system speed, and reliability. These unglamorous details are critical for performance in the real world.
The "bitter lesson" (scale and simple models win) works for language because training data (text) aligns with the output (text). Robotics faces a critical misalignment: it's trained on passive web videos but needs to output physical actions in a 3D world. This data gap is a fundamental hurdle that pure scaling cannot solve.