A humanoid robot with 40 joints has more potential positions than atoms in the universe (360^40). This combinatorial explosion makes it impossible to solve movement and interaction with traditional, hard-coded rules. Consequently, advanced AI like neural networks are not just an optimization but a fundamental necessity.
While language models understand the world through text, Demis Hassabis argues they lack an intuitive grasp of physics and spatial dynamics. He sees 'world models'—simulations that understand cause and effect in the physical world—as the critical technology needed to advance AI from digital tasks to effective robotics.
While LLMs dominate headlines, Dr. Fei-Fei Li argues that "spatial intelligence"—the ability to understand and interact with the 3D world—is the critical, underappreciated next step for AI. This capability is the linchpin for unlocking meaningful advances in robotics, design, and manufacturing.
Physical Intelligence demonstrated an emergent capability where its robotics model, after reaching a certain performance threshold, significantly improved by training on egocentric human video. This solves a major bottleneck by leveraging vast, existing video datasets instead of expensive, limited teleoperated data.
The adoption of powerful AI architectures like transformers in robotics was bottlenecked by data quality, not algorithmic invention. Only after data collection methods improved to capture more dexterous, high-fidelity human actions did these advanced models become effective, reversing the typical 'algorithm-first' narrative of AI progress.
Self-driving cars, a 20-year journey so far, are relatively simple robots: metal boxes on 2D surfaces designed *not* to touch things. General-purpose robots operate in complex 3D environments with the primary goal of *touching* and manipulating objects. This highlights the immense, often underestimated, physical and algorithmic challenges facing robotics.
The computer industry originally chose a "hyper-literal mathematical machine" path over a "human brain model" based on neural networks, a theory that existed since the 1940s. The current AI wave represents the long-delayed success of that alternate, abandoned path.
Human intelligence is multifaceted. While LLMs excel at linguistic intelligence, they lack spatial intelligence—the ability to understand, reason, and interact within a 3D world. This capability, crucial for tasks from robotics to scientific discovery, is the focus for the next wave of AI models.
Manipulating deformable objects like towels was long considered one of the final, hardest challenges in robotics due to their infinite variations. The fact that Figure's neural networks can now successfully fold laundry indicates that the core technological hurdles for truly general-purpose robots have been overcome.
The "bitter lesson" (scale and simple models win) works for language because training data (text) aligns with the output (text). Robotics faces a critical misalignment: it's trained on passive web videos but needs to output physical actions in a 3D world. This data gap is a fundamental hurdle that pure scaling cannot solve.
Unlike older robots requiring precise maps and trajectory calculations, new robots use internet-scale common sense and learn motion by mimicking humans or simulations. This combination has “wiped the slate clean” for what is possible in the field.