Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Neurobotics posits that true physical AI requires more than just vision-language models; it needs a "nervous system" and reflexes. They advocate for training robots in physical "gyms" to collect embodied data, arguing that complex physical tasks cannot be learned solely by watching videos.

Related Insights

A core controversy in robotics is whether to follow AI's "bitter lesson"—that general methods using massive data outperform systems with hand-coded knowledge. Many roboticists still argue for programming in physics for reliability, resisting a purely end-to-end learning approach that relies solely on data.

While language models understand the world through text, Demis Hassabis argues they lack an intuitive grasp of physics and spatial dynamics. He sees 'world models'—simulations that understand cause and effect in the physical world—as the critical technology needed to advance AI from digital tasks to effective robotics.

In robotics, purely imitating human actions is insufficient. A model trained this way doesn't learn how to recover from inevitable errors. Comma AI solves this by training its models in a simulator where they are forced to learn recovery paths from off-course situations, a critical step for real-world deployment.

Large Language Models are limited because they lack an understanding of the physical world. The next evolution is 'World Models'—AI trained on real-world sensory data to understand physics, space, and context. This is the foundational technology required to unlock physical AI like advanced robotics.

Unlike pre-programmed industrial robots, "Physical AI" systems sense their environment, make intelligent choices, and receive live feedback. This paradigm shift, similar to Waymo's self-driving cars versus simple cruise control, allows for autonomous and adaptive scientific experimentation rather than just repetitive tasks.

Ken Goldberg quantifies the challenge: the text data used to train LLMs would take a human 100,000 years to read. Equivalent data for robot manipulation (vision-to-control signals) doesn't exist online and must be generated from scratch, explaining the slower progress in physical AI.

AI can generate art because it was trained on the internet's vast trove of images. It struggles with physical tasks like washing dishes because there is virtually no first-person video data for such actions. Solving this data-gathering problem is key to advancing robotics.

The "bitter lesson" (scale and simple models win) works for language because training data (text) aligns with the output (text). Robotics faces a critical misalignment: it's trained on passive web videos but needs to output physical actions in a 3D world. This data gap is a fundamental hurdle that pure scaling cannot solve.

ONE X designs its robots with human-like physical properties, down to skin tissue stiffness. This allows them to effectively leverage the internet's vast repository of human video data (e.g., YouTube) as a training set, bootstrapping intelligence without needing to create an entirely new internet-sized dataset.

CEO Brett Adcock posits that real-world interaction is the 'last missing piece' for AGI. Because humanoid robots can learn from physically touching the world, trial-and-error, and consequences, he believes they may be the first embodiments to achieve artificial general intelligence, surpassing purely digital models.