Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Waabi's CEO explains that for physical AI, world models must go beyond just creating realistic simulations. The critical feature is 'controllability'—the ability to precisely generate and manipulate specific, safety-critical scenarios for testing. This is a fundamental difference from world models used for generating creative media or games.

Related Insights

Demis Hassabis notes that while generative AI can create visually realistic worlds, their underlying physics are mere approximations. They look correct casually but fail rigorous tests. This gap between plausible and accurate physics is a key challenge that must be solved before these models can be reliably used for robotics training.

While language models understand the world through text, Demis Hassabis argues they lack an intuitive grasp of physics and spatial dynamics. He sees 'world models'—simulations that understand cause and effect in the physical world—as the critical technology needed to advance AI from digital tasks to effective robotics.

Instead of reacting to its environment, ONE X's world model AI allows its robots to 'think' forward and simulate potential outcomes of an action. Like a human anticipating spilling hot coffee, the robot can identify risks and select the safest trajectory, which is critical for operating in a home.

Startups and major labs are focusing on "world models," which simulate physical reality, cause, and effect. This is seen as the necessary step beyond text-based LLMs to create agents that can truly understand and interact with the physical world, a key step towards AGI.

Large language models are insufficient for tasks requiring real-world interaction and spatial understanding, like robotics or disaster response. World models provide this missing piece by generating interactive, reason-able 3D environments. They represent a foundational shift from language-based AI to a more holistic, spatially intelligent AI.

Large Language Models are limited because they lack an understanding of the physical world. The next evolution is 'World Models'—AI trained on real-world sensory data to understand physics, space, and context. This is the foundational technology required to unlock physical AI like advanced robotics.

Waabi's CEO argues that achieving Level 4 (eyes-off) autonomy isn't a linear progression from Level 2 (driver-assist). They are entirely different safety problems. L4 requires a purpose-built technology stack from day one, as the absence of a human driver introduces challenges that cannot be solved by simply improving an L2 system.

The AI's ability to handle novel situations isn't just an emergent property of scale. Waive actively trains "world models," which are internal generative simulators. This enables the AI to reason about what might happen next, leading to sophisticated behaviors like nudging into intersections or slowing in fog.

Instead of using traditional, rule-based simulators, Comma AI trains its driving agent inside a learned "world model." This generative model creates photorealistic, diverse driving scenarios and, crucially, responds accurately to the agent's simulated actions—a key requirement for effective robotics training.

Demis Hassabis sees video generation as more than a content tool; it's a step toward building AI with "world models." By learning to generate realistic scenes, these models develop an intuitive understanding of physics and causality, a foundational capability for AGI to perform long-term planning in the real world.