Demis Hassabis sees video generation as more than a content tool; it's a step toward building AI with "world models." By learning to generate realistic scenes, these models develop an intuitive understanding of physics and causality, a foundational capability for AGI to perform long-term planning in the real world.

Related Insights

Demis Hassabis notes that while generative AI can create visually realistic worlds, their underlying physics are mere approximations. They look correct casually but fail rigorous tests. This gap between plausible and accurate physics is a key challenge that must be solved before these models can be reliably used for robotics training.

While language models understand the world through text, Demis Hassabis argues they lack an intuitive grasp of physics and spatial dynamics. He sees 'world models'—simulations that understand cause and effect in the physical world—as the critical technology needed to advance AI from digital tasks to effective robotics.

Demis Hassabis describes an innovative training method combining two AI projects: Genie, which generates interactive worlds, and Simmer, an AI agent. By placing a Simmer agent inside a world created by Genie, they can create a dynamic feedback loop with virtually infinite, increasingly complex training scenarios.

Language is just one 'keyhole' into intelligence. True artificial general intelligence (AGI) requires 'world modeling'—a spatial intelligence that understands geometry, physics, and actions. This capability to represent and interact with the state of the world is the next critical phase of AI development beyond current language models.

Startups and major labs are focusing on "world models," which simulate physical reality, cause, and effect. This is seen as the necessary step beyond text-based LLMs to create agents that can truly understand and interact with the physical world, a key step towards AGI.

Hassabis argues AGI isn't just about solving existing problems. True AGI must demonstrate the capacity for breakthrough creativity, like Einstein developing a new theory of physics or Picasso creating a new art genre. This sets a much higher bar than current systems.

Large language models are insufficient for tasks requiring real-world interaction and spatial understanding, like robotics or disaster response. World models provide this missing piece by generating interactive, reason-able 3D environments. They represent a foundational shift from language-based AI to a more holistic, spatially intelligent AI.

Today's AI models are powerful but lack a true sense of causality, leading to illogical errors. Unconventional AI's Naveen Rao hypothesizes that building AI on substrates with inherent time and dynamics—mimicking the physical world—is the key to developing this missing causal understanding.

The AI's ability to handle novel situations isn't just an emergent property of scale. Waive actively trains "world models," which are internal generative simulators. This enables the AI to reason about what might happen next, leading to sophisticated behaviors like nudging into intersections or slowing in fog.

Google DeepMind CEO Demis Hassabis argues that today's large models are insufficient for AGI. He believes progress requires reintroducing algorithmic techniques from systems like AlphaGo, specifically planning and search, to enable more robust reasoning and problem-solving capabilities beyond simple pattern matching.