We scan new podcasts and send you the top 5 insights daily.
New AI lab Odyssey is not building a direct robot controller. Instead, its 'foundation world model' acts as a general-purpose 'physics engine' for AI, learning the rules of reality from data. This foundational layer can then be licensed and used by other companies to build their specific action-oriented robot models.
The next major leap in AI may come from "world models," which aim to give LLMs an experiential, physical understanding of concepts like space and physics. This mirrors the difference between knowing facts from a book and having real-world experience.
To build generalist robots, the most effective approach is pre-training foundation models on internet-scale video datasets, not just simulation or tele-operated data. This vast, diverse data provides a deep, implicit understanding of physics and object interaction that is impossible to replicate in controlled environments, enabling true generalization.
While language models understand the world through text, Demis Hassabis argues they lack an intuitive grasp of physics and spatial dynamics. He sees 'world models'—simulations that understand cause and effect in the physical world—as the critical technology needed to advance AI from digital tasks to effective robotics.
Vision Language Action models (VLAs) have not yet produced a 'ChatGPT moment' for robotics. Consequently, investor enthusiasm and capital are increasingly flowing towards the alternative 'World Model' approach, which learns physics from video, even though it has yet to demonstrate superior tangible results.
The Physical Intelligence thesis is that a foundation model learning from diverse data can achieve a "physical understanding" of the world, making it easier to adapt to new tasks than building single-purpose robots from scratch. Generality leverages broader data, which is ultimately a more scalable approach.
Startups and major labs are focusing on "world models," which simulate physical reality, cause, and effect. This is seen as the necessary step beyond text-based LLMs to create agents that can truly understand and interact with the physical world, a key step towards AGI.
Large language models are insufficient for tasks requiring real-world interaction and spatial understanding, like robotics or disaster response. World models provide this missing piece by generating interactive, reason-able 3D environments. They represent a foundational shift from language-based AI to a more holistic, spatially intelligent AI.
Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.
Large Language Models are limited because they lack an understanding of the physical world. The next evolution is 'World Models'—AI trained on real-world sensory data to understand physics, space, and context. This is the foundational technology required to unlock physical AI like advanced robotics.
By solving the core "intelligence" problem with a foundation model, the barrier to entry for creating novel robotic applications and form factors will dramatically decrease. This will enable a "Cambrian explosion" of hardware creativity, as builders will no longer need to solve AI from scratch for each new idea.