We scan new podcasts and send you the top 5 insights daily.
The cutting edge of physical AI involves more than just programming a robot's response to a stimulus ("policy"). It also requires a "world capability"—a virtual twin that simulates and predicts outcomes, allowing the physical robot to choose intelligent actions based on those predictions.
Figure trains its robot's stability controller entirely in a physics simulator, akin to a video game. This allows them to test countless scenarios synthetically. The resulting AI model is so effective it can be 'zero-shot' deployed directly onto the physical robot, achieving human-level stability immediately.
While language models understand the world through text, Demis Hassabis argues they lack an intuitive grasp of physics and spatial dynamics. He sees 'world models'—simulations that understand cause and effect in the physical world—as the critical technology needed to advance AI from digital tasks to effective robotics.
Instead of reacting to its environment, ONE X's world model AI allows its robots to 'think' forward and simulate potential outcomes of an action. Like a human anticipating spilling hot coffee, the robot can identify risks and select the safest trajectory, which is critical for operating in a home.
Siemens discovered that standard virtual training for robots was insufficient for real-world application. The robot's accuracy only jumped to a usable level after they switched to a photorealistic digital twin using advanced ray-tracing, which more accurately modeled light and texture for the AI.
Startups and major labs are focusing on "world models," which simulate physical reality, cause, and effect. This is seen as the necessary step beyond text-based LLMs to create agents that can truly understand and interact with the physical world, a key step towards AGI.
Large language models are insufficient for tasks requiring real-world interaction and spatial understanding, like robotics or disaster response. World models provide this missing piece by generating interactive, reason-able 3D environments. They represent a foundational shift from language-based AI to a more holistic, spatially intelligent AI.
While often used interchangeably, 'Physical AI' is more specific than 'Edge AI.' Edge AI broadly concerns processing data locally. Physical AI refers to edge systems, like robots or autonomous vehicles, that not only sense and predict but also execute physical actions based on those predictions.
Large Language Models are limited because they lack an understanding of the physical world. The next evolution is 'World Models'—AI trained on real-world sensory data to understand physics, space, and context. This is the foundational technology required to unlock physical AI like advanced robotics.
Unlike pre-programmed industrial robots, "Physical AI" systems sense their environment, make intelligent choices, and receive live feedback. This paradigm shift, similar to Waymo's self-driving cars versus simple cruise control, allows for autonomous and adaptive scientific experimentation rather than just repetitive tasks.
Unlike older robots requiring precise maps and trajectory calculations, new robots use internet-scale common sense and learn motion by mimicking humans or simulations. This combination has “wiped the slate clean” for what is possible in the field.