Startups and major labs are focusing on "world models," which simulate physical reality, cause, and effect. This is seen as the necessary step beyond text-based LLMs to create agents that can truly understand and interact with the physical world, a key step towards AGI.
Sora 2's most significant advancement is not its visual quality, but its ability to understand and simulate physics. The model accurately portrays how water splashes or vehicles kick up snow, demonstrating a grasp of cause and effect crucial for true world-building.
While LLMs dominate headlines, Dr. Fei-Fei Li argues that "spatial intelligence"—the ability to understand and interact with the 3D world—is the critical, underappreciated next step for AI. This capability is the linchpin for unlocking meaningful advances in robotics, design, and manufacturing.
Language is just one 'keyhole' into intelligence. True artificial general intelligence (AGI) requires 'world modeling'—a spatial intelligence that understands geometry, physics, and actions. This capability to represent and interact with the state of the world is the next critical phase of AI development beyond current language models.
Large language models are insufficient for tasks requiring real-world interaction and spatial understanding, like robotics or disaster response. World models provide this missing piece by generating interactive, reason-able 3D environments. They represent a foundational shift from language-based AI to a more holistic, spatially intelligent AI.
Today's AI models are powerful but lack a true sense of causality, leading to illogical errors. Unconventional AI's Naveen Rao hypothesizes that building AI on substrates with inherent time and dynamics—mimicking the physical world—is the key to developing this missing causal understanding.
The AI's ability to handle novel situations isn't just an emergent property of scale. Waive actively trains "world models," which are internal generative simulators. This enables the AI to reason about what might happen next, leading to sophisticated behaviors like nudging into intersections or slowing in fog.
World Labs co-founder Fei-Fei Li posits that spatial intelligence—the ability to reason and interact in 3D space—is a distinct and complementary form of intelligence to language. This capability is essential for tasks like robotic manipulation and scientific discovery that cannot be reduced to linguistic descriptions.
Meta's chief AI scientist, Yann LeCun, is reportedly leaving to start a company focused on "world models"—AI that learns from video and spatial data to understand cause-and-effect. He argues the industry's focus on LLMs is a dead end and that his alternative approach will become dominant within five years.
Dr. Fei-Fei Li, a leading AI scientist, believes world models are deeply underappreciated. The reason isn't a lack of vision but the sheer novelty and technical difficulty of the field. As the "next frontier of AI," it hasn't had time to mature or be understood by the broader market in the way that LLMs have.
Human intelligence is multifaceted. While LLMs excel at linguistic intelligence, they lack spatial intelligence—the ability to understand, reason, and interact within a 3D world. This capability, crucial for tasks from robotics to scientific discovery, is the focus for the next wave of AI models.