Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Waymo uses a foundation model to create specialized, high-capacity "teacher" models (Driver, Simulator, Critic) offline. These teachers then distill their knowledge into smaller, efficient "student" models that can run in real-time on the vehicle, balancing massive computational power with on-device constraints.

Related Insights

The system demonstrates emergent capabilities beyond its explicit design. In one case, it detected a pedestrian obscured by a bus by interpreting extremely faint, noisy LiDAR signals that had bounced off the person's feet from underneath the bus, showcasing a profound level of environmental understanding.

Advanced AI architectures will use small, fast, and cheap local models to act as intelligent routers. These models will first analyze a complex request, formulate a plan, and then delegate different sub-tasks to a fleet of more powerful or specialized models, optimizing for cost and performance.

Rivian's CEO explains that early autonomous systems, which were based on rigid rules-based "planners," have been superseded by end-to-end AI. This new approach uses a large "foundation model for driving" that can improve continuously with more data, breaking through the performance plateau of the older method.

The transition from Gen 4 to Gen 5 was a discontinuous jump that enabled rapid expansion. Waymo made a "big bet on AI," replacing a system of many smaller, specialized ML models with a single, generalizable AI backbone. This new architecture, trained on diverse national data, was the key to scaling beyond specific pre-mapped areas.

A pure "pixels-in, actions-out" model is insufficient for full autonomy. While easy to start, this approach is extremely inefficient to simulate and validate for safety-critical edge cases. Waymo augments its end-to-end system with intermediate representations (like objects and road signs) to make simulation and validation tractable.

The AI's ability to handle novel situations isn't just an emergent property of scale. Waive actively trains "world models," which are internal generative simulators. This enables the AI to reason about what might happen next, leading to sophisticated behaviors like nudging into intersections or slowing in fog.

Initially criticized for forgoing expensive LIDAR, Tesla's vision-based self-driving system compelled it to solve the harder, more scalable problem of AI-based reasoning. This long-term bet on foundation models for driving is now converging with the direction competitors are also taking.

A fundamental constraint today is that the model architecture used for training must be the same as the one used for inference. Future breakthroughs could come from lifting this constraint. This would allow for specialized models: one optimized for compute-intensive training and another for memory-intensive serving.

Waive's core strategy is generalization. By training a single, large AI on diverse global data, vehicles, and sensor sets, they can adapt to new cars and countries in months, not years. This avoids the AV 1.0 pitfall of building bespoke, infrastructure-heavy solutions for each new market.

The winning vehicle in the 2005 DARPA self-driving challenge, led by future Waymo founder Sebastian Thrun, used a clever machine learning approach. It overlaid precise laser sensor data onto a regular video camera feed, teaching the system to recognize the color and texture of "safe" terrain and extrapolate a drivable path far ahead.