Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A "vanilla" end-to-end model is insufficient for safety-critical systems. Waymo's foundation model is end-to-end but is augmented with "structured materialized intermediate representation." This allows for crucial runtime validation, richer training, and closed-loop evaluation necessary for superhuman performance at scale.

Related Insights

The move from Waymo's 4th to 5th generation driver was a discontinuous jump. Waymo abandoned smaller, specialized ML models for a single AI backbone trained on a massive, nationwide dataset. This generalizable stack, rather than city-specific tuning, enabled its recent rapid scaling across the US.

Unlike typical tech development that focuses on capabilities first, Waymo embeds safety as a "non-negotiable foundation" from the start. This means building safety into the model architecture and team mindset, as the approach to achieving 90% performance is fundamentally different from reaching the final "nines" of reliability.

To address safety concerns of an end-to-end "black box" self-driving AI, NVIDIA runs it in parallel with a traditional, transparent software stack. A "safety policy evaluator" then decides which system to trust at any moment, providing a fallback to a more predictable system in uncertain scenarios.

A Waymo vehicle detected and reacted to a pedestrian completely occluded by a bus. The AI system achieved this by interpreting faint LiDAR reflections of the person's feet bouncing under the bus—a feat impossible for humans and a powerful demonstration of emergent capabilities.

Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.

A pure 'pixels in, actions out' model is insufficient for full autonomy. Waymo augments its end-to-end learning with structured, intermediate representations (like objects and road concepts). This provides crucial knobs for scalable simulation, safety validation, and defining reward functions.

Rivian's CEO explains that early autonomous systems, which were based on rigid rules-based "planners," have been superseded by end-to-end AI. This new approach uses a large "foundation model for driving" that can improve continuously with more data, breaking through the performance plateau of the older method.

The transition from Gen 4 to Gen 5 was a discontinuous jump that enabled rapid expansion. Waymo made a "big bet on AI," replacing a system of many smaller, specialized ML models with a single, generalizable AI backbone. This new architecture, trained on diverse national data, was the key to scaling beyond specific pre-mapped areas.

A pure "pixels-in, actions-out" model is insufficient for full autonomy. While easy to start, this approach is extremely inefficient to simulate and validate for safety-critical edge cases. Waymo augments its end-to-end system with intermediate representations (like objects and road signs) to make simulation and validation tractable.

Waymo uses a foundation model to create specialized, high-capacity "teacher" models (Driver, Simulator, Critic) offline. These teachers then distill their knowledge into smaller, efficient "student" models that can run in real-time on the vehicle, balancing massive computational power with on-device constraints.

Waymo Augments End-to-End AI with Structured Representations for Real-World Safety | RiffOn