Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.
A Waymo vehicle detected and reacted to a pedestrian completely occluded by a bus. The AI system achieved this by interpreting faint LiDAR reflections of the person's feet bouncing under the bus—a feat impossible for humans and a powerful demonstration of emergent capabilities.
Waymo's CEO argues it is a deceptive assumption that Level 2/3 driver-assist systems exist on a continuous spectrum with Level 4/5 full autonomy. The hardest parts of building a 'rider only' system are fundamentally different, requiring a qualitative jump in technology.
A pure 'pixels in, actions out' model is insufficient for full autonomy. Waymo augments its end-to-end learning with structured, intermediate representations (like objects and road concepts). This provides crucial knobs for scalable simulation, safety validation, and defining reward functions.
Waymo alternates major upgrades between hardware and software. Its 6th generation system introduces a custom vehicle and a cheaper, simpler sensor stack, but runs largely the same software as the 5th generation. This demonstrates software generalizability and de-risks the launch of new hardware.
Waymo demonstrated that a standard Vision Language Model (VLM) can be fine-tuned to output driving trajectories instead of text. While unsafe for public roads, it drives 'pretty darn well' in normal conditions, showing the surprising generalizability of foundational vision-language understanding.
The move from Waymo's 4th to 5th generation driver was a discontinuous jump. Waymo abandoned smaller, specialized ML models for a single AI backbone trained on a massive, nationwide dataset. This generalizable stack, rather than city-specific tuning, enabled its recent rapid scaling across the US.
While safety-critical driving inference happens locally, Waymo leverages the cloud for operational tasks. After a ride, an off-board model analyzes the interior to check if a passenger left an item or if the car needs cleaning, which helps optimize fleet management without burdening the in-car compute.
