Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.
A Waymo vehicle detected and reacted to a pedestrian completely occluded by a bus. The AI system achieved this by interpreting faint LiDAR reflections of the person's feet bouncing under the bus—a feat impossible for humans and a powerful demonstration of emergent capabilities.
Waymo's CEO argues it is a deceptive assumption that Level 2/3 driver-assist systems exist on a continuous spectrum with Level 4/5 full autonomy. The hardest parts of building a 'rider only' system are fundamentally different, requiring a qualitative jump in technology.
Waymo alternates major upgrades between hardware and software. Its 6th generation system introduces a custom vehicle and a cheaper, simpler sensor stack, but runs largely the same software as the 5th generation. This demonstrates software generalizability and de-risks the launch of new hardware.
A pure 'pixels in, actions out' model is insufficient for full autonomy. Waymo augments its end-to-end learning with structured, intermediate representations (like objects and road concepts). This provides crucial knobs for scalable simulation, safety validation, and defining reward functions.
Waymo demonstrated that a standard Vision Language Model (VLM) can be fine-tuned to output driving trajectories instead of text. While unsafe for public roads, it drives 'pretty darn well' in normal conditions, showing the surprising generalizability of foundational vision-language understanding.
The move from Waymo's 4th to 5th generation driver was a discontinuous jump. Waymo abandoned smaller, specialized ML models for a single AI backbone trained on a massive, nationwide dataset. This generalizable stack, rather than city-specific tuning, enabled its recent rapid scaling across the US.
While safety-critical driving inference happens locally, Waymo leverages the cloud for operational tasks. After a ride, an off-board model analyzes the interior to check if a passenger left an item or if the car needs cleaning, which helps optimize fleet management without burdening the in-car compute.
