Waymo uses a foundation model to create specialized, high-capacity "teacher" models (Driver, Simulator, Critic) offline. These teachers then distill their knowledge into smaller, efficient "student" models that can run in real-time on the vehicle, balancing massive computational power with on-device constraints.
Waymo's co-CEO argues that Level 4/5 autonomy will not emerge by incrementally improving Level 2/3 driver-assist systems. The hardest challenges of operating without a human driver are entirely absent in assist systems, requiring a "qualitative jump" and a completely different approach from the outset.
Beyond basic navigation, the most nuanced challenge for AVs is mastering pickups and drop-offs. The system must understand complex social context, like when it is acceptable to briefly double-park or how to avoid blocking a driveway, which is a more subtle problem than structured highway driving.
Major AI breakthroughs like Transformers accelerate initial progress but are not silver bullets for the safety-critical long tail. The nature of the problem is that getting a prototype working is relatively easy, but achieving the final "nines" of reliability is incredibly difficult, justifying Google's early, multi-decade investment.
A pure "pixels-in, actions-out" model is insufficient for full autonomy. While easy to start, this approach is extremely inefficient to simulate and validate for safety-critical edge cases. Waymo augments its end-to-end system with intermediate representations (like objects and road signs) to make simulation and validation tractable.
The system demonstrates emergent capabilities beyond its explicit design. In one case, it detected a pedestrian obscured by a bus by interpreting extremely faint, noisy LiDAR signals that had bounced off the person's feet from underneath the bus, showcasing a profound level of environmental understanding.
All critical, real-time driving inference happens locally on the vehicle. The cloud's role is for non-time-sensitive, operational tasks that enhance the customer experience. For example, after a ride, the car can use an off-board cloud model to check for forgotten items or determine if it needs cleaning.
According to its co-CEO, Waymo has moved beyond fundamental research and development. The company believes its core technology is sufficient to handle all aspects of driving. The current work is an engineering challenge of specialization, validation, and data collection for new environments like London, signaling a shift to commercial deployment.
Waymo decouples major hardware and software upgrades. Its 6th generation platform introduces a new custom vehicle and a cheaper, simpler sensor stack, but runs the same proven 5th generation software. This "tick-tock" approach allows them to validate a new hardware platform while relying on a mature, generalizable software stack.
The transition from Gen 4 to Gen 5 was a discontinuous jump that enabled rapid expansion. Waymo made a "big bet on AI," replacing a system of many smaller, specialized ML models with a single, generalizable AI backbone. This new architecture, trained on diverse national data, was the key to scaling beyond specific pre-mapped areas.
