Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

While safety-critical driving inference happens locally, Waymo leverages the cloud for operational tasks. After a ride, an off-board model analyzes the interior to check if a passenger left an item or if the car needs cleaning, which helps optimize fleet management without burdening the in-car compute.

Related Insights

The move from Waymo's 4th to 5th generation driver was a discontinuous jump. Waymo abandoned smaller, specialized ML models for a single AI backbone trained on a massive, nationwide dataset. This generalizable stack, rather than city-specific tuning, enabled its recent rapid scaling across the US.

Waymo demonstrated that a standard Vision Language Model (VLM) can be fine-tuned to output driving trajectories instead of text. While unsafe for public roads, it drives 'pretty darn well' in normal conditions, showing the surprising generalizability of foundational vision-language understanding.

Waymo alternates major upgrades between hardware and software. Its 6th generation system introduces a custom vehicle and a cheaper, simpler sensor stack, but runs largely the same software as the 5th generation. This demonstrates software generalizability and de-risks the launch of new hardware.

The seamless experience of an autonomous vehicle hides a complex backend. A subsidiary company, FlexDrive, manages a fleet for services like cleaning, charging, maintenance, and teleoperation. This "fleet management" layer represents a significant, often overlooked, part of the AV value chain and business model.

According to its co-CEO, Waymo has moved beyond fundamental research and development. The company believes its core technology is sufficient to handle all aspects of driving. The current work is an engineering challenge of specialization, validation, and data collection for new environments like London, signaling a shift to commercial deployment.

Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.

All critical, real-time driving inference happens locally on the vehicle. The cloud's role is for non-time-sensitive, operational tasks that enhance the customer experience. For example, after a ride, the car can use an off-board cloud model to check for forgotten items or determine if it needs cleaning.

Unlike traditional fleet management focused on maximizing vehicle utilization ('butts in seats'), AV fleet management prioritizes safety with airline-like rigor. This includes meticulous logging of every repair (e.g., torque values on lug nuts) and sophisticated matching of fleet supply to real-time rider demand.

Waymo uses a foundation model to create specialized, high-capacity "teacher" models (Driver, Simulator, Critic) offline. These teachers then distill their knowledge into smaller, efficient "student" models that can run in real-time on the vehicle, balancing massive computational power with on-device constraints.

Instead of streaming all data, Samsara runs inference on low-power cameras. They train large models in the cloud and then "distill" them into smaller, specialized models that can run efficiently at the edge, focusing only on relevant tasks like risk detection.