Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A Waymo vehicle detected and reacted to a pedestrian completely occluded by a bus. The AI system achieved this by interpreting faint LiDAR reflections of the person's feet bouncing under the bus—a feat impossible for humans and a powerful demonstration of emergent capabilities.

Related Insights

The system demonstrates emergent capabilities beyond its explicit design. In one case, it detected a pedestrian obscured by a bus by interpreting extremely faint, noisy LiDAR signals that had bounced off the person's feet from underneath the bus, showcasing a profound level of environmental understanding.

Waymo demonstrated that a standard Vision Language Model (VLM) can be fine-tuned to output driving trajectories instead of text. While unsafe for public roads, it drives 'pretty darn well' in normal conditions, showing the surprising generalizability of foundational vision-language understanding.

The shift to AI makes multi-sensor arrays (including LiDAR) more valuable. Unlike older rules-based systems where data fusion was complex, AI models benefit directly from more diverse input data. This improves the training of the core driving model, making a multi-sensor approach with increasingly cheap LiDAR more beneficial.

After proving its robo-taxis are 90% safer than human drivers, Waymo is now making them more "confidently assertive" to better navigate real-world traffic. This counter-intuitive shift from passive safety to calculated aggression is a necessary step to improve efficiency and reduce delays, highlighting the trade-offs required for autonomous vehicle integration.

Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.

A pure 'pixels in, actions out' model is insufficient for full autonomy. Waymo augments its end-to-end learning with structured, intermediate representations (like objects and road concepts). This provides crucial knobs for scalable simulation, safety validation, and defining reward functions.

Waymo uses a foundation model to create specialized, high-capacity "teacher" models (Driver, Simulator, Critic) offline. These teachers then distill their knowledge into smaller, efficient "student" models that can run in real-time on the vehicle, balancing massive computational power with on-device constraints.

Autonomous systems can perceive and react to dangers beyond human capability. The example of a Cybertruck autonomously accelerating to lessen the impact of a potential high-speed rear-end collision—a car the human driver didn't even see—showcases a level of predictive safety that humans cannot replicate, moving beyond simple accident avoidance.

A Waymo car hiring a DoorDasher isn't just a funny anecdote; it's a sign that AI has moved beyond simple tasks. It can now understand disparate human-designed systems (like the gig economy), identify its own physical limitations, and strategically leverage those systems to achieve a goal.

The winning vehicle in the 2005 DARPA self-driving challenge, led by future Waymo founder Sebastian Thrun, used a clever machine learning approach. It overlaid precise laser sensor data onto a regular video camera feed, teaching the system to recognize the color and texture of "safe" terrain and extrapolate a drivable path far ahead.