We scan new podcasts and send you the top 5 insights daily.
Dolgov shared a story where a Waymo vehicle reacted to a hidden pedestrian. The system's LIDAR captured sparse returns from the person's feet moving under a bus. This sliver of data was enough for the AI to not only detect the person but also predict their future path, demonstrating an emergent, superhuman capability.
The system demonstrates emergent capabilities beyond its explicit design. In one case, it detected a pedestrian obscured by a bus by interpreting extremely faint, noisy LiDAR signals that had bounced off the person's feet from underneath the bus, showcasing a profound level of environmental understanding.
The shift to AI makes multi-sensor arrays (including LiDAR) more valuable. Unlike older rules-based systems where data fusion was complex, AI models benefit directly from more diverse input data. This improves the training of the core driving model, making a multi-sensor approach with increasingly cheap LiDAR more beneficial.
A Waymo vehicle detected and reacted to a pedestrian completely occluded by a bus. The AI system achieved this by interpreting faint LiDAR reflections of the person's feet bouncing under the bus—a feat impossible for humans and a powerful demonstration of emergent capabilities.
Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.
A pure 'pixels in, actions out' model is insufficient for full autonomy. Waymo augments its end-to-end learning with structured, intermediate representations (like objects and road concepts). This provides crucial knobs for scalable simulation, safety validation, and defining reward functions.
The AI's ability to handle novel situations isn't just an emergent property of scale. Waive actively trains "world models," which are internal generative simulators. This enables the AI to reason about what might happen next, leading to sophisticated behaviors like nudging into intersections or slowing in fog.
A "vanilla" end-to-end model is insufficient for safety-critical systems. Waymo's foundation model is end-to-end but is augmented with "structured materialized intermediate representation." This allows for crucial runtime validation, richer training, and closed-loop evaluation necessary for superhuman performance at scale.
Autonomous systems can perceive and react to dangers beyond human capability. The example of a Cybertruck autonomously accelerating to lessen the impact of a potential high-speed rear-end collision—a car the human driver didn't even see—showcases a level of predictive safety that humans cannot replicate, moving beyond simple accident avoidance.
Dmitri Dolgov explains that while AI advancements create hype, they primarily speed up progress on the initial, easier parts of a problem. They don't change the "long tail" of complex, rare edge cases, which remains the core challenge in achieving full, superhuman autonomy.
The winning vehicle in the 2005 DARPA self-driving challenge, led by future Waymo founder Sebastian Thrun, used a clever machine learning approach. It overlaid precise laser sensor data onto a regular video camera feed, teaching the system to recognize the color and texture of "safe" terrain and extrapolate a drivable path far ahead.