Waymo Augments Its End-to-End AI with Intermediate Representations for Simulation

Related Insights

Waymo’s Rapid Scaling Was Unlocked by a Unified AI Backbone

The move from Waymo's 4th to 5th generation driver was a discontinuous jump. Waymo abandoned smaller, specialized ML models for a single AI backbone trained on a massive, nationwide dataset. This generalizable stack, rather than city-specific tuning, enabled its recent rapid scaling across the US.

From Models to Mobility: Building Waymo with Dmitri Dolgov

The a16z Show·3 months ago

Off-the-Shelf Vision Language Models Can Learn to Drive Nominally

Waymo demonstrated that a standard Vision Language Model (VLM) can be fine-tuned to output driving trajectories instead of text. While unsafe for public roads, it drives 'pretty darn well' in normal conditions, showing the surprising generalizability of foundational vision-language understanding.

From Models to Mobility: Building Waymo with Dmitri Dolgov

The a16z Show·3 months ago

Waymo's AI Uses a Foundation Model to Train Specialized 'Teacher' Models

Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.

From Models to Mobility: Building Waymo with Dmitri Dolgov

The a16z Show·3 months ago

Autonomous Driving Has Shifted From Brittle "Rules-Based" Systems to Trainable AI Models

Rivian's CEO explains that early autonomous systems, which were based on rigid rules-based "planners," have been superseded by end-to-end AI. This new approach uses a large "foundation model for driving" that can improve continuously with more data, breaking through the performance plateau of the older method.

Rivian CEO: 'We're really convicted' about skipping Carplay

Decoder with Nilay Patel·9 months ago

Pure End-to-End AV Models Fail Due to Simulation and Validation Challenges

A pure "pixels-in, actions-out" model is insufficient for full autonomy. While easy to start, this approach is extremely inefficient to simulate and validate for safety-critical edge cases. Waymo augments its end-to-end system with intermediate representations (like objects and road signs) to make simulation and validation tractable.

The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo

Cheeky Pint·4 months ago

Waive Teaches its AI to Reason Using "World Models" that Simulate Future Scenarios

The AI's ability to handle novel situations isn't just an emergent property of scale. Waive actively trains "world models," which are internal generative simulators. This enables the AI to reason about what might happen next, leading to sophisticated behaviors like nudging into intersections or slowing in fog.

How End-to-End Learning Created Autonomous Driving 2.0: Wayve CEO Alex Kendall

Training Data·8 months ago

Waymo's AI Architecture Uses Off-Board 'Teachers' to Train On-Device 'Student' Models

Waymo uses a foundation model to create specialized, high-capacity "teacher" models (Driver, Simulator, Critic) offline. These teachers then distill their knowledge into smaller, efficient "student" models that can run in real-time on the vehicle, balancing massive computational power with on-device constraints.

The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo

Cheeky Pint·4 months ago

Flexion Sidesteps the Robotics 'Sim-to-Real' Gap by Training on Abstracted Environments

Instead of simulating photorealistic worlds, robotics firm Flexion trains its models on simplified, abstract representations. For example, it uses perception models like Segment Anything to 'paint' a door red and its handle green. By training on this simplified abstraction, the robot learns the core task (opening doors) in a way that generalizes across all real-world doors, bypassing the need for perfect simulation.

NVIDIA Beats Earnings, Google Launches Nano Banana Pro, 𝕏 Timeline Reactions | David Chang, Loredana Crisan, Tarek Alaruri, Tony Zhao, Nikita Rudin

TBPN·8 months ago

Comma AI Trains its Driving Agent in a Generative AI 'World Model'

Instead of using traditional, rule-based simulators, Comma AI trains its driving agent inside a learned "world model." This generative model creates photorealistic, diverse driving scenarios and, crucially, responds accurately to the agent's simulated actions—a key requirement for effective robotics training.

Open Source Self-Driving with Comma AI

Practical AI·3 months ago

Comma AI Skips Explicit Object Detection for a Direct End-to-End Driving Model

Comma AI's architecture is "end-to-end," meaning its model takes raw video and directly outputs driving commands like acceleration and steering angle. This avoids the traditional, more brittle pipeline of separately detecting lanes, traffic lights, and other objects as intermediate steps before planning a path.

Open Source Self-Driving with Comma AI

Practical AI·3 months ago

Get your free personalized podcast brief

Related Insights