Off-the-Shelf Vision Language Models Can Learn to Drive Nominally

Related Insights

Waymo’s Rapid Scaling Was Unlocked by a Unified AI Backbone

The move from Waymo's 4th to 5th generation driver was a discontinuous jump. Waymo abandoned smaller, specialized ML models for a single AI backbone trained on a massive, nationwide dataset. This generalizable stack, rather than city-specific tuning, enabled its recent rapid scaling across the US.

From Models to Mobility: Building Waymo with Dmitri Dolgov

The a16z Show·3 months ago

Vision-Language-Action (VLA) Models Are an Emerging S-Curve for Robotics

A key trend to watch is the rise of Vision-Language-Action (VLA) models, which are critical for robotics. These models take an instruction (language), understand a scene (vision), and then manipulate the environment (action). This represents a new paradigm that combines "read" and "write" access to the physical world, often requiring edge-ready compute.

Training the AIs' Eyes: How Roboflow is Making the Real World Programmable, with CEO Joseph Nelson

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

Waymo's AI Uses a Foundation Model to Train Specialized 'Teacher' Models

Waymo’s system starts with a large, off-board foundation model understanding the physical world. This is specialized into three 'teacher' models: the Driver, the Simulator, and the Critic. These teachers then train smaller, efficient 'student' models that run in the vehicle.

From Models to Mobility: Building Waymo with Dmitri Dolgov

The a16z Show·3 months ago

Waymo Augments Its End-to-End AI with Intermediate Representations for Simulation

A pure 'pixels in, actions out' model is insufficient for full autonomy. Waymo augments its end-to-end learning with structured, intermediate representations (like objects and road concepts). This provides crucial knobs for scalable simulation, safety validation, and defining reward functions.

From Models to Mobility: Building Waymo with Dmitri Dolgov

The a16z Show·3 months ago

Autonomous Driving Has Shifted From Brittle "Rules-Based" Systems to Trainable AI Models

Rivian's CEO explains that early autonomous systems, which were based on rigid rules-based "planners," have been superseded by end-to-end AI. This new approach uses a large "foundation model for driving" that can improve continuously with more data, breaking through the performance plateau of the older method.

Rivian CEO: 'We're really convicted' about skipping Carplay

Decoder with Nilay Patel·9 months ago

Pure End-to-End AV Models Fail Due to Simulation and Validation Challenges

A pure "pixels-in, actions-out" model is insufficient for full autonomy. While easy to start, this approach is extremely inefficient to simulate and validate for safety-critical edge cases. Waymo augments its end-to-end system with intermediate representations (like objects and road signs) to make simulation and validation tractable.

The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo

Cheeky Pint·4 months ago

Waive Uses Language Models Not Just for Driving, But for User Interaction and Diagnostics

Waive integrates Vision-Language-Action models (VLAs) to create a conversational interface for the car. This allows users to talk to the AI chauffeur ("drive faster") and provides engineers with a powerful introspection tool to ask the system why it made a certain decision, demystifying its reasoning.

How End-to-End Learning Created Autonomous Driving 2.0: Wayve CEO Alex Kendall

Training Data·8 months ago

Waymo's AI Architecture Uses Off-Board 'Teachers' to Train On-Device 'Student' Models

Waymo uses a foundation model to create specialized, high-capacity "teacher" models (Driver, Simulator, Critic) offline. These teachers then distill their knowledge into smaller, efficient "student" models that can run in real-time on the vehicle, balancing massive computational power with on-device constraints.

The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo

Cheeky Pint·4 months ago

Sebastian Thrun's 2005 DARPA-Winning Car Used Machine Learning to See Safe Paths

The winning vehicle in the 2005 DARPA self-driving challenge, led by future Waymo founder Sebastian Thrun, used a clever machine learning approach. It overlaid precise laser sensor data onto a regular video camera feed, teaching the system to recognize the color and texture of "safe" terrain and extrapolate a drivable path far ahead.

Google: The AI Company

Acquired·9 months ago

Comma AI Skips Explicit Object Detection for a Direct End-to-End Driving Model

Comma AI's architecture is "end-to-end," meaning its model takes raw video and directly outputs driving commands like acceleration and steering angle. This avoids the traditional, more brittle pipeline of separately detecting lanes, traffic lights, and other objects as intermediate steps before planning a path.

Open Source Self-Driving with Comma AI

Practical AI·3 months ago

Get your free personalized podcast brief

Related Insights