The key difference between AV 1.0 and AV 2.0 isn't just using deep learning. Many legacy systems use DL for individual components like perception. The revolutionary AV 2.0 approach replaces the entire modular stack and its hand-coded interfaces with one unified, data-driven neural network.
Overly structured, workflow-based systems that work with today's models will become bottlenecks tomorrow. Engineers must be prepared to shed abstractions and rebuild simpler, more general systems to capture the gains from exponentially improving models.
Don't just sprinkle AI features onto your existing product ('AI at the edge'). Transformative companies rethink workflows and shrink their old codebase, making the LLM a core part of the solution. This is about re-architecting the solution from the ground up, not just enhancing it.
Today's AI is largely text-based (LLMs). The next phase involves Visual Language Models (VLMs) that interpret and interact with the physical world for robotics and surgery. This transition requires an exponential, 50-1000x increase in compute power, underwriting the long-term AI infrastructure build-out.
The latest Full Self-Driving version likely eliminates traditional `if-then` coding for a pure neural network. This leap in performance comes at the cost of human auditability, as no one can truly understand *how* the AI makes its life-or-death decisions, marking a profound shift in software.
Rivian's CEO explains that early autonomous systems, which were based on rigid rules-based "planners," have been superseded by end-to-end AI. This new approach uses a large "foundation model for driving" that can improve continuously with more data, breaking through the performance plateau of the older method.
Waive integrates Vision-Language-Action models (VLAs) to create a conversational interface for the car. This allows users to talk to the AI chauffeur ("drive faster") and provides engineers with a powerful introspection tool to ask the system why it made a certain decision, demystifying its reasoning.
Incumbents face the innovator's dilemma; they can't afford to scrap existing infrastructure for AI. Startups can build "AI-native" from a clean sheet, creating a fundamental advantage that legacy players can't replicate by just bolting on features.
Waive's core strategy is generalization. By training a single, large AI on diverse global data, vehicles, and sensor sets, they can adapt to new cars and countries in months, not years. This avoids the AV 1.0 pitfall of building bespoke, infrastructure-heavy solutions for each new market.
IBM's CEO explains that previous deep learning models were "bespoke and fragile," requiring massive, costly human labeling for single tasks. LLMs are an industrial-scale unlock because they eliminate this labeling step, making them vastly faster and cheaper to tune and deploy across many tasks.
Unlike older robots requiring precise maps and trajectory calculations, new robots use internet-scale common sense and learn motion by mimicking humans or simulations. This combination has “wiped the slate clean” for what is possible in the field.