We scan new podcasts and send you the top 5 insights daily.
With nearly a quarter-trillion annual car trips in the US, even a system with 99.9% accuracy would generate tens of millions of incorrect results. This would predominantly affect sober drivers, creating significant public frustration and logistical nightmares that could hinder adoption.
The law mandating advanced drunk driving prevention in new cars allows for delays. The National Highway Traffic Safety Administration (NHTSA) will only issue a binding mandate when the technology is proven ready, which it currently is not, making the 2027 date a soft target.
Traditional vehicle safety (e.g., Euro NCAP) used a checklist of specific test cases with binary pass/fail answers. For AI systems, this is insufficient. The new paradigm is statistical validation, where the goal is to prove reliability to a certain number of "nines" across a vast range of scenarios.
While more data seems better, comprehensive imaging scans can be problematic. Each measurement carries a false positive risk, so the cumulative probability of receiving a disruptive, incorrect result becomes material, leading to unnecessary stress and follow-up procedures.
A key risk in deploying AI is its inability to generalize to 'long-tail' or out-of-distribution events. Models trained on vast but finite data often fail when encountering novel situations common in the open-ended real world, such as a self-driving car mistaking a stop sign on a billboard for a real one.
Even with available AI detection software, professors are hesitant to take punitive action like failing a student. The risk of even a small number of false positives is too high, making anything less than perfect reliability unusable for accountability.
Drawing from his Tesla experience, Karpathy warns of a massive "demo-to-product gap" in AI. Getting a demo to work 90% of the time is easy. But achieving the reliability needed for a real product is a "march of nines," where each additional 9 of accuracy requires a constant, enormous effort, explaining long development timelines.
A technology like Waymo's self-driving cars could be statistically safer than human drivers yet still be rejected by the public. Society is unwilling to accept thousands of deaths directly caused by a single corporate algorithm, even if it represents a net improvement over the chaotic, decentralized risk of human drivers.
A pre-drive lockout system, while well-intentioned, fails to account for nuanced emergencies. For instance, it could prevent a driver who has had alcohol from evacuating during a tsunami warning, raising serious ethical and safety questions about rigid, automated decision-making.
Achieving near-perfect AV reliability (99.999%) is exponentially harder than getting to 99%. This final push involves solving countless subtle, city-specific issues, from differing traffic light colors and curb heights to unique local sounds like emergency sirens, which vehicles must recognize.
The public holds new technologies to a much higher safety standard than human performance. Waymo could deploy cars that are statistically safer than human drivers, but society would not accept them killing tens of thousands of people annually, even if it's an improvement. This demonstrates the need for near-perfection in high-stakes tech launches.