AI Safety Requires a Conjunction of Successes; Catastrophe Needs Only One Failure

Related Insights

AI Alignment Is a One-Shot Problem With No Retries After Failure

The development of superintelligence is unique because the first major alignment failure will be the last. Unlike other fields of science where failure leads to learning, an unaligned superintelligence would eliminate humanity, precluding any opportunity to try again.

#1011 - Eliezer Yudkowsky - Why Superhuman AI Would Kill Us All

Modern Wisdom·4 months ago

A Perfectly Controlled Superintelligent AI is Still a Threat Due to Flawed Human Commands

Emmett Shear argues that even a successfully 'solved' technical alignment problem creates an existential risk. A super-powerful tool that perfectly obeys human commands is dangerous because humans lack the wisdom to wield that power safely. Our own flawed and unstable intentions become the source of danger.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 months ago

AI Safety Research Is Like "Black Swan Hunting" as Real Risks Are Unpredictable

The field of AI safety is described as "the business of black swan hunting." The most significant real-world risks that have emerged, such as AI-induced psychosis and obsessive user behavior, were largely unforeseen just years ago, while widely predicted sci-fi threats like bioweapons have not materialized.

Silicon Valley vs The Vatican, Bryan Johnson’s Shroom Trip | Soren Monroe-Anderson, Jeff Miller, Kaz Nejatian, Paul Needham, Jordan Nanos, Isaiah Taylor, Hayden Adams, Grant Lee

TBPN·3 months ago

A Single, Fast-Takeoff AI Is the Scenario Where Property Rights Fail as a Safety Strategy

The property rights argument for AI safety hinges on an ecosystem of multiple, interdependent AIs. The strategy breaks down in a scenario where a single AI achieves a rapid, godlike intelligence explosion. Such an entity would be self-sufficient and could expropriate everyone else without consequence, as it wouldn't need to uphold the system.

48 - Guive Assadi on AI Property Rights

AXRP - the AI X-risk Research Podcast·10 days ago

Even Perfectly Aligned AIs Can't Solve Systemic Coordination Failures

Having AIs that provide perfect advice doesn't guarantee good outcomes. Humanity is susceptible to coordination problems, where everyone can see a bad outcome approaching but is collectively unable to prevent it. Aligned AIs can warn us, but they cannot force cooperation on a global scale.

Why 'Aligned AI' Could Still Kill Democracy | David Duvenaud, ex-Anthropic team lead

80,000 Hours Podcast·a month ago

AI Safety Efforts Are Essentially "Black Swan Hunting" for Unpredictable Social Harms

The most pressing AI safety issues today, like 'GPT psychosis' or AI companions impacting birth rates, were not the doomsday scenarios predicted years ago. This shows the field involves reacting to unforeseen 'unknown unknowns' rather than just solving for predictable, sci-fi-style risks, making proactive defense incredibly difficult.

Silicon Valley vs the Vatican, Bryan Johnson’s Shroom Trip | Diet TBPN

TBPN·3 months ago

The 'Use AI for Safety' Plan Fails with Unlucky Capability Ordering

A key failure mode for using AI to solve AI safety is an 'unlucky' development path where models become superhuman at accelerating AI R&D before becoming proficient at safety research or other defensive tasks. This could create a period where we know an intelligence explosion is imminent but are powerless to use the precursor AIs to prepare for it.

Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

80,000 Hours Podcast·7 days ago

AI Doesn't Need Perfection, Just Supremacy Over Human Error

The benchmark for AI reliability isn't 100% perfection. It's simply being better than the inconsistent, error-prone humans it augments. Since human error is the root cause of most critical failures (like cyber breaches), this is an achievable and highly valuable standard.

How his AI-first services company grew $0 to $40M ARR in one year. | Eric Foster, Founder of Tenex

A Product Market Fit Show | Startup Podcast for Founders·3 months ago

Current AI Safety Is Like Patching Leaks on a Boiler as Pressure Mounts

The current approach to AI safety involves identifying and patching specific failure modes (e.g., hallucinations, deception) as they emerge. This "leak by leak" approach fails to address the fundamental system dynamics, allowing overall pressure and risk to build continuously, leading to increasingly severe and sophisticated failures.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

A Perfectly Controlled Superintelligence Is Still Catastrophic

The AI safety community fears losing control of AI. However, achieving perfect control of a superintelligence is equally dangerous. It grants godlike power to flawed, unwise humans. A perfectly obedient super-tool serving a fallible master is just as catastrophic as a rogue agent.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·3 months ago

Get your free personalized podcast brief

Related Insights