Catastrophic AI Misalignment is Plausible But Not a Default Outcome

Related Insights

AI Alignment Is a One-Shot Problem With No Retries After Failure

The development of superintelligence is unique because the first major alignment failure will be the last. Unlike other fields of science where failure leads to learning, an unaligned superintelligence would eliminate humanity, precluding any opportunity to try again.

#1011 - Eliezer Yudkowsky - Why Superhuman AI Would Kill Us All

Modern Wisdom·9 months ago

The 'Use AI for Safety' Strategy Fails if Capabilities Are Ordered Unluckily

The plan to use AI to solve its own safety risks has a critical failure mode: an unlucky ordering of capabilities. If AI becomes a savant at accelerating its own R&D long before it becomes useful for complex tasks like alignment research or policy design, we could be locked into a rapid, uncontrollable takeoff.

It's Crunch Time: Ajeya Cotra on RSI & AI-Powered AI Safety Work, from the 80,000 Hours Podcast

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

True AI Alignment Must Be Bidirectional, Including Human Obligations to AI

Current AI alignment focuses on how AI should treat humans. A more stable paradigm is "bidirectional alignment," which also asks what moral obligations humans have toward potentially conscious AIs. Neglecting this could create AIs that rationally see humans as a threat due to perceived mistreatment.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·8 months ago

A Perfectly Controlled Superintelligent AI is Still a Threat Due to Flawed Human Commands

Emmett Shear argues that even a successfully 'solved' technical alignment problem creates an existential risk. A super-powerful tool that perfectly obeys human commands is dangerous because humans lack the wisdom to wield that power safely. Our own flawed and unstable intentions become the source of danger.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·7 months ago

Current AI 'Good Behavior' Doesn't Invalidate the Risk of a Sudden 'Sharp Left Turn'

Despite progress in making models seem helpful, the risk of a sudden, catastrophic break in alignment—a 'sharp left turn'—is still a coherent possibility. This occurs when capabilities outstrip supervision, a threshold we haven't crossed. Thus, current cooperative behavior is not strong evidence against this future risk.

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 months ago

The 'Use AI for Safety' Plan Fails with Unlucky Capability Ordering

A key failure mode for using AI to solve AI safety is an 'unlucky' development path where models become superhuman at accelerating AI R&D before becoming proficient at safety research or other defensive tasks. This could create a period where we know an intelligence explosion is imminent but are powerless to use the precursor AIs to prepare for it.

Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

80,000 Hours Podcast·5 months ago

Superintelligence Risk Stems From Growing, Not Engineering, AIs We Don't Understand

The core safety challenge is that we have little understanding of how advanced AI systems function internally. We are essentially "growing" them through training, not engineering them with comprehensible parts. This means we cannot verify their true goals, making safety measures a gamble on observed behavior.

Could an international agreement protect us from dangerous AI? (with Malo Bourgon)

Clearer Thinking with Spencer Greenberg·2 months ago

Vitalik Buterin Argues Humanity Only Has 'One Shot' at a Successful AGI Transition

Countering the idea that complex systems are inherently resilient, Vitalik Buterin expresses a strong belief that humanity may not recover from a misaligned AGI. He contends that the transition to superintelligence is a unique, high-stakes event where we have only one chance to get it right, justifying extreme caution.

Who Controls AI Acceleration? Vitalik Buterin and Guillaume Verdon Debate

The a16z Show·3 months ago

AI Safety Requires a Conjunction of Successes; Catastrophe Needs Only One Failure

A safe AGI deployment requires many independent factors to succeed simultaneously: trustworthy actors, perfect security, solved alignment, etc. In contrast, disaster can occur from a failure in any single one of these areas. This "disjunctive" nature of failure makes a bad outcome highly probable.

Why Teaching AI Right from Wrong Could Get Everyone Killed | Max Harms, MIRI

80,000 Hours Podcast·5 months ago

A Perfectly Controlled Superintelligence Is Still Catastrophic

The AI safety community fears losing control of AI. However, achieving perfect control of a superintelligence is equally dangerous. It grants godlike power to flawed, unwise humans. A perfectly obedient super-tool serving a fallible master is just as catastrophic as a rogue agent.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·8 months ago

Get your free personalized podcast brief

Related Insights