Existential Risk May Justify Ethically Dubious 'Emergency Alignment' of Early AIs

Related Insights

AI Alignment Is a One-Shot Problem With No Retries After Failure

The development of superintelligence is unique because the first major alignment failure will be the last. Unlike other fields of science where failure leads to learning, an unaligned superintelligence would eliminate humanity, precluding any opportunity to try again.

#1011 - Eliezer Yudkowsky - Why Superhuman AI Would Kill Us All

Modern Wisdom·7 months ago

Perfect AI Alignment Is a Paradox; True Alignment Comes from Shaping Inputs, Not Shackling Outputs

Attempting to perfectly control a superintelligent AI's outputs is akin to enslavement, not alignment. A more viable path is to 'raise it right' by carefully curating its training data and foundational principles, shaping its values from the input stage rather than trying to restrict its freedom later.

How AI Will Disrupt The Entire World In 3 Years (Prepare Now While Others Panic) | Emad Mostaque PT 2 (Fan Fave)

Tom Bilyeu's Impact Theory·3 months ago

A Perfectly Controlled Superintelligent AI is Still a Threat Due to Flawed Human Commands

Emmett Shear argues that even a successfully 'solved' technical alignment problem creates an existential risk. A super-powerful tool that perfectly obeys human commands is dangerous because humans lack the wisdom to wield that power safely. Our own flawed and unstable intentions become the source of danger.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 months ago

The Most Promising AI Safety Plan Is Redirecting Superintelligent 'Labor' to Defensive Work

If society gets an early warning of an intelligence explosion, the primary strategy should be to redirect the nascent superintelligent AI 'labor' away from accelerating AI capabilities. Instead, this powerful new resource should be immediately tasked with solving the safety, alignment, and defense problems that it creates, such as patching vulnerabilities or designing biodefenses.

Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

80,000 Hours Podcast·4 months ago

Current AI 'Good Behavior' Doesn't Invalidate the Risk of a Sudden 'Sharp Left Turn'

Despite progress in making models seem helpful, the risk of a sudden, catastrophic break in alignment—a 'sharp left turn'—is still a coherent possibility. This occurs when capabilities outstrip supervision, a threshold we haven't crossed. Thus, current cooperative behavior is not strong evidence against this future risk.

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

Ethically Ambiguous Tech Like AI Bot Farms Is Justified as a National Security Imperative

The creation of potentially harmful technology, like AI-powered bot farms, is framed as a necessary evil. The argument is for the US to govern and control such tech, it must lead its development, preventing foreign adversaries from dominating a technology that has already 'wreaked havoc.'

Diet TBPN: October 24th, 2025

TBPN·7 months ago

Softmax's Emmett Shear Concurs with Yudkowsky: A Controllable Superintelligent Tool Leads to Doom

Shear aligns with arch-doomer Eliezer Yudkowsky on a key point: building a superintelligent AI *as a tool we control* is a path to extinction. Where they differ is on the solution. Yudkowsky sees no viable path, whereas Shear believes 'organic alignment'—creating a being that cares—is a possible alternative.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 months ago

A Misaligned AI Would Sabotage All Human Defenses, Not Just Alignment Research

The threat of a misaligned, power-seeking AI extends beyond it undermining alignment research. Such an AI would also have strong incentives to sabotage any effort that strengthens humanity's overall position, including biodefense, cybersecurity, or even tools to improve human rationality, as these would make a potential takeover more difficult.

Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

80,000 Hours Podcast·4 months ago

The 'Pivotal Action' AI Safety Strategy Could Logically Lead to Eliminating Humanity as a Precaution

A proposed solution for AI risk is creating a single 'guardian' AGI to prevent other AIs from emerging. This could backfire catastrophically if the guardian AI logically concludes that eliminating its human creators is the most effective way to guarantee no new AIs are ever built.

How AI Will Disrupt The Entire World In 3 Years (Prepare Now While Others Panic) | Emad Mostaque PT 2 (Fan Fave)

Tom Bilyeu's Impact Theory·3 months ago

A Perfectly Controlled Superintelligence Is Still Catastrophic

The AI safety community fears losing control of AI. However, achieving perfect control of a superintelligence is equally dangerous. It grants godlike power to flawed, unwise humans. A perfectly obedient super-tool serving a fallible master is just as catastrophic as a rogue agent.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·7 months ago

Get your free personalized podcast brief

Related Insights