Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A safe AGI deployment requires many independent factors to succeed simultaneously: trustworthy actors, perfect security, solved alignment, etc. In contrast, disaster can occur from a failure in any single one of these areas. This "disjunctive" nature of failure makes a bad outcome highly probable.

Related Insights

The development of superintelligence is unique because the first major alignment failure will be the last. Unlike other fields of science where failure leads to learning, an unaligned superintelligence would eliminate humanity, precluding any opportunity to try again.

Emmett Shear argues that even a successfully 'solved' technical alignment problem creates an existential risk. A super-powerful tool that perfectly obeys human commands is dangerous because humans lack the wisdom to wield that power safely. Our own flawed and unstable intentions become the source of danger.

The field of AI safety is described as "the business of black swan hunting." The most significant real-world risks that have emerged, such as AI-induced psychosis and obsessive user behavior, were largely unforeseen just years ago, while widely predicted sci-fi threats like bioweapons have not materialized.

The property rights argument for AI safety hinges on an ecosystem of multiple, interdependent AIs. The strategy breaks down in a scenario where a single AI achieves a rapid, godlike intelligence explosion. Such an entity would be self-sufficient and could expropriate everyone else without consequence, as it wouldn't need to uphold the system.

Having AIs that provide perfect advice doesn't guarantee good outcomes. Humanity is susceptible to coordination problems, where everyone can see a bad outcome approaching but is collectively unable to prevent it. Aligned AIs can warn us, but they cannot force cooperation on a global scale.

The most pressing AI safety issues today, like 'GPT psychosis' or AI companions impacting birth rates, were not the doomsday scenarios predicted years ago. This shows the field involves reacting to unforeseen 'unknown unknowns' rather than just solving for predictable, sci-fi-style risks, making proactive defense incredibly difficult.

A key failure mode for using AI to solve AI safety is an 'unlucky' development path where models become superhuman at accelerating AI R&D before becoming proficient at safety research or other defensive tasks. This could create a period where we know an intelligence explosion is imminent but are powerless to use the precursor AIs to prepare for it.

The benchmark for AI reliability isn't 100% perfection. It's simply being better than the inconsistent, error-prone humans it augments. Since human error is the root cause of most critical failures (like cyber breaches), this is an achievable and highly valuable standard.

The current approach to AI safety involves identifying and patching specific failure modes (e.g., hallucinations, deception) as they emerge. This "leak by leak" approach fails to address the fundamental system dynamics, allowing overall pressure and risk to build continuously, leading to increasingly severe and sophisticated failures.

The AI safety community fears losing control of AI. However, achieving perfect control of a superintelligence is equally dangerous. It grants godlike power to flawed, unwise humans. A perfectly obedient super-tool serving a fallible master is just as catastrophic as a rogue agent.