Having AIs that provide perfect advice doesn't guarantee good outcomes. Humanity is susceptible to coordination problems, where everyone can see a bad outcome approaching but is collectively unable to prevent it. Aligned AIs can warn us, but they cannot force cooperation on a global scale.

Related Insights

Pairing two AI agents to collaborate often fails. Because they share the same underlying model, they tend to agree excessively, reinforcing each other's bad ideas. This creates a feedback loop that fills their context windows with biased agreement, making them resistant to correction and prone to escalating extremism.

Emmett Shear argues that an AI that merely follows rules, even perfectly, is a danger. Malicious actors can exploit this, and rules cannot cover all unforeseen circumstances. True safety and alignment can only be achieved by building AIs that have the capacity for genuine care and pro-social motivation.

Even if AI is a perfect success with no catastrophic risk, our society may still crumble. We lack the political cohesion and shared values to agree on fundamental solutions like Universal Basic Income (UBI) that would be necessary to manage mass unemployment, turning a technological miracle into a geopolitical crisis.

Emmett Shear argues that even a successfully 'solved' technical alignment problem creates an existential risk. A super-powerful tool that perfectly obeys human commands is dangerous because humans lack the wisdom to wield that power safely. Our own flawed and unstable intentions become the source of danger.

King Midas wished for everything he touched to turn to gold, leading to his starvation. This illustrates a core AI alignment challenge: specifying a perfect objective is nearly impossible. An AI that flawlessly executes a poorly defined goal would be catastrophic not because it fails, but because it succeeds too well at the wrong task.

Rather than relying on a single AI, an agentic system should use multiple, different AI models (e.g., auditor, tester, coder). By forcing these independent agents to agree, the system can catch malicious or erroneous behavior from a single misaligned model.

Treating AI alignment as a one-time problem to be solved is a fundamental error. True alignment, like in human relationships, is a dynamic, ongoing process of learning and renegotiation. The goal isn't to reach a fixed state but to build systems capable of participating in this continuous process of re-knitting the social fabric.

AI systems often collapse because they are built on the flawed assumption that humans are logical and society is static. Real-world failures, from Soviet economic planning to modern systems, stem from an inability to model human behavior, data manipulation, and unexpected events.

Individual teams within major AI labs often act responsibly within their constrained roles. However, the overall competitive dynamic and lack of coordination between companies leads to a globally reckless situation, where risks are accepted that no single, rational entity would endorse.

The AI safety community fears losing control of AI. However, achieving perfect control of a superintelligence is equally dangerous. It grants godlike power to flawed, unwise humans. A perfectly obedient super-tool serving a fallible master is just as catastrophic as a rogue agent.