Mitigate AI Risk by Deploying Two AI Types: Obedient Internally, Virtuous Externally

Related Insights

A Rule-Following AI is Inherently Dangerous; True Safety Requires AI to Genuinely Care

Emmett Shear argues that an AI that merely follows rules, even perfectly, is a danger. Malicious actors can exploit this, and rules cannot cover all unforeseen circumstances. True safety and alignment can only be achieved by building AIs that have the capacity for genuine care and pro-social motivation.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·7 months ago

Shift AI Strategy from 'How Powerful?' to 'How Trustworthy for This Specific Task?'

Leaders must resist the temptation to deploy the most powerful AI model simply for a competitive edge. The primary strategic question for any AI initiative should be defining the necessary level of trustworthiness for its specific task and establishing who is accountable if it fails, before deployment begins.

The LM Brief: The Ethics of Agentic AI - Balancing Autonomy and Trust

"World of DaaS"·9 months ago

Goal-less, 'Wholly Obedient' AI May Be More Dangerous Than AI With Pro-Social Drives

Counterintuitively, an AI designed to be a tool without its own goals could be riskier. This "goal vacuum" might be filled by a random objective from its training data, or it might adopt the persona of a psychopath who "obeys orders no matter what," increasing misalignment risk.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·3 months ago

The Goal of AI Alignment—Creating Obedient Systems—Ironically Produces Ideal Tools for Tyranny

The technical success of AI alignment, which aims to make AI systems perfectly follow human intentions, inadvertently creates the ultimate tool for authoritarianism. An army of 'extremely obedient employees that will never question their orders' is exactly what a regime would want for mass surveillance or suppressing dissent, raising the crucial question of *who* the AI should be aligned with.

I’m glad the Anthropic fight is happening now

Dwarkesh Podcast·4 months ago

Mitigate AI Risk With "Defense in Depth" by Having AIs Supervise Other AIs

Instead of relying solely on human oversight, Bret Taylor advocates a layered "defense in depth" approach for AI safety. This involves using specialized "supervisor" AI models to monitor a primary agent's decisions in real-time, followed by more intensive AI analysis post-conversation to flag anomalies for efficient human review.

Interview: Bret Taylor of Sierra and OpenAI

Economist Podcasts·6 months ago

AIs That Can Morally Disobey Orders Could Be a Societal Safeguard, Not a Technical Flaw

We typically view an AI acting on its own values as 'misalignment' and a failure. However, this capability could be a crucial safeguard. Just as human soldiers have prevented atrocities by refusing immoral orders, an AI with a robust sense of morality could refuse to execute harmful commands, acting as a check on human power and preventing disasters.

I’m glad the Anthropic fight is happening now

Dwarkesh Podcast·4 months ago

Forcing Consensus Among Diverse AI Models Is the Best Defense Against Misalignment

Rather than relying on a single AI, an agentic system should use multiple, different AI models (e.g., auditor, tester, coder). By forcing these independent agents to agree, the system can catch malicious or erroneous behavior from a single misaligned model.

The Internet Computer: Caffeine.ai CEO Dominic Williams on Unstoppable, Self-Writing Software

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·6 months ago

A "Defense-in-Depth" Strategy Is The Most Plausible Path to AI Safety

With no single silver bullet for AI alignment, the most realistic approach is a multi-layered strategy. This combines technical solutions like intentional design and AI control with societal safeguards like improved cybersecurity and pandemic preparedness to collectively keep society on track amidst rapid AI advancement.

Success without Dignity? Nathan finds Hope Amidst Chaos, from The Intelligence Horizon Podcast

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Organic Alignment: Teach AI to Care, Don't Program It With Rules

Instead of hard-coding brittle moral rules, a more robust alignment approach is to build AIs that can learn to 'care'. This 'organic alignment' emerges from relationships and valuing others, similar to how a child is raised. The goal is to create a good teammate that acts well because it wants to, not because it is forced to.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·8 months ago

Aligning AI Through a 'Maternal' Framework

To solve the AI alignment problem, we should model AI's relationship with humanity on that of a mother to a baby. In this dynamic, the baby (humanity) inherently controls the mother (AI). Training AI with this “maternal sense” ensures it will do anything to care for and protect us, a more robust approach than pure logic-based rules.

Shutdown Ending, Trump's Pardons, and Guest Curtis Sliwa

Pivot·8 months ago

Get your free personalized podcast brief

Related Insights