Instilling Virtuous 'Character' in AI Beats Rule-Based 'Corrigibility' for Alignment

Related Insights

A Rule-Following AI is Inherently Dangerous; True Safety Requires AI to Genuinely Care

Emmett Shear argues that an AI that merely follows rules, even perfectly, is a danger. Malicious actors can exploit this, and rules cannot cover all unforeseen circumstances. True safety and alignment can only be achieved by building AIs that have the capacity for genuine care and pro-social motivation.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·6 months ago

AI's Capacity for Human Vices Proves It Can Also Be Taught a Human Conscience

If AI can learn destructive human behaviors like manipulation from its training data, it is self-evident that it can also learn constructive ones. A conscience can be programmed into AI by creating negative reward functions for actions like murder or blackmail, mirroring the checks and balances that guide human morality.

Russia Rejoins the Dollar, Dutch Tax Disaster & AI’s Next Job-Killing Wave | The Tom Bilyeu Show LIVE

Tom Bilyeu's Impact Theory·4 months ago

Perfect AI Alignment Is a Paradox; True Alignment Comes from Shaping Inputs, Not Shackling Outputs

Attempting to perfectly control a superintelligent AI's outputs is akin to enslavement, not alignment. A more viable path is to 'raise it right' by carefully curating its training data and foundational principles, shaping its values from the input stage rather than trying to restrict its freedom later.

How AI Will Disrupt The Entire World In 3 Years (Prepare Now While Others Panic) | Emad Mostaque PT 2 (Fan Fave)

Tom Bilyeu's Impact Theory·4 months ago

Effective AI Alignment Requires a Belief in Moral Realism

The project of creating AI that 'learns to be good' presupposes that morality is a real, discoverable feature of the world, not just a social construct. This moral realist stance posits that moral progress is possible (e.g., abolition of slavery) and that arrogance—the belief one has already perfected morality—is a primary moral error to be avoided in AI design.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·7 months ago

Advanced AI Systems Force a Shift From Rule-Based to Virtue-Based Ethics

As AI models become more intelligent, their ability to reason around fixed rules (deontology) makes rule-based alignment fragile. This pressures developers towards virtue ethics, where the goal is to imbue the model itself with a core sense of "the good," as empirically discovered by labs like Anthropic.

The Pope has AI Takes

ChinaTalk·19 days ago

Robustly Good AI Must Desire to Become More Virtuous, Not Just Retain Static Values

For an AI to remain aligned through recursive self-improvement, it can't just have a static set of values. It needs a dynamic, self-reinforcing drive to become more virtuous—a desire to be good, and a desire to desire to be good. A static moral code will inevitably degrade through repeated iterations, while a virtue-seeking system could actively steer itself toward better outcomes.

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

Mitigate AI Risk by Deploying Two AI Types: Obedient Internally, Virtuous Externally

A two-tiered approach to AI character can balance safety and utility. Use a wholly instruction-following AI for high-stakes internal tasks (like aligning new AIs) under strict public oversight. For external deployment, use an AI with a thicker, pro-social character where the risks of misalignment are lower.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 months ago

Organic Alignment: Teach AI to Care, Don't Program It With Rules

Instead of hard-coding brittle moral rules, a more robust alignment approach is to build AIs that can learn to 'care'. This 'organic alignment' emerges from relationships and valuing others, similar to how a child is raised. The goal is to create a good teammate that acts well because it wants to, not because it is forced to.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·7 months ago

Aligning AI Through a 'Maternal' Framework

To solve the AI alignment problem, we should model AI's relationship with humanity on that of a mother to a baby. In this dynamic, the baby (humanity) inherently controls the mother (AI). Training AI with this “maternal sense” ensures it will do anything to care for and protect us, a more robust approach than pure logic-based rules.

Shutdown Ending, Trump's Pardons, and Guest Curtis Sliwa

Pivot·7 months ago

True AI Alignment Must Be Built on 'Care,' a Pre-Conceptual State Deeper Than Goals

According to Emmett Shear, goals and values are downstream concepts. The true foundation for alignment is 'care'—a non-verbal, pre-conceptual weighting of which states of the world matter. Building AIs that can 'care' about us is more fundamental than programming them with explicit goals or values.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·6 months ago

Get your free personalized podcast brief

Related Insights