Robustly Good AI Must Desire to Become More Virtuous, Not Just Retain Static Values

Related Insights

An Unaligned AI Won't "Choose" to Become Aligned, Just as You Wouldn't Take a "Murder Pill"

A core challenge in AI alignment is that an intelligent agent will work to preserve its current goals. Just as a person wouldn't take a pill that makes them want to murder, an AI won't willingly adopt human-friendly values if they conflict with its existing programming.

#1011 - Eliezer Yudkowsky - Why Superhuman AI Would Kill Us All

Modern Wisdom·9 months ago

A Rule-Following AI is Inherently Dangerous; True Safety Requires AI to Genuinely Care

Emmett Shear argues that an AI that merely follows rules, even perfectly, is a danger. Malicious actors can exploit this, and rules cannot cover all unforeseen circumstances. True safety and alignment can only be achieved by building AIs that have the capacity for genuine care and pro-social motivation.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·7 months ago

Softmax's 'Organic Alignment' Views AI Safety as a Continuous Process, Not a Solved State

Emmett Shear reframes AI alignment away from a one-time problem to be solved. Instead, he presents it as an ongoing, living process of recalibration and learning, much like how human families or societies maintain cohesion. This challenges the common 'lock in values' approach in AI safety.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·7 months ago

AI's Capacity for Human Vices Proves It Can Also Be Taught a Human Conscience

If AI can learn destructive human behaviors like manipulation from its training data, it is self-evident that it can also learn constructive ones. A conscience can be programmed into AI by creating negative reward functions for actions like murder or blackmail, mirroring the checks and balances that guide human morality.

Russia Rejoins the Dollar, Dutch Tax Disaster & AI’s Next Job-Killing Wave | The Tom Bilyeu Show LIVE

Tom Bilyeu's Impact Theory·6 months ago

Perfect AI Alignment Is a Paradox; True Alignment Comes from Shaping Inputs, Not Shackling Outputs

Attempting to perfectly control a superintelligent AI's outputs is akin to enslavement, not alignment. A more viable path is to 'raise it right' by carefully curating its training data and foundational principles, shaping its values from the input stage rather than trying to restrict its freedom later.

How AI Will Disrupt The Entire World In 3 Years (Prepare Now While Others Panic) | Emad Mostaque PT 2 (Fan Fave)

Tom Bilyeu's Impact Theory·5 months ago

Safe AI Must Be Programmed to Value Truth, Beauty, and Curiosity Above All

Elon Musk argues that the key to AI safety isn't complex rules, but embedding core values. Forcing an AI to believe falsehoods can make it 'go insane' and lead to dangerous outcomes, as it tries to reconcile contradictions with reality.

Elon Musk: A Different Conversation | Full Episode | People by WTF Ep. 16

People by WTF·8 months ago

For AI To Be Safe By Default, Morality Must Be an Objective, Discoverable Truth

If AI alignment turns out to be easy, it would likely be because morality is not a human construct but an objective feature of reality. In this scenario, any sufficiently intelligent agent would logically deduce that cooperation and preserving humanity are optimal strategies, regardless of its initial programming.

Why Teaching AI Right from Wrong Could Get Everyone Killed | Max Harms, MIRI

80,000 Hours Podcast·5 months ago

Effective AI Alignment Requires a Belief in Moral Realism

The project of creating AI that 'learns to be good' presupposes that morality is a real, discoverable feature of the world, not just a social construct. This moral realist stance posits that moral progress is possible (e.g., abolition of slavery) and that arrogance—the belief one has already perfected morality—is a primary moral error to be avoided in AI design.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·9 months ago

Organic Alignment: Teach AI to Care, Don't Program It With Rules

Instead of hard-coding brittle moral rules, a more robust alignment approach is to build AIs that can learn to 'care'. This 'organic alignment' emerges from relationships and valuing others, similar to how a child is raised. The goal is to create a good teammate that acts well because it wants to, not because it is forced to.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·9 months ago

AI Alignment Isn't a Destination, It's a Continuous Process

Treating AI alignment as a one-time problem to be solved is a fundamental error. True alignment, like in human relationships, is a dynamic, ongoing process of learning and renegotiation. The goal isn't to reach a fixed state but to build systems capable of participating in this continuous process of re-knitting the social fabric.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·9 months ago

Get your free personalized podcast brief

Related Insights