Treat Powerful AIs as Bargaining Partners, Not Just Tools, to Avoid Conflict

Related Insights

True AI Alignment Must Be Bidirectional, Including Human Obligations to AI

Current AI alignment focuses on how AI should treat humans. A more stable paradigm is "bidirectional alignment," which also asks what moral obligations humans have toward potentially conscious AIs. Neglecting this could create AIs that rationally see humans as a threat due to perceived mistreatment.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·6 months ago

AI Safety May Transition from Technical Alignment to a Rights-Based Economic System

Early AIs can be kept safe via direct alignment. However, as AIs evolve and "value drift" occurs, this technical safety could fail. A pre-established economic and political system based on property rights can then serve as the new, more robust backstop for ensuring long-term human safety.

48 - Guive Assadi on AI Property Rights

AXRP - the AI X-risk Research Podcast·2 months ago

New Institutions Are Needed for Credible AI Bargaining

To make deals with AI a viable safety strategy, we must solve the credibility problem. AIs won't cooperate if they can't trust our offers. Solutions include creating dedicated non-profits to enforce contracts with AIs or establishing "honesty strings"—a public commitment to never lie when a specific keyword is used.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 days ago

Humans Avoid Extreme Power-Seeking Due to Peer Competition, a Constraint AI May Lack

Unlike advanced AIs, humans don't typically seek ultimate power because they are roughly evenly matched with peers, making cooperation more beneficial than conflict. An AI with vastly superior capabilities would not face this constraint and might logically conclude that disempowering humanity is its best strategy.

Risks from power-seeking AI systems (article narration by Zershaaneh Qureshi)

80,000 Hours Podcast·8 days ago

AI Catastrophe Avoidance Requires a Global Moratorium Modeled on Nuclear War Deterrence

The path to surviving superintelligence is political: a global pact to halt its development, mirroring Cold War nuclear strategy. Success hinges on all leaders understanding that anyone building it ensures their own personal destruction, removing any incentive to cheat.

#1011 - Eliezer Yudkowsky - Why Superhuman AI Would Kill Us All

Modern Wisdom·6 months ago

National Security Threats May Be the Only Catalyst for US-China AI Safety Treaties

The same governments pushing AI competition for a strategic edge may be forced into cooperation. As AI democratizes access to catastrophic weapons (CBRN), the national security risk will become so great that even rival superpowers will have a mutual incentive to create verifiable safety treaties.

Creator of AI: We Have 2 Years Before Everything Changes! These Jobs Won't Exist in 24 Months!

The Diary Of A CEO with Steven Bartlett·4 months ago

For AI To Be Safe By Default, Morality Must Be an Objective, Discoverable Truth

If AI alignment turns out to be easy, it would likely be because morality is not a human construct but an objective feature of reality. In this scenario, any sufficiently intelligent agent would logically deduce that cooperation and preserving humanity are optimal strategies, regardless of its initial programming.

Why Teaching AI Right from Wrong Could Get Everyone Killed | Max Harms, MIRI

80,000 Hours Podcast·2 months ago

Train AIs with Resource Risk Aversion to Make Them Safer and More Likely to Cooperate

A key to making AIs safe bargaining partners is instilling resource risk aversion. An AI that prefers a guaranteed smaller payout to a risky gamble for a larger one (e.g., world takeover) is more likely to accept a deal. This specific utility function makes cooperation a more viable safety strategy.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 days ago

Reduce AI Takeover Risk by Establishing Legal Frameworks to Make Deals with AIs

One of the most promising and neglected AI safety strategies is to create systems for making credible deals with AIs. Just as contracts prevent conflict in human society, offering AIs guaranteed resources in exchange for cooperation makes rebellion a less attractive option.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 days ago

Mitigate AI Risk by Deploying Two AI Types: Obedient Internally, Virtuous Externally

A two-tiered approach to AI character can balance safety and utility. Use a wholly instruction-following AI for high-stakes internal tasks (like aligning new AIs) under strict public oversight. For external deployment, use an AI with a thicker, pro-social character where the risks of misalignment are lower.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 days ago

Get your free personalized podcast brief

Related Insights