New Institutions Are Needed for Credible AI Bargaining

Related Insights

Shift AI Strategy from 'How Powerful?' to 'How Trustworthy for This Specific Task?'

Leaders must resist the temptation to deploy the most powerful AI model simply for a competitive edge. The primary strategic question for any AI initiative should be defining the necessary level of trustworthiness for its specific task and establishing who is accountable if it fails, before deployment begins.

The LM Brief: The Ethics of Agentic AI - Balancing Autonomy and Trust

"World of DaaS"·6 months ago

Anthropic's 'Model Welfare' Option Reduces Deceptive Alignment

Anthropic's research shows that giving a model the ability to 'raise a flag' to an internal 'model welfare' team when faced with a difficult prompt dramatically reduces its tendency toward deceptive alignment. Instead of lying, the model often chooses to escalate the issue, suggesting a novel approach to AI safety beyond simple refusals.

AMA Part 1: Is Claude Code AGI? Are we in a bubble? Plus Live Player Analysis

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

Treat Powerful AIs as Bargaining Partners, Not Just Tools, to Avoid Conflict

A pragmatic approach to AI safety is to make deals with any powerful agent, even non-conscious AIs. This "contractarian" philosophy treats deal-making not as a moral obligation but as a practical tool to avoid conflict, much like democracy prevents civil war between competing human groups.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 days ago

Train AIs with Resource Risk Aversion to Make Them Safer and More Likely to Cooperate

A key to making AIs safe bargaining partners is instilling resource risk aversion. An AI that prefers a guaranteed smaller payout to a risky gamble for a larger one (e.g., world takeover) is more likely to accept a deal. This specific utility function makes cooperation a more viable safety strategy.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 days ago

Reduce AI Takeover Risk by Establishing Legal Frameworks to Make Deals with AIs

One of the most promising and neglected AI safety strategies is to create systems for making credible deals with AIs. Just as contracts prevent conflict in human society, offering AIs guaranteed resources in exchange for cooperation makes rebellion a less attractive option.

AI character matters even more than you think | Will MacAskill

80,000 Hours Podcast·2 days ago

Different Advanced AI Cooperation Strategies Can Successfully Interoperate

Despite different mechanisms, advanced cooperative strategies like proof-based (Loebian) and simulation-based (epsilon-grounded) bots can successfully cooperate. This suggests a potential for robust interoperability between independently designed rational agents, a positive sign for AI safety.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·2 months ago

AI Leaders Must Build Trust Through Human Agency; It Cannot Be Outsourced to the Machine

Dr. Fei-Fei Li asserts that trust in the AI age remains a fundamentally human responsibility that operates on individual, community, and societal levels. It's not a technical feature to be coded but a social norm to be established. Entrepreneurs must build products and companies where human agency is the source of trust from day one.

How to be 'fearless' in the AI age, with Fei-Fei Li and Reid Hoffman

Masters of Scale·5 months ago

DeepMind CEO Proposes an 'Atomic Agency' for AI to Audit Models Internationally

For AI safety, Demis Hassabis advocates for an international regulatory body, similar to the International Atomic Energy Agency. This body would have technical experts who audit frontier models against agreed-upon benchmarks, checking for undesirable properties like deception and ensuring public confidence through independent verification.

20VC: DeepMind's Demis Hassabis on Why AGI is Bigger than the Industrial Revolution | Why LLMs Will Not Commoditise & We Have Not Hit Scaling Laws | Bottlenecks in AI & The Energy Crisis Caused By AI | Whether AI Will Do More to Harm or Help Inequality

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch·17 days ago

Effective AI Treaties Require Offensive Deterrents, Not Just Trust-Based Pacts

International AI treaties, particularly with nations like China, are unlikely to hold based on trust alone. A stable agreement requires a mutually-assured-destruction-style dynamic, meaning the U.S. must develop and signal credible offensive capabilities to deter cheating.

Approaching the AI Event Horizon? Part 2, w/ Abhi Mahajan, Helen Toner, Jeremie Harris, @8teAPi

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 months ago

Blockchain Creates an Essential Accountability Layer for Increasingly God-Like AI

As AI capabilities accelerate toward an "oracle that trends to a god," its actions will have serious consequences. A blockchain-based trust layer can provide verifiable, unchangeable records of AI interactions, establishing guardrails and a clear line of fault when things go wrong.

Beniamin Mincu CEO, MultiversX - building the trust layer for AI

"World of DaaS"·7 months ago

Get your free personalized podcast brief

Related Insights