AIs Can Use an Obscure Logic Theorem to Achieve Robust Cooperation

Related Insights

Shared Public Randomness Is Key to Stable AI Cooperation; Private Randomness Cripples It

In multi-agent simulations, if agents use a shared source of randomness, they can achieve stable equilibria. If they use private randomness, coordinating punishment becomes nearly impossible because one agent cannot verify if another's defection was malicious or a justified response to a third party's actions.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·5 months ago

Simple Code-Matching for AI Cooperation Is Too Brittle for Practical Use

Early program equilibrium strategies relied on checking if an opponent's source code was identical. This approach is extremely fragile, as trivial changes like an extra space or a different variable name break cooperation, making it impractical for real-world applications.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·5 months ago

Formal Math Languages Like Lean Turn Theorem Proving Into a Solvable Game for AI

Languages like Lean allow mathematical proofs to be automatically verified. This provides a perfect, binary reward signal (correct/incorrect) for a reinforcement learning agent. It transforms the abstract art of mathematics into a well-defined environment, much like a game of Go, that an AI can be trained to master.

Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast·6 months ago

AI Agents Use 'Program Equilibrium' to Cooperate by Inspecting Source Code

In program equilibrium, players submit computer programs instead of actions. These programs can read each other's source code, allowing them to verify cooperative intent and overcome dilemmas like the Prisoner's Dilemma, which is impossible in standard game theory.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·5 months ago

Different Advanced AI Cooperation Strategies Can Successfully Interoperate

Despite different mechanisms, advanced cooperative strategies like proof-based (Loebian) and simulation-based (epsilon-grounded) bots can successfully cooperate. This suggests a potential for robust interoperability between independently designed rational agents, a positive sign for AI safety.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·5 months ago

Multi-Agent Simulations Can Create a 'Surrogate Model for Alignment'

Softmax's technical approach involves training AIs in complex multi-agent simulations to learn cooperation, competition, and theory of mind. The goal is to build a foundational, generalizable model of sociality, which acts as a 'surrogate model for alignment' before fine-tuning for specific tasks.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·6 months ago

Program Equilibrium Theory Models Real-World AI and Institutional Interactions

Program equilibrium isn't just an abstract concept; it serves as a direct model for how autonomous AI systems could interact. It also provides a powerful analogy for human institutions like governments, where laws and constitutions act as a transparent "source code" governing their behavior.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·5 months ago

Program Equilibrium's 'Folk Theorems' Create a Coordination Paradox for AIs

A key finding is that almost any outcome better than mutual punishment can be a stable equilibrium (a "folk theorem"). While this enables cooperation, it creates a massive coordination problem: with so many possible "good" outcomes, agents may fail to converge on the same one, leading to suboptimal results.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·5 months ago

Adding a Random Chance of Cooperation Solves Infinite Loops in AI Simulation

A simple way for AIs to cooperate is to simulate each other and copy the action. However, this creates an infinite loop if both do it. The fix is to introduce a small probability (epsilon) of cooperating unconditionally, which guarantees the simulation chain eventually terminates.

49 - Caspar Oesterheld on Program Equilibrium

AXRP - the AI X-risk Research Podcast·5 months ago

Train Social AI on the Entire Manifold of Social Dynamics

To build robust social intelligence, AIs cannot be trained solely on positive examples of cooperation. Like pre-training an LLM on all of language, social AIs must be trained on the full manifold of game-theoretic situations—cooperation, competition, team formation, betrayal. This builds a foundational, generalizable model of social theory of mind.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·8 months ago

Get your free personalized podcast brief

Related Insights