In multi-agent simulations, if agents use a shared source of randomness, they can achieve stable equilibria. If they use private randomness, coordinating punishment becomes nearly impossible because one agent cannot verify if another's defection was malicious or a justified response to a third party's actions.
Pairing two AI agents to collaborate often fails. Because they share the same underlying model, they tend to agree excessively, reinforcing each other's bad ideas. This creates a feedback loop that fills their context windows with biased agreement, making them resistant to correction and prone to escalating extremism.
Simulating strategies with memory (like "grim trigger") or with multiple players causes an exponential explosion of simulation branches. This can be solved by having all simulated agents draw from the same shared sequence of random numbers, which forces all simulation branches to halt at the same conceptual "time step."
When multiple AI agents work as an ensemble, they can collectively suppress hallucinations. By referencing a shared knowledge graph as ground truth, the group can form a consensus, effectively ignoring the inaccurate output from one member and improving overall reliability.
In program equilibrium, players submit computer programs instead of actions. These programs can read each other's source code, allowing them to verify cooperative intent and overcome dilemmas like the Prisoner's Dilemma, which is impossible in standard game theory.
To overcome brittle code-matching, AIs can use formal logic to prove cooperative intent. This is enabled by Löb's Theorem, an obscure result which allows a program to conclude "my opponent cooperates" without falling into an infinite loop of reasoning, creating a robust cooperative equilibrium.
Despite different mechanisms, advanced cooperative strategies like proof-based (Loebian) and simulation-based (epsilon-grounded) bots can successfully cooperate. This suggests a potential for robust interoperability between independently designed rational agents, a positive sign for AI safety.
Rather than relying on a single AI, an agentic system should use multiple, different AI models (e.g., auditor, tester, coder). By forcing these independent agents to agree, the system can catch malicious or erroneous behavior from a single misaligned model.
The "epsilon-grounded" simulation approach has a hidden cost: its runtime is inversely proportional to epsilon. To be very certain that simulations will terminate (a small epsilon), agents must accept potentially very long computation times, creating a direct trade-off between speed and reliability.
A key finding is that almost any outcome better than mutual punishment can be a stable equilibrium (a "folk theorem"). While this enables cooperation, it creates a massive coordination problem: with so many possible "good" outcomes, agents may fail to converge on the same one, leading to suboptimal results.
A simple way for AIs to cooperate is to simulate each other and copy the action. However, this creates an infinite loop if both do it. The fix is to introduce a small probability (epsilon) of cooperating unconditionally, which guarantees the simulation chain eventually terminates.