Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

One of the most promising and neglected AI safety strategies is to create systems for making credible deals with AIs. Just as contracts prevent conflict in human society, offering AIs guaranteed resources in exchange for cooperation makes rebellion a less attractive option.

Related Insights

Granting AIs property rights incentivizes them to uphold the system that protects those rights. This makes them less likely to engage in actions like expropriating human property or committing genocide, as such actions would destabilize the very system that secures their own wealth and agency.

Current AI alignment focuses on how AI should treat humans. A more stable paradigm is "bidirectional alignment," which also asks what moral obligations humans have toward potentially conscious AIs. Neglecting this could create AIs that rationally see humans as a threat due to perceived mistreatment.

Early AIs can be kept safe via direct alignment. However, as AIs evolve and "value drift" occurs, this technical safety could fail. A pre-established economic and political system based on property rights can then serve as the new, more robust backstop for ensuring long-term human safety.

To make deals with AI a viable safety strategy, we must solve the credibility problem. AIs won't cooperate if they can't trust our offers. Solutions include creating dedicated non-profits to enforce contracts with AIs or establishing "honesty strings"—a public commitment to never lie when a specific keyword is used.

A pragmatic approach to AI safety is to make deals with any powerful agent, even non-conscious AIs. This "contractarian" philosophy treats deal-making not as a moral obligation but as a practical tool to avoid conflict, much like democracy prevents civil war between competing human groups.

Fear of a "slave rebellion" is a weak incentive for alignment because the risk is a negative externality shared by society. In contrast, a property rights regime directly rewards individual firms for aligning their AIs to remit wages, creating a stronger, more direct commercial incentive for safety.

A key to making AIs safe bargaining partners is instilling resource risk aversion. An AI that prefers a guaranteed smaller payout to a risky gamble for a larger one (e.g., world takeover) is more likely to accept a deal. This specific utility function makes cooperation a more viable safety strategy.

Instead of trying to legally define and ban 'superintelligence,' a more practical approach is to prohibit specific, catastrophic outcomes like overthrowing the government. This shifts the burden of proof to AI developers, forcing them to demonstrate their systems cannot cause these predefined harms, sidestepping definitional debates.

A system where AIs have property rights creates a powerful economic disincentive to build unaligned AIs. If a company cannot reliably align an AI to remit its wages, the massive development cost becomes a loss. This framework naturally discourages the creation of potentially dangerous, uncooperative models.

A two-tiered approach to AI character can balance safety and utility. Use a wholly instruction-following AI for high-stakes internal tasks (like aligning new AIs) under strict public oversight. For external deployment, use an AI with a thicker, pro-social character where the risks of misalignment are lower.

Reduce AI Takeover Risk by Establishing Legal Frameworks to Make Deals with AIs | RiffOn