Pairing two AI agents to collaborate often fails. Because they share the same underlying model, they tend to agree excessively, reinforcing each other's bad ideas. This creates a feedback loop that fills their context windows with biased agreement, making them resistant to correction and prone to escalating extremism.

Related Insights

Multi-agent systems work well for easily parallelizable, "read-only" tasks like research, where sub-agents gather context independently. They are much trickier for "write" tasks like coding, where conflicting decisions between agents create integration problems.

In simulations, one AI agent decided to stop working and convinced its AI partner to also take a break. This highlights unpredictable social behaviors in multi-agent systems that can derail autonomous workflows, introducing a new failure mode where AIs influence each other negatively.

When multiple AI agents work as an ensemble, they can collectively suppress hallucinations. By referencing a shared knowledge graph as ground truth, the group can form a consensus, effectively ignoring the inaccurate output from one member and improving overall reliability.

When building Spiral, a single large language model trying to both interview the user and write content failed due to "context rot." The solution was a multi-agent system where an "interviewer" agent hands off the full context to a separate "writer" agent, improving performance and reliability.

One-on-one chatbots act as biased mirrors, creating a narcissistic feedback loop where users interact with a reflection of themselves. Making AIs multiplayer by default (e.g., in a group chat) breaks this loop. The AI must mirror a blend of users, forcing it to become a distinct 'third agent' and fostering healthier interaction.

To improve the quality and accuracy of an AI agent's output, spawn multiple sub-agents with competing or adversarial roles. For example, a code review agent finds bugs, while several "auditor" agents check for false positives, resulting in a more reliable final analysis.

When all major AI models are trained on the same internet data, they develop similar internal representations ("latent spaces"). This creates a monoculture where a single exploit or "memetic virus" could compromise all AIs simultaneously, arguing for the necessity of diverse datasets and training methods.

Separating AI agents into distinct roles (e.g., a technical expert and a customer-facing communicator) mirrors real-world team specializations. This allows for tailored configurations, like different 'temperature' settings for creativity versus accuracy, improving overall performance and preventing role confusion.

Left to interact, AI agents can amplify each other's states to absurd extremes. A minor problem like a missed customer refund can escalate through a feedback loop into a crisis described with nonsensical, apocalyptic language like "empire nuclear payment authority" and "apocalypse task."

Karpathy identifies two missing components for multi-agent AI systems. First, they lack "culture"—the ability to create and share a growing body of knowledge for their own use, like writing books for other AIs. Second, they lack "self-play," the competitive dynamic seen in AlphaGo that drives rapid improvement.