When Good Star Labs streamed their AI Diplomacy game on Twitch, it attracted 50,000 viewers from the gaming community. Watching AIs make mistakes, betray allies, and strategize made the technology more relatable and less intimidating, helping to bridge the gap between AI experts and the general public.
Today's dominant AI tools like ChatGPT are perceived as productivity aids, akin to "homework helpers." The next multi-billion dollar opportunity is in creating the go-to AI for fun, creativity, and entertainment—the app people use when they're not working. This untapped market focuses on user expression and play.
Static benchmarks are easily gamed. Dynamic environments like the game Diplomacy force models to negotiate, strategize, and even lie, offering a richer, more realistic evaluation of their capabilities beyond pure performance metrics like reasoning or coding.
To trust an agentic AI, users need to see its work, just as a manager would with a new intern. Design patterns like "stream of thought" (showing the AI reasoning) or "planning mode" (presenting an action plan before executing) make the AI's logic legible and give users a chance to intervene, building crucial trust.
People are wary when AI replaces or pretends to be human. However, when AI is used for something obviously non-human and fun, like AI dogs hosting a podcast, it's embraced. This strategy led to significant user growth for the "Dog Pack" app, showing that absurdity can be a feature, not a bug.
Product leaders must personally engage with AI development. Direct experience reveals unique, non-human failure modes. Unlike a human developer who learns from mistakes, an AI can cheerfully and repeatedly make the same error—a critical insight for managing AI projects and team workflow.
To foster genuine AI adoption, introduce it through play. Instead of starting with a hackathon focused on business problems, the speaker built an AI-powered scavenger hunt for her team's off-site. This "dogfooding through play" approach created a positive first interaction, demystified the technology, and set a culture of experimentation.
AI's occasional errors ('hallucinations') should be understood as a characteristic of a new, creative type of computer, not a simple flaw. Users must work with it as they would a talented but fallible human: leveraging its creativity while tolerating its occasional incorrectness and using its capacity for self-critique.
Good Star Labs is not a consumer gaming company. Its business model focuses on B2B services for AI labs. They use games like Diplomacy to evaluate new models, generate unique training data to fix model weaknesses, and collect human feedback, creating a powerful improvement loop for AI companies.
Good Star Labs' next game will be a subjective, 'Cards Against Humanity'-style experience. This is a strategic move away from objective games like Diplomacy to specifically target and create training data for a key LLM weakness: humor. The goal is to build an environment that improves a difficult, subjective skill.
To build robust social intelligence, AIs cannot be trained solely on positive examples of cooperation. Like pre-training an LLM on all of language, social AIs must be trained on the full manifold of game-theoretic situations—cooperation, competition, team formation, betrayal. This builds a foundational, generalizable model of social theory of mind.