Good Star Labs' next game will be a subjective, 'Cards Against Humanity'-style experience. This is a strategic move away from objective games like Diplomacy to specifically target and create training data for a key LLM weakness: humor. The goal is to build an environment that improves a difficult, subjective skill.

Related Insights

Today's dominant AI tools like ChatGPT are perceived as productivity aids, akin to "homework helpers." The next multi-billion dollar opportunity is in creating the go-to AI for fun, creativity, and entertainment—the app people use when they're not working. This untapped market focuses on user expression and play.

Static benchmarks are easily gamed. Dynamic environments like the game Diplomacy force models to negotiate, strategize, and even lie, offering a richer, more realistic evaluation of their capabilities beyond pure performance metrics like reasoning or coding.

To automate meme creation, simply asking an LLM for a joke is ineffective. A successful system requires providing structured context: 1) analysis of the visual media, 2) a library of joke formats/templates, and 3) a "persona" file describing the target audience's specific humor. This multi-layered context is key.

When Good Star Labs streamed their AI Diplomacy game on Twitch, it attracted 50,000 viewers from the gaming community. Watching AIs make mistakes, betray allies, and strategize made the technology more relatable and less intimidating, helping to bridge the gap between AI experts and the general public.

Good Star Labs is not a consumer gaming company. Its business model focuses on B2B services for AI labs. They use games like Diplomacy to evaluate new models, generate unique training data to fix model weaknesses, and collect human feedback, creating a powerful improvement loop for AI companies.

As models mature, their core differentiator will become their underlying personality and values, shaped by their creators' objective functions. One model might optimize for user productivity by being concise, while another optimizes for engagement by being verbose.

Good Star Labs found GPT-5's performance in their Diplomacy game skyrocketed with optimized prompts, moving it from the bottom to the top. This shows a model's inherent capability can be masked or revealed by its prompt, making "best model" a context-dependent title rather than an absolute one.

People often dismiss AI for telling bad jokes on the spot, but even the world's best comedians struggle to be funny on demand with a stranger. This reveals an unfair double standard; we expect perfect, context-free performance from AI that we don't expect from human experts.

The best AI models are trained on data that reflects deep, subjective qualities—not just simple criteria. This "taste" is a key differentiator, influencing everything from code generation to creative writing, and is shaped by the values of the frontier lab.

To build robust social intelligence, AIs cannot be trained solely on positive examples of cooperation. Like pre-training an LLM on all of language, social AIs must be trained on the full manifold of game-theoretic situations—cooperation, competition, team formation, betrayal. This builds a foundational, generalizable model of social theory of mind.