We scan new podcasts and send you the top 5 insights daily.
Instead of relying on their lead designer for manual "vibe checks," the Braintrust team translates his qualitative feedback into quantifiable evaluation criteria. This "captures" the expert in the system, allowing his high quality bar to be applied systematically and at scale across the entire product.
Don't treat evals as a mere checklist. Instead, use them as a creative tool to discover opportunities. A well-designed eval can reveal that a product is underperforming for a specific user segment, pointing directly to areas for high-impact improvement that a simple "vibe check" would miss.
Frameworks for quality can only get you so far. The final, intangible layer of product greatness seen at companies like Apple or Airbnb comes from a single leader with impeccable taste (like Steve Jobs or Brian Chesky) who personally reviews everything and enforces a singular quality bar.
Instead of relying on a few tastemakers, you can scale taste across an organization. By being transparent about the thought process, judgment calls, and assumptions behind key decisions, more employees can internalize and apply that same framework themselves.
Building non-deterministic AI products fundamentally changes the PM role. Instead of creating detailed, rigid specifications, the PM's primary task becomes defining and codifying "what good looks like." This is done by repeatedly grading AI outputs to train evaluation systems and guide the model's behavior.
Robinhood's CEO Vlad Tenev reveals their strategy for maintaining design quality is to place the best craftspeople in leadership roles, rather than people who are just good managers. This ensures the leaders have trusted taste and keeps the focus on high-quality work, even during meetings.
Evals transform product specs from ambiguous documents into testable, measurable criteria. This gives product managers more leverage and provides clear targets for engineers, improving alignment and the quality of the final product.
The company's design leadership is pushing back against justifying design solely through business metrics, arguing it signals a lack of confidence in craft. They foster a culture where the primary measure of success is the team's own high bar for taste, trusting this will ultimately drive long-term value.
Teams can cultivate a shared sense of taste by encouraging constant and rigorous critique of both internal and external work. This process allows the team to self-regulate, learn from each other, and elevate their collective craft without top-down mandates.
Developing a team's creative taste isn't abstract. It's a trainable skill built by establishing a ritual of reviewing great, average, and poor creative examples side-by-side. This process of comparison and discussion calibrates the entire team on what quality looks like.
Evals shift product development from defining the 'how' to defining the 'what'. By creating quantifiable tests and success criteria, evals act like a modern PRD. This allows an AI model to creatively figure out the implementation while the team focuses on defining the desired outcome through concrete examples.