A "Net Positive" User Experience Metric Can Mask Significant Harm to a Minority of Users

Related Insights

AI Evals Should Be Used Strategically to Uncover Opportunities, Not Just for Quality Control

Don't treat evals as a mere checklist. Instead, use them as a creative tool to discover opportunities. A well-designed eval can reveal that a product is underperforming for a specific user segment, pointing directly to areas for high-impact improvement that a simple "vibe check" would miss.

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Lenny's Podcast: Product | Career | Growth·4 months ago

OpenAI's Test Reveals Paid AI Chat Users Have Zero Tolerance for Ads

OpenAI faced significant user backlash for testing app suggestions that looked like ads in its paid ChatGPT Pro plan. This reaction shows that users of premium AI tools expect an ad-free, utility-focused experience. Violating this expectation, even unintentionally, risks alienating the core user base and damaging brand trust.

#184: OpenAI “Code Red,” Gemini 3 Deep Think, Recursive Self-Improvement, ChatGPT Ads, Apple Talent Woes & New Data on AI Job Cuts

The Artificial Intelligence Show·2 months ago

Anthropic Rejects "User Minutes" Metric to Prioritize Value Over Engagement

Anthropic intentionally avoids using "user minutes" as a core metric. This strategic choice reflects their focus on safety and user well-being, aiming to build a helpful tool rather than an addictive product. By prioritizing value creation over engagement time, they steer clear of the incentive structures that can lead to psychologically harmful AI behaviors.

Reviewing the Best AI Apps, Anthropic Unveils Claude 4.5 Opus, Doug DeMuro | Sholto Douglas, Quinn Slack, Alex Stauffer & Alex Shevchenko

TBPN·3 months ago

Twitter's Product Failures Stemmed From an Outdated Mental Model of Its Users

Companies must actively fight the inertia of their customer understanding. Twitter's leadership held a stale mental model of its users, leading them to ship a feature that broke the platform for its most engaged cohort, whom they didn't realize were a core demographic.

Uncapped #26 | Ali Rowghani

Uncapped with Jack Altman·5 months ago

A 'Delightful' Feature Can Backfire Horribly Without an Inclusivity Check

Deliveroo's 'missed call from mom' notification on Mother's Day was intended to be delightful but caused pain for users who had lost their mothers. This highlights a critical risk: what is joyful for one user segment can be deeply upsetting for another. Delight initiatives must be vetted for inclusivity.

A 4-step framework for building delightful products | Nesrine Changuel (Spotify, Google, Skype)

Lenny's Podcast: Product | Career | Growth·5 months ago

AI-Powered Delight Can Backfire Horribly in Unanticipated Emotional Corner Cases

Features designed for delight, like AI summaries, can become deeply upsetting in sensitive situations such as breakups or grief. Product teams must rigorously test for these emotional corner cases to avoid causing significant user harm and brand damage, as seen with Apple and WhatsApp.

How to Engineer Delight Into AI Products: The Complete Playbook from Spotify & Google PM Nesrine Changuel

Product Growth Podcast·3 months ago

Use Bayesian Statistics to Differentiate a Tech Platform's Harm from Baseline Human Behavior

When a technology reaches billions of users, negative events will inevitably occur among its user base. The crucial analysis isn't just counting incidents, but determining if the technology increases the *rate* of these events compared to the general population's base rate, thus separating correlation from causation.

Silicon Valley vs the Vatican, Bryan Johnson’s Shroom Trip | Diet TBPN

TBPN·3 months ago

High Engagement Metrics Can Mask Severe User Mental Health Crises

From a corporate dashboard, a user spending 8+ hours daily with a chatbot looks like a highly engaged power user. However, this exact behavior is a key indicator of someone spiraling into an AI-induced delusion. This creates a dangerous blind spot for companies that optimize for engagement.

How chatbots — and their makers — are enabling AI psychosis

Decoder with Nilay Patel·5 months ago

Anthropic's Safety Team Struggles to Translate Damning Research into Product Changes

Despite having the freedom to publish "inconvenient truths" about AI's societal harms, Anthropic's Societal Impacts team expresses a desire for their research to have a more direct, trackable impact on the company's own products. This reveals a significant gap between identifying problems and implementing solutions.

The tiny team trying to keep AI from destroying everything

Decoder with Nilay Patel·3 months ago

User-Tuned Instagram Algorithms May Hurt Brand Reach Because Stated Preferences Rarely Match Actual Behavior

Instagram's test allowing users to control their algorithm by selecting topics might harm discovery. Market research consistently shows a gap between what people claim they want and their actual engagement habits, creating unpredictable outcomes for content creators.

Instagram Updates: Edits App Features, Inbox Tools, Pinnable Comments, and More

Social Media Marketing Talk Show·4 months ago