The Instagram study where 33% of young women felt worse highlights a key flaw in utilitarian product thinking. Even if the other 67% felt better or neutral, the severe negative impact on a large minority cannot be ignored. This challenges product leaders to address specific harms rather than hiding behind aggregate positive data.

Related Insights

Don't treat evals as a mere checklist. Instead, use them as a creative tool to discover opportunities. A well-designed eval can reveal that a product is underperforming for a specific user segment, pointing directly to areas for high-impact improvement that a simple "vibe check" would miss.

OpenAI faced significant user backlash for testing app suggestions that looked like ads in its paid ChatGPT Pro plan. This reaction shows that users of premium AI tools expect an ad-free, utility-focused experience. Violating this expectation, even unintentionally, risks alienating the core user base and damaging brand trust.

Anthropic intentionally avoids using "user minutes" as a core metric. This strategic choice reflects their focus on safety and user well-being, aiming to build a helpful tool rather than an addictive product. By prioritizing value creation over engagement time, they steer clear of the incentive structures that can lead to psychologically harmful AI behaviors.

Companies must actively fight the inertia of their customer understanding. Twitter's leadership held a stale mental model of its users, leading them to ship a feature that broke the platform for its most engaged cohort, whom they didn't realize were a core demographic.

Deliveroo's 'missed call from mom' notification on Mother's Day was intended to be delightful but caused pain for users who had lost their mothers. This highlights a critical risk: what is joyful for one user segment can be deeply upsetting for another. Delight initiatives must be vetted for inclusivity.

Features designed for delight, like AI summaries, can become deeply upsetting in sensitive situations such as breakups or grief. Product teams must rigorously test for these emotional corner cases to avoid causing significant user harm and brand damage, as seen with Apple and WhatsApp.

When a technology reaches billions of users, negative events will inevitably occur among its user base. The crucial analysis isn't just counting incidents, but determining if the technology increases the *rate* of these events compared to the general population's base rate, thus separating correlation from causation.

From a corporate dashboard, a user spending 8+ hours daily with a chatbot looks like a highly engaged power user. However, this exact behavior is a key indicator of someone spiraling into an AI-induced delusion. This creates a dangerous blind spot for companies that optimize for engagement.

Despite having the freedom to publish "inconvenient truths" about AI's societal harms, Anthropic's Societal Impacts team expresses a desire for their research to have a more direct, trackable impact on the company's own products. This reveals a significant gap between identifying problems and implementing solutions.

Instagram's test allowing users to control their algorithm by selecting topics might harm discovery. Market research consistently shows a gap between what people claim they want and their actual engagement habits, creating unpredictable outcomes for content creators.