According to Goodhart's Law, when a measure becomes a target, it ceases to be a good measure. If you incentivize employees on AI-driven metrics like 'emails sent,' they will optimize for the number, not quality, corrupting the data and giving false signals of productivity.

Related Insights

The best barometer for AI's enterprise value is not replacing the bottom 5% of workers. A better goal is empowering most employees to become 10x more productive. This reframes the AI conversation from a cost-cutting tool to a massive value-creation engine through human-AI partnership.

The proliferation of AI leaderboards incentivizes companies to optimize models for specific benchmarks. This creates a risk of "acing the SATs" where models excel on tests but don't necessarily make progress on solving real-world problems. This focus on gaming metrics could diverge from creating genuine user value.

Public leaderboards like LM Arena are becoming unreliable proxies for model performance. Teams implicitly or explicitly "benchmark" by optimizing for specific test sets. The superior strategy is to focus on internal, proprietary evaluation metrics and use public benchmarks only as a final, confirmatory check, not as a primary development target.

Human intuition is a poor gauge of AI's actual productivity benefits. A study found developers felt significantly sped up by AI coding tools even when objective measurements showed no speed increase. The real value may come from enabling tasks that otherwise wouldn't be attempted, rather than simply accelerating existing workflows.

Traditional product metrics like DAU are meaningless for autonomous AI agents that operate without user interaction. Product teams must redefine success by focusing on tangible business outcomes. Instead of tracking agent usage, measure "support tickets automatically closed" or "workflows completed."

Open and click rates are ineffective for measuring AI-driven, two-way conversations. Instead, leaders should adopt new KPIs: outcome metrics (e.g., meetings booked), conversational quality (tracking an agent's 'I don't know' rate to measure trust), and, ultimately, customer lifetime value.

Research highlights "work slop": AI output that appears polished but lacks human context. This forces coworkers to spend significant time fixing it, effectively offloading cognitive labor and damaging perceptions of the sender's capability and trustworthiness.

An employee using AI to do 8 hours of work in 4 benefits personally by gaining free time. The company (the principal) sees no productivity gain unless that employee produces more. This misalignment reveals the core challenge of translating individual AI efficiency into corporate-level growth.

AI tools can generate vast amounts of verbose code on command, making metrics like 'lines of code' easily gameable and meaningless for measuring true engineering productivity. This practice introduces complexity and technical debt rather than indicating progress.

The employees who discover clever AI shortcuts to be 'lazy' are your biggest innovation assets. Instead of letting them hide their methods, companies should find them, make them heroes, and systematically scale their bottom-up productivity hacks across the organization.