Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

METR's influential study on AI developer productivity is now difficult to replicate. As AI tools become more powerful, developers are unwilling to be randomized into a control group where AI use is forbidden. This selection bias makes it increasingly impractical to measure true productivity gains with the original study design.

Related Insights

An MIT study reveals AI's asymmetrical impact on productivity. While it moderately improves performance for average workers, it provides an exponential boost to the top 5%. This is because effectively harnessing AI is a skill in itself, leading to a widening gap between good and great.

Even within OpenAI, a stark performance gap exists. Engineers who avoid using agentic AI for coding are reportedly 10x less productive across metrics like code volume, commits, and business impact. This creates significant challenges for performance management and HR.

There's a significant gap between AI performance on structured benchmarks and its real-world utility. A randomized controlled trial (RCT) found that open-source software developers were actually slowed down by 20% when using AI assistants, despite being miscalibrated to believe the tools were helping. This highlights the limitations of current evaluation methods.

While many believe AI will primarily help average performers become great, LinkedIn's experience shows the opposite. Their top talent were the first and most effective adopters of new AI tools, using them to become even more productive. This suggests AI may amplify existing talent disparities.

A randomized controlled trial revealed a nearly 40% perception gap in developer productivity. While experienced developers using AI tools were measurably 19% slower, they self-reported feeling 20% faster. This highlights the unreliability of self-reported metrics for assessing AI's impact.

Human intuition is a poor gauge of AI's actual productivity benefits. A study found developers felt significantly sped up by AI coding tools even when objective measurements showed no speed increase. The real value may come from enabling tasks that otherwise wouldn't be attempted, rather than simply accelerating existing workflows.

A recent study found that AI assistants actually slowed down programmers working on complex codebases. More importantly, the programmers mistakenly believed the AI was speeding them up. This suggests a general human bias towards overestimating AI's current effectiveness, which could lead to flawed projections about future progress.

While AI coding assistants appear to boost output, they introduce a "rework tax." A Stanford study found AI-generated code leads to significant downstream refactoring. A team might ship 40% more code, but if half of that increase is just fixing last week's AI-generated "slop," the real productivity gain is much lower than headlines suggest.

A Meta study found expert programmers were less productive with AI tools. The speaker suggests this is because users thought they were faster while actually being distracted (e.g., social media) waiting for the AI, highlighting a dangerous gap between perceived and actual productivity.

Data on AI tool adoption among engineers is conflicting. One A/B test showed that the highest-performing senior engineers gained the biggest productivity boost. However, other companies report that opinionated senior engineers are the most resistant to using AI tools, viewing their output as subpar.