We scan new podcasts and send you the top 5 insights daily.
Despite perceptions of rapid acceleration, a large-scale analysis by Google DeepMind and EPOC that stitches together many benchmarks over time shows that general AI capability progress has been remarkably linear. This suggests AI is currently a better tool, not an expanding population of researchers.
On financial analyst benchmarks, top models from Anthropic, Google, and OpenAI are now almost indistinguishable in capability. This convergence suggests the frontier is commoditizing, questioning the return on investment for massive training runs and shifting value up the application stack.
When AI models achieve superhuman performance on specific benchmarks like coding challenges, it doesn't solve real-world problems. This is because we implicitly optimize for the benchmark itself, creating "peaky" performance rather than broad, generalizable intelligence.
METR's research reveals a consistent, exponential trend in AI capabilities over the last five years. When measured by the length of tasks an AI can complete (based on human completion time), this 'time horizon' has been doubling approximately every seven months, providing a single, robust metric for tracking progress.
Instead of a single "AGI" event, AI progress is better understood in three stages. We're in the "powerful tools" era. The next is "powerful agents" that act autonomously. The final stage, "autonomous organizations" that outcompete human-led ones, is much further off due to capability "spikiness."
While AI progress is marketed in revolutionary "step-changes" (e.g., GPT-3 to GPT-4), the underlying reality is more like compounding interest. A continuous stream of small, incremental improvements are accumulating, and their combined effect is what creates the feeling of an exponential leap in capability over time.
Third-party tracker METR observed that model complexity was doubling every seven months. However, a recent proprietary model shattered this trend, demonstrating nearly double the expected capability for independent operation (15 hours vs. an expected 8). This signals that AI advancement is accelerating unpredictably, outpacing prior scaling laws.
While the long-term trend for AI capability shows a seven-month doubling time, data since 2024 suggests an acceleration to a four-month doubling time. This faster pace has been a much better predictor of recent model performance, indicating a potential shift to a super-exponential trajectory.
The media portrays AI development as volatile, with huge breakthroughs and sudden plateaus. The reality inside labs like OpenAI is a steady, continuous process of experimentation, stacking small wins, and consistent scaling. The internal experience is one of "chugging along."
The current AI boom isn't a sudden, dangerous phenomenon. It's the culmination of 80 years of research since the first neural network paper in 1943. This long, steady progress counters the recent media-fueled hysteria about AI's immediate dangers.
An analysis of AI model performance shows a 2-2.5x improvement in intelligence scores across all major players within the last year. This rapid advancement is leading to near-perfect scores on existing benchmarks, indicating a need for new, more challenging tests to measure future progress.