Richard Sutton, whose "Bitter Lesson" essay was a foundational argument for scaling compute in AI, has publicly aligned with critiques from LLM skeptic Gary Marcus. This surprising shift suggests that the original simplistic interpretation of "more compute is all you need" is being re-evaluated by its own progenitor.
A core debate in AI is whether LLMs, which are text prediction engines, can achieve true intelligence. Critics argue they cannot because they lack a model of the real world. This prevents them from making meaningful, context-aware predictions about future events—a limitation that more data alone may not solve.
With industry dominating large-scale compute, academia's function is no longer to train the biggest models. Instead, its value lies in pursuing unconventional, high-risk research in areas like new algorithms, architectures, and theoretical underpinnings that commercial labs, focused on scaling, might overlook.
For the first time in years, the perceived leap in LLM capabilities has slowed. While models have improved, the cost increase (from $20 to $200/month for top-tier access) is not matched by a proportional increase in practical utility, suggesting a potential plateau or diminishing returns.
The history of AI, such as the 2012 AlexNet breakthrough, demonstrates that scaling compute and data on simpler, older algorithms often yields greater advances than designing intricate new ones. This "bitter lesson" suggests prioritizing scalability over algorithmic complexity for future progress.
Ilya Sutskever argues the 'age of scaling' is ending. Further progress towards AGI won't come from just making current models bigger. The new frontier is fundamental research to discover novel paradigms and bend the scaling curve, a strategy his company SSI is pursuing.
Richard Sutton, author of "The Bitter Lesson," argues that today's LLMs are not truly "bitter lesson-pilled." Their reliance on finite, human-generated data introduces inherent biases and limitations, contrasting with systems that learn from scratch purely through computational scaling and environmental interaction.
The "bitter lesson" in AI research posits that methods leveraging massive computation scale better and ultimately win out over approaches that rely on human-designed domain knowledge or clever shortcuts, favoring scale over ingenuity.
The era of guaranteed progress by simply scaling up compute and data for pre-training is ending. With massive compute now available, the bottleneck is no longer resources but fundamental ideas. The AI field is re-entering a period where novel research, not just scaling existing recipes, will drive the next breakthroughs.
Contrary to the prevailing 'scaling laws' narrative, leaders at Z.AI believe that simply adding more data and compute to current Transformer architectures yields diminishing returns. They operate under the conviction that a fundamental performance 'wall' exists, necessitating research into new architectures for the next leap in capability.
Ilya Sutskever argues that the AI industry's "age of scaling" (2020-2025) is insufficient for achieving superintelligence. He posits that the next leap requires a return to the "age of research" to discover new paradigms, as simply making existing models 100x larger won't be enough for a breakthrough.