Richard Sutton's "Bitter Lesson" suggests general compute always wins. Applied to LLMs, building complex workflows or fine-tuning yields only temporary gains that the next-generation general model will erase. Always bet on the more general model.
The "Bitter Lesson" is not just about using more compute, but leveraging it scalably. Current LLMs are inefficient because they only learn during a discrete training phase, not during deployment where most computation occurs. This reliance on a special, data-intensive training period is not a scalable use of computational resources.
General LLMs are optimized for short, stateless interactions. For complex, multi-step learning, they quickly lose context and deviate from the user's original goal. A true learning platform must provide persistent "scaffolding" that always brings the user back to their objective, which LLMs lack.
Overly structured, workflow-based systems that work with today's models will become bottlenecks tomorrow. Engineers must be prepared to shed abstractions and rebuild simpler, more general systems to capture the gains from exponentially improving models.
AI development history shows that complex, hard-coded approaches to intelligence are often superseded by more general, simpler methods that scale more effectively. This "bitter lesson" warns against building brittle solutions that will become obsolete as core models improve.
An LLM shouldn't do math internally any more than a human would. The most intelligent AI systems will be those that know when to call specialized, reliable tools—like a Python interpreter or a search API—instead of attempting to internalize every capability from first principles.
Richard Sutton, author of "The Bitter Lesson," argues that today's LLMs are not truly "bitter lesson-pilled." Their reliance on finite, human-generated data introduces inherent biases and limitations, contrasting with systems that learn from scratch purely through computational scaling and environmental interaction.
The "bitter lesson" in AI research posits that methods leveraging massive computation scale better and ultimately win out over approaches that rely on human-designed domain knowledge or clever shortcuts, favoring scale over ingenuity.
The "bitter lesson" of AI applies to product development: complex scaffolding built around model limitations (like early vector stores or agent frameworks) will inevitably become obsolete as the models themselves get smarter and absorb those functions. Don't over-engineer solutions that a future model will solve natively.
Just as neural networks replaced hand-crafted features, large generalist models are replacing narrow, task-specific ones. Jeff Dean notes the era of unified models is "really upon us." A single, large model that can generalize across domains like math and language is proving more powerful than bespoke solutions for each, a modern take on the "bitter lesson."
Richard Sutton, whose "Bitter Lesson" essay was a foundational argument for scaling compute in AI, has publicly aligned with critiques from LLM skeptic Gary Marcus. This surprising shift suggests that the original simplistic interpretation of "more compute is all you need" is being re-evaluated by its own progenitor.