RiffOn - Some thoughts on the Sutton interview

LLMs' reliance on human data is a feature, not a bug—a crucial 'fossil fuel' providing the necessary prior for true reinforcement learning.

Richard Sutton's 'Bitter Lesson' Implies Current LLMs Are Inefficient Users of Compute

The "Bitter Lesson" is not just about using more compute, but leveraging it scalably. Current LLMs are inefficient because they only learn during a discrete training phase, not during deployment where most computation occurs. This reliance on a special, data-intensive training period is not a scalable use of computational resources.

Some thoughts on the Sutton interview

Dwarkesh Podcast·5 months ago

LLMs Follow a 'Backwards' Path to Agency Compared to Biological Evolution

Biological evolution used meta-reinforcement learning to create agents that could then perform imitation learning. The current AI paradigm is inverted: it starts with pure imitation learners (base LLMs) and then attempts to graft reinforcement learning on top to create coherent agency and goals. The success of this biologically 'backwards' approach remains an open question.

Some thoughts on the Sutton interview

Dwarkesh Podcast·5 months ago

View LLM Imitation Learning as Reinforcement Learning with a One-Token Horizon

The distinction between imitation learning and reinforcement learning (RL) is not a rigid dichotomy. Next-token prediction in LLMs can be framed as a form of RL where the "episode" is just one token long and the reward is based on prediction accuracy. This conceptual model places both learning paradigms on a continuous spectrum rather than in separate categories.

Some thoughts on the Sutton interview

Dwarkesh Podcast·5 months ago

Human Pre-Training Data is the 'Fossil Fuel' for Bootstrapping AGI

Like fossil fuels, finite human data isn't a dead-end for AI but a crucial, non-renewable resource. It provides the initial energy to bootstrap more advanced, self-sustaining learning systems (the AI equivalent of renewable energy), which couldn't have been built from scratch. This frames imitation learning as a necessary intermediate step, not the final destination.

Some thoughts on the Sutton interview

Dwarkesh Podcast·5 months ago