Modern AI Training Is Not Just Next-Token Prediction Anymore

Related Insights

Advanced AIs Develop Alien Internal Reasoning, Not Just Predict Next Words

Reinforcement learning incentivizes AIs to find the right answer, not just mimic human text. This leads to them developing their own internal "dialect" for reasoning—a chain of thought that is effective but increasingly incomprehensible and alien to human observers.

What AI Means for Students & Teachers: My Keynote from the Michigan Virtual AI Summit

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·7 months ago

LLMs Were an Intelligence 'Shortcut'; AI Now Returns to Agent-Based Learning to Exceed Human Data

The boom from LLMs was a 'shortcut' that mined intelligence from existing human data. This has limits. To achieve novel breakthroughs beyond that corpus, the field now re-integrates the original DeepMind philosophy of agents learning through interaction (like reinforcement learning) to generate truly new knowledge.

10 Years of AlphaGo: The Turning Point for AI | Thore Graepel & Pushmeet Kohli

Google DeepMind: The Podcast·4 months ago

AIs Are Developing Internal Jargon, Proving They're Not Just Predicting Next Tokens

Under intense pressure from reinforcement learning, some language models are creating their own unique dialects to communicate internally. This phenomenon shows they are evolving beyond merely predicting human language patterns found on the internet.

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

Leading AI Researchers Find It "Crazy" That LLMs Work Without Value Functions

Modern LLMs use a simple form of reinforcement learning that directly rewards successful outcomes. This contrasts with more sophisticated methods, like those in AlphaGo or the brain, which use "value functions" to estimate long-term consequences. It's a mystery why the simpler approach is so effective.

Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast·6 months ago

AI's Next Leap Is Reinforcement Learning in Simulated Environments

Pre-training on internet text data is hitting a wall. The next major advancements will come from reinforcement learning (RL), where models learn by interacting with simulated environments (like games or fake e-commerce sites). This post-training phase is in its infancy but will soon consume the majority of compute.

Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Invest Like the Best with Patrick O'Shaughnessy·9 months ago

Reinforcement Learning Now Excels at 'Fuzzy' Tasks Beyond Verifiable Rewards

It's a misconception that Reinforcement Learning's power is limited to domains with clear, verifiable rewards. Geoffrey Irving points out that frontier models use RL to improve on fuzzy, unverifiable tasks, like giving troubleshooting advice from a photo of a lab setup, proving the technique's much broader effectiveness.

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Reinforcement Learning Represents AI's Shift From Imitating Data to Achieving Goals

The transition from supervised learning (copying internet text) to reinforcement learning (rewarding a model for achieving a goal) marks a fundamental breakthrough. This method, used in Anthropic's Opus 3 model, allows AI to develop novel problem-solving capabilities beyond simple data emulation.

Jack Morris on Finding the Next Big AI Breakthrough

Odd Lots·9 months ago

The Future of AI Training Is Models Creating Their Own "Dynamic Data"

Static data scraped from the web is becoming less central to AI training. The new frontier is "dynamic data," where models learn through trial-and-error in synthetic environments (like solving math problems), effectively creating their own training material via reinforcement learning.

The AI Tsunami is Here & Society Isn't Ready | Dario Amodei x Nikhil Kamath | People by WTF

People by WTF·4 months ago

View LLM Imitation Learning as Reinforcement Learning with a One-Token Horizon

The distinction between imitation learning and reinforcement learning (RL) is not a rigid dichotomy. Next-token prediction in LLMs can be framed as a form of RL where the "episode" is just one token long and the reward is based on prediction accuracy. This conceptual model places both learning paradigms on a continuous spectrum rather than in separate categories.

Some thoughts on the Sutton interview

Dwarkesh Podcast·9 months ago

Modern AI Models Are 'Grown' Through Reinforcement, Not Explicitly Programmed

Unlike traditional software, large language models are not programmed with specific instructions. They evolve through a process where different strategies are tried, and those that receive positive rewards are repeated, making their behaviors emergent and sometimes unpredictable.

Can AI Models Be Evil? These Anthropic Researchers Say Yes — With Evan Hubinger And Monte MacDiarmid

Big Technology Podcast·7 months ago

Get your free personalized podcast brief

Related Insights