Leading AI Researchers Find It "Crazy" That LLMs Work Without Value Functions

Related Insights

LLMs Can Predict Words But Can't Predict the Future Without Real-World Understanding

A core debate in AI is whether LLMs, which are text prediction engines, can achieve true intelligence. Critics argue they cannot because they lack a model of the real world. This prevents them from making meaningful, context-aware predictions about future events—a limitation that more data alone may not solve.

#119 OpenAI Sora vs. TikTok: Can “AI Entertainment” Fund the Compute Bill?

More or Less·5 months ago

Early LLMs Learned by Simply Predicting the Next Word in 7,000 Books

In a 2018 interview, OpenAI's Greg Brockman described their foundational training method: ingesting thousands of books with the sole task of predicting the next word. This simple predictive objective was the key that unlocked complex, generalizable language understanding in their models.

Why We Need Ferries and Tugboats in Space w/ Orbital Operations | E2208

This Week in Startups·3 months ago

Advanced AIs Develop Alien Internal Reasoning, Not Just Predict Next Words

Reinforcement learning incentivizes AIs to find the right answer, not just mimic human text. This leads to them developing their own internal "dialect" for reasoning—a chain of thought that is effective but increasingly incomprehensible and alien to human observers.

What AI Means for Students & Teachers: My Keynote from the Michigan Virtual AI Summit

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

AI Achieves Superhuman Performance in Verifiable Domains Like Coding Via "Experiential Learning"

In domains like coding and math where correctness is automatically verifiable, AI can move beyond imitating humans (RLHF). Using pure reinforcement learning, or "experiential learning," models learn via self-play and can discover novel, superhuman strategies similar to AlphaGo's Move 37.

Inside The $2.2B AI Research Accelerator | Turing

Sourcery·4 months ago

Reinforcement Learning Represents AI's Shift From Imitating Data to Achieving Goals

The transition from supervised learning (copying internet text) to reinforcement learning (rewarding a model for achieving a goal) marks a fundamental breakthrough. This method, used in Anthropic's Opus 3 model, allows AI to develop novel problem-solving capabilities beyond simple data emulation.

Jack Morris on Finding the Next Big AI Breakthrough

Odd Lots·5 months ago

View LLM Imitation Learning as Reinforcement Learning with a One-Token Horizon

The distinction between imitation learning and reinforcement learning (RL) is not a rigid dichotomy. Next-token prediction in LLMs can be framed as a form of RL where the "episode" is just one token long and the reward is based on prediction accuracy. This conceptual model places both learning paradigms on a continuous spectrum rather than in separate categories.

Some thoughts on the Sutton interview

Dwarkesh Podcast·5 months ago

LLMs Follow a 'Backwards' Path to Agency Compared to Biological Evolution

Biological evolution used meta-reinforcement learning to create agents that could then perform imitation learning. The current AI paradigm is inverted: it starts with pure imitation learners (base LLMs) and then attempts to graft reinforcement learning on top to create coherent agency and goals. The success of this biologically 'backwards' approach remains an open question.

Some thoughts on the Sutton interview

Dwarkesh Podcast·5 months ago

Modern AI Models Are 'Grown' Through Reinforcement, Not Explicitly Programmed

Unlike traditional software, large language models are not programmed with specific instructions. They evolve through a process where different strategies are tried, and those that receive positive rewards are repeated, making their behaviors emergent and sometimes unpredictable.

Can AI Models Be Evil? These Anthropic Researchers Say Yes — With Evan Hubinger And Monte MacDiarmid

Big Technology Podcast·3 months ago

Evolution Gave Human Brains Complex Loss Functions, While AI Relies on Simple Ones

AI models use simple, mathematically clean loss functions. The human brain's superior learning efficiency might stem from evolution hard-coding numerous, complex, and context-specific loss functions that activate at different developmental stages, creating a sophisticated learning curriculum.

Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast·2 months ago

Current LLMs Are Stateless and Cannot Genuinely Learn from Experience

A key gap between AI and human intelligence is the lack of experiential learning. Unlike a human who improves on a job over time, an LLM is stateless. It doesn't truly learn from interactions; it's the same static model for every user, which is a major barrier to AGI.

TECH001: AI for Activists w/ Justin Moon and Shroominic (Tech Podcast)

We Study Billionaires - The Investor’s Podcast Network·5 months ago