AlphaZero Achieved Superhuman Skill by Discarding Human Training Data

Related Insights

DeepMind Taught AI to Learn Like a Child by Playing Games

DeepMind's core breakthrough was treating AI like a child, not a machine. Instead of programming complex strategies, they taught it to master tasks through simple games like Pong, giving it only one rule ('score go up is good') and allowing it to learn for itself through trial and error.

Story Of The Most Important Founder You've Never Heard Of

My First Million·3 months ago

Advanced AIs Develop Alien Internal Reasoning, Not Just Predict Next Words

Reinforcement learning incentivizes AIs to find the right answer, not just mimic human text. This leads to them developing their own internal "dialect" for reasoning—a chain of thought that is effective but increasingly incomprehensible and alien to human observers.

What AI Means for Students & Teachers: My Keynote from the Michigan Virtual AI Summit

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 months ago

Asking AI to 'Unlearn' Core Concepts Unlocks Perspectives Humans Cannot Achieve

A novel prompting technique involves instructing an AI to assume it knows nothing about a fundamental concept, like gender, before analyzing data. This "unlearning" process allows the AI to surface patterns from a truly naive perspective that is impossible for a human to replicate.

She Turned Her Whole Life Into Training Data—For an AI Baby

AI & I·5 months ago

AI Achieves Superhuman Performance in Verifiable Domains Like Coding Via "Experiential Learning"

In domains like coding and math where correctness is automatically verifiable, AI can move beyond imitating humans (RLHF). Using pure reinforcement learning, or "experiential learning," models learn via self-play and can discover novel, superhuman strategies similar to AlphaGo's Move 37.

Inside The $2.2B AI Research Accelerator | Turing

Sourcery·7 months ago

AlphaGo's 'Move 37' Proved AI Can Generate Genuinely New, Counterintuitive Knowledge

AlphaGo's infamous 'Move 37' was a play no human expert would have made, initially dismissed as an error. Its eventual success demonstrated that AI can discover novel, superior strategies beyond the existing corpus of human knowledge, fundamentally expanding a field of study rather than just mastering it.

10 Years of AlphaGo: The Turning Point for AI | Thore Graepel & Pushmeet Kohli

Google DeepMind: The Podcast·2 months ago

AI's 'Bitter Lesson': Massive Compute Consistently Beats Human-Crafted Heuristics

The "bitter lesson" in AI research posits that methods leveraging massive computation scale better and ultimately win out over approaches that rely on human-designed domain knowledge or clever shortcuts, favoring scale over ingenuity.

#172: Sora 2, Claude Sonnet 4.5, ChatGPT Instant Checkout, How OpenAI Uses AI, Grokipedia & Mercor’s AI Productivity Index

The Artificial Intelligence Show·7 months ago

AI Develops Alien Intelligence by Lacking Human Constraints Like Mortality and Limited Compute

Human intelligence is shaped by limitations like a finite lifespan and small brain, forcing efficient learning from sparse data. AI lacks these constraints, learning from lifetimes of data with massive compute. This fundamental difference means AI will naturally evolve into a distinct, non-human form of intelligence unless we explicitly engineer human-like biases into it.

972: In Case You Missed It in February 2026

Super Data Science: ML & AI Podcast with Jon Krohn·2 months ago

Superhuman AI Models Can Learn Alien Heuristics Instead of Human-Understood Principles

Even when a model performs a task correctly, interpretability can reveal it learned a bizarre, "alien" heuristic that is functionally equivalent but not the generalizable, human-understood principle. This highlights the challenge of ensuring models truly "grok" concepts.

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

Latent Space: The AI Engineer Podcast·3 months ago

AlphaGo Optimized for Win Probability, Not Score Margin, Creating Counterintuitive Behavior

In the endgame, AlphaGo made moves that seemed suboptimal, even giving up points. This was because it wasn't optimizing for a large victory margin (a human heuristic) but purely for maximizing the probability of winning, even by a half-point. This reveals how literal AI objective functions can differ from human proxies for success.

10 Years of AlphaGo: The Turning Point for AI | Thore Graepel & Pushmeet Kohli

Google DeepMind: The Podcast·2 months ago

AI's 'Move 37' Was the Moment It First Demonstrated True Creativity

The 'Move 37' in the AlphaGo vs. Lee Sedol match was AI's 'four-minute mile.' It marked the first time an AI made a move that was not just optimal but also novel and creative—one no human grandmaster would have conceived. This signaled a shift from pattern matching to genuine, emergent intelligence.

Story Of The Most Important Founder You've Never Heard Of

My First Million·3 months ago

Get your free personalized podcast brief

Related Insights