/

© 2026 RiffOn. All rights reserved.

Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Dwarkesh Podcast
Eric Jang – Building AlphaGo from scratch

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast · May 15, 2026

Eric Jang demystifies AlphaGo, detailing its core components like Monte Carlo Tree Search and neural networks for policy and value prediction.

A Small Neural Network Can Amortize a Vast Search, Compressing Deep Simulation into One Glance

A key insight from AlphaGo is that a relatively shallow neural network can approximate the result of an incredibly deep and complex search tree. This suggests neural nets can learn to compress sequential, recursive computation into a single, efficient forward pass.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

Frontier AI Models Use Massive Compute for Discovery, Not Efficiency

The enormous compute budget for the original AlphaGo was not about finding the most efficient training method, but about proving a method could work at all. Once a breakthrough is made and the path is clear, subsequent efforts can focus on optimization and achieve similar results with far less compute.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

AlphaGo's Policy Network Learns to Predict the Outcome of Its Own MCTS Search

Monte Carlo Tree Search (MCTS) acts as a 'policy improvement operator.' After the search finds a better move distribution, the policy network is trained to directly predict this improved distribution. This distills the expensive search process into the network itself, making it stronger over time.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

Automated AI Researchers Excel at Local Optimization But Fail at High-Level Strategic Pivots

Current LLM agents are effective at executing and optimizing experiments within a defined research track, like hyperparameter tuning. However, they lack the crucial scientific skill of 'lateral thinking'—recognizing when a research path is a dead end and strategically pivoting to a fundamentally new approach.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

AlphaGo Made Go Solvable by Using Neural Nets to Prune Its Intractable Search Tree

Go's search space is larger than the number of atoms in the universe, making exhaustive search impossible. AlphaGo's core breakthrough was using neural networks to intelligently guide its search, evaluating only the most promising moves and making an intractable problem solvable.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

AlphaGo's Value Network Truncates Game Tree Search by Predicting Mid-Game Outcomes

Humans stop analyzing a game when they intuit a winning or losing position. AlphaGo’s value function mimics this by predicting the eventual outcome from any board state. This allows the search to be drastically shortened, as it doesn't need to play out every possibility to the very end.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

MCTS Provides Corrective Training Data for Go AI, Mimicking Robotics' Dagger Algorithm

MCTS acts like the Dagger (Dataset Aggregation) algorithm in robotics. For every state in a game, even one on a losing path, MCTS provides a 'better' action. This teaches the policy not just the optimal path, but also how to recover and get back to it from suboptimal states, creating a more robust agent.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

For Unsearchable Games like StarCraft, AIs Train "Best Response" Policies Against Fixed Opponents

In games too complex for a clean search tree (e.g., StarCraft), AIs use 'neural fictitious self-play.' They train specialized model-free RL agents to be a 'best response' against specific, fixed opponents. These specialists are then distilled into a single, robust policy that averages across many opponents.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

Neural Networks Find Practical Solutions to NP-Hard Problems, Questioning Worst-Case Complexity Theory

The success of neural networks on problems like Go and protein folding, long considered intractable NP-hard problems, is profound. It suggests our formal understanding of computational hardness, which focuses on worst-case scenarios, may be an incomplete model for how to find useful, approximate solutions in practice.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

AlphaGo's RL Is a Stable Supervised Learning Loop, Bypassing Policy Gradient's High Variance

Unlike typical reinforcement learning which learns from sparse win/loss signals, AlphaGo's method is remarkably stable. It uses MCTS to generate an 'improved' move for every state, turning the problem into a simple supervised learning task of imitating a better version of itself, avoiding high-variance gradients.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

ResNets' Inductive Bias Makes Them More Data-Efficient Than Transformers for Go AI

For board games like Go, ResNet architectures can outperform Transformers in lower-data regimes. ResNets have a built-in inductive bias for local spatial patterns via convolutions, which is highly relevant for Go. Transformers must learn these patterns from scratch, requiring more data to achieve similar performance.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

AlphaGo Learns from MCTS's Full Probability Distribution, Not Just a Single Best Move

Instead of training on the single best action from its search (a one-hot label), AlphaGo's policy network learns to imitate the entire probability distribution of moves from MCTS. This 'soft label' contains far more information, enabling a much more effective and sample-efficient form of knowledge distillation.

Eric Jang – Building AlphaGo from scratch thumbnail

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·5 hours ago

RiffOn - Eric Jang – Building AlphaGo from scratch | Dwarkesh Podcast