To Understand a Neural Network, Focus on Its Training Process, Not Its Final Weights

Related Insights

Advanced AIs Develop Alien Internal Reasoning, Not Just Predict Next Words

Reinforcement learning incentivizes AIs to find the right answer, not just mimic human text. This leads to them developing their own internal "dialect" for reasoning—a chain of thought that is effective but increasingly incomprehensible and alien to human observers.

What AI Means for Students & Teachers: My Keynote from the Michigan Virtual AI Summit

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

The Entire History of Deep Learning Is a Story of Scaling Compute

The progression from early neural networks to today's massive models is fundamentally driven by the exponential increase in available computational power, from the initial move to GPUs to today's million-fold increases in training capacity on a single model.

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

Latent Space: The AI Engineer Podcast·3 months ago

In-Context Learning May Be a Form of Internal Gradient Descent

Contrary to the view that in-context learning is a distinct process from training, Karpathy speculates it might be an emergent form of gradient descent happening within the model's layers. He cites papers showing that transformers can learn to perform linear regression in-context, with internal mechanics that mimic an optimization loop.

Andrej Karpathy — AGI is still a decade away

Dwarkesh Podcast·4 months ago

AI Interpretability Reveals Messy Systems, Not Clean, Reverse-Engineered Algorithms

The ambition to fully reverse-engineer AI models into simple, understandable components is proving unrealistic as their internal workings are messy and complex. Its practical value is less about achieving guarantees and more about coarse-grained analysis, such as identifying when specific high-level capabilities are being used.

Full-Stack AI Safety: Why Defense-in-Depth Might Work, with Far.AI CEO Adam Gleave

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 months ago

AI Models Excel at Pattern Fitting But Can't Natively Abstract Causal Laws Like F=MA

Current AI can learn to predict complex patterns, like planetary orbits, from data. However, it struggles to abstract the underlying causal laws, such as Newtonian physics (F=MA). This leap to a higher level of abstraction remains a fundamental challenge beyond simple pattern recognition.

What Comes After ChatGPT? The Mother of ImageNet Predicts The Future

a16z Podcast·2 months ago

Mechanistic Interpretability Bets on a Future Where "The Model Said So" Is Unacceptable

As AI models are used for critical decisions in finance and law, black-box empirical testing will become insufficient. Mechanistic interpretability, which analyzes model weights to understand reasoning, is a bet that society and regulators will require explainable AI, making it a crucial future technology.

Anthropic, Glean & OpenRouter: How AI Moats Are Built with Deedy Das of Menlo Ventures

Latent Space: The AI Engineer Podcast·3 months ago

AI in Scientific Research Requires Interpretability, Not Just Performance

For AI systems to be adopted in scientific labs, they must be interpretable. Researchers need to understand the 'why' behind an AI's experimental plan to validate and trust the process, making interpretability a more critical feature than raw predictive power.

Big Ideas 2026: New Infrastructure Primitives

The a16z Show·2 months ago

Trading AI's Uninterpretability Is a Feature, Not a Bug

Demanding interpretability from AI trading models is a fallacy because they operate at a superhuman level. An AI predicting a stock's price in one minute is processing data in a way no human can. Expecting a simple, human-like explanation for its decision is unreasonable, much like asking a chess engine to explain its moves in prose.

How Hudson River Trading Actually Uses AI

Odd Lots·4 months ago

Modern AI Models Are 'Grown' Through Reinforcement, Not Explicitly Programmed

Unlike traditional software, large language models are not programmed with specific instructions. They evolve through a process where different strategies are tried, and those that receive positive rewards are repeated, making their behaviors emergent and sometimes unpredictable.

Can AI Models Be Evil? These Anthropic Researchers Say Yes — With Evan Hubinger And Monte MacDiarmid

Big Technology Podcast·3 months ago

Evolution Gave Human Brains Complex Loss Functions, While AI Relies on Simple Ones

AI models use simple, mathematically clean loss functions. The human brain's superior learning efficiency might stem from evolution hard-coding numerous, complex, and context-specific loss functions that activate at different developmental stages, creating a sophisticated learning curriculum.

Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast·2 months ago