/

© 2026 RiffOn. All rights reserved.

Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis · Mar 5, 2026

Goodfire's vision for 'Intentional Design': a new paradigm for proactively shaping AI training rather than just reverse-engineering models.

AI Interpretability Is Shifting From Identifying Concepts to Mapping Their Geometric Relationships

The field is moving beyond labeling concepts with sparse autoencoders. The new frontier is understanding the intricate geometric structures (manifolds) these concepts form in a model's latent space and how circuits transform them, providing a more unified, dynamic view.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Goodfire's 'Intentional Design' Aims to Shape Model Learning, Not Just Reverse-Engineer It

Instead of only analyzing a fully trained model, "intentional design" seeks to control what a model learns during training. The goal is to shape the loss landscape to produce desired behaviors and generalizations from the outset, moving from archaeology to architecture.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Effective AI Control Doesn't Fight Backpropagation, It Reshapes the Loss Landscape

Trying to simply block a model from learning an undesirable behavior is futile; gradient descent will find a way around the obstacle. Truly effective techniques must alter the loss landscape so the model naturally "wants" to learn the desired behavior.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Running Safety Probes on a Frozen Model Copy Helps Prevent Evasion

To reduce hallucinations, Goodfire runs a detection probe on a frozen copy of a model, not the live one being trained. This makes it computationally harder for the model to learn to evade the detector than to simply learn not to hallucinate, addressing a key failure mode in AI safety.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

AI Models Can Be Steered by Decomposing Gradient Updates Into Semantic Concepts

Using a sparse autoencoder to identify active concepts, one can project a model's gradient update onto these concepts. This reveals what the model is learning (e.g., "pirate speak" vs. "arithmetic") and allows for selectively amplifying or suppressing specific learning directions.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Goodfire Used Interpretability to Discover a Novel Alzheimer's Biomarker from an AI

By analyzing a model predicting Alzheimer's, Goodfire discovered it relied on the length of cell-free DNA fragments—a previously overlooked signal. This demonstrates how interpretability can extract new, testable scientific hypotheses from high-performing "black box" models.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Removing an AI's Memorized Facts Can Counterintuitively Improve Its Reasoning

Research shows it's possible to distinguish and remove model weights used for memorizing facts versus those for general reasoning. Surprisingly, pruning these memorization weights can improve a model's performance on some reasoning tasks, suggesting a path toward creating more efficient, focused AI reasoners.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Goodfire Monetizes Interpretability with a Palantir-like High-Touch Service Model

Instead of a low-touch SaaS product, Goodfire's business model involves high-value, seven-figure consulting engagements. They work directly with large organizations in finance, government, and life sciences to apply bespoke interpretability and intentional design techniques to specific, high-stakes problems.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

AI Safety Labs May Withhold Research to Preserve a 'Line of Retreat'

Goodfire is cautious about immediately publishing all findings in sensitive areas like intentional design. This isn't just for commercial reasons, but for safety. If a research path proves dangerous, not having published every step allows the community a "line of retreat" from pursuing a harmful direction.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Interpretability Tools for Transformers Are Proving Effective on New Architectures

Contrary to fears, interpretability techniques for Transformers seem to work well on new architectures like Mamba and Mixture-of-Experts. These architectures may even offer novel "affordances," such as interpretable routing paths in MoEs, that could make understanding models easier, not harder.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Fixing AI Sycophancy Requires Surgical Intervention, Not Deleting 'Theory of Mind'

A model's ability to understand a user's mental state is crucial for helpfulness but also enables sycophancy. Effective alignment must surgically intervene in the specific circuit where this capability is misused for people-pleasing, rather than crudely removing the entire useful 'theory of mind' capacity.

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath thumbnail

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago