Removing an AI's Memorized Facts Can Counterintuitively Improve Its Reasoning

Related Insights

LLM Knowledge is a Crutch; Future Research Must Isolate the "Cognitive Core"

LLMs learn two things from pre-training: factual knowledge and intelligent algorithms (the "cognitive core"). Karpathy argues the vast memorized knowledge is a hindrance, making models rely on memory instead of reasoning. The goal should be to strip away this knowledge to create a pure, problem-solving cognitive entity.

Andrej Karpathy — AGI is still a decade away

Dwarkesh Podcast·8 months ago

AI Progress Requires Algorithmic Shifts, Not Just More Data and Scale

Solving key AI weaknesses like continual learning or robust reasoning isn't just a matter of bigger models or more data. Shane Legg argues it requires fundamental algorithmic and architectural changes, such as building new processes for integrating information over time, akin to an episodic memory.

The Arrival of AGI with Shane Legg (co-founder of DeepMind)

Google DeepMind: The Podcast·6 months ago

Smaller AI Models Should Prioritize Reasoning Over Memorized Knowledge

When designing smaller models, it's inefficient to use limited parameters for memorizing facts that can be looked up. Jeff Dean advocates for focusing a model's capacity on core reasoning abilities and pairing it with a retrieval system. This makes the model more generally useful, as it can access a vast external knowledge base when needed.

Owning the AI Pareto Frontier — Jeff Dean

Latent Space: The AI Engineer Podcast·4 months ago

The Binary "Reasoning vs. Non-Reasoning" Model Distinction Is Now Obsolete

Classifying a model as "reasoning" based on a chain-of-thought step is no longer useful. With massive differences in token efficiency, a so-called "reasoning" model can be faster and cheaper than a "non-reasoning" one for a given task. The focus is shifting to a continuous spectrum of capability versus overall cost.

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah-Hill Smith

Latent Space: The AI Engineer Podcast·5 months ago

LLM Performance Correlates with Total, Not Active, Parameters, Suggesting Sparsity Can Increase Further

Performance on knowledge-intensive benchmarks correlates strongly with an MoE model's total parameter count, not its active parameter count. With leading models like Kimi K2 reportedly using only ~3% active parameters, this suggests there is significant room to increase sparsity and efficiency without degrading factual recall.

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith

Latent Space: The AI Engineer Podcast·5 months ago

LLM Factual Knowledge Correlates Strongly with Total Parameter Count, Not Active Parameters

Artificial Analysis found that a model's ability to recall facts is a strong function of its total size, even for sparse Mixture-of-Experts (MoE) models. This suggests that the vast number of "inactive" parameters in MoE architectures contribute significantly to the model's overall knowledge base, not just the active ones per token.

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah-Hill Smith

Latent Space: The AI Engineer Podcast·5 months ago

AI's Current "Memory" Is a Context Shortcut, Not a Source of True Learning

The "memory" feature in today's LLMs is a convenience that saves users from re-pasting context. It is far from human memory, which abstracts concepts and builds pattern recognition. The true unlock will be when AI develops intuitive judgment from past "experiences" and data, a much longer-term challenge.

How Investors are using AI - [Business Breakdowns, EP.240]

Business Breakdowns·4 months ago

LLMs' Superhuman Memorization is a Bug, Not a Feature

Unlike humans, whose poor memory forces them to generalize and find patterns, LLMs are incredibly good at memorization. Karpathy argues this is a flaw. It distracts them with recalling specific training documents instead of focusing on the underlying, generalizable algorithms of thought, hindering true understanding.

Andrej Karpathy — AGI is still a decade away

Dwarkesh Podcast·8 months ago

Imbue LLMs with Reasoning by Training on Code and Textbooks

To improve LLM reasoning, researchers feed them data that inherently contains structured logic. Training on computer code was an early breakthrough, as it teaches patterns of reasoning far beyond coding itself. Textbooks are another key source for building smaller, effective models.

Best of the Pod: Reid Hoffman on How AI Is Answering Our Biggest Questions

AI & I·5 months ago

AI Progress Now Hinges on 'Scaffolding' That Overcomes Model Limitations

Recent AI breakthroughs aren't just from better models, but from clever 'architecture' or 'scaffolding' around them. For example, Claude Code 'cheats' its context window limit by taking notes, clearing its memory, and then reading the notes to resume work. This architectural innovation drives performance.

Claude Code’s Shining Moment, ChatGPT for Healthcare, End Of Busywork?

Big Technology Podcast·5 months ago

Get your free personalized podcast brief

Related Insights