LLM Factual Knowledge Correlates Strongly with Total Parameter Count, Not Active Parameters

Related Insights

LLMs Can Predict Words But Can't Predict the Future Without Real-World Understanding

A core debate in AI is whether LLMs, which are text prediction engines, can achieve true intelligence. Critics argue they cannot because they lack a model of the real world. This prevents them from making meaningful, context-aware predictions about future events—a limitation that more data alone may not solve.

#119 OpenAI Sora vs. TikTok: Can “AI Entertainment” Fund the Compute Bill?

More or Less·5 months ago

LLM Knowledge is a Crutch; Future Research Must Isolate the "Cognitive Core"

LLMs learn two things from pre-training: factual knowledge and intelligent algorithms (the "cognitive core"). Karpathy argues the vast memorized knowledge is a hindrance, making models rely on memory instead of reasoning. The goal should be to strip away this knowledge to create a pure, problem-solving cognitive entity.

Andrej Karpathy — AGI is still a decade away

Dwarkesh Podcast·4 months ago

LLMs Appear Intelligent by Reflecting the Collective Creativity of Humanity

When AI pioneers like Geoffrey Hinton see agency in an LLM, they are misinterpreting the output. What they are actually witnessing is a compressed, probabilistic reflection of the immense creativity and knowledge from all the humans who created its training data. It's an echo, not a mind.

Lee Cronin "Sam Altman Is Delusional, Hinton Needs Therapy, P(Doom) Is Nonsense"

AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts·a month ago

LLMs Lack a "Sleep" Phase to Distill Daily Experiences into Long-Term Memory

Karpathy identifies a key missing piece for continual learning in AI: an equivalent to sleep. Humans seem to use sleep to distill the day's experiences (their "context window") into the compressed weights of the brain. LLMs lack this distillation phase, forcing them to restart from a fixed state in every new session.

Andrej Karpathy — AGI is still a decade away

Dwarkesh Podcast·4 months ago

AI's 'Bitter Lesson': Massive Compute Consistently Beats Human-Crafted Heuristics

The "bitter lesson" in AI research posits that methods leveraging massive computation scale better and ultimately win out over approaches that rely on human-designed domain knowledge or clever shortcuts, favoring scale over ingenuity.

#172: Sora 2, Claude Sonnet 4.5, ChatGPT Instant Checkout, How OpenAI Uses AI, Grokipedia & Mercor’s AI Productivity Index

The Artificial Intelligence Show·4 months ago

The Binary "Reasoning vs. Non-Reasoning" Model Distinction Is Now Obsolete

Classifying a model as "reasoning" based on a chain-of-thought step is no longer useful. With massive differences in token efficiency, a so-called "reasoning" model can be faster and cheaper than a "non-reasoning" one for a given task. The focus is shifting to a continuous spectrum of capability versus overall cost.

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah-Hill Smith

Latent Space: The AI Engineer Podcast·a month ago

Google's "Titans" AI Achieves Long-Term Memory by Detecting Information "Surprise"

Google's Titans architecture for LLMs mimics human memory by applying Claude Shannon's information theory. It scans vast data streams and identifies "surprise"—statistically unexpected or rare information relative to its training data. This novel data is then prioritized for long-term memory, preventing clutter from irrelevant information.

TECH009: Data Centers in Space, AI Education, Haptic Touch Robotics and More w/ Seb Bunney

We Study Billionaires - The Investor’s Podcast Network·2 months ago

Architectural Innovation Is Key to China's AI Cost Efficiency

Chinese AI models like Kimi achieve dramatic cost reductions through specific architectural choices, not just scale. Using a "mixture of experts" design, they only utilize a fraction of their total parameters for any given task, making them far more efficient to run than the "dense" models common in the West.

China Decode: How an AI Price War Could Spark a Market Correction

The Prof G Pod with Scott Galloway·3 months ago

LLMs' Superhuman Memorization is a Bug, Not a Feature

Unlike humans, whose poor memory forces them to generalize and find patterns, LLMs are incredibly good at memorization. Karpathy argues this is a flaw. It distracts them with recalling specific training documents instead of focusing on the underlying, generalizable algorithms of thought, hindering true understanding.

Andrej Karpathy — AGI is still a decade away

Dwarkesh Podcast·4 months ago

Imbue LLMs with Reasoning by Training on Code and Textbooks

To improve LLM reasoning, researchers feed them data that inherently contains structured logic. Training on computer code was an early breakthrough, as it teaches patterns of reasoning far beyond coding itself. Textbooks are another key source for building smaller, effective models.

Best of the Pod: Reid Hoffman on How AI Is Answering Our Biggest Questions

AI & I·2 months ago