LLMs learn two things from pre-training: factual knowledge and intelligent algorithms (the "cognitive core"). Karpathy argues the vast memorized knowledge is a hindrance, making models rely on memory instead of reasoning. The goal should be to strip away this knowledge to create a pure, problem-solving cognitive entity.

Related Insights

A core debate in AI is whether LLMs, which are text prediction engines, can achieve true intelligence. Critics argue they cannot because they lack a model of the real world. This prevents them from making meaningful, context-aware predictions about future events—a limitation that more data alone may not solve.

LLMs shine when acting as a 'knowledge extruder'—shaping well-documented, 'in-distribution' concepts into specific code. They fail when the core task is novel problem-solving where deep thinking, not code generation, is the bottleneck. In these cases, the code is the easy part.

Karpathy identifies a key missing piece for continual learning in AI: an equivalent to sleep. Humans seem to use sleep to distill the day's experiences (their "context window") into the compressed weights of the brain. LLMs lack this distillation phase, forcing them to restart from a fixed state in every new session.

An LLM shouldn't do math internally any more than a human would. The most intelligent AI systems will be those that know when to call specialized, reliable tools—like a Python interpreter or a search API—instead of attempting to internalize every capability from first principles.

AI expert Andrej Karpathy suggests treating LLMs as simulators, not entities. Instead of asking, "What do you think?", ask, "What would a group of [relevant experts] say?". This elicits a wider range of simulated perspectives and avoids the biases inherent in forcing the LLM to adopt a single, artificial persona.

Current AI models resemble a student who grinds 10,000 hours on a narrow task. They achieve superhuman performance on benchmarks but lack the broad, adaptable intelligence of someone with less specific training but better general reasoning. This explains the gap between eval scores and real-world utility.

Karpathy claims that despite their ability to pass advanced exams, LLMs cognitively resemble "savant kids." They possess vast, perfect memory and can produce impressive outputs, but they lack the deeper understanding and cognitive maturity to create their own culture or truly grasp what they are doing. They are not yet adult minds.

Overloading LLMs with excessive context degrades performance, a phenomenon known as 'context rot'. Claude Skills address this by loading context only when relevant to a specific task. This laser-focused approach improves accuracy and avoids the performance degradation seen in broader project-level contexts.

Unlike humans, whose poor memory forces them to generalize and find patterns, LLMs are incredibly good at memorization. Karpathy argues this is a flaw. It distracts them with recalling specific training documents instead of focusing on the underlying, generalizable algorithms of thought, hindering true understanding.

To improve LLM reasoning, researchers feed them data that inherently contains structured logic. Training on computer code was an early breakthrough, as it teaches patterns of reasoning far beyond coding itself. Textbooks are another key source for building smaller, effective models.