We scan new podcasts and send you the top 5 insights daily.
The idea of separating "fact learning" from "skill learning" is a false dichotomy. Models need a base of internalized facts to reason effectively. The key is developing intelligence to compress what's important and discard what isn't, much like lossy human memory.
A key challenge in AI development is creating constraints on memory. Unlike humans who naturally filter relevance, AI systems that retain all information get overwhelmed by noise. Building an effective "forgetting" mechanism is crucial for AI to determine salience and avoid making faulty connections based on irrelevant data.
LLMs learn two things from pre-training: factual knowledge and intelligent algorithms (the "cognitive core"). Karpathy argues the vast memorized knowledge is a hindrance, making models rely on memory instead of reasoning. The goal should be to strip away this knowledge to create a pure, problem-solving cognitive entity.
Implementing effective long-term memory for AI agents is a major unsolved problem. The difficulty is not in storing information, but in automatically generating useful memories from interactions and accurately retrieving the correct, context-specific memory without cluttering the prompt with irrelevant information.
The "Omniscience" accuracy benchmark, which measures pure factual knowledge, tracks more closely with a model's total parameters than any other metric. This suggests embedded knowledge is a direct function of model size, distinct from reasoning abilities developed via training techniques.
Contrary to the belief that memorization requires multiple training epochs, large language models demonstrate the capacity to perfectly recall specific information after seeing it only once. This surprising phenomenon highlights how understudied the information theory behind LLMs still is.
Google's Titans architecture for LLMs mimics human memory by applying Claude Shannon's information theory. It scans vast data streams and identifies "surprise"—statistically unexpected or rare information relative to its training data. This novel data is then prioritized for long-term memory, preventing clutter from irrelevant information.
The "memory" feature in today's LLMs is a convenience that saves users from re-pasting context. It is far from human memory, which abstracts concepts and builds pattern recognition. The true unlock will be when AI develops intuitive judgment from past "experiences" and data, a much longer-term challenge.
Unlike humans, whose poor memory forces them to generalize and find patterns, LLMs are incredibly good at memorization. Karpathy argues this is a flaw. It distracts them with recalling specific training documents instead of focusing on the underlying, generalizable algorithms of thought, hindering true understanding.
Research shows it's possible to distinguish and remove model weights used for memorizing facts versus those for general reasoning. Surprisingly, pruning these memorization weights can improve a model's performance on some reasoning tasks, suggesting a path toward creating more efficient, focused AI reasoners.
M0 employs a two-phase process for agent memory. It first extracts atomic facts solely from human-computer dialogue, ignoring verbose tool outputs. A separate LLM call then compares these new facts to existing memories to decide whether to add, update, or ignore them, preventing redundant or contradictory storage and minimizing token usage.