/

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data · Jun 24, 2026

Engram's founders discuss moving beyond pre-training to models that are always learning, internalizing context to create personalized AI memory.

Engram Bets on a Future of Many Personalized AI Models, Not One Giant AGI

While frontier labs aim for a single, universally intelligent model, Engram believes value lies in specialized models that learn private, conflicting, or ambiguous user-specific data—things that are difficult to incorporate into a single, massive model.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

Internalizing Knowledge Into Model Weights Can Reduce Inference Costs Up to 100x

Continuously training a model on private data internalizes concepts, reducing the need for massive context windows and system prompts. This dramatically cuts token consumption for inference compared to RAG-based approaches that re-read documents repeatedly.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

Deep Learning Fuses Algorithms and Databases, Forcing a Rethink of AI Architecture

Traditional computer science separates algorithms (processing) and databases (knowledge). Deep learning merges them into model weights. The new challenge is not to re-separate them, but to manage this fusion by deciding which facts should be periodically mixed into the model.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

AI Models Need an Offline 'Dreaming' Phase to Internalize Knowledge and Test Capabilities

Instead of only learning at test time, models should have a phase to retreat from live interaction and deeply integrate new information. This 'dreaming' allows them to experiment with their affordances and what they know, analogous to how humans consolidate memories.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

True AI Insight Requires Associative Memory in Weights, Not Just RAG Lookups

RAG systems are limited to direct retrieval and can't make spontaneous, abstract connections. This human-like ability to notice related but unasked-for concepts can only emerge from knowledge internalized within model weights, forming an associative memory.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

Engram Argues Future AI Models Will Be 'Always Training,' Not Just Pre-Trained and Fine-Tuned

The bottleneck for AI is not raw intelligence but understanding new context. This requires models that continuously learn from new data and interactions, moving beyond the static pre-train/fine-tune paradigm and deeply baking new information into the model weights.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

LLM Fact Memorization Is a Feature, Not a Bug; The Real Problem Is What to Remember

The idea of separating "fact learning" from "skill learning" is a false dichotomy. Models need a base of internalized facts to reason effectively. The key is developing intelligence to compress what's important and discard what isn't, much like lossy human memory.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

A KV Cache for One Article Can Rival an Entire 70B Model’s Size, Highlighting Its Inefficiency

A KV cache for a single Wikipedia article can consume 80GB of HBM, while a 70B model storing the internet's knowledge is only slightly larger (100GB). This highlights the inefficiency of context-window memory and the benefit of compressing that knowledge into model weights.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago

For High-Stakes Tasks Like a Math Olympiad, Training on Data Is Superior to RAG

To win a math competition, an AI lab wouldn't just build a RAG system over textbooks. They would synthesize training data and launch a training job. This 'magic of training' integrates concepts more deeply than retrieval, a principle applicable beyond just frontier models.

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin thumbnail

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Training Data·4 days ago