While frontier labs aim for a single, universally intelligent model, Engram believes value lies in specialized models that learn private, conflicting, or ambiguous user-specific data—things that are difficult to incorporate into a single, massive model.
Continuously training a model on private data internalizes concepts, reducing the need for massive context windows and system prompts. This dramatically cuts token consumption for inference compared to RAG-based approaches that re-read documents repeatedly.
Traditional computer science separates algorithms (processing) and databases (knowledge). Deep learning merges them into model weights. The new challenge is not to re-separate them, but to manage this fusion by deciding which facts should be periodically mixed into the model.
Instead of only learning at test time, models should have a phase to retreat from live interaction and deeply integrate new information. This 'dreaming' allows them to experiment with their affordances and what they know, analogous to how humans consolidate memories.
RAG systems are limited to direct retrieval and can't make spontaneous, abstract connections. This human-like ability to notice related but unasked-for concepts can only emerge from knowledge internalized within model weights, forming an associative memory.
The bottleneck for AI is not raw intelligence but understanding new context. This requires models that continuously learn from new data and interactions, moving beyond the static pre-train/fine-tune paradigm and deeply baking new information into the model weights.
The idea of separating "fact learning" from "skill learning" is a false dichotomy. Models need a base of internalized facts to reason effectively. The key is developing intelligence to compress what's important and discard what isn't, much like lossy human memory.
A KV cache for a single Wikipedia article can consume 80GB of HBM, while a 70B model storing the internet's knowledge is only slightly larger (100GB). This highlights the inefficiency of context-window memory and the benefit of compressing that knowledge into model weights.
To win a math competition, an AI lab wouldn't just build a RAG system over textbooks. They would synthesize training data and launch a training job. This 'magic of training' integrates concepts more deeply than retrieval, a principle applicable beyond just frontier models.
