Despite massive context windows in new models, AI agents still suffer from a form of 'memory leak' where accuracy degrades and irrelevant information from past interactions bleeds into current tasks. Power users manually delete old conversations to maintain performance, suggesting the issue is a core architectural challenge, not just a matter of context size.
Karpathy identifies a key missing piece for continual learning in AI: an equivalent to sleep. Humans seem to use sleep to distill the day's experiences (their "context window") into the compressed weights of the brain. LLMs lack this distillation phase, forcing them to restart from a fixed state in every new session.
When an AI's context window is nearly full, don't rely on its automatic compaction feature. Instead, proactively instruct the AI to summarize the current project state into a "process notes" file, then clear the context and have it read the summary to avoid losing key details.
When an AI model gives nonsensical responses after a long conversation, its context window is likely full. Instead of trying to correct it, reset the context. For prototypes, fork the design to start a new session. For chats, ask the AI to summarize the conversation, then start a new chat with that summary.
Even models with million-token context windows suffer from "context rot" when overloaded with information. Performance degrades as the model struggles to find the signal in the noise. Effective context engineering requires precision, packing the window with only the exact data needed.
Long, continuous AI chat threads degrade output quality as the context window fills up, making it harder for the model to recall early details. To maintain high-quality results, treat each discrete feature or task as a new chat, ensuring the agent has a clean, focused context for each job.
Long-running AI agent conversations degrade in quality as the context window fills. The best engineers combat this with "intentional compaction": they direct the agent to summarize its progress into a clean markdown file, then start a fresh session using that summary as the new, clean input. This is like rebooting the agent's short-term memory.
Long conversations degrade LLM performance as attention gets clogged with irrelevant details. An expert workflow is to stop, ask the model to summarize the key points of the discussion, and then start a fresh chat with that summary as the initial prompt. This keeps the context clean and the model on track.
Simply having a large context window is insufficient. Models may fail to "see" or recall specific facts embedded deep within the context, a phenomenon exposed by "needle in the haystack" evaluations. Effective reasoning capability across the entire window is a separate, critical factor.
Even with large advertised context windows, LLMs show performance degradation and strange behaviors when overloaded. Described as "context anxiety," they may prematurely give up on complex tasks, claim imaginary time constraints, or oversimplify the problem, highlighting the gap between advertised and effective context sizes.
To make agents useful over long periods, Tasklet engineers an "illusion" of infinite memory. Instead of feeding a long chat history, they use advanced context engineering: LLM-based compaction, scoping context for sub-agents, and having the LLM manage its own state in a SQL database to recall relevant information efficiently.