Agentic workflows involving tool use or human-in-the-loop steps break the simple request-response model. The system no longer knows when a "conversation" is truly over, creating an unsolved cache invalidation problem. State (like the KV cache) might need to be preserved for seconds, minutes, or hours, disrupting memory management patterns.

Related Insights

An AI agent uses an LLM with tools, giving it agency to decide its next action. In contrast, a workflow is a predefined, deterministic path where the LLM's actions are forced. Most production AI systems are actually workflows, not true agents.

Purely agentic systems can be unpredictable. A hybrid approach, like OpenAI's Deep Research forcing a clarifying question, inserts a deterministic workflow step (a "speed bump") before unleashing the agent. This mitigates risk, reduces errors, and ensures alignment before costly computation.

When building Spiral, a single large language model trying to both interview the user and write content failed due to "context rot." The solution was a multi-agent system where an "interviewer" agent hands off the full context to a separate "writer" agent, improving performance and reliability.

Long-running AI agent conversations degrade in quality as the context window fills. The best engineers combat this with "intentional compaction": they direct the agent to summarize its progress into a clean markdown file, then start a fresh session using that summary as the new, clean input. This is like rebooting the agent's short-term memory.

Contrary to the idea that infrastructure problems get commoditized, AI inference is growing more complex. This is driven by three factors: (1) increasing model scale (multi-trillion parameters), (2) greater diversity in model architectures and hardware, and (3) the shift to agentic systems that require managing long-lived, unpredictable state.

OpenAI identifies agent evaluation as a key challenge. While they can currently grade an entire task's trace, the real difficulty lies in evaluating and optimizing the individual steps within a long, complex agentic workflow. This is a work-in-progress area critical for building reliable, production-grade agents.

"Context Engineering" is the critical practice of managing information fed to an LLM, especially in multi-step agents. This includes techniques like context compaction, using sub-agents, and managing memory. Harrison Chase considers this discipline more crucial than prompt engineering for building sophisticated agents.

While AI models excel at gathering and synthesizing information ('knowing'), they are not yet reliable at executing actions in the real world ('doing'). True agentic systems require bridging this gap by adding crucial layers of validation and human intervention to ensure tasks are performed correctly and safely.

The simple "tool calling in a loop" model for agents is deceptive. Without managing context, token-heavy tool calls quickly accumulate, leading to high costs ($1-2 per run), hitting context limits, and performance degradation known as "context rot."

To make agents useful over long periods, Tasklet engineers an "illusion" of infinite memory. Instead of feeding a long chat history, they use advanced context engineering: LLM-based compaction, scoping context for sub-agents, and having the LLM manage its own state in a SQL database to recall relevant information efficiently.