Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

The vision for Model Context Protocol (MCP) is to let AIs perform complex, multi-app tasks. However, translating a full API like Stripe's into MCP tools overwhelms current models' context windows, making them confused and ineffective. This forces developers to handcraft a small subset of tools.

Related Insights

Current LLMs are intelligent enough for many tasks but fail because they lack access to complete context—emails, Slack messages, past data. The next step is building products that ingest this real-world context, making it available for the model to act upon.

To avoid overwhelming an LLM's context with hundreds of tools, a dynamic MCP approach offers just three: one to list available API endpoints, one to get details on a specific endpoint, and one to execute it. This scales well but increases latency and complexity due to the multiple turns required for a single action.

The MCP protocol's primitives are not directly influenced by current model limitations. Instead, it was designed with the expectation that models would improve exponentially. For example, "progressive discovery" was built-in, anticipating that models could be trained to fetch context on-demand, solving future context bloat problems.

Despite models advertising million-token context windows, Blitzy's CEO claims effective intelligence rapidly depreciates beyond 100k tokens due to "context pressure." This suggests that solving large-scale problems requires complex system-level orchestration, not just bigger models.

To overcome LLM limitations, successful Model Context Protocol (MCP) design involves severe constraints: keep the number of tools low, use precise yet concise names and descriptions, minimize input parameters, and return only essential data. This handcrafted approach is necessary for models to perform reliably.

To bypass context window limits with large APIs, Stainless uses a 'dynamic mode' for its MCP servers. It provides only three tools: `list endpoints`, `get endpoint details`, and `execute endpoint`. This scales infinitely but adds latency, as the model needs three separate turns to perform a single action.

Instead of direct API calls, build Model-Controlled Program (MCP) servers. They act as better guardrails for the AI, allowing it to interact with external data more effectively and even suggest novel use cases based on API documentation.

Even models with million-token context windows suffer from "context rot" when overloaded with information. Performance degrades as the model struggles to find the signal in the noise. Effective context engineering requires precision, packing the window with only the exact data needed.

Simply giving an AI agent thousands of tools is counterproductive. The real value lies in an 'agentic tool execution layer' that provides just-in-time discovery and managed execution to prevent the agent from getting overwhelmed by its options.

Exposing a full API via the Model Context Protocol (MCP) overwhelms an LLM's context window and reasoning. This forces developers to abandon exposing their entire service and instead manually craft a few highly specific tools, limiting the AI's capabilities and defeating the "do anything" vision of agents.