The MCP transport protocol requires holding state on the server. While fine for a single server, it becomes a problem at scale. When requests are distributed across multiple pods, a shared state layer (like Redis or Memcache) becomes necessary to ensure different servers can access the same session data.

Related Insights

The evolution of a protocol like MCP depends on a tight feedback loop with real-world implementations. Open source clients such as Goose serve as a "reference implementation" to test and demonstrate the value of new, abstract specs like MCPUI (for user interfaces), making the protocol's benefits concrete.

To avoid overwhelming an LLM's context with hundreds of tools, a dynamic MCP approach offers just three: one to list available API endpoints, one to get details on a specific endpoint, and one to execute it. This scales well but increases latency and complexity due to the multiple turns required for a single action.

Tools like Git were designed for human-paced development. AI agents, which can make thousands of changes in parallel, require a new infrastructure layer—real-time repositories, coordination mechanisms, and shared memory—that traditional systems cannot support.

MCP shouldn't be thought of as just another developer API like REST. Its true purpose is to enable seamless, consumer-focused pluggability. In a successful future, a user's mom wouldn't know what MCP is; her AI application would just connect to the right services automatically to get tasks done.

The MCP protocol's primitives are not directly influenced by current model limitations. Instead, it was designed with the expectation that models would improve exponentially. For example, "progressive discovery" was built-in, anticipating that models could be trained to fetch context on-demand, solving future context bloat problems.

MCP was born from the need for a central dev team to scale its impact. By creating a protocol, they empowered individual teams at Anthropic to build and deploy their own MCP servers without being a bottleneck. This decentralized model is so successful the core team doesn't know about 90% of internal servers.

The MCP protocol made the client's return stream optional to simplify implementation. However, this backfired as most clients didn't build it, rendering server-side features like elicitations and sampling unavailable because the communication channel didn't exist. This is a key lesson in protocol design.

MCP acts as a universal translator, allowing different AI models and platforms to share context and data. This prevents "AI amnesia" where customer interactions start from scratch, creating a continuous, intelligent experience by giving AI a persistent, shared memory.

OpenAI uses two connector types. First-party (1P) "sync connectors" store data to enable higher-quality, optimized experiences (e.g., re-ranking). Third-party (3P) MCP connectors provide broad, long-tail coverage but offer less control. This dual approach strategically trades off deep integration quality against ecosystem scale.

Exposing a full API via the Model Context Protocol (MCP) overwhelms an LLM's context window and reasoning. This forces developers to abandon exposing their entire service and instead manually craft a few highly specific tools, limiting the AI's capabilities and defeating the "do anything" vision of agents.

MCP's Stateful Design Requires Shared Caching for Horizontal Scalability | RiffOn