In a significant strategic move, OpenAI's Evals product within Agent Kit allows developers to test results from non-OpenAI models via integrations like Open Router. This positions Agent Kit not just as an OpenAI-centric tool, but as a central, model-agnostic platform for building and optimizing agents.

Related Insights

OpenAI embraces the 'platform paradox' by selling API access to startups that compete directly with its own apps like ChatGPT. The strategy is to foster a broad ecosystem, believing that enabling competitors is necessary to avoid losing the platform race entirely.

Recognizing there is no single "best" LLM, AlphaSense built a system to test and deploy various models for different tasks. This allows them to optimize for performance and even stylistic preferences, using different models for their buy-side finance clients versus their corporate users.

The "AI wrapper" concern is mitigated by a multi-model strategy. A startup can integrate the best models from various providers for different tasks, creating a superior product. A platform like OpenAI is incentivized to only use its own models, creating a durable advantage for the startup.

Simply offering the latest model is no longer a competitive advantage. True value is created in the system built around the model—the system prompts, tools, and overall scaffolding. This 'harness' is what optimizes a model's performance for specific tasks and delivers a superior user experience.

OpenAI integrated the Model-Centric Protocol (MCP) into its agentic APIs instead of building its own. The decision was driven by Anthropic treating MCP as a truly open standard, complete with a cross-company steering committee, which fostered trust and made adoption easy and pragmatic.

Initially, even OpenAI believed a single, ultimate 'model to rule them all' would emerge. This thinking has completely changed to favor a proliferation of specialized models, creating a healthier, less winner-take-all ecosystem where different models serve different needs.

Using a composable, 'plug and play' architecture allows teams to build specialized AI agents faster and with less overhead than integrating a monolithic third-party tool. This approach enables the creation of lightweight, tailored solutions for niche use cases without the complexity of external API integrations, containing the entire workflow within one platform.

OpenAI uses two connector types. First-party (1P) "sync connectors" store data to enable higher-quality, optimized experiences (e.g., re-ranking). Third-party (3P) MCP connectors provide broad, long-tail coverage but offer less control. This dual approach strategically trades off deep integration quality against ecosystem scale.

Replit's leap in AI agent autonomy isn't from a single superior model, but from orchestrating multiple specialized agents using models from various providers. This multi-agent approach creates a different, faster scaling paradigm for task completion compared to single-model evaluations, suggesting a new direction for agent research.

The developer abstraction layer is moving up from the model API to the agent. A generic interface for switching models is insufficient because it creates a 'lowest common denominator' product. Real power comes from tightly binding a specific model to an agentic loop with compute and file system access.