ChatKit is delivered as an embeddable iframe, not an open-source library. This is a deliberate choice modeled after Stripe Checkout, allowing OpenAI to push updates (new models, UI features, modalities) automatically. This saves developers from constant frontend maintenance and keeps the experience cutting-edge.
OpenAI identifies agent evaluation as a key challenge. While they can currently grade an entire task's trace, the real difficulty lies in evaluating and optimizing the individual steps within a long, complex agentic workflow. This is a work-in-progress area critical for building reliable, production-grade agents.
OpenAI integrated the Model-Centric Protocol (MCP) into its agentic APIs instead of building its own. The decision was driven by Anthropic treating MCP as a truly open standard, complete with a cross-company steering committee, which fostered trust and made adoption easy and pragmatic.
OpenAI favors "zero gradient" prompt optimization because serving thousands of unique, fine-tuned model snapshots is operationally very difficult. Prompt-based adjustments allow performance gains without the immense infrastructure burden, making it a more practical and scalable approach for both OpenAI and developers.
In a significant strategic move, OpenAI's Evals product within Agent Kit allows developers to test results from non-OpenAI models via integrations like Open Router. This positions Agent Kit not just as an OpenAI-centric tool, but as a central, model-agnostic platform for building and optimizing agents.
OpenAI learned from its "Plugins" product that developers need control over their brand and user experience. The new Apps SDK allows custom UI components inside ChatGPT, a direct response to feedback that Plugins offered too little control, binding developers too tightly to the standard chat interface.
An emerging power-user pattern, especially among new grads, is to trust AI coding assistants like Codex with entire features, not just small snippets. This "full YOLO mode" approach, while sometimes failing, often "one-shots" complex tasks, forcing a recalibration of how developers should leverage AI for maximum effectiveness.
OpenAI uses two connector types. First-party (1P) "sync connectors" store data to enable higher-quality, optimized experiences (e.g., re-ranking). Third-party (3P) MCP connectors provide broad, long-tail coverage but offer less control. This dual approach strategically trades off deep integration quality against ecosystem scale.
