The true power of the AI application layer lies in orchestrating multiple, specialized foundation models. Users want a single interface (like Cursor for coding) that intelligently routes tasks to the best model (e.g., Gemini for front-end, Codex for back-end), creating value through aggregation and workflow integration.
To build a durable business on top of foundation models, go beyond a simple API call. Gamma creates a moat by deeply owning an entire workflow (visual communication) and orchestrating over 20 different specialized AI models, each chosen for a specific sub-task in the user journey.
The new Codex app is designed as an "agent command center" for managing multiple AI agents working in parallel. This interface-driven approach suggests OpenAI believes the developer's role is evolving from a hands-on coder into a high-level orchestrator, fundamentally changing the software development paradigm.
Rather than committing to a single LLM provider like OpenAI or Gemini, Hux uses multiple commercial models. They've found that different models excel at different tasks within their app. This multi-model strategy allows them to optimize for quality and latency on a per-workflow basis, avoiding a one-size-fits-all compromise.
Enterprises will shift from relying on a single large language model to using orchestration platforms. These platforms will allow them to 'hot swap' various models—including smaller, specialized ones—for different tasks within a single system, optimizing for performance, cost, and use case without being locked into one provider.
The comparison reveals that different AI models excel at specific tasks. Opus 4.5 is a strong front-end designer, while Codex 5.1 might be better for back-end logic. The optimal workflow involves "model switching"—assigning the right AI to the right part of the development process.
While GenAI grabs headlines, its most practical enterprise use is as an intelligent orchestrator. It can call upon and synthesize results from highly effective traditional tools like time-series forecasting models or SQL databases, multiplying their value within a larger, more powerful system.
The belief that a single, god-level foundation model would dominate has proven false. Horowitz points to successful AI applications like Cursor, which uses 13 different models. This shows that value lies in the complex orchestration and design at the application layer, not just in having the largest single model.
Like Kayak for flights, being a model aggregator provides superior value to users who want access to the best tool for a specific job. Big tech companies are restricted to their own models, creating an opportunity for startups to win by offering a 'single pane of glass' across all available models.
The common critique of AI application companies as "GPT wrappers" with no moat is proving false. The best startups are evolving beyond using a single third-party model. They are using dozens of models and, crucially, are backward-integrating to build their own custom AI models optimized for their specific domain.
Powerful AI tools are becoming aggregators like Manus, which intelligently select the best underlying model for a specific task—research, data visualization, or coding. This multi-model approach enables a seamless workflow within a single thread, outperforming systems reliant on one general-purpose model.