Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

The popular cost-saving strategy of using a cheap AI to route tasks to a smarter AI is backwards. A 'dumb' model cannot reliably know what it doesn't know, making it a poor judge of when to escalate. The logically sound but more expensive approach is for a smart model to delegate tasks downward.

Related Insights

The future of AI is not a single all-knowing model, but a "router" model that triages requests to a suite of specialized expert AIs (e.g., doctor, programmer). The primary technical and business challenge will shift to building the most efficient and accurate routing system, which will determine market leadership.

Don't use your most powerful and expensive AI model for every task. A crucial skill is model triage: using cheaper models for simple, routine tasks like monitoring and scheduling, while saving premium models for complex reasoning, judgment, and creative work.

Advanced AI architectures will use small, fast, and cheap local models to act as intelligent routers. These models will first analyze a complex request, formulate a plan, and then delegate different sub-tasks to a fleet of more powerful or specialized models, optimizing for cost and performance.

Instead of relying on one powerful model for all tasks, the leading strategy is 'smart routing'—using a panel of models and directing each task to the most appropriate one. This compound architecture demonstrably beats single frontier models on both cost and performance.

Legal AI firm Harvey proved a hybrid system—using a smaller model as a primary worker and routing selectively to a frontier model as an "advisor"—can beat a frontier-only approach on both quality and cost. This demonstrates that intelligent orchestration is a more effective strategy than simply using the most powerful model for every task.

Companies are building intelligent systems that analyze a user's prompt and automatically route it to the most cost-effective model that can handle the task. This avoids using expensive frontier models for simple requests, with some companies like Coinbase successfully keeping costs flat despite exponential usage growth.

A hybrid approach to AI agent architecture is emerging. Use the most powerful, expensive cloud models like Claude for high-level reasoning and planning (the "CEO"). Then, delegate repetitive, high-volume execution tasks to cheaper, locally-run models (the "line workers").

To prevent AI agent usage costs from spiraling, GitHub expects the solution will be intelligent model routing. These systems will automatically select the most efficient and cost-effective AI model for a given task, such as using a cheap model for simple refactoring instead of a powerful, expensive one.

Building AI systems around rigid "workflows" is a mistake because knowledge work lacks predictable "happy paths." A superior mental model is "delegation," where the AI is treated like a human assistant. You delegate a task area, and the AI is expected to learn and adapt to novel circumstances, not just execute a process.

To manage costs, the optimal architecture isn't running everything on the most powerful model. Instead, a smart orchestrator agent should break down complex problems and dispatch simpler sub-tasks to smaller, cheaper models, optimizing for both cost and performance.