Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Legal AI firm Harvey proved a hybrid system—using a smaller model as a primary worker and routing selectively to a frontier model as an "advisor"—can beat a frontier-only approach on both quality and cost. This demonstrates that intelligent orchestration is a more effective strategy than simply using the most powerful model for every task.

Related Insights

A single AI model is insufficient for running a complex company. An orchestration layer allows you to assign different models (e.g., a powerful frontier model for the CEO, cheaper models for routine tasks) based on their unique "personalities" and cost-effectiveness.

Enterprises are currently overspending on tokens by sending all queries to the most powerful LLMs. A new software category will emerge to intelligently route requests to smaller, cheaper models when possible, creating a critical efficiency and cost-saving layer between companies and foundational model providers.

Don't use your most powerful and expensive AI model for every task. A crucial skill is model triage: using cheaper models for simple, routine tasks like monitoring and scheduling, while saving premium models for complex reasoning, judgment, and creative work.

Rather than relying on a single LLM, LexisNexis employs a "planning agent" that decomposes a complex legal query into sub-tasks. It then assigns each task (e.g., deep research, document drafting) to the specific LLM best suited for it, demonstrating a sophisticated, model-agnostic approach for enterprise AI.

To provide high-quality AI insights in real-time without prohibitive costs, Abridge employs a "fast and slow" thinking approach. It uses a constellation of models, where a cheaper, faster model first triages a situation and then hands off complex tasks to a more powerful, expensive model only when necessary.

Advanced AI architectures will use small, fast, and cheap local models to act as intelligent routers. These models will first analyze a complex request, formulate a plan, and then delegate different sub-tasks to a fleet of more powerful or specialized models, optimizing for cost and performance.

An intelligent AI orchestration layer can achieve a cost-to-accuracy balance superior to any single model. By routing queries to a portfolio of different models (large, small, specialized), it creates a new Pareto frontier, delivering higher success rates at a lower average cost than relying on one "best" model.

To optimize costs, users configure powerful models like Claude Opus as the 'brain' to strategize and delegate execution tasks (e.g. coding) to cheaper, specialized models like ChatGPT's Codec, treating them as muscles.

A hybrid approach to AI agent architecture is emerging. Use the most powerful, expensive cloud models like Claude for high-level reasoning and planning (the "CEO"). Then, delegate repetitive, high-volume execution tasks to cheaper, locally-run models (the "line workers").

An emerging rule from enterprise deployments is to use small, fine-tuned models for well-defined, domain-specific tasks where they excel. Large models should be reserved for generic, open-ended applications with unknown query types where their broad knowledge base is necessary. This hybrid approach optimizes performance and cost.

Hybrid AI Agents Outperform Frontier Models by Using Smart Routing, Not Brute Force | RiffOn