An emerging rule from enterprise deployments is to use small, fine-tuned models for well-defined, domain-specific tasks where they excel. Large models should be reserved for generic, open-ended applications with unknown query types where their broad knowledge base is necessary. This hybrid approach optimizes performance and cost.

Related Insights

For specialized, high-stakes tasks like insurance underwriting, enterprises will favor smaller, on-prem models fine-tuned on proprietary data. These models can be faster, more accurate, and more secure than general-purpose frontier models, creating a lasting market for custom AI solutions.

Use a tiered approach for model selection based on parameter count. Models under 10B are for simple tasks like RAG. The 10-100B range is the sweet spot for agentic systems. Models over 100B parameters are for complex, multi-lingual, enterprise-wide deployments.

The path to robust AI applications isn't a single, all-powerful model. It's a system of specialized "sub-agents," each handling a narrow task like context retrieval or debugging. This architecture allows for using smaller, faster, fine-tuned models for each task, improving overall system performance and efficiency.

The model uses a Mixture-of-Experts (MoE) architecture with over 200 billion parameters, but only activates a "sparse" 10 billion for any given task. This design provides the knowledge base of a massive model while keeping inference speed and cost comparable to much smaller models.

Instead of relying solely on massive, expensive, general-purpose LLMs, the trend is toward creating smaller, focused models trained on specific business data. These "niche" models are more cost-effective to run, less likely to hallucinate, and far more effective at performing specific, defined tasks for the enterprise.

The "agentic revolution" will be powered by small, specialized models. Businesses and public sector agencies don't need a cloud-based AI that can do 1,000 tasks; they need an on-premise model fine-tuned for 10-20 specific use cases, driven by cost, privacy, and control requirements.

Enterprises will shift from relying on a single large language model to using orchestration platforms. These platforms will allow them to 'hot swap' various models—including smaller, specialized ones—for different tasks within a single system, optimizing for performance, cost, and use case without being locked into one provider.

The trend toward specialized AI models is driven by economics, not just performance. A single, monolithic model trained to be an expert in everything would be massive and prohibitively expensive to run continuously for a specific task. Specialization keeps models smaller and more cost-effective for scaled deployment.

While frontier models like Claude excel at analyzing a few complex documents, they are impractical for processing millions. Smaller, specialized, fine-tuned models offer orders of magnitude better cost and throughput, making them the superior choice for large-scale, repetitive extraction tasks.

For specific, high-leverage tasks like conversation summarization and re-ranking search results, Intercom trains its own custom models. These smaller, fine-tuned models have proven to be cheaper, faster, and higher quality than using general-purpose frontier models from vendors.