Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Unlike companies that resell tokens for every query, Serval uses expensive models once to create a durable script. This automation is executed repeatedly at low cost. This "generate-once, run-many" approach dramatically improves unit economics and insulates the business from high token consumption.

Related Insights

Enterprises are currently overspending on tokens by sending all queries to the most powerful LLMs. A new software category will emerge to intelligently route requests to smaller, cheaper models when possible, creating a critical efficiency and cost-saving layer between companies and foundational model providers.

Contrary to the belief that enterprises have unlimited budgets, they are focused on the ROI of their AI spend. As agentic workflows cause token bills to skyrocket, orchestration tools that intelligently route queries to the most cost-effective model for a given task are becoming essential infrastructure.

It's counterintuitive, but using a more expensive, intelligent model like Opus 4.5 can be cheaper than smaller models. Because the smarter model is more efficient and requires fewer interactions to solve a problem, it ends up using fewer tokens overall, offsetting its higher per-token price.

Unlike traditional SaaS, achieving product-market fit in AI is not enough for survival. The high and variable costs of model inference mean that as usage grows, companies can scale directly into unprofitability. This makes developing cost-efficient infrastructure a critical moat and survival strategy, not just an optimization.

An effective cost-saving strategy for agentic workflows is to use a powerful model like Claude Opus to perform a complex task once and generate a detailed 'skill.' This skill can then be reliably executed by a much cheaper and faster model like Sonnet for subsequent use.

Current AI pricing models, which pass on expensive LLM costs to users, are temporary. As LLM costs inevitably collapse and become commoditized, the winning companies will be those who have already evolved their monetization to be based on the value their product delivers.

The process of 'distillation' involves using a large, expensive LLM to perform a task repeatedly. The resulting prompts and responses then become the training data to create a smaller, specialized, and much cheaper Small Language Model (SLM) that can perform that specific task, potentially saving 90% on inference costs.

By training a smaller, specialized model where company data is in the weights, firms avoid the high token costs of repeatedly feeding context to large frontier models. This makes complex, data-intensive workflows significantly cheaper and faster.

AI-native companies grow so rapidly that their cost to acquire an incremental dollar of ARR is four times lower than traditional SaaS at the $100M scale. This superior burn multiple makes them more attractive to VCs, even with higher operational costs from tokens.

Despite fears of high AI usage bills, the actual token costs for running multiple customer-facing AI applications can be trivial. SaaStr's entire suite of AI tools, including its AI VP of CS, runs on a total budget of less than $200 per month for all API usage.

Serval's "Generate-Once, Run-Many" Model Bypasses Poor AI Unit Economics | RiffOn