Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

In response to budget blowouts from agentic AI, enterprises are moving beyond simple adoption to active cost management. A new "token efficiency" stack is emerging, featuring tactics like model routing to cheaper alternatives (e.g., DeepSeek) and custom post-trained models to reduce reliance on expensive foundation models.

Related Insights

Faced with rising costs from proprietary labs, sophisticated enterprise clients are building internal evaluation and routing systems. This allows them to use cheaper, open-source models for less complex tasks, optimizing for both cost and performance.

Enterprises are currently overspending on tokens by sending all queries to the most powerful LLMs. A new software category will emerge to intelligently route requests to smaller, cheaper models when possible, creating a critical efficiency and cost-saving layer between companies and foundational model providers.

Contrary to the belief that enterprises have unlimited budgets, they are focused on the ROI of their AI spend. As agentic workflows cause token bills to skyrocket, orchestration tools that intelligently route queries to the most cost-effective model for a given task are becoming essential infrastructure.

As enterprises become more cost-conscious about token spend, they are actively seeking cheaper alternatives to OpenAI and Anthropic. Data from Ramp shows China's DeepSeek is the top trending software vendor, indicating a new willingness to use foreign or open-source models despite potential data privacy concerns.

The most sophisticated AI users aren't locking into one provider. Faced with a 13x annual increase in token costs, they leverage multiple models and routing platforms like OpenRouter to optimize for price and performance. This behavior suggests a future of model commoditization, not monopoly.

The most heated topic among Fortune 500 CIOs is no longer which AI model is most powerful, but how to manage unpredictable and soaring token costs. Companies are struggling to find the right strategies—from workload prioritization to user-based access tiers—to create a predictable cost model in a rapidly evolving tech landscape.

The AI industry has shifted from a subsidized model to a "token shortage" era. This forces all companies, from AI providers to enterprise users like Uber, to prioritize cost-effective usage. Business models are now usage-based, making architectural and financial efficiency paramount.

Companies are building intelligent systems that analyze a user's prompt and automatically route it to the most cost-effective model that can handle the task. This avoids using expensive frontier models for simple requests, with some companies like Coinbase successfully keeping costs flat despite exponential usage growth.

Companies initially gamified AI use, leading to a "token maxing" culture. Now, facing enormous, unexpected bills, they are experiencing "sticker shock." This is forcing a strategic shift from encouraging maximum usage to demanding ROI calculations and finding the most cost-effective AI model for a given task.

After encouraging heavy internal AI usage ('token maxing'), Meta is now launching an efficiency program to control ballooning costs. It's building an "AI Gateway" to track usage, set budgets, and push employees toward cheaper, in-house tools, signaling a broader industry trend of reining in AI spending.

Enterprises Are Building a "Token Efficiency" Stack to Combat Soaring AI Costs | RiffOn