Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A model with a low per-token price can be more expensive if it's inefficient, verbose, or requires multiple attempts ('overthinking'). The actual invoice depends on the total tokens needed to complete a task, making token efficiency a hidden multiplier that savvy enterprises are now tracking to determine the true cost.

Related Insights

Flat-rate AI plans are becoming economically unviable due to token-hungry agents. Companies like Google and Microsoft are pushing usage-based billing, forcing enterprises to confront the surprisingly high real cost of running models at scale, which was previously hidden by subsidized pricing experiments.

It's counterintuitive, but using a more expensive, intelligent model like Opus 4.5 can be cheaper than smaller models. Because the smarter model is more efficient and requires fewer interactions to solve a problem, it ends up using fewer tokens overall, offsetting its higher per-token price.

When evaluating AI agents, the total cost of task completion is what matters. A model with a higher per-token cost can be more economical if it resolves a user's query in fewer turns than a cheaper, less capable model. This makes "number of turns" a primary efficiency metric.

A paradox exists where the cost for a fixed level of AI capability (e.g., GPT-4 level) has dropped 100-1000x. However, overall enterprise spend is increasing because applications now use frontier models with massive contexts and multi-step agentic workflows, creating huge multipliers on token usage that drive up total costs.

The binary distinction between "reasoning" and "non-reasoning" models is becoming obsolete. The more critical metric is now "token efficiency"—a model's ability to use more tokens only when a task's difficulty requires it. This dynamic token usage is a key differentiator for cost and performance.

While the cost to achieve a fixed capability level (e.g., GPT-4 at launch) has dropped over 100x, overall enterprise spending is increasing. This paradox is explained by powerful multipliers: demand for frontier models, longer reasoning chains, and multi-step agentic workflows that consume exponentially more tokens.

OpenAI's GPT-5.5 is more expensive per token, but a new evaluation framework is emerging. The key metric isn't raw cost, but the model's efficiency in solving a problem. This 'intelligence per dollar' reframes cost analysis around performance and compute, where more expensive models can be cheaper overall if they solve tasks more efficiently.

In complex, multi-step tasks, overall cost is determined by tokens per turn and the total number of turns. A more intelligent, expensive model can be cheaper overall if it solves a problem in two turns, while a cheaper model might take ten turns, accumulating higher total costs. Future benchmarks must measure this turn efficiency.

OpenAI is reportedly exploring outcome-based pricing, where customers are charged only if an AI successfully completes a task. This model shifts from a commodity-like 'cost per 1000 tokens' (CPM) to a value-aligned 'cost per successful action' (CPA), better aligning incentives.

While the cost for GPT-4 level intelligence has dropped over 100x, total enterprise AI spend is rising. This is driven by multipliers: using larger frontier models for harder tasks, reasoning-heavy workflows that consume more tokens, and complex, multi-turn agentic systems.