We scan new podcasts and send you the top 5 insights daily.
Anthropic's Fable 5 costs twice as much per token as its predecessor. However, its increased intelligence leads to fewer errors and more direct solutions, reducing the total tokens needed for a task and making the overall cost more competitive.
The common analogy of new models being like faster but less fuel-efficient sports cars is wrong. Anthropic finds that each new model generation brings a step-function improvement in both capability and token processing efficiency, benefiting both customers and internal R&D.
While the cost-per-token is decreasing as models become more efficient, this efficiency gain drives a massive increase in new use cases and overall consumption. This economic principle, Jevons Paradox, explains why total enterprise spending on model inference is skyrocketing, even as the unit cost falls.
Fable 5's advanced reasoning comes at a steep cost, consuming tokens and rate limits at twice the speed of previous models. This is presented as an intentional design choice, forcing users to strategically decide if a task's complexity justifies the significant increase in operational expense.
It's counterintuitive, but using a more expensive, intelligent model like Opus 4.5 can be cheaper than smaller models. Because the smarter model is more efficient and requires fewer interactions to solve a problem, it ends up using fewer tokens overall, offsetting its higher per-token price.
When evaluating AI agents, the total cost of task completion is what matters. A model with a higher per-token cost can be more economical if it resolves a user's query in fewer turns than a cheaper, less capable model. This makes "number of turns" a primary efficiency metric.
Despite a higher price per token, Fable 5 can be more cost-effective in practice. Its ability to solve complex problems correctly on the first try ("one-shot") eliminates the significant token and time costs associated with iterative reprompting, making it cheaper for ambitious projects that require high accuracy.
The binary distinction between "reasoning" and "non-reasoning" models is becoming obsolete. The more critical metric is now "token efficiency"—a model's ability to use more tokens only when a task's difficulty requires it. This dynamic token usage is a key differentiator for cost and performance.
OpenAI's GPT-5.5 is more expensive per token, but a new evaluation framework is emerging. The key metric isn't raw cost, but the model's efficiency in solving a problem. This 'intelligence per dollar' reframes cost analysis around performance and compute, where more expensive models can be cheaper overall if they solve tasks more efficiently.
A model with a low per-token price can be more expensive if it's inefficient, verbose, or requires multiple attempts ('overthinking'). The actual invoice depends on the total tokens needed to complete a task, making token efficiency a hidden multiplier that savvy enterprises are now tracking to determine the true cost.
In complex, multi-step tasks, overall cost is determined by tokens per turn and the total number of turns. A more intelligent, expensive model can be cheaper overall if it solves a problem in two turns, while a cheaper model might take ten turns, accumulating higher total costs. Future benchmarks must measure this turn efficiency.