Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

When an efficient model like DeepSeek was released, Nebius's stock fell on fears of reduced compute demand. Internally, they had their best sales week ever. Cheaper intelligence makes new products economically viable, increasing overall compute consumption, not decreasing it.

Related Insights

While the cost-per-token is decreasing as models become more efficient, this efficiency gain drives a massive increase in new use cases and overall consumption. This economic principle, Jevons Paradox, explains why total enterprise spending on model inference is skyrocketing, even as the unit cost falls.

While the cost per AI query drops, companies find more complex, compute-intensive uses for it. This elasticity of demand means total AI spending becomes a significant and variable operational expense, similar to a utility bill, rather than a predictable software cost.

While an AI bubble seems negative, the overproduction of compute power creates a favorable environment for companies that consume it. As prices for compute drop, their cost of goods sold decreases, leading to higher gross margins and better business fundamentals.

The comparison of the AI hardware buildout to the dot-com "dark fiber" bubble is flawed because there are no "dark GPUs"—all compute is being used. As hardware efficiency improves and token costs fall (Jevons paradox), it will unlock countless new AI applications, ensuring that demand continues to absorb all available supply.

While the growth of new consumer AI users is slowing into an S-curve, the compute consumption per user is still growing exponentially. This is driven by the shift from simple queries to complex, token-intensive tasks like reasoning and agents, sustaining massive demand for GPU infrastructure.

The demand for AI inference is insatiable. As models become cheaper and more efficient, developers and businesses find more ways to embed intelligence, creating a perpetually growing market. Even with AGI, the core need will be running inference.

The cost of AI, priced in "tokens by the drink," is falling dramatically. All inputs are on a downward cost curve, leading to a hyper-deflationary effect on the price of intelligence. This, in turn, fuels massive demand elasticity as more use cases become economically viable.

The future of compute demand is a tale of two opposing forces. Enterprises will use AI to compress redundant data and streamline operations, reducing compute costs. Consumers, however, will demand generative AI for entertainment and personalization (e.g., 'Star Wars with my face'), creating massive new compute needs.

While the cost for GPT-4 level intelligence has dropped over 100x, total enterprise AI spend is rising. This is driven by multipliers: using larger frontier models for harder tasks, reasoning-heavy workflows that consume more tokens, and complex, multi-turn agentic systems.

The success of personal AI assistants signals a massive shift in compute usage. While training models is resource-intensive, the next 10x in demand will come from widespread, continuous inference as millions of users run these agents. This effectively means consumers are buying fractions of datacenter GPUs like the GB200.

Cheaper AI Models Don't Kill Compute Demand; They Create an Explosion of New Use Cases | RiffOn