Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

While the growth of new consumer AI users is slowing into an S-curve, the compute consumption per user is still growing exponentially. This is driven by the shift from simple queries to complex, token-intensive tasks like reasoning and agents, sustaining massive demand for GPU infrastructure.

Related Insights

While focus is on massive supercomputers for training next-gen models, the real supply chain constraint will be 'inference' chips—the GPUs needed to run models for billions of users. As adoption goes mainstream, demand for everyday AI use will far outstrip the supply of available hardware.

Unlike the dot-com bubble's speculative fiber build-out which resulted in unused "dark fiber," today's AI infrastructure boom sees immediate utilization of every GPU. This signals that the massive investment is driven by tangible, present demand for AI computation, not future speculation.

The frenzy over Mac Minis to run Moltbot is a "sideshow." The true economic impact is the massive increase in GPU/TPU demand for inference. Each user running a persistent personal agent is effectively consuming the output of a dedicated data center chip, not just a local machine.

The focus in AI has evolved from rapid software capability gains to the physical constraints of its adoption. The demand for compute power is expected to significantly outstrip supply, making infrastructure—not algorithms—the defining bottleneck for future growth.

Unlike the dot-com era's speculative infrastructure buildout for non-existent users, today's AI CapEx is driven by proven demand. Profitable giants like Microsoft and Google are scrambling to meet active workloads from billions of users, indicating a compute bottleneck, not a hype cycle.

The comparison of the AI hardware buildout to the dot-com "dark fiber" bubble is flawed because there are no "dark GPUs"—all compute is being used. As hardware efficiency improves and token costs fall (Jevons paradox), it will unlock countless new AI applications, ensuring that demand continues to absorb all available supply.

The future of compute demand is a tale of two opposing forces. Enterprises will use AI to compress redundant data and streamline operations, reducing compute costs. Consumers, however, will demand generative AI for entertainment and personalization (e.g., 'Star Wars with my face'), creating massive new compute needs.

While the cost to achieve a fixed capability level (e.g., GPT-4 at launch) has dropped over 100x, overall enterprise spending is increasing. This paradox is explained by powerful multipliers: demand for frontier models, longer reasoning chains, and multi-step agentic workflows that consume exponentially more tokens.

AI's computational needs are not just from initial training. They compound exponentially due to post-training (reinforcement learning) and inference (multi-step reasoning), creating a much larger demand profile than previously understood and driving a billion-X increase in compute.

The success of personal AI assistants signals a massive shift in compute usage. While training models is resource-intensive, the next 10x in demand will come from widespread, continuous inference as millions of users run these agents. This effectively means consumers are buying fractions of datacenter GPUs like the GB200.