We scan new podcasts and send you the top 5 insights daily.
The demand for AI inference is insatiable. As models become cheaper and more efficient, developers and businesses find more ways to embed intelligence, creating a perpetually growing market. Even with AGI, the core need will be running inference.
Despite the hype around enterprise AI, the vast majority of current inference workloads are driven by new, AI-native application companies. This indicates that the broader enterprise adoption market is still in its infancy, representing a massive future growth opportunity.
Analysts distinguish between initial revenue from training large language models (LLMs) and more sustainable, long-term revenue from 'inference'—the actual use of AI applications by end-market companies. The latter, like a bank using an AI chatbot, signals true market adoption and is considered the more valuable, 'sticky' revenue base.
Previously, the biggest constraint in AI was compute for training next-gen models. Now, the critical bottleneck is providing enough compute for *inference*—the real-time processing of queries from a rapidly growing user base.
The inference market is too large to remain monolithic. It will fragment into specialized platforms for different use cases like real-time video, long-running agents, or language models. This specialization will extend to hardware, with high-throughput, low-latency-need tasks (like agents) favoring cheaper AMD/Intel chips over NVIDIA's top GPUs.
CoreWeave, a major AI infrastructure provider, reports its compute workload is shifting from two-thirds training to nearly 50% inference. This indicates the AI industry is moving beyond model creation to real-world application and monetization, a crucial sign of enterprise adoption and market maturity.
The true commercial impact of AI will likely come from small, specialized "micro models" solving boring, high-volume business tasks. While highly valuable, these models are cheap to run and cannot economically justify the current massive capital expenditure on AGI-focused data centers.
While the most powerful AI will reside in large "god models" (like supercomputers), the majority of the market volume will come from smaller, specialized models. These will cascade down in size and cost, eventually being embedded in every device, much like microchips proliferated from mainframes.
The most profound near-term shift from AI won't be a single killer app, but rather constant, low-level cognitive support running in the background. Having an AI provide a 'second opinion for everything,' from reviewing contracts to planning social events, will allow people to move faster and with more confidence.
Don't assume that a "good enough" cheap model will satisfy all future needs. Jeff Dean argues that as AI models become more capable, users' expectations and the complexity of their requests grow in tandem. This creates a perpetual need for pushing the performance frontier, as today's complex tasks become tomorrow's standard expectations.
As AI models become commodities, the underlying hardware's speed and efficiency for inference is the true differentiator. The company that powers the fastest AI experiences will win, similar to how Google won with fast search, because there is no market for slow AI.