We scan new podcasts and send you the top 5 insights daily.
Companies like Base ten and OpenRouter are securing billion-dollar valuations, signaling a major investment shift. The market now prioritizes the "inference layer"—serving and routing AI models in production—over just training them, as this is where recurring costs and value are generated at scale.
As chip manufacturers like NVIDIA release new hardware, inference providers like Base10 absorb the complexity and engineering effort required to optimize AI models for the new chips. This service is a key value proposition, saving customers from the challenging process of re-optimizing workloads for new hardware.
The recent explosion in AI agent usage is a key driver behind the massive funding rounds for inference providers like Base10. Agents, which can be autonomous and perform complex tasks, "gobble up" significantly more compute resources and tokens than previous AI applications, directly boosting revenue for the companies that run the underlying models.
The AI value stack has evolved from chips (NVIDIA) to models (OpenAI). The next critical phase is the application layer. It's unclear if value will be captured by new application companies or if the underlying model providers will absorb all the profits, a key question for investors and founders.
Analysts distinguish between initial revenue from training large language models (LLMs) and more sustainable, long-term revenue from 'inference'—the actual use of AI applications by end-market companies. The latter, like a bank using an AI chatbot, signals true market adoption and is considered the more valuable, 'sticky' revenue base.
The demand for AI inference is insatiable. As models become cheaper and more efficient, developers and businesses find more ways to embed intelligence, creating a perpetually growing market. Even with AGI, the core need will be running inference.
Previously, the biggest constraint in AI was compute for training next-gen models. Now, the critical bottleneck is providing enough compute for *inference*—the real-time processing of queries from a rapidly growing user base.
While training has been the focus, user experience and revenue happen at inference. OpenAI's massive deal with chip startup Cerebrus is for faster inference, showing that response time is a critical competitive vector that determines if AI becomes utility infrastructure or remains a novelty.
CoreWeave, a major AI infrastructure provider, reports its compute workload is shifting from two-thirds training to nearly 50% inference. This indicates the AI industry is moving beyond model creation to real-world application and monetization, a crucial sign of enterprise adoption and market maturity.
The joint venture between Google and Blackstone is likely not aimed at the crowded AI training market. Instead, it appears to be a strategic play for the rapidly growing inference market, where demand for running open-source models is exploding and requires different infrastructure.
As foundational AI models become commoditized 'intelligence utilities,' the economic value moves up the stack. Orchestrators like OpenClaw, which can intelligently route tasks to the most efficient model based on cost or use case, are positioned to capture the margin that the underlying model providers cannot.