Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

World models are algorithmically more intense than language models, pushing computation (flops) much harder relative to memory access. This unique computational pattern will create a market for specialized chips optimized specifically for these workloads, leading to a divergence from the current hardware landscape built for LLMs.

Related Insights

The AI inference process involves two distinct phases: "prefill" (reading the prompt, which is compute-bound) and "decode" (writing the response, which is memory-bound). NVIDIA GPUs excel at prefill, while companies like Grok optimize for decode. The Grok-NVIDIA deal signals a future of specialized, complementary hardware rather than one-size-fits-all chips.

Today's AI is largely text-based (LLMs). The next phase involves Visual Language Models (VLMs) that interpret and interact with the physical world for robotics and surgery. This transition requires an exponential, 50-1000x increase in compute power, underwriting the long-term AI infrastructure build-out.

Large Language Models are limited because they lack an understanding of the physical world. The next evolution is 'World Models'—AI trained on real-world sensory data to understand physics, space, and context. This is the foundational technology required to unlock physical AI like advanced robotics.

The intense power demands of AI inference will push data centers to adopt the "heterogeneous compute" model from mobile phones. Instead of a single GPU architecture, data centers will use disaggregated, specialized chips for different tasks to maximize power efficiency, creating a post-GPU era.

The era of dual-purpose AI chips is ending. The overwhelming demand for real-time processing from AI agents is forcing companies like Google and NVIDIA to create dedicated, inference-optimized hardware. This marks a fundamental and permanent split in the AI infrastructure market, separating training from inference.

The rise of agent orchestration using specialized, open-source models will drive demand for custom ASICs. Jerry Murdock argues that putting a model on a dedicated chip will be far cheaper and more tunable for specific workloads than using expensive, general-purpose GPUs like Nvidia's, spurring a hardware shift.

The inference market is too large to remain monolithic. It will fragment into specialized platforms for different use cases like real-time video, long-running agents, or language models. This specialization will extend to hardware, with high-throughput, low-latency-need tasks (like agents) favoring cheaper AMD/Intel chips over NVIDIA's top GPUs.

The AI hardware market is splitting into two distinct segments: training and inference. While NVIDIA dominates training, the larger, long-term opportunity lies in inference. This is creating a market for specialized, memory-optimized chips from companies like Cerebras and Grok designed for running models efficiently.

While NVIDIA currently holds a stranglehold on AI compute, this dominance won't sustain. The industry will move towards specialization, with new architectures and ASICs designed for specific tasks like inference (e.g., Cerebras) or with neural network weights baked in. This will fragment the market.

At a massive scale, chip design economics flip. For a $1B training run, the potential efficiency savings on compute and inference can far exceed the ~$200M cost to develop a custom ASIC for that specific task. The bottleneck becomes chip production timelines, not money.