Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Intel is using less expensive LPDDR memory in its new AI chip to compete on cost in the inference market, not performance in the training market dominated by Nvidia. This niche strategy aims to capture cost-sensitive customers and potentially the restricted China market.

Related Insights

The next wave of AI silicon may pivot from today's compute-heavy architectures to memory-centric ones optimized for inference. This fundamental shift would allow high-performance chips to be produced on older, more accessible 7-14nm manufacturing nodes, disrupting the current dependency on cutting-edge fabs.

Despite its high valuation post-IPO, AI chipmaker Cerebras's long-term strategy focuses on inference, not just training. The bet is that inference will become a much larger segment of the AI compute market. By developing chips specifically optimized for this task, Cerebras aims to take significant market share from NVIDIA.

While Nvidia dominates the AI training chip market, this only represents about 1% of the total compute workload. The other 99% is inference. Nvidia's risk is that competitors and customers' in-house chips will create cheaper, more efficient inference solutions, bifurcating the market and eroding its monopoly.

China is compensating for its deficit in cutting-edge semiconductors by pursuing an asymmetric strategy. It focuses on massive 'superclusters' of less advanced domestic chips and creating hyper-efficient, open-source AI models. This approach prioritizes widespread, low-cost adoption over chasing the absolute peak of performance like the US.

The AI narrative has focused on GPUs for training, but the proliferation of AI agents for task execution is creating a massive, overlooked demand for CPUs. This shift to inference and orchestration is reversing Intel's recent decline.

The inference market is too large to remain monolithic. It will fragment into specialized platforms for different use cases like real-time video, long-running agents, or language models. This specialization will extend to hardware, with high-throughput, low-latency-need tasks (like agents) favoring cheaper AMD/Intel chips over NVIDIA's top GPUs.

Unlike general-purpose NVIDIA GPUs, Microsoft's custom Maya 200 chip focuses specifically on running existing AI models (inference). Microsoft claims this makes it cheaper for certain tasks, like its own Copilot tools, creating a cost-saving value proposition for potential customers like Anthropic.

Microsoft's new AI chip is not designed as an "NVIDIA killer" for the open market. Instead, it's optimized for internal use within its hyperscaler fleet, prioritizing performance-per-dollar and efficiency—operating at half the power of NVIDIA's Blackwell—for its own inference workloads.

The AI hardware market is splitting into two distinct segments: training and inference. While NVIDIA dominates training, the larger, long-term opportunity lies in inference. This is creating a market for specialized, memory-optimized chips from companies like Cerebras and Grok designed for running models efficiently.

Previously, the bottleneck for AI labs was researcher time, making Nvidia's easy-to-use CUDA ecosystem dominant. Now, the biggest cost is compute capacity itself, creating massive economic incentives for labs to adopt cheaper, even if less convenient, competing chips from AMD or Google.