A critical, under-discussed constraint on Chinese AI progress is the compute bottleneck caused by inference. Their massive user base consumes available GPU capacity serving requests, leaving little compute for the R&D and training needed to innovate and improve their models.
The performance gap between US and Chinese AI models may be widening due to second-order effects of chip controls. By limiting inference at scale, the controls reduce the volume of customer interactions and feedback Chinese firms receive. This starves them of the data needed to identify and patch model weaknesses on diverse, real-world tasks.
Facing semiconductor shortages, China is pursuing a unique AI development path. Instead of competing directly on compute power, its strategy leverages national strengths in vast data sets, a large talent pool, and significant power infrastructure to drive AI progress and a medium-term localization strategy.
China is gaining an efficiency edge in AI by using "distillation"—training smaller, cheaper models from larger ones. This "train the trainer" approach is much faster and challenges the capital-intensive US strategy, highlighting how inefficient and "bloated" current Western foundational models are.
The focus in AI has evolved from rapid software capability gains to the physical constraints of its adoption. The demand for compute power is expected to significantly outstrip supply, making infrastructure—not algorithms—the defining bottleneck for future growth.
A primary risk for major AI infrastructure investments is not just competition, but rapidly falling inference costs. As models become efficient enough to run on cheaper hardware, the economic justification for massive, multi-billion dollar investments in complex, high-end GPU clusters could be undermined, stranding capital.
Echoing Don Valentine's VC wisdom that 'scarcity sparks ingenuity,' US restrictions on advanced chips are compelling Chinese firms to become hyper-efficient at optimizing older hardware. This necessity-driven innovation could allow them to build a more resilient and cost-effective AI ecosystem, posing a long-term competitive threat.
China can compensate for less energy-efficient domestic AI chips by utilizing its vast and rapidly expanding power grid. Since the primary trade-off for lower-end chips is energy efficiency, China's ability to absorb higher energy costs allows it to scale large model training despite semiconductor limitations.
An Alibaba tech lead claims the US compute advantage allows for wasteful but effective "rich people innovation" (running many experiments). In contrast, Chinese firms are forced into "poor people innovation," bogged down by operational needs and unable to risk compute on next-gen research.
The widely discussed compute shortage is primarily an inference problem, not a training one. According to Mustafa Suleiman, Microsoft has enough power for training next-gen models, but is constrained by the massive demand for running existing services like Copilot.
Contrary to the idea that infrastructure problems get commoditized, AI inference is growing more complex. This is driven by three factors: (1) increasing model scale (multi-trillion parameters), (2) greater diversity in model architectures and hardware, and (3) the shift to agentic systems that require managing long-lived, unpredictable state.