Microsoft's new data centers, like Fairwater 2, are designed for massive scale. They use high-speed networking to aggregate computing power across different sites and even regions (e.g., Atlanta and Wisconsin), enabling training of unprecedentedly large models on a single job.

Related Insights

The standard for measuring large compute deals has shifted from number of GPUs to gigawatts of power. This provides a normalized, apples-to-apples comparison across different chip generations and manufacturers, acknowledging that energy is the primary bottleneck for building AI data centers.

The progress in deep learning, from AlexNet's GPU leap to today's massive models, is best understood as a history of scaling compute. This scaling, resulting in a million-fold increase in power, enabled the transition from text to more data-intensive modalities like vision and spatial intelligence.

The progression from early neural networks to today's massive models is fundamentally driven by the exponential increase in available computational power, from the initial move to GPUs to today's million-fold increases in training capacity on a single model.

Instead of bearing the full cost and risk of building new AI data centers, large cloud providers like Microsoft use CoreWeave for 'overflow' compute. This allows them to meet surges in customer demand without committing capital to assets that depreciate quickly and may become competitors' infrastructure in the long run.

While AI inference can be decentralized, training the most powerful models demands extreme centralization of compute. The necessity for high-bandwidth, low-latency communication between GPUs means the best models are trained by concentrating hardware in the smallest possible physical space, a direct contradiction to decentralized ideals.

Satya Nadella reveals that Microsoft prioritizes building a flexible, "fungible" cloud infrastructure over catering to every demand of its largest AI customer, OpenAI. This involves strategically denying requests for massive, dedicated data centers to ensure capacity remains balanced for other customers and Microsoft's own high-margin products.

Instead of relying on hyped benchmarks, the truest measure of the AI industry's progress is the physical build-out of data centers. Tracking permits, power consumption, and satellite imagery reveals the concrete, multi-billion dollar bets being placed, offering a grounded view that challenges both extreme skeptics and believers.

Satya Nadella clarifies that the primary constraint on scaling AI compute is not the availability of GPUs, but the lack of power and physical data center infrastructure ("warm shelves") to install them. This highlights a critical, often overlooked dependency in the AI race: energy and real estate development speed.

The infrastructure demands of AI have caused an exponential increase in data center scale. Two years ago, a 1-megawatt facility was considered a good size. Today, a large AI data center is a 1-gigawatt facility—a 1000-fold increase. This rapid escalation underscores the immense and expensive capital investment required to power AI.

OpenAI’s pivotal partnership with Microsoft was driven more by the need for massive-scale cloud computing than just cash. To train its ambitious GPT models, OpenAI required infrastructure it could not build itself. Microsoft Azure provided this essential, non-commoditized resource, making them a perfect strategic partner beyond their balance sheet.