/

The data black hole at the center of AI

Dwarkesh Podcast · Jun 19, 2026

AI's glitter hides a black hole of data. Models are millions of times less sample-efficient than humans, a fundamental gap that scaling alone can't bridge.

Frontier AI Models Are Powered by a Hidden Decabillion-Dollar Industry of Human Experts

Achieving state-of-the-art AI performance requires a massive, bespoke data generation process. This involves thousands of human experts—from legal specialists to management consultants—creating specific examples, rubrics, and chain-of-thought explanations, forming a new and rapidly growing data industry that is the true engine of progress.

The data black hole at the center of AI thumbnail

The data black hole at the center of AI

Dwarkesh Podcast·a day ago

AI's Ludicrous Training Inefficiency Is Economically Viable Due to Massive Scalability

While training AI is vastly less data-efficient than training a human, it remains a winning economic strategy. Unlike humans, AI training can be massively parallelized, and the resulting skills can be amortized across billions of simultaneous user sessions, making the inefficient process highly profitable and scalable.

The data black hole at the center of AI thumbnail

The data black hole at the center of AI

Dwarkesh Podcast·a day ago

Open Source AI Catches Up Fast Because Data, Not Architecture, Is the Key Driver

The rapid progress of open-source models is evidence that data is the primary driver of AI capability, not proprietary architectures or training tricks. Data can be easily distilled from public APIs, allowing competitors to quickly close the gap with frontier models, which would be impossible if secret architectural tricks were the main advantage.

The data black hole at the center of AI thumbnail

The data black hole at the center of AI

Dwarkesh Podcast·a day ago

Scaling AI Models Larger Won't Solve Their Fundamental Data Inefficiency Problem

According to scaling laws, increasing model size offers minimal improvement to data efficiency. Even an infinitely large model would only reduce data needs by about 10x, a trivial amount compared to the thousands-to-millions-fold efficiency gap between AIs and humans. This suggests current architectures are on the wrong scaling curve for true intelligence.

The data black hole at the center of AI thumbnail

The data black hole at the center of AI

Dwarkesh Podcast·a day ago

The Human Genome Is Too Small to Support the 'Evolution Pre-Trained Our Brains' AI Analogy

The argument that evolution 'pre-trained' humans, excusing AI's data needs, is flawed. The human genome is too small to store a complex neural network's parameters. A better analogy is that evolution found the right hyperparameters and loss functions, while our brain's 'weights' are learned from scratch in our lifetime, making AI's data hunger even more stark.

The data black hole at the center of AI thumbnail

The data black hole at the center of AI

Dwarkesh Podcast·a day ago

RiffOn - The data black hole at the center of AI | Dwarkesh Podcast