The "Bitter Lesson" of Scale Applies Directly to Training Powerful Search Models

Related Insights

Richard Sutton's 'Bitter Lesson' Implies Current LLMs Are Inefficient Users of Compute

The "Bitter Lesson" is not just about using more compute, but leveraging it scalably. Current LLMs are inefficient because they only learn during a discrete training phase, not during deployment where most computation occurs. This reliance on a special, data-intensive training period is not a scalable use of computational resources.

Some thoughts on the Sutton interview

Dwarkesh Podcast·10 months ago

Search's Next Bottlenecks Are Infrastructure Scale and Unearthing Real-World Data

With model intelligence advancing, the next hurdles for perfect search are operational. First, building infrastructure to handle a 1000x increase in agent-driven queries. Second, the "data bottleneck" of capturing and indexing vast information that exists only offline.

Building Search for AI Agents with Exa CEO Will Bryk

The a16z Show·2 months ago

Efficient Retrieval Lets Smaller LLMs Outperform Large Ones, Solving the 'Tokenpocalypse'

Instead of using massive, expensive LLMs for every task, companies can solve the "tokenpocalypse" (runaway token costs) by pairing smaller models with high-quality retrieval systems. This allows cheap models to act like large ones, saving significant costs.

Building Search for AI Agents with Exa CEO Will Bryk

The a16z Show·2 months ago

Frontier AI Labs Now Deny "Scaling Is All You Need," Focusing on Complex Post-Training Pipelines

The original playbook of simply scaling parameters and data is now obsolete. Top AI labs have pivoted to heavily designed post-training pipelines, retrieval, tool use, and agent training, acknowledging that raw scaling is insufficient to solve real-world problems.

How Foundation Models Evolved: A PhD Journey Through AI's Breakthrough Era

The a16z Show·6 months ago

Superhuman AI Performance Comes from RL Eliciting Latent, Pre-Trained Capabilities

Reinforcement learning achieves superhuman results not by inventing alien concepts, but by surfacing and combining rare behaviors that are already possible within a model's vast pre-trained distribution. The goal of pre-training is to make this search for novel solutions more efficient and less random.

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

LLMs Nullify Google's Historical Moats, Enabling Startups to Outcompete in Search

Google's moats (human click data, large re-ranking teams) are less relevant for AI agents. LLMs allow small, agile teams to build superior search products by training their own models without needing decades of user signal data.

Building Search for AI Agents with Exa CEO Will Bryk

The a16z Show·2 months ago

Google Search's 2001 Quality Leap Came from Fitting Its Entire Index in Memory

In 2001, Google realized its combined server RAM could hold a full copy of its web index. Moving from disk-based to in-memory systems eliminated slow disk seeks, enabling complex queries with synonyms and semantic expansion. This fundamentally improved search quality long before LLMs became mainstream.

Owning the AI Pareto Frontier — Jeff Dean

Latent Space: The AI Engineer Podcast·5 months ago

AI Capability Improves Non-Linearly With Massive Increases in Training Data

A key surprise in AI development was the non-linear impact of scale. Sebastian Thrun noted that while AI trained on millions of documents is 'fine,' training it on hundreds of billions creates an 'unbelievably smart' system, shocking even its creators and demonstrating data volume as a primary driver of breakthroughs.

Search Engine Presents: Are you a good driver?

Odd Lots·3 months ago

AI Product Usage Creates a Data Flywheel for Improving Search Accuracy

Perplexity leverages its user-facing product to improve its core search technology. When the LLM reasons through search snippets and selects which ones to cite in an answer, that selection process acts as a powerful signal to refine and improve the underlying search ranking algorithm for future queries.

Perplexity Chief Business Officer Dmitry Shevelenko: why curiosity is now AI’s scarcest resource

Summation with Auren Hoffman·2 months ago

Yahoo's AI Search Uses a Lightweight LLM to Process Its 30-Year Proprietary Data Trove

Yahoo built its AI search engine, Scout, not by training a massive model, but by using a smaller, affordable LLM (Anthropic's Haiku) as a processing layer. The real power comes from feeding this model Yahoo's 30 years of proprietary search data and knowledge graphs.

Yahoo CEO Jim Lanzone on reviving the web's homepage

Decoder with Nilay Patel·4 months ago

Get your free personalized podcast brief

Related Insights