Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Verifiability alone doesn't explain AI's rapid progress in math and coding. The key factor is 'grindability'—the ability to run thousands of parallel, containerized, and deterministic simulations. This allows for efficient credit assignment and learning, a luxury not available in domains like e-commerce or business strategy, which are constrained by real-world interactions and bot detectors.

Related Insights

AI agents excel not because they are inherently more intelligent, but because they can exhaustively test possibilities without the cognitive fatigue that limits human performance. This 'relentless tedium' is a superpower for tasks like finding obscure bugs.

Today's AI boom is fueled by scaling computation, which is a known engineering challenge. The alternative, embedding nuanced, human-like inductive biases, is far harder as it requires a deep understanding of the problem space. This difficulty gap explains why massive models dominate AI development over more targeted, efficient ones—scaling is simply the more straightforward path.

Andrej Karpathy's 'Software 2.0' framework posits that AI automates tasks that are easily *verifiable*. This explains the 'jagged frontier' of AI progress: fields like math and code, where correctness is verifiable, advance rapidly. In contrast, creative and strategic tasks, where success is subjective and hard to verify, lag significantly behind.

In domains like coding and math where correctness is automatically verifiable, AI can move beyond imitating humans (RLHF). Using pure reinforcement learning, or "experiential learning," models learn via self-play and can discover novel, superhuman strategies similar to AlphaGo's Move 37.

Judgment Labs CEO Alex Shan argues that AI agents will first dominate domains with easily verifiable results, like coding, where a solution's correctness can be quickly checked. Progress will be slower in non-verifiable fields like law or complex drug discovery, where feedback loops are long and ambiguous.

Unlike other sciences, mathematics has historically lacked a strong experimental branch. AI changes this by enabling large-scale studies—for example, testing a thousand different problem-solving approaches on a thousand problems. This creates a new, data-driven methodology for a field that has been almost entirely theoretical.

Demis Hassabis identifies a key obstacle for AGI. Unlike in math or games where answers can be verified, the messy real world lacks clear success metrics. This makes it difficult for AI systems to use self-improvement loops, limiting their ability to learn and adapt outside of highly structured domains.

AI's ability to code seems like advanced reasoning, but it's actually just navigating the most complete archive of human knowledge ever created. Programming's version control, documentation, and forums provide a perfectly mapped territory for AI to search, not a complex problem for it to solve through intelligence.

AI models improve dramatically in domains with objective feedback, like coding (unit tests) or science (lab results). Progress is slower in subjective fields like creative writing where feedback is opinion-based, explaining the uneven impact of AI across different types of knowledge work.

We perceive complex math as a pinnacle of intelligence, but for AI, it may be an easier problem than tasks we find trivial. Like chess, which computers mastered decades ago, solving major math problems might not signify human-level reasoning but rather that the domain is surprisingly susceptible to computational approaches.