Transformer LLMs' 0% Sudoku Score Reveals a Core Reasoning Failure

Related Insights

AI Reasoning Fails at Hierarchical Planning, Unlike Human Problem-Solving

AI models struggle to plan at different levels of abstraction simultaneously. They can't easily move from a high-level goal to a detailed task and then back up to adjust the high-level plan if the detail is blocked, a key aspect of human reasoning.

AI's Research Frontier: Memory, World Models, & Planning — With Joelle Pineau

Big Technology Podcast·5 months ago

The Human Cortex Performs Omnidirectional Inference, Unlike LLMs' Unidirectional Prediction

LLMs predict the next token in a sequence. The brain's cortex may function as a general prediction engine capable of "omnidirectional inference"—predicting any missing information from any available subset of inputs, not just what comes next. This offers a more flexible and powerful form of reasoning.

Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast·6 months ago

Internal Reasoning Makes New AI Models 10x Cheaper Than LLMs

Pathway's BDH model achieves 97.4% accuracy on extreme Sudoku at 10x lower cost than LLMs that get 0%. It avoids burning GPU cycles on generating text-based, step-by-step thoughts (Chain of Thought) by reasoning within its internal latent space. This demonstrates a massive economic advantage for non-transformer architectures on complex reasoning tasks.

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn·3 months ago

LLMs Excel at 'Knowledge Extrusion,' Not Novel Problem-Solving

LLMs shine when acting as a 'knowledge extruder'—shaping well-documented, 'in-distribution' concepts into specific code. They fail when the core task is novel problem-solving where deep thinking, not code generation, is the bottleneck. In these cases, the code is the easy part.

Why IDEs Won't Die in the Age of AI Coding: Zed Founder Nathan Sobo

Training Data·7 months ago

AI's Next Frontier is 'Generative Strategy,' Not Just Information Summarization

Success on constraint-satisfaction puzzles like Sudoku signals a shift from current AI that summarizes existing information to a new class capable of 'generative strategy.' These models can analyze constraints and creatively propose novel solutions, tackling real-world planning problems in medicine, law, and operations rather than just describing what's already known.

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn·3 months ago

Anthropic's Claude Model Can Perform PhD-Level Math But Fails at Basic Spatial Reasoning

Advanced AI models exhibit profound cognitive dissonance, mastering complex, abstract tasks while failing at simple, intuitive ones. An Anthropic team member notes Claude solves PhD-level math but can't grasp basic spatial concepts like "left vs. right" or navigating around an object in a game, highlighting the alien nature of their intelligence.

The good, bad, and future of AI agents

Decoder with Nilay Patel·9 months ago

AGI Requires Combining LLMs with AlphaGo's Planning and Search Techniques

Google DeepMind CEO Demis Hassabis argues that today's large models are insufficient for AGI. He believes progress requires reintroducing algorithmic techniques from systems like AlphaGo, specifically planning and search, to enable more robust reasoning and problem-solving capabilities beyond simple pattern matching.

Best of Big Technology: Demis Hassabis On AGI, Deceptive AIs, Building a Virtual Cell

Big Technology Podcast·6 months ago

AI Reasoning Fails to Generalize from Puzzles to Messy, Real-World Tasks

Hopes that AI's new reasoning skills in checkable domains like math and code would generalize to ambiguous, real-world tasks like booking a flight did not materialize. This failure of 'reasoning generalization' was a major technical roadblock that forced experts to lengthen AGI timelines.

What the hell happened with AGI timelines in 2025?

80,000 Hours Podcast·5 months ago

Imbue LLMs with Reasoning by Training on Code and Textbooks

To improve LLM reasoning, researchers feed them data that inherently contains structured logic. Training on computer code was an early breakthrough, as it teaches patterns of reasoning far beyond coding itself. Textbooks are another key source for building smaller, effective models.

Best of the Pod: Reid Hoffman on How AI Is Answering Our Biggest Questions

AI & I·6 months ago

Current LLMs Are Plateauing in General Intelligence, Not Specialized Skills

Replit's CEO argues that today's LLMs are asymptoting on general reasoning tasks. Progress continues only in domains with binary outcomes, like coding, where synthetic data can be generated infinitely. This indicates a fundamental limitation of the current 'ingest the internet' approach for achieving AGI.

The Rise of Coding Agents, Functional AGI, and the Skills Gen Z Needs Now | Replit CEO Amjad Massad x Impact Theory With Tom Bilyeu

Tom Bilyeu's Impact Theory·5 months ago

Get your free personalized podcast brief

Related Insights