/
© 2026 RiffOn. All rights reserved.

Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

  1. Super Data Science: ML & AI Podcast with Jon Krohn
  2. A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn · Mar 27, 2026

A new post-transformer architecture, BDH, crushes extreme Sudoku with 97.4% accuracy, while leading LLMs score 0%, revealing their limits.

Transformer LLMs' 0% Sudoku Score Reveals a Core Reasoning Failure

Top LLMs like Claude 3 and DeepSeek score 0% on complex Sudoku puzzles, a task humans can solve. This isn't a minor flaw but a categorical failure, exposing the transformer architecture's inability to handle constraint satisfaction problems that require backtracking and parallel reasoning, unlike its sequential, token-by-token processing.

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) thumbnail

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn·19 hours ago

AI's Next Frontier is 'Generative Strategy,' Not Just Information Summarization

Success on constraint-satisfaction puzzles like Sudoku signals a shift from current AI that summarizes existing information to a new class capable of 'generative strategy.' These models can analyze constraints and creatively propose novel solutions, tackling real-world planning problems in medicine, law, and operations rather than just describing what's already known.

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) thumbnail

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn·19 hours ago

Pathway's BDH Model Uses Brain-Like 'Sparse Activations' for Efficient Reasoning

Unlike transformers which use dense activations (firing most neurons), Pathway's BDH architecture uses sparse positive activations, where only ~5% of neurons fire at once. This approach is more biologically plausible, mimicking the human brain's energy efficiency and enabling complex reasoning without the massive computational overhead of dense models.

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) thumbnail

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn·19 hours ago

Internal Reasoning Makes New AI Models 10x Cheaper Than LLMs

Pathway's BDH model achieves 97.4% accuracy on extreme Sudoku at 10x lower cost than LLMs that get 0%. It avoids burning GPU cycles on generating text-based, step-by-step thoughts (Chain of Thought) by reasoning within its internal latent space. This demonstrates a massive economic advantage for non-transformer architectures on complex reasoning tasks.

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) thumbnail

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn·19 hours ago