Pathway's BDH Model Uses Brain-Like 'Sparse Activations' for Efficient Reasoning

Related Insights

LLMs Function as Compressed Representations of an Impossibly Large and Sparse Probability Matrix

A useful mental model for an LLM is a giant matrix where each row is a possible prompt and columns represent next-token probabilities. This matrix is impossibly large but also extremely sparse, as most token combinations are gibberish. The LLM's job is to efficiently compress and approximate this matrix.

What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado

The a16z Show·3 months ago

The Human Cortex Performs Omnidirectional Inference, Unlike LLMs' Unidirectional Prediction

LLMs predict the next token in a sequence. The brain's cortex may function as a general prediction engine capable of "omnidirectional inference"—predicting any missing information from any available subset of inputs, not just what comes next. This offers a more flexible and powerful form of reasoning.

Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast·6 months ago

The Brain's Power Is Its Vast Configurational Space, Not Processing Speed

The human brain contains more potential connections than there are atoms in the universe. This immense, dynamic 'configurational space' is the source of its power, not raw processing speed. Silicon chips are fundamentally different and cannot replicate this morphing, high-dimensional architecture.

Lee Cronin "Sam Altman Is Delusional, Hinton Needs Therapy, P(Doom) Is Nonsense"

AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts·6 months ago

Internal Reasoning Makes New AI Models 10x Cheaper Than LLMs

Pathway's BDH model achieves 97.4% accuracy on extreme Sudoku at 10x lower cost than LLMs that get 0%. It avoids burning GPU cycles on generating text-based, step-by-step thoughts (Chain of Thought) by reasoning within its internal latent space. This demonstrates a massive economic advantage for non-transformer architectures on complex reasoning tasks.

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Super Data Science: ML & AI Podcast with Jon Krohn·3 months ago

MiniMax M2.1 Uses a 'Sparse' Architecture for Big Model Power at Small Model Cost

The model uses a Mixture-of-Experts (MoE) architecture with over 200 billion parameters, but only activates a "sparse" 10 billion for any given task. This design provides the knowledge base of a massive model while keeping inference speed and cost comparable to much smaller models.

MiniMax M2.1 Bets That ‘Most Usable’ Beats ‘Most Massive’

Machine Learning Tech Brief By HackerNoon·5 months ago

The Binary "Reasoning vs. Non-Reasoning" Model Distinction Is Now Obsolete

Classifying a model as "reasoning" based on a chain-of-thought step is no longer useful. With massive differences in token efficiency, a so-called "reasoning" model can be faster and cheaper than a "non-reasoning" one for a given task. The focus is shifting to a continuous spectrum of capability versus overall cost.

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah-Hill Smith

Latent Space: The AI Engineer Podcast·6 months ago

Co-designing LLMs with Target Hardware Unlocks Major Inference Efficiency Gains

Model architecture decisions directly impact inference performance. AI company Zyphra pre-selects target hardware and then chooses model parameters—such as a hidden dimension with many powers of two—to align with how GPUs split up workloads, maximizing efficiency from day one.

How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony

Latent Space: The AI Engineer Podcast·8 months ago

LLM Performance Correlates with Total, Not Active, Parameters, Suggesting Sparsity Can Increase Further

Performance on knowledge-intensive benchmarks correlates strongly with an MoE model's total parameter count, not its active parameter count. With leading models like Kimi K2 reportedly using only ~3% active parameters, this suggests there is significant room to increase sparsity and efficiency without degrading factual recall.

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith

Latent Space: The AI Engineer Podcast·6 months ago

Architectural Innovation Is Key to China's AI Cost Efficiency

Chinese AI models like Kimi achieve dramatic cost reductions through specific architectural choices, not just scale. Using a "mixture of experts" design, they only utilize a fraction of their total parameters for any given task, making them far more efficient to run than the "dense" models common in the West.

China Decode: How an AI Price War Could Spark a Market Correction

The Prof G Pod with Scott Galloway·7 months ago

Startup Uses Live Neurons as an 'Algorithm Discovery Platform' for AI

A neuroscientist-led startup is growing live neurons on electrodes not just for compute efficiency, but as a platform to discover novel algorithms. By studying how biological networks process information, they identify neuroscience principles that can be used as software plugins to improve current AI models and find successors to the transformer architecture.

Something Mini is Coming, Anthropic's $20B Round, Ackman’s Meta Move | Bryan Johnson, Andrew Huberman, Matthew Zeitlin, Joon Sung Park, David Risher, Todd McKinnon, Alexander Ksendzovsky

TBPN·4 months ago

Get your free personalized podcast brief

Related Insights