Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Unlike transformers which use dense activations (firing most neurons), Pathway's BDH architecture uses sparse positive activations, where only ~5% of neurons fire at once. This approach is more biologically plausible, mimicking the human brain's energy efficiency and enabling complex reasoning without the massive computational overhead of dense models.

Related Insights

A useful mental model for an LLM is a giant matrix where each row is a possible prompt and columns represent next-token probabilities. This matrix is impossibly large but also extremely sparse, as most token combinations are gibberish. The LLM's job is to efficiently compress and approximate this matrix.

LLMs predict the next token in a sequence. The brain's cortex may function as a general prediction engine capable of "omnidirectional inference"—predicting any missing information from any available subset of inputs, not just what comes next. This offers a more flexible and powerful form of reasoning.

The human brain contains more potential connections than there are atoms in the universe. This immense, dynamic 'configurational space' is the source of its power, not raw processing speed. Silicon chips are fundamentally different and cannot replicate this morphing, high-dimensional architecture.

Pathway's BDH model achieves 97.4% accuracy on extreme Sudoku at 10x lower cost than LLMs that get 0%. It avoids burning GPU cycles on generating text-based, step-by-step thoughts (Chain of Thought) by reasoning within its internal latent space. This demonstrates a massive economic advantage for non-transformer architectures on complex reasoning tasks.

The model uses a Mixture-of-Experts (MoE) architecture with over 200 billion parameters, but only activates a "sparse" 10 billion for any given task. This design provides the knowledge base of a massive model while keeping inference speed and cost comparable to much smaller models.

Classifying a model as "reasoning" based on a chain-of-thought step is no longer useful. With massive differences in token efficiency, a so-called "reasoning" model can be faster and cheaper than a "non-reasoning" one for a given task. The focus is shifting to a continuous spectrum of capability versus overall cost.

Model architecture decisions directly impact inference performance. AI company Zyphra pre-selects target hardware and then chooses model parameters—such as a hidden dimension with many powers of two—to align with how GPUs split up workloads, maximizing efficiency from day one.

Performance on knowledge-intensive benchmarks correlates strongly with an MoE model's total parameter count, not its active parameter count. With leading models like Kimi K2 reportedly using only ~3% active parameters, this suggests there is significant room to increase sparsity and efficiency without degrading factual recall.

Chinese AI models like Kimi achieve dramatic cost reductions through specific architectural choices, not just scale. Using a "mixture of experts" design, they only utilize a fraction of their total parameters for any given task, making them far more efficient to run than the "dense" models common in the West.

A neuroscientist-led startup is growing live neurons on electrodes not just for compute efficiency, but as a platform to discover novel algorithms. By studying how biological networks process information, they identify neuroscience principles that can be used as software plugins to improve current AI models and find successors to the transformer architecture.

Pathway's BDH Model Uses Brain-Like 'Sparse Activations' for Efficient Reasoning | RiffOn