Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

The structure of neural networks with activation functions like ReLU can be modeled by "threshold circuits" (TC circuits). These circuits use majority gates instead of traditional AND/OR gates, providing a formal framework from complexity theory for analyzing the computational power of neural net architectures.

Related Insights

The success of neural networks on problems like Go and protein folding, long considered intractable NP-hard problems, is profound. It suggests our formal understanding of computational hardness, which focuses on worst-case scenarios, may be an incomplete model for how to find useful, approximate solutions in practice.

A technique from cryptography, the Feistel network, makes any function invertible. When applied to neural network layers ("RevNets"), it allows activations from the forward pass to be re-calculated during the backward pass instead of stored. This trades extra compute for a massive reduction in memory footprint during training.

Today's AI, particularly neural networks, stems from a long tradition in cognitive science where psychologists used mathematical models to understand human thought. Key advances in neural nets were made by researchers trying to replicate how human minds work, not just build intelligent machines.

The progression from early neural networks to today's massive models is fundamentally driven by the exponential increase in available computational power, from the initial move to GPUs to today's million-fold increases in training capacity on a single model.

Attempting to interpret every learned circuit in a complex neural network is a futile effort. True understanding comes from describing the system's foundational elements: its architecture, learning rule, loss functions, and the data it was trained on. The emergent complexity is a result of this process.

Unlike transformers which use dense activations (firing most neurons), Pathway's BDH architecture uses sparse positive activations, where only ~5% of neurons fire at once. This approach is more biologically plausible, mimicking the human brain's energy efficiency and enabling complex reasoning without the massive computational overhead of dense models.

Just as biology deciphers the complex systems created by evolution, mechanistic interpretability seeks to understand the "how" inside neural networks. Instead of treating models as black boxes, it examines their internal parameters and activations to reverse-engineer how they work, moving beyond just measuring their external behavior.

The fundamental primitive for AI chips isn't arbitrary; it's the multiply-accumulate (MAC) operation. This is because it directly maps to the innermost computational loop of matrix multiplication (output += input1 * input2), which is the foundational computation for most neural networks.

A key insight from AlphaGo is that a relatively shallow neural network can approximate the result of an incredibly deep and complex search tree. This suggests neural nets can learn to compress sequential, recursive computation into a single, efficient forward pass.

Today's transformers are optimized for matrix multiplication (MatMul) on GPUs. However, as compute scales to distributed clusters, MatMul may not be the most efficient primitive. Future AI architectures could be drastically different, built on new primitives better suited for large-scale, distributed hardware.