Attempting to interpret every learned circuit in a complex neural network is a futile effort. True understanding comes from describing the system's foundational elements: its architecture, learning rule, loss functions, and the data it was trained on. The emergent complexity is a result of this process.
The brain connects abstract, learned concepts (like social status) to innate rewards (like shame or pride) via a "steering subsystem." The cortex learns to predict the responses of this more primitive system, effectively linking new knowledge to hardwired emotional and motivational circuits.
The small size of the human genome is a puzzle. The solution may be that evolution doesn't store a large "pre-trained model." Instead, it uses the limited genomic space to encode a complex set of reward and loss functions, which is a far more compact way to guide a powerful learning algorithm.
LLMs predict the next token in a sequence. The brain's cortex may function as a general prediction engine capable of "omnidirectional inference"—predicting any missing information from any available subset of inputs, not just what comes next. This offers a more flexible and powerful form of reasoning.
Languages like Lean allow mathematical proofs to be automatically verified. This provides a perfect, binary reward signal (correct/incorrect) for a reinforcement learning agent. It transforms the abstract art of mathematics into a well-defined environment, much like a game of Go, that an AI can be trained to master.
Modern LLMs use a simple form of reinforcement learning that directly rewards successful outcomes. This contrasts with more sophisticated methods, like those in AlphaGo or the brain, which use "value functions" to estimate long-term consequences. It's a mystery why the simpler approach is so effective.
"Amortized inference" bakes slow, deliberative reasoning into a fast, single-pass model. While the brain uses a mix, digital minds have a strong incentive to amortize more capabilities. This is because once a capability is baked in, the resulting model can be copied infinitely, unlike a biological brain.
AI models use simple, mathematically clean loss functions. The human brain's superior learning efficiency might stem from evolution hard-coding numerous, complex, and context-specific loss functions that activate at different developmental stages, creating a sophisticated learning curriculum.
The brain's hardware limitations, like slow and stochastic neurons, may actually be advantages. These properties seem perfectly suited for probabilistic inference algorithms that rely on sampling—a task that requires explicit, computationally-intensive random number generation in digital systems. Hardware and algorithm are likely co-designed.
An experiment showed that given a fixed compute budget, training a population of 16 agents produced a top performer that beat a single agent trained with the entire budget. This suggests that the co-evolution and diversity of strategies in a multi-agent setup can be more effective than raw computational power alone.
A novel training method involves adding an auxiliary task for AI models: predicting the neural activity of a human observing the same data. This "brain-augmented" learning could force the model to adopt more human-like internal representations, improving generalization and alignment beyond what simple labels can provide.
Single-cell brain atlases reveal that subcortical "steering" regions have a vastly greater diversity of cell types than the more uniform cortex. This supports the idea that our innate drives and reflexes are encoded in complex, genetically pre-wired circuits, while the cortex is a more general-purpose learning architecture.
