We scan new podcasts and send you the top 5 insights daily.
LLMs operate autoregressively, making one decision (token) at a time without seeing the full problem space. This can lead to hallucinations or dead ends. EBMs are non-autoregressive, allowing them to see all possible routes simultaneously and select an optimal path, much like having a bird's-eye view of a map to avoid a hole in the road.
A useful mental model for an LLM is a giant matrix where each row is a possible prompt and columns represent next-token probabilities. This matrix is impossibly large but also extremely sparse, as most token combinations are gibberish. The LLM's job is to efficiently compress and approximate this matrix.
LLMs predict the next token in a sequence. The brain's cortex may function as a general prediction engine capable of "omnidirectional inference"—predicting any missing information from any available subset of inputs, not just what comes next. This offers a more flexible and powerful form of reasoning.
According to Demis Hassabis, LLMs feel uncreative because they only perform pattern matching. To achieve true, extrapolative creativity like AlphaGo's famous 'Move 37,' models must be paired with a search component that actively explores new parts of the knowledge space beyond the training data.
Billions have been invested in the LLM data center and hardware ecosystem, creating a powerful inertia. For an alternative architecture like EBMs to succeed, it cannot demand a full replacement. Instead, it must position itself as a compatible layer that makes existing LLM investments cheaper and more effective for specific tasks like spatial reasoning.
The argument that LLMs are just "stochastic parrots" is outdated. Current frontier models are trained via Reinforcement Learning, where the signal is not "did you predict the right token?" but "did you get the right answer?" This is based on complex, often qualitative criteria, pushing models beyond simple statistical correlation.
Unlike transformers which use dense activations (firing most neurons), Pathway's BDH architecture uses sparse positive activations, where only ~5% of neurons fire at once. This approach is more biologically plausible, mimicking the human brain's energy efficiency and enabling complex reasoning without the massive computational overhead of dense models.
Unlike LLMs, which can hallucinate and behave unpredictably in novel situations, EBMs have an architecture designed to be constrained. A human can define a set of rules or constraints, and the EBM is forced to follow them, making it a more reliable choice for mission-critical systems like autonomous vehicles or financial trading.
LLMs' intelligence is dependent on the language they are trained on, meaning their reasoning process differs between, for example, English and French. This is unnatural for tasks like spatial reasoning, which are language-agnostic. EBMs operate on an abstract, token-free level, mapping information directly without a language-based intermediary.
EBMs are based on a fundamental principle in physics where systems naturally seek their lowest energy state (e.g., sitting on a couch when tired). The model maps all possible outcomes onto an 'energy landscape,' where the lowest points represent the most probable solutions. This avoids the expensive, token-by-token guessing game played by LLMs.
EBMs analyze data to understand its underlying rules, storing this knowledge in inspectable 'latent variables' in the form of an energy landscape. This contrasts with LLMs, which are black boxes where the reasoning process is opaque. With EBMs, you can observe the model's internal state in real-time to see what it has learned.