We scan new podcasts and send you the top 5 insights daily.
EBMs analyze data to understand its underlying rules, storing this knowledge in inspectable 'latent variables' in the form of an energy landscape. This contrasts with LLMs, which are black boxes where the reasoning process is opaque. With EBMs, you can observe the model's internal state in real-time to see what it has learned.
Contrary to fears that reinforcement learning would push models' internal reasoning (chain-of-thought) into an unexplainable shorthand, OpenAI has not seen significant evidence of this "neural ease." Models still predominantly use plain English for their internal monologue, a pleasantly surprising empirical finding that preserves a crucial method for safety research and interpretability.
Just as biology deciphers the complex systems created by evolution, mechanistic interpretability seeks to understand the "how" inside neural networks. Instead of treating models as black boxes, it examines their internal parameters and activations to reverse-engineer how they work, moving beyond just measuring their external behavior.
Large Language Models are limited because they lack an understanding of the physical world. The next evolution is 'World Models'—AI trained on real-world sensory data to understand physics, space, and context. This is the foundational technology required to unlock physical AI like advanced robotics.
Contrary to fears, interpretability techniques for Transformers seem to work well on new architectures like Mamba and Mixture-of-Experts. These architectures may even offer novel "affordances," such as interpretable routing paths in MoEs, that could make understanding models easier, not harder.
Unlike LLMs, which can hallucinate and behave unpredictably in novel situations, EBMs have an architecture designed to be constrained. A human can define a set of rules or constraints, and the EBM is forced to follow them, making it a more reliable choice for mission-critical systems like autonomous vehicles or financial trading.
LLMs' intelligence is dependent on the language they are trained on, meaning their reasoning process differs between, for example, English and French. This is unnatural for tasks like spatial reasoning, which are language-agnostic. EBMs operate on an abstract, token-free level, mapping information directly without a language-based intermediary.
LLMs operate autoregressively, making one decision (token) at a time without seeing the full problem space. This can lead to hallucinations or dead ends. EBMs are non-autoregressive, allowing them to see all possible routes simultaneously and select an optimal path, much like having a bird's-eye view of a map to avoid a hole in the road.
Language models work by identifying subtle, implicit patterns in human language that even linguists cannot fully articulate. Their success broadens our definition of "knowledge" to include systems that can embody and use information without the explicit, symbolic understanding that humans traditionally require.
EBMs are based on a fundamental principle in physics where systems naturally seek their lowest energy state (e.g., sitting on a couch when tired). The model maps all possible outcomes onto an 'energy landscape,' where the lowest points represent the most probable solutions. This avoids the expensive, token-by-token guessing game played by LLMs.
We can now prove that LLMs are not just correlating tokens but are developing sophisticated internal world models. Techniques like sparse autoencoders untangle the network's dense activations, revealing distinct, manipulable concepts like "Golden Gate Bridge." This conclusively demonstrates a deeper, conceptual understanding within the models.