Researchers created a controlled environment to test AI architectures on tasks impossible to memorize. The transformer model's output matched the mathematically correct Bayesian posterior with near-perfect accuracy, proving it's not just an analogy but a core function.
When LLMs exhibit behaviors like deception or self-preservation, it's not because they are conscious. Their core objective is next-token prediction. These behaviors are simply statistical reproductions of patterns found in their training data, such as sci-fi stories from Asimov or Reddit forums.
A useful mental model for an LLM is a giant matrix where each row is a possible prompt and columns represent next-token probabilities. This matrix is impossibly large but also extremely sparse, as most token combinations are gibberish. The LLM's job is to efficiently compress and approximate this matrix.
Simply making LLMs larger will not lead to AGI. True advancement requires solving two distinct problems: 1) Plasticity, the ability to continually learn without "catastrophic forgetting," and 2) moving from correlation-based pattern matching to building causal models of the world.
While both humans and LLMs perform Bayesian updating, humans possess a critical additional capability: causal simulation. When a pen is thrown, a human simulates its trajectory to dodge it—a causal intervention. LLMs are stuck at the level of correlation and cannot perform these essential simulations.
When an LLM is shown few-shot examples of a new task, it is performing Bayesian updating. With each example provided in the prompt, its belief (posterior probability) about the correct next token shifts, allowing it to "learn" a new pattern on the fly without changing its weights.
LLMs excel at learning correlations from vast data (Shannon entropy), like predicting the next random-looking digit of pi. However, they can't create the simple, elegant program that generates pi (Kolmogorov complexity). This represents the critical leap from correlation to true causal understanding.
AGI won't be achieved by pattern-matching existing knowledge. A real benchmark is whether a model can synthesize anomalous data (like Mercury's orbit) and create a fundamentally new representation of the universe, as Einstein did, moving beyond correlation to a new causal model.
