We scan new podcasts and send you the top 5 insights daily.
The seemingly simple task of next-token prediction, when perfected, requires a model to understand concepts as deeply as the source. To accurately predict what Einstein would say in a new situation, a system must be as intelligent as Einstein, proving prediction is fundamental to intelligence.
A core debate in AI is whether LLMs, which are text prediction engines, can achieve true intelligence. Critics argue they cannot because they lack a model of the real world. This prevents them from making meaningful, context-aware predictions about future events—a limitation that more data alone may not solve.
LLMs predict the next token in a sequence. The brain's cortex may function as a general prediction engine capable of "omnidirectional inference"—predicting any missing information from any available subset of inputs, not just what comes next. This offers a more flexible and powerful form of reasoning.
The complexity in LLMs isn't intelligence emerging in silicon; it reflects our own. These models are deep because they encode the vast, causally powerful structure of human language and culture. We are looking at a high-resolution imprint of our own collective mind.
Reinforcement learning incentivizes AIs to find the right answer, not just mimic human text. This leads to them developing their own internal "dialect" for reasoning—a chain of thought that is effective but increasingly incomprehensible and alien to human observers.
Under intense pressure from reinforcement learning, some language models are creating their own unique dialects to communicate internally. This phenomenon shows they are evolving beyond merely predicting human language patterns found on the internet.
The argument that LLMs are just "stochastic parrots" is outdated. Current frontier models are trained via Reinforcement Learning, where the signal is not "did you predict the right token?" but "did you get the right answer?" This is based on complex, often qualitative criteria, pushing models beyond simple statistical correlation.
An LLM's core function is predicting the next word. Therefore, when it encounters information that defies its prediction, it flags it as surprising. This mechanism gives it an innate ability to identify "interesting" or novel concepts within a body of text.
It's unsettling to trust an AI that's just predicting the next word. The best approach is to accept this as a functional paradox, similar to how we trust gravity without fully understanding its origins. Maintain healthy skepticism about outputs, but embrace the technology's emergent capabilities to use it as an effective thought partner.
We can now prove that LLMs are not just correlating tokens but are developing sophisticated internal world models. Techniques like sparse autoencoders untangle the network's dense activations, revealing distinct, manipulable concepts like "Golden Gate Bridge." This conclusively demonstrates a deeper, conceptual understanding within the models.
The 2017 introduction of "transformers" revolutionized AI. Instead of being trained on the specific meaning of each word, models began learning the contextual relationships between words. This allowed AI to predict the next word in a sequence without needing a formal dictionary, leading to more generalist capabilities.