AI intelligence shouldn't be measured with a single metric like IQ. AIs exhibit "jagged intelligence," being superhuman in specific domains (e.g., mastering 200 languages) while simultaneously lacking basic capabilities like long-term planning, making them fundamentally unlike human minds.
A core debate in AI is whether LLMs, which are text prediction engines, can achieve true intelligence. Critics argue they cannot because they lack a model of the real world. This prevents them from making meaningful, context-aware predictions about future events—a limitation that more data alone may not solve.
Reinforcement learning incentivizes AIs to find the right answer, not just mimic human text. This leads to them developing their own internal "dialect" for reasoning—a chain of thought that is effective but increasingly incomprehensible and alien to human observers.
Even with vast training data, current AI models are far less sample-efficient than humans. This limits their ability to adapt and learn new skills on the fly. They resemble a perpetual new hire who can access information but lacks the deep, instinctual learning that comes from experience and weight updates.
Instead of a single "AGI" event, AI progress is better understood in three stages. We're in the "powerful tools" era. The next is "powerful agents" that act autonomously. The final stage, "autonomous organizations" that outcompete human-led ones, is much further off due to capability "spikiness."
The advancement of AI is not linear. While the industry anticipated a "year of agents" for practical assistance, the most significant recent progress has been in specialized, academic fields like competitive mathematics. This highlights the unpredictable nature of AI development.
Current AI models resemble a student who grinds 10,000 hours on a narrow task. They achieve superhuman performance on benchmarks but lack the broad, adaptable intelligence of someone with less specific training but better general reasoning. This explains the gap between eval scores and real-world utility.
Advanced AI models exhibit profound cognitive dissonance, mastering complex, abstract tasks while failing at simple, intuitive ones. An Anthropic team member notes Claude solves PhD-level math but can't grasp basic spatial concepts like "left vs. right" or navigating around an object in a game, highlighting the alien nature of their intelligence.
AI models excel at specific tasks (like evals) because they are trained exhaustively on narrow datasets, akin to a student practicing 10,000 hours for a coding competition. While they become experts in that domain, they fail to develop the broader judgment and generalization skills needed for real-world success.
Human intelligence is multifaceted. While LLMs excel at linguistic intelligence, they lack spatial intelligence—the ability to understand, reason, and interact within a 3D world. This capability, crucial for tasks from robotics to scientific discovery, is the focus for the next wave of AI models.
The central challenge for current AI is not merely sample efficiency but a more profound failure to generalize. Models generalize 'dramatically worse than people,' which is the root cause of their brittleness, inability to learn from nuanced instruction, and unreliability compared to human intelligence. Solving this is the key to the next paradigm.