Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Future AI expressivity won't come from adding more identical layers, but from 'nesting' levels with different update frequencies. This allows some parts of the system to adapt rapidly (like working memory) while others preserve core knowledge (long-term memory), mimicking human cognition.

Related Insights

Significant opportunity exists in re-architecting how AI models work. Instead of building ever-larger single models, the focus is shifting to creating networks of smaller, specialized models that collaborate, which can drastically reduce the cost per token produced.

Solving key AI weaknesses like continual learning or robust reasoning isn't just a matter of bigger models or more data. Shane Legg argues it requires fundamental algorithmic and architectural changes, such as building new processes for integrating information over time, akin to an episodic memory.

Broad improvements in AI's general reasoning are plateauing due to data saturation. The next major phase is vertical specialization. We will see an "explosion" of different models becoming superhuman in highly specific domains like chemistry or physics, rather than one model getting slightly better at everything.

Instead of just expanding context windows, the next architectural shift is toward models that learn to manage their own context. Inspired by Recursive Language Models (RLMs), these agents will actively retrieve, transform, and store information in a persistent state, enabling more effective long-horizon reasoning.

Contrary to the prevailing 'scaling laws' narrative, leaders at Z.AI believe that simply adding more data and compute to current Transformer architectures yields diminishing returns. They operate under the conviction that a fundamental performance 'wall' exists, necessitating research into new architectures for the next leap in capability.

The key to continual learning is not just a longer context window, but a new architecture with a spectrum of memory types. "Nested learning" proposes a model with different layers that update at different frequencies—from transient working memory to persistent core knowledge—mimicking how humans learn without catastrophic forgetting.

New AI models are moving away from brute-force computation. By selectively focusing on relevant data, much like the human brain indexes memories, they can achieve massive performance gains and cost reductions, overcoming a major bottleneck in current architectures.

Current AI development focuses on "vertical scaling" (bigger models), akin to early humans getting smarter individually. The real breakthrough, like humanity's invention of language, will come from "horizontal scaling"—enabling AI agents to share knowledge and collaborate.

A major flaw in current AI is that models are frozen after training and don't learn from new interactions. "Nested Learning," a new technique from Google, offers a path for models to continually update, mimicking a key aspect of human intelligence and overcoming this static limitation.

Recent AI breakthroughs aren't just from better models, but from clever 'architecture' or 'scaffolding' around them. For example, Claude Code 'cheats' its context window limit by taking notes, clearing its memory, and then reading the notes to resume work. This architectural innovation drives performance.

AI Scaling Will Shift From Stacking Layers to Nesting Different Update Frequencies | RiffOn