While language models are becoming incrementally better at conversation, the next significant leap in AI is defined by multimodal understanding and the ability to perform tasks, such as navigating websites. This shift from conversational prowess to agentic action marks the new frontier for a true "step change" in AI capabilities.

Related Insights

Language is just one 'keyhole' into intelligence. True artificial general intelligence (AGI) requires 'world modeling'—a spatial intelligence that understands geometry, physics, and actions. This capability to represent and interact with the state of the world is the next critical phase of AI development beyond current language models.

The true evolution of voice AI is not just adding voice commands to screen-based interfaces. It's about building agents so trustworthy they eliminate the need for screens for many tasks. This shift from hybrid voice/screen interaction to a screenless future is the next major leap in user modality.

The evolution from simple voice assistants to 'omni intelligence' marks a critical shift where AI not only understands commands but can also take direct action through connected software and hardware. This capability, seen in new smart home and automotive applications, will embed intelligent automation into our physical environments.

The next frontier for conversational AI is not just better text, but "Generative UI"—the ability to respond with interactive components. Instead of describing the weather, an AI can present a weather widget, merging the flexibility of chat with the richness of a graphical interface.

Elias Torres argues that the current AI paradigm, which focuses on tools that assist humans (e.g., summarizers, drafters), is fundamentally limited. He believes true value is unlocked when you can instruct an AI to perform a task *infinitely* on its own, without requiring a human to type into a chat box for every action.

The perceived limits of today's AI are not inherent to the models themselves but to our failure to build the right "agentic scaffold" around them. There's a "model capability overhang" where much more potential can be unlocked with better prompting, context engineering, and tool integrations.

The recent leap in AI coding isn't solely from a more powerful base model. The true innovation is a product layer that enables agent-like behavior: the system constantly evaluates and refines its own output, leading to far more complex and complete results than the LLM could achieve alone.

While AI models excel at gathering and synthesizing information ('knowing'), they are not yet reliable at executing actions in the real world ('doing'). True agentic systems require bridging this gap by adding crucial layers of validation and human intervention to ensure tasks are performed correctly and safely.

Human intelligence is multifaceted. While LLMs excel at linguistic intelligence, they lack spatial intelligence—the ability to understand, reason, and interact within a 3D world. This capability, crucial for tasks from robotics to scientific discovery, is the focus for the next wave of AI models.

The future of AI is not just humans talking to AI, but a world where personal agents communicate directly with business agents (e.g., your agent negotiating a loan with a bank's agent). This will necessitate new communication protocols and guardrails, creating a societal transformation comparable to the early internet.

True AI Breakthroughs Are No Longer About Better Chat, But About Agentic Capabilities | RiffOn