Human communication is returning to its oral and visual roots. Text, a low-dimensional medium, was a temporary necessity for scalable knowledge storage—a 'parenthesis' in history. As AI makes creating rich media as easy as writing, society will default back to more natural, higher-bandwidth formats like audio and video.
Don't view generative AI video as just a way to make traditional films more efficiently. Ben Horowitz sees it as a fundamentally new creative medium, much like movies were to theater. It enables entirely new forms of storytelling by making visuals that once required massive budgets accessible to anyone.
Current text-based prompting for AI is a primitive, temporary phase, similar to MS-DOS. The future lies in more intuitive, constrained, and creative interfaces that allow for richer, more visual exploration of a model's latent space, moving beyond just natural language.
Observing that younger generations prefer consuming information via video (TikTok) and communicating via voice, Superhuman's CTO predicts a fundamental shift in user experience. Future interfaces, including email, will likely become more conversational and audio-based rather than relying on typing and reading.
While today's focus is on text-based LLMs, the true, defensible AI battleground will be in complex modalities like video. Generating video requires multiple interacting models and unique architectures, creating far greater potential for differentiation and a wider competitive moat than text-based interfaces, which will become commoditized.
This idea posits that language is a lossy, discrete abstraction of reality. In contrast, pixels (visual input) are a more fundamental representation. We perceive language physically—as pixels on a page or sound waves—and tokenizing it discards rich information like font, layout, and visual context.
The future of media is not just recommended content, but content rendered on-the-fly for each user. AI will analyze micro-behaviors like eye movement and swipe speed to generate the most engaging possible video in that exact moment. The algorithm will become the content itself.
As AI-generated content creates a sea of sameness, audiences will seek what machines cannot replicate: genuine emotion and deep, personal narrative. This will drive a creator-led shift toward more meaningful, long-form content that offers a real human connection.
The next user interface paradigm is delegation, not direct manipulation. Humans will communicate with AI agents via voice, instructing them to perform complex tasks on computers. This will shift daily work from hours of clicking and typing to zero, fundamentally changing our relationship with technology.
The next generation of social networks will be fundamentally different, built around the creation of functional software and AI models, not just media. The status game will shift from who has the best content to who can build the most useful or interesting tools for the community.
An AI CEO predicts that within two years, AI tools will make content creation instantaneous and nearly free. This will destroy traditional moats like audience loyalty and production quality, as anyone can generate photorealistic content. The market will shift focus from the creator to the individual content piece.