Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Unlike current text-based LLMs, effective agentic commerce requires a visual interface. Consumers need to see generated images of products, especially how clothing looks on them or how furniture fits in their home. The output must be product imagery, not just descriptive text, to be truly useful.

Related Insights

While companies readily use models that process images, audio, and text inputs, the practical application of generating multimodal outputs (like video or complex graphics) remains rare in business. The primary output is still text or structured data, with synthesized speech being the main exception.

Current text-based prompting for AI is a primitive, temporary phase, similar to MS-DOS. The future lies in more intuitive, constrained, and creative interfaces that allow for richer, more visual exploration of a model's latent space, moving beyond just natural language.

The quality and vision of an AI-generated video are determined more by the source reference images and videos than by the text prompt itself. Providing a strong visual reference gives the model a clear understanding of taste, style, and desired outcome, acting as a more powerful input than descriptive text alone.

The current enterprise AI boom is a symptom of AI teams lacking product designers and the limitations of text-based models. A true consumer AI revolution awaits mature image and video generation, which will unlock the immersive, visual interfaces necessary for breakout consumer apps.

Brian Chesky believes the current chatbot paradigm is flawed for consumer applications like travel because it's text-first, lacks direct manipulation, and is poor for comparison shopping. He predicts the future belongs to rich, visual, agentic interfaces, not simple text conversations.

The next frontier for conversational AI is not just better text, but "Generative UI"—the ability to respond with interactive components. Instead of describing the weather, an AI can present a weather widget, merging the flexibility of chat with the richness of a graphical interface.

The ultimate goal of AI in e-commerce is not to point users to a vast catalog, but to emulate a skilled store associate. This means presenting a few highly curated options based on deep customer knowledge, which improves conversion and helps reduce the industry's staggering 18% apparel return rate.

AI tools that generate functional UIs from prompts are eliminating the 'language barrier' between marketing, design, and engineering teams. Marketers can now create visual prototypes of what they want instead of writing ambiguous text-based briefs, ensuring alignment and drastically reducing development cycles.

Merchants with thousands of products struggle to create unique visuals for each item. AI tools can automatically generate compelling creative at scale—adding motion and frames to basic product images—solving the bottleneck of low "creative density" against a large catalog.

While companies customize LLMs for writing style, visual identity (logos, colors, style) is a far stronger brand differentiator. The CEO argues that since visual brands are more immediately recognizable and diverse than writing styles, the enterprise demand for custom-trained visual models will ultimately be much greater.