AI apps that require users to select a mode like 'image' or 'text' before a query are revealing their underlying technical limitations. A truly intelligent, multimodal system should infer user intent directly from the prompt within a single conversational flow, rather than relying on a clumsy UI to route the request.
Figma CEO Dylan Field predicts we will look back at current text prompting for AI as a primitive, command-line interface, similar to MS-DOS. The next major opportunity is to create intuitive, use-case-specific interfaces—like a compass for AI's latent space—that allow for more precise control beyond text.
Current text-based prompting for AI is a primitive, temporary phase, similar to MS-DOS. The future lies in more intuitive, constrained, and creative interfaces that allow for richer, more visual exploration of a model's latent space, moving beyond just natural language.
The true evolution of voice AI is not just adding voice commands to screen-based interfaces. It's about building agents so trustworthy they eliminate the need for screens for many tasks. This shift from hybrid voice/screen interaction to a screenless future is the next major leap in user modality.
Existing AI tools are good at either "asking" for information (e.g., search) or "doing" a task. AI-first browsers like Comet struggle because browsing requires seamlessly blending both intents, a difficult product challenge that has not yet been effectively solved, hindering their adoption.
Comparing chat interfaces to the MS-DOS command line, Atlassian's Sharif Mansour argues that while chat is a universal entry point for AI, it's the worst interface for specialized tasks. The future lies in verticalized applications with dedicated UIs built on top of conversational AI, just as apps were built on DOS.
The best UI for an AI tool is a direct function of the underlying model's power. A more capable model unlocks more autonomous 'form factors.' For example, the sudden rise of CLI agents was only possible once models like Claude 3 became capable enough to reliably handle multi-step tasks.
AI is best understood not as a single tool, but as a flexible underlying interface. It can manifest as a chat box for some, but its real potential is in creating tailored workflows that feel native to different roles, like designers or developers, without forcing everyone into a single interaction model.
The next frontier for conversational AI is not just better text, but "Generative UI"—the ability to respond with interactive components. Instead of describing the weather, an AI can present a weather widget, merging the flexibility of chat with the richness of a graphical interface.
Cues uses 'Visual Context Engineering' to let users communicate intent without complex text prompts. By using a 2D canvas for sketches, graphs, and spatial arrangements of objects, users can express relationships and structure visually, which the AI interprets for more precise outputs.
Open-ended prompts overwhelm new users who don't know what's possible. A better approach is to productize AI into specific features. Use familiar UI like sliders and dropdowns to gather user intent, which then constructs a complex prompt behind the scenes, making powerful AI accessible without requiring prompt engineering skills.