Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Until brain-computer interfaces are viable, the highest bandwidth way to interact with AI is through speaking commands (voice out) and receiving information visually (visual in), whether on a screen or via glasses. This is because humans speak significantly faster than they can type.

Related Insights

AI devices must be close to human senses to be effective. Glasses are the most natural form factor as they capture sight, sound, and are close to the mouth for speech. This sensory proximity gives them an advantage over other wearables like earbuds or pins.

OpenAI's upcoming hardware family, including a smart speaker and glasses, will intentionally have no screens. This is a deliberate strategic choice to move beyond the screen-centric ecosystem dominated by Apple and Google. It represents a bet on a future where AI interaction is primarily ambient, powered by voice and computer vision rather than touchscreens.

The dominant AI interface will be a universal conversational layer (chat/voice) for any task. This will be supplemented by specialized graphical UIs for power users needing deep functional control, much like an executive sometimes needs to edit a document directly instead of dictating to an assistant.

Power users of AI agents believe the ideal user interface is not graphical but conversational. They prefer text-based interactions within existing chat apps and see voice as the ultimate endgame. The goal is an invisible assistant that operates autonomously and only prompts for input when absolutely necessary, making traditional UIs feel like friction.

Observing that younger generations prefer consuming information via video (TikTok) and communicating via voice, Superhuman's CTO predicts a fundamental shift in user experience. Future interfaces, including email, will likely become more conversational and audio-based rather than relying on typing and reading.

The ultimate winner in the AI race may not be the most advanced model, but the most seamless, low-friction user interface. Since most queries are simple, the battle is shifting to hardware that is 'closest to the person's face,' like glasses or ambient devices, where distribution is king.

The true evolution of voice AI is not just adding voice commands to screen-based interfaces. It's about building agents so trustworthy they eliminate the need for screens for many tasks. This shift from hybrid voice/screen interaction to a screenless future is the next major leap in user modality.

Adding existing health sensors like heart rate monitors to new devices like smart glasses offers diminishing returns. The real innovation and value proposition for new wearables lies in developing new interaction paradigms, particularly advanced, low-latency audio interfaces for seamless communication in any environment.

The magic of ChatGPT's voice mode in a car is that it feels like another person in the conversation. Conversely, Meta's AI glasses failed when translating a menu because they acted like a screen reader, ignoring the human context of how people actually read menus. Context is everything for voice.

The next user interface paradigm is delegation, not direct manipulation. Humans will communicate with AI agents via voice, instructing them to perform complex tasks on computers. This will shift daily work from hours of clicking and typing to zero, fundamentally changing our relationship with technology.