While users can read text faster than they can listen, the Hux team chose audio as their primary medium. Reading requires a user's full attention, whereas audio is a passive medium that can be consumed concurrently with other activities like commuting or cooking, integrating more seamlessly into daily life.
Instead of debating AI's creative limits, The New Yorker pragmatically adopted it to solve a production bottleneck. AI-generated voiceovers make written pieces available for listening "well nigh immediately," expanding reach to audio-first consumers without compromising the human-led creative process of the articles themselves.
The Hux founder, formerly of Google's NotebookLM, is building an AI that moves beyond the prompt-and-response model. By connecting to a user's calendar and email, it proactively generates personalized audio content, acting like a "friend that was ready to get you caught up" without requiring user input.
Observing that younger generations prefer consuming information via video (TikTok) and communicating via voice, Superhuman's CTO predicts a fundamental shift in user experience. Future interfaces, including email, will likely become more conversational and audio-based rather than relying on typing and reading.
The host, a self-proclaimed "hard copy person," used the AI app Speechify to consume a digital book he otherwise would have skipped. By selecting celebrity voices like Snoop Dogg and Gwyneth Paltrow, he made the content more accessible and enjoyable, highlighting a novel productivity hack for auditory learners.
For professionals who find phone calls demanding and texting too superficial for relationship building, voice memos offer an effective middle ground. This asynchronous communication method allows for the nuance and personality of voice, fostering a deeper connection without the pressure of a real-time conversation.
The magic of ChatGPT's voice mode in a car is that it feels like another person in the conversation. Conversely, Meta's AI glasses failed when translating a menu because they acted like a screen reader, ignoring the human context of how people actually read menus. Context is everything for voice.
Investors may be under-bullish on voice because they judge it by current adoption. However, observing the communication habits of the under-25 demographic—who heavily favor voice notes—provides a clear signal that the next generation of workers will expect and demand voice-native tools.
Even when consuming podcasts on video platforms, users often treat it as an audio-first experience, listening while multitasking. This behavior reveals the core value remains the audio connection and storytelling, regardless of the visual medium used for delivery.
Once a voice input tool reaches a high quality threshold, user behavior changes dramatically. Whisperflow users transition from doing 20% of their computer work with voice to 80% within four months, indicating that a powerful, sticky habit forms that effectively replaces the keyboard for most tasks.
Despite the focus on text interfaces, voice is the most effective entry point for AI into the enterprise. Because every company already has voice-based workflows (phone calls), AI voice agents can be inserted seamlessly to automate tasks. This use case is scaling faster than passive "scribe" tools.