When building feedback tools, recognize that users favor audio. It's easier for multitasking, supports multiple languages, and feels less inhibited than writing. Conversely, video feedback is highly disliked and should be avoided as a primary collection method.
While users can read text faster than they can listen, the Hux team chose audio as their primary medium. Reading requires a user's full attention, whereas audio is a passive medium that can be consumed concurrently with other activities like commuting or cooking, integrating more seamlessly into daily life.
User expectations for AI responses change dramatically based on the input method. A spoken query demands a concise, direct answer, whereas a typed query implies the user has more patience and is receptive to a detailed, link-filled response. Contextual awareness of input modality is critical for good UX.
Observing that younger generations prefer consuming information via video (TikTok) and communicating via voice, Superhuman's CTO predicts a fundamental shift in user experience. Future interfaces, including email, will likely become more conversational and audio-based rather than relying on typing and reading.
To get the most out of recording yourself, review it three separate times. First, listen without video to focus on your tone, pace, and filler words. Second, watch without sound to analyze body language and posture. Finally, watch with sound to see the complete picture. This isolates variables for more effective feedback.
For professionals who find phone calls demanding and texting too superficial for relationship building, voice memos offer an effective middle ground. This asynchronous communication method allows for the nuance and personality of voice, fostering a deeper connection without the pressure of a real-time conversation.
The magic of ChatGPT's voice mode in a car is that it feels like another person in the conversation. Conversely, Meta's AI glasses failed when translating a menu because they acted like a screen reader, ignoring the human context of how people actually read menus. Context is everything for voice.
Investors may be under-bullish on voice because they judge it by current adoption. However, observing the communication habits of the under-25 demographic—who heavily favor voice notes—provides a clear signal that the next generation of workers will expect and demand voice-native tools.
When giving feedback, structure it in three parts. "What" is the specific observation. "So what" explains its impact on you or the situation. "Now what" provides a clear, forward-looking suggestion for change. This framework ensures feedback is understood and actionable.
Even when consuming podcasts on video platforms, users often treat it as an audio-first experience, listening while multitasking. This behavior reveals the core value remains the audio connection and storytelling, regardless of the visual medium used for delivery.
Once a voice input tool reaches a high quality threshold, user behavior changes dramatically. Whisperflow users transition from doing 20% of their computer work with voice to 80% within four months, indicating that a powerful, sticky habit forms that effectively replaces the keyboard for most tasks.