Investors may be under-bullish on voice because they judge it by current adoption. However, observing the communication habits of the under-25 demographic—who heavily favor voice notes—provides a clear signal that the next generation of workers will expect and demand voice-native tools.
A one-size-fits-all AI voice fails. For a Japanese healthcare client, ElevenLabs' agent used quick, short responses for younger callers but a calmer, slower style for older callers. This personalization of delivery, not just content, based on demographic context was critical for success.
Observing that younger generations prefer consuming information via video (TikTok) and communicating via voice, Superhuman's CTO predicts a fundamental shift in user experience. Future interfaces, including email, will likely become more conversational and audio-based rather than relying on typing and reading.
The true evolution of voice AI is not just adding voice commands to screen-based interfaces. It's about building agents so trustworthy they eliminate the need for screens for many tasks. This shift from hybrid voice/screen interaction to a screenless future is the next major leap in user modality.
When a prospect's voicemail directs you to text, structure your message for reading, not listening. Start with relevance about them, not your name, because they will likely read a transcript. This optimizes the message for the medium they've chosen.
Contrary to the focus on professional use cases, OpenAI's largest study shows that 46% of messages from adult consumer users are from the 18-25 age group. This indicates the emergence of an "AI native" generation whose approach to work and education will be fundamentally different.
The next evolution of sales technology isn't an improved CRM but an integrated platform connecting ERP, finance, and legal systems. Salespeople will interact with it via voice commands to get instant answers, generate proposals, and coordinate cross-departmental actions without manual input.
For professionals who find phone calls demanding and texting too superficial for relationship building, voice memos offer an effective middle ground. This asynchronous communication method allows for the nuance and personality of voice, fostering a deeper connection without the pressure of a real-time conversation.
The magic of ChatGPT's voice mode in a car is that it feels like another person in the conversation. Conversely, Meta's AI glasses failed when translating a menu because they acted like a screen reader, ignoring the human context of how people actually read menus. Context is everything for voice.
Reacting against digital oversaturation, younger consumers are creating a counter-movement toward "acoustic real experiences." This involves deliberately choosing analog technologies like point-and-shoot cameras and flip phones over their more efficient digital counterparts, creating new market opportunities for founders catering to this desire for tangible, focused experiences.
Despite the focus on text interfaces, voice is the most effective entry point for AI into the enterprise. Because every company already has voice-based workflows (phone calls), AI voice agents can be inserted seamlessly to automate tasks. This use case is scaling faster than passive "scribe" tools.