The Age of AI: The Revolution Will Be Conversational

The intelligence revolution will be defined by the rise of AI systems that can perform the knowledge work and decision-making previously done by humans alone

Scott Stephenson, Co-founder and CEO, Deepgram, the voice AI company

November 6, 2024

4 Min Read
Getty Images

Throughout history, revolutions in human productivity have been driven by the mechanization of key tasks - from food production in the agricultural revolution, to physical labor in the industrial revolution, to information processing in the digital revolution. We are now on the cusp of a new era, where the tasks being mechanized are those we consider quintessentially human: Cognition, reasoning and intelligence.

This intelligence revolution will be defined by the rise of artificial intelligence systems that can perform the knowledge work and decision-making previously done by humans alone. Like previous revolutions, it will be shaped by the constraints and interfaces of its time—with AI systems initially focused on replicating human interaction modalities like vision, language and speech. But as these systems grow in sophistication, they will unlock entirely new forms of human-machine collaboration and cognition.

The impacts of this shift will be staggering. Just as the Industrial Revolution automated routine physical labor and drove an explosion in material wealth, the intelligence revolution will automate routine cognitive labor and drive an explosion in the wealth of knowledge and ideas. It will augment and uplift human capabilities in every domain, from science and engineering to art and entertainment. And it will fundamentally reshape the way we live, work and interact with the world around us.

Related:Why Your AI Will Never Take Off Without Better Data Accessibility

Successfully navigating this transition requires visionary leaders who not only grasp the underlying technologies but also anticipate their societal impacts. We must proactively create the technical, economic and ethical frameworks to ensure this revolution enhances the human condition and works to democratize its benefits globally. Though the path forward will be challenging, the outcome promises a future of unimaginable abundance and human flourishing for all.

The Future of Voice: From Interfaces to Autonomous Agents

Voice and audio represent the most natural modalities for human communication - which is why they will be essential interfaces in the coming intelligence revolution. We are already seeing the rise of voice-based AI, but this is only the beginning. As speech recognition, language understanding and speech synthesis technologies mature, voice will become the primary way we access and interact with AI systems. After all, it’s much faster and more natural to speak than it is to type.

This presents both immense opportunities and non-trivial challenges. On one hand, voice interfaces can make AI feel more human and accessible—enabling people to leverage its capabilities through fluid conversation. Well-designed voice AI can empathize, engage and build rapport in ways that a screen never could. It can also enable hands-free, ambient interactions that weave assistance intelligence seamlessly into our environment.

Related:How Blockchain and Artificial Intelligence Can Build Trust Together

Furthermore, voice-enabled AI can convey much more information than AI that communicates through text alone. As experts have outlined, vocal tone, talking rate and even dramatic pauses are more or less lost in a text-only communication space. Voice allows for more robust and dynamic interactions; therefore, we’re likely to default to vocal rather than written communication as our machines’ capabilities expand.

However, voice interfaces also introduce new frictions. They lack the visual affordances of a graphical UI, struggle with complex information presentation and can feel socially awkward in many contexts. Designing effective voice experiences requires a deep understanding of the strengths and limitations of the medium. It also requires solving extremely difficult technical problems in areas like multi-party conversation, prosody control and audio rendering.

In the long run, the optimal solution may be found in the fusion of voice with other complementary modalities. Wearables like AR glasses can provide visual grounding while leveraging voice for input and simple outputs. Newer technologies like ultrasound haptics can add a tactile dimension. The AI itself must be able to fluidly mix and match modalities based on the user's context and task.

As we master this, voice AI will blossom from narrow interfaces for controlling our devices into rich, persistent, personalized assistants that accompany us throughout our daily lives. They will always be available—not to replace human connection, but to augment our agency in the world. Through open-ended dialogue, they will help us access knowledge, generate ideas, reason through problems and manage tasks. As technological progress accelerates, they will be our indispensable copilots in navigating a future of volatility, uncertainty, complexity and ambiguity (VUCA).

About the Author

Scott Stephenson

Co-founder and CEO, Deepgram, the voice AI company, Deepgram

Scott is a dark matter physicist turned deep learning entrepreneur. He earned a PhD in particle physics from the University of Michigan where his research involved building a lab two miles underground to detect dark matter. Scott left his physics post-doc research position to found Deepgram.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like