Nvidia shows off AI software platform designed to create synthetic voices in less than an hour

Riva Custom Voice contains automatic speech recognition and customizable text-to-speech capabilities

November 11, 2021

3 Min Read

Riva Custom Voice contains automatic speech recognition and customizable text-to-speech capabilities

Nvidia has revealed Riva Custom Voice: an AI software platform that can create ‘human-like’ voices in one day, using just 30 minutes of audio data.

Announced at the company’s annual GTC event, Riva is designed for companies to create voices for virtual assistants in settings like call centers.

During his keynote, Nvidia founder and CEO Jensen Huang demoed Riva, showing the system generating a voice for an Albert Einstein speech from around 30 minutes of audio data.

Riva Custom Voice is available in the latest version of the company’s Riva speech AI software development kit. It includes automatic speech recognition and customizable text-to-speech capabilities.

"It also has the ability to scale speech services to hundreds of thousands of streams in the cloud, in the data center, or at the edge," Nvidia said in the announcement.

“Now companies can use speech AI to listen and respond to customers with an expressive voice that’s unique to their brand and that drives more engaging and delightful interactions,” said Kari Briski, VP of product management for AI software at Nvidia.

Scaling speech

The new product “makes it practical for millions of companies to develop an expressive custom voice with Riva in hours versus weeks, using a small amount of data,” Nvidia said.

RingCentral, a provider of cloud-based communications for businesses, was one of the first businesses to try the system. The company is using Riva automatic speech recognition for its video conferencing live-captioning.

"Our goal is to make meetings smarter and with Riva, it's now possible to train live transcription models on Nvidia GPUs for accuracy against varied accents," said Nat Natarajan, executive vice president and general manager of products and engineering at RingCentral.

"In the future, we expect there to be several concurrent streams and Riva can easily scale, running these streams in real-time in under 300 milliseconds. We are excited to partner with Nvidia and for the future."

Another Riva user is Ping An, the Chinese financial services firm. It is using the voice product as part of its virtual agent service to decrease customer wait times.

"Using Nvidia's pre-trained models for automatic speech recognition, further fine-tuned on our data, our system has achieved a five percent improvement in accuracy, enabling us to provide more engaging and authentic services," said Jing Xiao, chief scientist at Ping An.

Nvidia said its conversational AI software has been downloaded more than 250,000 times to date.

The voice offering wasn’t the only AI option on display at GTC – with Nvidia unveiling Omniverse Avatar, a platform for creating interactive 3D representations of people.

Hear the voice grow louder

Using AI software to generate voices is a relatively new area, but one that’s fast becoming filled with industry players.

Take Veritone, the enterprise software developer, which launched a Voice-as-a-Service (VaaS) product in May to create ‘hyper-realistic’ synthetic voices.

Dubbed MARVEL.ai, the platform allows creators to create, manage, license, and monetize synthetic speech – including celebrity voices – through AI.

Veritone CEO Ryan Steelberg hosted a keynote session at the recent AI Summit & IoT World Silicon Valley 2021, explaining how AI at scale was helping content creators monetize their voice talent.

Another example of this tech saw Hour One help create an AI-powered version of the animated character Boss Baby, able to deliver personalized video messages on Cameo.

Amir Konigsberg, founding director and board member of Hour One, recently sat down with AI Business to talk about AI in video messaging – and whether they’ll soon be capable of recreating human characters.

The AI Business team itself got in on the act – using Uberduck.ai, the free-to-use synthetic voice tool, to bring Sir Patrick Stewart to life on our podcast.

The voice commerce market is predicted to hit $80bn by 2023, according to Juniper Research.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Scaling speech

Hear the voice grow louder

About the Author(s)

Latest News

Trending articles