Nvidia releases Jarvis, a framework for conversational AI servicesNvidia releases Jarvis, a framework for conversational AI services
Using Mozilla’s Common Voice data to offer free, pre-trained ML models
April 13, 2021
Company uses Mozilla’s Common Voice data to offer free, pre-trained ML models
Nvidia has officially launched Jarvis, a collection of pre-trained machine learning models and tools designed to help businesses develop their own interactive conversational AI services.
The announcement was made at the company’s GPU Technology Conference (GTC) 2021, which also saw the reveal of the world's most powerful AI-capable supercomputer, and Nvidia’s first ever data center CPU.
In his keynote, Nvidia founder and CEO Jensen Huang said conversational AI was “in many ways, the ultimate AI.”
"Deep learning breakthroughs in speech recognition, language understanding, and speech synthesis have enabled engaging cloud services. Nvidia Jarvis brings this state-of-the-art conversational AI out of the cloud for customers to host AI services anywhere," he added.
The ‘ultimate AI’
Nvidia said Jarvis “will enable a new wave of language-based applications previously not possible, improving interactions with humans and machines.”
It claimed that the system can run end-to-end speech analysis in under 100 milliseconds and be deployed in the cloud, in a corporate data center, or at the edge.
The company said Jarvis was built "using models trained for several million GPU hours on over one billion pages of text, 60,000 hours of speech data, and in different languages, accents, environments, and lingos to achieve world-class accuracy."
Users can pick from a selection of pre-trained models, which can then be fine-tuned using their in-house data.
Nvidia said that “thousands” of brands asked to join the Jarvis early access program.
T-Mobile was one of those brands given early access. The company’s product and technology vice president Matthew Davis said, “With Nvidia Jarvis services, fine-tuned using T-Mobile data, we're building products to help us resolve customer issues in real-time.
“After evaluating several automatic speech recognition solutions, T-Mobile has found Jarvis to deliver a quality model at extremely low latency, enabling experiences our customers love.”
Nvidia promised additional features for Jarvis in the second quarter of the year.
Mozilla Common Voice partnership
To make Jarvis more useful, Nvidia partnered with Mozilla Common Voice, an open source collection of voice data for startups, researchers, and developers, commonly used to train voice-enabled apps, services, and devices.
Common Voice contains over 9,000 hours of contributed voice data in 60 different languages and claims to be the world's largest multi-language, public domain voice dataset.
Nvidia is using Jarvis to develop pre-trained models with the dataset, and then offer them back to the community for free.
"We launched Common Voice to teach machines how real people speak in their unique languages, accents, and speech patterns,"Mark Surman, executive director at Mozilla, said.
“Nvidia and Mozilla have a common vision of democratizing voice technology — and ensuring that it reflects the rich diversity of people and voices that make up the Internet.”