Nvidia releases Jarvis, a framework for conversational AI services

Using Mozilla’s Common Voice data to offer free, pre-trained ML models

April 13, 2021

2 Min Read

Company uses Mozilla’s Common Voice data to offer free, pre-trained ML models

Nvidia has officially launched Jarvis, a collection of pre-trained machine learning models and tools designed to help businesses develop their own interactive conversational AI services.

The announcement was made at the company’s GPU Technology Conference (GTC) 2021, which also saw the reveal of the world's most powerful AI-capable supercomputer, and Nvidia’s first ever data center CPU.

In his keynote, Nvidia founder and CEO Jensen Huang said conversational AI was “in many ways, the ultimate AI.”

"Deep learning breakthroughs in speech recognition, language understanding, and speech synthesis have enabled engaging cloud services. Nvidia Jarvis brings this state-of-the-art conversational AI out of the cloud for customers to host AI services anywhere," he added.

The ‘ultimate AI’

Nvidia said Jarvis “will enable a new wave of language-based applications previously not possible, improving interactions with humans and machines.”

It claimed that the system can run end-to-end speech analysis in under 100 milliseconds and be deployed in the cloud, in a corporate data center, or at the edge.

The company said Jarvis was built "using models trained for several million GPU hours on over one billion pages of text, 60,000 hours of speech data, and in different languages, accents, environments, and lingos to achieve world-class accuracy."

Users can pick from a selection of pre-trained models, which can then be fine-tuned using their in-house data.

Nvidia said that “thousands” of brands asked to join the Jarvis early access program.

T-Mobile was one of those brands given early access. The company’s product and technology vice president Matthew Davis said, “With Nvidia Jarvis services, fine-tuned using T-Mobile data, we're building products to help us resolve customer issues in real-time.

“After evaluating several automatic speech recognition solutions, T-Mobile has found Jarvis to deliver a quality model at extremely low latency, enabling experiences our customers love.”

Nvidia promised additional features for Jarvis in the second quarter of the year.

Mozilla Common Voice partnership

To make Jarvis more useful, Nvidia partnered with Mozilla Common Voice, an open source collection of voice data for startups, researchers, and developers, commonly used to train voice-enabled apps, services, and devices.

Common Voice contains over 9,000 hours of contributed voice data in 60 different languages and claims to be the world's largest multi-language, public domain voice dataset.

Nvidia is using Jarvis to develop pre-trained models with the dataset, and then offer them back to the community for free.

"We launched Common Voice to teach machines how real people speak in their unique languages, accents, and speech patterns,"Mark Surman, executive director at Mozilla, said.

“Nvidia and Mozilla have a common vision of democratizing voice technology — and ensuring that it reflects the rich diversity of people and voices that make up the Internet.”

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

The ‘ultimate AI’

Mozilla Common Voice partnership

About the Author(s)

Latest News

Trending articles