This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
Nvidia reveals Omniverse Avatars: A suite of interactive AI assistants
by Ben Wodecki
Tech giant Nvidia has revealed the Omniverse Avatar – a platform to generate interactive AI avatars.
At the company's GTC event, Jensen Huang, founder and CEO of Nvidia, was replicated as an avatar through Project Tokkio, designed for customer support.
Also demonstrated were DRIVE Concierge for always-on, intelligent services in vehicles, and Project Maxine for video conferencing.
The Avatar platform is part of the company’s wider Omniverse virtual environment offering –currently in open beta, with around 70,000 users.
“The dawn of intelligent virtual assistants has arrived,” Huang said.
"Omniverse Avatar combines Nvidia's foundational graphics, simulation, and AI technologies to make some of the most complex real-time applications ever created. The use cases of collaborative robots and virtual assistants are incredible and far-reaching."
In the footsteps of Nintendo
As part of the first demonstration – focusing on Project Tokkio – Huang was displayed as a toy replica of himself, communicating with colleagues on topics like biology and climate science.
A second Project Tokkio demo saw a customer-service avatar in a restaurant kiosk conversing with customers as they ordered food.
Both demos were powered by Nvidia’s Megatron 530B, a customizable language model.
Meanwhile, Project Maxine was showing an English-language speaker on a video call in a noisy cafe, who could be heard without background noise. As she spoke, her words were both transcribed and translated in real-time into German, French, and Spanish, with the same voice and intonation.
Maxine has been teased for some time – with Nvidia showing off one of its tools, Vid2Vid Cameo, earlier this year, as an AI model capable of creating realistic videos of a person from a single photo. It uses generative adversarial networks (GANs) and is specifically designed for video calls.
The DRIVE Concierge demo saw a digital assistant appearing on the dashboard screen of a car to help the driver select the best driving mode to reach their destination.
The various Omniverse Avatar options “[open] the door to the creation of AI assistants that are easily customizable for virtually any industry,” Nvidia said in the announcement.
“These could help with the billions of daily customer service interactions — restaurant orders, banking transactions, making personal appointments and reservations, and more — leading to greater business opportunities and improved customer satisfaction.”
The demo harkens back to Facebook’s recent Meta re-brand, when CEO Mark Zuckerberg was recreated in digital form, with his company pivoting to become a metaverse-focused venture.
The major difference between the respective demos was that Nvidia wasn't rocked by PR disasters shortly before its announcement.
Other criticisms of Meta’s approach are emerging: the reveal should have focused less on entertainment and videoconferencing, and more on industrial uses, Danny Lange, SVP of AI at Unity, said at the recent AI Summit Silicon Valley 2021.
Humorously, following Lange’s session at the event was Katie Duffy, the director of Facebook IQ, who doubled down on the entertainment aspect – going as far as to suggest that AI could be worthy of earning film credits.
What’s in an avatar?
According to Nvidia, the Omniverse Avatars combine elements of computer vision, natural language understanding, speech AI, and recommendation engines.
The speech recognition is based on Riva, the company’s speech software development kit that generates human-like speech responses using text-to-speech capabilities.
The natural language understanding stems from the Megatron 530B model, while the recommendation engine is borrowed from Nvidia Merlin.
Metropolis, Video2Face, and Audio2Face are some of the other Nvidia tools also used to bring the package to life.
“These technologies are composed into an application and processed in real-time using the Nvidia Unified Compute Framework," the company said.
“Packaged as scalable, customizable microservices, the skills can be securely deployed, managed and orchestrated across multiple locations by Fleet Command.”
Omniverse, initially dubbed Holodeck after the VR environment room from Star Trek: The Next Generation, can be used to create 3D environments for production teams to work together without the need for in-person meetings or sizable file exchanges.
What exactly is the metaverse?
Nvidia and Facebook join a list of companies attempting to conquer the metaverse.
For reference, the metaverse is a hypothetical concept of always-online 3D virtual environments where users can converse and collaborate through technologies like VR and AR.
The area is still in its infancy – so far, it has been limited to a few VR experiences, mainly found in gaming, through the likes of VRChat.
The video game Second Life, where users create an avatar of themselves and effectively live online, is a precursor to what many companies are trying to achieve today. Second Life was released in 2003 and retains players today, though is no way near as popular as it was two decades ago.
The AI Business team has discussed Second Life and the metaverse repeatedly in our podcast series.
Other notable names operating in this space include Matterport – which is developing AI platforms for users to recreate physical spaces. The company went public in July and brought its system to Android users in the form of a mobile app in late October.
Following the float, James Morris-Manuel, the company’s managing director for EMEA, told AI Business his team sees the built environment as a $230 trillion asset class, suggesting it's as much as three times larger than the current equities markets combined.
Among the open source options, COLMAP is a tool that can create exportable 3D environments using an AI algorithm. Capable of running the software on Windows, Mac, or Linux, its users can export a 3D mesh from a flat image – although to refine it, programs like MeshLab would be required.