Nvidia demos Vid2Vid Cameo – an AI model that brings 2D to life for better video calls

Available soon as part of Nvidia’s Maxine SDK

June 28, 2021

2 Min Read

Nvidia has unveiled Vid2Vid Cameo – an AI model capable of creating realistic videos of a person from a single photo.

Vid2Vid Cameo uses generative adversarial networks (GANs) and is specifically designed for video calls.

The product will soon be available as part of Nvidia’s Maxine SDK, a collection of pre-trained AI models that provide augmented reality effects during video conferences or live streams, the company announced.

“Many people have limited internet bandwidth, but still want to have a smooth video call with friends and family,” Nvidia researcher Ming-Yu Liu said.

“In addition to helping them, the underlying technology could also be used to assist the work of animators, photo editors, and game developers.”

Maxine, you’re on mute

Nvidia debuted Maxine as a video conferencing service last October. Maxine is capable of generating subtle AI-powered features like face alignment and noise reduction, as well as live translation.

Vid2Vid Cameo was teased around the launch of Maxine, with a demonstration published last December. Nvidia said that Maxine will “dramatically” reduce the bandwidth required for videoconferencing calls that use GANs.

The new service works by identifying and encodes facial features in an image and then automatically extracting them.

The extracted features are then sent to other video conference participants, with the system capable of saving and reusing them from prior meetings. On the receiver’s side, the GANs then use the information and generate a video that mimics the appearance of the original picture.

Ming and fellow Nvidia researchers Ting-Chun Wang and Arun Mallya published a research paper explaining the process behind the new service.

“Instead of sending bulky live video streams from one participant to the other, video conferencing platforms can simply send data on how the speaker’s key facial points are moving. On the receiver’s side, the GAN model uses this information to synthesize a video that mimics the appearance of the reference image,” the paper reads.

The paper suggests that the same technology could be used to transfer movements of one person onto the image of another, or to animate digital avatars.

Nvidia’s Vid2Vid Cameo unveiling comes shortly after it launched Fleet Command, a remote management platform designed to allow businesses to monitor and manage AI applications at the edge.

The company also announced a partnership with Equinix on an infrastructure program called AI LaunchPad – offering speedy access to Nvidia’s hardware and software.

A few weeks prior, the Santa Clara-based firm unveiled the Jetson AGX Xavier Industrial, a compute module designed for AI systems in safety-critical industrial environments.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Maxine, you’re on mute

About the Author(s)

Latest News

Trending articles