This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
Nvidia demos Vid2Vid Cameo – an AI model that brings 2D to life for better video calls
by Ben Wodecki
Nvidia has unveiled Vid2Vid Cameo – an AI model capable of creating realistic videos of a person from a single photo.
Vid2Vid Cameo uses generative adversarial networks (GANs) and is specifically designed for video calls.
The product will soon be available as part of Nvidia’s Maxine SDK, a collection of pre-trained AI models that provide augmented reality effects during video conferences or live streams, the company announced.
“Many people have limited internet bandwidth, but still want to have a smooth video call with friends and family,” Nvidia researcher Ming-Yu Liu said.
“In addition to helping them, the underlying technology could also be used to assist the work of animators, photo editors, and game developers.”
Maxine, you’re on mute
Nvidia debuted Maxine as a video conferencing service last October. Maxine is capable of generating subtle AI-powered features like face alignment and noise reduction, as well as live translation.
Vid2Vid Cameo was teased around the launch of Maxine, with a demonstration published last December. Nvidia said that Maxine will “dramatically” reduce the bandwidth required for videoconferencing calls that use GANs.
The new service works by identifying and encodes facial features in an image and then automatically extracting them.
The extracted features are then sent to other video conference participants, with the system capable of saving and reusing them from prior meetings. On the receiver’s side, the GANs then use the information and generate a video that mimics the appearance of the original picture.
Ming and fellow Nvidia researchers Ting-Chun Wang and Arun Mallya published a research paper explaining the process behind the new service.
“Instead of sending bulky live video streams from one participant to the other, video conferencing platforms can simply send data on how the speaker’s key facial points are moving. On the receiver’s side, the GAN model uses this information to synthesize a video that mimics the appearance of the reference image,” the paper reads.
The paper suggests that the same technology could be used to transfer movements of one person onto the image of another, or to animate digital avatars.
Nvidia’s Vid2Vid Cameo unveiling comes shortly after it launched Fleet Command, a remote management platform designed to allow businesses to monitor and manage AI applications at the edge.
The company also announced a partnership with Equinix on an infrastructure program called AI LaunchPad – offering speedy access to Nvidia’s hardware and software.
A few weeks prior, the Santa Clara-based firm unveiled the Jetson AGX Xavier Industrial, a compute module designed for AI systems in safety-critical industrial environments.