Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!
January 29, 2024
Google has unveiled a new text-to-video model capable of generating lifelike videos from short text inputs.
Lumiere creates videos that showcase realistic motion and can even use images and other videos as inputs to improve results. Unveiled in a paper titled ‘A Space-Time Diffusion Model for Video Generation,' Lumiere works differently from existing video generation models. It generates a temporal duration of the video at once, whereas existing models synthesize distant keyframes followed by temporal super-resolution.
Put simply, Lumiere focuses on the movement of objects in the image, whereas prior systems patch together a video from key frames where the movement already happened.
The model is capable of generating videos comprised of 80 frames. For comparison, Stability’s Stable Video Diffusion clocks in at 14 and 25 frames. The more frames, the smoother the motion of the video.
The researchers also contend that Lumiere produces state-of-the-art generation outputs as a result of its alternative approach. They claim Lumiere's outputs could be used in content creation tasks and video editing, including video inpainting and stylized generation (mimicking artistic styles it is shown) by using fine-tuned text-to-image model weights.
To achieve its results, Lumiere leverages a new architecture, Space-Time U-Net. This generates the entire temporal duration of the video at once, through a single pass in the model.
The Google team wrote that the novel approach improves consistency in outputs. “By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-framerate, low-resolution video by processing it in multiple space-time scales,” the paper reads.
The goal of the Lumiere project was to create a system to enable novice users to more easily create video content.
However, the paper acknowledges the risk of potential misuse, specifically warning models like Lumiere could be used to create fake or harmful content.
“We believe that it is crucial to develop and apply tools for detecting biases and malicious use cases in order to ensure a safe and fair use,” the paper reads.
Google has not made the model available to the public at the time of writing. However, you can explore various example generations on the showcase page on GitHub.
Lumiere follows VideoPoet, a Google-produced multimodal model that creates videos from text, video and image inputs. Unveiled last December, VideoPoet uses a decoder-only transformer architecture, making it capable of creating content it has not been trained on.
Read more about:ChatGPT / Generative AI
Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.
You May Also Like
Generative AI Journeys with CDW UK's Chief TechnologistFeb 28, 2024
Qantm AI CEO on AI Strategy, Governance and Avoiding PitfallsFeb 14, 2024
Deloitte AI Institute Head: 5 Steps to Prepare Enterprises for an AI FutureJan 31, 2024
Athenahealth's Data Science Architect on Benefits of AI in Health CareJan 19, 2024