Google Unveils Next-Gen Video, Image, Music Models at Google I/O 2024

I/O 2024 offered a glimpse of Google’s Sora rival Veo, a new Imagen model and music generation tools

Ben Wodecki, Jr. Editor

May 16, 2024

4 Min Read
GLENN CHAPMAN/AFP via Getty Images

Google has unveiled a series of new creative-focused generative AI models, including a video generation model set to rival OpenAI’s Sora.

Google has been experimenting with AI video generation for some time, including most recently, the text-to-video model Lumiere.

At the company’s annual I/O event, Google announced it would be bringing AI video generation to the masses through Veo, a new model capable of creating high-quality 1080p resolution videos in around one minute.

View post on X

Developed by Google DeepMind, Veo can turn text, images and other videos into new video content.

The video generation model understands concepts in prompts like cinematic effects and can be used to create visuals like time lapses and aerial landscape shots.

Google DeepMind used the multimodal capabilities of its flagship foundation model Gemini to optimize Veo during the training process, enabling it to better understand nuance from prompts.

Google DeepMind said the model is designed to help creators and would make video production accessible to everyone.

“Techniques for generating static images have come a long way,” said Sir Demis Hassabis, Google DeepMind’s CEO. “Generating video is a different challenge altogether. Not only is it important to understand where an object or subject should be in space, it needs to maintain this consistency over time.

Related:Generative AI Helps Google Redefine Search, Google I/O 2024

“Veo builds upon years of our pioneering generative video model work, including GQN, Phenaki, Walt, VideoPoet, Lumiere and much more. We combined the best of these architectures and techniques to improve consistency, quality and output resolution.”

Actor Donald Glover and his creative studio, Gilga, were given early access to the video generation tool.

“Everybody is going to become a director and everybody should be a director because at the heart of all of this is just storytelling,” Glover said. “The closer we are to being able to tell each other our stories, the more we'll understand each other.”

Videos generated by Veo can be traced back to the model through SynthID, Google’s watermarking technology.

The video generation model will be made available through VideoFX. It’s currently limited to select creators but you can sign up to obtain access through the waitlist.

Hassabis said Google is experimenting with the video generation model across features like storyboarding as well as using the model to create longer scenes.

Google said the model will also power some creative features on YouTube, including Shorts, though failed to disclose what that’ll entail.

Related:Google Expands Gemini Lineup With Large, Small Model Updates at Google I/O 2024

Imagen 3: Improved Image Generation

Google also unveiled the latest version of its Imagen line of image generation models, Imagen 3.

The new Imagen model creates more photorealistic images compared to previous versions, incorporating more intricate details in outputs with fewer distorted objects.

The new model has a better understanding of prompts, with Google claiming it can comprehend the intent behind an input.

“Imagen 3 understands prompts written the way people write, the more creative and detailed you are the better,” said Douglas Eck, Google’s senior research director.

Google also boosted Imagen 3’s ability to render text, a feature with which most image generation models tend to struggle.

Imagen 3 is currently available to select creators in private preview through the ImageFX platform.

Users can sign up to the waitlist, with the image generation model set to be available in Vertex AI “soon.”

Previous Imagen models were the subject of a copyright infringement lawsuit brought against Google earlier this month.

Music: Creative Collaborations

Google also announced it’s been working on tools powered by AI that let musicians create tracks from scratch.

Music AI Sandbox uses the Lyria model to offer musicians a creative playground. Users can generate instrumental sections from natural language prompts.

Related:Google Custom TPUs Power Faster AI Workloads, Google I/O 2024

“We've been working closely with incredible musicians, songwriters and producers,” said Ek. “Some of them may even be entirely new songs in ways that would have not been possible without these tools.”

Artists including Wyclef Jean were brought in to test the platform.

Marc Rebillet, an improvisational electronic musician and YouTuber, used the tools to make what he called “Gloops: Google loops.”

Rebillet was the keynote’s warm-up act and used the new generative AI music tools to create tracks live. The audience shouted ideas for instruments he could prompt to include and the system generated live snippets that he mixed into tracks.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like