What’s Next for AI Video Generation

Using generative AI to create visual content has come a long way, but there’s much more to come

Alon Yaar, Vice president of product at Lightricks

August 6, 2024

5 Min Read
Four people looking at devices with images protected around them
Getty Images

The use of generative AI tech to create visual content has come a long way over the last two years, as models have been democratized and improved to deliver staggering photo realism. They’ve also been integrated into mobile apps that anyone can use.

In 2024, the emergence of more sophisticated text-to-video models has already started to give businesses and content creators vastly more powerful tools that can help them create realistic, professional-grade video content at scale. While the potential is still far from realized, the industry has been pushing things forward quickly, as seen in models like OpenAI’s Sora and Shengshu Technology’s Vidu and in apps like Runway and Lightricks’ LTX Studio.

Generative AI-powered video is already making an impact, helping companies and individuals alike to create professional-grade video content, without the need for high budgets, production crews, actors, or even technical skills. To get started with making generative AI videos, you don’t even need a developed idea, as it can all start from a descriptive prompt.

Power of Generative AI Video

AI-generated videos already excel in several ways, despite being in their infancy. When it comes to recreating natural phenomena such as fire, weather, waves and the effect of windy conditions on a landscape, the technology already produces ultra-realistic video that genuinely looks as if it were filmed, thanks to being trained on vast amounts of existing video data.

Related:How AI Is Revolutionizing the World of Education

AI can also create extremely compelling “high-level” footage of things such as aerial city shots, large crowds and nature, which is especially useful given the costs and difficulty of using computer-generated imagery (CGI) to enhance regular photography and replicate these effects.

Character consistency is another area where AI rivals traditional video, with existing models being able to generate and preserve the identity and style of unique characters across multiple frames and scenes in different contexts. Similarly, today’s generative AI video creators shine when it comes to camera motion, providing a vital tool for filmmakers.

These are essential capabilities for storytelling, enabling the characters in AI videos to remain true to their visual appearance and giving video creators fine-grained control over elements such as moving camera angles, lighting and composition. With these capabilities, creators can realize their vision exactly as they imagine it.

Limitations Still Persist

Media outlets are abuzz nowadays with examples of dazzling AI-generated videos that illustrate the technology’s remarkable potential. But as impressive as these early examples are, it’s also clear that AI video generation is held back by some limitations that the industry must resolve if it is to become a viable alternative to traditional video production.

Related:Navigating the Path to Responsible Generative AI

In particular, AI-created videos have been criticized for their failure to create realistic-looking human characters. While these technical challenges have been more or less solved when it comes to still photos, in video, AI-generated faces are a particular challenge, as are elements such as human hands and feet, and their interactions, gestures and emotions.

Although some improvements have been made in these areas recently, AI-generated human characters still tend to fall into what’s known as the “uncanny valley,” due to a lack of subtlety and authenticity.

This is a big problem, with a recent study by researchers at the University of Cambridge finding that the uncanny valley effect and its deviations from realism can be quite unsettling to human viewers. 

What Causes These Limitations?

AI struggles with generating consistent and compelling human motion due to the underlying architecture of the model, how the data set is trained and the general lack of specific training data available. Generative AI models need enormous and diverse datasets for training and there simply isn’t enough high-quality and unbiased information available to create them.

A lack of diversity in training datasets leads to the perpetuation of stereotyped characters and unfair representation in AI-generated content. One recent study by MIT, for example, showed that AI models trained on biased datasets reflect those biases in the content they generate.

Humans are remarkably complex beings and the task of creating an artificial one means reflecting the hundreds of nuances, such as facial expressions and gestures. However, a dearth of useful training data prevents AI video generators from capturing the scale of complexity necessary to recreate these human traits effectively. Similarly, there’s not enough data on hand to replicate the natural speech of humans or understand the context of different environments.

AI also still falls short in terms of cultural nuances, creative expression and complex decision-making, and again, this is due to the lack of high-quality data.

Thirst for More Data

It’s easy to identify the solution to the limitations of AI video generation. We simply need to accumulate more, high-quality training data. But getting our hands on this data is not as easy as it seems.

Some have promoted the idea of using synthetic datasets instead of real-world data to train video AI models, but research suggests that this is not a viable alternative. An article in New Scientist warns that as the internet becomes populated with AI-created humans and other synthetic images, it will lead to generative AI being trained on its own, artificially-created data, creating a self-consuming loop that stalls any progress in terms of image and video quality.

A better solution might be to plug the gaps in existing datasets with specific imagery generated using CGI tools. This is essentially video-to-video generation rather than text-to-video generation. The premise is that AI can be merged with CGI, with AI handling things that CGI struggles to create convincingly, such as human eyes, and CGI used to recreate the things that AI doesn’t do so well, such as beards or emotional reactions.

Pedal to the Metal

AI video generation has the potential to accelerate the creativity and productivity of humans to an unprecedented level, but the road to realizing this dream is dependent on the amount and quality of the data that powers these new systems.

Fortunately, the possibilities of this technology are so immense that no one is standing still and we’re already making strong progress in solving AI’s training data challenge. That progress will only accelerate as more innovators rise to the challenge, tempted by the rewards that AI can offer.

The journey has only just begun and we’ve come very far, but there’s a much longer road ahead. By the time we reach the end of that road, the creative possibilities of generative AI will far exceed the limits of our imagination.

About the Author

Alon Yaar

Vice president of product at Lightricks, Lightricks

Alon Yaar is vice president of product at Lightricks, an award-winning developer of AI-first photo and video editing tools that have been downloaded over 730 million times worldwide. A seasoned freelance designer, Yaar first joined the Lightricks team in 2018 as a machine learning researcher.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like