Text-to-Video Generative AI Models: The Definitive List
Explore the growing text-to-video AI space and find out about models like Imagen Video
What are text-to-video AI models?
Text-to-video models, as the name suggests, use natural language prompts as input to generate a video. These models use advanced machine learning or deep learning techniques or a recurrent neural network to understand the context and semantics of the input text and then generate a corresponding video sequence.
Text-to-video AI models require massive amounts of data and computing power to train, and the field is still evolving.
Such models could be used to create video content for advertising or entertainment or aid in film production processes.
AI Business explores the growing field of text-to-video AI, outlining the models and platforms available today.
Text-to-video AI models
Imagen Video
Creator: Google
First published: October 2022
Imagen Video is a text-to-video version of Google’s Imagen generative model. Using a natural language prompt, Imagen Video generates high-definition videos.
The model can generate videos and text animations in various artistic styles and with 3D object understanding. To achieve this, Imagen Video uses ‘Cascaded Diffusion Models’ - a combination of a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models to create HD videos.
Access the paper detailing Imagen Video: https://imagen.research.google/video/paper.pdf
CogVideo
Creator: Nightmareai
Try CogVideo via the demo on Hugging Face Spaces: https://huggingface.co/spaces/THUDM/CogVideo
CogVideo is a pre-trained transformer for text-to-video generation. It has 9.4 billion parameters and uses a combination of a text-to-image model, CogView2, and then uses a multi-frame-rate hierarchical training strategy to turn those images into short videos.
CogVideo currently only supports inputs in Chinese - with some demos automatically translating English prompts into simplified Chinese.
Access the CogVideo code: https://github.com/THUDM/CogVideo#
Make-A-Video
First published: September 2022
Make-A-Video takes text prompts and generates short videos similar to GIFs. Make-A-Video can also create videos from images or take existing videos and create similar new ones.
Built using publicly available datasets, the model uses images with descriptions to “learn what the world looks like and how it is often described," according to Meta.
Check out the Make-A-Video paper: https://arxiv.org/abs/2209.14792
Read more from AI Business on Make-A-Video: https://aibusiness.com/ml/meta-unveils-ai-model-that-can-generate-videos-from-text-inputs
Phenaki
Creator: Google
First published: October 2022
Phenaki can generate videos from text that are several minutes long, compared to other models on this list. The model was trained on both image-text pairs and a number of video-text examples, a method that Google claims offers improved generation capabilities compared to models that solely use video datasets alone.
Read more on Phenaki: https://sites.research.google/phenaki/
Read the Phenaki research paper: https://openreview.net/forum?id=vOEXS39nOF
AI text-to-video platforms
Here are some text-to-video AI platforms you can try today:
Sythensia
Sythensia is a platform where users can easily type a video idea and the platform generates the content. Users can select a template and edit their script to obtain the desired content.
The team behind it sought to build a platform where anyone can produce video content. Sythensia can be used to create YouTube 'How To' videos or enterprise-focused content like sales pitches. The Sythensia platform cannot be used to generate political, sexual or discriminatory content.
Hour One
Hour One is an AI video generation platform. Users can create videos from text prompts, as well as use templates and virtual human presenters to craft their ideal output.
The likes of HP, T-Mobile and AstraZeneca are among its customers. Hour One tech was used to generate video greetings on Cameo for the Alec Baldwin character, Boss Baby.
Try Hour One: https://app.hourone.ai/?init=signUp
Colossyan
Colossyan users can create videos using text prompts. Its video generation platform auto-translates contents into other languages.
Users can also choose from a range of AI presenters, as well as the ability to customize their own.
Automobile giant BMW, professional services firm AAB and chemical manufacturer BASF are among Colossyan’s client base.
Try Colossyan: https://app.colossyan.com/try
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like