Text-to-Video Generative AI Models: The Definitive List

Explore the growing text-to-video AI space and find out about models like Imagen Video

August 10, 2023

3 Min Read

Credit: Imagen Video

What are text-to-video AI models?

Text-to-video models, as the name suggests, use natural language prompts as input to generate a video. These models use advanced machine learning or deep learning techniques or a recurrent neural network to understand the context and semantics of the input text and then generate a corresponding video sequence.

Text-to-video AI models require massive amounts of data and computing power to train, and the field is still evolving.

Such models could be used to create video content for advertising or entertainment or aid in film production processes.

AI Business explores the growing field of text-to-video AI, outlining the models and platforms available today.

Text-to-video AI models

Imagen Video

Creator: Google

First published: October 2022

Imagen Video is a text-to-video version of Google’s Imagen generative model. Using a natural language prompt, Imagen Video generates high-definition videos.

The model can generate videos and text animations in various artistic styles and with 3D object understanding. To achieve this, Imagen Video uses ‘Cascaded Diffusion Models’ - a combination of a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models to create HD videos.

Access the paper detailing Imagen Video: https://imagen.research.google/video/paper.pdf

CogVideo

Creator: Nightmareai

Try CogVideo via the demo on Hugging Face Spaces: https://huggingface.co/spaces/THUDM/CogVideo

CogVideo is a pre-trained transformer for text-to-video generation. It has 9.4 billion parameters and uses a combination of a text-to-image model, CogView2, and then uses a multi-frame-rate hierarchical training strategy to turn those images into short videos.

CogVideo currently only supports inputs in Chinese - with some demos automatically translating English prompts into simplified Chinese.

Access the CogVideo code: https://github.com/THUDM/CogVideo#

Make-A-Video

Creators: Meta, FAIR

First published: September 2022

Make-A-Video takes text prompts and generates short videos similar to GIFs. Make-A-Video can also create videos from images or take existing videos and create similar new ones.

Built using publicly available datasets, the model uses images with descriptions to “learn what the world looks like and how it is often described," according to Meta.

Check out the Make-A-Video paper: https://arxiv.org/abs/2209.14792

Read more from AI Business on Make-A-Video: https://aibusiness.com/ml/meta-unveils-ai-model-that-can-generate-videos-from-text-inputs

Phenaki

Creator: Google

First published: October 2022

Phenaki can generate videos from text that are several minutes long, compared to other models on this list. The model was trained on both image-text pairs and a number of video-text examples, a method that Google claims offers improved generation capabilities compared to models that solely use video datasets alone.

Read more on Phenaki: https://sites.research.google/phenaki/

Read the Phenaki research paper: https://openreview.net/forum?id=vOEXS39nOF

AI text-to-video platforms

Here are some text-to-video AI platforms you can try today:

Sythensia

Sythensia is a platform where users can easily type a video idea and the platform generates the content. Users can select a template and edit their script to obtain the desired content.

The team behind it sought to build a platform where anyone can produce video content. Sythensia can be used to create YouTube 'How To' videos or enterprise-focused content like sales pitches. The Sythensia platform cannot be used to generate political, sexual or discriminatory content.

Hour One

Hour One is an AI video generation platform. Users can create videos from text prompts, as well as use templates and virtual human presenters to craft their ideal output.

The likes of HP, T-Mobile and AstraZeneca are among its customers. Hour One tech was used to generate video greetings on Cameo for the Alec Baldwin character, Boss Baby.

Try Hour One: https://app.hourone.ai/?init=signUp

Colossyan

Colossyan users can create videos using text prompts. Its video generation platform auto-translates contents into other languages.

Users can also choose from a range of AI presenters, as well as the ability to customize their own.

Automobile giant BMW, professional services firm AAB and chemical manufacturer BASF are among Colossyan’s client base.

Try Colossyan: https://app.colossyan.com/try

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Text-to-Video Generative AI Models: The Definitive List

What are text-to-video AI models?

Text-to-video AI models

Imagen Video

CogVideo

Make-A-Video

Phenaki

AI text-to-video platforms

Sythensia

Hour One

Colossyan

About the Author(s)

Latest News

Trending articles