Text-to-Video Generative AI Models: The Definitive List

Explore the growing text-to-video AI space and find out about models like Imagen Video

Ben Wodecki, Jr. Editor

August 10, 2023

3 Min Read
Credit: Imagen Video

What are text-to-video AI models?

Text-to-video models, as the name suggests, use natural language prompts as input to generate a video. These models use advanced machine learning or deep learning techniques or a recurrent neural network to understand the context and semantics of the input text and then generate a corresponding video sequence.

Text-to-video AI models require massive amounts of data and computing power to train, and the field is still evolving.

Such models could be used to create video content for advertising or entertainment or aid in film production processes.

AI Business explores the growing field of text-to-video AI, outlining the models and platforms available today.

Text-to-video AI models

Imagen Video

Creator: Google

First published: October 2022

Imagen Video is a text-to-video version of Google’s Imagen generative model. Using a natural language prompt, Imagen Video generates high-definition videos.

The model can generate videos and text animations in various artistic styles and with 3D object understanding. To achieve this, Imagen Video uses ‘Cascaded Diffusion Models’ - a combination of a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models to create HD videos.

Related:12 Language Models You Need to Know

Access the paper detailing Imagen Video: https://imagen.research.google/video/paper.pdf


Creator: Nightmareai

Try CogVideo via the demo on Hugging Face Spaces: https://huggingface.co/spaces/THUDM/CogVideo

CogVideo is a pre-trained transformer for text-to-video generation. It has 9.4 billion parameters and uses a combination of a text-to-image model, CogView2, and then uses a multi-frame-rate hierarchical training strategy to turn those images into short videos.

CogVideo currently only supports inputs in Chinese - with some demos automatically translating English prompts into simplified Chinese.

Access the CogVideo code: https://github.com/THUDM/CogVideo#


Creators: Meta, FAIR

First published: September 2022

Make-A-Video takes text prompts and generates short videos similar to GIFs. Make-A-Video can also create videos from images or take existing videos and create similar new ones.

Built using publicly available datasets, the model uses images with descriptions to “learn what the world looks like and how it is often described," according to Meta.

Check out the Make-A-Video paper: https://arxiv.org/abs/2209.14792

Read more from AI Business on Make-A-Video: https://aibusiness.com/ml/meta-unveils-ai-model-that-can-generate-videos-from-text-inputs

Related:The Essential List: AI Text-Generation Models and Apps


Creator: Google

First published: October 2022

Phenaki can generate videos from text that are several minutes long, compared to other models on this list. The model was trained on both image-text pairs and a number of video-text examples, a method that Google claims offers improved generation capabilities compared to models that solely use video datasets alone.

Read more on Phenaki: https://sites.research.google/phenaki/

Read the Phenaki research paper: https://openreview.net/forum?id=vOEXS39nOF

AI text-to-video platforms

Here are some text-to-video AI platforms you can try today:


Sythensia is a platform where users can easily type a video idea and the platform generates the content. Users can select a template and edit their script to obtain the desired content. 

The team behind it sought to build a platform where anyone can produce video content. Sythensia can be used to create YouTube 'How To' videos or enterprise-focused content like sales pitches. The Sythensia platform cannot be used to generate political, sexual or discriminatory content.

Hour One

Hour One is an AI video generation platform. Users can create videos from text prompts, as well as use templates and virtual human presenters to craft their ideal output.

Related:AI Image-Generation Models and Tools: The Ultimate List

The likes of HP, T-Mobile and AstraZeneca are among its customers. Hour One tech was used to generate video greetings on Cameo for the Alec Baldwin character, Boss Baby.

Try Hour One: https://app.hourone.ai/?init=signUp


Colossyan users can create videos using text prompts. Its video generation platform auto-translates contents into other languages.

Users can also choose from a range of AI presenters, as well as the ability to customize their own.

Automobile giant BMW, professional services firm AAB and chemical manufacturer BASF are among Colossyan’s client base.

Try Colossyan: https://app.colossyan.com/try

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like