AI Image-Generation Models and Tools: The Ultimate List

AI models like DALL-E and Stable Diffusion paved the way for text-to-image innovation, transforming creative expression

Ben Wodecki, Jr. Editor

July 25, 2023

7 Min Read
Close-up photorealistic image of a young woman's face, with Mediterranean features
Created using the prompt: Portrait of a young girl with tearful eyesCredit: Craiyon

Before ChatGPT, sitting atop the generative AI wave were image generation models such as DALL-E, Stable Diffusion and Midjourney.

These models and their striking outputs captivated the mainstream public, enticing an audience yet to come to grips with ChatGPT.

Text-to-image generative technologies came in and altered the dynamics of creativity, challenging the boundary between human imagination and machine interpretation.

AI Business dives into the world of text-to-image generative AI models, explaining how they work and outlining the various models and applications available.

What are text-to-image generative AI models?

Text-to-image AI models take inputs in the form of text prompts and produce an image matching the description using machine learning and deep neural networks.

These models work by training on large datasets that contain both images and corresponding textual descriptions. They learn to understand the relationship between specific words and phrases and the visual components they represent. When a user provides a new textual input, the model uses what it has learned to generate an image that it believes corresponds to the description.

Text-to-image generation has potential applications in numerous fields, such as graphic design, video game development, marketing and advertising.

Related:12 Language Models You Need to Know

Text-to-image AI models


Creator: OpenAI – San Francisco-based AI research lab backed by Microsoft

First published: January 2021

Current version: DALL-E 2

OpenAI’s DALL-E can produce high-quality images based on text inputs. According to OpenAI, more than 1.5 million users are already using the tool, generating over two million images a day.

DALL-E is available via what is essentially a freemium model: Users are given 50 credits for signing up and subsequently 15 credits a month. Users wanting further credits can purchase them.

Aside from OpenAI’s own platform, DALL-E can be found as part of the underlying machine learning component of Shutterstock’s text-to-image platform. Parent company OpenAI struck a deal with Shutterstock last October to power the platform, as well as gather user insights related to AI-generated content to address potential ramifications.

Access the code:

Stable Diffusion


  • Stability AI - Based in London, Stability markets and manages the model.

  • CompVis LMU - Research group from the Ludwig Maximilian University of Munich (LMU Munich) that created the deep generative neural network powering the model.

  • Runway - New York-based applied AI research company building next-gen creativity tools using generative AI. Runway is responsible for the underlying algorithm that powers Stable Diffusion.

  • LAION - A German nonprofit that built Stable Diffusion’s underlying dataset

Related:The Essential List: AI Text-Generation Models and Apps

First published: August 2022

Current version: Stable Diffusion XL 1.0

Stable Diffusion uses deep learning to generate results and can be used for inpainting and generating image-to-image translations guided by a text prompt, as well as standard text-to-image generation.

Stable Diffusion can run on consumer-grade hardware − requiring a GPU with just eight gigabytes of RAM, setting it apart from DALL-E and Midjourney, which require cloud services to run.

The team behind Stable Diffusion is looking beyond images, with the lessons learned from its flagship model set to be applied to audio, language, video and 3D generation for both consumer and enterprise use cases.

Stable Diffusion’s popularity propelled both Runway and Stability AI into the limelight, helping the companies recently raise $50 million and $101 million in funding rounds, respectively. 


Creator: David Holz, co-founder of Leap Motion (Now UltraLeap)

First published: July 2022

Current version: Version 5.2

Midjourney is only accessible via a Discord bot on an official Discord server. Users direct message the bot, or they can invite the Midjourney bot to a third-party server, use the /imagine command and type in a prompt and the bot will then generate four images based on the request. Users can then upscale images.

Related:AI Code Generation Models: The Big List

Midjourney is working on a web interface, similar to how Stable Diffusion has ClipDrop.

Access the Midjourney code on GitHub:


Creator: Google

First published: May 2022

Current version: 23.10.3

Not released to the public, Imagen is a text-to-image diffusion model that uses transformer language models to understand text, and hinges on the strength of diffusion models to generate images in high fidelity. Its DrawBench benchmark for text-to-image models allows Imagen to be compared with other methods including VQGAN+CLIP, latent diffusion models and DALL-E 2. Google said human raters preferred Imagen over these other models in sample quality and image-text alignment.


Creator: Nvidia - Chipmaking giant and AI powerhouse

First published: November 2021

Current version: GauGan2

Named after French post-Impressionist painter Paul Gauguin, GauGAN works a little differently to other image models on this list. It creates realistic images from segmentation maps – or labeled sketches depicting a scene.

Users have an MS Paint-style platform to design a landscape or upload their own segmentation maps. Natural language prompts can also be applied. The system will then generate a realistic depiction of the scene − with users able to tweak aspects of the image using a smart paintbrush.


Creator: Pixray

First published: May 2022

Current version: 50f96fcd

Pixray-text2image can generate an image from a text prompt. It uses a combination of image generation features such as Perception Engines and CLIP-guided GAN imagery to generate high-quality images. 

Access Pixray-text2image via Replicate:

Access the Pixray-text2image:

AI text-to-image generator tools

Jasper Art

Creator: Jasper - Founded in 2015, Jasper is a Y Combinator-backed startup developing AI-powered productivity tools.

AI model used: DALL-E 2

First published: August 2022

Jasper users can generate images from both text and images. By entering a written prompt or an image and selecting a style, the system will return a series of images.

Jasper Art can be accessed via a user’s dashboard and supports 29 languages. Jasper bills its Art tool as a way for users to “ditch stock photos.” 

Access Jasper Art:


Creator: Craiyon - What started out as an independent research project by one Boris Dayma turned into a popular image generator.

First published: April 2022

Formerly DALL-E Mini, Craiyon is designed to be a lightweight version of text-to-image models on this list. 

Craiyon is a free-to-use tool for non-commercial purposes. To use the model for commercial use cases, paid subscriptions are available. Users on the premium tiers have access to shorter wait times for their generations. Craiyon also relies on ads to pay for its servers.

Access Craiyon:


Creator: NightCafe Studio - Founded by Angus Russell in a November 2019 Reddit post

AI models used: Stable Diffusion, Coherent (CLIP-Guided Diffusion), Artistic (VQGAN+CLIP), Style Transfer

First published: November 2019

NightCafe can be used to generate AI art using natural language prompts. Users have to create an account in order to access the tool. 

NightCafe shot to fame for its use of the VQGAN+CLIP text-to-image art generation method, quickly growing in popularity as a result.

As of October 2022, more than 35 million AI-generated artworks have been created on the NightCafe platform. 

The NightCafe name is an apparent nod to the Vincent Van Gogh painting of the same name.

Access NightCafe:


Creator: Wombo - Canadian AI company whose name comes from the term ‘wombo combo’ from the Super Smash Bros video game.

AI model used: VQGAN+CLIP

First published: February 2021

Current version: version 3.5.0

Turn words into images using your phone: Wombo Dream is a text-to-image tool available as a mobile app.

Its AI generator lets users create and share images. According to its App Store description, Wombo Dream has more than 140 million app installs. 

Wombo offers in-app purchases covering monthly and yearly subscriptions.

Access the Wombo app on Apple’s App store:

Access the Wombo app on the Google Play store:


Creator: Codeway Digital - Founded by Anıl Simsek, Codeway is a Turkish AI app developer that has also created Ask AI, Facemix and PixelUp

First published: June 2022

Current version: 3.2.0

Wonder is an app-based AI image generator where handset users can generate artwork and images using natural language prompts. 

Images generated on the app can be shared to social media. Wonder offers premium subscription offerings to increase generation amounts.

Access Wonder on the Google Play Store:

Access the Wonder app on Apple’s App store:

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like