July 25, 2023
Before ChatGPT, sitting atop the generative AI wave were image generation models such as DALL-E, Stable Diffusion and Midjourney.
These models and their striking outputs captivated the mainstream public, enticing an audience yet to come to grips with ChatGPT.
Text-to-image generative technologies came in and altered the dynamics of creativity, challenging the boundary between human imagination and machine interpretation.
AI Business dives into the world of text-to-image generative AI models, explaining how they work and outlining the various models and applications available.
What are text-to-image generative AI models?
Text-to-image AI models take inputs in the form of text prompts and produce an image matching the description using machine learning and deep neural networks.
These models work by training on large datasets that contain both images and corresponding textual descriptions. They learn to understand the relationship between specific words and phrases and the visual components they represent. When a user provides a new textual input, the model uses what it has learned to generate an image that it believes corresponds to the description.
Text-to-image generation has potential applications in numerous fields, such as graphic design, video game development, marketing and advertising.
Text-to-image AI models
Creator: OpenAI – San Francisco-based AI research lab backed by Microsoft
First published: January 2021
Current version: DALL-E 2
OpenAI’s DALL-E can produce high-quality images based on text inputs. According to OpenAI, more than 1.5 million users are already using the tool, generating over two million images a day.
DALL-E is available via what is essentially a freemium model: Users are given 50 credits for signing up and subsequently 15 credits a month. Users wanting further credits can purchase them.
Aside from OpenAI’s own platform, DALL-E can be found as part of the underlying machine learning component of Shutterstock’s text-to-image platform. Parent company OpenAI struck a deal with Shutterstock last October to power the platform, as well as gather user insights related to AI-generated content to address potential ramifications.
Access the code: https://github.com/openai/DALL-E
Stability AI - Based in London, Stability markets and manages the model.
CompVis LMU - Research group from the Ludwig Maximilian University of Munich (LMU Munich) that created the deep generative neural network powering the model.
Runway - New York-based applied AI research company building next-gen creativity tools using generative AI. Runway is responsible for the underlying algorithm that powers Stable Diffusion.
LAION - A German nonprofit that built Stable Diffusion’s underlying dataset
First published: August 2022
Current version: Stable Diffusion XL 1.0
Stable Diffusion uses deep learning to generate results and can be used for inpainting and generating image-to-image translations guided by a text prompt, as well as standard text-to-image generation.
Stable Diffusion can run on consumer-grade hardware − requiring a GPU with just eight gigabytes of RAM, setting it apart from DALL-E and Midjourney, which require cloud services to run.
The team behind Stable Diffusion is looking beyond images, with the lessons learned from its flagship model set to be applied to audio, language, video and 3D generation for both consumer and enterprise use cases.
Creator: David Holz, co-founder of Leap Motion (Now UltraLeap)
First published: July 2022
Current version: Version 5.2
Midjourney is only accessible via a Discord bot on an official Discord server. Users direct message the bot, or they can invite the Midjourney bot to a third-party server, use the /imagine command and type in a prompt and the bot will then generate four images based on the request. Users can then upscale images.
Midjourney is working on a web interface, similar to how Stable Diffusion has ClipDrop.
Access the Midjourney code on GitHub: https://github.com/midjourney
First published: May 2022
Current version: 23.10.3
Not released to the public, Imagen is a text-to-image diffusion model that uses transformer language models to understand text, and hinges on the strength of diffusion models to generate images in high fidelity. Its DrawBench benchmark for text-to-image models allows Imagen to be compared with other methods including VQGAN+CLIP, latent diffusion models and DALL-E 2. Google said human raters preferred Imagen over these other models in sample quality and image-text alignment.
Creator: Nvidia - Chipmaking giant and AI powerhouse
First published: November 2021
Current version: GauGan2
Named after French post-Impressionist painter Paul Gauguin, GauGAN works a little differently to other image models on this list. It creates realistic images from segmentation maps – or labeled sketches depicting a scene.
Users have an MS Paint-style platform to design a landscape or upload their own segmentation maps. Natural language prompts can also be applied. The system will then generate a realistic depiction of the scene − with users able to tweak aspects of the image using a smart paintbrush.
First published: May 2022
Current version: 50f96fcd
Pixray-text2image can generate an image from a text prompt. It uses a combination of image generation features such as Perception Engines and CLIP-guided GAN imagery to generate high-quality images.
Access Pixray-text2image via Replicate: https://replicate.com/pixray/text2image
Access the Pixray-text2image: https://github.com/pixray/pixray
AI text-to-image generator tools
Creator: Jasper - Founded in 2015, Jasper is a Y Combinator-backed startup developing AI-powered productivity tools.
AI model used: DALL-E 2
First published: August 2022
Jasper users can generate images from both text and images. By entering a written prompt or an image and selecting a style, the system will return a series of images.
Jasper Art can be accessed via a user’s dashboard and supports 29 languages. Jasper bills its Art tool as a way for users to “ditch stock photos.”
Access Jasper Art: https://www.jasper.ai/tools/ai-image-generator
Creator: Craiyon - What started out as an independent research project by one Boris Dayma turned into a popular image generator.
First published: April 2022
Formerly DALL-E Mini, Craiyon is designed to be a lightweight version of text-to-image models on this list.
Craiyon is a free-to-use tool for non-commercial purposes. To use the model for commercial use cases, paid subscriptions are available. Users on the premium tiers have access to shorter wait times for their generations. Craiyon also relies on ads to pay for its servers.
Access Craiyon: https://www.craiyon.com/
AI models used: Stable Diffusion, Coherent (CLIP-Guided Diffusion), Artistic (VQGAN+CLIP), Style Transfer
First published: November 2019
NightCafe can be used to generate AI art using natural language prompts. Users have to create an account in order to access the tool.
NightCafe shot to fame for its use of the VQGAN+CLIP text-to-image art generation method, quickly growing in popularity as a result.
As of October 2022, more than 35 million AI-generated artworks have been created on the NightCafe platform.
The NightCafe name is an apparent nod to the Vincent Van Gogh painting of the same name.
Access NightCafe: https://nightcafe.studio/
Creator: Wombo - Canadian AI company whose name comes from the term ‘wombo combo’ from the Super Smash Bros video game.
AI model used: VQGAN+CLIP
First published: February 2021
Current version: version 3.5.0
Turn words into images using your phone: Wombo Dream is a text-to-image tool available as a mobile app.
Its AI generator lets users create and share images. According to its App Store description, Wombo Dream has more than 140 million app installs.
Wombo offers in-app purchases covering monthly and yearly subscriptions.
Access the Wombo app on Apple’s App store: https://apps.apple.com/in/app/wombo-dream-ai-art-generator/id1586366816
Access the Wombo app on the Google Play store: https://play.google.com/store/apps/details?id=com.womboai.wombodream
First published: June 2022
Current version: 3.2.0
Wonder is an app-based AI image generator where handset users can generate artwork and images using natural language prompts.
Images generated on the app can be shared to social media. Wonder offers premium subscription offerings to increase generation amounts.
Access Wonder on the Google Play Store: https://play.google.com/store/apps/details?id=com.codeway.wonder&hl=en&gl=US
Access the Wonder app on Apple’s App store: https://apps.apple.com/us/app/wonder-ai-art-generator/id1621278575
Read more about:ChatGPT / Generative AI
About the Author(s)
You May Also Like