November 29, 2022
The newest version was released as open-source software last week. It includes text-to-image models trained using a new text encoder, OpenCLIP, which was developed by LAION with support from Stability. OpenCLIP is designed to improve the quality of generated images compared with the original version of the AI engine.
Stable Diffusion 2 can generate images with default resolutions of 512x512 pixels – the same as the previous iteration – but also 768x768 pixels. Users will likely use external methods to further upscale the images, including chaiNNer or TinyWow.
A depth-guided stable diffusion model, depth2img, was also added to Stable Diffusion 2.0 to generate new images while retaining the same basic shape and depth of an original image.
depth2img can be used for structure-preserving image-to-image and shape-conditional image synthesis.
Stable Diffusion 2.0 also has a new text-guided inpainting model meaning users can switch out parts of an image at speed.
“Just like the first iteration of Stable Diffusion, we’ve worked hard to optimize the model to run on a single GPU–we wanted to make it accessible to as many people as possible from the very start,” Stability said upon announcement.
“This new release, along with its powerful new features like depth2img and higher resolution upscaling capabilities, will serve as the foundation of countless applications and enable an explosion of new creative potential.”
Stability's new Stable Diffusion release comes hot off the heels of the company securing $101 million in new funding from backers including Coatue, Lightspeed Venture Partners and O'Shaughnessy Ventures. Before releasing Stable Diffusion 2.0, the startup said it wanted to develop open AI models for language, audio and video for both consumer and enterprise use cases.
About the Author(s)
You May Also Like