New iteration trained on nearly 10 million images
Nvidia has unveiled an updated version of its GauGAN model, GauGAN2.
GauGAN was a Microsoft Paint-style platform that let uses create landscape images, with the model then able to turn them into photorealistic images.
The new GauGAN2 can generate images using only text.
Users can type phrases like ‘winter,’ ‘foggy’ or ‘rainbow’ and the AI model can produce images that match the desired descriptors.
“With the press of a button, users can generate a segmentation map, a high-level outline that shows the location of objects in the scene,” according to the Nvidia announcement.
“From there, they can switch to drawing, tweaking the scene with rough sketches using labels like sky, tree, rock and river, allowing the smart paintbrush to incorporate these doodles into stunning images.”
GauGAN2: Paint me a picture
The original GauGAN dates back to 2019. It was trained on public images from the platform Flickr.
The first version was trained on just over 1 million pictures. GauGAN2 however was trained on 10 times that and can understand natural language descriptions relating to landscapes.
The first iteration of GauGAN was repackaged as Nvidia Canvas, a free app in beta for any RTX GPU user.
The company’s newly released demo of GauGAN2 is “one of the first to combine multiple modalities (text, semantic segmentation, sketch and style) within a single GAN framework,” Nvidia said.
“This makes it faster and easier to turn an artist’s vision into a high-quality AI-generated image.”
The announcement does not mention any commercialization plans, or whether it will be integrated with Canvas, stating that the demo “illustrates the future possibilities for powerful image-generation tools for artists.”
Nvidia’s GauGAN2 comes shortly after a myriad of unveilings at its recent GTC event.
There, it showed off the Jetson AGX Orin, a small yet powerful compute module for AI workloads, Riva Custom Voice, a software platform that can create ‘human-like’ voices, and Omniverse Avatar, a platform for creating interactive three-dimensional representations of people.