Google’s DreamFusion uses AI to generate 3D models from text

Latest in generative AI requires no training on 3D model data

Ben Wodecki

October 5, 2022

3 Min Read

Latest in generative AI requires no training on 3D model data

Researchers from Google are the latest to unveil a generative AI tool capable of turning text prompts into digital 3D representations.

Dubbed DreamFusion, the AI-powered tool can generate 3D models of text inputs.

DreamFusion is an expanded version of Dream Fields, a generative 3D system Google unveiled back in 2021. This latest release, however, requires no prior training – meaning DreamFusion can generate 3D representations of objects without 3D data.

Instead, the system uses 2D images of an object generated by the Imagen text-to-image diffusion model to understand different perspectives of the model it is trying to generate.

According to Google’s AI researchers, the resulting 3D model “can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment.”

“Given a caption, DreamFusion generates relightable 3D objects with high-fidelity appearance, depth, and normal,” according to a breakdown of the project.

DreamFusion: How does it work?

Google’s team proposed the concept of Score Distillation Sampling (SDS) – a way of generating samples from a diffusion model by optimizing a loss function.

“SDS allows us to optimize samples in an arbitrary parameter space, such as a 3D space, as long as we can map back to images differentiably,” they explained.

Google’s researchers then used a 3D scene parameterization similar to Neural Radiance Fields, or NeRFs, to define the differentiable mapping of a model.

“SDS alone produces reasonable scene appearance, but DreamFusion adds additional regularizers and optimization strategies to improve geometry. The resulting trained NeRFs are coherent, with high-quality normals, surface geometry and depth, and are relightable with a Lambertian shading model.”

Here’s a breakdown:

Step 1) Type in your prompt. The example Google offered was ‘a DSLR photo of a peacock on a surfboard.’

Figure 1:

Step 2) Apply the Imagen model to create various 2D angles of the prospective model to predict potential issues that would affect the model quality.

Figure 2:

Step 3) Apply a 3D scene parameterization such as NerF to further optimize the image. Repeat this action to get the best results.

Figure 3:

Step 4) The result is a 3D representation of a peacock on a surfboard. You can now export this as a mesh – using the file formats STL or PLY – for use in another scene or project.

Figure 4:

For a more in-depth explainer, Google’s paper outlining DreamFusion is available via arXiv.

More ways to generate peacocks on surfboards

Dreamfusion follows a host of generative AI tools showcased in the past few weeks, with OpenAI’s DALL-E inviting interest in the concept of generating objects from text prompts.

DALL-E was followed by other text-to-image engines, including Midjourney and Stable Diffusion in rising to public knowledge.

The interest saw the launch of PromptBase, an online marketplace platform giving users the ability to purchase prompts to generate desired images.

The U.S. Copyright Office even granted protection to an AI-generated work. But not everyone is enamored with these newfound artworks, several online platforms, including heavyweight Getty Images, have barred AI-generated content from their sites.

Interest in generative AI is not limited to images, either. Facebook parent Meta recently unveiled Make-A-Video, an AI system capable of generating videos from text prompts.

About the Authors

Ben Wodecki

Assistant Editor

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.