Apple Unveils GAUDI: An AI Model that Generates 3D Scenes from Any Angle

Type ‘move further down the corridor’ and the model repositions the scene.

Ben Wodecki, Jr. Editor

August 10, 2022

2 Min Read
GAUDI can generate new camera movements through 3D indoor scenes via textApple

Apple is throwing its hat in the AI text-to-image ring with GAUDI, an AI model that can generate 3D scenes from text prompts – and redraws the scene from any angle.

Named after the famous Spanish architect known for his whimsical designs, Antoni Gaudi, Apple’s AI model uses a camera pose decoder that allows it to predict possible camera positions of a scene. The decoder then enables the model to predict the 3D canvas from essentially any angle.

Apple's team showcased GAUDI in a paper reconstructing views of interior scans of rooms on a quality level the researchers suggested was on par with existing 3D scene generation techniques.

GAUDI can also generate new camera movements through 3D indoor scenes via text, such as a user input to ‘go through the corridor.’

According to their paper, Apple’s researchers believe GAUDI “generalizes” previous works of 3D scene generation that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples.

"We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene," the authors wrote.

Related:This AI system can make old photos look brand new

6288.jpg

Co-author Miguel Ángel Bautista, a senior research scientist at Apple, said in a tweet that GAUDI tackles “the problem of learning a generative model of 3D scenes parametrized as radiance fields.”

“Very exciting times ahead for the interplay of powerful generative models and 3D data,” he added.

Apple published GAUDI’s repository to GitHub.

GAUDI, meet GauGAN and GFP-GAN

The model’s application is similar to GauGAN2, developed by Nvidia. GauGAN2 can generate images using text, with users able to type phrases like ‘winter’ and the model able to produce images that match the desired descriptors.

The release of GAUDI comes after researchers from Chinese tech company Tencent published a model that can restore damaged and low-resolution pictures.

GFP-GAN uses a combination of a proprietary model and a pre-trained StyleGAN-2 model from Nvidia to effectively fill in the missing elements of an old image in seconds.

Read more about:

ChatGPT / Generative AI

About the Author

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like