AI Business is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

Apple unveils GAUDI: An AI model that generates 3D scenes from any angle

by
 
Article ImageType ‘move further down the corridor’ and the model repositions the scene.

Apple is throwing its hat in the AI text-to-image ring with GAUDI, an AI model that can generate 3D scenes from text prompts – and redraws the scene from any angle.

Named after the famous Spanish architect known for his whimsical designs, Antoni Gaudi, Apple’s AI model uses a camera pose decoder that allows it to predict possible camera positions of a scene. The decoder then enables the model to predict the 3D canvas from essentially any angle.

Apple's team showcased GAUDI in a paper reconstructing views of interior scans of rooms on a quality level the researchers suggested was on par with existing 3D scene generation techniques.

GAUDI can also generate new camera movements through 3D indoor scenes via text, such as a user input to ‘go through the corridor.’


Related stories:

This AI system can make old photos look brand new

Nvidia launches latest GauGAN AI landscape generation model


According to their paper, Apple’s researchers believe GAUDI “generalizes” previous works of 3D scene generation that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples.

"We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene," the authors wrote.

Co-author Miguel Ángel Bautista, a senior research scientist at Apple, said in a tweet that GAUDI tackles “the problem of learning a generative model of 3D scenes parametrized as radiance fields.”

“Very exciting times ahead for the interplay of powerful generative models and 3D data,” he added.

Apple published GAUDI’s repository to GitHub.

GAUDI, meet GauGAN and GFP-GAN

The model’s application is similar to GauGAN2, developed by Nvidia. GauGAN2 can generate images using text, with users able to type phrases like ‘winter’ and the model able to produce images that match the desired descriptors.

The release of GAUDI comes after researchers from Chinese tech company Tencent published a model that can restore damaged and low-resolution pictures.

GFP-GAN uses a combination of a proprietary model and a pre-trained StyleGAN-2 model from Nvidia to effectively fill in the missing elements of an old image in seconds.

Trending Stories
All Upcoming Events

Upcoming Webinars

More Webinars

Latest Videos

More videos

EBooks

More EBooks

Research Reports

More Research Reports
AI Knowledge Hub

Newsletter Sign Up


Sign Up