This AI model can turn still images into detailed 3D environments

COLMAP is free to use, requires just two images, and improves on classic approaches

Ben Wodecki

October 26, 2021

2 Min Read

It’s now possible to turn still images into exportable 3D environments using an AI algorithm.

COLMAP is a general-purpose, open source Structure-from-Motion (SfM) and Multi-View Stereo (MVS) 3D reconstruction pipeline with a graphical and command-line interface.

Capable of running on Windows, Mac, or Linux, the system offers a wide range of features for reconstructing ordered and unordered image collections.

Linux users are recommended to use CUDA, which requires an Nvidia GPU.

COLMAPping the future

Using an AI system to recreate environments in 3D isn’t new – software firm Matterport’s entire business model revolves around creating digital twins of buildings.

And Nvidia is working on Vid2Vid Cameo – an AI model capable of creating realistic videos of a person from a single photo.

But what sets COLMAP apart is it’s free, and available to download under the BSD license via Github.

The technology could see potential uses in creating virtual reality experiences and visual effects.

COLMAP users can export a 3D mesh – although to refine it, programs like MeshLab would be required.

The algorithm is based on a paper titled ‘Structure-from-Motion Revisited,’ written by Johannes L Schonberger from ETH Zurich and Jan-Michael Frahm from the University of North Carolina.

Schonberger is a principal scientist at the Microsoft Mixed Reality & AI lab, and Frahm is a research scientist manager at Facebook – and the former’s PhD adviser.

“While incremental reconstruction systems have tremendously advanced in all regards, robustness, accuracy, completeness, and scalability remain the key problems towards building a truly general-purpose pipeline,” the pair wrote.

“We propose a new SfM technique that improves upon the state of the art to make a further step towards this ultimate goal.”

The system works by analyzing still images, and, using a 3D model, attempting to visualize how a scene would look from any viewpoint.

To capture images for use with COLMAP, a digital camera is recommended, however, the images do not have to be taken with a single camera, as differences in resolutions or image sizes are acceptable.

While it’s possible to create a reconstruction from just two images, more data creates a higher resolution experience.

A cited example of 297 images showed that COLMAP completed the colored mesh reconstruction process in 15 hours. Comparatively, the existing CMPMVS application took over 20 hours.

The results from the test suggest that the COLMAP pipeline is "more comprehensive since it takes [an] image input and generates sparse/dense/mesh results.”

The system is also better at smoothing out featureless surfaces, the researchers said.

Although the program can export a 3D mesh, you can only work with the dense point cloud viewport. To clean up and refine the 3D mesh, you need to work with other programs like MeshLab.

About the Author(s)

Ben Wodecki

Assistant Editor

Stay Ahead of the Curve
Get the latest news, insights and real-world applications from the AI Business newsletter

You May Also Like