Facebook has introduced a model that turns common two-dimensional pictures into 3D photos. The method, first published at this month’s SIGGRAPH 2020 virtual computer graphics conference, transforms single-shot images and works directly on a mobile device. Although this is not a novel technique for today’s advanced smartphones, the proposed system is designed to work even on low-end mobile phones and without an Internet connection.

With a single shot picture as input, the system estimates the depth of the scene and the content of parallax regions using learning-based methods. It does this through four stages:

Stage 1: Depth Estimation. The researchers proposed a new architecture, Tiefenrausch, with three improvements:

Efficient block structure that is fast on mobile devices

New network design that balances accuracy, latency, and model size using a neural architecture search algorithm

Reduced model size and latency through 8-bit quantization

Stage 2: Layer Generation. Depth discontinuities were solved by grouping discontinuities into curve-like features (colour-coded, (a) in the above illustration), and inferring spatial constraints to better shape their growth (dashed lines, see above). The pixels are lifted onto a layered depth image (LDI). The researchers synthesized a new geometry by running an expansion algorithm for 50 iterations to obtain a multi-layered LDI with sufficient overlap for displaying with parallax.

Stage 3: Colour Inpainting. The researchers inpainted on the LDI structure by traversing the connections of LDI pixels to aggregate a local neighbourhood around a pixel, which allowed them to train a network in 2D and then use the pretrained weights for LDI inpainting. They created a new architecture, Farbrausch, to optimize the inpainting network to a mobile-friendly size.

Stage 4: Meshing. A custom algorithm constructs a simplified 3D triangle mesh directly. It exploits the 2.5D structure of representation by operating in the 2D texture atlas domain: simplifying and triangulating the chart polygons first in 2D, then later lifting them to 3D.

Altogether, the processing takes just a few seconds, even on offline low-end mobile devices. In experiments, the method showed comparable performance and accuracy to current state-of-the-art 3D image generation approaches.



The paper One Shot 3D Photography is on arXiv. The code is available on GitHub.

Analyst: Reina Qi Wan | Editor: Michael Sarazen; Fangyu Cai

