The development of neural radiance fields (NeRFs) opened a new research avenue for 3D modelling and photometric and geometric reconstruction. Although the results have been promising, 3D reconstruction models still typically require multiple 2D input images taken from multiple viewpoints before they can confidently predict an object’s 3D shape. Is there a way to mitigate this data requirement?
In the new paper RealFusion: 360° Reconstruction of Any Object from a Single Image, an Oxford University research team leverages a diffusion model to generate 360° reconstructions of objects from a single image. Their RealFusion approach achieves state-of-the-art performance on monocular 3D reconstruction benchmarks.
The team summarizes their main contributions as follows:
- We propose RealFusion, a method that can extract from a single image of an object a 360◦ photographic 3D reconstruction without assumptions on the type of object imaged or 3D supervision of any kind.
- We do so by leveraging an existing 2D diffusion image generator via a new single image variant of textual inversion.
- We also introduce new regularizers and provide an efficient implementation using InstantNGP.
- We demonstrate state-of-the-art reconstruction results on a number of in-the-wild images and images from existing datasets when compared to alternative approaches.
This work is based on the premise that pretrained diffusion models such as Stable Diffusion have learned high-quality priors — understandings of the real world and how (3D) objects are represented in 2D images — and that these priors can be conditioned to “dream up” or sample images that may plausibly constitute other views of a given object.
The team’s approach for reconstructing a 3D model of an object from a single image leverages such priors to supply missing information rather than obtaining this information through multi-view training data. The team optimizes their radiance field with two simultaneous objectives: 1) a reconstruction objective from a fixed viewpoint, and 2) an SDS-based (Score Distillation Sampling) prior objective based on novel generated views.
The reconstruction objective is used to ensure the radiance field resembles the input image from a specific, fixed view. The prior objective meanwhile does the same with regard to the diffusion model’s generated object samples and their novel viewpoints. The team also introduces new regularizers to encourage their radiance field geometry to have smooth normals.
In their empirical study, the team compared the proposed RealFusion approach against established methods for monocular 3D reconstruction of objects. The results show that RealFusion’s 3D reconstructions have higher quality and more plausible shape, appearance, and extrapolation characteristics.
The paper RealFusion: 360° Reconstruction of Any Object from a Single Image is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.