Recent developments in neural rendering such as neural radiance fields (NeRFs) have enabled photo-realistic reconstruction and novel view synthesis from a large set of camera images. Existing work however has focused on small-scale and object-centric reconstruction, as scaling up to city-scale environments can result in problematic artifacts and low visual fidelity due to limited model capacity.
In the new paper Block-NeRF: Scalable Large Scene Neural View Synthesis, a team from UC Berkeley, Waymo and Google Research takes NeRFs to the next level, proposing the grid-based Block-NeRF variant for representing much larger environments. The team demonstrates Block-NeRF’s capabilities by rendering an entire San Francisco neighbourhood from some 2.8 million images — the largest neural scene representation to date.
Reconstructing city-scale environments is crucial for high-impact use-cases such as autonomous driving and aerial surveying. Yet this task introduces many challenges, including dealing with the presence of transient objects (cars and pedestrians) and varying weather and lighting conditions, as well as limitations with regard to model capacity and memory and compute constraints. Moreover, it is highly unlikely that the training data for such large environments could be collected in a single capture under consistent conditions.
To address these issues, the team proposes splitting large environments into a set of Block-NeRFs that can be independently trained in parallel then rendered and combined dynamically at inference time. This enables the method to expand the environment with additional Block-NeRFs or update existing blocks without retraining the entire environment.
Block-NeRF is built upon NeRFs and the recently introduced mip-NeRF extension, a multiscale representation for anti-aliasing neural radiance fields that reduces aliasing issues that hurt NeRF performance in scenes where the input images observe a given scene from different distances. The team also incorporates techniques from NeRF in the Wild (NeRF-W) to deal with inconsistent scene appearances when applying NeRF to landmarks from the Photo Tourism dataset. The proposed Block-NeRF can thus combine many NeRFs to reconstruct a coherent large environment from millions of images.
In their empirical study, the team used San Francisco’s Alamo Square neighbourhood as the target area and the city’s Mission Bay District as the baseline. Their training dataset was derived from 13.4 hours of driving time sourced from 1,330 different data collection runs for a total of 2,818,745 training images.
The results show that splitting a city-scale scene into multiple lower capacity models can reduce the overall computational cost and that the proposed Block-NeRF method can also effectively handle transient objects by filtering them out during training by masking via a segmentation algorithm. The team hopes their work can inspire future research in large-scale scene reconstruction using modern neural rendering methods.
The paper Block-NeRF: Scalable Large Scene Neural View Synthesis is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.