View synthesis is a computer vision technique that aims to recover a 3D scene representation to enable rendering of photorealistic images of the scene from unobserved viewpoints. View synthesis has recently seen impressive progress via the use of neural volumetric representations such as Neural Radiance Fields (NeRF).
Despite NeRF’s success, its rendering procedure is slow and compute-heavy, which limits its use for interactive view synthesis and makes it impossible to display a recovered 3D model in a normal web browser. To address this issue, a team from Google Research has proposed an approach to accelerate NeRF’s rendering procedure, enabling it to work in real-time while retaining its ability to represent fine geometric details and convincing view-dependent effects.
NeRF uses a multilayer perceptron (MLP) that represents a scene as a continuous neural volumetric function by mapping from a 3D coordinate to the volume density and view-dependent emitted radiance at that position. Recent research has explored strategies to improve the efficiency of NeRF, and one of the most effective approaches has been discretized volumetric representations. The Google researchers extend this approach with a deferred neural rendering technique for modelling view-dependent effects that enables the visualization of trained NeRF models in real-time on commodity hardware with minimal quality degradation.
The researchers reformulate NeRF in three ways:
- Limit the computation of view-dependent effects to a single network evaluation per ray.
- Introduce a small bottleneck in the network architecture that can be efficiently stored as 8-bit integers.
- Introduce a sparsity loss during training, which concentrates the opacity field around surfaces in the scene.
NeRF’s MLP is used to predict a 256-dimensional feature vector for each input 3D location and then integrate the feature vector outputs with the viewing direction to decode into an RGB colour. This process however makes the evaluation of an MLP at every sample to estimate the view-dependent colour prohibitively expensive for real-time rendering. The proposed deferred NeRF architecture instead restructures NeRF to output a diffuse RGB colour and a 4-dimensional feature vector via a sigmoid function (a function that can compress colour and feature vectors). In this way, it is only necessary to evaluate the MLP and produce view-dependent effects at a rate of once per pixel, not once per sample.
Rendering time and required storage for a volumetric representation strongly depend on the sparsity of opacity within a given scene. To reduce time and memory cost, the team added a regularizer that penalizes predicted density, so as to make NeRF’s opacity field more sparse.
The team then converted a trained deferred NeRF model into a representation suitable for real-time rendering. They replaced the MLP evaluations in NeRF with fast lookups in a precomputed data structure they call Sparse Neural Radiance Grid (SNeRG), which makes it possible to skip blocks of empty space during rendering.
All values in the “baked” SNeRG representation are then quantified to 8 bits and the indirection grid and the 3D texture atlas separately compressed. Although this approach makes SNeRG compact and easy to distribute, the quality of images rendered from the baked SNeRG is compromised. To improve final rendering quality, the researchers fine-tuned the weights of the deferred per-pixel shading MLP.
The team conducted extensive experiments on free-viewpoint rendering of 360-degree scenes to compare the proposed approach to recent techniques for accelerating NeRF. The evaluations considered three criteria: render-time performance, storage cost, and rendering quality.
The experiments showed that removing the view-dependence MLP had little impact on runtime performance, removing the sparsity loss resulted in increased memory usage, and changing the proposed “deferred” rendering to NeRF resulted in prohibitively large render times.
After fine-tuning, the rendering quality of the proposed SNeRG model was competitive with the neural model. The storage ablation study validated that the compressed SNeRG representations are small enough to be quickly loaded in a web page or display at over 30 frames per second on a laptop GPU.
The team says it hopes the proposed approach’s ability to render neural volumetric representations such as NeRF in real-time on commodity graphics hardware will increase the adoption of such neural scene representations across a variety of vision and graphics applications.
The paper Baking Neural Radiance Fields for Real-Time View Synthesis is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.