Recent progress in neural 3D reconstruction has greatly simplified capturing realistic digital representations of real-world 3D objects and scenes via neural radiance fields (NeRFs) built using information from multiple camera viewpoints. Current approaches for editing such 3D representations are however much less accessible, typically requiring specialized tools.
In the new paper Instruct-NeRF2NeRF: Editing 3D Scenes With Instructions, a UC Berkeley research team presents Instruct-NeRF2NeRF, an approach for editing 3D NeRF scenes through natural language text instructions alone. The proposed method is able to edit large-scale, real-world 3D scenes with improved ease of use and realism.

Instruct-NeRF2NeRF takes as its inputs a reconstructed NeRF scene, a set of captured images and their corresponding camera poses, and camera calibration information. The user’s natural-language editing instructions are then used to condition the model’s edited NeRF output.

Instruct-NeRF2NeRF uses InstructPix2Pix — a diffusion-based model specialized for image editing — to iteratively update image content at the captured viewpoints. These dataset edits are then consolidated into a globally consistent 3D representation via NeRF training. This novel Iterative Dataset Update (Iterative DU) approach enables Instruct-NeRF2NeRF to gradually percolate diffusion priors into a 3D scene reconstruction while maintaining the original scene’s structure and identity.

The team uses NeRFStudio’s Nerfacto model as their underlying NeRF implementation and fine-tunes parameters that affect noise/signal strength and the model’s classifier-free guidance weights to optimize edit strength and enable different degrees of scene edits before performing the NeRF optimization process.

In their empirical study, the team applied Instruct-NeRF2NeRF to the editing of 360 unique 3D scenes of varying complexity and compared its qualitative and quantitative performance against ablative baselines. The results show that Instruct-NeRF2NeRF can perform superior targeted edits on 3D representations of people, objects, and large-scale real-world scenes and impart its outputs with realism that surpasses the benchmarks.
Result videos can be found on the project’s website. The paper Instruct-NeRF2NeRF: Editing 3D Scenes With Instructions is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
This type of 3D image editing has gradually become popular because of its very high practicality. It’s not as difficult to use as I initially thought.
The way you define things is very helpful. This was a very good lesson and gave me a lot to think about