R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

ECCV 2022

1Snap Inc., 2Northeastern University
(*Work done when Huan was an intern at Snap)
We present R2L, a deep (88-layer) residual MLP network that can represent the neural light field (NeLF) of complex synthetic and real-world scenes. It is featured by compact representation size (~20MB storage size), fast rendering speed (~30x speedup than NeRF), significantly improved visual quality (1.4dB boost than NeRF), with no whistles and bells (no special data structure or parallelism required).


Recent research explosion on Neural Radiance Field (NeRF) shows the encouraging potential to represent complex scenes with neural networks. One major drawback of NeRF is its prohibitive inference time: Rendering a single pixel requires querying the NeRF network hundreds of times. To resolve it, existing efforts mainly attempt to reduce the number of required sampled points. However, the problem of iterative sampling still exists. On the other hand, Neural Light Field (NeLF) presents a more straightforward representation over NeRF in novel view synthesis -- the rendering of a pixel amounts to one single forward pass without ray-marching. In this work, we present a deep residual MLP network (88 layers) to effectively learn the light field. We show the key to successfully learning such a deep NeLF network is to have sufficient data, for which we transfer the knowledge from a pre-trained NeRF model via data distillation. Extensive experiments on both synthetic and real-world scenes show the merits of our method over other counterpart algorithms. On the synthetic scenes, we achieve 26-35x FLOPs reduction (per camera ray) and 28-31x runtime speedup, meanwhile delivering significantly better (1.4-2.8 dB average PSNR improvement) rendering quality than NeRF without any customized implementation tricks.

1. Visual Comparison on NeRF Synthetic and Realistic Datasets

(You may pause the video to review the difference between NeRF and ours)

Scene: Chair (Blender). Left: NeRF (PSNR: 33.90), Right: Ours (PSNR: 36.71)

Scene: Drums (Blender). Left: NeRF (PSNR: 25.56), Right: Ours (PSNR: 26.03)

Scene: Ficus (Blender). Left: NeRF (PSNR: 28.88), Right: Ours (PSNR: 28.63)

Scene: Hotdog (Blender). Left: NeRF (PSNR: 34.64), Right: Ours (PSNR: 38.07)

Scene: Lego (Blender). Left: NeRF (PSNR: 31.42), Right: Ours (PSNR: 32.53)

Scene: Materials (Blender). Left: NeRF (PSNR: 29.22), Right: Ours (PSNR: 30.20)

Scene: Mic (Blender). Left: NeRF (PSNR: 30.84), Right: Ours (PSNR: 32.80)

Scene: Ship (Blender). Left: NeRF (PSNR: 29.30), Right: Ours (PSNR: 29.98)

Scene: Room (LLFF). Left: NeRF (PSNR: 33.07), Right: Ours (PSNR: 33.30)

Scene: Fern (LLFF). Left: NeRF (PSNR: 26.86), Right: Ours (PSNR: 26.87)

Scene: Leaves (LLFF). Left: NeRF (PSNR: 22.40), Right: Ours (PSNR: 22.71)

Scene: Orchids (LLFF). Left: NeRF (PSNR: 21.29), Right: Ours (PSNR: 21.01)

Scene: Flower (LLFF). Left: NeRF (PSNR: 28.22), Right: Ours (PSNR: 28.67)

Scene: T-Rex (LLFF). Left: NeRF (PSNR: 28.10), Right: Ours (PSNR: 28.12)

Scene: Horns (LLFF). Left: NeRF (PSNR: 28.86), Right: Ours (PSNR: 28.95)

2. Visual Comparison on DONeRF Synthetic Dataset

(You may pause the video to see the difference between NeRF and ours)

Scene: Sanmiguel (Blender). Left: NeRF (PSNR: 28.96), Right: Ours (PSNR: 31.37)

Scene: Pavillon (Blender). Left: NeRF (PSNR: 32.82), Right: Ours (PSNR: 34.10)

Scene: Classroom (Blender). Left: NeRF (PSNR: 35.33), Right: Ours (PSNR: 38.96)

Scene: Bulldozer (Blender). Left: NeRF (PSNR: 36.85), Right: Ours (PSNR: 38.01)

Scene: Forest (Blender). Left: NeRF (PSNR: 28.11), Right: Ours (PSNR: 34.18)

Scene: Barbershop (Blender). Left: NeRF (PSNR: 33.92), Right: Ours (PSNR: 36.05)


  title={R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis},
  author={Wang, Huan and Ren, Jian and Huang, Zeng and Olszewski, Kyle and Chai, Menglei and Fu, Yun and Tulyakov, Sergey},
  booktitle={European Conference on Computer Vision},