Depth-supervised NeRF: Fewer Views and Faster Training for Free

Kangle Deng1    Andrew Liu2    Jun-Yan Zhu1    Deva Ramanan1,3

1CMU    2Google    3Argo AI   

Paper | GitHub | YouTube

Abstract

We propose DS-NeRF (Depth-supervised Neural Radiance Fields), a model for learning neural radiance fields that takes advantage of depth supervised by 3D point clouds. Current NeRF methods require many images with known camera parameters -- typically produced by running structure-from-motion (SFM) to estimate poses and a sparse 3D point cloud. Most, if not all, NeRF pipelines make use of the former but ignore the latter. Our key insight is that such sparse 3D input can be used as an additional free signal during training. By adding a loss to ensure that depth rendered along rays that intersect these 3D points are close to their observed depth, we find that DS-NeRF can train 2-6x faster and synthesize better results from fewer training views. Using only 2 views from the NeRF Real-world dataset, our method's view synthesis significantly outperforms original NeRF, as well as other sparse-view variants. Unlike our category-free approach, competing baselines are handicapped by the need to learn category priors from training scenes and are sensitive test time domain-shift. In spite of this, we show that our loss is compatible with the above NeRF models, demonstrating that depth is a cheap and generalizable supervisory signal.


paper thumbnail

Paper

arxiv:2107.02791 , 2021.

Citation

Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. "Depth-supervised NeRF: Fewer Views and Faster Training for Free", in arXiv, 2021.
Bibtex




Video


Visual Comparisons

Trained on 2 views:

RGB

NeRF DS-NeRF

Depth

NeRF DS-NeRF

RGB

NeRF DS-NeRF

Depth

NeRF DS-NeRF

RGB

NeRF DS-NeRF

Depth

NeRF DS-NeRF

RGB

NeRF DS-NeRF

Depth

NeRF DS-NeRF

RGB

NeRF DS-NeRF

Depth

NeRF DS-NeRF

Trained on 5 views:

RGB

NeRF DS-NeRF

Depth

NeRF DS-NeRF

RGB

NeRF DS-NeRF

Depth

NeRF DS-NeRF


Make use of RGB-D input

DS-NeRF is able to use different sources of depth information other than COLMAP, such as RGB-D input. We derive dense depth maps for each training view with RGB-D input from the Redwood dataset. With dense depth supervision, our DS-NeRF can render even higher quality images and depth maps than DS-NeRF with COLMAP sparse depth supervision.

NeRF

DS-NeRF with COLMAP

DS-NeRF with RGB-D

NeRF

DS-NeRF with COLMAP

DS-NeRF with RGB-D



Failure Cases

Our model does not work well when COLMAP fails. COLMAP does not generate accurate sparse 3D points (or accurate cameras) in textureless scenes.

Trained on 2 views

NeRF DS-NeRF
NeRF DS-NeRF


Acknowledgment

We thank Takuya Narihira, Akio Hayakawa, Sheng-Yu Wang, and for helpful discussion. We are grateful for the support from Sony Corporation, Singapore DSTA, and the CMU Argo AI Center for Autonomous Vehicle Research.