Windowed Bundle Adjustment Framework for Unsupervised Learning of Monocular Depth Estimation with U-Net Extension and Clip Loss

Download: PDF.

“Windowed Bundle Adjustment Framework for Unsupervised Learning of Monocular Depth Estimation with U-Net Extension and Clip Loss” by L. Zhou and M. Kaess. IEEE Robotics and Automation Letters, RA-L, vol. 5, no. 2, Apr. 2020, pp. 3283-3290.

Abstract

This letter presents a self-supervised framework for learning depth from monocular videos. In particular, the main contributions of this letter include: (1) We present a windowed bundle adjustment framework to train the network. Compared to most previous works that only consider constraints from consecutive frames, our framework increases the camera baseline and introduces more constraints to avoid overfitting. (2) We extend the widely used U-Net architecture by applying a Spatial Pyramid Net (SPN) and a Super Resolution Net (SRN). The SPN fuses information from an image spatial pyramid for the depth estimation, which addresses the context information attenuation problem of the original U-Net. The SRN learns to estimate a high resolution depth map from a low resolution image, which can benefit the recovery of details. (3) We adopt a clip loss function to handle moving objects and occlusions that were solved by designing complicated network or requiring extra information (such as segmentation mask [1]) in previous works. Experimental results show that our algorithm provides state-of-the-art results on the KITTI benchmark.

Download: PDF.

BibTeX entry:

@article{Zhou20ral,
   author = {L. Zhou and M. Kaess},
   title = {Windowed Bundle Adjustment Framework for Unsupervised Learning
	of Monocular Depth Estimation with U-Net Extension and Clip Loss},
   journal = {IEEE Robotics and Automation Letters, RA-L},
   volume = {5},
   number = {2},
   pages = {3283-3290},
   month = apr,
   year = {2020}
}

Last updated: March 21, 2023