通過層級特徵融合增強無監督單眼深度估計

最近，通過優化幀間的光度一致性，深度學習方法在單眼影片序列的深度估計和視覺測距中表現出良好的性能。然而，獲取大規模的深度圖真實值以用於監督神經網路仍然存在困難。受到近期語義分割深度學習方法的啟發，我們提出了一種簡單但有效的無監督學習深度網路，可用於更準確地進行深度估計和相機運動估計。一個空洞空間金字塔池化模塊和一個附加的細化層被組合到基礎的編碼器-解碼器模型之中。此外，我們引入了一項一致性正則化損失，以增加模型處理光照變化的魯棒性。實驗證明，我們的方法可以生成具有更清晰對象邊界的高解析度深度圖，並且在KITTI基準測試中獲得了良好的結果。

關鍵字

深度估計；深度學習；無監督學習

並列摘要

Recently, deep methods have shown good performance in depth estimation and Visual Odometry from monocular video sequence by optimizing the photometric consistency between frames. However, it remains hard to obtain large-scale ground truth depth maps for supervising a neural network for depth estimation. Meanwhile, existing solutions for depth estimation typically produce low resolution results. Inspired by recent deep learning methods for semantic segmentation, we present a simple but effective unsupervised learning deep network for more accurate depth estimation and camera motion estimation. An atrous spatial pyramid pooling module and an additional refinement layer are combined to an encoder-decoder base model. Besides, we introduce a consistency-regularization loss to increase the robustness towards handling illumination change. Our approach produces high-resolution depth maps with sharper object boundaries and achieve better results on the KITTI benchmark.

並列關鍵字

depth estimation ； deep learning ； unsupervised learning

參考文獻

[1] Tateno K, Tombari F, Laina I, et al. CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction[J]. arXiv preprint arXiv:1704.03489, 2017.

Google Scholar

[2] A. Saxena, M. Sun, and A. Ng. Make3d: Learning 3d scene structure from a single still image. PAMI, 2009.

Google Scholar

[3] D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In NeurIPS, 2014.

Google Scholar

[4] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in PyTorch. In NeurIPS-W, 2017.D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE International Conference on Computer Vision, 2015.

Google Scholar

[5] F. Liu, C. Shen, and G. Lin, “Deep Convolutional Neural Fields for Depth Estimation from a Single Image ,” Proceedings of the IEEE International Conference on Computer Vision, 2015.

Google Scholar

國際替代計量

通過層級特徵融合增強無監督單眼深度估計

未授權

主題瀏覽