幾何感知表示學習用於非監督式單眼深度估計

對於場景理解來說，單眼深度預測(monocular depth estimation)是一個重要的判斷依據，雖然目前大量的監督式和非監督式機器學習方法被提出，並在單眼深度預測上取得長足的進展，但通常大部分的方法在物體邊界以及細節上無法獲得很好的結果，而這些部份的深度資訊在生活應用上卻是相對重要部份。在這篇論文當中，我們提出一個全新的「幾何結構表示學習方法」 (geometry-aware representation learning)，透過加入語意分割的資訊將物體幾何結構納入單眼深度預測中，搭配上一系列的特殊條件判別器，用於統整物體結構和視覺外觀，最終有效幫助非監督式單眼深度預測，改善之前大部分方法於物體邊界和細節上不準確的問題。透過在公開資料集上定量和定性分析，證明我們的表示學習方法在非監督式單眼深度預測上比肩於目前其餘最先進方法的結果，並於特定物體上取得明顯的進步。

關鍵字

場景理解；單眼深度預測；語義分割；領域自適應；多任務學習；非監督學習；表示學習

並列摘要

Monocular depth estimation plays an important role in scene understanding. While a number of supervised and unsupervised learning approaches for monocular depth estimation have been proposed, their promising quantitative performance might not necessarily reflect satisfactory quality of the depth outputs due to inaccurate object boundaries. In this paper, we propose a novel approach of geometry-aware representation learning, which takes object geometry into account with the aid of semantic scene understanding (ie semantic segmentation). With a series of unique combinatorial conditional discriminators deployed over visual appearance and geometric representations, improved unsupervised depth estimation can be achieved. Experiments on the KITTI dataset successively verify that our model performs favorably against recent approaches.

並列關鍵字

scene understanding ； monocular depth estimation ； semantic segmentation ； domain adaptation ； multi-task learning ； unsupervised learning ； representation learning

參考文獻

[1] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, Deepordinal regression network for monocular depth estimation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp.2002-2011, 2018.

Google Scholar

[2] D. Eigen, C. Puhrsch, and R. Fergus, Depth map prediction from a single image using a multi-scale deep network," in Advances in neural information processing systems, 2014, pp.2366-2374.

Google Scholar

[3] F. Liu, C. Shen, G. Lin, and I. Reid, Learning depth from single monocular images using deep convolutional neural fields," IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 10, pp.2024-2039, 2015.

Google Scholar

[4] C. Godard, O. Mac Aodha, and G. J. Brostow, Unsupervised monocular depth estimation with left-right consistency," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017,pp. 270-279.

Google Scholar

[5] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, Unsupervised learning of depth and ego-motion from video," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.1851-1858

Google Scholar

國際替代計量

幾何感知表示學習用於非監督式單眼深度估計

全文下載

主題瀏覽