具三維幾何學習能力之單目深度估測網路

在這篇論文中我們提出了一個輕量化的深層神經網絡，用來預測影像的深度資訊。在先前使用深度學習方式來預測影像深度的研究，大多只使用RGB 影像作為輸入來重建出深度圖，但是這樣的做法對於深度預測的準確性是十分有限的。因此在本論文所提出的方法中，我們將RGB 影像和相應的稀疏深度資訊作為輸入，提取不同尺度以及解析度的特徵來重建深度圖。透過利用稀疏深度信息，我們可以大幅地提高預測深度的準確性。除此之外，我們引進了多視圖學習的概念，計算了當前預測影像以及相對應的鄰近影像之間的光度一致性。透過影像之間的幾何限制來幫助網絡重建出更完整的深度圖。我們所提出的網絡透過RGB 影像以及對應的稀疏深度資訊，並且結合影像的幾何線索來有效地預測出充滿細節的精準深度圖。此外，我們使用此方法所預測出的深度圖來重建三維立體模型，以此展現出網絡的能力。即使是在難以重建的區域，像是:反光材質、黑色吸光物體等等，我們提出的網路也能夠很好地重建出立體場景。總結來說，本篇論文提出了一個神經網路，以RGB 影像和稀疏深度資訊作為輸入，並且結合了圖像幾何的資訊來重建出深度圖。我們提出的網絡所預測出的深度結果，不論是在數值上的表現，或是視覺化上的效果都能有很高品質的呈現，並且我們也透過實驗將網路能力測試在不同的數據集上，包括RGBD、SUN3D、MVS以及ETH3D。

關鍵字

深度學習；深度預測；稀疏深度資訊；學習幾何資訊

並列摘要

Abstract In this thesis, we proposed a convolutional neural network (CNN) for monocular depth estimation. In previous depth estimation works, various approaches only take RGB images as input to reconstruct dense depth map. These approaches may have a limit on the accuracy of the predicted depth value. In the proposed method, we take RGB image and corresponding sparse depth information as input, extract both multi-scale context features and multi-resolution spatial features to reconstruct dense depth map. By utilizing the sparse depth information, we can significantly improve the accuracy of prediction depth map. Moreover, we introduce the concept of multi-view learning to our network, compute the photometric consistency between reference and neighborhood views. It provides the geometry constraint and helps network to recover a more complete depth map. The proposed network can efficiently predict accurate depth maps full of details through sparse depth information and geometry cues. In addition, we use the depth map predicted by our method to demonstrate the network ability on 3D reconstruction task. The 3D point clouds can be reconstructed well even in ground-truthless areas, such as textureless and reflective materials. In conclusion, the proposed network takes RGB image and sparse depth information as input, and learns the geometry constraint to predict depth map. The proposed network provides dense depth map in both accurate depth values and high visualization quality on variety datasets, including RGBD, SUN3D, MVS and ETH3D.