透過您的圖書館登入
IP:3.144.206.193
  • 學位論文

360MVSNet:基於360°影像之多視角立體視覺深度模型

360MVSNet: Deep Multi-­View Stereo Network with 360° Images

指導教授 : 莊永裕
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


多視角立體視覺的目標是透過多張影像以及相對應的相機參數,還原場景的三維資訊。近年來,隨著深度學習的發展,許多論文在多視角立體視覺的題目上取得優異的成果。然而,在還原大型場景時,現有的方法會需要較多的勞力以確保取得的影像之間有足夠的重疊。因此,我們提出一個新的想法,使用全景影像作為多視角立體視覺的輸入以推斷場景的三維幾何資訊。全景圖的優點在於它們能夠獲得完整的環境訊息,並在單張影像中提供較廣泛且連續的資訊。為此,我們提出360MVSNet,一個用於360°影像的多視角立體視覺深度學習模型。為了使訓練的過程能夠考量到360°相機提供的幾何資訊,我們提出球型掃描的方法,根據所假設的深度將影像特徵投影到不同半徑的球體上做計算。透過多尺度的立體成本容積以及測量每個尺度模型的不確定性,我們能夠階段性的預測影像的深度,並生成高解析度的深度圖。除此之外,我們建立一個大型的合成資料集EQMVS,它包含50000張左右的RGB影像、深度圖以及相機參數。透過實驗結果證明,我們的模型在測試資料集以及真實世界的場景都能較完整的還原整個場景,同時在數據上超越其他的方法。

並列摘要


Recent works on multi-view stereo that estimate the dense representation of a scene have achieved promising performance with the growth of deep learning techniques. However, it becomes labor-intensive for these methods when constructing large-scale scenes since they need to ensure the input images with normal field-of-view have enough visual overlap. Therefore, we propose using 360° images to infer the 3D geometry since they can capture the entire environment and provide broad and continuous information in a single image. To this end, we present 360MVSNet, a deep learning network for multi-view stereo with 360° images. To embed the 360° camera information into the training process, we propose a spherical sweeping module that warps image features onto virtual spheres with different depth hypotheses to form the cost volumes. We construct multi-scale cost volumes with uncertainty estimation that predict the depth in a coarse-to-fine manner to generate high-resolution output. In addition, we build up EQMVS, a large-scale synthetic dataset for training and testing multi-view stereo with 360° images. Our dataset consists of 50K RGB images, depth maps, and corresponding camera information. Experimental results on the synthetic dataset and a real-world scene show that our model can produce complete reconstructions and outperform classical and learning-based methods on large-scale datasets.

參考文獻


H. Aanæs, R. R. Jensen, G. Vogiatzis, E. Tola, and A. B. Dahl. Large­scale data for multiple­view stereopsis. International Journal of Computer Vision, pages 1–16, 2016.
I. Armeni, S. Sax, A. R. Zamir, and S. Savarese. Joint 2d­3d­semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017.
N. D. Campbell, G. Vogiatzis, C. Hernández, and R. Cipolla. Using multiple hypotheses to improve depth­maps for multi­view stereo. In European Conference on Computer Vision, pages 766–779. Springer, 2008.
D. Cernea. OpenMVS: Multi­view stereo reconstruction library. 2020.
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from rgb­d data in indoor environments. International Conference on 3D Vision (3DV), 2017.

延伸閱讀