透過您的圖書館登入
IP:3.149.243.32
  • 學位論文

利用動態圖卷積之基於RGB 影像之三維場景重建應用於擴增實境

3D Scene Reconstruction from RGB Images Using Dynamic Graph Convolution for Augmented Reality

指導教授 : 傅立成
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,三維場景重建的研究在電腦視覺的領域中逐漸興起了熱潮。三維場景重建旨在重建一個場景中的佈局以及場景中物件的形狀和姿勢。在虛擬實境及擴增實境中,如何與周圍環境進行更有效的互動與機器對於場景的了解有著密不可分的關係。然而,目前絕大多數的相關方法僅能重建具有普遍外觀的物體,而缺乏對外觀差異很大的物體的適應。 在這篇論文中,我們提出了一個可以僅從RGB圖像同時且端到端地重建場景中的三維佈局、物體姿勢和形狀的場景重建系統。通過推導出粗略的深度信息,我們可以在三維物體檢測中生成質量更好的三維物體姿態。基於網格點的特性,我們設計了一個可以適應非常不同類型的物體的形狀的架構,基於物體表面生成中的圖卷積網絡,它可以在局部和全局聚合一個節點鄰居的特徵,我們另外提出損失函數限制學習過程。此外,我們提出了一種場景合併策略,以利用來自不同時間戳的不同視圖已得到更全面的場景環境,使其更適合增強現實。 在實驗中,我們在兩個公開的數據集上對我們所提出的系統進行性能的分析與評估。我們將實驗的結果與最先進的方法進行定性和定量地比較,以此證明我們的方法所取得的卓越性。

並列摘要


The popularity of 3D scene reconstruction has grown in recent years. 3D scene reconstruction aims at reconstructing the object shape, object pose, and the 3D layout of the scene. In the field of virtual reality and augmented reality, to know the best of the environment can facilitate people to effectively interact with their surroundings. However, most of the existing works can only reconstruct the objects with common appearances but lack adaption to objects with very different looks. In this thesis, we develop a holistic scene reconstruction system that can end-to-end and also simultaneously reconstruct the 3D scene layout, 3D object pose, and the surface of objects in the scene from only RGB image. By deriving the coarse depth information in prior, we can generate the 3D object poses with better quality in 3D object detection. Based on the property of mesh points, we design an architecture which can be adaptive to very different types of objects based on the graph convolution network for object surface generation, which can aggregate the features of every node and of its all neighbor nodes both locally and globally. Loss functions are also proposed to constrain the learning process. Moreover, a scene-merging strategy is proposed to obtain a more comprehensive reconstructed scene utilizing different views at different time stamps which makes it more suitable for augmented reality. In the experiments, we evaluate the performance of our proposed system on two public datasets. We compare all the obtained results with those of the state-of-the-art methods qualitatively and quantitatively, which demonstrates the superiority of our method.

參考文獻


[1] J. W. Alexander. On the deformation of an n cell. Proceedings of the National Academy of Sciences of the United States of America, 9(12):406, 1923.
[2] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
[3] Y. Chen, S. Huang, T. Yuan, S. Qi, Y. Zhu, and S.­C. Zhu. Holistic++ scene understanding: Single-­view 3d holistic scene parsing and human pose estimation with human­-object interaction and physical commonsense. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8648–8657, 2019.
[4] Z. Chen and H. Zhang. Learning implicit fields for generative shape model­ing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[5] W. Choi, Y.­W. Chao, C. Pantofaru, and S. Savarese. Understanding indoor scenes using 3d geometric phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 33–40, 2013.

延伸閱讀