透過您的圖書館登入
IP:3.16.83.150
  • 學位論文

基於空間感知卷積及形狀補全應用於擴增實境之三維語意分割

3D Semantic Segmentation based on Spatial-aware Convolution and Shape Completion for Augmented Reality Applications

指導教授 : 傅立成
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


三維室內場景語意分割在電腦視覺中是一項十分熱門的研究項目,對於許多應用程序而言,準確了解場景中每個點的類別非常重要,受益於深度學習的發展,已經有許多基於體素和點的神經網絡被提出來解決語義分割問題。但是,大多數都沒有充分考慮空間結構的資訊。 本論文的目的是希望能夠提出一個系統,利用帶有顏色資訊的場景點雲對整個室內場景做語意分割。基於空間資料的稀疏性,我們設計了一個新穎的空間感知稀疏卷積運算。我們使用物體存在的空間資訊編碼作為額外的特徵,並使用自我注意力機制有效整合不同資訊;此外,我們引入了補全網路對分割網路的結果做修正,使場景中每種物體可以得到更加合裡和完整的形狀。透過以上兩點方法,我們建立準確的場景語意分割網路,獲得整個場景的屬性分類。 在實驗的部分,我們使用兩個公開的資料集進行定量和定性的分析,首先我們對不同配置的模型進行比較,證明提出方法的有效性;其次將與其他最先進的方法進行比較,證明提出方法的優越性;最後則是一個實際應用的分析,說明具體的應用性。我們期望提出的三維場景語意分割系統能夠為實際應用提供準確而迅速的結果。

並列摘要


3D semantic segmentation of indoor scenes is a popular research topic in the field of computer vision. For many applications, it is very important to know exactly what category each point in the scene belongs to. Benefiting from the development of deep learning, many neural networks based on voxels and points have been proposed to solve semantic segmentation problems. However, most of them don't fully consider the information of the spatial structure. current voxel-based sparse convolutional neural networks can effectively extract the 3D features in the space. However, it assumes that the feature in empty space is zero, causing the information loss of the spatial structure. In this thesis, we propose a system that uses scene point clouds with color information to semantically segment an entire indoor scene. Based on the sparsity of spatial data, we design a novel spatial-aware sparse convolution operation. We encode the spatial information of the object's existence as an additional feature and use the self-attention mechanism to effectively aggregate features. In addition, we introduce a completion network to refine the results from the segmentation network, so that each object in the scene can get a more reasonable and complete shape. Through the above two methods, we build an accurate scene semantic segmentation network to obtain the semantic information of the entire scene. In the experimental part, we use two public datasets to perform quantitative and qualitative analysis. We compare our results with other state-of-the-art methods to prove the superiority of the method. Also, we examine our models with different configurations to assure the effectiveness of the proposed method. Finally, An real application is introduced to demonstrate our work. We expect that the proposed 3D scene semantic segmentation system can provide accurate and fast results for practical applications.

參考文獻


[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition, 1998.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.
[3] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2015.
[4] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

延伸閱讀