透過您的圖書館登入
IP:18.225.35.81
  • 學位論文

以多視圖序列學習作基於圖像之三維模型跨域搜索

Cross-Domain Image-Based 3D Shape Retrieval by View Sequence Learning

指導教授 : 徐宏民
共同指導教授 : 黃寶儀

摘要


我們提出一個用於跨領域基於自然圖片之三維模型搜尋的方法,可端對端學習圖片及三維模型共同的特徵空間。我們可根據圖片和三維模型之相似度搜尋,相似度則可由二者在特徵空間中的距離求得。首先,我們提出一個三維模型的特徵抽取方法,稱為跨視圖卷積 (cross-view convolution, CVC)。跨視圖卷積將三維模型之不同角度的二維視圖特徵根據其順序結合,以得出三維模型的整體特徵。為拉近二維自然圖片特徵和三維模型特徵之間領域的差異,我們提出了跨領域 三元神經網路 (cross-domain triplet neural network, CDTNN)。該模型在類神經網路中加入一個轉換層,使得圖片特徵經過轉換後能直接與三維模型特徵比較。該模型可以端對端地訓練。最後,我們提出加速版本的跨領域三元神經網路訓練的方法,大幅減少訓練時間。為實驗模型有效性,我們建立了一個龐大的資料集,其中包含自然圖片和三維模型。實驗結果顯示,我們的方法勝過其他當前最好的方法。同時我們也實驗了各種不同的網路結構設計,以減少記憶體及計算資源的使用。

並列摘要


We propose a cross-domain image-based 3D shape retrieval method, which learns a joint embedding space for natural images and 3D shapes in an end-to-end manner. The similarities between images and 3D shapes can be computed as the distances in this embedding space. To better encode a 3D shape, we propose a new feature aggregation method, Cross-View Convolution (CVC), which models a 3D shape as a sequence of rendered views. For bridging the gaps between images and 3D shapes, we propose a Cross-Domain Triplet Neural Network (CDTNN) that incorporates an adaptation layer to match the features from different domains better and can be trained end-to-end. In addition, we speed up the triplet training process by presenting a new fast cross-domain triplet neural network architecture. We evaluate our method on a new image to 3D shape dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches in terms of retrieval performance. We also provide in-depth analysis of various design choices to further reduce the memory storage and computational cost.

參考文獻


[1] M. Allen, L. Girod, R. Newton, S. Madden, D. T. Blumstein, and D. Estrin. Voxnet: An interactive, rapidly-deployable acoustic monitoring platform. In IPSN, 2008.
[3] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014.
[4] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In ECCV, 2016.
[6] R. Girdhar, D. F. Fouhey, M. Rodriguez, and A. Gupta. Learning a predictable and generative vector representation for objects. In ECCV, 2016.
[8] A. Krizhevsky, I. Sutskever, and G. E. H. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

延伸閱讀