以多視圖序列學習作基於圖像之三維模型跨域搜索

我們提出一個用於跨領域基於自然圖片之三維模型搜尋的方法，可端對端學習圖片及三維模型共同的特徵空間。我們可根據圖片和三維模型之相似度搜尋，相似度則可由二者在特徵空間中的距離求得。首先，我們提出一個三維模型的特徵抽取方法，稱為跨視圖卷積 (cross-view convolution, CVC)。跨視圖卷積將三維模型之不同角度的二維視圖特徵根據其順序結合，以得出三維模型的整體特徵。為拉近二維自然圖片特徵和三維模型特徵之間領域的差異，我們提出了跨領域三元神經網路 (cross-domain triplet neural network, CDTNN)。該模型在類神經網路中加入一個轉換層，使得圖片特徵經過轉換後能直接與三維模型特徵比較。該模型可以端對端地訓練。最後，我們提出加速版本的跨領域三元神經網路訓練的方法，大幅減少訓練時間。為實驗模型有效性，我們建立了一個龐大的資料集，其中包含自然圖片和三維模型。實驗結果顯示，我們的方法勝過其他當前最好的方法。同時我們也實驗了各種不同的網路結構設計，以減少記憶體及計算資源的使用。

關鍵字

三維模型；卷積神經網路；三元神經網路；跨域度量學習

並列摘要

We propose a cross-domain image-based 3D shape retrieval method, which learns a joint embedding space for natural images and 3D shapes in an end-to-end manner. The similarities between images and 3D shapes can be computed as the distances in this embedding space. To better encode a 3D shape, we propose a new feature aggregation method, Cross-View Convolution (CVC), which models a 3D shape as a sequence of rendered views. For bridging the gaps between images and 3D shapes, we propose a Cross-Domain Triplet Neural Network (CDTNN) that incorporates an adaptation layer to match the features from different domains better and can be trained end-to-end. In addition, we speed up the triplet training process by presenting a new fast cross-domain triplet neural network architecture. We evaluate our method on a new image to 3D shape dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches in terms of retrieval performance. We also provide in-depth analysis of various design choices to further reduce the memory storage and computational cost.

並列關鍵字

3D Shape ； Convolutional Neural Network ； Triplet Neural Network ； Cross-Domain Metric Learning

參考文獻

[1] M. Allen, L. Girod, R. Newton, S. Madden, D. T. Blumstein, and D. Estrin. Voxnet: An interactive, rapidly-deployable acoustic monitoring platform. In IPSN, 2008.

[3] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014.

[4] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In ECCV, 2016.

[6] R. Girdhar, D. F. Fouhey, M. Rodriguez, and A. Gupta. Learning a predictable and generative vector representation for objects. In ECCV, 2016.

[8] A. Krizhevsky, I. Sutskever, and G. E. H. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

國際替代計量

以多視圖序列學習作基於圖像之三維模型跨域搜索

全文下載

主題瀏覽