透過您的圖書館登入
IP:3.15.174.76
  • 學位論文

基於線稿與使用區域分割導引之高解析度3D重建

Sketch-based High-resolution 3D Reconstruction with Regional Segmentation Guidance

指導教授 : 林奕成

摘要


在本論文中,我們提出了一種允許使用者繪製物體的線稿作為輸入並生成相對應高解析度三維模型的方法。 近年來,利用卷積神經網絡來進行三維模型重建已成為一個熱門的議題。此議題一般可分為三種類型:第一種類型的研究依靠卷積神經網路學習單一視角RGB圖像的特徵圖來生成最可能的三維模型,第二種類型的研究認為多視角圖像可以幫助卷積神經網絡從不同的視角學習到更多的資訊,其他研究使用了RGB-D深度圖像來獲取對象深度資訊。我們認為使用線稿作為輸入是對用戶最友善的方法。與繪畫彩色圖像或利用專業儀器獲取深度圖像不同,畫線稿對每個人來說是最直觀且最方便的方法。不過線稿是一把雙面刃,雖然它很容易繪製,但是它缺乏紋理特徵,這可能使卷積神經網絡學習更加困難。為了解決這個問題,我們不僅提供卷積神經網絡原始線稿作為輸入,並且利用Mask R-CNN提取區域語義標籤,以改善我們的卷積神經網絡學習。 我們研究的另一個目標是產生高解析度的模型。記憶體限制是三維模型重建中的一大挑戰。若是用一般方法產出低解析度的結果會使模型看起來粗糙,所以我們選擇使用八元樹生成網絡(OGN) 作為我們的生成網絡,並且修改其損失函數以提高我們的卷積神經網絡學習率。

並列摘要


In this thesis, we propose a method allowing users to draw an object’s sketch as input and generating corresponding high-resolution 3D volume of the object. 3D reconstruction by convolutional neural network has become a big issue in recent years, and it can be generally categorized into three types. The first type of research depended on learning single-view RGB image features to generate most likely 3D volume; the second type thought multi-view image can help network to learning more information from different angles of view; others used RGB-D to get object depth information. In our opinion, using sketch is the most user-friendly method. Unlike drawing colorful images or getting depth maps through professional instrument, drawing sketch is the most intuitive and the most convenience method for everyone. Sketch is a double-edge sword, it is easy to draw but it lack texture features which may make neural network learning more difficult. To solve this problem, we not only provide our neural network the original sketch as input but extract its semantic region labels by Mask R-CNN to improve our neural network learning. The other target of our work is generating high-resolution volume. The memory limitation is a big challenge in 3D reconstruction and it causes the result volume of low resolution and rough performance. We chose Octree Generating Network (OGN) as our generating network and modified the loss function to improve our network learning rate than that of the function.

參考文獻


R. Girdhar, D. F. Fouhey, M. Rodriguez, A. Gupta. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision (pp. 484-499). Springer, Cham, 2016.
C. B. Choy, D. Xu, J. Gwak, K. Chen, S. Savarese. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In European Conference on Computer Vision (pp. 628-644). Springer, Cham, 2016.
Y. L. Liao, Y. C. Yang, Y. F. Lin, P. J. Chen, C. W. Kuo, W. C. Chiu, Y. C. F. Wang. Learning Pose-aware 3D Reconstruction via 2D-3D Self-consistency. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3857-3861). IEEE, 2019.
X. Di, R. Dahyot, M. Prasad. Deep shape from a low number of silhouettes. In European Conference on Computer Vision (pp. 251-265). Springer, Cham, 2016.
J. Wu, C. Zhang, T. Xue, B. Freeman, J. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems (pp. 82-90), 2016.

延伸閱讀