透過您的圖書館登入
IP:18.216.57.207
  • 學位論文

基於永續深度學習機制之六自由度相機定位研究

Continual Lifelong Learning of Deep Convolutional Networks for 6-DOF Camera Relocalization

指導教授 : 王凡
共同指導教授 : 陳祝嵩(Chu-Song Chen)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


視覺定位對於許多電腦視覺的應用是相當重要的,例如:自動駕駛、虛擬實境、導航等。深度學習技術已經在多種電腦視覺領域(包含物件偵測、影像分類、動作識別等)展現顯著的成效。近年來,深度神經網路也被運用在從單一影像當中估計六自由度的相機姿態(位置和方向)。傳統上,需要一個深度模型去學習一個場景的定位資訊。然而不同場景之間時常會有共同的特徵,假若單獨訓練各個模型在各場景上,模型之間將無法彼此分享這些從特徵當中抽取出的知識。這些學習過的資訊有機會能幫助新場景的定位預測。除此之外,這些多個獨立訓練的模型對於行動裝置上的應用來說是一種負擔。 本論文提出一個持續學習對於視覺定位問題的方法,能有效學習單一個緊實的模型來預測多個場景的相機姿態。在學習的過程中,已經學習過的場景資訊並不會被遺忘,而從前面場景抽取出的知識,可以重複被利用。就我們所知,本論文是第一篇將永續學習和視覺定位做結合的研究。實驗結果顯示可以有效漸進地學習多個場景姿態,且獲得更好的定位精準度相較於個別場景訓練。為了更貼近真實情況,我們在臺灣大學校園中蒐集一個大規模的資料集來測試我們的方法。

並列摘要


Visual localization, which aims to acquire accurate camera pose estimation in a known scene, is important in many computer vision applications such as autonomous driving, virtual reality, navigation. Deep learning have achieved dominant success on variety of computer vision researches including object detection, image classification, action recognition and so on. Recently, deep neural networks are also applied for estimating six degrees of freedom (6-DOF) camera pose (position and orientation) from a single image. Traditionally, one deep model is required to learn a scene location. However, it is common that multiple scenes share some common features. By individually training each scene, the models are unable to share the knowledge extracted from features with one another. These learned information may assist in the prediction on new scenes. Additionally, these independent models may be burdens for mobile devices. This thesis propose a continual learning approach for visual localization problem, which can effectively learn a compact model for estimating the pose among multiple scenes without forgetting. During the training process, the knowledge learned from the scenes can also be reused. To the best of our knowledge, this thesis is the first work that combines continual learning and visual localization. Experimental results show that our method can incrementally learn a compact model for multiple scenes with more better accuracy than individual scene training. To be close to real situation, we gather a large scale dataset in National Taiwan University campus to benchmark our approach.

參考文獻


[1] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars. Mem- ory aware synapses: Learning what (not) to forget. In V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, editors, Computer Vision – ECCV 2018, pages 144–161, Cham, 2018. Springer International Publishing.
[2] R. Aljundi, P. Chakravarty, and T. Tuytelaars. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3366–3375, 2017.
[3] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[4] V. Balntas, S. Li, and V. Prisacariu. Relocnet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
[5] E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother. Dsac - differentiable ransac for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

延伸閱讀