  • 學位論文


Deformation invariant surface matching technique based on deep object detection

指導教授 : 黃思皓 陳安斌


物件偵測是一門在生活中可見的技術,而其中一個常見的應用即為擴增實境(Augmented Reality,AR)。 擴增實境得實時地提供對於現實世界中諸多事物的資訊,諸如車用輔助駕駛系統、體育賽事直播等皆可見其應用; 而背後物件偵測技術的性能將直接地影響擴增實境的效果。 近年來受益於深度學習技術發展,用於找尋常見物品分類的技術有長足進步。 然而,對於特定應用下用於偵測指定圖樣之相關技術則沒有對應的發展。 本研究旨在改善擴增實境應用的效果,其提出一基於深度學習之新穎的圖片偵測技術,得以在任意畫面中找尋指定的樣板圖片並同時估算有效的可視影像範圍。 此技術採用關鍵點(keypoint)配對的方法來進行圖像偵測,本方法以取自樣板圖片與目標畫面的關鍵點與描述符(descriptor)作為輸入,並輸出合理的關鍵點配對清單。 而其所列之合理配對清單,係在符合二維多重樣條函數(polyharmonic spline)近似下,得以精確配對的內點(inlier);在此策略下,本方法能承受樣板圖片之旋轉、位移,縮放、變形乃至遮蔽。 本研究提出兩大貢獻: 其一,本方法得以在樣板圖片被扭曲的條件下偵測目標。在如此受限的條件下,本方法仍可達成相當高的匹配精度(precision),而對於錯誤的配對亦維持較低的誤差,此成效使得本方法得以應用於擴增實境並表現比過往更佳的呈現效果。 其二,此方法得以使用人為建立的資料集進行訓練,並轉移模型用於自現實收取的影像;此成效使模型訓練的成本得大幅降低。


Object detection is an essential task that has several usages in our life. One of them is augmented reality, which could enrich our lives by providing the information and visualizing virtual contents in the real world. Previous works on object detection achieve a notable accuracy in discovering common classes of objects. However, it still lacks the practical techniques to detect the specific pattern. We develop the spline network, a deep learning based surface matching method, to detect the known template pattern in the unknown scene, regardless of its translation, rotation, scaling, deformation and obscuration. The spline network consumes the paired keypoints and descriptors as input and gives the list of inlier pairs based on the polyharmonic spline interpolation. This system benefits from two properties: First, the keypoint based detection technique natively integrated with the tolerance of the obscuration. Second, the spline-based error function enriches the model with the capacity to figure out the correspondence on a deformed object. This work is designed to practice the object detection for the augmented reality. It not only performs the keypoint matching, but it could also estimate the visible area of the template pattern, as it is necessary to give a clear boundary on rendering in usage. Further, we introduce a data simulation framework. It profits from using the generated data on training the model. This work could sufficiently reduce the difficulty in collecting the training data.


[1] M. Wikia, “J.A.R.V.I.S.” http://marvel-movies.wikia.com/wiki/J.A.R.V.I.S., 2018, [On- line; accessed 21-February-2018].
[2] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR09, 2009.
[4] S.L.Kim,H.J.Suk,J.H.Kang,J.M.Jung,T.H.Laine,andJ.Westlin,“Usingunity3dto facilitate mobile augmented reality game development,” in Internet of Things (WF-IoT), 2014 IEEE World Forum on. IEEE, 2014, pp. 21–26.
[5] W. Lee, W. Woo, and J. Lee, “Tarboard: Tangible augmented reality system for table-top game environment,” in 2nd International Workshop on Pervasive Gaming Applications, PerGames, vol. 5, no. 2.1, 2005.
