透過您的圖書館登入
IP:3.145.96.102
  • 學位論文

於行動裝置上之實境即時影像辨識

Real-time Mobile Visual Object Recognition in Real Scene

指導教授 : 陳銘憲

摘要


多媒體影音(Multimedia Data)已經成為巨量資料(Big Data)中很重要的一部份。在我們的日常生活中,就會接觸到許多的多媒體影音。例如:社群網站上的影音(social media)、監視器影片和行車紀錄器的影像、以及醫療用影像等等。這些巨量增長的多媒體影音,已經無法使用傳統的方法做有效的處理。因此,如何及時的在多媒體影音中找到重要的、有價值的資訊,是我們未來的重要課題。近年來,隨著行動裝置的普及與穿戴式裝置的輔助,擴增實境(Augmented Reality)的應用顯得更為可行。舉例來說,一般使用者在逛街的時候,只要將手機的相機對著街景,便可以即時取得街上所有店家、景物相關的資訊。然而,這些應用的達成都必須仰賴一個不可或缺的技術,那就是在現實環境中的即時影像和物品辨識。 我們在這篇博士論文中,提出了一個在行動裝置上的實境物品辨識系統(SAGRO)。由於結合了網格表示方法(grid based representation)與結構學習(structured learning),並考慮了不同網格間彼此空間上的關係,我們所提出的系統能夠更精確、及時的辨識出實境物品。此外,基於物品辨識的技術,我們還提出兩個能協助使用者日常生活的應用。首先,我們還提出了一個線上商品搜尋及推薦系統(UbiShop),讓使用者看到喜歡的商品時,只要拍張照就能夠獲得該商品的相關資訊,以及外型相似的商品推薦清單,作為購買時的選擇。其次,我們利用物品辨識的技術,來找尋街上的興趣區域(Interesting Regions),並用這些興趣區域來改善車用導航的提示,藉此避免駕駛在複雜的路口轉錯彎。

並列摘要


The popularization of mobile devices and the advancement of wearable devices make the augmented reality (AR) scenarios become feasible. However, the success of AR applications relies on a key technique, real-time visual object recognition in real scene. Therefore, in our dissertation, we developed a framework called SpAtialized Grid based structured learning for Real-scene Object recognition (SAGRO). The proposed SAGRO is not only able to locate the visual objects precisely but also achieves real-time performances. Based on the techniques of mobile visual object recognition, we presented two applications to improve user experiences in their daily life. First, we proposed a commercial item retrieval and recommendation system, UbiShop, on mobile phones, whereby users can timely get the related information of interesting commercial items by taking pictures of them. Users can also obtain recommendations on visually similar commercial items to help their buying selections. Moreover, observing the fact that more than 63 percent of the drivers in the United States in 2013 have been led astray because of receiving confusing GPS driving instructions, we presented a more intuitive driving instruction, iNavi, by detecting interesting regions from the sight of vehicle drivers to help them quickly and correctly recognize the turning points.

參考文獻


[1] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. Frequency-tuned salient region detection. In CVPR, pages 1597–1604, 2009.
[4] B. Alexe, T. Deselaers, and V. Ferrari. Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell., 2012.
[5] M. Andriluka, S. Roth, and B. Schiele. Monocular 3D pose estimation and tracking by detection. In CVPR, 2010.
[6] P. Arbeláez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik. Semantic segmentation using regions and parts. In CVPR, pages 3378–3385, 2012.
[7] R. C. Atkinson et al. The Control Processes of Short-term Memory. Stanford University, 1971.

延伸閱讀