透過您的圖書館登入
IP:18.189.180.76
  • 學位論文

即時人體動作辨識系統之演算法開發與架構設計

Algorithm and Architecture Design for Real-time Human Action Recognition

指導教授 : 陳良基
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


電腦視覺的相關研究已經進行多年,並徹底地改變了每個人的生活。幸虧科技的進步,使我們進入巨量資料與智慧型裝置的時代。有了機器學習演算法的幫助,電子產品能夠自動從網路等巨量資料庫學習有用的知識,進行自我校正與進步。電腦視覺與機器學習的結合帶來許多不同的應用,使我們的生活更加迅捷與方便。電腦視覺的終極目標是發明一個智慧型機器人,使得此機器人能夠和具有與一般人無異的感知與互動。而要達到此終極目標的第一步則是:使得機器能夠解讀動態影片背後所代表的實質意義。 比起靜態的影像,擁有時空資訊的動態影片往往蘊含更多的知識。因此,人體動作辨識的應用則成為機器人視覺最重要的基礎之一。動態影片所包含的各種變化大幅增加了分析的難度,使得許多研究學者專注於提高動作辨識的準確度。然而,在過去的研究中,從動態影片中取出特徵值的演算法依然太過複雜以致於難以達到即時。 在此論文中,我們首先介紹一些電腦視覺的基礎與應用。這些角點偵測和特徵選取的演算法非常重要,是所有視覺辨識工作的根基。將平面的辨識工作擴展到具有時空的三維空間裡,我們將面臨額外的挑戰。藉由比較和分析各種不同的演算法之優缺點,我們決定使用區域性時空特徵來處理動作辨識。考慮到系統的效率以及準確度,我們使用MoFREAK特徵來穩固地描述動作影片。MoFREAK分別以FREAK來描述靜態資訊、MIP來描述動態資訊,通過許多資料組的驗證,可以證明是一種表現良好的影片描述子。藉由分析整個系統之中每個步驟所花的時間,我們做實驗來證明此系統可以即時運作。接續的是以不同的資料組來驗證整個系統的效率與準確度之實驗結果,從中亦可以看出此系統之實用性。最後,將此系統應用在實際的影片之結果亦描述在最後一章。我們開發了一種創新的滑動式累計直方圖架構來因應動作會持續改變的影片,並探討各種參數的影響。 總結而言,我們設計了一個即時線上的人體動作辨識系統。此系統利用快速的特徵選取、匹配演算法,以及線上的架構讓它可以立即地判別人體動作。

並列摘要


Computer vision has been developed for decades, and has totally changed our lives. Thanks to the progress of technologies, we have entered the era of big data and smart devices. With the help of machine learning algorithms, electronic devices are able to learn knowledge from big data such as the Internet. The combination of computer vision and machine learning has also brought a large amount of applications, making our lives more convenient. The ultimate goal of computer vision is to invent a brilliant robot which perceives and interacts just like human-beings. Understanding the semantic meaning behind videos is the first step toward this goal. Rather than still images, videos including spatial-temporal information imply richer knowledge. Therefore, human action recognition becomes a basic application that can be implemented in the vision of robots. The fact that different variations in videos increases the difficulty of analysis, leading many researchers to develop better algorithms aiming at raising the recognition accuracy on datasets. However, the computation complexity of feature extraction in videos is still too complicated to be real-time in past researches. In the thesis, we first introduce some applications and fundamental functions of computer vision. These algorithms, such as corner detection and feature extraction, are very important since they construct the basis of recognition task. Adapting the 2D successful object recognition framework into 3D videos, we face additional challenges. Comparing several related algorithms and examining the pros and cons of each method, we choose to use space-time local features in our approach. Considering both the efficiency and accuracy, MoFREAK feature is extracted to generate robust descriptors of action videos. MoFREAK is a feature combining the appearance model and motion model independently. We characterize static information by FREAK and dynamic information by MIP, and show good performance through datasets. Analyzing the computation time of entire procedure, it is feasible for real-time applications. Then the experiment results of proposed action recognition system is described thoroughly to prove the robustness and efficiency. Finally, to adapt to high resolution and online situations, an innovative sliding histogram scheme is developed. To sum up, a real-time online action recognition system is designed. We can recognize different actions instantly due to the proposed fast feature extraction and matching algorithm.

參考文獻


[10] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-up robustfeatures (surf)," Computer vision and image understanding, vol. 110,no. 3, pp. 346-359, 2008.
[11] J. Aggarwal and M. S. Ryoo, "Human activity analysis: A review,"ACM Computing Surveys (CSUR), vol. 43, no. 3, p. 16, 2011.
[13] I. Laptev, "On space-time interest points," International Journal ofComputer Vision, vol. 64, no. 2-3, pp. 107-123, 2005.
[15] K. Soomro, A. R. Zamir, and M. Shah, "Ucf101: A dataset of101 human actions classes from videos in the wild," arXiv preprintarXiv:1212.0402, 2012.
[20] M.-y. Chen and A. Hauptmann, "Mosift: Recognizing human actionsin surveillance videos," 2009.

延伸閱讀