透過您的圖書館登入
IP:18.222.69.152
  • 學位論文

使用深度資訊之即時人體動作辨識系統演算法開發與架構設計

Algorithm and Architecture Design Using HON4D for Online Human Action Recognition

指導教授 : 陳良基
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


電腦視覺的相關研究已經進行多年,並徹底地改變了每個人的生活。幸虧科技的進步,使我們進入巨量資料與智慧型裝置的時代。隨著電腦視覺的相關研究發展,相關的創新應用徹底改變了每個人的生活,使得我們的生活更加便捷與方便。電腦視覺的終極目標是發明一個智慧型機器人,使得機器人能像人類一般理解真實世界的資訊。而要達到此終極目標的第一步則是:使得機器能夠解讀動態影片背後所代表的實質意義。 人體動作辨識的應用為機器人視覺最重要的基礎之一,動態影片蘊涵時空間的資訊,隨著深度感測器的發展,在日常生活中更易取得深度資訊,且深度資訊提供更多幾何形狀的資訊,使得動作辨識相關研究更往前邁進。在這篇論文中,我們提供使用深度資訊之即時人體動作辨識系統。我們呈現即時自動切割深度影片的方法,搭配全圖的法向量累計直方圖來描述深度影片。 最後,我們呈現了一個全新並適合硬體實現的架構,包含特徵值擷取引擎以及更新累計直方圖引擎。根據實際運行時間的分析,特徵值擷取部分是最花費時間的,因此,我們使用不同化簡技巧實現了特徵值擷取,同時比較不同更新累計直方圖引擎的硬體架構,包括直接滑動累計直方圖、優化之直接滑動累計直方圖以及我們所提出的演算法。 整體的來說,我們發展出了一個使用深度資訊且可以即時辨識人體動作的系統,同時我們提出可以減少記憶體用量以及頻寬的硬體架構。

並列摘要


The ultimate goal of computer vision is to help computing devices understand the real world, process visual information efficiently, and even have semantic understandings like humans do. Nowadays, computer vision algorithms progressed rapidly, and developed plenty innovative applications. For example, intelligent environmental surveillances of the future are capable of monitoring real environments, including objects and people. Through the release of Kinect, 3D sequences become more accessible, and push researches forward to the ultimate goal. In the past few years, various methods have been proposed to solve the problem of human activity recognition from depth images. Compared with traditional 2D videos, depth sequences provide geometrical information, and therefore can better describe the scenes. In this thesis, we aim to provide an online action recognition system using 3D data. Since depth sequences are captured with a single commodity camera, noise and occlusion are common problems. In order to deal with these issues, we extract histogram of oriented 4D surface normal (HON4D) features, which can capture the joint shape-motion cues in the depth sequence. Moreover, we present an automatic segmentation method for online recognition of depth sequences. The overall framework is mainly separated into two parts, feature extraction engine, and histogram engine. According to our run-time profiling, feature extraction is the most time-consuming part. Therefore, HON4D feature extraction is implemented with several approximation techniques while maintaining its performance. Furthermore, we discuss three online action recognition architecture using HON4D features. These online action recognition architectures are based on direct sliding window, modified cell-based sliding window, and our proposed algorithm. In sum, we implement HON4D feature extraction to optimize the most time-consuming part in our proposed system. Furthermore, an online action recognition framework is proposed. Compared with other sliding window methods, our framework is favored for lower memory consumption, and also bandwidth.

參考文獻


models for object detection," in Computer Vision–ECCV 2012,
[2] R. Shapovalov, Object detection vs. semantic segmenta-
Pattern Analysis and Machine Intelligence, IEEE Transactions on,
[8] Y. Tu, C.-L. Zeng, C.-H. Yeh, S.-Y. Huang, T.-X. Cheng, and M. Ouhy-
oung, Real-time head pose estimation using depth map for avatar

延伸閱讀