透過您的圖書館登入
IP:3.22.249.158
  • 學位論文

動作辨識之三維梯度方向直方圖架構設計

Architecture Design of Histograms of 3D Gradient Orientations for Action Recognition

指導教授 : 陳良基

摘要


隨著傳統數位影像處理越來越成熟,另一個影像處理領域關於教導機器以人類的方式去看周遭世界的事物變得越來越熱門,這個領域就叫做電腦視覺。我們在第一章節介紹許多電腦視覺的應用,一些較低階的應用如物件辨識和語意切割,這些應用可以去實現更高階的應用,例如:智慧監視器系統,自動駕駛汽車,機器人,其中我們發現動作辨識是許多應用的核心技術,如果搭配動作辨識功能,攝影機可以分辨出緊急事件的發生已通報當局,自動駕駛汽車可以知道行人的速度已決定要加速或停下來,還有許多應用需要擁有動作辨識的功能,所以我們想要提供一個動作辨識的硬體架構去解決動作辨識所遇到的問題,讓電腦視覺領域再向前邁進。 我們瀏覽許多動作辨識相關的方法並將其分成三類。在瀏覽的過程中,我們發現兩個指標能判斷動作辨識系統的好壞,首先是辨識動作的準確度,再來是系統的執行速度,大部分的動作辨識方法可以得到蠻好的準確度,但是需要花費相當長的時間以致於無法即時運算,即使是處理低解析度的影片也是如此,我們在第一章節和第二章節詳細說明這個概念,而這也是為什麼我們要將動作辨識時坐在ASIC的原因,我們的目標是高幀率特徵抽取引擎應用在穿戴式裝置上,相關規格在第二章做定義,相較於其他類似的作品,我們的規格是最高的。 當我們比較過動作辨識中的特徵抽取方法後,我們決定採取HOG3D當作我們的特徵,但是原始的HOG3D演算法並不適合硬體實作,所以我們對不同參數做實驗以及更改演算法以適合硬體實作,這些內容在第三章節,有了這些結果之後我們在第四章節提出硬體架構設計,主要的貢獻在於移除演算法中非線性運算的部分使得更多資料可以重複使用,另外,也採用了平行運算的技巧去達到即時運算,和運算資源共享以減少硬體面積。在第四章節最後,我們分析晶片上記憶體和系統頻寬的取捨,也比較我們所提出的四種硬體架構設計。

並列摘要


With the traditional digital image processing technologies becoming more and more mature, another image processing field which is about teaching machines to see things in the real world like human doing has become more and more popular. This field is called computer vision. In chapter 1, we introduce a lot of application of computer vision. Some lower level applications are object recognition and semantic segmentation. These techniques make higher level applications like intelligent surveillance system, self-driving car and robot becomes realizable. We can find the fact that action recognition is a core key technique for these applications. With action recognition, surveillance can tell urgent events and call the authorities and self-driving car can know the speed of pedestrians to decide to accelerate or stop. So many fascinating applications need action recognition on them. So, we decide to provide a hardware architecture to solve the problems in action recognition to make advance in computer vision field. We have surveyed lots of related work to recognize human actions and categorized these methods into three categories. During our survey of papers about action recognition, we find two critical issue to check whether an action recognition system is good enough. One is the accuracy of the system and the other is the processing time of the system. For most algorithms of action recognition, the accuracy of them is high enough while the processing time can not reach the real-time requirement even for low resolution video sequences. In chapter 1 and chapter 2, we state these concepts in detailed and they motivate us to implement action recognition on ASIC. Our target is a high frame rate feature extraction engine for wearable devices. The specification of our hardware is defined in chapter 2 and it is the highest specification compared with other similar works. After we compare some feature extraction methods for action recognition, we choose HOG3D descriptor as our feature. However, original algorithm of HOG3D descriptor is not hardware-friendly. Therefore, we do experiments to choose parameters and modify original HOG3D algorithm to make it more hardware-friendly in chapter 3. Following the results we get from chapter 3, we propose our architecture design in chapter 4. The main contribution comes from removing non-linear operations in algorithm making more data reuse. Besides, the technique of parallel computing and source sharing help to reach real-time requirement and reduce chip area. At last of chapter 4, we analysis the trade-off between on-chip memory and the bus bandwidth and compare engines of four versions we have proposed.

參考文獻


[1] T. Lan, M. Raptis, L. Sigal, and G. Mori, “From subcategories to visual composites: A multi-level framework for object detection,” in International Conference on Computer Vision (ICCV), 2013.
[4] M. Humphries, “Googles new self driving car has no steering wheel.” http://www.geek.com/news/
[10] G. Johansson, “Visual perception of biological motion and a model for its analysis,” Perception & Psychophysics, vol. 14, no. 2, pp. 201–211, 1973.
[12] A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 3, pp. 257–267, 2001.
[14] R. Polana and R. Nelson, “Low level recognition of human motion (or how to get your man without finding his body parts),” in Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on, pp. 77–82, IEEE, 1994.

延伸閱讀