高幀率光流演算法引擎硬體架構設計

隨著科技的日新月異，電腦視覺的演進逐漸改變我們生活習慣，像是3D電影的體驗、VR在遊戲及醫療、人才培訓等處開啟了許多應用情境，方便人類的生活，以及類似於Google glass裝置，讓我們在看其他生活周遭的景物時，可以立即獲取相關資訊。我們認為，在未來的科技發展，如利用生活周遭的監視系統可以預測人的動作，進而預防社會案件的發生；VR應用不僅僅只是觀賞，未來會結合與虛擬物件、環境的互動，若有動作預測這項技術，會讓使用者有更流暢的體驗；最後論及到目前正在開發階段的機器人以及無人車駕駛，兩者皆可以藉由外部相機偵測來預測人的行為，偵測到人類快要跌倒或有危險行為即將發生時，可以做處理以避免意外的發生，要完善上述這些未來可能發生的應用情境，精準且快速的動作預測是我們團隊認為未來不可或缺的技術。動作預測又可視為及早的動作辨識，像是不需要完整的動作，我們可以很輕易預測接下來動作的發展，論文第一章講述近年來動作辨識及預測的發展，進而推論出精準度及運算速度為動作預測最大的挑戰，以目前研究來說，精準度逼近百分之百，但在運算速度上，連Real-Time的需求都很難完成，更何況要達到預測所需的規格，其中，又以動作預測演算法中的光流運算最為耗時，占用整個系統一半以上的運算時間，故本篇論文提出「High Frame Rate Optical Flow Engine」，每秒可處理240張HD影像，是目前最高規格的光流演算法硬體加速器，不僅僅遠超過動作預測或辨識所使用的測資，也提供各類需要光流運算的演算法使用。光流運算主要面臨到窺孔問題，會利用到Grouping方式，比對鄰近的像素點以增加信心程度，然而此做法會遇到若比對框中有位移向量不一致的情況，會導致錯誤產生，論文第二章介紹到近年來有非常多演算法被提出以解決此問題，然而這些方式對於硬體來說不易優化，論文第三章比較我們所提出的演算法與其他演算法的差異，於精準度上差異不大，但在實作硬體上卻可以大幅優化。論文第四章介紹硬體架構面積最佳化部份，我們提出運算重複使用、資料重複使用、位元化簡、查表化簡等方式，在精準度稍微降低的情況下，我們的架構使用到的硬體資源降為原本的百分之20，大幅減少製作成本以及耗電量，以滿足可以掛載至移動式裝置上的功能。

關鍵字

電腦視覺處理；光流運算；高幀率架構設計；有效率面積優化架構設計

並列摘要

Computer vision has been developed for decades, and has totally changed our lives. Thanks to the progress of technologies, we have entered the era of big data and smart devices. There are lots of new technologies, like 3D printers, wearable devices, light field refocus cameras and so on, been invented in recent years. We introduce the technology trends for past five years in Chapter. 1, we think the "Action Prediction" might be one of computing cores in many applications. Like home care robots, automatic cars, auto surveillance system... The action prediction is irreplaceable that can help robots assist people avoid accident happen. Predicting human action can dramatically reduce the accident rate, like the self-driving cars can promptly stop when detecting human is going to fall in the street. This is the motivation of our thesis, we hope our work can help the current technology move further. Then we survey the researches about action prediction and also pro filing the full system in Chapter. 1. We find that both of accuracy and computing speed are the critical parts to bring this technology into world. However, the full action prediction system may take more 2 minutes for processing. The most time consuming stage is the "optical flow computing", it takes about at least 1 minute for an VGA image, more than fifty percent of full system. The calculation speed is far from early determining the action as action prediction. So our work is providing a "High Frame Rate Optical Flow Engine Chip" to accelerate this basic but complicate computation. We introduce the difficulties of optical flow and related works of both of algorithms and architectures. The specification is explored in Chapter. 4, and it is the highest one comparing with other works in recent years. In Chapter. 3, we show the core idea that modifies the original optical flow algorithm to hardware friendly without losing accuracy. We test lots of conditions to ensure the accuracy of modified one is kept as original one. In short, we take the simple filter in the most complicate stage, but use complex filter in other stage to supplement the results. We show the ideas and the details of mapping algorithm to architecture step by step in Chapter. 4, then we optimize the architecture by using the ideas of computing reuse, weight quantization, pipeline structure and bit truncation. The computation reuse is the most influential optimized strategy of all, it reduces the area to 15 percent of original. It is based on the algorithm modification shown in Chapter. 3. The final result is also shown in the thesis; the goal of optimization is to build an area efficient chip for this highly parallel architecture. To sum up, an area efficient high frame rate optical flow engine is designed. It can be used in lots of wearable devices which is low power and low cost requirement. It is also a critical and necessary core for achieving action prediction.

並列關鍵字

Video processing ； optical flow calculation ； high frame rate architecture ； area efficient architecture

參考文獻

[1] T. Lan, L. Sigal, and G. Mori, Social roles in hierarchical models for human activity recognition," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 1354{1361, June 2012.

[2] M. Humphries, Googles new self driving car has no steering wheel." http://www.geek.com/news/googles-new-self-driving-car-has-no-steering-wheel-1595053/, 2014. [Online; accessed 23-March-2016].

[13] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu, Dense trajectories and motion boundary descriptors for action recognition," vol. 103, pp. 60{79, 2013.

[17] M. S. Ryoo and J. K. Aggarwal, UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA)." http://cvrc.ece.utexas.edu/SDHA2010/Human Interaction.html, 2010.

[22] Y. Boykov, O. Veksler, and R. Zabih, Fast approximate energy minimization via graph cuts," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, no. 11, pp. 1222{1239, 2001.

國際替代計量

高幀率光流演算法引擎硬體架構設計

主題瀏覽