透過您的圖書館登入
IP:44.222.138.70
  • 學位論文

針對高效能智慧型視覺辨識系統之探討及設計

Exploration and Design of High-Efficiency Intelligent Visual Recognition Systems

指導教授 : 陳良基

摘要


智慧視覺辨識技術在現今許多應用扮演不可或缺的角色,例如智慧車輛、人機互動系統、監視系統以及體感遊戲等等。在本論文中,我們將分別探討利用特定性機器學習演算法開發之高效率視覺辨識系統,以及利用廣域性稀疏重建演算法開發之高效率視覺辨識系統。本論文分為兩大部分。在第一部份我們介紹以智慧視覺為基礎之車輛偵測及追蹤辨識系統。此系統建立在電腦視覺及機器學習技術之上。我們根據以視覺為基礎之智慧車輛應用之需求探討系統規格。此系統利用機器學習演算法達到高準確率的偵測結果。我們提出一針對多車輛追蹤之高效率特徵追蹤演算法。我們所提出的系統適合用於各種車用應用,並且在遠距離應用達到超過90%之偵測率以及在中距離應用達到99.1%之追蹤成功率。 為了達到即時處理之需求,我們利用大型積體電路設計來實現本系統。我們探討硬體設計最佳化之流程及方法,在不影響準確率的情況下,降低硬體資源需求及消耗。我們利用40奈米半導體製程實現此智慧視覺車用系統晶片。此晶片大小為3.0×3.1mm2。此晶片之時脈為220MHz,核心與輸入輸出腳位電壓分別為0.9V以及2.5V。此晶片能夠達到3.01TOPS/W功率效率以及55.6GOPS/mm2面積效率。此系統晶片能夠最多支援同時64物體追蹤。藉由特徵追蹤處理器,此晶片能夠提升1.62倍之功率效率以及提升至少1.79倍之處理偵率。對於Haar-like物體偵測,此晶片在VGA解析度下能夠達到0.327fps/MHz之處理效率,此處理效率為目前相關之物體偵測硬體架構之3.6到8.8倍。總結第一部份而言,我們實現了一能於140公尺主動距離到達60fps、60公尺主動距離達到300fps之系統及其硬體架構設計。此架構最高能支援Quad-VGA(1280×960)影像解析度。此晶片之平均功率消耗為69mW,同時達到354.2fps/W之功率效率。 在論文第二部分,我們提出針對一般性視覺辨識系統之稀疏重建演算法及硬體架構設計。此演算法利用信號稀疏性的基本性質來對物體進行稀疏表示。此外信號稀疏特性可以用於壓縮式感測。稀疏表示和壓縮式感測的重建核心可以模型化成凸面最佳化問題。然而此凸面最佳化問題會造成非常高的運算量以及運算複雜度。此凸面最佳化二次方程式問題即被稱之為LASSO方程式。我們針對此問題,基於Homotopy拓樸演算法,提出了一疊代式重建核心演算法。此方法能夠將先前重建結果當成一起始信號點來重現目前待解信號。此方式稱之為暖啟動稀疏重建。此方法適用於當信號與信號之間存在時序或空間相依性。我們提出之方法能夠快速地在不同動態變化之下,重建稀疏信號。同時我們也根據實際計算流程,探討演算法最佳化以及快速計算法。我們利用提出之重建演算法能夠達到物體辨識及追蹤,同時探討利用暖啟動重建演算法之處理時間效益。 我們針對高維度稀疏信號重建開發一多元性平行硬體架構。此晶片能夠對於壓縮式感測或稀疏表示法相關應用提供高維度之信號重建能力。同時利用此晶片能夠達到對於不同視覺辨識應用所需之即時處理能力。此信號重建平台實現於40奈米半導體製程。晶片大小為3.7×3.7mm2。其平均功率消耗為353.3mW,時脈為250MHz,核心與輸入輸出腳位電壓分別為0.9V以及2.5V。我們提出一4G 係數/秒(8Gbps)之高輸出感測矩陣產生引擎。透過此矩陣產生引擎,此晶片相較於從晶片外部讀取整個感測矩陣,能夠減少超過75%頻寬需求,同時其減少77%整體處理時脈週期。我們對於解析線性方程式提出一矩陣分解引擎。透過此矩陣分解引擎以及所提出之漸進式矩陣更新流程,能夠減少超過57%之線性方程式解析時間。此晶片透過16個處理核心達到401GFlops/W之功率效率以及10.4GFlops/mm2面積效率。此晶片根據信號維度能夠支援不同的信號稀疏度。相較於軟體實現,此晶片對於監測視訊重建達到292倍之加速。同時此晶片對於視覺物體追蹤達到至少200倍之加速。而對於高斯隨機信號重建,當信號維度為2048,量測信號維度為1024,信號稀疏度為5%時,此晶片能夠達到1008倍之加速能力。

並列摘要


Visual intelligent recognition nowadays plays an essential role in many applications such as smart automobiles, human-machine interaction, surveillance and gaming. In this dissertation, we explore high-efficient visual recognition systems which are developed with the specific machine-learning algorithm and the generic sparse reconstruction algorithm. The dissertation is divided into two parts. In the first part, we present an intelligent vision-based on-road preceding vehicle detect-and-track recognition system based on computer vision and machine learning techniques. We discuss the system requirements and specification for vision-based automotive applications. High-accurate detection is achieved via the machine learning-based method. We present an efficient knowledge-based tracking algorithm for multi-vehicle tracking tasks. Our framework is favored for versatile automotive applications, which yields above 90% detection rate in long-range and 99.1% tracking successful rate in middle-range. To achieve real-time criteria, we implement the system in VLSI. Architecture optimization is investigated to reduce hardware costs without significantly degrading the accuracy. We show an intelligent vision SoC implemented in a 40nm CMOS process. The die size is 3.0x3.1mm2. 3.01TOPS/W power efficiency and 55.6GOPS/mm2 area efficiency are achieved. The system supports at most 64 object tracking. It raises 1.62x improvement on power efficiency and at least 1.79x increase on frame rate with the proposed knowledge-based tracking processor. For Haar-like object detection, the processing efficiency is 0.327fps/MHz normalized to VGA resolution with 3.6x to 8.8x outperformance compared to the state-of-the-arts. The architecture realizes 140 meters active distance at 60fps and 60 meters at 300fps under Quad-VGA (1280x960) resolution. The chip achieves 354.2fps/W power efficiency with 69mW average power consumption. In the second part, we propose a sparse reconstruction algorithm and an architecture for generic visual recognition systems. The algorithm adopts the fundamental characteristics of signal sparsity for sparse representation (SR) of object patterns and for decoding signals via compressed sensing (CS). Both reconstruction kernels of CS and SR can be modeled as convex optimizations, which may induce high computational complexity. The quadric form is known as the LASSO equation. We then develop a generic iterative reconstruction kernel based on Homotopy-based algorithm. The method can recovery a signal from a previous reconstructed result as a starting point, called warm-start, which is suitable for signals with temporal/spatial dependency. The proposed method can rapidly reconstruct a sparse signal under several dynamic modifications. We also exploit algorithmic optimization methods for practical implementations. We then show a visual object tracking system simultaneously performs object recognition. The system is designed using the proposed sparse reconstruction algorithm. The improvement on processing time of the warm-start algorithm is also exploited. We develop a versatile universal architecture for high-dimensional sparse signal reconstruction. The chip supports high-dimensional sparse signal reconstruction for compressed sensing and sparse representation. It achieves the real-time processing capability for various visual recognition applications. The versatile signal reconstruction platform is designed in a 40nm CMOS process. The die size is 3.7x3.7mm2. It dissipates 353.3mW average power at 250MHz with 0.9V/2.5V core/IO voltage. A 4G entries/s (8Gbps) high-throughput sensing matrix generation engine is proposed. With the matrix generation engine, the chip reduces over 75% bandwidth requirement compared to loading the full sensing matrix from off-chip. It also reduces 77% total processing cycles with the matrix generation engine. A generic matrix factorization engine is proposed for solving linear algebra equations. Over 57% processing time reduction is achieved in solving linear equations via the proposed incremental matrix updating scheme. The chip achieves 401GFlops/W power efficiency with the proposed 16 multiprocessing cores. 10.4GFlops/mm2 area efficiency is also achieved. The chip supports various sparisty levels according to signal dimensions. The chip yields a 292x speedup for a surveillance video reconstruction compared to software implementations. The chip also yields over 200x improvement on computing time compared to software implementations for visual object tracking tasks. For a Gaussian-randomized arbitrary sparse signal recovery with identity basis, it achieves 1008x speedup for a signal with N=2048, M=1024, Sparsity=5%.

參考文獻


[122] S. Agarwal and D. Roth, “Learning a sparse representation for object detection,” in Proceedings of the 7th European Conference on Computer Vision, ECCV ’02, pp. 113–130, 2002.
[123] P. Maechler, C. Studer, D. Bellasi, A. Maleki, A. Burg, N. Felber, H. Kaeslin, and R. G. Baraniuk, “VLSI implementation of approximate message passing for signal restoratiaon and compressive sensing,” Submitted to IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2012.
[1] B. A. Olshausen, C. F. Cadieu, and D. K. Warland, “Learning real and complex overcomplete representations from the statistics of natural images,” in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 7446 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Aug. 2009.
[5] S. Lee, J. Oh, M. Kim, J. Park, J. Kwon, and H.-J. Yoo, “A 345mw heterogeneous many-core processor with an intelligent inference engine for robust object recognition,” in Solid-State Circuits Conference - Digest of Technical Papers, IEEE International, pp. 332–333, Feb. 2010.
[8] C.-R. Chen, W.-S. Wong, and C.-T. Chiu, “A 0.64 mm2 real-time cascade face detection design based on reduced two-field extraction,” Very Large Scale Integration Systems, IEEE Transactions on, vol. 19, pp. 1937–1948, Nov. 2011.

延伸閱讀