透過您的圖書館登入
IP:18.221.187.121
  • 學位論文

智慧型監控系統之演算法積體電路硬體架構及系統設計

Algorithm, VLSI Hardware Architecture and System Design for Smart Surveillance

指導教授 : 簡韶逸

摘要


近年來監控系統被廣泛地佈置在生活環境中的許多角落,例如: 機場、路口、家中、辦公大樓……等等,因此產生了大量的監控視訊內容,這樣大量的視訊內容實在無法用傳統人力的方式進行監看,於是自動化的電腦監控功能變得非常有需要,另外,由於半導體科技的大幅進步,一個更有效率的硬體運算平台與系統架構是有需要被設計與討論的,經過良好的設計,可以讓這樣的智慧型監控系統以更低的成本與佈置空間,提供監控上更準確更便利更及時的服務。 這份博士論文,便針對智慧型監控系統進行以下四個方面的討論並提出供獻,分別為智慧型監控網路的架構最佳化、智慧型監控之電腦視覺演算法設計、智慧型監控之高效能視覺處理器之硬體架構設計及互助式智慧型監控系統設計。 在智慧型監控網路的架構最佳化方面,在這裡,我們提出監控內容資料抽象層次,並透過監控管道(surveillance pipeline)的方式將智慧型監控網路模型化,加以比較各種可能的智慧型監控網路架構,討論各種智慧型監控網路架構在內容分析的即時性、用來分析的視訊之品質、系統的建構成本及所佔空間…等等的優劣。最後,根據討論的結果,我們以智慧型監控攝影機(內建智慧型內容分析硬體引擎的攝影機)的系統架構為我們的選擇。 在智慧型監控之電腦視覺演算法設計方面,根據視訊監控之需要,我們針對視訊分割、視訊物件追蹤、視訊物件描述、人臉偵測與清析度評分等四個較重要的視訊監控演算法,找出現有演算法的問題並提出改進的方法。首先視訊分割的部份,我們採用的是背景減去法。使用這個方法,動態背景的處理及決定一個準確強健的門檻值是提高分割品質的關鍵。針對這兩個問題,我們提出多背景註冊法及一個適用在動態背景的門檻決定演算法,來強化分割的品質,此外,所提出的這兩個方法可以大量的消減背景模型的記憶體使用量。對於視訊物件追蹤,我們觀察到物件的外表顏色特徵常會隨著所在環境的打光情況而變化,如何在這樣外表變化的情況下,依然能準確的追蹤到物件,將是物件追蹤的功能是否能成功的應用在日常監控環境的關鍵。關於這點我們提出計算擴散距離來量測物件摸型與物件可能所在區塊的顏色條狀圖之相似性,若考量運算量,我們採用一維的顏色條狀圖。實驗的結果發現,我們提出的方法,有效的將打光變化的這個環境變數,加以有效的克服。另外,我們還利用視訊分割的結果(物件移動照成的線索),來強化追蹤的準確度,克服物件與背景相似的問題(background clutters)。另外,對於視訊物件描述,有鑑於傳統的物件描述只能對視訊物件進行較低層次的描述,例如:mpeg-7裡的描述器(descriptor),我們這裡提出較高階的物件描述方法,先將物件分成有意義的區塊,再對不同的區塊抽取其特徵,例如可將人類物件分為軀幹與四肢等區塊,再針對軀幹及四肢抽取顏色特徵,如此一來,我們可以用「穿白色上衣的人」…等等較高層次的描述方式,描述一個人類物件。最後,對於人臉偵測及清析度評分,我們認為一般監控場景中不一定隨時會有人臉出現,若使用image-based人臉偵測方法,例如Adaboost結合Haar-like features,則會有大量的運算被浪費掉,基於這點我們提出考量視訊分割結果的人臉偵測方法,只在劃面中有前景物件被偵測出來的地方進行人臉偵測,來進行有效率的人臉偵測。最後,人臉上的特徵被用來對偵測到的人臉進行清析度的評分,判斷那些人臉是比較清楚的,再將清楚的人臉留下來當作最後的偵測結果。 在智慧型監控之高效能視覺處理器之硬體架構設計方面,根據我們的分析,要針對智慧型監控應用設計一個優良的嵌入式視覺處理器,必須同時需具備以下五點: 高效能、可程式化能力、多種資料型態的處理、低成本及低功率消耗。有鑑於此,我們提出一個具備以上五個特點的架構: 粗顆粒可重組化影像串流處理架構,並搭配子字平行、異質資料串流以及硬體共享等設計技巧,來提高硬體的效能並降低功率消耗及硬體成本。晶片實作的結果,我們設計的處理器:可重組化智慧型攝影機串流處理器,在面積使用效率及功率使用效率上,大幅的領先目前文獻上最新的視覺處理晶片。在實現的結果中,我們提出的視覺處理器,在記憶體使用量、面積使用效率及功率使用效率方面,領先現今最佳的視覺處理晶片,分別達到18.2至182倍、3.8至74.2倍及4.5至33.0倍。 在互助式智慧型監控系統設計方面,最後,我們提出一個互助式智慧型監控的原型系統,這個系統由固定式的監控攝影機及可移動的機器人組成。傳統的監控系統由固定式的攝影機組成,容易有監控死角產生。有鑑於此,我們提出的系統,可以由固定式的攝影機進行可疑人物的偵測,當發現可疑人物的時候,可以將可疑人物的外表特徵及所在位置傳給機器人,讓機器人進行移動式的無死角人物追蹤,如此可以減少監控系統的死角,讓監控系統的安全性進一步提升。

並列摘要


In the next-generation visual surveillance systems, content analysis tools will be integrated. New design issues will arise related to system cost, deployment space, network loading, and system scalability. In this thesis, after a discussion in terms of surveillance pipelines, it is proposed to utilize a content abstraction hierarchy to relieve network loading and increase system scalability, and integrate a hardware content analysis engine into a smart camera System-on-a-Chip (SoC) to reduce system cost and deployment space. As a result, the surveillance IP camera will become a smart camera with the embedded capabilities for automatic content analysis and the network of surveillance IP cameras will become smart surveillance networks. For the functions of content analysis, the video object segmentation and tracking are two important building blocks for smart surveillance. However, there are several issues needed to be solved. First, the threshold decision is a hard problem for background subtraction video object segmentation. Second, for video object tracking, there are some issues or conditions that make video object tracking hard to be robust, such as non-rigid object motion, target appearance changes due to illumination condition changes, background clutter, ..., etc. In this thesis, by proposing an improve threshold decision algorithm, the threshold for background-subtraction-based video object segmentation can be decided automatically and robustly under sever dynamic backgrounds. Besides, the proposed threshold decision is based on a mechanism different from that in background-subtraction-based video object segmentation, which can prevent possible error propagations. For video object tracking, by using diffusion distance for color histogram matching, the tracker can track non-rigid moving object under sever illumination condition changes, and, by using motion clue from video object segmentation, the tracker can be robust to background clutter. In the experiments results, we show that the presented algorithms are robust under several challenging sequences and our proposed methods are truly effective approaches for the mentioned issues. Beside of video object segmentation and tracking, two more functions of content analysis are also improved in this thesis. They are video object description, and face detection and scoring. For the video object description, a new descriptor for human objects, Human Color Structure Descriptor (HCSD), is proposed. Experimental results show that the proposed descriptor, HCSD, can achieve better performance than Scalable Color Descriptor and Color Structure Descriptor of MPEG-7 for human objects. For face detection and scoring, facial images with low resolution in surveillance sequences are hard to detect with traditional approaches. An efficient face detection and face scoring technique in surveillance systems is proposed. It combines spirits of image-based face detection and essences of video object segmentation to filter out high-quality faces. The proposed face scoring technique, which is useful for surveillance video summary and indexing, includes four scoring functions based on feature extraction and is integrated by a neural network training system to select high-quality face. Experiments show that the proposed algorithm effectively extracts low-resolution human faces, which the traditional face detection algorithms cannot handle well. It can also rank face candidates according to face scores, which determine face quality. For the hardware content analysis engine, a 5.877 TOPS/W and 111.329 GOPS/mm^2 Reconfigurable Smart-camera Stream Processor (ReSSP) is implemented in 90nm CMOS technology. A coarse-grained reconfigurable image stream processing architecture (CRISPA) along with design techniques of heterogeneous stream processing (HSP) and subword-level parallelism (SLP) is implemented to accelerate the processing algorithms for smart-camera vision applications. With the processor architecture of CRISPA and the design techniques of HSP and SLP, ReSSP can outperform existing vision chips in many aspects of hardware performances. Moreover, the programmability of ReSSP makes it capable of supporting many high-level vision algorithms in high spec, such as the real-time capability for full-HD video analysis. The implementation results show that the on-chip memory can be reduced by 94% with SLP memory sharing. The on-chip memory size, power efficiency and area efficiency are 18.2x to 182x, 4.5x to 33.0x, and 3.8x to 74.2x better than the state-of-the-art chips. Beside of the algorithms and hardware that are proposed for the single smart camera, this thesis also presents a cooperative surveillance system. It proposes a cooperation scheme between fixed cameras and a mobile robot. The fixed cameras detect the objects with background subtraction and locate the objects on a map with homography transform. At the same time, the information of the target to track, including the position and the appearance, is transmitted to the mobile robot. After Breadth First Search in a map of Boolean array, the mobile robot finds the target in its view by use of a stochastic scheme with the information given, then the mobile robot will track the target and keep it in the robot's view wherever he or she goes. By proposing this system, the dead spot problem in typical surveillance systems with only fixed cameras is considered and resolved. Besides, the track initialization problem in typical tracking systems, i.e. how to decide the target of interests to be tracked, is also resolved with the proposed cooperation scheme in system level.

參考文獻


[1] S.-Y. Chien, Y.-W. Huang, B.-Y. Hsieh, S.-Y. Ma, and L.-G. Chen, “Fast video segmentation algorithm with shadow cancellation, global motion compensation, and adaptive threshold techniques,” IEEE Trans. Multimedia, vol. 6, no. 1, pp. 732 – 748, Oct. 2004.
[2] C. Regazzoni, V. Ramesh, and G. L. Foresti, “Special issue on video communications, processing, and understanding for third generation surveillance systems,” Proc. IEEE, vol. 89, no. 10, pp. 1355 – 1367, Oct. 2001.
[3] E. Durucan and T. Ebrahimi, “Change detection and background extraction by linear algebra,” Proc. IEEE, vol. 89, no. 10, pp. 1368 – 1381, Oct. 2001.
[4] T.E. Boult, R.J. Micheals, X. Gao, and M. Eckmann, “Into the woods: visual surveillance of noncooperative and camouflaged targets in complex outdoor settings,” Proc. IEEE, vol. 89, no. 10, pp. 1382 – 1402, Oct. 2001.
[5] F. Bartolini, A. Tefas, M. Barni, and I. Pitas, “Image authentication techniques for surveillance applications,” Proc. IEEE, vol. 89, no. 10, pp. 1403- 1418, Oct. 2001.

延伸閱讀