透過您的圖書館登入
IP:3.142.53.68
  • 學位論文

視訊辨識技術應用於智慧型監控系統之研究

Vision Sensing Techniques for Intelligent Surveillance System

指導教授 : 洪一平

摘要


隨著智慧型監控系統的發展,影像分析與辨識技術已經成為智慧型監控系統內最重要的核心技術。本研究以建構全方位的智慧型監控系統為目標,提出多項前端影像辨識技術,包含:監控攝影機之干擾偵測、監控攝影機之行人與物件偵測、行人臉部定位技術、行人遺留物偵測、行人比對技術與智慧型監控人機介面。智慧型影像監控系統利用攝影機為主要訊號輸入,透過電腦視覺影像辨識技術達到自動監控的目標。因此如何保護攝影機為首要之任務,我們提出即時攝影機干擾偵測技術,此演算法能透過攝影機輸入影像,判定攝影機是否遭受人為蓄意遮蔽、轉向、失焦、斷線等破壞,此方法偵測影像中的關鍵點,並偵測其變化,達成低計算成本之優點並在多個實際測試影片中獲得相當穩定結果:低誤判率與高準確度。確保攝影機安全後,我們利用固定式攝影機場景的特性,也就是行人在攝影機不同位置擁有的一致性特徵,透過自動取樣此攝影機下的行人與物件,自動學習其顯著特徵並訓練出多個特定區域行人精煉偵測器,每個偵測器負責攝影機下之局部範圍。與目前僅使用單一偵測器方法實驗比較後發現,本方法能大幅提升的物件偵測之準確度。 此外,我們也提出行人比對技術,當給予一個嫌疑者於攝影機中的照片後,此技術整合局部與整體外觀特徵,達成高準確度之行人比對技術,能在攝影機網路所有的行人資料中快速找到相符者。除了行人外觀為一項重要特徵外,臉部資訊也是不可缺少的影像線索,我能提出臉部特徵對位偵測技術,利用具深度資訓的人臉訓練影像,離線建立3D人臉模型,並於偵測時套用於於二維影像上。與現今僅使用平面二維資訊之方法比較後發現,多了三維模型的資訊能使對位結果更為準確;此外,由於我們有3D人臉模型,因此在臉部對位後,我們能直接獲的臉部旋轉資訊,提供智慧型監控系統更多有關行人的資訊,例如:在智慧型人流分析系統中,我們可利用臉部角度與行經路徑估測行人所關注的區域與商品。 除了行人偵測與辨識外,我們也試著進行攝影機下的人為分析,我們以行人遺留物偵測為範例,在影像上的每個像素建立前景/背景狀態有限狀態機,分析該像素的狀態轉換與變化過程,決定是否在畫面中出現靜止不動的前景物。為了完整分析遺留物的事件,我們追朔過去一段時間內的移動物體軌跡,分析並驗證物主是否確實遠離了遺留物,以減少誤報情形,此方法在兩個公開測試資料庫(PETS2006、AVSS2007、NTU) 的偵測數據上均勝過相關研究。最後基於以上核心技術,我們再提出兩項先進的人機顯示方法,方便監控者快速了解、觀看並搜索多攝影機網路內所有行人與事件。

並列摘要


With the development of intelligent surveillance systems, video analysis, and recognition technology have become the most important core techniques in this field. In order to construct a surveillance system with higher intelligence, this research proposes a number of advanced video recognition technologies, including the camera interference/tampering detection, pedestrian detection, abandoned luggage detection, pedestrian re-identification and intelligent interface for visualization. Video surveillance uses cameras as the primary input sensor to achieve automatic monitoring. Therefore, how to protect the camera has become the top priority. We propose real-time camera sabotage/tampering detection technology which quickly detects whether or not cameras are hindered by deliberate shelter, disorientation, out of focus, disconnection and other damage via the video analysis. We initially locate the key points whose appearances are relatively stable. Monitoring the changes of these key points and scene structure can detect the tampering events precisely and efficiently. Our method requires lower computational cost and obtains higher stability and accuracy rate in comparison to the existing methods. After protecting cameras, we propose a scene-specific pedestrian detection and object classification. Our approach is location-based, which cab discover scene-dependent discriminative features to identifying foreground objects of different categories (e.g., pedestrians, bicycles, and vehicles). We incorporate a similarity grouping procedure capable of gathering more consistent training examples from a considerably larger neighbor area and train the specific pedestrian detectors for each grouped local area. Our approach gets significant improvement in detection and classification comparing the traditional generic object detector and classifier. Also, we propose an ensemble of invariant features (EIF), which can properly handle the color variations and human poses/viewpoints for matching pedestrian images observed in different cameras. Our proposed method belongs the direct method, which requires no domain learning. The novel features combined both the holistic and region-based features. The holistic features are extracted by using a publicly available pre-trained deep convolutional neural network (DCNN) used in generic object classification. In contrast, the region-based features are extracted based on our proposed two-way Gaussian Mixture Model fitting (2WGMMF), which overcomes the self-occlusion and poses variations. In addition to the appearance feature, the face information is undoubtedly the indispensable vital in video surveillance. We propose a 3D face alignment algorithm in the 2D image based on Active Shape Model. We off-line train a 3D shape model with different view-based local texture models from a 3D database, and then on-line fit a face in a 2D image by these models. This method mainly leverages additional depth information on the traditional 2D image alignment problem and gets a promising improvement compared to the existing model-based and regression-based approaches. Since the human poses, and their gaze directions are especially valuable information to the surveillance system, the head poses can be directly estimated by the alignment result of the proposed 3D model subsequently. Based on the robust pedestrian detection and re-identification algorithm, we also focus the problem of event detection in surveillance cameras. We take the abandoned luggage detection as an example since it is one of the most critical and challenge problems in video surveillance. We propose the complementary background model which combines short- and long-term background models to classify each pixel as 2-bit code where each bit represents a foreground or background. Subsequently, we introduce a finite-state machine framework to identify static foreground regions based on the temporal transition of code patterns and to determine whether the selected area contain abandoned objects by analyzing the back-traced trajectories of luggage owners. The experimental results obtained based on video images from 2006 Performance Evaluation of Tracking and Surveillance (PETS2006), 2007 Advanced Video, Signal-based Surveillance (AVSS2007) databases and NTU data set collected by ourselves. We show that the proposed approach is useful for detecting abandoned luggage and that it outperforms previous methods. Finally, based on the above core technologies, we also propose two advanced visualization interface, which facilitates people to observe quickly and search incidents of pedestrians within a camera network.

參考文獻


[2] Yu Su Bingpeng Ma and Frederic Jurie. Covariance descriptor based on bio-inspired features for person re-identification and face verification. Image and Vision Computing, 32(6):379–390, 2014.
[5] Douglas Gray and Hai Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proc. of the European Conference on Computer Vision (ECCV), pages 262–275, Marseille, France, 2008.
[6] Davide Baltieri, Roberto Vezzani, and Rita Cucchiara. Mapping appearance descriptors on 3d body models for people re-identification. International Journal of Computer Vision (IJCV), 111(3):345–364, 2015.
[7] Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deepreid: Deep filter pairing neural network for person re-identification. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 152–159, Columbus, OH, 2014.
[9] Fatih Porikli, Yuri Ivanov, and Tetsuji Haga. Robust abandoned object detection using dual foregrounds. EURASIP Journal on Advances in Signal Processing, 2008:30, 2008.

延伸閱讀