透過您的圖書館登入
IP:18.222.35.21
  • 學位論文

行人於多監控影片的運動行為之結構化描述

The Structural Description of the Pedestrians Motion Behavior in Multi-Surveillance Videos

指導教授 : 林春宏

摘要


在大量部署監控攝影機的監控系統中,將會產生大量的非結構化影像資料,其資料多樣且不易分析。因此,要如何對監控影片進行自動化分析與推理,將是未來重要的研究目標。本研究目的是對所有監控影片中的行人運動行為做記錄,然後利用這些記錄將行人運動行為轉換成結構化的描述,最後再建置所有行人運動行為的關聯表,以便未來容易處理與分析大量的監控影片。 本文所提出行人在多監控影片之運動行為的結構化描述,首先校正監控影片,再對監控影片做自動行人偵測、追蹤、重新識別、分析以及描述等。本系統將分為五個處理階段:第一個是攝影機校正的處理階段,係對影像的扭曲變形進行校正;第二個處理係物件偵測(object detection)的階段,係使用卷積類神經網路(convolutional neural network,CNN)偵測影片中的物件,並進行物件的分類,類別包括卡車、汽車、機車、腳踏車以及行人;第三個處理階段是行人追蹤(pedestrian tracking),目的是找出行人物件在連續影像中的關聯性,描繪多物件在監控影像中之軌跡;第四個處理為行人重新識別(pedestrian re-identification)的階段,主要係針對不同時間與影片中的行人物件集合計算相似度,以作為重新識別之依據;第五個處理階段係萃取行人在各攝影機的路徑,以及行人在各影片中的移動、靜止、站立與否、移動路徑、方向、穿著以及行走時間等資訊,以建置一套行人在多攝影機網路間的運動記錄之關聯表。 為了評估本文方法對於行人在多監控影片的運動行為之成效,本實驗使用兩個不同的影像資料庫進行實驗與分析,其影像資料庫為Multi-Camera Object Tracking(MCT) Challenge和PETS 2009 Benchmark Data。本實驗包含行人偵測、行人辨識、單監視攝影機之多行人追蹤、多監控影片之行人重新識別等實驗,再將以上資訊進行整理,最後將這些資訊描述成行人的運動行為。在行人偵測的實驗,其結果證明運用卷積類神經網路能夠有效克服物件停留、光線變化、顏色相近、物件擁擠、物件不完整等問題。在行人辨識和重新識別的實驗中,其準確率會受到實驗影片解析度的不同,而受到影響。 本文行人偵測方法比傳統動態物件偵測方法更精確,並且在行人追蹤及行人重新識別後,整合成行人物件在多個監控攝影機的關係,並建立行人在各個監控影片的運動行為關聯表。期望未來能加入不同的類神經網路架構,擴增行人重新識別之分辯力,並分析更多行人的運動行為,以降低在尋找特定事件發生時所需要的人力和時間。

並列摘要


When a large number of surveillance cameras are deployed in a surveillance system, a large amount of unstructured data derived from the surveillance video is generated. These data do not follow a specified format and hence are difficult to analyze. Therefore, determining how to achieve automated analysis and reasoning will become an important future research objective for the surveillance video. The purpose of this research is to record the motion behavior of pedestrians in all surveillance videos, and then use these records to convert the pedestrians’ motion behavior into structured descriptions. Finally, a relational table of all pedestrians’ motion behavior is built to facilitate future processing and analysis of pedestrian’s motion behavior in a large surveillance video setup. This paper describes the structured motion behavior of pedestrians in multi-surveillance videos. First, the surveillance videos are calibrated, and then several steps are performed, namely the automatic detection, tracking, re-identification, analysis and description of pedestrian movements in the surveillance video footage. In addition, the system is divided into five processing stages. The first processing stage is camera calibration, followed by pedestrian detection, which utilizes a convolutional neural network to detect object locations in the surveillance videos, while simultaneously identifying different objects such as vehicles, scooters, bikes and pedestrians. The third processing stage is pedestrian tracking, which serves to find the relations between constant frames, and depicts the camera’s trajectory. The fourth processing stage involves the re-identification of pedestrians in multiple surveillance videos that incorporate the same pedestrian units in different frames and cameras. In the final processing stage, the pedestrian path in each camera is extracted, as well as each pedestrian’s movement and information, including standing or not, walking paths, direction, clothing color and walking (as appeared) time in each surveillance video, all geared toward building a correlation table of pedestrian motion records between multiple cameras in the surveillance network. In order to evaluate the effectiveness of the research methods in this paper, two different image databases are used, specifically the multi-camera object tracking (MCT) challenge, and the PETS 2009 benchmark data for experiment and analysis. This experiment includes pedestrian detection, pedestrian identification, multiple pedestrian tracking, and multiple camera pedestrian re-identification. All the information is integrated, and the behavior of pedestrians in the multi-surveillance videos is illustrated. In the pedestrian detection experiment, it is noted that the use of a convolutional neural network can effectively surmount such problems as object retention, changing light conditions, similar colors, crowded images and incomplete objects. In the pedestrian identification and re-recognition experiment, it is found that the experimental surveillance video resolution is too low, which results in a reduced accuracy rate. The proposed pedestrian detection method exhibits better accuracy than the traditional dynamic object detection method, and after pedestrian tracking and pedestrian re-identification, the relationships between pedestrians in multiple surveillance videos are integrated, and the behaviors of pedestrians in various surveillance videos are established, and entered into a behavior association table. It is expected that future work will combine different neural networks, amplify pedestrian re-identification of the resolution, and analyze more pedestrian behavior in order to reduce the required computation complexity and searching time.

參考文獻


[10] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.
[12] B. Yeo, W. Lim, and H. Lim, "Scalable-width temporal edge detection for recursive background recovery in adaptive background modeling," Applied Soft Computing, vol. 13, no. 4, pp. 1583-1591, 2013.
[13] B. K. Horn and B. G. Schunck, "Determining optical flow," Artificial intelligence, vol. 17, no. 1-3, pp. 185-203, 1981.
[15] F.-C. Cheng, S.-C. Huang, and S.-J. Ruan, "Advanced background subtraction approach using Laplacian distribution model," in Multimedia and Expo (ICME), 2010 IEEE International Conference on, 2010, pp. 754-759: IEEE.
[16] S.-C. Huang and F.-C. Cheng, "Motion detection with pyramid structure of background model for intelligent surveillance systems," Engineering Applications of Artificial Intelligence, vol. 25, no. 7, pp. 1338-1348, 2012.

延伸閱讀