透過您的圖書館登入
IP:3.144.35.148
  • 學位論文

基於深度學習的魚眼相片行人偵測與追蹤

Deep Learning-based Pedestrian Detection and Tracking for Fisheye Images

指導教授 : 杭學鳴

摘要


全景相機(也稱為360度相機)可以拍攝到比普通相機更為寬廣的視野,這可運用於安全監控和人流分析。一台魚眼相機可以取代數台普通相機。然而,因為魚眼鏡頭造成的圖像變形,使得魚眼照片與一般照片截然不同。更重要的是,變形的程度會隨著物體所在位置而有所變化。這種魚眼變形使得偵測物體比在一般相片中更為困難。 在本論文中,我們採用了最先進的架構之一,YOLOv3 [1]用於行人偵測。為了因應魚眼相機的變形的情況,我們採用deformable convolution [27]的方法,將其與YOLOv3結合。我們在YOLOv3網絡中做了不同的修改,並且比較它們的表現。在我們的實驗中,我們使用PIROPO [2]數據集。此數據集沒有提供定界框的資料。因此,我們在此數據集中標記了大約6000張相片,並將其稱為“PIROPO-mini”數據集。我們在PIROPO-mini數據集上訓練和測試我們提出的模型。通過額外的資料增強步驟,我們的最佳模型可以達到約為90%至92%的精度均值。實驗結果顯示,在魚眼相片中,我們的模型明顯地改善了原始YOLOv3模型。 此外,我們也做了行人追蹤的應用。我們採用Deep SORT演算法,該算法結合了卡爾曼濾波和深度關聯度量。它是一種藉由偵測進而追踪的算法。因此,我們將此算法與我們提出的偵測模型結合。我們在MW-18Mar數據集上評估此追踪系統。在本論文中,我們的系統在魚眼視訊上展示了不錯的行人追踪結果。

並列摘要


Omnidirectional cameras (also known as 360-degree cameras) can capture a much wider field of view than the perspective cameras, which is helpful for security surveillance and pedestrian flow analysis. Several normal cameras can be replaced by one fisheye camera. However, a fisheye camera image is quite different from a perspective camera image due to fisheye distortions. What is more, the degree of distortions changes with respect to the location of an object. This kind of fisheye distortions make the detection task much harder than the perspective image cases. In this thesis, we adopt one of the state-of-the-art architectures, YOLOv3 [1] for pedestrian detection. To cope with the distortion of fisheye camera, we adopt the deformable convolution [27] technique and combine it with YOLOv3. We propose several modifications at different layers in the neural networks and compare their performance. In our experiments, we use the PIROPO [2] dataset. The dataset does not provide the bounding box ground truth. Thus, we label about 6000 images in the dataset, and call them “PIROPO-mini” dataset. We train and test our proposed models on the PIROPO-mini dataset. With additional data augmentation steps, the best model has AP (Average Precision) around 90% to 92%. The experimental results show that our model significantly improves the original YOLOv3 model on the fisheye images. Furthermore, we implement the pedestrian tracking function into our system. We employ the Deep SORT algorithm, which combines Kalman filtering and a deep association metric. It is a tracking by detection method. Therefore, we combine it with our proposed detection model. We evaluate the tracking system on the MW-18Mar dataset. In this work, our system has shown promising tracking results on the fisheye videos.

參考文獻


[1] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018.
[2] People in indoor rooms with perspective and omnidirectional cameras (PIROPO) database. Available: https://sites.google.com/site/piropodatabase/.
[3] Ricoh Theta V. Available: https://theta360.com/ct/about/theta/v.html
[4] Mirror worlds challenge.
Available: https://icat.vt.edu/mirrorworlds/challenge/index.html.

延伸閱讀