透過您的圖書館登入
IP:18.117.100.20
  • 學位論文

基於消失點轉換與深度學習對空拍影像進行行人偵測

Pedestrian Detection on Aerial Images Using Vanishing Point Transformation and Deep Learning

指導教授 : 莊仁輝 陳華總

摘要


有鑑於無人飛行機日漸普及且操作更加穩定,再加上無人飛行機的便利性,已有許多應用透過無人飛行機進行高空拍攝。然而,較少有研究針對無人飛行機空拍影像作更進一步的分析。由於空拍影像中,物件的大小在影像中所佔比例很小,再加上俯仰角造成物件在影像中變形。尤其是在偵測人物時,人物比例相較於汽車等其他物件又更小,且因為透視投影的關係,站立之人物在影像中會向兩側歪斜,在空拍影像中更難以偵測。 對於空拍影像,本論文選擇使用目前在影像偵測中表現優異的深度學習架構進行行人偵測。現在已有許多穩定且強大的深度學習架構,如Fast R-CNN、You Only Look Once(YOLO)及Single Shot MultiBox Detector (SSD),這些深度學習架構在The PASCAL Visual Object Classes(VOC) Challenges中都有傑出的表現。但即使使用這些深度學習架構進行行人偵測,仍有部分人物無法偵測出來。為了改善深度學習的行人偵測結果,我們將影像進行消失點轉換,用以修正人物歪斜的問題;並且嘗試切割空拍影像,以改善人物在影像中比例較小的問題。經過實驗分析,透過這兩種前處理方法確實可以有效提升深度學習的偵測率。

並列摘要


In recent years, the control of drone is much more stable, and according to the conven-ience of drone, there are many applications that use drone for high-altitude shooting to get aerial image. However, there is less research that analyzes aerial images. The main challenges of aerial image analysis are: (i) the proportions of objects in aerial images are very small, and (ii) objects in aerial images would be tilted because of the perspective projection deformation. These challenges make object detecting systems hard to detect the location of objects. To deal with aerial images, this thesis uses a deep learning model to conduct pedestrian detection. There are many stable and robust deep learning models for object detection, such as Fast R-CNN, You Only Look Once (YOLO), and Single Shot MultiBox Detector (SSD). These deep learning models perform well in The PASCAL Visual Object Classes (VOC) Challenges. However, even if we use these deep learning models to detect pedestrians in aerial images, many pedestrians still cannot be detected. In this thesis, we use vanishing point-based image transformation to fix the perspective projection deformation, and use image partition to solve the problem of small proportion of objects. After experiments, we can validate that these two pre-processing methods can effectively improve the detection rate of deep learning models.

參考文獻


[1] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005.
[2] C. Cortes and V. Vapnik, “Support Vector Networks,” in Machine Learning, vol. 20, 1995, pp. 273-297.
[3] A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel”, in Proceedings of the International Conference on Image and Video Retrieval, 2007, pp. 401-408.
[4] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans. on Pattern Analysis and Ma-chine Intelligence, 2010.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014.

延伸閱讀