基於深度學習的空拍影片即時人物追蹤

近年來，無人機因為其優異的機動性與在高空中飛行的優勢，其在安全監控及日常拍攝的相關應用已漸趨普及，其中人物追蹤更是基本且必要的研究方向。然而目前針對空拍畫面的相關研究較少，一般的電腦視覺演算法常因拍攝角度的不同而不利於應用在空拍畫面中。本文採用先偵測後追蹤(tracking-by-detection)的追蹤流程，結合深度學習偵測模型YOLOv3及追蹤框架SORT為基本架構，透過我們所建立之空拍影像資料集進行微調訓練(fine-tuning)，最終得到適合空拍影像的人物偵測模型；另外亦藉由同時考量目標動態及外觀訊息，大大提升了SORT在重新識別(re-identification)目標的能力。本文所提出的方法具備多目標追蹤的能力，執行速度約30 FPS，測試資料集為空拍追蹤資料庫UAV123中的人物追蹤影片，實驗結果顯示在測試資料集中本文方法能成功追蹤近九成的目標。

關鍵字

空拍機；人物追蹤；深度學習；即時；空拍影片

並列摘要

Recently, drones have inspired more and more applications in surveillance system and daily life due to its maneuverability and flying ability, wherein pedestrian tracking is an even more rudimentary and necessary research field. However, there is insufficient research related to aerial images or videos so far. Common computer vision algorithms tend to perform poorly for aerial images because of different image shooting angles. The proposed method follows the track-ing-by-detection paradigm, combining YOLOv3, a deep learning based object detection model, and SORT, an object tracking algorithm, as our basic framework. Firstly, we fine-tune YOLOv3 on our aerial dataset to improve its detection accuracy on aerial images. Secondly, we consider both motion and appearance information to increase the re-identification capability of SORT. The proposed method can track multiple objects at around 30 FPS. The test dataset is composed of videos focusing on tracking pedestrian from the UAV123, a tracking dataset with video sequences from aerial viewpoints. Experimental evaluation shows that the proposed method can successfully track about 90% targets on the test dataset.

並列關鍵字

Drone ； Pedestrian tracking ； Deep learning ； Real-time ； Aerial videos

參考文獻

[1] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005.

Google Scholar

[2] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010.

Google Scholar

[3] M. Everingham, L. Van Gool, C.K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, Jun. 2010.

Google Scholar

[4] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolu-tional neural networks,” Advances in neural information processing systems, pp. 1097-1105, 2012.

Google Scholar

[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conference on Computer Vi-sion and Pattern Recognition, 2014.

Google Scholar

國際替代計量

基於深度學習的空拍影片即時人物追蹤

全文下載

主題瀏覽