透過您的圖書館登入
IP:52.14.150.55
  • 學位論文

基於階層式物體偵測網路之影片動作偵測

Temporal Action Detection Based on Hierarchical Object Detection Networks

指導教授 : 蔡文錦 陳華總

摘要


時序上的動作檢測指的是在一段包含多個動作的視頻中,除了要偵測出當中包含哪些動作類別外,還要精確地定位出每個動作發生的時間,包括起始和結束的時間。隨著深度學習技術的發展,很多研究從使用傳統電腦視覺的方法,改成利用深度學習的方式,這使得時序上的動作檢測這個研究領域也有了很大的進展。時序上的動作檢測有許多應用,像是視頻監控和視頻檢索等。 在本論文中,我們認為圖片中出現的物體資訊對於動作的檢測有很大幫助。因此,我們不使用三維的卷積網絡來生成影片的特徵,而是提出了一種使用兩層物體偵測網絡的架構:第一層網絡用於偵測每個幀中出現的物體,第二層網路則是用於動作的檢測。其中,我們提出了一種資料轉換的方法,將第一層的偵測結果沿著時序堆疊起來,形成一種具六通道的新型態資料,兼具空間和時間的資訊,作為第二層網絡的輸入資料。透過實驗證實了我們的方法能得到不錯的結果。

並列摘要


As the development of deep learning, there is a great progress in temporal action detection. Instead of using the ways of conventional computer vision, many approaches use the ways of deep learning to do temporal action detection. There are many applications of temporal action detection such as video surveillance and video retrieval. Considering that some actions can be recognized by the information of objects appearing and moving in the video, in this thesis, a hierarchical model is proposed which consists of two object detection networks to do temporal action detection. The first network is used to detect objects in each frame, and the second one is for temporal action detection. We also proposed a method which converts the object detection results of the first network into a new type of data so that it can be fed to the second network. The new type of data is an image of six channels with spatiotemporal information and is beneficial to temporal action detection. We conduct experiments on the dataset THUMOS14 which is used for temporal action detection and our approach achieves a satisfactory performance.

參考文獻


[1] Z. Shou, D. Wang and S. F. Chang, "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 1049-1058.
[2] D.Tran, L.Bourdev, R.Fergus, L.Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks,” In IEEE International Conference on Computer Vision, pages 4489–4497, 2015.
[3] H. Xu, A. Das and K. Saenko, "R-C3D: Region Convolutional 3D Network for Temporal Activity Detection," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 5794-5803.
[4] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788.
[5] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Region-Based Convolutional Networks for Accurate Object Detection and Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142-158, Jan. 1 2016.

延伸閱讀