基於階層式物體偵測網路之影片動作偵測

時序上的動作檢測指的是在一段包含多個動作的視頻中，除了要偵測出當中包含哪些動作類別外，還要精確地定位出每個動作發生的時間，包括起始和結束的時間。隨著深度學習技術的發展，很多研究從使用傳統電腦視覺的方法，改成利用深度學習的方式，這使得時序上的動作檢測這個研究領域也有了很大的進展。時序上的動作檢測有許多應用，像是視頻監控和視頻檢索等。在本論文中，我們認為圖片中出現的物體資訊對於動作的檢測有很大幫助。因此，我們不使用三維的卷積網絡來生成影片的特徵，而是提出了一種使用兩層物體偵測網絡的架構:第一層網絡用於偵測每個幀中出現的物體，第二層網路則是用於動作的檢測。其中，我們提出了一種資料轉換的方法，將第一層的偵測結果沿著時序堆疊起來，形成一種具六通道的新型態資料，兼具空間和時間的資訊，作為第二層網絡的輸入資料。透過實驗證實了我們的方法能得到不錯的結果。

關鍵字

深度學習；時序動作檢測；卷積神經網路

並列摘要

As the development of deep learning, there is a great progress in temporal action detection. Instead of using the ways of conventional computer vision, many approaches use the ways of deep learning to do temporal action detection. There are many applications of temporal action detection such as video surveillance and video retrieval. Considering that some actions can be recognized by the information of objects appearing and moving in the video, in this thesis, a hierarchical model is proposed which consists of two object detection networks to do temporal action detection. The first network is used to detect objects in each frame, and the second one is for temporal action detection. We also proposed a method which converts the object detection results of the first network into a new type of data so that it can be fed to the second network. The new type of data is an image of six channels with spatiotemporal information and is beneficial to temporal action detection. We conduct experiments on the dataset THUMOS14 which is used for temporal action detection and our approach achieves a satisfactory performance.

並列關鍵字

deep learning ； temporal action detection ； convolutional neural network

參考文獻

[1] Z. Shou, D. Wang and S. F. Chang, "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 1049-1058.

Google Scholar

[2] D.Tran, L.Bourdev, R.Fergus, L.Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks,” In IEEE International Conference on Computer Vision, pages 4489–4497, 2015.

Google Scholar

[3] H. Xu, A. Das and K. Saenko, "R-C3D: Region Convolutional 3D Network for Temporal Activity Detection," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 5794-5803.

Google Scholar

[4] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788.

Google Scholar

[5] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Region-Based Convolutional Networks for Accurate Object Detection and Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142-158, Jan. 1 2016.

Google Scholar

國際替代計量

基於階層式物體偵測網路之影片動作偵測

全文下載

主題瀏覽