運用深度學習建構高效率的行為辨識模型

在影像中識別人類行為是一項具有挑戰性與重要性的任務，可廣泛的應用於各種情境；如自動監控系統中的異常事件檢測、體育運動分析與影片分類等。本研究以3D ResNet-18模型為基礎進行優化改造，提出一個更簡單且較少超參數的模組化架構。在KTH和UCF-101資料集上的結果表明本文所提出的演算法準確率（Top-1）分別為的96.3%和60.01%，與3D ResNet-18模型相比，本研究能對人類行為進行更準確的辨識。

關鍵字

深度學習；三維卷積；殘差網路；行為辨識；影像分類

並列摘要

Recognizing human behavior in images is a challenging and important task. As the image recognition of human behavior recognition has gradually begun to be used in daily life, such as automatic monitoring system in the detection of abnormal events, sports analysis and film classification. This research will optimize and improve the 3D ResNet-18 based model to propose a simple and less hyperparameter adjustment modular architecture. The experimental results on the KTH and UCF-101 data sets show that the improved algorithm accuracy (Top-1) is 96.3% and 60.01%, and the improved model can take more effective features and improve the recognition effect of human behaviors compared with the original 3D ResNet-18 Model.

並列關鍵字

Deep learning ； 3D convolution ； residual network ； behavior recognition ； video classification

參考文獻

Fukushima, K. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks, 1(2), 119-130. doi:10.1016/0893-6080(88)90014-7, 1988.

李瑞峰、王亮亮&王珂。人体动作行为识别研究综述。模式识别与人工智能，27(1)，35–48，2014。

Google Scholar

Sultani, W., Chen, C., & Shah, M. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6479–6488), 2018.

Google Scholar

Simonyan, K., & Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (pp. 568–576),2014a.

Google Scholar

Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. Learning spatiotemporal features with 3d convolutional networks (pp. 4489-4497). Presented at the Proceedings of the IEEE international conference on computer vision, 2015.

Google Scholar

國際替代計量

運用深度學習建構高效率的行為辨識模型

全文下載

主題瀏覽