應用深度學習技術建構人類行為辨識模型

在影像中識別人類行為是一項具有挑戰性的任務，根據對影像中的特徵提取方式的不同可分為兩類：基於人工構造特徵和基於自動學習特徵。做為影像辨識中人類行為識別已逐漸開始應用於日常生活中，如自動監控系統中的異常事件檢測，體育運動分析，各個影音播放平台的影片檢索、影片分類等。在獲取一定水準的準確度同時，也帶來了操作步驟繁瑣，特徵提取的時間消耗高，不能高效執行等眾多問題。三維卷積類神經網路是對時空域的三維座標進行操作，由於其符合影像本身的時空域組成，因此可以擷取到更多的特徵，這說明了三維卷積類神經網路在影像的空間與時間特徵提取方面是有效的。因此本研究將以3D ResNet-18為基礎模型進行優化改進，使用 ResNeXt 的拆分-變換-合併策略，提出一個簡單且較少超參數調整的模組化架構。在KTH和UCF-101資料集上的實驗結果表明本文所改進的演算法準確率(Top-1)分別為的96.3%和60.01%，與原始3D ResNet-18相比，改進後的模型能對原始影像中的人類行為進行更有效的特徵擷取以及提升辨識效果。

關鍵字

深度學習；三維卷積；殘差網路；行為辨識；影像分類

並列摘要

Recognizing human behavior in images is a challenging task, which can be divided into two categories according to the way features are extracted from the image: based on artificial construction features and based on automatic learning features. As the image recognition of human behavior recognition has gradually begun to be used in daily life, such as automatic monitoring system in the detection of abnormal events, sports analysis, video retrieval of various audio and video playback platforms, film classification. The 3D convolutional neural network is the 3D coordinates of the time-space, because it conforms to the time-altitude airspace composition of the image itself, so it can capture more features, which shows that the three-dimensional convolutional neural network is effective in the spatial and temporal feature extraction of the image. Therefore, this research will optimize and improve the 3D ResNet-18 based model, using the split-transform-merge strategy, to propose a simple and less hyperparameter adjustment modular architecture. The experimental results on the KTH and UCF-101 data sets show that the improved algorithm accuracy (Top-1) is 96.3% and 60.01%, and the improved model can take more effective features and improve the recognition effect of human behavior in the original image compared with the original 3D ResNet-18.

並列關鍵字

deep learning ； 3D convolution ； residual network ； behavior recognition ； image classification.

參考文獻

1. 李刚, 刘新, 顾广华. (2018). 基于三维卷积稠密网络的视频行为识别算法. 中国科技论文, (14), 12.

Google Scholar

2. 李瑞峰, 王亮亮, 王珂. (2014). 人体动作行为识别研究综述. 模式识别与人工智能, 27(1), 35–48.

Google Scholar

3. 郭明祥, 宋全军, 徐湛楠, 董俊, 谢成军. (2019). 基于三维残差稠密网络的人体行为识别算法. 计算机应用, 39(12), 3482–3489.

Google Scholar

4. Accuracy and precision. (2019, December 22). In Wikipedia. Retrieved from https://en.wikipedia.org/w/index.php?title=Accuracy_and_precision oldid=931972951

Google Scholar

5. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A. (2011). Sequential deep learning for human action recognition. In International workshop on human behavior understanding (pp. 29–39). Springer.

Google Scholar

國際替代計量

應用深度學習技術建構人類行為辨識模型

不提供下載

主題瀏覽