應用於即時情感辨識之基於屬性的雙通道時間網絡

預測臉部連續並且自發性的情緒變化在電腦視覺領域是一個很重要的研究。因為了解即時並且細微的情緒變化會對許多人機互動和醫療監控的應用領域有很大的幫助。在這篇論文裡,我們會著重分析兩個情緒象限,valence 和 arousal 在時間上的動態情緒變化。我們提出了一個基於屬性的雙通道時間網絡,這個網路包含了一個離散的情緒性卷積網路模型 (discrete emotion CNN model)和一個堆疊的長短期記憶模型 (Stacked-LSTM)。透過這兩個模型,我們可以有效結合空間上的臉部特徵資訊和長時間的動態變化進而達到幫助預測的目的。其中,離散的情緒性卷積網路模型是為了擷取出不受動作和個體特徵變化影響的關於情緒的特徵;而堆疊的長短期記憶模型則是用於學習沿著時域上的情緒的動態依賴性。此外,為了穩定訓練過程,並從而得出更平穩可靠的長期預測結果,我們會同時將兩段在時間上位移過的影片輸入 Siamese (雙通道)網路架構。AVEC2012 的實驗結果顯示,我們提出的方法不僅可以即時預測 (平均每秒預測 40.1 個影格),也能在只用影像資訊的條件下得到現階段在 AVEC2012 這個資料上最好的結果。

關鍵字

情感辨識；時間網絡；卷積類神經網路

並列摘要

Predicting continuous facial emotions is essential to many applications in human-computer interaction. In this paper, we focus on predicting the two dimensional emotions: valence and arousal, to interpret the dynamically yet subtly changed facial emotions. We propose an Attribute-based Siamese Temporal Network (AST-Net), which includes a discrete emotion CNN model and a Stacked-LSTM, to incorporate both the spatial facial attributes and the long-term dynamics into the prediction. The discrete emotion CNN model aims to extract attribute-related but pose- and identity-invariant features; and the Stacked-LSTM is used to characterize the dynamic dependency along the temporal domain. Furthermore, in order to stabilize the training procedure and also to derive a smoother and reliable long-term prediction, we propose to jointly learn the model from two temporally-shifted videos under the Siamese network architecture. Experimental results on AVEC2012 dataset show that the proposed AST-Net not only processes in real time (40.1 frames per second) but also achieves the state-of-the-art performance even when using the vision modality alone.

並列關鍵字

Emotion Recognition ； Temporal Network ； Attribute Feature ； Convolutional neural network ； Affective computing ； Emotion dimension ； Facial expression

參考文獻

neural network based multimodal dimensional emotion recognition," In Proc. 5th

Pattern Recognition, pages 1836–1845, 2015.

[5] S. Chen and Q. Jin, " Multi-modal dimensional emotion recognition using

Vision and Pattern Recognition, pages 1933–1941, 2016.

Y. Zhou, “Challenges in representation learning: A report on three machine

國際替代計量

應用於即時情感辨識之基於屬性的雙通道時間網絡

全文下載

主題瀏覽