透過您的圖書館登入
IP:3.22.51.241
  • 學位論文

基於生理訊號之情緒辨識研究

A Study on Physiological-Signal-Based Emotion Recognition

指導教授 : 馬席彬

摘要


在論文中,我們提出了一個基於生理訊號的情緒辨識系統,使人類與機器間 的互動關係能有更進一步的發展。我們收集了一組全新的情緒資料庫─台藝大資料庫(NTUA database)來研究與驗證此系統,並使用演出的方式來自發性地引導出受測者的情緒。資料庫中紀錄了受測者的影像資料、音訊資料及生理訊號(心電訊號及呼吸訊號),論文中,我們專注於生理訊號的部分,研究並使之成為可信賴的情緒辨別方法。 為了有效的辨識受測者的情緒,我們分析所記錄的情緒標示結果並利用這些結果建出一個二維的情緒效價與喚起模型(valence-arousal model),再者除了使用心律變異度分析(heart rate variability)中時域分析及頻域分析來擷取生理訊號的特徵外,我們更使用了混和高斯模型(Gaussian mixture model)和費雪向量(Fisher vector)來得到一組全新的特徵。在音訊或視訊處理上面,使用混和高斯模型和Fisher vector來萃取特徵是十分常見的,但在生理訊號的領域上卻很少使用,然而,此方法可以反映出一連串的生理反應,並增加辨識準確度。 我們選擇支持向量機當作分類器來區分不同等級的情緒激發程度和情緒正負面程度,搭配最小冗餘最大相關特徵選擇來選取較適合的特徵,並分析每個實驗步驟的辨識結果來驗證這些過程是合理的,並比較從心律變異分析和費雪向量擷取出兩種不同的特徵之情緒辨識的準確度,分類出來的結果準確度為激發程度61.1%,正負面程度為54.6%。此外,我們也融合了生理訊號與音訊這兩種訊號的特徵來做分類,最終的分類結果為情緒效價64.24%,情緒喚起為63.31%。

並列摘要


To improve the mutual relationship between humans and machines, an emotion recognition system based on physiological signals is proposed. To study and verify this system, we collect a new emotional database, the National Taiwan University of Art database (NTUA database), and use performing induction method to spontaneously elicit subjects' emotional states. The database records subjects' performance video, audio and physiological signals (electrocardiogram and respiration). In this work, we focus on the physiological signals, and try to make it a reliable channel of emotion recognition. To recognize the subjects' emotion efficiently, we analyze the emotional labels and make use of them to build a two-dimensional valence-arousal model. In addition, not only the prevailing functional features from heart rate variability (HRV) in time domain and frequency domain, but also the features encoded by the Gaussian mixture mode (GMM) and Fisher vector are introduced to obtain new features. The latter is a common method in audio and video signal processing, but it is rarely conducted in physiological area. However, this method can truly reflect the series of physiological reaction, and improve the recognition accuracy. The support vector machine is proposed as the classifier to classify different levels of valence and arousal, and minimum redundancy maximum relevance (mRMR) is used as the strategy of feature selection to pick up the useful features. We analyzed the recognition results from each experiment step to prove that each process was rational, and compared the recognition accuracy between the features from HRV and the features from Fisher vector. The final accuracy of classification is 54.6% for valence and 61.1% for arousal. We also conduct the feature-level fusion of physiological and audio signals, and the recognition accuracy increases to 64.24% for valence and 63.31% for arousal.

參考文獻


[1] C. H. Wu and W. B. Liang, “Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels,” IEEE Transactionson Affective Computing.
[2] K. Wang, N. An, B. N. Li, Y. Zhang, and L. Li, “Speech Emotion Recognition Using Fourier Parameters,” IEEE Transactions on Affective Computing, vol. 6, no. 1, pp. 69–75, Jan 2015.
[3] C. H. Wu, W. L. Wei, J. C. Lin, and W. Y. Lee, “Speaking Effect Removal on Emotion Recognition From Facial Expressions Based on Eigenface Conversion,” IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1732–1744, Dec 2013.
[4] F. Abdat, C. Maaoui, and A. Pruski, “Human-Computer Interaction Using Emotion Recognition from Facial Expression,” in Computer Modeling and Simulation (EMS), 2011 Fifth UKSim European Symposium on, Nov 2011, pp. 196–201.
[7] J. Posner, J. A. Russell, and B. S. Peterson, “The circumplex model of affect: An integrative approach to affective neuroscience, cogni-tive development, and psychopathology,” Develop. Psychopathol., vol. 17, no. 03, pp. 715–734, 2005.

延伸閱讀