透過您的圖書館登入
IP:3.19.31.73
  • 期刊
  • OpenAccess

A Novel Trajectory-based Spatial-Temporal Spectral Features for Speech Emotion Recognition

摘要


Speech is one of the most natural form of human communication. Recognizing emotion from speech continues to be an important research venue to advance human-machine interface design and human behavior understanding. In this work, we propose a novel set of features, termed trajectory-based spatial-temporal spectral features, to recognize emotions from speech. The core idea centers on deriving descriptors both spatially and temporally on speech spectrograms over a sub-utterance frame (e.g., 250ms) - an inspiration from dense trajectory-based video descriptors. We conduct categorical and dimensional emotion recognition experiments and compare our proposed features to both the well-established set of prosodic and spectral features and the state-of-the-art exhaustive feature extraction. Our experiment demonstrate that our features by itself achieves comparable accuracies in the 4-class emotion recognition and valence detection task, and it obtains a significant improvement in the activation detection. We additionally show that there exists complementary information in our proposed features to the existing acoustic features set, which can be used to obtain an improved emotion recognition accuracy.

參考文獻


(Hogan, N., Krebs, H. I., Sharon, A. & Charnnarong, J. (1995). U.S. Patent No. 5,466,213A. Cambridge, MA: Massachusetts Institute Of Technology.).
Bach-y Rita, P.,Kercel, S. W.(2003).Sensory substitution and the human-machine interface.Trends in cognitive sciences.7(12),541-546.
Busso, C.,Bulut, M.,Lee, C.-C.,Kazemzadeh, A.,Mower, E.,Kim, S.,Narayanan, S. S.(2008).IEMOCAP: Interactive emotional dyadic motion capture database.Language Resources and Evaluation.42(4),335-359.
Calvo, R. A.,D'Mello, S.,Gratch, J.,Kappas, A.(2014).The Oxford handbook of affective computing.Oxford, England:Oxford University Press.
Campbell, W. M.,Sturim, D. E.,Reynolds, D. A.(2006).Support vector machines using gmm supervectors for speaker verification.IEEE Signal Processing Letters.13(5),308-311.

延伸閱讀