A Novel Trajectory-based Spatial-Temporal Spectral Features for Speech Emotion Recognition

Speech is one of the most natural form of human communication. Recognizing emotion from speech continues to be an important research venue to advance human-machine interface design and human behavior understanding. In this work, we propose a novel set of features, termed trajectory-based spatial-temporal spectral features, to recognize emotions from speech. The core idea centers on deriving descriptors both spatially and temporally on speech spectrograms over a sub-utterance frame (e.g., 250ms) - an inspiration from dense trajectory-based video descriptors. We conduct categorical and dimensional emotion recognition experiments and compare our proposed features to both the well-established set of prosodic and spectral features and the state-of-the-art exhaustive feature extraction. Our experiment demonstrate that our features by itself achieves comparable accuracies in the 4-class emotion recognition and valence detection task, and it obtains a significant improvement in the activation detection. We additionally show that there exists complementary information in our proposed features to the existing acoustic features set, which can be used to obtain an improved emotion recognition accuracy.

關鍵字

Emotion Recognition ； Speech Processing ； Spatial-Temporal Descriptors ； Mel-Filter Bank Energy

參考文獻

(Hogan, N., Krebs, H. I., Sharon, A. & Charnnarong, J. (1995). U.S. Patent No. 5,466,213A. Cambridge, MA: Massachusetts Institute Of Technology.).

Google Scholar

Bach-y Rita, P.,Kercel, S. W.(2003).Sensory substitution and the human-machine interface.Trends in cognitive sciences.7(12),541-546.

Google Scholar

Busso, C.,Bulut, M.,Lee, C.-C.,Kazemzadeh, A.,Mower, E.,Kim, S.,Narayanan, S. S.(2008).IEMOCAP: Interactive emotional dyadic motion capture database.Language Resources and Evaluation.42(4),335-359.

Google Scholar

Calvo, R. A.,D'Mello, S.,Gratch, J.,Kappas, A.(2014).The Oxford handbook of affective computing.Oxford, England:Oxford University Press.

Google Scholar

Campbell, W. M.,Sturim, D. E.,Reynolds, D. A.(2006).Support vector machines using gmm supervectors for speaker verification.IEEE Signal Processing Letters.13(5),308-311.

Google Scholar

國際替代計量

A Novel Trajectory-based Spatial-Temporal Spectral Features for Speech Emotion Recognition

全文下載

主題瀏覽