透過您的圖書館登入
IP:18.221.239.148
  • 學位論文

使用深度學習分析音樂情感基於音頻與歌詞

Music Emotion Analysis Based on Audio and Lyrics Using Deep Learning

指導教授 : 魏世杰

摘要


音樂情感分析一直是音樂資訊檢索中一個不斷發展的研究領域。為了解決基於內容推薦系統的冷啟動問題,需要一種自動設定音樂標籤的方法。由於科技的進展,神經網路可用於為各種任務提取音頻特徵。人們在聽一首歌的時候,最能打動人心的是音樂與歌詞。因此,本研究將嘗試根據音頻信號和歌詞資訊來預測音樂情感的類型。建構模型時,採用卷積神經網路 (CNN) 處理音頻信號,及利用自然語言處理 (NLP) 模型處理歌詞。一個新的數據集 ABP 是從西方流行音樂的三個數據集整理而來,其中每首歌曲都包含由人判斷的效價和喚醒值。音樂情感的類型將根據效價軸和喚醒軸形成的四個象限進行分類。實驗證實,使用音頻和歌詞資訊對歌曲的情感進行分類比以往研究中使用純音頻的學習方法具有更好的分類性能。與相關研究相比,本研究將音頻模型和歌詞模型的準確率提高了8~16%。

關鍵字

多模態音樂情感分類 CNN NLP

並列摘要


Music emotion analysis has been an ever-growing field of research in music in- formation retrieval. To solve the cold start problem of content-based recommendation systems, a method for automatic music labeling is needed. Due to recent advances, neural networks can be used to extract audio features for a wide variety of tasks. When humans listen to a song, it is the music or the lyrics that touch the heart the most. Therefore, this study will try to predict the type of music emotion based on the audio signal and the lyrics information. For model building, convolutional neural networks (CNNs) will be used on the audio signals and natural language processing (NLP) models on the lyrics. A new dataset ABP is compiled from three datasets of Western pop music where each song contains valence and arousal values judged by humans. The type of music emotion will be categorized based on the four quadrants formed by the valence and arousal axes. It is confirmed in the experiment that use of audio and lyrics information to classify the emotions of songs has a better classification performance than use of the audio-only learning methods in previous studies. Compared with a related work, this study has improved the accuracy of the audio model and the lyrics model by 8~16%.

參考文獻


Besson, M., Faita, F., Peretz, I., Bonnel, A.-M., & Requin, J. (1998). Singing in the brain: Independence of lyrics and tunes. Psychological Science, 9(6), 494-498.
Bhaskar, J., Sruthi, K., & Nedungadi, P. (2015). Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Computer Science, 46, 635-643.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music mood detection based on audio and lyrics with deep neural net. arXiv preprint arXiv:1809.07276.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Hevner, K. (1936). Experimental studies of the elements of expression in music. The American Journal of Psychology, 48(2), 246-268.

延伸閱讀