從音訊到主題：用卷積神經網路學習語意

Nowadays, music has become an import part of our lives. As cloud-based streaming service becomes popular, people are more dependent on music. Music as a tool of expressing emotions, it is rich in semantics. In previous genre and mood classification tasks, some people already show that combining lyrics and audio features can improve the results. Their research indicates there are potential relationship between audio and lyrics. Lyrics directly describe a song’s topic, while audio can expand the emotions. Nevertheless, lyrics can be incomplete or missing. If we can learn the topics from audio, we can guess the possible topics for a song without using lyrics. We proposed an unsupervised two-stage method. First, we learn the latent topics in lyrics by topic model. Second, we transfer audio signal to topic distribution via a convolutional neural network. We show that this framework can indeed learns a semantical representation from audio and can be directly applied to song retrievals. We can not only search the songs with lyrics. For those songs without lyrics, i.e. classical songs, we can also provide a reasonable result.

並列關鍵字

Convolutional Neural Network ； LDA ； topic model ； audio signal

參考文獻

C. Laurier, J. Grivolla and P. Herrera, "Multimodal Music Mood Classification Using Audio and Lyrics," in ICMLA, 2008.

R. Neumayer and A. Rauber, "Integration of Text and Audio Features for Genre Classification in Music Information Retrieval," in ECIR, 2007.

R. Mayer, R. Neumayer and A. Rauber, "Rhyme and Style Features for Musical Genre Classification by Song Lyrics," in ISMIR, 2008.

R. Mayer, R. Neumayer and A. Rauber, "Combination of audio and lyrics features for genre classification in digital audio collections," in ACM Multimedia, 2008.

R. Mayer and A. Rauber, "Music Genre Classification by Ensembles of Audio and Lyrics Features," in ISMIR, 2011.

國際替代計量

從音訊到主題：用卷積神經網路學習語意

全文下載

主題瀏覽