歌曲賞析能讓使用者快速掌握一首歌曲的情境內容,判別這首歌是否符合當下的聆聽需求。歌曲賞析可以利用文字摘要模型從歌詞生成,然而歌詞的抽象特性使得現今文字摘要模型往往無法掌握其涵義,而產生與內容不相關的語句。在這篇論文裡,我們藉由資料處理方法之設計與訓練目標函數之設定,以產生符合歌曲主題的文字敘述。具體而言,我們從網路論壇蒐集歌曲評論作為資料集,並且將歌詞精華與之結合。歌詞精華屬於萃取式(extractive)摘要,是利用語句關係網路以及句子在向量空間的分布,由歌詞選取而得;歌曲評論屬於概括式(abstractive)摘要,來自網路論壇。我們的訓練資料是以萃取式和概括式兩種形式的摘要組合而成,以避免受到歌曲評論的雜論度影響。我們的模型架構為基於注意力機制之神經網路(Transformer),以成對的歌詞與摘要作為訓練素材,學習產生概括式摘要。模型的目標函數結合最大似然估計(maximum likelihood estimation)與向量相似度。實驗結果顯示,此研究所探討的資料處理方法與訓練目標函數有助於提升文字摘要模型的表現。
Music descriptions help users understand the context of a song at a glance. Given the figurative nature of song lyrics, current text summarization models often fail to capture the meaning expressed in songs and, as a result, generate imaginative but irrelevant descriptions. In this work, we propose a music description (or summary) generation scheme based on a novel data representation and training objective. The generation of music descriptions is built upon a Transformer-based model, for which the training objective incorporates semantic similarity into maximum likelihood estimation (MLE). To combat noise, our reference summary for the data representation of a song contains both extractive and abstractive components obtained from the lyrics highlight and interpretation of the song. The lyrics highlight is obtained from graph-based ranking and embedding similarity. The data representation serves as the pseudo-ground-truth for sequence-to-sequence abstractive summarization. The effectiveness of the purposed method is evaluated by metrics such as ROUGE, BLEU, and BERTScore.