應用自動文字探勘於臺灣中文饒舌音樂歌詞之研究

邁入千禧年後，饒舌歌曲已逐漸進入主流音樂市場，深受年輕族群的歡迎。饒舌歌手經常透過自行創作的歌詞來抒發心情或表達對社會的批判，瞭解饒舌音樂的歌詞內容也能瞭解當代文化和社會風氣。本研究目的旨在透過文字探勘，去探索臺灣中文饒舌音樂歌詞中可能存在之主題類型。本研究首先進行詞頻分析，從整體、年代兩大面向觀察各關鍵詞的出現頻率以瞭解歌詞文本的基本內涵與詞頻分布，隨後進行了k-means分群演算法（k-means clustering）及鄰近傳播分群法之分群實驗，並利用分群結果與人工標記之結果進行支援向量機與K-近鄰演算法之分類實驗。本研究發現臺灣中文饒舌音樂歌詞近二十年來以音樂、愛情、派對的主題最為常見。分群成效方面，鄰近傳播分群法相較於k-means分群演算法會得到略好些的分群成效。分類成效方面，使用K-近鄰演算法相較於支援向量機會得到略好些的分類成效，而且透過分群結果輔助分類標記能訓練出比純人工標記還要好的音樂類歌詞二元分類模型。音樂類主題的歌詞確實存在於臺灣中文饒舌音樂歌詞中，而其他主題類型的歌詞因為有資料不平衡之問題存在，能否自成一類仍有待觀察。建議未來研究可以增加歌詞文本的收錄範圍、嘗試不同的維度縮減方式、從不同面向進行詞頻分析、偕同專家或閱聽者進行標記、使用不同的分群與分類方法。

關鍵字

饒舌；文字探勘；詞頻分析；分群；分類

並列摘要

After entering the millennium, rap songs have gradually entered the mainstream music market and are very popular among young people. Rappers often express their emotions or express criticism of society through their own lyrics. Understanding the content of rap music lyrics can also understand contemporary culture and social atmosphere. The purpose of this study is to explore possible thematic types in Chinese rap music lyrics in Taiwan through text mining. This study first conducted word frequency analysis, calculated the total number of occurrences of keywords in the lyrics text, and observed the frequency of each keyword to understand the basic connotation and word frequency distribution of the lyrics texts. Then, this study used k-means and affinity propagation clustering to conduct unsupervised clustering experiments. Finally, this study used the results of the clustering experiment and manual labeling with the support vector machine and the k-nearest neighbor algorithm to conduct a supervised binary classification experiment. The findings of the study show that the themes of music, love, and party are the most common themes of Chinese rap music lyrics in Taiwan in the past two decades. In terms of clustering effectiveness, the affinity propagation clustering performed slightly better than k-means. In terms of classification performance, the k-nearest neighbor algorithm outperformed the support vector machine slightly, and the labeling through the clustering results could train a binary classification model for music lyrics that is better than pure manual labeling. The lyrics with the theme of music do exist in Chinese rap music lyrics in Taiwan, and it remains to be seen whether other themes exist due to the problem of data imbalance. It is suggested that future research can increase the coverage of lyrics text, try different dimension reduction methods, analyze word frequency from different aspects, label types of lyrics by experts or listeners, and use different clustering and classification methods.

並列關鍵字

rap ； text mining ； word frequency analysis ； clustering ； classification

參考文獻

林浩立（2005）。流行化、地方化與想像：臺灣嘻哈文化的形成。人類與文化，37，7-28。doi:10.6719/MC.200509_(37).0003

蕭蘋、蘇振昇（2002）。揭開風花雪月的迷霧：解讀臺灣流行音樂中的愛情世界（1989–1998）。新聞學研究，70，167-195。doi:10.30386/ MCR.200201_(70).0006

Chen, S.-Y., Tseng, T.-T., Ke, H.-R., & Sun, C.-T. (2011). Social trend tracking by time series based social tagging clustering. Expert Systems with Applications, 38, 12807-12817. doi:10.1016/j.eswa.2011.04.073

Chervonenkis, A. Y. (2013). Early history of support vector machines. In B. Schölkopf, Z. Luo, & V. Vovk (Eds.), Empirical inference: Festschrift in honor of Vladimir N. Vapnik (chap. 3, pp. 13-20). Berlin, Germany: Springer. doi:10.1007/978-3-642-41136-6_3

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27. doi:10.1109/TIT.1967.1053964

國際替代計量

應用自動文字探勘於臺灣中文饒舌音樂歌詞之研究

全文下載

主題瀏覽