應用自動文字探勘於臺灣中文饒舌音樂歌詞之研究

邁入千禧年後，饒舌歌曲已逐漸進入主流音樂市場，深受年輕族群的歡迎。饒舌歌手經常透過自行創作的歌詞來抒發心情或表達對社會的批判，了解饒舌音樂的歌詞內容也能了解當代文化和社會風氣。本研究旨在運用文字探勘探索臺灣中文饒舌音樂歌詞中可能存在之主題類型。本研究首先進行詞頻分析，計算關鍵詞在歌詞文本中出現的總次數，從整體、歌手、年代三個不同面向去觀察關鍵詞的出現頻率以了解臺灣中文饒舌音樂歌詞的基本內涵與詞頻分布。隨後使用K-means分群演算法及鄰近傳播分群法進行非監督式的分群實驗，並透過輪廓係數的計算以及對各群集的深入觀察作為對分群成效的評估，同時找出了七種可能存在之歌詞主題類型，分別為：音樂、派對、友情、愛情、成長、地方、社會。最後，利用分群實驗與人工標記之結果搭配支援向量機與K-近鄰演算法進行監督式的二元分類實驗，並透過正確率、精確率、召回率與F1值之計算評估此兩種分類演算法在不同的歌詞主題及不同的標記方式下對於臺灣中文饒舌音樂歌詞之分類成效。本研究發現臺灣中文饒舌音樂歌詞近二十年來以音樂、愛情、派對的主題最為常見，隨著年代的推移，也有越來越多不同的歌詞主題出現，例如，日常生活、社會議題、學校等。分群成效方面，鄰近傳播分群法相較於K-means分群演算法會得到略好些的分群成效。分類成效方面，使用K-近鄰演算法相較於支援向量機會得到略好些的分類成效，而且透過分群結果輔助分類標記能訓練出比純人工標記還要好的音樂類歌詞二元分類模型。音樂類主題的歌詞確實存在於臺灣中文饒舌音樂歌詞中，而其他主題類型的歌詞因為有資料不平衡之問題存在，能否自成一類仍有待觀察。建議未來研究可以增加歌詞文本的收錄範圍、嘗試不同的維度縮減方式、從不同面向進行詞頻分析、偕同專家或閱聽者進行標記、使用不同的分群與分類方法。

關鍵字

饒舌；文字探勘；詞頻分析；分群；分類

並列摘要

After entering the millennium, rap songs have gradually entered the mainstream music market and are very popular among young people. Rappers often express their emotions or express criticism of society through their own lyrics. Understanding the content of rap music lyrics can also understand contemporary culture and social atmosphere. The purpose of this study is to explore possible thematic types in Chinese rap music lyrics in Taiwan through text mining. This study first conducted word frequency analysis, calculated the total number of occurrences of keywords in the lyrics text, and observed the frequency of each keyword from three aspects: overall, singer, and age to understand the basic connotation and word frequency distribution of the lyrics texts. Then, this study used K-means and affinity propagation clustering to conduct unsupervised clustering experiments, and used the calculation of silhouette coefficients and in-depth observation of each cluster to evaluate the effectiveness of clustering. As a result, seven possible lyrics themes were found: music, party, friendship, love, growth, local place, and society. Finally, this study used the results of the clustering experiment and manual labeling with the support vector machine and the K-nearest neighbor algorithm to conduct a supervised binary classification experiment, and through the calculation of accuracy, precision, recall and F1 value, the effectiveness of these two classification algorithms on the classification of Chinese rap music lyrics in Taiwan under different lyrics themes and different labeling methods was evaluated. The findings of the study show that the themes of music, love, and party are the most common themes of Chinese rap music lyrics in Taiwan in the past two decades. As years go by, more and more different lyrics themes appear, such as daily life, social issues, school, etc. In terms of clustering effectiveness, the affinity propagation clustering performed slightly better than K-means. In terms of classification performance, the K-nearest neighbor algorithm outperformed the support vector machine slightly, and the labeling through the clustering results could train a binary classification model for music lyrics that is better than pure manual labeling. The lyrics with the theme of music do exist in Chinese rap music lyrics in Taiwan, and it remains to be seen whether other themes exist due to the problem of data imbalance. It is suggested that future research can increase the coverage of lyrics text, try different dimension reduction methods, analyze word frequency from different aspects, label types of lyrics by experts or listeners, and use different clustering and classification methods.

並列關鍵字

Rap ； Text Mining ； Word Frequency Analysis ； Clustering ； Classification

參考文獻

Bennett, A. (2000). Popular Music and Youth Culture: Music, Identity and Place. London, England: Macmillan.

Google Scholar

Chen, S. Y., Tseng, T. T., Ke, H. R., & Sun, C. T. (2011). Social trend tracking by time series based social tagging clustering. Expert Systems with Applications, 38(10), 12807-12817.

Google Scholar

Chervonenkis, A. Y. (2013). Early history of support vector machines. In Empirical Inference (pp. 13-20). Springer, Berlin, Heidelberg.

Google Scholar

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.

Google Scholar

Dueck, D., & Frey, B. J. (2007, October). Non-metric affinity propagation for unsupervised image categorization. In 2007 IEEE 11th International Conference on Computer Vision (pp. 1-8). IEEE.

Google Scholar

國際替代計量

應用自動文字探勘於臺灣中文饒舌音樂歌詞之研究

全文下載

主題瀏覽