透過您的圖書館登入
IP:216.73.216.32
  • 學位論文

多標籤分類方法應用於PTT資料

Multi-label classification methods applied to PTT data

指導教授 : 陳景祥
共同指導教授 : 李百靈(Pai-Ling Li)

摘要


隨著社群網路的普及,越來越多人於網路上發表文章來闡述自己的想法,其中PTT也是一大熱門的論壇,形成許多獨特的網路流行文化。在文章分析中,文章分類是很常見的議題,但是一篇文章可能不只有一個標籤屬性,而是多標籤的範疇。本篇研究使用多標籤方法為問題轉換的方法,將多標籤分類轉換為單標籤分類後搭配傳統的分類器進行分類,並加入類神經網路來比較。另外,過去的研究也認為考慮標籤與標籤間的關係能有效提升分類效果,故本篇論文中也使用Copy轉換並以機率的方式進行標籤預測。本篇研究使用PTT論壇電影版的文章資料進行多標籤分類,並使用三種轉換方法搭配分類器,加上另外使用機率預測的方式共八種方法,最後採用六種評估指標衡量各個方法的分類效果。

並列摘要


With the popularity of the social network, more and more people publish articles to express their opinions on the internet forum platforms. Among them, PTT is a popular forum at Taiwan, forming a unique network culture. Generally, document classification is a quite common branch in text analysis. However, some articles may have multi-label category. The multi-label method used in this paper is the problem transformation, which converts multi-label classification into a single-label classification algorithm. In addition, we also adopt neural network classification and compare it with the other methods. On the other hand, it is generally considered that the information of relationship among labels can effectively improve the classification performance. In this paper, we adopt the copy transformation and use posterior probabilities to predict the labels. Eight combinations of algorithms are used for multi-label classification to classify the PTT movie data and six evaluation metrics are adopted to measure the performance of all classification methods.

參考文獻


Boser, B. E., Guyon, I. M., Vapnik, V. N., 1992. A training algorithm for optimal margin classifiers, COLT '92 Proceedings of the fifth annual workshop on Computational learning theory, 144-152.
Breiman, L., 2001. Random Forests, Machine Learning, 45(1), 5-32.
He, H., Xia, R., 2018. Joint Binary Neural Network for Multi-label Learning with Applications to Emotion Classification, Lecture Notes in Computer Science, 11108, 250-259.
Probst, P., Au, Q., Casalicchio, G., Stachl, C., Bischl, B., 2017. Multilabel classification with R package mlr, R Journal, 9(1), 352-369.
Ren, F., Sohrab, M.G., 2013. Class-indexing-based term weighting for automatic text classification, Information Sciences, 236, 109-125.

延伸閱讀