透過您的圖書館登入
IP:216.73.216.100
  • 學位論文

以LDA機率模型進行PTT論壇文章主題分類並分析文章留言與文章主題之關聯

Using Latent Dirichlet Allocation Model for Topic Modeling with Articles of PTT Forum and Analyzing Relevance of Article Comments

指導教授 : 江玥慧 劉昭麟

摘要


隨著科技日新月異,人們在網路上的社群平台與論壇發言越來越普遍,各個國家不同領域的人集合在同一個區域討論分享意見越來越頻繁,但是如何能自動化的分類出每個發言族群討論的內容為一件難事,基於許多分類方法,本研究使用台灣知名的論壇PTT為資料來源,以LDA(Latent Dirichlet Allocation)模型將文章分類出主題群,使用Word2Vec模型分類出回應給同一篇文章的留言之討論主題,觀察其留言與文章主題的關聯性,可作為進一步了解論壇內交流狀況之基礎。

並列摘要


With the rapid development of technology, people's interaction on social networking platforms becomes more and more common. People from different fields in various countries gather in the same area to discuss and share opinions more and more frequently, but how can classify topics of discussion automatically is a difficult thing. This study uses Taiwan’s well-known online forum PTT as a data source, and adopts the LDA (Latent Dirichlet Allocation) model to classify articles into topic groups. Results of the model are used to further investigate if the comments of an article are related to the article in terms of topic groups. Analyzing the association between the comments and the articles can be used as a basis for further understanding of the communication in the PTT forum.

參考文獻


[1] Hong, L., & Davison, B. D. (2010, July). Empirical study of topic modeling in twitter. In Proceedings of the first workshop on social media analytics (pp. 80-88). ACM.
[2] Everett, B. (2013). An introduction to latent variable models. Springer Science & Business Media.
[3] Landauer, T. K., McNamara, D. S., Dennis, S., & Kintsch, W. (Eds.). (2013). Handbook of latent semantic analysis. Psychology Press.
[4] Manning, C., Raghavan, P., & Schütze, H. (2010). Introduction to information retrieval. Natural Language Engineering, 16(1), 100-103.
[5] Hofmann, T. (2000). Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In Advances in neural information processing systems (pp. 914-920).

延伸閱讀