透過您的圖書館登入
IP:3.23.101.60
  • 學位論文

潛在類別分析於文字探勘之應用

Applying Latent Class Analysis on Text Mining

指導教授 : 江振東

摘要


現今網路的使用已經成為主流,因此在網站上擁有大量的文字信息。文字探勘也因此成為一種流行的資料分析方法。潛在類別分析(Latent Class Analysis)是一常用於社會科學的分析方法來尋找潛藏於資料背後的潛在類別。在本文中,我們應用潛在類別分析來評估此分析方法應用於文字探勘的可行性。本文中針對兩個案例進行論證和研究,一個是比較“水滸傳”和“三國演義”的相似性檢測,另一個則是針對新聞文章的分類問題來尋找關鍵詞並據此提供結論和建議。

並列摘要


There is a large amount of information on the website that is in text form, and due to the increment of internet usage, text mining has become a popular method for information retrieval. In this paper, we apply Latent Class Analysis (LCA), a technique that is often used in social sciences to reveal underlying latent classes, on text mining and check whether it is an appropriate method on this regard. Two study cases are demonstrated, one is similarity detection that compare two novels, Water Margin and Romance of Three Kingdom, and the other is using classification that classify the categories for news articles to find important keywords. Conclusions and suggestions are provided.

參考文獻


Aggarwal, C. C. & Zhai, C. X. (2012). Mining Text Data. New York, NY: Springer Publishing Company.
Forster, M. R. (2000). Key Concepts in Model Selection: Performance and Generalizability. Journal of Mathematical Psychology, 44, 205- 231.
Lin, T. H. & Dayton, C. M. (1997). Model Selection Information Criteria for Non-Nested Latent Class Models. Journal of Educational and Behavioral Statistics, 22(3), 249-264.
Linzer, D. A. & Lewis, J. B. (2011). poLCA: An R Package for Polytomous Variable Latent Class Analysis. Journal of Statistical Software, 42(10), 1-29.
Matsuo, Y. & Ishizuka, M. (2004). Keyword Extraction from a Single Document Using Word Co-Occurrence Statistical Information. International Journal on Artificial Intelligence Tools, 13(1), 157-169.

延伸閱讀