透過您的圖書館登入
IP:3.147.72.31
  • 學位論文

生醫文獻自動化分群系統與評估

Automatic Biomedical Literature Clustering System and Evaluations

指導教授 : 翁昭旼 蔣以仁
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


異質性資料在文件上的共現問題導致了複雜的結構,如何解釋它們之間的關聯一直以來是很多研究者想解決的問題。尤其現今電腦網際網路(Internet)時代來臨,大部份的人皆被網路便利性、快速性等性質深深吸引著,人們漸漸以網際網路作為尋找資料、分享資料的主要管道,使得文字電子資訊量大增,在文獻、網頁、新聞或企業文件量上皆成指數成長,因此如何有效管理這些大量文件變成一個重要議題。 本論文主要目的是發展一套生醫文獻自動化分群系統,希望能從這些散亂的文獻中自動化將類似領域主題知識聚集在一起。藉此幫助使用者在面對龐大的醫學文獻時能有效、快速瞭解其知識結構內容。在這篇論文中我們以關聯法則實作Clique Percolation Method Simplex概念,最後與Literature Clustering Search在Reuters- 21578與OHSUMED兩個文件分類測試集(Benchmark)上評估其Precision、Recall、Normalized mutual Information、Pairwise Testing之間的差異。

關鍵字

群聚分析 社群 文字探勘

並列摘要


The co-occurrence of items in data always induces a complex structure. Many researchers try to discover them. However, heterogeneity lets the data hard to analysis. Especially associated with the arrival of the Internet era, most of the people become deeply attract to the convenience and effectiveness of Internet, therefore, try to find a way to explain its model. As Internet has gradually become a major access for people to search for information and share it with others, which brings about the large increase in electronic texts—the growth in the number of literature, web pages, news reports, and business documents is exponential. Therefore, how to effectively arrange this large amount of texts has become a crucial issue. This essay aims to develop a set of automatic biomedical literature clustering system and compare them. Hopefully, it will be able to automatically arrange these disorderly texts into an organized knowledge database, in the meantime categorizing them according to different themes and fields. We hope this system will be of help to its users to effectively grasp the structure and content of the knowledge they seek for when they encounter such great deal of medical literature. In this thesis, we apply the association rule to the clique percolation method and the concept of simplex. Then, for the literature clustering search, we will adopt two text categorization and collection benchmarks—Reuters-21578 and OHSUMED, discerning the differences of the precision, recall, normalized mutual information, and pairwise testing of the two.

並列關鍵字

Cluster Analysis Community Text mining

參考文獻


【2】M. Buchanan, NEXUS:Small Worlds and the Groundbreaking Science of Networks. NY:Norton, 2002.
【4】R. McAleese, A theoretical view on concept mapping. Association for Learning Technology Journal, Vol. 2, No. 1, 38–48, 1994
【5】L. Schultze, D.E. Leidner, Studying Knowledge Management in Information System Research: Discourses and Theoretical Assumptions. MIS Quarterly, Vol. 26, No. 3, 2002
【6】J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation. ACM Press, 1-12, 1999.
【7】D. Watts, S. Strogatz, Collective Dynamics of Small-World Networks. Nature, Vol. 393, 440-442, 1998.

被引用紀錄


夏景岳(2010)。應用文獻自動化分群系統對於臨床藥學文獻資料檢索服務之探討〔碩士論文,臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2010.00002
王敦威(2007)。文獻參考網路分析之單一主題參考文獻分析〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2007.03277

延伸閱讀