利用同現詞解析進行主題詞文集文件分類之研究－－以多義詞為例

在這資訊爆炸的時代，網際網路像是一座館藏豐富的圖書館，給了我們豐富的資訊來源。然而網際網路上充滿了豐富而氾濫的資訊，當我們利用搜尋引擎檢索網際網路上的資料時，往往我們會得到各領域、各方面出現有搜尋詞的網頁資料，且這些資料通常因為搜尋引擎系統依各文件在搜尋詞上的權重計算而排定，沒有依照其文件相似性而歸於同一類，因此當使用者對搜尋詞在某領域的文件才有興趣時，則需瀏覽多頁搜尋結果才能將這些他有興趣的文件蒐集齊全。依照我們在判斷搜尋關鍵詞的結果哪些是我們有興趣領域的資料的經驗上，通常我們會利用搜尋引擎所節錄文件在搜尋詞附近的片段內容來加以判斷，且常常是利用搜尋詞與其附近的詞彙的關係來判斷這份文件是否就是我們所想要的那方面的資料。因此原理，我們利用搜尋詞附近出現的關鍵詞來分析它與搜尋詞的同現關係，並利用這些同現關係來進行搜尋詞的同現關係詞的分類，然後依此同現關係詞的分類結果來對搜尋引擎找到的文章進行分類，如此則我們有興趣的領域文件則會被歸於同一類，便於使用者瀏覽檢視。

關鍵字

同現詞；多義詞；文件分類

並列摘要

World Wide Web which likes a library collecting abundant of books offers us a plentiful information source in this era of information explosion. Because of this state, we will receive many web pages which exists our search keyword in every domain and every knowledge field when we use searching engine in the internet. We may browsing many pages of searching result for collect all of web pages which in our interesting domain, because of this searching result was arranged by the searching engine in counting the weight of searching word in the web page, not be arranged by similarity of web pages or domain. We usually utilize the extract of web page in the searching result to judge if we interested, especially use the relation between searching word and its nearby word of phrase in the extract. According to the principle of the way we judged, our study analyze the relation between searching word and its nearby words for finding the co-occurrence term of searching word, and classify those co-occurrence term by the relation weight. At last, we can classify those web pages which searched by searching word by classification of co-occurrence term, and show the result by classification of web pages.

並列關鍵字

Co-occurrence Term ； Polysemy Term ； Text Classfication

參考文獻

[3] Lee-Feng Chien, “PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval”, ACM SIGIR Forum, 31, July 1997

[4]Inderjit S. Dhillon, Subramanyam Mallela, Rahul Kumar, “Enhanced word clustering for hierarchical text classification," Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002

[5] Elliott Franco DRABEK, Qiang ZHOU, “Using Co-occurrence statistics as an Information Source for Partial parsing of Chinese”, ACL-2000 2nd Chinese Lang Processing Workshop, 2000

[6] George Forman, “An extensive empirical study of feature selection metrics for text classification,” The Journal of Machine Learning Research Volume 3, 2003

[8] Hu Xiao, Wu Qinyi, Zhong Yixin, “A statistics based method of mining hierarchical word relation”, 2001 International Conferences on Info-tech and Info-net, 2001.

國際替代計量

利用同現詞解析進行主題詞文集文件分類之研究－－以多義詞為例

主題瀏覽