使用隱藏式馬可夫模型之特定網頁資訊抓取蒐集

現今在網路上最主要的活動就是資訊的搜尋，雖然目前的搜尋引擎已經相當的好用了，但是它們仍然有些缺點需要去改進。很多人們的資訊需求是很難用關鍵字為基礎的查詢，就能得到正確的回傳結果，所以在本篇論文裡，我們建立一個名為隱藏式的馬可夫模型，來預測最有可能的網頁路徑，進而達到蒐集特定資訊的目的，而在實驗結果方面也顯示了我們的系統改善了一些搜尋引擎所面臨的一些缺點。

關鍵字

馬可夫鏈；資訊蒐集

並列摘要

Information search is the key activity for many users on the Web. Although search engines are very useful and powerful nowadays, there are also many drawbacks faced by them. Moreover, many information needs are hard to express using keyword-based queries. In this paper, we apply a method to solve composite information needs by building a Hidden Markov Model (HMM) for predicting the most likely path to the target information. We want to use the concept of the focused crawling to trace down a Web site for specific information. The experiment shows that the results is good for the admission information and the accepted papers.

並列關鍵字

HMM ； Information Gathering

參考文獻

20. LSI: Latent Semantic Indexing Tool, http://www.cs.utk.edu/~lsi/

23. Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/

25. WVTool: The World Vector Tool, http://nemoz.org/joomla/index.php?option=com_content&task=view&id=43&Itemid=83

1. Aggarwal, C. C., Al-Garawi, F., and Yu, P. S. 2001. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proceedings of the 10th international Conference on World Wide Web (Hong Kong, Hong Kong, May 01 - 05, 2001). WWW ''01. ACM Press, New York, NY, 96-105.

Google Scholar

2. Chakrabarti, S., Punera, K., and Subramanyam, M. 2002. Accelerated focused crawling through online relevance feedback. In Proceedings of the 11th international Conference on World Wide Web (Honolulu, Hawaii, USA, May 07 - 11, 2002). WWW ''02. ACM Press, New York, NY, 148-159.

Google Scholar

國際替代計量

使用隱藏式馬可夫模型之特定網頁資訊抓取蒐集

未授權

主題瀏覽