隨著網際網路的發展,網站不斷成立,網頁文件在網際網路上快速成長;如何快速有效的進行網際網路資訊檢索,以獲得完整且高度相關的網頁,已是一項重要的研究課題。目前網路上傳統的資訊檢索系統皆以「關鍵字 / 詞」(Keywords)來進行檢索,所獲得的資源資訊極其繁多,擷取回的網頁位址數以千計,單憑使用者本能的過濾功能,縱使耗費巨量的時間,亦難以應付的如此巨量之資訊,造成使用者「資訊過量」的負擔。 因此本研究結合資訊檢索、資訊過濾、資訊擷取、中文斷詞、乏晰理論、平行處理等相關技術及理論,建構一個「關鍵頁」超搜尋代理人,使用者僅需提供一篇「關鍵頁 (文件或網頁位址)」(Keypages) 做為輸入,系統擷取出文件的內容,將其斷字斷詞,並擷取出其中關鍵詞建構出特徵向量,透過現有的搜尋引擎,擷取出網路上相關網頁,並透過SimNet的MD值比對出相似度高的網頁,讓使用者可獲得與其資訊需求有高相關度的網頁。
Due to the develop of Internet, the Web sites and the number of Web pages have being implemented rapidly. The efficiency of finding desired information from the Internet has attracted great attentions from researchers due to the increasing amount of electronic documents available on the Internet. Most of the currently available Internet search engines have been based on keywords. Through these kinds of keyword searches, hundreds or thousands of returned URLs are not uncommon. In order to find out the desired information, users need to go over all these pages one by one. This time-consuming task is referred to as the problem of information overloading. In this study, a approach is proposed to deal with the “Information Overloading” problem on the Internet. A Chinese Key-page based search agent will be constructed by integrating the following techniques, Information Retrieval, Information Filtering, Information Extraction, especially Computational Intelligence and Parallel Processing algorithms, that will help users find the highly-correlated documents by simply supply an electronic document or an URL.