網際網路資訊擷取過濾系統─中文關鍵頁超搜尋代理人

隨著網際網路的發展，網站不斷成立，網頁文件在網際網路上快速成長；如何快速有效的進行網際網路資訊檢索，以獲得完整且高度相關的網頁，已是一項重要的研究課題。目前網路上傳統的資訊檢索系統皆以「關鍵字 / 詞」(Keywords)來進行檢索，所獲得的資源資訊極其繁多，擷取回的網頁位址數以千計，單憑使用者本能的過濾功能，縱使耗費巨量的時間，亦難以應付的如此巨量之資訊，造成使用者「資訊過量」的負擔。因此本研究結合資訊檢索、資訊過濾、資訊擷取、中文斷詞、乏晰理論、平行處理等相關技術及理論，建構一個「關鍵頁」超搜尋代理人，使用者僅需提供一篇「關鍵頁 (文件或網頁位址)」(Keypages) 做為輸入，系統擷取出文件的內容，將其斷字斷詞，並擷取出其中關鍵詞建構出特徵向量，透過現有的搜尋引擎，擷取出網路上相關網頁，並透過SimNet的MD值比對出相似度高的網頁，讓使用者可獲得與其資訊需求有高相關度的網頁。

關鍵字

資訊檢索；資訊過濾；關鍵詞擷取；關鍵頁；文件相似比對；個人化資訊服務

並列摘要

Due to the develop of Internet, the Web sites and the number of Web pages have being implemented rapidly. The efficiency of finding desired information from the Internet has attracted great attentions from researchers due to the increasing amount of electronic documents available on the Internet. Most of the currently available Internet search engines have been based on keywords. Through these kinds of keyword searches, hundreds or thousands of returned URLs are not uncommon. In order to find out the desired information, users need to go over all these pages one by one. This time-consuming task is referred to as the problem of information overloading. In this study, a approach is proposed to deal with the “Information Overloading” problem on the Internet. A Chinese Key-page based search agent will be constructed by integrating the following techniques, Information Retrieval, Information Filtering, Information Extraction, especially Computational Intelligence and Parallel Processing algorithms, that will help users find the highly-correlated documents by simply supply an electronic document or an URL.

並列關鍵字

HASH(0xa25c370)

參考文獻

11.Robertson E. S., “The Parametric Description of Retrieval Tests”, Journal of Documentation 25:1, pp.1-27.

12.Salton and McGill, 1983, Introduction to Modern Information Retrieval. McGraw Hill Book Co.

5.胡勝傑、許中川，1999，“中文新聞文件斷詞”，第十屆國際資訊管理學術研討會，桃園，台灣，pp968-974。

6.許加文、李錫捷，1998，“網際網路搜尋過濾系統[一個「關鍵頁」超搜尋智慧型代理引擎”，第九屆國際資訊管理學術研討會，中壢，台灣。

英文文獻

Google Scholar

被引用紀錄

黃國倫（2001）。網際網路購物代理人之設計與建構〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611304061

張如瑩（2001）。多語系平行關鍵頁搜尋引擎之設計與建構〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611302285

吳志鴻（2001）。應用關鍵頁搜尋及知識分類技術於Q&A系統之研究與設計〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611303228

林佩瑩（2002）。分散式搜尋伺服器於生物醫學文獻分析〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611363547

黃思瑋（2003）。平行搜尋引擎於蛋白質交互作用文獻之應用〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611364868

國際替代計量

網際網路資訊擷取過濾系統─中文關鍵頁超搜尋代理人

主題瀏覽