本論文研製之目的是為了提升全文文件資料的檢索效率,以節省資訊需求者浪費在資訊重複篩選與過濾上的時間與成本。現今資訊的傳遞藉由網際網路的盛行,已呈現爆炸性的成長,各種不同知識領域的資訊正透過網路廣泛地互相傳遞,此時全世界的資訊都藉由著文字、影像與聲音的型態在不斷的接受與傳遞;這其中又以文字資訊所佔的比例最高,文字以其特性記錄著人類以自然語言描述的概念細節,在如此不具有結構性質的文字描述中,我們要如何的利用檢索的方式對這些非結構性的資訊進行檢索,取出個人需求的資訊?這個問題正如同在廣如大海般的資料流中尋找個人所需的資訊一般的困難。為了針對具備非結構性的全文資料進行有效的檢索,我們提出了一個有別於傳統關鍵字檢索的方法論,我們稱之為概念檢索。 概念檢索的核心是以概念分類為主要的重點,除此之外,為了配合概念檢索的執行,我們使用多維向量空間的方式來進行文件相似度的計算,此時的文件概念相似度就不再只是計算關鍵字的頻率而已,概念的相關、擴展與收縮亦可藉由空間向量距離的計算而得到。配合著概念分類的系統設計還加上了概念主題檢索的功能,透過視覺化的雷達圖介面,可導引檢索者以視覺式的方式進行概念的調整,以輔助檢索者對檢索需求進行最佳化的動作,務求第一次檢索就能達成任何一位檢索者的需求,提供對檢索者有效的資訊。 系統的實作建置是以網際網路上的電子報作為實驗對象,當然本研究所強調的是方法論的提出,任何全文文件資料皆可應用此方法論而進行概念全文檢索。此研究結果證實了概念檢索可以根據資訊檢索者的不同需求情況而在檢索回覆上給予適當的記憶能力與精確能力。除此之外,概念主題檢索功能的提供,使的資訊需求者的檢索概念能與資訊系統所提供的概念配合,強化了檢索的有效性。
A procedure is studied for the purpose of query efficiency improvement for text data, to save the time and cost for who eager for information. Nowadays, information is no longer limited by area due to the blooming usage of Internet. Information is propagated widely via Internet in the format of voice, picture and text. Compare to other format, text data is the major usage to carry on the cable communication in human society. However, the concept of the description of using text is lacks in precision compare to the traditional database which use the “tuple” to record the data precisely. In order to have the efficient query during information search in text data, this study propose a methodology named concept indexing different to the traditional skill of text indexing which usually take times to re-screen the information during query. Concept category is the core of concept indexing. All the keyword will be transferred from the term space to concept space, and the document similarity will be then calculated in the concept space using the theory of Euclidean distance in vector space. This usage of vector space will bring the function of the relation, contraction and dilation between concepts. Base on the category of concept, the study also involve the idea of visual text mining to address the subject in the concept, trying to help the information buyer to get the useful target in the first time of query. The Internet news was used to implement the system; different kind of text data source can be adapted to the system since the methodology is proposed. The experiment results of this study show that the concept indexing can adjust the ability of recall and precision according to the requirement of information buyer. And the subject of concept for both information buyer and query system can be matched by the using of concept indexing skill.