以關鍵字為搜尋基礎的搜尋引擎是主要用來作為從大量資料中擷取相關文件的方法之一,回傳的搜尋結果(Snippet)並未加以組織,且僅以關鍵字為單一維度的選擇條件,無法提供多面向的瀏覽。在資訊擷取(Information Retrieval)領域中,分類(Classification)與分群(Clustering)是用來自動給定文件集合不同語意目錄類別的兩個方法。前者必須先訓練部分文件集合,形成分類模式以進行自動化分類。而後者則以統計方法計算文件之間的相似度,達到自動分群的目的。由於搜尋結果具有動態特性,且預先定義的目錄類別不具彈性,因此以分群技術作為本研究的工具之一。 本研究提出以具多維度瀏覽功能的虛擬文件倉儲系統作為提供多面向瀏覽搜尋結果的方法。結合現有搜尋引擎,以HAC+P階層分群演算法形成語意上的階層結構,即形成以語意為基礎的概念階層,透過不斷的搜尋與分群,可形成屬於個人的概念化知識地圖,藉此改善使用者的瀏覽經驗,更有效地找到相關的主題及文件內容。
Searching for information based on the keyword-based retrieval by using search engines has limited ability to mine the most important and relevant knowledge. The retrieved search results are disorganized results and lack of dimensions. In the information retrieval (IR) field, text categorization has been investigated for many years to organize search results automatically into corresponding categories, which contains classification and clustering. In this thesis, we propose and describe the Virtual Document Warehouse System, which contains an integrated interface for multi-dimensional analysis for knowledge management and decision-making. The system extracts relevant documents by using search engines and we utilize clustering algorithms to dynamically and automatically organize information retrieved from heterogeneous sources into hierarchical structures, and to combine different concept hierarchies. Finally, we propose an approach that makes searching more convenient and multi-dimensional, and present the application of personalized conceptual knowledge maps.