引用本體論至相關文件檢索之研究

相關文件檢索的議題已被廣泛地討論，並有各種不同的方法或技術被提出或實際應用至上線的文件檢索系統中。大部分的方法採取讓使用者輸入查詢，系統對查詢字串做些處理，再進行全文比對以找到相關文件；或者，提供使用者特定欄位的查詢，如標題、摘要、關鍵字、參考文獻等，再將這些特定欄位轉成特定的模式做相似度計算，如向量模式搭配TF/IDF 來計算文章相似度。整體而言，這些方法主要來自於資訊檢索(Information Retrieval)這門領域中。語意網(Semantic Web)是一門新興的研究領域，並已被用來和其他研究領域相結合以產生各種應用，這些領域包括知識管理、代理人通訊、網路服務等。語意網的核心概念為本體論(Ontology)，根據本體論的特性，以標籤語言方式將特定內容具備的語意充份地呈現出來，不但具可讀性，更能被電腦系統作進一步的處理；而目前大多提出的相關文件檢索的方法對於文件內容中語意特性的處理仍然有限，再加上較少文獻論及將本體論的概念應用至相關文件檢索的方法，因此促成本研究的產生。於本研究中，將本體論應用至相關文件檢索的架構被設計出來，並實作一個雛型系統。系統的輸入為一份文件，而輸出為和輸入文件相關的文件；而系統處理程序主要分成若干步驟：(1)將輸入文件轉換成本體論的格式。(2)若輸入文件已存在於系統中，則直接輸出相關文件。(3)若輸入文件不存在於系統中，則進行輸入文件和已存在於系統中文件的相似度計算。其中，本研究設計兩種相似度計算方法來計算相似度，並搭配遺傳演算法來分別計算兩種相似度計算結果所對應的權重，完成最終的相似值。

關鍵字

相關文件檢索；本體論萃取；本體論對應；本體論

並列摘要

Research of relevant document discovery is practical and attractive to many researchers, and there are different solutions to this issue. Some solutions have been adopted in real world environments, such as electronic articles publishers. These publishers offer different information search options such as keywords, full-text, phrase, boolean expression…etc, for users to retrieve documents. Most relevant document discovery techniques are originally from the domain of information retrieval. The core concept of semantic web is ontology, which has been applied in various domains, such as web service, agent communication, knowledge management… etc. However, there was few paper applied ontology to the research of relevant document discovery. Therefore, in this paper, ontology is applied to the issue of relevant documents discovery and a prototype system is constructed to implement the method proposed. With the input of a user selected document, the designed prototype system could generate a number of closely related documents that originally stored in the repository. The process of the prototype system could be mainly divided into the following steps: (1) transforming the input text document into OWL format (2) determining if the input document already exists in the ontology repository of the system (3) if the input document does not exist in ontology repository, then the program will calculate the similarity between the input ontology and the documents originally stored in ontology repository, and retrieving related documents with higher similarity values. Ontology extraction and similarity calculation are the cores that applied the concept of ontology to the prototype system. The objective of ontology extraction is to transform TXT format documents into OWL formats according to the characteristics of ontology. Secondly, similarity calculation is composed of two methods: concept similarity and instance similarity are proposed and implemented in the prototype system.

並列關鍵字

Relevant Document Discovery ； Ontology Extraction ； Ontology ； Ontology

參考文獻

N. R., Automatic Ontology-Based Knowledge Extraction from Web Documents,

IEEE Intelligent Systems, Vol. 18, No.1, pp.14-21, 2003.

4. Berners-Lee Tim, Hendler James, Lassila Ora, THE SEMANTIC WEB,

Calleja, Raúl Fernández, Jorge Vila, Semantic Web Digital Archive Integration,

6. Doan, A., Jayant, M., Pedro, D., Alon, H., “Learning to map between ontologies on

國際替代計量

引用本體論至相關文件檢索之研究

未授權

主題瀏覽