透過您的圖書館登入
IP:18.222.69.152
  • 學位論文

引用本體論至相關文件檢索之研究

Applying Ontology to Relevant Document Discovery

指導教授 : 鄭裕勤
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


相關文件檢索的議題已被廣泛地討論,並有各種不同的方法或技術被提出或實 際應用至上線的文件檢索系統中。大部分的方法採取讓使用者輸入查詢,系統對 查詢字串做些處理,再進行全文比對以找到相關文件;或者,提供使用者特定欄 位的查詢,如標題、摘要、關鍵字、參考文獻等,再將這些特定欄位轉成特定的 模式做相似度計算,如向量模式搭配TF/IDF 來計算文章相似度。整體而言,這 些方法主要來自於資訊檢索(Information Retrieval)這門領域中。 語意網(Semantic Web)是一門新興的研究領域,並已被用來和其他研究領域相 結合以產生各種應用,這些領域包括知識管理、代理人通訊、網路服務等。語意 網的核心概念為本體論(Ontology),根據本體論的特性,以標籤語言方式將特定 內容具備的語意充份地呈現出來,不但具可讀性,更能被電腦系統作進一步的處 理;而目前大多提出的相關文件檢索的方法對於文件內容中語意特性的處理仍然 有限,再加上較少文獻論及將本體論的概念應用至相關文件檢索的方法,因此促 成本研究的產生。 於本研究中,將本體論應用至相關文件檢索的架構被設計出來,並實作一個雛 型系統。系統的輸入為一份文件,而輸出為和輸入文件相關的文件;而系統處理 程序主要分成若干步驟:(1)將輸入文件轉換成本體論的格式。(2)若輸入文件已 存在於系統中,則直接輸出相關文件。(3)若輸入文件不存在於系統中,則進行 輸入文件和已存在於系統中文件的相似度計算。其中,本研究設計兩種相似度計 算方法來計算相似度,並搭配遺傳演算法來分別計算兩種相似度計算結果所對應 的權重,完成最終的相似值。

並列摘要


Research of relevant document discovery is practical and attractive to many researchers, and there are different solutions to this issue. Some solutions have been adopted in real world environments, such as electronic articles publishers. These publishers offer different information search options such as keywords, full-text, phrase, boolean expression…etc, for users to retrieve documents. Most relevant document discovery techniques are originally from the domain of information retrieval. The core concept of semantic web is ontology, which has been applied in various domains, such as web service, agent communication, knowledge management… etc. However, there was few paper applied ontology to the research of relevant document discovery. Therefore, in this paper, ontology is applied to the issue of relevant documents discovery and a prototype system is constructed to implement the method proposed. With the input of a user selected document, the designed prototype system could generate a number of closely related documents that originally stored in the repository. The process of the prototype system could be mainly divided into the following steps: (1) transforming the input text document into OWL format (2) determining if the input document already exists in the ontology repository of the system (3) if the input document does not exist in ontology repository, then the program will calculate the similarity between the input ontology and the documents originally stored in ontology repository, and retrieving related documents with higher similarity values. Ontology extraction and similarity calculation are the cores that applied the concept of ontology to the prototype system. The objective of ontology extraction is to transform TXT format documents into OWL formats according to the characteristics of ontology. Secondly, similarity calculation is composed of two methods: concept similarity and instance similarity are proposed and implemented in the prototype system.

參考文獻


N. R., Automatic Ontology-Based Knowledge Extraction from Web Documents,
IEEE Intelligent Systems, Vol. 18, No.1, pp.14-21, 2003.
4. Berners-Lee Tim, Hendler James, Lassila Ora, THE SEMANTIC WEB,
Calleja, Raúl Fernández, Jorge Vila, Semantic Web Digital Archive Integration,
6. Doan, A., Jayant, M., Pedro, D., Alon, H., “Learning to map between ontologies on

延伸閱讀