以Web資訊擷取和知識本體融合方法整合領域內容和知識

本論文應用Web Mining方法，自動搜集大量散佈在各個網站或目錄的網頁資訊，並從中淬煉出與領域高度相關的資料。以智慧型系統實作論文所提出的方法，我們的系統可以分析和擷取領域相關網站的知識架構 (例如：sitemap)，經由這些簡單的domain ontologies，系統可以分析「目錄與目錄」、「目錄與物件」、「物件與物件」間的關係，進而融合成為較完整的Domain Ontology (DO)。DO除提供領域知識建構外，有可以根據DO提供使用者更友善的瀏覽動線，能夠更迅速且正確的就找到他們想要的結果。基於這些動機，本論文設計和完成一個以Web資訊擷取和知識本體融合方法整合領域內容和知識的系統，有效將不同領域網站之domain ontologies自動融合 (稱為Ontology Fusion)，以提供自動建立領域知識的系統雛形。本系統以I3S (Intelligent Internet Information System) [8]平台為基礎，分成三階段達到此目標。首先，利用I3DDC (Domain Data Collector) 可快速及有效搜集領域相關的網頁。透過I3DME (Domain Metadata Extractor) 可將領域相關的重要metadata與domain concepts擷取出來。接著系統開發I3DKF (Domain Knowledge Fusioner) 中的I3CF (Catalog Fusion)，以Concept-Based CF與Object-Based CF兩種模式進行目錄之間的合併、搬移。我們提出Ontology Fusion方法，分析目錄間的關聯性，可以進一步擷取目錄和物件間之關聯 (Relations)，讓目錄架構擴充成為Ontology。

關鍵字

知識本體融合；目錄融合；資訊擷取；資料探勘；領域入口網站

並列摘要

In this thesis, we applied web mining techniques to automatically collect a large number of scattered websites and pages directory information for extract highly domain-related information. The intelligent system is then proposed to analyze the structures of websites and directories and extract the structural knowledge as simple ontologies for the domain. Each simple ontology is corresponding to a portal directory that consists of relationships between “catalog and the catalog”, “catalog and object”, and “object and object”. Fusing these ontologies into the Domain Ontology (DO) is feasible since those are extracted from domain-related websites and directories. Based on three types of relationships, “concept-based”, “object-based” and “relation-based” fusion approaches are proposed and integrated into the ontology-fusion processes. Experiments show that the fused DO is not only used as the largest domain portal directory, the DO is also better than any portal sites for organizing the structure knowledge and browsing-and-finding objects.

並列關鍵字

Ontology Fusion ； Catalog Fusion ； Data Mining ； Information Extraction

參考文獻

[1] Chinese MARC, http://catweb.ncl.edu.tw/.

Google Scholar

[2] Chakrabarti, S., van den Berg, M. and Dom, B., “Focused crawling: A new approach to topic-specific web resource discovery,” Proceedings of the 8th World Wide Web Conference, Toronto, 1999.

Google Scholar

[3] Eric Glover, Gary Flake, Steve Lawrence, William P. Birmingham, Andries Kruger, C. Lee Giles, and David Pennock. Improving category specific web search by learning query modifications. In Symposium on Applications and the Internet, SAINT, San Diego, CA, January 8–12 2001.

Google Scholar

[4] Dublin Core, “Dublin Core Metadata Initiative,” available at http://dublincore.org/.

Google Scholar

[5] Findbook, http://findbook.tw/.

Google Scholar

國際替代計量

以Web資訊擷取和知識本體融合方法整合領域內容和知識

主題瀏覽