  • 學位論文


Conceptual Indexing and Knowledge Extraction Based on Automatic Construction of Ontology

指導教授 : 林宣華


隨著Web內容大量成長,從網路閱讀網頁、文件或電子書等,已經成為一般人學習和汲取知識的主要方式。以目前熱門的電子書為例,目前市面上看到的電子書,大多只是把文字的呈現介面從紙本轉移到電子化的裝置上。使用者還是需要以傳統閱讀方式,逐字閱讀目錄和章節內容,並由索引和搜尋方式找尋概念。若能以智慧型系統事先分析電子書內容,自動建立書的知識本體 (Ontology),以呈現書中重要「人事時地物」等名詞或概念,並加以索引分析,建立概念的知識地圖 (Knowledge Map),將有助於新手讀者一目了然整本書的內容架構。本論文整合資料探勘 (Data Mining)、搜尋引擎和Web等技術,自動分析數位化書籍內容,擷取重要概念,探索概念間之關聯,建立書籍內容的初步Ontology,並擴充成為書籍的知識或學習地圖。藉此,可以提供讀者快速概觀書籍。以SVG (Scalable Vector Graphics) 為基礎的圖形化知識地圖介面,更可以提供人性化的索引資訊,藉由瀏覽「人事時地物」關聯,快速連結至書籍內容處,讓讀者一目了然。


As the explosive growth of the Web content, reading pages, documents, or the currently hot eBook from the Web becomes one of the major styles that people learn or acquire knowledge. However, the eBook technology focuses on simulating the way of traditional reading of books on the eBook reader. Readers still word-by-word read the book content. As the book content is digitized, more intelligent services for reading ebooks should be proposed. If the system can previously analyze the content, build the book’s semantic ontology, and create indexing information; then the system may provide Knowledge Map for readers so that they can realize the overview of the book by taking a glimpse. In this thesis, we integrate techniques of data mining, search engine, and web information processing to develop an intelligent system that automatically analyze the book textual content, extract significant concepts, explore conceptual relations, and build the book ontology as an extension of the book’s learning map. In this way, readers can rapidly understand the book’s content. Based on SVG (Scalable Vector Graphics), we provide the graphical Knowledge Map to readers for realizing relations among people, events, times, places, and things.


[1] Ainsworth, Harrison, “Epub Format Construction Guide,” http://www.hxa.name/articles/content/epub-guide_hxa7241_2007.html.
[2] Chen, W., Zhang, Y., Isahara, H., “Chinese Named Entity Recognition with Conditional Random Fields,” Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121, 2006.
[3] G. Marchionini S. Haas C. Plaisant B. Shneiderman C. Hert., “Toward a statistical knowledge network,” In Proceedings of the National Conference on Digital Government Research, pages 27–32, Boston, 2003. National Science Foundation.
[4] Chen, H., and Ng, “An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-bound vs. Connections Hopfield Net Activation,” Journal of the American Society for Information Science (46:5), pp.348-369, 1994.
[5] S.-H. Wu, T.-H. Tsai, and W.-L. Hsu, “Domain Event Extraction and Representation with Domain Ontology,” Proceedings of the IJCAI-03 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 33-38, 2003.
