以機器學習為基礎的資訊擷取以建立語意網
詮釋性資料

隨著網路上資訊爆炸的問題，人們需要一個有效的方法去擷取真正所需要的資訊。語意網是在目前的全球網(WWW)之外，架設一層詮釋性資料層(metadata layer)，用詮釋性資料描述全球網上的資源。語意網擴展目前的網站結構，在資訊方面給予意義上明確的定義，並且使得人和電腦可以共同合作處理資訊。在本論文中，我們設計並且實作了一個用來擷取中文文件並且提供語意服務的系統。這個架構包含二個部分: 中文擷取元件及服務前端。擷取元件包含了數個用來擷取中文文件的元件，並且利用機器學習的方法來讓系統的中文文法結構更加完善。服務前端則是提供了數個語意服務。在建構完整個系統之後，我們會評估我們的系統藉由擷取中文文件後的結果，並且探討那些原因會影響到擷取的結果。

關鍵字

機器學習；資訊擷取；語意網

並列摘要

With the problem of information explosion on the web, people need an efficient way to extract the information they really need. Semantic web is an emerging technology working by building a metadata layer upon the current web and using the metadata description language to describe the resources on the WWW. It is an extension of current Web where information is given well-defined meaning, better, enabling computers and people to process in cooperation. In this thesis, we design and implement a system that is able to extract the chinese documents and to provide the semantic service. The architecture consists of two parts: Chinese Extraction Components and Service Front End. The Back End consists of several components used to extract the Chinese documents and use Machine Learning to build Chinese grammar structural. The Service Front End provides several semantic services. After building the whole system, we make the evaluation for our system by extract some specific domain events from the relevant documents and figure out which reasons can influence the result of extraction.

並列關鍵字

Machine Learning ； Information Extraction ； Semantic Web

參考文獻

[1] Sean B. Palmer, The Semantic Web: An Introduction, http://infomesh.net/2001/swintro/, 2001.

[2] The web site of World Wide Web Consortium (W3C), http://www.w3.org/.

[5] Resource Description Framework(RDF),http://www.w3.org/RDF/.

[6] Yaser Al-Onaizan and Kevin Knight. 2002. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 400-408.

[9] Brian McBride, Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommendation, http://www.w3.org/TR/rdf-concepts/, 10 February 2004.

被引用紀錄

趙堃廷（2011）。基於機器學習方法之蛋白質複合體分類研究〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://doi.org/10.6827/NFU.2011.00149

國際替代計量

以機器學習為基礎的資訊擷取以建立語意網詮釋性資料

未授權

主題瀏覽