隨著網路上資訊爆炸的問題,人們需要一個有效的方法去擷取真正所需要的資訊。語意網是在目前的全球網(WWW)之外,架設一層詮釋性資料層(metadata layer),用詮釋性資料描述全球網上的資源。語意網擴展目前的網站結構,在資訊方面給予意義上明確的定義,並且使得人和電腦可以共同合作處理資訊。 在本論文中,我們設計並且實作了一個用來擷取中文文件並且提供語意服務的系統。這個架構包含二個部分: 中文擷取元件及服務前端。擷取元件包含了數個用來擷取中文文件的元件,並且利用機器學習的方法來讓系統的中文文法結構更加完善。服務前端則是提供了數個語意服務。在建構完整個系統之後,我們會評估我們的系統藉由擷取中文文件後的結果,並且探討那些原因會影響到擷取的結果。
With the problem of information explosion on the web, people need an efficient way to extract the information they really need. Semantic web is an emerging technology working by building a metadata layer upon the current web and using the metadata description language to describe the resources on the WWW. It is an extension of current Web where information is given well-defined meaning, better, enabling computers and people to process in cooperation. In this thesis, we design and implement a system that is able to extract the chinese documents and to provide the semantic service. The architecture consists of two parts: Chinese Extraction Components and Service Front End. The Back End consists of several components used to extract the Chinese documents and use Machine Learning to build Chinese grammar structural. The Service Front End provides several semantic services. After building the whole system, we make the evaluation for our system by extract some specific domain events from the relevant documents and figure out which reasons can influence the result of extraction.