由於文件有著各式各樣的類型與結構,許多豐富的資料皆隱含在其中,文件占各公司、機關的資訊量非常的大,若我們想從這些文件裡查詢或找到資料則必須靠檢索的技術,傳統的方法是依賴全文檢索,但是全文檢索並不能真正找到我們所要的資訊,因為目前的中文資訊檢索系統只能依據使用者所輸入的檢索條件(關鍵字)回覆一堆可能含有使用者所要之資訊的相關文件,然後,使用者再透過人工自行過濾而找出其所要的答案。這對使用者來講,在大部份的狀況下是不符合其需求的,因為他們可能希望系統能直接回答其問題而不是回覆一堆文件資料。由於中文檢索技術發展得較晚,以及中文本身所具有的各種特性,與西方研究者所提出之方法與技術仍存在著不少的差異,其技術與方法可能無法完全轉移到中文環境來使用。因此,為解決中文資訊擷取問題,本研究提出一個新的文件特徵組合模式,使其在分析文件內容的同時,也同時建構了能滿足資訊檢索與資訊擷取需求的文件資訊索引架構,此外,透過新的概念樹(concept tree)的建構,使文件特徵能以文件架構(schema) 和概念樹重組建構。我們也自行發展一個架構在現行資訊檢索系統上的中文自然語言問題回答系統。我們相信這個系統應該是第一個在網際網路上完全自動化與沒有特定領域問題限制的中文自然語言問題回答系統。在我們的方法中,我們嘗試將中文文件以其特徵因子來表示,並且把它轉換為Entity-Relation-Entity(ERE)關係串列模式,然後,透過這個關係串列模式來搜尋與回答答案。我們的系統非常的簡單,但是執行的效果十分良好,而且透過實驗結果分析發現,系統分析越來越多的關係串列模式資料後,其回答問題的正確率會愈來愈高。
Traditional Chinese text retrieval systems return a ranked list of documents in response to a user’s request. While a ranked list of documents can be an appropriate response for the user, frequently it is not. Usually it would be better for the system to provide the answer itself instead of requiring the user to search for the answer in a set of documents. Since Chinese text retrieval has just been developed lately, and due to various specific characteristics of Chinese language, the approaches of its retrieval are quite different from those studies and researches proposed to deal with Western language. Thus, we proposed a document characterization model- EAVR, to solve the Chinese text retrieval problem. In the EAVR conceptual model, an information index structure that satisfies the requirements of information retrieval or information extraction is established during context analysis stage. Besides, the new type of concept tree allows document characteristics schema to be reorganized and reconstructed with the concept tree. We also developed an architecture that augments existing search engines so that they support Chinese natural language question answering. In this dissertation we describe a new approach to build Chinese question answering system, which we believe to be the first general-purpose, fully-automated Chinese question-answering system available on the web. In our approach, we attempt to represent Chinese text by its characteristics, and try to convert the Chinese text into ERE (E: entity, R: relation) relation data lists, and then, to answer the question through ERE relation model. Our system performs quite well in addition to the simplicity of the techniques being utilized. Experimental results show that question answering accuracy can be greatly improved by analyzing more and more matching ERE relation data lists. Simple ERE relation data extraction techniques work well in our system making it efficient to use with many backend retrieval engines.