透過您的圖書館登入
IP:18.118.200.197
  • 學位論文

中文零代詞解析與應用

Chinese Zero Anaphora Resolution and Its Applications

指導教授 : 葉慶隆
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在一般口語表達或文章書寫時,我們常利用代詞(anaphor)來取代之前已經提過的詞語,而代詞解析(Anaphora resolution)就是在文句中找出代詞所參考的先行詞的方法。代詞解析的技術也在一些自然語言處理的應用上扮演著重要的角色。在本篇論文中,我們提出了一種基於重心理論(centering theory)中文零代詞解析方法,並將此法使用在文件分類(text classification)、資訊擷取(information retrieval)與主題地圖(topic map)資訊建立等自然語言處理的應用上。中文零代詞解析的研究分為兩個階段:首先,我們參考了中文語言學中描述零代詞現象的文獻,以及代詞解析的計算機理論後,發展出中文零代詞的解析方法;接下來,我們採用此解析方法實作出中文零代詞解析系統,並以真實的新聞文件作實驗,來驗證系統的解析能力。 為了測試中文零代詞解析在自然語言處理應用上的效果,我們分別製作了一個文件分類系統與資訊擷取系統,並以中時晚報與中央日報等新聞為測試文件作實驗。文件分類的實驗中,我們先將輸入的查詢文件作過零代詞解析,再觀察分類正確率的提升程度。資訊擷取的實驗中,我們提出了一種基於零代詞解析技術的文句主題辨識(topic identification)方法,並進一步利用此法辨識出測試文件中每個文句的主題,再觀察資訊擷取召回率(recall rate)與準確率(precision rate)的提升程度。除了這二個自然語言處理應用之外,我們也利用了上述的主題辨識方法,提出一種主題地圖資訊建立方法,試圖以自動取得文件主題的方式,建立主題地圖中的主題資訊。

並列摘要


Anaphora resolution is the task of determining the antecedent of an anaphor which can be zero, pronominal and nominal forms. It plays an increasingly important role in a number of natural language processing applications including machine translation, information retrieval, text summarization, etc. In this thesis, we aim to investigate computational resolution of zero anaphora in Chinese text and apply the resolution method on NLP applications for examining its performance. The work of zero anaphora resolution is divided into two steps: First, we investigate linguistic behavior of Chinese zero anaphora and computational approaches to anaphora resolution for developing the method of Chinese zero anaphora resolution. Second, the zero anaphora resolution system is implemented according to results of the first step. On completing the implementation, an evaluation of the system is performed on real news articles. Because zero anaphors are not expressed on the surface text, our resolution method is first to detect zero anaphors in each utterance, and then identify their antecedents in the preceding utterance. After the method of zero anaphora resolution is carried out, we adopt the resolution method as a basis for improving the accuracy of NLP applications. A text categorization system integrates the zero anaphora resolution process to recover the omissions of anaphors in query text. An information retrieval system employs a topic identification method to resolve the omissions of topics of documents in the text collection for creating better indices. The topic identification method is developed by employing the notion of the centering model and the zero anaphora resolution method and is further used to create the metadata of XML Topic Maps. The experiments of these applications demonstrate on text collection taken from several newspapers, such as China Times Express and Central Daily News.

參考文獻


[79] Tsay, Jyh-Jong and Wang, Jing-Doo, 2000, Design and Evaluation of Approaches to Automatic Chinese Text Categorization, Computational Linguistics and Chinese Language Processing (CLCLP), 5(2): 43-58.
[86] Wu, D. S. 2003, Automatic Pronominal Anaphora Resolution in English Texts, Master thesis, National Chiao Tung University, Taiwan.
[2] Abney, Steven, 1996, Tagging and Partial Parsing, in: Ken Church, Steve Young, and Gerrit Bloothooft (eds.), Corpus-Based Methods in Language and Speech, An ELSNET volume, Kluwer Academic Publishers, Dordrecht.
[3] Aone, Chinatsu and Bennett, Scott William, 1995, Evaluating automated and manual acquisition of anaphora resolution strategies, Proceedings of the 33rd Annual Meeting of the ACL, Santa Cruz, New Mexico, pages 122–129.
[4] Aone, Chinatsu and McKee, Douglas, 1993, A language-independent anaphora resolution system for understanding multilingual texts". Proceedings of the ACL'93, 156-163.

延伸閱讀


國際替代計量