中文文本中限定性抽象名詞指代消解

在文本中，指代是一種常見的詞彙替換，用以指示先前所提到的事物。在中文文件裡，指代現象包括有代名詞指代、零指代以及名詞指代，其參照對象可為抽象描述或實體名稱。在本論文中，我們針對限定性的抽象名詞指代，提出一個以小句為單位的指代消解程序。利用同義詞詞林、中研院八萬目詞辭典及網路搜尋相關詞等資源，進行指代詞辨識、辨識特徵萃取。我們建立有限狀態機，以進行指代詞辨識，在1538個實例中達到90%辨識正確率。我們萃取四種類型共十個特徵，包括位置特徵、距離特徵、詞彙特徵和語義特徵，做為回指對象的挑選依據。我們分別以支援向量機分類器和權重計算法來進行指代消解，並以基因演算法求出最佳特徵組合。實驗結果顯示在241個抽象名詞指代消解，支援向量機分類器在小句符合的正確率是40.66%，長句符合的正確率是68.46%，權重計算方法在小句符合的正確率是42.32%，長句符合的正確率是70.54%。

關鍵字

指代消解；抽象指代

並列摘要

Anaphora is a common phenomenon in written texts, denoting the use of terms referring the mentioned entities previously. There are pronominal anaphora, zero-anaphora, and nominal anaphora in Chinese texts. The referents can be abstract or entities. In this thesis, we focus on studying definite abstract noun anaphora, and we propose a clause based anaphora resolution procedure. Furthermore, anaphora identification and feature selection are done by using CLINE, CKIP lexical and Google search results etc. The anaphora recognition achieves 90% precision using finite state machine in 1538 instances. Furthermore, we extract four types of features to classify candidate antecedents including position features, distance features, lexicon features and semantic features. These features are used for building SVM classifiers and weighted model on resolving anaphora. The best features set are found by a genetic algorithm. In 241 definite anaphora instances, the SVM classify achieves 40.66% on correct clause position and 68.46% on correct sentence position. The weighted method achieves 42.32% on correct clause position and 70.54% on correct sentence position.

並列關鍵字

anaphora resolution ； abstract anaphora

參考文獻

[2] Chih-Chung Chang, Chih-Jen Lin, LIBSVM : a library for support vector machines, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.

[8] Nicholas Asher, Reference to abstract objects in discourse, Kluwer Academic Publisher, 1993.

[15] Costanza Navarretta, “Resolving Individual and Abstract Anaphora in Texts and Dialogues”, In Proceedings of the 20th International Conference of Computational Linguistics (COLING), Geneva, Switzerland, pp. 233-239, 2004.

[17] Miriam Eckert, Michael Strube, “Dialogue Acts, Synchronizing Units and Anaphora Resolution”, Journal of Semantics 2000, 17, pp. 51-89, 2000.

[22] Tyne Liang, Shan-Chun Pan, Kwan-His Chen, “Sentence-based Topic Identification and Its Applications in Chinese Texts”, National Computer Symposium, Taipei, Taiwan, 2009.

被引用紀錄

梁芷榕（2016）。美國親職假相關法制之研究─兼論我國相關法制未來應有之發展〔碩士論文，國立中正大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201614055035

國際替代計量

中文文本中限定性抽象名詞指代消解

全文下載

主題瀏覽