透過您的圖書館登入
IP:3.145.173.132
  • 學位論文

中文顯性和隱性語篇關係分析之研究

Chinese Explicit and Implicit Discourse Analysis

指導教授 : 陳信希

摘要


近年來自然語言處理的研究,隨著字、詞層面的研究日益成熟,以及PDTB、RST-DT等大規模語篇關係語料庫的出現,對於語篇關係的研究日益增加。若是能正確預測篇章的關係,將有助於理解通篇的語義關係,在自然語言處理的相關應用如QA系統、自動摘要也都有很大的幫助。   然而,由於中文缺乏了語料庫的資源,目前對於中文語篇關係的研究還是不多。   在本文中,我們先針對哈爾濱工業大學在2013年發布的HIT-CIR中文語篇關係語料庫進行初步的分析。在研究過程中,因為資料集的稀疏,我們轉以另一個大規模的虛擬資料集做為訓練集。實驗的結果顯示使用大規模的語料訓練模型,有利於預測不同來源的文本。   最後,我們進一步的分析,顯隱性語篇關係的分類性能,並分析了語篇單位周遭的非主要語篇標記是否和句子本身的語篇關係相關。

並列摘要


In recent years, research in natural language processing, with the study words, phrases levels become more sophisticated. Since the large-scale manually annotated corpus of discourse relations such as PDTB and RST-DT have been released, the study of discourse relation is increasing. If we could correctly predict the relationship between discourse, it will help to understand the semantic understanding. The related applications in natural language processing, such as QA systems, automatic summaries are also of great help.  However, due to the lack of a corpus of Chinese resources, the study in Chinese discourse relations are still little currently.  In this work, we first make a preliminary analysis for HIT-CIR Chinese Discourse Relations Corpus, Harbin Institute of Technology released in 2013. Because of small-scale of datasets, we turn to treat another large-scale pseudo dataset as the training set. Experimental results show that this large-scale corpus training model promote to predict the discourse relation of text from different sources.  Finally, we were further analyzed to the classification performance of implicit and explicit discourse relations, and analyzed whether the non-primary Markers is relevance to its discourse relation.

參考文獻


Hen-Hsen Huang, Tai-Wei Chang, Huan-Yuan Chen, and Hsin-Hsi Chen. 2014. Interpretation of Chinese Discourse Connectives for Explicit Discourse Relation Recognition. To appear in Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland.
Hen-Hsen Huang, Chi-Hsin Yu, Tai-Wei Chang, Cong-Kai Lin, and Hsin-Hsi Chen. 2013. Analyses of the Association between Discourse Relation and Sentiment Polarity with a Chinese Human-Annotated Corpus. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (LAW VII 2014), pages 70-78, Sofia, Bulgaria.
HERNAULT, Hugo, et al. 2010. HILDA: a discourse parser using support vector machine classification. Dialogue & Discourse, 2010, 1.3.
Miltsakaki, Eleni, et al. 2004. "The Penn Discourse Treebank." LREC.
SPORLEDER, Caroline; LASCARIDES, Alex. 2008. Using automatically labelled examples to classify rhetorical relations: An assessment. Natural Language Engineering, 2008, 14.03: 369-416.

延伸閱讀