透過您的圖書館登入
IP:3.146.255.127
  • 學位論文

準則式中文句子重組還原

A Principle Based Approach for Chinese Word Reordering

指導教授 : 顏嗣鈞
共同指導教授 : 許聞廉(Wen-Lian Hsu)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


句子重組問題是指將通順的句子隨機排序打亂後,轉成詞袋模型,並將其中的詞彙重組,還原回一個通順的句子。此目的是提高機器生成的文本的語法性和流暢性,儘管句子重組問題在英文自然語言處理上已有多篇相關研究,但在中文自然語言處理中尚未有相關研究。 本論文基於英文上的相關研究,進行語言模型的更改,以及有別於以往集束搜索的方式,透過從訓練語料中建立詞類模板有效降低搜尋成本,使其和我們使用的語言模型BERT更為融合,在中文的樹庫資料集中拿到了0.82的BLEU分數。並在研究過程中訓練出一個EHowNet分類器,透過BERT詞向量的群聚,將其投影到正確的類別,可以有效解決資料庫中out-of-vocabulary的問題,對基於知識的自然語言處理有很大的幫助。

並列摘要


The problem of so-called sentence reordering means to randomly scramble the sequence of an orderly sentence and transforms it into a bag of words model among which it ends up with the tokens permutated and restored to an orderly way. The purpose is to increase the fluency and grammatical structure of the sentence generated by machines. In fact, there have been plenty of researches in English in terms of sentence reordering and natural language processing (NLP) but not in the scope of Chinese language. This paper hinges on the study of related works in English and is aimed at modification of language model to differentiate the existing method of beam search. Through the process of training corpus to establish POS patterns can effectively reduce the search efforts. By seamlessly combining it with BERT language model, it got a high mark of 0.82 BLEU points on Chinese Treebank. By virtue of creating the EHowNet classifier incorporating with the cluster of BERT word vectors, the system can project the accurate category and successfully address the issue of out-of-vocabulary in the database and did a great help in the knowledge-based NLP.

參考文獻


[1] I. G. Israel, “Information explosion and university libraries : Current trends and strategies for intervention,” 2010.
[2] K. Knight, “Automatic language translation generation help needs badly,” In MT Summit XI Work- shop on Using Corpora for NLG: Keynote Address, pp. 5–8, 2008.
[3] A. Ramanathan, H. Choudhary, A. Ghosh, and P. Bhattacharyya, “Case markers and morphology: addressing the crux of the fluency problem in English-Hindi smt,” In Proceedings of ACL-IJCNLP, 2009.
[4] Y. Zhang and S. Clark, “SyntaxBased Grammaticality Improvement using CCG and Guided Search,” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1147–1157, 2011.
[5] T.R.Gruber, “A translation approach to portable ontology specifications,” Knowledge acquisition, vol. 5, no. 2, pp. 199–220, 1993.

延伸閱讀