協助非中文母語學習者修正介繫詞選詞錯誤

目前漢語的學習人數激增，知道中文詞彙；明瞭漢語語法；通曉華語文化儼然成為優勢。這樣的技能水平標準，是未來潮流必備的。然而，縱查自然語言之研究，卻極少有關於中文語法的相關課題，再者，中文文法上介繫詞的選詞，決定了中文所要表達的含意。反觀英文的介繫詞研究在近年來成長許多，也應用的相當廣泛，使得英文在學習上不再受空間或時間所限，進展到可以由人工智慧來協助。本研究期望針對外國人學習漢語時，介繫詞詞彙選擇錯誤，所造成意境上的誤差，來做為研究目標對象。研究中以HSK（漢語水平標準考試）語料庫為出發點，把外國人學習中文的真實情境語料資源作為研究目標，並擇以介繫詞偏誤為主的句子，透過不同語言模型修正其錯誤。而參照的標準語料庫為中研院所收集之巨量資料集，CGW（Chinese Giga Word），其遍及的中文極廣，含括華爾街日報、中央社新聞等等，透過此大數據的資料，藉以不同策略建立語言模型，建立一套專門修正介繫詞錯誤的中文語言模組。針對研究的重點介繫詞，在無需人工修正下，最佳模組在介繫詞選擇上，於漢語為母語的文章裡，可以達到將近68%的準確率，而在外國人撰寫的文句則可以修正句子達到45%。

關鍵字

中文文法改正；中文介繫詞選詞； HSK語料庫；語言模型

並列摘要

The increase of the Chinese language leaners has become a currently trend. Knowing Chinese words, realizing Chinese grammar, and comprehending Chinese culture is an advantage in the word. Those skills will be an necessary standard. However, there is not much research dedicated to detect and correct Chinese grammatical errors. Moreover, the Chinese preposition contain most meaning of a context. For preposition research, there are many investigations on English part, and the application is also widespread. It makes learning English not restricted by time and space. The goal of this research focus on the preposition error; mostly investigate the mistake from the Chinese as a second language leaners. In this research, the experiment dataset is extracted from HSK dynamic composition corpus that was built by Beijing Language and Culture University. This is a real circumstances of CSL and sentences extracted with “preposition error”. Using Chinese Giga Word as the stander dataset and training different language models in order to choose the correct preposition. Without any rule-based model, our model for selecting the proper preposition can reach 68% accuracy in the L1 context, and the 45% in L2 dataset.

並列關鍵字

Chinese Grammar Correction ； Chinese Preposition Selection ； HSK Corpus ； Language Model

參考文獻

Wu, S. H., Chen, Y. Z., Yang, P. C., Ku, T., & Liu, C. L. (2010). Reducing the false alarm rate of Chinese character error detection and correction. In Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP 2010) (pp. 54-61).

Johnson, D. H. (1999). The insignificance of statistical significance testing. The journal of wildlife management, 763-772.

Pearson, K. (1900). X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157-175.

Yuan, Z., & Felice, M. (2013). Constrained grammatical error correction using Statistical Machine Translation. CoNLL-2013, 52.

Buys, J., & Merwe, B. V. D. (2013). A Tree Transducer Model for Grammatical Error Correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 43-50). Association for Computer Linguistics.

Google Scholar

國際替代計量

協助非中文母語學習者修正介繫詞選詞錯誤

全文下載

主題瀏覽