  • 學位論文


Identifying Common Erroneous Patterns for Auto Editing

指導教授 : 林守德


本研究的目的在從句子對應的語料中擷取有用的規則並示範如何應用這些規 則來建構出自動更正系統。本論文所提出的架構先利用計算句子之間所需要的更 正距離來找出句子之間有差異的部分,並根據這些差異的部分去構造出用來更正 句子的規則。我們相信這些差異的部分在更正的過程中扮演一個重要的角色。除 此之外,我們除去不必要的規則來讓所找到的規則更具一般性。最後我們也利用 詞性的特性來讓我們的規則能應用在詞性的型式上很相似但卻不同的句子。 本論文所提出的架構不需要限制在特定語言,可以很輕鬆地移植到不同語言上。 本實驗顯示本系統所找出的前1500 個規則中,有67.2%個規則被三位主修英語的 碩士標為「正確」或「大部分是正確」的標籤。


This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance between sentences to identify the key parts of the rules and then use the editing corpus to filter, condense and refine the rules. We produce the rule candidates of such form, A => B, where A stands for the erroneous pattern and B is the correct pattern. Besides, we focus on the generality of the rules to make the rules more general. Finally, we also employ the property of POS (Part of Speech) to make the rules general and can be applied to different sentences but similar in its POS form. Our framework is language independent, therefore can be applied to other languages easily. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we create an online auto-editing system for demo on http://mslab.csie.ntu.edu.tw/~kw/new_demo.html.


[4] Chodorow and Leacock, “An Unsupervised Method for Detecting Grammatical
Marcel Dekker.
[2] Lisa N. Michaud, Kathleen F. McCoy, and Christopher A. Pennington. “An
Intelligent Tutoring System for Deaf Learners of Written English.” In Proceeding
Timothy Baldwin. “Arboretum: Using a precision grammar for grammar checking
