  • 學位論文


Web-Based Semantic Processing for Self-Paced Language Learning and Assessment

指導教授 : 張俊盛


電腦輔助自動出題(Computer Assisted Item Generation)為自然語言處理(Natural Language Processing)領域近年來剛起步的研究,有相當大的應用潛力,可提供電腦輔助語言學習(Computer Assisted Language Learning)所急需的自動化工具。結合自然語言處理與網路資料庫(Web-as-Corpus)技術來輔助學術閱讀更是近年來的重點研究項目,其目的在於使閱讀內容更為豐富與易於吸收,另一方面亦可輔助閱讀理解測驗(Reading Comprehension Test)的半自動化出題。 在本論文中,我們提出以網路資源為本的創新概念,來針對學習性(Learned Genre)的文章進行語意分析(Semantic Processing)。在訓練階段我們先對一篇隨機抽取的閱讀文章進行詞性分析、基本片語分析,進而抽取出文章中的關鍵詞和動詞與名詞的搭配詞(V-N Collocation)來針對原文進行語義辨析(Word Sense Disambiguation)與重述(Paraphrasing)。我們實際製作了程式,以隨機抽取的二十篇托福閱讀文章來進行測試,並針對名詞的語義辨析與動詞的重述設計不同的評估方式。名詞語義辨析部分在較嚴苛的測試條件下,得到近六成的精確率;動詞重述部分則得到八成五的涵概率。經由實驗結果我們發現,以網路資源為本的語意分析的確有相當大的應用潛力,能跳脫傳統以資料庫進行語意分析時所面臨的資訊不足問題,亦可善加利用豐富多變與即時更新的網路資源來輔助與增益自我導向式(Self-Paced)的語言學習。


語意分析 語義辨析 重述


There has been increasing interest in exploiting Natural Language Processing (NLP) technology in Computer Assisted Language Learning (CALL). Advances have been made in automatic rating of essays in standardized tests. There is also a need for automatic programs that generate test items that, after minor post-editing, are applicable in self-paced learning and low-stakes testing situations. This paper presents a novel NLP-based approach to facilitate the reading process of self-paced online learning, and to assist the semi-automatic generation of test items for reading comprehension tests (RCTs). The method involves identifying key words and key sentences, disambiguating word sense of the key words, paraphrasing part of the sentences, displaying disambiguated keyword definitions and paraphrased verb phrase alternatives. For that, senses of words are transformed into a set of sense-related queries combined to be with context information to collect disambiguation information or paraphrase data from the Web. We implement the proposed method based on the concept of Web-as-Corpus (WAC) for the semantic processing of word sense disambiguation and paraphrasing. Evaluation on a set of official TOEFL reading passages suggests that such a procedure is effective in terms of time, labor, and quality. Our methodology clearly provides potential for exploiting the web-based data, turning authentic texts into enriched reading materials, and assisting the generation of effective test items for reading comprehension tests.


Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied Linguisics, 22(2), 173-194.
Barzilay, R., McKeown, K., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proc. of the 37th Association for Computational Linguistics (ACL’99), 550-557.
Barzilay, R., & McKeown, K. (2001). Extracting paraphrases from a parallel corpus. Proc. of ACL-EACL2001, 50-57.
Bruce, R., & Wiebe, J. (1994). Word-Sense Disambiguation Using Decomposable Models. Proc. of the 32nd Annual Meeting of the Association for Computational Linguistics.
Chalhoub-Deville, M. (2001). Language testing and technology: past and future. Language Learning & Technology, 5(2), 95–98.
