帳號:guest(3.140.188.16)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):楊媛茜
作者(外文):Yuan-Chien Yang
論文名稱(中文):應用於語言學習與測驗之網路為本語意分析
論文名稱(外文):Web-Based Semantic Processing for Self-Paced Language Learning and Assessment
指導教授(中文):張俊盛
指導教授(外文):Jason S. Chang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:936724
出版年(民國):95
畢業學年度:94
語文別:英文
論文頁數:70
中文關鍵詞:語意分析語義辨析重述
外文關鍵詞:Semantic ProcessingWord Sense DisambiguationParaphrasing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:214
  • 評分評分:*****
  • 下載下載:25
  • 收藏收藏:0
電腦輔助自動出題(Computer Assisted Item Generation)為自然語言處理(Natural Language Processing)領域近年來剛起步的研究,有相當大的應用潛力,可提供電腦輔助語言學習(Computer Assisted Language Learning)所急需的自動化工具。結合自然語言處理與網路資料庫(Web-as-Corpus)技術來輔助學術閱讀更是近年來的重點研究項目,其目的在於使閱讀內容更為豐富與易於吸收,另一方面亦可輔助閱讀理解測驗(Reading Comprehension Test)的半自動化出題。

在本論文中,我們提出以網路資源為本的創新概念,來針對學習性(Learned Genre)的文章進行語意分析(Semantic Processing)。在訓練階段我們先對一篇隨機抽取的閱讀文章進行詞性分析、基本片語分析,進而抽取出文章中的關鍵詞和動詞與名詞的搭配詞(V-N Collocation)來針對原文進行語義辨析(Word Sense Disambiguation)與重述(Paraphrasing)。我們實際製作了程式,以隨機抽取的二十篇托福閱讀文章來進行測試,並針對名詞的語義辨析與動詞的重述設計不同的評估方式。名詞語義辨析部分在較嚴苛的測試條件下,得到近六成的精確率;動詞重述部分則得到八成五的涵概率。經由實驗結果我們發現,以網路資源為本的語意分析的確有相當大的應用潛力,能跳脫傳統以資料庫進行語意分析時所面臨的資訊不足問題,亦可善加利用豐富多變與即時更新的網路資源來輔助與增益自我導向式(Self-Paced)的語言學習。
There has been increasing interest in exploiting Natural Language Processing (NLP) technology in Computer Assisted Language Learning (CALL). Advances have been made in automatic rating of essays in standardized tests. There is also a need for automatic programs that generate test items that, after minor post-editing, are applicable in self-paced learning and low-stakes testing situations. This paper presents a novel NLP-based approach to facilitate the reading process of self-paced online learning, and to assist the semi-automatic generation of test items for reading comprehension tests (RCTs).
The method involves identifying key words and key sentences, disambiguating word sense of the key words, paraphrasing part of the sentences, displaying disambiguated keyword definitions and paraphrased verb phrase alternatives. For that, senses of words are transformed into a set of sense-related queries combined to be with context information to collect disambiguation information or paraphrase data from the Web. We implement the proposed method based on the concept of Web-as-Corpus (WAC) for the semantic processing of word sense disambiguation and paraphrasing. Evaluation on a set of official TOEFL reading passages suggests that such a procedure is effective in terms of time, labor, and quality. Our methodology clearly provides potential for exploiting the web-based data, turning authentic texts into enriched reading materials, and assisting the generation of effective test items for reading comprehension tests.
摘要 i
ABSTRACT ii
致謝辭 iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1 Introduction 1
1.1 Computer Assisted Extensive Reading 1
1.2 Computer Assisted Item Generation 3
1.3 Organization 9
Chapter 2 Related Work 10
Chapter 3 Web-Based Semantic Processing 15
3.1 Problem Statement 15
3.2 Transform Dictionary Information into Effective Queries 17
3.2.1 Prepare a Query Table for Disambiguating Key Noun Phrases 18
3.2.2 Prepare a Query Table for Paraphrasing Verb Phrases 22
3.3 A Web-based Procedure for Semantic Processing 25
3.3.1 Preprocess the Given Context 26
3.3.2 Disambiguate Word Sense of the Key Terms 27
3.3.3 Paraphrase Part of the Key Sentences 32
3.3.4 Output of the Alternative Sentences for the Extracted Key Sentences 35
Chapter 4 Experiments and Analysis 38
4.1 Training the Semantic Processing Query Tables 38
4.2 Evaluation Metrics 40
4.2.1 Metric for Key Word Sense Disambiguation 41
4.2.2 Metric for Verb Phrase Paraphrasing 41
4.3 Evaluation Results 43
4.3.1 Evaluation Result of Key Word Sense Disambiguation 43
4.3.2 Evaluation Result of Verb Phrase Paraphrasing 45
4.4 Discussion 47
4.4.1 Limitation and Future Development of the WSD Procedure 47
4.4.2 Limitation and Future Development of the Paraphrasing Procedure 53
Chapter 5 Conclusion and Future Work 56
5.1 Future Work 56
5.2 Conclusion 57
References 58
Appendix A – WordNet Glossary of Terms 62
Appendix B – An Example Article in the Learned Genre 63
Appendix C – Enriched Text Given in Appendix B 65
Appendix D – Verb Frames in WordNet 2.0 67
Appendix E – Evaluation Result of Noun Disambiguation (Sample of All-Word Setting) 68
Appendix F – Evaluation Result of Noun Disambiguation (Sample of Sample-Word Setting) 69
Appendix G – Evaluation Result of Verb Paraphrasing ( Sample ) 70
Agirre E., & Martinez D. (2004). The effect of bias on an automatically-built word sense corpus. Proc. of the 4rd International Conference on Language Resources and Evaluations (LREC).
Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied Linguisics, 22(2), 173-194.
Barzilay, R., McKeown, K., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proc. of the 37th Association for Computational Linguistics (ACL’99), 550-557.
Barzilay, R., & McKeown, K. (2001). Extracting paraphrases from a parallel corpus. Proc. of ACL-EACL2001, 50-57.
Baumann, J. F., Kame‘enui, E. J., & Ash, G. E. (2003). Research on vocabulary instruction: Voltaire redux. In J. Flood, D. Lapp, J. R. Squire, & J. M. Jensen (Eds.), Handbook on research on teaching the English language arts (2nd ed., pp. 752-785). Mahwah, NJ: Erlbaum.
Bruce, R., & Wiebe, J. (1994). Word-Sense Disambiguation Using Decomposable Models. Proc. of the 32nd Annual Meeting of the Association for Computational Linguistics.
Carver, R. P. (1973). Reading as reasoning: Implications for measurement. In W. H. MacGinitie (Ed.), Assessment problems in reading. Newark, DE: International Reading Association.
Chalhoub-Deville, M. (2001). Language testing and technology: past and future. Language Learning & Technology, 5(2), 95–98.
Chang, Y.-C. (2005). An Automatic Collocation Writing Assistant for Taiwanese EFL Learners Based on NLP Technology. A Thesis Presented to the National Tsing Hua University for the Degree Master of Computer Science, 1-48.
Chapman, K. B. (2005). The Marino Mission: One Girl, One Mission, One Thousand Words; 1000 Need-to-Know *SAT Vocabulary Words. Location: Cliffs Notes.
Cheng, C.-C. (2004). Word-focused extensive reading with guidance. Selected Papers from the 13th International Symposium and Book Fair on English Teaching, 24-32.
Deane, K. Sheehan. (2003). Automatic item generation via frame semantics, Education Testing Service.
Gale, W. A., Church, K. W., & Yarowsky, D. (1992) One sense per discourse. Proc. of the workshop on Speech and Natural Language.
Gao, Z.-M. (2002). An Automatic Web-Based Computer-Adaptive Vocabulary Testing System. Proc. of the Conference and Workshop on TEFL & Applied Linguistics.
Haastrup, K. (1987). Using thinking aloud and retrospection to uncover learners’ lexical inferencing procedures. In C. Faerch & G. Kasper (Eds.), Introspection in second language research (pp. 197-212). Clevedon, UK: Multilingual Matters.
Hanks, P., & Church, K. W. (1990) Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22-29.
Henning, G. (1986). Item banking via dBase II: The UCLA ESL Proficiency Examination experience. In C. W. Stansfield (Ed.), Technology and language testing (pp. 69-77). Washington, DC: TESOL.
Jian, J.-Y., Chang, Y.-C., & Chang, Jason S. (2004) TANGO: Bilingual Collocational Concordancer. Proc. of the 42th Annual Meeting of Association for Computational Linguistics.
Leacock, C., Towell, G., & Voorhees, E.M. (1993) Toward building contextual representations of word senses using statistical models. Proc. of the 1993 ACL SIGLEX Workshop - Acquisition of Lexical Knowledge from Text.
Leacock, C., Chodorow, M., & Miller, G.A. (1998) Using Corpus Statistics and WordNet Relations for Sense Identication. Computational Linguistics, 24(1), 147-166.
Li, C., & Li, H. (2002) Word Translation Disambiguation Using Bilingual Bootstrapping. Proc. of the 40th Ann. Meeting Assoc. Computational Linguistics, 343-351.
Lin, D., & Pantel, P. (2001). Discovery of inference rules for question-answering. Natural Language Engineering, 7, 343–360.
Liu, C.-L., Wang, C-H., Gao, Z.-M., & Huang, S.-M. (2005). Applications of Lexical Information for Algorithmically Composing Multiple-Choice Cloze Items. Proc. of the Second Workshop on Building Educational Applications Using NLP, 1-8.
Mihalcea, R., & Moldovan, D. (1999) An Automatic Method for Generating Sense Tagged Corpora. Proc. of AAAI '99, 461-466.
Miller, G. A., & Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1), 1-28.
Mitkov, R., & Ha, L.A. (2003). Computer-Aided Generation of Multiple-Choice Tests. Proc. of the HLT-NAACL03 Workshop on Building Educational Applications Using NLP, 17-22.
Paribakht, T. S., & Wesche, M. (1999). Reading and “incidental” L2 vocabulary acquisition: An introspective study of lexical inferencing. Studies in Second Language Acquisition, 21, 195-218.
Schütze, H. (1992). Dimensions of Meaning. Proc. Supercomputing 92, 787-796.
Sekine, S. (2005) Automatic paraphrase discovery based on context and keywords between NE pairs. Proc. of International Workshop on Paraphrase, 80-87.
The Official Guide to the New TOEFL® iBT. (2006). Location: Educational Testing Service.
Wang, C.-H., Liu, C.-L., & Gao, Z.-M. (2003). Toward computer assisted item generation for English vocabulary tests (電腦輔助英文字彙出題系統之研究). Proc. of the 2003 Joint Conference on Artificial Intelligence, Fuzzy Systems, and Grey Systems (TAAI'03), CD-ROM.
Wang, C.-H., Liu, C.-L., & Gao, Z.-M. (2004). 利用自然語言處理技術自動產生英文克漏詞試題之研究. Proc. of the Sixteenth Conference on Computational Linguistics and Speech Processing (ROCLING XVI), 111-120.
Yang, C.-Y., & Hung, Jason C. (2006) Word Sense Determination using WordNet and Sense Co-occurrence. aina, Proc. of the 20th International Conference on Advanced Information Networking and Applications (AINA'06), 1, 779-784.
Yang, Y.-C., Yang, J.-F., Chang, J.-M., & Chang, Jason S. (2005). 電腦輔助閱讀測驗自動出題. Proc. of the Sixteenth Conference on Computational Linguistics and Speech Processing (ROCLING XVII), 141-153.
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–196.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *