簡易檢索 / 詳目顯示

研究生: 趙偉成
Chao, Wei-Cheng
論文名稱: 探討預訓練神經網路於語音內涵之機器閱讀理解
Investigating Pretraining-Based Neural Networks for Machine Comprehension of Spoken Content
指導教授: 陳柏琳
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 57
中文關鍵詞: 口語問答深度學習遷移學習多任務學習模型壓縮
英文關鍵詞: Spoken Question Answering, Deep Learning, Transfer Learning, Multi-Task Learning, Model Compression
DOI URL: http://doi.org/10.6345/NTNU201900461
論文種類: 學術論文
相關次數: 點閱:77下載:25
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 當前人工智慧個人助理的研究和開發激增,如Alexa,Siri,Google assistant和Cortana,以及圍繞購物,音樂等的許多使用案例。隨著移動和虛擬現實設備語音介面需求的不斷增長,語音理解最近受到了許多研究人員的關注。本論文主要想研究如何建立系統來閱讀文本段落並回答理解問題。我們認為閱讀理解是評估系統如何理解人類語言的重要任務。如果我們能夠構建高性能的閱讀理解系統,它們將成為問答和對話系統等應用的關鍵技術。
    即使語音理解系統中的使用者介面是語音查詢,大多數語音理解系統也假設需求的文本可以獨立獲得。語言理解模型通常獨立於語音辨識系統進行優化。雖然近年來語音辨識系統的準確性有所提高,但辨識錯誤會使語言理解性能惡化。這個問題在人工智慧設備上變得更加嚴重,因為人工智慧設備的互動往往更具會話性。
    我們旨在涵蓋神經閱讀理解的本質,並展示我們在構建有效的神經閱讀理解模型方面的努力,更重要的是,理解神經閱讀理解模型實際學到了什麼,以及需要多大的語言理解深度來解決當前任務。我們還總結了最新進展,並討論了該領域的未來方向和未決問題。特別是我們開創了三個新的研究方向:多任務模型;利用遮蔽是語言模型改善語音辨識錯誤影響;還有用知識蒸餾的技術做模型壓縮,我們在中文聽力閱讀理解實施了這些想法,並證明了這些方法的有效性。

    While a future of interacting verbally with pervasive computers is not yet here, many strides toward that have emerged in recent years. Intelligent assistants, such as Alexa, Siri, and Google Assistant, are becoming increasingly common. For truly intelligent assistants that can help us with myriad daily tasks, AI should be able to answer a wide variety of questions from people beyond straightforward, factual queries such as “Which artist sings this song?”.
    This thesis tackles the problem of reading comprehension: how to build computer systems to read a passage of text and answer comprehension questions. On the one hand, we think that reading comprehension is an important task for evaluating how well computer systems understand human language. On the other hand, if we can build high-performing reading comprehension of spoken content systems, they would be a crucial technology for applications such as spoken question answering and dialogue systems. Language model pretraining has led to significant performance gains but automatic speech recognition errors and inference speed becomes a problem. To tackle this challenge, we propose multi-task fine-tune and model compression. The experiment results show that our method can significantly outperform the baseline methods, along with significant speedup of model inference.

    第1章 緒論 1 1.1 研究動機 1 1.2 論文章節安排 6 1.3 論文貢獻 7 第2章 文獻回顧及方法探討 8 2.1 閱讀理解的歷史 8 2.2 任務描述 15 2.3 閱讀理解與問題回答 18 2.4 資料集和模型 19 第3章 神經閱讀理解模型 21 3.1 以前的方法:基於特徵的模型 21 3.2 預訓練模型 24 3.3 多任務學習 24 3.4 遮蔽式語言模型 29 3.5 集成學習及知識蒸餾 30 第4章 實驗 35 4.1 實驗架構 35 4.2 實驗結果 38 第5章 結論 47 參考文獻 49

    [1] Wendy Grace Lehnert. 1977. The process of question answering. Ph.D. thesis, Yale University.
    [2] Roger C Schank and Robert P Abelson. 1977. Scripts, plans, goals and understanding: An inquiry into human knowledge structures. Lawrence Erlbaum.
    [3] Lynette Hirschman, Marc Light, Eric Breck, and John D Burger. 1999. Deep read: A reading comprehension system. In Association for Computational Linguistics (ACL), pages 325–332.
    [4] Ellen Riloff and Michael Thelen. 2000. A rule-based question answering system for reading comprehension tests. In ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computer-based language understanding sytems, pages 13–19.
    [5] Eugene Charniak, Yasemin Altun, Rodrigo de Salvo Braz, Benjamin Garrett, Margaret Kosmala, Tomer Moscovich, Lixin Pang, Changhee Pyo, Ye Sun, Wei Wy, Zhongfa Yang, Shawn Zeller, and Lisa Zorn. 2000. Reading comprehension programs in a statistical-language-processing class. In ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computerbased language understanding sytems, pages 1–5.
    [6] Matthew Richardson, Christopher J.C. Burges, and Erin Renshaw. 2013. MCTest: A challenge dataset for the open-domain machine comprehension of text. In Empirical Methods in Natural Language Processing (EMNLP), pages 193–203.
    [7] Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, Abby Vander Linden, Brittany Harding, Brad Huang, Peter Clark, and Christopher D. Manning. 2014. Modeling biological processes for reading comprehension. In Empirical Methods in Natural Language Processing (EMNLP), pages 1499–1510.
    [8] Mrinmaya Sachan, Kumar Dubey, Eric Xing, and Matthew Richardson. 2015. Learning answer-entailing structures for machine comprehension. In Association for Computational Linguistics (ACL), volume 1, pages 239–249.
    [9] Karthik Narasimhan and Regina Barzilay. 2015. Machine comprehension with discourse relations. In Association for Computational Linguistics (ACL), volume 1, pages 1253–1262.
    [10] Hai Wang, Mohit Bansal, Kevin Gimpel, and David McAllester. 2015. Machine comprehension with syntax, frames, and semantics. In Association for Computational Linguistics (ACL), volume 2, pages 700–706.
    [11] Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, Abby Vander Linden, Brittany Harding, Brad Huang, Peter Clark, and Christopher D. Manning. 2014. Modeling biological processes for reading comprehension. In Empirical Methods in Natural Language Processing (EMNLP), pages 1499–1510.
    [12] Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems (NIPS), pages 1693–1701.
    [13] Danqi Chen, Jason Bolton, and Christopher D Manning. 2016. A thorough examination of the CNN/Daily Mail reading comprehension task. In Association for Computational Linguistics (ACL), volume 1, pages 2358–2367.
    [14] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Empirical Methods in Natural Language Processing (EMNLP), pages 2383–2392.
    [15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    [16] Christopher Clark and Matt Gardner. 2018. Simple and effective multi-paragraph reading comprehension. In Association for Computational Linguistics (ACL), volume 1, pages 845–855.
    [17] Tom M Mitchell, Justin Betteridge, Andrew Carlson, Estevam Hruschka, and Richard
    [18] Wang. 2009. Populating the semantic web by macro-reading internet text. In International Semantic Web Conference (IWSC), pages 998–1002.
    [19] Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval, 13(3):254–270.
    [20] Mandar Joshi, Eunsol Choi, Daniel SWeld, and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Association for Computational Linguistics (ACL), volume 1, pages 1601–1611.
    [21] Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL): System Demonstrations, pages 55–60.
    [22] Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale reading comprehension dataset from examinations. In Empirical Methods in Natural Language Processing (EMNLP), pages 785–794.
    [23] Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6:287–302.
    [24] Tomas Kocisky, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G´aabor Melis, and Edward Grefenstette. 2018. The NarrativeQA reading comprehension challenge. Transactions of the Association of Computational Linguistics (TACL), 6:317–328.
    [25] Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In North American Association for Computational Linguistics (NAACL), volume 1, pages 252–262.
    [26] Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Association for Computational Linguistics (ACL), volume 2, pages 784–789.
    [27] Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Empirical Methods in Natural Language Processing (EMNLP), pages 2369–2380.
    [28] Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, and David McAllester. 2016. Who did what: A large-scale person-centered cloze dataset. In Empirical Methods in Natural Language Processing (EMNLP), pages 2230–2235.
    [29] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Association for Computational Linguistics (ACL), pages 311–318.
    [30] Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72.
    [31] Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out.
    [32] Tom M Mitchell, Justin Betteridge, Andrew Carlson, Estevam Hruschka, and Richard Wang. 2009. Populating the semantic web by macro-reading internet text. In International Semantic Web Conference (IWSC), pages 998–1002.
    [33] Danqi Chen, Jason Bolton, and Christopher D Manning. 2016. A thorough examination of the CNN/Daily Mail reading comprehension task. In Association for Computational Linguistics (ACL), volume 1, pages 2358–2367.
    [34] Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval, 13(3):254–270.
    [35] Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL): System Demonstrations, pages 55–60.
    [36] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    [37] Rich Caruana. 1997. Multitask learning. Machine learning, 28(1):41–75.
    [38] Yu Zhang and Qiang Yang. 2017. A survey on multitask learning. arXiv preprint arXiv:1707.08114.
    [39] Ronan Collobert, Jason Weston, L´eon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537.
    [40] Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, and Jianfeng Gao. 2018. Multi-task learning for machine reading comprehension. arXiv preprint arXiv:1809.06963.
    [41] Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2015. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114.
    [42] Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, and Jianfeng Gao. 2018. Multi-task learning for machine reading comprehension. arXiv preprint arXiv:1809.06963.
    [43] J. Gao, M. Galley, and L. Li. 2018. Neural approaches to conversational AI. CoRR, abs/1809.08267.
    [44] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
    [45] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.
    [46] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762.
    [47] Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89–96. ACM.
    [48] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 2333–2338. ACM.
    [49] Cristian Bucilu, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541. ACM.
    [50] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
    [51] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450.
    [52] Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. 2015. Net2net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641.

    下載圖示
    QR CODE