透過您的圖書館登入
IP:3.236.171.68
  • 學位論文

準確且穩固的問答模型

Toward Accurate and Robust Question Answering Systems

指導教授 : 陳縕儂
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本篇論文主要目的在於解決問答模型的問題,因為問答問題常被研究者們拿來測試模型對自然語言的理解及推理能力。解決的問題主要有二,第一是藉由提出一個簡單且有效的模組將原先只能處理單回合問答的模型延伸至多回合問答。第二是改善問答模型對對抗性攻擊的穩固性,我們設計了一個基於最大化相互資訊的正則化來達到這個目標。 基於對話的多回合問答需要模型對交談過程有近一步的理解,而先前被提出的模型藉由隱含的對模型推理的過程建模來改善表現。本篇論文的第一部分在這上面做了更進一步的改善,我們提出藉由明確的對模型推理的過程進行建模,以使模型可以更好地擷取對回答問題有用的資訊。模型在QuAC, CoQA以及SCONE三個資料集上皆得到很好的效果,顯著的改善了表現且證明了其可以被應用在不同種類的模型上。 本篇論文的第二部分專注在改善模型對對抗性樣本的的穩固性。雖然現在問答模型已經可以在傳統的測量標準上得到非常好的成績,它們仍是非常容易地被特別設計的混淆句子所欺騙,使人們對這些模型是否真正理解問題感到存疑。為了解決這個問題,我們首先專注在單回合的問答資料集上,並提出了一個藉由最大化問題、答案以及文章的相互資訊來實現的正則化。我們的正則化可以幫助模型不再只是用資料集中存在的膚淺相關性來回答問題。實驗結果顯示模型在Adversarial-SQuAD這個資料集上達到現在最好的表現。 在未來工作方面,將影像、聲音及常識引入問答模型是個重要的方向,而進一步研究如何防禦對抗性攻擊可以幫助模型對問題及文章有更深一步的了解。除此之外,問答模型的半監督學習和自監督學習也是一個重要的研究主題,因為儘管是小孩也不需要現在模型需要的龐大資料集來學習如何解決簡單的閱讀測驗。我們的未來方向放在如何開發有效率,穩固,且可以應用在各情境的問答模型。

並列摘要


The main purpose of this thesis is to solve problems related to question answering (QA), for it being widely used for training and testing machine comprehension and reasoning.We focus on two problems about generalization of single-turn QA models. Firstly, we propose a simple and effective module which models the information gain in the reasoning process to extend the single-turn QA models to multi-turn setting. Secondly, we aim to improve the robustness of QA models to adversarially generated examples by designing a novel regularizer utilizing mutual information maximization to guide the training process. Multi-turn question answering as the dialog requires deep understanding of the dialogue flow, and the prior work proposed FlowQA to implicitly model the context representation in reasoning for better understanding. The first part of this thesis proposes to explicitly model the information gain through dialogue reasoning in order to allow the model to focus on more informative cues. The proposed module is evaluated on two conversational QA datasets Question Answering in Context (QuAC) and Conversational Question Answering Challenge (CoQA), and one sequential instruction understanding datset Sequential Context-dependent Execution (SCONE) to shows the effectiveness. The proposed approach achieves significant improvement over baselines in all three datasets and demostrates its capability of generalization to different QA models and tasks. The second part of this thesis focuses on improving the robustness of QA models to adversarial examples. Standard accuracy metrics indicate that modern reading comprehension systems have achieved strong performance in many question answering datasets. However, the extent these systems truly understand language remains unknown, and existing systems are not good at distinguishing distractor sentences, which look related but do not actually answer the question. To address this problem, we first focus on models trained on single-turn extractive QA datasets, and propose QAInfomax as a regularizer in reading comprehension systems by maximizing mutual information among passages, questions, and answers. QAInfomax helps regularize the model to not simply learn the superficial correlation for answering questions. The experiments show that our proposed QAInfomax achieves the state-of-the-art performance on the benchmark Adversarial-SQuAD dataset. As for future work, QA can be extended to incorporate commonsense and features in multiple-modalities, and studying how to defense adversarial attacks in QA can lead models to deeper understanding of questions and paragraphs. Moreover, semi-supervised and self-supervised approaches of QA are worth exploring, as even children does not need so much training data to learn how to solve these simple questions. The efficient, robust, and generalizable QA systems is our most important research direction.

參考文獻


[1] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, 2016.
[2] M. Joshi, E. Choi, D. S. Weld, and L. Zettlemoyer, “Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension,” arXiv preprint arXiv:1705.03551, 2017.
[3] T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, M. Kelcey, J. Devlin, K. Lee, K. N. Toutanova, L. Jones, M.-W. Chang, A. Dai, J. Uszkoreit, Q. Le, and S. Petrov, “Natural questions: a benchmark for question answering research,” Transactions of the Association of Computational Linguistics, 2019.
[4] H. Alamri, V. Cartillier, R. G. Lopes, A. Das, J. Wang, I. Essa, D. Batra, D. Parikh, A. Cherian, T. K. Marks, et al., “Audio visual scene-aware dialog (avsd) challenge at dstc7,” arXiv preprint arXiv:1806.00525, 2018.
[5] A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. Moura, D. Parikh, and D. Batra, “Visual dialog,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2017.

延伸閱讀


國際替代計量