  • 學位論文


Question Generation from Knowledge Base Using Deep Learning Model

指導教授 : 孫春在


隨著近年資料驅使模型的興起,語料資源的缺乏成為自然語言研究者的一大阻礙。相較於英文語料,漢語語料更是缺乏。本研究提出利用現有少數的問答資源整合知識庫,建立漢語問答語料。 本研究使用CN-DBpedia知識庫資源以及WebQA與網路爬蟲蒐集的漢語問答語料,整併兩者不同來源的語料資源作為訓練資料,並利用序列到序列模型,由知識庫生成對應之問句,再取模型生成的問句結合知識庫中的實體作為答案,建立新的漢語問答語料。實驗中我們另以模板式問題生成方法製作基準模型,並以人工方式進行評分。我們發現本研究在不需任何人工文法規則介入的情況下,可達到與模板式方法相似的成果,並在問句複雜度與問答合理性均優於模板式方法。


With the advancement of data-driven approach, the lack of corpora has become the main obstacle of the natural language processing research. Compared with English corpora, publicly available Mandarin corpora is even more lacking. Our paper purposes to solve this problem by using existing question answering dataset and knowledge base to create a new Mandarin question answering dataset. In this study, we first collect the data from CN-DBpedia and question answering dataset from WebQA and web crawler, and propose a method to combine them in the form of pairs as our training data, and then using sequence-to-sequence model to generate questions from knowledge base. The generated questions then incorporate with entities in knowledge base as the answers to create a new Mandarin question answering dataset. In our experiment, we develop a template-based question generation baseline in order to evaluate our model by human evaluation. Our model achieves an acceptable performance compare to the template-based baseline.


[1] B. Green, A. Wolf, C. Chomsky, and K. Laughery, "BASEBALL: an automatic question answerer," in Readings in natural language processing: Morgan Kaufmann Publishers Inc., 1986, pp. 545-549.
[2] 張文哲, "教育心理學: 理論與實務," 台北市: 學富文化, 2005.
[3] V. Boumová, "Traditional vs. modern teaching methods: Advantages and disadvantages of each," Masarykova univerzita, Filozofická fakulta, 2008.
[4] A. S. Palinscar and A. L. Brown, "Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities," Cognition and instruction, vol. 1, no. 2, pp. 117-175, 1984.
[5] N. Mostafazadeh, I. Misra, J. Devlin, M. Mitchell, X. He, and L. Vanderwende, "Generating natural questions about an image," arXiv preprint arXiv:1603.06059, 2016.
