透過您的圖書館登入
IP:3.145.66.241
  • 學位論文

視覺問題生成搭配文字輔助的方法與比較

Evaluation of Visual Question Generation With Captions

指導教授 : 莊永裕

摘要


論文主要目的是希望可以對圖片問出一個自然的問題。靈感來自於 網路上有許多多媒體資訊,例如教學網站的影片,如果我們可以讓電 腦在看完多媒體時,自動產生問題,我們並可以把問題給學生,從學 生的回答中看學生的學習成效。 我們決定從圖片著手研究,圖片部分成功之後未來就可以對影片的 應用有所幫助。本論文在視覺問題生成這個領域,用了新的流程來讓 深度學習模型學習,結合圖片和圖片文字描述,讓模型學習問出一個 像自然人所問出的問題。過去有一些論文提出結合圖片和文字描述的 資料,但其中並沒有應用在視覺問題生成這個主題,第一篇視覺問題 生成主題的論文在二零一六年提出,但是並沒有結合文字來學習。在 我們的實驗之中,比較之前論文和我們嘗試結合圖片和文字的方法, 從結果之中,得到加了文字特徵能讓模型學習問出更自然的問題。

並列摘要


Over the last few years, there have been many types of research in the vision and language community. There are many popular topics, for example, image captions, video transcription, question answering about images or videos, Image-Grounded Conversation(IGC) and Visual Question Generation(VQG). In this thesis, we focus on question generation about images. Because of the popularity of image on social media, people always upload an image with some descriptions, we think that maybe image captions can help Artificial Intelligence (AI) to learn to ask more natural questions. We proposed new pipeline models for fusing both visual and textual features, do experiments on different models and compare the prediction questions. In our results of experiments, the captions are definitely useful for visual question generation.

參考文獻


and language data with Amazon’s mechanical turk, pages 35–40. Association for Computational Linguistics, 2010.
[6] A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3128–3137, 2015.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
[9] I. Labutov, S. Basu, and L. Vanderwende. Deep questions without deep understand- ing. In ACL (1), pages 889–898, 2015.
[15] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

延伸閱讀