透過您的圖書館登入
IP:3.14.255.254
  • 學位論文

利用跨多媒體擬似配對資料於影像群生成食物評論

Food Review Generation for a Set of Images by Leveraging Cross-Media Pseudo Pairs

指導教授 : 徐宏民

摘要


由於近年來評論資訊的重要性,如何提供重要且富含資訊量的評論資訊是一個相當重要的課題。為此,我們提出了一個結構性評論的想法,也就是將一段評論搭配上相對應的影像來提供更為資訊量豐富的評論,並蒐集了大量的資料集。然而,這些文字評論跟影像資料都是相當雜的,因此濾掉相對不重要的資訊也是一個不可缺的步驟。此外,我們利用聚類分析而非分類方法來分類不同食物,因為現今所可取得的資料集並沒有適合我們問題的資料來做分類。濾掉不重要的資訊後,我們提出一個利用擬似配對資料做兩階段訓練的方法來避免跨域問題並使用不同的融合方法解決以多張影像為輸入的問題。透過我們的方法,產生的評論會相對穩定且合理並於食物正確性方面有相對36%的進步。同時,我們利用評估文字品質的BLEU來評估我們的方法也有相對的進步。

關鍵字

評論 兩階段訓練 擬似配對

並列摘要


Due to the importance of review information recently, how to provide more important and informative reviews to users is an essential problem to resolve. Therefore, we propose a novel idea named structural review which aims to match the review with corresponding images for more informative information and collect a large dataset. However, images are noisy even the text reviews, so it is also an essential process to filter the relatively useless information. Besides, we use clustering method to cluster the images with the same food type rather than classification for there is no suitable food dataset for our task to train a classifier. After filtering the noises, we propose a two-stage training method with pseudo pairs to avoid cross-domain issue and utilize different fusion methods for the input with multiple images. With our method, the quality of generated reviews is more stable and it also performs better with food accuracy with about 36% relative improvement. Meanwhile, our method also performs better with BLEU metric which measures the quality of the text.

並列關鍵字

Review Two-stage Training Pseudo Pair

參考文獻


[2] B. Pang, L. Lee, et al., “Opinion mining and sentiment analysis,” Foundations and Trends® in Information Retrieval, vol. 2, no. 1–2, pp. 1–135, 2008.
[4] D. Tang, B. Qin, and T. Liu, “Document modeling with gated recurrent neural network for sentiment classification.,” in EMNLP, pp. 1422–1432, 2015.
[7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[16] L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101 – mining discriminative components with random forests,” in European Conference on Computer Vision, 2014.
[17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems (NIPS), 2015.

延伸閱讀