透過您的圖書館登入
IP:3.147.73.35
  • 學位論文

鄉民風格之時尚評論 - 數據集及與多樣性測量法

Netizen-Style Commenting on Fashion Photos - Dataset and Diversity Measures

指導教授 : 徐宏民
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來圖片描述的產生已經有相當不錯的成果,然而,當今的方法仍有些缺陷,由於只能夠描述表面上的物件,如樣式或顏色等,缺乏了實用性,只能夠產生比較罐頭式的句子,這些句子缺乏了人與人之間的情感聯繫。因此我們提出了 Netizen Style Commenting (NSC),用來對時尚照片產生有特色的評論,我們致力於讓評論的風格具有如同網路上鄉民般生動活潑的風格,希望能增加與使用者情感上的連結。我們的作品主要有三個的部分,第一個是大規模的穿搭評論資料集,第二個是對於多樣性的衡量方式,最後是我們透過結合了Topic model與類神經網路來補足傳統方法的不足。

關鍵字

時尚 圖片描述 評論 深度學習 主題模型

並列摘要


Recently, image captioning has achieved promising results. However, current works have several deficiencies. They have low utilities as simply generating “vanilla” sentences, which only describe shallow appearances (e.g., types, colors) in photos - lacking engagements and user intentions. Therefore, we propose Netizen Style Commenting (NSC), to generate characteristic comments to a user-contributed fashion photo. We are devoted to modulating the comments in a vivid “netizen” style which reflects the culture in a designated social community and hopes to facilitate more engagements with users. In this work, we design a novel framework that consists of three major components: (1) We construct a large-scale clothing dataset named NetiLook to discover netizen-style comments. (2) We propose three unique measures to estimate the diversity of comments. (3) We bring diversity by marrying topic model with neural networks to make up the insufficiency of conventional image captioning works.

參考文獻


[6] E. Simo-Serra, S. Fidler, F. Moreno-Noguer, and R. Urtasun, “Neuroaesthetics in fashion: Modeling the perception of fashionability,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 869–877, 2015.
[10] L. Anne Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell, “Deep compositional captioning: Describing novel object categories without paired training data,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp. 1–10, 2016.
[11] J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” arXiv preprint arXiv: 1612.01887, 2016.
[17] J. Deng, A. Berg, K. Li, and L. Fei-Fei, “What does classifying more than 10,000 image categories tell us?,” Computer Vision–ECCV 2010, pp. 71–84, 2010.
[19] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

延伸閱讀