圖文生活日誌之圖片回憶研究

受惠於科技的進步，人們可以隨時隨地用相機或智能手機拍照來記錄生活。但是照片無法保存完整的信息。因此需要使用文字紀錄整個故事並保留一些特定信息當作圖片訊息的補充。許多人選擇編寫圖文交織的部落格使得生活記憶得以保存。但是像痞客邦這樣的熱門部落格網站並沒有照片回憶功能。而谷歌相簿雖然有基本的照片搜索功能，但此搜索功能卻不支援圖片上下文相關故事信息的搜索。據我們所知，這是第一個針對圖文生活日誌進行圖片回憶的研究。我們從痞客邦收集圖文生活日誌資料集“Blog-travel”，並模仿人們對此資料集從五種不同面向進行圖片回憶標記。我們另外從痞客邦搜集了更大的資料集 “Blog-travel-large”來做更多訓練和比較。此外，我們比較了一些圖片和文字的嵌入編碼器，並提出了“圖片模型”和 “故事模型”來做圖片回憶檢索。圖片模型透過無監督式的圖文嵌入學習，可以將圖片和文字嵌入到同一個空間中，進而可以用文字對圖片做檢索。而故事模型單純使用圖片附近的故事來做文字對文字的檢索，在對應到鄰近圖片達成文字對圖片檢索。由於上述兩種模型具有互補性，因此我們將兩個模型結合成為一個模型“圖片故事模型”，此模型在“Blog-travel”做圖片回憶評分時的結果優於谷歌圖像搜索也優於訓練在 MSCOCO 資料集表現最好的圖文嵌入模型。我們更進一步地考慮了不同的 query 會造成相關故事和圖片間的距離差異，提出圖片故事注意力模型，使得表現更加提升。

關鍵字

圖文生活日誌；圖片回憶；圖文嵌入學習；圖片檢索

並列摘要

Benefit from the advancement of science and technology, people can easily take photos with cameras or smart phones anytime anywhere to record their life. However, photos cannot keep the complete information. Text is the complement to describe the whole story and keep some specific messages. Therefore, writing image-text intertwined lifelog is a popular way to keep life memory. And then how to retrieve image precisely between tons of images with context information in lifelogs is a big issue. The modern blog websites like PIXNET does not have function of photos recall. Another online photo storage like Google Photos has basic photo search function does not support to search photos with related story information. To the best of our knowledge, this is the first research addressing image recall on image-text intertwined lifelog. We collect an image-text intertwined lifelog dataset “Blog-travel” from PIXNET, and to imitate people to do image recall on this dataset from five different points of view. Furthermore, we collect a bigger dataset “Blog-travel-large” to do more training and comparison. We compare some image and sentence encoders and propose Image model and Story model for image recall retrieval. Image model can transfer image and text to the same embedding space through unsupervised learning, so that the image can be retrieved by text. The Story model simply uses the story near the image to calculate text-text similarity score, and assign the score to the image to make image retrieval possible. Since the above two models are complementary, we combine the two models into the Image-story model. This model outperforms Google Image Search on image recall task on Blog-travel, and also outperforms the state-of-the-art model which is trained on MSCOCO dataset. Moreover, we notice that the distance between the image and the related stories will be different by different queries. And then we propose Image-story attention model which combines different Image-story models which consider different image-story distances to get better performance.

並列關鍵字

Image-text lifelog ； Image recall ； Image-text embedding learning ； Image retrieval

參考文獻

[1] K. Juneja, A. Verma, S. Goel, S. Goel, "A survey on recent image indexing and retrieval techniques for low-level feature extraction in CBIR systems", Proc. IEEE Int. Conf. Comput. Intell. Commun. Technol., pp. 67-72, 2015.

Google Scholar

[2] S. Gandhani, N. Singhal “Content Based Image Retrieval: Survey and comparison of CBIR system based on Combined features”, International Journal of Signal Processing, Image Processing and Pattern Recognition, 2015

Google Scholar

[3] A. Kr. Yadav, R. Roy, Vaishali and A. Praveen Kumar, "Survey on Content-based Image Retrieval Texture Analysis with Applications", International Journal of Signal Processing Image Processing and Pattern Recognition, vol. 7, 2014.

Google Scholar

[4] J. Yue, Z. Li, L. Liu and Z. Fu. “Content-based image retrieval using color and texture fused features”, Mathematical and Computer Modelling, vol. 54, no. 3, pp. 1121-1127, 2011.

Google Scholar

[5] X. Y. Wang, Y. J. Yu and H. Y. Yang, “An effective image retrieval scheme using color, texture and shape features”, Computer Standards & Interfaces, vol. 33, pp. 59-68, 2011

Google Scholar

國際替代計量

圖文生活日誌之圖片回憶研究

主題瀏覽