利用語音進行照片中人物影像的自動化標註及檢索

Annotation is important for managing and retrieving a large amount of photos, but it is generally labor-intensive and time-consuming. However, speaking while taking photos is straightforward and effortless, and using voice for annotation is faster than typing words. To best reduce the manual cost of annotating photos, we propose a novel framework which utilizes the scarce spoken annotations recorded while capturing as voice labels and automatically label every facial image in the photo collection. To accomplish this goal, we employ a probabilistic graphical model which integrates voice labels and visual appearances for inference. Combined with group prior estimation and gender attribute association, we can achieve an outstanding performance on the proposed synthesized group photo collections.

並列關鍵字

Photo Annotation ； Speech Retrieval

參考文獻

[5] M. Brenner and E. Izquierdo. Recognizing people by face and body in photo col- lections. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pages 1–7. IEEE, 2013.

[6] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.

[7] D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In Computer Vision and

[10] P. Duygulu and A. Hauptmann. What’s news, what’s not? associating news videos with words. In Image and Video Retrieval, pages 132–140. Springer, 2004.

[11] B. J. Frey and D. Dueck. Clustering by passing messages between data points. sci- ence, 315(5814):972–976, 2007.

國際替代計量

利用語音進行照片中人物影像的自動化標註及檢索

全文下載

主題瀏覽