透過您的圖書館登入
IP:3.17.23.130
  • 學位論文

利用音樂查詢之影像檢索系統

An Image Retrieval System Using Music as Query

指導教授 : 鄭士康
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文提出一個新的影像檢索的方法,利用音樂做為查詢。不同於一般影像檢索的方法,大部分是利用關鍵字,或是其他的影像做為查詢。也就是說,我們提出的是跨媒體類別的檢索系統。在網路上,影像和音樂都伴隨有許多文字資訊(Metadata,元資料),而在我們的方法中,這些文字資訊被運用為音樂和影像之間的連結。利用一個從Okapi BM25所衍生而得的計算排名分數的函式,從文字資訊上計算音樂和影像之間的關聯程度,然後利用機率潛在語義分析模型(PLSA, Probabilistic Latent Semantic Analysis),計算音樂和影像的隱藏語意特徵(HSF, Hidden Semantic Feature),並且利用類神經網路(Neural Network)的技術,訓練出一個從音樂音訊特徵( Audio Feature)至隱藏語意特徵(HSF)的映射函數。在影像檢索的階段,音樂和影像的隱藏語意特徵和文字資訊被用作計算之間關聯性的基礎。最後,透過使用者的相關性回饋(Relevance Feedback)來增進影像檢索的效果,其中可分為短期學習及長期學習,前者為影像重新排名(Image Reranking),後者為更新音樂-影像描述文字對照表(Music-Image Descriptive Word Map)。為評估此影像檢索系統的效果,從Flickr取得了4000張圖片及其對應的文字資訊,以及取得了2000首歌曲,並且從AMG(All Music Guide)取得其對應的文字資訊。而實驗結果顯示,本系統可達到相當不錯的效果。

並列摘要


In this paper, a novel image retrieval approach is proposed. Differ from traditional image retrieval approaches, which generally retrieve images using keywords or example images as query, the image retrieval system proposed allows the user to search images using music as query. Namely, a music-image cross-media retrieval system is developed. There is rich textual information associated with music and image on the web, and the textual information is used to bridge the semantic gap between music and image in our research. The relevance of music and image are measured by a ranking function derived from Okapi BM25. Music-image semantic matrix is constructed based-on textual information of music and image, and PLSA (Probabilistic Latent Semantic Analysis) is applied on it to measure HSF (hidden semantic feature) of music and image. Neural Network is used to train a mapping function from music audio feature to HSF. In the phase of image retrieval, the music-image retrieval is based on HSF and textual feature. Finally, user relevance feedback is used for image reranking (short-term learning) and updating the music-image descriptive word map (long-term learning) to enhance the retrieval results. To evaluate the image retrieval system, 4000 images with textual information (metadata) are collected from Flickr, 1836 songs are collected and textual information (metadata) of these songs are collected from AMG(All Music Guide). The results show that this image retrieval system can achieve good performance.

參考文獻


[7] J. Assfalg , A. Del Bimbo, and P. Pala, “Three-dimensional interfaces for querying by example in content-based image retrieval,” IEEE Trans. Visualization and Computer Graphics, vol. 8, no. 4, pp. 305-318, 2002
[8] A. Csillaghy, H. Hinterberger, and A. Benz,” Content based image retrieval in astronomy,” Information Retrieval, vol. 3, no. 3, pp.229-241, 2000.
[9] X. He, O. King, W.-Y. Ma, M. Li, and H.-J. Zhang, “Learning a Semantic Space From User’s Relevance Feedback for Image Retrieval”. IEEE Trans. Circuits and Systems for Video Technology, vol. 13, no. 1, pp. 39-48, 2003
[10] Y.-T Zhuang, Y. Yang, and F. Wu, “Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval,” IEEE Trans. Multimedia, vol. 10, no. 2, pp. 221-229, 2008
[11] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Signal Processing, vol. 10, no. 5 , pp. 293-302, 2002

延伸閱讀