透過您的圖書館登入
IP:18.188.20.56
  • 學位論文

基於內容的自動標記與喜好學習應用於音樂資訊檢索

Content-based Automatic Annotation and Preference Learning for Music Information Retrieval

指導教授 : 張智星

摘要


音樂資訊檢索在近十年來受到了越來越多的關注。在本論文中,我們探討兩種主要人們搜尋歌曲、歌手、專輯等的方式,分別為自動標記與喜好學習預測。相較於利用範例查詢的方法,對於人們來說使用語意概念字來搜尋音樂較自然。這種搜尋方式,又稱語意描述查詢,需要一種準確且自動的方法來幫助人們標記音訊檔案。為了達到此目的,我們提出了一個基於反義字模型的自動標記系統。我們針對每一個標記字建立出相對應的反義字集,而其反意字集是由語意上與其具有相反意義的標記字所建立出來。藉由建立標記字與其反意字的模型,在自動標記的表現上,我們的系統能比原系統具有更好的效果。而在搜尋的表現上,使用了反意字模型的系統也同樣能有較好的效果。另一種人們發掘感興趣的音樂的方式是藉由推薦系統。一些商用系統,例如Amazon、TiVo和Netflix採用協同過濾(collaborative filtering)方法來幫助使用者發掘感興趣的商品,但該方法卻面臨緩開始問題(cold-start problem)。不過,基於內容的方法主要利用音樂本身的特徵,而不是利用使用者過去的交易紀錄來推薦,因此能舒緩此種問題。在本論文的第二部分,我們提出了一個能預測使用者喜好的基於內容的歌手推薦系統。首先利用所有的歌曲建立出全域背景模型(universal background model, UBM),接著利用事後機率最大化調適方法(maximum a posterior adaptation, MAP)建立出各個歌手之聲學特徵。這些聲學特徵與使用者的喜好分數將利用排序回歸方法來訓練排序函數。在本論文中,我們提出了一個保留排序投影(order preserving projection, OPP)演算法。該方法與一個排序回歸方法,PRank,有相似的效能。另外,我們可以核化(kernelize)提出的保留排序投影演算法使其有能力學習非線性的排序函數。藉由導入核方法,我們還可以有效地融合聲學特徵與符號特徵,而這些符號特徵是由標記字所建立。實驗結果顯示,我們的系統可以有效地預測使用者的喜好,並且藉由使用非線性排序函數或融合聲學特徵和符號特徵,系統效能皆能得到進一步的提升。

並列摘要


Music information retrieval received more and more attention in the past decades. The goal is to find songs, artists, or albums of users’ interests. In this thesis, we focus on two major retrieval approaches, automatic annotation and preference learning recommendation systems. Rather than adopting query-by-example techniques (QBE), searching audio files by a set of semantic concept words is much more natural to associate with music. Such an approach, called query-by-semantic-description (QBSD), needs an accurate and automatic way to help people with tagging lots of audio files. To achieve this demand, we propose an automatic annotation system that uses anti-words for each annotation word based on the concept of supervised multi-class labeling (SML). More specifically, words that are highly associated with the opposite semantic meaning of a word constitute its anti-word set. By modeling both a word and its anti-word set, our annotation system can achieve higher mean per-word precision and recall than the original SML model. Moreover, by constructing the models of the anti-word explicitly, the performance is also significantly improved for the retrieval system. Another major approach for people to discover music is through recommendation which exists frequently in our daily life. Recommenders, such as Amazon, TiVo, and Netflix, adopt collaborative filtering (CF) which often suffers from the so called cold-start problem. However, content-based approach can alleviate this problem since it relies on audio contents instead of users’ past transactions. In the second part of this thesis, we propose a content-based artist recommendation system that can well-predict a user’s tastes. In particular, an artist is characterized by the corresponding acoustical model which is adapted from a universal background model (UBM) through maximum a posterior (MAP) adaptation. These acoustical features, together with their preference rankings, are then used for an ordinal regression algorithm that tries to find a ranking rule which can predict the rank of a new instance. Moreover, an order preserving projection (OPP) algorithm is proposed which is shown to have comparable results with an ordinal regression algorithm, PRank. The proposed linear OPP can also be kernelized to learn the potential nonlinear relationship between music contents and users’ artist rank orders. By introducing the kernel method, we can also efficiently fuse acoustical and symbolic features, i.e. annotation words, under the proposed framework. Experimental results show that the system can successfully predict the user’s tastes and achieve better performance whether using non-linear algorithms of OPP or fusing acoustical and symbolic features.

參考文獻


[1] M. Goto and K. Hirata, “Recent studies on music information processing,” Acoust. Sci. Technol., vol. 25, no. 4, pp. 419–425, 2004.
[3] J.-S. Roger Jang and H.-R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming," IEEE Trans. Audio, Speech, and Language Process., vol. 16, no. 2, pp. 350-358, Feb. 2008.
[6] J.-S. Roger Jang, H.-R. Lee, C.-H. Yeh, "Query by Tapping: A New Paradigm for Content-based Music Retrieval from Acoustic Input," 2nd IEEE Pacific-Rim Conf. Multimedia, Beijing, China, October 2001.
[9] M. Slaney, “Semantic-audio retrieval,” in Proc. IEEE ICASSP, 2002, pp. IV-1408-IV-1411.
[11] B. Whitman and R. Rifkin, “Musical query-by-description as a multiclass learning problem,” in IEEE Workshop Multimedia Signal Process., 2002, pp. 153-156.

延伸閱讀