以雙向檢索及排序學習演算法來改進音訊指紋辨識

音訊指紋辨識是一種快速且成熟的音樂檢索手段，使用者輸入其藉由麥克風錄製的一段音訊，讓系統抽取該音訊片段的特徵，再與資料庫中的歌曲特徵進行比對，最後輸出符合程度最高的結果給使用者。本篇論文將以在噪音環境下隨機錄製的音訊當作查詢片段，模擬在現實生活中錄製的音訊的過程與結果，嘗試找出針對抽取歌曲特徵這一步驟的改良方法。在一個以地標為特徵基礎的音訊指紋系統中，我們嘗試改變組成地標方式與內容，也進一步改良地標的檢索方式。其中雜湊表中包含的訊息越多，就可以用更多的條件進行過濾，需要比對的地標數量也會隨之減少，輸出配對成果的速度也會隨之提高。我們也改進了檢索地標的方式，藉由雙向檢索 (bi-directionalretrieve) 得到更多對辨識結果有正向幫助的資訊，來將初始的配對結果進行二次評分，初步提高辨識結果的準確率，接著利用排序學習演算法 (learning to rank) 來重新排序評分結果，使得辨識率進一步提高。

關鍵字

音樂檢索；音訊指紋系統；地標；雙向檢索；排序學習演算法； AdaRank

並列摘要

Audio Fingerprint (AFP) Recognition is well known as a rapid and mature strategy in audio information retrieval. End user records an audio snippet as the input of our AFP system, the system would extract the features of the input snippet, then it would compare the features of snippet with the features in database which is formed by selected audio data set (known as ground truth). Finally the system returns the most likely match with details (song name, author, ..., etc) from database. In this thesis, we would randomly record voices with highly noise-affected environment, and set these audio snippets as query piece (input). We use these query piece to simulate the audio record in real life, and try to find a method to improve the way we used for feature extraction. Based on the AFP system which uses landmark as basic feature, we try to change the content in the landmark to format different kind of Hash table. The more information contained in a Hash table, the more criteria we can use to filter the landmarks, and then we can check fewer landmark to get match result, this reduces the query time. We also improve the method for retrieving the hash table. Via Bi-directional Retrieve, we can get much more positive information from the same hash table to re-rank the match result, and increase the accuracy of match result. Further more, we use the algorithm from learning to rank to re-rank the match result, and then get the better accuracy.

並列關鍵字

audio retrieval ； audio fingerprint system ； landmark ； bi-directional retrieval ； learning to rank ； AdaRank

參考文獻

[1] A. L. Wang, “An industrial-strength audio search algorithm,” in ISMIR 2003, 4th Symposium Conference on Music Information Retrieval, 2003, pp. 7–13.

Google Scholar

[2] Hyoung-Gook Kim, Hye-Seung Cho, and Jin Young Kim, “Robust audio fingerprinting using peak-pair-based hash of non-repeating foreground audio in a real environment,” Cluster Computing, vol. 19, pp. 315–323, 2016.

Google Scholar

[3] Hsin-Fu Liao, “Improvement of landmark-based audio fingerprinting with target zone and hash table tuning,” M.S. thesis, Department of Computer Science and Information Engineering, College of Electrical Engineering and Computer Science, National Taiwan University, 2018.

Google Scholar

[4] “音樂史,” https://zh.wikipedia.org/wiki/%E9%9F%B3%E6%A8%82%E5%

Google Scholar

8F%B2, Accessed: 2019-11-12.

Google Scholar

國際替代計量

以雙向檢索及排序學習演算法來改進音訊指紋辨識

主題瀏覽