基於GPU加速之巨量音訊指紋系統

在本論文中，我們使用音訊指紋(Audio Fingerprinting, AFP) 建置在75萬首歌曲的巨量資料庫上，並以GPU (graphical processing unit) 進行平行化運算。此系統可以提供使用者利用手機，快速地錄製任何時候、任何地方所聽到的歌曲，並將錄製好的歌曲片段作為搜尋目標，在藉由GPU加速之音訊指紋系統中找到最相似的歌曲與其相關資訊。為了解決演算法對歌曲長度與曲目總數的限制，我們針對AFP計算中，擷取landmark的步驟進行改良，將因為歌曲長度超過演算法max time限制所產生的不連續landmark區段進行重疊，使特定landmark複製後位移其時間點，避開不連續的時間點後再放入資料庫中。此方法在不同的max time下可以將比對歌曲的landmark個數還原至正常水準，使巨量資料庫維持其辨識效果。接著為了使巨量的資料能在CPU與GPU的有限的記憶體中運算，我們將單一資料庫分散成數個子資料庫，並改良讀取資料庫的方法，使CPU記憶體與GPU記憶體的需求分別大幅減少99.84% 與 80%，讓資料庫的規模不再受限於記憶體，同時使一般的個人電腦上也可以運作巨量資料庫的音訊指紋系統。最後，和原始系統相比，改良之後的系統需要較長的硬碟讀取時間，因此我們將資料庫放在SSD (Solid-state Drive) 硬碟中讀取，能夠使讀取時間相較於原本使用HDD (Hard Disk Drive) 加速近6倍的速度，減少讀取時花費的時間。

關鍵字

音樂檢索；音訊指紋；記憶體；固態硬碟

並列摘要

The goal of this research is to implement an audio fingerprinting system that works on a large-scale song database of 750 thousand songs and performs parallel computing with a GPU (graphical processing unit). Audio fingerprinting is a fast and robust musical retrieval method that allows a user to retrieve an intended song and its related information by recording a snippet of the song, even under a noisy environment. In order to handle the algorithm’s limitation on maximum song length and the number of songs, we improve the landmark extraction step during AFP computation. If the length of a song exceeds the maximum time limit and causes discontinuity in start time of landmarks, we copy the landmarks which are close to the maximum time and then shift the landmarks to avoid the discontinuity; these shifted landmarks are added to the database. This method is able to maintain the number of landmarks under different maximum time settings and thus ensures a satisfactory performance under a large-scale database. In addition, we split the database into several subsets and improve the data loading method so that the system is able to work with a large-scale database in the limited memory. In our method, the CPU and GPU memory requirement are drastically decreased by 99.84% and 80% respectively. Thus the system is no longer limited by the capacity of the available memory and can now work in any personal computer. At last, our system is slower than baseline system due to the frequent reading from the database. To speed up the reading process, we use an SSD (Solid-state Drive) , which allows a 6 times faster reading speed than HDD (Hard Disk Drive) , as the storage device to accelerate the process.

並列關鍵字

music retrieval ； audio fingerprinting ； memory ； SSD ； landmark ； GPU

參考文獻

【9】 J. Haitsma and T. Kalker, “A Highly Robust Audio Fingerprinting System,” in Proceedings of the International Symposium on Music Information Retrieval, Paris, France, 2002.

【10】 Avery Li-Chun Wang, “An Industrial-Strength Audio Search Algorithm,” in Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR), 2003.

【11】 Avery Li-Chun Wang. The shazam music recognition service. Communications of the ACM, 49(8):44-48, 2006.

【13】 Shumeet Baluja and Michele Covell, “Audio Fingerprinting: Combining Computer Vision & Data Stream Processing,” in Proc. of IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007.

【14】 Gustavo Poli, Alexandre L. M. Levada, João F. Mari, José Hiroki Saito, “Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA),” in Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing , SBAC-PAD 2007, Brazil, pp. 19–25, 2007.

國際替代計量

基於GPU加速之巨量音訊指紋系統

全文下載

主題瀏覽