使用GPU加速哼唱選歌比對

哼唱選歌 (Query by Singing/Humming, QBSH) 是一種將使用者的哼唱聲做為輸入，再從歌曲資料庫中找出最符合使用者哼唱歌曲的技術。本論文將研究如何改善哼唱選歌系統中的比對演算法，利用GPU平行運算的特性，加速比對速度，並且同時提高其辨識率。目前本實驗室的哼唱選歌系統，使用兩種比對演算法，一種是Linear Scaling (LS)，另一種是時間複雜度較高的Dynamic Time Warping (DTW) 比對方法，本論文將DTW演算法由CPU平台移植到GPU平台，並且提出一種改善演算法的作法，讓DTW的運算可以高度平行化，有效利用GPU的運算能力。當只使用原調做為輸入音高進行測試，不產生其他移調的情況下，使用CPU對整個資料庫做DTW比對查詢，需要149秒的運算時間，而採用GPU的DTW演算法只需耗費0.4至0.8秒，兩相比較之下，藉由GPU的加速可達200~350倍。測試用輸入語料來自於MIR實驗室，取自2011年清大資工科學計算課程學生所錄製的哼唱片段，總共有818 組長度約八到十秒的錄音片段，經本論文實作的DTW比對查詢，Top-10的辨識率可以達到69.9%。將DTW的比對時間大幅縮短後，就能經由更多次的移調，來提高辨識率，且維持讓使用者能夠接受的系統反應速度，讓使用DTW演算法的哼唱選歌系統能夠同時服務更多使用者。

關鍵字

音樂檢索；哼唱選歌

並列摘要

A query-by-singing/humming (QBSH) system is a technique that takes the user’s humming sound as input to find the most matching song from the song database. This study intends to discuss how to improve the matching algorithm in a QBSH system using GPU parallel computing features more efficiency, and at the same time to improve the recognition rate. The current version of QBSH system discussed in this study adopts two types of matching algorithms: one method is called linear scaling (LS), and the other method is the more time-costly dynamic time warping (DTW). This study ports the DTW algorithms from the original CPU environment to the GPU environment and proposes a method to improve the algorithm so that the DTW computation can utilize the computing power of GPU effectively to achieve a higher level of parallel computing. By using the original tune as input, without using any key transposition, it would take 149 seconds for the CPU to search through the entire song database; on the other hand, DTW algorithm running on GPU only takes 0.4 to 0.8 second, rendering a computing time reduction ratio around 1/200 to 1/350. The test corpus was constructed by MIR laboratory, using the humming fragments recorded by students of the 2011 scientific computing class in National Tsing Hua University. A total of 818 8-to-10-seconds audio clips was collected. Experimental result shows that our QBSH system can achieve 69.9% top-10 recognition rate. Since the computation time for DTW is greatly reduced, more key transposition trials can be adopted to achieve a higher recognition rate while keeping the response time of the QBSH system within an acceptable limit, and thus the system can provide services to more users simultaneously.

並列關鍵字

Music retrieval ； Query-by-singing/humming ； GPU ； CUDA ； DTW

參考文獻

[1] J.-S. R. Jang, H.-R. Lee, and M.-Y. Kao, “Content-based Music Retrieval Using Linear Scaling and Branch-and-Bound Tree search,” in Proc. of IEEE International Conference on Multimedia and Expo, August 2001.

[3] G. Poli, A. L. M. Levada, J. F. Mari, J. H. Satio, “Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA),” in Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing , SBAC-PAD 2007, Brazil, pp. 19–25, 2007.

[5] D. Sart, A. Mueen, W. Najjar, E. Keogh, and V. Niennattrakul, “Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs,” in ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining, pp. 1001-1006, 2010.

[6] Yaodong Zhang, Kiarash Adl, James Glass, "Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical processing Units", ICASSP 2012.

[8] Yi-Huan Li, ”An Extended Dynamic Time Warping Algorithm and Architecture”, 2008

國際替代計量

使用GPU加速哼唱選歌比對

全文下載

主題瀏覽