流行CD音樂之人聲消除技術研究

由於現行的卡拉OK設備大多採用MIDI合成音樂或VCD/DVD實音伴唱曲，使用者若想擴充新歌，則必需不斷地加購伴唱歌曲。為了滿足使用者能隨時歡唱的需求，本論文嘗試發展一項伴唱音樂的自製技術，讓使用者利用已有的CD唱片歌曲轉成不含人聲的伴唱歌曲，此技術稱為「人聲消除」。通常，CD唱片歌曲具有兩個聲音類似的聲道，這兩個聲道都包含歌手演唱聲與伴奏音樂的混合結果，因此一般大眾並不會直接將CD唱片歌曲作為卡拉OK演唱之用。本論文所使用的第一種人聲消除法是根據CD唱片的製作特性而發展。考慮唱片歌曲的立體聲效果通常是因混音師將兩聲道中的演唱歌聲控制成相同，而修改伴奏音樂使得兩聲道聽起來有如聲音環繞的感受，因此我們嘗試將兩聲道訊號相減以扣除人聲。實驗發現某些CD歌曲確實能以此方式產生不錯的伴唱音樂，但仍有大多數歌曲無法獲得理想的結果。因此，我們進一步考慮兩聲道中的演唱音量差異，利用最小平方誤差準則求取兩聲道訊號的最佳權重相減。另外，考慮伴奏音樂與歌聲的頻譜分佈差異，我們利用頻帶切割方式濾除較高能量的歌聲，並保留易遭訊號相減法所刪除的鼓聲。另一方面，本論文應用盲訊號分離技術進行人聲消除。我們分別探討如何使用實數型的獨立成份分析演算法進行時域的樂聲分離，以及使用複數型的獨立成份分析演算法進行頻域的樂聲分離。但由於實際上並無人聲消除結果的標準答案可供方法效能評估，因此本研究實做一線上系統，允許大量使用者上傳歌曲片段以獲得消除人聲後的伴唱音樂，並請求使用者協助進行優劣評比。初步實驗顯示多數使用者認同本論文所提之人聲消除方法的可行性。

關鍵字

人聲消除；最小平方誤差；獨立成份分析演算法。

並列摘要

As most current Karaoke equipments use MIDI music or dedicated VCD/DVD music that separates singer voices from accompaniments into different tracks or channels, it is impossible for users to include new songs directly from their regular CD music. This study attempts to develop a technique for removing or suppressing vocals in regular CD music, so that everyone can produce Karaoke music by himself/herself. The technique is simply called “de-vocal”. In general, a track of regular CD music consists of two similar channels for stereo; each encompasses a mix of vocal signal and accompaniment signal. The stereo is usually man made by putting almost the same vocals in the two channels, but making the accompaniments variant in the two channels, so that the resulting music sounds stereophonic. Motivated by this fact, we subtract one channel’s signal from another one’s, in an attempt to remove the same vocal in the two channels. Experiments show that such a method does perform well in some popular songs, but fails to deal with a vast majority of songs. Hence, we propose a weighted subtraction scheme optimized using least square error, instead of direct subtraction. In addition, to avoid drum sound or bass being removed during the subtraction, we develop a subband devocal method. On the other hand, this study applies blind signal separation techniques to the devocal problem. We investigate how to use real-type and complex-type Independent Component Analysis in time-domain and frequency domain, respectively. Furthermore, recognizing that there is a lack of music groundtruth for evaluating the de-vocal performance, we implement an online system that allows a large number of users to create Karaoke music by uploading their music pieces. This online system enables us to evaluate our devocal methods based on users’ feedback of subjective listening test. A preliminary result show the feasibility of our devocal system.

並列關鍵字

De-vocal ； Least Square Error ； Independent Component Analysis.

參考文獻

[3]A. S. Bregman, Auditory Scene Analysis. Cambridge, MA: MIT Press, 1990.

[6]Y. Li and D. L. Wang, “Separation of Singing Voice From Music Accompaniment for Monaural Recordings,” IEEE Transactions on Audio, Speech, and Language Processing, Vol.15, pp.1475-1487, 2007.

[7]B. Wang and M. D. Plumbley, “Musical audio stream separation by non-negative matrix factorization,” In Proc. Glasgow, 2005.

[8]P. Smaragdis, “Non-negative matrix factor deconvolution extraction of multiple sound sources from monophonic inputs,” In proc. 5th International Conference on Independent Component Analysis and Blind Signal Separation, Granda, Spain, September 22-24, 2004.

[10]H. M. Yu, W. H. Tsai, and H. M. Wang, “A Query-by-Singing System for Retrieving Karaoke Music,” IEEE Transaction on Multimedia,Vol.10, No.8, Dec 2008, pp. 1626-1637.

國際替代計量

流行CD音樂之人聲消除技術研究

全文下載

主題瀏覽