A Study of Vocal Removal Techniques for Popular CD Music

指導教授 : 蔡偉和




As most current Karaoke equipments use MIDI music or dedicated VCD/DVD music that separates singer voices from accompaniments into different tracks or channels, it is impossible for users to include new songs directly from their regular CD music. This study attempts to develop a technique for removing or suppressing vocals in regular CD music, so that everyone can produce Karaoke music by himself/herself. The technique is simply called “de-vocal”. In general, a track of regular CD music consists of two similar channels for stereo; each encompasses a mix of vocal signal and accompaniment signal. The stereo is usually man made by putting almost the same vocals in the two channels, but making the accompaniments variant in the two channels, so that the resulting music sounds stereophonic. Motivated by this fact, we subtract one channel’s signal from another one’s, in an attempt to remove the same vocal in the two channels. Experiments show that such a method does perform well in some popular songs, but fails to deal with a vast majority of songs. Hence, we propose a weighted subtraction scheme optimized using least square error, instead of direct subtraction. In addition, to avoid drum sound or bass being removed during the subtraction, we develop a subband devocal method. On the other hand, this study applies blind signal separation techniques to the devocal problem. We investigate how to use real-type and complex-type Independent Component Analysis in time-domain and frequency domain, respectively. Furthermore, recognizing that there is a lack of music groundtruth for evaluating the de-vocal performance, we implement an online system that allows a large number of users to create Karaoke music by uploading their music pieces. This online system enables us to evaluate our devocal methods based on users’ feedback of subjective listening test. A preliminary result show the feasibility of our devocal system.


