透過您的圖書館登入
IP:3.138.101.95
  • 學位論文

使用時頻變化調變於歌唱語音分離

Singing Voice Separation using Spectro-Temporal Modulations

指導教授 : 冀泰石

摘要


在過去十年來,由於數位聲音科技的進步使得歌聲分離技術逐漸備受關注。在音樂資訊檢索的研究領域中,歌聲訊號或背景音樂訊號有非常多的發展用途,例如歌者辨識、音高擷取、還有音樂曲風分類。然而在多數情況下,歌聲訊號往往是混合著背景音樂訊號,使得純歌聲訊號取得不易。因此如何將歌聲和背景音樂分離就成了重要的課題。 在本論文中,我們使用聽覺感知模型提取出時頻調變參數組合,藉由這些參數組合和EM 演算法做無監督式的二階段聚類分析。我們將MIR-1K 音樂資料庫的歌聲和背景音樂以不同訊雜比(Signal to Noise Ratio) 混合做測試並將我們的時頻調變參數演算法與其他知名演算法做不同訊雜比下的優劣比較。實驗得到本研究所提出的演算法於低雜訊比的情況下能有最佳的分離表現。

關鍵字

歌唱語音 分離 調變 聽覺模型

並列摘要


Over the past decade, the task of singing voice separation has gained much attention due to improvements in digital audio technologies. In the research field of music information retrieval (MIR), separated vocal signals or accompanying music signals can be of great use in many applications, such as singer identification, pitch extraction, and music genre classification. However, in most cases the singing voice is mixed with the accompanying music which makes it difficult to obtain the clean singing voice signal. Thus, separating singing voice from the accompanying music has become an important task. In this thesis, two singing voice separation methods are proposed. The spectro-temporal modulations are extracted from the two-stage auditory model and used as modulation feature sets in the one-stage and two-stage unsupervised clustering system using EM algorithm. The proposed system is tested with the MIR-1K database under different signal-to-noise ratio (SNR) conditions. The experiment results are compared with results of several state-of-the-art unsupervised singing voice separation algorithms and showed better performances in low SNR conditions.

並列關鍵字

singing voice separation modulation auditory model

參考文獻


[2] B. D. J.-L. Durrieu and G. Richard, “A musically motivated mid-level representation for pitch estimation and musical audio source separation,” IEEE J. of Selected Topics on Signal Process, vol. 5, no. 6, pp. 1180–1191, 2011.
[3] D. FitzGerald and M. Gainza, “Single channel vocal separation using median filtering and factorization techniques,” ISAST Trans. on Electron. and Signal Process., vol. 4, no. 1, pp. 62–73, 2010.
[4] Z. Rafii and B. Pardo, “Repeating pattern extraction technique (repet): A simple method for music/voice separation,” IEEE Trans. on Audio, Speech, and Language Process., vol. 21, no. 1, pp. 73–84, 2013.
[5] N. O. H. Tachibana and S. Sagayama, “Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms,” IEEE/ACM Trans. on Audio, Speech, and Language Process., vol. 22, no. 1, pp. 228–237, 2014.
[6] P. S. P.-S. Huang, S. D. Chen and M. Hasegawa-Johnson, “Singing-voice separation from monaural recordings using robust principal component analysis,” in Porc. IEEE Int. Conf. on Acoust., Speech and Signal Process., 2012, pp. 57–60.

延伸閱讀