貝氏非負矩陣分解於單通道音訊分離之研究

本論文提出貝氏(Bayesian)非負矩陣分解(Nonnegative Matrix Factorization)演算法並實現於單通道(Single-Channel)語音與音樂訊號分離以及歌聲(Singing Voice)與背景伴奏(Accompaniment)訊號分離。我們使用卜瓦松(Poisson)機率分佈作為相似度(Likelihood)函數，並且以指數(Exponential)機率分佈表示基底矩陣與權重矩陣之事前(Prior)資訊來建立起非負矩陣分解模型，透過變異性貝氏(Variational Bayesian) 期望最大化(Expectation-Maximization, EM)演算法有效地找出貝氏非負矩陣分解模型中變異性參數與模型參數的封閉解。此模型的特色在於指數事前機率分佈的參數可以用來控制基底矩陣的稀疏程度。在變異性貝氏推論過程中，變異性下界(Lower Bound)是以邊際化相似度函數(Marginal Likelihood)為評比基準來處理模型正規化(Model Regularization)的議題，並針對不同的混合訊號自動找出合適的基底個數。本論文評估兩套單通道音訊分離實驗效能，監督式(Supervised)語音與音樂分離以及非監督式(Unsupervised)歌聲與背景伴奏分離。在監督式學習實驗裡，我們驗證了透過貝氏非負矩陣分解自動選取基底個數比傳統非負矩陣分解固定基底個數有較佳的分離效果。另外，非監督式學習必須額外將分解出的基底向量作分群，我們提出的非負矩陣分解分群法及貝氏訊號分離，已證實優於現今文獻中所有其他方法的分離效果。

關鍵字

非負矩陣分解；模型選擇；貝氏學習；單通道訊號分離；歌聲分離

並列摘要

This paper proposes a new Bayesian nonnegative matrix factorization (NMF) approach for speech and music separation as well as for singing voice separation from background music accompaniment. Using this approach, the reconstruction error based on NMF is represented by a Poisson distribution and the NMF parameters, consisting of basis and weight matrices, are characterized by the exponential priors. A variational Bayesian (VB) expectation-maximization (EM) algorithm is developed to implement an efficient closed-form solution to variational parameters and model parameters for monaural audio source separation. Importantly, the exponential prior parameter is used to control the sparseness in basis representation. The variational lower bound in VB-EM procedure is derived as an objective to conduct adaptive basis selection for different mixed signals with variations from different speakers, singers, instruments and background accompaniments. Model regularization is tackled through the uncertainty modeling via variational inference based on the maximization of marginal likelihood. The experiments on supervised single-channel speech/music separation show that the adaptive basis representation in Bayesian NMF performs better than the NMF with the fixed number of bases in terms of signal-to-distortion ratio. In addition, we implement the proposed Bayesian NMF for unsupervised monaural singing voice separation where an additional grouping of the factorized basis vectors is performed. The two groups of basis vectors are obtained to reconstruct the source signals of singing voice and background accompaniment. The experimental results on MIR-1K database demonstrate that the Bayesian NMF performs better than other unsupervised separation algorithms in terms of the global normalized source to distortion ratio.

並列關鍵字

nonnegative matrix factorization ； model selection ； Bayesian learning ； monaural source separation ； singing voice separation

參考文獻

[1] S. Haykin and Z. Chen, “The cocktail party problem,” Neural computation, vol. 17, no. 9, pp. 1875–1902, 2005.

[2] A. Ozerov and C. Fevotte, “Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation,” IEEE Transactions on Audio, Speech, Language Processing, vol. 18, no. 3, pp. 550–563, 2010.

[3] H. Sawada, H. Kameoka, S. Araki, and N. Ueda, “Multichannel extensions of non-negative matrix factorization with complex-valued data,” IEEE Transaction

on Audio, Speech and Language Processing, vol. 21, no. 5, pp. 971–982, 2013.

[4] M. N. Schmidt and R. K. Olsson, “Single-channel speech separation using sparse non-negative matrix factorization,” in Proc. of Annual Conference of International Speech Communication Association, pp. 2614–2617, 2007.

國際替代計量

貝氏非負矩陣分解於單通道音訊分離之研究

全文下載

主題瀏覽