使用廣義高斯模型於未知聲源數之訊號分離

本論文主要探討一種在未知聲源數目下可分離語音的方法。近年來，利用遮蔽來分離訊號的方法已有許多的研究，但是大部分分離的方法都必須在已知聲源數目的情形下才能運行，這是很不實際的做法。在我們的方法中，我們使用廣義高斯混合模型(generalized Gaussian mixture model)來估測聲源方向(direction of arrival, DOA)的統計直方圖分布情形，進而獲得混合訊號中的聲源數目以及聲音源方向。而在計算廣義高斯模型的相關參數中，我們採用了E-M演算法來求取所需的參數。在分離語音的部分，是利用DOA權重遮蔽分離法，使用的語音特徵是聲音源來向(DOA)。根據每個T-F unit的DOA給予各聲源遮蔽相對應位置不同的權重比例，比例就代表著該unit中各聲源佔有的成分多寡。在我們的模擬中，給予兩個麥克風不同延遲與不同衰減的訊號，此舉是為了模擬聲音來自不同的方位，再利用兩個麥克風所收到的混合訊號之間的差異來做分離。論文中除了使用廣義高斯分布，也使用了高斯分布和拉普拉斯分布來估測聲源數，並且與NPCM以及DOA-NPCM比較。在分離語音部分，也測試了二元遮蔽法、DOA權重遮蔽法、NPCM以及DOA-NPCM。實驗結果顯示，五種方法在空間解析度的表現大同小異，但在正確率上，NPCM、DOA-NPCM和廣義高斯模型有較好的效果。在分離的比較上，無論任何情形，DOA權重遮蔽法有最好的SDR。

關鍵字

廣義高斯混合模型；聲源位置；未知聲源數；聲源分離

並列摘要

In this thesis, we propose a method to separate speech signals from spectrograms of sound mixtures with unknown number of sources. Recently, many sparse source separation algorithms using time-frequency masking have been proposed. However, most of these algorithms demand a known number of mixed sources in advance, which is not convenient in practice. In our proposed method, we first model the histogram of estimated angles of the direction of arrival (DOA) with a generalized Gaussian mixture model (GGMM) for detecting the number of sources and sound locations. The GGMM parameters are estimated using the expectation-maximization (EM) algorithm. Based on DOA information of each time-frequency (T-F) unit of the mixed spectrogram, a DOA-weight mask is estimated for each speech signal. The spectrogram of each speech signal is then extracted using the corresponding mask. In our simulations, speech signals are given different delays and amplitude at two microphones to produce DOA information for different locations. In addition to the generalized Gaussian distribution, the Gaussian distribution and the Laplace distribution are also investigated in modeling the DOA histogram. Two kinds of masks, the binary mask and the DOA-weight mask, are investigated in segregating signals from the mixture. Simulation results are compared with outputs of NPCM and DOA-NPCM. Results show that all methods perform equivalently in tests of spatial resolution. On the other hand, the NPCM, DOA-NPCM and GGMM have higher accuracy in estimating the DOA. For segregation, DOA-weight mask performs the best in most test conditions.

並列關鍵字

generalized gaussian mixture model ； sound locations ； unknown number of sources ； sources separation

參考文獻

[1] Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, John Wiley, 2001.

[2] James V. Stone, Independent Component Analysis: a Tutorial Introduction, MIT Press, 2004

[3] Z. Guoxu, Y. Zuyuan, X. Shengli, Y. Jun-Mei, “Mixing matrix estimation from sparse mixtures with unknown number of sources,” IEEE Transactions on Neural Networks, vol.22, pp. 211-221, 2011.

[4] O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. Signal Processing, vol. 52, no.7, pp. 1830–1847, July 2004.

[5] A. Jourjine, S. Rickard, and O. Yilmaz, “Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures,” in Proc. ICASSP, vol.5, pp. 2985-2988, 2000.

國際替代計量

使用廣義高斯模型於未知聲源數之訊號分離

全文下載

主題瀏覽