頻域上之聲源分離: 利用滑動k-平均演算法解決排列問題

本論文於頻域上做聲源分離，在現實環境中，各聲源是以摺積混合的方式到達麥克風。前人為了降低其運算複雜度，將混合訊號透過短時間傅立葉轉換轉換到時頻域上，再以獨立成份分析法分離每個頻率柱上的訊號。然，分離後的訊號面臨了膨脹問題與排列問題。解決排列問題較複雜，因此也是本論文的探討重點。對於排列問題而言，前人基於相同聲源頻率間的能量包絡具有高相關性而發展出一套相關性演算法。而本論文提出的滑動k-平均演算法將與之比較與分析。經過獨立成份分析法以及解決完兩個後續問題之後，我們可以將每個頻率柱上的解混合矩陣求出，並透過實際測量環境的頻率響應進而算出混合矩陣作為正確答案。理論上，兩者應互為反矩陣。因此，本論文發明了一套評分系統來檢驗兩矩陣相乘後的對角集中度，並定義了兩個客觀指標來量化與評估分離結果。在本實驗中，我們將歌手依據不同性別的組合分為三組。k-平均演算法能達到 90.5% 的排列準確度，將滑動的過程加入後，排列準確度又可以普遍上升1% ~ 3%。另一方面，前人提出的相關性演算法雖能達到更高的排列準確度但卻很容易受到不同參數設定的影響而顯得不夠穩定。以上結果顯示了本論文於解決排列問題而提出之演算法，其效果足以與前人方法抗衡又增加了更高的穩定性。

關鍵字

獨立成分分析；膨脹問題；排列問題

並列摘要

This thesis aims at solving source separation problem in the frequency domain. In an actual environment, mixed source signals are convolutive mixtures. Some previous works indicate that it is easier to separate convolutive mixtures in the 2-dimensional time-frequency domain after applying short-time Fourier transform (STFT) to the signals. Then, independent component analysis (ICA) is utilized to separate the sources in each frequency bin. However, this leaves two uncertain factors to handle, namely the scaling problem and the permutation problem. Among these two problems, the latter is the focus in this thesis. Considering the permutation problem, the correlation method and the sliding k-means method are proposed and compared based on the assumption that higher correlations should be found between the temporal envelopes of neighboring frequency bins from the same source. After going through ICA and solving these two problems, the un-mixing matrix can be calculated. To evaluate the performance, we measured the frequency response of the environment and obtained the mixing matrix which can serve as the ground truth. Then, a scoring system combining both matrices and two objective indices are defined to quantify and evaluate the separation performance objectively. In our experiments, we divide the singers into 3 groups (male+male, female+female, male+female). Among 3 groups, the permutation accuracy of the k-means method can reach at least 90.5 % with respect to different parameters. After introducing the "sliding process", the permutation accuracy generally rises 1~3 %. On the other hand, the correlation method can reach higher permutation accuracy than the k-means method but is vulnerable to parametric variations and shows great instability. The results have shown that our new approach is stable and also yields a comparable performance.

並列關鍵字

independent ； component ； analysis ； k-means ； permutation

參考文獻

[1] Y. Yang, Z. Li, X. Wang and D. Zhang, "Noise source separation based on the blind source separation," IEEE Control and Decision Conference (CCDC), pp. 2236-2240, May, 2011.

Google Scholar

[2] T. Virtanen, "Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria," IEEE Trans. on Audio, Speech, and Language Proc. Vol. 15, no. 3, pp. 1066-1074, 2007.

Google Scholar

[3] J. F. Cardoso, "Source separation using higher order moments," IEEE Int. Conf., Acoust. Speech, and Signal Processing (ICASSP), pp. 2109-2112, May 1989.

Google Scholar

[4] M. Zibulevsky and B. A. Pearlmutter, "Blind source separation by sparse decomposition in a signal dictionary," Neural Computation, vol. 13, no. 4, pp. 863-882, 2001.

Google Scholar

[5] A. Hyvärinen and E. Oja, "Independent component analysis: algorithms and applications," Neural Networks, vol. 13, no. 4-5, pp. 411-430, 2000.

Google Scholar

國際替代計量

頻域上之聲源分離: 利用滑動k-平均演算法解決排列問題

全文下載

主題瀏覽