透過您的圖書館登入
IP:3.236.214.123
  • 學位論文

一個加速時頻域遮罩之盲訊號分離演算法

Blind Source Separation Using a Fast Time Frequency Mask Technique

指導教授 : 蔡宗漢
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


盲訊號分離主要處理雞尾酒會問題,他的概念是在一個派對中,一些人同時說話,即使身旁有很多干擾,我們也可以很容易去聽某個人的談話內容,這是因為人的大腦可以自然的去分離訊號,但這個過程對於數位電路來說卻很複雜。盲訊號分離的目的為,在一個房間用多個麥克風放不同位置同時錄音,並利用這個訊號,解析原始的聲音源。 應用層面廣泛,包含行動通訊、多使用者通訊系統、吵雜環境下增強語音訊號。 盲訊號分離是以摺積性混合訊號為假設基礎,去做訊號重建之技術。混合訊號會經過短時傅立葉轉換,轉換到頻域,因為訊號源有稀疏性特性,我們可以根據空間特徵,來聚集這些特徵時頻點。特徵擷取最重要的概念就是,兩個不移動的聲源會各自產生聲波傳遞到雙麥克風,因為麥克風相對於聲源有遠近的差異,所以聲波到達麥克風有先後順序。一般來說,可以用各個聲源到兩個麥克風的相位差和強度比作為空間特徵。空間特徵是以複數型態表示,散佈於複數平面上。之後,利用k-meams 演算法,將特徵點分成N類,每一類就代表一個聲源。接下來,使用二元時頻遮罩技術將分類好的時頻點標記出來,也就是說,如果此時頻點屬於目標語音則則為1,若非則為0。最後將完整的遮罩與混合訊號做點乘,即可以得到分離的訊號。最後,將結果利用反短時傅立葉轉換,回到時域。 為了解決旋積盲訊號源分離問題,本論文提出了一個加速時頻域遮罩之盲訊號分離演算法。首先我們先定義兩個特徵參數包括了訊號的強度比以及相位差,然後利用降低資料變異數方式,讓兩個特徵的變異數相似,好處是利於K-means的收斂,再用K-Means演算法對每個頻帶的資料群聚。最後。根據群聚的特徵點,將時頻遮罩結果計算出來。 在真實環境中,我們可以依據麥克風的收錄的聲音,直接分離訊號,再以SDR(Signal to distortion ratio ) 、SIR(Signal to interference ratio) 評估訊號品質。此方法讓聚類速度加快,不僅不會降低訊號品質,演算法簡易。

並列摘要


The goal of BSS is solving cocktail party problem. Imagine a room with a number of persons and microphones for recording. When people are speaking at the same time, each microphone registers a different mixture of individual speaker's audio signals. And the task of BSS is to untangle these mixtures into their sources. There are various applications including mobile telephony, multiuser communication systems, voice reinforce in noisy environment. The mixtures recorded by microphones will be transformed to frequency domain with STFT (Short-Time Fourier Transform). Owing to the characteristics, sparseness and the disjointness ,of the source signal, we can obtain those features from the mixtures during feature extraction step. The features are represented as complex number. Afterwards, by utilizing K-meansalgorithm, we divide those features into N group, where N is the number of sources. Prior to transform the separated signal back to time domain, we adopt mask design to label the target signal, for example, if the target signal is a speech signal, we will label it one, otherwise zero. To solve the convolutive blind source separation (BSS) problem, this thesis presents a new method which utilizing a fast time frequency mask technique. We first define two features, which are normalized level-ratio and phase-difference. Next, we reduce the variance of feature in order to obtain lower iterations of K-Means clustering. Afterwards, with K-means algorithm, we cluster the features by assigning them to the nearest group. In the end, according to the clustered features, a time frequency mask is generated. The method is not only easy, but also faster without reducing the quality of the target signal. In real environment,we also evaluate the separated signal in terms of SDR (signal to distortion ratio) and SIR (signal to interference ratio).

參考文獻


[15] Lars Kai Hansen. “ICA of fMRI based on a convolutive mixture model”, Ninth Annual
[12] Jo¨rn Anemu¨ller; Terrence J. Sejnowski; Scott Makeiga.“Complex independent
[37] Muhammad Z. Ikram; Dennis R. Morgan, “Permutation inconsistency in blind speech
[1] O. M. Mitchell; C. A. Ross; G. H. Yates. “Signal processing for a cocktail party effect,”
Separation Lecture Notes in Computer Science, LNCS 3889, pp. 674–681, 2006.

延伸閱讀