透過您的圖書館登入
IP:3.16.30.154
  • 學位論文

具有預測理解度之預訓練模型和零點控制波束成形技術的雙通道語音增強系統

A Two-channel Speech Enhancement System with a Pre-trained Intelligibility Prediction Model and Null-steering Beamforming

指導教授 : 蘇柏青

摘要


波束成形技術時常用於許多多通道語音增強系統中,以抑制降低語音理解度的指向性干擾訊號。傳統波束成形器通常基於「波達方向」(direction-of-arrival, DOA)、「能量頻譜密度」(power spectral density, PSD)、「相對轉移函數」(relative transfer function, RTF)、共變異數矩陣等參數的精準估計值來進行最佳化。但精準估計這一些參數有時候是一件很不容易的任務。在這一本論文中,我們提出了一個新的波束成形框架,此框架是基於一個能預測訊號的「短時客觀理解度」(short-time objective intelligibility, STOI) 的預訓練模型:STOI-Net 來提升吵雜語音訊號的理解度。該方法稱作「具理解度意識的零點控制波束成形技術」(intelligibility-aware null-steering beamforming, IANS)。吵雜語音訊號會先送進一群零點控制波束成形器來產生一序列的訊號。這一些訊號會再送進STOI-Net 來決定何者具有最高的理解度。實驗結果顯示我們可以利用一個雙麥克風陣列搭配我們提出的方法在多個情境中提升語音訊號的理解度。其STOI 增強效果類似於在已知目標以及干擾訊號之DOA 的狀況下所產生的波束成形結果。

並列摘要


Beamforming technology is commonly used in many multi-channel speech enhancement systems to suppress directional interfering signals that degrade speech intelligibility. Traditional beamformers are usually optimized based on accurate estimations of parameters such as the direction-of-arrival (DOA), power spectral densities, relative transfer functions, and covariance matrices. However, accurately estimating these parameters could be a challenging task. In this thesis, a novel beamforming framework is proposed to enhance the intelligibility of noisy speech signals based on a pre-trained short-time objective intelligibility (STOI) prediction model, STOI-Net. This framework is referred to as intelligibility-aware null-steering beamforming (IANS). The noisy speech signal is first sent into a set of null-steering beamformer to generate a set of signals. These signals are then sent into STOI-Net which determines the signal corresponding to the highest intelligibility. Experiment results show that our proposed method, using a two-channel microphone array, is capable of generating intelligibility-enhanced speech signals in multiple scenarios. These signals have STOI scores similar to those generated using beamforming methods given the DOAs of the speech and interfering signals.

並列關鍵字

beamforming null-steering STOI STOI-Net

參考文獻


[1] J. B. Allen and D. A. Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4):943–950, 1979.
[2] S. Araki, H. Sawada, R. Mukai, and S. Makino. DOA estimation for multiple sparse sources with normalized observation vector clustering. In Proc. ICASSP, pages 33–36, 2006.
[3] M. R. Bai, J.-G. Ih, and J. Benesty. Acoustic array systems: theory, implementation, and application. John Wiley & Sons, 2013.
[4] J. Capon. High-resolution frequency-wavenumber spectrum analysis. Proceedings of the IEEE, 57(8):1408–1418, 1969.
[5] J. H. DiBiase. A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. PhD thesis, Brown University, Providence, R.I., 2000.

延伸閱讀