以多解析度聽覺模型嵌入之神經網路模擬聽覺專注現象之語音強化演算法

於本論文中，我們根據神經生物學研究發現的專注聽覺現象和生物聽覺實驗發現的大腦聽覺皮質上神經作用的模式，結合現今正當紅的類神經網路學習，發想出一種獨特的類神經網路模型，並針對語音增強這個議題做討論，期望能藉由神經生理學的知識，有效的解決工程上的問題。而我們所設計的這個類神經網路模型，是以基本的卷積神經網路模型作為基底，再作微調整，特別的是，我們嵌入了由 NSL 提出的聽覺模型，把其用於模擬大腦皮質 A1 區，設計可同時解析時頻域資訊的濾波器，放置於卷積神經網路的卷積層當成初始值；之後模型經過訓練，根據設定目標的需要，會自動微調整其中參數，使輸入資料映射至目標的型態，而在我們的語音增強議題上，目標即是乾淨的語音參數。訓練完後的模型，之前嵌入卷積層的濾波器初始值也會被調整至可映射到乾淨語音參數的形式，即自動噪音消除，而這個模型參數微調整的動作，我們認為非常相似於神經生物學上的專注聽覺反應，即當有特定目的要達成時，大腦皮質產生的濾波器與在安靜環境中使用濾波器並不相同。我們設計了幾種不同的比較模型，並且也與傳統的神經網路模型進行比較，進而發現在訓練資料相當不足的情況下，我們所設計的模型表現都優於其他種模型，即可以快速地達到收斂的狀態。

關鍵字

語音增強；聽覺模型；專注聽覺現象

並列摘要

In this thesis, we propose a neural network to emulate auditory attention on speech enhancement. The proposed system integrates a spectro-temporal analytical auditory model with a multi-layer fully-connected network to form a quasi-CNN structure. The initial kernels of the convolutional layer are derived from the neuro-physiological auditory model. To simulate the plasticity of cortical neurons, the kernels are allowed to adjust themselves pertaining to the task at hand. For the application of speech enhancement, the Fourier spectrogram instead of the auditory spectrogram is used as input to the proposed system such that the speech signal can be well reconstructed. The proposed system performs comparably with standard DNN and CNN systems when plenty resources are available. On the other hand, under the limited-resource condition, the proposed system outperforms standard systems in all test settings.

並列關鍵字

speech enhancement ； auditory model ； attentional hearing

參考文獻

[1] T. Chi, P. Ru, and S. A. Shamma, “Multi-resolution spectro- temporal analysis of complex

sounds,” J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887–906, 2005.

[3] T.-S. Chi, T.-H. Lin, and C.-C. Hsu., “Spectro-temporal modulation energy based mask for

Information Retrieval (ISMIR), pp. 617–622, 2014.

[6] T.-S. Chi and C.-C. Hsu, “Multiband analysis and synthesis of spectro-temporal modulations

國際替代計量

以多解析度聽覺模型嵌入之神經網路模擬聽覺專注現象之語音強化演算法

全文下載

主題瀏覽