透過您的圖書館登入
IP:3.145.167.176
  • 學位論文

改善語音品質之強化學習語音增強演算法

Reinforcement Learning Based Speech Enhancement for Improving Speech Quality

指導教授 : 冀泰石

摘要


去除環境噪音一直以來都是語音訊號處理中非常重要的議題,Koizumi團隊採用深度強化學習去針對語音品質指標進行語音強化,該方法能夠有效地利用有限的訓練資料,且其結果優於映射(mapping)方法。因此,我們針對該方法提出三種優化方式以期進一步加強其結果,首先我們提出另外兩種定義動作(actions)的演算法,並額外探討兩種動作數量的情況並加以比較,然後我們將語音品質指標分數較高的一階深度強化學習語音增強演算法進一步延伸至兩階段深度強化學習語音增強演算法,以分別對高低頻率域進行強化,最後我們更進一步將蒙特卡羅法應用在提出的方法中。實驗設計為量測在不同雜訊種類及強度的情況下語音理解度與語音品質的變化,以評估此三套優化方法之效能。由實驗結果可說明所提出的動作定義演算法的確在語音品質指標上優於原先定義的方式,但不同動作數量並無太大的影響,此外,兩階段深度強化學習語音增強演算法對於語音品質指標有更進一步的提升,最後,結合蒙地卡羅法的深度強化學習語音增強演算法確實有助於語音品質的提升。未來展望可將此套強化學習方法應用在回歸類型的語音增強方法上,以期能有更好的結果。

並列摘要


Speech enhancement to cancel the noises in the environment has been an important topic in speech signal processing. Koizumi's research group proposed deep-neural-network-based reinforcement learning (DNN-RL) to enhance the speech in accordance with the speech quality. Their method is said to utilize limited training data efficiently and is better than the DNN-mapping method. Hence, we propose three optimization techniques to further boost the performance. First, we propose two procedures to define the actions and make a comparison between other number of templates. Second, we extend the one-level DNN-RL which yields the best speech quality to a two-level DNN-RL to separately enhance the high-frequency and low-frequency regions. Last, the Monte Carlo method is combined with the proposed DNN-RLs to ensure the stability of algorithm. To evaluate these three optimization techniques, experiments are designed to measure the difference of speech intelligibility and speech quality under different noise condition. Judging from the experiment results, the proposed procedures of defining actions has higher speech quality scores than the original procedure while the number of actions barely influences the speech quality. Also, the two-level DNN-RL produces better speech quality than the one-level DNN-RL. Last, DNN-RL combined with the Monte Carlo method benefits speech quality. Future work to combine the optimized DNN-RL method into the regression-based speech enhancement method is expected to produce a better result.

參考文獻


[1] J. Y. Li, L. Deng, Y. F. Gong, and R. Haeb-Umbach, "An Overview of Noise-Robust Automatic Speech Recognition," (in English), Ieee-Acm Transactions on Audio Speech and Language Processing, vol. 22, no. 4, pp. 745-777, Apr 2014.
[2] J. Ming, T. J. Hazen, J. R. Glass, and D. A. Reynolds, "Robust speaker recognition in noisy conditions," (in English), Ieee Transactions on Audio Speech and Language Processing, vol. 15, no. 5, pp. 1711-1723, Jul 2007.
[3] L. P. Yang and Q. J. Fu, "Spectral subtraction-based speech enhancement for cochlear implant patients in background noise," (in English), Journal of the Acoustical Society of America, vol. 117, no. 3, pp. 1001-1004, Mar 2005.
[4] M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of speech corrupted by acoustic noise," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'79., 1979, vol. 4, pp. 208-211: IEEE.
[5] J. S. Lim and A. V. Oppenheim, "All-Pole Modeling of Degraded Speech," (in English), Ieee Transactions on Acoustics Speech and Signal Processing, vol. 26, no. 3, pp. 197-210, 1978.

延伸閱讀