透過您的圖書館登入
IP:3.12.161.77
  • 學位論文

以理解度為基礎之雜訊環境語音合成

Intelligibility-Based Speech Synthesis for Noise Environment

指導教授 : 廖元甫

摘要


本研究主旨之目標是在有雜訊的環境情況下,能使得語音合成的內容理解度更為清楚。在我們日常生活講話的時候,我們耳朵會聽到自己的聲音而後去判斷講話內容清不清楚,然後調整自己說話的方式。 本研究主要是想讓系統跟人一樣有一個遞迴的機制能去調整系統合成出來的聲音。所以,我們使用了一種遞迴的觀念,讓系統自己去計算量測並且調整參數,讓我們在雜訊環境底下的語音合成有更好的內容理解度。為了實現在實做上, 我們利用自動語音辨認(Autocorrelation Speech Recognition , ASR)做辨認,辨認在有雜訊環境底下的語音合成,並將得到的內容理解度結果,透過最小錯誤鑑別式(Minimum Classification Error , MCE)演算法以遞迴的方式來更新模型參數。 從內容理解度偏好測試和主觀理解度測試來看,在不同雜訊干擾下,偏好測試結果顯示,我們的調適方法比起以往能有63.75%的人給予比較好的評估,而主觀理解度測試能降低7.64%的錯誤率。

並列摘要


The goal of purport of this research is under noise environmental situation. It can make the content of speech synthesis understand degree is clearer. When our daily life speaks, we will hear one's own sound and then judge whether the content is known with ears, then adjust one's own way to speak. Research this to want, let system can be automatic recursion to adjust it synthesize speech itself, the same as people hear voice behavior. So we use recursion idea that could calculate content intelligence and adjust system model. Let speech synthesis system have better to synthesize contents to understand degree under noise environment. In order to implement, we utilize ASR(Autocorrelation Speech Recognition) to recognize synthesis sentences which content noise, and calculate content intelligence. Finally, we update model by recursion through MCE(Minimum Classification Error) algorithm. We compared average intelligence of content preference and subjective intelligence of content. From the intelligence of content preference test, it showed that 63.75 % persons think our method better than baseline and subjective intelligence of content test showed that our method have reduction of 7.64 % relative baseline

參考文獻


[2] B-H. Juang and S. Katagiri., “Discriminative learning for minimum error classification.,” IEEE Trans. Signal Processing, 40(12):3043–3054, 1992.
[5] Satoshi Imai, “Cepstral Analysis Synthesis on the Mel Frequency Scale,” in Proc. of ICASSP, pp.93–96, 1983.
[6] Cohen, J., Cohen P., West, S.G., & Aiken, L.S., Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates. 2003.
[7] Y. J. Wu and R. H. Wang, “Minimum generation error training for HMM-based speech synthesis,”in Proc. of ICASSP 2006, vol. 1, pp. 89-92, May. 2006.
[8] Y.J. Wu, R.H. Wang, and F. Soong, “Full HMM training for minimizing generation error in synthesis,” Proc. ICASSP-2007, vol. IV, pp. 517-520, 2007.

延伸閱讀