透過您的圖書館登入
IP:3.144.77.71
  • 學位論文

IMBE語音編碼器合成音質改進之研究

ON IMPROVING SYNTHETIC SPEECH QUALITY FOR THE IMBE VOCODER

指導教授 : 李清坤
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


IMBE (Improved Multi-Band Excitation Vocoder) 語音編碼器主要應用於語音通訊中,也被美國電信工業協會及公共安全通信協會訂為其語音編碼通訊標準。其特色在於不但擁有高品質的合成語音效果,且位元傳輸速率低,抗雜訊強,非常符合成本效益。 IMBE語音編碼模式與傳統的語音編碼模式的主要差別是在於激勵源的處理方式。傳統的語音編碼模式是將每一個音框(speech frame)判別為濁音音框(voiced frame)或是清音音框(unvoiced frame),反之IMBE語音編碼模式則是將每一個音框在頻域上再細分成多個子頻帶(subbands),並且分別對每一個子頻帶判別其為濁音子頻帶或清音子頻帶。因此,IMBE語音編碼器可以容許一個音框的激勵源同時擁有濁音的週期特性與清音的類雜訊(noise-like)特性,從而可以重建更接近自然聲音的合成語音。 本篇論文的研究重點是探討IMBE語音編碼器對各個子頻帶濁/清音判斷的正確性與對合成語音品質的影響。我們首先以單音來進行基礎測試,觀察是否存在濁/清音判斷錯誤的情況。再以自然語句來進行測試,主要是觀察在濁/清音轉換時判斷錯誤的情況及模式。從以上實驗整理探討濁/清音判斷錯誤的因素,進而提出改進法則,試圖重建更接近自然音質的合成語音。

關鍵字

基頻 諧波 頻譜 音框 清音 濁音 IMBE

並列摘要


The IMBE (Improved Multi-Band Excitation) speech coder has been widely used in digital voice communication and has been approved as a speech coding standard by the Telecommunications Industry Association (TIA) and the American Association of Public Safety Communications. The IMBE speech coder has many advantageous features including high-quality synthetic speech, low bit rate, high noise-immunity, and very cost effective. A primary difference between traditional speech models and the IMBE speech model is the excitation signal. In conventional speech models a single voiced/unvoiced (V/UV) decision is used for each speech frame. In contrast the IMBE speech model divides the excitation spectrum into a number of non-overlapping frequency bands (called subbands) and makes a V/UV decision for each subband. This allows the excitation signal for a particular speech frame to be a mixture of periodic (voiced) and noise-like (unvoiced) energy, and thus is possible to reconstruct more nature synthetic speech. The main research goal of this thesis is to investigate, for the IMBE speech coder, the performance of the V/UV decision and its effect on the synthetic speech quality. We first tested sustained phonemes and observed the error patterns. Then, nature sentences were tested to observe the V/UV decision performance, especially for the V/UV transition regions. We analyze our experimental results and proposed modified V/UV decision rules trying to improve the synthetic speech quality.

參考文獻


[1]A. Oppenheim and R. Schafer, Discrete Time Signal Processing, Prentice-Hall, 1989.
[3]L. B. Almeida and F. M. Silva, “Variable Frequency Synthesis: An Improved Harmonic Coding Scheme,” in Proc. ICASSP 84, San Diego, CA, Mar. 1984, pp. 289-292.
[4]J. P. Campbell, “The New 4800 bps Voice Coding Standard,” in Proc. Mil. Speech Tech. 89, Washington D.C., Nov. 1989, pp. 64-70.
[5]J. Hu, S. Xu, and J. Chen, “A Modified Pitch Detection Algorithm,” IEEE Communications Letters, Vol. 5, No. 2, pp. 64-66, Feb. 2001.
[7]L. M. Van Immerseel and J-P. Martens, "Pitch and Voiced/Unvoiced Determination with an Auditory Model," J. Acoust. Soc. Amer. Vol. 91(6), pp.3511-3526,Feb .1992.

延伸閱讀