音訊擷取應用於語音合成之研究

本研究將提出一個能自由地錄製語音合成語料的系統，提供使用者方便的錄製語音合成所需要的語料，籍由儲存客製化的語料，應用於含有TTS功能的系統上，例如：聽取E-mail、盲人導覽系統、幼兒有聲書，各種需要語音輸出的系統皆可應用。在語音辨識部份使用了HTK(Hidden Markov Model Toolkit)做為語音辨識系統的開發工具，利用隱藏式馬可夫模型做為辨識模型，以及使用維特比解碼器(Viterbi decoder)進行音素切割(Force Alignment)將所辨識出來的音訊切割出來並儲存，以供未來Text-to-Speech使用，再加上音量調整、平滑化、音長調整的方法將輸出的語音更自然。在使用HTK為語音辨識，數字辨識的辨識率達到94.29%，命令辨識達到95.69%，為了切割所有中文常用音，我們整理出1166個音，其整體辨識率達64.01%。

關鍵字

語音合成；語音辨識； HMM ； TTS

並列摘要

This study proposes a system that can record and produce the speech corpora freely, and offers users a system to conveniently recording speech corpora for the text-to-speech system. Users can store the customized speech corpora and apply it to any systems with the TTS function. It can be applied to systems that need voice output such as listen E-mail system, blind person guide system, and infant sound book. This research uses HTK (Hidden Markov Model Toolkit) to develop the speech recognition system. The HTK provides the users with HMM (Hidden Markov Models) to be the speech recognition model. This research uses the Viterbi decoder provided by HTK to proceed force alignment and store the recognized the segments for the future text-to-speech. Together with volume adjustment, voice smoothing, and duration adjustment, the output voice will be more natural. When HTK is used for speech recognition, the recognition rate of number has reached 94.29%, and recognition rate of voice command has reached 95.69%. In order to segment all Chinese syllables commonly used in daily life, the research collects 1166 syllables for recognition and the recognition rate is up to 64.01%.

並列關鍵字

Speech synthesize ； Speech recognition ； HMM(Hidden Markov Models) ； TTS(Text-to-speech)

參考文獻

[2] F. Charpentier and M. Stella, “Diphone synthesis using an overlap-add technique,” ICASSP 86, TOKYO, pp. 2015- 2018, 1986.

[6] F. F. Lee, “Time Compression and Expansion of Speech by the Sampling method,” J. Audio Eng. Soc., pp. 738-742, 1972.

[7] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, vol. 77, pp. 257-286, 1989.

[8] L. R. Rabiner and B. H. Juang, “An Introduction to Hidden Markov Model,” IEEE ASSP Magazine, pp. 4-16, 1986.

[9] D. A. Reynolds, “Large Populcation Speaker Identification Using Clean and Telephone Speech,” IEEE SIGNAL PROCESSING LETTERS, vol. 2, pp. 3, March, 1995.

國際替代計量

音訊擷取應用於語音合成之研究

主題瀏覽