透過您的圖書館登入
IP:3.149.233.6
  • 學位論文

運用語音合成器對人聲自然特性之分析與探討

Human Speech Analysis and Investigation by Utilizing Speech Formant Synthesizer

指導教授 : 陳永耀

摘要


人們對於語音的感知包含了響度,音調,音色。響度反映聲壓的大小,音調描述語音基本頻率的高低。音色的描述最為複雜困難,此特性蘊藏在聲音波形的諧波成分裡。音色代表說話者的特徵,聽者能藉由此項特徵辨別出各種聲音。音色在語音辨識上是相當重要的因素,然而對於音色的物理意義,在以往文獻中並沒有更深入的探討。 傳統語音合成技術,主要分為兩大方法。其一為利用預先錄好之語音,切割成小單位,再將語音小單位組合合成輸出,限制了調變語音的能力。另一方法為共振峰合成,先建立發聲腔道之數學模型,再藉由參數化的控制合成輸出。本論文中,運用了共振峰語音合成器,比較不同說話者之發聲腔道模型,提出一種分析音色的方法,進而探討人聲的一致性。藉由語音技術的分析、模仿、創造,更進一步瞭解聲音的本質。

並列摘要


Human perception of sounds includes loudness, pitch, and timbre. Loudness reflects the amplitude of sound pressure, pitch depicts the amount of fundamental frequency in speech. However, the description of timbre is most difficult. The characteristics of timbre are hidden inside harmonic components of a sound wave. Timber represents the feature of a speaker, which listener can tell speakers apart by this feature. Timbre is such dominant component in speech recognition, but the physical meaning of timbre does not have much investigation in the literature. In conventional techniques of speech synthesis, it can be divided into two broad fields. The first, segment recorded speech into small units, and employ the combination of these small units to synthesize speech, which limit the variety of speech. The other method is formant synthesis, establish vocal tract model beforehand, and use parameters to control the output speech. In this thesis, utilize speech formant synthesizer to compare the vocal tract model of different speakers, and propose a algorithm to analyze timbre, in order to study the consistence of human voice. Via methodologies of speech analysis, imitation, creation, it is expected to figure out the intrinsic feature of speech.

參考文獻


[3] D. Klatt, “Software for a cascade/parallel formant synthesizer”. Journal of the Acoustical Society of America, 67, pp.971-995,1980
[6] T. Chen, “The past, present and future of audio signal processing,” IEEE Signal Processing Magazine, pp. 30–57, September 1997.
[7] B.S. Atal and S.L. Hanauer, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J. Acoust. Soc. Am., vol.50, pp. 637-655, 1971.
[8] S. Smith, “The Scientist and Engineer’s Guide to Digital Signal Processing”. California. Technical Publishing, 1997.
[10] R. Vergin, A. Farhat and D. O’Shaughnessy, “Robust Gender-Dependent Acoustic-Phonetic Modeling in Continuous Speech Recognition Based on a New Automatic Male/Female Classification”. In Proc. International Conference on Spoken Language, vol.2, pp. 1081-1084, 1996.

延伸閱讀