透過您的圖書館登入
IP:18.191.171.20
  • 學位論文

強健性語音辨識上關於特徵正規化與其它改良技術的研究

A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition

指導教授 : 陳柏琳博士
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


並列摘要


In the course of evolution for thousands of years, human beings have continuously acquired as well as accumulated their knowledge from their daily life. Therefore, the civilization and evolution of human beings were almost on a par with each other in the past several thousand years. However, the quick development of technology nowadays has surmounted the evolution of human beings further. For example, huge quantities of multimedia information, such as broadcast radio and television programs, voice mails, digital archives and so on, are continuously growing and filling our computers, networks and lives. Therefore, accessing multimedia information at anytime, anywhere by small handheld mobile devices is now becoming more and more emphasized. It is well known that speech is the primary and the most convenient means of communication between people, and it will play a more active role and serve as the major human-machine interface for the interaction between people and different kinds of smart devices in the near future. Hence, it would be much more comfortable if we could use speech as the human-machine interface, and automatically transcribe, retrieve and summarize multimedia using the speech information inherent in it. However, speech recognition is usually interfered with some complicated factors, such as the background and channel noises, speaker and linguistic variations, etc., which make the current state-of-the-art recognition systems still far from perfect. With these observations in mind, in this thesis, several attempts were made to improve the current speech robustness techniques, as well as to find a way to integrate them together. The experiments were carried out on the Aurora 2.0 database and the Mandarin broadcast news speech collected in Taiwan. Considering the phonetic characteristics of the Chinese language, a modified histogram equalization (MHEQ) approach was first proposed. Separated reference histograms for the silence and speech segments (MHEQ-2), or more precisely, the silence, INITIAL and FINAL segments (MHEQ-3) in Chinese, were established. The proposed approach can yield above 5.75% and 4.04% relative improvements over the baseline system and the conventional table-based histogram equalization (THEQ) approach, respectively, in the clean environments. Furthermore, the spectral entropy features obtained after Linear Discriminant Analysis (LDA) were used to augment the Mel-frequency cepsctral features, and considerable improvements were initially indicated. Finally, fusion of the above proposed approaches was also investigated with very promising results demonstrated.

參考文獻


S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”, IEEE Trans. Acoust. Speech Signal Process. 1981
Chang-wen Hsu, Lin-shan Lee, “Higher Order Cepstral Moment Normalization for Robust Speech Recognition”, ISASSP 2004.
Florian Hilger & Hermann Ney and Olivier Siohan & Frank K. Soong, “Combining Neighboring Filter Channels to Improve Quantile-Based Histogram Equalization”, in Proc. IEEE International Conference, Hong Kong, China, Apr. 2003
Harold Gene Longbotham, Alan Conrad Bovik, “Theory of Order Statistic Filters and Their Relationship to Linear FIR Filters”, IEEE TRANSACTIONS on ACOUSTICS, SPEECH, and SIGNAL PROCESSING, VOL. 37. NO. 2 , 1989
Li Lee and Richard Rose, “A Frequency Warping Approach to Speaker Normalization“, Member IEEE, 1998

延伸閱讀