  • 期刊
  • OpenAccess

Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition


In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizer's performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2% and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9.


Aubert, X.(1999).One pass cross word decoding for large vocabularies based on a lexical tree search organization.(Proceedings of the European Conference on Speech Communication and Technology).
Bacchiani, M.,M. Ostendorf(1999).Joint lexicon, acoustic unit inventory and model design.(International Journal of Speech Communication).
Tone contour
Cremelie, N.,J.-P. Martens(1998).In search of pronunciation rules.(Proceedings of the European Speech Communication Association (ESCA) Workshop on Modeling Pronunciation Variation for Acoustic Speech Recognition).
Downey, S.,R. Wiseman(1998).Dynamic and static improvements to lexical baseforms.(Proceedings of the Workshop on Modeling Pronunciation Variations).


游聲峰(2014)。語音辨識 輔助的 台語語料庫 收集方法 探討〔碩士論文,國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2014.00126
Yeh, C. F. (2015). 使用跨語言聲學模型及音框層級語言識別來辨識高度不平衡雙語混合課程之整合性架構 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2015.00605
