Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition

In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizer's performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2% and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9.

並列關鍵字

Bi-lingual ； One-pass ASR ； Pronunciation Modeling

參考文獻

Aubert, X.(1999).One pass cross word decoding for large vocabularies based on a lexical tree search organization.(Proceedings of the European Conference on Speech Communication and Technology).

Google Scholar

Bacchiani, M.,M. Ostendorf(1999).Joint lexicon, acoustic unit inventory and model design.(International Journal of Speech Communication).

Google Scholar

Tone contour

Google Scholar

Cremelie, N.,J.-P. Martens(1998).In search of pronunciation rules.(Proceedings of the European Speech Communication Association (ESCA) Workshop on Modeling Pronunciation Variation for Acoustic Speech Recognition).

Google Scholar

Downey, S.,R. Wiseman(1998).Dynamic and static improvements to lexical baseforms.(Proceedings of the Workshop on Modeling Pronunciation Variations).

Google Scholar

被引用紀錄

游聲峰（2014）。語音辨識輔助的台語語料庫收集方法探討〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2014.00126

呂相弘（2016）。使用深層學習的語音辨識中的跨語言聲學模型〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU201601009

Yeh, C. F. (2015). 使用跨語言聲學模型及音框層級語言識別來辨識高度不平衡雙語混合課程之整合性架構 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2015.00605

國際替代計量

Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition

全文下載

主題瀏覽