以三連音素為單位之中文語音辨識

本文使用三連音素為單位的聲學模型(acoustic model)來取代傳統的411個中文音節所構成的聲學模型，以提高在大詞彙下的語音辨識率。而使用跨音節的三連音素模型，由於模型數目過於龐大，使得訓練語料的不足，會有所謂的未曾出現三連音素模型(unseen models)的問題。為了解決這個問題我們用大陸的中文語料來增加訓練的語料庫，以及使用決策樹(decision tree)中層級分享參數(state-tying)的方法，使得三連音素模型有更好的辨識成果。

關鍵字

語音辨識；三連音素；決策樹；狀態栓綁

並列摘要

In this paper, a mandarin speech recognition system based on tri-phone model was constructed. However, there are several practical problems when tri-phone models are applied in the speech recognition system. First, in the speech recognition system, many tri-phone models have only few occurrences in the training data, hence there is no sufficient data for robust parameter estimation of these rarely seen tri-phone models. Second, there are a large number of tri-phone models missing in the training corpus. Unseen tri-phone models are unavoidable when building cross-word tri-phone systems. We use decision tree and more training data to solve these problems.

並列關鍵字

Speech Recognition ； Tri-Phone ； Decision Tree ； State-Tying

國際替代計量

以三連音素為單位之中文語音辨識

全文下載

主題瀏覽