由說話者發音特性推估語音辨識率之研究

在本研究中我們發現，即便是自動語音辨識器訓練語料集合內語者，不同語者的語音辨識效果存在明顯的差異。由於此語音辨識率和語者間的關連性，本論文試圖從說話者的發音特性來預估測試語者的語音辨認率。在實驗中，我們嘗試利用語者辨認的技術來推估語音辨認的正確率。我們利用語者辨認的分數，轉換成權重，再以權重和的方式來估計測試語者的語音辨認正確率。實驗中發現，若測試語者為訓練語料中已知的語者並且加入在語者識別中常用的技巧，尤其是背景模型(universal background model; UBM)的運用，根據測試句字數的不同，可以獲得語音辨認正確率的誤差值分別為6.37%、4.57%和4.44%

關鍵字

語音辨認率；語者辨認；語者驗證；背景模型

並列摘要

In this study, we show that the performance of automatic speech recognition (ASR) is inherently speaker-dependent, and even this fact is consistent with the speakers in the training set of ASR. Using this dependency, a method for the estimation of speech recognition rate based on the attributes of speakers was proposed. We tried to estimate the speech recognition rate by using the results of speaker recognition technology. We translated the log-likelihood scores derived from speaker recognition into the weights, and getting the speech recognition rate by using the weighted sum of the speech recognition rate of the speakers in the training set. In our experiments, we found that the speech recognition rate of the speaker who are known in the training set could be well estimated. When we applied the technology of the speaker verification, universal background model (UBM), we could get the estimation error rate of 6.37%, 4.57%, and 4.44% for 2, 3, and 4 words of the test utterance.

並列關鍵字

Speech Recognition Rate ； Speaker Recognition ； Speaker Verification ； Universal Background Model ； UBM

國際替代計量

由說話者發音特性推估語音辨識率之研究

全文下載

主題瀏覽