本論文論述的重點在於針對特定應用從現有語料中挑選特定語料與現有HMM進行鑑別式特徵轉換以改進英語語音辨識。本論文是使用異質性線性分析實做特徵轉換,包含兩大部分:第一部分為「以特定應用語料進行鑑別式特徵轉換」、第二部分為「特徵合併法」。 『以特定應用語料進行鑑別式特徵轉換』包含『以少量特定應用語料進行鑑別式特徵轉換』與『針對特定應用以挑選語料進行鑑別式特徵轉換』兩種方法。第一種方法是基於特定應用語料不足的前提下,利用現有HMM以此少量特定應用語料進行特徵轉換;而第二種方法則強調從現有語料中挑選特定語料與現有HMM進行特徵轉換,這兩者的差別在於有無使用特定應用之語料。 『特徵合併法』,藉由串接音框特徵的方式來增加每一個音框在時域上的特徵資訊,再搭配特徵降維的技巧訓練出HMM,以改善英語語音辨識系統。 為測試所提的方法的效能,我們以整句辨識當作評量的依據。經本論文實驗發現,使用HLDA進行鑑別式特徵轉換有較好辨識效能,特徵合併方法的辨識結果亦優於基礎語音模型。針對上述兩種方法的組合,其辨識率為97.49%,為本論文中最佳的結果。
This research focuses on selecting a specific training data from the existing corpus to conduct a discriminative feature transformation with existing hidden Markov models (HMM) to improve the performance of task-specific English speech recognition. This thesis contains two parts: the first part is task-specific corpus selection for discriminative feature transform using heteroscedastic linear discriminant analysis (HLDA); the second part is feature mergence. Two methods are used for task-specific corpus selection for discriminative feature transformation using HLDA. The first method uses small amount of task-specific corpus to perform discriminative feature transformation. The second method select a subset of the training corpus, based on the task, to perform discriminative feature transformation. The first method uses the existing HMMs to conduct discriminative feature transformation with limited task-specific training data. The second method focuses on selecting specific training data from the existing corpus to conduct discriminative feature transformation with the existing HMMs. The difference between these two methods lies on whether the task-specific corpus is used. The second part of this thesis, features mergence, improves the contextual information of each frame in time domain by cascading the feature of frames. HMMs are then trained with different feature extraction techniques to improve the English speech recognition system. To evaluate the performance of the porposed methods, this thesis uses sentence recognition rate as our performance measure. The experimental result shows that discriminative feature transformation using HLDA has a better performance. Besides, feature mergence also outperforms the baseline acoustic HMMs. Lastly, combining the above two methods achieves the best recognition rate of 97.49% in this research.