使用異質性線性鑑別分析於特定語料以改進特定應用之語音命令辨識

本論文論述的重點在於針對特定應用從現有語料中挑選特定語料與現有HMM進行鑑別式特徵轉換以改進英語語音辨識。本論文是使用異質性線性分析實做特徵轉換，包含兩大部分：第一部分為「以特定應用語料進行鑑別式特徵轉換」、第二部分為「特徵合併法」。『以特定應用語料進行鑑別式特徵轉換』包含『以少量特定應用語料進行鑑別式特徵轉換』與『針對特定應用以挑選語料進行鑑別式特徵轉換』兩種方法。第一種方法是基於特定應用語料不足的前提下，利用現有HMM以此少量特定應用語料進行特徵轉換；而第二種方法則強調從現有語料中挑選特定語料與現有HMM進行特徵轉換，這兩者的差別在於有無使用特定應用之語料。『特徵合併法』，藉由串接音框特徵的方式來增加每一個音框在時域上的特徵資訊，再搭配特徵降維的技巧訓練出HMM，以改善英語語音辨識系統。為測試所提的方法的效能，我們以整句辨識當作評量的依據。經本論文實驗發現，使用HLDA進行鑑別式特徵轉換有較好辨識效能，特徵合併方法的辨識結果亦優於基礎語音模型。針對上述兩種方法的組合，其辨識率為97.49%，為本論文中最佳的結果。

關鍵字

梅爾倒頻譜係數；異質性線性識別分析；隱藏式馬可夫模型；語音辨識

並列摘要

This research focuses on selecting a specific training data from the existing corpus to conduct a discriminative feature transformation with existing hidden Markov models (HMM) to improve the performance of task-specific English speech recognition. This thesis contains two parts: the first part is task-specific corpus selection for discriminative feature transform using heteroscedastic linear discriminant analysis (HLDA); the second part is feature mergence. Two methods are used for task-specific corpus selection for discriminative feature transformation using HLDA. The first method uses small amount of task-specific corpus to perform discriminative feature transformation. The second method select a subset of the training corpus, based on the task, to perform discriminative feature transformation. The first method uses the existing HMMs to conduct discriminative feature transformation with limited task-specific training data. The second method focuses on selecting specific training data from the existing corpus to conduct discriminative feature transformation with the existing HMMs. The difference between these two methods lies on whether the task-specific corpus is used. The second part of this thesis, features mergence, improves the contextual information of each frame in time domain by cascading the feature of frames. HMMs are then trained with different feature extraction techniques to improve the English speech recognition system. To evaluate the performance of the porposed methods, this thesis uses sentence recognition rate as our performance measure. The experimental result shows that discriminative feature transformation using HLDA has a better performance. Besides, feature mergence also outperforms the baseline acoustic HMMs. Lastly, combining the above two methods achieves the best recognition rate of 97.49% in this research.

並列關鍵字

Mel-frequency cepstral coefficients ； Heteroscedastic linear discriminant analysis ； hidden Markov models ； speech recognition

參考文獻

【2】 Steve Young, The HTK Book version 3.4, Microsoft Corporation, 2009

【3】 Davis, “Comparison of parametric representation for mononsyllabic word recognition in continuously spoken sentences.” IEEE International Conference on Acoustics, 1980

【4】 R. A. Fisher, “The use of Multiple measurements in taxonomic problems,” Ann. Eugen., 1936.

【5】 R. A. Fisher, “The statistical utilization of multiple measurements,” Ann.Eugen., 1938.

【6】 N. Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition,”Ph.D. Thesis, Johns Hopkins Univ., Baltimore, MD, 1997

被引用紀錄

劉承泰（2013）。嵌入式語音命令系統的設計與改進〔碩士論文，國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2511201311364897

國際替代計量

使用異質性線性鑑別分析於特定語料以改進特定應用之語音命令辨識

主題瀏覽