基於分類錯誤之線性鑑別式特徵轉換應用於大詞彙連續語音辨識

線性鑑別分析(linear discriminant analysis, LDA)的目標在於尋找一個線性轉換，能將原始資料投射到較低維度的特徵空間，同時又能保留類別間的幾何分離度(geometric separability)。然而，LDA並不能總是保證在分類過程中產生較高的分類正確率。其中一個可能的原因在於LDA的目標函式並非直接與分類錯誤率連接，因此它也就未必適合在某特定分類器控制下的分類規則，自動語音辨識(automatic speech recognition, ASR)就是一個很好的例子。在本篇論文中，我們藉著探索每一對容易混淆之音素類別間的經驗分類錯誤率(empirical classification error rate)與馬氏距離(Mahalanobis distance)的關係，擴展了傳統的LDA，並且將原來的類別間散佈矩陣(between-class scatter)，從每一對類別間的歐式距離(Euclidean distance)估算，修改為它們的成對經驗分類正確率。這個新方法不僅保留了原本LDA就具有的輕省可解性，同時無須預設資料是為何種機率分佈。另一方面，我們更進一步提出一種嶄新的線性鑑別式特徵擷取方法，稱之為普遍化相似度比率鑑別分析(generalized likelihood ratio discriminant analysis, GLRDA)，其旨在利用相似度比率檢驗(likelihood ratio test)的概念尋求一個較低維度的特徵空間。GLRDA不僅考慮了全體資料的異方差性(heteroscedasticity)，即所有類別之共變異矩陣可被彈性地視為相異；並且在分類上，能藉由最小化類別間最混淆之情況（由虛無假設(null hypothesis)所描述）的發生機率，而求得有助於分類效果提升的較低維度特徵子空間。同時，我們也證明了LDA與異方差性線性鑑別分析(heteroscedastic linear discriminant analysis, HLDA)可被視為GLRDA的兩種特例。再者，為了增進語音特徵的強健性，GLRDA更可進一步地與辨識器所提供的經驗混淆資訊結合。實驗結果顯示，在中文大詞彙連續語音辨識系統中，我們提出的方法都比LDA或其它現有的改進方法，如HLDA等，有較佳的表現。

關鍵字

語音辨識；鑑別分析；特徵擷取；特徵轉換

並列摘要

The goal of linear discriminant analysis (LDA) is to seek a linear transformation that projects an original data set into a lower-dimensional feature subspace while simultaneously retaining geometrical class separability. However, LDA cannot always guarantee better classification accuracy. One of the possible reasons lies in that its criterion is not directly associated with the classification error rate, so that it does not necessarily accommodate itself to the allocation rule governed by a given classifier, such as that employed in automatic speech recognition (ASR). In this thesis, we extend the classical LDA by leveraging the relationship between the empirical phone classification error rate and the Mahalanobis distance for each respective phone class pair. To this end, we modify the original between-class scatter from a measure of the Euclidean distance to the pairwise empirical classification accuracy for each class pair, while preserving the lightweight solvability and taking no distributional assumption, just as what LDA does. Furthermore, we also present a novel discriminative linear feature transformation, named generalized likelihood ratio discriminant analysis (GLRDA), on the basis of the likelihood ratio test (LRT). It attempts to seek a lower dimensional feature subspace by making the most confusing situation, described by the null hypothesis, as unlikely to happen as possible without the homoscedastic assumption on class distributions. We also show that the classical linear discriminant analysis (LDA) and its well-known extension – heteroscedastic linear discriminant analysis (HLDA) are just two special cases of our proposed method. The empirical class confusion information can be further incorporated into GLRDA for better recognition performance. Experimental results demonstrate that our approaches yields moderate improvements over LDA and other existing methods, such as HLDA, on the Chinese large vocabulary continuous speech recognition (LVCSR) task.

並列關鍵字

speech recognition ； discriminant analysis ； feature extraction ； feature transformation

參考文獻

[79] H.-M. Wang, et al., "MATBN: A mandarin Chinese broadcast news corpus," International Journal of Computational Linguistics and Chinese Language Processing, vol. 10, pp. 219-235, 2005.

[85] 張志豪, "強健性和鑑別力語音特徵擷取技術於大詞彙連續語音辨識之研究," 碩士論文: 國立台灣師範大學, 2005.

[83] 劉士弘, "改善鑑別式聲學模型訓練於中文連續語音辨識之研究," 碩士論文: 國立台灣師範大學, 2007.

[1] H.-S. Chiu, et al., "Position information for language modeling in speech recognition," in Proc. ISCSLP, 2008, pp. 101-104.

[2] J. Li, et al., "Soft margin estimation of hidden markov model parameters," in Proc. Interspeech, 2006, pp. 2422-2425.

被引用紀錄

林秋延（2012）。華語文聽說評量與補救教學系統之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315292592

國際替代計量

基於分類錯誤之線性鑑別式特徵轉換應用於大詞彙連續語音辨識

主題瀏覽