論文的研究目的是改進本實驗室嵌入式語音命令系統的效能。主要重點是加快系統處理速度,並希望降低因加快速度額外產生的錯誤率。 本論文提出的大方向是降低系統所需處理的特徵維度,共有兩種方法。第一種方法是直接降低39維梅爾倒頻譜係數的維度。第二種方法是將特徵合併之後,使用異質性線性鑑別分析進行降維,並且透過放大係數整數化轉換矩陣,置入於系統內以即時進行特徵轉換。我們亦基於第二種方法進行其他實驗,比較不同設定之下的辨識率。 最後實驗結果顯示,使用異質性線性鑑別分析進行降維,除了能加快系統處理速度之外,更可有效降低錯誤率。其整體辨識率不但比直接降維的方法好,甚至在某些條件下可超越原先39維特徵的結果,但是辨識效能會隨聲學模型之mixture component數以及進行分析的方法而改變。因此,我們可針對需求使用最適合的方式進行異質性線性鑑別分析,以求得到最好的效果。
The purpose of this research is to improve the performance of our lab’s embedded voice command system. The goal is to speed up the processing time and reduce the additional errors caused by our method. In order to do so, we propose two methods to reduce the feature dimension required by the system as follows. The first one is to directly reduce original 39 dimensions. The second one is to use heteroscedastic linear discriminant analysis after increasing the dimension of original feature vectors. Then, we change the floating-point transform matrix to a fixed-point version through a scale factor and store it in the system for the feature transformation in runtime. Based on the second method, different parameter settings are tested. The final experimental result shows that the second method (heteroscedastic linear discriminant analysis) outperforms the first method (direct feature reduction). The first method even performs better than the original method with 39 dimension of feature in some cases. This result indicates that heteroscedastic linear discriminant analysis is able to effectively accelerate the recognition time while at the same time reduce the error rate. However, the experimental results also show that the performance changes with different number of mixture components in acoustic models and the analysis method. We can therefore choose the most suitable way to do the analysis for the best performance.