嵌入式語音命令系統的設計與改進

論文的研究目的是改進本實驗室嵌入式語音命令系統的效能。主要重點是加快系統處理速度，並希望降低因加快速度額外產生的錯誤率。本論文提出的大方向是降低系統所需處理的特徵維度，共有兩種方法。第一種方法是直接降低39維梅爾倒頻譜係數的維度。第二種方法是將特徵合併之後，使用異質性線性鑑別分析進行降維，並且透過放大係數整數化轉換矩陣，置入於系統內以即時進行特徵轉換。我們亦基於第二種方法進行其他實驗，比較不同設定之下的辨識率。最後實驗結果顯示，使用異質性線性鑑別分析進行降維，除了能加快系統處理速度之外，更可有效降低錯誤率。其整體辨識率不但比直接降維的方法好，甚至在某些條件下可超越原先39維特徵的結果，但是辨識效能會隨聲學模型之mixture component數以及進行分析的方法而改變。因此，我們可針對需求使用最適合的方式進行異質性線性鑑別分析，以求得到最好的效果。

關鍵字

梅爾倒頻譜係數；異質性線性鑑別分析；語音辨識

並列摘要

The purpose of this research is to improve the performance of our lab’s embedded voice command system. The goal is to speed up the processing time and reduce the additional errors caused by our method. In order to do so, we propose two methods to reduce the feature dimension required by the system as follows. The first one is to directly reduce original 39 dimensions. The second one is to use heteroscedastic linear discriminant analysis after increasing the dimension of original feature vectors. Then, we change the floating-point transform matrix to a fixed-point version through a scale factor and store it in the system for the feature transformation in runtime. Based on the second method, different parameter settings are tested. The final experimental result shows that the second method (heteroscedastic linear discriminant analysis) outperforms the first method (direct feature reduction). The first method even performs better than the original method with 39 dimension of feature in some cases. This result indicates that heteroscedastic linear discriminant analysis is able to effectively accelerate the recognition time while at the same time reduce the error rate. However, the experimental results also show that the performance changes with different number of mixture components in acoustic models and the analysis method. We can therefore choose the most suitable way to do the analysis for the best performance.

並列關鍵字

Mel-frequency cepstral coefficients ； heteroscedastic linear discriminant analysis ； speech recognition

參考文獻

[2] 陳奕宏，”32位元處理器之定點數MFCC演算法的改進與探討”，清華大學碩士論文，2006年。

[3] 黃俊仁，”嵌入式語音辨識之改良”，清華大學碩士論文，2009年。

[5] 陳揚昇，”結合多重聲學模型來改進英語語音評分”，清華大學碩士論文，2011年。

[4] 扈均，“Improvement of 32-bit Embedded Speech Recognition Systems”，清華大學碩士論文，2012年。

[6] 莊芫綱，”使用異質性線性鑑別分析於特定語料以改進特定應用之語音命令辨識”，清華大學碩士論文，2012年。

被引用紀錄

范君豪（2016）。使用深度學習以改善語音評分之方法與比較〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU201602150

國際替代計量

嵌入式語音命令系統的設計與改進

主題瀏覽