改良式梅爾倒頻譜係數混合多種語音特徵之研究

本篇論文主要研究的主題是語音辨識系統中的特徵值擷取以及特徵參數補償的部分，前者目的是將不同的特徵值做合併，其中將線性預估倒頻譜係數與梅爾倒頻譜係數結合的效果是最佳的，本論文使用高斯型的梅爾濾波器組來取代原本梅爾倒頻譜係數中的三角濾波器組，而經過實驗證實，將線性預估倒頻譜係數與梅爾倒頻譜係數以1:1的方式做合併效果是最好的，除了將特徵參數做合併之外，本論文還利用倒頻譜平均值與變異數正規化法來補償倒頻譜係數並提升整體系統的辨識效果。

關鍵字

語音辨識；特徵合併；梅爾倒頻譜係數；關鍵詞萃取

並列摘要

This thesis studies the speech feature extracting and feature compensation in speech recognition. Several speech features are selected for combinations. The best one is cascading Linear Prediction Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficient (MFCC). The MFCCs used here are obtained by utilizing a Gaussian Mel-Frequency band instead of using a triangular filter bank. And by experiments, it is found that the best combination ratio of LPCC and MFCC is 1:1. The thesis also showed that further improved performance is possible if Cepstral Mean and Variance Normalization (CMVN) is added.

並列關鍵字

speech recognition ； feature combination ； MFCC ； keyword spotting

參考文獻

[30]謝宗學，「加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識」，南投：國立暨南國際大學碩士論文，2006。

[1]J. P. Campbell and JR., “Speaker recognition: a tutorial,” Proceedings of the IEEE , vol. 85, no. 9, pp. 1437-1462, 1997.

[3]B. H. Juang and S. Furui, “Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication,” Proceedings of the IEEE , vol. 88, no. 8, pp. 1142-1165, 2000.

[4]林品宏，「關鍵詞萃取系統及語音聲控車之應用」，桃園：國立中央大學碩士論文，2012。

[5]J. Bradbury, “Linear predictive coding,” Online PDF, pp. 1-23, 2000.

國際替代計量

改良式梅爾倒頻譜係數混合多種語音特徵之研究

未授權

主題瀏覽