利用共同向量法及基頻軌跡法於中文單音之辨識

本篇論文主要是利用共同向量法探討1391 個中文單音之辨識。共同向量法是一種線性子空間分類器，在每一個中文母音類別裡，我們將所有訓練音的特徵向量投影到一個唯一且共同的特徵向量上作為這個類別的語音模型。其優點為若以訓練音做測試，可以得到100％的辨識率。因此共同向量法為一個簡單且容易應用的方法，除了語音辨識外，也可應用於臉部辨識。這篇論文中，我們運用梅爾頻率倒頻譜係數當作語音的特徵，並將研究分成兩部分進行探討。第一部分主要是中文母音及聲調之辨識，其方法是將中文之母音做共同向量法進行辨識，並結合基頻軌跡方法，對聲調進行比對，觀察能否提高母音之辨識率。第二部分則是中文單音之辨識，其方法是將中文單音分割成子音與母音兩部分，分別作共同向量法，觀察在不同的參數下，如子母音的權重比和建構語音模型所運用到的樣本點數等，去比較何種參數組合會有較佳之辨識率。本論文由實驗室十位不同語者資料庫作測試，共有139100 中文單音。由第一部分實驗結果發現不分聲調之母音辨識率為91.2%，其次為分聲調母音之辨識率86.7％，最後為加入基頻軌跡之母音辨識率82.8％。第二部份之實驗結果為，當權重設定為(0.5,0.5)，取樣本點數為4 時，最佳中文單音辨識率為82%。本論文最後也探討連續音之母音辨識。

關鍵字

梅爾頻率倒頻譜係數；特徵擷取；共同向量法；基頻軌跡

並列摘要

This paper is to investigate the 1391 mandarin monosyllable in speaker-dependent system. The method of common vector is used for the speech recognition. The method is simple and easy in application not only for speech recognition but also for face pattern recognition. The common vector approach is a linear subspace classifier. In each class, it projects all training features into a unique common feature as the model for the corresponding class. The advantage of common vector method is to obtain the 100% recognition rate for the trainning dataset. In this paper, we use the Mel-frequency cepstrum coefficient (Mfcc) as the feature in recognition. In the first part of the work is to do the vowel and tone recognitions by using the pitch contour method. The second part is to do the monosyllable recognition. The mandarin monosyllable is divided into consonant and vowel two parts, in which the common vector approach is contructed as model, respectively. The different weights are then given for each consonant and vowel parts as the parameters in speech recognition. The set of training samples will also be considered as the parameter in recognition. The test corpus consists of the recorded speech of ten persons. From the experimental results, we find that the recognition rate of vowel without considering or considering tone distinguish or by using the pitch contour method is 91.2% or 86.7% or 82.8%, respecitively. For the second part, when the weight equals to (0.5,0.5) for consonant and vowel parts, and the training samples is 4, the best monosyllable recognition will be up to 82%. Finally, connect words are also investigated in the work.

並列關鍵字

Mel-frequency cepstrum coefficient ； Feature Extraction ； Common vector ； Pitch contour

參考文獻

Bayrakceken, M.K.,Cay, M.A.,Barkana, A.(2007).Word Spotting Using Common Vector Approach.Signal Processing and Communications Applications, IEEE 15th.(Signal Processing and Communications Applications, IEEE 15th).

Google Scholar

Cevikalp, H.,Neamtu, M.,Barkana, A.(2007).The Kernel Common Vector Method: A Novel Nonlinear Subspace Classifier for Pattern Recognition.Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Trans. on Speech and Audio Processing.37(4),937-951.

Google Scholar

Edizkan, R.,Gülmezoǧlu, M. B.,Ergin, S.,Barkana, A.(2005).Improvements on common vector approach for multi class problems.13th European Signal Processing Conference.(13th European Signal Processing Conference).:

Google Scholar

Ergin, S.,Gülmezoǧlu, M.B.(2007).Face recognition based on face partitions using Common Vector Approach.Communications, Control and Signal Processing, ISCCSP 2008. 3rd International Symposium on.(Communications, Control and Signal Processing, ISCCSP 2008. 3rd International Symposium on).

Google Scholar

Gülmezoǧlu, M.B.,Dzhafarov, V.,Keskin, M.,Barkana, A.(1999).A Novel Approach to Isolated Word Recognition.IEEE Trans. On Speech and Audio Processing.7(6),620-628.

Google Scholar

國際替代計量

利用共同向量法及基頻軌跡法於中文單音之辨識

全文下載

主題瀏覽