Multiband Approach to Robust Text-Independent Speaker Identification

This paper presents an effective method for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency bands in order not to spread noise distortions over the entire feature space. To capture the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCCs) of each band are calculated. Furthermore, the cepstral mean normalization technique is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize these multiband speech features, we use feature recombination and likelihood recombination methods to evaluate the task of text-independent speaker identification. The feature recombination scheme combines the cepstral coefficients of each band to form a single feature vector used to train the Gaussian mixture model (GMM). The likelihood recombination scheme combines the likelihood scores of the independent GMM for each band. Experimental results show that both proposed methods achieve better performance than GMM using full-band LPCCs and mel-frequency cepstral coefficients (MFCCs) when the speaker identification is evaluated in the presence of clean and noisy environments.

並列關鍵字

speaker identification ； wavelet transform ； linear predictive cepstral coefficient LPCC ； mel-frequency cepstral coefficient MFCC ； Gaussian mixture model GMM

參考文獻

Alamo, C. M.,Gil, F. J. C.,Munilla, C. T.,Gomez, L. H.(1996).Discriminative training of GMM for speaker identification.Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.1,89-92.

Google Scholar

Allen, J. B.(1994)。How do humans process and recognize speech?。IEEE Transactions on Speech and Audio Processing。2(4)，567-577。

Google Scholar

Atal, B.(1974).Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification.Journal of acoustical society America.55,1304-1312.

Google Scholar

Buck, J. T.,Burton, D. K.,Shore, J. E.(1985).Text-dependent speaker recognition using vector quantization.Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.10,391-394.

Google Scholar

Daubechies, I.(1988).Orthonormal bases of compactly supported wavelets.Communications on Pure and Applied Mathematics.41,909-996.

Google Scholar

被引用紀錄

鄭竹勝（2007）。以多階層向量量化為基礎之語者辨識〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2007.00755

林宗憲（2006）。應用於多頻段LC VCO之有系統的設計程序〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu200600486

國際替代計量

Multiband Approach to Robust Text-Independent Speaker Identification

全文下載

主題瀏覽