適用於語者年齡層分類之特徵參數探討

在自動語音對話系統的應用中，從語音信號偵測使用者的年齡層分類可提供智慧型應答所需的重要參考資訊。此外，由於一般語音辨識系統的聲學模型大多取自於青壯年語者，當面對的使用者是兒童或老年人時，會因聲學模型的不匹配造成語音辨識率的下降。若能在進行語音辨識前事先知曉使用者的年齡，再載入適當的聲學模型或者進行語者調適，將可以有效降低語音辨識系統因模型不匹配所造成的辨識錯誤。本篇文章中，特徵參數取自於梅爾頻率倒頻譜係數(MFCC)、差分化梅爾頻率倒頻譜係數(ΔMFCC)和共振峰頻寬，並利用高斯混合模型建立青壯年族群和老年族群兩大年齡層的統計模型。實驗結果顯示，利用所提出的特徵參數作為年齡層分類的依據，可以得到很好的年齡層分類結果。

關鍵字

語者年齡層分類；高斯混合模型；梅爾頻率倒頻譜係數；共振峰頻寬

並列摘要

Speaker's age classification can provide crucial information to an intelligent automatic speech server when interacting with users. Besides, in a conventional automatic speech recognizer, acoustic models are mostly trained on speech spoken by adult speakers. For children and elder users, mismatch of acoustic model will result in the degraded performance of speech recognition. If a speaker's age is known in advance, degraded performance of speech recognition due to acoustic model mismatch can be reduced if we exchange a suitable acoustic model on-the-fly or apply speaker adaptation techniques. In this paper, Mel-frequency cepstral coefficients (MFCC), delta Mel-frequency cepstral coefficients (ΔMFCC), and bandwidth of formants are suggested as our feature parameters. Then Gaussian mixture models are employed to model those feature vectors extracted from adult and elder speakers. The compelling experiment results demonstrate that our proposed method is successful in classifying speakers by their ages.

並列關鍵字

Speaker's Age Classification ； Gaussian Mixture Model ； Mel-Frequency Cepstral Coefficient ； MFCC ； Formant Bandwidth

被引用紀錄

王麒讚（2008）。基於影像及語音識別技術之即時門禁監控系統〔碩士論文，崑山科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0025-1208200814315500

國際替代計量

適用於語者年齡層分類之特徵參數探討

全文下載

主題瀏覽