新一代語音辨認技術將是以知識為基礎的方式加上資料驅動模式,建構含有語音與語言學知識的語音辨識系統,並以多組特徵參數取代僅以一組特徵參數作為語音辨認的聲學特徵參數。除了傳統的聲學頻譜參數之外,還可以擷取多組語音特徵參數,用以偵測發音方法、發音部位、口腔形狀等語音事件。 基於這個概念,本論文針對國語連續語音,做靜音(silence)、響音(sonorant)、與阻塞音(obstruent)之偵測,然後再進行阻塞音之分類與辨識之研究。國語阻塞音共包含有18個音素,可以依照發音部位的不同,將之區分為塞音(stop)、擦音(fricative)以及塞擦音(affricate)三種類型;塞音與塞擦音依照發音時是有有吐氣行為,可區分為不送氣音與送氣音兩種類型,個別包含了三種不同發音部位之六種音素;摩擦音依照發音時聲帶是否振動,可區分為不帶聲音與帶聲音兩種類形,包含了五種不同發音部位之六種音素。本論文探討可以分辨發音方法與發音部位的特徵參數,以標註過的訓練語料來統計各特徵參數之分布,並決定其作為語音事件偵測與辨識之門檻值。 本論文之語音特徵參數是從聽覺模型(auditory model)推導出來,聽覺模型是一種模擬人耳聽覺神經處理聲音訊號的過程,本論文採用Seneff聽覺模型(Seneff auditory model)做為前端處理器,取其兩種輸出,即包絡頻譜(envelope spectrum)與同步頻譜(synchrony spectrum),利用此兩組輸出來計算特徵參數。
A study on acoustic-phonetic features for the obstruent detection and classification based on the knowledge of Mandarin speech is proposed. Seneff auditory model is used as the front-end processor for extracting acoustic-phonetic features. These features are rich in their information content in a hierarchical decision process to detect and classify the Mandarin obstruents. The preliminary experiments showed that accuracy of obstruent detection is about 84%. An algorithm based on the information of feature distribution is applied to further classify the obstruents into stops, fricatives, and affricates. The average accuracy is about 80%. The proposed approach based on the feature distribution is simple and effective. It could be a very promising method for searching acoustic-phonetic features for the phone recognition in continuous speech recognition.