情緒化語音訊號處理之管線化經驗模態分解技術

本論文整合管線化處理技術於非線性訊號之經驗模態分解中，並將之應用於情緒化自發性語音訊號之前端處理。此管線化經驗模態分解有效縮短情緒化自發性語音分解之運算時間，使原本不具有即時性之經驗模態分解技術能運用於具即時線上要求之語音辨識系統中。雖然管線化處理會造成語音訊號的些許失真，唯本論文之實驗證實此失真對於語音辨識所產生之誤差少於3%。實驗中亦比較有無管線化之經驗模態分解後的訊號差異。同時將此分解後的情緒化語音經過語者辨識系統，比較其辨識率的差異。情緒化自發語者辨識技術之架構包括：語音訊號處理，管線化經驗模態分解，特徵萃取，雙模辨識等四大部分。本系統之特色在於藉由管線化經驗模態分解，找出固有聲紋特徵的內建模態函數組，並據此建立語者聲學模型與語言詞彙模型。運作上先辨識語者之聲帶特色，再找出具語者個人特色之詞彙庫以辨識其所說之語音指令。此技術讓語音的辨識更為準確，且個人詞彙庫內之詞彙語音模型亦可同時具有語者個人特性。此等設計對於要求高度個人化與智慧化的數位家庭生活科技，實為一重要突破。

關鍵字

管線化處理；經驗模態分解；內建模態函數；語者辨識

並列摘要

In this thesis, the pipelining technique is integrated into the empirical mode decomposition of the nonlinear signal processing. This newly pipelining empirical mode decomposition also is applied to construct the front processing unit of an emotionalized spontaneous speaker and speech recognition system in this thesis. This novel approach owns the ability to reduce the computing time, so it can empower our emotionalized spontaneous speaker and speech recognition system to satisfy the real-time requirement. There are proved that the speech recognition rate’ difference between the pipelined and non-pipelined voice signals are less than 3%, even though the pipelined voice signals have some distortions. The experiments in the final part of this thesis present the detail comparisons between the pipelined and non-pipelined signals and speech recognition rates. The architecture of the speech recognition system in an emotionalized spontaneous speech includes four major parts: speech signal processing, pipelining empirical mode decomposition, feature extraction, and dual-model identification systems. That feature is to use the pipelining empirical mode decomposition technique to online extract the speaker voice characteristics and identification first, and then to find out the personal and characteristic vocabulary voice model of the person, according to the model to identify the pronunciation voice commands is the last step. This design makes the voice commands identification more accurate, and the stored vocabulary voice model can also owns the personal characteristic of some specified speaker simultaneously.