透過您的圖書館登入
IP:13.58.120.173
  • 學位論文

情緒化語音訊號處理之管線化經驗模態分解技術

A Pipelining Empirical Mode Decomposition for Emotionalized Speech Signal Processing

指導教授 : 周復華

摘要


本論文整合管線化處理技術於非線性訊號之經驗模態分解中,並將之應用於情緒化自發性語音訊號之前端處理。此管線化經驗模態分解有效縮短情緒化自發性語音分解之運算時間,使原本不具有即時性之經驗模態分解技術能運用於具即時線上要求之語音辨識系統中。雖然管線化處理會造成語音訊號的些許失真,唯本論文之實驗 證實此失真對於語音辨識所產生之誤差少於3%。實驗中亦比較有無管線化之經驗模態分解後的訊號差異。同時將此分解後的情緒化語音經過語者辨識系統,比較其辨識率的差異。 情緒化自發語者辨識技術之架構包括:語音訊號處理,管線化經驗模態分解,特徵萃取,雙模辨識等四大部分。本系統之特色在於藉由管線化經驗模態分解,找出固有聲紋特徵的內建模態函數組,並據此建立語者聲學模型與語言詞彙模型。運作上先辨識語者之聲帶特色,再找出具語者個人特色之詞彙庫以辨識其所說 之語音指令。此技術讓語音的辨識更為準確,且個人詞彙庫內之詞彙語音模型亦可同時具有語者個人特性。此等設計對於要求高度個人化與智慧化的數位家庭生活科技,實為一重要突破。

並列摘要


In this thesis, the pipelining technique is integrated into the empirical mode decomposition of the nonlinear signal processing. This newly pipelining empirical mode decomposition also is applied to construct the front processing unit of an emotionalized spontaneous speaker and speech recognition system in this thesis. This novel approach owns the ability to reduce the computing time, so it can empower our emotionalized spontaneous speaker and speech recognition system to satisfy the real-time requirement. There are proved that the speech recognition rate’ difference between the pipelined and non-pipelined voice signals are less than 3%, even though the pipelined voice signals have some distortions. The experiments in the final part of this thesis present the detail comparisons between the pipelined and non-pipelined signals and speech recognition rates. The architecture of the speech recognition system in an emotionalized spontaneous speech includes four major parts: speech signal processing, pipelining empirical mode decomposition, feature extraction, and dual-model identification systems. That feature is to use the pipelining empirical mode decomposition technique to online extract the speaker voice characteristics and identification first, and then to find out the personal and characteristic vocabulary voice model of the person, according to the model to identify the pronunciation voice commands is the last step. This design makes the voice commands identification more accurate, and the stored vocabulary voice model can also owns the personal characteristic of some specified speaker simultaneously.

參考文獻


[1] 王小川,語音訊號處理,初版,全華科技圖書,台北,民國九十三年。
[2] 李政益,「特定語者特定中文語音指令雙模辨識技術」,清雲科技大學,碩士論文,民國九十四年。
[3] (美)帕特森(David A. Patterson) 、(美)亨尼西(John L. Hennessy)著,計算機組織與設計:軟硬體介面,曾志光、鄭光廷譯,第二版,??峯資訊股份有限公司,台北,民國九十二年。
[4] 范世明,「高斯混合模型在語者辨識與國語語音辨認之應用」,國立交通大學,碩士論文,民國九十一年。
[5] 陳文杰,「雜訊環境下經驗模態分解法於語音辨識之應用」,國立中央大學,碩士論文,民國九十五年。

延伸閱讀