從語音訊號估計發聲狀態下之口鼻腔截面積

由語音訊號回推發聲腔道的截面積函數(Vocal Tract Area Function, VTAF)是一種逆散射問題(inverse scattering problem)。VTAF 的估測結果在醫療上可應用於失語症(aphasia)或吶語症(dysarthria)的臨床分析、聽障者的語言訓練、語言學習與語音矯正等。它提供了一種聲音上視覺回饋的手段。在Wakita 所建立的分析方法中，指出訊號分析上的逆向數位濾波器(inverse digital filter)與物理上的聲學節管模型(acoustic tube model)兩者的濾波過程是相同的。其過程可以透過線性預估編碼(linear prediction coding)與分析格狀架構 (analysis lattice structure)的轉換關係來解釋，並且使用Levinson Durbin 遞迴演算法(Levinson-Durbin recursive algorithm)加速其運算效率。由於上述過程將濾波器的轉移函數限制為全極點型式(all-pole type)，整個發聲過程被描述為以聲帶振動為發聲源，聲波由喉嚨傳遞至口腔而後由嘴唇發出的過程，其中並無考慮到鼻腔共振所造成的影響。本論文使用Schnell 與Lacroix 所出的方法，基於Burg-lattice 以遞迴的方式將整個聲道共鳴腔的轉移函數描述為極－零點的型式(pole-zero type)。接著，根據此轉移函數以及人體發聲腔道構造與大致的尺寸，設定合適的初始條件與邊界條件，將發聲腔道描述為由主腔體(main tract)、口腔(oral cavity)與鼻腔(nasal cavity)三個部分組合而成的三分支交界型態(three-branched model)。最後，我們使用簡單的多項式因式分解，搭配地毯式搜索最佳解的手段，嘗試在考慮鼻腔共振的情況下，進行口腔與鼻腔兩部份截面積的同時估測。我們以鼻音化與非鼻音化的母音作為測試對象，驗證聲學節管模型與逆散射解之合理性。

關鍵字

線性預估；逆散射；分析格狀架構

並列摘要

Estimation of vocal tract area function (VTAF) from speech signals is an inverse scattering problem. The medical application of VTAF includes clinical analysis of aphasia and dysarthria, language training, and phoniatrics for the hearing-impaired. The estimation results of VTAF provide a visual feedback on auditory sense. According to Wakita's method, the filtering processes of the inverse digital filter in signal analysis and the acoustic tube model in physics are identical. It could be confirmed by the relationship between linear prediction coding and analysis lattice structure. The efficiency of calculation could be improved with the Levinson-Durbin recursive algorithm. Due to the restriction of all-pole transfer functions, the speech production was modeled by Wakita as follows: sound waves produced by the glottis pass from the throat to the oral cavity and then radiate at lips. One could notice that the procedure does not take the nasal resonance effect into consideration. In this thesis, we use an iterative procedure proposed by Schnell and Lacroix to obtain pole-zero type transfer functions. After that, we set up initial conditions and boundary conditions based on the typical size and shape of the human vocal tract. Last, with the assistance of factorization and exhaustive search, the vocal tract is described as a three-branched model which could be divided into the main tract, the oral cavity and the nasal cavity. Thus, we could estimate VTAF by considering the nasal resonance effect. We use nasalized vowels and non-nasalized vowels to verify the validity of the acoustic tube model and the solution of the inverse scattering problem.

並列關鍵字

linear prediction ； inverse scattering ； analysis lattice structure

參考文獻

[2] C. T. Ferrand, Speech Science:An Integrated Approach to Theory and Clinical

Practice, 1st ed. Pearson, 2001.

[3] H. Wakita, “Direct estimation of the vocal tract shape by inverse filtering of

in vowel production training with children who have profound hearing loss,”

hearing-impaired speakers using palatometry.,” J. Speech Hear. Res., vol. 34,

國際替代計量

從語音訊號估計發聲狀態下之口鼻腔截面積

全文下載

主題瀏覽