  • 學位論文


Prediction of Glottal Parameters Obtained from Stroboscopic Images Using Voice Recordings

指導教授 : 劉奕汶


平時人們透過說話來和其他人溝通,但是隨著年齡的增長,或只是聲帶長期過度使用以及產生病變的原故,以致於讓人無法正常發聲。 而檢查聲帶是否異常的方式大致可分為聽覺與視覺兩方面。聽覺部分可透過錄下病人的聲音,並使用多響度嗓音分析儀 (MDVP)分析各種聲音指標以評估發聲是否正常,亦或是由醫師直接聆聽病患的嗓音所評分的GRBAS量表來評判;至於視覺方面則是利用喉閃頻儀 (Stroboscopy)拍攝聲帶的動態變化,再由醫師判斷聲帶的外觀和行動是否異常。 在本實驗中,我們收集MDVP所錄製的音檔以及尤其所計算出的聲學指標,包含:Jitter, Shimmer和 NHR,並且還收集了GRBAS量表。接著再對音檔中的聲音訊號進行音訊處理,利用線性預估法計算其預估誤差訊號,再求出預估誤差訊號的相關係數,定義一個指標為音高振幅(Pitch Amplitude, PA)。 由於喉閃頻儀通常只是輔助醫師藉由聲帶振動影像來大略判斷聲帶受損的嚴重性,沒有一個量化的標準,於是我們將幾個常見的聲門生理參數包括:週期性,開放商數,對稱性以及平滑度,利用由喉閃頻儀所拍攝的影片進行一些影像處理的方式加以量化。而本實驗的最終目的是希望能利用聲學指標來預估由影像才能觀察到的聲門生理參數,這是因為喉閃頻儀是種侵入性的觀測儀器,在檢查的過程中可能會造成病患有些許的不適,因此我們希望能在不侵入人體的情況下,就能以聲學指標來幫助醫師初步判斷聲帶的生理外觀及動態的受損程度如何。 本實驗利用線性迴歸 (Linear Regression) 的方法,利用上述所提及之聲學指標的線性組合來預測經過量化的聲門生理參數,而預測的效果以R-squared表示,各個指標的預測效果分別可達到:Aperiodicity=0.907, OQ=0.774, Symmetry=0.783 和 Roughness=0.833。


喉閃頻 聲門參數


The most common way for people to communicate is by means of talking. Some people suffer from voice disorder because of overusing the vocal folds or due to other vocalfold pathology. In an ear-nose-throat clinic, ways of diagnosing the vocal-fold pathology include listening to patients’ voice and watching the movement of patients’ vocal folds. The former method is performed through recording and then analyzing the acoustic parameters to examine whether phonation is normal or not. This method is objective. Alternatively, a subjective way is for the doctor to grade the condition of patients’ voice directly through hearing, and these evaluation indexes include Grade, Roughness, Breathiness, Aesthenia and Strain (GRBAS). The latter method is using stroboscopy to observe the dynamic variation of vocal folds, then the doctor evaluates whether the appearance and movement of vocal folds are normal. Stroboscopy is usually used by doctors to judge the severity of vocal-fold damage subjectively. There is currently no standard way to automate this process. Therefore, we attempt to quantify common glottal physiological parameters (GPPs) including Aperiodicity, Opening Quotient (OQ), Symmetry and Roughness by image processing. The ultimate purpose of this research is to use acoustic parameters to predict the vocal-fold damage severity observed from stroboscopic images. Our motivation was based on the fact that stroboscopy is an invasive instrument which could hurt the patient. In this research, we collected the recording voice files and acoustic parameters including Jitter, Shimmer and Noise-to-Harmonic Ratio (NHR) and also collected the data of GRBAS from 15 patients before they went through vocal-fold surgery. Then we used the linear prediction method to analyze the voice files and calculated their prediction error signals. Finally we calculated the correlation function of the prediction error signals and defined a parameter called Pitch Amplitude (PA). Then combinations of these parameters were used to construct a linear model that gives the best prediction of GPPs in terms of least-square approximation. The R-square, which is used to evaluate the performance of each GPP, can reach Aperiodicity=0.907, OQ=0.774, Symmetry=0.783 and Roughness=0.833. The GPPs can be predicted well by the linear combination of the acoustic parameters which is a non-invasive method.




[1] I. R. Titze, Principles of voice production. Prentice Hall, 1994.
[2] M. Hirano, Clinical examination of voice. Springer-Verlag, 1981.
results after type i thyroplasty with the montgomery’s prosthesis],” Acta otorhinolaryngologica
e chirurgia cervico-facciale, vol. 21, pp. 156–162, June 2001.
otolaryngology, vol. 14, pp. 151–157, June 1985.
