透過您的圖書館登入
IP:18.223.160.61
  • 學位論文

不同發音方式所測得之電聲門圖與音訊綜合比較與分析

Joint comparison and analysis of EGG and audio signals measured during different types of phonation

指導教授 : 劉奕汶

摘要


本研究探討人聲在產生的過程中,作為聲源的聲帶之振動方式。發聲機制是 由喉部精密的構造相互調節運作,使聲帶黏膜能夠以不同的方式擺盪振動,產生 多元豐富的聲音。藉由電聲門圖儀器對聲帶的測量,並同步記錄聲音訊號,收集 了三種不同發聲方式,分別為氣息發聲、模態發聲與壓迫發聲, 在 三種方式 中分 別 蒐集了不同音高與五個主要的單母音。 在電聲門圖生理訊號 ,我們發現 其 波形 與文獻中典型平滑的電聲門圖不同;各母音的發聲中,電聲門圖的波形具有不同 特徵的波紋,在 K-近鄰演算法中,交叉驗證後的準確率為 0.536。而當使用不同 發聲方式,閉合商的數值能夠區別各發聲方式,且使用 K-近鄰演算法交叉驗證 後的準確率為 0.899,代表電聲門圖可以將不同發聲方式的聲帶振動記載至一定 的程度。為了討論聲音訊號中的聲門資訊,本論文亦使用 聲門氣流模型迭代自適 應逆濾波 的方法,將聲音訊號中的聲門訊號提取出來後,發現所提取的聲門訊號 中,其波紋與電聲門圖相應位置的波形有相似的特徵。

並列摘要


The thesis aims to explore the vocal fold vibration conditions during voice production. The human phonation mechanism adjusts the complex structures near the glottis and uses the laryngeal muscles to control the vibration of the vocal fold. Literature suggested that when a singer sings with different techniques, vocal fold mucosa would vibrate differently. In this research, the condition of vocal fold vibration is recorded via electroglottography (EGG). The EGG signal and the voice audio signal are simultaneously recorded, and we collected breathy, modal, and pressed phonation of notes that are sung with different pitches and in five distinct single vowels. We found that, qualitatively, the waveform of the EGG signal that we saw is not smooth as reported in several papers. The waveform shows ripples during each period of five single vowels. Thus, we implement K-nearest neighbor algorithm (KNN) to classify the waveforms of different vowels. The accuracy of KNN with cross-validation is 0.536, much higher than random guess (0.2). We also found that the EGG signal of different phonation types can be distinguished by the closure quotient. The accuracy of the KNN classification with cross-validation is 0.899. This result shows that the phonation type can be inferred from the EGG signal. Finally, we use a technique called glottal flow model iterative adaptive inverse filtering (GFM-IAIF) to extract the glottal source signal from the audio signal. The waveform of the extracted glottal source has similar characteristic ripples which resemble the waveform of the EGG signals.

參考文獻


[49] D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series,” in Proc. Knowledge Discovery and Delivery Workshop, vol. 10, no. 16, pp. 359-370, 1994. [50] T. R. Derrick, B. T. Bates, and J. S. Dufek, “Evaluation of time-series data sets using the Pearson product-moment correlation coefficient,” Medicine and Science in Sports and Exercise, vol. 26, no. 7, pp. 919-928, 1994. [51] M.T. Puth, M. Neuhäuser, and G. D. Ruxton, “Effective use of Pearson's product–moment correlation coefficient,” Animal Behaviour, vol. 93, pp. 183-189, 2014. [52] D. E. Hinkle, W. Wiersma, and S. G. Jurs, Applied Statistics for the Behavioral Sciences, vol. 663, Houghton Mifflin College Division, 2003. [53] A. P. Prathosh, V. Srivastava, and M. Mishra, “Adversarial approximate inference for speech to electroglottograph conversion,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2183-2196, 2019.

延伸閱讀