  • 學位論文


Investigation on the Characteristics of Long Term Average Spectrum from Human Speech

指導教授 : 陳永耀


人類可以藉由音色這個因素來分辨不同人所發出的語音,現今很多演算法也嘗試去擷取音色這個參數,而長時間平均頻譜就是其中一個被廣泛應用的工具,顧名思議,它其實就是一個從長時間的語音訊號頻譜平均後所得到的結果,在許多有關人聲的特性的討論或應用中都會使用它來做為分析的依據。從以往的研究結果顯示,長時間平均頻譜在時間足夠的條件下會有穩定的效果。另一個文獻指出有關長時間平均頻譜的關鍵特色就是它在經由平均的運算過程中,可以消除掉語音內容的影響,進而保留僅跟人聲特色有關的成份。也就是因為上述提到的這兩個特色,長時間平均頻譜才會被廣泛使用在一些語者辨識系統和人聲特性的討論。 在本論文裡,我們將會對長時間平均頻譜的特性進行一連串的分析。然而從實驗結果顯示,過去以往對長時間平均頻譜的認知可能有一些錯誤的地方。我們發現長時間平均頻譜的兩個論點是建構在某些特定情況下才能夠成立的,也就是說長時間平均頻譜並不能真的完全消除語音內容的影響,更進一步的經過一些特定的例子驗證後,證明長時間平均頻譜是一個綜合內容影響和語者特性的一個結果。


The unique timbres of different speakers make their speech discriminative. There have been many algorithms trying to quantify the timbre characteristics for speaker identification systems. Long Term Average Spectrum (LTAS), an averaged spectrum on a long term series of the human speech, is one of the most popular technologies to analyze speakers’ characteristics. LTAS is considered to disregard the influence of contents but keep only speakers’ characteristics, and it has been used in many applications on human speech analysis and recognition. In this thesis, the characteristics of the LTAS are analyzed. Experiments demonstrated that the previous arguments on LTAS might only hold in particular situations. LTAS cannot totally disregard the influence of the contents in general. It is improper for speaker identification unless embeds the same content distribution. LTAS somehow represents the speakers’ characteristics, but the content distribution should be considered at the same time. So the previous applications based on the LTAS might be improved.


[2] T. F. Cleveland, et al., "Long-Term-Average Spectrum Characteristics of Country Singers During Speaking and Singing," Journal of Voice, vol. 15, pp. 54-60, 2001.
[3] J. S. DZ Borch, "Spectral Distribution of Solo Voice and Accompaniment in Pop Music," Logopedics Phonatrics Vocology, 2002.
[4] S. Pauk, "Use of Long-Term Average Spectrum for Automatic Speaker Recognition," Department of Computer Science, University of Joensuu, 2006.
[5] R. M. Peter Vary, Digital Speech Transmission: Enhancement, Coding and Error concealment: Wiley, 2006.
[6] J. Makhoul, "Linear Prediction:A Tutorial Review," Proceedings of the IEEE, vol. 63, 1975.


Tsai, Z. S. (2012). 長時間平均頻譜與語音內容分布的語者辨識系統 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2012.01641
