  • 學位論文


Deep Neural Network based Factor Analysis for Robust Speaker Verification System

指導教授 : 廖元甫


本論文之目標要建立一個強健語者驗證模型。因為在語者驗證中,常受到通道環境、雜訊、Session等因素使辨認效能下降。在建立語者模型部分,近年來常見強健語者驗證之方法有(i-Vector+ Linear Discriminant Analysis; LDA)及(i-Vector+ Probabilistic Linear Discriminant Analysis; PLDA),由於PLDA的語者模型是基於強力假設資料的機率分佈是高斯分佈,但礙於資料的變異性,其假設不總是對的,因此我們進一步提出基於Deep Neural Network的變異系統,使用基於神經網路的特徵鑑別分析(i-Vector+Factor Analysis-based on Deep Neural Network; FA-DNN)的方法,先用i-Vector將(Universal Background Model;UBM)的特徵向量降到低維度,最後使用FA-DNN處理通道或環境不匹配所造成的干擾。除此之外,我們利用FA-DNN模型,將具有高度表徵的隱藏層,分為語者與非語者節點,在測試時,只專注語者節點的貢獻。 LDA、PLDA、FA-DNN的方法我們實現於NIST SRE14上,最後在min DCF的驗證下,FA-DNN的系統辨識效能比Baseline系統(Cosine Distance)的min DCF有9.84%的相對效能增益,EER有13.25%的相對效能增益。


The goal of this study is to build a model of robust speaker verification. In the speaker verification, performance is affected with noise, environment, or session …etc. i-Vector+Linear Discriminant Analysis (LDA) and i-Vector+ Probabilistic Linear Discriminant Analysis (PLDA) systems have become the state-of-the art technique in the speaker verification field. Because of PLDA's speaker model is based on the strong assumption that the probability distribution is a Gaussian distribution of information, but due to the variability of the data, the assumption is not always right. So we further proposed variation Deep Neural Network-based systems based on neural network using method.We use the model (FA-DNN), the hidden layer having a high degree of representation, into the non-language speaker node, in the test, only focus on the contribution speaker node. In this thesis, three methods are experimented on the SRE14. The experimental results on min DCF trial showed that relative performance gain of FA-DNN is 9.84%, and EER of PLDA is 13.25%.


[1] NIST Speaker Recognition Evaluation, http://www.nist.gov/itl/iad/mig/sre12.cfm.
[6] D. A. Reynolds, “Channel robust speaker verification via feature mapping,” in IEEE ICASSP, Hong Kong, Apr. 2003, pp. 53-56.
[7] Rong Zheng, Shuwu Zhang, and Bo Xu. “A Comparative Study of Feature and Score Normalization for Speaker Verification,” Advances in Biometrics, Vol. 3832, pp. 531-538, Dec. 2005.
[9] Kenny P., Ouellet P., Dehak N., Gupta V., and Dumouchel P., “A Study of Inter-Speaker Variability in Speaker Verification,” IEEE Transactions on Audio Speech and Language Processing, vol. 16, no. 5, pp. 980-988, Jul. 2008.
[10] Dehak, N., Kenny, P., Dehak, R., Dumouchel P., and Ouellet P., “Front-End Factor Analysis for Speaker Verification,” IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 8, pp. 1-28. Jul. 2010.
