透過您的圖書館登入
IP:18.218.89.173
  • 學位論文

基於深度學習的端到端語者驗證系統之損失函數的研究

A Study on Loss Functions in End-to-end DNN-based Speaker Verification

指導教授 : 張智星
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


深度學習搭配度量學習(metric learning )在辨別臉部特徵之應用,已經被證實有良好的效果,其中利用角度間隔( angular margin )限制的 Additive Angular Margin Softmax Loss 在計算機視覺的成功,也帶領了語者驗證領域的進步。本論文將使用以角度間隔為基礎的 Angular Triplet Loss 以及 Angulart Triplet Center Loss 所訓練之端到端語者驗證模型,以 Equal Error Rate 和 NIST SRE 所訂定的 Cprimary 作為衡量標準,將其應用在公開的中文語音資料集 aishell1 來衡量模型的表現。最終本次研究的最佳模型相較於 Additive Angular Margin Softmax Loss 模型在平均 Equal Error Rate 獲得了 7.4% 的相對進步,以及在平均 Cprimary 獲得了 6.1% 的相對進步。

並列摘要


Deep metric learning has proven itself an effective way to discriminate clustering embedding for face recognition. The success of the modified softmax Loss function, additive angular margin softmax loss, in computer vision leads the improvement of training speaker recognition. We introduce angular triplet loss and angular triplet center loss into end-to-end speaker verification. Experiments are conducted on Aishell1 dataset and demonstrate the performance with equal error rate and Cprimary. By testing the combination of different loss function with angular triplet loss and angular triplet center loss, our best model show a relative improvement of 7.4% on average Equal Error Rate and 6.1% on average Cprimary over the additive angular margin softmax loss.

參考文獻


[1] T. Q. D. Reynolds and R. Dunn, “Speaker verification using adapted gaussian mixture models,” 2000.
[2] P. O. Patrick Kenny, G. Boulianne and P. Dumouchel, “Joint factor analysis versus eigenchannels in speaker recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 1, no. 1, 2007.
[3] P. K. Niko Brümmer, Doris Baum, “Abc system description for nist sre 2010,” NIST SRE, vol. 1, no. 1, 2010.
[4] E. Variani, “Deep neural networks for small footprint text-dependent speaker verification,” ICASSP Transactions on Acoustics, Speech and Signal Processing, vol. 1, no. 1, 2014.
[5] G. Heigold, I. Moreno, S. Bengio, and N. Shazeer, “End-to-end text-dependent speaker verification,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119, IEEE, 2016.

延伸閱讀