透過您的圖書館登入
IP:52.14.253.170
  • 學位論文

基於識別向量分群與深層類神經網路之語者調適

Speaker Adaptation over Deep Neural Network by Clustering Identity Vectors

指導教授 : 李琳山

摘要


語音辨識的使用日趨廣泛,大量出現於各種應用環境中,而語者調適愈顯得日趨重要。深層類神經網路亦已成為聲學模型的主流,本論文將各語者的平均識別向量分群,分別為每一群語者訓練特定的深層類神經網路模型,再用這些事前訓練好的模型來作語者調適。本論文提出兩種作法,一是以測試語者的識別向量做為選擇標準,挑出最適合的語者分群模型;另一者則用監督式方法學習出結合向量來整合各個模型的輸出結果。我們使用高度口語化、個人化及雙語特性之語料測試,發現本論文所提出的架構在調適語料少時能迅速提升辨識正確率,並且在調適語料數目增加時也有不錯的表現。

並列摘要


參考文獻


[1] Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh, “A fast learningalgorithm for deep belief nets,” Journal Neural Computation, vol. 18, no. 7, pp.357–389, 2006.
[2] Najim Dehak, Patrick Kenny, Reda Dehak, Pierre umouchel, and Pierre Ouellet, “Front-end factor analysis for speaker verification,” Audio, Speech,and Language Processing, IEEE Transactions on, vol. 19, no. 6, pp. 788–798, 2011.
[3] James Baker, “The dragon system-an overview,” Acoustics Speech and Signal Processing, vol. 23, no. 1, pp. 24–29, 1975.
[4] Lawrence Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Acoustics Speech and Signal Processing, vol.77, no. 2, pp. 257–286, 1989.
[5] Janet Baker, Li Deng, James Glass, Sanjeev Khudanpur, Chin-Hui Lee, Nelson Morgan, and Douglas O’Shaughnessy, “Developments and directions in speech recognition and understanding, part 1,” Signal Processing Magazine, vol. 26, no. 3, pp. 75–80, 2009.

延伸閱讀