基於識別向量分群與深層類神經網路之語者調適

語音辨識的使用日趨廣泛,大量出現於各種應用環境中,而語者調適愈顯得日趨重要。深層類神經網路亦已成為聲學模型的主流,本論文將各語者的平均識別向量分群,分別為每一群語者訓練特定的深層類神經網路模型,再用這些事前訓練好的模型來作語者調適。本論文提出兩種作法,一是以測試語者的識別向量做為選擇標準,挑出最適合的語者分群模型;另一者則用監督式方法學習出結合向量來整合各個模型的輸出結果。我們使用高度口語化、個人化及雙語特性之語料測試,發現本論文所提出的架構在調適語料少時能迅速提升辨識正確率,並且在調適語料數目增加時也有不錯的表現。

關鍵字

深層類神經網路；識別向量；語者調適；向量結合；語者分群

並列摘要

無資料

並列關鍵字

deep neural network ； i-vector ； speaker adaptation ； vector combination ； speaker clustering

參考文獻

[1] Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh, “A fast learningalgorithm for deep belief nets,” Journal Neural Computation, vol. 18, no. 7, pp.357–389, 2006.

[2] Najim Dehak, Patrick Kenny, Reda Dehak, Pierre umouchel, and Pierre Ouellet, “Front-end factor analysis for speaker verification,” Audio, Speech,and Language Processing, IEEE Transactions on, vol. 19, no. 6, pp. 788–798, 2011.

[3] James Baker, “The dragon system-an overview,” Acoustics Speech and Signal Processing, vol. 23, no. 1, pp. 24–29, 1975.

[4] Lawrence Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Acoustics Speech and Signal Processing, vol.77, no. 2, pp. 257–286, 1989.

[5] Janet Baker, Li Deng, James Glass, Sanjeev Khudanpur, Chin-Hui Lee, Nelson Morgan, and Douglas O’Shaughnessy, “Developments and directions in speech recognition and understanding, part 1,” Signal Processing Magazine, vol. 26, no. 3, pp. 75–80, 2009.

國際替代計量

基於識別向量分群與深層類神經網路之語者調適

全文下載

主題瀏覽