深層長短期記憶網絡用於語音辨識之研究

基於深度學習技術的語音辨識系統已證實能顯著提升語音辨識正確率。利用前饋式類神經網路(Feedforward Neural Network, FNN)或遞迴式類神經網路(Recurrent Neural Network, RNN)是近年來實現深層學習(Deep Learning)並建立聲學模型(Acoustic Model)常見的方法。前饋式類神經網路是透過多層非線性轉換提取深層抽象(Abstraction)且具有不變性(Invariance)的特徵，然而遞迴式類神經網路則利用遞迴的方式獲得時間序列(Temporal)資料的潛在資訊。長短期記憶(Long-Short Term Memory, LSTM)模型可以有效儲存歷史資訊，並被證實比傳統遞迴式類神經網路能更為有效的處理時間序列資料中間隔較長的重要資訊。本篇論文結合前饋式與遞迴式類神經網路的優點，提出具新穎性之深層長短期記憶類神經網路架構，實現包含FNN-LSTM、LSTM-FNN、LSTM-FNN-FNN以及LSTM-FNN-LSTM等不同的串聯模組，並根據這些串聯模組堆疊出更深層的類神經網路架構。在實驗評估中，我們使用卡爾迪 (Kaldi) 語音辨識軟體實現本論文所提出之深層架構。在第三屆CHiME Challenge及 Aurora-4語音資料庫的實驗結果顯示混合前饋式類神經網路及長短期記憶模型的深層架構可以有效提昇在雜訊環境下的語音辨識率。

關鍵字

聲學模型；語音辨識；前饋式類神經網路；長短期記憶模型

並列摘要

Speech recognition has been significantly improved by applying acoustic models based on the deep neural network (DNN) which could be realized as the feedforward neural network (FNN) or the recurrent neural network (RNN). FNN is feasible to project the observations onto a deep invariant feature space while RNN is beneficial to capture the temporal information in sequence data. RNN based on the long short-term memory (LSTM) is capable of memorizing the inputs over a long time period and thus exploiting a self-learnt amount of long-range temporal context. By considering the complimentary FNN and RNN in their modeling capabilities, we present a new architecture of DNN model which is constructed by cascading LSTM and FNN in different ways and stacking the cascades of (1) FNN-LSTM, (2) LSTM-FNN, (3) LSTM-FNN-FNN and (4) LSTM-FNN-LSTM in a deep model structure. Through the cascade of the LSTM cells and the fully-connected feedforward units, we build the deep long short-term memory network which explores the temporal patterns and summarizes the long history of previous inputs in a deep learning machine. In the experiments, different architectures and topologies are investigated by using open-source Kaldi toolkit. The experiments on 3rd CHiME challenge and Aurora-4 show that the stacks of hybrid LSTM and FNN outperform the stand-alone FNN and LSTM and the other hybrid systems for noisy speech recognition.

並列關鍵字

acoustic modeling ； speech recognition ； feedforward neural network ； long short-term memory

參考文獻

[61] Y.-P. Chiang, Releveance vector machine for speech recognition," Master's

[1] L. Rabiner, A tutorial on hidden Markov models and selected applications in

[2] K. Fukushima, Neocognitron: A self-organizing neural network model for a

[3] V. N. Vapnik, Statistical Learning Theory. John Wiley & Sons, Inc., 2006.

[4] G. Hinton, S. Osindero, and Y.-W. Teh, A fast learning algorithm for deep

國際替代計量

深層長短期記憶網絡用於語音辨識之研究

全文下載

主題瀏覽