遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究

語言模型藉由大量的文字訓練後，可以捕捉自然語言的規律性，並根據歷史詞序列來區辨出下一個詞應該為何，因此在自動語音辨識(Automatic Speech Recognition, ASR)系統中扮演著不可或缺的角色。傳統統計式N連(N-gram)語言模型是常見的語言模型，它基於已知的前N-1個詞來預測下一個詞出現的可能性。當N小時，缺乏了長距離的資訊；而N大時，會因訓練語料不足產生資料稀疏之問題。近年來，由於類神經網路(Neural Networks)的興起，許多相關研究應運而生，類神經網路語言模型即是一例。令人感興趣的是，類神經網路語言模型能夠解決資料稀疏的問題，它透過將詞序列映射至連續空間來估測下一個詞出現的機率，因此在訓練語料中不會遇到未曾出現過的詞序列組合。除了傳統前饋式類神經網路語言模型外，近來也有學者使用遞迴式類神經網路來建構語言模型，其希望使用遞迴的方式將歷史資訊儲存起來，進而獲得長距離的資訊。本論文研究遞迴式類神經網路語言模型於中文大詞彙連續語音辨識之使用，探索額外使用關聯資訊以更有效地捕捉長距離資訊，並根據語句的特性動態地調整語言模型。實驗結果顯示，使用關聯資訊於遞迴式類神經網路語言模型能對於大詞彙連續語音辨識的效能有相當程度的提昇。

關鍵字

語音辨識；語言模型；前饋式類神經網路；遞迴式類神經網路

並列摘要

The goal of language modeling (LM) attempts to capture the regularities of natural languages. It uses large amounts of training text for model training so as to help predict the most likely upcoming word given a word history. Therefore, it plays an indispensable role in automatic speech recognition (ASR). The N-gram language model, which determines the probability of an upcoming word given its preceding N-1 word history, is most prominently used. When N is small, a typical N-gram language model lacks the ability of rendering long-span lexical information. On the other hand, when N becomes larger, it will suffer from the data sparseness problem because of insufficient training data. With this acknowledged, research on the neural network-based language model (NNLM), or more specifically, the feed-forward NNLM, has attracted considerable attention of researchers and practitioners in recent years. This is attributed to the fact that the feed-forward NNLM can mitigate the data sparseness problem when estimating the probability of an upcoming word given its corresponding word history through mapping them into a continuous space. In addition to the feed-forward NNLM, a recent trend is to use the recurrent neural network-based language model (RNNLM) to construct the language model for ASR, which can make efficient use of the long-span lexical information inherent in the word history in a recursive fashion. In this thesis, we not only investigate to leverage extra information relevant to the word history for RNNLM, but also devise a dynamic model estimation method to obtain an utterance-specific RNNLM. We experimentally observe that our proposed methods can show promise and perform well when compared to the existing LM methods on a large vocabulary continuous speech recognition (LVCSR) task.

並列關鍵字

automatic speech recognition ； language modeling ； feed-forward neural network ； recurrent neural networks

參考文獻

[賴敏軒，2011] 賴敏軒，“實證探究多種鑑別式語言模型於語音辨識之研究，”國立臺灣師範大學資訊工程所碩士論文，2011。

[陳冠宇，2010] 陳冠宇，“主題模型於語音辨識使用之改進，”國立臺灣師範大學資訊工程所碩士論文，2010。

[劉家妏，2010] 劉家妏，“多種鑑別式語言模型應用於語音辨識之研究，” 國立臺灣師範大學資訊工程所碩士論文，2010。

[劉鳳萍，2009] 劉鳳萍，“使用鑑別式言模型於語音辨識結果重新排序，”國立臺灣師範大學資訊工程所碩士論文，2009。

[邱炫盛，2007] 邱炫盛，“利用主題與位置相關語言模型於中文連續語音辨識，”國立臺灣師範大學資訊工程所碩士論文，2007。

國際替代計量

遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究

主題瀏覽