基於權重式有限狀態機之中文大詞彙連續語音辨識介紹與中文語音辨識之挑戰

大詞彙連續語音辨識為語音辨識技術的極致，以其為基礎搭配語言理解與對話管理，能夠發展出許多功能性和趣味性的智慧型應用和服務，提升人機互動的便利性。本文將介紹權重式有限狀態機中文大詞彙連續語音辨識系統幾項必要的功能模組，內容涵蓋聲學特徵參數萃取、聲學模型訓練演算法、語言模型訓練演算法、以及解譯器架構。除此之外，尚會提及GALE計劃裡中文大詞彙連續語音辨識系統的效能，以及中文辨識所要面對的特殊問題和挑戰。

關鍵字

大詞彙連續語音辨識；權重式有限狀態機；梅爾頻率倒頻譜係數；感知線性預測；改良式聶氏平滑化；中文

並列摘要

Large vocabulary continuous speech recognition (LVCSR) is the ultimate goal of speech recognition. When incorporated with other techniques such as language understanding and spoken dialogue management, a variety of speech-based intelligent applications and services are realized to enrich interaction between human and machine. This article gives readers a brief introduction to modules in weighted finite-state transducer (WFST) based LVCSR system, including acoustic feature extraction, acoustic model training algorithms, language model training algorithms, and decoder architecture. In addition, recent performance of Mandarin LVCSR systems in GALE project and some issues specific to Mandarin are addressed.

並列關鍵字

Large Vocabulary Continuous Speech Recognition (LVCSR) ； Weighted Finite-state Transducer (WFST) ； Mel-frequency Cepstral Coefficients (MFCC) ； Perceptual Linear Prediction (PLP) ； Modified Kneser-Ney Smoothing ； Mandarin

國際替代計量

基於權重式有限狀態機之中文大詞彙連續語音辨識介紹與中文語音辨識之挑戰

全文下載

主題瀏覽