簡易檢索 / 詳目顯示

研究生: 吳佳樺
Wu, Chia-Hua
論文名稱: 探究有效偵測及修正語音辨識錯誤技術之研究
A Study on Effective Detection and Correction Techniques for Speech Recognition Errors
指導教授: 陳柏琳
Chen, Berlin
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 38
中文關鍵詞: 語音辨識辨識錯誤錯誤偵測錯誤修正未知詞
英文關鍵詞: Speech Recognition, Recognition Errors, Error Detection, Error Correction, Out-of-Vocabulary Words
DOI URL: http://doi.org/10.6345/NTNU202000395
論文種類: 學術論文
相關次數: 點閱:120下載:21
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文著重在研究語音辨識錯誤相關的幾個重要面向,尤其是當一般的語音辨識系統應用於特殊領域下所產生的未知詞問題。為此目的,我們提出一個兩階段的方法,包括了語音錯誤偵測和錯誤內容修補。在錯誤偵測階段,我們嘗試比較多種序列標記方法去偵測不同型態的錯誤。更進一步,在錯誤修正階段,藉由上一階段所偵測的結果作為依據,利用音素比對方法以特殊領域的關鍵詞表來修正錯誤。在四種應用領域,包括教育議題、工業技術相關訪談、語音記事及會議錄音,所進行的一系列實驗。由實驗結果顯示,我們提出的方法可以使得一般語音辨識系統在上述應用領域中有某種程度上的提升。

    This paper sets out to study several important aspects pertaining to speech recognition errors, especially the out-of-vocabulary (OOV) word problem that is caused by using generic speech recognition systems for a specific application domain. To this end, a two-stage processing method, involving error detection and error correction, is proposed. For error detection, we explore and compare disparate sequence labeling methods to detect possible errors of different types. Further, in the error correction stage, an effective phone-level matching mechanism along with a domain-specific keyword list is exploited to correct errors of different types detected by the previous stage. Extensive experiments conducted on four application domains, including educational issues, industrial technology-related interviews and speech memos and meeting recordings, show that our proposed methods can boot the performance of a given general speech recognition system on the aforementioned application domains to some extent.

    第1章 緒論 1 1.1 研究背景及動機 1 1.2 本論文研究內容及貢獻 2 1.3 論文架構 3 第2章 文獻探討 4 2.1 自動語音辨識 4 2.1.1 語音辨識流程 4 2.1.2 現階段語音辨識之發展及應用 6 2.2 語音辨識錯誤處理 7 2.2.1 語音辨識錯誤之類型 7 2.2.2 語音辨識錯誤之影響 9 2.2.3 語音辨識錯誤處理相關研究 10 2.3 語音辨識錯誤偵測 12 2.3.1 基於信心評估 13 2.3.2 未知詞偵測 13 2.4 語音辨識錯誤修正 15 2.4.1 語音辨識錯誤修正方法 16 2.4.2 語音辨識錯誤修正流程 17 第3章 語音辨識錯誤偵測及修正方法 18 3.1 語音錯誤偵測之分類模型 18 3.2 語音辨識錯誤修正 19 第4章 辨識錯誤偵測之特徵 21 4.1 韻律特徵(Prosodic features) 21 4.1.1 能量(Energy) 22 4.1.2 過零率(Zero-crossing rate) 22 4.1.3 基本音頻(Fundamental Frequency, F0) 22 4.1.4 發音時間(Duration) 22 4.2 語言學特徵(linguistic feature) 23 4.3 詞表示法(Word Embedding) 23 第5章 實驗架構與設定 24 5.1 實驗語料說明 24 5.1.1 會議對話語料 24 5.1.2 MATBN新聞語料 25 5.2 實驗評估方法 26 5.3 實驗語料標記方式 27 第6章 實驗結果與討論 28 6.1 錯誤偵測模型 28 6.2 語音辨識修正 31 第7章 結論與未來展望 33 參考文獻 34

    [1] Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal processing magazine 29.6 (2012): 82-97.
    [2] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
    [3] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.
    [4] Schmidhuber, Jürgen. "Deep learning in neural networks: An overview." Neural networks 61 (2015): 85-117.
    [5] Ogawa, Atsunori, and Takaaki Hori. "Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks." Speech Communication 89 (2017): 70-83.
    [6] Qin, Long, Ming Sun, and Alexander Rudnicky. "OOV detection and recovery using hybrid models with different fragments." Twelfth Annual Conference of the International Speech Communication Association. 2011.
    [7] Bazzi, Issam. Modelling out-of-vocabulary words for robust speech recognition. Diss. Massachusetts Institute of Technology, 2002.
    [8] Bennacef, S. K., et al. "A spoken language system for information retrieval." Third International Conference on Spoken Language Processing. 1994.
    [9] Mishra, Taniya, and Srinivas Bangalore. "Qme!: A speech-based question-answering system on mobile devices." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
    [10] Misu, Teruhisa, and Tatsuya Kawahara. "Speech-based interactive information guidance system using question-answering technique." 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07. Vol. 4. IEEE, 2007.
    [11] Hori, Chiori, and Sadaoki Furui. "Advances in automatic speech summarization." Seventh European Conference on Speech Communication and Technology. 2001..
    [12] Davis, Steven, and Paul Mermelstein. "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences." IEEE transactions on acoustics, speech, and signal processing 28.4 (1980): 357-366.
    [13] Li, Jinyu, et al. "An overview of noise-robust automatic speech recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 22.4 (2014): 745-777.
    [14] Szoke, Igor, et al. "Sub-word modeling of out of vocabulary words in spoken term detection." 2008 IEEE Spoken Language Technology Workshop. IEEE, 2008.
    [15] Klakow, Dietrich, Georg Rose, and Xavier Aubert. "OOV-detection in large vocabulary system using automatically defined word-fragments as fillers." Sixth European Conference on Speech Communication and Technology. 1999.
    [16] Bisani, Maximilian, and Hermann Ney. "Open vocabulary speech recognition with flat hybrid models." Ninth European Conference on Speech Communication and Technology. 2005.
    [17] Schaaf, Thomas. "Detection of OOV words using generalized word models and a semantic class language model." Seventh European Conference on Speech Communication and Technology. 2001.
    [18] Wessel, Frank, et al. "Confidence measures for large vocabulary continuous speech recognition." IEEE Transactions on speech and audio processing 9.3 (2001): 288-298.
    [19] Sun, Hui, et al. "Using word confidence measure for OOV words detection in a spontaneous spoken dialog system." Eighth European Conference on Speech Communication and Technology. 2003.
    [20] Lin, Hui, et al. "OOV detection by joint word/phone lattice alignment." 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). IEEE, 2007.
    [21] Burget, Lukas, et al. "Combination of strongly and weakly constrained recognizers for reliable detection of OOVs." 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008.
    [22] Rastrow, Ariya, Abhinav Sethy, and Bhuvana Ramabhadran. "A new method for OOV detection using hybrid word/fragment system." 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009.
    [23] Parada, Carolina, et al. "Contextual information improves OOV detection in speech." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
    [24] Rastrow, Ariya, Abhinav Sethy, and Bhuvana Ramabhadran. "A new method for OOV detection using hybrid word/fragment system." 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009.
    [25] Twiefel, Johannes, et al. "Improving domain-independent cloud-based speech recognition with domain-dependent phonetic post-processing." Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014.
    [26] Bechet, Frederic, and Benoit Favre. "Asr error segment localization for spoken recovery strategy." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
    [27] Bechet, Frederic, and Benoit Favre. "Asr error segment localization for spoken recovery strategy." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
    [28] Li, Jinyu, et al. "An overview of noise-robust automatic speech recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 22.4 (2014): 745-777.
    [29] Kim, Yoon, Horacio Franco, and Leonardo Neumeyer. "Automatic pronunciation scoring of specific phone segments for language instruction." Fifth European Conference on Speech Communication and Technology. 1997.

    下載圖示
    QR CODE