透過您的圖書館登入
IP:3.141.24.134
  • 學位論文

以句尾母音模型與鼻濁音發音變異來改善日語語音模型

Improving Japanese Acoustic Models by Sentence-end Vowel Models and Bidakuon Allophones

指導教授 : 張智星

摘要


由於句尾母音無聲化與鼻濁音是日語語音辨識中經常遇到的問題,本論文將使用加入句尾母音專用模型的方式克服句尾母音無聲化的情況,並對鼻濁音使用一自動修正發音變異機制修正標音,最後將使用評量相關的量測方式測試改良後與未改良之差距。 我們將以梅爾倒頻譜係數 (mel-frequency cepstral coefficients,MFCCs)和對數能量 (log energy)取得訓練語料特徵,並且將訓練語料每句句尾加入母音專用模型的方式訓練,以改善日語句尾效能;而在鼻濁音發音變異上,也將以自動修正發音變異機制進行疊代式修正,以排名評分設定一門檻值做為修正標音之依據,有系統地將訓練語料逐步修正為最貼近真實發音的標音。 為測試模型於發音評量時的表現,我們使用三種評量相關的測試方法,分別是以排名為基礎的信心度量測、自由拍解碼與整句辨識。經實驗,將鼻濁音標音校正與加入句尾母音專用模型的兩方法一起使用訓練之模型優於基礎模型。

並列摘要


Sentence-end vowel devoicing and bidakuon allophones are common problems in Japanese speech recognition. This thesis proposes the use of specialized models for sentence-end vowel phones to overcome the devoicing problem and an automatic transcription correction framework for bidakuon allophones. In this study, Mel-frequency cepstral coefficients (MFCC) and log energy are used as features for training speech recognition models. Sentence-end vowel models are adopted for each sentence during the training phase in order to improve the recognition performance at the end of the sentence. On the other hand, we use an automatic transcription correction framework to resolve the bidakuon allophone problem by an iterative correction method. The iterative correction method is based on thresholds trained from the ranking scores. The transcription is corrected gradually towards the actual pronunciation recorded in the training data. We use three types of performance measure to evaluate the effectiveness of the proposed methods. They are confidence measure based on phone model ranking, free-mola decoding, and sentence recognition. The experimental results show that using both of the proposed methods can effectively enhance the recognition performance of the baseline system.

參考文獻


林宏俊, “華語混淆音與耦合音之自動切分”, 2008。 
董姵汝, “使用音高資訊來改進日文發音評量”, 2010。 
蔡佩姍、沈涵平、吳宗憲, “發音事件驗證於多語辨識發音變異模型之產生”, 2010。 
A. Kipp, M.-B. Wesenick, and F. Schiel, “Automatic Detection and Segmentation of Pronunciation Variants in German Speech Corpora”in Proc. of the International Conference on Spoken Language Processing (ICSLP), 1996。 
G. Bouselmi, D. Fohr, and I. Illina, “Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition” in Proc. of Interspeech, 2007。 

被引用紀錄


汪緒中(2012)。未知語者聲道長度正規化之快速計算方法〔碩士論文,國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2012.00517
陳珮真(2012)。電熱分離之LED投射光源模組開發〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-1408201223383100

延伸閱讀