本論文主旨是以加入音高資訊來改進日文發音評量,並使用評量相關的量測方法測試改良後的效能。 我們首先加入梅爾倒頻譜係數 (Mel-frequency cepstral coefficients,MFCCs) 和對數能量 (log energy) 特徵,並且利用系統化調整標音的步驟,以更貼近真實發音的標音訓練出基礎語音模型;接著除了 MFCCs 和對數能量,我們再加入音高特徵,用以改良基礎模型,其中音高擷取我們使用 ACF (autocorrelation function) 及 UPDUDP (unbroken pitch determination using dynamic programming) 兩種音高追蹤方法,分別擷取出非連續音高 (broken pitch) 及連續音高 (unbroken pitch)。 為測試改良後模型應用在發音評量的效能,我們使用兩種評量相關的測試方法,分別是以排名為基礎的信心度量測和發音錯誤偵測。經實驗,改良後模型的整體評量效能優於基礎語音模型,但其中並非所有音素皆適用加入音高特徵,因此我們再實驗選擇性的載入包含音高特徵的模型或是基礎模型,結果顯示,相較於非選擇性載入模型亦有微幅的評量效能提升。
The aim of this work is to improve Japanese pronunciation assessment by utilizing pitch information, and the performance of the proposed method is evaluated against several performance measures. Firstly the baseline models are constructed by using MFCCs (Mel-frequency cepstral coefficients) as well as the log energy. The transcriptions are adjusted systematically due to the unique property of Japanese pronunciation. Then we train the improved acoustic models, called pitch-added models, with MFCCs, log energy and pitch. ACF (autocorrelation function) and UPDUDP (unbroken pitch determination using dynamic programming) are adopted as the pitch extraction method to generate a broken pitch contour and an unbroken pitch contour respectively. The performance of the proposed method is evaluated by using ranking-based confidence measure and pronunciation error detection. Experimental results show that the proposed method outperforms the baseline. However, unvoiced phonemes are considered to have no pitch values. It is therefore we try to load the models selectively between the pitch-added models and the original ones, and the experimental results show a slight improvement of the selective approach than the non-selective approach.