華語混淆音與耦合音之自動切分

Abstract Currently the automatic phoneme segmentation is done through a technique called “forced alignment”. Its advantage is to efficiently label the boundaries between phonemes from a massive amount of audio files. However, forced alignment might produce inaccurate or erroneous labeling of boundaries. After analyzing these error cases, it is found that the noisy recording environment, inappropriate recording devices or mispronunciation resulted from the unfamiliarity of the recording language often causes the forced alignment algorithm to label the boundaries incorrectly. In addition, the traditional segmentation methods do not perform well under specific conditions where the last phoneme of the previous character in a term is the same as the first phoneme of the next character. This thesis attempts to solve the above problems by proposing two methods: in the case where the inaccuracy problem is caused by the strong first language accent, Automatic Generation of Pronunciation Confusion Network (AGPCN) is proposed which combines the forced alignment algorithm with the pronunciation confusion network (PCN); in the case where the two adjacent characters connect to each other with the same phoneme, tonal feature is used in conjunction with the forced alignment algorithm. Experiments show that the accuracy increases when applying the two proposed methods. 中文摘要目前傳統的自動切音是採取強迫對位（Forced alignment）的方式進行切音，優點是能夠大量且快速的標定出音檔內容的邊界值（boundary），但使用強迫對位的切音方法卻常會發生音標邊界值標定錯誤或是不夠準確的情形，對這些情況做進一步的分析後，發現原因通常與錄音者的錄音環境與錄音設備不夠完善、或是錄音者因對欲錄音的語言不夠熟悉，以致於錄音內容的口音不夠正確、另外，由於發音的關係，對於某些特定的特定詞句（如：蘇武、回憶、記憶…等），傳統切音的效果普遍不理想，這些原因都會使切音的效果下降，影響切音的準確度。本論文便是根據上述問題，提出兩種實驗方法來改進：就錄音者的語音可能帶有明顯的母語口音，導致切音效果不理想的部分，我們便將傳統的強迫對位（Forced alignment）切音，結合發音混淆網路（Pronunciation Confusion Network, PCN）的概念，提出發音混淆網路的自動產生（Automatic Generation of Pronunciation Confusion Network, AGPCN）切音；而就傳統切音對於某些特定的詞句，切音效果普遍不理想的部分，我們則將傳統的強迫對位切音，加入音調的特徵，提出音調特徵（Tonal feature）切音，目標為在經過上述兩種方法實作後，切音效果的準確率能夠獲得提升。

關鍵字

混淆音；耦合音；自動切分

並列摘要

無資料

並列關鍵字

HASH(0x1d3e0490) ； HASH(0x1d3e57a0) ； HASH(0x1d3e5840)

參考文獻

Witt, S. M. and Young, S. J., "Off-line Acoustic Modeling of Non-native

Accents," in Proc. Eurospeech, 1999, pp. 1367-1370.

Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “CALL system

Wei-Tang Hsu, “Error-Spotting in pronunciation of English vowels based on

Forney, G.D., Jr.,“The viterbi algorithm”,IEEE ,1973.

被引用紀錄

曾泓熹（2011）。以句尾母音模型與鼻濁音發音變異來改善日語語音模型〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2011.00537

李宛穎（2011）。使用音高資訊以改進華語發音評量〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2011.00051

蕭麗雯（2008）。中彰地區長期照護機構工作人員對安寧療護之知識、態度及行為之探討〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215462592

國際替代計量

華語混淆音與耦合音之自動切分

全文下載

主題瀏覽