透過您的圖書館登入
IP:3.142.35.75
  • 學位論文

增補資源匱乏漢語方言之漢字發音

Augmentation of Character Pronunciations for Resource-poor Chinese Dialects

指導教授 : 許永真
共同指導教授 : 蔡宗翰(Richard Tzong-han Tsai)

摘要


大多數漢語方言缺乏完整的數位發音資料庫,而這卻是語音處理不可或缺的。若 有相關方言的完整發音資料庫便能憑某漢字之韻書特徵,及其於相關方言之發 音,使用監督式學習方法預測該漢字於目標方言之發音。遺憾的是漢語方言發音 資料庫資源仍不完備。我們提出一新式生成模型,同時利用方言發音資料以及中 古韻書以發掘在多方言間存在之音韻規律。我們提出之模型能利用現存不完整之 方言發音資料庫以及韻書所載資料增補得出一完整之方言發音資料庫。該方言發 音資料庫之後即可利用傳統監督式學習方法預測某方言之漢字發音。我們藉整體 發音特徵準確率 (OPFA) 項目評估。第一個實驗結果可看出若加入方言發音特徵相 較於僅有韻書特徵,能大幅度改進支持向量機分類器 (SVM classifier) 的效能。第 二個實驗中我們比較利用親屬關係相近之方言與親屬關係相距遙遠之方言之音韻 特徵對支持向量機效能影響。實驗結果顯露利用相近方言可得較高準確率。第三 個實驗中可看出利用我們提出之增補模型可以提高 SVM 模型之 OPFA 準確率高達 4.9%。

並列摘要


Most spoken Chinese dialects lack comprehensive digital pronunciation databases, which are crucial for speech processing tasks. Given complete pronunciation databases for related dialects, one can use supervised learning techniques to predict a Chinese character’s pronunciation in a target dialect based on the character’s features and its pronunciation in other related dialects. Unfortunately, Chinese dialect pronunciation databases are far from complete. We propose a novel generative model that makes use of both existing dialect pronunciation data plus medieval rime books to discover patterns that exist in multiple dialects. The proposed model can augment missing dialectal pronunciations based on existing dialect pronunciation tables (even if in-complete) and the pronunciation data in rime books. The augmented pronunciation database can then be used in supervised learning settings. We evaluate the prediction accuracy in terms of phonological features, such as tone, initial phoneme, final phoneme, etc. For each character, features are evaluated on the whole, overall pronunciation feature accuracy (OPFA). Our first experimental results show that adding features from dialectal pronunciation data to our baseline rime-book model dramatically improves OPFA using the support vector machine (SVM) model. In the second experiment, we compare the performance of the SVM model using phonological features from closely related dialects with that of the model using phonological features from non-closely related dialects. The experimental results show that using features from closely-related dialects results in higher accuracy. In the third experiment, we show that using our proposed data augmentation model to fill in missing data can increase the SVM model’s OPFA by up to 4.9%.

參考文獻


[12] C.-J. Lin and H.-H. Chen. A Mandarin to Taiwanese Min Nan machine translation system with speech synthesis of Taiwanese Min Nan. International Journal of Computational Linguistics and Chinese Language Processing, 4(1):59–84, 1999.
[1] W. S. Allen. Vox Latina: a guide to the pronunciation of classical Latin. Cambridge University Press, Cambridge [Eng.], 1978.
[2] M. Ben Hamed and F. Wang. Stuck in the forest : Trees, networks and Chinese dialects. Diachronica, 23(1):29–60, 2006.
[11] J. B. Jensen. On the mutual intelligibility of Spanish and Portuguese. Hispania, 72(4):848–852, 1989.
[13] D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Math. Program., 45(3):503–528, 1989.

延伸閱讀