讓格書寫的台語自動標音初探

本文主要在探討讓格書寫下的台文自動標音。在順向長詞優先(Forward Maximal Matching) 斷詞標音的基礎上，我們討論高頻音、位置修正的改善。接著，使用含拼音的台華平行語料庫，藉由平行斷詞，自動抽取含拼音的新詞條(PSi詞典)，藉以改善自動標音的效率。文中並比較讓格書寫與無間書寫的標音效果差異。

關鍵字

台文；自動標音；平行斷詞；平行語料庫；讓格；對齊

並列摘要

With Taiwanese text written in LangGeh orthography, we study the automatic phonetic annotations of the text. Compared to the baseline case that uses Daiim phonetic dictionary (Chiang 2002) and forward maximal matching, we study some possible improvements using various information extracted from corpus: using high frequency phones of single characters as well as multi-syllabic words, positional information of a character in a LangGeh phrase, and additional phonetic dictionary extracted from parallel corpus. Due to limitation of corpus size, only high frequency phones of characters exhibits significant improvement in our experiments.

並列關鍵字

Taiwanese ； Phonetic Transcription ； Automatic Phonetic Annotations ； Parallel Segmentation ； Parallel Corpus ； LangGeh ； Alignment

參考文獻

[10] 楊佩琦(2009)。「讓格書寫下統計式台華翻譯初探」。新竹：國立清華大學統

“A Statistical Approach to Machine Translation,” Computational Linguistics

Volume 16, Number 2, June 1990.

[9] 林淑卿(2009)。「從台華平行語料庫擷取對應詞組典」。新竹：國立清華大學

[1] CKIP詞典

Google Scholar

被引用紀錄

吳戴任（2011）。論前音節輸入法〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2011.00667

王建傑（2013）。讓格書寫下之斷詞探討〔碩士論文，國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2511201311361262

國際替代計量

讓格書寫的台語自動標音初探

全文下載

主題瀏覽