台華平行讓格語料的自動對齊

對台文或華文來說，「讓格書寫」是書寫方式上的新提議，「讓格書寫」主要是以分簡單詞組的方式來書寫。依照讓格書寫的方式我們製作出了台華讓格平行語料庫。本文使用 Brown et. al. (1990) 作對齊的標記，對台華簡短詞組作詞組的對齊。因台文和華文之間具有兩個共通性，第一：漢字共同詞不少、第二：詞序接近，所以林淑卿(2009) 是基於這兩種共通性而使用最大共同子序列的方法實作台華的自動對齊，而我們則是再進一步去討論將簡短詞組透過詞典產生候選香腸再作最大共同子序列的方法實作台華的自動對齊。

關鍵字

台文；台語；中文；華語；平行語料庫；讓格；對齊；平行斷詞；正向長詞優先法；最大共同子序列；候選香腸

並列摘要

Written in LangGeh orthography, the alignment of parallel sentences in Taiwanese and in Mandarin has been studied (Lin 2009). By substituting a few common words in Taiwanese with their counterparts in Mandarin, the LCS (longest common subsequence) algorithm is able to give about 70% recall rate while keeps those aligned highly correct (it actually was perfectly correct in the experiment). This thesis continues the study on alignment by constructing sausage nets from Taiwanese sentences and from Mandarin sentences using various parallel dictionaries, and then applying the LCS algorithm. The sausage net approach gives in 85%~90% recall rates on various corpora while still retaining nearly perfect correctness for those marked aligned.

並列關鍵字

Taiwanese ； Mandarin ； Parallel Corpus ； LangGeh ； Alignment ； fmm algorithm ； Parallel Segmentation ； LCS algorithm ； Sausage net

參考文獻

[4] 林淑卿(2009)。「從台華平行語料庫擷取對應詞組典」，國立清華大學統計學研究所碩士論文，2009。

[5] 楊佩琦(2009)。「讓格書寫下統計是台華翻譯初探」，國立清華大學統計學研究所碩士論文，2009。

[1] CKIP詞典

Google Scholar

[2] Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin(1990). “A Statistical Approach to Machine Translation,” Computational Linguistics Volume 16, Number 2, June 1990.

Google Scholar

[3] Python 3.1(2009). http://www.python.org/.

Google Scholar

被引用紀錄

游聲峰（2014）。語音辨識輔助的台語語料庫收集方法探討〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2014.00126

Hsu, H. P. (2012). 華英平行句的詞組對齊初探 [master's thesis, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2012.00054

李柏宏（2011）。台華平行語料中台語簡短詞組的詞類標記〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2011.00666

林佩儀（2009）。探討喪偶老年婦女之孤寂感及其因應歷程〔碩士論文，中臺科技大學〕。華藝線上圖書館。https://doi.org/10.6822/CTUST.2009.00022

林佩儀（2009）。探討喪偶老年婦女之孤寂感及其因應歷程〔碩士論文，中臺科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0099-0508201017254485

國際替代計量

台華平行讓格語料的自動對齊

全文下載

主題瀏覽