A Novel Approach for Handling Unknown Word Problem in Chinese-Vietnamese Machine Translation

For languages where space cannot be a boundary of a word, such as Chinese and Vietnamese, word segmentation is always the task to be done first in a statistical machine translation system (SMT). The word segmentation increases the translation quality, but it causes many unknown words (UKW) in the target translation. In this paper, we will present a novel approach to translate UKW. Based on the meaning relationship between Chinese and Vietnamese, we built a model which based on the meaning of the characters forming the UKW before translating the UKW through the model. Experiments show that our method significantly improved the performance of SMT.

並列關鍵字

Chinese-Vietnamese SMT ； Unknown Word ； Sino-Vietnamese ； Pure-Vietnamese ； SVBUT Model ； PVBUT Model

參考文獻

Dinh, D.,Vu, T.(2006).A maximum entropy approach for Vietnamese word segmentation.Research, Innovation and Vision for the Future, 2006 International Conference.(Research, Innovation and Vision for the Future, 2006 International Conference).:

Google Scholar

Eck, M.,Vogel, S.,Waibel, A.(2008).Communicating Unknown words in machine translation.International Conference on Language Resources and Evaluation, LREC 2008.(International Conference on Language Resources and Evaluation, LREC 2008).:

Google Scholar

Silva, J.,Coheur, L.,Costa, A.,Trancoso, I.(2012).Dealing with unknown words in the Statistical machine translation.Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12).(Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)).:

Google Scholar

Tran, P.,Dinh, D.(2012).Surveying word boundary factor in Chinese-Vietnamese SMT.8th Science conference.(8th Science conference).:

Google Scholar

Tran, P.,Dinh, D.(2012).Identifying and reordering prepositions in Chinese-Vietnamese machine translation.First International Workshop on Vietnamese language and speech processing (VLSP), In conjunction with 9th IEEE-RIVF conference on Computing and Communication Technologies (RIVF 2012).(First International Workshop on Vietnamese language and speech processing (VLSP), In conjunction with 9th IEEE-RIVF conference on Computing and Communication Technologies (RIVF 2012)).:

Google Scholar

國際替代計量

A Novel Approach for Handling Unknown Word Problem in Chinese-Vietnamese Machine Translation

全文下載

主題瀏覽