Aligning More Words with High Precision for Small Bilingual Corpora

In this paper, we propose an algorithm for identifying each word with its translations in a sentence and translation pair. Previously proposed methods require enormous amounts of bilingual data to train statistical word-by-word translation models. By taking a word-based approach, these methods align frequent words with consistent translations at a high precision rate. However, less frequent words or words with diverse translations generally do not have statistically significant evidence for confident alignment. Consequently, incomplete or incorrect alignments occur. Here, we attempt to improve on the coverage using class-based rules. An automatic procedure for acquiring such rules is also described. Experimental results confirm that the algorithm can align over 85% of word pairs while maintaining a comparably high precision rate, even when a small corpus is used in training.

並列關鍵字

Word alignment ； machine readable dictionary and thesaurus ； bilingual corpus ； word sense disambiguation

參考文獻

Alshawi, H.,Carter, D.(1994).Training and Scaling Preference Functions for Disambiguation.Computational Linguistics.20(4),635-648.

Google Scholar

Brill, E.(1992).Proceedings of the third Conference on Applied Natural Language Processing.

Google Scholar

Y. R. Yuen Ren, Yuen Ren(1968).A Grammar of Spoken Chinese.

Google Scholar

Jyun-Sheng J. S., J. S.(1994).Proceeding of 2nd Pacific Asia Conference on Formal and Computational Linguistics.

Google Scholar

Jyun-Sheng J. S., J. S.(1997).Proceedings of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation.

Google Scholar

國際替代計量

Aligning More Words with High Precision for Small Bilingual Corpora

全文下載

主題瀏覽