學習使用知識庫以改善機器翻譯

我們提出一個機器翻譯的方法，可以翻譯帶有命名實體的句子。在研究方法中，我們將命名實體和其翻譯從雙語知識庫中抽出，以求正確翻譯在平行語料中代表性不足的命名實體。此方法涉及偵測、鏈接和替換輸入句子中的命名實體，識別命名實體類別，以及訓練類神經機器翻譯模型並基於命名實體類別生成翻譯。在執行時，接受文本段落，並將命名實體換成命名實體類別，然後用類神經機器翻譯模型和雙語知識庫生成翻譯。我們將該方法應用於平行語料庫和雙語知識庫，實際製作了一個雛型翻譯系統。根據我們對句子翻譯的評估，此模型在命名實體翻譯和增強目標句子的流利性方面上，有顯著的改進。

關鍵字

知識庫；機器翻譯

並列摘要

We introduce a method for learning to generate machine translation of a given sentence with potential rare named entities (NE). In our approach, NEs and their translations are extracted from a bilingual knowledge base, aimed at maximizing correct translations for under-represented named entities in a parallel corpus. The method involves linking NEs in the bilingual training sentences, replacing NEs with NE-type labels, and training a neural machine translation (NMT) model for partially lexicalized training data with regular tokens and NE-type labels. At run time, the system accepts a text passage, links and replaces NEs with NE-type labels, and then translates the text using the trained NMT model and translate NE-type labels using a bilingual knowledge base. We present a prototype system, WikiTrans that applies the method to a parallel corpus and extit{Wikipedia}. Evaluation on a set of sentences shows that the method achieves reasonably good performance in terms of generating high quality NE translations and enhancing the fluency of target sentences.

並列關鍵字

Knowledge Base ； Machine Translation

參考文獻

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

Google Scholar

Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, and Robert L Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263–311, 1993.

Google Scholar

Joseph Chang, Richard Tzong-Han Tsai, and Jason S Chang. Wikisense: Supersense tagging of wikipedia named entities based wordnet. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1, pages 72–81, 2009.

Google Scholar

Kyunghyun Cho, Bart Van Merri ̈enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

Google Scholar

Chris Dyer, Victor Chahuneau, and Noah A Smith. A simple, fast, and effective reparameterization of ibm model 2. 2013.

Google Scholar

國際替代計量

學習使用知識庫以改善機器翻譯

全文下載

主題瀏覽