算式生成策略於克漏字測驗與標題生成任務之數字推理

數字推理是近幾年自然語言處理中受到廣泛關注且充滿挑戰的一個研究領域。隨著金融科技的興起，如何讓機器理解數字並自動化地進行數值的運算與推理，是自然語言處理的一個發展趨勢。目前數字推理主要被應用在機器閱讀理解中，旨在讓機器閱讀一篇文章後可以解決數字相關的問題。在BERT、RoBERTa等大規模預訓練模型提出後，在機器閱讀理解處理數字相關的問題上逐漸接近人類水平。此外，在自然語言生成的任務上，數值推理也十分重要，例如在財經新聞或財務報表的自動摘要，就需要機器理解其中的數字關係，才能捕捉到重要的資訊，並生成人們期望的摘要。然而，過去很少有研究將數字推理結合在自然語言生成任務中，本論文將進行這項嶄新的嘗試。這篇論文的貢獻主要有兩個。首先，我們提出一個大規模的資料集，既可以用在克漏字測驗中，又可以用來訓練模型，檢測其在標題生成任務中的數字推理的能力。第二個貢獻是我們提出一個方法，透過公式生成的策略，使得模型在生成標題的過程中，同時具備數字推理的能力。實驗證明在克漏字測驗與標題生成的任務中，公式生成策略可以幫助模型進行數字的推理。同時模型生成的標題，在可讀性與生成數字的正確性上，都取得不錯的效果。

關鍵字

自然語言生成；數字推理；標題生成；克漏字測驗

並列摘要

In recent years, numerical reasoning has become a notable and challenging research field in natural language processing (NLP). With the development of financial technology (FinTech), making machines better understand numbers and automatically perform numerical reasoning is a trend in NLP. At present, numerical reasoning is widely used in Machine Reading Comprehension (MRC), which aims at making machines solve math word problems after looking through an article. As large-scale pre-trained models such as BERT and RoBERTa are proposed, the performance of solving math word problems in MRC gradually approaches human performance. In the natural language generation task, numerical reasoning is also very important. For example, in the automatic summarization of financial news and financial statements, it is necessary for machines to understand the numerical relationships in order to capture important information and generate summaries as expected. However, in the past, only a few studies have combined numerical reasoning into natural language generation tasks. This thesis will deal with both numerical reasoning and natural language generation together. There are two main contributions in this paper. First, we propose a large-scale dataset. It can be used in cloze test, and also be used to train a summarization model and to test its ability of numerical reasoning in headline generation tasks. The second contribution is that we propose a method to make the model have the ability of numerical reasoning in the process of headline generation through equation generation strategy. Experimental results show the effectiveness of equation generation strategy for numerical reasoning in both cloze test and headline generation tasks. Meanwhile, the generated headlines have achieved good results in terms of readability and correctness of the generated numbers.

並列關鍵字

Natural Language Generation ； Numerical Reasoning ； Headline Generation ； Cloze Test

參考文獻

[1] A. Amini, S. Gabriel, P. Lin, R. KoncelKedziorski, Y. Choi, and H. Hajishirzi. Mathqa: Towards interpretable math word problem solving with operationbased formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 2357–2367, 2019.

Google Scholar

[2] Y. Bisk, R. Zellers, R. L. Bras, J. Gao, and Y. Choi. Piqa: Reasoning about phys ical commonsense in natural language. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7432–7439, 2020.

Google Scholar

[3] F. Boudin. Unsupervised keyphrase extraction with multipartite graphs. In Proceed ings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 667–672, 2018.

Google Scholar

[4] A. Bougouin, F. Boudin, and B. Daille. TopicRank: Graphbased topic ranking for keyphrase extraction. In Proceedings of the 6th International Joint Conference on Natural Language Processing, pages 543–551, 2013.

Google Scholar

[5] J. Chen, X. Zhang, Y. Wu, Z. Yan, and Z. Li. Keyphrase generation with correlation constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4057–4066, 2018.

Google Scholar

國際替代計量

算式生成策略於克漏字測驗與標題生成任務之數字推理

全文下載

主題瀏覽