強化生成式摘要之資訊一致性與重點覆蓋率

生成式摘要（abstractive summarization）隨著快速成長的預訓練模型逐漸成為摘要任務的主流，而生成式摘要與原文資訊不一致的問題也變得更加明顯：摘要必須忠於原文，不應編造故事。本論文在非監督式摘要（unsupervised abstractive sumamrization）的研究基礎上，透過增加事實一致性評分機制，強化摘要與原文的資訊一致性；另外我們提出一個新的擷取關鍵字的方法，利用依存句法剖析器（Dependency Parsing）找到被修飾最多的關鍵字，這些關鍵字將用於輔助非監督式摘要所需還蓋到的訊息。透過 FEQA與ROUGE，實驗結果顯示我們在資訊一致性與重點還覆蓋率上皆有顯著的提升。

關鍵字

生成式摘要；關鍵字擷取；強化學習；非監督式學習；內容覆蓋率；資訊一致性

並列摘要

Abstractive summarization has gradually gained importance because of the rapid growth of pre-trained language models. However, there are occasions when the models generate a summary that contains information that is inconsistent with the original document. Presenting information differently from the original document is a critical problem under summarization that we label factual inconsistency. This research proposes an unsupervised abstractive summarization method for improving factual consistency and coverage that uses reinforcement learning. It includes a novel method designed to maintain factual consistency between the generated summary and the original document. As well as a novel method of ranking keywords; here, keywords are used to support the model and keep track of the level of coverage of the information. The result validates the performance and outperforms the existing methods.

並列關鍵字

Abstractive Summarization ； Keyword Extraction ； Reinforcement Learning ； Unsupervised Learning ； Coverage ； Factual Consistency

參考文獻

[1] Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang,Songhao Piao, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. Unilmv2: Pseudo-masked language models for unified language model pre-training, 2020.

Google Scholar

[2] Federico Barrios, Federico Lpez, Luis Argerich, and Rosa Wachenchauzer. Variations of the similarity function of textrank for automated summarization, 2016.

Google Scholar

[3] Yen-Chun Chen and Mohit Bansal. Fast abstractive summarization with reinforce-selected sentence rewriting, 2018.

Google Scholar

[4] James Clarke and Mirella Lapata. Discourse constraints for document compression. Computational Linguistics, 36(3):411–441, 2010.

Google Scholar

[5] Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. A discourse-aware attention model for abstractive summarization of long documents, 2018.

Google Scholar

國際替代計量

強化生成式摘要之資訊一致性與重點覆蓋率

主題瀏覽