透過您的圖書館登入
IP:3.144.200.28
  • 學位論文

基於改進式GPT-2的新聞標題生成

An Improved GPT-2 Model for News Title Generation

指導教授 : 鄭卜壬

摘要


數位資訊時代人們習慣於從網路中便捷地獲取即時新聞資訊。當前社群媒體中許多短篇新聞為了達到吸引眼球的目的會出現題不對文的問題,使得讀者很難通過標題直觀了解感興趣的新聞內容。 近年來,隨著深度學習的興起,針對新聞標題生成任務的研究也逐漸從基於RNN的方法發展到基於Transformer的方法。然而,現有的工作中仍然存在以下問題:首先,我們發現很多模型對新聞內容的理解不足,很難最大化利用內容中的有效資訊。其次,由於一篇新聞中往往有多個重點,當前許多模型不一定有能力捕獲那些適合作為標題的相關資訊。針對以上兩點問題,本研究以GPT-2模型為基礎架構對前述兩點問題提出了相的應改進方法並設計了一個兩階段訓練過程。在預訓練階段,我們通過重新設計注意力遮罩解決文章資訊利用不足的問題;在微調階段,模型將同時進行上下文和語言模型的學習,進一步引導模型關注文章內容中與標題最相關的資訊。 最後在實驗部分,我們比較了基於注意力的序列到序列生成模型、指針神經網路、基礎GPT-2模型及本研究提出的改進式GPT-2模型在標題生成任務上的表現差異。結果通過機器評估與人工評價均驗證了改進式GPT-2模型有能力生成符合文意且品質較高的新聞標題。

並列摘要


With the rapid development of the Internet, people are accustomed to obtain instant news information conveniently from their social software. Many short news in social media tend to use eye-catching title which leads to inconsistency with the facts. In the past few years, neural text summarization methods has been developed from RNN-based to Transformer-based methods. However, some of the existing works still have the following problems: (1) These models usually suffer from insufficient of understanding content information which makes the model hardly to maximize performance. (2) There are often multiple key points in an article, and many current models are not necessarily capable of capturing relevant information suitable for the title. In view of the above two problems, we improved the original GPT-2 model architecture and designs a two-stage training scheme. In pretrain phase, we re-designed attention mask trying to improve content understanding without disclosure of title information. In fine-tune phase, the model was designed to learn both context and language prediction. Finally, We compared the performance of the RNN-based Seq2Seq model, Pointer Generator, basis GPT-2 model and improved GPT-2 model in the title generation task. Both machine and human evaluation verified that the improved GPT-2 model has the ability to generate a high-quality news title.

參考文獻


[1] H. P. Luhn, “The automatic creation of literature abstracts.” in IBM Journal of research and development, 1958, vol. 2, no. 2, pp. 159-165.
[2] R. Mihalcea, and P. Tarau, “TextRank: Bringing Order into Text.” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004, pp. 404-411.
[3] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation.” in Journal of machine Learning research, 2003, vol. 3, pp. 993-1022.
[4] R. Nallapati, F. Zhai, and B. Zhou, “Summarunner: A recurrent neural network based sequence model for extractive summarization of documents.” in Thirty-first AAAI conference on artificial intelligence, 2017.
[5] S. Narayan, S. B. Cohen, and M. Lapata, “Ranking Sentences for Extractive Summarization with Reinforcement Learning.” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, vol. 1, pp. 1747-1759.

延伸閱讀