透過您的圖書館登入
IP:3.147.104.248
  • 期刊

應用語句關係網路計算語句向心性之新聞事件摘要方法

Extraction-based News Summarization Using Sentence Centrality in the Sentence Similarity Network

摘要


摘錄式摘要技術的核心在於評估語句的摘要代表性,藉以排序語句作為摘錄語句時的依據。本研究將語句視為節點,藉由語句相似度來決定節點間是否存在連結,依此建構出語句關係網路模型。接著,衡量節點在網路中的重要性或對於其他相連節點的影響性,提出:(1)Degree Centrality、(2)Normalized Similarity-based Degree Centrality、(3)HITS Centrality、(4)PageRank Centrality,及(5)iSpreadRank Centrality的節點向心性分析;並以語句向心性作為語句的摘要代表性,藉此達到排序語句的目的。最後,導入CSIS(Cross-Sentence Information Sub-sumption)過濾重複性資訊,依序擷取語句組成摘要。實驗使用DUC 2004資料集來驗證上述摘要方法的可行性。在ROUGE-1的指標下,結合不同語句向心性之摘要效能依序是:iSpreadRank > Normalized Similarity-based Degree > PageRank > Degree > HITS。整體而言,實驗得知應用語句關係網路計算語句向心性之摘要方法確實可行。

並列摘要


Purpose: One widely-adopted summarization paradigm, sentence extraction, aims at extracting important sentences and composing them into a summary. The foundation towards sentence extraction is to assess importance of sentences in the summary so as to rank sentences for extraction. This paper employs graph-based text analysis to model documents and investigates measures of graph-based centrality as sentence salience in summarization. Design/methodology/approach: This paper models documents on the same (or related) topic as a sentence similarity network, in which a sentence is regarded as a node and relationship between sentences only exists if they are semantically related. Several methods for evaluating the importance of a node (i.e., a sentence) in the network are then proposed, namely: (1) Degree Centrality; (2) Normalized Similarity-based Degree Centrality; (3) HITS Centrality; (4) PageRank Centrality; and (5) iSpreadRank Centrality. All are designed on the basis of the idea that the importance of a node is determined not only by the number of nodes to which it connects, but also by the importance of its connected nodes. As to summary generation, CSIS (Cross-Sentence Information Sub-sumption) is employed for anti-redundancy while extracting sentences according to the sentence ranking produced based on the centrality of sentences. Findings: The proposed summarization method was evaluated using the ROUGE evaluation suite on the DUC 2004 news stories collection. Experimental results show that, while considering the ROUGE-1 metric, the performance ranking is: iSpreadRank > Normalized Similarity-base Degree > PageRank > Degree > HITS. Another experiment, conducted to combine sentence centrality with surface-level features, also presents competitive results, compared with the best participant in the DUC 2004 evaluation. Research limitations/implications: Directions for future research would be: (1) instead of symbolic-level analysis, to take into account semantics, such as synonymy, polysemy, and term dependency, while determining if two sentences are semantically related; (2) to investigate graph-based centrality developed in social network analysis for evaluating sentence salience in summarization; (3) to improve the cohesion and coherence of summaries using natural language processing techniques, such as sentence planning and generation. Practical implications: The proposed summarization method is in an unsupervised manner; thus no training dataset is required. Since no domain-specific knowledge or deep linguistic analysis is exploited, the method is domain- and language-independent. However, it might lead to poor understanding of the input texts and would probably produces poor summaries, due to neither deep analysis of natural language processing performed, discourse structure considered, nor domain-specific knowledge involved in the process of summarization. Originality/value: The contributions of this work are threefold. First, this paper offers a sentence similarity network to model topic-related documents. Second, novel graph-based sentence ranking methods are explored to rank the importance of sentences for extraction. Finally, the proposed method had been proven successful in a case study with the DUC 2004 benchmark dataset.

參考文獻


陳光華()。
葉鎮源(2002)。文件自動化摘要方法之研究及其在中文文件的應用(碩士論文)。國立交通大學資訊科學系。
謝仰哲(2008)。國中生友誼與學習諮詢網路之社會網路分析(碩士論文)。國立臺灣師範大學資訊教育學系。
Aliguliyev, R.M.(2010).Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization.Computer Intelligence.26(4),420-448.
Anderson, J.R.(1983).A spreading activation theory of memory.Journal of Verbal Learning and Verbal Behavior.22(3),261-295.

延伸閱讀