商品對比意見摘要技術之研究

本論文以論壇的意見評論句集合作為研究資料，探討如何從意見評論句集中自動摘要出具代表性的對比意見句組。由於使用者討論商品時大多針對特定功能或特徵提出意見，而功能與特徵多為名詞，因此本論文首先根據意見句中包含的名詞建立特徵向量後進行分群，將可能討論同一主題的句子群聚在一起。接著對同一分群中的句子依正反兩類意見分開，依同類意見句相互的相似程度值，建立句子間的關聯圖模型，計算出各個句子在該群該類中的重要性分數。接著從各群配對正反意見句產生候選對比句組，以權重值組合候選對比句組中兩句子之重要性分數及兩句子的相似度，計算出每個候選對比句組的對比性分數，以此對比性分數做為挑選對比句組的依據。此外，我們提出動態更新句子群組演算法，當資料新增時可將新增句動態加入原有意見句群組中，只需針對被更新的群組進行對比句擷取。實驗結果顯示，本論文提出之對比意見摘要技術對擷取對比意見句組較相關研究所提出的方法有更好的效果，且動態更新句子群組演算法對新增意見句的處理效率有明顯的提昇。

關鍵字

對比意見摘要；關聯圖模型；重要性分數；對比性分數；動態更新句子群組

並列摘要

In this thesis, the opinioned reviews from web forum are used as the data source. Our goal is to provide an effective approach for automatically summarizing comparative sentence pairs from contractive opinioned text. Users usually give comments for a product on its features or functions, whose part of speech usually belong to nouns. Accordingly, each opinioned sentence is characterized by a noun feature vector according to the nouns appearing in the sentence. For the purpose of gathering the sentences describing on the same topic, clustering is performed on the opinioned sentences according to their noun feature vectors. Then, for each cluster, the positive and negative sentences are separated into two groups. In each group, after constructing the association graph of sentences according to their similarity degree, the representative score of each sentence is computed. For each positive and negative pairs selected from a cluster, the comparative score of the pair is obtained by performing a weighted sum to combine the representative scores of the two sentences and the similarity degree between the two sentences. The pair with the highest comparative score in a cluster will be selected as a comparative sentence pair. Moreover, we propose an efficient updating algorithm to insert a new opinioned sentence into the existing clusters of sentences incrementally. Then, it only requires performing comparative sentence pair selection from the updated cluster. The experimental results show that the effectiveness of the comparative sentence pair extraction method proposed in this thesis outperforms the related work. Especially, the proposed cluster updating algorithm has significant improvement on execution efficiency for processing newly inserted opinioned sentences.

並列關鍵字

summarize comparative sentence pairs ； association graph ； updating algorithm

參考文獻

[1] D. Das, A. Martins, “A Survey on Automatic Text Summarization, ” in Literature Survey for the Language and Statistics Ⅱ Course at CMU, 2007.

[6] Hyun Duk Kim, ChengXiang Zhai, “Generating Comparative Summaries of Contradictory Opinions in Text, ” in Proceeding of the 18th ACM conference on Information and knowledge management, 2009.

[7] Beibei Li, Shuting Xu, Jun Zhang, “Enhancing clustering blog documents by utilizing author/reader comments, ” in Proceedings of the 45th Annual Southeast Regional Conference, 2007.

[9] J. B. MacQueen, “Some Methods for classification and Analysis of Multivariate Observations, ” in Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297, 1967.

[12] Michael J. Paul*, ChengXiang Zhai, Roxana Girju, “Summarizing Contrastive Viewpoints in Opinionated Text, ” in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010.

國際替代計量

商品對比意見摘要技術之研究

主題瀏覽