透過您的圖書館登入
IP:3.144.243.184
  • 學位論文

整合頻繁項目集探勘與潛在語意分析於萃取式中文文件摘要

Integrating Frequent Itemset Mining and Latent Semantic Analysis for Extractive Chinese Document Summarization

指導教授 : 吳宜鴻
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著網路與計算機科技的發展,人們對文本摘要的需求日益增加,文本摘要目的是將原始文章濃縮成簡短片段,有助於人們快速選讀文章或取得重點資訊。自動文本摘要可分成兩類:萃取式摘要是從原始文章挑選重要的句子組成、而抽象式摘要則分析原始文章後生成新的句子。更動字詞或句型可能扭曲原意,造成讀者不必要的誤解,因此,我們著重於萃取式摘要方法的研究。主要想法是在現有的增強型潛在語意分析(ELSA)方法加入由詞向量建立而成的同義詞字典,希望能改善因同義詞歧義而導致句子語意關聯性被忽略的問題。在實驗中,我們嘗試各種參數設定與項目集類型,發現主題較分散的文章集合即使只探勘少量的項目集也能維持不錯的摘要品質。

並列摘要


With the development of network and computer technology, the demand for text summarization is increasing. Text summarization aims at condensing original articles into short snippets to help people quickly select articles for reading or get key information. Automatic text summarization can be divided into two categories: Extractive summarization is composed of important sentences selected from the original article, while abstractive summarization generates new sentences after analyzing the original article. Changing words or the forms of sentences may distort the original meaning and cause readers unnecessary misunderstanding. Therefore, we focus on the study of extractive summarization approaches. The main idea is to add a synonym dictionary established by word vectors to the existing method called Enhanced Latent Semantic Analysis (ELSA), hoping to improve the problem that the semantic relevance among sentences is ignored due to synonym ambiguity. We experimented with various parameter settings and itemset types, and found that even a small number of itemsets were mined, a set of articles with dispersed topics maintained good summary quality.

參考文獻


[1] P. Bellot et al., "Overview of INEX 2013," International Conference of the Cross-Language Evaluation Forum for European Languages, 2013.
[2] Chin-Yew Lin and Eduard Hovy, "The Automated Acquisition of Topic Signatures for Text Summarization," In Proceedings of the 18th Conference on Computational Linguistics - Volume 1 (COLING'00), pp. 495-501, 2000.
[3] Chin-Yew Lin, "ROUGE: A Package for Automatic Evaluation of Summaries," In Proceedings of the Workshop on Text Summarization, 2004.
[4] Elena Baralis, Luca Cagliero, Alessandro Fiori, and Saima Jabeen, "PatTexSum: A Pattern-based Text Summarizer," In Mining Complex Patterns Workshop, pp. 18-29, 2011.
[5] Elena Baralis, Luca Cagliero, Alessandro Fiori, and Paolo Garza, "MWI-Sum: A Multilingual Summarizer Based on Frequent Weighted Itemsets," ACM Trans. Inf. Syst. 34(1), pp. 5:1-5:35, 2015.

延伸閱讀