Using Redundancy Reduction in Summarization to Improve Text Classification by SVMs

In this paper, we investigate the use of summarization technique to improve text classification. As summarization inherently assign more weights to the more important sentences in an article, this may improve the accuracy of classification of the article. Redundancy in summaries was reduced to different levels and its effect on classification performance was investigated. The classification algorithm used here was Support Vector Machines (SVMs) which has proven to be very effective and robust for text classification problem. Experimental results showed that summaries with lowest redundancy could improve the classification performance of Reuters corpus with more than 6% increase on average F1 measure. In order to explain why summarization can improve the performance while feature selection makes no sense for SVMs, a further experiment was conducted to demonstrate the difference between summarization and traditional feature selection techniques.

並列關鍵字

text classification ； text summarization support vector machines maximal marginal relevance ； text mining

被引用紀錄

胡夢珂（2011）。使用支援向量機進行中文文本可讀性分類-以國小國語課文為例〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315250667

國際替代計量

Using Redundancy Reduction in Summarization to Improve Text Classification by SVMs

全文下載

主題瀏覽