Text Categorization using Reduced Training Set

The machine learning approaches to text categorization proceed by teaching the system how to classify through labeled samples. In real application scenarios, the collection of training (labeled) samples to design a classifier is not always trivial due to the complexity and the cost which characterize the process. A possible solution to this issue can be found in the exploitation of the large number of unlabeled samples which are accessible at zero cost from the web. Active learning strives to reduce the required labeling effort while retaining the accuracy by intelligently selecting the samples to be labeled. This Study presents a novel active learning method for text classification that selects a batch of informative samples for manual labeling by an expert. The proposed method uses the posterior probability output of a multi-class SVM method. The experiments are performed with two well-known datasets and the presented experimental results show that employing our active learning method can significantly reduce the need for labeled training data.

並列關鍵字

Active learning ； pairwise coupling ； pool-based active learning ； support vector machine ； text classification

國際替代計量

全文下載

主題瀏覽