透過您的圖書館登入
IP:3.17.68.14

摘要


The short text has the characteristics of less vocabulary, more noise and sparse features, which leads to the unsatisfactory effect of the traditional text classification method applied to the short text classification. In order to improve the classification accuracy of short texts, a feature extension method based on Wikipedia word vector is proposed. First, word vectors are trained using Wikipedia corpus. Then, word vector is combined with document vector for feature selection. Finally, by extending the word set with high similarity of feature items, the resulting text is classified by the traditional classifier. Experimental results show that the proposed method is better than other text feature extension algorithms in the accuracy of short text classification.

參考文獻


Sriram B, Fuhry D, Demir E, et al. Short text classification in twitter to improve information filtering [C]. Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva: Bharath Sriram, 2010:841-842.
Zubiaga A, Spina D, Martínez R, et al. Real-time classification of Twitter trends [J]. Journal of the Association for Information Science & Technology, 2015, 66(3):462-473.
Li, X., Gao, F., & Ding, C. (2016, January). The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension and HowNet. In 2016 International Conference on Intelligent Control and Computer Application (ICCA 2016). Atlantis Press.
Fan, X. (2012). A method for Chinese short text classification considering effective feature expansion. INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ARTIFICIAL INTELLIGENCE, 1(1).
Zhang zhifei, miao jiaoqian, gao can. Classification of short texts based on LDA topic model [J]. Computer applications, 2013,33 (6):1587-1590.

延伸閱讀