Method of Short Text Classification based on TF‐IWF Feature Selection

[Objective] TF‐IDF algorithm solves the problem of external corpus dependence in short text classification, but it has the problem of weight concentration and low text discrimination when calculating text features. Therefore, a short text classification method based on Chi square statistics and tf‐iwf algorithm is proposed. [method] the feature words are extracted from the training data set by chi square statistics. The feature words are weighted by tf‐iwf algorithm, and then classified by SVM classifier. [results] the experimental results show that the accuracy of text classification is improved by 3.1%, the recall is improved by 5.2%, and the F1 value is improved by 3.7% by combining chi square statistics and tf‐iwf. [Conclusion] the method expands the range of the weight value of feature words, increases the variance of the weight value of the text set, and solves the problem of sparsity of short text content to a certain extent, so as to improve the performance of short text classification.

關鍵字

Short‐Text ； TF‐IWF Algorithm ； Feature Selection ； Sentiment Classification

參考文獻

HU X, SUN N, ZHANG C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge [C] / /Proceedings of the 18th ACM conference on Information and knowledge management. Hong Kong: ACM, 2009: 919-928.

WANG Sheng, FAN Xinghua, CHEN Xianlin. Chinese short text classification based on hyponymy relation[J]. Journal of Computer Applications, 2010, 30(03): 603-606+611.

Wang Yang, Xu Shanshan, Li Chang, Ai Shicheng, Zhang Weidong, Zhen Lei, Meng Dan. Classification model based on support vector machine for Chinese extremely short text[J/OL]. Application Research of Computers:1-5.https://doi.org/10.19734/j.issn.1001-3695.2018.06.0514.

SHENG Cheng Cheng,ZHU Yong, LIU Tao. Public opinion analysis based on Weibo social network[J]. Intelligent Computer and Applications, 2019, 9(01): 57-59+64.

Google Scholar

Li Ding-yu,Hu Xue-gang. Cross-domain Sentiment Classification Algorithm for Short Text[J]. Journal ofChineseMini-MicroComputerSystems, 2018, 39(05): 1005-1009.

Google Scholar

國際替代計量

Method of Short Text Classification based on TF‐IWF Feature Selection

全文下載

主題瀏覽