Explicit Use of Term Occurrence Probabilities for Term Weighting in Text Categorization

In this paper, the behaviors of leading symmetric and asymmetric term weighting schemes are analyzed in the context of text categorization. This analysis includes their weighting patterns in the two dimensional term occurrence probability space and the dynamic ranges of the generated weights. Additionally, one of the newly proposed term selection schemes, multi-class odds ratio, is considered as a potential symmetric weighting scheme. Based on the findings of this study, a novel symmetric weighting scheme derived as a function of term occurrence probabilities is proposed. The experiments conducted on Reuters-21578 ModApte Top10, WebKB, 7-Sectors and CSTR2009 datasets indicate that the proposed scheme outperforms other leading schemes in terms of macro-averaged and micro-averaged F1 scores.

並列關鍵字

text categorization ； supervised term weighting ； symmetric schemes ； term occurrence probabilities ； support vector machines

被引用紀錄

郭盈妙（2014）。防護網之沖壓製程分析與連續模具設計〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://doi.org/10.6827/NFU.2014.00139

國際替代計量

Explicit Use of Term Occurrence Probabilities for Term Weighting in Text Categorization

全文下載

主題瀏覽