透過您的圖書館登入
IP:13.58.121.131
  • 學位論文

個案公司E-mail文本分級探討

Protective Email Classification: A Case Study

指導教授 : 柯士文 鍾斌賢

摘要


本實驗向量表示法採用Word2vec以及Bag-of-word(TF-IDF),並結合四種傳統機器學習分類演算法支援向量機(SVM,Support vector machine)、最近鄰居法(KNN、K-nearest neighbors algorithm)、梯度提升決策樹(GBDT,Gradient boosting Decision Tree)、隨機森林(Random Forest)和一種深度學習長短期記憶(Long Short-term Memory,LSTM)對個案公司所蒐集的電子郵件文本進行分類(需簽核、不需簽核),並透過各種向量表式法和分類器的組合分類出的結果進行探討與比較;根據實驗結果,將符合且適合個案公司現況的演算法組合推薦給個案公司,組合為Word2vec向量表示法搭配SVM演算法。

並列摘要


We adopt vector representation of the text including Word2vec and Bag-of-word(TF-IDF)in this study, and combine four kinds of machine learning algorithms (SVM, KNN, GBDT and Random Forest), as well as a deep-learning tool, LSTM. We use the tools above to class the email text (security and normal), and then investigate and compare the result of each vector representation of the text and classifier. According to the results, we introduce the combination of Word2vec and SVM algorithm to the company.

參考文獻


[1] E. Crawford, J. Kay及E. McCreath, 作者, Automatic Induction of Rules for e-mail Classification. .
[2] B. Klimt及Y. Yang, 作者, The Enron Corpus: A New Dataset for Email Classification Research. .
[3] V. R. Carvalho及W. W. Cohen, 作者, 「On the Collective Classification of Email 『Speech Acts』. .
[4] W. W. Cohen, 作者, 「Learning Rules that Classify E-mail」, 頁 8.
[5] J. D. M. Rennie, 作者, 「ifile: An Application of Machine Learning to E-Mail Filtering」, 頁 6

延伸閱讀