透過您的圖書館登入
IP:3.142.199.138
  • 期刊

結合資料探勘與統計檢定之垃圾郵件過濾器之研究

Anti-Spam Filter Based on Data Mining and Statistical Test

若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著網際網路的普及與電子郵件的廣泛使用,垃圾郵件的數量日益增多,造成電子郵件使用者的不便。當前垃圾郵件相關研究多注重在過濾垃圾郵件之演算法:利用各種人工智慧或是資料探勘的方式來產生垃圾郵件分類的法則,但是隨著垃圾郵件分類法則的長年累積,郵件伺服器可能包含著過時或是無用的垃圾郵件分類法則,而採用過時的法則可能會導致誤判率升高,除此之外,過多的郵件分類法則也會影響郵件伺服器之過濾效能,本研究結合資料探勘與統計檢定的方式做為垃圾郵件防治之完整的解決方案,本研究利用資料探勘的技巧產生垃圾郵件分類法則並且配合統計檢定的方法來決定是否使用此法則對郵件做分類。透過統計模式推導,可以保證所有採用的法則都是高精確度以及高穩定度的法則,如此可以增加垃圾郵件過濾之效能與效率。

並列摘要


Because of the popularity of Internet and wide use of E-mail the volume of spam mails keeps growing rapidly. The growing volume of spam mails annoys people and affects work efficiency significantly. Most previous researches focused on developing spam filtering algorithm, using statistic or data mining approach to develop precise spam rules. However, mail servers may generate new spam rules constantly and mail server will then carry a growing number of spam rules. The rules might be out-of-date or imprecise to classification as spam evolves continuously and hence applying such rules might cause misclassification. In addition, too many rules in mail server may affect the performance of mail filters. In this research, we propose an anti-spam approach combining both data mining and statistic test approach. We adopt data mining to generate spam rules and statistic test to evaluate the efficiency of them. By the efficiency of spam rules, only significant rules will be used to classify emails and the rest of rules can be eliminated then for performance improvement.

並列關鍵字

Spam mail Data Mining Statistical Test

參考文獻


Androutsopoulos, I.,Paliouras, G.,Karkaletsis, V.,Sakkis, G.,Spyropoulos, C. D.,Stamatopoulos, P.(2000).Learning to filter spam e-mail: A comparison of a Naive Bayesian and a memory-based approach.(4th PKDD's Workshop on Machine Leaning and Textual Information Access).
Bass, T.,Watt, G.(1997).A simple framework for filtering queued SMTP mail (cyberwarcountermeasures).Military Communications Conference.(Military Communications Conference).
Carreras, X.,Marquez, L.(2001).Boosting trees for anti-spam email filtering.4th International Conference on Recent Advances in Natural Language Processing.(4th International Conference on Recent Advances in Natural Language Processing).
Clark, J.,Koprinska, I.,Poon, J.(2003).A neural network based approach to automated e-mail classification.IEEE/WIC International Conference on Web Intelligence.(IEEE/WIC International Conference on Web Intelligence).
Delanya, S. J.,Cunninghamb, P.,Tsymbalb, A.,Coyle, L.(2005).A case-based technique for tracking concept drift in spam filtering.Knowledge-Based Systems.18(4-5),187-195.

延伸閱讀