  • 學位論文


Design and Implementation of a Clustering-Based E-mail Filtering System

指導教授 : 李良德


在現今的 e-mail 環境中,快速且大量增生的廣告信是一個很嚴重的問題,這些大量的廣告信不僅造成網路頻寬的浪費也使得 e-mail 伺服器超載,使伺服器的效能大幅降低,使用者也可能會因大量的廣告信而誤刪正常的 e-mail,況且目前市面上高效能的 e-mail 過濾系統成本動輒百萬上下。再加上目前在單機處理 e-mail 過濾的系統,因為 e-mail 伺服器(Mail Transfer Agent, MTA)先天上只有一個收件佇列(incoming queue),即使是在伺服器內執行多個 e-mail 過濾系統程序,也是需要由這一個惟一的收件佇列內去讀取等待處理的 e-mail,當有大量的e-mail在此佇列等待處理時,即會使得郵件傳送時間(mail delivery time)產生大量的延遲。在這篇論文內,我們將討論一個架構,得以建構出一個低成本、高效能及準確度高的 e-mail 過濾系統,此架構是以叢集式系統為基礎並以SpamAssassin 為 e-mail 過濾系統,利用叢集式系統來達到類似多重收件佇列(multiple incoming queue)的目的,並得以分散處理大量的 e-mail 使得 e-mail伺服器能運作在最佳的效能且降低郵件傳送時間。


叢集系統 廣告信


In current e-mail environment, it is a very serious problem for fast and massive proliferation spam mails. These massive spam mails also waste network bandwidth and CPU time of e-mail server. Thus the potency of the server will be reduced significantly. For the massive spam mails, users may also delete the normal e-mail by mistake. Moreover, the cost of most high capacity and efficiency commercial off-the-shelf e-mail filtering systems are quite cost expensive, usually about million dollars at present. In addition, the MTA (Mail Transfer Agent) of the current mail server has a unique incoming mail queue in a single computer e-mail filtering system, thus, it restricts for running multiple e-mail filtering processes in the mail server. Since there is only one incoming mail queue to wait for reading e-mails, it may prolong of the mail delivery time significantly. In this thesis, a clustering-based spam mail filtering system has been proposed to build a cost effectiveness, high performance, and high accuracy e-mail filtering system. The proposed clustering-based architecture uses SpamAssassin for e-mail filtering. In the proposed clustering system, multiple incoming queues are implemented, so as to process massive e-mails in parallel. The experimental results show that the performance of the e-mail server can be improved and the mail delivery time can also be reduced significantly.


SPAM mail clustering system


[4] Neville Holmes, “In Defense of Spam”, Computer Volume 38, Issue 4, April 2005 Page(s):88 – 87
[7] RFC2821: Simple Mail Transfer Protocol, http://www.faqs.org/rfcs/rfc2821.html
[10] Dongeun Kim; Cheol Ho Park; Daeyeon Park;, “Request rate adaptive dispatching architecture for scalable Internet server”, Cluster Computing, 2000. Proceedings. IEEE International Conference on 28 Nov.-1 Dec. 2000 Page(s):289 - 296
[1] Daniel Thomas, “Hack attacks and spam set to increase”, http://www.vnunet.com/computing/news/2071100/hack-attacks-spam-set-increase
[2] JUPITERRESEARCH REPORTS THAT U.S. E-MAIL MARKETING SPENDING WILL RISE FROM $2.1 BILLION IN 2003 TO $6.1 BILLION IN 2008, http://www.jupitermedia.com/corporate/releases/04.03.18-newjupresearch.html
