在現今的 e-mail 環境中,快速且大量增生的廣告信是一個很嚴重的問題,這些大量的廣告信不僅造成網路頻寬的浪費也使得 e-mail 伺服器超載,使伺服器的效能大幅降低,使用者也可能會因大量的廣告信而誤刪正常的 e-mail,況且目前市面上高效能的 e-mail 過濾系統成本動輒百萬上下。再加上目前在單機處理 e-mail 過濾的系統,因為 e-mail 伺服器(Mail Transfer Agent, MTA)先天上只有一個收件佇列(incoming queue),即使是在伺服器內執行多個 e-mail 過濾系統程序,也是需要由這一個惟一的收件佇列內去讀取等待處理的 e-mail,當有大量的e-mail在此佇列等待處理時,即會使得郵件傳送時間(mail delivery time)產生大量的延遲。在這篇論文內,我們將討論一個架構,得以建構出一個低成本、高效能及準確度高的 e-mail 過濾系統,此架構是以叢集式系統為基礎並以SpamAssassin 為 e-mail 過濾系統,利用叢集式系統來達到類似多重收件佇列(multiple incoming queue)的目的,並得以分散處理大量的 e-mail 使得 e-mail伺服器能運作在最佳的效能且降低郵件傳送時間。
In current e-mail environment, it is a very serious problem for fast and massive proliferation spam mails. These massive spam mails also waste network bandwidth and CPU time of e-mail server. Thus the potency of the server will be reduced significantly. For the massive spam mails, users may also delete the normal e-mail by mistake. Moreover, the cost of most high capacity and efficiency commercial off-the-shelf e-mail filtering systems are quite cost expensive, usually about million dollars at present. In addition, the MTA (Mail Transfer Agent) of the current mail server has a unique incoming mail queue in a single computer e-mail filtering system, thus, it restricts for running multiple e-mail filtering processes in the mail server. Since there is only one incoming mail queue to wait for reading e-mails, it may prolong of the mail delivery time significantly. In this thesis, a clustering-based spam mail filtering system has been proposed to build a cost effectiveness, high performance, and high accuracy e-mail filtering system. The proposed clustering-based architecture uses SpamAssassin for e-mail filtering. In the proposed clustering system, multiple incoming queues are implemented, so as to process massive e-mails in parallel. The experimental results show that the performance of the e-mail server can be improved and the mail delivery time can also be reduced significantly.