The Application of Back-Propagation Network in E-mail Classification

指導教授 : 黃有評


隨著Internet的普及與網路頻寬不斷提昇,人們打開電腦第一件事,不外乎就是上網收信,就如同家中信箱常會被塞滿一堆廣告傳單,每個人的電子郵件信箱中,也常常發現這些垃圾郵件的蹤跡。垃圾郵件過多,已經成為使用者收信的最大困擾。 由於電子郵件的使用不會因為垃圾郵件的存在而停止,但是垃圾郵件的氾濫,卻讓使用者感到厭煩,不僅收信、看信的時間拉長,還要浪費時間去判斷與處理,耗費心神。而且大量的垃圾郵件常佔用信箱空間,如果沒有即時清理,連正常的信件都無法接收。 本研究主要目的在開發一個以倒傳遞網路為基礎的電子郵件分類系統,採用文件自動分類的技術,首先取出郵件的重要特徵值,並利用中文斷詞技術進行郵件主旨及內容的斷字處理,接著進行關鍵詞的篩選及權重給定,找出郵件重要的關鍵詞後進行與郵件類別的相似度計算,最後結合倒傳遞網路自動學習的方式進行電子郵件的信件分類,自動過濾出垃圾郵件。 實驗結果顯示,本系統在電子郵件分類可達到一定的成果,同時在垃圾郵件偵測上有不錯的辨識率。本研究的確可幫助使用者在收信後減輕其負擔並維護網路資源順暢,減少使用者進行郵件處理的時間,也降低了垃圾郵件的數量。


Because of the popularization of Internet and the speed up of the network, people do the first thing after they turn on the computer is read E-mail. Just like the regular post mailbox in the family, our e-mail mailbox was often finding the traces of spam mail. Too much spam has become the biggest worry from user to receive e-mail. The usage of the E-mail will not stop just because of existence of the spam. But the overflowing of spam let user feel vexed endlessly. This is because not only the long time of receive and read mail but also consume the mind to delete and filter the mail. And a large number of junk emails take up the mailbox space, if we don't clearing up immediately; even the normal mail is unable to receive. The main purpose of our research was to develop an e-mail classification system based on Back-Propagation Network. We adopt the technology of automatic text categorization. We first extracted the important features from mail file. Then we use the Chinese segmentation algorithm to process mail subject and content. We using keyword selection and weighting algorithm to find mail keyword and calculate similarity. Finally, we combine Back-Propagation Network and similarity value to achieve the e-mail classification and automatically filter spam mail. The experimental result shows that the system can accomplish the classification function. We also achieve good recall and precision rate in spam mail filtering. We hope to help users to lighten their burdens to receive mails and to reduce the resources of the network; indeed, we reduced the e-mail processing time, but also decrease the amount of spam.


