透過您的圖書館登入
IP:3.23.100.174
  • 學位論文

決策樹法在垃圾郵件過濾之應用

Application of Decision Tree Methods on Spam Filtering

指導教授 : 陳景祥

摘要


由於電腦科技的進步與網際網路的發達,電子郵件已成為現代人日常生活中溝通交流的重要媒介。電子郵件方便寄送,成為商品廣告的最佳行銷管道,因此產生垃圾郵件的問題。垃圾信成長速度很快,不只佔用網路資源、造成系統負擔,也浪費收件者的時間。因此,近年來過濾垃圾信的技術已成為一項熱門的研究領域。本研究根據電子郵件的14個特徵,透過資料採礦技術中的三種決策樹方法,將電子郵件分類為垃圾信與正常信,並與時下最常被用來過濾垃圾信的貝氏分類器方法做比較。我們的研究發現,在考慮分類結果與風險成本的情況下,C4.5決策樹的結果最好,其分類時間也比其他二種決策樹方法快。本研究並發現,在做分類前若先使用白名單,可以降低正常信被誤判的機率。

並列摘要


As a result of the progress on computer science and the development of Internet, Email has been the important communication medium in daily life. Email Advertising becomes the most efficient technique in marketing, and therefore arises the problem about spam. The amounts of spam increase quickly. It not only takes the network resources and makes the burden on system, but also wastes the receiver’s time. Spam filtering becomes a popular research issue in recent years. In this study, we use three decision tree methods of data mining technology to classify Emails into “spam” and “legitimate” based on fourteen characteristics of Email. The three decision tree methods are compared with bayes classifier, which is most often used in spam filtering at present. When the efficiency of classification and misclassification costs are considered, C4.5 method has the best outcome in our case study of spam mails. It takes the shortest test time among the three decision tree methods. Our study also shows that we can avoid misclassifying legitimate by using the white list before we apply the classification.

並列關鍵字

Spam Decision Tree C4.5 C&RT QUEST Bayes Classifier

參考文獻


[7] 趙銘森,林志忠,PHP之戀,上奇科技股份有限公司,2003。
[13] Androutsopoulos I, Koutsias J, Chandrinos KV and Spyropoulos CD. An Experimental Comparison of Naïve Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 160-167, 2000.
[21] Drucker H, Wu D and Vapnik VN. Support Vector Machines for Spam. IEEE Trans. on Neural Networks, 10(5): 1048-1054, 1999.
[23] Johnson DE, Oles FJ, Zhang T and Goetz T. A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41(3): 428-437, 2002.
[24] Loh WY and Shih YS. Split Selection Methods for Classification Trees. Statistica Sinica, 7(4): 815-840, 1997.

被引用紀錄


李冠諭(2017)。我國營所稅稽核之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2017.00959
陳宇邦(2011)。順序型變數轉換在決策樹之應用〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2011.00383
陳婷婷(2009)。以資料探勘技術分析拍賣網站數位相機購物消費行為〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2009.00396
吳泳慶(2007)。中文垃圾郵件客製化過濾系統之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2007.00125
葉采羚(2006)。垃圾郵件過濾:資料採礦與中文斷詞技術之應用〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2006.00611

延伸閱讀