本論文提出一個有效克服圖像旋轉、縮放及背景干擾問題的圖像式垃圾郵件分類器,針對垃圾郵件的圖像進行辨識,本方法利用圖像顏色分層的概念來減少背景顏色的干擾以及利用七種實驗證明可抗縮放的參數進行辨識,並將所擷取的圖像以橢圓方式呈現。橢圓旋轉方面經由轉換其座標軸,可以解決圖片旋轉後的最小誤差,本論文也提出了一個混合式的圖像式垃圾郵件過濾器,主要方法分為兩個部分,第一部分為利用光學文字辨識(Optical Character Recognition, OCR),擷取出圖像內的文字,並利用關鍵字列表過濾出垃圾郵件,第二部分為使用抗旋轉、縮放圖像之SVM分類器,過濾經過旋轉、縮放的垃圾郵件圖像,經實驗證明抗旋轉以及縮放分類器可以達到很好的分類效果。
This thesis proposes a novel algorithm to effectively identify the image spam for the problems of image rotation, scaling and the background interference. This algorithm uses image color layering to reduce the interference of the background color and adopts seven resistant scaling parameters to identifier, and capture the image objects in oval way. This thesis also proposes a hybrid image spam filter. The first part uses Optical Character Recognition (OCR) to capture the text that embed in the image, and used keyword list to filter the identify the spam. The second part uses the proposed algorithm to filter the rotation and scaling of image spam. The experimental results demonstrated that the proposed algorithm can achieve a good classification results.