近年來網站的資訊與應用愈來愈重要,在學校的應用上,大多數的校務系統使用網站做為與學生溝通的橋樑,因此網站的安全性與資訊的正確性需要網站管理者加以管控。Web Robot的運作經常隱藏於真實的流量之中,惡意的Web Spam Robot經常在網路上散播廣告留言、垃圾訊息或釣魚網站,本研究以散播廣告留言的Web Spam Robot為研究對象,收集臺北大學犯罪研究所的網站日誌檔,經過資料預處理後,參考相關研究Web Robot文獻,分析找出適合於廣告留言機器人的特徵,並應用多層次倒傳遞學習演算法訓練類神經網路模型。本研究擷取特徵分為資源取得方法、時間特徵、資源要求方法與伺服器回傳狀態四大類型,篩選出九種有效特徵。為了進行類神經網路的訓練,提出三種有效方法進行Session的預先分類處理。研究實驗在Weka類神經網路分類器進行,運用十折交叉驗證,實驗結果成功辨識出25697筆資料,精確度達99.2%,回覆率為88.6%,具有一定的辨識程度。可藉由已訓練完成的分類器,輔助廣告留言機器人的防禦,彈性調整CAPTCHA,給疑似廣告留言機器人較複雜的圖形,並讓一般使用者辨識較簡易的圖形,以提供更和善的留言使用介面。
School web applications provide lots of useful information and services and become major communication platforms between students and the school, so web site managers require to en-sure web application security and information integrity. Web robots usually operate behind real flow, and malicious web robot usually spread spam, junk, or phishing messages. This paper targets web robot spreading spam messages, which are collected in the web site at Graduate school of Criminology, National Taipei University. We pre-process the web log data, reference related research works, identify appropriate features for web spam robots, and adopt back-propagation neural networks to train a neural network for web spam robot detection. According to resource gathering methods, time patterns, resource request methods, and response status, we select nine effective features, and conclude three session features to detect the robots. By 10-fold cross-validation, our experiment results show that 25697 sessions are detected with 99.2% accuracy and 88.6% recall rate. Our trained classifier can assist current web spam robot detectors and dynamically provide complicated CAPTCHA for suspicious spam users and simple CAPTCHA for normal users in order to enhance user interface.