透過您的圖書館登入
IP:18.224.73.125
  • 學位論文

應用多特徵與倒傳遞類神經網路於網路廣告機器人之偵測

Combination of Multiple Feature and Back-Propagation Neural Network for Web Spam Robot Detection

指導教授 : 曾俊元
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來網站的資訊與應用愈來愈重要,在學校的應用上,大多數的校務系統使用網站做為與學生溝通的橋樑,因此網站的安全性與資訊的正確性需要網站管理者加以管控。Web Robot的運作經常隱藏於真實的流量之中,惡意的Web Spam Robot經常在網路上散播廣告留言、垃圾訊息或釣魚網站,本研究以散播廣告留言的Web Spam Robot為研究對象,收集臺北大學犯罪研究所的網站日誌檔,經過資料預處理後,參考相關研究Web Robot文獻,分析找出適合於廣告留言機器人的特徵,並應用多層次倒傳遞學習演算法訓練類神經網路模型。本研究擷取特徵分為資源取得方法、時間特徵、資源要求方法與伺服器回傳狀態四大類型,篩選出九種有效特徵。為了進行類神經網路的訓練,提出三種有效方法進行Session的預先分類處理。研究實驗在Weka類神經網路分類器進行,運用十折交叉驗證,實驗結果成功辨識出25697筆資料,精確度達99.2%,回覆率為88.6%,具有一定的辨識程度。可藉由已訓練完成的分類器,輔助廣告留言機器人的防禦,彈性調整CAPTCHA,給疑似廣告留言機器人較複雜的圖形,並讓一般使用者辨識較簡易的圖形,以提供更和善的留言使用介面。

並列摘要


School web applications provide lots of useful information and services and become major communication platforms between students and the school, so web site managers require to en-sure web application security and information integrity. Web robots usually operate behind real flow, and malicious web robot usually spread spam, junk, or phishing messages. This paper targets web robot spreading spam messages, which are collected in the web site at Graduate school of Criminology, National Taipei University. We pre-process the web log data, reference related research works, identify appropriate features for web spam robots, and adopt back-propagation neural networks to train a neural network for web spam robot detection. According to resource gathering methods, time patterns, resource request methods, and response status, we select nine effective features, and conclude three session features to detect the robots. By 10-fold cross-validation, our experiment results show that 25697 sessions are detected with 99.2% accuracy and 88.6% recall rate. Our trained classifier can assist current web spam robot detectors and dynamically provide complicated CAPTCHA for suspicious spam users and simple CAPTCHA for normal users in order to enhance user interface.

參考文獻


Ahn, L. V., Blum, M., Hopper, N. J., & Langford, J. (2003). CAPTCHA: Using hard AI problems for security.
Bomhardt, C., Gaul, W., & Schmidt-Thieme, L. (2005). Web robot detection-preprocessing web logfiles for robot detection. New developments in classification and data analysis, 113-124.
C. S. Lee J, L. S., Lee H. (2009). Classification of web robots: an empirical study based on over one billion requests. Comput Secur, pp. 28:795-802.
Cooley, R., Mobasher, B., & Srivastava, J. (1997). Web mining: Information and pattern discovery on the world wide web.
Doran, D., & Gokhale, S. S. (2011). Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, 22(1), 183-210.

延伸閱讀