  • 學位論文


Query-based Backpropagation Neural Networks and its Application in Anomaly Detection

指導教授 : 張瑞益 郭真祥


異常偵測是一個重要的問題已經廣泛在各類型的學術與應用領域進行研究,其中,類神經網路已經發展出許多應用於異常偵測的技術。異常通常是指罕見發生事件或行為,當正常事件的資料佔據絕對大多數比例時,相對異常事件的資料僅有很少數的分佈。類神經網路是一種強有力的技術能夠解決很多現實世界問題,然而許多現實世界中能夠取得的資料大部份是不平衡分佈的。而倒傳遞演算法的實證研究顯示對於不平衡類別的問題,在網路訓練階段計算均方誤差時,會增加計算的複雜度而造成網路收斂速度緩慢,因此,當面對複雜且不平衡資料時,倒傳遞演算法無法適當地表示資料分佈的特徵,造成產生不正確的資料分類。 本研究企圖透過利用詢問式學習技術來降低訓練樣本的需求數量與特徵空間的維度以解決上述問題。透過我們提出的新方法能夠改善倒傳遞演算法的收斂時間和提升一般化的能力。利用降低或刪除缺乏資訊量的訓練樣本與無相關性的資料屬性,來維持或提高分類的正確率。實驗用的資料是著名的KDD 1999網路入侵偵測競賽資料集和UCI機器學習的標竿資料庫。實驗結果顯示應用詢問式方法結合訓練資料與特徵選取的倒傳遞類神經網路比傳統的倒傳遞類神經網路能夠有更好的效果,完成訓練的網路具有絕佳的表現在收歛時間與一般化能力上,能夠有效改善不平衡資料的問題。


Anomaly detection is an important problem that has been widely studied within diverse research areas and application domains. Many anomaly detection techniques have been developed in the neural networks communities. Since anomalies are rare events (also known as imbalance data) issue occurs when there is a very small percentage of positive instances while the large number of negative instances dominates the detection model during the training process. Neural networks are a powerful technique to solve many problems; however, in many real world domains, available data sets are imbalanced. Empirical studies of the backpropagation algorithm show that the class imbalance problem generates unequal contributions to the mean square error in the training phase. Therefore, when presented with complex imbalanced data sets, this algorithm fails to represent the distributive characteristics of the data and resultantly provide unfavorable accuracies across the classes of the data. This study attempts to overcome the problem by applying a query-based learning technique to reduce the number of training samples required and the number of feature space dimensions. To achieve this goal, we developed a new method that can improve backpropagation's convergence time and generalization capabilities. Preserving the classification accuracy rates increase the overall execution efficiency by reducing both the uninformative training samples and the irrelevance of feature spaces. The data used in the experiments are the well-known KDD Cup 1999 intrusion detection data set and the UCI machine learning benchmark repository. The experimental result shows that, our method gives better performance in comparison with the conventional backpropagation neural networks, which combine queried samples and feature selection. The trained network has excellent performance in convergence time and generalization ability to improve imbalance problem.


Hall, M. A., & Smith, L. A. (1999). Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. Florida Artificial Intelligence Symposium. AAAI Press. 235-239.
Abe, N., Mamitsuka, H., & Nakamura, A. (1998). Empirical Comparison of Competing Query Learning Methods. In Proceedings of Discovery Science, 387-388.
Anand, R., Mehrotra, K. G., Mohan, C. K., & Ranka, S. (1993). An improved algorithm for neural network classification of imbalanced training sets. IEEE Transactions on Neural Networks, 4, 962-969
Anderson, R., & Khattak, A. (1998). The use of information retrieval techniques for intrusion detection, International Workshop on the Recent Advances in Intrusion Detection, Louvain-la-Neuve, Belgium.
Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 87-106.


