透過您的圖書館登入
IP:3.14.128.23
  • 學位論文

系統日誌異常檢測方法的效能評估

Performance Evaluation of Anomaly Detection Methods for System Log Data

指導教授 : 洪士灝
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


幾乎所有的系統都會有系統日誌,內容記錄了系統執行時期豐富的資訊,包含開機、關機、登入、登出以及異常事件等資訊,管理人員可透過對系統日誌的分析,診斷系統是否有出現異常的情況,此外,在資安事件方面,也可藉由系統日誌的訊息,對可疑的攻擊行為發出警告。但隨著系統變得龐大且複雜,所產生的系統日誌數量也大幅成長,因此,需要透過自動異常偵測機制來取代人工查找的方式。而根據過去的研究,透過長短期記憶 (Long Short-Term Memory, LSTM) 建立的異常偵測模型,可有效偵測出異常的行為。但由於這類型異常偵測模型是透過Top-g參數的設定,決定出預測結果的候選清單數量,設定較大的數值代表較寬鬆的條件,有助於提升精確度(precision),但會降低召回率(recall),而較小的數值代表較嚴格的條件,能提升召回率,但會降低精確度,使用者常需要權衡準確度及召回率,無法同時兼顧。本研究提出動態Top-g的方法,將序列依照出現在正常及異常資料集的狀況作分類,於計算候選清單時,讓Top-g參數可以依照序列資料所屬的類別作動態設定,實驗結果發現,透過動態Top-g設定參數,精確度可達到92%,召回率可達到99%。

並列摘要


Almost all systems has system logs. It records rich information about startup, shutdown, login, logout, and error events. Administrators can analyze the system logs to diagnose whether the system is abnormal. In addition, in terms of information security, administrators can also use system logs to warn of suspicious attacks. But as the system becomes large and complex, the number of system logs generated has also grown substantially. Therefore, an automatic anomaly detection mechanism is needed to replace manual search. According to past research, the anomaly detection model established through Long Short-Term Memory (LSTM) can effectively detect abnormal behaviors. This type of anomaly detection model determines the candidate list through the Top-g parameter. Setting a larger value represents a looser condition, which helps to improve the precision, but it will reduce the recall. A smaller value represents a stricter condition, which can increase the recall, but will reduce the precision. Users often need to weigh the precision and the recall, which cannot be both. This study proposes a dynamic Top-g method, which classifies sequences according to the conditions that appear in normal and abnormal datasets. When calculating the candidate list, the Top-g parameter can be dynamically set according to the category of the sequence data. The experimental results found that through dynamic Top-g setting parameters, the precision can reach 92%, and the recall can reach 99%.

參考文獻


[1] Amrouche, F., et al. Graph-based malicious login events investigation. in 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). 2019. IEEE.
[2] Chollet, F., Deep Learning with Python. 2017: Manning Publications Co.
[3] Du, M., et al. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017.
[4] Fu, Q., et al. Execution anomaly detection in distributed systems through unstructured log analysis. in 2009 ninth IEEE international conference on data mining. 2009. IEEE.
[5] He, P., et al. Drain: An online log parsing approach with fixed depth tree. in 2017 IEEE International Conference on Web Services (ICWS). 2017. IEEE.

延伸閱讀