幾乎所有的系統都會有系統日誌,內容記錄了系統執行時期豐富的資訊,包含開機、關機、登入、登出以及異常事件等資訊,管理人員可透過對系統日誌的分析,診斷系統是否有出現異常的情況,此外,在資安事件方面,也可藉由系統日誌的訊息,對可疑的攻擊行為發出警告。但隨著系統變得龐大且複雜,所產生的系統日誌數量也大幅成長,因此,需要透過自動異常偵測機制來取代人工查找的方式。而根據過去的研究,透過長短期記憶 (Long Short-Term Memory, LSTM) 建立的異常偵測模型,可有效偵測出異常的行為。但由於這類型異常偵測模型是透過Top-g參數的設定,決定出預測結果的候選清單數量,設定較大的數值代表較寬鬆的條件,有助於提升精確度(precision),但會降低召回率(recall),而較小的數值代表較嚴格的條件,能提升召回率,但會降低精確度,使用者常需要權衡準確度及召回率,無法同時兼顧。本研究提出動態Top-g的方法,將序列依照出現在正常及異常資料集的狀況作分類,於計算候選清單時,讓Top-g參數可以依照序列資料所屬的類別作動態設定,實驗結果發現,透過動態Top-g設定參數,精確度可達到92%,召回率可達到99%。
Almost all systems has system logs. It records rich information about startup, shutdown, login, logout, and error events. Administrators can analyze the system logs to diagnose whether the system is abnormal. In addition, in terms of information security, administrators can also use system logs to warn of suspicious attacks. But as the system becomes large and complex, the number of system logs generated has also grown substantially. Therefore, an automatic anomaly detection mechanism is needed to replace manual search. According to past research, the anomaly detection model established through Long Short-Term Memory (LSTM) can effectively detect abnormal behaviors. This type of anomaly detection model determines the candidate list through the Top-g parameter. Setting a larger value represents a looser condition, which helps to improve the precision, but it will reduce the recall. A smaller value represents a stricter condition, which can increase the recall, but will reduce the precision. Users often need to weigh the precision and the recall, which cannot be both. This study proposes a dynamic Top-g method, which classifies sequences according to the conditions that appear in normal and abnormal datasets. When calculating the candidate list, the Top-g parameter can be dynamically set according to the category of the sequence data. The experimental results found that through dynamic Top-g setting parameters, the precision can reach 92%, and the recall can reach 99%.