具事件穿插特性的系統日誌的數據挖掘分析框架

我們考慮一種情景，即工程師分析用戶發送的系統日誌以解決故障。用戶通常因系統運行有缺陷的程序路徑而遇到麻煩，這會生成一系列稱為“故障事件”的模板。工程師的目標是探索未知故障事件的性質，並確定系統日誌中是否包含已知的故障事件。主要挑戰在於來自不同任務的日誌在系統日誌中交錯存在，此外，大規模系統服務會生成多種多樣的日誌。這些因素使得故障排除過程極其耗時，因為工程師需要確認系統日誌每一行之間的相關性。在本論文中，我們提出了一種新的故障排除框架，模板-模式-事件，通過將代表相同行為的日誌聚合成同一模式來減少系統日誌的複雜性。其次，我們提出了一種模板聚類算法，從具有交錯特徵的系統日誌數據中學習模式。第三，我們引入了事件追踪算法，以識別系統日誌中故障事件的位置。通過我們提出的新架構，故障排除過程將更加簡化和高效。

關鍵字

系統日誌分析；偵錯；事件穿插

並列摘要

We consider the scenario where engineers analyze system logs sent from users for troubleshooting. Users typically encounter trouble due to the system running a defective program path, which generates the sequence of templates called the "trouble event." The engineers' goal is to explore the nature of unknown trouble events and to determine whether a system log contains any known trouble events. The main challenge lies in the fact that logs from different tasks are interleaved in the system log, and additionally, large-scale system services generate a wide variety of logs. These factors make the process of troubleshooting extremely time-consuming, as engineers need to confirm the relevance between logs in each line of the system log. In this thesis, we propose a new troubleshooting framework, template-pattern-event, which reduces the complexity of the system log by aggregating logs that represent the same system behavior into the same pattern. Secondly, we propose an algorithm, Template Clustering, to learn patterns from system log data with interleaving characteristics. Thirdly, we introduce the Event Trace algorithm to identify the positions of trouble events in the system log. With our proposed new architecture, the troubleshooting process will be simplified and more efficient.

並列關鍵字

System log analysis ； Troubleshooting ； Interleaved system log

參考文獻

Ivan Beschastnikh, Yuriy Brun, Michael D Ernst, Arvind Krishnamurthy, and Thomas E Anderson. Mining temporal invariants from partially ordered logs. In Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, pages 1–10. 2011.

Google Scholar

Ivan Beschastnikh, Yuriy Brun, Michael D Ernst, and Arvind Krishnamurthy. Inferring models of concurrent systems from logs of their behavior with csight. In Proceedings of the 36th International Conference on Software Engineering, pages 468–479, 2014.

Google Scholar

Min Chen, Anne Trefethen, René Bañares-Alcántara, Marina Jirotka, Bob Coecke, Thomas Ertl, and Albrecht Schmidt. From data analysis and visualization to causality discovery. Computer, 44(10):84–87, 2011.

Google Scholar

Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, and Michael R. Lyu. Experience report: Deep learning-based system log analysis for anomaly detection. arXiv preprint arXiv:2107.05908, 2021.

Google Scholar

Min Du, Feifei Li, and Guanlin Li. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1285–1298, 2017.

Google Scholar

主題瀏覽