系統日誌威脅偵測：圖形知識庫與深度學習模型建立

隨著網路安全威脅的日益嚴峻，企業對有效的威脅狩獵（Threat Hunting）方法愈發重視。傳統的威脅狩獵方法主要依賴安全專家的專業知識來制定威脅檢測規則，但隨著人工智慧的發展，利用機器學習和深度學習實現威脅狩獵的研究逐漸興起。本論文提出了一種新穎的威脅狩獵方法，旨在探討深度學習模型在該領域的應用，以此降低對資安專家知識的依賴。本論文分為兩階段，包括構建圖形化威脅知識庫和開發深度學習模型以識別未知圖中的攻擊模式（Attack Pattern）。第一階段首先使用 MITRE 公司底下的開源自動化攻擊模擬平台 Caldera 和適用於 Windows 作業系統的系統程序監視工具 Process Monitor 來紀錄受害主機的系統日誌，其中我們提取了 78 種攻擊手法（Techniques），並構建了 167 個攻擊模式圖（Attack Pattern Graph）。這些攻擊模式圖被整合成一個圖形化威脅知識庫。該圖形化威脅知識庫不僅解決了訓練模型時攻擊資料不足的困境，更具備易於理解和分析的優勢，符合威脅狩獵研究中強調溯源圖分析（Provenance Graph Analysis）的趨勢。在第二階段，我們將這些圖形和基於文字的資料轉換為適合深度學習模型的數值嵌入（Embeddings）。我們評估了四種嵌入模型：TransX 系列（TransE、TransH、TransR）和利用網路安全詞庫預訓練過的語言模型 SecureBERT。並實現了三種深度學習模型：MLP、RNN 和 GraphSAGE。實驗結果表明，深度學習模型在威脅狩獵任務中表現優於開源 Sigma 規則。此外，我們還分析了 Sigma 規則的偵測範圍涵蓋程度和偵測效果，為後續的相關研究提供了新的視角。

關鍵字

威脅狩獵；深度學習；溯源圖；知識庫；多層感知器；循環神經網路；圖神經網絡

並列摘要

With the increasing severity of cybersecurity threats, enterprises are paying more and more attention to effective threat-hunting methods. Traditional threat-hunting methods mainly rely on domain-specific expertise to formulate detection rules. However, with the development of artificial intelligence, research on using machine learning and deep learning to achieve this goal has gradually emerged. This paper proposes a novel two-stage threat-hunting method, which aims to explore the application of deep learning models in this field to reduce the reliance on security expert knowledge. The paper is divided into two phases, including the construction of a graphical threat knowledge base and the implementation of deep learning models to identify Attack Patterns(APs) in unknown graphs. In the first phase, we use Caldera, an open-source automated attack simulation platform under MITRE, and Process Monitor, a system process monitoring tool for Windows operating systems, to record the audit log of the victim host. Among them, we extracted 78 ATT\\&CK Techniques and constructed 167 Attack Pattern Graphs(APGs). These APGs are integrated into a graphical threat knowledge base. This knowledge base not only solves the problem of insufficient attack data but also has the characteristics of easy interpretation, which conforms to the trend of emphasizing provenance graph analysis in threat hunting research. In the second phase, we convert these graphical and text-based data into numerical embeddings suitable for deep learning models. We evaluate four embedding models: TransX family (TransE, TransH, TransR) and SecureBERT, a language model pre-trained on a cybersecurity corpus. In the end, we implemented three deep learning models: MLP, RNN, and GraphSAGE to explore the distinct capabilities and features of each. Experimental results show that deep learning models outperform open-source Sigma rules in our threat-hunting tasks. In addition, we also analyze the coverage and detection effect of Sigma rules, providing a new perspective for follow-up research.

並列關鍵字

Threat Hunting ； Deep Learning ； Provenance Graph ； Knowledge Base ； MultiLayer Perceptron ； Recurrent Neural Network ； Graph Neural Network

參考文獻

What is threat hunting? https://www.ibm.com/topics/threat-hunting.

Google Scholar

A numbers game: How many alerts are too many to handle? https://www.bankinfosecurity.com/whitepapers/ numbers-game-how-many-alerts-too-many-to-handle-w-1289, 2015.