資安風險與威脅是數位時代最大的議題。如何利用有效地惡意威脅活動偵察、即早預警,達到主動式防禦(Proactive Defense),是目前全世界各國各領域的共識與希冀。而達到這個目標,我們認為深度掌握與充分瞭解惡意活動的特徵與所關連的各種系統、網路的資源接觸與操作是達到此目標的關鍵。本研究透過執行惡意攻擊腳本,收集其系統事件紀錄,標記出關鍵的攻擊手法 (Techniques) 等作業與行為特徵,建立MITRE ATT&CK攻擊手法辨識知識庫,並利用此知識庫訓練深度學習模型,進行以Technique為基準的威脅狩獵 (Technique-based Threat Hunting ) 任務。 本研究分為兩階段,第一階段為建立 MITRE ATT&CK Technique 攻擊手法辨識知識庫,第二階段為利用知識庫的資料訓練深度學習模型完成 Technique 威脅狩獵的任務。 在第一階段中,我們首先蒐集MITRE ATT&CK Technique 攻擊手法的系統日誌資料集。我們利用CALDERA 平台以及APT29 Evaluation 提供的多種 MITRE ATT&CK 攻擊手法標籤的攻擊腳本(Abilities),經由執行各個攻擊腳本,與利用Windows作業系統中的 Process Monitor收集其系統日誌(Audit log),並從中標記出關鍵的攻擊手法行為特徵。為了解決系統日誌資料量不足的問題,我們藉由替換攻擊手法行為特徵中的Artifacts(如user name、file name、C2 server IP等),擴增系統日誌資料集。此外,Sigma rules作為偵測Windows系統日誌的標準,也被納入知識庫的一部分。 第二階段的重點在於開發能夠辨識系統日誌中MITRE Techniques攻擊手法的深度學習模型。我們首先將系統日誌建構成溯源圖,溯源圖可以有效地追蹤並理解事件發生的順序和因果關係,為了降低溯源圖的複雜度,我們採用了Causality Preserved Reduction (CPR)技術進行資料縮減。縮減後的資料經由SecureBERT轉換為詞嵌入(Embedding)作為模型輸入。本研究的模型使用序列模型結合注意力機制來進行威脅狩獵任務。此外,本研究探討了如何將由正則表達式(RE)組成的Sigma rules與深度學習模型結合,以增強模型對於Techniques的識別能力。 研究結果顯示,我們的模型可以很好地從系統日誌中辨識出Technique攻擊手法,並且藉由Sigma rules與深度學習模型結合的方式,可以有效改善模型對於特定Techniques的辨識表現。
Cybersecurity risks and threats are among the most pressing issues in the digital age. Achieving proactive defense through effective reconnaissance of malicious activities and early threat detection is a shared goal across nations and industries worldwide. We believe that a deep understanding of the characteristics of malicious activities and their interactions with various system and network resources is key to this goal. In this study, we executed malicious attack scripts to collect system audit logs, identifying MITRE ATT&CK Techniques and behavioral patterns, and constructed a knowledge base of MITRE ATT&CK Techniques. This knowledge base was then used to train deep learning models for Technique-based threat hunting. The research is divided into two stages. The first stage involves building a comprehensive knowledge base of MITRE ATT&CK Techniques. The second stage focuses on training deep learning models using this knowledge base to perform Technique-based threat hunting tasks. In the first stage, we collected a dataset of audit logs associated with MITRE ATT&CK Techniques. By leveraging the CALDERA platform and APT29 Evaluation, which provide various labeled attack scripts (abilities) corresponding to MITRE ATT&CK Techniques, we executed these scripts and used the Process Monitor tool in Windows to collect audit logs, identifying key behavioral characteristics of the Techniques. To address the issue of limited audit log data, we augmented the dataset by varying artifacts within the Technique behaviors (e.g., user names, file names, C2 server IPs). Additionally, Sigma rules, widely used for detecting Windows system logs, were incorporated into the knowledge base. The second stage focuses on developing a deep learning model capable of identifying MITRE Techniques within audit logs. We first constructed provenance graphs from the audit logs to effectively trace and understand the sequence and causal relationships of events. To manage the complexity of these provenance graphs, we employed Causality Preserved Reduction (CPR) techniques for data reduction. The reduced data was then transformed into embeddings via SecureBERT for input into the model. Our model utilizes a sequence-based architecture combined with an attention mechanism to perform threat hunting tasks. Furthermore, this study explores how integrating Sigma rules, composed of regular expressions (RE), with the deep learning model can enhance its ability to identify Techniques. The results demonstrate that our model can effectively identify MITRE Techniques from audit logs. Additionally, the integration of Sigma rules with the deep learning model significantly improves its performance in recognizing specific Techniques.