基於自注意力機制產生重要執行序的惡意程式家族分類系統

近年來惡意程式產生的數量急速增加，進而造成全球個人和企業的大量損失。瞭解惡意程式的企圖與目的，並萃取出關鍵重要之執行行為，將能對惡意程式偵測與防禦有莫大幫助。本論文提出自動化惡意程式重要執行序行為辨識系統，將以遞歸神經網路搭配自注意力機制做為架構基礎，分析惡意程式執行時的Windows API call invocations序列，學習並捕捉序列之間的關係，以自動辨識每一筆API call invocation在惡意特徵執行活動中是否為重要關鍵者，而能反應其惡意企圖。本論文系統包含了我們所設計的三個功能模組，分別為將API call invocations進行編碼的Embedder、計算每筆API call invocation於惡意程式執行序中重要性的Encoder、篩選出惡意程式重要執行序的Filter。透過此三個模組，我們便能建立惡意程式分析與家族行為態樣歸類的管線流程。使系統所輸出的重要執行序除了能讓資安研究人員迅速得知一惡意程式之特徵執行活動樣式的語意解釋，並正確判斷執行程式是否具有惡意以外，更能藉由重要執行序比較不同惡意程式特徵間的相似度，以對惡意程式進行分類或分群。從我們的實驗結果中，不僅證明了本論文系統各個功能模組相較於其他設計方法的有效性，也展現了系統的行為特徵辨識能力，得以將未知的惡意程式有效地分類出其行為態樣與惡意程式家族。除此之外，我們也將惡意程式的重要執行序以可視化的方式呈現，分析不同行為態樣與惡意程式家族特徵執行活動樣式之間的關係，顯示了同家族惡意程式變種間的行為多樣性與不同家族間共享相同行為的現象。

關鍵字

惡意程式；動態行為分析；執行序列分析；家族分類；自注意力機制；遞歸神經網路；重要執行序視覺化

並列摘要

In recent years, the number of malicious software (malware) has increased rapidly, which has caused a lot of losses for individuals and businesses around the world. Understanding the intention of malware and extracting key execution behaviors will considerably help detect and defend against malware. This research proposes an automated important execution sequence behavior identification system. The recurrent neural network and self-attention mechanism are used as the basis of the architecture. It is used to analyze Windows API call invocations sequence recording at runtime, and capture the relationship between API call invocations. To automatically identify malware whether each API call invocation is a characteristic API call in malicious behavioral activity, and can respond to its malicious intentions. The proposed system contains three functional modules, namely Embedder which vectorizes API call invocations, Encoder which calculates the importance of each API call invocation in the execution profiles, and Filter which extracts important API call invocations from the malware. Through these three modules, we can establish a pipeline for malware analysis and family classification. The important API call invocations of the system output allow the security analysts to quickly know the semantic interpretation of the characteristic execution pattern and classify or cluster malware by calculating the similarity score. Compared with other methods our experiments not only prove the effectiveness of the proposed functional modules in our system but also demonstrate the system's behavioral feature recognition ability, which can classify unseen malware correctly into their family. Additionally, we visualize the important API call invocations of the malware and analyze the relationship between different behavioral patterns and family characteristic execution patterns. We found that the malware family is pluralistic, and the same behavioral patterns can exist in many different families.

並列關鍵字

Self-Attention ； Recurrent Neural Network ； Behavioral pattern ； Malware family classification ； Malware analysis ； Dynamic analysis ； Important API calls visualization

參考文獻

McAfee. (2018). McAfee Labs Threats Report. Retrieved from https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-jun-2018.pdf

Google Scholar

AV-TEST. (2019). Malware. Retrieved from https://www.av-test.org/en/statistics/malware/

Google Scholar

Sun, Y. S., Chen, C.-C., Hsiao, S.-W., & Chen, M. C. (2018). ANTSdroid: Automatic Malware Family Behaviour Generation and Analysis for Android Apps. https://doi.org/10.1007/978-3-319-93638-3_48