具可解釋性的集成式深度學習惡意程式分類模型

在科學及網路的快速發展下，惡意程式的迅速散播成為了無法忽視的一大問題。傳統的惡意程式檢測技術需要耗費大量的時間成本以及許多資安的領域知識，如何快速、有效率且準確的檢測惡意程式成為了一個相當重要的議題。近年來隨著人工智慧技術的興起，許多學者透過將惡意程式的分類問題轉到技術已經發展成熟的領域上(如電腦視覺、自然語言處理)來進行深度學習模型建模，並取得了相當不錯的成果。但是，這些方法卻缺少了可解釋性，除了針對惡意程式進行分類外並無法協助資安專家進一步的去做出分析。本論文提出一種基於注意力機制的集成深度學習分類模型，針對惡意程式中的不同區段進行不同的建模並將他們集成在一起。透過在兩個惡意程式資料集上進行驗證，我們提出的模型都取得了相當好的分類結果，並且能夠在準確且有效率的分類惡意程式的情況下，協助資安專家找出可疑的惡意程式碼區段。

關鍵字

惡意程式分類；集成模型；可解釋性人工智慧

並列摘要

As society becomes increasingly dependent on technology, malware attacks also become increasingly common and dangerous. Traditionally, experts mainly use static analysis and dynamic analysis to deal with malware. These methods, although effective, requires lots of time, effort, and domain knowledge to properly execute. However, not only do malware programmers become better at finding loopholes, but the number of new malware also grow rapidly. Recently, with the rise of artificial intelligence, many researchers have successfully applied computer vision and natural language processing techniques to malware classification by using deep learning models. These methods, however, lack interpretability. Although they can classify malware, they cannot assist security experts in any further analysis. In this thesis, we propose an interpretable ensemble deep learning model. We explore the malware program structure, including the data section and the code section. We build different classification models among different sections and combine them to classify the malware. To sum up, our proposed model can classify malware accurately as well as help security experts highlight the different parts of assembly codes between the different malware families, which provides potential interpretability. The experiment results support this claim.

並列關鍵字

malware classification ； ensemble model ； explainable AI

參考文獻

[1] Statistics report of Malware.url:https://www.av-test.org/en/statistics/malware.

Google Scholar

[2] Ekta Gandotra, Divya Bansal, and Sanjeev Sofat. “Malware analysis and classification: A survey”. In: Journal of Information Security2014 (2014).

Google Scholar

[3] Tony Abou-Assaleh et al. “N-gram-based detection of new malicious code”. In: Proceedings of the 28th Annual International Computer Software and ApplicationsConference, 2004. COMPSAC 2004.Vol. 2. IEEE. 2004, pp. 41–42.

Google Scholar

[4] Justin Sahs and Latifur Khan. “A machine learning approach to android malware detection”. In:2012 European Intelligence and Security Informatics Conference.IEEE. 2012, pp. 141–147.

Google Scholar

[5] Nikola Milosevic, Ali Dehghantanha, and Kim-Kwang Raymond Choo. “Machine learning aided Android malware classification”. In: Computers & Electrical Engi-neering61 (2017), pp. 266–274.

Google Scholar

國際替代計量

具可解釋性的集成式深度學習惡意程式分類模型

全文下載

主題瀏覽