結合深度學習方法及動態分析之函式相似度比較系統

一個能比較函式相似度的系統可以幫助我們理解未知函式的行為與意圖，舉例來說，透過這個系統我們可以把加密演算法和勒索軟體中的函式一一比對相似度，相似度最高的函式就極有可能是其中負責加密的函式，這個方法將能幫助資安分析人員快速找到勒索軟體中關鍵的部分。在關於函式相似度比較的研究中，使用神經網路的作法越來越熱門，但大多數的研究都只使用了靜態分析的資訊來學習。對於應用了程式加殼以及其他混淆技巧的樣本，靜態分析能取得的資訊並不多。因此我們提出一個使用動態分析結合神經網路的模型來解決函數相似度匹配的問題。我們在動態分析中除了記錄下執行過的彙編指令外，也會即時偵測迴圈並記錄下迴圈的資訊。這些記錄下的資訊會以函式為單位輸入進我們的神經網路模型中，同時我們的神經網路也整合了內部程序呼叫的資訊，最後產出一個能代表該函式的內嵌向量，並使用餘弦相似度來比較這些向量。我們使用 2,668 個執行檔中的 118,529 個函式來訓練我們的模型，這些執行檔是用兩種不同的編譯器和四種不同的優化參數編譯而成的。我們比較的對象是其他也使用神經網路的研究，我們模型的準確率分別比其他兩個最先進的研究還高出了 0.2% 和 15.9%，而且我們的研究顯示我們的模型能成功辨識出加殼程式中的函式，而其他使用靜態分析的研究則無法辨識。

關鍵字

動態分析；函式相似度；神經網路；自然語言處理；程式加殼

並列摘要

Comparing function similarity can help us understand the behavior and intent of unknown functions. For example, we can compare the similarity between cryptographic functions and functions in ransomware, and the function with the highest similarity is likely to be responsible for encryption. This method can help security researchers locate the key parts of the ransomware more efficiently. In the research on function similarity comparison, the use of neural networks is becoming more and more popular. However, most studies only use data collected from static analysis to learn and therefore perform poorly on samples that use binary packing or other obfuscation techniques. To achieve a more in-depth analysis, we propose a model that uses dynamic analysis combined with neural network to solve the problem of function similarity comparison. In dynamic analysis, besides recording the executed assembly instructions, we also detect loops on the fly. The recorded data will be input into our neural network model as units of functions. Our neural network also integrates the information of interprocedural calls. Finally, the model generates embedding vectors that can represent these functions, and then we can compare these embedding vectors with cosine similarity. For evaluation, we use 118,529 functions in 2,668 executable files to train our model. The executable files are compiled with two different compilers and four different optimization parameters. Comparing to other neural network based approach, the accuracy of our model is 0.2% and 15.9% higher than the other two state-of-the-art. Our research shows that our model can successfully identify functions in binary-packed malware, while other state-of-the-arts using static analysis failed to identify.

並列關鍵字

Dynamic Analysis ； Function Similarity ； Neural Network ； Natural Language Processing ； Binary Packing

參考文獻

[1] A. V. Aho. Principles of compiler design. Technical report, 1977.

Google Scholar

[2] BCon. About basic implementations of standard cryptography algorithms, like aes and sha1.

Google Scholar

[3] J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Säckinger, and R. Shah. Signature verification using a “siamese"time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence, 7(04):669– 688, 1993.

Google Scholar

[4] J. Calvet, J. M. Fernandez, and J.Y. Marion. Aligot: Cryptographic function identi fication in obfuscated binary programs. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 169–182, 2012.

Google Scholar

[5] M.Chandramohan,Y.Xue,Z.Xu,Y.Liu,C.Y.Cho,andH.B.K.Tan.Bingo:Cross architecture crossos binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 678–689, 2016.

Google Scholar

國際替代計量

結合深度學習方法及動態分析之函式相似度比較系統

全文下載

主題瀏覽