以三值樹狀自動機為基礎之惡意程式分析

網路上存在許多不同的資安威脅，其中最惡名昭彰的就是惡意程式。惡意程式指的是那些帶有惡意企圖並且有惡意行為的程式。典型的惡意程式包含了病毒、蠕蟲、木馬與間諜軟體。惡意程式偵測軟體可降低我們被惡意程式攻擊的風險。不同的偵測軟體有其各別的偵測方法，但在現今的偵測軟體中最基本也最普遍被應用的方法是靜態特徵碼比對。然而這種偵測方法已經普遍的被認為無法對抗現今更為進階的惡意程式。進階的惡意程式使用程式混淆的手法改變程式本身的結構，也因此能夠非常輕易的避過偵查軟體。幸運的是，程式本身的語意在經過混淆後通常仍會保持一致，因此一個可行的對策就是設計一個以程式語意為基礎的偵測方法。在這篇論文裡，我們提出一個以程式語意為基礎的惡意程式偵測方法。觀察近年來被提出的各種辦法，我們發覺字串仍然被廣泛的使用為一種特徵碼的形式。我們可以將字串擴充為樹，樹不但更為一般化而且帶有更多的語意資訊。因此，我們使用樹做為我們偵測軟體的特徵碼形式。我們的偵測軟體需要一組惡意程式跟一組正常程式做為輸入。這些程式的語意會以系統呼叫的資料相依圖表示。接著，我們將這些相依圖解析為樹。使用文法推論的方法，我們最後可以得到一個三值的樹狀自動機。一個三值的樹狀自動機有三個互斥的最終狀態：接受、拒絕、與未知。如果我們使用三值樹狀自動機做為我們的惡意程式偵測軟體，根據輸入的程式他會有三種可能的輸出值。如果輸入的程式是一個惡意程式，偵測軟體會輸出是。如果輸入的程式是一個正常程式，偵測軟體會輸出否。其餘的狀況下他會輸出未知。根據我們的實驗結果，我們的偵測軟體有著相當低的誤報。然而，相對的代價是許多程式經過軟體檢查後的結果是未知。

關鍵字

惡意程式分析；惡意程式偵測軟體；文法推論；三值自動機；程式語意；系統呼叫

並列摘要

There exist many security threats on the Internet, and the most notorious is malware. Malware (malicious software) refers to programs that have malicious intention and per- form some harmful actions. Typical malware includes viruses, worms, trojan horses, and spyware. The rst line of defense to deter malware is malware detector. Each malware detector has its own analysis method. The most basic and prevalent methods used in commercial malware detectors are based on syntactic signature matching. It is widely recognized that this detection mechanism cannot cope with advanced malware. Advanced malware uses program obfuscation to alter program structures and therefore can evade the detection easily. However, the semantics of a malware instance is usually preserved after obfuscation. So, it is feasible to develop a malware detector that is based on pro- gram semantics. In this thesis, we propose a semantics-based approach to malware analysis. Observing recently proposed methods for malware detection, we notice that string-based signatures are still used widely. It is natural to extend from string to tree, which is more general and can carry more semantics. Therefore, we use trees as signatures. Our malware detector requires a set of malware instances and a set of benign programs. The semantics of each input program is extracted and represented as a system call dependence graph. The graph is then transformed into a tree. With the set of trees generated from malware and benign programs, we use the method of grammatical inference to learn a 3-valued deter- ministic nite tree automaton (3DFT). A 3DFT has three di erent nal states: accept, reject, and unknown. If we take this 3DFT as the malware detector, it outputs three di erent values. If an input program is a malware instance, the detector outputs true. If an input program is a benign program, the detector outputs false. Otherwise, it outputs unknown. According to our experiments, our detector exhibits very low false positives. However, there is a tradeo that many programs are identi ed as unknown.

並列關鍵字

Malware Analysis ； Malware Detector ； Grammatical Inference ； 3-Valued Automata ； Program Semantics ； System Call

參考文獻

[15] Matt Fredrikson, Somesh Jha, Mihai Christodorescu, Reiner Sailer, and Xifeng Yan.

[18] Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda,

[19] Lorenzo Martignoni, Elizabeth Stinson, and John C. Mitchell. A layered architecture

[8] Guillaume Bonfante, Matthieu Kaczmarek, and Jean-Yves Marion. Architecture of a

[4] Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Com-

國際替代計量

以三值樹狀自動機為基礎之惡意程式分析

主題瀏覽