網路應用程式中,許多重大的資安漏洞可以歸因於字串操作的問題,而通常可以藉由字串分析來避免這些漏洞。因此字串分析的工具是不可或缺,且正在蓬勃發展的。目前,字串分析的方法主要分為兩種:基於自動機的方法與基於可滿足性的方法。而這兩種方法各有各的優缺點。具體來說,基於自動機的方法不容易產生實際的攻擊例子,而基於可滿足性的方法則不適用於產生可以即時過濾可疑輸入字串的過濾器。在這篇論文中,我們提出了一個新的字串分析的方法,使用邏輯電路表示有限自動機以支援許多字串與自動機的操作。我們的方法可以同時支援攻擊實例生成與過濾器的合成。而藉由這個新的資料結構,我們可以有效地表示具有相當大的態空間與字母的自動機。在針對相當多的開放原始法網路應用程式與著名的攻擊手法進行實驗後,可以看出我們提出的方法相較於之前的字串分析工具是有其優勢的。
Many severe security vulnerabilities in web applications can be attributed to string manipulation mistakes, which can often be avoided through formal string analysis. String analysis tools are indispensable and under active development. Prior string analysis methods are primarily automata-based or satisfiability-based. The two approaches exhibit distinct strengths and weaknesses. Specifically, existing automata-based methods have difficulty in generating counterexamples at system inputs to witness vulnerability, whereas satisfiability-based methods are inadequate to produce filters amenable for firmware or hardware implementation for real-time screening of malicious inputs to a system under protection. In this thesis, we propose a new string analysis method based on a scalable logic circuit representation for (nondeterministic) finite automata to support various string and automata manipulation operations. It enables both counterexample generation and filter synthesis in string constraint solving. By using the new data structure, automata with large state spaces and/or alphabet sizes can be efficiently represented. Empirical studies on a large set of open source web applications and well-known attack patterns demonstrate the unique benefits of our method compared to prior string analysis tools.