基於 LLVM 的二元碼轉譯器框架，以 RISC-V 為範例

可重定目標的二元碼轉譯系統利用 LLVM 的編譯系統將二元碼轉譯成 LLVM IR 後輸出至不同的指令集架構，結合動態二元碼轉譯 (DBT) 與靜態二元碼轉譯 (SBT) 組成混和型的二元碼轉譯系統 (HBT)，可以在靜態二元碼轉譯模式下利用 LLVM 進行優化，並利用動態二元碼轉譯避免複雜的分析。然而在建立新的二元碼轉譯器上，並沒有一個良好的指引來說明如何將二元碼轉譯至 LLVM IR。依據轉譯出來的 LLVM IR 不同，可能會對 LLVM 後續的優化造成影響，且產生出不同的 LLVM IR 樣式，需以不同的方式進行分析。使得本應獨立於不同來源指令集架構的分析難以在不同的二元碼轉譯系統之間重複利用，造成二元碼轉譯發展的阻礙。我們認為提供一套前端接口使開發不同的來源指令集架構的支援從開發二元碼轉譯系統中獨立出來，有助於將來更進一步的發展。為此，我們設計了一個基於 LLVM 的二元碼轉譯器框架 (Rabbit)。此二元碼轉譯器框架可以透過使用不同的前端插件以支援不同的來源指令集架構，並提供一套撰寫方式以簡化前端插件將指令轉譯成 LLVM IR 的程式碼，可以同時支援靜態與動態二元碼轉譯。在降低開發二元碼轉譯器前端的難度的同時，將二元碼轉譯器前端開發從二元碼轉譯系統中獨立出來，使開發二元碼轉譯系統的人員可以專注在二元碼轉譯上。此外，我們也改善了過去混和型的二元碼轉譯系統造成優化時間耗時、記憶體用量過大以及在動態二元碼轉譯與靜態二元碼轉譯之間的高額切換開銷問題，並避免了效能損失。

關鍵字

二元碼轉譯；混和式二元碼轉譯；靜態二元碼轉譯；拆分轉譯；多入口副程式

並列摘要

In order to share software components, such as optimization passes, among several binary translators, we need to establish a convention, such as the calling convention, for the generated LLVM IR. % from binary. A common convention can make the development of frontends for different ISAs independent of that of the backends. We first defined a convention. Based on this convention, we designed and implemented an LLVM-based binary translator framework called Rabbit. Rabbit contains a friendly frontend template. It uses the frontend plugins to support different source ISAs. The Rabbit framework can support both DBT and SBT. Furthermore, we proposed a Division method (see section 6.1) which can reduce the optimization time significantly (see Fig. 8.3).

並列關鍵字

binary translation ； hybrid binary translation ； static binary translation ； LLVM ； division translation ； multi-entry function ； RISC-V

參考文獻

[1] B.-Y. Shen, J.-Y. You, W. Yang, and W.-C. Hsu, “An llvm-based hybrid binary translation system,” in 7th IEEE international symposium on industrial embedded systems (SIES’12). IEEE, 2012, pp. 229–236.

Google Scholar

[2] S. B. Yadavalli and A. Smith, “Raising binaries to llvm ir with mctoll (wip paper),” in Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, ser. LCTES 2019. New York, NY, USA: Association for Computing Machinery, 2019, p. 213–218. [Online]. Available: https://doi.org/10.1145/3316482.3326354

Google Scholar

[3] A. Dinaburg and A. Ruef, “Mcsema: Static translation of x86 instructions to llvm,” in ReCon 2014 Conference, Montreal, Canada, 2014.

Google Scholar

[4] K. Anand, M. Smithson, K. Elwazeer, A. Kotha, J. Gruen, N. Giles, and R. Barua, “A compiler-level intermediate representation based binary analysis and rewriting system,” in Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013, pp. 295–308.

Google Scholar

[5] J. Hendrix, G. Wei, and S. Winwood, “Towards verified binary raising.”

Google Scholar

國際替代計量

基於 LLVM 的二元碼轉譯器框架，以 RISC-V 為範例

全文下載

主題瀏覽