透過您的圖書館登入
IP:3.145.115.195
  • 學位論文

使用擴增暫存器增進 x86 處理器之效能

Improving x86 Processor Performance via Extended Registers

指導教授 : 徐慰中
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


IA-32 為一個廣泛使用的指令集架構,為 x86 的 32 位元版本,指令的功能性很豐富但暫存器的數量則較少。如果在這個架構下我們可以擁有更多暫存器,藉由將更多變數保留在暫存器中或在指令排程上增加平行度,就有機會改進效能。雖然 64 位元版本的 Intel64 中暫存器的數量有所增加,但現存的 32 位元應用程式並沒有辦法利用到。其中不少嵌入式或桌面應用傾向於保持 32 位元避免額外增加的資料量。此外,作業系統或執行時期函式庫也可能需要重新編譯甚至重新開發。 在本論文中,我們設計了一個方法讓 32 位元的程式有機會利用到這些延伸暫存器。在我們的設計中,IA-32 的暫存器陣列被增加到 16 個,這個架構稱為 RegX16。32 位元的程式可以被編譯成 RegX16,但仍直接和舊有的 IA-32 函式庫連結,稱為混和模式 (mixed-mode) 二進制檔,同時包含了原架構和延伸架構的指令在其中。在這樣的架構中需要處理器模式 (processor mode) 來分辨當前指令的架構以正確解碼。執行時在遇到從 IA-32 的指令執行到 RegX16 指令時,處理器模式需要被切換。我們實作了一個編譯器可以利用到延伸暫存器,並在暫存器分配和指令排程時得到好處,並且自動地加入模式切換的指令。模式的切換是根據一個我們設計的方法,並且最佳化過以降低模式切換的負擔。 我們使用 EEMBC 效能測試集來評估 RegX16 對效能的改進,在純 RegX16 模式下最大的改進幅度為 19.5%,平均則有 10.9%。在測試程式和舊有的 32 位元函式庫連結的情況下,因為模式切換的額外負擔,平均的改進為 5.1%,其中非必要的模式切換已被我們以連結時期最佳化 (LTO) 削減,對某些程式下,混合模式仍然可以達到 21.2\% 的加速。此外,我們也評估了延伸暫存器對指令排程的改善。我們精確的根據 RegX16 處理器的架構來設計我們的指令排程方式,以完全利用延伸暫存器帶來的優勢。在不使用延伸暫存器時,指令排程僅能帶來 3.9% 的加速,但一併使用延伸暫存器時,整體的效能改進9.7%。

關鍵字

延伸暫存器

並列摘要


IA-32, the 32-bit version of x86, is a commonly used ISA (Instruction Set Architecture), which has feature-rich instruction set but only several registers. If there are more general purpose architectural registers defined in the ISA, the performance can be improved by promoting more variables to registers, holding more temporaries in registers, and exposing more ILP (Instruction Level Parallelism) for code scheduling. Although the 64-bit version, Intel64, has been extended with more registers, such extended registers cannot be exploited by 32-bit applications. Many embedded and desktop applications prefer to stay in 32-bit mode to avoid increased data working set. In addition, the operating system and runtime libraries have to be recompiled or even redeveloped for such a new architecture. In this thesis, we design a mechanism which gives 32-bit applications an opportunity to exploit the extended registers. In our design, the general purpose register file in the original IA-32 is extended to 16 registers. We call this extended architecture RegX16. A 32-bit application could be recompiled to RegX16 yet still linked with the legacy IA-32 libraries (in executable format). Such an application binary is called mixed-mode binary, which is consist of instructions from both the original and the extended ISAs. Processor mode is introduced to identify which ISA is in use so that the current instruction can be correctly decoded. During binary execution, the processor mode has to be explicitly switched when transiting between the IA-32 and the RegX16 mode. We implement a compiler that automatically take advantages of the extended registers in both the register allocation and the code scheduling phases. Furthermore, our compiler also automatically inserts mode switching instructions to mixed-mode binaries according to our mode switching mechanism. Optimizations to reduce mode switching overhead are also in place. The EEMBC benchmark suite is used to evaluate the performance improvement of RegX16, the greatest improvement observed is 19.5%, with an average speedup of 10.9 for the pure RegX16 binary. If the benchmarks have to be linked with legacy 32-bit libraries as mixed-mode binaries, the improvement is lowered to 5.1% on average due to the increased mode switching overhead. In the above experiments, we have exploited the link-time optimization (LTO) to eliminate unnecessary mode switching. For some applications, LTO has been quite effective, in one case, the mixed mode application still can get 21.2% of performance gain from the extended register. Furthermore, we also evaluate the performance improvement of exploiting extended registers on code scheduling. We have more accurately modeled the RegX16 micro-architecture to fully exploit the extended register in code scheduling. Our revised code scheduling model improves the performance by 3.9% without using the extended registers. When the extended registers are used, the average performance gain increased to 9.7%.

並列關鍵字

x86 extended register mixed mode mode switching

參考文獻


[1] EEMBC - Embedded Microprocessor Benchmarks. http://www.eembc.org/benchmark/products.php. (Accessed on 08/15/2016).
[2] writing an llvm backend - llvm 3.8 documentation.
[3] Intel Corporation. Intel Itanium Architecture Software Developer’s Manual. October 2002.
[4] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual. Number 253669-033US. December 2009.
[5] Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. Mar 2004.

延伸閱讀