透過您的圖書館登入
IP:18.188.38.142
  • 學位論文

低複雜度MIPS指令集架構亂序執行處理器架構的實現與設計

Design and implementation of a low complexity out-of-order processor based on MIPS instruction architecture

指導教授 : 許雅三

摘要


在近年來,亂序執行技術已經被廣泛應用在許多高效能處理器中,過去的技術無法有效的降低其能耗以及其架構的高複雜度,使得早期在市場上並不常見。亂序執行相較於一般的順序執行,能避開遇到資料相依性時,所導致後續指令也被延遲的情況,令彼此間為獨立關係的指令不會互相影響,進而提高整體系統在每個週期指令的執行量。 Tomasulo 為早期完整的亂序執行演算法,它使用一個空間去記錄、分析各指令之間的關係,並用多個運算元來讓這些存放的指令可以多個並行運算,也因此實現其架構的複雜度遠高於一般的順序執行。而本篇論文主要是實現並且修改基於 Tomasulo演算法的亂序執行架構的硬體,為了支援精準中斷,我們會使用Reorder Buffer來記錄原始程序的順序。我也會導入統一物理暫存技術(Unified Physical Register)來提升性能。 除了導入這些架構之外,本篇論文針對Tomasulo 架構的部分提出一些改進的想法,透過用暫存器指標取代真實資料值的方式,減少此架構部份的硬體面積,也讓線的複雜度能被降低,並且降低整體架構的功耗。 另外,這些導入的技術中一般只有簡單的介紹運作的想法,此篇論文會詳細的介紹這些架構的硬體實現方式,在各組件之間的溝通也會詳細的介紹。我使用verilog設計整個系統架構,合成部分則是依據TSMC 0.13 um 的製程來實現。

關鍵字

亂序執行 處理器

並列摘要


Recently, the technology of out-of-order execution is widely used in many high performance processors. In the past, the technology could not effectively reduce its power consumption and high complexity, so it was not common in the market. Compared to in-order execution, out-of-order execution can avoid machines stall when RAW hazards occur, so independent instructions can be overlapped. And then, IPC of the system will be improved. Tomasulo algorithm is the first algorithm, which provide full out-of-order execution. It use a space to record and analyze the relationship between instructions, and multiple functional units are used for achieving parallel operation. Therefore, the complexity of this architecture is far more than in-order execution. In this thesis, we implement and modify the hardware of the out-of-order execution system based on Tomasulo algorithm. To support precise interrupt, we use reorder buffer to record the program order. We also implement Unified Physical Register File to improve performance. Besides, we also put forward some improvement ideas for this algorithm by replacing true values with register indices. This way reduces hardware area and complexity of wiring, so the power consumption can be reduced. In this thesis, we will introduce the details of implementing this architecture in hardware. The communication between these submodules are also described in detail. I use Verilog to implement it, and the system is synthesized according to TSMC 0.13um cell library.

參考文獻


[1] John L. Hennessy and David A. Patterson, “Computer architecture : a Quantitative Approach (Fourth edition).”
[2] R. M. Tomasulo, “An efficient algorithm for exploiting multiple arithmetic units.” IBM Journal of Research and Development (1967)
[3] Description of the MIPS instruction set. [Online] Available: http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html
[4] Rajeev Balasubramonian, “Out-of-Order Processors,” October 13, 2007
[5] David Culler, “Graduate Computer Architecture Lec 6 - Scoreboard”

延伸閱讀