應用於嵌入式處理器之暫存器分配技術

隨著科技的進步，嵌入式系統被廣泛得應用在各個方面，所以各種應用於嵌入式系統的最佳化被不斷的提出與改進，而這些最佳化分為硬體、軟體或是混合式。軟體的最佳化，可以透過加入編譯器的方式來自動產生最佳化的程式碼。暫存器分配是編譯器中一個相當重要的最佳化，但是暫存器分配會受到暫存器的硬體設計影響，所以我們在這邊提出了幾個解決於嵌入式處理器上不同問題的暫存器分配方法。首先是解決分散式暫存器檔案架構的方法。我們有一個VLIW(very long instruction word)的處理器採用分散式暫存器檔案架構，但使用一般的暫存器分配不能有效的分配合適的暫存器，所以我們提出了方法，在分配暫存器前，先針對指令的類型去指派適合的暫存器檔案，來讓原本德暫存器分配器可以分配更合適的結果。接著，擴充暫存器數量一直是許多研究的目標，但是由於對指令集架構(instruction set architecture)帶來太大的影響，所以都是透過特殊的方法來達成。我們在這邊提出了一個簡單的方式，軟體的編碼方式與簡單的硬體解碼，可以輕微的修改指令集架構就完成擴充暫存器數量，卻不太需要付出太多的成本。為了達到更高的效能，我們配合前面的方法提出了一個滑動暫存器分配，針對邊碼的特性，來獲得更好的效能。最後，由於嵌入式系統上，溫度的影響相當巨大，而我們觀察到暫存器檔案常常會發生尖峰的高溫。指自去歸納原因之後，發現因為暫存器分配不佳導制會有這樣的情況。所以為了解決這個情況，我們也在這邊提出了一個溫度感知暫存器分配方法來降低暫存器檔案上尖峰的溫度，使系統更加穩定。我們將三個方法實作，並透過實驗的結果來證明我們的三個方法，都有辦法可以達成我們預期的目標。於分散式暫存器檔案架構上，我們確實的提升了程式的效能；在擴充暫存器上，我們透過了更多的暫存器來達成更好的效能，而且只需要重新編譯舊程式就可以支援這個硬體的修改；最後，溫度感知暫存器也確實可以降低高峰溫度。

關鍵字

嵌入式處理器；暫存器分配；編譯器

並列摘要

Embedded systems have been used in many aspects in recent years as the advance of technology. Therefore, to optimize embedded systems is a very important research topic. To apply the optimizations by handwriting code is complicated and wasting time. Compilers can provide an alternative way to optimize code with simple way by just recompiling the programs. Register allocation is one of most important passes in compiler optimizations. Register allocation is depended on the design of hardware. This thesis contains three optimizations focused on different point on different architectures. First, an approach to solve performance issue on a heterogeneous dual-core processor is proposed. The processor, called UniDual, is a two-way VLIW (very long instruction word) unified RISC (reduced instruction set computers)/DSP (digital signal processors) core with shared-based clustered register architecture. In this thesis, a scheduling and instruction transformation approach to support the processor is presented. The proposed approach schedules instructions and then transforms overlapped instructions into RISC and DSP instructions by taking communication overhead and hardware limitations into account. Second, number of processor registers is very important in compiler optimizations to exploit ILP (instruction level parallelism) but number of registers is limited by width of register fields in ISA (instruction set architecture). Increasing an extra bit of register field may cause increasing length to instruction that impacts code size fetch directly and also complicates the decoding process in the pipeline. In this thesis, a new encoding and decoding method is proposed to extend number of registers without additional cost to ISA. Actually, this method has some restrictions, so we implement a new register allocation with optimizations to overcome the restrictions and improve performance. Finally, the register file has been shown as a thermal hotspot while executing workloads. To avoid side effects caused by high temperature, reducing the temperature of a register file is a very efficient way. In this thesis, a compilation optimization to address this issue based on loop unrolling without hardware support is focused on. The approach can distribute the temperature evenly across all registers with respect to a workload.

並列關鍵字

Embedded processor ； Register allocation ； Compiler

參考文獻

[2] A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for vliws: a preliminary analysis of tradeoffs. In Proceedings of the 25th annual international symposium on Microarchitecture, MICRO 25, pages 292-300, Los Alamitos, CA, USA, 1992. IEEE Computer Society Press. ISBN 0-8186-3175-9. doi: 10.1145/144953.145839. URL http://dx.doi.org/10.1145/144953.145839.

[4] P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood. Lx: a technology platform for customizable vliw embedded processing. In Proceedings of the 27th annual international symposium on Computer architecture, ISCA '00, pages 203-213, New York, NY, USA, 2000. ACM. ISBN 1-58113-232-8. doi: 10.1145/339647.339682. URL http://doi.acm.org/10.1145/339647.339682.

[5] J. A. Swensen and Y. N. Patt. Hierarchical registers for scientific computers. In Proceedings of the 2nd international conference on Supercomputing, ICS '88, pages 346-354, New York, NY, USA, 1988. ACM. ISBN 0-89791-272-1. doi: 10.1145/55364.55398. URL http://doi.acm.org/10.1145/55364.55398.

[6] J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Two-level hierarchical register file organization for vliw processors. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, MICRO 33, pages 137-146, New York, NY, USA, 2000. ACM. ISBN 1-58113-196-8. doi: 10.1145/360128.360143. URL http://doi.acm.org/10.1145/360128.360143.

[9] A. Gangwar, M. Balakrishnan, and A. Kumar. Impact of intercluster communication mechanisms on ilp in clustered vliw architectures. ACM transactions on design automation of electronic systems, 12(1):1:1-1:29, February 2007. ISSN 1084-4309. doi: 10.1145/1188275.1188276. URL http://doi.acm.org/10.1145/1188275.1188276.

國際替代計量

應用於嵌入式處理器之暫存器分配技術

未授權

主題瀏覽