透過您的圖書館登入
IP:18.218.168.16
  • 學位論文

暫存器配置演算法於共有與叢集混合式暫存器架構

A Register Allocation Algorithm for Shared and Clustered Hybrid Register Files Organization

指導教授 : 鍾葉青

摘要


對於超長指令集處理器設計,叢集式暫存器架構可以提供較佳的硬體效率。但叢集式暫存器架構會帶來額外的叢集間溝通負擔於執行周期。我們提出了共有與叢集混合式暫存器(SCRF)架構與SCRF暫存器配置演算法來降低叢集間溝通負擔。SCRF架構由一個共有暫存器與多個叢集暫存器所組成。把使用率高的變數配置於共有暫存器中,可以有效地降低叢集間溝通負擔。SCRF暫存器配置演算法可利用SCRF暫存器架構之特性來有效降低叢集間溝通與暫存器滿溢負擔。我們把SCRF架構與SCRF暫存器配置演算法實做在Trimaran這個編譯模擬環境中。並使用mediabench的效能測試程式來分析SCRF架構在執行週期與程式大小上的影響。根據實驗結果,SCRF架構在各項數據上皆優於叢集式暫存器架構。對於執行週期、叢集間溝通負擔、暫存器滿溢負擔、程式大小,SCRF架構平均降低了11.6%、55.6%、52.7%、18.2%。

並列摘要


In VLIW processor design, clustered architecture becomes a popular solution for better hardware efficiency. But the inter-cluster communication (ICC) will cause the execution cycles overhead. In this thesis, we propose a shared cluster register file (SCRF) architecture and a SCRF register allocation algorithm to reduce the ICC overhead. The SCRF architecture is a hybrid register file (RF) organization composed of shared RF (SRF) and clustered RFs (CRFs). By putting the frequently used variables that need ICCs on SRF, we can reduce the number of data communication of clusters and thus reduce the ICC overhead. The SCRF register allocation algorithm exploits this architecture feature to perform optimization on ICC reduction and spill codes balancing. The SCRF register allocation algorithm is a heuristic based on graph coloring. To evaluate the performance of the proposed architecture and the SCRF register allocation algorithm, the frequently used two-cluster architecture with and without the SRF scheme are simulated on Trimaran, a compiler framework. A set of multimedia programs from mediabench is used as the benchmarks. The simulation results show that the performance of the SCRF architecture is better than that of the clustered RF architecture for all test programs in all measured metrics. In the SCRF architecture with macro registers defined in the SRF, the execution cycles, the ICC overhead, the spill codes overhead, and the code density can get 11.6%, 55.6%, 52.7%, and 18.2% reduction in average, respectively.

參考文獻


[34] P. Mattson, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owens, “Communication scheduling”, in Proceedings of the 9th international Conference on Architectural Support for Programming Languages and Operating Systems, Nov. 2000, pp. 82-92.
[10] A. Gangwar, M. Balakrishnan, A. Kumar, "Impact of Inter-cluster Communication Mechanisms on ILP in Clustered VLIW Architectures", In 2nd Workshop on Application Specific Processors (WASP-2), in conjuction with 36th IEEE/ACM Annual International Symposium on Microarchitecture (MICRO-36), Dec 2003.
[3] CCCP research group, “Compilers Creating Custom Processors”, http://cccp.eecs.umich.edu.
[2] A. Capitanio, N. Dutt, A. Nicolau, “Partitioned register files for VLIWs: a preliminary analysis of tradeoffs”, in Proceedings of the 25th annual international symposium on Microarchitecture, MICRO 25, Dec. 1992, pp. 292-300.
[4] G. J. Chaitin, “Register allocation and spilling via graph coloring”, in Proceeding of the ACM SIGPLAN 82 Symposium on Compiler Construction, June 1982, pp. 98-105.

延伸閱讀