透過您的圖書館登入
IP:3.145.178.240
  • 學位論文

應用於具有分散式暫存器組的超長指令集數位訊號處理器之全域性編譯器最佳化

Global Optimizations in Compilers for VLIW DSP Processors with Distributed Register Files

指導教授 : 李政崑

摘要


中文摘要 超長指令集架構的數位訊號處理器逐漸地被應用於有多媒體需求的嵌入式系統上。在開發一個新的超長指令集數位訊號處理器時,設計複雜度,晶片大小,耗電量等等往往是工程師們在設計上的考量。因此對於嵌入系統,一般常用且傳統的設計並非那麼適合,取而代之的是分散式與多重暫存器組的設計逐漸被採用好減少讀寫埠的數量。雖然如此多變化的暫存器組架構與非常規的設計可以達到高效率與耗電量低的要求,對於編譯器最佳化來說是一個很大挑戰。 編譯器最佳化的目的在於產生較有效率的程式碼,一般來說可以大致區分成區域性最佳化與全域性最佳化。區域性最佳化通常僅運作與小區塊的程式碼,所以非常規的硬體設計其影響層面較小。相反地,全域性最佳化通常要掃過整個程式碼並且妥善利用有限的硬體資源,所以常非規的硬體設計與限制經常會讓全域性最佳化得不到期望的效果。 此篇論文的貢獻在於討論在分散式暫存器組的超長指令集數位訊號處理器上,全域性編譯器最佳化的該如何恰當的應用與實現。我們以一個實際的的例子 PAC DSP 作為例子,它是一個高度分散式並有嚴格讀寫埠限制的暫存組設計。藉由分享我們嘗試在此顆超長指令集數位訊號處理器上的編譯器開發經驗,或可作為其它編譯器在面對其它超長指令集數位訊號處理器在開發時上的參考與借鏡。 實驗部份則是以 Open64 compiler 作為基礎,來開發屬於 PAC DSP 的編譯器。藉由導入了在我們提出的方式之後,在 EEMBC, Mibench 等 benchmark 的數據實驗下,可以看出,相比於傳統的最佳化方式,我們確實改善了全域性最佳化在此類特定暫存器組設計下的運行效果。

並列摘要


Abstract Digital signal processors (DSPs) with very long instruction word (VLIW) data-path architectures are increasingly being deployed on embedded devices for multimedia processing applications. While developing new VLIW DSP processors, engineers always take complexity, die size, and power dissipation into consideration. Therefore, some popular and traditional designs may not be feasible for embedded systems. Instead, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports associated with register files. Although such wide varieties of register file architectures and irregular designs achieve high performance and low power consumption criterion, they present challenges for devising compiler optimization schemes as well. Compiler optimizations, which direct code generation more efficiency, can be conceptually classified into local and global optimizations. Local optimizations only take place within small scope of code fragment, hence the impact of irregular designs is trivial. On the contrary, global optimizations usually go through entire procedure and try to utilize resources as effectively as possible, so the irregular designs and distributed scenarios make global optimizations difficult to have expected improvement. This dissertation has made contributions to the development and discussion of global optimizations on compilers for a novel VLIW DSP with distributed register files. The target DSP architecture, known as PAC DSP core, is designed with distinctively banked register files with highly restricted port access. Our experiences of developing global optimizations in compilers for the PAC DSP may also be of interest to those involved in developing compilers for the similar architectures. Experiments were also performed on the PAC VLIW DSP with distributed register files by incorporating our proposed optimization schemes into an Open64-based compiler. Several benchmarks such as EEMBC and MiBench were tested for evaluating the improvement of utilizing the features of the specific register file architectures. It shows that a VLIW DSP compiler applied by our global optimization schemes exhibits performance superior to traditional strategies.

參考文獻


[2] David Bernstein, Dina Q. Goldin, Martin Charles Golumbic, Hugo Krawczyk, Yishay Mansour, Itai Nahshon, and Ron Y. Pinter. Spill code minimization techniques for optimizing compilers. In Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation, pages 258– 263, 1989.
[4] Andrea Capitanio, Nikil Dutt, and Alexandru Nicolau. Partitioned register files for VLIW’s: A preliminary analysis of tradeoffs. In Proceedings of the 25th An- nual International Symposium on Microarchitecture, pages 292–300, December 1992.
[7] Gregory J. Chaitin. Register allocation and spilling via graph coloring. In Pro- ceedings of the ACM SIGPLAN 1982 Symposium on Compiler Construction, pages 98–105, June 1982.
[8] Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Mar- tin E. Hopkins, and Peter W. Markstei. Register allocation via coloring. Com- puter Languages, 6(1):47–57, 1981.
[10] Giuseppe Desoli. Instruction assignment for clustered vliw dsp compilers: A new approach. Technical report, Hwelett-Packard Laboratories, 1998.

延伸閱讀