透過您的圖書館登入
IP:18.222.67.251
  • 學位論文

具分散式及非正規設計之超長指令集數位訊號處理器架構之編譯器設計與最佳化研究

Compilers for VLIW DSP Architectures with Distributed and Irregular Designs

指導教授 : 李政崑
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


超長指令集架構已成為近年來提供更高指令層級平行化與效能的現在化高階處理器之主流設計。拜超大型積體電路技術之進步所賜,現今已能夠設計出比過往更強大及快速的晶片,但複雜度、尺寸及耗電量卻成為設計新的超長指令集架構處理器的額外要考慮的議題。對於嵌入式系統市場而言,一個成功的處理器設計必須要同時兼顧高效能、低耗能、低成本、及比別家更短的上市時間等特色。因此,有些可用來增強通用型超長指令集架構處理器效能之熱門、繁複及花俏的設計並不合適用在一個也需要高效能的嵌入式處理器的設計上。 這些年來有眾多的暫存器檔案架構及非常規設計被研發出來用在嵌入式處理器上,以便能比傳統高效能處理器架構節省更多的複雜度、耗能及尺寸。由於編譯器一般被視為一處理器設計能否成功之最重要的系統軟體組件,因此研發能夠有效支援這類非正規設計之架構的程式碼產生與最佳化技術是十分引人興趣的。更何況在這種超長指令集架構下也需要有效的編譯器支援才能讓編寫程式的效率能夠提升。 在本論文中,我們提出了有效支援一具有非正規設計之嶄新的超長指令集架構數位訊號處器的編譯器設計與最佳化研究結果。我們所針對的處理器稱為平行架構核心數位訊號處理器(PAC DSP),是設計成具有高度存取埠限制之區塊分割形式的暫存器檔案。另外、平行架構核心數位訊號處理器也運用了一種異構化分散式資料路徑架構來達到低複雜度、小尺寸、及可低耗能之有效率的設計。我們認為平行架構核心數位訊號處理器提供了一種架構模型是有希望達成實用上能應付應用程式所需之高度平行化但又能減少日益嚴重的複雜度、尺寸、及耗能所帶來的問題。我們針對平行架構核心數位訊號處理器所研發的相關編譯器技術成果與經驗也能夠對其他欲發展編譯器在類似架構上的開發者有所助益。 我們將介紹如何運用開放研究編譯器(Open Research Compiler)架構來完成在一嶄新超長指令集架構數位訊號處理器的非常規暫存器檔案架構下的程式碼產生之具體設計。同時,我們也將介紹在這種架構下有效支援產生高品質程式碼之新的暫存器配置框架。我們提出了數種暫存器配置的方法能夠有效利用非常規的暫存器檔案架構。另外,我們也介紹了其他能夠支援在平行架構核心數位訊號處理器上最佳化的編譯器技術。 使用我們所開發給平行架構核心數位訊號處理器的編譯器之所有實驗結果皆顯示我們在這個架構下所研發的編譯器技術方法都能明顯增進所產生的程式碼效能。進一步而言,利用我們所研發的編譯器將能更有效率地運用平行架構核心數位訊號處理器的特殊暫存器檔案架構及非正規的設計。

並列摘要


VLIW architectures have already been the main-stream design for a modern high-end processor in recent years to support more instruction-level-parallelism (ILP) and potential performance than the traditional single-issue CISC/RISC machines. Due to the advances in VLSI technology, people nowadays could develop more powerful and faster chips than ever, but also get additional issues to be considered while designing a new VLIW processor: complexity, die size, and power dissipation. For the embedded-system market, a successful processor design not only requires to provide ample performance but features low-power consumption, low cost, and reduced time-to-market. Therefore, some popular, fancy and sophisticated design techniques to enhance the performance of a general-purpose VLIW processor may not be feasible for an embedded processor that also demands a high performance criterion. Wide varieties of register file architectures and irregular designs --- developed for embedded processors --- have turned to aim at reducing the complexity, power dissipation, and die size these years, by contrast with the traditional architectures implemented by high-performance processors. There has been considerable interest in developing the techniques to effectively support the code generation and optimizations for such architectures with irregular designs because the compiler is generally regarded as the most important system-software component that supports a processor design to achieve success. It is also essential to have adequate compiler support for VLIW architectures so that the programming efficiency could be dramatically improved. This dissertation has made contributions to the design and development of an effective compiler for a novel VLIW DSP with irregular designs. The target DSP architecture, known as the PAC DSP core, is designed with distinctively partitioned register files in which port access is highly restricted. Moreover, the PAC DSP utilizes a heterogeneous distributed data-path architecture to attain an efficient design with low complexity, small size, and the possible low power consumption. We believe that the PAC DSP employs a promising architecture model to pragmatically support the high parallelism demanded by the DSP applications but reduce the disadvantageous progress of chip complexity, die size, and power dissipation. Our experiences in designing the compiler support for the PAC DSP may also be of interest to those involved in developing compilers for the similar architectures with such irregular designs. Our major contributions in this dissertation are as follows: 1. We present our application of the Open64/ORC infrastructure to a novel VLIW DSP and the specific design for handling its register file architecture. As part of an effort to overcome the new challenges of code generation for the PAC DSP, we have developed a new register allocation framework and other retargeting optimization phases that allow the effective generation of %support in generating high quality code. 2. We propose a novel heuristic algorithm, named ping-pong aware local favorable (PALF) register allocation, to obtain advantageous register allocation that is expected to better utilize irregular register file architectures. We also propose an alternate register allocation scheme using a simulated-annealing (SA) approach, and a hybrid optimization procedure to integrate the PALF and SA. Furthermore, an associated global register allocation strategy is presented and discussed. 3. Advanced subjects to support generating optimized code for PAC DSP architectures are also discussed in this dissertation and preliminarily developed in our compilation infrastructure. The results of all experiments performed using our optimizing compiler based on the Open Research Compiler (Open64/ORC), showed significant performance improvement over the primitive code generation. Our preliminary experimental results also indicate that our developed compiler can efficiently utilize the features of the specific register file architectures and irregular designs in the PAC DSP.

並列關鍵字

compiler DSP VLIW distributed architecture irregular design PAC

參考文獻


gramming Language Design and Implementation, May 1999.
[3] Andrew Appel, Jack Davidson, and Norman Ramsey. The zephyr compiler infrastructure.
Exploiting multiple levels of parallelism in openmp: A case study. In International
intrinsic functions. In Proceedings of the Hawaii International Conference on System
between optimizations and a new type of dsp intrinsic function. In Proceedings of the In-

延伸閱讀