支援數位訊號處理器短字組指令之編譯器最佳化

隨著多媒體應用的快速發展，其所需的運算量與日俱增。為達到高效能運算及低開發成本，大多數的嵌入式系統皆採用數位訊號處理器來應付所需。而為提高運算效能，當今的數位訊號處理器已具有相當多針對多媒體應用的優化。在其硬體設計的潮流中，將具運算長度較短的向量指令帶入數位訊號處理器的指令集廣被採用。這些具較短運算長度的向量指令，亦稱為短字組指令，可以單一指令完成多筆資料的運算，例如於一道指令中完成四筆八位元的加法。由於多媒體應用常使用精度較小、資料長度較短的運算，對於這些多媒體運用短字組指令將提供可觀的效能提升。雖然短字組指令能帶來效能的提升，但這些指令到目前仍不廣為利用。其根本原因在於運用這些指令需透過組合語言、函式庫、及編譯器所提供的內部函式。對於使用高階語言所撰寫的程式而言，這些短字組指令未能被編譯器利用，使得應提升的效能付之闕如。為了能妥善利用這些短字組指令，於編譯器中自動向量化的技術因而被提出。此編譯器的技術將於高階語言所撰寫的程式中，找出程式的的平行度並進而產生具有短字組指令的組合語言。如此一來費力及乏味的組合語言撰寫過程將可省去，而對於多媒體應用也能提供不錯的效能。在這篇論文中，我們將呈現如何利用此編譯器的技術，為我們的數位訊號處理器產生其短字組指令。此篇論文所使用的數位訊號處理器是由工研院系統晶片中心所研發的超長指令集數位訊號產理器。在實驗的部分，我們使用一群具代表性且具有實際用途的的數位訊號處理程式，來驗証我們編譯器是否能產生較高效能的組合語言。初步的實驗數據顯示我們的編譯器確實能為部分的程式提高一點三至二點一倍的效能。

關鍵字

自動向量化；編譯器；單指令多資料流；短字組指令；數位訊號處理器

並列摘要

With rapid growth and evolution of multimedia applications, the demanded computation has been putting more and more pressure on processing capability of modern processors. Aiming at high performance and relatively low cost, a majority of embedded systems have adopted Digital Signal Processors (DSP) as their optimum solution. For achievement of higher computing ability, DSP has been equipped with many hardware optimizations specific to multimedia applications. One developing trend in DSP is to augment DSP instruction sets with short vector instructions, called sub-word instructions. Sub-word instructions operate sets of data in a Single Instruction, Multiple Data (SIMD) manner. Tremendous benefit of multimedia processing is provided since sub-word instructions accelerate processing for adjacent data with short data types. However, access to these sub-word instructions is, unfortunately, limited to in-line assembly, library, and compiler intrinsic functions, but not applicable for general C language constructs for efficiency support. In order to take advantage of sub-word instructions both for old and new programs, auto-vectorization was consequently proposed to generate sub-word instructions automatically in compilers. The goal of auto-vectorization is to exploit parallelism implicit in user programs and leverage sub-word instructions in code generation. With auto-vectorization, efforts of parallel programming and legacy program rewriting are largely saved. Moreover, performance of multimedia applications could be boosted for several times according to the data types used. In this thesis, we present an enabled flow for performing auto-vectorization of C compilers by utilizing sub-word instructions. The vectorizing compiler would identify data level parallel implicit in C programs and automatically generate assembly with sub-word instructions whenever possible. The target architecture in our experiment is based on PAC VLIW DSP processors. The performance of vectorized programs are evaluated using a set of DSP loop kernels, which are typical and representative in digital signal processing. The preliminary results reveal that our vectorizing compiler generates codes with efficiency. The speedup is from 1.3 to 2.1 compared to the one without our proposed optimizations.

並列關鍵字

Auto-vectorization ； Compiler ； SIMD ； Sub-word Instructions ； PAC DSP

參考文獻

[Online]. Available: http://www.standardsinfo.net/isoiec/index.html

[2] I. T. Union, ITU-T H.261, H.262, H.263, H.264, International Telecommunication

[3] An Efficient VLIW DSP Architecture for Baseband Processing., vol. Proceedings

of the 21th International Conference on Computer Design, 2003.

“Pac dsp core and application processors,” Proceedings of International Conference

國際替代計量

支援數位訊號處理器短字組指令之編譯器最佳化

未授權

主題瀏覽