透過您的圖書館登入
IP:18.218.110.116
  • 學位論文

針對TI C64x數位訊號處理器之可變長度超長指令字元編碼

Variable-Length VLIW Encoding for TI C64x DSP Processors

指導教授 : 林泰吉
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


超長指令字元(Very Long Instruction Word;VLIW)架構已經被證實在各類嵌入式系統應用如:信號、影像、聲音處理上,較精簡指令集(RISC)、超純量(superscalar)架構提供更好的效能及較低的耗能和設計成本,但也因為當執行而指令並行度不高時,需在沒有有效指令可執行的功能單元填入NOP(No Operand)指令,導致其相較於其他架構程式指令大小相對來的高了許多。TI C64x為目前最成功的VLIW DSP 之一,雖然應用NOP Removal消除大部分NOP改善了指令密度,但由於使用固定長度指令且僅限於高效能,普遍仍有程式大小過大之問題。本研究發現其指令編碼中如:1.條件執行、2. 暫存器、3.立即值、4.功能碼等欄位中部分位元在執行時並不會被使用到,因此我們應用可變長度指令編碼(Variable-length Instruction Encoding)、自適性指令分組及分派機制、固定長度指令包裹(Instruction Bundle)整合的可變長度超長指令字元編碼,來達到提高指令密度的目的。並分析各方法所帶來的效益及代價和整體的指令壓縮結果。最後本論文提出適用於我們方法的平行指令解碼硬體架構,並探討在C64管線前端實現該架構所需的延遲及硬體複雜度。本論文針對C64所提出之改善方法在當指令並行度不高時亦可達到74~84%的指令壓縮率。

並列摘要


VLIW architecture has been demonstrated in various types of embedded system applications such as: signal, image, audio processing, more traditional superscalar, RISC architecture provides better performance and lower design costs. But code compiled will often include many NOP instructions, which occur because there is not enough ILP to completely fill an execute packet with useful instructions, leading to its program size relative to other architectures than many high. TI C64x is one of the most successful VLIW DSP, although the application NOP Removal but the use of fixed-length instructions and only for high-performance, there are still too large for the size of the program in question. The study found that the instruction encoding, such as: (1) conditional execution, (2) registers, (3) immediate value, (4) function code etc. field some bits are not used in execute, so we proposed variable length instruction encoding, and adaptive instruction grouping & dispersal scheme (CAP), a fixed-length instruction bundle integrated variable-length instruction VLIW encoding, and then analysis the proportion of valid bits & used in various fields for C64 and complete instructions encoding to achieve the purpose of improving instruction encoding density. We also propose decompression hardware architecture and consideration for each method of analysis overhead. Finally this paper discusses delay and hardware complexity of the front in C64 pipeline when using our approach. In this thesis, the proposed method for C64 when instruction parallelism is not high still can reach 74 to 84 percent of the instruction compression ratio, and only about 2ns delay in the instruction decoding.

參考文獻


[2] B. R. Rau, J. A. Fisher, “Instruction-level parallel processing: History, overview, and perspective,” Journal of Supercomputing, vol. 7, no. 1, 1993.
[3] Y. Jiang, Y. Tang, Y. Wang, and D. Zhou, “A DSP-based turbo codec for 3G communication systems,” in Proc. IEEE ICASSP, vol. 3, 2002, pp. 2685-2688.
[4] H. Jiang, and V. Owall, “FPGA implementation of real-time image convolutions with three level of memory hierarchy,” in Proc. IEEE ICASSP, 2003, pp. 424-427.
[5] Z. Wei, P. Liu, C. Yu, and H. Zhou, “Optimization of ETSI DSR frontend software on a high-efficient audio DSP,” in Proc. IEEE ISCAS, 2013, pp. 994-997.
[6] Y. Tang, et al., “Optimized software implementation of full-rate ieee 802.11 a compliant digital baseband transmitter on digital signal processing,” in Proc. IEEE Global Telecommunications Conf. GLOBECOM, 2005.

延伸閱讀