帳號:guest(3.133.137.17)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳世昌
作者(外文):Shih-Chang Chen
論文名稱(中文):在超長指令平行架構核心以及數位訊號處理器上驅動軟體管線之研究
論文名稱(外文):Enabling Software Pipelining for PAC VLIW DSP Processors
指導教授(中文):李政崑
指導教授(外文):Jenq-Kuen Lee
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:934347
出版年(民國):95
畢業學年度:94
語文別:英文
論文頁數:66
中文關鍵詞:軟體管線化ORCItanium六十四位元群集旋轉暫存器Modulo Variable Expansion指令集模擬器DSPstone
外文關鍵詞:Software PipeliningORCItanium Architecture -64 bitsClusterRotating RegisterModulo Variable ExpansionInstruction Set SimulatorDSPstone
相關次數:
  • 推薦推薦:0
  • 點閱點閱:139
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在編譯器中,軟體管線化是一種功能很強大的技術。它讓迴圈鄰近的迭代執行時間能夠重疊,因此能夠增進程式的效能。然而這個必須在排程的時候考慮一些限制才能夠達成這個目的。許多不同有關軟體管線化的演算法已經被提出來了。我提出了一個可以適用於核心平行處理器的架構之下的方法。我們藉助了ORC這個編譯器。ORC原本是支援Itanium六十四位元硬體架構的編譯器。因此我們必須對ORC做修正才行。核心平行處理器和Itanium六十四位元架構主要有三個差異的部份。第一點、核心平行處理器是群集的架構,我們必須將指令分配到不同的群集,並且處理不同群集指令之間的溝通。第二點、在核心平行處理器沒有旋轉暫存器這種硬體支援。這使得原本ORC產生程式碼的部份已經不適用於核心平行處理器了,因此必須針對這部份做修正,否則將會導致程式執行的錯誤。我們參考了以前的研究資料,使用了Modulo Variable Expansion來解決這樣的問題。第三點、存取平行核心處理器的全域暫存器有特殊的限制,這是Itanium六十四位元所沒有的。我們使用自己的資料結構來考慮這樣的限制,並且修正原本軟體管線化中排程的部份。在實驗的部份,我們使用平行核心處理器的指令集模擬器,並且用DSPstone做為我們的測試程式。我們比較不同最佳化層級的結果、使用不同群集驅動軟體管線化的結果、驅動軟體管線化跟沒有驅動軟體管線化的結果。實驗結果顯示驅動軟體管線化在最佳化層級O1平均會比沒有驅動軟體管線化在最佳化層級O0快一倍以上。
Software pipelining is a powerful loop optimization technologyi in compiler. It overlaps the execution of adjacent loop iterations to improve performance. However it
has to consider many constraints in the scheduling phase to achieve this purpose. Many miscellaneous algorithms of software pipelining have already come out and we propose a method for a clustered VLIW DSP processor known as PAC platform. We enable the work of the software pipelining with ORC over PAC platform. However, ORC is not available for the PAC platform. The ORC is originally construct for
IA-64 architectures. We need to modify ORC to fit PAC architectures. There are mainly three differences between PAC and IA-64 architectures. First, The VLIW data paths of PAC architectures are clustered. We have to assign instructions to appropriate clusters and deal with communications between clusters. Second, there
is no rotating register hardware support in PAC architectures. The code generations of ORC must be modified, otherwise it may cause errors. We reference previous work called modulo variable expansion to solve the problem. Third, there are ping-pong constraints when we access the global register files of PAC architectures. We use our data structures to consider this constraint and we modify the modulo scheduling of software pipelining. We run the experiment by Instruction Set Simulator for PAC
DSP architecture and we take DSPstone suite as our benchmark. We compare the results of different optimization levels and different number of clusters of software pipelining. The result shows that there is at least 2 times speedup for each case of ii the benchmark by incorporting our scheme over -O0 code generations.
Acknowledgements i
Abstract ii
Contents iv
1.Introduction 1
2.PAC DSP Architecture 4
3.Basic Concepts of Software Pipelining 9
4.Software Pipeliniing for PAC 34
5.Experiment 49
6.Conclustion 58
Bibliography 61
[1] Richard A. Huff. ”Lifetime-sensitive modulo scheduling” In Proc. of the SIGPLAN ’93 Conf. on Program- ming Language Design and Imple- mentation, Albuquerque, N. Mex., Jun. 23–25, 1993. ACM SIG- PLAN.
[2] B. Rau, M. Lee, P. Tirumalai, and P. Schlansker ”Register allocation for software pipelined loops” In Proc. of the ACM SIGPLAN’92 Conference on Programming Language Design and Implementation, pages 283–299, June 1992.
[3] B. Rau. ”Iterative Modulo Scheduling: An Algorithm for software pipelining loops” MICRO-27, 1994, pp. 63-74 .
[4] B. Rau, M. Schlansker, and P.Tirumalai ”Code Generation Schemas for Modulo Scheduled DO-Loops and WHILE-Loops” MICRO-25, Dec. 1992.
[5] M.Lam. ”Software pipelining: an effective scheduling technique for VLIW machines” Proceedings of the SIGPLAN ’88 conference on Programming language design and implementation. 1988.
[6] ME Wolf, MS Lam. ”A loop transformation theory and an algorithm to maximize parallelism” IEEE Transactions on Parallel and Distributed Systems, 1991.
[7] Yung-Chia Lin, Yi-Ping You, and Jenq Kuen Lee. ”Register Allocation for VLIW DSP Processors with Irregular Register Files” Compiler for Parallel Computing. 2006.
[8] S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. ”Owens: Register organization for media processing” International Symposium on High Performance Computer Architecture (HPCA), pp.375-386, 2000.
[9] T.-J. Lin, C.-C. Chang. C.-C. Lee, and C.-W. Jen. ”An Efficient VLIW DSP Architecture for Baseband Processing” Proceedings of the 21th International Conference on Computer Design, 2003.
[10] Tay-Jyi Lin, Chie-Min Chao, Chia-Hsien Liu, Pi-Chen Hsiao, Shin-Kai Chen, Li- Chun Lin, Chih-Wei Liu, Chein-Wei Jen ”Computer architecture: A unified processor architecture for RISC & VLIW DSP”Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 2005.
[11] Yung-Chia Lin, Chung-Lin Tang, Chung-Ju Wu, Ming-Yu Hung, Yi-Ping You, Ya-Chiao Moo, Sheng-Yuan Chen, and Jenq Kuen Lee ”Compiler Supports and Optimizations for PAC VLIW DSP Processors”Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing, 2005.
[12] BR Rau, CD Glaeser ” Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing” ACM SIGMICRO Newsletter, 1981.
[13] H Rong, Z Tang, R Govindarajan, A Douillet, GR Gao. ”Singledimension software pipelining for multi-dimensional loops” Code Generation and Optimization, 2004. CGO 2004
[14] Zhining Huang and Sharad Malik. ”Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks” DATE 2001.
[15] S. Rixner, W. J. Dally, B.Khailany, P. Mattson, U. J. Kapasi, and I. D. ”Owens Register organzation for media processing” International Symposium on High Performance Computer Architecture (HPCA), pp.375-386, 2000.
[16] STC ITRI,PACDSP ISM v2.0.
[17] David Chang and Max Baron: Taiwan’s Roadmap to Leadership in Design. Microprocessor Report, In-Stat/MDR, Dec. 2004. http://www.mdronline.com/mpr/archive/mpr 2004.html.
[18] Roy Ju, Sun Chan, and Cheng yong Wu, ”Open Research Compiler for the Itanium Family”. Tutorial at the 34th Annual Intl Symposium on Micro-architecture, Dec, 2001.
[19] SGI - Developer Central Open Source - Pro64 http://oss.sgi.com/projects/Pro64/.
[20] A Capitanio, N. Dutt, and A. Nicolau, ”Partitioned register files for VLIWs: A preliminary analysis of tradeoffs”, Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO25),
Porland, December 1V4, 1992; pages 292V300.
[21] The open research compiler official page.
http://ipforc.sourceforge.net.
[22] Yung-Chia Lin, Chung-Lin Tang, Chung-Ju Wu, Ming-Yu Hung, Yi-Ping You, Ya-Chiao Moo, Sheng-Yuan Chen, Jenq-Kuen Lee. ”Compiler Supports and Opimizations for PAC VLIW DSP Processors”, LCPC, 2005.
[23] Cheng-Wei Chen, Yung-Chia Lin, Chung-Ling Tang, Jenq-Kuen Lee. ”ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors”, IEEE VLSI TSA, April 27-29, 2005.
[24] Tay-Jyi Lin, Chen-Chia Lee, Chih-Wei Liu, and Chein-Wei Jen ”A Novel Register Orgnization for VLIW Digital Signal Processors”, Proceedings of 2005 IEEE International Symposium on VLSI Design, Automation, and Test, 2005, pages 335 V338.
[25] R.Leupers ”instruction scheduling for clustered VLIW DSPs”, Proc. Intl Conference on Parallel Architecture and Compilation Techniques, Ort. 2000, pages 291V300.
[26] V.Zivojnovic, J. Martines, C. Schlager and H. Meyr ”DSPstone: A DSP-Oriented Benchmarcking Methodology”, Proc. of ICSPAT, Dallas,1994.
[27] VH Allan, RB Jones, RM Lee, SJ Allan ,”Software pipelining” ACM Computing Surveys (CSUR), 1995 - portal.acm.org.
[28] K Ebcio.lu ,”A compilation technique for software pipelining of loops
with conditional jumps”, 1987 - ACM Press New York, NY, USA.
[29] J Ruttenberg, GR Gao, A. Stoutchinin, W. Lichtenstein , ”Software
pipelining showdown: optimal vs. heuristic methods in a production
compiler”, Proceedings of the ACM SIGPLAN 1996 conference on
Programming language design and implementation.
[30] R Govindarajan, ER Altman, GR Gao, ”Minimizing register requirements
under resource-constrained rate-optimal software pipelining”,
Proceedings of the 27th annual international symposium on Microarchitecture.
[31] J Wang, C Eisenbeis, M Jourdan, B Su, ”Decomposed software
pipelining: a new perspective and a new approach”, International
Journal of Parallel Programming, 1994 - portal.acm.org.
[32] S Jain ,”Circular scheduling: a new technique to perform software
pipelining”, ACM SIGPLAN Notices, 1991 - portal.acm.org.
[33] Q Ning, GR Gao ,”A novel framework of register allocation for software
pipelining”, Proceedings of the 20th ACM SIGPLAN-SIGACT
symposium on Principles of programming languages.
[34] B Su, S Ding, J Xia ,”URPR-An extension of URCR for software
pipelining”, ACM SIGMICRO Newsletter, 1986 - portal.acm.org.
[35] A Aiken, A Nicolau, S Novack ,”Resource-constrained software
pipelining”, IEEE Transactions on Parallel and Distributed Systems,
1995 - doi.ieeecs.org.
[36] MS Lam, ”A systolic array optimizing compiler”, Kluwer Academic
Publishers....
[37] SOOM MOON, KE GLU ,”Parallelizing Nonnumerical Code with
Selective Scheduling and Software Pipelining”, ACM Transactions
on Programming Languages and Systems, 1997 - portal.acm.org.
[38] B Su, S Ding, J Wang, J Xia, ”GURPR-a method for global software
pipelining”, Proceedings of the 20th annual workshop on Microprogramming,
1987 - portal.acm.org.
[39] TJ Callahan, J Wawrzynek ,”Adapting software pipelining for reconfigurable
computing”, Proceedings of the 2000 international conference
on Compilers, architecture, and synthesis for embedded systems.
[40] A Aiken, A Nicolau ,”A realistic resource-constrained software
pipelining algorithm”, Advances in Languages and Compilers for Parallel
Processing, 1991.
(此全文限內部瀏覽)
封面
摘要
致謝辭
目錄
第一章
第二章
第三章
第四章
第五章
第六章
參考文獻
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *