Title

GPU平台下的高度平行化繞線演算法與其應用

Translated Titles

Massively Parallel Routing Algorithms on GPU Platform and Their Applications

DOI

10.6842/NCTU.2011.00237

Authors

羅勻鍵

Key Words

繞線演算法 ; 電子設計自動化 ; 平行化 ; 全域繞線器 ; GPU ; CUDA ; maze routing algorithm ; EDA ; parallel programing ; global router

PublicationName

交通大學資訊科學與工程研究所學位論文

Volume or Term/Year and Month of Publication

2011年

Academic Degree Category

碩士

Advisor

李毅郎

Content Language

英文

Chinese Abstract

GPU原為設計用來處理圖形方面應用之處理器,因為其多核心與單指令流多執行緒架構,已有越來越多演算法被平行化並使用GPU來加速。本研究使用GPU來加速電子自動化設計中常需使用到的演算法 – 在以格線為基礎的圖形內的繞線演算法。我們提出了每一個節點由一個執行緒控制為基礎的演算法,並使用三個方法來加速,分別是判斷並只呼叫需要更新的區域、減少分歧路線、避免資料讀取衝突,之後我們將此演算法應用至以下三個問題中 – 全點對最短路徑問題、使用緩衝器ECO來解決時間優化問題、三維全域繞線器。實驗結果指出在全點對最短路徑問題中,GPU最多可加速至CPU的31.2倍,ECO問題亦可透過GPU加速至平均13.7倍,三維全域繞線器與GLADE [1] 比較之結果,可繞出合法解之平均總線長短1.6%,但時間花費為GLADE的9.84倍且解不出的overflow亦比GLADE還要高,但若與CPU相比,GPU的時間比CPU提升了11.52倍。

English Abstract

GPU was originally a processor designed to handle the graphic applications, because of its multi-core and Single Instruction, Multiple Thread (SIMT) architecture, more and more algorithms are parallel and use GPU to accelerate. This study uses GPU to accelerate one of the electronic design automation algorithms – maze routing algorithm in grid-based graph. We proposed a algorithm that each node was controlled by a thread, and used the following three methods to speed up, respectively, check which sub-regions need to be updated, minimize divergent warps, and avoid bank conflicts, then we apply this algorithm to the following three problems – all pairs shortest path, maze routing part in buffer insertion for ECO timing optimization, 3D global router with layer directive. Experimental results indicated that GPU can speedup to 31.2 times with CPU in APSP problem, ECO problem can also be accelerated to an average of 13.7 times with CPU, 3D global router compared with the results of GLADE [1], the legal solutions can be around 1.6% shorter on total wire length than GLADE, but the time spent was 9.84 times than GLADE and the overflow solution is also higher than GLADE, but if compared with the CPU, GPU is 11.52 times faster than CPU.

Topic Category 基礎與應用科學 > 資訊科學
資訊學院 > 資訊科學與工程研究所
Reference
  1. [1] Yen-Jung Chang, Tsung-Hsien Lee, and Ting-Chi Wang, “GLADE: A Modern Global Router Considering Layer Directives,” in Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 319-323, 2010.
    連結:
  2. [5] Tomohiro Okuyama, Fumihiko Ino, and Kenichi Hagihara, “A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-compatible GPU”, in Proceedings of 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 284-291, 2008
    連結:
  3. [6] Quoc-Nam Tran, “Designing Efficient Many-core Parallel Algorithms for All-Pairs Shortest-Paths using CUDA,” 2010 Seventh International Conference on Information Technology, 2010.
    連結:
  4. [7] Shucai Xiao, and Wu-chun Feng, “Inter-Block GPU Communication via Fast Barrier Synchronization,” in Proceedings of 2010 IEEE International Symposium on Parallel and Distributed Processing, pp. 1, 2010
    連結:
  5. [8] Zhuo Feng, and Zhiyu Zeng, “Parallel Multigrid Preconditioning on Graphics Processing Units (GPUs) for Robust Power Grid Analysis,” in Proceedings of the 47th Design Automation Conference, DAC ’10, pp. 661–666, New York, NY, USA. ACM.
    連結:
  6. [9] Christopher I. Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, and Wen-Mei W. Hwu, “GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications,” in ACM International Conference on Computing Frontiers, 2008, pp. 273-282
    連結:
  7. [10] Jian-Syun Tong, “Topology-aware Buffer Insertion and GPU-based Massively Parallel Rerouting for ECO Timing Optimization,” National Chiao Tung University, Master, October 2010
    連結:
  8. [11] Ke-Ren Dai, Wen-Hao Liu, and Yih-Lang Li, “NCTU-GR: Efficient Simulated Evolution-Based Rerouting and Congestion-Relaxed Layer Assignment on 3-D Global Routing,” in Proceedings of Asia and South Pacific Design Automation Conference, Tokohama, Japan, 2009, pp. 1066-1077, 2008.
    連結:
  9. [2] NVIDIA Corporation, NVIDIA CUDA Programming Guide, February, 2010, version 3.0.
  10. [3] Lijuan Luo, Martin Wong, and Wen-mei Hwu, “An Effective GPU Implementation of Breadth-First Search,” in Proceedings of the 47th Design Automation Conference, pp. 52, 2010.
  11. [4] Gary J. Katz, and T. Kider Jr, “All-Pairs Shortest-Paths for Large Graphs on the GPU,” in Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pp. 47-55
  12. [12] ISPD 2008 Global Routing Contest. [Online]. Available: http://archive.sigda.org/
  13. ispd2008/contests/ispd08rc.html