GPU平台下的高度平行化繞線演算法與其應用

GPU原為設計用來處理圖形方面應用之處理器，因為其多核心與單指令流多執行緒架構，已有越來越多演算法被平行化並使用GPU來加速。本研究使用GPU來加速電子自動化設計中常需使用到的演算法 – 在以格線為基礎的圖形內的繞線演算法。我們提出了每一個節點由一個執行緒控制為基礎的演算法，並使用三個方法來加速，分別是判斷並只呼叫需要更新的區域、減少分歧路線、避免資料讀取衝突，之後我們將此演算法應用至以下三個問題中 – 全點對最短路徑問題、使用緩衝器ECO來解決時間優化問題、三維全域繞線器。實驗結果指出在全點對最短路徑問題中，GPU最多可加速至CPU的31.2倍，ECO問題亦可透過GPU加速至平均13.7倍，三維全域繞線器與GLADE [1] 比較之結果，可繞出合法解之平均總線長短1.6%，但時間花費為GLADE的9.84倍且解不出的overflow亦比GLADE還要高，但若與CPU相比，GPU的時間比CPU提升了11.52倍。

關鍵字

繞線演算法；電子設計自動化；平行化；全域繞線器

並列摘要

GPU was originally a processor designed to handle the graphic applications, because of its multi-core and Single Instruction, Multiple Thread (SIMT) architecture, more and more algorithms are parallel and use GPU to accelerate. This study uses GPU to accelerate one of the electronic design automation algorithms – maze routing algorithm in grid-based graph. We proposed a algorithm that each node was controlled by a thread, and used the following three methods to speed up, respectively, check which sub-regions need to be updated, minimize divergent warps, and avoid bank conflicts, then we apply this algorithm to the following three problems – all pairs shortest path, maze routing part in buffer insertion for ECO timing optimization, 3D global router with layer directive. Experimental results indicated that GPU can speedup to 31.2 times with CPU in APSP problem, ECO problem can also be accelerated to an average of 13.7 times with CPU, 3D global router compared with the results of GLADE [1], the legal solutions can be around 1.6% shorter on total wire length than GLADE, but the time spent was 9.84 times than GLADE and the overflow solution is also higher than GLADE, but if compared with the CPU, GPU is 11.52 times faster than CPU.

並列關鍵字

GPU ； CUDA ； maze routing algorithm ； EDA ； parallel programing ； global router

參考文獻

[1] Yen-Jung Chang, Tsung-Hsien Lee, and Ting-Chi Wang, “GLADE: A Modern Global Router Considering Layer Directives,” in Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 319-323, 2010.

[5] Tomohiro Okuyama, Fumihiko Ino, and Kenichi Hagihara, “A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-compatible GPU”, in Proceedings of 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 284-291, 2008

[6] Quoc-Nam Tran, “Designing Efficient Many-core Parallel Algorithms for All-Pairs Shortest-Paths using CUDA,” 2010 Seventh International Conference on Information Technology, 2010.

[7] Shucai Xiao, and Wu-chun Feng, “Inter-Block GPU Communication via Fast Barrier Synchronization,” in Proceedings of 2010 IEEE International Symposium on Parallel and Distributed Processing, pp. 1, 2010

[8] Zhuo Feng, and Zhiyu Zeng, “Parallel Multigrid Preconditioning on Graphics Processing Units (GPUs) for Robust Power Grid Analysis,” in Proceedings of the 47th Design Automation Conference, DAC ’10, pp. 661–666, New York, NY, USA. ACM.

國際替代計量

GPU平台下的高度平行化繞線演算法與其應用

全文下載

主題瀏覽