GPU實作增廣塊狀西門諾分散式演算法解三對角矩陣

解三對角矩陣在現在自然科學及工程學裡的問題，例如微分方程、偏微分方程、或是流體動力學等問題，已成為不可或缺的一環。因為三對角矩陣本身的特殊稀疏架構，有很多針對此架構的演算法被提出，在這之前，主流是運用對角旋轉(Diagonal Pivoting)的方式解決準確性的問題。然而對角旋轉有它的限制，在某些矩陣的測試下會得到錯誤的答案。此時，找尋不同於主流或改良主流演算法的必要性與日俱增，而增廣塊狀西門諾分散式演算法(Augmented Block Cimmino Distributed Algorithm)提供了一個不同於主流的演算法。在這篇論文中，我們研究並將增廣塊狀西門諾分散式演算法實作在GPU上。其中針對GPU的計算架構，我們應用了CUDA程式設計實作時需遵循的基本原則，例如透過轉換資料儲存格式達成GPU連續記憶體存取來增進效能，並提出了邊界填補修正方法，此方法能夠弭平大幅降低GPU效能的計算分歧，或是降低計算分歧的影響，進而使得GPU的效能獲得顯著提升。在實驗部分，我們比較了ABCD在Matlab上的準確性、ABCD在GPU上的準確性以及主流演算法的準確性，並比較了不同演算法的計算時間。

關鍵字

ABCD ； GPU ；平行；三對角矩陣；演算法； solver

並列摘要

The tridiagonal solver nowadays appears as a fundamental component in scientific and engi-neering problems, such as Alternating Direction Implicit methods (ADI), fluid Simulation, and Poisson’s equation. Due to the particular sparse format of tridiagonal matrix, many algorithms of solving the system are conceived. Previously, the main stream of solving the system is by using Diagonal Pivoting to reduce the accuracy issue. But, Diagonal Pivoting has its limits and will lead to error solution while the condition number increases. Augmented Block Cimmino Distributed (ABCD) algorithm serves as another option when trying to resolve the problem accurately. In this thesis, we study and implement the ABCD algorithm on GPU. Because of the spe-cial structure of tridiagonal matrices, we investigate the boundary padding technique to eliminate the execution branches on GPU for better performance. In addition, our implementation incorpo-rates various performance optimization techniques, such as memory coalesce, to further enhance the performance. In the experiments, we evaluate the accuracy and performance of our GPU im-plementation against CPU implementation, and analyze the effectiveness of each performance op-timization technique. The performance of GPU version is about 15 times faster than that of the CPU version.

並列關鍵字

ABCD ； GPU ； parallel ； tridiagonal matrix ； algorithm ； solver

參考文獻

[2] R. W. Hockney, “A fast direct solution of Poisson’s equation using Fourier analysis,” J. ACM, vol. 12, pp. 95–113, January 1965.

[4] E. Polizzi and A. H. Sameh, “A parallel hybrid banded system solver: The SPIKE algo-rithm,” Parallel Computing, vol. 32, no. 2, pp. 177–194, 2006.

[6] Harold S. Stone, “An efficient parallel algorithm for the solution of a tridiagonal linear system of equations,” Journal of ACM,Vol. 20, No. 1, pp. 27-38, January 1973.

[7] J. B. Erway, R. F. Marcia, and J. Tyson, “Generalized diagonal pivoting methods for tridi-agonal systems without interchanges,” IAENG International Journal of Applied Mathematics, vol. 4, no. 40, pp. 269–275, 2010.

[8] Ali Cevahir, Akira Nukada, Satoshi Matsuoka, “ High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning,” Computer Science - Research and Development May 2010, Volume 25, Issue 1-2, pp 83-91

被引用紀錄

吳姿瑤（2008）。中國大陸婦女就業困境之研究〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2008.00384

陳姿如（2010）。台灣民俗藝陣發展之研究─以西港刈香為例〔碩士論文，長榮大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0015-1607201009214400

國際替代計量

GPU實作增廣塊狀西門諾分散式演算法解三對角矩陣

全文下載

主題瀏覽