對稱正定矩陣Cholesky分解之GPU加速研究

本研究旨在利用近期發展之圖形處理單元(GPU)硬體，提昇特定矩陣運算之效能。在求解線性最小平方差問題中，常需要計算共變異矩陣之反矩陣。這時利用共變異矩陣滿足對稱正定矩陣之特性，求解過程可使用Cholesky分解，其相較於適用一般矩陣之LU分解，速度還快上一倍。近年來隨著技術進步，一片圖形卡中常可置入數百核作運算，相較於CPU之少數8核或16核，若能將圖形卡視為一般用途圖形處理單元(GPGPU)，將可加速執行一般可平行化運算。目前文獻中已有許多矩陣運算平行化之研究，本文將針對求對稱正定矩陣逆運算常用之Cholesky分解，找尋現有網路上公開之GPU實作原始碼，分析其結構，評比其表現，並提出改良方法。經過測試，相較於純CPU執行之Intel數學核心函式庫(Math Kernel Library，MKL)版本，本文改良法可透過GPU提昇維度10000方陣之Cholesky分解速度達3.5倍快程度。

關鍵字

Cholesky分解；圖形處理單元(GPU) ； Intel數學核心函式庫(MKL)

並列摘要

This work aims to apply the recently developed Graphics Processing Unit (GPU) to performance enhancement of a specific matrix operation. When solving the linear least squares problem, it is often necessary to compute the inverse of a covariance matrix. As the covariance matrix satisfies the condition of a symmetric positive definite matrix, the Cholesky decomposition can be used which is twice as fast as the LU decomposition on general matrices. In recent years, with the advances in technology, a graphics card can accommodate hundreds of cores compared to the small number of 8 or 16 cores on CPU. Therefore a trend is seen to use the graphics card as a general purpose graphics processing unit (GPGPU) for parallel computation. There are already many works on parallel matrix operations in the literature. This work will focus on the Cholesky decomposition commonly used in computing the inverse of a symmetric positive definite matrix. First, several open source GPU-based Cholesky decomposition programs on the Internet were located, analyzed, and evaluated. Then several strategies for performance improvement were proposed and tested. After experiments, compared to the CPU version using the Intel Math Kernel Library (MKL), our proposed GPU improvement strategy can achieve a speedup of 3.5x on the Cholesky decomposition of a square matrix of dimension 10,000.

並列關鍵字

Cholesky Decomposition ； Graphics Processing Units ； Intel Math Kernel Library

參考文獻

[11] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, Numerical Recipes in C - The Art of Scientific Computing, 2nd Edition, Cambridge University Press, 1992.

[3] Gene H. Golub, and Charles F. Van Loan, Matrix Computations, Third Edition, The Johns Hopkins University Press, 1996.

[4] Hatem Ltaief, Stanimire Tomov, Rajib Nath, and Jack Dongarra, "Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators."

[5] IntelR, IntelR Math Kernel Library Documentation.

[7] MAGMA, Matrix Algebra on GPU and Multicore Architectures.

國際替代計量

對稱正定矩陣Cholesky分解之GPU加速研究

全文下載

主題瀏覽