CPU-GPU混合系統上QR分解的區塊大小調整

在CPU-GPU的混合系統中，因為MAGMA的QR分解採用的固定區塊大小造成CPU的閒置。為了增進效能，我們提出了一個自動調校區塊大小的方法。首先，將CPU和GPU上的子程式分別建立各自的迴歸模型。再來，我們使用了一個最佳化方法來決定最好的區塊大小。目標函數的設計是針對降低CPU和GPU閒置造成的效能損失。最後，我們提出了數值結果來展示我們的方法得到的效能提升。

關鍵字

GPU ； QR分解；自動調校

並列摘要

In CPU-GPU hybrid systems, the QR factorization in MAGMA re- sults in CPU idle due to the xed block size. To improve the computa- tional e ciency of MAGMA QR factorization, we propose a dynamic block size auto-tuning scheme on CPU-GPU hybrid systems. Our approach is a data-driven approach. First we model the CPU and GPU costs in MAGMA QR factorization via two independent regression models based on collecting training data. Next, according to these tting models, we propose a block size optimization scheme to tune the block size adaptively and therefore to minimize a cost objective function. The cost objective function is designed to balance the workloads between CPU and GPU based on the performance models. Several numerical results demonstrate the performance gains due to the novel QR factorization algorithm.

並列關鍵字

GPU ； QR Factorization ； Auto Tuning

參考文獻

[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,

Philadelphia, PA, third edition, 1999.

[3] Christian H. Bischof. Adaptive blocking in the QR factorization. The Jour-

[4] Takeshi Fukaya, Yusaku Yamamoto, and Shao-Liang Zhang. A dynamic pro-

gramming approach to optimizing the blocking strategy for the Householder

國際替代計量

CPU-GPU混合系統上QR分解的區塊大小調整

全文下載

主題瀏覽