巨量資料之密度分析演算法在異質多核心平台之可擴充性設計與最佳化研究

由於現今高維度資料的普及，使得應用於高維度資料的密度估計法及如何從資料中分析資訊變成運算科學領域中相當重要的工作。Bayesian Sequential Partitioning (BSP)是近期開發的一種統計上分析高維度資料非常有效率的密度估計法，然而，BSP也是富含大量運算及大量資料存取的一種演算法，其統計模型需大量的運算，且需針對在高維度樣本空間中的大量資料進行重複存取，上述特性使BSP在未來巨量資料的情況下會對運算平台的設計考量造成很大的挑戰。本論文針對BSP於異質多核心平台上提出可擴充性之設計，將單一通量圖形處理器上之設計擴展至多通量圖形處理器以解決圖形處理器上記憶體有限的問題，此外，本論文更進一步突破系統限制，在搭配較少的圖形處理器的系統上進行更大量資料的BSP分析。根據實驗結果，本論文提出的技術能有效地改善單一通量圖形處理器上之設計。

關鍵字

通量圖形處理器；貝氏循序切割演算法；可擴充性設計；多通量圖形處理器

並列摘要

Due to the prevalent of feature-rich data content, high dimensional density estimation has become an important and effective machine learning analytics. Bayesian Sequential Partitioning (BSP) is a recently developed density estimation algorithm that has been demonstrated statistically effective. However, BSP is known to be both computation intensive as well as data intensive. The statistic models require heavy algorithmic computing, while the iterative counting on the massive number of samples within specific hyper-regions causes accesses to large volume of data. These attributes pose great design challenges to attain superior performance. This paper proposes scalable designs of BSP on heterogeneous many-core platforms. The paper first introduces a Distributed Execution Mode (DEM) of BSP algorithm to exploit GPGPUs in a system. DEM involves a fully distributed data structure and execution management scheme to leverage the computing capability of multiple GPGPUs. The paper then proposes a Collaborative Processing Mode (CPM) to efficiently orchestrate the execution between the host and device when the data size exceeds the total memory capacity of GPGPUs. The proposed DEM has demonstrated up to 3.09x runtime enhancement with four GPGPUs, and the CPM can achieve 9.97x speedup for After Copula of BSP.

並列關鍵字

GPGPU ； Scalable design ； BSP ； multi-GPGPUs

參考文獻

[2] L. Lu. H, Jiang and W. H. Wong, “Multivariable density estimation by Bayesian Sequential Partitioning,” Journal of the American Statistical Association, 2013.

Available: http://www.nvidia.com,2007

[8] J. Martin and J. Leben, “TCP/IP networking: architecture, administration, and programming,” 1994.

[9] W. Gropp, E. Lusk, N. Doss and A. Skejellum, “A high-performance, portable implementation of the MPI message passing interface standard,” Parallel Computing, vol. 22, issue. 6, pp. 789-828.

[11] Dagum. L. and Menon. R, “OpenMP: and industry standard API for shared-memory programming,” Computational Science & Engineering, IEEE, vol. 5, issue. 1, pp. 46-55.

國際替代計量

巨量資料之密度分析演算法在異質多核心平台之可擴充性設計與最佳化研究

全文下載

主題瀏覽