透過您的圖書館登入
IP:18.119.123.252
  • 學位論文

在叢集伺服器上高效能多維奇異值分解及基於閉曲線積分的特徵值分解

Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster

指導教授 : 王偉仲
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


我們改進了"多維奇異值分解"及"基於閉曲線積分的特徵值分解"。在"多維奇異值分解"中,我們使用不同的方式實作兩個演算法,並提升解決巨量資料的能力。隨著資料越來越多樣,找到方法去快速且有效得壓縮和分析資料中的多角關係對於高效能計算中是非常重要的。我們實做了兩個已經存在的演算法:多維奇異值分解及逐步降維多維奇異值分解將多維矩陣分解,利用 GPU 來加速他們是非常困難的,因為我們幾乎無法把資料一次放進 GPU 的記憶體當中。所以我們利用 QR 和 Gram 來協助我們縮減問題的大小,而我們實作 QR 和 Gram 是將資料分成一部份一部份來解決,這樣不但能讓 GPU 有能力處理,還能利用計算的部分遮蓋掉資料搬移的時間,最後我們相對於原先利用 cuda 函式庫時做的能加速 163.21 倍,在未來希望能夠將這個方法使用在實際問題上。在"基於閉曲線積分的特徵值分解"中,我們藉由基於閉曲線積分的特徵值分解,展示了一個分治法的處理方式去解決區域中有太多的特徵對的問題,並應用它在有機材料模擬。在許多應用上,解特徵值分解扮演了重要的角色,這些矩陣通常都是稀疏的大矩陣,但通常實際上我們只需要一小部分的特徵對,現在有 FEAST 或 CIRR 能夠幫忙解決這類的問題,然而當區域中有許多特徵對時,會變得非常慢,所以必須將問題切成許多小問題。決定分區雖然困難但非常重要,要讓每個小問題都能被輕鬆解決,另外當有些特徵對收料的比較早時,原先的方法還是會花時間繼續迭代他們。我們介紹了兩種分區方式:藉由預測特徵對的數量來分解、藉由先備知識來分解。我們提升解決區域中有過多的特徵對問題,且藉由恰當的分區,分治法會比較原先的還快,也利用凍結技巧去讓程式不要花費時間在已經收斂的特徵對上。之後想設計一個方法能夠讓各個分區解決的時間差不多。

並列摘要


We implement, accelerate, and improve "Higher-Order Singular Value Decomposition" and "Contour Integral based Eigen Decomposition". In "Higher-Order Singular Value Decomposition", we implemented two methods with different strategies and improved the ability to solve the large tensor problem. With the explosion of big data, finding ways of compressing and analyzing large data sets with the multi-way relationship - i.e., tensors - quickly and efficiently have become critical in High-Performance Computing. We implement two existed methods which are Higher-Order Singular Value Decomposition and Sequential Truncated Higher-Order Singular Value Decomposition to achieve Tucker Decomposition. Implementing them with GPU is very difficult because we usually can not store the whole tensor into GPU memory. We use QR method and Gram method to reduce the problem size to make its size allowed by GPU memory. We also implemented QR method and Gram by part-by-part. It can help us to solve the large data problem and use computing to cover data transferring. Finally, We achieve 163.21x speedup over a CUDA library-based solution. In the future, we want to apply it to the real application. In "Contour Integral based Eigen Decomposition", we proposed a divide-and-conquer flow to solving the certain eigenpairs in the specific region containing many eigenpairs with eigensolver based on contour integral with the locking technique, and use it to solve the generalized eigenvalue problem from the organic material simulation. Solving eigenvalue problems is an essential part of many applications. Those matrices are often large and sparse, but the eigenpairs only are required in the region of interest. Several solvers can solve the eigenpairs in the selected region such as FEAST and CIRR. When there are many eigenpairs in the selected region, the performance is slow, so the partition of the region is needed. Deciding the partition is very difficult but critical such that solving each sub-region should be efficient. When some eigenvector is converged early, the solver still spends time on them. We introduce the two partition method, uniform dividing by the estimated eigenvalue number and dividing by domain acknowledgment. We increase the eigensolver ability to solve the region containing many eigenpairs and get better performance with the proper partition. We also use the locking technique to avoid spending the time on converged eigenpairs. In the future, we would like to design an automatic flow to generate the partition whose sub-region spends almost the same executing time.

參考文獻


[1]  B. W. Bader and T. G. Kolda. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software (TOMS), 32(4):635–653, 2006.
[2]  L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253– 1278, 2000.
[3]  J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential qr and lu factorizations. SIAM Journal on Scientific Computing, 34(1):A206–A239, 2012.
[4]  J. W. Demmel. Applied numerical linear algebra, volume 56. Siam, 1997.
[5]  E. Di Napoli, E. Polizzi, and Y. Saad. Efficient estimation of eigenvalue counts in 
an interval. Numerical Linear Algebra with Applications, 23(4):674–692, 2016.

延伸閱讀