類Locally Linear Embedding資料降維演算法GPU加速框架

近年來隨著資料的多元化與多樣化，還有資料量也以倍數亦或指數的方式來進行成長，雖然人們可以很輕鬆的獲取到任何形式的資料，但人們要從這些取得的資料獲得更深一資料隱藏的關係或意義要進行資料分析、資料探勘亦或機器學習，以上的技術若要直接經由原始的資料來進行運算是十分困難的一件事，因為目前存在的原始資料集都是存在高維度的空間中，而如果要從高維度的空間中表現出資料間的關係，不只在空間表現上有困難，就算表現出來要用這些資料進行後續處理或讓人們理解這些資料間的關係都是不可能，因此，我們需要將把取得到的高維度資料集合進行資料降維，用資料降維技術來將資料內存在雜訊與多餘的資訊去掉，這樣一來就可以在低維度的空間中呈現這些資料，在低維度的空間來呈現就可以比較容易理解資料間的關係，而目前有很多種資料降維的研究是在研究降維的準確性及新的降維運算，但比較少的論文在探討將資料降維技術進行平行運算，所以本篇論文將探討平行資料降維技術。目前一般的空間集合中資料集合是成非線性的分佈，所以我們將針對非線性的資料降維來進行平行加快的運算。在非線性的資料降維技術中屬LLE這類的演算法可平行度最高，因此，我們將採用這個資料降維演算法進行平行加速，但如果單純的把資料降維技術平行在CPU的計算平台上以目前的龐大資料量會造成龐大的運算量加速效果是十分有限，所以除了CPU平行之外還有另一個平行的平台運算技術，就是圖形運算處理器（GPU）平行運算，本篇論文將把LLE這個資料降維技術上的KNN搜尋演算法以及大型稀疏矩陣解特徵值的這兩個部份平行在GPU的運算平台上，並透過這個方法來加快所有基於LLE為基礎而發展出來的資料降維技術其執行時間。本篇論文將KNN搜尋演算法以及大型稀疏矩陣解特徵值的這兩個部份平行在GPU的運算平台（CUDA）上後得到了不錯的加速效果，在KNN的方面本篇的整體加速到達了四十至五十倍，在解大型稀疏矩陣特徵值的部份加速至十倍左右，至於整體的資料降維演算法加速十倍左右，在實驗結果得知我們有效的運用GPU平行來加速LLE這類的演算法。

關鍵字

圖形運算處理器； CUDA ； LLE演算法； KNN搜尋；大型稀疏矩陣解特徵值

並列摘要

There are more and more data format in the information wor. And there are many algorithms or techniques to present data’s relationship, such as data mining、data analysis. But the data always have high dimension data structure in real world. It’s so difficult implement these data’s relation presentation algorithms or techniques, because the data dimension is multi-dimension. It will take much effort to present data’s relation and let use to realize the graph. So we need a technique to reduce data dimension. There are many data dimension reduction algorithms, such as PCA, MDS, Isomap and LLE…etc. And there are many research papers to discuss the issue which are like “how can we reduce the data dimension accurately” or “how can we modify some algorithms flow to increase the algorithm’s precision”. But there are few papers to talk about these algorithms efficiency or speed up these algorithms. This thesis will increase a dimension reduction algorithm’s computation speed through parallelize that algorithm and different computation platform. First, we chose one dimension reduction algorithm as our improve target. And the algorithm is Locally Linear Embedded（LLE）. Because, the data set always form a nonlinear graph in real world. And LLE is a nonlinear dimension reduction algorithm. Second reason, the LLE algorithm have high parallelize computation ability. So, our target is how improve LLE algorithm in parallel computation. GPU computing architecture is so hot in recently. It has powerful float computation ability and high parallel computation capability. So, we use GPU computing architecture to execute our parallel LLE algorithm. We only port KNN search algorithm and large sparse eigen solution (LSES) functions to GPU. Because KNN search algorithm and LSES have heaving calculation loading. We get good performance after we port KNN and LSES. The parallel GPU KNN algorithm speedup 40X~50X performance. And LSES increase 10X performance.

並列關鍵字

LLE ； GPU ； KNN ； Dimension reduction

參考文獻

[1]. Roweis, Sam T. and Saul, Lawrence K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. SCIENCE. 2000, Vol. 290.

[4]. Visualization using Locally Linear Embedding method. Nhut, Nguyen Minh. 2006.

[5]. Kruskal, Joseph B. and Wish, Myron. Multidimensional scaling. s.l. : Sage Publications, Inc, 1978. 0803909403.

[6]. Tenenbaum, Joshua B., Silva, Vin de and Langford, John C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. SCIENCE. 2000, Vol. 290.

[7]. Saul, Lawrence K. and Roweis, Sam T. An Introduction to Locally Linear Embedding.

國際替代計量

類Locally Linear Embedding資料降維演算法GPU加速框架

主題瀏覽