透過您的圖書館登入
IP:216.73.216.65
  • 學位論文

以分群演算法加速高斯程序迴歸

Scaling Gaussian Process Regression for Big Data

指導教授 : 盧信銘

摘要


高斯程序迴歸(Gaussian Process Regression)為機器學習中的監督式學習方法之一。該方法具有良好的預測能力,但時間複雜度高,其訓練模型時必須算一行列數會隨著訓練資料集線性上升的反矩陣,因此當資料量大時,高斯程序訓練時間會過長。本篇論文提出一個以分群演算法來加速高斯程序訓練時間的方法,且實驗顯示在訓練資料集有四萬筆資料時能提升七十倍以上的訓練速度,並且僅降低極小程度的預測能力,勝過其他具代表性的加速方法。

並列摘要


Gaussian process (GP) regression are non-parametric supervised learning methods in the field of machine learning. GP methods has excellent prediction performance, but need too much time on training models, because it has to solve a square matrix whose number of rows and columns are linear to the number of training data points, resulting in cubed time complexity. We proposed a method that uses clustering algorithm to speed up the training phase and approximate the prediction. The experiments show that our method costs less than one seventieth time of original GP given the training set has forty thousand data points while the error does not grow much. Compared to other approximation methods, our method uses less time and obtain prediction of less error.

參考文獻


[2] Chang, C.-C. and C.-J. Lin, LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2011. 2(3): p. 1-27.
[3] Coppersmith, D. and S. Winograd, Matrix multiplication via arithmetic progressions. J. Symb. Comput., 1990. 9(3): p. 251-280.
[4] Eddelbuettel, D. and R. Francois, Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 2011. 40(8): p. 1-18.
[5] Eddelbuettel, D. and C. Sanderson, RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Computational Statistics and Data Analysis, 2014. 71: p. 1054-1063.
[7] Hartigan, J.A. and M.A. Wong, Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979. 28(1): p. 100-108.

延伸閱讀