大規模線性排序支持向量機在分散式環境下之分析實作

在排序學習中，要快速地得到一個基準模型作為比較，線性排序支持向量機是一個有用的方法。雖然它的平行機制已經被探討且實作在圖形處理器上面，但此實作有可能無法處理大規模的數據集。在本論文中，我們提出兩種平行架構，用分散式牛頓法訓練L2損失函數之線性排序支持向量機。我們小心的探討降低溝通成本以及加速運算的技術，並且在稠密和稀疏的數據集上比較兩種平行機制的優劣。實驗顯示本文提出的方法在兩種數據集上會遠比單機運算快，分別為資料量遠大於特徵數以及特徵數遠大於資料量的數據集。

關鍵字

大規模學習；排序支持向量機；分散式牛頓法

並列摘要

Linear rankSVM is a useful method to quickly produce a baseline model for learning to rank. Although its parallelization has been investigated and implemented on GPU, it may not handle large-scale data sets. In this thesis, we propose a distributed trust region Newton method for training L2-loss linear rankSVM with two kinds of parallelizations. We carefully discuss the techniques for reducing the communication cost and speeding up the computation, and compare both kinds of parallelizations on dense and sparse data sets. Experiments show that our distributed methods are much faster than the single machine method on two kinds of data sets: one with its number of instances much larger than its number of features, and the other is the opposite.

並列關鍵字

Learning to rank ； Ranking support vector machines ； Large-scale learning ； Linear model ； Distributed Newton method

參考文獻

[2] A. Airola, T. Pahikkala, and T. Salakoski. Training linear ranking SVMs in linearithmic time using red-black trees. Pattern Recognition Letters, 32(9):1328-1336, 2011.

[5] Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11:1471-1490, 2010. URL http://www.csie.ntu.edu.tw/~cjlin/papers/lowpoly_journal.pdf.

[7] O. Chapelle and S. S. Keerthi. Efficient algorithms for ranking with SVMs. Information Retrieval, 13(3):201-215, 2010.

[8] D. Christensen. Fast algorithms for the calculation of Kendall's tau. Computational Statistics, 20:51-62, 2005.

[9] C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273-297, 1995.

國際替代計量

大規模線性排序支持向量機在分散式環境下之分析實作

全文下載

主題瀏覽