透過您的圖書館登入
IP:18.118.184.237
  • 學位論文

大規模線性排序支持向量機在分散式環境下之分析實作

Analysis and Implementation of Large-scale Linear RankSVM in Distributed Environments

指導教授 : 林智仁

摘要


在排序學習中,要快速地得到一個基準模型作為比較,線性排序支持向量機是一個有用的方法。雖然它的平行機制已經被探討且實作在圖形處理器上面,但此實作有可能無法處理大規模的數據集。在本論文中,我們提出兩種平行架構,用分散式牛頓法訓練L2損失函數之線性排序支持向量機。我們小心的探討降低溝通成本以及加速運算的技術,並且在稠密和稀疏的數據集上比較兩種平行機制的優劣。實驗顯示本文提出的方法在兩種數據集上會遠比單機運算快,分別為資料量遠大於特徵數以及特徵數遠大於資料量的數據集。

並列摘要


Linear rankSVM is a useful method to quickly produce a baseline model for learning to rank. Although its parallelization has been investigated and implemented on GPU, it may not handle large-scale data sets. In this thesis, we propose a distributed trust region Newton method for training L2-loss linear rankSVM with two kinds of parallelizations. We carefully discuss the techniques for reducing the communication cost and speeding up the computation, and compare both kinds of parallelizations on dense and sparse data sets. Experiments show that our distributed methods are much faster than the single machine method on two kinds of data sets: one with its number of instances much larger than its number of features, and the other is the opposite.

參考文獻


[2] A. Airola, T. Pahikkala, and T. Salakoski. Training linear ranking SVMs in linearithmic time using red-black trees. Pattern Recognition Letters, 32(9):1328-1336, 2011.
[5] Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11:1471-1490, 2010. URL http://www.csie.ntu.edu.tw/~cjlin/papers/lowpoly_journal.pdf.
[7] O. Chapelle and S. S. Keerthi. Efficient algorithms for ranking with SVMs. Information Retrieval, 13(3):201-215, 2010.
[8] D. Christensen. Fast algorithms for the calculation of Kendall's tau. Computational Statistics, 20:51-62, 2005.
[9] C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273-297, 1995.

延伸閱讀