稀疏隨機特徵近似-希伯特空間下的座標下降法

在這篇論文中，我們提出了「稀疏隨機特徵近似—希伯特空間下的座標下降法」(Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space )。這個演算法的大致流程是，首先我們會在希伯特空間上去最小化L1正規化(L1-Regularization)的目標函式(objective)，然後經過數個迴圈的隨機座標下降法(Randomized Coordinate Descent, RCD)訓練(training)，最後會得到一個稀疏非線性的預測器(predictor)。藉由將這個演算法解釋成在無窮維度的隨機特徵座標下降，可以證明我們提出的方法會收斂至一個不差於實際核心解(exact kernel solution)ϵ-準確度(precision)的解，也就是只需抽取O(1/ϵ)數量的隨機特徵，就能將期望上的錯誤率降低到ϵ以下，反而是目前隨機特徵法使用的蒙地卡羅分析法(Monte-Carlo analysis) 卻需要抽取O(1/ϵ^2 )數量的隨機特徵，才能達到相同的正確率。在我們的實驗中，稀疏隨機特徵演算法會得到一個稀疏的解，且在訓練的過程中使用了較少的記憶體(memory) 以及較少的預測(prediction) 時間，且不論是在迴歸(regression)或是分類(classification)的問題上，都同時保持與核心機器(kernel machine)相當的表現(performance)。除此之外，在無窮維度(infinite-dimensional)下的L1正規化問題上，當提升演算法(boosting approach) 無法使用貪婪步驟(greedy step)求得準確的值的時候，我們使用的近似求解器(approximate solver)以及隨機方法(randomized approach)可以得到一個比提升演算法更好的解。

關鍵字

隨機特徵；核心逼近；支持向量機；隨機座標下降法；正規化

並列摘要

In this paper, we propose a Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space, which learns a sparse non-linear predictor by minimizing an ℓ1-regularized objective function over the Hilbert Space induced from kernel function. By interpreting the algorithm as Randomized Coordinate Descent in the infinite-dimensional space, we show the proposed approach converges to a solution comparable within ϵ-precision to exact kernel method by drawing O(1/ϵ) number of random features, contrasted to the O(1/ϵ^2)-type convergence achieved by Monte-Carlo analysis in current Random Feature literature. In our experiments, the Sparse Random Feature algorithm obtains sparse solution that requires less memory and prediction time while maintains comparable performance on tasks of regression and classification. In the meantime, as an approximate solver for infinite-dimensional ℓ1-regularized problem, the randomized approach converges to better solution than Boosting approach when the greedy step of Boosting cannot be performed exactly.

並列關鍵字

random feature ； kernel approximation ； support vector machine ； randomized coordinate descent ； regularization

參考文獻

[1] P. Ricktarik and M. Takac. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. School of Mathematics, University of Edinburgh, Tech. Rep., 2011.

[5] Ingo Steinwart and Andreas. Christmann. Support vector machines. Springer, 2008.

[8] A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. CVPR, 2010.

[12] G. S. Kimeldorf and G. Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41:495502, 1970.

[15] Petros Drineas and Michael W. Mahoney. On the nystrom method for approximating a gram matrix for improved kernel-based learning. JMLR, 2005.

國際替代計量

稀疏隨機特徵近似-希伯特空間下的座標下降法

全文下載

主題瀏覽