透過您的圖書館登入
IP:18.221.187.121
  • 學位論文

針對一次正則化及分組一次正則化問題的隨機活動集近端擬牛頓法

Stochastic Active-Set Proximal Quasi-Newton Method for L1 and Group-L1 Regularized Learning

指導教授 : 林守德

摘要


一次正則化(L1 regularization)已經是標準的特徵選取(feature selection)方法之一。當我們對於資料特徵有額外的結構資訊,我們可以將特徵選取延伸至挑選一些預先定義的特徵集合,而這可以由分組一次正規化(group-L1 regularization)來達成。面對高維度的機器學習問題,shrinking 以及貪婪方法常被用來維護一個工作集以縮小問題。然而shrinking 採取太過保守的策略,而貪婪方法需要過多的花費。在這篇論文中,我們提出了隨機活動集方法(Stochastic Active-Set method),透過無偏的近似貪婪方法維護一個活動集,並有夠高的機率達到收斂。藉由一個雙重採樣的策略,我們平衡了搜尋特徵空間的花費以及活動集的效果。除此之外,為了應對那些計算密集的損失函數,我們建構了一個擬牛頓法的新變體來配合我們的活動集方法,因此不需要計算所有的損失函數梯度。在我們的實驗中,對於計算密集的損失函數的一次正規化以及分組一次正規化問題,我們所提出的方法相較於目前最先進的方法,展現出了顯著的加速。

並列摘要


L1 regularization has become one of the standard techniques for feature selection. Furthermore, when structural prior knowledge is available, we can extend the feature selection problem to selecting (predefined) groups of variables, which can be achieved via solving a Lp;1 regularization. To deal with high dimensional learning tasks, shrinking heuristics and greedy method are often exploited to maintain a relatively small working set. However, in practice, shrinking could be too conservative while greedy method could be too expensive in the search of greedy variable. In this work, we propose a Stochastic Active-Set method that maintains an active-set by an approximate and unbiased greedy search method that finds active variables with certain probability suffices to achieve fast convergence. A balance between exploiting active set and exploring high-dimensional space is achieved via a double sampling strategy. In addition, to handle loss function of expensive evaluation cost, we construct a new variant of Quasi-Newton approximation for our Active-Set method, which does not require computation of the whole gradient vector. In our experiments, the proposed algorithm leads to significant speed up over state-of-the-art solvers for L1, L2;1 and Linf;1 regularized problems of computationally expensive loss function.

參考文獻


[1] G. Andrew and J. Gao. Scalable training of l 1-regularized log-linear models. In Proceedings of the 24th international conference on Machine learning, pages 33–40. ACM, 2007.
[2] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Structured sparsity through convex optimization. Statistical Science, pages 450–468, 2012.
[3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
[4] Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and testing low-degree polynomial data mappings via linear svm. The Journal of Machine Learning Research, 11:1471–1490, 2010.
[5] X. Chen, W. Pan, J. T. Kwok, and J. G. Carbonell. Accelerated gradient method for multi-task sparse learning problem. In 2009 Ninth IEEE International Conference on Data Mining, pages 746–751. IEEE, 2009.

延伸閱讀