透過您的圖書館登入
IP:3.16.218.62
  • 學位論文

大規模L1正規化邏輯迴歸最佳化方法之比較

A Comparison of Optimization Methods for Large-scale L1-regularized Logistic Regression

指導教授 : 林智仁

摘要


邏輯迴歸是一種常被應用在文件分類與計算語言學上的技術。L1 正規化的邏輯迴歸可被視為一種特徵選取的方式,然而它不可微分的特性增加了問題的困難度。近年來有多種最佳化方法被用在解決這個問題上,但這些方法彼此之間卻缺乏嚴謹的比較。在這篇論文之中,我們提出了一種信賴區間牛頓法,並將它與數種已知的最佳化方法比較。實驗結果顯示我們提出的方法並不亞於目前最新的最佳化方法。另一個實驗比較了 L1 與 L2 正規化的邏輯迴歸,結果證實了在達到相似準確度的前提之下,使用 L1 正規化邏輯迴歸可得到比 L2 正規化邏輯迴歸更為稀疏的向量解。

並列摘要


Large-scale logistic regression is useful for document classification and computational linguistics. The L1-regularized form can be used for feature selection, but its non-differentiability causes more difficulties in training. Various optimization methods are proposed in recent years, but no serious comparison between them has been made. In this thesis we propose a trust region Newton method and compare several existing methods. Result shows that our method is competitive with some state-of-art L1-regularized logistic regression solvers. To investigate the applicability of L1-regularized logistic regression, we also conduct an experiment to show that compared to L2-regularized logistic regression, a sparser solution is obtained with similar accuracy.

參考文獻


G. Andrew and J. Gao. Scalable training of L1-regularized log-linear models. In ICML, 2007.
S. Benson and J. J. Mor e. A limited memory variable metric method for bound constrained minimization. Technical report, Argonne Lab., 2001.
R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Statist. Comput., 16:1190-1208,1995.
A. Genkin, D. D. Lewis, and D. Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3):291-304, 2007.
J. Kazama and J. Tsujii. Evaluation and extension of maximum entropy models with inequality constraints. In EMNLP, 2003.

延伸閱讀