A large-scale linear classifier is useful for document classification and computational linguistics. The L1-regularized form can be used for feature selection, but its non-differentiability causes more difficulties in training. Various optimization methods have been proposed in recent years, but no serious comparison among them has been made. In this paper, we carefully address implementation issues of some representative methods and conduct a comprehensive comparison. Results show that coordinate descent type methods may be the most suitable in general situations though Newton method has the fastest final convergence.