大規模矩陣分解與其廣義模型

矩陣分解在許多領域如線上商品推薦與社交網路分析中均扮演著重要的腳色。本文旨在探究大規模矩陣分解在實務上所遭遇的一些困難並提出對應的解決辦法。首先，矩陣分解的技術中，隨機梯度法是目前最重要的演算法之一，但有效率的調整隨機梯度法中的學習速率仍是一個巨大的挑戰。我們針對應用於矩陣分解的隨機梯度法提出了一個有效的速率調整策略並藉此改進了隨機梯度法的收斂性質。又由於目前大部分的矩陣分解軟體均不支援平行運算，這些軟體的使用者們很難得益於現行分享式記憶體與多核處理器平台強大的平行計算能力。基於我們最近所發展出的平行隨機梯度法，我們設計了一個新的矩陣分解函式庫—LIBMF—並開放給公眾使用。在LIBMF的架構下，我們可以求解數種不同的矩陣分解問題。最後，我們討論了一個廣義的矩陣分解模型—場域分解機（Field-aware factorization machine）；目前已知此模型在資料非常稀疏的分類問題中可以達到很好的效果。

關鍵字

矩陣分解；隨機梯度法；平行計算；分解機；場域分解機

並列摘要

Matrix factorization (MF) is a popular technique in many applications include online recommendataion and social network analysis. Our work aims to address some issues to make MF a practically useful technique for large-scale cases. The first issue is the learning rate of stochastic gradient (SG) methods for matrix factorization. Currently, stochastic gradient methods are one of the most important training methods for MF, but how to effectively adjust the learning rate in SG remains a challenging issue. We propose a useful scheme to adjust the learning rate so that the convergence of stochastic gradient methods for MF is improved. Second, MF users do not benefit from the recent advances of shared-memory systems with multi-core CPUs because most existing packages do not support parallel training. Based on our recently developed parallel SG algorithms, we create a new MF library LIBMF for public use. LIBMF can solve several MF problems in a unified way. In the third part of this thesis, we investigate an extension of MF called field-aware factorization machine (FFM). It is useful for classification problems with highly sparse data.

並列關鍵字

matrix factorization ； stochastic gradient methods ； parellel computation ； factorization machine ； field-aware factorization machine

參考文獻

R. M. Bell and Y. Koren. Lessons from the Netflix prize challenge. ACM SIGKDD Explorations Newsletter, 9(2):75–79, 2007.

C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., 2006. ISBN 0387310738.

O. Chapelle, E. Manavoglu, and R. Rosales. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology, 5(4): 61:1–61:34, 2015.

L. Dagum and R. Menon. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 5:46–55, 1998.

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf.

國際替代計量

大規模矩陣分解與其廣義模型

全文下載

主題瀏覽