利用MapReduce實作分散式協同推薦系統

現在的資訊社會有許多應用都是仰賴大量資料而形成，Recommendation System便是一例。Recommendation System近年來大量應用在電子商務，主要用在推薦使用者可能會感興趣的物品，希望能藉此提高使用者對該網站有更多的探索，用處極廣，是資料處理相關領域相當重要的一個研究議題。隨著商業規模的擴大，Recommendation System所要計算的資料大小也伴隨增加，計算所需花費的時間更是呈倍數成長。 Apache Mahout實作了一個使用MapReduce Framework來進行Large scale data-process 的Recommendation System運算，希望能夠藉由MapReduce讓運算更有效率並達到更高的精確度。MapReduce是一個用在大量資料分散式運算的programming model，能夠架設在cluster中，將資料分散到數台機器進行運算，用以進行大規模的資料處理。是Google先進行實作且大量運用在大規模資料的處理，後來Apache Hadoop跟進且將其MapReduce Project發展至更廣大的應用層面。而我們針對上述Mahout Distributed Item Based Recommendation System進行分析後，針對該實作中花費較多執行時間的Similarity Matrix計算部分進行修改，改為進行Stochastic SVD和配合相關數學推導，實作出兩個Distributed Collaborative-based Recommendation Systems。本篇論文使用Apache Hadoop架設了兩組Cluster，對Mahout Distributed Item Based Recommendation System和Distributed SSVD Recommendation System進行實驗測試，並且進一步比較其整體表現。

關鍵字

推薦系統；協同； MapReduce ； Mahout

並列摘要

Recommendation System has been widely used in electronic commerce recently. To promote user’s visiting of websites, it recommends objects that users might be interested. Nowadays, large E-commerce sites often have millions of items and users, which increase the computation workload of Recommendation System rapidly. Apache Mahout, an open source machine learning library, which uses MapReduce framework to implement Collaborative Filtering Recommendation Systems, is desinged to make large-scale data process more efficient. MapReduce framework is a distributed computation programming model. It is first proposed by Google and applied to the development of many Google’s servies. Then Apache Hadoop developed its MapReduce Project which has more widespread applications. MapReduce is mainly used to do large-scale data processing by distributing data and computation to different nodes of Cluster. In this thesis we present the work of analyzing the processing of a Mahout Distributed Item Based Recommendation System and improving the most time comsuming part, which is the computation of similarity matrix. Two new algorithms of Distributed Collaborative-based Recommendation System are proposed and implemented using Stochastic SVD. Moreover, we conducted experiments to compare the performance and accuracy of those algorithms on two different clusters of Apache Hadoop servers. Experimental evaluations showed our algorithms and implementation can improve the performance of Mahout Distributed Item Based Recommendation System 2.5 times and its accuracy by Stochastic SVD features.

並列關鍵字

Recommendation System ； Collaborative Filtering

參考文獻

[1] S. Owen, R. Anil, T. Dunning, E. Friedman. Mahout in Action. Manning Publication Co., 2012.

[2] T White. Hadoop: The Definitive Guide, Second Edition. O’reilly.(2010).

[4] Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce. April 11, 2010.

[5] J. L. Herlockerer al.. Evaluating Collaborative Filtering Recommender System. ACM Transactions on Information Systems, 2004.

[7] B. Sarwar, G. Karypis, J. Konstan and J. Riedl. Item-Based Collaborative Filtering Recommendation Algorithms. ACM/Hong Kong. 2001.

被引用紀錄

胡維哲（2016）。基於演化式演算法與叢集運算之動態資料驅動預測模型〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201600926

國際替代計量

利用MapReduce實作分散式協同推薦系統

全文下載

主題瀏覽