基於增強式學習在用戶偏好的推薦系統

近年來，推薦系統(Recommendation system)的需求與應用日漸增加，不管在哪個平台，諸如網路購物、影音平台、社群軟體，甚至是網路新聞，用戶所面對能選擇的東西越來越多，可能用戶只有幾十萬但是商品項目卻高達幾千幾億，因此在面對龐大的可選擇資訊量當中，該如何減少選擇時間與提高推薦給顧客喜歡東西的準確率已然成為一門重要的課題。在本論文中我們將在電影平台，透過協同過濾、基於內容、矩陣分解等的方式，根據使用者的興趣或過去行為去分析用戶的習慣，以及用戶對電影的評分，系統便會進一步把其他跟你相似的人所喜歡的物品推薦給你，並去找出用戶所喜歡的電影，達到個人化的推薦效果。在本論文中使用了增強式學習(Reinforcement Learning, RL)的推薦方法，在過去傳統的推薦方法是使用item-based與user-based，但用戶的行為模式不單單只有對電影的評分而已，用戶可能受到很多隱藏、次要因素影響評分標準，諸如導演、演員、劇情等等，相較於過去傳統單純比較使用者與其他用戶的評分相似度來作比較並推薦，使用增強式學習來讓模型學習並預測用戶的未來的行為習慣能為系統帶來更好、更長遠的效益，並且利用矩陣分解(Matrix Factorization, MF)技術中相當成熟的奇異值分解(Singular Value Decomposition, SVD)去做先前的處理，矩陣分解可以將用戶對項目的評分矩陣（User-item Matrix）拆解成維度較小的矩陣來作運算，並試圖找出用戶與電影之間的特徵矩陣，因為通常諸如各電商平台、電影平台抑或是電子書平台，User-item Matrix會是一個非常稀疏的矩陣，因為item 相對於user是較多的，在item數量過多的情下，用戶可能只對龐大item中少數的item作過評分，因此，矩陣分解將Rating矩陣與Item矩陣拆解並投射到較低維度的矩陣，並藉由用戶所評分過的電影中找出影響評分的主要因素，另外在推薦電影的這個領域，電影之間存在相關性，當電影數量增加，所含的訊息量則不會隨著電影數增加而線性增加，可以有效的降低計算複雜度與訓練時間，並結合主題模型(Topic Model, TM)去應用在SVD中的特徵矩陣來做特徵分析，讓SVD中被多個特徵結和在一起的特徵向量能有個明確的標籤分類，最後透過增強式學習來去模擬用戶的行為模式，讓模型去模仿用戶可能有的行為及評分習慣，並透過模型去了解用戶時間性的動態變化，來提供更好的預測準確度，這種方法能有效地處理資料稀疏、個人化、冷啟動等的問題。

關鍵字

推薦系統；增強式學習；奇異值分解；個人化；個人偏好

並列摘要

In recent years, the demand and application of recommendation systems have increased. No matter which platform, such as online shopping, audio-visual platforms, social software, or even online news, users are facing more and more choices. There may be hundreds of thousands of users but hundreds of millions of products. Therefore, in the face of the huge amount of optional information, how to reduce the selection time and improve the accuracy of recommending things to customers has become an important Subject. In this paper, we will analyze the user’s habits based on the user’s interest or past behavior and the user’s rating of the movie on the movie platform through collaborative filtering, content-based, matrix decomposition, etc. The system will further Recommend items that people like you like to you, and find out the movies that users like to achieve a personalized recommendation effect. In this paper, the recommendation method of Reinforcement Learning (RL) is used. In the past, the traditional recommendation method was to use item-based and user-based, but the user's behavior pattern is not only the rating of the movie. It may be affected by many hidden and secondary factors, such as directors, actors, plots, etc., compared to the traditional comparison of the user’s rating similarity with other users for comparison and recommendation, using enhanced learning to make the model Learning and predicting the user’s future behavior and habits can bring better and longer-term benefits to the system, and use Matrix Factorization , MF) technology is quite mature singular value decomposition (Singular Value Decomposition, SVD) to do the previous processing, matrix decomposition can disassemble the user-item matrix (User-item Matrix) of the user to the item into a matrix with smaller dimensions. Calculation, and try to find the feature matrix between users and movies, because usually, such as e-commerce platforms, movie platforms, or e-book platforms, the User-item Matrix will be a very sparse matrix, because item is relatively If there are too many items, users may only rate a few items in a large item. Therefore, matrix decomposition disassembles the Rating matrix and the Item matrix and projects them to a lower-dimensional matrix. Find out the main factors that affect the ratings from the rated movies. In addition, in the field of recommended movies, there is a correlation between movies. When the number of movies increases, the amount of information contained will not increase linearly with the increase in the number of movies , It can effectively reduce the computational complexity and training time, and combine the topic model (Topic Model, TM) to apply the feature matrix in the SVD for feature analysis, so that the feature vector combined by multiple features in the SVD can be There is a clear label classification, and finally through enhanced learning to simulate the user's behavior pattern, let the model imitate the user's possible behavior and scoring habits, and use the model to understand the user's temporal dynamic changes to provide better predictions Accuracy, this method can effectively deal with data sparseness, personalization, cold start and other issues.

並列關鍵字

Recommendation system ； Reinforcement learning ； Singular value decomposition ； Personalization ； Personal preference

參考文獻

[1] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, “Item-based Collaborative Filtering RecommendationAlgorithms”, WWW10, May 1-5, 2001, Hong Kong.

Google Scholar

[2] Yancheng Jia, Changhua Zhang, Qinghua Lu and Peng Wang, “Users' brands preference based on SVD++ in recommender systems”, 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA).

Google Scholar

[3] Shuiguang Deng, Longtao Huang, Guandong Xu, Xindong Wu and Zhaohu Wu, “On Deep Learning for Trust-Aware Recommendations in Social Networks”, IEEE Transactions on Neural Networks and Learning Systems, vol. 28, pp. 1164-1177, May. 2017.

Google Scholar

[4] Isshu Munemasa, Yuta Tomomatsu, Kunioki Hayashi and Tomohiro Takagi, “Deep reinforcement learning for recommender systems”, 2018 International Conference on Information and Communications Technology (ICOIACT).

Google Scholar

[5] Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti and Ed H. Ch, “Top-K Off-Policy Correction for a REINFORCE Recommender System”, Twelfth ACM International Conference on Web Search and Data Mining (WSDM’ 19), February 11-15, 2019, Melbourne, VIC, Australia. ACM, New York, NY, USA, 9 pages.

Google Scholar

國際替代計量

基於增強式學習在用戶偏好的推薦系統

全文下載

主題瀏覽