推薦多樣性與異質資料協同過濾之研究：以機器學習
為徑路

本論文主要探討推薦系統 (Recommender System) 研究當中兩個相當有趣而重要的主題：推薦多樣性和異構 (Heterogeneous) 數據協同過濾。雖然協同過濾（Collberative Filtering）演算法，建構推薦系統最成功的徑路之一，近年來已達到相當不錯的準確率，近期研究發現僅僅關注準確率是不夠的，我們應該多面向地考量推薦系統的效能：例如推薦結果的多樣性，以及對於異構數據的適應性，才能滿足未來各式各樣多變的推薦情境。在這篇論文中，我對上述兩主題進行了完整的研究，並分別提出能夠有效地多樣化推薦結果，以及靈活地利用異質數據產生更準確、穩定推薦結果的兩個新方法。先前的研究指出，行銷更多長尾 (Long Tail) 端的商品可望達成企業、客戶雙贏。然而大多數基於協同過濾的系統仍傾向於推薦較熱銷的商品。在本論文中我提出了一種嶄新方法：藉著將“推薦次數”視為一種“資源”，並依據用戶之間的“相對偏好”來將這些資源分配給各個商品，來多樣化推薦結果。此方法能幫助推銷更多長尾端商品，增進總體推薦多樣性，並同時保持合理的推薦準確性水準。實驗結果顯示，這個新方法可以有效地從長尾端發掘值得推薦的商品。此外，由於協同過濾演算法在資料極端稀疏時無法穩定地產生準確的推薦，複合方法 (Hybrid Method) 利用產品描述、使用者資訊等輔助資訊 (Side Information) 來產生品質更好的推薦結果。然而，我發現大多數複合方法都存在以下 3 項限制：1）適應性 (Adaptivity)：缺乏適當的介面承接已知資訊以外的額外異構輔助資訊。2）靈活性(Flexibility)：模型架構變更需要密集專家知識，無法輕易根據不同的vii輸入資訊或推薦任務靈活地修改。3）通用性 (Generality)：模型架構設計基於輸入資料的相關性，並因此限制了參數學習的自由度。以上三項限制使得目前可得的方法無法有效地利用未來來自物聯網 (Internetof Things) 的大規模異構數據：如各式感知器 (Sensor) 數據，以及社交媒體 (Social Media) 上，使用者產生的各式資訊，來產生更有效，並且更符合情境的推薦。在本論文中，我提出了一個複合式推薦系統的端到端 (End-to-End) 深度學習 (Deep Learning) 架構。藉著將使用者喜好預測數學化地描述為一種嵌入學習 (Embedding Learning) 過程，我的方法為各種異構數據輸入提供了模組化的介面，並為真實世界中各式推薦情境提供非常靈活的模型結構。另外，此方法不需依賴數據相關性的假設，並且能夠從各個輸入資訊萃取出的特徵 (Feature) 值上進一步學習更細緻的特徵。我採用了兩個異構數據集 (Dataset)，MovieLens 及MoviePilot，試驗本方法在兩種不同的推薦情境下的效能。實驗結果顯示我提出的架構能夠靈活地適應不同的數據輸入以及不同的推薦情境，並且擁有最先進的推薦準確率。

關鍵字

推薦系統；深度學習；多樣性；異構數據；類神經網路；長尾

並列摘要

This dissertation focus on two interesting and important topics about recommender systems (RS): recommendation diversity and heterogeneous data collaboration. To fulfill the various contexts of recommender system, the most popular and successful approaches for building RS, collaborative filtering (CF) methods become insufficient. Other factors, such as the ability of providing more diverse recommendations and the capability of adapting het- erogeneous information, should also be taken into consideration. In this work, I proposed two novel approaches for diversifying recommendation results and learning user performance from heterogeneous data respectively. On the one hand, studies have shown that more the sales of long-tail items could be more beneficial to both customers and some business models. However, the majority of CF-based methods tend to recommend popular selling items. I proposed a novel approach which diversifies the results of recommender systems by considering “recommendations” as resources to be allocated to the items according to the “relative preference” between users. My approach enhances the aggregate recommendation diversity by promoting long-tail items and maintains a reasonable level of accuracy simultaneously. The experimental results show that this approach can discover more worth-recommending items from Long Tails and improves user experiences. On the other hand, since CF-based methods often suffer from sparsity problem, hybrid methods utilize side information, such as product descriptions and user profiles to provide more robust recommendations. However,I noticed 3 common constraints among available hybrid methods in terms of 1)Adaptivity: no interfaces for additional heterogeneous side information; 2)Flexibility: model modification requires expertise-intensive process; and 3)Generality: model design depends on correlation between source data and limited inter-sources parameter leaning. These 3 constraints make previous approaches insufficient to leverage large scaled heterogeneous data (e.g., sen- sory data from Internet of Things and all kinds of user-generated data from social media) which will become increasingly accessible in the near future. I proposed an end-to-end deep learning framework for hybrid RS which provides modularized interfaces for additional inputs and flexible model structure for various recommendation scenarios and heterogeneous inputs. Moreover, my approach is able to learn more sophisticated features by considering the interaction between source data. I evaluated proposed approach under two different real-life scenarios: individual recommendation and group recommendation on two real-world heterogeneous datasets. The experimental results demonstrate that my approach holds above mentioned features and its performance suppressed the state-of-the-art approaches.

並列關鍵字

Recommender System ； Deep Learning ； Diversity ； Heterogeneous Data ； Neural Network ； Long Tail

參考文獻

[58] M. Zhang. Enhancing diversity in top-n recommendation. In Proceedings of the third ACM conference on Recommender systems, RecSys ’09, pages 397–400, New York, NY, USA, 2009. ACM.

[11] D. G. Goldstein and D. C. Goldstein. Profiting from the long tail. Harvard Business Review, 84(6):24–28, June 2006.

[1] G. Adomavicius and Y. Kwon. Maximizing aggregate recommendation diversity: A graph-theoretic approach. In Proceedings of workshop on novelty and diversity in recommender systems, pages 3–10, 2011.

[2] G. Adomavicius and Y. Kwon. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering, 24(5):896–911, 2012.

[3] C. Anderson. The long tail: Why the future of business is selling less of more. Hyperion Books, 2008.

國際替代計量

推薦多樣性與異質資料協同過濾之研究：以機器學習為徑路

全文下載

主題瀏覽