基於使用者偏好可調概念空間的推薦系統設計

此篇研究提出了一種推薦系統設計，其中利用了以使用者為基礎的協同過濾、以項目為基礎的協同過濾和基於內容過濾的優點。不同的是，上述兩種以使用者或項目為基礎的協同過濾，其在項目或使用者空間中是高維度的。而基於內容過濾的方式，雖然可以處理協同過濾的冷啟動問題，但對於發覺使用者可能的潛在項目卻較為無效。此篇提出的推薦系統採用了「基於使用者和基於項目的概念空間」，維度的大小，或概念空間的數量，在有必要時才增加，另外該系統能利用產生另一種在文章上的向量維度，以處理冷啟動問題。更多的是，推薦能隨著時間更改演化，在快速增加的訊息下，避免重複的過程是必要的。概念空間的為度是依照項目的特徵來設立，另外「概念」是項目概念空間分群的結果，也就是文章分群的結果。而使用者空間概念是項目概念空間根據使用者的行為調整後的。譬如說使用者看了某兩個項目，我們便假設這兩個項目之間有關係。這兩個概念空間互相演化，接著系統利用演化最後的「概念」來作推薦。這樣的系統在實驗上實作在文章推薦上面。在我們的例子中，項目是文章，項目特徵是文章的詞，而使用者行為就是讀者閱讀文章的紀錄。在實驗中，以使用者為基礎的協同過濾和以項目為基礎的協同過濾的為度分別約是三萬和三千，這是根據實驗中的使用者和文章數量決定的。而我們提出的系統，由於使用了「概念」來作，以概念的個數視為維度，其大小從五開始，每個迭代都可能增加，最後在第十二次的迭代收斂，此時維度為八十七。除此之外，所提出的系統能動態調整代表文章的向量維度，文章的維度是用於將文章分群時的依據，文章會依照向量維度來計算相似度，最後向量的長度是四十四，新的文章則可以根據這個維度來加入分群以及被推薦。精確度-召回率虛線顯示，我們所提出的推薦系統，使用者真正點擊推薦文章的比例，相較於以使用者為基礎的協同過濾、以項目為基礎的協同過濾和基於內容過濾，有更多的點擊比例。另外在平均精確度也可，系統的平均隨著迭代次數增長並超過其他推薦系統。我們希望概念空間的這個想法可以擴展到有可被提取的特徵與用戶之間交互關係的項目。

關鍵字

推薦系統；協同式過濾；內容式資訊過濾；使用者閱讀行為

並列摘要

This thesis proposes a recommendation system (RS) which incorporates the advantages of the user/item-based collaborative filtering (CF) and the content-based filtering. Unlike the user/item-based CF where the user/item spaces are of high dimension, the proposed RS utilizes the user-based and item-based concept spaces where dimension, or the number of concepts, is increased only necessary. In addition, the proposed system can deal with the cold start problem with producing another kind dimension of items. With modifying clustering results, it can be used to create recommendation in the rapid increasing information. The dimension of the item-based concepts is defined by the features of the items, and concepts are the clustering result of the item-based concept space. The user-based concepts are the result of clustering adjustment from the item-based concepts with the information of users' behaviors, such as whether or not a user is interested in both items in a concept. The user-base and item-based concepts co-evolve iteratively in the above manner. At the end, the proposed RS utilizes the learned concepts combined with the reading dependence to perform recommendation. The proposed techniques are demonstrated on the article recommendation. In this case, the features of an item correspond to the segmented contents of an article, and users' behaviors correspond to users' reading preferences. In the experiment, the item-based/user-based CF dimension is about $30,000$ and $3,000$ while the concept space in proposed RS articles starts from $5$ and ended up merely $87$ after $12$ iterations. The proposed RS dynamically adjust the dimension of articles. The dimensions of articles is $44$ in the end and used for clustering articles. New articles then can be clustered and recommended as well. The precision-recall curves indicates that the proposed RS achieves more hits than user-based/item-based CF and content-based filtering. The average precision-recall curves and mean average precision of proposed system grows and exceeds others. This idea of two concept spaces can be extended to the situation with items with extractable features as dimension and the interaction between items and users.

並列關鍵字

Recommendation system ； collaborative filtering ； content-based filtering ； users’ reading behaviors

參考文獻

[8] J. A. Hartigan. Clustering. Annual review of biophysics and bioengineering, 2(1):81–102, 1973.

[9] J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Applied statistics, pages 100–108, 1979.

[12] F. Jelinek. Statistical methods for speech recognition. 1997.

[13] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. Grouplens: applying collaborative filtering to usenet news. Commu- nications of the ACM, 40(3):77–87, 1997.

[15] G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to- item collaborative filtering. Internet Computing, IEEE, 7(1):76–80, Jan 2003.

國際替代計量

基於使用者偏好可調概念空間的推薦系統設計

主題瀏覽