透過您的圖書館登入
IP:18.118.137.243
  • 學位論文

應用機器學習於內容網站推薦系統研究

Applying Machine Learning for Recommendation System in Content Website

指導教授 : 張瑞益

摘要


現今應用機器學習建立內容網站推薦系統越來越普遍,各式各樣的推薦系統根據不同的網站內容和服務需求被發展出來。推薦系統的好壞會直接影響使用者的黏著度。過去基於內容的推薦系統是透過比對內容的相似度,依據相似度的高低來做為相關內容推薦的依據,然而這樣的作法無法處理內容的用字相同但主要表達的意義不同的問題,也沒有考慮使用者想找尋甚麼資訊。此外在網站首頁和手機的推薦系統都有空間限制的問題,受限於版面大小的問題無法像在電腦上瀏覽一樣,一次能夠呈現大量內容。若使用者無法在短時間瀏覽的內容網站的優勢內容會降低再度來訪意願。在本論文中,我們提出兩種推薦系統來解決以上的問題: 基於主題關鍵字的推薦系統以及行動導向型錄推薦系統。我們提出主題關鍵字的概念跟一般的關鍵字相比更強調使用者意圖,我們希望使用者在搜尋引擎查詢主題關鍵字所找出的文章,必須要在該文章中被完整描述。接著,我們利用粒子學演算法和定義使用鏈的相關特徵發展了一套主題關鍵字擷取技術。定義使用鏈是由定義單元和使用單元構成的資料結構,包含定義的變數和所有對這個變數的使用所構成。定義使用鏈的概念為主題關鍵字是定義變數,然後主題關鍵字被其他字、子句或詞所描述。針對行動導向型錄推薦系統,我們提出了詢問式基因演算法來建構行動導向型錄,能在有限型錄數量的情況之下,建立出吸引最多使用者的型錄。詢問式基因演算法有三種類型的神諭,分別是偏好模型、商品向量和交易向量,透過神諭可以不斷地加入高吸引力的產品到行動導向型錄中。根據實驗的結果主題關鍵字技術可以有效地找出文章的核心觀念以提升相似資源推薦的品質。至於行動導向型錄推薦系統方面,詢問式基因演算法建立出來的型錄比目前最好的方法更能吸引使用者。最後我們探討了如何在真實的內容網站(教育大市集)應用這兩種推薦系統技術,而結果顯示這些技術的確能提升內容的使用率。

並列摘要


The use of machine learning to build content website recommendation systems has become increasingly popular. Various types of recommendation systems have been developed based on website content and service demands. The quality of a recommendation system directly affects user adhesion. A previous content-based recommendation system recommended relevant content according to content similarity obtained through content comparison. However, such an approach can neither consider that same words in content may have different meanings nor consider what information users want to find. In addition, recommendation systems for website homepage and mobile phone, which cannot simultaneously display a large amount of content as a PC because of limited page space, are subject to space constraints. If users are unable to browse the most advantageous content of a content website in a short time, their willingness to revisit the website reduces. In this dissertation, we propose the following two recommendation systems to solve these problems: subject-keyphrase-based and Mobile-Oriented Catalog (MOC) based recommendation systems. Subject-keyphrase is more focused on user intention than general keywords. We expect that when users search for a subject-keyphrase in a search engine, the subject-keyphrase is fully described in the acquired articles. We then developed subject-keyphrase extraction technique based on Particle Swarm Optimization (PSO) and Definition-Use Chain (DU Chain). DU Chain is a data structure which includes a definition (D-component) of a variable and all the uses (U-component) reachable from that definition. The DU Chain follows from what has been said in that subject-keyphrases are described by other words, clauses or phrases which can be referred to as U-components and the subject-keyphrase is the definition of a variable. For the mobile-oriented catalog recommendation system, we propose Query-Based-Learning Genetic Algorithm (QBLGA) to construct MOCs to attract the most users in the case of limited catalog quantity. QBLGA has three main types of oracle which are preference modeling, Product2Vec and Transaction2Vec. The oracle of QBLGA can actively and repeatedly add high-attractive products into MOCs for higher covered customers. The experimental results show that the subject-keyphrase technology can effectively determine the core concept of the article to improve the quality of similar resource recommendations. For the MOC based recommendation system, the catalog built by using our QBLGA has proved to be more attractive to users than the state-of-the-art method. Finally, we showed the application of these two recommendation system techniques to a real content website (Taiwan Open Platform for Educational Resources; TOPER); the results show that these techniques could improve the content usage.

參考文獻


[2] Y. Matsuo and M. Ishizuka, "Keyword extraction from a single document using word co-occurrence statistical information," International Journal on Artificial Intelligence Tools, vol. 13, pp. 157-169, 2004.
[3] G. Ercan and I. Cicekli, "Using lexical chains for keyword extraction," Information Processing & Management, vol. 43, pp. 1705-1714, 2007.
[4] C. Fellbaum, WordNet: Springer, 2010.
[11] P. Turney, "Coherent keyphrase extraction via web mining," 2003.
[17] Q. Su and L. Chen, "A method for discovering clusters of e-commerce interest patterns using click-stream data," Electronic Commerce Research and Applications, vol. 14, pp. 1-13, 2015.

延伸閱讀