透過您的圖書館登入
IP:3.147.85.201
  • 學位論文

結合購物歷史和推車資料做即時個人偏好之推理研究

Combining Purchase History data and Shopping Cart data to make Real-time Personalized Preference Inference Engine

指導教授 : 張時中
共同指導教授 : 黃亭凱
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


現今線上購物的消費者越來越多, 公司每天處理的消費者資料不僅龐大且流動快速。為了更有效吸引目標客戶及提供顧客更個人化的服務, 如何透過電腦自動化從這些即時資訊中萃取個人偏好來做即時推薦, 在商業競爭激烈的現代社會中更顯重要。其中, 隨著數據分析的進步, 今利用電腦高速運算能力在線上購物資料中, 找尋產品及消費者關聯性以及消費者偏好做為行銷推薦的依據已相當廣泛。 然而, 目前的線上購物的資料分析方法, 主要依據歷史交易紀錄來了解消費者偏好, 而當下放進購物車的採購品項數據鮮少考慮。可是根據Ayala Arad研究, 消費者下一個想買什麼商品除了與過去歷史紀錄有關之外, 也與當下放進購物車的採購品項有關。因此, 應如何結合消費者過去的交易紀錄和當下放進購物車之採購品項數據, 來推理個人當下對下一個購買商品的偏好?下列為回答此問題相對應的挑戰: 一) 當消費者放進曾買過的商品至購物推車裡時, 應如何從該消費者的購物歷史資料中找到消費者個人的購買偏好模式, 以推理預測個人對下一個購買商品的偏好? 二) 當消費者放進未曾買過的商品至購物推車時, 無法採用消費者過去的購買習慣作為推理偏好依據時, 該如何推理預測其下一個購買商品的偏好? 三) 該如何運用現有工具來推論下一個購買商品的偏好? 有鑑上述挑戰, 本論文設計了以個人的購買偏好模式和其相關群體為基礎之偏好分析方式, 並將其整合為一「結合購物歷史和推車資料做即時個人偏好之推論引擎」(Transaction-Data Based Real-time Preference Inference Engine, TRPIE)作為實踐。此外, 本論文透過Instacart公司提供給Kaggle數據建模和數據分析競賽平台的交易紀錄資料,開發此系統工具並實際測試系統之推理能力, 其設計包含以下部分對應上述挑戰: D1. 基於線上採購歷史的順序資訊萃取個人下一個購買商品的偏好模式 消費者的採購行為是動態的過程, 目前購物籃分析中的Apriori 演算法沒有考慮購買商品的序列性, 且這個過程會逐漸形成個人的習慣性購買行為。因此, 我們巧妙利用消費者的商品採購順序的資訊, 把消費者的歷史採購行為轉成一連串具有時間順序性(temporal)的資料, 並利用針對問題所設計了兩層循環類神經網路來從歷史資料中萃取出個別消費者的購買偏好模式, 進而透過這個模式來推論出下一個購買商品的偏好機率。 D2. 基於相關群體(Reference Group)的偏好推論 當消費者放進未曾買過的商品至購物推車時, 不能有效使用消費者過去的購買習慣來做推論。但消費者的購買行為很大程度與偏好相類似群體相關。因此, 我們透過現有的K-mean分群法從所有人的購物歷史紀錄中找出相似購買品項的人當作此消費者所對應的相關群體, 並透過購物籃分析從此相關群體中所有人的購物歷史紀錄中, 找出和這未曾買過的商品最常一起購買的品項, 進而外插推論個別消費者對下一個商品的偏好。 本論文提供一參考實踐系統來實現TRPIE之方法設計,基於D1和D2的設計方法, 利用python腳本語言實作了C1.把歷史資料轉為具有順序性的訓練資料, C2.針對本論文問題設計了RNN的模型架構來整合現有之KerasTM, tensorflowTM, sklearnTM, MlxtendTM。 藉由擁有100筆真實交易資料的1374人,進行系統的實踐、測試與實驗。實驗一使用Robin Devooght定義的短期預測(Short-Term Predictions)當作評價指標, 來評估TRPIE中對曾買過品項的下一個購買商品的偏好推論。此實驗在隨機預測的平均結果為0.5%。實驗一結果, 考慮消費者的購買順序的個人購買偏好模式在短期預測的預測準確度上平均為36.46%, 標準差20.66%, 顯著高於只考慮用個人購物歷史資料且忽略購賣商品序列性的預測準確度18.17%, 標準差17.98%; 實驗二評估基於相關群體之未曾購賣過商品的偏好推論,因為沒有考慮消費者的購買順序, 所以使用Robin Devooght定義的長期預測(Long-Term Predictions)當作評價指標, 來評估TRPIE中的未曾買過品項的下一個購買商品的偏好推論。實驗二結果中, 在長期預測的預測準確度上平均為61.76%,顯著高於隨機猜測結果。 本論文的貢獻在於創新利用消費者的購買順序來萃取出個人購買偏好模式,並藉此設計了TRPIE結合此購買偏好模式和當下放進購物車的商品, 來即時預測其對下一個商品的偏好。具體貢獻條列如下: (1)把個人過去歷史紀錄中的購買商品根據消費者採購順序轉成具有時間順序性(temporal)的訓練資料集。 (2)針對問題設計、採用適合處理有順序性資料的兩層式RNN來學習此訓練資料集, 並從中找出個人的購買偏好模式。 (3)藉由所有消費者過去購買行為, 藉由K-means分群法找出購買相似品項的相關群體,並透過購物籃分析從此相關群體的購物歷史紀錄中,找出和未曾買過的商品最常一起購買的品項,外插推論個別消費者對下一個商品的偏好。 (4)實作D1和D2的設計方法, 並整合Keras, tensorflowTM, sklearnTM, MlxtendTM完成TRPIE系統之參考實踐。 (5)藉由真實交易資料驗證基於以上方法所實作之系統TRPIE之個人偏好推論上在短期預測上的準確度平均為36.46%, 顯著高於使用Apriori 演算法當作的參考基準的18.17%。

並列摘要


With more and more people shopping online, customer data companies daily deal with is not only big but fast. In order to attract target customers more effectively and to provide customers with more personalized services, how to automatically extract personal preference from the real-time information through computer to make real-time recommendation is even more important for businesses in the competitive modern society. With the advance in data analysis, to find relationship among customers and products or customers’ preference from data for marketing strategies via modern computer’s computing ability has been very common nowadays. But, current methods of data analysis for online shopping mainly rely on historical transaction record. The shopping cart data is barely considered. However, according to Ayala Arad, next item a customer wants to buy is not only related to his past historical records but to the item currently being put into the shopping cart. Therefore, how to combine the customer’s past transaction record and items put into the shopping cart at the moment to infer the preference for the next item? The challenges to address this question are as follows: (1). How to extract the customer’s personal purchasing preference pattern from his purchase history data for inferring preference for the next item, when he put reordered item into the shopping cart? (2). How to predict customer’s preference for the next-to-purchased item, when he put never-purchased item into the shopping cart such that it’s unable to make inference through his past habit? (3). How to exploit existing tools to infer the preference for the next item? This thesis designs a personal purchasing preference pattern and reference group-based methodology – Transaction-Data Based Real-time Preference Inference Engine(TRPIE). Moreover, this thesis use the transaction data provided by Instacart for Kaggle competition to design and develop system in response to above challenges: D1. Extraction of personal purchasing preference pattern based on order of purchasing item in purchase history data Customer’s purchasing behavior is a dynamic process, and this process will gradually form a personal purchasing habit. Currently, Aprior algorithm in basket analysis doesn’t consider this purchasing sequentiality. Therefore, we cleverly use the information of order of purchasing items to turn the purchase history data into a series of temporal data. Then, design a two layers Recurrent Neural Network(RNN) for extracting personal purchasing preference pattern to infer preference of next item. D2. Preference inference based on reference group It’s not able to infer directly from past purchasing habit, when customer takes a never-purchased product into the shopping cart. However, the customer’s purchasing behavior is largely related to a certain group with similar preference. As a result, we use the existing K-mean clustering method to find out each person’s reference group from all people's purchase history record, and through basket analysis, we can find top m products that the customer’s reference group would most likely to buy after purchasing this not reordered product as group preference. Then, we extrapolate the customer’s preference of next item from the group preference. A reference implementation of this research implements TRPIE methodology into a system by exploiting existing tools including Keras, tensorflowTM, sklearnTM, and MlxtendTM. To develop, experiment, and test our system, we use the real data from 1374 people with 100 orders. Experiment 1 uses the short-term Predictions defined by Robin Devooght as a measure of performance to assess the preference in TRPIE for the next purchase of a purchased item. The average of the experiment 1 in a randomized guess is 0.5%. In experiment 1, the personal purchasing preference pattern considering the order of purchasing item gets average accuracy 36.46% and standard deviation 20.66% on the top of short-term prediction, which is significantly higher than baseline only considering purchase history record and ignoring sequentiality of purchasing items (average accuracy 18.17% and standard deviation 17.98% ); On the other hand, we use long-term prediction defined by Robin Devooght as evaluation metric in experiment 2, because the method 2 does not consider order of purchasing item. In experiment 2, never-purchased product inference based on reference group gets average accuracy of 61.76 on the top of long-term prediction, significantly higher than random guess. The contribution of this thesis is to innovatively exploit consumer’s order of purchasing item to extract personal purchase preference patterns, and to design the TRPIE combined the pattern and item currently being put into the shopping cart for real-time predicting the preference of next item. Specifically, contributions include: (1). Converting the purchase history data into temporal training data for following RNN model based on the customer’s order of purchasing item. (2). Adopting a two-layer RNN naturally suitable for handling with time-series data to find out personal purchasing preference pattern based on our problem. (3). Adopting k-means clustering to find reference group with the similar purchasing behavior, and to find the most common item purchased together with not reordered item through the basket analysis for extrapolating individual consumer’s preference of next item. (4). Implementing the method of D1 and D2, and integrating Keras, tensorflowTM, sklearnTM and MlxtendTM to build the TRPIE system. (5). Using the real data to validate TRPIE based on above mentioned methods; Achievements of personal purchasing preference accuracy of 36.46%, significantly higher than reference baseline using Aprior Algorithm (18.17%).

參考文獻


[ACC16] Aggarwal, Charu C. (2016). Recommender Systems: The Textbook. Springer. 
[ARL17] Wikipedia “Association rule learning”
[Ara12] “Past Decisions Do Affect Future Choices: An Experimental Demonstration”, Ayala Arad
[ARR04] Agrawal, Rakesh, and Ramakrishnan Srikant. "Fast algorithms for mining association rules." Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994.
[BeC15] “A critical review of Recurrent Neural Network for sequence Learning?” John Berkowitz, zachary C. Lipton, 2015.

延伸閱讀