透過您的圖書館登入
IP:3.147.89.85
  • 學位論文

期間限制探勘於高效用序列樣式

Mining High Utility Sequential Patterns with Duration Constraints

指導教授 : 胡雅涵
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


高效用序列樣式探勘是資料探勘領域中一種很重要的應用。其中,它被廣泛應用在在購買行為分析的領域。透過高效用序列樣式探勘,企業可以窺探顧客的購買習慣,藉此得知產品之間的關聯性與大多數顧客購買哪些高單價商品,進而制訂銷售方針。 然而,在過去的研究裡,高效用序列樣式探勘出的樣式,會有顧客購買時間很長的樣式,在購買的關係上意義不大,為了要找出較具意義的樣式,我們針對要找尋的樣式加入時間條件的篩選。 在本篇中,加入期間條件的限制(duration constraint)與間隔的限制(gap constraint),在期間限制方面提出了最大跨度區間(maximum span length),期望找出來的樣式是在特定的一段時間內發生,在間隔的限制方面提出了最大間隔(maxgap)與最小間隔(mingap)。本研究提出HUD演算法,整合時間限制並調整Prefixspan演算法做效用序列樣式探勘。在實驗中測試執行時間、樣式數量、單一樣式平均價值、查準率、查全率、F測量等指標來比較我們的方法與傳統方法的差異。

並列摘要


Utility Sequential pattern mining (utility SPM) is one of most important data mining technique, and it is widely used in customer behavior scenario. Organizations are able to explore customers’purchase habit and comprehend the relationship between merchandise and high-priced merchandise which most customers buy through utility SPM process to develop sales policy. However, in previous studies, the pattern in conventional utility SPM, if the average length of sequences in database is long, the algorithm often generate too many long sequential patterns. It is not meaningful for relationship purchased. In order to find the more meaningful the style, we add the time constraints for the pattern mining. In this paper, we include the duration constraints in utility SPM. Specifically, we propose maximum span length constraint that expect to find out patterns which are occurring within a specific period of time. Next, the maxgap and mingap constraints is used to confine the reasonable time-interval between adjacent events. A new framework High Utility sequential pattern mining with Duration constraints (HUD) algorithm to mine high utility sequential patterns by the integration constraints. In experiment, we test runtime, number of patterns, value per pattern, precision, recall, F-measure to compare performance between our method and traditional utility SPM.

參考文獻


Ahmed, Chowdhury Farhan, Tanbeer, Syed Khairuzzaman, & Jeong, Byeong-Soo. (2010). A novel approach for mining high-utility sequential patterns in sequence databases. ETRI journal, 32(5), 676-686.
Ahmed, Chowdhury Farhan, Tanbeer, Syed Khairuzzaman, Jeong, Byeong-Soo, & Lee, Young-Koo. (2011). HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Applied Intelligence, 34(2), 181-198.
Boulvain, Frédéric, Mabille, Cédric, Poulain, Geoffrey, & Da Silva, Anne-Christine. (2009). Towards a palaeogeographical and sequential framework for the Givetian of Belgium. Geologica Belgica, 12.
Chen, Enhong, Cao, Huanhuan, Li, Qing, & Qian, Tieyun. (2008). Efficient strategies for tough aggregate constraint-based sequential pattern mining. Information Sciences, 178(6), 1498-1518.
Chu, Chun-Jung, Tseng, Vincent S, & Liang, Tyne. (2009). An efficient algorithm for mining high utility itemsets with negative item values in large databases. Applied Mathematics and Computation, 215(2), 767-778.

延伸閱讀