以關連性法則分析結果為基礎的資料分群法:應用在網頁瀏覽紀錄分析

雖然分析網站使用者瀏覽行為對於網站經營者有其重要性及必要性，不過實務上由於網站的建置初期往往不會考慮到是否未來有分析使用者瀏覽行為的需求，所以要在既有網站上建立一個分析系統其實是困難重重，以分析網誌為例，要如何確認網誌上面的哪些記錄是來自於同一使用者的行為就是一個非常大的挑戰，本論文提出一種方式利用ISAPI 過濾器配合cookie的技術，在實務上同時兼顧可行性與準確性來辨別個別的網站使用者，而不需要更動已開發的系統與程式碼。　　過去在關聯性法則運用在網路探勘的相關研究大多著重於找出不同網頁彼此的關聯性藉以產生具意義的規則，本論文則是利用關聯性法則分析的結果，透過合併關聯性緊密的資料項目集同時排除內部資料關係鬆散的資料項目集，藉此產生內部資料關聯性高的資料群，分析的過程中，也同時將關聯性較低的資料排除於資料群之外，同時確保資料品質的一致性。得到資料特性也有別於傳統以距離量測為基礎的資料分群法所產生的群資料特性。　　透過實驗結果，可以發現本論文所得到的群資料的確能夠將使用者的瀏覽相關性高的頁面集中在同一群內，與關聯性法則超圖形分割法（Association Rule Hypergraph Partition）所得到的資料相比，本論文所得到的分群結果除了比較準確，資料品質也較佳。

關鍵字

關聯性法則；群聚演算法；超圖形分割法；網頁探勘；網站使用探勘

並列摘要

Analyzing and understanding user behavior in browsing a web site is an important issue in web site developments, however, this capability is seldom an integral part of the design process when building the web site. It is a challenging task to add such capability to an existing and running web server due to the engineering consideration of modifying potentially large amount of web pages. This thesis uses ISAPI filter to inject cookies into HTTP transaction in order to identify individual user. This method can be applied to existing system with minor modifications. The main goal of data clustering is to partition data set into clusters, so that the data in each cluster share some common trait. This thesis proposes a method to cluster data items bases on the large itemsets which come from association rule analysis, instead of some commonly known distance measure. Empirical data are collected from an existing web server and the resulting clusters are analyzed and compared with the commonly used “Association Rule Hypergraph Partitioning”. The experiment shows the method we proposed can get more pertinent results as compare to “Association Rule Hypergraph Partitioning” and also at the same time, the method can prune infrequent data items.

並列關鍵字

association rule ； clustering algorithm ； ISAPI ； cookie ； hypergraph partitioning ； web mining ； web usage mining

參考文獻

[4]W.J. Frawley, G. Piatetsky-Shapiro and C.J. Matheus, “Knowledge Discovery in Databases: an Overview,” Knowledge Discovery in Databases, Cambridge, MA:AAAI/MIT, pp. 213-228, 1991.

[5]J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufmann Publishers, CA, USA, 2001.

[6]O. Etzioni, “The World Wide Web: Quagmire or Gold Mine,” Communications of the ACM, vol. 39, pp.65-68, 1996.

[8]R. Agrawal and R, Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Very Large Databases (VLDB) Conf., pp 487-499, 1994.

[9]G. Karpis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel Hypergraph Partitioning: Application in VLSI Domain,” IEEE Trans. VLSI Syst., vol. 7, pp. 69-79, 1999.

國際替代計量

以關連性法則分析結果為基礎的資料分群法:應用在網頁瀏覽紀錄分析

全文下載

主題瀏覽