從網站日誌中發現網站結構及網頁關係

網站使用行為採礦(WUM)是一種應用資料採礦技術從網站日誌中取出知識，而藉以用來提升網站設計、預測使用者行為或網站個人化設計等。網站使用行為採礦(WUM)可分為三個主要階段：資料前置處理(data preprocess)、模式發現(pattern discovery)以及模式分析(pattern analysis)。其中，資料前置處理占整個程序的６０％以上，是最費時的一個階段。 Cooley 等人又將資料前置處理分為四加一個額外的步驟，分別為資料清理(data cleaning)、使用者識別(user/session identification)、路徑完成(path completion)和頁面識別(page view identification)，一個額外的步驟為交易識別(transaction identification)。直到現在，網站使用行為採礦的資料前置處理必須取得外部的領域知識(domain knowledge)，例如：網站結構(Web structure)及網頁內容(Web content)分類，以致大大的影響網站行為採礦的應用。就分析師而言，必須花費許多時間以熟悉網站架構及網頁內容，對網站管理者而言，當提供詳細網站結構給分析師時，必須先考量網站資料機密性問題。我們認為應該在網站使用行為採礦的過程建立一個平台以協助分析師與網站管理者更良好的溝通。本論文提出一個機制是從網站日誌內隱含的資訊建構網站結構及找出網頁關係。實驗結果顯示重建網站結構及發現網頁關係的精確率達90%以上。這個方法可以容易的嵌入目前的前處理步驟，是一種切實可行的替代方法。

關鍵字

資料前置處理；網站結構；網頁關係；圖框集

並列摘要

Web usage mining which extracts knowledge from Web server log is an application of data mining method. The mining results can be used for improving the Web design, predicating user behavior and personalizing Web site. Web usage mining has three major stages: data preprocessing, pattern discovery and pattern analysis. Data pre-processing, which normally spends more than 60% of the whole mining process, is most time consuming. Cooley divided data preprocessing into four and one optional steps. They were data cleaning, user/session identification, path completion, page view identification and transaction identification which is optional. Until now, the preprocessing of Web usage mining must gather external domain knowledge, such as Web structure and Web content classification, which greatly affects the application of Web usage mining. It takes more time for the analyst to be familiar with Web structure and content. For Web administrator, she/he may have concerns with the confidential Web data when giving the detailed Web structure to the analyst. Thus, we want to solve the problem by creating a platform between analysts and Web administrators to help them better communicate during the Web usage mining progress. In this thesis, we propose a framework that can reconstruct Web structure and discover the page relationship from Web server log’s implicit information. The experimental results showed that Web site reconstruction and page relationship discovery with precision of more than 90%. This method that can be easily embedded in the popular preprocessing stage is a workable and practical substitute method.

並列關鍵字

preprocessing ； Web structure ； page relationship ； frameset

參考文獻

[Xing04] Dongshan Xing, Junyi Shen. Efficient data mining for Web navigation patterns. Elsevier Information and Software Technology 46 (2004), pp. 55-63.

Google Scholar

[Eiri03] Magdalini Eirinaki, Michalis Vazirgiannis. Web Mining for Web Personalization. ACM Transactions on Internet Technology, Vol. 3, No. 1, February 2003, Pages 1–27.

Google Scholar

[Dell03] Frank Dellmann, Holger Wulff, Stefan Schmitz. Statistical Analysis of Web Log Files of a German Automobile Producer: Findings from a Practical Project Concerning Web usage mining. The 3rd IEEE International Conference on Data Mining. November 2003, Pages 715–718.

Google Scholar

[Tana03] Doru Tanasa, Brigitte Trousse. Advanced Data Preprocessing for Intersites Web usage mining. IEEE INTELLIGENT SYSTEMS MARCH/APRIL 2004, Pages 59–65.

Google Scholar

[Liu01] Lizhen Liu, Junjie Chen, Hantao Song. The Research of Web Mining. IEEE Proceeding of 4th World Congress on Intelligent Control and Automation June 10-14, 2002, Shanghai, P.R.China Pages 2333-2337.

Google Scholar

被引用紀錄

洪范文（2010）。以網站日誌探勘建立網站架構〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315203559

國際替代計量

從網站日誌中發現網站結構及網頁關係

全文下載

主題瀏覽