透過您的圖書館登入
IP:3.145.163.58
  • 期刊

以資料挖礦法則預測網頁更新規則之研究

Discovering Web Page Update Patterns with Data Mining

摘要


企在電子商務時代,有各式代理人軟體(Agent)在網路搜尋資訊以建構各式各類網站。由於資料量通常相當龐大,對這類軟體而言,何時應更新其所取得的資訊,便成為一個系統管理員重要的決策課題。目前通常採取固定時間更新方式,亦即更新的間隔為一使用者自定的固定時間。但是一旦其間隔的設定不佳,則可能造成抓回來的網頁內容都是與先前相同的(間隔太短),或是網頁的內容已經被更新過多次以上了(間隔太長),這樣一來就可能會有浪費網路資源或資料過舊的情況出現。所以本論文利用資料挖礦中產生序列關聯規則的方法,對網頁找出其更新時間的樣式(up-date pattern),並以此樣式來實際擷取網頁,以做驗證。由於網頁更動的樣式可能隨著時間變化而產生修改,因此一成不動的預測樣式會逐漸失去準確性。本研究因此也提出累進式的方法來更新預測規則,使規則能適時反應現況但又不至於耗用過多電腦資源。

並列摘要


In the E-Commerce era, many agents roam over Internet to find best prices, cluster related product information, etc. Agents have to visit targeted web pages periodically to update information. If agents visit pages too frequently then they end up reloading existing information. On the other hand, if agents visit web pages too infrequently, collected data may be out of date. To minimize out-of-date errors, agents temp to visit a site as soon as possible. However, to minimize network traffic and database update cost, system administrators temp to reduce the visit as much as possible. To the best of our knowledge, no research has have been directed to finding a scientific approach to solve the dilemma. In the paper, we propose to visit web pages according to past update patterns. That is, a page should be visited as soon as it is expected to be changed, but should not be visited in any other time. To discover the update patterns, we propose to use sequential association rules of data mining methodology. Association rules can find patterns implicitly associated with update temporal patterns. In the paper, each web page will be associated with a sequence of binary digits denoting whether the page is updated in last agent fetching slot. We designed an algorithm to mine patterns from the sequence of binary digits. The patterns will be composed of large item sequences and related association rules. The rule states under some preconditions, the web page will be changed in next time slot. If a precondition matches current situation then an agent will be sent to fetch the page. Besides computing patterns for existing pages, the system will also update its database dynamically to consider the factors of newly inserted pages and deleted pages.

並列關鍵字

web page update Data mining pattern Discovery WWW

參考文獻


Caglyan, A.,Harrison, C.(1997).Agent Sourcebook.Canada:John Wiley Sons.
Cheung, D. W.,Kao, B.,Lee, S. D.(1997).Proceedings of the Fifth International Conference on Database Systems for Advanced Applications.Melbourne, Australia:
Chen, Ming-Syan,Han, J.,Yu, P. S.(1997).Data Mining: an Overview from Database Perspective.IEEE Transaction on Knowledge and Data Engineering.
Aumann, Y.,Perkowitz, R. F. M.,Etzioni, O.,Shmiel, T.(1998).KDD'98.
陳彥良 Chen, Yen-Liang,Hsu, P. Y.,Chen, S. S.(2002).Mining Hybrid Sequential Patterns and Sequential Rules.Journal of Information Systems.27(5)

被引用紀錄


趙瑋蕾(2006)。運用延遲差異建置可客製化網站之研究〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916284661

延伸閱讀