透過您的圖書館登入
IP:3.15.171.202
  • 學位論文

以鏈結為基礎之自動化網頁分類

Link-based Automatic Web Page Classification

指導教授 : 楊正仁 博士
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在網際網路快速的發展下網路上所存的資訊數量大量的增加。 藉由網頁分類目錄與搜尋引擎的幫助,在網路上可以方便的找到想要的資訊,因為它們所提供的服務是如此的便利與迅速,網頁分類目錄與搜尋引擎已經成為網際網路上不可或缺的重要工具。然而不論是網頁分類目錄或是搜尋引擎,如果有一些支援機制的幫助,將會使分類更為精準,搜尋更有效率,而自動化網頁分類即提供了這種機制。 網際網路資訊量愈來愈大,因此自動化網頁分類將會是網頁分類方法的主流。但是在目前的自動化網頁分類方法中,存在著網頁分類不精準甚至因資訊量過少而無法分類的問題,因此本篇論文提出一個新的整合方法分析被分類的網頁上的鏈結,並利用這些鏈結資訊找出具有高度關聯的網頁,整合這些網頁內容一併分析。運用網頁的關聯性我們擴大可利用的網頁文字與標籤資訊,提昇自動化分類的精準度,並改善可分類率,降低無法分類的情形,使分類目錄中能精確的收納更多該類別的網頁。 我們並分別實作出使用鏈結資訊的分類法以及Jenkins&Inman的分類法, 以網際網路上既有的分類目錄進行訓練與測試,驗證本論文所提出的方法確實改進了自動化網頁分類的精準度與可分類率。

關鍵字

網頁分類 鏈結分析

並列摘要


As the Internet rapidly develops, the amount of information accumulates vastly.Web search engines and categories help users to find important information quickly and effectively.Therefore, Web search engines and categories of Web pages have became two important services on the Internet. However, either Web search engines or categories of Web pages need some support mechanisms for precisely classifying Web pages to improve the effectiveness. The automatic Web page classification is one of the mechanisms. Because the amount of Internet information is too huge to be classified manually, the automatic Web page classification is becoming the main stream of Web page classification.However, two problems need to be discussed further: how to improve the classification accuracy and how to reduce the ratio of the pages that can not be classified at all.This thesis proposes a new approach called linked-based automatic Web page classification to relief the problems.We improve a tag-weighted approach (Jenkins&Inman) by incorporating link analysis, which picks out the authority links from the Web page being classified, and analyzing the contents which pointed by the authority links. We have conducted experiments to compare our approach with Jenkins&Inman approach. We used a set of classified Yahoo! Web pages for training and verification.The experiment results show that the linked-based automatic Web pages classification indeed improves the classification correctness rate and reduces the amount of Web pagess which cannot be classified in Jenkins'' approach.

參考文獻


[Chekuri 1997]. C. Chekuri, M.H. Goldwasser, and P. Raghavan, “Web search using automatic classification.” In proceedings of the 6th International World Wide Web Conference, santa Clara, California, April 1997.
[Jenkins 1998]. C. Jenkins, M. Jackson, P. Burden, and J. Wallis, “Automatic classification of Web resources using Java and Dewey decimal classification,” Computer Networks and ISDN Systems, Volume 30, Page 646-648, 1998.
[Jenkins 2000]. C. Jenkins and D. Inman, “Adaptive automatic classification on the Web.” In proceedings of the 11th international workshop, Database and Expert Systems Applications, Page 504 —511, 2000.
[Selberg 1997]. E. Selberg and O. Etzioni, “The Metacrawler architecture for resource aggregation on the Web,” IEEE Expert, Jan.-Feb. 1997 Volume: 12 1, Pages 11 -14
[Thompson 1997]. R. Thompson, K. Shafer, and D. Vizine-Goetz, “Evaluating Dewey concepts as a knowledge base for automatic subject assignment,” OnLine computer Library Centre(OCLC), http://orc.rsch.oclc.org:6109/eval_dc.html, 1997

延伸閱讀