瀏覽器網頁內容搜集之實作設計

網際網路Internet上面，是一個取之不盡的大寶藏。我們可以使用瀏覽器，透過網際網路Internet連線到遠端(Remote)搜尋引擎主機，我們只要輸入關鍵字，就可以很容易找到一堆我們想要的網頁內容。某些網頁內容，對於個人來說，是很有價值的，可是要如何將這些有價值的網頁內容保存下來，將是本論文研究在實作設計，所要研究的一個方向。網頁內容搜集保存的功能，可以算是網際網路Intenet上面一種 ”新型態價值網路儲存服務”的概念。本論文提出一個網頁內容儲存的架構，讓一般使用者可以很容易的將這些網頁內容資料，儲存在遠端(remote)網際網路上的伺服主機。本論文同時就目前該服務功能發展的現況做一個介紹及分析比較。目前在國內外主要的網頁內容搜集產品：一.Google的Notebook、二.Mozilla FireFox的Scrapbook、三.CyberArticle電子圖書館、四.Code Library .Net知識庫管理，四個已經成熟的工具服務產品，提供大家一個選擇參考的依據。本研究針對瀏覽器網頁內容搜集的研究原因目的，主要有四點：一.目前還不是很普遍，使用的人還不是很多、二.目前提供類似的工具服務產品不多、三.提供知識工作者一個有效的工具、四.完成系統平台實作，達到網頁內容搜集之目的。本論文的研究方法是依據整體功能面的使用者需求分析，設計一個Server-Side個人化網頁內容搜集系統平台架構，本研究稱做「個人網頁內容搜集器」(Personal Pages Content Collector)，縮寫稱做PPCC，在本論文的說明當中，將會以該縮寫稱之。將利用IE瀏覽器本身提供的程式元件，以及HTTP通訊協定原理，將使用者認為有價值的網頁內容資料，傳送到遠端PPCC伺服主機；之後只要連線到PPCC伺服主機，就可以很容易的將這些搜集下來的網頁內容叫出來，這將是本論文研究最後的目的。

關鍵字

網頁搜集；外掛程式；個人化；知識庫； CGI程式

並列摘要

Internet is an inexhaustible treasure.People can find infinite information through Internet just by using key words through web browser search engine. Some of the searching results are useful to users. Therefore, how to help users to keep these valuable and useful information contents are the direction of this research. The web pages contents collection is one of the new concepts about Internet value-added storage services. This research will provide a web contents storage architecture about how to save web pages contents at remote Internet server. This research will also introduce and analyze the function of the most popular four web pages contents collection development status. By first, Notebook of Google. Second, Scrapbook of Mozilla FireFox. Third, CyberArticle and Fourth, Code Library .Net. Users could select one of the four web pages contents collection on the basis of this research. There are four major reasons for this research. By first of all, the tools service now is not a popular , only a few users use it. Secondly, only a few tools are available. Third, web pages contents collection is a powerful tool for knowledge workers. Finaly, fourth the system platform implementation and reach the target of web pages contents collection. The methodology of this research is based on the analysis of integration usage demand. And then design a server-side Personalized Web Pages Content Collector system platform. , which is called “Personal Pages Content Collector” (the “PPCC”). By using the Internet explorer itself program component and HTTP protocol conventional communication users can send and store valuable Web Pages Content at remote Web Pages Content Collector server . Hereafter, users can find this information easily on-line as long as they enter the remote Web Pages Content Collector server . This is the ultimate target of this research.

並列關鍵字

web pages collection ； Plug-ins ； Personalized ； Knowledge base ； CGI program

參考文獻

[3] Hao WU，Xioling CHEN，Hui LI and XIAOBIN CAI，＂A WebGIS-based Browser Plug-in Approach to Share Spatial Information，IEEE 2005

[4] Tyng-Jaw Sheu，“The Design and Implementation of a Distributed Data Gathering System”，National Chung Cheng University，2000

[5] Min-Chi Tzeng，”XML-based Personal Web Annotations”，Yuan Ze University，2002

[7] Yi-Feng Tseng，The Mining and Extraction of Primary Informative Blocks and Data Objects from Systematic Web Pages，National Cheng Kung University，2006

[22] Hypertext Transfer Protocol -- HTTP/1.0，RFC1945 - HTTP/1.0 Specification，

國際替代計量

瀏覽器網頁內容搜集之實作設計

主題瀏覽