聚焦於網站特定資料彙整和擷取之智慧型網路機器人

科技發達的現今，上網成為人們習以為常的生活習慣之一，而網站更有如琳瑯滿目的商店一樣，吸引使用者們的點擊與瀏覽。面對這種競爭激烈的環境，各家業者需要在對手的網站花費一番功夫，然而許多公司仍然在使用人工的方式進行資料的蒐集、建檔和進行分析。需要關注的網站為數眾多，如果以人工的方式來進行，勢必要投入大量人力、時間與資源。本論文開發智慧網路機器人 (Web Robot or Web Crawler) 技術，有效快速並且精確地蒐集Web資料。根據需求自動化程式佈署爬蟲機器人至各網站擷取資料，以大幅改善企業以人力資源做資料蒐集的窘況。智慧型網路機器人系統 (imBot, intelligent meta roBot)，結合Chrome Extension Apps整合瀏覽器，讓使用者日常瀏覽時，發現想蒐集之資料，可以輕易依據需求建立任務目錄，系統自動擷取相關網頁資訊，分析網站的領域metadata，當使用者確認並且標註欄位後，系統會開始擷取metadata加以分析，並輸出成資訊圖表。

關鍵字

資料擷取；網路爬蟲；共享標註

並列摘要

In the modern age of advanced technology, surfing on the Internet to get information from sites or Apps become parts of people’s daily life. Especially for commercial information, corporations frequently spend manual efforts to collect data from competitive business websites. Obviously, there are numerous websites but only few hands. To handle such an drawback, we develop the Intelligent Metadata Robot (imBot) for users to collect data and extract useful metadata from various websites. By providing Chrome Extension App, users can easily add interesting websites or pages and assign them into customized categories maintained on the imBot web platform. Consequently, integrates an Intelligent Internet Robot (Web Robot or Web Crawler), through this system, imBot facilitates users to accumulate useful metadata information transparently while they are surfing on the Web. The imBot platform also provides sharing annotations that websites labelled by someone for metadata extraction are sharable to other users. Finally, imBot provides informative graphs to present collected metadata for users.