Destination Image Representation on the Web by Content Analysis

指導教授 : 項潔


隨著網路技術的發展,許多免費的網路資源可以讓人使用,旅遊者可以輕易的經由網路景點介紹的網站所傳達的資訊作為參考,相關的資訊來源可以來自官網、專門介紹的網站、部落格網站等資訊。近年來目的地意象的內容分析法研究亦關注到旅遊網頁內容所顯示的意義,相關的研究顯示目前研究以人工擷取,輔助軟體分析旅遊網頁的詞頻和詞彙群聚的現象可以觀察出旅遊地所顯示的目的地意象。 經探討在文本自動擷取、分群應用、大量資料分析與管理的研究較為缺乏,因此發展適合中文的方法與系統工具,進行方法的設計與實作,經過驗證其可行性後,整合為一個資訊系統,結合資料庫的設計,以進行大量資料的分析與管理。本文提出之目的地意象系統可觀察經過時間的變化所產生的差異,以全文或句子為計算基礎的分析差異,進行多景點資料來源的多面向的觀察。 在文本萃取研究部份,我們針對網站、部落格網頁進行擷取,區分成兩種資料的擷取模式,其一是對專屬介紹網站資料的擷取,另外一種是經由搜尋引擎對部落格網頁資料的擷取,以取得景點的不同來源資料來進行分析與比較,在取得網頁資料後,發展適用於各種不同網站來源的文本萃取機制,以取得其中的文本。在分群研究部份,利用所萃取之文本,經由旅遊詞彙分析,可以觀察出旅遊詞彙詞頻的統計資訊,利用相關性來進行詞彙共現分析,以取得其語意網絡,應用與調整多種分群演算法技術進行觀察,包含類神經網路演算法(Neural Network)、階層式分群演算法(Agglomerative Clustering)、生成樹分群演算法(Spanning Tree Clustering)進行分析與比較。在系統研究部份,提出目的地意象系統,經過設計可提供之功能包含新增景點、網站文本擷取、部落格文本擷取、匯入文本分析、線上文本萃取、瀏覽分析結果、詞彙管理、目的地意象分群等功能。 本文提出目的意象系統研究,可同時進行多個景點的目的地意象觀察,我們應用提出的系統於淡水、阿里山、日月潭、墾丁、清境、平溪等多個不同的景點,分析出部落格及官網網頁所傳達的目的地意象差異,藉此可以觀察台灣景點的目的地意象變化,利用本文提出目的意象系統研究,可同時進行多個景點的目的地意象觀察,分析出部落格及官網網頁所傳達的目的地意象差異,藉此可以觀察台灣景點的目的地意象變化,可以應用於網站好用性評估,觀察遊客在部落格上發布的想法與意見,作為旅遊業者或主管機關審視服務績效的工具。


With the rapid development of the Web, free Web resources have become the primary source of information for many people. When planning travel, for instance, one may consult official websites of places, commercial websites devoted to specific destinations, or weblogs. Research studying destination image representation on the web through content analysis has attracted much attention in recent years. Most of these works obtain Web content manually, use software to analysis phrase frequency and phrase clustering to analyze the destination images. The tools currently available, however, are mainly designed for western languages and are not suitable for Chinese content. We have also noticed the existing methods need improvements in automatic Web content extraction, clustering, and content analysis and management. In this thesis we have developed a system architecture that fully integrates into a management system these additional features so that destination images can be analyzed more effectively. Our method can also differentiate temporal aspects of data extracted, and use different segmentation methods to provide analysis from multiple dimensions. We have developed two kinds of automatic Web content extraction mechanisms. The purpose is to separate the meaningful content from the nonessential part, such as header and advertisement, in a webpage. The first one is designed for specific websites, and the second is for blogs that are obtained through keyword search. Parsing and cleaning techniques are also developed to extract meaningful content in the plain text from the webpages. Through segmenting the text, we identify travel related phrases together with their frequencies and co-location relations. We have also developed several clustering algorithms, based on neural network, spanning tree clustering, and agglomerative clustering, to cluster the phrases and find the destination images. The functions are integrated into a system with database design architecture. To demonstrate the effectiveness of our method, we have applied our system to a number of popular tourist destinations including Alishan, Sun Moon Lake, Kenting, Tamsui, Pingxi, and Qingjing. We use the system to analyze the differences of destination images transmitted through the official website and weblogs of each location. They also show the similarities and differences of perception among the different tourist locations. The system can be used to evaluate the effectiveness of official websites for tourism, identify subtle seasonal differences in tourism at the same location, and be used as a reference for promotional strategies for tourist industry.


