網路越來越發達的時代,我們能夠更便利性的透過上網瀏覽消費者所留下的評論,讓許多人在進行購物、訂房或是訂位前會習慣上網先查詢相關評價再來做最後決定,希望購買的物品能滿足自己的需求。店家也希望消費者在購物或是體驗過後能上網留下寶貴的意見,所以這些評論能夠吸引更多人關注並且提供店家維持品質和改善的方向。在每篇評論中包含了使用者所給予的意見,通常會評論著許多不同的面向,因此從大量的使用者評論中自動分類出每間店家中所代表的關注面向,是本研究的首要目標;另外也嘗試透過各項特徵對店家進行自動分類面向詞(Aspect term),讓消費者可以知道每間店家評論中關鍵的面向詞是在哪個關注面向,是本篇論文的另一項研究目標。 本論文使用的資料來自於TripAdvisor國際旅遊評論網站,實驗訓練資料選自彰化市知名7間飯店,而測試資料選自台北2間飯店。首先使用中央研究院中文斷詞系統先將評論進行斷詞處理,再從資料評論中找出關鍵特徵。研究目的有二:第一個目的是先篩選出每則評論文章中的面向詞(Aspect term)和形容詞,並依序自動歸類到四個不同面向類別(aspect category)上,統計出每間飯店在關注面向中所出現的關注面向會落在哪個類別上,讓消費者可以快速知道每間飯店所著重的優勢。 第二個目的是自動分類面向詞,找出全部評論文章裡所出現的面向詞(Aspect term),將所有面向詞映射到向量空間後搭配每個面向詞附近共同出現的形容詞,當使用者想要查看所在意面向的評論文章時,不需要每篇評論都看過,而是能夠透過自動分類出來的四個主題面向快速找到有關此面向的評論文章。本研究在關注面向部分,利用SVM訓練模型及預測結果,可以得到不錯的準確率。
With the rapid development of the Internet, we can more easily browse consumers’ comments. Many people will get used to surf the Internet before making purchases, making reservations or booking. The evaluation help for making final decision, and hope that the items will correspond to people needs. The store also hopes that consumers can write some valuable opinions after shopping or experiencing, so these comments can attract more attention and provide the sellers with quality and improvement. Each comment contains the opinions given by users and comments have many different aspects. Therefore, the first goal of the study is to automatically classify the user’s comments from a large number of user comments. Furthermore, if we can find proper features to automatically categorize the targets aspect terms, then consumers can know which key face-to-face words in each targets’ comments are oriented upwards. This is the second research goal of the thesis. The dataset used in this thesis are from the TripAdvisor International Tourism Review website. The experimental training data are selected from 7 well-known hotels in Changhua and the test data are selected from 2 hotels in Taipei. First, we use the Academia Sinica Chinese word-segmentation system to process the comments, and then find the keywords from the data reviews. There are two research purposes: the first purpose is to screen out the aspect terms and emotional words in each review article, and automatically classify them into four different aspect categories, and then count each hotel's aspect focused on which category, so that consumers can quickly know the advantages of each hotel. The second purpose is to automatically classify the aspect terms, find the aspect terms that appear in all comment articles, and map all the aspect terms to the vector space and match the adjectives that appear near each aspect term. When users want to view the comment articles for the specific hotel, users don't need to read every comment. Instead, users can quickly find the review articles about this topic through the four categories that are automatically classified, and you can analyze each aspect in more detail. This study focuses on the aspect categorization, using the SVM to train the models and to predict results, and reaches a good accuracy.