透過您的圖書館登入
IP:18.222.35.21
  • 學位論文

基於線上部落格食評語段分析之 料理與意見詞配對演算法

Discourse-Based Cuisine-Opinion Pair Identification from Online Restaurant Reviews

指導教授 : 蔡宗翰
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


部落格食評能提供撰寫者本身對於餐廳料理的品嘗經驗,也因此成為許多網路使用者挑選餐廳的依據。使用者常常花費許多的時間去找尋餐廳內歡迎的料理,因此希望能夠正確且快速的提供使用者餐廳的推薦料理。本論文有別於傳統在意見詞與目標詞配對時以句子和距離為基礎的分析方式,提出了一個嶄新的方式來處理意見詞與目標詞配對,以便做為各料理名稱推薦度評分計算的基礎。 雖然網路部落格食評屬於非結構化文件,其格式與長度並不一致且沒有規定,但撰寫者在描述物品以及其資訊是有順序性的,故我們依照該特性提出了利用語段的方式先對食評去切割,切割後的語段會由數個句子來組成,其中包含一道料理以及對該料理的描述或是評論,接著再以語段為基礎,進行意見詞以及目標詞的配對處理,利用最大熵模型計算食評中所提及的推薦料理名稱推薦評分。 實驗結果證實,加入語段切割後,產生的推薦料理序列之MAP分數較未加入語段切割前高5%。進一步加入意見詞頻率資訊,MAP分數可提升至55%。

並列摘要


Online blog reviews are one of useful sources of information for creating restaurant directories. An important step in mining restaurant review blogs is extracting recommendation cuisines. In this paper, we propose a novel method for extracting recommendation cuisines from Chinese language blog reviews. We observe that when users introduce a cuisine in blog reviews, they usually mention the full cuisine name at first. Then, they describe how the cuisine tastes like. The cuisine picture can be located before the full cuisine name or in the end of the discourse for introducing the cuisine. According to this structure, we identify the full cuisine name and check if there is any picture in the surrounding position. We develop several patterns to identify the discourse for a given cuisine name. We compile a sentiment dictionary and design several effective syntactic patterns to identify the opinion words corresponding to a given cuisine name. In addition, for a given cuisine, opinion words associated with its food materials are also aggregated to it. Finally, we calculate the recommendation score of the given cuisine using a Maximum Entropy model in which opinion words associated with the cuisine are used as features. Experimental results show that the performance of generating the list of recommended cuisines are significantly improved by 5%. By considering frequency information, the MAP score can be further improved by 4%.

參考文獻


[1] Technorati, " http://technorati.com/"
[8] V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives," Association for Computational Linguistics , pp. 174-181, 1997.
[9] T. Okamoto, et al., "Locally contextualized smoothing of language models for sentiment sentence retrieval," ACM, pp. 73-80, 2009.
[17] G. Qiu, et al., "Extracting opinion topics for Chinese opinions using dependence grammar," ADKDD, pp. 40-45, 2007.
[19] W. Y. Ma and K. J. Chen, "Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff," Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 168-171, 2003.

延伸閱讀