透過您的圖書館登入
IP:3.15.3.240
  • 學位論文

應用潛藏面相評分分析於中文評論:使用局部潛藏狄利克雷分配方法

Latent Aspect Rating Analysis on Chinese Reviews: A Local LDA Based Approach

指導教授 : 盧信銘
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著網路科技的高速發展,網路上充滿著各式各樣的評論。如何針對這些非結構的資料進行分析也顯得日漸重要。然而在這些服務或產品評論當中,往往使用者只留下對於產品或服務的整體評論分數(overall rating),而沒有針對服務或產品的各主題面向(topical aspect)做分數的評比或是揭露使用者對於產品或服務的某一種主題面向的權重(weight),這樣對於使用者的幫助有限。而藉由分析文件的主題面向分數(topical aspect rating)和其權重(weight)的問題稱為潛藏面向評分分析(Latent Aspect Rating Analysis,簡稱:LARA)。 本研究試圖使用局部潛藏狄利克雷分配(Local Latent Dirichlet Allocation,簡稱:Local LDA)和潛藏評分迴歸模型(Latent Rating Regression,簡稱:LRR)將LARA分析應用於中文評論上。實驗共分為兩階段模型,第一階段使用Local LDA將經過前處理的評論內文進行面向的切割和和面向擷取,之後第二階段運用LRR模型以類似EM算法的形式試圖推論出文件的主題面向分數(topical aspect rating)和其權重(weight)。 本研究將使用華文最大的旅遊網站攜程網旅遊評論和全球最大的旅遊評論網站TripAdvisor為分析資料集,其中攜程網資料為使用網路爬蟲擷取後整理而成。實驗中我們可以發現Local LDA的方法比起Bootstrap相對較好,且Local LDA屬於非監督式學習,毋須人工手動設定種子關鍵詞,可以讓整個應用更加廣泛。

並列摘要


As the growth of web technology, it’s an important task to mine the detailed information in the online reviews. Most reviewers only rating the entity with overall rating; however, it’s not enough for users to learn more from the reviews. As a result, there is a new problem called Latent Aspect Rating Analysis in text mining which analyzes latent aspect and latent aspect weight simultaneously. In this research, we apply the LARA on the Chinese reviews. We use the Local LDA(unsupervised learning) and LRR model to analyze the online reviews. In the first stage, we use the Local LDA method on the review contexts to conduct the aspect segmentation after preprocessing. After the aspect segmentation, we can get the aspects and aspect representative words. In the second stage, we use the LRR model to infer the latent aspect rating and latent aspect weight. Our experiment uses the Ctrip and TripAdvisor online reviews as the dataset. The results demonstrate the Local LDA + LRR method has some advantage on Chinese LARA problems.

參考文獻


[2] M. Porter. An algorithm for su±x stripping. Program,14(3):130 - 137, 1980.
[5] D. Blei, A. Ng, and M. Jordan., (2003), Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022,2003.
[7] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2), 1-135.
[8] Pak, A., & Paroubek, P. (2010, May). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In LREC.
[10] Chaovalit, P., & Zhou, L. (2005, January). Movie review mining: A comparison between supervised and unsupervised classification approaches. In System Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on (pp. 112c-112c). IEEE.

被引用紀錄


羅子修(2017)。應用文字探勘技術於消費者產品使用狀況之研究-以手機遊戲線上評論為例〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201700232

延伸閱讀