透過您的圖書館登入
IP:3.145.156.250
  • 學位論文

以半監督式方法來萃取線上評論者的用戶資訊

A Semi-supervised Approach for Profiling Online Reviewers

指導教授 : 魏志平

摘要


在這個資訊爆炸的時代,人們可以輕易地取得各種資訊,而線上評論是其中一個重要的資訊來源,且會深深影響使用者的決策。如果能進一步知道評論者的個人資料,則對消費者和電商業者都會很有幫助。然而,大部分的線上評論網站基於個人隱私的關係,並沒有提供這些資訊,因此我們只能從評論內容來推測評論者的個人資訊。「用戶分析」這個研究領域專門透過文本或其他資訊來萃取使用者的相關特徵,但是用來訓練分類器的標準答案通常難以取得。因此許多研究人員決定請專家幫忙標注,但人為標注十分耗時又費工。 在這篇論文中,我們提出了一個半監督式的方法,希望不用借助人為標注就可以取得標準答案來訓練分類器。我們進行了幾組實驗來比較我們的方法和傳統方法的表現,並描述所觀察到的現象。希望有一天,我們的方法能被實際應用在用戶資訊萃取中,幫助研究人員省下收集標準答案的時間,能更專注在特徵值的萃取和分類器的訓練。

並列摘要


In the modern age where everyone can easily access a variety of information, online review has become an important source and will deeply affect one’s decision. The ability of knowing reviewers’ profiles is helpful for both customers and online retailers in many ways. However, most of online review websites do not provide personal information of reviewers for the privacy concern, and the only clue that can be found is content of review. There is a research field called ‘user profiling’ which focuses on extracting user-profile attributes from corpus by using labeled datasets to train classifiers. Nevertheless, it is hard to get gold-standard datasets because of the lack of ground truth. As a result, many researchers found experts to help them label datasets, yet the manual annotation was a time-consuming and laborious task. In this paper, we propose a semi-supervised approach, trying to get labeled datasets without manual annotation. We conduct experiments to demonstrate the performance of our approach, comparing it with the ideal performance, and describe our observation. We hope that, one day, our method can be applied in user profiling, helping researchers save time on collecting gold-standard datasets, and focus on features extraction and classifier building.

參考文獻


Dellarocas, C. 2003. "The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms," Management Science (49:10), pp. 1407-1424.
Gou, L., Zhou, M. X., and Yang, H. 2014. "Knowme and Shareme: Understanding Automatically Discovered Personality Traits from Social Media and User Sharing Preferences," Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems: ACM, pp. 955-964.
Miyazaki, A. D., and Fernandez, A. 2001. "Consumer Perceptions of Privacy and Security Risks for Online Shopping," Journal of Consumer Affairs (35:1), pp. 27-44.
Oliver, M. B., Weaver, I., James B, and Sargent, S. L. 2000. "An Examination of Factors Related to Sex Differences in Enjoyment of Sad Films," Journal of Broadcasting & Electronic Media (44:2), pp. 282-300.
Otterbacher, J. 2010. "Inferring Gender of Movie Reviewers: Exploiting Writing Style, Content and Metadata," Proceedings of the 19th ACM International Conference on Information and Knowledge Management: ACM, pp. 369-378.

延伸閱讀