Title

線上薦證餐廳評論之分析與偵測

Translated Titles

Analysis and Detection of Online Paid Restaurant Reviews

DOI

10.6342/NTU201602005

Authors

高文俊

Key Words

垃圾意見評論 ; 虛假意見評論 ; 部落格 ; 業配文 ; 網路寫手 ; 薦證廣告 ; opinion spam ; blog ; testimonial

PublicationName

臺灣大學資訊工程學研究所學位論文

Volume or Term/Year and Month of Publication

2016年

Academic Degree Category

碩士

Advisor

陳信希

Content Language

英文

Chinese Abstract

近年來人們習慣於網路上分享他們的意見與經驗,而這些意見正影響著人們的決策。以消費來說,大部分的人都會在進行購買前先在網路上參考別人的評論。惡意的業者便利用虛假的評論以操控線上輿論,這樣的行為延伸到社群網路以及部落格上。本論文收集在痞客邦上的薦證評論來了解這樣的行銷活動。我們發現薦證評論以及網路寫手的一些特性。基於前面的觀察,我們針對薦證評論以及網路寫手提出一系列特徵並使用監督式的機器學習方法做偵測,結果顯示出特徵的有效性而偵測效能也大幅超越隨機的表現。最後透過一個基於馬可夫隨機場的協同偵測方法,整合對評論以及寫手的偵測實驗。這樣的方法可以利用評論以及使用者之間的關係。在表現上超越獨立偵測的結果。

English Abstract

In recent years, people get used to sharing their opinions and experiences on the Internet. These opinions greatly influence our decisions. For example, most people read online reviews before they make purchases. Malicious companies or individuals make use of fake reviews to control the opinions on social media and blogs. In this thesis, we collect paid reviews on Pixnet and understand this type of promotion campaigns. Some characteristics of paid reviews and writers are found. We then propose a set of features based on our observation and detect paid reviews and writers using supervised machine learning techniques. Our results demonstrate the effectiveness of features and outperform random baseline significantly. Furthermore, a collective detection method using Markov Random Fields is proposed to detect paid reviews and writers seamlessly. The collective detection method can utilize the relations among review and user instances. The results outperform the performance of separate detections.

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學研究所
Reference
  1. Leman Akoglu, Rishi Chandy and Christos Faloutsos, “Opinion Fraud Detection in Online Reviews by Network Effects”, Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, Association for the Advancement of Artificial Intelligence, 2013.
    連結:
  2. Enrico Blanzieri, Anton Bryl, “A survey of learning-based techniques of email spam filtering”, Artificial Intelligent Systems and Technology, Vol. 29 Issue 1, pp. 63-92, 2008.
    連結:
  3. C.-C. Chang and C.-J. Lin. “LIBSVM : a library for support vector machines”, ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011.
    連結:
  4. Yu-Ren Chen and Hsin-Hsi Chen, “Opinion Spam Detection in Web Forum: A Real Case Study”, Proceedings of the 24th International Conference on World Wide Web, ACM, pp. 173-183, 2015.
    連結:
  5. Wenjing Duan, Bin Gu and Andrew B. Whinston, “An empirical investigation of panel data”, Decision Support Systems, pp.1007-1016, 2008.
    連結:
  6. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification, Journal of Machine Learning Research 9 , pp. 1871-1874, 2008.
    連結:
  7. Atefeh Heydari, Mohammad ali Tavakoli, Naomie Salim, Zahra Heydari, “Detection of review spam: A survey”, Expert Systems with Applications, Elsevier, Vol. 42 Issue 7 pp. 3634-3642, 2015.
    連結:
  8. Chin-Lan Huang, Chung, C. K., Natalie Hui, Yi-Cheng Lin, Yi-Tai Seih, Ben C. P. Lam, Wei-Chuan Chen, Michael H. Bond, James W. Pennebaker, “The Development of the Chinese Linguistic Inquiry and Word Count Dictionary”, Chinese Journal of Psychology, pp. 185-201, 2012.
    連結:
  9. C. L. Hung, C. K. Chung, N. Hui, Y. C. Lin, Y. T. Seih, W. C. Chen, B. Lam, M. Bond, and J. W. Pennebaker, “The Development of the Chinese Linguistic Inquiry and Word Count Dictionary”, Chinese Journal of Psychology, pp.54, 2012.
    連結:
  10. M. McCord and M. Chuah, “Spam Detection on Twitter Using Traditional Classifiers”, Autonomic and Trusted Computing, Vol. 6906, pp. 175-186, 2011.
    連結:
  11. Man-Chun Ko and Hsin-Hsi Chen, “Analysis of Cyber Army’ s Behaviours on Web Forum for Elect Campaign”, Information Retrieval Technology, Springer, Vol 9460 pp. 394-399, 2016.
    連結:
  12. C.L. Lai, K.Q. Xu, Raymond Y.K. Lau and Y. Li, “Toward A Language Modeling Approach for Consumer Review Spam Detection”, IEEE International Conference on E-Business Engineering, pp. 1-8, 2010.
    連結:
  13. Huayi Li, Zhiyuan Chen, Bing Liu, Xiaokai Wei and Jidong Shao, “Spotting Fake Reviews via Collective Positive-Unlabeled Learning”, Proceedings of IEEE International Conference on Data Mining, pp. 899-904, 2014.
    連結:
  14. T Mikolov, I Sutskever, K Chen, GS Corrado, J Dean, “Distributed representations of words and phrases and their compositionality”, Advances in neural information processing systems, pp.3111-3119, 2013.
    連結:
  15. Arjun Mukherjee, Bing Liu, Natalie Glance, “Spotting fake reviewer groups in consumer reviews”, Proceedings of the 21st international conference on World Wide Web, pp. 191-200, 2012.
    連結:
  16. Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, Nitin Jindal, “Detecting Group Review Spam”, Proceedings of the 20th international conference companion on World wide web, ACM, pp. 93-94, 2011
    連結:
  17. Nikita Spririn, Jiawei Han, “Survey on Web Spam Detection: Principles and Algorithms”, ACM SIGKDD Explorations Newsletter, ACM, Vol. 13 Issue 2, pp. 50-64, 2011.
    連結:
  18. Duyu Tang, Bing Qin, Ting Liu, “Learning Semantic Representations of Users and Products for Document Level Sentiment Classification”, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1014-1023, 2015.
    連結:
  19. Aldert Vrij, Ronald Fisher, Samantha Mann and Sharon Leal, “A cognitive load approach to lie detection”, Journal of Investigative Psychology and Offender Profiling, Wiley, Vol. 5 Issue 1-2, pp. 39-43, 2008.
    連結:
  20. Chang Xu, Jie Zhang, Kuiyu Chang and Chong Long, “Uncovering collusive spammers in Chinese review websites”, Proceedings of the 22nd ACM international conference on Information & Knowledge Management, ACM, pp.979-988, 2013.
    連結:
  21. Xianchao Zhang, Shaoping Zhu, Wenxin Liang, “Detecting Spam and Promoting Campaigns in the Twitter Social Network”, 12th International Conference on Data Mining, IEEE, pp. 1194-1199, 2012.
    連結:
  22. Fabrício Benevenuto, Gabriel Magno, Tiago Rodrigues, Virgílio Almeida, “Detecting spammers on twitter”, Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, CiteSeer, Vol. 6, pp. 12, 2010.
  23. Yu-Ren Chen and Hsin-Hsi Chen, “Opinion Spammer Detection in Web Forum”, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 759-762, 2015.
  24. Zi Chu, Indra Widjaja, Haining Wang, “Detecting Social Spam Campaigns on Twitter”, Applied Cryptography and Network Security, Springer, Vol. 7341, pp. 455-472, 2012.
  25. Gordon V. Cormack, “Email Spam Filtering: A systematic Review”, Foundation and Trends in Information Retrieval, now publishers, Vol. 1, No. 4(2006) pp. 335-455, 2006
  26. Nitin Jindal and Bing Liu, “Opinion Spam and Analysis”, Proceedings of ACM the 2008 International Conference on Web Search and Data Mining, ACM, pp. 219-230, 2008.
  27. Haibo He and Edwardo A. Garcia, “Learning from Imbalanced Data”, IEEE Trans. on Knowl. and Data Eng., pp.1263-1284, 2009.
  28. Christopher G. Harris, “Detecting Deceptive Opinion Spam Using Human Computation”, Human Computation AAAI Technical Report WS-12-08, Association for the Advancement of Artificial Intelligence, pp. 87-93, 2012.
  29. Pedram Hayati, Vidyasagar Potdar, Alex Talevski, Nazanin Firoozeh, Saeed Sarenche, Elham A. Yeganeh, “Definition of Spam 2.0: New Spamming Boom”, 4th IEEE International Conference on Digital Ecosystems and Technologies, Dubai, pp. 580-584, 2010
  30. Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu and Jidong Shao, “Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns”, Proceedings of the Ninth International AAAI Conference on Web and Social Media, Association for the Advancement of Artificial Intelligence, pp. 634-637, 2015.
  31. Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, Hady W. Lauw, “Detecting Product Review Spammers using Rating Behaviors”, Proceedings of the 19th ACM international conference on Information and knowledge management, ACM, pp. 939-948, 2010.
  32. Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance, “What Yelp Fake Review Filter Might Be Doing”, Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, Association for the Advancement of Artificial Intelligence, pp. 409-418, 2013.
  33. Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T., “Finding Deceptive Opinion Spam by Any Stretch of the Imagination”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Volume 1 pp. 209-319, 2011.
  34. James W. Pennebaker, Cindy K. Chung, Molly Ireland, Amy Gonzales, and Roger J. Booth, “The Development and Psychometric Properties of LIWC2007”, Austin, TX, LIWC.Net, 2007.
  35. Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, George Burri, Duen Horng (Polo) Chau, “Turning the Tide: Curbing Deceptive Yelp Behaviors”, Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 9, 2014.
  36. Shebuti Rayana and Leman Akoglu. “Collective Opinion Spam Detection: Bridging Review Networks and Metadata”, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 985-994, 2015.
  37. KC Santosh and Arjun Mukherjee, “On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp”, Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 369-379, 2016.
  38. Feng, Song and Banerjee, Ritwik and Choi, Yeji, “Syntactic Stylometry for Deception Detection”, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, Association for Computational Linguistics, 2012.