Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering


Sam Weng;Chun-Kai Wu;Yu-Chun Wang;Richard Tzong-Han Tsai

Key Words

Question Retrieval ; QR ; Community-based Question and Answer ; CQA



Volume or Term/Year and Month of Publication

22卷2期(2017 / 12 / 01)

Page #

17 - 29

Content Language


Chinese Abstract

In recent years, community-based question and answer (CQA) sites have grown rapidly in number and size. These sites represent a valuable source of online knowledge; however, they often suffer from the problem of duplicate questions. The task of question retrieval (QR) aims to find previously answered semantically similar questions in CQA archives. Nevertheless, synonymous lexical variations pose a big challenge for question retrieval. Some QR approaches address this issue by calculating the probability of correlation between new questions and archived questions. Much recent research has also focused on surface string similarity among questions. In this paper, we propose a method that first builds a continuous bag-of-words (CBoW) model with data from Asus's Republic of Gamers (ROG) forum and then determines the similarity between a given new question and the Q&As in our database. Unlike most other methods, we calculate the similarity between the given question and the archived questions and descriptions separately with two different features. In addition, we factor user reputation into our ranking model. Our experimental results on the ROG forum dataset show that our CBoW model with reputation features outperforms other top methods.

Topic Category 人文學 > 圖書資訊學
基礎與應用科學 > 資訊科學
工程學 > 電機工程
  1. Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. Retrived from arXiv preprint arXiv:1301.3781
  2. Bansal, M.,Gimpel, K.,Livescu, K.(2014).Tailoring Continuous Word Representations for Dependency Parsing.Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(ACL)
  3. Berger, A.,Caruana, R.,Cohn, D.,Freitag, D.,Mittal, V.(2000).Bridging the lexical chasm: statistical approaches to answer-finding.Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval(SIGIR '00)
  4. Cao, X.,Cong, G.,Cui, B.,Jensen, C. S.,Zhang, C.(2009).The use of categorization information in language models for question retrieval.Proceedings of the 18th ACM conference on Information and knowledge management(CIKM '09)
  5. Ponte, J. M.,Croft, W. B.(1998).A language modeling approach to information retrieval.Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval(SIGIR '98)
  6. Song, F.,Croft, W. B.(1999).A general language model for information retrieval.Proceedings of the eighth international conference on Information and knowledge management(CIKM '99)
  7. Xue, X.,Jeon, J.,Croft, W. B.(2008).Retrieval models for question and answer archives.Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval(SIGIR '08)
  8. Zhai, C.,Lafferty, J.(2004).A study of smoothing methods for language models applied to information retrieval.ACM Transactions on Information Systems,22(2),179-214.
  9. Zhang, K.,Wu, W.,Wang, F.,Zhou, M.,Li, Z.(2016).Learning Distributed Representations of Data in Community Question Answering for Question Retrieval.Proceedings of the Ninth ACM International Conference on Web Search and Data Mining(WSDM '16)
  10. Zhou, G.,He, T.,Zhao, J.,Hu, P.(2015).Learning continuous word embedding with metadata for question retrieval in community question answering.Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing