透過您的圖書館登入
IP:18.219.22.169
  • 期刊
  • OpenAccess

Word Co-occurrence Augmented Topic Model in Short Text

並列摘要


The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. The Bi-term topic model (BTM) has been proposed to improve this problem. However, BTM just consider simple bi-term frequency which cause the generated topics are dominated by common words. In this paper, we solve the problem of the frequent bi-term in BTM. Thus, we proposed an improvement of word co-occurrence method to enhance the topic models. We apply the word co-occurrence information to the BTM. The experimental result that show our PMI-β-BTM gets well result in the both of regular short news title text and the noisy tweet text. Moreover, there are two advantages in our method. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing BTM based models.

參考文獻


Hofmann, T.(1999).Probabilistic latent semantic analysis.Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence.(Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence).
Blei, D. M.,Ng, A. Y.,Jordan, M. I.(2003).Latent dirichlet allocation.the Journal of machine Learning research.3,993-1022.
Divya, M.,Thendral, K.,Chitrakala, S.(2013).A Survey on Topic Modeling.International Journal of Recent Advances in Engineering & Technology.1,57-61.
Mimno, D.,Wallach, H. M.,Talley, E.,Leenders, M.,McCallum, A.(2011).Optimizing semantic coherence in topic models.Proceedings of the Conference on Empirical Methods in Natural Language Processing.(Proceedings of the Conference on Empirical Methods in Natural Language Processing).
Yan, X.,Guo, J.,Lan, Y.,Cheng, X.(2013).A biterm topic model for short texts.Proceedings of the 22nd international conference on World Wide Web.(Proceedings of the 22nd international conference on World Wide Web).:

延伸閱讀