  • 期刊
  • OpenAccess

A Pretrained YouTuber Embeddings for Improving Sentiment Classification of YouTube Comments


Technology is changing the way we consume information and entertainment. YouTube streaming video services provide a discussion function that allows video publishers to know what matters most to the people they want to love their brand. Through comments, video publishers can better understand the audience's thoughts and even help video publishers improve their video quality. We propsoe a classifier based on machine learning and BERT to automatically detect YouTuber preferences, video preferences, and excitement levels. In order to make high performance of models, we use a pretrained YouTuber embeddings to enhance performance, which is trained in advance based on roughly 175,000 pieces of videos' comments that contain YouTubers' name. YouTuber embeddings can capture some of the semantics and character of the relation between YouTubers. Experimental results show that the performances of machine learning-based models with YouTuber embeddings have improved overall accuracy and F1-score on all sentiment classications. The result validates that YouTuber embedding training is significantly helpful when detecting audience sentiment towards YouTubers. On the contrary, BERT model cannot perfectly deal with the polarity classificational tasks when using YouTubers embeddings. However, the BERT model construction is more suitable for addressing multi-dimensional classification tasks, such as the five-labels classification task used in this task. Conclusion, the sentiment detection task on the YouTube can improve performance by the proposed multi-dimensional sentiment indicators and our solution to modify the structure on classifiers.


Cunha, A.A.L., Costa, M.C., & Pacheco, M.A.C. (2019). Sentiment Analysis of YouTube Video Comments Using Deep Neural Networks. In Proceedings of the 18th International Conference on Artificial Intelligence and Soft Computing (ICAISC), 561-570. https://doi.org/10.1007/978-3-030-20912-4_51
Hassan, A., & Mahmood, A. (2017). Deep learning approach for sentiment analysis of short texts. In Proceedings of the 3rd international conference on control, automation and robotics (ICCAR 2017), 705-710. https://doi.org/10.1109/ICCAR.2017.7942788
Heredia, B., Khoshgoftaar, T. M., Prusa, J., & Crawford, M. (2016). Cross-domain sentiment analysis: An empirical investigation. In Proceedings of the IEEE 17th International Conference on Information Reuse and Integration (IRI 2016), 160-165. https://doi.org/10.1109/IRI.2016.28
Zhu, Y., Yan, E. & Wang, F (2017). Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Med Inform Decis Mak, 17, 95. https://doi.org/10.1186/s12911-017-0498-1
Zhang, X., & Zheng, X. (2016). Comparison of text sentiment analysis based on machine learning. In Proceedings of the 15th international symposium on parallel and distributed computing (ISPDC 2016), 230-233. https://doi.org/10.1109/ISPDC.2016.39
