Discovering the Latent Writing Style from Articles: A Contextualized Feature Extraction Approach

With the growth of the Internet, the ready accessibility and generation of online information has created the issue of determining how accurate or truthful that information is. The rapid speed of information generation makes the manual filter approach impossible; hence, there is a desire for mechanisms to automatically recognize and filter unreliable data. This research aimed to create a method for distinguishing vendor-sponsored reviews from customer product reviews using real-world online forum datasets. However, the lack of labelled sponsored reviews makes end-to-end training difficult; many existing approaches rely on lexicon-based features that may be easily manipulated by replacing word usages. To avoid this word manipulation, we derived a graph-based method for extracting latent writing style patterns. Thus, this work proposes a Contextualized Affect Representation for Implicit Style Recognition framework, namely CARISR. Transfer learning architecture was also adapted to improve the model's learning process with weakly labeled data. The proposed approach demonstrated the ability to recognize sponsored reviews through comprehensive experiments using the limited available data with 70% accuracy.

關鍵字

Reliability ； Transfer Learning ； Writing Style ； Text Classification ； Natural Language Processing

參考文獻

Al-Anzi, F. S., & AbuZeina, D. (2017). Toward an enhanced arabic text classification using cosine similarity and latent semantic indexing. Journal of King Saud University-Computer and Information Sciences, 29(2), 189-195. doi: 10.1016/j.jksuci.2016.04.001

Argueta, C., Saravia, E., & Chen, Y.-S. (2015). Unsupervised graph-based patterns extraction for emotion classification. In Proceedings of the 2015 ieee/acm international conference on advances in social networks analysis and mining 2015, 336-341. doi: 10.1145/2808797.2809419

Gomez Adorno, H. M., Rios, G., Posadas Durán, J. P., Sidorov, G., & Sierra, G. (2018). Stylometrybased approach for detecting writing style changes in literary texts. Computación y Sistemas, 22(1), 47-53. doi: 10.13053/CyS-22-1-2882

Janicka, M., Pszona, M., & Wawer, A. (2019). Cross-domain failures of fake news detection. Computación y Sistemas, 23(3), 1089-1097. doi: 10.13053/CyS-23-3-3281

Pavlinek, M., & Podgorelec, V. (2017). Text classification method based on self-training and lda topic models. Expert Systems with Applications, 80, 83-93. doi: 10.1016/j.eswa.2017.03.020

國際替代計量

Discovering the Latent Writing Style from Articles: A Contextualized Feature Extraction Approach

全文下載

主題瀏覽