隨著世界科技發展,人們透過線上社群媒體交流快速成長。然而隨之而來的仇恨性言論已漸漸成為一道不可忽視的議題。在這項研究中,我們嘗試針對不同語料、片語、俚語等冒犯性言論進行研究以找出描述性的語言模式以辨識出推特上的仇恨言論。然而,研究仇恨性言論並不容易,仇恨性言論可能會隨著時間、季節而改變;但用於仇恨性言論的描述模式卻會逐漸趨於穩定。在我們的研究成果中可以發現:儘管仇恨性言論會隨著語言、用字而改變,但基本核心概念卻不會。有鑑於此,我們提出一項新架構能根據推特上使用者內文的語言模式來預測出潛在的仇恨性言論。
Amidst the rise of technology, the usage of communicating via social media platforms has grown exponentially. With more users working and communicating solely online, ensuring that hate speech is being flagged properly should be of greatest importance. In this thesis, we attempt to identify descriptive linguistic patterns that can detect the presence of hate speech in a tweet. Researches in the hate speech domain currently focus on identifying different hate speech dictionaries, code words, slangs, and offensive language that are associated with hate. However, keeping up with new words that are used to portray hate is a daunting task. Hate speech words can be rather seasonal and change over time, but the descriptive patterns commonly used with hate speech are more resilient. We present an innovative approach in displaying that although hate words change overtime, the intensity patterns tend to remain the same. We propose a framework that allows the prediction of these cases by identifying linguistic patterns, on user-generated content of Twitter, that lead to hate speech.