本論文的主要目標是探索大型語言模型(如 GPT-4o)在社交媒體中對表情符號進行情感檢測的能力,並評估其對表情符號情感的理解是否與人工標註者一致。先前的研究已顯示,大型語言模型在理解表情符號的語義、語用、情感和使用意圖方面取得了顯著成功。然而,針對表情符號情感檢測的研究相對缺乏。本研究使用 Instagram 貼文資料集,並利用重複出現的相同表情符號來區分情緒強度差異,配合 Plutchik 情感輪的固有位置優勢來計算一致性。研究比較了 GPT-4o 和人工標註者在情感標註任務中的表現,以評估模型在情感檢測中取代人類的可行性。本研究使用描述性統計分析來觀察情感檢測結果的分佈和變異性。也通過計算 GPT-4o 在不同表情符號上的精確度、召回率和準確性來評估其效果,並進行誤分類分析。研究結果顯示,人工標註者和 GPT-4o 在識別正面情感方面表現優良。儘管人工標註者表現出更廣泛的二級情感範圍,GPT-4o 則更一致和集中。表情符號本身的外觀也影響了模型的情感檢測判斷。這些研究結果表明,GPT-4o 在情感檢測任務中的表現可與人工標註者相媲美,但在處理情感的細微差異方面仍存在不足。本研究為未來社交媒體情感分析技術的發展提供了重要參考。
The main goal of this thesis is to explore the emotion detection capabilities of large language models, such as GPT-4o, on emojis in social media and to evaluate whether its understanding of emoji emotions aligns with that of human annotators. Previous research has shown that large language models have achieved significant success in understanding the semantics, pragmatics, sentiment, and user intentions of emojis. However, there is a gap in research specifically focused on emotion detection related to emojis. This study aims to fill this gap by using Instagram text datasets and proposing a new annotation scheme based on repeated emoji features and Plutchik's model. This method introduces a way to calculate agreement using the inherent positional advantage of Plutchik's wheel of emotions. The performance of GPT-4o and human annotators in emotion annotation tasks was compared to evaluate the feasibility of the model replacing humans in emotion detection. Descriptive statistical analysis was used to provide insights into the distribution and variability of emotion detection results. Additionally, GPT-4o's performance in terms of precision, recall, and accuracy was calculated to assess its effectiveness with different emojis. Misclassification analysis was also conducted to identify the reasons behind GPT-4o's classification errors. The results showed strong consistency between human annotators and GPT-4o in identifying positive emotions. While human annotators exhibited a broader range of secondary emotions, GPT-4o was more consistent and focused. The appearance of the emoji itself also influenced the model's emotion detection judgment. These findings indicate that GPT-4o's performance in some emotion detection tasks is comparable to that of human annotators, but there are still deficiencies in handling subtler distinctions in emotions. This study provides important references for the development of future social media emotion analysis technologies.