透過您的圖書館登入
IP:18.219.103.183
  • 會議論文

PTT推文之詞組規則研究

Study on the phrase rule of PTT push messages

摘要


目前大數據時代來臨,除了分析大量結構化資料外,挖掘非結構化資料背後之深層意涵也日益重要。PTT為國內具代表性意義的大型論壇之一,由於使用者眾多,又不需過多地透露使用者個人資料,論壇上討論主題,正反意見交錯熱議之情況每日可見,使得PTT論壇應可成為文字探勘、情感分析的寶庫,目前已有多篇研究應用文字探勘手法分析其中之網路情感。但綜觀各項研究,對於PTT文章之推文部分,因推文順序較不整齊、文句過短、使用網路慣用語等技術性上之問題,不是略過不處理,或是使用較簡便之指標如推文數量來衡量網路情感。本研究對於PTT推文,整理出不易分析之原因,並提出將同一推文作者之推文合併、建立網路用語資料庫、推文前方強烈語助詞應納入考量等建議,並提出2種推文專用之詞組規則。

並列摘要


At present, with the advent of the big data era, besides analyzing a large amount of structured data, it is increasingly important to explore the deep implications behind unstructured data. PTT is one of the representative significance large BBS in Taiwan, due to the numerous users, and does not need too much revealed that users of personal data by means of discussion on the BBS, positive and negative opinions staggered daily can be seen, the debate of the PTT BBS should become a treasure trove of text mining and sentiment analysis, there are many study application of text mining technique analysis of network semantic among them. However, through a review of all the studies, the technical problems of the push messages of PTT articles, such as the irregular sequence of push messages, short sentences, and the use of network idioms, are not ignored, or simple indicators such as the number of push messages are used to measure network semantic. For PTT push messages, this study sorted out the reasons that could not be analyzed easily, and proposed to combine the push messages of the same push message author, establish a database of Internet terms, and take into account the strong auxiliary words in front of the tweet, and put forward two kinds of phrase rules for the push messages.

並列關鍵字

Text mining Semantic analysis Social media PTT

延伸閱讀


國際替代計量