透過您的圖書館登入
IP:44.214.106.184
  • 學位論文

運用文字探勘技術分析社群網站之半結構化資料

Applying Text Mining Technology for the Analysis of Semi-structured Data in the Social Network Sites

指導教授 : 吳昌憲
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來科技不斷的進步,網路變得越來越發達,而資料也隨之增加,加上「大數據」這個熱門議題的興起,業界或學界開始關注這個領域的發展。不管是政府的開放資料(Open Data)、社群網站、新聞等皆有相關研究,也造就了許多成功的案例,如天氣預測、股市趨勢分析等。其中社群網站更是受到重視,如2015年的台北市長選舉,分析了社群網站使用者的討論內容,藉由這些討論資料來去了解使用者們所關注的議題,讓候選人可以去制訂相關的選戰策略,使自身選情更加穩固。 這些社群網站上的討論內容,蘊含著許多可能性,可以使用相關的分析工具去做處理。以往傳統的市場調查方式,最大的問題就是成本上的考量,不管是時間或是金錢,都可能造成研究上的限制,導致無法達成預期的結果,而這些問題如果能夠交給電腦自動化的處理,不僅能更加精確、快速,也能解決成本上的問題,是較為符合資訊科技時代的處理方式,而分析出的成果,比起傳統的市場調查法,其結果能更貼近真實的情況。 因此本研究希望能透過抓取社群網站上,這些使用者的討論訊息,運用自然語言相關處理工具,去做字詞的拆解,並分析這些詞彙之權重後,導出關鍵詞彙。接著以潛在語意分析方法去比對詞彙與文件的相關性,得出潛在語意關連性,如此便可以得出詞彙的語意,並依此來將所有可能相關連的文章搜尋出來,藉此解決僅以關鍵字搜索時會出現的缺點,改善搜尋結果,最終達到文字探勘的效果。

並列摘要


In recent years, the technology is more advanced, the network is more developed, and the faster growing data is thus followed. The hot topic about "big data" rises. This field causes industry and academia to focus on its development. Several successful cases were achieved in the government open data, social networking sites, news related research, such as weather forecast and stock market trend analysis. In 2015 Taipei mayoral election, the content of user discussion on the social network site were analyzed to help candidates develop campaign strategies. There exist many possibilities for the discussion on these websites. The related analysis tools can be employed to do the processing. In the past, the traditional market survey faced the biggest problems in cost control of time and money. These problems created limitations of the research and thus affected the desired results. Compared with traditional market survey, if automation processing with the computer is implemented, these problems might be resolved. The obtained analysis results should expectedly reflect the real situation, and prove to be more accurate and efficient. Therefore, this study will focus on surfing discussions in the social network sites, and list all the keywords by using natural language processing tools to do word segmentation. After that, Latent Semantic Analysis is applied to compare correlation of words and documents to obtain latent semantic connection. All the documents associated with words are listed and search results are improved. Finally the effect of text mining is achieved.

參考文獻


[6]鍾任明,2005,“運用文字探勘於日內股價漲跌趨勢預測之研究”,私立中原大學資訊管理系碩士論文。
[2]楊淑蘭,2014,“口吃相關議題之網路口碑分析”,特殊教育學報,40期,頁35~62,12月。
[1]Westbrook, R. A. (1987). Product/consumption-based affective responses and post-purchase processes. Journal of Marketing Research, 24(3), 258-270.
[3]Hennig-Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the internet. Journal of Interactive Marketing, 18(1), 38-52.
[8]Salton, G., Wong, A. & Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. Communications of the ACM 18, 613-620.

延伸閱讀