社群媒體中顧客知識之挖掘：意見探勘技術開發

資訊科技與網際網路的普及，促成眾多新興應用的蓬勃發展，大量與多樣的資料迅速累積，為了有效地自大量資料中挖掘出有趣的知識，巨量資料分析的概念孕育而生。意見探勘是巨量資料分析的一項核心技術，其目的是自大量使用者產生資料中，分析使用者對某些有興趣的實體（例如，產品、服務等）的主觀看法（例如，意見、情感、評價等），並將這些資訊適當地摘要彙整，專換成結構化的顧客知識。本研究專注在意見探勘中意見句子識別的工作，為改善傳統監督式學習法在準備訓練資料上所需投入的大量人力與時間，本研究提出僅需要使用者提供少量的關鍵字，再輔以社群媒體抓取來，未經人工標註的使用者產生資料，便能夠進行半監督式的學習，產生與監督式學習相似甚至更佳的探勘結果。具體而言，本研究採用類別關聯規則演算法，達配本研究設計的半監督式學習法，提出規則式意見句子識別技術 (R-OSI)。根據實驗評估結果，本研究的R-OSI 技術能夠達到與監督式方法相近甚至更優良的效能。

關鍵字

意見句子識別；意見探勘；使用者產生資料；社群媒體分析；巨量資料分析

並列摘要

With the popularization of information and network technology, many emerging and interesting applications have been developed vigorously. The volume and variety of data accumulates rapidly. These data are considered vital assets for supporting crucial business intelligence applications. To better manage and use the valuable data, big data analytics, which is the process of examining large datasets containing a variety of data types to uncover hidden, previously unknown, and potentially useful patterns and knowledge, has become a crucial research issues. In this study, we concentrate on an important big data analytic task, namely opinion mining. We propose a rule-based opinion sentence identification (R-OSI) technique, which can retrieve relevant review sentences to a specific product feature of interest from a large volume of consumer reviews. The novelty of the proposed technique is that it adopts a semi-supervised learning approach by requesting a user to provide keywords to describe the target product feature. In addition, a set of unannotated consumer reviews are retrieved from various social media websites. On the basis of the user-provided keywords and the set of unannotated consumer reviews, the class association rule mining algorithm is applied to learn a set of opinion sentence identification rules for the target product feature. Our empirical evaluation results suggest that the proposed R-OSI technique achieves promising performance in opinion sentence identification, even when a supervised learning approach is adopted as the performance benchmark.

並列關鍵字

opinion sentence identification ； opinion mining ； user generated content ； social media analytics ； big data analytics

參考文獻

鍾珍珠、郭玉慧(2014)。從「巨量資料」綜觀全國性繳費即時交易的成長遠景。財金資訊季刊。80，2-8。

Google Scholar

蘇俊榮，2015，財政巨量資料的未來與挑戰，http://www.cse.yzu.edu.tw/qpi/download/speech/1040109_PDF.pdf，搜尋日期：2015 年5 月31 日。(Su, Chun-Jung. 2015. Future opportunities and challenges in financial big data. http://www.cse.yzu.edu.tw/qpi/download/speech/1040109_PDF.pdf. Accessed May. 31, 2015.)

Google Scholar

Agrawal, R.,Srikant, R.(1994).Fast algorithms for mining association rules in large databases.Proceedings of the 20th International Conference on Very Large Data Bases.(Proceedings of the 20th International Conference on Very Large Data Bases).:

Google Scholar

Chen, H.,Chiang, R. H. L.,Storey, V. C.(2012).Business intelligence and analytics: From big data to big impact.MIS Quarterly.36(4),1165-1188.

Google Scholar

Chen, W.,Zhou, J.(2010).A text classifier with domain adaptation for sentiment classification.Information Retrieval Technology.(Information Retrieval Technology).:

Google Scholar

被引用紀錄

張益誠、張育傑、余泰毅（2021）。探討環境教育論文的文件自動分類技術－以2013－2018年環境教育研討會摘要為例。環境教育研究，17(1)，85-128。https://doi.org/10.6555/JEER.17.1.085

國際替代計量

社群媒體中顧客知識之挖掘：意見探勘技術開發

全文下載

主題瀏覽