許多以字典為基礎之意見分析的研究,會採用不同的領域文章作為研究語料,然而常用於學術研究之HowNet與NTUSD…等意見字典,其應用對象為通用中文文章,缺少特定領域意見詞彙。因此,許多研究者在使用特定領域的研究語料時,會透過人工方式擴增意見詞,雖然詞彙正確率高,但要提升意見詞彙之涵蓋率,就必須耗費相當多人力,缺乏效率。本研究建置一套基於詞性組合的詞彙擴增雛型系統,實踐意見詞彙擴增流程,並以美食和美妝領域的產品評論當作研究語料,透過詞性組合抽取候選的領域意見詞彙,並進行詞彙相似度分群後,透過計算之目標領域內的代表詞過濾無效的候選意見詞彙。最後在意見詞彙正負極性判斷部分,本研究自HowNet與NTUSD內選取正負情感種子詞彙,並以意見詞彙與種子詞彙的平均距離做為權重進行基於語料庫的SO-PMI運算,決定候選意見詞彙的正負情感傾向,以產生領域意見字典。經本研究涵蓋率與文章情感分類實驗之結果顯示,將HowNet與NTUSD結合本研究之所擴增之意見詞彙後,於美食領域中,意見詞彙涵蓋率約提升了11%,文章情感分類之準確率則提升約5%,美妝領域則是詞彙涵蓋率提升約17%、準確率提升約6%,綜合實驗結果可證明意見詞彙涵蓋率的提升對於意見分析有正面之影響。期望本研究對於使用特定領域文章作為研究語料之情形,可以提升詞彙涵蓋率並增加意見分析之成效。
A great number of dictionary-based research based on dictionary and diverse article. However, there are few Chinese opinion dictionaries from specific domain. Therefore, many researcher expanse Chinese opinion dictionary manually. Though it reached high accuracy and coverage but expensive and low efficiency. The study developed a prototype based on the combination of Part-of-Speech to expand the opinions words with the reviews from iPeen, TripAdvisor, Yelp, UrCosme and FashionGuide. In addition, we extract the candidate domain opinion words with Part-of-Speech, and use the average distance with opinion words for the calculation of SO-PMI. And then, the study according to the result to determined sentiment tendency and opinion word generation. The present study expect the improvement of coverage and opinion analysis with the article from specific domain.