透過您的圖書館登入
IP:18.191.216.163
  • 學位論文

意見分析之研究與應用

A Study on Opinion Analysis and its Applications

指導教授 : 陳信希

摘要


意見分析主要的目的在於利用電腦來分析人所發表的意見。這個議題在資訊爆炸的時代愈顯其重要性。意見分析的首要任務是探勘意見並將之擷取出來,並將其加以整理與應用,讓使用者最終能獲得有用的資訊。 為達到這個目標,首先我們必須設計用來探勘意見的演算法。為了開發演算法,必須為它找尋有用的知識供其學習,並準備一個良好的評估環境,以便在開發過程中適當地改進演算法。我們為意見探勘的演算法準備了一部意見詞典,並收集新聞及部落格文章作為它的測試集。而又因意見本身的不確定性,我們也為其發展了一套根據多位標記者提供的正確答案來訂定評估標準的方法。 我們所提出的演算法從兩個不同的角度切入:微觀與巨觀。演算法將會在詞彙、句子、與文章三個不同的層次尋找意見。在巨觀的演算法中,我們關心的是每一個組成成份的意見傾向,我們相信由這些成份的意見傾向可以決定整體的意見傾向,例如詞彙的意見傾向是由字所決定,句子的意見傾向是由詞彙所決定,而文章的意見傾向則是由句子所決定,以此類推。在微觀的演算法中,我們從一個完整的訊息切入,關心它的成份中互相之間的關連與影響,從而決定整個訊息的意見傾向。為了考慮成份之間互相影響的方式,我們引入了詞彙形態學與句子結構等資訊。我們期待微觀的演算法能補足巨觀演算法的不足。 在發展意見探勘的演算法,進而能夠找到意見資訊之後,我們提出了數個可能的應用,包括意見摘要、意見追蹤、意見問答及利用意見進行的關聯探索,意見摘要與意見追蹤提供了整理意見,並將意見呈現給使用者的不同方式。意見摘要提供一個以文字及量化數據表示的整體意見,而意見追蹤則提供正負面意見評論隨時間消長的趨勢圖。意見問答技術更進一步利用探勘而得的意見作為背景知識,有別於傳統常識型的問答系統,具有回答與意見相關的問題的能力。利用意見進行的關連探索更能跳脫一般對於「具有某種關係」的定義,直接找出互相之間具有影響力的配對。這些都是相當特別而實用的應用。 從我們的實驗中得知,無論是演算法或是我們提出的應用,都有令人滿意的效能。我們據此發展出一個中文意見分析系統 – CopeOpi ,是第一個中文的意見分析系統。即使與國外的意見分析系統比較,我們的系統仍然提供了相當多的功能、有用的資訊與有效率的資料整理方式,是目前最先進的意見分析系統之一。 得到中文的實驗結果與研究成果之後,我們更將實驗的範圍由單語擴展到多語。我們從英文與日文兩個主要的國際語言切入,引用NTCIR國際比賽的語料,討論結構資訊在不同語言上的表現,以及自動翻譯效能在意見分析議題上的影響。快速而正確地整合並分析人類的意見,是我們最終的目標。期望我們在單語及多語意見分析上的研究成果,能夠提供一個堅實的研究基礎。

並列摘要


Opinion analysis contains two main parts: opinion mining and its applications. Opinion mining identifies opinion holders, extracts the relevant opinion sentences and decides their polarity. We first generate a Chinese opinion dictionary NTUSD for mining opinions. Moreover, since there are no commonly applied methods for creating evaluation corpora, we introduce a method for developing reliable opinion corpora involving multiple annotators. We develop algorithms for opinion mining from the macro (un-structural) view and the micro (structural) view. To demonstrate and evaluate the proposed opinion mining algorithms developed from the macro view, news and bloggers’ articles are adopted. Documents in the evaluation corpora are tagged in different units from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve an f-measure 62.16% at the sentence level and 74.37% at the document level. From the micro view, we further learn the polarity of Chinese words by classifying the word structures. Chinese words are classified into eight types based on the morphological information. Experiments show that the injection of morphological information makes a difference on word polarity identification. Given morphological types of words, the f-score 0.610 is achieved in word polarity prediction without using any word thesauri, which is 8.93% improvement from the f-score 0.56 of bag-of-characters approach. If only words which can bear opinions, i.e., nouns, verbs, adjectives and adverbs are considered, i.e., viewing others as non-opinionated, the word polarity prediction achieves 0.62 when morphological types are employed. With the algorithm from the micro view, the performance achieves 0.77 by incorporating an opinionated word dictionary NTUSD. We extend this idea about relations of characters in words to relations of words in sentences and also achieve a large improvement. Several applications are proposed in this dissertation. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the non-supportive evidence. Opinion tracking monitors the developments of opinions from spatial and temporal dimensions. An opinion tracking is generated to show the variation of opinions. Opinion question answering and relationship discovery, are another two applications discussed in more detail. People are interested in not only factual questions, but also opinions. We discuss question analysis and answer passage retrieval in opinion QA systems. For question analysis, six opinion question types are defined. A two-layered framework utilizing two question type classifiers is proposed. The performance achieves 87.8% in general question classification and 92.5% in opinion question classification. For answer passage retrieval, three components are introduced. Relevant sentences retrieved are further identified as to whether the focus (Focus Detection) is in a scope of opinion (Opinion Scope Identification) or not, and, if yes, whether the polarity of the scope and the polarity of the question (Polarity Detection) match with each other. The best model achieves an F-measure of 40.59%. With relevance issues removed, the F-measure of the best model boosts up to 84.96%. Objects which yield similar opinion tendencies over a certain time period may be correlated due to the latent causal events. We discover relationships among objects based on their opinion tracking plots and collocations. We collected 1.3M economics-related documents from 93 Web sources over 22 months for experiments, and proposed collocation-based, opinion-based, and hybrid models. We consider as correlated company pairs that demonstrate similar stock price variations, and selected these as the gold standard for evaluation. Results show that opinion-based and collocation-based models complement each other, and that integrated models perform the best. An achievement of our research is the Chinese opinion analysis system CopeOpi, which extracts from the Web opinions about specific targets, summarizes the polarity and strength of these opinions, and tracks opinion variations over time. It demonstrates the mentioned approaches and its user interface provides an example for other opinion analysis systems. We extend the research domain to the English language to discuss the research issue in different languages and enlarge the practicability of this research. An English parser is applied to extract structural information of English experiment materials. The discussion of the translation issue on opinion analysis is also included in and some interesting results are reported.

參考文獻


Branavan, S. R. K., Chen, H., Eisenstein, J., and Barzilay, R. (2008). Learning document-level semantic properties from free-text annotations. Proceedings of the Association for Computational Linguistics (ACL), .
Chen, K.-H. and Chen, H.-H. (2001). Cross-language Chinese text retrieval in NTCIR workshop – towards cross-language multilingual text retrieval. ACM SIGIR Forum, 35(2), pages 12-19.
Ghose, A. and Ipeirotis, P. (2007). Designing novel review ranking systems: Predicting usefulness and impact of reviews. Proceedings of the International Conference on Ectronic Commerce (ICEC), Invited paper.
Hiroshi, K., Tetsuya, N. and Hideo, W. (2004). Deeper sentiment analysis using machine translation technology. Proceedings of the 20th International Conference on Computational Linguistics, pages 494-500.
Huang, T.-H. (2009). Automatic Extraction of intra- and inter- Word syntactic Structures for Chinese Opinion Analysis. Master Thesis, National Taiwan University.

被引用紀錄


杜明潔(2018)。消費者網路電影口碑推薦系統〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201101053

延伸閱讀


國際替代計量