網際網路的發展到現在儼然已經成為一個新的媒體,閱聽眾已不再是被動地獲得資訊,而是可以自主的選擇自己想要資訊,甚至可以對於各個公司、團體、人物、產品等的主體發表個人的意見與評論。這些評論對於其他人如何評價主體是具有相當的影響力,所以去了解發文內涵的極性與面向,將有助於了解大眾的好惡以及原因。 不過這些網路評論的產生速度之快,數量之大,早已無法以人力來做分析。然而現今的意見探勘系統主要都以document-level的方式針對主體的聲量做正負評的統計,所得到的結果雖然有一定的參考價值,但是缺乏對於面向做分析,無法進一步了解這些意見的細節。因此我們發展中文的aspect-level意見探勘系統,透過完整句演算法來獲取觀點層次的意見。 意見探勘系統的主要組成是:爬文模組、分析模組與報表模組。作者的研究是針對這三個模組做改善與精進。改善的方式主要如下:爬文模組是透過“排除關鍵字”來提昇爬文的準確率,減少不相關文章帶來的雜訊;分析模組主要是用我們提出的“評價計分演算法”來平衡發文中正負評計算的特殊狀況,讓探勘結果更接近真相;報表模組則是改進使用者介面與報表的呈現,讓使用者能更容易了解每日的正負評及其面向。此外我們開發了發文者追蹤的功能,對於判斷發文者意見是否具有參考價值(如寫手的意見),或發文者真的需要主體特別的協助(如問題遲遲無法解決)具有相當的貢獻。
The development of the Internet has become just like a new media; audience are no longer passively getting information but allowed to view the information only they desired, and even to express their personal opinions and comments on subjects of companies, organizations, individuals, products, etc. These comments will considerably influence the viewpoints of others on a given subject; therefore, understanding the polarity and orientation of the connotation will help to perceive the public attitudes toward the subject and the cause of the attitudes. However, a huge quantity of the comments on the internet are produced rapidly, which comes too large and too fast to be analyzed manually. Yet, the opinion exploration system today mainly adopts document-level approach, which the polarity is found by the level of volume. Although the outcome of this finding certainly has a reference value, no further details about the comments can be identified due to lack of dimensional analysis. Therefore, we develop Chinese aspect-level opinion exploration system; through an algorithm of completed sentence, various level of viewpoints from the comments can be obtained. The opinion exploration system is made of crawling module, analysis module and report module. The author's research is aimed at improving and enhancing these three modules. The improvement approaches are as follows: crawling module is to "exclude keywords" to improve the accuracy of crawling and reduce the noise from irrelevant articles; the analysis module mainly uses our "evaluation scoring algorithm" "To balance the special situation of both polarities in the comments, so that the outcome of the finding can be even more realistic; the report module is to improve the user interface and report presentation, which allows the user easily to comprehend the both polarities and dimensions of daily comments. In addition, we have developed a comment tracking feature, which can identify whether a comment has the reference value (i.e. through fake reviewer) or whether a comment maker actually requires a special assistance on an given subject (i.e. a problem has still not been resolved), has a significant contribution.