  • 學位論文


Comment Extraction from Blog Posts and Its Application to Opinion Mining

指導教授 : 陳信希


近年來,由於部落格數量以及部落客人數大幅增加,著實改變了網路上人與人之間溝通的方式。舉例來說,部落客可以將推薦的部落格加入部落格連結,而產生了部落格和部落格之間的關係;每個部落客在發表貼文時,可以引用其他部落格貼文,而產生貼文與貼文之間的關係。此外,部落格貼文和一般網頁最大的不同點在於讀者在瀏覽部落格貼文後,可以藉由留下評論的方式和作者(亦即部落客)互動。 由於部落格貼文往往包含了許多作者的個人經驗、對某些特定事物的看法等,在意見探勘研究上是相當有用的素材。其中,評論不僅代表了讀者們對貼文中所描述對象的觀感,更可能是對這篇文章支持亦或反對的態度。然而,過去的研究往往只針對如何找尋部落格貼文中作者的意見,而把隱藏在評論中的讀者意見給忽略了。我們推斷原因,可能是評論擷取的確有不少挑戰性。每個部落格站都提供了不同的樣板來呈現評論,並沒有一定的規格,甚至在一個部落格站中也會有不同的樣板。樣版不同,代表評論和評論之間會有極大的差異性。如何在各式各樣來源的部落格貼文中,都可以正確切分出一個一個的評論,顯然並非一個簡單的問題。 在本篇論文中,我們分析部落格貼文中評論的特性,提出一些方法來解決評論擷取的問題,並實作一個部落格意見搜尋系統,將評論擷取的結果應用到部落格意見探勘上。最後總結在評論擷取的實驗結果與應用於意見探勘上的成果,並提出許多有趣的議題亦或方向供未來研究。


In recent years, the style of communications on the Internet is changed due to the growing amount of blogs and bloggers. For example, bloggers may put their commendatory blogs into the blog roll in their blogs, and the relationship between blogs and blogs thus is formed. When a blogger writes a post, he or she may cite other blog posts. This establishes the relationship among blog posts and bog posts. Besides, one of the most differences between blogs and standard web pages is that blogs allow readers to interact with bloggers by placing comments on specific blog posts. Because blog posts usually contain many personal experiences or perspectives toward specific subjects, they are useful materials for opinion mining. Moreover, the comments in a blog post carry the viewpoints of readers toward the targets described in the post or the supportive or nonsupportive attitude toward the post. However, the previous works on opinion mining focus on author’s opinion only. In other words, mining opinions of readers in the comment region is largely ignored. The reason may be the challenges of comment extraction. Each blog service provider provides its unique templates to present comments. A specification of templates among all blog service providers does not exist. Even a blog service provider may have different templates. How to correctly extract each comment from blog posts of different sources is apparently not an easy task. In this thesis, we analyze comments in blog posts, propose methods to deal with comment extraction, and apply our comment extraction results to blog opinion mining. Finally, we conclude with the experimental results of comment extraction and the achievements in blog opinion mining, and state some interesting issues for future work.


Deepayan Chakrabarti, Ravi Kumar, and Kunal Punera (2007). “Page-level template detection via isotonic smoothing.” Proceedings of the 16th International Conference on World Wide Web (WWW’07), 61–70.
Chih-Chung Chang and Chih-Jen Lin. (2001). “{LIBSVM}: a library for support vector machines.” Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Yi-Wei Chen and Chih-Jen Lin. (2005). “Combining SVMs with various feature selection strategies.” Available from http://www.csie.ntu.edu.tw/~cjlin/ papers/features.pdf
Osamu Furuse, Nobuaki Hiroshima, Setsuo Yamada, and Ryoji Kataoka. (2007). “Opinion Sentence Search Engine on Open-domain Blog,” Proceedings of the 20th Int’l Joint Conf. of Artificial Intelligence. Hyderabad: IJCAI Press, 2760-2765.
Hua Geng, Qiang Gao and Jingui Pan. (2007). “Extracting Content for News Web Pages based on DOM.” IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.2, 124-129.


