透過您的圖書館登入
IP:3.15.164.218
  • 學位論文

中文Blog Comment Spam偵測技術之研究

Detecting Comment Spam in Chinese Blog

指導教授 : 林志麟

摘要


近年來Blog逐漸成為網路的一項主流應用,透過Blog平台的文章分類、回應、聯播等功能,Blog在網路溝通、知識學習、社群交流互動等方面,已經是一項重要的訊息傳遞管道來源之一。Blog Comment內容,不但可以做為文章內容的補充、外部資訊的參考,甚至對Blog作者是一種很大的鼓勵。不過近來由於Blog Comment Spam現象的氾濫,許多Comment的內容不但對網路瀏覽者在對於瞭解文章的內容上沒有幫助,更是令Blog作者在對Blog文章的管理維護上造成極大的困擾,相對的也降低瀏覽者對於此Blog網站的觀感與喜好。   本研究提出一種運用資料探勘分類技術的方法,可以從網路蒐集Blog網頁,透過詞彙擷取與特徵選擇,建立出一個可偵測Blog Comment Spam之分類模型。而實驗結果發現,此偵測方法應用在Blog Comment Spam的過濾上有很好的效能。

並列摘要


For the past few years, Blog gradually became the network a mainstream application. It's an important messages transmission tunnel of network communication, knowledge learning and social communication by using the functions of Blog platform which like article categorization, comments and broadcasting. Blog comments, not only may do for the Blog content supplement or additional information reference, but also an encouragement to the Blogger. Blog Comment Spam phenomenon has ungovernable recently, lots of comment content not only has unhelpful to the readers to understanding the Blog content, but also distracted the Blogger about the Blog article management and maintenance, it also reduces the relatively feeling and druthers of viewer to the Blog site.   In our research we propose a method, it may collects Blog pages from internet, split the article to words and attributes evaluation, using classifier of Data Mining technology, training a classification model for detecting the Blog Comment Spam. Finally we apply the model to Blog Comment Spam detecting in a good performance.

參考文獻


11. 李駿翔,應用資料探勘分類技術於專利分析之研究,中原大學資訊管理學系碩士學位論文,2003
1. Baoning Wu, Brian D. Davison. “Cloaking and Redirection: A Preliminary Study”, First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005.
4. Ian H. Witten, Eibe Frank. “Data Mining: Practical machine learning tools and techniques”, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
5. Minoru Sasaki, Hiroyuki Shinnou. “Spam Detection Using Text Clustering”, Cyberworlds, 2005. 4th International Conference on Cyberworlds (CW 2005), pages 23-25.
2. Chen Aitao, He Jianzhang, Xu Liangjie. “Chinese Text Retrieval Without Using a Dictionary”, Proceedings of SIGIR97, the 20th annual ACM conference on Research and Development in Information Retrieval, Philadelphia, PA, July 26-31,1997, pages 42-49.

被引用紀錄


林銘笙(2010)。中文部落格評論之分類〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-2007201011391200

延伸閱讀