近年來Blog逐漸成為網路的一項主流應用,透過Blog平台的文章分類、回應、聯播等功能,Blog在網路溝通、知識學習、社群交流互動等方面,已經是一項重要的訊息傳遞管道來源之一。Blog Comment內容,不但可以做為文章內容的補充、外部資訊的參考,甚至對Blog作者是一種很大的鼓勵。不過近來由於Blog Comment Spam現象的氾濫,許多Comment的內容不但對網路瀏覽者在對於瞭解文章的內容上沒有幫助,更是令Blog作者在對Blog文章的管理維護上造成極大的困擾,相對的也降低瀏覽者對於此Blog網站的觀感與喜好。 本研究提出一種運用資料探勘分類技術的方法,可以從網路蒐集Blog網頁,透過詞彙擷取與特徵選擇,建立出一個可偵測Blog Comment Spam之分類模型。而實驗結果發現,此偵測方法應用在Blog Comment Spam的過濾上有很好的效能。
For the past few years, Blog gradually became the network a mainstream application. It's an important messages transmission tunnel of network communication, knowledge learning and social communication by using the functions of Blog platform which like article categorization, comments and broadcasting. Blog comments, not only may do for the Blog content supplement or additional information reference, but also an encouragement to the Blogger. Blog Comment Spam phenomenon has ungovernable recently, lots of comment content not only has unhelpful to the readers to understanding the Blog content, but also distracted the Blogger about the Blog article management and maintenance, it also reduces the relatively feeling and druthers of viewer to the Blog site. In our research we propose a method, it may collects Blog pages from internet, split the article to words and attributes evaluation, using classifier of Data Mining technology, training a classification model for detecting the Blog Comment Spam. Finally we apply the model to Blog Comment Spam detecting in a good performance.