近年來因資訊科技迅速的發展,資料量伴著資料庫的應用隨之增加,而透過資料探勘技術能夠挖掘出其有用的隱含資訊。資料分群是資料探勘領域中最常見與被討論的探勘技術,分群技術是將所取得的資料分類出群組內具高相似度及群組間具高相異度的數個群組。這幾年來許多學者紛紛提出各式演算法,目的在於改良現有演算法的效能及效率,能夠使其有效應用於往後的資訊環境,演算法包含切割、密度、階層與網格式等四大類,而現今提出的方法在效率及正確率上仍有不足的問題。本論文提出一個基於網格式架構名GCCR(Grid-based Clustering with Cross Relation)的新演算法,首先使用網格刻度進行網格劃分並將資料點歸屬於網格,利用密度的概念進行雜訊網格的過濾,完成上述動作後開始做網格合併動作,本方法利用網格橫直列間的相關性做掃描合併,能夠快速的達到資料分群結果。經實驗證實,本方法基於一個簡單的架構下執行資料分群,可以應用於任意圖形資料集,並與現有演算法相較,可大幅減少一般所需花費之時間成本。因此本論文所提出之方法除具高效率外且可行性極高。
Recently, there are a lot of data mining applications in enterprise. The most famous approaches are K-means and DBSCAN in this field. It still cannot solve the issues of high execution cost and low correction ratio in arbitrary dataset. Although DBSCAN is a high correction ratio algorithm, it can’t compute in low execution cost while K-means has poor accuracy as well. In order to improving the efficiency and effectiveness, many clustering algorithm methods have been developed. The thesis presents a clustering algorithm, called “GCCR”. GCCR is a grid-based clustering, and it associates row-line relationships to complete clustering. Experimental results reveal that GCCR is having better clustering performance in comparison approaches. It includes existing well-known K-means, DBSCAN, CLIQUE, ANGEL and GOD-CS. In conclusion, GCCR is very simple to implement, and it also has high efficiency and effectiveness clustering algorithm method.