一個建立於網格式具高效能及高效率的群聚演算法

近年來因資訊科技迅速的發展，資料量伴著資料庫的應用隨之增加，而透過資料探勘技術能夠挖掘出其有用的隱含資訊。資料分群是資料探勘領域中最常見與被討論的探勘技術，分群技術是將所取得的資料分類出群組內具高相似度及群組間具高相異度的數個群組。這幾年來許多學者紛紛提出各式演算法，目的在於改良現有演算法的效能及效率，能夠使其有效應用於往後的資訊環境，演算法包含切割、密度、階層與網格式等四大類，而現今提出的方法在效率及正確率上仍有不足的問題。本論文提出一個基於網格式架構名GCCR(Grid-based Clustering with Cross Relation)的新演算法，首先使用網格刻度進行網格劃分並將資料點歸屬於網格，利用密度的概念進行雜訊網格的過濾，完成上述動作後開始做網格合併動作，本方法利用網格橫直列間的相關性做掃描合併，能夠快速的達到資料分群結果。經實驗證實，本方法基於一個簡單的架構下執行資料分群，可以應用於任意圖形資料集，並與現有演算法相較，可大幅減少一般所需花費之時間成本。因此本論文所提出之方法除具高效率外且可行性極高。

關鍵字

資料探勘；資料分群；網格式分群；叢集分析

並列摘要

Recently, there are a lot of data mining applications in enterprise. The most famous approaches are K-means and DBSCAN in this field. It still cannot solve the issues of high execution cost and low correction ratio in arbitrary dataset. Although DBSCAN is a high correction ratio algorithm, it can’t compute in low execution cost while K-means has poor accuracy as well. In order to improving the efficiency and effectiveness, many clustering algorithm methods have been developed. The thesis presents a clustering algorithm, called “GCCR”. GCCR is a grid-based clustering, and it associates row-line relationships to complete clustering. Experimental results reveal that GCCR is having better clustering performance in comparison approaches. It includes existing well-known K-means, DBSCAN, CLIQUE, ANGEL and GOD-CS. In conclusion, GCCR is very simple to implement, and it also has high efficiency and effectiveness clustering algorithm method.

並列關鍵字

data clustering ； data mining ； grid-based clustering

參考文獻

[7] Agrawal, R., Gehrk, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: ACM-SIGMOD Int. Conf. Management of Data. pp. 94-105 (1998)

[8] Borah, B., Bhattacharyya, D.K.: An Improved Sampling-Based DBSCAN for Large Spatial Databases. In: Proceedings of International Conference on Intelligent Sensing and Information Processing, pp. 92-96. Chennai, India (2004).

[12] Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A Hierarchical Clustering Using Dynamic Modeling. In IEEE Computer: Special Issue on Data Analysis and Mining, vol. 32, no. 8, pp.68-75 (1999)

[13] MacQueen, J.B.: Some Methods of Classification and Analysis of Multivariate Observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297 (1967)

[14] Tsai, C.F., Yen, C.C.: ANGEL: A New Effective and Efficient Hybrid Clustering Technique for Large Databases. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 817-824. Springer, Heidelberg (2007)

被引用紀錄

陳而設（2016）。以索引值導向為基礎具高效率的新網格群集演算法〔碩士論文，國立屏東科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0042-1805201714165802

國際替代計量

一個建立於網格式具高效能及高效率的群聚演算法

全文下載

主題瀏覽