Clustering Non-Ordered Discrete Data

Clustering in continuous vector data spaces is a well-studied problem. In recent years there has been a significant amount of research work in clustering categorical data. However, most of these works deal with market-basket type transaction data and are not specifically optimized for high-dimensional vectors. Our focus in this paper is to efficiently cluster high-dimensional vectors in non-ordered discrete data spaces (NDDS). We have defined several necessary geometrical concepts in NDDS which form the basis of our clustering algorithm. Several new heuristics have been employed exploiting the characteristics of vectors in NDDS. Experimental results on large synthetic datasets demonstrate that the proposed approach is effective, in terms of cluster quality, robustness and running time. We have also applied our clustering algorithm to real datasets with promising results.

並列關鍵字

clustering ； data mining ； categorical data ； non-ordered discrete data ； vector data

國際替代計量

全文下載

主題瀏覽