透過您的圖書館登入
IP:18.221.239.148
  • 期刊

資料分類、群聚及其整合之研究:從資料挖掘觀點

A Study of Data Classifying, Clustering and Their Integration: A Data Mining Perspective

摘要


資料挖掘係由大量資料中擷取出有價值之知識,企業因此能取得由資料所挖掘出之經營規則(business rule)及顧客行為等過去較難取得之知識,進而創造競爭優勢。資料挖掘的兩個重要且根本的領域分別是分類(classifying)與群聚(clustering),兩者均是一種分群(set-partitioning)之技術。分類法針對已知類別之範例尋求分類規則,強調由規則以作出完美分群,群聚針對未知類別之範例尋求相似度與相異度,強調由群內相似度及群間相異度以作出完美分群。過去之研究,分類與群聚多由各自的方法以進行問題之解決,然而,有些問題並不適合單獨以分類或群聚的方法去解決,而需要兩者之整合。本文主要在綜合以上分類、群聚及整合兩者之研究現況,接著提出分類與群聚兩方法整合之研究的分類法(taxonomy)及如何應、用之建議,並整理出三種整合方式,分別是分類樹形式進行群聚法、先群聚後分類法及同時分類與群聚法。

關鍵字

分類 群聚 分類及群聚 資料挖掘

並列摘要


Data mining discovers valuable knowledge from very large dataset. Enterprises therefore can formulate business rules and customer behaviors which were difficult to realize before. Classifying and clustering, a kind of set-partitioning technology, are two important and fundamental domains of data mining. Classifying acquires rules from known classes, and it emphasizes finding rules for perfect partitioning. Clustering gains degree of similarity and dissimilarity from unknown classes, and it emphasizes similarity in the same cluster and dissimilarity in different clusters for perfect partitioning. In the past, classifying and clustering are usually used independently to solve problems. However, some problems seem cannot be solved by classifying or clustering solely, but need an integration of them. This paper synthesized the current research situation of classifying, clustering and their integration, and proposed research taxonomy and suggestions for the integration of classifying and clustering. We have organized the integration in three ways: tree-classifiying clustering, clustering-then-classifying and clustering-classifying method.

參考文獻


R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami(1992), An Interval Classifier for Database Mining Applications, Proc. of the 18th Int'l Conference on Very Large Databases, Vancouver, pp. 560-573.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan(1998), Automatic subspace clustering of high dimensional data for data mining applications, SIGMOD'98, pp. 94-105.
M. Ankerst, Ch. Elsen, M. Ester, and H.-P. Kriegel(1999), Visual Classification: An Interactive Approach to Decision Tree Construction, Proc.5th Int. Conf. on Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, pp. 392-396.
M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander(1999), Optics: Ordering points to identify the clustering structure, SIGMOD'99, pp.49-60.
P. S. Bradly, U. Fayyad, and C. Riena(1998), Scaling clustering algorithm to large databases, KDD'98, pp. 9-15.

延伸閱讀