可修正的二分群集法

在資料探勘（Data mining）的領域中，群集技術（Clustering）的相關研究一直是相當重要的研究課題，它主要的目的就是將資料中具有相似特性的資料聚成同一群，使得同群之間同質性高，不同群之間有顯著性的差異。由於此種方法的實用價值相當高，所以被廣泛的應用在不同的領域裡，如：社會科學、遺傳學、生物科學、商業和教育等。群集有很多種方法，其中最常見的是階層式群集法（Hierarchical clustering）與分割式群集法（Partitional clustering）兩大類，其中階層式群集法所呈現的是一種樹狀架構，由於此種方法比較能符合資料本身的特性進行群集，所以相較於分割式群集法擁有較好的群集品質，但它最為人詬病的是執行的時間複雜度高，而相反的分割式群集法擁有較低時間複雜度。因此便有學者提出結合兩類演算法優點的群集演算法，稱為二分法群集法（Bisecting k-mean clustering），此群集法擁有分割式群集法的時間複雜度低及階層式群集法分群品質較佳的優點，但是此群集法其二分的特性，使得資料點在一開始若分錯了類別，便無法在之後執行的過程中被修正回來，因此，本研究便是針對此缺點做了深入的探討，並提出改良的方法，我們的方法命名為可修正的二分群集法（Adjustable Bisecting K-mean clustering），此法在群集的過程中利用『合併』的技術來提昇群集的品質，使得原本分錯的資料點能在之後群集的過程中被『合併』修正回來，最後，並透過嚴謹的實驗來驗證我們群集法的可行性。

關鍵字

資料探勘；群集演算法；分割式群集；二分群集法；階層式群集法ㄝ

並列摘要

In knowledge discovery and data mining, clustering is one of the important issues. Its main feature is 『A cluster is a collection of data objects that are similar to one another within the same cluster and dissimilar to the objects in other clusters.』Clustering is useful in a number of different areas, including Social Science, genetics, bioscience, business and education. Two of the most popular methods among many clustering methods are Hierarchical clustering and Partitional clustering. Hierarchical clustering is often portrayed as the better quality clustering approach, but limited because of its quadratic time complexity. Partitional clustering’s advantage is lower time complexity. In past, researcher offer the clustering algorithm called Bisecting K-mean clustering that both have the advantage of Hierarchical clustering that have a better cluster qulity and Partitional clustering that is lower time complexity. But this method’s main drawback is that when some data have divided into wrong cluster in the beginning, they can’t be adjusted in following step due to it’s bisecting step. Therefore this research focus on this “mistake”, and offer a method called Adjustable Bisecting K-mean clustering, this method use “aggregation” to improve the result of Clustering quality and let the “mistakes” can be adjusted in following step. At last, we propose an explanation to verify that the feasibility of this method.

並列關鍵字

k-means ； Bisecting k-means clustering ； Clustering Algorithm ； Hierarchical clustering ； Partitional clustering

參考文獻

[6] MCQUEEN, J. 1967. Some methods for classificationand analysis of multivariate observations.In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281–297.

[7] Bjorner Larsen and Chinatsu Aone, Fast and Effective Text Mining Using Linear-time Document Clustering, KDD-99, San Diego, California, 1999.

[9] C.J. van Rijsbergen, "Information Retrieval". Buttersworth, 1979.

[12] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331

[15] Douglass Cutting, David Karger, Jan Pedersen, and John W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, 1992.

被引用紀錄

白裕猷（2010）。群集分析於空間破壞機制之資料探勘〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2010.00591

林進富（2008）。應用可攜式腦波機於睡眠呼吸阻斷與良導絡相關性之研究〔博士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2008.00256

邱瑞民（2007）。未知群數之可重新分配的分裂導向模糊分群〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-0607200917241788

陳信夫（2011）。基於字詞關係動態建立階層分群〔碩士論文，國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-1903201314413147

國際替代計量

可修正的二分群集法

未授權

主題瀏覽