原型選擇法：結合 CLU 和 PSR 之方法

與大型數據集處理時，在數據挖掘中的數據減少是非常重要的。通過數據減少，就可以提高存儲效率，減少數據挖掘過程的運行時間。以減少數據量的方法是選擇性地保留所述數據集中的一個子集作為原始數據集的表示。這種方法被稱為原型選擇。原型選擇的目標是丟棄在訓練集中多餘的情況下，因為多餘的情況下，會影響結果的數據挖掘。在本文中，我們提出了採用聚類算法的混合原型的選擇方法和選擇的集群成員的最相關的子集，名為CLU-R算法。的結果表明，CLU-R的算法進行比單獨使用原始的方法更好.

關鍵字

數據挖掘；數據壓縮；原型的選擇方法；模糊C均值

並列摘要

Data reduction in data mining is very important when we are dealing with large datasets. Through data reduction, we can increase storage efficiency and reduce the run time of data mining process. One of the methods is to reduce the volume of data and selectively retain a subset of the dataset as the representation of the original one. This method is known as prototype selection (Olvera et al, 2008). Prototype selection aims to discard the superfluous instances in training set, because superfluous instances affect the results in data mining. In this thesis, we propose a hybrid prototype selection method using clustering algorithm and selected the most relevant subset of cluster members, called CLU-R algorithm. The results showed that the CLU-R algorithm performed better than the original methods used individually.

並列關鍵字

Data mining ； data reduction ； prototype selection methods ； fuzzy c-means clustering

參考文獻

[1] Lopez, J. A. O. (2010). Prototype selection methods, Computation System, 13(4), 449-462.

[2] Lopez, J. A. O., Ochoa, J. A. C., Trinidad J. F. M., & Kittler, J. (2010). A review of instance selection methods. Artif Intell Rev, 34, 133-143.

[3] Cover, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory, 13, 21-27.

[5] Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory, 14, 515-516.

[6] Ritter, G. L., Woodruff, H. B., Lowry, S. R., & Isenhour, T. L. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory, 21(6), 665-669.

國際替代計量

原型選擇法：結合 CLU 和 PSR 之方法

全文下載

主題瀏覽