頻繁子空間之分類器

隨著資料量越來越大，利用所有資料維度的分類方法並不可行，因此用子空間建構分類器的概念越來越受到矚目。因為前人的研究只利用隨機或部份的子空間來做分類，所以本篇論文提出一個「頻繁子空間之分類器」，以產生出所有可能的子空間，並利用這些子空間建構一個分類器。我們所提出的方法主要包括三個階段。首先，我們使用小波轉換將資料的維度下降。然後，使用閾值篩選出所有可能的頻繁二維子空間，並用深度優先搜尋法找出高維度的頻繁子空間。最後，利用AdaBoost選出重要的子空間以建構一個組合分類器。因為我們的方法可將所有可能的子空間列入考慮，所以有更多的機會建構一個效能不錯的分類器。實驗結果顯示，不管在UCI資料或股市資料中，我們所提出的方法皆優於SVM和LogitBoost。

關鍵字

子空間；分群；分類器； AdaBoost

並列摘要

With the amount of the data increasing rapidly, it is infeasible to consider all the dimensions of the data to perform classification. Thus, constructing a classifier based on subspaces has attracted more and more attention. The previously proposed methods used randomly-generated or some subspaces to construct a classifier. Therefore, in this thesis, we propose a hybrid classification method, called FSC (Frequent subspace classifier), to generate all potential subspaces and utilize these subspaces to construct a classifier. Our proposed method consists of three phases. First, we apply the discrete wavelet transform to reduce the dimensions of feature vectors. Next, we employ the frequent subspaces mining method to derive all potential subspaces. Finally, we exploit AdaBoost to select the significant subspaces from the potential subspaces derived to construct an ensemble classifier. Since the FSC generates all potential subspaces and selects the subspaces based on the maximum entropy reduction, it provides more opportunities to construct an effective classifier. The experiment results show that the FSC outperforms the SVM and LogitBoost in both UCI and stock datasets.

並列關鍵字

subspace ； clustering ； classifier ； AdaBoost

參考文獻

C. C. Aggarawal and P. S. Yu, Finding generalized projected clusters in high dimensional spaces, In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000, pp. 70-81.

R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data, In Proceedings of ACM SIGMOD International Conference on Management of Data, 1998, pp. 94-105.

A. Assareh, M. H. Moradi, and L. G. Volkert, A hybrid random subspace classifier fusion approach for protein mass spectra classification, In Proceedings of International Conference on Computer Science, Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 2008. pp. 1-11.

Y. Bao, Y. Lu, and J. Zhang, Forecasting stock price by SVMs Regression, In

Proceedings of International Conference on Computer Science, Artificial Intelligence: Methodology, Systems, and Applications, vol. 3192, 2004, pp.295-303.

國際替代計量

頻繁子空間之分類器

全文下載

主題瀏覽