分類是根據事物之特性將事物指派到某一類別的過程,為資料探勘領域最常被探討的問題。處理大量資料的分類問題有許多方法論,各適用在不同情況與資料性質。大多數分類方法為確保分類品質建議先進行資料特徵選取,避免多餘或不相關特徵影響分類正確率。透過多樣化的特徵選取方式縮減特徵數改善正確率進而發展出許多不同的分類模式,其中複合式的資料探勘分類方法常被用以建立有效分類模式。 本研究以資料探勘分類方法論建構複合分類預測模式,運用線性區別分析法 (Linear Discriminate Analysis, LDA)、約略集理論法(Rough Sets Theory approach, RST)、決策樹法 (Decision Tree, DT)、 F分數法(F-Score )與灰關聯分析法(Grey Relational Analysis, GRA)五種特徵選取技術篩選影響分類的重要屬性,之後結合類神經網路(Neural Network, NN)、K最鄰近法(K-Nearest Neighborhood, KNN)、支援向量機(Support Vector Machine, SVM)與邏輯斯迴歸(Logistic Regression, LR)四種分類方法形成不同複合分類模式,計算所得複合模式之測試樣本平均分類正確率,比較預測分類結果與實際結果,並同時比較不同特徵選取方法應用在相同分類機制上、相同特徵選取方法應用在不同分類機制上的差異;此外分類結果採用無母數符號檢定(Wilcoxon Sign Rank Test)探討模式間是否有顯著差異。所得結果顯示,經過特徵篩選後所得之複合模式比原始模式分類能力顯著提升,各複合模式所保留之重要特徵相對重要性排序呈現類似結果;以F分數法作為特徵選取結合不同分類法所形成複合模式具最佳平均分類正確率。
Classification is a process of assigning objects into different classes by their attributes which has been discussed mostly in the field of data mining. There are many classification methodologies in dealing with huge data that apply to various situations and different characteristics of data. Most classification methodologies suggest features selection first to ensure the quality of classification so that the accuracy of classification will not be affected due to redundant or irrelevant features. Diversity classification models will be developed through the reduction of features that improve the accuracy of original classification models. The compound data classification methods are usually employed to establish effective classification models. This research establishes prediction models of classification by data mining methodology. Important attributes are extracted by five various features selection approaches that combine with the four different classifiers to optimize features space. The average accuracy of each approach is compared in combination with different classifiers and nonparametric Wilcoxon signed rank test is taken to show if there is any significant difference between these models. The experimental results demonstrate that the proposed structures outperform original methods and the features selection approach of F-score is a promising method for the fields of data mining.