  • 學位論文


Applying Support Vector Data Description For Data Classification

指導教授 : 許俊欽


支援向量資料描述法(Support Vector Data Description;SVDD)由Tax和Duin於1999年所提出。SVDD主要目的為找出符合資料集之資料範圍描述,並藉由此目標資料投射到高維度的資料空間,試圖找出一個可包圍所有或最多訓練樣本的最小體積之最佳超球體,而該超球體邊界僅由少數目標資料或支援向量(support vectors;SVs)所建構而成。經由此資料轉換可使SVDD得到更合適、精確的資料描述,當加入新未知樣本時可使用所建構出的資料範圍進行離群值檢驗或分類。SVDD之優點為資料型態不設限、有最佳化之觀念、利用超球體緊密包絡訓練資料點及利用支援向量可訂定出分類之臨界值。近年來,SVDD被廣泛用於許多實際領域,如影像特徵分類、機械故障檢測、語音識別等,均有顯著的實用成效。 本研究之目的在於評估資料前處理方法是否影響SVDD之分類效率。所評估之前處理方法為常用之多變量維度縮減技術包含主成份分析法(Principal Component Analysis;PCA)與獨立成份分析法(Independent Component Analysis;ICA)。 本研究進行三個實例驗證,其中,性別資料案例與行動電話製程案例屬於連續型資料型態。另一案例為院內感染案例屬於離散型資料型態。透過PCA與ICA前處理之SVDD與傳統SVDD做比較,其結果發現雖PCA-SVDD與ICA-SVDD皆有減少特性變數,但透過前處理之SVDD分類準確率並沒有比以原始變數做為輸入之傳統SVDD好,且在Kappa分析中發現傳統SVDD也有較高的分類一致性與較低之誤判率。因此,我們發現前處理對SVDD分類效果並無直接影響。


Support Vector Data Description (SVDD) was developed by Tax and Duin in 1999. The objective of SVDD is to obtain a shaped decision boundary with minimum volume around a dataset. SVDD was firstly developed to detecting outliers. In this study, the SVDD will be adopted as a classification tool. The SVDD is unlimited to the data assumption. Moreover, the decision boundary is formed by Support Vectors (SVs) which are obtained from solving convex quadratic programming problem. This study aims at evaluating the impacts of preprocessing methods on the SVDD classification efficiency. The evaluated preprocessing methods are the widely used dimension reduction techniques, including Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Three real cases will be implemented. Among which, both causes of gender prediction and mobile phone process are the continuous typed datasets. The other case related to nosocomial infection detection, that is the case from Taichung General Veteran hospital and it is a discrete typed dataset. From Kappa analysis, results demonstrated that SVDD without using preprocessing methods can pose higher classification consistence and lower misclassification rates.


