透過您的圖書館登入
IP:52.90.50.252
  • 學位論文

應用支援向量資料描述法於資料之分類

Applying Support Vector Data Description For Data Classification

指導教授 : 許俊欽
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


支援向量資料描述法(Support Vector Data Description;SVDD)由Tax和Duin於1999年所提出。SVDD主要目的為找出符合資料集之資料範圍描述,並藉由此目標資料投射到高維度的資料空間,試圖找出一個可包圍所有或最多訓練樣本的最小體積之最佳超球體,而該超球體邊界僅由少數目標資料或支援向量(support vectors;SVs)所建構而成。經由此資料轉換可使SVDD得到更合適、精確的資料描述,當加入新未知樣本時可使用所建構出的資料範圍進行離群值檢驗或分類。SVDD之優點為資料型態不設限、有最佳化之觀念、利用超球體緊密包絡訓練資料點及利用支援向量可訂定出分類之臨界值。近年來,SVDD被廣泛用於許多實際領域,如影像特徵分類、機械故障檢測、語音識別等,均有顯著的實用成效。 本研究之目的在於評估資料前處理方法是否影響SVDD之分類效率。所評估之前處理方法為常用之多變量維度縮減技術包含主成份分析法(Principal Component Analysis;PCA)與獨立成份分析法(Independent Component Analysis;ICA)。 本研究進行三個實例驗證,其中,性別資料案例與行動電話製程案例屬於連續型資料型態。另一案例為院內感染案例屬於離散型資料型態。透過PCA與ICA前處理之SVDD與傳統SVDD做比較,其結果發現雖PCA-SVDD與ICA-SVDD皆有減少特性變數,但透過前處理之SVDD分類準確率並沒有比以原始變數做為輸入之傳統SVDD好,且在Kappa分析中發現傳統SVDD也有較高的分類一致性與較低之誤判率。因此,我們發現前處理對SVDD分類效果並無直接影響。

並列摘要


Support Vector Data Description (SVDD) was developed by Tax and Duin in 1999. The objective of SVDD is to obtain a shaped decision boundary with minimum volume around a dataset. SVDD was firstly developed to detecting outliers. In this study, the SVDD will be adopted as a classification tool. The SVDD is unlimited to the data assumption. Moreover, the decision boundary is formed by Support Vectors (SVs) which are obtained from solving convex quadratic programming problem. This study aims at evaluating the impacts of preprocessing methods on the SVDD classification efficiency. The evaluated preprocessing methods are the widely used dimension reduction techniques, including Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Three real cases will be implemented. Among which, both causes of gender prediction and mobile phone process are the continuous typed datasets. The other case related to nosocomial infection detection, that is the case from Taichung General Veteran hospital and it is a discrete typed dataset. From Kappa analysis, results demonstrated that SVDD without using preprocessing methods can pose higher classification consistence and lower misclassification rates.

參考文獻


[38] 張皓然,「發展支援向量資料描述-田口系統(SVDD-TS)於多變量分類問題之應用」碩士論文,私立朝陽科技大學工業工程與管理研究所 (2012)。
[37] 鄭仲宏,「建構類神經網路辨識器於ICA管制圖異常信號之診斷」碩士論文,私立朝陽科技大學工業工程與管理研究所 (2011)。
[1] Altmann, J., “Observational study of behavior: sampling method,” Behaviour, Vol. 49, No. 3, pp. 227-267 (1974).
[3] Doucet, A., Godsill, S,. and Andrieu, C., “On Sequential Monte Carlo Sampling Methods For Bayesian Filtering,” Statistics and Computing, Vol. 10, pp. 197-208 (2000).
[4] Banerjee, A., Burlina, P., and Diehl, C., “A Support Vector Method for Anomaly Detection in Hyperspectral Imagery,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 44, No. 8, pp.2282-2291 (2006).

延伸閱讀