透過您的圖書館登入
IP:3.144.187.103
  • 學位論文

基於包裝器之特徵提取及網格搜尋支持向量機 之心血管疾病分類

Wrapper Feature Selection with MinMax Scaling and SVM with Grid Search for Cardiovascular Disease Classification

指導教授 : 王國禎

摘要


心血管疾病是全世界導致死亡的主要疾病之一。在醫療照護的領域,能夠早期偵測心血管疾病是一個重要的議題。首先,因為心血管疾病的量測數據包含了大量的特徵值,容易導致其分類的準確性降低,我們提出一個特徵選擇方法來找到合適的特徵集。接著,我們利用MinMax正規化將特徵值縮放到特定範圍。在分類的部分,儘管支持向量機已經被公認為一種有效的監督學習方法,其分類的準確率仍取決於如何選擇參數。因此,我們採用支持向量機結合網格搜索法(WMSG-CDC)來調整支持向量機的參數。最後,我們以調整後的支持向量機對心血管疾病進行分類。在特徵選擇的部分,我們採用隨機森林結合基於封裝器的特徵選取(WFS-RF)機制進行特徵選擇,並利用MinMax正規化方法來縮放特徵值。此外,我們使用網格搜索法優化支持向量機的參數,以提高分類效能。最後再利用支持向量機結合一對一,一對多和錯誤校正碼檢測心律正常或異常。我們根據準確性、敏感性、精確性和F1分數來對所提之演算法進行評估。實驗結果顯示,相較於主要相關文獻,我們所提的支持向量機結合一對一的方法更能有效地對心血管疾病進行分類。

並列摘要


The Cardiovascular Disease (CVD) is one of major death causing diseases in the worldwide. To detect it in the early stage is very crucial in healthcare. A high quantity of features in an arrhythmia dataset results in low classification accuracy of multiclass classification. Hence, there is a need to use an effective feature selection method for finding appropriate features, which also help to enhance the classification performance. Since the value ranges of features can be quite different, a normalization is necessary to re-scale values of features to a specific range. The support vector machine (SVM) is commonly considered as an efficient supervised learning method used in classification. Nevertheless, the success of SVM classification relies on the perfect choice of its parameters. Therefore, it is necessary to optimize SVM parameters using an optimization method to maximize the effectiveness of SVM classification. The proposed wrapper feature selection with MinMax scaling and SVM with grid search for cardiovascular disease classification (WMSG-CDC) approach addresses these issues. We apply the wrapper feature selection using random forest (WFS-RF) to determine the appropriate features for classification from a given dataset. In data preprocessing, a MinMax Scaling is used to re-scale selected features to a fixed range. In addition, grid search optimization is used to optimize parameters of the SVM to increase the performance of classification. Finally, the SVM along with one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) methods are used to categorize data into normal and abnormal classes of arrhythmia. We evaluated the performance of the proposed in terms of accuracy, sensitivity, precision, and F1 score. Experimental results show that SVM with one-against-one achieve higher CVD classification performance measures compared with a related works.

參考文獻


[1] “Cardiovascular diseases (CVDs),” World Health Organization. Available: http://www.who.int/mediacentre/factsheets/fs317/en/
[2] Wikipedia Contributors: ‘Electrocardiography’, Wikipedia, The Free Encyclopedia.https://en.wikipedia.org/w/index.php?title=Electrocardiography&oldid=875817847
[3] J. C. Ang, A. Mirzal, H. Haron, and H. N. A. Hamed, "Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection," IEEE/ACM transactions on computational biology and bioinformatics, vol. 13, no. 5, pp. 971-989, 2015.
[4] S.-Y. Jiang and L.-H. Wang, "Enhanced machine learning feature selection algorithm for cardiac arrhythmia in a personal healthcare application," in 2018 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), 2018: IEEE, pp. 39-42.
[5] A. Elsayyad, M. Al-Dhaifallah, and A. M. Nassef, "Features selection for arrhythmia diagnosis using Relief-F algorithm and support vector machine," in 2017 14th International Multi-Conference on Systems, Signals & Devices (SSD), 2017: IEEE, pp. 461-468.

延伸閱讀