透過您的圖書館登入
IP:18.117.158.47
  • 學位論文

An Improved Feature Selection Method toward Precise Disease Classification

An Improved Feature Selection Method toward Precise Disease Classification

指導教授 : 薛幼苓
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在急診醫療之中,如何正確且快速的判斷病患的狀況並決定如何治療是現今很重要的一個議題。若急診室同時面對大量的病患就診且病患人數超過急診醫生的負荷,此時就需要快速的將病患按照危急程度來分類,以達到最有效率的治療。但是在緊急情況下,醫生所能得到病患的資訊相當有限,而且許多病徵都非常相似,所以即時判斷病患的嚴重程度是一件非常困難的事。因此本研究使用電子健康記錄及資料探勘的技術來建立分類模型,並以其幫助醫生快速診斷。我們使用的資料為嘉義基督教醫院急診室 2012 到 2013 年之間的看診資料,其中 15775 位胸痛病患裡包含了 338 位危急的胸痛病患。本研究著重在更有效率的建立分類模型以協助醫生分辨危急胸痛病患,因此我們提出基於集群快速特徵選取之優化演算法 (i-FAST)將不相關及多餘的特徵排除。我們將演算法分為兩部份:第一部份我們使用 ReliefF 演算法來移除不相關的特徵。第二部份使用三種不同的互資訊方法 (1)對稱不確定性、(2)信息增益及 (3)獲利比率計算每個特徵之間的關聯性後,再分別使用算出的關聯性分別建立最小生成樹,並利用切割樹的方式找出代表性的特徵。最後將篩選後的特徵使用資料探勘的技術來建立胸痛病患的分類器。

並列摘要


In the emergency room of a hospital, the patients need to be quickly diagnosed so that the doctors can decide the required treatment. Doctors have to decide the treatment order for patients based on the level of emergency. However, it is hard to diagnose the disease immediately when patients go to the emergency room because patients may have the similar symptoms for different diseases. In this research, we use data mining techniques to analyze the electronic health records (EHRs) for helping doctors diagnose patients responsively. The dataset used in this research was collected from the emergency room of Chiayi Christian Hospital, Chiayi City, Taiwan. It contains the medical records from 2012 to 2013. The objective is to build a classifier to identify the chest pain patients. For this purpose, we design a feature selection algorithm, improved fast clustering-based feature subset selection algorithm (i-FAST), to facilitate any existing classifiers. The i-FAST aims to remove the irrelevant and redundant features and find the important features for classifier construction. Firstly, the irrelevant features are removed by ReleifF. Secondly, the distances of features are calculated based on three mutual information measurements, symmetric uncertainty, information gain, and gain ratio. We then construct the MST with the distances of features and partition the tree to select the representative features. Finally, the classifier with the selected features is built for identifying chest pain patients. The experiment result show that our classifier integrated with the i-FAST algorithm outperforms the classifier integrated with the FAST algorithm.

參考文獻


[1] E. Cela and N. Frasheri. Data mining techniques and tools used in healthcare databases.
[3] X.-W. Chen, G. Anantha, and X. Lin. Improving bayesian network structure learning with mutual information-based node ordering in the k2 algorithm. Knowledge and Data Engineering, IEEE Transactions on, 20(5):628–640, 2008.
[4] C.-Y. Fan, P.-C. Chang, J.-J. Lin, and J. Hsieh. A hybrid model combining casebased reasoning and fuzzy decision tree for medical data classification. Applied Soft Computing, 11(1):632–644, 2011.
[7] I. Kononenko. Estimating attributes: analysis and extensions of relief. In European conference on machine learning, pages 171–182. Springer, 1994.
[8] D. Lavanya and K. U. Rani. Performance evaluation of decision tree classifiers on medical datasets. International Journal of Computer Applications (0975–8887 (, Volume 26–No. 4, 1-4, 2011.