透過您的圖書館登入
IP:18.119.118.99
  • 學位論文

以機器學習演算法對糖尿病及糖尿病前期的預測

Prediction of Diabetes Mellitus and Prediabetes Based on Machine Learning Algorithms

指導教授 : 蘇家玉

摘要


全世界約有近3億6千萬的糖尿病人口,2009年的國際糖尿病聯盟的資料顯示,糖尿病未來盛行率至2030年為止,有可能達到5億5千萬人次。近年我國罹患糖尿病人口不斷上升,依衛生福利部2012年公告,糖尿病自1987年,高居台灣十大死因前5 位,死亡率約為26.5%,罹病人口數從2000年起,以7%的速度逐年增加。根據美國糖尿病協會 (American Diabetes Association) 2014年公佈,無症狀成年人若過重 (BMI≧24kg/m2) 且符合其公告之十項危險因子任一者,包括缺乏運動、高血脂、高血壓等,建議作糖尿病篩檢。國外也有利用機器演算法對糖尿病預測所做的研究。因此,本研究以台灣門診中常見的檢驗項目,透過各種機器學習演算法,包含類神網路與支持向量機等分類技術,比較各種分析方法對糖尿病預測的準確率。 研究樣本以台北某教學醫院之2012年1月至2013年12月之全院年滿20歲以上回診病患之檢驗資料,以SAS軟體9.3中的SAS Enterprise Guide 5.1 作資料前處理,排除檢驗報告有欠缺之個案後,共有339筆資料,內含86筆為糖尿病 (Diabetes Mellitus, DM) 、111筆為非糖尿病 (Non-DM) 及162筆為糖尿病前期 (Prediabetes, Pre-DM)。研究分為兩組為Group 1 (DM vs. Non-DM) 與Group 2 (Pre-DM vs. Non-DM)。研究的方法為利用SAS Enterprise Miner 12.1,將Group 1與Group 2資料中80%做為訓練資料集,10%做為驗證資料集,最後以10%做為測試資料集。並輔以決策樹、類神經網路、邏輯斯迴歸、支持向量機等內建模型,對該兩組資料作預測分析。 研究結果發現,我們的研究方法可以達到精確的預測力。在Group 1以邏輯斯迴歸可以達到最佳的準確率 (85.8%) 及AUC (0.898)。Group 2則以決策樹可達到最佳的準確率 (77.8%),AUC以支持向量機可達最佳結果 (1.000)。此外,本研究所納入的變數符合醫學臨床輔佐判斷糖尿病的項目。本研究的方法亦能找出和糖尿病與糖尿病前期的特徵相符的變數。

並列摘要


It has been estimated that approximately 360 million people have diabetes mellitus worldwide, and that this number may reach 552 million by the year 2030. In Taiwan, diabetes mellitus ranks fifth among ten leading causes of death with a 26.5% mortality rate, and the diabetic population has increased by 7% annually since 2000. The American Diabetes Association also suggests that screening and early detection of undiagnosed diabetes mellitus are crucial in preventive medicine, especially for overweight adults with other risk factors, include less Physical Activity, hyperlipidemia and hypertension etc. Several previous studies that incorporate machine learning algorithms for prediction of diabetes mellitus have been proposed, and the average accuracy is around 79.0%. In this study, we developed a novel prediction method, where discriminative features from physiology and blood biochemical data are incorporated in various machine learning algorithms. The patient data have been collected from January 2012 to December 2013 in a medical center in northern Taiwan. In data preprocessing, we apply SAS Enterprise Guide version 5.1 to filter patients with missing values and only extract those whose age is greater or equal to 20. The final data set composed of 339 individuals (i.e., 86 confirmed diabetes, 111 non-diabetes, and 142 pre-diabetes patients) is used as input for SAS Enterprise Miner version 12.1. We organize the data set into two groups as Group 1 (DM vs. Non-DM) and Group 2 (Pre-DM vs. Non-DM). Each group is further divided into a training set (80%) for model construction, a validation set (10%) for parameter selection, and a test set (10%) for performance evaluation. We apply machine learning algorithms including decision trees (DT), artificial neural network, logistic regression, and support vector machines to construct predictive models. Experiment results show that our method achieve accurate predictive performance. First, we apply LR for prediction of diabetes in Group 1 and attain a high accuracy and area under the curve (AUC) of 85.8% and 0.898, respectively. In addition, for prediction of prediabetes in Group 2, our approach selects DT to obtain accuracy of 77.8% and SVM for AUC of 1.000, respectively. Moreover, the proposed biomedical features in our method also correspond well with medical domain knowledge. This demonstrates that our approach is able to generate discriminative features to identify diabetes and prediabetes patients.

參考文獻


(4) 李俊宏, 古清仁. (2010) . 類神經網路與資料探勘技術在醫療診斷之應用研究. 工程科技與教育學刊, 7 (1) .
(9) 陳敏麗, 黃松元. (2005) . 某社區民眾糖尿病篩檢中血糖值與糖尿病高危險因子及健康促進生活型態之探討. 衛生教育學報, 24.
(5) 李語嫣, 曾新穆, 吳晉祥. (2010) . 運用資料探勘技術由健康檢查與生活習慣資料建立疾病預測模型-以糖尿病為例. (碩士) , 國立成功大學.
(10) 黃國晉, 黃蘭菁; 李貫廷; 李育霖; 楊偉勛;. (2013) . 2013年美國糖尿病學會臨床治療指引摘要. 台北市醫師公會會刊, 57 (3) , 23-31.
(3) 行政院衛生署國民健康局, 宜蘭縣政府衛生局. (2004) . 糖尿病共同照護工作指引手冊. from http://www.hpa.gov.tw/BHPnet/Web/HealthTopic/TopicArticle.aspx?No=200712250075&parentid=200712250014

延伸閱讀