  • 學位論文


The Risk Factors of Pre-diabetes to Predict by Data Mining — Take a Regional Hospital in Pingtung as an Example

指導教授 : 蔡正發


隨著社會經濟的繁榮及生活型態的改變,全球糖尿病之盛行率有逐年上升的趨勢,糖尿病也列於國人十大死因之一,其併發症有足部神經病變可能需截肢、視網膜病變導致失明、腎衰竭等,也增加罹患心臟病和中風的危險,糖尿病的高危險群如能早期發現,早期接受適當的飲食與運動建議,則可避免其發展成為糖尿病患者。 本研究試圖以某區域醫院成人健檢的報告資料,使用資料探勘的方法,目的是發展出一套糖尿病前期的預測模式,進而探討其危險因子。透過分群Simple K-means演算法先提取研究的樣本,再以六種分類器(決策樹、Logistic Regression、Multilayer perceptron、SMO、Naive Bayes、RBF Network)做比較,找出最佳的預測模式。研究發現:(1)預測模式以Logistic Regression表現最佳,其評估指標,準確率(Accuracy)為99.8%,敏感度(Sensitivity)高達98.7%,特異性(Specificity)為99.9%,及預測有病的正確率(PPV)也高達99.1%;(2) 實驗獲得糖尿病前期危險因子包括收縮壓、舒張壓、BMI、膽固醇、飯前血糖等五項;(3) 搭配決策樹整理出11條實際有用且易於判讀的規則。在疾病尚未形成前,提供醫療人員診斷輔助或個人的疾病預防,使其早期發現,早期接受適當的飲食與運動建議,降低危險因子,以避免其發展成為糖尿病患者。 關鍵字:糖尿病前期、預測模式、危險因子、資料探勘


Along with socio-economic prosperity and lifestyle change, global prevalence of diabetes has been increasing every year. Diabetes also becomes one of the ten leading causes of death in our country. The complications include foot neuropathy which may need amputation, and retinopathy which causes blindness and renal failure. The risk of heart disease and stroke also elevates. If the high-risk group of diabetes can be discovered early and be provided with appropriate diet and exercise suggestions early, we can avoid the patients suffering from diabetes. This study tried to use data mining techniques to study the report data of adult health exams in a regional hospital. The main purpose was to develop a set of pre-diabetes predictive model, and to discuss the risk factors of pre-diabetes. The research samples were extracted through simple K-means clustering algorithm, and then were analyzed by six classifiers (Decision trees, Logistic Regression, Multilayer perceptron, SMO, Naive Bayes and RBF Network). We compared these six prediction modes in order to find the best one. The research results showed: (1) the prediction mode analyzed by Logistic Regression had the best performance, and the accuracy was 99.8%, the sensitivity was 98.7%, the specificity was 99.9%, and the positive predictive value (PPV) was 99.1%; (2) this study found five risk factors of pre-diabetes, including systolic blood pressure, diastolic blood pressure, body mass index (BMI), cholesterol level and fasting blood glucose level; (3) with decision trees, this study arranged 11 rules which are practically useful and easily interpretable. Before diabetes develops, the patients should be provided with medical personnel diagnostic aids or individual disease prevention, in order to discover the disease early, receive proper diet and exercise suggestions early as well as reduce the risk factors, so that the patients will not suffer from diabetes. Keyword:Pre-diabetes, Predictive model, Risk factor, Data mining


[6]Barber, S.R., Davies, M.J., Khunti, K., Gray, L.J., “Risk assessment tools for detecting those with pre-diabetes: A systematic review,” Diabetes Research and Clinical Practice, vol. 105, pp. 1–13, 2014.
[7]Brown, N., Critchley, J., Bogowicz, P., Mayige, M., Unwin, N., “Risk scores based on self-reported or available clinical data to detect undiagnosed Type 2 Diabetes: A systematic review,” Diabetes Research and Clinical Practice, vol. 98, pp. 369–385, 2012.
[8]Liao, S.H., Chu, P.H., Hsiao P.Y., “Data mining techniques and applications – A decade review from 2000 to 2011,” Expert Systems with Applications, vol. 39, pp. 11303-11311, 2012.


