應用機器學習演算法建立65歲以上台灣老人代謝症侯群之高危險群預測模型

目前成人預防保健服務分為兩個階段，第一階段包括基本資料、問卷填答(疾病史、生活習慣等)、身體檢查(身高、體重、血壓、身體質量指數、腰圍)、及實驗室檢查(血液生化檢查)。第二階段主要為實驗室檢查結果判讀、病人教育與健康諮詢。然而第一階段檢查後，須間隔一定的工作天數才能執行第二階段。本研究希望能為節省受檢者等待第二階段之時間及增進高危險群族民眾接受健康教育機會，將針對65歲以上參加第一階段成人預防保健服務之民眾，找出最佳之高危險代謝症候群預測模型，藉由本研究之預測結果，提供高危險群民眾，在執行第一階段時，即可先行實施相關衛教。本研究對象為2013年接受成人預防保健服務的老人(65歲以上民眾)，使用的輸入變項為成人預防保健服務第一階段之項目，包括身體檢查(身高、體重、血壓、身體質量指數、腰圍)、疾病史以及生活習慣。運用三種不同的機器學習演算法：類神經網絡(Artificial Neural Network，ANN)、隨機森林(Random Forest，RF)以及支持向量機(Support Vector Machine，SVM)等三種預測模型，並使用混亂矩陣的靈敏度、特異度及曲線下的面積(Area Under Curve，AUC)等方式來進行預測模型間比較。在三種預測模型中，ANN、RF及SVM之敏感度分別為64.27%、33.16%以及59.64%；ANN、RF及SVM之特異性分別為89.73%、92.94%以及79.82%；而以AUC而言，則以ANN(0.885) 表現最好，RF(0.793)次之，SVM(0.756)較差。運用成人預防保健服務第一階段資料來預測為高危險代謝症候群的老人，整體而言以混亂矩陣來進行比較後，以ANN表現最好，具有優良的鑑別力。

關鍵字

代謝症侯群；預測模型；機器學習演算法；老人預防保健；混亂矩陣；人工類神經網絡

並列摘要

At present, there are two phases in adult preventive health care services. The first phase includes gathering people’s basic information, filling in questionnaires (disease histories, lifestyle etc.), physical examination (height, weight, blood pressure, body mass index(BMI), waist circumference) and blood test (Blood biochemistry test). The second phase contains the interpretation of laboratory tests, patient education and health consultation. There will be certain days of waiting between the first and second phrase. Our study aims at reducing patient’s waiting time and increase the opportunity of high risk patients getting sufficient health education. Our study population are people who are over 65 years old and participate in the first phrase of preventive health care services. Our study wishes to build a predictive model of detecting people in high risk of developing metabolic syndrome. The study population are people who are over 65 years old and accepted preventive health care services in 2013.The study variables are items being checked in the first phrase of the heath care service, including physical examination (height, weight, blood pressure, body mass index(BMI), waist circumference), disease histories and lifestyle. Three different machine learning algorithms were used in this study, including Artificial Neural Network(ANN), Random Forest(RF), and Support Vector Machine(SVM). Then the comparisons between models are compared using the sensitivity, specificity, and AUC in confusion matrix. Within three prediction models, the sensitivity of ANN. RF, and SVM are 64.27%、33.16%, and 59.64% respectively. Specificity are 89.73%, 92.94%, and 79.82% respectively. In terms of AUC, ANN(0.885) has the best performance，RF(0.793) in second， and SVM(0.756) has the lowest performance score. In conclusion, using confusion matrix to predict high risk people who are in the first phrase of preventive health care services of developing metabolic syndrome, after using confusion matrix to do model comparison, we found that ANN has the best outcome and provide the highest performance score.

並列關鍵字

Metabolic syndrome ； Prediction model ； Machine learning ； Adult preventive health care aged 65 and over ； Confusion matrix ； Artificial neural network