根據行政院衛生福利部的統計資料顯示,2014年台灣十大死因以慢性疾病為主,探究十大死因背後潛藏的主要因素,與高血壓、高血脂、糖尿病的發生息息相關,造成每十萬人口有219.2人的死亡率,不可輕忽的慢性疾病已成為危害國人健康的頭號殺手,為防患於未然,應用資料探勘技術從健康檢查資料進行分析,從中發掘罹患慢性疾病前的徵兆,預防及延緩慢性疾病的產生是刻不容緩的議題。 本研究以中南部某區域教學醫院健康檢查資料庫為資料來源,運用健康檢查資料與生活習慣資料,藉由嘗試多種資料探勘技術的建構與實驗,以分類技術來建立高血壓、高血脂、糖尿病的預測模型,並將同一受檢者多次在樣本醫院的健康檢查資料納入分析探討,藉著不同時間的健康檢查資料建構慢性疾病之疾病預測模型,實驗結果以隨機森林(Random forest)分類技術所建構的預測模型效能最佳,AUC:76.60%,為本研究最佳的分類器,在屬性關聯發現年齡及有糖尿病病史與罹患高血壓、高血脂、糖尿病有高度的相關性,期能經由本研究結果提醒40歲以上的民眾,善用週期性健康檢查檢視自身的健康狀況,矯正不良生活習慣,同時亦能協助醫療照護人員,有效掌握罹患慢性疾病前的徵兆,提供適切的健康諮詢,進而促使民眾採取健康措施以降低罹病風險。
According to the Ministry of Health and Welfare (MOHW), Executive Yuan, R.O.C. (Taiwan), Taiwan's top ten causes of death in 2014 is chronic diseases.So we explore the main factor,we found that hypertension, high cholesterol and diabetes are the main fators,and causing 219.2 people's mortality per 100,000 population, We can’t ignore chronic diseases have become the number one killer of harm to people's health, to take preventive measures, the application of data mining techniques from physical examination data to explore signs of suffering from chronic diseases before, so that we can prevent or delay resulting in chronic diseases. In this study, we use the physical examination data of a region in south-central teaching hospital as a data source.By trying many kind of mining techniques, we proposed a novel framework for discovering health risk patterns and the relation between the health patterns and the target disease from physical examination history data. Results in a random forest (random forest) classification the best performance prediction model constructed, AUC: 76.60%, the best research-based classifiers, In the other hand,we found age and a history of diabetes have a high degree of correlation between suffering from hypertension, high cholesterol, diabetes. The information can build effective prediction model for target disease diagnoses. According to these information provided, the physicians could early provide the health alerting and the medical treatment for people.