透過您的圖書館登入
IP:3.81.13.254
  • 學位論文

以機器學習技術預測台灣血液透析患者之存活

Using Machine Learning Techniques to Predict the Survival of Maintenance Hemodialysis Patients in Taiwan

指導教授 : 李友專

摘要


背景:在台灣,以2010年而言,使用血液透析方式的末期腎臟病患,占百分之八十九點六。多重併發症會影響血液透析患者之存活,如何能準確預測其存活是相當重要的課題,因為關係到醫師的決策、國家的財政負擔與病患的抉擇。台灣全民健保資料庫於2000年釋出並廣泛用在相當多的研究上。Weka是一資料探勘運用的開放軟體,它提供了目前普遍運用的機器學習之流程,使用者可輕易將資料輸入分析。而我們的這個研究,利用從健保資料庫得到血液透析病人的資料,利用Weka中機器學習的技術,欲建立一個理想的預測其存活率的模型。 方法:我們從西元1997到2008年,百萬人健保資料庫中擷取資料。我們選擇年紀大於20歲,血液透析至少90天,沒有換腎紀錄,也沒有接受過腹膜透析紀錄,我們收集這些患者的年紀、性別、存活時間與併發症,包含糖尿病、心衰竭、腦中風、慢性阻塞性肺病、C型肝炎感染、癌症、心律不整、焦慮、動脈硬化性心臟病、骨折、腸胃道出血、肝硬化、B型肝炎感染與副甲狀腺切除病史。依預測存活時間長短分為三組,依次為2年、6年與10年。每組在分為兩小組,一小組為小於預測時間,另一小組為大於預測時間。再將這些資料利用Weka中的單純貝氏分類、支持向量機、多層次感知、邏輯式迴歸與隨機森林法做預測模型分析,決定預測值之好壞取決於接收者操作特徵曲線下區域之大小。 結果:共有2591名血液透析患者進入本研究。第一組有2114人,第二組有1348人,第三組有998人。活得較長的那幾小組,年紀都比較輕且有統計上的意義。女性在活得較長的那幾小組占多數。糖尿病、心衰竭、腦中風與慢性阻塞性肺病,在活得較短的那幾小組占多數且有統計上的意義。副甲狀腺切除術紀錄,在活得較長的那幾小組占多數。C型肝炎感染在第一組與第二組中,反而在活的較長那些小組有高比例。動脈硬化性心臟病、骨折、腸胃道出血、肝硬化有高比例在活得較短的小組,但在統計上僅第二和第三組有意義。癌症比例除第三組外,在第一和第二組中,活的較短的那小組有較高比例。經過變數選擇,我們在三組中,每組可得到一組對預測模型貢獻的排序,年紀與糖尿病在三組中均分居一、二名,而接收者操作特徵曲線下區域之大小表現上,我們發現單用年紀此變數,就足以表示出整個模型的預測度,其他變數影響並不大。 結論:年紀本身在預測台灣血液透析患者的存活上,是一個很強的預測因子,而機器學習技術能廣泛運用在醫學領域上。

關鍵字

血液透析 存活 機器學習

並列摘要


Background: End-stage renal disease was highly prevalent in Taiwan and the prevalence of hemodialysis modality in end-stage renal disease patients was 89.6% in 2010. Multiple comorbidities can influence the survival of hemodialysis patients. How to predict the patients’ survival is an important issue because it is concerned with the doctors’ decision making, the national financial burden and the patients’ choices. The National Health Insurance Research Dataset (NHIRD) of Taiwan was released in 2000 and it be widely used in lots of studies. Weka is open source software for data mining task. It provides some popular machine learning algorithms that you can easily apply to your dataset. In this study, we want to utilize the hemodialysis patients’ data from NHIRD to construct an ideal prediction model for their survival with machine learning techniques in Weka. Method: We extracted approximately one million patients’ data from NHIRD of Taiwan from 1997 to 2008. We recruited the patients who were on hemodialysis more than 90 days and older than 20 years old. The patients who had been renal transplanted and had undergone peritoneal dialysis were excluded. Their gender, age, survival-length and comorbidities such as diabetes mellitus, congestive heart failure, cerebrovascular accident, chronic obstructive pulmonary disease, hepatitis C virus infection, cancer, arrhythmia, anxiety, atherosclerotic heart disease, bone fracture, gastrointestinal bleeding, liver cirrhosis and hepatitis B infection and parathyoidectomy history were recorded. Three groups were created by the prediction of survival-length. Group 1 was divided two subgroups according the patients’ survival-length were longer than 2 years or not. Group 2 and 3 were also individually separated to two subgroups according to the patients’ survival-length were longer than 6 and 10 years or not respectively. The data was inputted to the machine learning tools including Naive Bayes, support vector machines, multilayer perception, logistic regression and Random Forests in Weka. We identified an ideal prediction model according to the values of area under the receiver operating characteristic curve (AUROC). Result: A total of 2591 hemodialysis patients were included in this study. There were 2114, 1384 and 998 patients in Group 1, Group 2 and Group 3. Young age patients were statistical significance in the longer survival-length subgroups in three groups. Female patient ratio was higher in all longer survival-length subgroups. The comorbidities such as congestive heart failure, cerebrovascular accident, chronic obstructive pulmonary disease and diabetes mellitus were higher patient ratio in the short survival-length subgroups and all statistical significances in three groups. The history of parathyroidectomy was obvious in long survival-length subgroups and there were all statistical significances in three groups. The long survival-length subgroups owned more hepatitis C virus infected patient ratio than the short survival-length subgroups and were statistical significances in Group 1 (p=0.043) and Group 2 (p=0.014). The comorbidities including atherosclerotic heart disease, bone fracture, gastrointestinal bleeding and liver cirrhosis were higher patient ratio in short survival-length subgroups but only were statistical significances in Group 2 and Group 3. Cancer patient ratio was higher in short survival-length subgroups in Group 1 and Group 2 except in Group 3. After machine learning with attributes selection, we got the attributes rank for prediction model contribution. Age was top 1 attribute and diabetes mellitus was top 2 in all groups. The AUROC was 0.708 by 1-attribute prediction model with age and was 0.715 (the highest AUROC) by 8-attribute prediction model with top1 to top 8 attributes in logistic regression model in Group 1. The AUROC was 0.752 by 1-attribute prediction model with age and was 0.778 (the highest AUROC) by 12-attribute prediction model with top1 to top 12 attributes in logistic regression model in Group 2. The AUROC was 0.809 by 1-attribute prediction model with age and was 0.858 (the highest AUROC) by 3-attribute prediction model with top1 to top 3 attributes in logistic regression model in Group 3. According to this finding, age played an important role in the survival prediction of hemodialysis patients in Taiwan. Conclusion: Age per se is the strongest attribute in the prediction model for the survival of maintenance hemodialysis patients in Taiwan. Machine learning techniques can be widely utilized in the medical fields.

並列關鍵字

Hemodialysis (HD) Survival Machine Learning

參考文獻


33. Fabrizi F., Martin P., Lunghi G., Ponticelli C. (2004). Natural history of HBV in dialysis population. Giornale Italiano Di Nefrologia, 21(1), 21-28
1. Hwang S.J., Tsai J.C., Chen H.C. (2010). Epidemiology, impact and preventive care of chronic kidney disease in Taiwan. Nephrology, 15, 3-9
2. Kuo H.W., Tsai S.S., Tiao M.M., Yang C.Y. (2007). Epidemiological features of CKD in Taiwan. American Journal of Kidney Disease, 49 (1), 46-55
3. Wen C.P., Cheng T.Y., Tsai M.K., Chang Y.C., Chan H.T., et al. (2008). All-cause mortality attributable to chronic kidney disease: A prospective cohort study based on 462293 adults in Taiwan. Lancet, 371, 2173–2182
5. Chen J.Y., Tsai S.H., Chuang P.H., Chang C.H., Chuang C.L., et al. (2014). A comorbidity index for mortality prediction in Chinese patients with ESRD receiving hemodialysis, Clinical Journal of the American Society of Nephrology, 9(3), 513-9

延伸閱讀