透過您的圖書館登入
IP:18.223.108.186
  • 學位論文

透過數據分析建立疾病風險預測模式

Establishment of Disease Risk Prediction Models Based on Data Analysis

指導教授 : 趙榮耀

摘要


隨著醫療技術不斷的發展,人類的平均壽命逐年增加,疾病仍然是人類死亡的主要原因,其中,惡性腫瘤在台灣始終為近十年的十大死因之首,其中肺癌又是惡性腫瘤中的第一位。癌症的早期診斷非常重要,早期診斷出癌症後,通常可以通過手術和輔助療法治愈,造成癌症的原因則有許多不同的說法,包含共病症、基因、飲食生活習慣…等等,近數十年來多數醫師與科學家始終在找尋疾病成因,但目前尚未有確切的證據證實能夠精準確認潛在疾病的發生,因此疾病預防以及癌症早期診斷變得越來越重要,目前在科學證據支持下,可以透過數據分析來識別不同疾病之間的關係,當出現某些症狀時,可以在癌症進展之前就發現它,並立即進行治療以使預後效果更好。 本研究目標為在建立醫學數據應用與分析的精準架構,進而開發疾病預測模型。以我國健康保險資料庫為基礎,透過科學大數據分析的方法查找不同疾病與肺癌的潛在相關因素,並將其與基於證據的醫學研究進行比較,以確認因素之間的相關性,然後透過採用最小絕對收縮(LASSO)和深度神經網路方法(DNN),設計出一種基於數據科學建構預測模型的新流程。 最終,本研究用科學的流程建構兩個不同的案例的模型。第一個模型是十年罹患肺癌預測模型,透過深度神經網路,根據13種不同的疾病計算罹患肺癌的可能性,並能夠幫助潛在患者更早地發現肺癌,建構出的模型效能準確度為85.4%,靈敏度為72.4%和特異性為85%以及ROC(95%CI,0.8604-0.8885)為87.4%。第二個模型是基於不同治療方法肺癌治療三年存活率的預測模型,以邏輯回歸與類神經網路建構基於五種因子的治療存活率預測模型,我們研究中最好的模型為類神經網路模型,其精準度為82.7%,靈敏度為77.6%,特異性為76.8%,以及AUROC為81%。本研究提出的兩種模型,比起過往的模型皆有較高的精準度,第一種模型以科學數據分析為依據,開發高準確度疾病預測模型。第二種模型則可做為不同療法選擇的決策參考依據,並且發現了高血壓之定期服用藥物可能為肺癌治療的保護因子。

並列摘要


With the continuous development of medical technology, the average human lifespan has been increasing year by year. However, diseases are still the main cause of human death. Among them, cancer leads all other diseases in recent decades in Taiwan. Cancer is usually curable by surgery and adjunctive therapy when diagnosed in early stages. Early cancer can usually be operated on, but elder patients may recover slowly from treatment. Being in bed for a few weeks will affect the general condition of the elderly and prevent them from fully recovering. In order to find a resolation between the pros and cons of the treatment for the elderly, it is necessary to balance over-treatment and under-treatment. Therefore, early diagnosis and disease prevention are becoming more and more important. The relationships between different diseases can be identified through medical data analysis. When certain symptoms appear, cancer can be found before it is advanced, and the immediate treatment follows that makes better prognosis. This study aims to establish an architecture for medical data analysis and design a disease prediction model. Based on the National Health Insurance Research Database, we attempt to find potential correlates of disease and compare them with evidence-based medical research in order to confirm factor correlation. Finally, by employing Least Absolute Shrinkage and deep neural network methods, we design a new approach of building prediction models. Two models are established in this study using different methods. The first model is a prediction model for lung cancer. A deep neural network was created to calculate the probability of lung cancer, depending on the different pre-diagnosed diseases, and to result in the earlier detection of lung cancer for the potential patients. Based on only 13 factors, the performance of model shows an accuracy of 85.4%, a sensitivity of 72.4% and a specificity of 85%, as well as an 87.4% area under ROC (AUROC) (95%, 0.8604-0.8885) model precision. The second model is a prediction model for the survival rate of lung cancer based on different treatments. Based on only 5 factors, the performance of model in our study shows model precision of 82.7% accuracy, a sensitivity of 77.6% and specificity of 76.8%, as well as 81% AUROC. Both models show better performance than other previous studies. The first model is based on scientific data analysis to develop a highly accurate lung cancer prediction model. The second model can be used as a reference for decision-making for different treatment options. In additional, this study also found that the lung cancer patients with hypertension tend to have a lower death rate.

參考文獻


1. World Health Organization, The top 10 causes of death. 2018; Available from: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death.
2. Taiwan Minitry of Health and Welfare, 2017 cause of death statistics analysis. 2017.
3. Peeter Karihtala ja, Ulla Puistola, Syöpä iäkkäällä naisella. Duodecim 2015. 131: p. 1507-1512.
4. Ng, O., E. Watts, C. A. Bull, R. Morris, A. Acheson and A. Banerjea, Colorectal cancer outcomes in patients aged over 85 years. The Annals of The Royal College of Surgeons of England, 2016. 98(03): p. 216-221.
5. Hennequin, C., S. Guillerm and L. Quero, Radiotherapy in elderly patients, recommendations for the main localizations: Breast, prostate and gynaecological cancers. Cancer radiotherapie: journal de la Societe francaise de radiotherapie oncologique, 2015. 19(6-7): p. 397-403.

延伸閱讀