資料探勘技術應用於病患存活狀態之預測

近年來，國人死亡率有逐漸提升的趨勢，而主要死因與往年比較，其標準化死亡率亦均呈現增加的情勢。因此，了解當前疾病的發生及疾病死亡構成，早已為醫學研究領域中的重要工作。當我們使用資料探勘的分類技術去探討某一特定疾病的個體狀態是否為死亡時，往往需要使用此疾病的相關因子來分析並且建構模型，方能達到一定的效果。在此，本研究於有限的資源內採納四個年度的健保就醫明細資料，並引入四種資料複雜度指標用於分類器篩選的依據，採用分類正確率、敏感度、特異度等等，用於評估六種常用的分類技術對於健保資料分類結果的表現。研究結果顯示，支持向量機與線性判別分析於分類正確率上有較佳的表現，其代表能較準確地單純依照個體的就醫資訊即可預測是否為死亡。未來希望能提供醫學研究方面的參考，同時也希望能為醫院每年配置合理的醫療資源和制定預防管理措施。

關鍵字

分類器；分類正確率；資料複雜；資料探勘

並列摘要

Mortality increased gradually in recent years. Besides, the standard mortality also present the gradual increment trend by comparing the main causes of death to the past few years. Therefore, understanding the happening of present disease and the component of disease-cause already becomes the important work in research healthcare. When we use data mining classification techniques to discuss whether one certain Status is death or not, we usually need to analyze and construct the model by using this related disease factor to achieve the optimal effect. This research aims to adopt National Health Insurance Resource in four years to filter six common useful classified techniques for the classification performance of National Health Insurance Resource by four complexity indices and evaluate those techniques by correct rates such as classification correct rate, specificity and so on. The results show that Support Vector Machine and Linear Discriminant Analysis have better performance on classification correct rate. It means that we could predict whether the individual status is died in the future or not precisely by its own information merely. In the future development, we hope to give more reference on medical researches. Moreover, we also hope to allocate the reasonable medical resources and make preparation of managing steps on hospitals each year.

並列關鍵字

classifiers ； classification correct rate ； data complexity ； data mining

參考文獻

2.沈彥廷(2012)，｢資料複雜度指標對資料探勘分類技術的影響｣，淡江大學統計學系應用統計學碩士班碩士論文。

3.吳泳慶(2007)，｢中文垃圾郵件客製化過濾系統之研究｣，淡江大學統計學系應用統計學碩士班碩士論文。

1.Breiman, L. (2001), Random Forests, Machine Learning, 45, 5-32.

5.Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems.Annals of eugenics, 7(2), 179-188.

6.Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations.Wiley, New York.

國際替代計量

資料探勘技術應用於病患存活狀態之預測

全文下載

主題瀏覽