腦中風連續十年是國人主要死因原因排行榜中的前三名,因此腦中風為我國重要疾病之一,故本研究針對全民健保資料庫之腦中風病患死亡相關因素作為探討。本研究以國家衛生研究院2005年發行的健保承保抽樣歸人檔為資料來源,選取2005年至2009年的腦中風病患作為分析資料,建立一套分析全民健保資料庫的標準作業流程,並應用資料採礦技術之方法(如:決策樹、羅吉斯迴歸、類神經網路、隨機森林及支援向量機),分析各種方法之準確性,進而找出腦中風發病後死亡之影響因子。由實證分析結果得知,建模方法以隨機森林較佳,其能較準確地預測腦中風病患發病後死亡,故本研究建立之健保資料庫標準作業流程,希望能提供醫學研究方面的參考。
Stroke has been the top three of death in Taiwan's top ten causes of death in decade. For this reason, stroke is one of attentive diseases in Taiwan. In this study, the data came from Longitudinal Health Insurance Database 2005 (LHID2005) in National Health Insurance Research Database and used data from 2005 to 2009. This study used data mining technology to establish Standard Operation Procedure of National Health Insurance Research Database and built various model such as decision tree, logistic regression, neural network, random forest and support vector machine to analyze the accuracy of models mentioned and found influential factors of death after stroke. The results showed that the random forest model is the best method to predict the death after stroke. In conclusion, we hope that the Standard Operation Procedure can provide the reference of medical research.