根據行政院衛生署生命統計資料,國人2010年十大癌症死因中,肝癌位居第二位,佔18.9%。近年來流行病學統計,一般人口群中非酒精性脂肪肝盛行率在12~37%,國人罹患肝臟疾病不斷攀升,且有年輕化趨勢,有必要提昇國人對肝臟疾病的重視,和強調預防醫學的重要性。本研究採用中南部某區域教學醫院健康管理中心之健檢醫學資料庫為研究對象,以邏輯斯迴歸以及決策樹和類神經網路兩種資料探勘演算法來建立非酒精性脂肪肝的預測模式,並以十摺交互驗證法(10-fold cross validation)和ROC曲線評估預測模式的準確率,以建構最佳預測模型,協助醫師執行臨床醫療決策時的參考。 研究結果發現類神經網路預測模型有較佳的準確率,經十摺驗證法後可達89.1614%,ROC曲線92.1%;其次為決策樹,正確率為88.3991%,ROC曲線為90.7%;最後為邏吉斯迴歸,正確率為83.825%,ROC曲線為89.8%。以Select Attributes分類器中Gain Ratio Attributes 來評估各變數對非酒精性脂肪肝的重要性,經演算後主要影響因子依序為:TG、ALT、UA、TCHO、AST、AC GLU、HDLC、BMI、LDLC;此結果與決策樹之樹狀結構亦相符合。在臨床實務上本研究所建構的預測模式,將有助於醫療人員進行醫療決策時,適時提供正確的醫療處置或衛教資訊,使民眾及醫療提供者達到雙贏的效益。
Hepatocellular carcinoma is the second leading cause of cancer-related deaths in Taiwan, accounting for 18.9 % of all cancer deaths in 2010. According to recent epidemiologic statistics, the prevalence of non-alcoholic fatty liver in general population was 12~37%. Since the incidences of liver diseases are rising, and the ages of patients are getting younger, it is important to inspire people to pay more attention to liver diseases and to emphasize the importance of preventive medicine. We extracted data from the health examination database of the health management center of a major hospital in Southern Taiwan to generate the prediction models for non-alcoholic fatty liver. Three techniques including logistic regression and two data mining algorithms, decision tree and neural networks, were applied to build the models and 10-fold cross validation method and ROC curves were employed to examine the accuracy of the models. The results showed that the neural networks prediction model had the highest average accuracy rate, 89.1614% and 92.1% by ROC curve; followed by the decision tree model, which had an 88.3991% average accuracy rate and a 90.7% ROC curve; and the logistic regression had an average accuracy rate of 83.825% and the ROC curve was 89.8%. Gain ratio attributes revealed the importance of the factors of the non-alcoholic fatty liver and the ranking of the relative importance was as follows: TG, ALT, UA, TCHO, AST, AC GLU, HDLC, BMI, and LDLC. The prediction models constructed in our study can provide the health care providers better clinical intervention or health education information when they are making medical decisions, achieving a win-win benefit for both the patients and health care providers.