在癌症流行病學中,可使用資料探勘技術來推測病患的預後結果以及造成癌症的可能因子,若能運用此技術針對本土常見疾病進行研究,將有助於改善治療與預防,進而達到降低醫療成本的效益。乳癌是世界各地最為常見的婦女癌症,近年來乳癌的死亡率亦有日漸增加的趨勢,成為國內女性癌症死亡原因第4名。 本研究針對台灣中部某地區醫院之資料為樣本,經由資料的蒐集與研究變數彙整,使用5疊交互驗證法建構訓練及測試資料,以類神經網路、決策樹、貝氏分類法、支援向量機四種資料探勘方法建構乳癌預後(Prognosis)模式,並利用準確率(Accuracy)、敏感度(Sensitivity)、特異度(Specificity)及ROC(Receiver Operating Characteristic)曲線下面積等績效評估方法評估比較四種探勘模型。 本研究結果顯示,以類神經網路與貝氏分類法預測乳癌的績效較佳,準確率分別為95.93%及94.41%,ROC曲線下面積分別為0.894及0.911。此四種模型可用於預測乳癌病患最終結果為存活或無法存活,在臨床上可提供乳癌病患的存活預測,期能提供醫師對於患者診療及預後評估之參考與建議。
In the cancer epidemiology, we can use the technology of data mining to speculate about the prognosis results of patients and the factor of cancer causing. If we can use this technology to focus on the research of common disease at local place, it will not only improve the cure and prevention, but also reduce the cost of medical. Breast cancer is the most common cancer for women in the world. Recently, the death rate of breast cancer has increased gradually. Therefore, breast cancer has become the number four of the cancer death rate for domestic women. In this research, it used the data of regional hospital in the middle area of Taiwan as a sample. By concluding the research and collection of data, we used 5-fold-cross-validation to build training and test the data. In addition, we constructed the prognosis model of breast cancer by artificial neural network, decision tree, bayes classifier and SVM(support vector machine), and used accuracy, sensitivity, specificity and AUC(Area Under ROC Curve) these methods to assess and compare to the four models. The results show that the efficiency of artificial neural network and bayes classifier are better than other methods. The accuracy is 95.93% and 94.41%. Moreover, AUC are 0.894 and 0.911. These four models can predict if the breast cancer patients are alive or not. For clinical, it can provide the alive prediction of breast cancer patients to give suggestions of cure and prevention for the doctor.