乳癌是全世界女性罹患最多的癌症之一,近年來由於醫學進步,若經提早的診斷發現及適當的治療,則乳癌的10年存活率平均達60%,第一期乳癌的存活率則高達80%以上,零期乳癌甚至接近100%。在科技進步及電腦處理普及的情況下,大量的醫院病患資料得以更快速的取得與分析,利用資料探勘方法將可以在短時間內進行預測及分類,提供醫生診斷之參考。 為建立一乳癌輔助診斷模型,利用倒傳遞類神經網路 (BPN)、決策樹 (C5.0)、貝氏網路 (BN)、支援向量機 (SVM)、邏輯式迴歸 (LR)、區別分析 (DA)、案例式推理 (CBR)、多元適應性雲形迴歸 (MARS) 對細針抽吸 (FNA) 乳房檢驗資料進行分析。建立的模型有單一乳癌診斷模型,包含單一診斷模型及再確認診斷模型;多重乳癌診斷模型,包含不一致診斷模型及投票組合診斷模型。另嘗試利用基因演算法 (GA) 在眾多方案中快速找出最佳解之乳癌診斷模型組合。結果顯示,再確認診斷模型、不一致診斷模型及投票組合診斷模型,皆優於單一診斷模型,其中投票組合模型表現為最佳,準確率達98.82%。而利用GA確實能減少建立模型所耗費的時間,找到最佳乳癌診斷之組合,在短時間內進行預測,提供醫生做為疾病診斷之參考,並提升診斷之準確性。
Breast cancer has been one of the most prevalent diseases for women around the world. Thanks to the advancement in medical treatment, approximately 60% of the patients with breast cancers are able to survive for ten more years with early diagnosis coupled with appropriate treatment. The survival rate for Stage-1 and Stage-0 breast cancer is over 80% and nearly 100% respectively. With the constant technological progress and ever-increasing reliance on computer, a huge amount of medical information of hospitalized patients can be easily acquired and effectively analyzed. Data mining method can be used to process and classify the information, providing valuable reference for doctors to reach more accurate diagnosis in an efficient manner. Striving to develop a solid diagnosis-supporting model focusing on breast cancers, the study uses BPN (Back Propagation Networks), C5.0, BN (Bayesian Networks), SVM (Support Vector Machines), LR (Logistic Regression), DA (Discriminant Analysis), CBR (Case Based Reasoning) and MARS (Multivariate Adaptive Regression Splines) to examine and classify the data obtained from breast FNA (Fine-Needle Aspiration) analyses. The breast cancer diagnosis models developed by the study include: the single diagnosis model (incorporating both diagnosis and reconfirmation) and the multi-combinational diagnosis model (including inconsistency-based model and voting model). In addition, GA (Genetic Algorithm) is used to identify the best combination of breast cancer diagnosis. Based on the research results, reconfirmation model, inconsistency-based model, and voting model are superior to a single diagnosis model. The voting model reports the best performance with an accuracy rate as high as 98.82%. Utilizing GA can effectively reduce the time spent on model construction, help identify the best combination of prediction models to facilitate efficient diagnosis of breast cancer, provide doctors with valuable reference, and to enhance the accuracy of diagnosis.