透過您的圖書館登入
IP:13.59.82.167
  • 期刊

運用三種資料探勘方法預測子宮頸癌存活情形之比較

Predicting Cervical Cancer Survivability: A Comparison of Three Data Mining Methods

摘要


本研究目的在探究以人工智慧(Artificial Intelligence)方法與資料探勘技術(Data Mining)在子宮頸癌預測模式的運用,分別應用類神經網路(Artificial Neural Network)、決策樹(Decision Tree)以及邏輯斯迴歸(Logistic Regression)三種演算法,由預測準確率以及對預測結果的解釋能力做為演算法的評估指標。本研究採用資料探勘技術,以美國SEER (the Surveillance, Epidemiology, and End Results) 1973-2000年癌症登記資料庫(CIPUD, Cancer Incidence Public-Use Database)中433,272筆資料記錄及72個變項進行資料分析,再將資料進行10折交叉驗證(10-Fold cross-validation),用類神經網路、決策樹以及邏輯斯迴歸三種演算法來比較預測存活準確度。結果顯示:預測準確率分別如下邏輯斯迴歸分析模型為0.8974(敏感度0.9047,特異度0.8830);決策樹分析模型(C5)為0.8732(敏感度0.8639,特異度0.8966);類神經網路分析模型為0.7406(敏感度0.7394,特異度0.7473)。邏輯斯迴歸演算結果預測準確度出現極端值1.0 (100%)、0.9942 (99.42%),明顯高出預測準確度的平均值0.8981。在決策樹的模型中,預測結果普遍比邏輯斯迴歸高,但相差不大。在類神經網路模型中,預測準確度平均為0.7776,明顯低於邏輯斯迴歸及決策樹,在其10折的準確度也顯示出不穩定的狀況,標準差為0.0786,為三種模型中最高。以預測準確度的平均值而言,邏輯斯迴歸分析(0.8981)及決策樹分析(0.8926)優於類神經網路分析(0.7776),而且類神經網路模型10折交叉驗證的預測準確度標準差(0.0786)最大;這樣的情形顯示其預測能力相對於邏輯斯迴歸及決策樹模型表現不佳。

關鍵字

無資料

並列摘要


Objective: The purpose of the study was to investigate the use of artificial intelligence methods and data mining technology for predicting cervical cancer survivability. The 3 models of artificial neural network, decision tree and logistic regression were investigated and their accuracy values for predicting cervical cancer survivability were evaluated. Methods and material: The Surveillance, Epidemiology, and End Results (SEER), a large dataset, was used to develop the 3 prediction models. The 3 models were 2 popular data mining algorithms, which were artificial neural network and decision tree; and 1 common statistical model, which was logistic regression. The 10-fold cross-validation analysis also measured the unbiased estimation of 3 prediction results for comparing their performances. Results: The results of accuracy of 3 models were respectively 0.8981 of logistic regression, 0.8930 of decision tree and 0.7776 of artificial neural network. The results of logistic regression were ever 1.0 and 0.9942 accuracy. In 10-fold cross-validation analysis, the standard deviation of accuracy of artificial neural network was 0.0786 and it was the worst one among the 3 prediction models. Conclusions: In this research, artificial neural network performed the model for predicting cervical cancer survivability worse (lowest prediction accuracy and largest variation of accuracy in 10-fold cross-validation analysis) than logistic regression and decision tree.

參考文獻


行政院衛生署癌症登記年報
anonymity
anonymity
Cios, K.J.,G.W. Moore(2002).Uniqueness of medical data mining.Artif Intell Med.26(1-2),1-24.
Delen, D.,G. Walker,A. Kadam(2005).Predicting breast cancer survivability: a comparison of three data mining methods.Artif Intell Med.34(2),113-127.

被引用紀錄


尤哲威(2015)。以健保資料庫對中風患者再復發之預測〔碩士論文,臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2015.00080
林宥安(2014)。以健保資料庫與癌症登記檔建構糖尿病確診後罹患為肝癌之預測模型〔碩士論文,臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2014.00105
楊欣明(2009)。資料探勘在健康檢查後續追蹤之應用〔碩士論文,國立屏東科技大學〕。華藝線上圖書館。https://doi.org/10.6346/NPUST.2009.00237
賴瑋諭(2007)。應用人工智慧於醫療資源之輔助規劃研究-以人工全膝關節置換術為例〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-1501201314421326
留啟祐(2008)。整合資料探勘方法應用於肝病輔助診斷〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-1108200821124700

延伸閱讀