透過您的圖書館登入
IP:18.220.160.216
  • 期刊

使用比較資料探勘演算法預測非小細胞肺癌患者預後因子、存活情形及其效能

Comparing Data Mining Methods for Predicting Prognostic Factors and Survivability of Non-small Cell Lung Cancer Patients

摘要


本研究目的是使用決策樹(DT)、類神經網路(ANN)和邏輯斯迴歸(LR)模型三種資料探勘演算法探討非小細胞肺癌(non-small cell lung cancer, NSCLC)預後因子及模型的影響因素。研究對象為131,257位美國癌症登記資料檔(surveillance, epidemiology, and end results, SEER)中診斷為NSCLC患者,依死因不同分為死於肺癌(N=123972)與轉移癌(N=7285);限於篇幅本文只討論死於肺癌個案。模型評估指標為準確性(accuracy, ACC)、ROC曲線下的面積(area under the ROC curve, AUC)和外推力(external generalization)且進行十折交叉驗證(10-fold crossvalidation)。其研究結果顯示:綜合三個模型之一、三、五年存活情形預後變項排序的結果,死於肺癌的NSCLC患者其預後因子前三名主要為手術種類、臨床分組和腫瘤擴散程度。預測力以ANN表現較好,外推能力以LR表現較好。樣本人數建議至少3500人,LR模型最易受小樣本影響;DT則受到所提供訊息之不足而無法成樹。複合模型則是當決策樹測試組ACC值較好時,則複合模型測試組AUC值就會提高。故研究結果建議ANN預測力表現較好,外推力以LR較好;使用LR樣本大小建議大於3500人;當DT的預測力較好時,建議可以使用複合模型。

並列摘要


The purpose of this study was to investigate effectiveness of decision tree (DT), artificial neural network (ANN) and logistic regression (LR) models for predicting prognostic factors and survivability of patients with non-small cell lung cancer (NSCLC). Study samples were patients diagnosed of NSCLC between 1973 and 2004 in the United States drawn from the SEER (surveillance, epidemiology and end results) databank. The dataset consists of 131,257 patients, 123,972 of whom died of lung cancer and 7,285 died of metastasis in five years. Because of the page limit, we demonstrate only results from those who died of lung cancer. The study evaluated the performance of models in terms of accuracy (ACC), area under ROC curve (AUC) and external generalization (ΔACC, ΔAUC). A 10-fold cross-validation was used to estimate unbiased values of parameters. Through synthesizing the models of DT, ANN and LR, the first 3 prognostic factors for 1-, 3- and 5-year survivability of patients died of lung cancer are surgery type, clinical stage, and the extension of cancer. The first 3 prognostic factors of patients died of metastasis are surgery type, clinical stage and the number of examined lymph nodes. ANN model had the highest ACC while LR had the worst. Decision tree for 5-year NSCLC survivability cannot be constructed due to inadequate information. The sample sizes significantly affect the performances of LR. As LR performs stable generalities beyond the amount of 3500 samples, sample size of at least 3500 people is recommended. The hybrid model can improve the performance (AUC and ACC) when DT performs better than ANN or LR.

被引用紀錄


吳娟(2012)。運用資料探勘技術預測末期病人短期存活時間〔碩士論文,元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2012.00047
王建菘(2012)。胃癌手術之住院日與醫療費用評估研究〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://doi.org/10.6827/NFU.2012.00189
黃怡靜(2011)。早期肺癌術後患者的症狀嚴重度、心理困擾與照護需求之探討〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2011.02966
黃子圳(2010)。應用人工智慧輔助評估非侵入呼吸器使用時機之研究〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-0607201014185600
吳建廷(2013)。應用資料探勘預測類風濕性關節炎病人預後〔碩士論文,國立中正大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201613533838

延伸閱讀