透過您的圖書館登入
IP:3.138.114.38
  • 學位論文

運用機器學習法分析大腸直腸癌病患罹患同時/異時性第二大腸直腸癌的風險因子:以三家醫院癌症登記資料為例

Using Machine Learning to Analyze Risk Factors for Synchronous/Metachronous Colorectal Cancer Patients with Colorectal Cancer: A Case Study of Three Hospitals' Cancer Registrations

指導教授 : 張啟昌
本文將於2024/11/12開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


背景:癌症的篩查和進步的治療方法,不僅提升了治療效果與病患的存活率,卻也導致第二原發癌個案數的增加。本研究使用機器學習技術,通過使用分類技術和關聯規則來識別各種可分析的危險因子和臨床特徵,從而開發出大腸直腸癌的預測模型。 方法:臨床數據集來自三家醫院的癌症登記中心,共計4,287有效記錄。根據三位臨床醫師與文獻探討共14項獨立變數作為分析第二癌的候選風險因子。使用懷卡托智慧型軟體的分類技術包括:基本貝氏、邏輯回歸、K-Star、隨機成員法、隨機過濾分類器、隨機森林、隨機樹。評估績效指標包括分類準確率、敏感度、特異度、F-measure評分和精確度。 結果: 本研究第二大腸直腸癌的標準化發生率為1.10與國外數據相比略低。罹患第二大腸直腸癌整體的危險因子依序為:整併期別(Combined Stage)、腫瘤大小(Tumor Size)、化學治療(Chemotherapy)和組織分級/分化(Grade/Differentiation);整體而言,基本貝氏方法的準確率最高(89.88%)。早期階段最重要的危險因子為身體質量指數(BMI);晚期階段最重要的危險因子為腫瘤大小;基本貝氏方法的準確率最高早期(88.03%)晚期(93.62%)。同時性和異時性第二癌的危險因子依序皆為:整併期別、腫瘤大小;基本貝氏法的準確率最高同時性(90.03%)異時性 (92.78%)。此外,在罹患同時性第二癌病患中,早期階段最高的準確率為K-Star分類器(87.07%),晚期階段最高準確率為基本貝氏(92.88%);危險因子分別皆為:腫瘤大小和化學治療;在罹患異時性第二癌病患中,準確率皆以基本貝氏方法為最高,分別為早期階段89.97%與晚期階段95.36%,危險因子依序都為腫瘤大小和組織分級/分化。 結論:大腸直腸癌是近十年台灣發生人數最多且發生率僅次於女性乳癌,死亡率僅次於肺癌與肝癌排名第三的惡性腫瘤,同時也是癌症健保醫療支出第三高的癌症。本研究為首次針對台灣大腸直腸癌倖存者進行同時與異時性第二大腸直腸癌風險因子的分析。本研究結果顯示總體危險因子為整併期別、腫瘤大小、化學治療、組織分級/分化;其中針對化學治療引起的同時性第二癌以及臨床組織分級/分化對異時性第二癌的影響是值得觀察的重要指標。

並列摘要


Background: Screening for cancer and advanced treatments have not only improved treatment outcomes and patients’ survival rate but also led to an increase in the number of second primary cancers (SPCs), and this study used machine learning techniques to develop a predictive model of colorectal cancer, by using the classification techniques and association rules to identify various analyzable risk factors and clinical features. Method: The clinical dataset was obtained from the cancer registry of three hospitals with a total of 4,287 valid records. According to the three clinicians and the literature, 14 independent variables were used as risk factors for analyzing the SPC. Classification techniques using Wikato’s intelligent software includes as follows: Naive Bayes, logistic, K-Star, random committee, randomizable filtered classifier, random forest, and random tree. Evaluation of analyzable performance indicators included sensitivity, accuracy, specificity, F-measure score, and precision. Results: The standardized incidence rate of the second primary colorectal cancer in this study was 1.10, which was slightly lower than in other countries. The overall risk factors for second primary colorectal cancer patients were combined stage, tumor size, chemotherapy, and grade/differentiation, and the Naive Bayes method had the highest accuracy rate (89.88%). Clinical features showed that the most important risk factors for patients in early and advanced stages were body mass index and tumor size, respectively; the Naive Bayes method had the highest accuracy in both early (88.03%) and advanced (93.62%) stages. The risk factors of synchronous and metachronous were combined stage and tumor size, and the Naive Bayes method had the highest accuracy in synchronous (90.03%) and metachronous (92.78%). In addition, the highest accuracies for analyzing synchronous were K-Star classifiers (87.07%) in early stage and Naive Bayes (92.88%) in advanced stage. The risk factors were tumor size and chemotherapy. The highest accuracy for metachronous was Naive Bayes, which was 89.97% in early stage and 95.36% in advanced stage. The risk factors were tumor size and grade/differentiation. Conclusion: Colorectal cancer has been a common disease in Taiwan over the past decade, and the incidence rate was second only to female breast cancer. The mortality rate was third only to malignant tumors of lung cancer and liver cancer, and it also represents the third highest cancer in medical expenditure. This study was the first to analyze the risk factors of synchronous and metachronous second primary colorectal cancers for colorectal cancer survivors in Taiwan. The results of this study showed that the overall risk factors were combined stage, tumor size, chemotherapy, and grade/differentiation. Among them, chemotherapy for synchronous cancer and grade/differentiation of metachronous cancer were both important factors to be observed.

參考文獻


References
Asaju, LB, Peter BS, Nwadike Fand Hambali MA. (2017). Intrusion Detection System on a computer network using an ensemble of randomizable filtered classifier, k-nearest neighbor algorithm, FUW Trends in Science & Technology Journal, 2(1B), 550-553.
Babacan NA. (2012). Multiple primary malignant neoplasms: Multi-center results from Turkey, official journal of the Balkan Union of Oncology, 17(4), 770-5.
Camille C, Patricia D, Arnaud S and Marc C. (2009). Incidence of second primary cancer within 5 years of diagnosis of a breast, prostate or colorectal cancer:a population-based study,European journal of cancer prevention, 18(5), 343-8.
Chang CC and Ssu-HC (2019). Developing a Novel Machine Learning-based Classification Scheme for Predicting SPCs in Women with Breast Cancer, Frontiers in Genetics. 2019, https://doi.org/10.3389/fgene.2019.00848

延伸閱讀