透過您的圖書館登入
IP:3.21.34.0
  • 學位論文

肝癌病患經治療後腫瘤復發分析與預測模式的建立

Prediction Model Establishment and Analysis of Post-Treatment Recurrence in Hepatocellular Carcinoma Patients

指導教授 : 陳培哲
共同指導教授 : 楊培銘(Pei-Ming Yang)

摘要


肝細胞癌在慢性肝炎盛行的台灣是一個重要且造成高癌症死亡率的一個疾病。儘管腫瘤在精確診斷後接受有效的治療清除下,術後仍然有相當高的復發率。如何在這些病患中找出哪些是容易復發的病患也成為一個重要的研究方向。也能夠提供更有效的篩選以及更及早發現腫瘤的復發來加以治療。 此研究起始於針對台大醫院建立一個完成肝癌資料庫,並對於接受肝癌根除性無線射頻燒灼術後的病患建立一個術後復發的預測模組。於資料庫建立後,完成詢問搜尋系統建置及測試,利用資料庫搜尋少見之肝癌合併十二指腸侵犯併發症。以搜集到之個案來驗證資料庫之正確性與有效性,並完成一完整之序列案例報告。 臨床上,病患接受追蹤檢查,常必須接受包含超音波以及血液追蹤檢查。這些檢查資料會有幾種特徵,包括縱向長期追蹤資料,某實驗數據的追蹤期間不定期,某病患每次追蹤檢驗項目不一,以及多種評估項目資料。這些因素常會影響研究重要預後因子的分析。因此,後續研究利用這些預後因子合併支持向量機來建立預測模組,並利用多次追蹤資料的整合,研究提出處理資料之方法,並應用於前述預測模組的建立以及提昇預測模組的有效預測能力。 利用台大醫院於2007年至2013年間新診斷早期肝癌病患並接受根除性無線射頻燒灼術治療的資料納入研究。第一階段利用2007年至2009年間病患資料,考究文獻資料後選擇16種重要預測因子來進行預測模組的建立。其中加上5種預測因子選擇模式(基因演算法genetic algorithm [GA],模擬退火法 simulated annealing algorithm [SA],隨機森林法random forest [RF],混合法[GA+RF, SA+RF]來選出預測因子重要的子集(subset)。接著合併支持向量機 (Support Vector Machine, SVM) 來建立治療後一年的復發預測模組。同時以5次交叉驗證(5-fold cross validation)來測試與建立預測模組。 接續研究以增加病人數量及併入術前多重檢測資料來進行研究。藉由動態週期切片(dynamic period slicing,DPS)方式來找出缺值最少的切割方式。同時將此多重檢測資料進行量化時間性摘要(quantitative temporal abstraction, QTA)演算法後同樣利用向量支持機來建立治療後1年復發的預測模組。針對尚未使用DPSQTA以及使用DPSQTA方式建立的預測模組來進行各項準確度及預測能力的分析。同時利用此資料與案例式推理模式做比較與分析。 結論:藉由建立專有肝癌資料庫,能對電子病歷記錄與資料做有效應用。同時,以支持向量機方式能建立有效的肝癌治療後復發之預測模組,有效提供高危險病患的篩檢率。而利用合併動態週期切片與量化時間性摘要的方式,能夠將多重追蹤資料的訊息有效的加入研究分析。研究結果也顯示出多重追蹤資料仍然對於治療術後復發的追蹤有其重要性。

並列摘要


Background and objective: Hepatocellular carcinoma (HCC) is a leading cancer in Taiwan with high prevalence of hepatitis. Despite effective treatments with tumor eradication, recurrence of is still high and is an important issue for patient treatment. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to establish a hospital-based HCC database and develop recurrence predictive models for HCC patients who received treatment. After establishment of the HCC database, a case series regarding to the rare HCC duodenal complication was performed to test the efficiency and accuracy of database. Multiple sampling medical data with varied frequencies such as laboratory data may contain the following characteristics: longitudinal data, irregular measurements for a laboratory item, irregular measurements for some patient, and multiple parameters. The aim of this study was to propose a data processing method to handle data with these characteristics and its application to the clinical prediction model development. Methods: In National Taiwan University Hospital, a HCC cancer registry database was established. The newly diagnosed early stage HCC patients from 2007 to 2009 (in 1st stage study) and to 2013 (in 2nd stage study) who received radiofrequency ablation (RFA) as 1st treatment were enrolled for study. In 1st stage study, five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. In 2nd stage study, a dynamic period slicing (DPS) method combined with quantitative temporal abstraction algorithm (DPSQTA algorithm) is proposed to process longitudinal, irregular and multiple parameters data. The DPSQTA and a baseline method are compared regarding to the performances of predictive models, including sensitivity, specificity, balanced accuracy (BAC), accuracy, positive predictive value (PPV), and negative predictive value (NPV). Results: HCC cancer registry database was established and implemented with high accuracy query system, which provided a base for studies, for example of rare duodenal invasion of HCC as a study target. HCC post treatment recurrence predictive model could be developed by SVM with hybrid feature selection methods and 5-fold cross-validation. Averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve were 67%, 86%, 82%,69%, 90%, and 0.69, respectively in early stage study. With the help of DPSQTA in 2nd stage study, the DPSQTA increased the overall performance of established predictive model than the baseline method in sensitivity, BAC, accuracy, PPV and NPV, although not statistically significant. Conclusions: Based on established cancer registry database, effective HCC post RFA recurrence predictive model was established by machine learning SVM. High-risk recurrent patients could be identified for close follow up of recurrence. By add-on DPSQTA, the longitudinal, irregular and multiple parameters data could be processed and predictive model accuracy might be improved.

參考文獻


1. Torre, L.A., et al., Global cancer statistics, 2012. CA Cancer J Clin, 2015. 65(2): p. 87-108.
2. Department of Health, E.Y., Taiwan, Cancer Registry Annual Report 2009. 2012.
3. Hsu, Y.C., et al., Temporal Trend and Risk Determinants of Hepatocellular Carcinoma in Chronic Hepatitis B Patients on Entecavir or Tenofovir. J Viral Hepat, 2017.
4. Department of Health, E.Y., Taiwan, Cancer Registry Annual Report, 2015, Taiwan. 2017.
5. Omata, M., et al., Asian Pacific Association for the Study of the Liver consensus recommendations on hepatocellular carcinoma. Hepatol Int, 2010. 4(2): p. 439-74.

延伸閱讀