透過您的圖書館登入
IP:34.238.138.162
  • 學位論文

深度學習於生醫影像及多體學數據之應用

Biomedical Applications of Deep Learning Using Imaging and Multi -Omics Data

指導教授 : 莊曜宇

摘要


在過去十年間,人工智慧(AI、機器學習(ML)、最佳化、預測及影像辨識)研究流程的開發與使用有快速的發展且被廣泛應用於各個研究領域之中。而將人工智慧的研究方法應用到高通量數據的生醫研究中(如影像以及多體學資料)可對該大數據資料有更深層的理解,亦能夠用於改善許多公共衛生的議題。深度學習為機器學習的一個最新的子類別,旨在讓機器學習能更接近人工智慧的底層概念。本研究欲建立高效能且可應用於分類及辨識目標物之深度學習方法,並使用病理切片影像及DNA定序資料開發三種不同的深度學習流程。 於第一項研究中,一套新穎且基於深度學習的預測流程可使用混雜不同個體的DNA定序資料達到偵測及分類各序列來自哪個個體的功能。為證明此技術同樣可用於其他不同的資料中,該模型的開發流程亦使用了來自不同定序技術所產生的資料集:(1)目標區間定序、(2)全外顯子定序(WES)。在第一個資料集中,利用目標的27個短片段重複序列及94個單核苷酸多型性,製備混雜不同個體的DNA樣本,並使用此深度學習流程去區分出每個個體並可達到95-97%準確率。第二個資料集則使用乳癌患者的WES資料進行,且可完全正確地(100%)預測病患之疾病亞型。此外為克服每個序列之間長度的變異,本研究使用一新滑動窗口(sliding window)方法可大幅提升模型效能。總結來說,本研究提出一項能適用於不同次世代定序平台上的完整工作流程,同時包含序列資料處理及使用深度學習進行預測。 爲結合病理切片影像之易取得性和來自乳癌患者的70個基因印記的復發風險分數,第二項研究提出一深度學習模型,僅使用病理切片影像進行乳癌復發率的預測,提供一快速、低成本以及健全之乳癌復發率預測工具,幫助醫師進行治療計畫的評估。本研究使用六個預訓練模型(VGG16、ResNet50、ResNet101、Inception_ResNet、EfficientB5、Xception)進行遷移式學習,並使用準確率、精確率、召回率、F1 分數、混淆矩陣以及AUC進行模型效能評估。在驗證資料中,Xception有最優異的表現,在patch-wise方法中有0.87%的準確率;且在patient-wise方法中,高風險及低風險類別分別有0.90及1.00之準確率。總結來說,這項研究證明了在病理切片影像未標注特定區域的情況下,建立高效能之人工智慧模型預測癌症復發率之可行性。 利用深度學習方法預測乳癌分子亞型可提供一便利之乳癌診斷策略,且可降低進行mRNA表達量分析以及免疫組織化學染色法鑑定亞型的成本。我們期望使用上一項研究中的70個基因印記影像所訓練的模型權重進行兩階段遷移式學習並應用到病理切片影像上,並作為我們的最後一項研究。我們使用來自四個預訓練模型(VGG16、ResNet50、ResNet101、Xception)的權重以及TCGA-BRCA的資料集做四種乳癌亞型的預測模型。此外,使用Imagenet權重的ResNet101被用於與上述模型進行比較。在分類結果上,此兩階段遷移式學習有優異的表現,ResNet101在slide-wise的預測準確率達到0.913。此深度學習模型亦用於與另一常用的乳癌分類工具Genefu進行比較,在比較的結果中,深度學習模型有與Genefu媲美的表現且在特定乳癌亞型中有更優異的預測能力。 深度學習技術已在許多研究中使用,並被整合到現今的醫療照護系統之中,以增進疾病的診斷以及預後的判定。美國食品藥物管理局已制定完善的機器學習協議,用於管理深度學習及人工智慧工具的應用,並更進一步成為模型開發、資料集建立和部署到醫院的黃金標準。最後,此類的人工智慧工具將使整個醫療照護系統更不易受到緊急狀況的影響,否則在現今的體系下較無法得到最好的處置。

並列摘要


The last decade has witnessed an acceleration in the development and use of artificial intelligent (AI; Machine Learning (ML), optimization, forecasting and image recognition) pipelines that have been widely applied to various research fields. Application of such methods for biomedical research utilizing high-throughput data, such as imaging data and next generation sequencing data, has allowed deeper understanding towards expansion of precision medicine and improvement of public health issues. Deep learning (DL) is the latest sub-branch of ML and has been introduced with the aspiration of bringing ML closer to AI. With the ultimate goal to develop high performance DL pipelines, applicable universally for classification and identification tasks, in this research work, 3 pipelines were developed to process pathological images and DNA sequencing data, respectively. For the first study a novel DL pipeline was proposed that utilized DNA sequencing data to successfully detect and classify different individuals. To prove the global applicability of the pipeline, it was implemented on datasets generated using different sequencing technologies: (i) targeted sequencing and (ii) whole exome sequencing data. For the first application, individuals were identified with 95-97% accuracy, from mixtures of DNA samples, prepared using targeted 27 short tandem repeats and 94 single nucleotide polymorphisms. WES data from breast cancer patients were used for the second application, and the pipeline could correctly classify all patients (100%) into subtypes. A new sliding window approach was proposed and applied, to overcome the sequence length variation problem of sequencing data, which dramatically improved the model performance. Overall, a complete pipeline, including sequencing data processing steps and DL steps is proposed that is applicable across different NGS platforms. To leverage the availability of whole slide images data and the recurrence risk score provided by a 70 gene-signature from breast cancer patients, a DL model was proposed for the second study to predict the breast cancer recurrence status using only pathological images. This provides a rapid, cost-effective and robust predictive tool which would assist medical doctor in treatment recommendation. 6 pretrained models (VGG16, ResNet50, ResNet101, Inception_ResNet, EfficientB5, and Xception) were used for transfer learning and their performances were evaluated based on accuracy, precision, recall, F1 score, confusion matrix, and AUC. Xception demonstrated highest validation performance with an overall accuracy of 0.87 for a patch-wise approach and 0.90 and 1.00 for a patient-wise approach for high-risk and low-risk groups, respectively. Taken together, this study demonstrated the feasibility and high performance of artificial intelligence models trained without region-of-interest labeling for predicting cancer recurrence. Deciphering breast cancer molecular subtypes by DL approaches could provide a convenient and method for the diagnosis of breast cancer patients. It could reduce costs associated with transcriptional profiling and subtyping discrepancy between IHC assays and mRNA expression. Therefore, we aim to develop a highly versatile 2-steps transfer learning pipeline for pathological images using weight obtained from model trained with the 70 gene signature images, for our final study. Weights from 4 pre-trained models namely VGG16, ResNet50, ResNet101, and Xception were used to train TCGA-BRCA datasets to predict 4 intrinsic breast cancer subtypes. Furthermore, ResNet101 model was used for training with weights from ImageNet for comparison with the aforementioned models. The 2-steps DL models showed promising classification results with the overall accuracy of slide-wise prediction as 0.913 with ResNet101 model. The DL model was additionally benchmarked with the common Genefu tool for breast cancer classification. The results demonstrated that the performance of the DL model is comparable to that of Genefu, even superior in certain breast cancer subtypes. DL technology is applied routinely in the laboratory and is integrated into the current health care system to facilitate diagnosis and determination of prognosis. Good machine learning protocol has also been released by U.S FDA for managing the applications of DL and artificial intelligence tools and are made golden standard for model development, dataset preparation and deployment into the hospital. Eventually, artificial intelligence tools would make health care system less vulnerable to emergent situations which are otherwise not handled the best under current healthcare protocols.

參考文獻


1 Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell reports 23, 181-193. e187 (2018).
2 Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal 13, 8-17 (2015).
3 Cruz-Roa, A. et al. Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent. Scientific reports 7, 46450 (2017).
4 Xu, J., Luo, X., Wang, G., Gilmore, H. Madabhushi, A. A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing 191, 214-223 (2016).
5 Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Science translational medicine 4, 157ra143-157ra143 (2012).

延伸閱讀