透過您的圖書館登入
IP:18.118.195.19
  • 學位論文

通過整合核糖核酸測序和臨床數據使用圖神經網絡預測癌症預後

Predicting Cancer Prognosis Using Graph Neural Networks by Integrating RNA-Sequencing and Clinical Data

指導教授 : 林澤
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在女性中,乳癌是最常見的癌症類型,並且是癌症相關死亡的主要原因。乳癌包含多種風險因子,包括年齡、肥胖、家族史和輻射暴露等。需要注意的是,每個女性都面臨患上乳癌的潛在風險。如果能夠及早發現疾病,乳腺癌治療後的存活率可達 90% 以上。準確檢測和早期治療成為防治乳癌的重要課題。癌症預後與患者的基因組特徵高度相關,這些特徵本質上是高維度的。 遺傳數據通常具有三個共同特徵:高維度、小數據量和稀疏性。從基因組數據中提取信息特徵來預測癌症預後具有挑戰性。在這種情況下,提供額外的信息可能是一個合理的解決方案。因此,我們引入了基因相互作用網絡(Gene Interaction Networks; GINs)來揭示基因之間的潛在關係。最近,圖神經網絡(Graph Neural Network; GNN)成為一種新興的深度學習方法。它在藥物發現、推薦系統、欺詐檢測和生物信息學等許多研究領域都表現出了非凡的表現力。在這項研究中,我們利用系統生物學特徵選擇器(Systems Biology Feature Selector)進行降維,從高維 RNA 測序 (RNA-Seq) 數據中選擇 20 個被認為與乳腺癌預後密切相關的預後生物標誌物(biomarker)。此外,我們實現了雙模態圖神經網路,以幫助模型充分理解基因之間複雜的相互作用,並從 RNA-Seq 和臨床數據中提取有用信息。在基因互動網路(GIN)的幫助下,我們的模型相較於所有作為基準的模型(baseline model)表現最佳,尤其是在精確召回曲線下面積(Area Under Precision Recall Curve; AUPRC)高達 22%。實驗結果表明,利用圖神經網路(GNN)可以成功提取基因資料中高維複雜的交互作用。希望我們的研究能夠為未來的癌症預後預測相關的研究提供重要的幫助。

並列摘要


In women, breast cancer stands as the prevailing cancer type and holds the top spot as the leading cause of cancer-related fatalities. Breast cancer encompasses various risk factors, including age, obesity, family history, exposure to radiation, etc. It is important to note that every woman faces a potential risk of developing breast cancer. When the disease can be identified early, the survival rate of breast cancer after treatment is 90% or higher. Accurate detection and early treatment become important issues in combating breast cancer. Cancer prognosis is highly related to patients’ genomic features, which are high-dimensional in nature. Genetic data usually have three common characteristics: high dimensionality, small data size, and sparsity. It is challenging to extract informative features from genomic data to predict cancer prognosis. Providing additional information seems to be a reasonable solution in this scenario. As a result, we introduce gene interaction networks (GINs) to reveal the underlying relationship between genes. Recently, graph neural network (GNN) has become an emerging deep learning method. It has shown extraordinary expressive power in many research areas, such as drug discovery, fraud detection, recommendation systems, and bioinformatics. In this study, we utilize a systems biology feature selector for dimension reduction to select 20 prognostic biomarkers that are considered closely related to breast cancer prognosis from the high dimensional RNA Sequencing (RNA-Seq) data. Furthermore, we implement a bimodal graph neural network to help the model fully understand the complicated interaction between genes and extract useful information in both RNA-Seq and clinical data. With the help of GINs, the model performs best among all baseline models, especially in the area under the precision-recall curve (AUPRC) by as large as 22%. The results demonstrate that our GNNs approach can successfully extract high-dimensional and complicated interactions within genomic data. We believe our research can provide crucial insights for future studies on cancer prognosis using genomic data.

參考文獻


H. Ritchie, F. Spooner, and M. Roser, “Causes of death,” Our World in Data, 2018, https://ourworldindata.org/causes-of-death.
F. Bray, M. Laversanne, E. Weiderpass, and I. Soerjomataram, “The ever-increasing importance of cancer as a leading cause of premature death worldwide,” Cancer, vol. 127, no. 16, pp. 3029–3030, 2021.
A. R. Omram, “The epidemiologic transition: a theory of the epidemiology of population change,” Bulletin of the World Health Organization, vol. 79, no. 2, pp. 161–170, 2001.
O. Gersten and J. R. Wilmoth, “The cancer transition in japan since 1951,”Demographic Research, vol. 7, pp. 271–306, 2002.
G. N. Hortobagyi, J. de la Garza Salazar, K. Pritchard, D. Amadori, R. Haidinger, C. A. Hudis, H. Khaled, M.-C. Liu, M. Martin, M. Namer et al., “The global breast cancer burden: variations in epidemiology and survival,” Clinical breast cancer, vol. 6, no. 5, pp. 391–401, 2005.

延伸閱讀