透過您的圖書館登入
IP:18.218.38.125
  • 學位論文

運用多種體學資料建構異質網絡之分析方法

A Hybrid Analysis Method for Construction of Heterogeneous Network from Multi-Omics Data

指導教授 : 莊曜宇
共同指導教授 : 蕭自宏(Tzu-Hung Hsiao)

摘要


目前已知腫瘤的生成是由許多不同層次的生物分子的變異所累積的結果,因此整合多種體學資料的分析已逐漸受到重視。但由於這些體學資料具有異質性、大規模、且彼此相關的特性,目前仍缺乏一個有效的整合分析方法,以系統性地探討癌症與生物分子間的關聯性以及生物分子之間的交互作用。 於本論文研究中,我們提了一個可以結合多種體學資料的整合分析方法,用於系統性鑑定癌症中與特定臨床效用相關的因子,如藥物敏感性及病人預後相關體學特徵,這些特徵更進一步以網絡的形式呈現,以一個節點代表一個特徵,並以線段連結兩個節點以表示兩者具有相關的特性;此外,此方法也將一群群高度相關的特徵聚集在一起而形成一個個模組,提供了生物分子之間交互作用的可能性。此方法主要分為四個步驟,首先為體學資料的蒐集與標準化過程,並將這些異質的體學資料轉換成適當的數值尺度,隨後,我們進行相關特徵節點的初步篩選以達到降維的目的,第三步為利用最小絕對壓縮挑選(Lasso)評估法篩選出最具代表性的因子,也就是找出每個模組中的代表節點,最後再利用相關性檢定將這些模組中的其他因子一一鑑定出來。為了驗證此方法的可行性,我們將這個方法應用至一系列的模擬資料以及兩個真實資料集「Cancer Cell Line Encyclopedia(CCLE)與The Cancer Genome Atlas(TCGA)」中分別作探討。 應用於模擬資料的結果證實了使用 Lasso 於整合分析方法中的可行性,也顯示了整合多種體學資料並同時分析的方法有效提升結果的正確性。在 CCLE 的真實資料集中,我們針對了癌症細胞株建立了一個與太平洋紫杉醇敏感性相關的異質網絡,總共有 2,033 個節點,其中包含了 5 種不同層次的分子特徵,並組成 98 個模組,其中一個節點為一個多重耐藥運輸子的上游基因ABCB1,我們發現其 mRNA 的表達與太平洋紫杉醇的耐受性相關,且在 9 種癌症中有著不同的表達情形;此外,我們也鑑定出一個已知參與細胞中微管組裝與去組裝功能的基因群「MICROTUBULE POLYMERIZATION OR DEPOLYMERIZATION」,也再次確認了太平洋紫杉醇與微管功能及細胞移動的相關性。將此方法應用於 TCGA 資料集中,我們鑑定出一個由 266 個節點所組成的結腸癌患者預後相關的異質網絡,其中為 5 種不同層次的分子特徵共同組成 61 個模組;硫酸酯酶修飾因子基因 SUMF1 及鉀離子通道蛋白 KCNK5 的突變狀態為影響預後最大的兩個因子;此外,1 號染色體短臂上 DNA 拷貝數的缺失、及位於 7 號染色體上 CpG 位點的甲基化特徵與較差的預後息息相關,其中包含了位於 HOXA13 中的 CpG 位點的甲基化特徵。我們期望在此得到的結果可以增進我們對於癌症中抗藥性因素及腫瘤生成與惡化機制的了解。 總結來說,我們所建立的整合分析方法可有效用於建構與藥物反應及病人預後相關的異質網絡,透過模擬研究及兩個真實資料集的應用,我們驗證了此模型的穩健性及可行性。我們期望此分析方法可以應用至更多的體學資料中,以幫助我們去了解高度異質性的癌症。

並列摘要


It has been illuminated that tumorigenesis is caused by an accumulation of perturbation of different layers of biomolecules. Therefore, there has been growing interest in the integrated analysis of multilayer omics data. The dimensionality, heterogeneity, and dependency of omics data necessitate an effective hybrid-analysis method for systematically exploring the associations and interactions between layers. No such method has been previously developed. In the present study, we aimed to develop a hybrid-analysis method that incorporates multi-omics data for systematically identifying the omics features related to the specific outcome, such as drug responsiveness and patients’ prognosis. These identified features were then presented by a network, using a node to represent a feature and an edge for correlation between features. Besides, the method could cluster a group of highly correlated omics features into a module, providing the putative interactions of biomolecules. The proposed method can be briefly divided into the following four steps. First, omics data were collected and conducted the normalization to transform each dataset into an appropriate scale. Next, we preselected the features of interest to reduce the dimension. Third, Least Absolute Shrinkage and Selection Operator (Lasso) estimator was introduced to identify representative nodes in each module. Finally, we built integral modules by correlation analyses. To test the feasibility of our method, the simulation study and two applications of public datasets, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA), were conducted. The results of the simulation study demonstrated the feasibility of applying the lasso estimator in the hybrid-analysis method and suggested that improved performance can be achieved by integrating all layers of data simultaneously. Two feature networks were constructed, related to paclitaxel response and survival, respectively. The former network involved a total of 98 modules constituted by 2,033 features from 5 data types. Among them, the expression of ABCB1, which encodes multidrug transporters, was the most relevant factor for drug resistance and was expressed differentially among several cancer types. In addition, we identified the gene set “MICROTUBULE POLYMERIZATION OR DEPOLYMERIZATION”, which influences assembly or disassembly of microtubules, suggesting that paclitaxel affects the functions of microtubules as well as cell movement. In the second network, we identified a total of 266 features that jointly constructed 61 modules correlated with the risk of colon cancer. The mutation status of sulfatase-modifying factor 1 (SUMF1) and the potassium channel member 5 (KCNK5) were the top two most influential factors. Moreover, the loss of chromosome 1p and hypermethylation of multiple CpG loci on chromosome 7, including the sites in HOXA13, were identified associated with poor prognosis. It is expected that the results obtained here could promote the understanding of drug resistance mechanisms and tumor development and progression. To sum up, we developed an effective and robust hybrid-analysis method to investigate multi-omics networks with implications in drug response and prognosis of cancers. Its performance was corroborated using a simulation study and two real datasets. Our model is widely applicable to other omics data and is anticipated to facilitate the exploration of highly heterogeneous cancers.

參考文獻


72. Narayan G, Arias-Pulido H, Koul S, Vargas H, Zhang FF, Villella J, Schneider A, Terry MB, Mansukhani M, Murty VV: Frequent promoter methylation of CDH1, DAPK, RARB, and HIC1 genes in carcinoma of cervix uteri: its relationship to clinical outcome. Molecular Cancer 2003, 2(1):1.
1. Cowell JK, Hawthorn L: The application of microarray technology to the analysis of the cancer genome. Current Molecular Medicine 2007, 7(1):103-120.
2. Metzker ML: Sequencing technologies—the next generation. Nature Reviews Genetics 2010, 11(1):31-46.
3. Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics 2011, 12(2):87-98.
5. Laird PW: Principles and challenges of genome-wide DNA methylation analysis. Nature Reviews Genetics 2010, 11(3):191-203.

延伸閱讀