透過您的圖書館登入
IP:18.189.170.17
  • 學位論文

利用整合式生物資訊方法分析動態時間序列與穩定態基因表現微陣列晶片資料

Integrative Bioinformatics Approaches for Dynamic Time Series and Steady State Transcriptome Microarray Data

指導教授 : 莊曜宇

摘要


微陣列晶片技術在過去的十數年中,已被廣泛的使用於生物及醫學研究上。其高通量之檢測特性,不僅能加速探索受實驗操弄後影響之細胞功能,並能在眾多基因群中迅速的尋找出可能的標的基因,以進行後續實驗驗證。然而,面對如此龐大的資料,如何有效的處理及獲得準確的分析結果成為重要的課題。針對此方向,許多統計方法與數學模型均為此而開發,以期能獲得較佳之分析效果。本論文共包含四部份,發展不同的生物資訊方法研究兩組微陣列晶片資料之結果,其晶片資料內容分別為三種人類淋巴腺細胞株受到輻射線暴露後之基因表現情形與台灣非吸菸女性肺癌病患之檢體資料。在第一部分中,本論文使用動態時間序列分析探究具有不同p53表現型的細胞株在接受高低劑量的輻射線照射後,是否會誘發不同的反應。首先利用模板式群集分析(template-based clustering)與緊湊式群集分析(tight clustering)尋找差異性表現基因,且結果顯示三種細胞在高低劑量輻射線暴露後,會開啟不同的訊息傳導途徑。在10Gy輻射照射後,TK6會啟動p53主導之訊息傳導途徑,而缺乏功能性p53蛋白質的WTK1則會使用NFkB主導之訊息傳導途徑。而在經過等存活率(iso-survival)劑量輻射照射後,不論p53的表現型為何,所有細胞株中與E2F4相關之基因表現量均有下降之情形,因此該傳導途徑在低劑量的輻射反應中可能扮演重要的調控角色。 在第二部份中,本論文利用60位病患的癌症及癌邊正常組織樣本探討非吸菸女性肺癌患者的基因表現圖譜變化情形。首先利用成對t檢定共尋找出687個在癌症組織中具有顯著表現量變化之基因,且這些基因廣泛的參與在突觸引導訊息(axon guidance signaling)傳導途徑上。進一步將這些基因與網路上公開的兩組具有成對樣本的肺癌微陣列資料進行比較,可觀察到高度相似的變化情形,此結果顯示這687個基因確實在肺部癌變的過程中受到影響而發生表現量變化。於這些劇烈變化的基因中,可發現SEMA5A的核醣核酸與蛋白質表現量在癌症組織中均有明顯下降,且其表現量與病患之存活狀況具有高度相關性,因此,SEMA5A未來也許能作為非吸菸女性肺癌病患的新生物標記。 在第三部份中,本論文在42對非吸菸肺腺癌女性患者上進行整合拷貝數變異(copy number variations)與基因表現量之研究。首先透過拷貝數變異分析,獲得在病患染色體中常發生拷貝數變異之區段,並利用統計檢定於這些區塊中找出475個與拷貝數變異相關的差異性表現基因。接著使用功能性分析找出這些差異表現基因廣泛參與的訊息傳導途徑,其中包括兩種主要的細胞功能調控機制—經過AKT訊息傳導控制細胞存活狀況與細胞骨架的拆解與組合。進一步將這些尋找出的傳導途徑進行存活預測分析,其結果在三組獨立的肺癌微陣列資料中均顯示了十分有效的預測能力,因此,這些同時具有拷貝數變異與表現量變化的基因與傳導途徑未來也許能作為肺部癌變過程中的生物標記。 在第四部份中,本論文針對32對非吸菸肺腺癌女性病患檢體進行基因組與轉錄體之整體性研究,其內容包括單核

並列摘要


Microarray technology has been widely utilized in biological and medical researches in the past two decades. The high-throughput feature facilitates the exploration of dysregulated cellular functions driven by experimental manipulations and identification of potential candidate genes for further validations. However, dealing with those massive data poses an exciting challenge in how to perform an efficient and accurate analysis. To address this issue, various statistical algorithms and mathematical models have been developed. In this dissertation, four bioinformatics approaches were presented and applied on two microarray datasets, three human lymphoblastoid cell lines exposed to radiation treatments and non-smoking female lung cancer patients in Taiwan. The first approach was a dynamic time series analysis, which explored the radiation-induced effects between higher and lower doses in the cells with different p53 status. Template-based clustering and tight clustering were performed to identify differentially expressed genes, and the results exhibited distinct signaling pathways in the three cell lines after 10Gy and iso-survival radiation exposures. After 10Gy radiation treatments, the p53 signaling pathway was triggered in TK6, whereas the NFkB signaling pathway was activated in WTK1 without functional p53 protein. Alternatively, irradiation with iso-survival doses induced down-regulations of many E2F4-related genes in all cell lines in spite of p53 status, which indicated that the E2F4 signaling pathway might serve as important regulators in response to lower dose radiation. The second approach investigated the gene expression profiles of non-smoking female lung cancer patients in Taiwan. This data set was composed of 60 pairs of tumor and adjacent normal tissue specimens. There were 687 differentially expressed genes in tumor tissue identified by paired t-test and significantly enriched in the pathway of axon guidance signaling. The varying patterns were highly similar to two public lung cancer datasets with both tumor and normal tissues from the same individual, which strengthened that these dysregulated genes were involved in lung tumorigenesis. Among them, the downregulation of SEMA5A in tumor tissue, both at the transcriptional and translational levels, was associated with poor survival outcomes. The results suggested that SEMA5A might be used as a novel biomarker for non-smoking female lung cancer patients. In the third approach, concurrent analyses of gene expression and copy number variations (CNVs) were performed in 42 pairs of non-smoking lung adenocarcinoma women. The results revealed the genomic landscape of recurrent copy number variated regions and 475 differentially expressed genes associated with CNVs. Among these CNV-driven genes, two important functions, survival regulation via AKT signaling and cytoskeleton reorganization, were significantly enriched. Survival analyses based on these enriched pathways demonstrated effective predictions in three independent microarray datasets, which suggested that those identified genes/pathways with concordant changes in both gene expression and CNV might be used as prognostic biomarkers for lung tumorigenesis. In the fourth approach, a comprehensive analysis was conducted in 32 pairs of non-smoking female lung adenocarcinoma patients to investigate SNPs, CNVs, methylation alterations, and gene expressions simultaneously. Associated co-varying patterns were observed between genetic modifications and transcriptional dysregulations. Three statistical approaches identified 617 SNP alleles related to CNVs or methylation alterations, and among them, Kruskal-Wallis test indicated 13 SNPs with downstream gene expression changes. Therefore, these SNPs with concordant changes in both DNA and RNA levels deserve more research efforts to elucidate their roles in lung cancer. In conclusion, these four bioinformatics approaches were effective in addressing biomedical issues and the results are confirmable in external datasets or biological experiments.

參考文獻


1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467-70.
2. Shigematsu H, Lin L, Takahashi T, Nomura M, Suzuki M, Wistuba, II, et al. Clinical and biological features associated with epidermal growth factor receptor gene mutations in lung cancers. J Natl Cancer Inst. 2005;97:339-46.
3. Iwase H. [Predictive factors of hormonal therapy in breast cancer]. Nippon Rinsho. 2006;64:555-60.
4. Cooper CS, Campbell C, Jhavar S. Mechanisms of Disease: biomarkers and molecular targets from microarray gene expression studies in prostate cancer. Nat Clin Pract Urol. 2007;4:677-87.
5. Dunkler D, Sanchez-Cabo F, Heinze G. Statistical analysis principles for Omics data. Methods Mol Biol. 2011;719:113-31.

延伸閱讀