透過您的圖書館登入
IP:3.12.151.153
  • 學位論文

利用深度學習建構跨細胞株模型預測增強子之細胞株特異性活性

Predicting cell type-specific enhancer activities by cross-cell type modeling with deep learning

指導教授 : 陳倩瑜
共同指導教授 : 歐陽彥正(Yen-Jen Oyang)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


增強子是一類重要的調節元件,過去許多研究中已顯示出增強子是輔助啟動子調節細胞基因表達的關鍵角色。目前,人類基因體中,增強子的數目及其在不同細胞中的活性,仍存在有許多未知。在過往的研究發現,增強子的活性與一些功能性的資料相關,例如:組蛋白修飾、序列特徵以及染色質的結構與開合程度等等。在本論文中,我們主要利用DNase以及其他組蛋白修飾數據建立了深度學習模型,並且以H3K27ac峰值作為所選細胞類型中的增強子位置進行訓練與預測。此外,本研究還設計了結合多種細胞類型的聯合訓練(Joint training),用以提高模型預測性能。透過我們所提出的深度學習模型accuEnhancer,我們展示了利用完整特徵資料集以及深度學習於預測單一種類細胞株內增強子活性的可行性,其準確性和F1可以達到0.97和0.9。為了更進一步進行跨細胞株的增強子活性預測,本論文提出通過整合來自不同細胞類型的數據來提高跨細胞類型預測的性能。隨著結合來自不同細胞類型的更多訓練數據,預測獨立細胞類型的F1從0.3上升至0.80。結果表明,通過合併更多的跨細胞類型的數據集,深度學習模型可以捕獲複雜的調控模式並提供更好的性能。最後,本研究測試了accuEnhancer模型在預測VISTA實驗驗證的增強子數據庫的有效性。結果顯示accuEnhancer在預測經過實驗驗證的增強子能勝過前人的其他方法,本論文因此探討了跨細胞株,乃至於跨物種預測的可行性。

並列摘要


Enhancers are one class of the regulatory elements that have been shown to act as key components to assist promoters in modulating the gene expression in living cells. At present, the number of enhancers in the human genome as well as their activities in different cell types are still largely unknown. Previous studies have shown that enhancer activities are associated with some functional data, such as histone modification, sequence motifs, and chromatin accessibilities. This study utilized DNase data to build a deep learning model for predicting the H3K27ac peaks as the active enhancers. Moreover, this thesis proposed joint training of multiple cell types to boost the model performance. The analyses conducted in this thesis first demonstrated the general feasibility of accuEnhancer to predict within-cell type enhancer activities, where the accuracy and the F1 score can achieve 0.97 and 0.9, respectively. To further predict cell type-specific enhancers by cross-cell type modeling, we integrated the training data from different cell types to boost the model performance. The F1 score increased from 0.3 to 0.80 as the model combined more training data from different cell types. The results demonstrated that by incorporating more datasets across cell types, the complex regulatory patterns could be captured by the deep neural networks to deliver better performances. Lastly, this study tested the effectiveness of the model on predicting experimentally validated enhancers in the VISTA database. The results indicated that accuEnhancer outperforms the previous works in predicting cell type-specific enhancer activities by cross-cell type modeling.

參考文獻


1. Smigielski, E.M., et al., dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res, 2000. 28(1): p. 352-5.
2. Sherry, S.T., M. Ward, and K. Sirotkin, dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res, 1999. 9(8): p. 677-9.
3. Rentzsch, P., et al., CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res, 2019. 47(D1): p. D886-D894.
4. Flanagan, S.E., A.M. Patch, and S. Ellard, Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations. Genet Test Mol Biomarkers, 2010. 14(4): p. 533-7.
5. Lou, S., et al., GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner. PLoS Genet, 2019. 15(8): p. e1007860.

延伸閱讀