透過您的圖書館登入
IP:3.17.142.93
  • 學位論文

開發基於圖卷積網路的單細胞核酸資料插補方法

Development of a scRNA-seq imputation method based on graph convolution networks

指導教授 : 陳倩瑜

摘要


近年來,單細胞核酸定序(single-cell sequencing)技術的發展,讓生物學家有了一大利器,得以更精準地去研究生物組織當中的細胞分群、基因表現的差異和演變的過程。過去的核酸定序是同時測序組織當中的眾多細胞, 而單細胞核酸定序則是將單個細胞區分開來獨立進行測序,能以前所未有的解析度去了解異質組織中的基因表現。然而,單細胞核酸定序技術也帶來了新的挑戰,由於單個細胞中的核酸含量較少,產生的定序資料具有較多的雜訊以及缺失值,這對於後續的下游分析任務會產生不利的影響。為了解決這個問題,過去數年間有很多不同的工具被發展出來對於單細胞定序資料進行插補(imputation),但依然少有方法能在各式的資料集與分析中皆獲得理想的表現。這篇論文提出了一基於圖神經網路的資料插補方法”ERGCN”,先藉由不同細胞基因表現的關聯性來建構近鄰圖(nearest neighbor graph),再利用圖卷積匯集相似細胞的資訊來重構個別細胞的基因表現。為了驗證該方法的有效性,本論文使用了數個單細胞定序資料集,並比較經過其處理後的基因表現情況以及應用於細胞分群任務的影響。在實驗中與其他七種插補方法進行了比較,ERGCN在不同資料集以及使用兩種分群演算法時,都能穩定獲得良好的分群結果。

並列摘要


In recent years, single cell RNA sequencing (scRNA-seq) has become a useful tool which enables biological researchers to study cell clusters, differential gene expression, and cell development trajectory. While the bulk RNA sequencing measures expression levels of numerous cells in the tissue simultaneously, scRNA-seq protocols separate each cell and measure individual expression. Therefore, the heterogeneous tissue can be studied at the resolution of single cells. However, scRNA-seq technology raises both new opportunities and challenges. Since an individual cell only contains a relatively small amount of RNA, the sequencing data tend to be noisy and have more missed values This usually results in negative impacts on downstream analysis. In order to solve the problem, considerable attention has been paid to develop different imputation software tools for scRNA-seq data, but few tools can consistently obtain satisfactory performance on various types of datasets and analysis tasks. A novel method based on graph neural network, ERGCN (expression recovery graph convolution network) is proposed in the thesis. The correlation between cell expression profiles is used to construct a nearest neighbor graph. Then, a graph convolution network is utilized to extract information from similar cells and reconstruct gene expressions. To verify the effectiveness of ERGCN, this study compares ERGCN with 7 imputation tools on multiple datasets. The results demonstrate that ERGCN exhibits competitive performance across datasets in recovering expression profiles and enhancing cell clustering when incorporating with two algorithms.

並列關鍵字

scRNA-seq imputation gene expression clustering GNN

參考文獻


Aleksandra, et al. (2015). "Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation." Cell Stem Cell 17(4): 471-485.
Amodio, M., et al. (2019). "Exploring single-cell data with deep multitasking neural networks." Nature Methods 16(11): 1139-1145.
Einipour, A., et al. (2020). "FSPAM: A Feature Construction Method to Identifying Cell Populations in ScRNA-seq Data." Computer Modeling in Engineering \ Sciences 122(1): 377--397.
Fey, M. and Jan (2019). "Fast Graph Representation Learning with PyTorch Geometric." arXiv pre-print server.
Hashimshony, T., et al. (2016). "CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq." Genome Biology 17(1).

延伸閱讀