透過您的圖書館登入
IP:3.144.31.239
  • 學位論文

RNA-seq 正規化的初探

A preliminary study of RNA-seq normalization strategies.

指導教授 : 謝文萍

摘要


RNA-seq has been an important technology for sequencing-based transcriptome survey. In order to reveal important effect of expression level change in biology, any measurement of the RNA levels should reflect the true relative activities across dif-ferent genes and different samples. There are always issues of data normalization for all sorts of high throughput platforms. In this study, we consider two well-known stu-dies that provide interesting normalization ideas for the RNA-seq data. The first one is named re-weighting scheme and is proposed by Kasper et al. It is focused on correct-ing the bias introduced by library preparation and random priming. The second study proposes two normalization schemes by Li et al. The two models are based on Poisson regression and Multiple Additive Regression Tree. The idea is to model the read count variation by including the information of sequence composition at every position considered. Our evaluation was performed on two datasets with several criterions, including the uniformity of the normalized signals and the consistency of the relative expression levels before and after the normalization. Our results show that MART model can better achieve the uniformity of the signals after the adjustment among the ones compared and keeps the right relative expression levels.

關鍵字

RNA-seq

參考文獻


Barski, A., et al. (2007) High-resolution profiling of histone methylations in the human genome, Cell, 129, 823-837.
Flicek, P. and Birney, E. (2009) Sense from sequence reads: methods for alignment and assembly, Nat Methods, 6, S6-S12.
Friedman, J.H. (2002) Stochastic gradient boosting, Comput Stat Data An, 38, 367-378.
Gilad, Y., Pritchard, J.K. and Thornton, K. (2009) Characterizing natural variation using next-generation sequencing technologies, Trends Genet, 25, 463-471.
Hansen, K.D., Brenner, S.E. and Dudoit, S. (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming, Nucleic Acids Res, 38, e131.

延伸閱讀


國際替代計量