  • 期刊
  • OpenAccess

Evaluation of Missing Value Estimation for Microarray Data


Microarray gene expression data contains missing values (MVs). However, some methods for downstream analyses, including some prediction tools, require a complete expression data matrix. Current methods for estimating the MVs include sample mean and K-nearest neighbors (KNN). Whether the accuracy of estimation (imputation) methods depends on the actual gene expression has not been thoroughly investigated. Under this setting, we examine how the accuracy depends on the actual expression level and propose new methods that provide improvements in accuracy relative to the current methods in certain ranges of gene expression. In particular, we propose regression methods, namely multiple imputation via ordinary least squares (OLS) and missing value prediction using partial least squares (PLS).Mean estimation of MVs ignores the observed correlation structure of the genes and is highly inaccurate. Estimating MVs using KNN, a method which incorporates pairwise gene expression information, provides substantial improvement in accuracy on average. However, the accuracy of KNN across the wide range of observed gene expression is unlikely to be uniform and this is revealed by evaluating accuracy as a function of the expression level.


Nguyen, V. B. (2017). Euonymus laxiflorus Champ 以及 Paenibacillus sp. TKU042 所生產α-葡萄糖苷酶抑制劑與α-澱粉酶抑制劑之研究 [doctoral dissertation, Tamkang University]. Airiti Library. https://doi.org/10.6846/TKU.2017.00406
彭博彥(2015)。機械壓印對奈米共軛高分子薄膜光電性質之影響: MEH-PPV和P3HT〔碩士論文,國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2015.00464
