BIRS Workshop Lecture Videos
Principal Component Analysis for microbiome data by correcting the measurement errors and sequencing depths Gu, Hong
Data exploratory methods, such as Principal Component Analysis (PCA), cannot properly be directly applied on microbiome data due to the issues of sampling errors and sequencing depths. Under the assumption of Poisson sampling errors, we study the problem of computing a PCA of the underlying Poisson means or a nonlinear transformation of the latent Poisson means. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means without any assumptions on the underlying distribution of these means. Furthermore, we incorporate methods for correcting diï¬ erent exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the non-trivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line. For example the Poisson-log-normal (PLN) model approach. We compare our method with the PLN approach and find that our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real data, we see that our method also appears to be more robust to outliers than the parametric method.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International