UBC Undergraduate Research

Batch Correction and Integration of Single-Cell RNA Sequencing Data from Multiple Lymphoma Subtypes Winata, Helena

Abstract

Single-cell RNA-sequencing (scRNA-seq) ia a relatively novel tool that allows the study of complex heterogeneous systems as found in many cancers. However, integration of scRNA-seq datasets with variable biological properties remains a huge bioinformatics challenge due to technical biases/batch effects. In lymphoma, the composition of the tumour microenvironment is predictive of clinical outcomes and varies between subtypes. This thesis aims to evaluate four batch-correction algorithms, FastMNN, Seurat, Harmony and LIGER, in terms of minimizing technical noise while preserving biological signal on classical Hodgkin and follicular lymphoma. Performance is evaluated using UMAP visualization and four assessment metrics: kBET, ARI, ASW, and LISI. Our results reveals that Harmony performs well in terms of batch mixing, as well as preserving cell purity in larger datasets. Notably, FastMNN ranks higher in cell type preservation, but is inferior to Harmony in terms of batch mixing. The complexity of batch-effect assessment is reflected by incongruent metrics results. Lastly, the importance of understanding the biological aspects of the dataset to prevent over-correction was highlighted.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International