UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Evaluating the performance of between-sample heterogeneity identification algorithms in large-scale flow cytometry data analysis Chen, Yixuan

Abstract

The development and application of machine learning (ML) models for automated flow cytometry (FCM) data analysis necessitate the efficient handling of large datasets. Heterogeneity detection and removal of highly redundant information in FCM training sets are crucial for decreasing the computational time for ML training and increasing the performance of ML algorithms by reducing overfitting. Our research introduces "flowTypeFilter" and "flowSim," novel computational tools aimed at improving the detection of heterogeneity and reducing data redundancy in FCM training sets. To address shortcomings in the state-of-the-art in this area, I contributed to the development and evaluation of flowTypeFilter and the evaluation and application of flowSim. By optimizing the flowType algorithm into flowTypeFilter, we enhanced cell subset detection in line with the HIPC Project's automated gating requirements. Additionally, flowSim is engineered for near duplicate detection (NDD) in bivariate FCM images, focusing on heterogeneity crucial for cell population identification. After testing on annotated and co-mixed datasets and extensive data collection, our algorithms demonstrated high accuracy, sensitivity, and efficiency. Notably, flowSim's filtering capability effectively removed 92.6% of half a million entries due to these being flagged as redundant, underscoring the importance of our computational strategies in large-scale FCM analysis.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International