UBC Theses and Dissertations
Evaluating feature selection techniques for identifying a microbial signature in 16S microbiome sequencing data Nguyen, Thuy Tien
The goal of understanding which microbes are responsible for shifts in different environmental and clinical conditions has motivated the increase in the development of custom feature selection techniques for microbiome data. When identifying an appropriate feature selection method, researchers are often faced with the question of whether or not to apply a phylogenetic approach. However, in many cases, it is not possible to know which is most suitable a priori. The motivation behind a phylogenetic approach is biological, as the features in microbiome data embodies an inherent hierarchical structure that may contain signal from trait conservation. Microbiome shifts correlated with host outcome could be driven by groups of taxa which are closely phylogenetically related. As such, techniques that leverage phylogenetic information seem highly fitting. Recent studies have shown promising results that a phylogenetic approach could be beneficial, however less have sought to provide a thorough evaluation of the robustness and applicability of the phylogenetic approach for different host outcomes of study. Guidance for researchers on the applicability of a phylogenetic approach is unclear in the current state of the literature. In this work, we sought to perform an assessment of feature selection methods in order to understand how different classes of methods compete on microbiome data. We sought to evaluate whether a phylogenetic approach would be more powerful in finding ground-truth features with phylogenetic correlation structures, leading us to discover that non-phylogenetic methods are the best all-around methods both in the presence and absence of strong phylogenetic signal. Some evidence has shown that there is still merit in the phylogenetic approach — such as in scenarios where the phylogenetic signal is very strong. Our observations and findings provide insights into strategies for testing for a phylogenetic signal using a combination of techniques.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International