UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Linking function and phylogeny in microbiomes using TreeSAPP Morgan-Lang, Connor

Abstract

Microorganisms within the domains Bacteria and Archaea are the most ancient and abundant forms of life in Earth’s biosphere. Their profound diversity manifests resilient communities that extend into virtually every conceivable niche. Over the past decade, high-throughput DNA sequencing of whole communities has transformed our perception of these branches of life, illuminating uncultivated lineages and conceptually linking microorganisms at various levels of biological organization to global-scale biogeochemical cycles. A core set of genes mediating transformations of energy and matter that drive a wide range of ecosystem services and functions evolved early in life’s history. Yet, identifying these functional and phylogenetic anchor genes amidst a veritable haystack of sequence information remains a challenging endeavour. In addition to common limitations including inefficient and overly permissible homology search methods, contemporary annotation tools fail to account for the paltry representation of extant diversity in natural and engineered environments leading to biased interpretations. To this end, I have developed the Tree-based Sensitive and Accurate Phylogenetic Profiler (TreeSAPP), a Python package for gene-centric analysis of microbial communities. TreeSAPP creates, updates, and leverages structured data objects called reference packages for homology search, phylogenetic placement, and taxonomic assignment of protein sequences. In comparison to related tools, TreeSAPP exhibits better classification performance and a broader suite of functions relevant to microbial ecology. I showcase recent improvements to the classification pipeline and introduce a supervised phylogeny partitioning algorithm useful in defining operational protein clusters compatible with common diversity estimation metrics. Example workflows are provided to construct accurate and inclusive reference packages in a principled manner, and quantify marker gene sequences in environmental datasets. Finally, I demonstrate the capabilities of TreeSAPP in a census of methane-cycling and alkane-transforming archaea, revealing expanded ecosystem ranges and support for numerous novel lineages that encode methyl-coenzyme M reductase. Resulting reference packages and data products provide a framework for developing molecular tools with which to probe and enrich for microbial agents driving selected environmental transformations and inform gene-centric modeling efforts to predict microbial community responses to environmental perturbation.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International