- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Undergraduate Research /
- Phylogenetic classification of long read sequences
Open Collections
UBC Undergraduate Research
Phylogenetic classification of long read sequences Chan, Kevin Chin-Wei
Abstract
TreeSAPP (Tree-based Sensitive and Accurate Protein Profiler) is an analysis pipeline designed to functionally and taxonomically classify protein and nucleotide sequences using marker genes and phylogenetic methods. Currently, TreeSAPP supports short read sequencing data (e.g. Illumina), but does not support long reads from newer sequencing platforms (e.g. Nanopore). Therefore, ten isolate datasets sequenced using Oxford Nanopore Technologies were aligned to reference sequences of five single-copy phylogenetic marker genes. Of the four aligners tested (minimap2, GraphMap, LAST and SNAP), minimap2 performed the best when judged by raw and weighted averages of taxonomic distance of alignments to their optimal placements, which is crucial for phylogenetic inference. Minimap2 was subsequently integrated into the long read workflow of TreeSAPP, and was tested on the same datasets and a mock community. While the workflow performed well with isolate datasets, poor recall was demonstrated with the mock community, suggesting required improvements in TreeSAPP’s linear model for taxonomic inference, or for higher resolution nucleotide reference packages. Importance Short read sequencing information pose several challenges for downstream bioinformatic analyses, such as sequencing error, non-uniform coverage of samples, computational time complexity and resolving repetitive regions. With the advent of cost-effective long read sequencing technologies, many of these problems are alleviated through contiguous sequences encoding full length open reading frames. Despite this benefit, relative to short reads, long reads have high error and insertion/deletion rates, with the potential to limit their utility in marker gene classification. To resolve this dilemma, TreeSAPP requires a separate workflow for long read sequences.
Item Metadata
Title |
Phylogenetic classification of long read sequences
|
Creator | |
Date Issued |
2019-04
|
Description |
TreeSAPP (Tree-based Sensitive and Accurate Protein Profiler) is an analysis
pipeline designed to functionally and taxonomically classify protein and nucleotide sequences
using marker genes and phylogenetic methods. Currently, TreeSAPP supports short read
sequencing data (e.g. Illumina), but does not support long reads from newer sequencing
platforms (e.g. Nanopore). Therefore, ten isolate datasets sequenced using Oxford Nanopore
Technologies were aligned to reference sequences of five single-copy phylogenetic marker
genes. Of the four aligners tested (minimap2, GraphMap, LAST and SNAP), minimap2
performed the best when judged by raw and weighted averages of taxonomic distance of
alignments to their optimal placements, which is crucial for phylogenetic inference. Minimap2
was subsequently integrated into the long read workflow of TreeSAPP, and was tested on the
same datasets and a mock community. While the workflow performed well with isolate datasets,
poor recall was demonstrated with the mock community, suggesting required improvements in
TreeSAPP’s linear model for taxonomic inference, or for higher resolution nucleotide reference
packages. Importance Short read sequencing information pose several challenges for downstream
bioinformatic analyses, such as sequencing error, non-uniform coverage of samples,
computational time complexity and resolving repetitive regions. With the advent of
cost-effective long read sequencing technologies, many of these problems are alleviated through
contiguous sequences encoding full length open reading frames. Despite this benefit, relative to
short reads, long reads have high error and insertion/deletion rates, with the potential to limit
their utility in marker gene classification. To resolve this dilemma, TreeSAPP requires a separate
workflow for long read sequences.
|
Genre | |
Type | |
Language |
eng
|
Series | |
Date Available |
2019-04-24
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0378444
|
URI | |
Affiliation | |
Campus | |
Peer Review Status |
Unreviewed
|
Scholarly Level |
Undergraduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International