- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Undergraduate Research /
- Phylogenetic classification of long read sequences
Open Collections
UBC Undergraduate Research
Phylogenetic classification of long read sequences Chan, Kevin Chin-Wei
Abstract
TreeSAPP (Tree-based Sensitive and Accurate Protein Profiler) is an analysis
pipeline designed to functionally and taxonomically classify protein and nucleotide sequences
using marker genes and phylogenetic methods. Currently, TreeSAPP supports short read
sequencing data (e.g. Illumina), but does not support long reads from newer sequencing
platforms (e.g. Nanopore). Therefore, ten isolate datasets sequenced using Oxford Nanopore
Technologies were aligned to reference sequences of five single-copy phylogenetic marker
genes. Of the four aligners tested (minimap2, GraphMap, LAST and SNAP), minimap2
performed the best when judged by raw and weighted averages of taxonomic distance of
alignments to their optimal placements, which is crucial for phylogenetic inference. Minimap2
was subsequently integrated into the long read workflow of TreeSAPP, and was tested on the
same datasets and a mock community. While the workflow performed well with isolate datasets,
poor recall was demonstrated with the mock community, suggesting required improvements in
TreeSAPP’s linear model for taxonomic inference, or for higher resolution nucleotide reference
packages. Importance Short read sequencing information pose several challenges for downstream
bioinformatic analyses, such as sequencing error, non-uniform coverage of samples,
computational time complexity and resolving repetitive regions. With the advent of
cost-effective long read sequencing technologies, many of these problems are alleviated through
contiguous sequences encoding full length open reading frames. Despite this benefit, relative to
short reads, long reads have high error and insertion/deletion rates, with the potential to limit
their utility in marker gene classification. To resolve this dilemma, TreeSAPP requires a separate
workflow for long read sequences.
Item Metadata
| Title |
Phylogenetic classification of long read sequences
|
| Creator | |
| Date Issued |
2019-04
|
| Description |
TreeSAPP (Tree-based Sensitive and Accurate Protein Profiler) is an analysis
pipeline designed to functionally and taxonomically classify protein and nucleotide sequences
using marker genes and phylogenetic methods. Currently, TreeSAPP supports short read
sequencing data (e.g. Illumina), but does not support long reads from newer sequencing
platforms (e.g. Nanopore). Therefore, ten isolate datasets sequenced using Oxford Nanopore
Technologies were aligned to reference sequences of five single-copy phylogenetic marker
genes. Of the four aligners tested (minimap2, GraphMap, LAST and SNAP), minimap2
performed the best when judged by raw and weighted averages of taxonomic distance of
alignments to their optimal placements, which is crucial for phylogenetic inference. Minimap2
was subsequently integrated into the long read workflow of TreeSAPP, and was tested on the
same datasets and a mock community. While the workflow performed well with isolate datasets,
poor recall was demonstrated with the mock community, suggesting required improvements in
TreeSAPP’s linear model for taxonomic inference, or for higher resolution nucleotide reference
packages. Importance Short read sequencing information pose several challenges for downstream
bioinformatic analyses, such as sequencing error, non-uniform coverage of samples,
computational time complexity and resolving repetitive regions. With the advent of
cost-effective long read sequencing technologies, many of these problems are alleviated through
contiguous sequences encoding full length open reading frames. Despite this benefit, relative to
short reads, long reads have high error and insertion/deletion rates, with the potential to limit
their utility in marker gene classification. To resolve this dilemma, TreeSAPP requires a separate
workflow for long read sequences.
|
| Genre | |
| Type | |
| Language |
eng
|
| Series | |
| Date Available |
2019-04-24
|
| Provider |
Vancouver : University of British Columbia Library
|
| Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
| DOI |
10.14288/1.0378444
|
| URI | |
| Affiliation | |
| Campus | |
| Peer Review Status |
Unreviewed
|
| Scholarly Level |
Undergraduate
|
| Rights URI | |
| Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International