- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Copy number estimation for high-throughput short read...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Copy number estimation for high-throughput short read shotgun sequencing de novo whole-genome assembly contigs Lim, Yee Fay
Abstract
High-throughput short shotgun sequencing reads, also known as second-generation sequencing (SGS) reads, continue to be prevalent for de novo whole-genome assembly, whether alone or in combination with long-range information. Knowledge of contig multiplicity (copy number) is acknowledged to improve assembly correctness, contiguity, and coverage for SGS reads. Despite that, a principled, general solution for contig copy number estimation in de novo whole-genome SGS assembly has been unavailable. In the literature, the problem is generally unaddressed or given heuristic treatment. In this work, we introduce a novel, versatile statistically informed contig copy number estimator, based on mixture models, for high-throughput short read shotgun sequencing de novo whole-genome assembly. In particular, this tool targets de Bruijn graph assembly, the dominant paradigm for de novo whole-genome SGS assembly. We show that it performs reliably at resolving multiplicities up to low repeat copy numbers; it is also robust over a range of genome characteristics, sequencing coverage levels, and assembly settings. Moreover, it is far more versatile than the closest existing alternative tools and usually outperforms them, often by a wide margin. At the same time, somewhat reduced though still robust performance in a limited set of experiments using real sequencing data suggests fundamental limitations to its usage of only length and read coverage data; incorporating other types of information, e.g. GC content, may be necessary to improve performance. Our code is publicly available at https://github.com/bcgsc/wgs-copynum-est; we hope this effort will provide a useful reference for similar future work.
Item Metadata
Title |
Copy number estimation for high-throughput short read shotgun sequencing de novo whole-genome assembly contigs
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2021
|
Description |
High-throughput short shotgun sequencing reads, also known as second-generation sequencing (SGS) reads, continue to be prevalent for de novo whole-genome assembly, whether alone or in combination with long-range information. Knowledge of contig multiplicity (copy number) is acknowledged to improve assembly correctness, contiguity, and coverage for SGS reads. Despite that, a principled, general solution for contig copy number estimation in de novo whole-genome SGS assembly has been unavailable. In the literature, the problem is generally unaddressed or given heuristic treatment.
In this work, we introduce a novel, versatile statistically informed contig copy number estimator, based on mixture models, for high-throughput short read shotgun sequencing de novo whole-genome assembly. In particular, this tool targets de Bruijn graph assembly, the dominant paradigm for de novo whole-genome SGS assembly. We show that it performs reliably at resolving multiplicities up to low repeat copy numbers; it is also robust over a range of genome characteristics, sequencing coverage levels, and assembly settings. Moreover, it is far more versatile than the closest existing alternative tools and usually outperforms them, often by a wide margin. At the same time, somewhat reduced though still robust performance in a limited set of experiments using real sequencing data suggests fundamental limitations to its usage of only length and read coverage data; incorporating other types of information, e.g. GC content, may be necessary to improve performance. Our code is publicly available at https://github.com/bcgsc/wgs-copynum-est; we hope this effort will provide a useful reference for similar future work.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2021-04-22
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-ShareAlike 4.0 International
|
DOI |
10.14288/1.0396908
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2021-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-ShareAlike 4.0 International