- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- The utility of SARS-CoV-2 genomic data for informative...
Open Collections
UBC Faculty Research and Publications
The utility of SARS-CoV-2 genomic data for informative clustering under different epidemiological scenarios and sampling Sobkowiak, Benjamin; Haghmaram, Pouya; Prystajecky, Natalie; Zlosnik, James E. A.; Tyson, John R.; Hoang, Linda M. N.; Colijn, Caroline
Abstract
Objectives: Clustering pathogen sequence data is a common practice in epidemiology to gain insights into the genetic diversity and evolutionary relationships among pathogens. We can find groups of cases with a shared transmission history and common origin, as well as identifying transmission hotspots. Motivated by the experience of clustering SARS-CoV-2 cases using whole genome sequence data during the COVID-19 pandemic to aid with public health investigation, we investigated how differences in epidemiology and sampling can influence the composition of clusters that are identified. Methods: We performed genomic clustering on simulated SARS-CoV-2 outbreaks produced with different transmission rates and levels of genomic diversity, along with varying the proportion of cases sampled. Results: In single outbreaks with a low transmission rate, decreasing the sampling fraction resulted in multiple, separate clusters being identified where intermediate cases in transmission chains are missed. Outbreaks simulated with a high transmission rate were more robust to changes in the sampling fraction and largely resulted in a single cluster that included all sampled outbreak cases. When considering multiple outbreaks in a sampled jurisdiction seeded by different introductions, low genomic diversity between introduced cases caused outbreaks to be merged into large clusters. If the transmission and sampling fraction, and diversity between introductions was low, a combination of the spurious break-up of outbreaks and the linking of closely related cases in different outbreaks resulted in clusters that may appear informative, but these did not reflect the true underlying population structure. Conversely, genomic clusters matched the true population structure when there was relatively high diversity between introductions and a high transmission rate. Conclusion: Differences in epidemiology and sampling can impact our ability to identify genomic clusters that describe the underlying population structure. These findings can help to guide recommendations for the use of pathogen clustering in public health investigations.
Item Metadata
Title |
The utility of SARS-CoV-2 genomic data for informative clustering under different epidemiological scenarios and sampling
|
Creator | |
Contributor | |
Publisher |
Elsevier
|
Date Issued |
2023-07-31
|
Description |
Objectives: Clustering pathogen sequence data is a common practice in epidemiology to gain insights into the
genetic diversity and evolutionary relationships among pathogens. We can find groups of cases with a shared
transmission history and common origin, as well as identifying transmission hotspots. Motivated by the experience of clustering SARS-CoV-2 cases using whole genome sequence data during the COVID-19 pandemic to aid
with public health investigation, we investigated how differences in epidemiology and sampling can influence
the composition of clusters that are identified.
Methods: We performed genomic clustering on simulated SARS-CoV-2 outbreaks produced with different transmission rates and levels of genomic diversity, along with varying the proportion of cases sampled.
Results: In single outbreaks with a low transmission rate, decreasing the sampling fraction resulted in multiple,
separate clusters being identified where intermediate cases in transmission chains are missed. Outbreaks simulated with a high transmission rate were more robust to changes in the sampling fraction and largely resulted in a
single cluster that included all sampled outbreak cases. When considering multiple outbreaks in a sampled
jurisdiction seeded by different introductions, low genomic diversity between introduced cases caused outbreaks
to be merged into large clusters. If the transmission and sampling fraction, and diversity between introductions
was low, a combination of the spurious break-up of outbreaks and the linking of closely related cases in different
outbreaks resulted in clusters that may appear informative, but these did not reflect the true underlying population structure. Conversely, genomic clusters matched the true population structure when there was relatively
high diversity between introductions and a high transmission rate.
Conclusion: Differences in epidemiology and sampling can impact our ability to identify genomic clusters that
describe the underlying population structure. These findings can help to guide recommendations for the use of
pathogen clustering in public health investigations.
|
Subject | |
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-12-22
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0438340
|
URI | |
Affiliation | |
Citation |
Sobkowiak, B., Haghmaram, P., Prystajecky, N., Zlosnik, J. E. A., Tyson, J., Hoang, L. M. N., & Colijn, C. (2023). The utility of SARS-CoV-2 genomic data for informative clustering under different epidemiological scenarios and sampling. Infection, Genetics and Evolution, 113, 105484-105484.
|
Publisher DOI |
10.1016/j.meegid.2023.105484
|
Peer Review Status |
Reviewed
|
Scholarly Level |
Faculty; Postdoctoral; Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International