Has anyone seen my plasmid? Probing the dark corners of metagenome-assembled genomes

BIRS Workshop Lecture Videos

Featured Collection

BIRS Workshop Lecture Videos

Has anyone seen my plasmid? Probing the dark corners of metagenome-assembled genomes Beiko, Robert

Description

Metagenomic analyses typically produce millions of short reads, sampled from the entire diversity of genomes present in a particular sample. While direct analysis of these reads can yield useful information about the diversity of microorganisms and functions present, a great deal of information can be learned by merging short reads into longer assemblies. Algorithms to reconstruct metagenome-assembled genomes (MAGs) draw from different types of evidence, including the relative abundance of particular reads in a sample, and the similarity of â wordsâ of length k (known as k-mers). Reconstruction of MAGs has shed new light on heretofore unknown deep lineages of bacteria, and revealed the degree of diversity of closely related organisms in different habitats. MAGs can also be very useful for the reconstruction of entire metabolic pathways and networks. However, the effectiveness of MAG assembly is not uniform, and stretches of DNA that deviate from the expected frequency or k-mer distribution can be difficult or impossible to correctly assign. This problem is especially acute in unusual constituents of the genome such as plasmids and genomic islands (GIs); since these elements often harbour useful information about antimicrobial resistance and other important pathways, their absence from a MAG can lead to underestimation of their abundance. We assessed the extent of the problem using a simulated 250 base-pair paired-end metagenome of 30 genomes displaying a broad range of GI abundance and numbers of plasmids. Across a range of methods, a median of 66.2% of all chromosomal sequence was binned into MAGs; however, only 23.1% of plasmids and 31.7% of GIs were similarly present in any bin. When assessing the percentage of GIs and plasmids that were correctly assigned to the same bin as the rest of their source genome this performance is even worse (median 32.5% of GIs and 6.9% of plasmids). These results on a relatively simple simulated community point to (possibly fundamental) limitations of existing methods in assigning exotic elements to their correct source genome. Although further improvements will undoubtedly be realized through better algorithms and statistics, high accuracy may depend on the integration of additional DNA sequencing data, and better use of known reference genomes.

Item Metadata

Title	Has anyone seen my plasmid? Probing the dark corners of metagenome-assembled genomes
Creator	Beiko, Robert
Publisher	Banff International Research Station for Mathematical Innovation and Discovery
Date Issued	2019-09-17T11:38
Description	Metagenomic analyses typically produce millions of short reads, sampled from the entire diversity of genomes present in a particular sample. While direct analysis of these reads can yield useful information about the diversity of microorganisms and functions present, a great deal of information can be learned by merging short reads into longer assemblies. Algorithms to reconstruct metagenome-assembled genomes (MAGs) draw from different types of evidence, including the relative abundance of particular reads in a sample, and the similarity of â wordsâ of length k (known as k-mers). Reconstruction of MAGs has shed new light on heretofore unknown deep lineages of bacteria, and revealed the degree of diversity of closely related organisms in different habitats. MAGs can also be very useful for the reconstruction of entire metabolic pathways and networks. However, the effectiveness of MAG assembly is not uniform, and stretches of DNA that deviate from the expected frequency or k-mer distribution can be difficult or impossible to correctly assign. This problem is especially acute in unusual constituents of the genome such as plasmids and genomic islands (GIs); since these elements often harbour useful information about antimicrobial resistance and other important pathways, their absence from a MAG can lead to underestimation of their abundance. We assessed the extent of the problem using a simulated 250 base-pair paired-end metagenome of 30 genomes displaying a broad range of GI abundance and numbers of plasmids. Across a range of methods, a median of 66.2% of all chromosomal sequence was binned into MAGs; however, only 23.1% of plasmids and 31.7% of GIs were similarly present in any bin. When assessing the percentage of GIs and plasmids that were correctly assigned to the same bin as the rest of their source genome this performance is even worse (median 32.5% of GIs and 6.9% of plasmids). These results on a relatively simple simulated community point to (possibly fundamental) limitations of existing methods in assigning exotic elements to their correct source genome. Although further improvements will undoubtedly be realized through better algorithms and statistics, high accuracy may depend on the integration of additional DNA sequencing data, and better use of known reference genomes.
Extent	41.0 minutes
Subject	Mathematics; Statistics; Biology and other natural sciences; Applied statistics
Type	Moving Image
File Format	video/mp4
Language	eng
Notes	Author affiliation: Dalhousie University
Series	BIRS Workshop Lecture Videos (Banff, Alta)
Date Available	2020-03-16
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0389576
URI	http://hdl.handle.net/2429/73746
Affiliation	Non UBC
Peer Review Status	Unreviewed
Scholarly Level	Faculty
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Item Media

201909171138-Beiko_lrv.mp4 -- 289.08MB

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Open Collections

BIRS Workshop Lecture Videos

Has anyone seen my plasmid? Probing the dark corners of metagenome-assembled genomes Beiko, Robert

Description

Item Metadata

Item Media

Item Citations and Data

Rights