BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

Has anyone seen my plasmid Probing the dark corners of metagenome-assembled genomes Beiko, Robert

Description

Metagenomic analyses typically produce millions of short reads, sampled from the entire diversity of genomes present in a particular sample. While direct analysis of these reads can yield useful information about the diversity of microorganisms and functions present, a great deal of information can be learned by merging short reads into longer assemblies. Algorithms to reconstruct metagenome-assembled genomes (MAGs) draw from different types of evidence, including the relative abundance of particular reads in a sample, and the similarity of â wordsâ of length k (known as k-mers). Reconstruction of MAGs has shed new light on heretofore unknown deep lineages of bacteria, and revealed the degree of diversity of closely related organisms in different habitats. MAGs can also be very useful for the reconstruction of entire metabolic pathways and networks. However, the effectiveness of MAG assembly is not uniform, and stretches of DNA that deviate from the expected frequency or k-mer distribution can be difficult or impossible to correctly assign. This problem is especially acute in unusual constituents of the genome such as plasmids and genomic islands (GIs); since these elements often harbour useful information about antimicrobial resistance and other important pathways, their absence from a MAG can lead to underestimation of their abundance. We assessed the extent of the problem using a simulated 250 base-pair paired-end metagenome of 30 genomes displaying a broad range of GI abundance and numbers of plasmids. Across a range of methods, a median of 66.2% of all chromosomal sequence was binned into MAGs; however, only 23.1% of plasmids and 31.7% of GIs were similarly present in any bin. When assessing the percentage of GIs and plasmids that were correctly assigned to the same bin as the rest of their source genome this performance is even worse (median 32.5% of GIs and 6.9% of plasmids). These results on a relatively simple simulated community point to (possibly fundamental) limitations of existing methods in assigning exotic elements to their correct source genome. Although further improvements will undoubtedly be realized through better algorithms and statistics, high accuracy may depend on the integration of additional DNA sequencing data, and better use of known reference genomes.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International