Data from: Morphological identification and single-cell genomics of marine diplonemids Gawryluk, Ryan M. R.; del Campo, Javier; Okamoto, Noriko; Strassert, Jurgen F. H.; Lukes, Julius; Richards, Thomas A.; Worden, Alexandra Z.; Santoro, Alyson E.; Keeling, Patrick J.
Recent global surveys of marine biodiversity have revealed that a group of organisms known as “marine diplonemids” constitutes one of the most abundant and diverse planktonic lineages . Though discovered over a decade ago [2 and 3], their potential importance was unrecognized, and our knowledge remains restricted to a single gene amplified from environmental DNA, the 18S rRNA gene (small subunit [SSU]). Here, we use single-cell genomics (SCG) and microscopy to characterize ten marine diplonemids, isolated from a range of depths in the eastern North Pacific Ocean. Phylogenetic analysis confirms that the isolates reflect the entire range of marine diplonemid diversity, and comparisons to environmental SSU surveys show that sequences from the isolates range from rare to superabundant, including the single most common marine diplonemid known. SCG generated a total of ∼915 Mbp of assembled sequence across all ten cells and ∼4,000 protein-coding genes with homologs in the Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology database, distributed across categories expected for heterotrophic protists. Models of highly conserved genes indicate a high density of non-canonical introns, lacking conventional GT-AG splice sites. Mapping metagenomic datasets  to SCG assemblies reveals virtually no overlap, suggesting that nuclear genomic diversity is too great for representative SCG data to provide meaningful phylogenetic context to metagenomic datasets. This work provides an entry point to the future identification, isolation, and cultivation of these elusive yet ecologically important cells. The high density of nonconventional introns, however, also portends difficulty in generating accurate gene models and highlights the need for the establishment of stable cultures and transcriptomic analyses.; Usage notes
Single-cell genomic scaffolds from 10 'wild-caught' marine diplonemidsFASTA format single-cell genomic scaffolds of 10 marine diplonemid (protist) cells are presented. Scaffolds were generated with the SPAdes assembler; contaminating sequences were removed, as described in the publication. Each FASTA file is derived from a single cell. Cells are referred to by the numbers used in the publication (i.e., cells 3, 13, 21, 27, 37, 47, 1sb, 4sb, 9sb, 21sb) as no species names exist.marine_diplonemid_SAGs.zipFigure S1 (related to Figure 1). Taxon-annotated GC plots demonstrate the effectiveness of our decontamination procedure.Plots were generated using blobtools (https://github.com/DRL/blobtools) for each SCG assembly before and after decontamination using the megablast/blastx protocol described in Experimental Procedures. Plots are based on megablast queries of the NCBI nt database according to taxonomic Order.FigS1.pdf