A multi-omics approach to microbialnitrogen and sulfur cycling in the oxygenstarved oceanbyAlyse Kathleen HawleyB.Sc., Biochemistry, The University of Victoria, 2004A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Microbiology and Immunology)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2018© Alyse Kathleen Hawley 2018The following individuals certify that they have read, and recommend to the Faculty of Graduateand Postdoctoral Studies for acceptance, the dissertation entitled:A multi-omics approach to microbial nitrogen and sulfur cycling in the oxygen starved oceanSubmitted by Alyse K. Hawley in partial fulfillment of the requirements forthe degree of Doctor of Philosophyin Microbiology and ImmunologyExamining Committee:Dr. Steven J. Hallam, Microbiology and ImmunologySupervisorDr. Philippe Tortell, Earth, Ocean and Atmospheric SciencesSupervisory Committee MemberDr. Julian Davies, Microbiology and ImmunologyUniversity ExaminerDr. Rosemary J. Redfield, ZoologyUniversity ExaminerChairAdditional Supervisory Committee Members:Dr. William W. Mohn, Microbiology and ImmunologySupervisory Committee MemberDr. Lindsay Eltis, Microbiology and ImmunologySupervisory Committee MemberDr. Leonard Foster, Biochemistry and Molecular BiologySupervisory Committee MemberiiAbstractMicrobial communities mediate biogeochemical processes of Carbon (C), Nitrogen (N) andSulfur (S) cycling in the ocean on global scales. Oxygen (O2) availability is a key driver in theseprocesses and shapes microbial community structure and metabolisms. As O2 decreases, microbesutilize alternative terminal electron acceptors, nitrate (NO3 – ), nitrite, sulfate and carbon dioxide,depleting biologically available nitrogen and producing greenhouse gases nitrous oxide (N2O) andmethane (CH4). Marine oxygen minimum zones (OMZs) are areas of O2-depletion (O2<20 µm)in sub-surface waters due to the respiration of organic matter from the surface. In areas ofacute O2-depletion or where OMZs contact underlying sediments, hydrogen sulfide (H2S) andCH4 accumulate within OMZ waters, drastically altering microbial community structure andmetabolism. In this thesis, I explore microbial cycles along defined gradients of O2, NO3 – and H2Sin Saanich Inlet, a seasonally anoxic fjord on the coast of British Columbia Canada. I develop atime-resolved multi-omic dataset consisting of small subunit ribosomal RNA amplicon sequences,single cell amplified genomes (SAGs), metagenomes, -transcriptomes and -proteomes, coupledwith geochemical measurements, enabling robust microbial metabolic reconstruction at theindividual, population and community levels of organization. Using metaproteomics, I constructa conceptual model of metabolic interactions involving N and S cycling, and carbon fixation,forming the basis for a collaborative effort to build a gene-centric numerical model, identifying anunrecognised niche for N2O reduction. Using single cell amplified genomes (SAGs) from SaanichInlet, I identify genes for N2O reduction, (nosZ), within the dark matter phylum Marinimicrobiaclade SHBH1141, filling the proposed niche of non-denitrifying N2O-reducers. Using globallysourced Marinimicrobia SAGs, I further analyze energy metabolism and biogeography of severalMarinimicrobia clades, revealing roles in C, N and S cycling along eco-thermodynamic gradientsthroughout the ocean. Finally, I chart the global abundance and distribution of nosZ genes andtranscripts within the ocean, identifying previously unappreciated potential sinks for N2O. AsOMZs continue to expand and intensify due to climate change, defining metabolic processes andinteractions along gradients of O2-depletion becomes increasingly important. This thesis providesfoundational knowledge related to the microbial communities driving coupled biogeochemicalcycling in OMZs.iiLay SummaryCommunities of microorganisms in the ocean are crucial for cycling carbon (C), nitrogen (N)and sulfur (S), elements essential for life. Climate change is depleting oxygen in much of theocean, causing microbes to use nitrogen compounds instead, removing nitrogen available toother organisms for growth and producing the greenhouse gas nitrous oxide (N2O). To betterunderstand impacts of oxygen-depletion on C, N and S cycling, I looked at microbial genes,transcripts and proteins along gradients of oxygen-depletion in Saanich Inlet, a fjord on the BritishColumbia coast. With this combined dataset I constructed models for the exchange of nitrogenand sulphur based molecules between microbial groups under different oxygen conditions. Ifurther identified microbes that consume N2O and modeled interactions with other microbesinvolved in this process, improving our collective understanding of how oxygen depletion effectscoupled nitrogen, sulfur and carbon cycles in the ocean.iiiPrefaceA number of sections of this work are partly or wholly published in press or accepted. Copyrightlicences to all works were obtained and are listed where appropriate.• Chapter 1: Text was written by Alyse Hawley. Figures were either used from other publica-tions with permission or generated by Alyse Hawley as indicated.• Chapter 2: Chapter 2 was written by Alyse Hawley with input from Steven Hallam. SaanichInlet datasets are the result of the hard work of many students, technicians, volunteers andpost-doctoral fellows. Specifically, for the generation of chemical and physical datasets: bothAlyse Hawley and Mo´nica Torres Beltra´n held the positions of Chief Scientist on board theStrickland for several years and oversaw sample collection, quality control and data curation.Chief Scientist position was also held by Steven Hallam, Elena Zaikova, Olena Shevchuk,Craig Miews and Jade Shiller during this time. Sea going technicians, Chris Payne andLaurisa Pakhomova, operated the CTD and collected samples for oxygen calibration andsalinity and curated these datasets. Dissolved gas measurements and associated qualitycontrol were carried out by David Capelle.The generation of multi-omic datasets was the hard work of many people. Metagenomicdatasets, including DNA extractions, were carried out by Alyse Hawley, Mo´nica Torres-Beltra´n, Melanie Scofield, Payal Sipahimalani, Elena Zaikova, Olena Shevchuk and StevenHallam. Fosmid libraries were constructed by Steven Hallam and David Walsh. Sequencingwas carried out at the Joint Genome Institute (JGI) including library production and qualitycontrol. The generation of metatranscriptomic datasets included extractions with protocoldesigned by and carried out by Alyse Hawley. cDNA library construction and sequencingwas carried out at the JGI including quality control. The generation of metaproteomic dataincluded extractions with protocol designed by Alyse Hawley, with aid from Heather Brewer.Extractions were carried out by Alyse Hawley with aid from Jinshu Yang and HeatherBrewer. Generation of peptide spectra was carried out by Heather Brewer at EnvironmentalMolecular Science Laboratory (EMSL) at Pacific Northwest National Labs (PNNL) andspectra mapped to protein database by Angela Norbeck. Preparation of samples for smallsubunit ribosomal tag sequencing was carried out by Melanie Scofield and Mo´nica TorresBeltra´n, sequencing was carried out at the JGI or at Genome Quebec. Tag sequence qualitycontrol and processing was carried out by Mo´nica Torres Beltra´n and Kishori Konwar. Theivmulti-omics datasets were processed through MetaPathways, designed and built by NielsHanson, Kishori Konwar and Steven Hallam.ä Portions of this text and protocols were published in the Methods in Enzymology chapter:Hawley, A. K., Kheirandish, S., Mueller, A., Leung, H. T., Norbeck, A. D., Brewer, H. M.,Pasa-Tolic, L., Hallam, S. J., 2013. Molecular tools for investigating microbial communitystructure and function in oxygen-deficient marine waters. Methods in Enzymology, MicrobialMetagenomics, Metatranscriptomics, and Metaproteomics. 2013;531 305-29.ä Datasets and descriptions were published at Scientific Data as: Hawley, A K., Torres-Beltra´n, M., Bhatia, M., Zaikova, E., Walsh, D. A., et al. 2017. A compendium of multi-omicsequence information from the Saanich Inlet water column. Scientific Data.2017; 4:170160.andä Torres-Beltra´n, M., Hawley, A. K., Capelle, D., Zaikova, E., Walsh, D. A., et al. 2017. Acompendium of geochemical information from the Saanich Inlet water column. ScientificData 4:170159.ä Additional manuscript using metagenomic and sing-cell amplified genomes is publishedin eLife as: Roux, S., Hawley, A. K., Torres-Beltra´n, M., Scofield, M., Schwientek, P.,Stepanauskas, R., Woyke, T., Hallam, S. J., Sullivan, M. B., 2014 Ecology and evolution ofviruses infecting uncultivated SUP05 bacteria as revealed by single-cell and meta-genomics.eLife. 3;e03125.ä Additional manuscript describing application of MetaPathways annotation pipeline tometagenomic datasets is published in BMC Genomics as: Hanson, N. W., Konwar, K. M.,Hawley, A. K., Altman, T., Karp, P. D., Hallam, S. J., 2014. Metabolic pathways for the wholecommunity. BMC Genomics 15:619• Chapter 3: Chapter 3 analysis and writing was carried out by Alyse Hawley with inputfrom Steven Hallam. Sample preparation was carried out by Alyse Hawley and HeatherBrewer at EMSL. Matching of peptide spectra to protein sequences and calculation of falsediscovery rate was carried out by Angela Norbeck at EMSL. SSUrRNA tag sequences weresequenced at the JGI and processed by Kishori Konwar. Identification of protein taxonomyand function as well as calculation of normalised spectral abundance factor was calculatedby Alyse Hawely.ä A version of this chapter is published in Proceeding of the National Academy of Sci-ences as: Hawley, A K. Brewer, H.M. Norbeck, A. D. Pasˇa-Tolic, L. Hallam, S. J. 2014.Metaproteomics reveals differential modes of metabolic coupling among ubiquitous oxy-gen minimum zone microbes. Proceedings of the National Accademy of Sciences USA. 111:3111395-11400.vä Additional manuscript based on these analyses with further modeling of geochemicaland multi-omic datasets is published in Proceedings of the National Academy of Sciencesas: Louca, S., Hawley, A. K., Katsev, S., Torres-Beltra´n, M., Bhatia, M. P., Kheirandish, S.,Michiels, C., Capelle, D., Lavik, G., Doebeli, M., Crowe, S. A., Hallam, S. J. 2016. Integratingbiogeochemistry with multi-omic sequence information in a model oxygen minimum zone.Proceedings of the National Accademy of Sciences USA. 113:40 E5925-E5933.• Chapter 4: Chapter 4 analysis and writing was carried out by Alyse Hawley with contri-butions from Nobu Masaru and Jody Wright and input from Steven Hallam. Collectionof samples for single-cell amplified genomes (SAGs) from Saanich Inlet was carried outby Alyse Hawley and Mo´ncia Torres-Beltraa´n, from North Eastern Subarctic Pacific Oceanwaters by Jody Wright. SAGs from other locations were collected for previous publicationsas indicated. Sequencing of SAGs from Saanich Inlet was carried out at the Genome SciencesCentre; sequencing of SAGs from NESAP was carried out at the JGI. Assembly and decon-tamination of SAGs were carried out at the JGI. Genome reduction analysis was carried outby Nobu Masaru. Phylogenetic analysis and associated figure production were carried outby Jody Wright, Brent Sage and Keith Miews. Generation of population genome bins wasdone by Evan Durno. Global metagenome fragment recruitment analysis was carried out byAlyse Hawley, aided by Connor Morgan-Lang. Metabolic analysis was carried out by NobuMasaru, Alyse Hawley and Jody Wright. Expression analysis and global nosZ distributionwas carried out by Alyse Hawley.ä A version of this chapter is published in Nature Communications as: Hawley, A. K.,Nobu, M. K., Wright, J. J., Durno, W. E., Morgan-Lang, C., Sage, B., Schwientek, P., Swan, B.K., Rinke, C., Torres-Beltra´n, M., Mewis, K., Liu, W., Stepanauskas, R., Woyke, T., Hallam, S.J. 2017. Diverse Marinimicrobia bacteria may mediate coupled biogeochemical cycles alongeco-thermodynamic gradients. Nature Communications. 8:1507.• Chapter 5: Chapter 5 analysis and writing was carried out by Alyse Hawley with inputfrom Steven Hallam. Collection of samples for single-cell amplified genomes (SAGs) fromSaanich Inlet was carried out by Steven Hallam, Alyse Hawley and Mo´ncia Torres-Beltra´n.SAGs were sequenced at the Genome Sciences Centre and assembled and decontaminatedby Connor Morgan-Lang. Identification of nitrogen cycling genes in SAGs and globalmetagenomes and expression analysis was carried out by Alyse Hawley. Phylogenetic treefor NosZ was constructed by Connor Morgan-Lang and nitrous oxide measurement done byDavid Capelle.ä Manuscript detailing the N2O dynamics in Saanich Inlet is published in Limnologyand Oceanography as: Capelle, D. W., Hawley, A. K., Hallam, S. J., Tortell, P. D. 2018. AMulti-year time-series of N2O dynamics in a seasonally anoxic fjord: Saanich Inlet, BritishviColumbia. Limnology and Oceanography 63:2 524-539.None of the work encompassing this dissertation required consultation with the UBC ResearchEthics Board.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Global oxygen minimum zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 OMZ formation and global distribution . . . . . . . . . . . . . . . . . . . . . 11.1.2 OMZ expansion and intensification . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Redox gradients and redox driven niche partitioning . . . . . . . . . . . . . 41.1.4 OMZ microbial community overview . . . . . . . . . . . . . . . . . . . . . . . 61.2 Biogeochemical cycles in OMZs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.1 Biogeochemical cycling of nitrogen in OMZs . . . . . . . . . . . . . . . . . . 81.2.2 Biogeochemical cycling of sulfur in OMZs . . . . . . . . . . . . . . . . . . . . 141.2.3 Carbon fixation in OMZs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.3 Microbes in community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3.1 Co-metabolic interactions in microbial communities . . . . . . . . . . . . . . 161.3.2 Using multi-omics to study microbial communities . . . . . . . . . . . . . . 171.4 Saanich Inlet as a model OMZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.4.1 Saanich Inlet microbial community . . . . . . . . . . . . . . . . . . . . . . . . 211.4.2 The SUP05 Gammaproteobacteria group in Saanich Inlet . . . . . . . . . . . 211.4.3 Saanich Inlet as a model OMZ system . . . . . . . . . . . . . . . . . . . . . . 23viii1.5 Thesis objectives and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Methodologies and workflows for generating and processing mulit-omic datasets . . 262.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2 Sampling and multi-omic dataset overview . . . . . . . . . . . . . . . . . . . . . . . 282.3 Establishing water column chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4 Exploring oxygen minimum zone community structure . . . . . . . . . . . . . . . . 322.4.1 Sample collection and DNA extraction for metagenomics and SSU rRNAtag sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.4.2 SSU rRNA tag sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.4.3 DNA sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.4.4 Metagenomic samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.5 Exploring oxygen mimimum zone microbial community transcriptome . . . . . . . 362.5.1 Sample collection and RNA extraction . . . . . . . . . . . . . . . . . . . . . . 372.5.2 RNA sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5.3 Metatranscriptomic Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.6 Exploring oxygen minimum zone microbial community proteome . . . . . . . . . . 382.6.1 Sample collection and protein extraction . . . . . . . . . . . . . . . . . . . . . 392.6.2 Protein sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.6.3 Taxonomic binning and visualization of expressed proteins . . . . . . . . . . 412.6.4 Metaproteomic samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.7 Single-cell amplified genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.8 Annotation of meta-omics datasets by MetaPathways . . . . . . . . . . . . . . . . . 462.9 Conclusion and application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Metaproteomics reveals differential modes of metabolic coupling among ubiquitousoxygen minimum zone microbes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.2.1 Water column chemistry and molecular sampling . . . . . . . . . . . . . . . 523.2.2 Patterns of redox-driven niche partitioning . . . . . . . . . . . . . . . . . . . 563.2.3 Differential gene expression patterns . . . . . . . . . . . . . . . . . . . . . . . 583.2.4 Regulated gene expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.5 Metabolic coupling model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3 Conclusions and future implications . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.4.1 Sample collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.4.2 Environmental DNA extraction, sequencing and assembly . . . . . . . . . . 693.4.3 PCR amplification of SSU rRNA gene for pyrotag sequencing and analysis . 70ix3.4.4 Environmental protein extraction and identification . . . . . . . . . . . . . . 703.4.5 Functional and taxonomic assignment of metagenome and metaproteome . 713.4.6 Hierarchical clustering of metaproteomic samples . . . . . . . . . . . . . . . 714 Diverse Marinimicrobia bacteria may mediate coupled biogeochemical cycles along eco-thermodynamic gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.2.1 Marinimicrobia single-cell amplified genomes and phylogeny . . . . . . . . 754.2.2 Biogeography of Marinimicrobia clades . . . . . . . . . . . . . . . . . . . . . 784.2.3 Metabolic reconstruction and gene model validation . . . . . . . . . . . . . . 804.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.4.1 SAG collection, sequencing, assembly, and decontamination . . . . . . . . . 904.4.2 Phylogenomic analysis of SAGs . . . . . . . . . . . . . . . . . . . . . . . . . . 904.4.3 Metagenome fragment recruitment . . . . . . . . . . . . . . . . . . . . . . . . 914.4.4 Saanich Inlet and NESAP metagenomes and metatranscriptomes . . . . . . 924.4.5 Marinimicrobia genome streamlining . . . . . . . . . . . . . . . . . . . . . . . 934.4.6 Annotation and identification of metabolic genes of interest . . . . . . . . . 944.4.7 Gene expression mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.4.8 Global distribution and expression of nosZ . . . . . . . . . . . . . . . . . . . 955 A niche for NosZ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.2.1 Inventory of single-cell amplified genomes . . . . . . . . . . . . . . . . . . . 995.2.2 Clustering & genomic neighbourhood analysis . . . . . . . . . . . . . . . . . 1015.2.3 NosZ phylogeny and abundance . . . . . . . . . . . . . . . . . . . . . . . . . 1015.2.4 nosZ global distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.2.5 NosZ time resolved multi-omic dynamics in Saanich Inlet . . . . . . . . . . 1065.2.6 nosZ global niches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145.2.7 Completing the tree with additional clades . . . . . . . . . . . . . . . . . . . 1165.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.5.1 Single-cell Amplified genome collection, sequencing and annotation . . . . 1235.5.2 Multi-omics datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.5.3 Identification and clustering of nosZ sequences . . . . . . . . . . . . . . . . . 1245.5.4 Generation of NosZ phylogenetic tree . . . . . . . . . . . . . . . . . . . . . . 124x5.5.5 Gene, transcript and protein abundance mapping . . . . . . . . . . . . . . . 1245.5.6 Denitrification and Anammox rate measurements . . . . . . . . . . . . . . . 1256 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.1 Advantages and limitations with multi-omics approaches . . . . . . . . . . . . . . . 1276.2 Methodological and analytical developments . . . . . . . . . . . . . . . . . . . . . . 1286.2.1 Field work and sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.2.2 Analysis and Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.3 Expanding the Saanich Inlet model to Global OMZs . . . . . . . . . . . . . . . . . . 1306.3.1 Questions about ecology and global implications . . . . . . . . . . . . . . . . 1306.3.2 SUP05 sub-clade metabolism, population dynamics and biogeography . . . 1316.4 Themes in microbial interactions along eco-thermodynamic gradients . . . . . . . . 1326.5 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135AppendicesA Chapter 2: Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150A.1 RNA extraction and isolation protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.2 Protein extraction and isolation protocol . . . . . . . . . . . . . . . . . . . . . . . . . 159A.3 Protein sequencing protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161A.4 Taxonomic binning and visualization of expressed proteins . . . . . . . . . . . . . . 162B Chapter 3: Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166C Chapter 4: Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178D Chapter 5: Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198xiList of Tables2.1 Physical and chemical parameters and protocols . . . . . . . . . . . . . . . . . . . . . 324.1 Genomic features of Marinimicrobia SAGs . . . . . . . . . . . . . . . . . . . . . . . . 764.2 Genomic features of Marinimicrobia population genomes . . . . . . . . . . . . . . . 83A.1 Metagenome inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151A.2 Metatranscriptome inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153A.3 Metaproteome inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154B.1 Number of detected peptides and proteins . . . . . . . . . . . . . . . . . . . . . . . . 171B.2 Taxonomic breakdown for April 2008 metagenome . . . . . . . . . . . . . . . . . . . 172B.3 Taxonomic breakdown for Sepetmber 2009 metaproteome . . . . . . . . . . . . . . . 173B.4 Protein naming key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177C.1 Metagenome inventory for global fragment recruitment . . . . . . . . . . . . . . . . 179C.2 Metagenome fragment recruitment summary . . . . . . . . . . . . . . . . . . . . . . . 186C.3 Genomic features of Mrinimicrobia population genome bin . . . . . . . . . . . . . . 187C.4 Summary of central metabolism in Marinimicrobia lineages . . . . . . . . . . . . . . 188D.1 Summary of CheckM statistics for SAGs with taxonomies containing nosZ . . . . . 198D.2 Metagenome and metatranscriptome RPKM for clades and nodes by chemistry . . 199D.3 Total clade abundance and expression . . . . . . . . . . . . . . . . . . . . . . . . . . 201xiiList of Figures1.1 Global OMZ distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 OMZ types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 OMZ expansion in North Eastern Subarctic Pacific . . . . . . . . . . . . . . . . . . . 41.4 Electron tower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Nitrogen cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.6 Saanich Inlet topology and bathymetry . . . . . . . . . . . . . . . . . . . . . . . . . . 202.1 Multi-omics sampling overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2 SSU rRNA tag sequencing validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.3 Metagenomic sequencing validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.4 RNA sequencing validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.5 Protein validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.6 Least common ancestor distribution of detected proteins . . . . . . . . . . . . . . . . 422.7 Singe-cell amplified genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.8 Nitrogen cycling pathways in MetaPathways . . . . . . . . . . . . . . . . . . . . . . . 483.1 Saanich Inlet bathymetry and chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 Sample hierarchical clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.3 Taxonomic distribution of metagenome, metaproteome and pyrotags . . . . . . . . . 563.4 Nitrogen, sulfur and carbon cycling proteins . . . . . . . . . . . . . . . . . . . . . . . 593.5 SUP05 gene expression regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.6 metabolic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.1 Marinimicrobia tree, clade mapping to electron tower . . . . . . . . . . . . . . . . . . 784.2 Phylogeny and electron donors of Marinimicrobia and Biogeographic distribution . 794.3 Energy metabolism of Marinimicrobia population genome bins . . . . . . . . . . . . 854.4 Expression of selected Marinimicrobia energy metabolism genes . . . . . . . . . . . 864.5 Proposed co-metabolic model along eco-thermodynamic gradients . . . . . . . . . . 885.1 Saanich Inlet SAG inventory and taxonomy . . . . . . . . . . . . . . . . . . . . . . . . 1005.2 Genome neighbourhood analysis for nosZ containing SAGs . . . . . . . . . . . . . . 1025.3 NosZ phylogenetic tree with global abundance and expression . . . . . . . . . . . . 104xiii5.4 Abundance and expression of nosZ in global systems . . . . . . . . . . . . . . . . . . 1055.5 Peruvian and ETSP nosZ clade distribution . . . . . . . . . . . . . . . . . . . . . . . . 1075.6 Saanich Inlet time series chemical profiles and nosZ multi-omic dynamics . . . . . . 1105.7 Metatranscriptome expression dynamics of nosZ subclades in Saanich Inlet . . . . . 1125.8 Saanich Inlet denitrification and anammox rates . . . . . . . . . . . . . . . . . . . . . 1135.9 nosZ clade distribution and expression along chemical gradients . . . . . . . . . . . 1165.10 NosZ tree with additional environmental sequences . . . . . . . . . . . . . . . . . . . 1186.1 Motifs for metabolic interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133B.1 Detected nitrogen cycling proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167B.2 Detected sulfur and hydrogen cycling proteins . . . . . . . . . . . . . . . . . . . . . . 168B.3 Detected proteins in carbon fixation pathways . . . . . . . . . . . . . . . . . . . . . . 169C.1 Genomic streamlining in Marinimicrobia clades . . . . . . . . . . . . . . . . . . . . . 189C.2 Marinimicrobia phyogenomic tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190C.3 Global prevalence of Marinimicrobia in surveyed metagenomes . . . . . . . . . . . . 191C.4 Saanich Inlet water column chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . 192C.5 Origin, length and abundance of contigs in population genomes . . . . . . . . . . . 193C.6 Expression of energy metabolism enzyme subunits . . . . . . . . . . . . . . . . . . . 194C.7 Marinimicrobia nosZ genes and expression in Saanich Inlet Time Series . . . . . . . 195C.8 Differential expression of enzymes involved in electron transfer . . . . . . . . . . . . 196C.9 Energy metabolism summary and operons . . . . . . . . . . . . . . . . . . . . . . . . 197D.1 SUP05 phylogenetic tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202D.2 Proportions of nosZ clades in Saanich Inlet metagenome and metatranscriptome . . 203D.3 Proportions of nosZ subclades in metatranscriptome . . . . . . . . . . . . . . . . . . 204D.4 Abundance of nosZ clades for Knorr Cruise . . . . . . . . . . . . . . . . . . . . . . . . 205D.5 Abundance of nosZ clades along TARA Oceans cruise track . . . . . . . . . . . . . . 207xivAcknowledgementsFirst of all I would like to thank my supervisor Dr. Steven Hallam and acknowledge the chance hetook in bringing me into the lab and for always pushing me to do better, think both deeply aboutdetail and think big about global processes. Thank you for introducing me to microbial ecology,you have changed the way I see the world and I would never go back. Mostly I would like tothank you, Dr. Hallam, for a challenging and rewarding 10 years of working with you! I wouldlike to thank my committee, Dr.’s Philippe Tortell, Bill Mohn, Leonard Foster and Lindsay Eltis fortheir support and input and particular Dr. Philip Tortell for my introduction to oceanography - ithas begun quite a love affair! I would like to thank all of the members of the Hallam lab, past andpresent, for their unvaried support, both emotional and logistical. Monica Torres Beltra´n, thankyou for being my partner in crime and my comrade in Saanich, you have been a tremendoussupport and source of insight! Maya Bhatia, thank you for your advice and support and kickin the butt when I needed one. Elena Zaikova and Esther Gies, thank you for your friendshipand support and willingness to go to Saanich and help with unending field work. Evan Durnoand Connor Morgan-Lang thank you for writing and running so much code for me and yourwillingness to help me any time I needed it! Aria Hahn, thank you for consultations on dataprocessing and visualization. I would also like to thank all of those who have worked in SaanichInlet over the years with me, both as chief scientists and as support. I would like to thank ChrisPayne and Laura Pakhomovitch for their hard work and accompaniment in Saanich. And toCaptain Ken and the Strickland crew, thank you for many fond memories of Saanich Inlet. I wouldalso like to thank my collaborators at the Joint Genome Institute for sequencing so many samples,and at Environmental Sciences Laboratory at Pacific Northwest National Labs for support inproteomics, none of this work would have been possible without you. I also thank my parents,Kathy and Jan, for endless encouragement and support and my in-laws Doreen and Ken for bothfinancial support and child care. Lastly, I would also like to acknowledge the sacrifices of myhusband, Michael Hawley, and my family for the past years as I dedicated so much time to thiswork. Michael, it would not have happened without all of your support.xvDedicationTo Mike who reminds me to be who I amTo Avery and Riley who remind me of what is to comeTo my parents who supported me to become the person I amxviChapter 1Introduction’When we try to pick out anything by itself, we find it hitched toeverything else in the universe.’John Muir (1911)1.1 Global oxygen minimum zonesOceans occupy over two-thirds of the planet’s surface and host a vast diversity of microbiallife. The metabolic activity of this microbial life is responsible for major global biogeochemicalprocesses such as oxygen (O2) production, carbon fixation (or primary production), cyclingof nitrogen, sulfur, phosphorus and many other elements and nutrients. Most of the oceanis moderately oxygenated (200 - 100 µm) [1], providing a ready source of terminal electronacceptor for breakdown and respiration of large amounts of organic matter produced by primaryproduction in the surface waters. However, ∼ 7% of the global ocean is depleted in oxygen[2, 3], with concentrations dropping below 20 µm. Areas of O2-depletion, called oxygen minimumzones (OMZs) host unique microbial communities with metabolic activity adapted to thrive in anO2-depleted environment [2, 4]. In OMZs, O2-depletion shifts the microbial communities and theirmetabolisms to the use of alternative terminal electron acceptors such as nitrate (NO3 – ), nitrite(NO2 – ), and carbon dioxide (CO2) [5], changing the biogeochemical processes and resulting in theproduction of greenhouse gases nitrous oxide (N2O) and methane (CH4) and loss of biologicallyavailable nitrogen [6] with implications for global climate and nutrient availability.1.1.1 OMZ formation and global distributionOxygen minimum zones form in subsurface waters, resulting from a combination of respirationof organic matter from primary production in surface waters and restricted ventilation or mixing,1preventing re-oxygenation at the surface [2, 7]. Oxygen minimum zones most often occuron western continental margins (Figure 1.1). As winds blow north (or south in the southernhemisphere) along western coasts the coriolis effect moves the surface waters away from the coastcausing deeper, nutrient (nitrate, phosphate, silicate) rich waters to move upward, in a processcalled upwelling. Once exposed to the sunlight, nutrient rich waters support phytoplankton tocarry out rapid photosynthesis, causing a high influx of organic matter in the surface waters.As phytoplankton die off the organic matter rains down into the subsurface waters whereheterotrophic bacteria degrade it, respiring the O2 present. A lack of ventilation in the subsurfacewaters prevents re-oxygenation, resulting in OMZ formation [6, 7]. In the open ocean, OMZs formin a similar manner, but with little upwelling to fuel rapid photosynthesis, O2-depletion is lessintense (Figure 1.2).806040LongitudeOxygen (Milliliters / Liter)200-20-150 -100 -50 0Latitude50 100 150-40-60-80876543210HOT ETSPETNPPeruvian Chilean Arabian BBNESAP SI CBNAMBaltic BlackFigure 1.1: Global OMZ distribution. Global distribution of OMZs including: Northeastern SubarcticPacific Ocean (NESAP), Saanich Inlet (SI), Eastern Tropical North Pacific (ETNP), Cariaco Basin(CB), Balticand Black Seas, Hawaii Ocean Time-series (HOT), Eastern Tropical South Pacific (ETSP), Peruvian, Chilean,Namibian (NAM) , Arabian and Bay of Bangual (BB). Oxygen concentration shown at depth of minimumoxygen concentration. Figure was modified from Wright et al. 2012.The global distribution of OMZs is substantial (Figure 1.1), with major OMZs including openocean OMZs(Figure 1.2A): North Eastern Subarctic Pacific (NESAP), Eastern Tropical North2Pacific (ETNP), Eastern Tropical South Pacific (ETSP) and the Bay of Bengal; and coastal andsemi-enclosed inlet and basin OMZs: Saanich Inlet (SI), Peruvian and Chilean upwelling systems,Baltic Sea, Black Sea, Namibian shelf and Cariaco Basin. Oxygen minimum zone formation in semi-enclosed inlets, basins and coastal areas where geography restricts mixing, promotes a stratifiedwater column with oxygenated surface waters and O2-depleted bottom waters [8]. Coastal systemscan experience localized eutrophication (run-off of nutrients from terrestrial sources), causingresults similar to upwelling with an input of large amounts of nutrients [5, 7]. Within inlets andbasins, and occasionally in OMZs in close proximity to the shore, OMZ waters can contact theunderlying sediments. Sufficient O2-depletion in these areas can allow reduced compounds, suchas hydrogen sulfide (H2S) and CH4, to efflux out of the sediments and accumulate in the overlyingwater column, creating a highly reduced environment termed a sulfidic OMZ (Figure 1.2C). Thesesulfidic OMZs are often characterized by steep gradients of O2, NO3 – and H2S occurring over afew to tens of meters within the water column [4, 9].Figure 1.2: OMZ types. Characteristic chemical profiles in OMZs. (A) Open-ocean OMZ.. (B) AnoxicOMZ. (C) Sulfidic basin. Figure from Pelagic Oxygen minimum zone microbial communities. 2013. By Ulloa, O.,Wright, J.J., Belmar, L., Hallam, S. J.1.1.2 OMZ expansion and intensificationCurrently, climate change and other anthropogenic forces are causing an expansion and in-tensification of OMZs globally [3, 7, 10, 11]. Ocean surface temperature rise causes increased3stratification of the surface water column, leading to decreased mixing and consequently anintensification of O2-depletion as well as shallower OMZ depths [3]. Indeed, over the past 60years O2 concentrations in the NESAP have dropped 22% [10] (Figure1.3). Recent modeling effortshave forecast intensification of O2-depletion globally in the coming decades [11]. As OMZs playkey roles in the global nitrogen and carbon cycles [12, 13] and greenhouse gas production [6]there comes a pressing need to understand the biogeochemical cycles and contributions of themicrobial metabolisms in OMZs to global processes including greenhouse gas production andcarbon sequestration.Figure 1.3: OMZ expansion in North Eastern Subarctic Pacific. Oxygen concentration at Ocean StationPapa (50.1N, 144.9W) over the past 60 years.1.1.3 Redox gradients and redox driven niche partitioningIn OMZs, as O2 is depleted, microbes begin to respire NO3 – , producing NO2 – (and potentiallyN2O and N2) [12]. Gradients of O2, NO3 – and NO2 – (Figure 1.2) form an overall redox gradientthat shapes the microbial community and associated metabolic activities [4]. Furthermore, insulfidic OMZs the redox gradient extends into the more highly reduced sulfidic environment,providing additional niches for the microbial community and associated metabolic activities.As terminal electron acceptors are depleted from O2, NO3 – , NO2 – , SO42 – and finally to CO2in respiratory processes (Figure1.4), O2-depletion levels can be classified as oxic (>90 µm O2),dysoxic (20-90 µm O2), suboxic (1-20 µm O2), anoxic (<1 µm O2) and sulfidic. These classifications4serve to provide a framework for discussing microbial ecology within OMZ systems. Redoxgradients provide conditions for redox driven niche partitioning where microbes partition atniches along the gradient according to their metabolic capabilities [2]. Notably, the presence ofreduced sulfur compounds such as H2S as well as CH4 can provide additional electron donorsand additional niches for chemotrophic metabolisms. While O2, NO3 – , NO2 – and other chemicalparameters can be easily measured, the microbial community and their associated metabolicactivities along these gradients requires more involved sampling and analysis. While several ofthe more abundant microbial groups along OMZ redox gradients are known, their metaboliccapabilities and thresholds (i.e. O2 concentrations at which NO3 – respiration takes over or N2Oproduction occurs) remain undetermined for the majority.–0.500.51.0E'0 (V)FeOOH Fe2+SO42– HS–CO2 CH4NO2–NO3–NH4+NH4+MnO2 Mn2+NO3– NO2–NO2– NOIO3– I–NO3– N2O2 H2ONO2– N2N2O N2Figure 1.4: Electron tower. Electron potentials (E‘ ◦) of various redox couples possible in OMZs. Figurefrom Lam and Kuypers 2011.5Particle associated nichesWhile marine waters, including OMZs, are generally considered a homogeneous mixture, micro-environments can exist on particles formed from the dead matter of phytoplankton (or detritus)and fecal pellets from larger organisms such as copepods. These micro-environments provideadditional sites of redox gradients as microbial respiration within the particle depletes O2 anddiffusion on this small scale cannot re-supply O2 to internal spaces. As such, particles developinternal redox gradients with anoxic and even sulfidic micro-environments [2, 14]. Within OMZs,the decreased O2 concentrations in the bulk water column provide even less O2 available fordiffusion into particles, resulting in strong redox gradients within the particles and potentiallylarger shifts in the particle associated microbial community and metabolic activity. While someresearch has been carried out on the particle associated microbial community [14–16], technicalchallenges around isolating particles remain substantial and the contribution of the particleassociated microbial communities to biogeochemical processes on both local and global scalesremains a question.1.1.4 OMZ microbial community overviewSurveys of the small subunit ribosomal RNA (SSU rRNA) gene in OMZs have shown the parti-tioning of similar microbial groups (defined at varying degrees between sub-phylum and genus)along defined redox gradients [2, 9, 17, 18]. These results have been detailed in Wright et al. 2013and are summarised below. As with many natural environments, numerous microbial groupsremain uncultured and are known only on the basis of SSU rRNA gene sequences that havebeen identified. These so-called microbial dark matter (MDM) groups [19] represent unknownmetabolic potential and interact with known taxa to define the metabolic networks that drivenutrient and energy cycling along OMZ redox gradients.Prevalent sequences found in oxygenated waters overlying OMZs include: Alphaproteobacte-ria affiliated with SAR11, Betaprotebacteria from the order Methylophilales, Gammaproteobacteriaaffiliated with SAR86 and Arctic96B-1, Actinobacteria affiliated with OM1, Bacteriodetes affil-iated with the genus Polaribacter, Arctic96A-17, and Cyanobacteria. These groups are largely6heterotrophic, remineralizing organic matter from surface waters. Members of the SAR11 clade areamong the most abundant bacteria in the ocean [20] and have multiple cultured sub-clades withrepresentatives typically exhibiting reduced genome sizes (around 1 Mb). With highly streamlinedgenomes, the metabolic potential of SAR11 clades can vary significantly [21], with genomicevidence for phototrophy via bacteriorhodopsin [22], as well as for one-carbon metabolism [23]and the transformation of various sulfur compounds [24].Prevalent sequences found in dysoxic and suboxic OMZ waters include: Alphaproteobacteriaaffiliated with SAR11 (with distinct SAR11 clades in oxic verses dysoxic waters), Gammaproteobac-teria affiliated with agg47, ZD0417, ZA3412c and Arctic96BD 19, Deltaproteobacteria affiliatedwith SAR324 and Nitrospina, Actinobacteria affiliated with Microthrixinea, Planctomycetes, Chlo-roflexi, Verrucomicrobia and Marine Group A. In addition to these bacterial groups, Archeaaffiliated with Thaumarchaeota are also present. These groups are a mixture of heterotrophsand chemoautotrophs and represent a transition into capacity for anaerobic metabolisms. Forexample, within dysoxic water, genomic bins affiliated with SAR11 harbour genes involved innitrate reduction, potentially contributing to nitrogen loss within OMZs [25]. Similarly, bothArctic96BD 19 and SAR324 have been implicated in sulfur cycling processes [26, 27]. Severalother groups play different roles in the nitrogen cycle. Nitrospina carry out nitrite oxidation [28],Planctomycetes carry out anammox [29], and Thaumarchaeota carry out ammonia oxidation [30].Candidate phyla Marine Group A (also known as Marinimicrobia) while numerically abundant inplaces like the NESAP OMZ [31] remain enigmatic, although limited genomic information pointsto a role in sulfur cycling [32].Prevalent sequences found in suboxic, anoxic and sulfidic OMZ waters are primarily af-filiated with SUP05 within the Gammaproteobacteria. Additional sequences found in anoxicand sulfidic OMZ waters include: Deltaproteobacteria affiliated with Desulphobacteraceae, Ep-silonproteobacteria affiliated with Arcobacter, Bacteriodetes affiliated with VC21 Bac22, variousGemmatimonadetes and Lantisphaerae, and the MDM phyla Marine Group A, OD1 and OP11.Genomic evidence suggests that many of these groups are chemoautotrophic with the potentialto couple different aspects of the carbon, nitrogen and sulfur cycles. SUP05 has been implicatedin partial denitrification coupled to sulfide oxidation, using the resulting energy to drive dark7carbon fixation [33, 34]. A recent study in the Peruvian upwelling system identified a completedenitrification pathway in a metagenome assembled metagenome (MAG) [35] assigned to SUP05,indicating the potential for functional specialization within the clade [36]. Similar to SUP05,Arcobacter have been implicated in denitrification coupled to sulfide oxidation although the twogroups appear to occupy different energetic niches in OMZs [37].1.2 Biogeochemical cycles in OMZsBiogeochemical cycles are the biological, geological and chemical processes responsible for movingelements through both biotic and abiotic systems, recycling elements for availability to livingorganisms. Microorganisms play a significant role in many of these cycles, as their metabolicactivities drive biogeochemical cycles [38]. Primary biogeochemical cycles of interest includecarbon, nitrogen, sulfur and phosphorus as these elements are essential in biological moleculessuch as nucleic acids, proteins and lipids. Many biogeochemical cycles are carried out throughmulti-step processes involving multiple different guilds of organisms, each with a specialisedrole within the cycle. Within OMZs the O2-depleted and anoxic environments provide essentialniches for O2 sensitive enzymatic reactions and reactions involving alternative electron acceptors.The microbial communities present along the redox gradients in OMZs play essential roles inbiogeochemical cycles, particularly in nitrogen and sulfur cycles, as well as the carbon cycle.1.2.1 Biogeochemical cycling of nitrogen in OMZsNitrogen is an essential nutrient integral in DNA, RNA and protein and thus essential forcell growth and carbon fixation. However, the majority of the Earth's nitrogen exists as inertdinitrogen (N2) gas which must be ‘fixed’ into the biologically available nitrogen species such asammonium (NH4+), a process carried out only by specific clades of microbes found in relativelylow abundances [39] (as well as industrial chemical processes). All other organisms then obtaintheir nitrogen by taking up these fixed nitrogen species, NH4+ or NO3 – and NO2 – , directly orthrough heterotrophy of organic matter. Within OMZs, biologically available nitrogen speciesplay additional roles of electron acceptors (NO3 – and NO2 – ) and donors (NH4+) to directly8fuel energy metabolism via dissimilatory processes. Nitrogen based energy metabolisms includeaerobic processes of ammonia oxidation and nitrification and anaerobic processes of denitrification,anaerobic ammonium oxidation (or anammox) and dissimilatory nitrate reduction to ammonium(DNRA) (Figure 1.5). Anaerobic processes of denitrification and anammox are termed nitrogenloss processes as they remove biologically available nitrogen from the environment, completingthe cycle by returning nitrogen to the atmosphere as N2 or N2O. In any given system theremust be enough biologically available nitrogen to support growth, while nitrogen loss is a keyprocess in completing the nitrogen cycle, too much nitrogen loss will limit growth and primaryproduction. Generally, it is predominantly the availability of O2 and other electron donors andacceptors within a given niche that influences the occurrence of different nitrogen based energymetabolisms, but much remains to be understood about the balance between different nitrogencycling processes along redox gradients in OMZs.Figure 1.5: Nitrogen cycle. Biogeochemical transformation involved in the Nitrogen cycle, indicationaerobic processes (green) and anaerobic processes (blue). Figure modified from Nitrogen in the marineenvironment 2008, by Capone.9Nitrogen fixationNitrogen fixation is not a focus of this thesis, it is however important to note its occurrence inthe ocean and current state of research in OMZs. The specific enzyme responsible for nitrogenfixation is NifH. Nitrogen fixation requires significant energy and therefore is most often foundcoupled with photosynthetic processes in surface waters and is carried out at higher rates in warmtropical waters [39, 40]. However, more recent studies document nitrogen fixation by heterotrophicdizotrophs from many diverse taxa present throughout the ocean, including dysoxic, suboxic andsulfidic waters [41]. The potential for nitrogen fixation within O2-deficient waters has implicationsfor the extent of N-loss by denitrification and anammox (discussed below) as it could both fuelN-loss processes and re-supply biologically available nitrogen for growth [42, 43].The extent of production of biologically available nitrogen from heterotrophic diazotrophswithin OMZ waters appears to be variable. Studies from both the Baltic Sea [41] and thePeruvian upwelling [44] report nitrogen fixation rates ranging between 0.1 to 3.4 nmol l−1 d−1.Within the Peruvian upwelling, depth integrated rates measured in different years were reportedranging between 7.5 to 190 µm m−2d−1 with the highest rates measured within the oxyclineand OMZ core [44]. Furthermore, NO2 – and PO43 – appear to be a factor in the distribution ofheterotrophic diazotrophs throughout OMZ waters [45]. Studies of taxa carrying out heterotrophicdiazotrophy in the Eastern Tropical South Pacific Ocean indicate a predominance of Alpha- andGamma-proteobacteria as well as sulphate reducing Deltaproteobacteria, Clostridium and Vibriospecies [46]. These smaller inputs of biologically available nitrogen into OMZs may impact thebalance between different nitrogen based energy metabolisms along OMZ redox gradients withimplications for global nitrogen budgets.NitrificationNitrification is the processes of oxidizing NH4+ to NO3 – (Equation 1.1), as NH4+ is releasedfrom the degradation of organic matter. In most environments nitrification is nearly alwayssplit between two different organisms [47], ammonium oxidizers: oxidizing NH4+ to NO2 – andnitrite oxidizers: oxidizing NO2 – to NO3 – . Both NH4+ and NO2 – oxidation are carried out in10subsurface waters as sunlight is inhibitory to NH4+ oxidation [48] (Figure 1.2). Thus, nitrificationis a dominant nitrogen cycling process within the dysoxic waters of OMZs.NH4+ + O2 −−→ NO2− −−→ NO3− (1.1)Within marine systems ammonium oxidation is carried out nearly exclusively by ammoniumoxidizing archaea (AOA) of the Thaumarcheaota lineage [49]. The first isolated archaeal ammoniaoxidizer, Nitrosopumulis maritimus, was isolated from aquarium gravel and observed to growchemoautotrophically on NH4+ [50]. The specific enzyme responsible for NH4+ oxidation, am-monia monooxygenase (AmoA), carries out the oxidation of NH4+ to hydroxylamine (NH2OH),which is then oxidized to NO2 – [30, 51, 52], however the specific enzymes responsible for NH2OHoxidation to NO2 – are undetermined in AOA [53]. Additional studies have found the archaealamoA gene to be widely distributed and abundant throughout the global ocean [30, 39]. Whilecertain bacteria are also known to harbour amoA, in the ocean the process of NH4+ oxidation iscarried out predominantly by archaea [39, 40, 49].Nitrite oxidation is carried out by a few different bacterial lineages, within marine environ-ments they are predominately members of Nitrococcus [54], Nitrospina [28] and Nitrospira [55].Members of these lineages have been isolated from OMZs and genomic evidence suggests achemoautotrophic life style [28, 54, 55]. The enzyme responsible for NO2 – oxidation to NO3 – isnitrite oxido-reductase (Nxr). Activity of nitrite oxidizing bacteria has been observed in OMZsat O2 concentrations as low as 1 µm, suggesting a broad niche for this biogeochemical processand coupling of nitrification with anaerobic nitrogen loss processes [56]. Recent studies suggestNitrococcus to be numerically dominant within OMZ waters [56] and clutured Nitrococcus are fur-ther observed to oxidize sulfide, reduce NO3 – and produce N2O [54]. These metabolic capacitiesfurther expand the metabolic roles of nitrite oxidizing bacteria in O2-deficient waters.DenitrificationDenitrification is a nitrogen loss process, removing biologically available nitrogen in the form ofN2 or N2O gas. Denitrification involves the successive reduction of NO3 – to NO2 – , to nitric oxide11(NO) to N2O, to N2 (Equation 1.2). All steps can be carried out by a single organism, or split acrossseveral different groups of organisms in a distributed metabolic process [57]. The individualmodular steps of denitrification can be taxonomically diverse with NO3 – reduction being carriedout by multiple domains including Fungi [57, 58] and N2O reduction observed in diverse bacteriallineages including multiple proteobacteria, as well as Verucomicrobia, Bacteriodetes, Chlorobi andhalophilic archaea [59].Initial work on denitrification in OMZs involved primarily rate measurements and genecounts for individual genes in the denitrification pathway (see below) using quantative PCR[60–62]. However, these methods yielded little information on the taxonomy of organismscarrying out these reactions [12, 63]. With the introduction of next generation sequencing andsingle-cell amplified genome technologies, combined with traditional culturing techniques, sometaxonomic information about denitrifying organisms in OMZs has become available. Specifically,Epsilonproteobacteria Sulfurimonas gotlandica (isolated from the Baltic Sea) is seen to carry outcomplete denitrification [64]. Gammaproteobacteria of the SUP05 lineage (draft populationgenome from Saanich Inlet), including Candidatus Thioglobus autotrophicus (isolated from sulfidicbasin Effingham Inlet, BC), is seen to carry out incomplete denitrification, producing N2O [34].However, recently metagenome assembled genome for the SUP05 group UThioglobus perditus(Uindicating uncultured) from the Peruvian upwelling indicates capacity for complete denitrificationincluding N2O reduction. The full taxonomic range of denitrifying organisms in OMZs remainsto be determined and is a central focus of this thesis.The enzyme complex responsible for the first step of denitrification (reduction of NO3 – toNO2 – ) can be either the nitrate reductase NarDGHIJ (Active subunit NarG) or periplasmic nitratereductase NapAB. While NarG has a wide taxonomic diversity [58] NapA has only been observedwithin the proteobacteria [58]. The second step of denitrification (reduction of NO2 – to NO) canbe carried out by either copper containing nitrite reductase NirK or cytochrome cd1 containingnitrite reductase NirS. The third step of denitrification (reduction of NO to N2O) is carried outby nitric oxide reductase NorCB, and the final step of denitrification (reduction of N2O to N2) iscarried out by nitrous oxide reductase NosZ. Due to its modular nature, complete denitrificationmay not occur within a given redox niche, resulting in the accumulation of partial denitrification12products such as NO2 – or N2O at points along the redox gradient (the inherent instability of nitricoxide does not allow for significant accumulation within the water column). Furthermore, thediversity of organisms carrying out various steps of denitrification makes it difficult to determinethe contribution of a given taxa to measured denitrification rates and the contribution of that taxato nitrogen loss across different OMZ systems.NO3− −−→ NO2− −−→ NO −−→ N2O −−→ N2 (1.2)AnammoxAnaerobic ammonium oxidation or Anammox is an additional nitrogen loss process carried outexclusively by the Brocadia lineage of Planctomycetes which use NO2 – to oxidize NH4+ to N2(Equation 1.3). Members of the Brocadia have been isolated primarily from wastewater treatmentfacilities [65] and appear to be capable of a chemoautotrophic lifestyle [66, 67]. Planctomycetesof the Scalindua clade within the Brocadia appear in OMZs globally [65], indicating the globalimportance of this nitrogen loss process. There are three primary enzymes responsible foranammox: NirS, which reduces NO2 – to NO, hydrazine hydrolase (HH) which converts NOto hydrazine (N2H2) with the addition of NH4+, and hydrazine dehydrogenase (HZO) whichconverts H2N2 to N2 [66, 67]. Anammox is generally carried out under anoxic conditions butseveral lines of evidence suggest a broader niche for this process. For example, anammox rateshave been observed in OMZs at O2 concentrations up to 20 µm [68] and Scalindua sp. havebeen shown to be particle associated, suggesting an additional potential niche for this process[14]. Additionally, anammox-type nirS transcripts have been detected in water samples fromthe Black Sea with 2 µm H2S [69], suggesting an additional, more reduced niche for anammoxin sulfidic OMZs. In all, anammox has been expected to contribute up to 50% of nitrogen lossglobally (including processes occurring in the sediments) [70, 71], but much uncertainty remainsaround the conditions which govern the balance between anammox and denitrification in OMZs[12, 60, 61, 72].NO2− + NH4+ −−→ N2 (1.3)13Dissimilatory nitrate reduction to ammoniumDissimilatory nitrate reduction to ammonium (DNRA) is an additional nitrogen transformationthat does not involve loss of biologically available nitrogen. As DNRA was originally thought tooccur only in sediments and soils, and was only recently discovered in the water column [29], itsdistribution in marine systems is still being determined. Observations of DNRA in OMZs includethe Peruvian [61] and Arabian [73] OMZs, but additional studies have yet to further constrain itsdistribution globally. Taxonomic lineages found to carry out DNRA include the gamma-, delta andepsilon-proteobacteria [12]. The enzyme responsible for DNRA, cytochrome C nitrite reductase(NrfA), carries out a six electron reduction of NO2 – to NH4+ (Equation 1.4). While only sparinglyobserved in the marine water column thus far, the implications of DNRA on nitrogen cycling aresubstantial as use of NO2 – to drive NH4+ production offsets or competes with nitrogen loss viadenitrification and/or anammox.NO2− −−→ NH4+ (1.4)1.2.2 Biogeochemical cycling of sulfur in OMZsSulfur is essential for protein production and thus for cell growth. Sulfur is generally readilyavailable in marine systems in the form of SO42 – , released upon the degradation of organicmatter, and taken up directly or heterotrophically through organic matter. Within OMZs, asO2 is depleted, SO42 – potentially becomes a viable electron acceptor, producing reduced sulfurcompounds such as thiosulfate (S2O32 – ) and H2S, which can then be oxidized. with both reductionand oxidation reactions fuelling chemotrophic energy metabolism. The taxonomic distribution ofSO42 – reducers is generally thought to be predominantly Deltaproteobacteria in marine systems,while sulfide oxidizers have a broader taxonomic distribution including many proteobacteriallineages, Chromatiaceae and Chlorobi lineages as well as some Archaea [74]. In OMZs, potentialsulfur oxidizing microbial taxa have been identified as Candidatus Thioglobus autotrophicus [34]and UT. perditus within the Gammaproteobacteria SUP05 group [33, 75, 76], EpsilonproteobacteriaSulfurimonas gotlandica, and Deltaproteobacteria SAR234 [27, 77].14Many of the enzymes responsible for sulfate reduction, such as dissimilatory sulfate reductase(Dsr) and adenyl-sulfate reductase (Apr) are reversible and can carry out the later steps of H2S ox-idation to SO42 – as well. Furthermore, the detection and measurement of many biologically activesulfur molecules is challenging due to extreme O2 sensitivity and low throughput methodologies.Detection of genes involved in H2S oxidation in non-sulfidic OMZ waters [18, 78, 79], suggests thepresence of H2S oxidizers as well as SO42 – reducers (to supply the reduced sulfur compounds)within suboxic waters. Measurements of SO42 – reduction and H2S oxidation rates in non-sulfidicOMZs have been carried out with detection of both activities occurring concurrently within thewater column [78]. However, identification of canonical SO42 – reducing bacteria in OMZs hasbeen elusive. These results support the presence of a cryptic sulfur cycle within (non-sulfidic)OMZ waters where reduced sulfur compounds are biologically produced and oxidized so rapidlythat reduced sulfur compounds are unable to accumulate within the water column [78]. Theseredox reactions serve to fuel chemotrophic energy metabolisms and support chemoautotrophicactivities, linking the sulfur and carbon cycles within OMZs [78]. Indeed, high abundance ofSUP05 Gammaproteobacteria UT. perditus within O2-deficient advected shelf water off the coastof Peru coincide with high abundance of elemental sulfur, the first product of sulfide reduction,detectedin the water column [36]. Rates of carbon fixation by UT. perditus were highest whensulfide content of the individual cells was high as well, supporting a strong coupling of carbonfixation and sulfide oxidation [36].1.2.3 Carbon fixation in OMZsMost OMZs exist below the photic zone such that sunlight is not available to support carbonfixation by photosynthesis. However, the abundance of redox active molecules serve as energeticsubstrates to support several pathways for chemoautotrophic carbon fixation, collectively termeddark carbon fixation. Within OMZs different taxa utilize different carbon fixation pathways,however, information on taxonomic distribution of pathways is limited. Ammonia oxidizingarchaea, Thaumarcheaota sp. use the 3-hydroxypropionate/4-hydroxybutyrate cycle [52, 80]; NO2 –oxidizers Nitrospira and Nitrospina use the reverse tricarboxylic acid cycle ([28, 55]); Planctomycetesuse the reductive acetyl-CoA pathway [66] and denitrifying sulfur oxidizing Gammaproteobacteria15of the SUP05 lineage, including Ca. T. autotrophicus and UT. perditus, use the Calvin BensonBasham cycle [33, 34, 36]. In principle, the structure of the microbial community and availableenergetic substrates along redox gradients dictates the amount of carbon fixed and by whichpathway and taxonomic group. While rates of dark carbon fixation have been observed for severaldifferent OMZ systems [81–84], the extent to which each of these chemoautotrophic groups isactively fixing carbon and under what conditions remains to be determined.1.3 Microbes in communityBeyond the microbes studied in the laboratory setting of isolated cultures, microbes exist in com-munities, with various organisms carrying out a vast diversity of metabolic processes tuned to thegeochemical conditions of their environment [85]. Within microbial communities the metabolismsof individual organisms or taxa overlap, with the community as a whole having functional redun-dancy that promotes co-metabolic interactions where individual steps for metabolic pathwaysare distributed across multiple taxa [2, 38, 86–89] . These co-metabolic interactions can take theform of discrete interactions where a metabolic intermediate made by one taxa is shared onlywith one other taxa or can occur much more generally and widespread as some taxa share withthe community as a whole [90–96].1.3.1 Co-metabolic interactions in microbial communitiesDiscrete co-metabolic interactions often manifest in symbiotic associations or obligate syntrophicinteractions. For example, the symbiotic association between spilltebug (Clastoptera arizonana) andits symbionts Sulcia muelleri CARI and Candidatus Zinderia insecticola. Ca. Z. insecticola makesand shares amino acids tryptophan, methionine, and histidine and no others, while S. mulleri CARImakes the remaining 7 of 10 essential amino acids and shares with Ca. Z. insecticola [97]. Discreteco-metabolic interactions are also seen in obligate syntrophic interactions such as the anaerobicmethanotrophic archaea of the ANME-2 lineage and sulfate reducing Deltaproteobacteria that aresuggested to directly shuttle electrons from the anaerobic oxidation of methane in the ANME-2 toreduce sulfate in the Deltaproteobacteria [98].16Co-metabolic interactions can also be much more general and widespread in the communitysuch as in the production of public goods, where a product (a metabolite or enzyme) is made byonly a subset of the community and used by all others. For example, peroxide oxidase enzymein surface ocean waters protects not only the organisms carrying the peroxide catalase gene butalso numerous Prochlorococcus cells lacking this important protective gene [99]. This becomesparticularly relevant in environmental conditions where resources are limited and individualspecies cannot bear the metabolic burden of a complete pathway [91, 100]. In a strategy knownas the black queen hypothesis it becomes advantageous to lose supposedly essential metabolicgenes if other members of the community can provide that resource [91, 101]. Additionally, inenvironments with highly limited energetic conditions, such as anoxic or sulfidic environments,it may be thermodynamically unfavourable for a single cell to carry out all necessary metabolicreactions and therefore metabolic pathways may become distributed across multiple members ofthe community [102]. For example, in highly reduced methanogenic conditions within a bioreactor,members of the candidate phyla Marinimicrobia are suggested to degrade protein and amino acidsproduced from the rest of the microbial community, some of which lack amino acid degradationpathways [103].Within OMZs, the multitude of nitrogen-based energy metabolisms (discussed above) oftenoverlap, with one organism feeding another, such as with AOA supplying NO2 – to nitriteoxidizing bacteria. Alternatively, organisms may compete for energetic substrates, such as NO2 –for anammox and denitrification [61]. Additionally, metabolic pathways may be distributed acrossthe community where different taxa carry our different steps. The nitrogen and sulfur basedenergy metabolisms present in OMZ microbial communities represent an excellent platform onwhich to investigate these co-metabolic interactions as they are separate from carbon pathwaysand the multitude of associated carbon-based metabolites. The gradients found in OMZs offer anopportunity to observe how these interactions may change along these gradients.1.3.2 Using multi-omics to study microbial communitiesThe nature of co-metabolic interactions occurring within microbial communities remains tobe fully revealed as it is only recently that technological advances have supported production17of data that would permit the identification of such interactions at the level of the microbialcommunity. As individual cells cannot be isolated from the community (by culturing or physicalmechanisms) without changing gene expression, the cultivation-independent study of microbialcommunities in natural (and engineered) environments utilizes a set of tools referred to as multi-omics. Multi-omics follows the flow of biological information set forth in the central dogma ofbiology by sequencing DNA, RNA and protein, producing metagenomes, metatranscriptomes andmetaproteomes respectively. Metagenomes give information about microbial community structureand metabolic potential while metatranscriptomes and metaproteomes give information aboutgene and protein expression, providing information about metabolic activities occurring undera given condition. Coupled to chemical and physical data from the environment, multi-omicdatasets can provide insight into biogeochemical cycles and co-metabolic interactions along redoxgradients. A drawback of the multi-omic approach is the loss of taxonomic resolution, the pairingof genes, transcripts and proteins with specific taxonomic origins. While gene sequences canbe mapped back to related cultured (and sequenced) representatives, precise information abouttaxonomic origins requires additional tools and/or methodologies that remain computationallyintensive. While there are limitations to multi-omic analyses, applications involving comparisonsbetween different conditions, along gradients or over time are particularly useful in uncoveringmicrobial community responses to a change [104].Additional technological and computational advances are providing ways to better link tax-onomy and function by reconstructing genomes of individual cells and microbial populationsrespectively. Technological advances are supporting the construction of single cell amplifiedgenomes (SAGs) by physical isolation of cells directly from the envrionment, followed by wholegenome apmlification and sequencing of the product [105]. The resulting SAGs are the partialgenome (as whole genome amplification has some bias) of one individual organism with all func-tional genes directly linked to the taxonomy of that cell. Computational advances are supportingthe construction of metagenome-assembled genomes (MAGs) by assembly of metagenomic contigswith overlapping sequences and similar K-mer frequencies [35]. The resulting MAGs represent thegenome of a population of phylogenetically similar cells from a given environment. Both SAGsand MAGs can be powerful tools in linking taxonomy and function and better understanding the18metabolic roles played by different taxa.The tremendous capacity of next generation DNA sequencing, coupled to both computationaland technological advances in other areas (SAGs, MAGs, proteomics, metabolomics, microscopyetc.) is ushering in a new era of microbial ecology [106] supporting the study of microbialcommunities and novel methods for uncovering co-metabolic interactions and their role inbiogeochemical cycles on a global scale.Within OMZs, microbial communities along redox gradients carry out different metabolicactivities, providing excellent systems to study how the microbial community and potentialco-metabolic interactions shift across redox gradients, informed by the availability of energeticsubstrates. As co-metabolic interactions are often an emergent property of the community and itsenvironment, it is necessary to study these processes in vivo within the environment rather thanin vitro in a synthetic system in the laboratory. Thus a model natural environmental system isrequired for the application of multi-omics datasets within OMZs, providing both redox gradientsand a natural microbial community. Using multi-omics to study microbial community structureand metabolic activities along environmental gradients can help to uncover fundamental principlesunderlying microbial community metabolism and co-metabolic interactions.1.4 Saanich Inlet as a model OMZSaanich Inlet is a seasonally anoxic fjord on the east coast of Vancouver Island, British Columbia,Canada. A shallow sill at the entrance to the inlet, at a depth of 75m, restricts circulation inthe basin waters that reach a depth of ∼230m (Figure 1.6). The reduced circulation results inO2-depletion in basin waters and permits the accumulation of H2S and CH4 from underlyingsediments into the water column [107–109], creating redox gradients of O2, NO3 – and H2S, likethose found in sulfidic OMZs (Figure 1.2) [4]. Saanich Inlet has long been the site of oceanographicresearch and houses the Institute of Ocean Science (IOS), providing shore-based resources forongoing research.Decades of research and environmental monitoring dating back to the 1930’s [110, 111] havedocumented a seasonal cycle of stratification, with intensifying O2-depletion in basin waters19Figure 1.6: Saanich Inlet topology and bathymetrySaanich Inlet, on the east coast of Vancouver Island,indicating sampling station S3 and Institute of Ocean Science. Figure from Zaikova et al. 2010 Microbialcommunity dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia in Environmental Microbiology.through spring and summer, followed by deep water renewal as oxygenated waters creep into thebasin during the fall months [8, 107, 111–113]. Through this cycle of stratification and renewalwe see recurring gradients of O2, NO3 – and H2S supporting different microbial communitiesalong an overall redox gradient [8]. Furthermore, the O2 profile of the Saanich Inlet water columnthrough winter and into spring shows the gradual depletion of O2 and NO3 – , strongly resemblingopen ocean with O2 concentrations ranging between 3 to 100 µm and average NO3 – concentrationsbetween 0 to 20 µm. In the later winter months through early summer the water column showsfurther depletion of O2 and NO3 – and accumulation of H2S in the deep anoxic basin water,20resembling coastal sulfidic OMZs with H2S concentrations ranging between 2 to 20 µm (Figure1.2). Thus, Saanich Inlet serves as a model system for multiple OMZ types.In efforts to understand the microbial community structure, metabolism and co-metabolicinteractions in relation to biogeochemical cycles, an environmental and microbial monitoring timeseries has been underway in Saanich Inlet since 2006 (see Chapter 2)[114, 115]. An environmentaltime series can provide an extensive and robust dataset of multi-omic and environmental mon-itoring data such that environmental perturbations are observable in the microbial communitystructure and metabolic processes [92, 104]. The Saanich Inlet time series programme aims tomonitor physical and chemical oceanographic characteristics and microbial community structureand metabolic activity within the context of OMZ nitrogen, sulfur and carbon cycles.1.4.1 Saanich Inlet microbial communityThe first study in the Saanich Inlet time series by Zaikova et al. used fosmid clone librariesand denaturing gradient gel electrophoresis of amplified small subunit ribosomal RNA geneto profile the changes in the microbial community along redox gradients and over time [8].These results show an overall similarity to the structure of other OMZ microbial communities[2, 9, 116]. Ammonia oxidizing Archaea Thaumarchaeota (then termed Crenarchaea) were observedthroughout the water column. Within the dysoxic waters at 100 m, the heterotrophic SAR11group was found in relative abundance, along with the NO2 – oxidizer Nitrospina and severalother taxonomic groups. Deeper into the basin waters as H2S began to accumulate, members ofthe Gammaproteobacteria SUP05 group became increasingly dominant, making up to 95% of theclone libraries in 200 m samples. Overall, the identification of patterns in microbial communitystructure along the Saanich Inlet redox gradient is similar to other OMZs, supporting the use ofSaanich Inlet as a model oxygen minimum zone [2, 8, 9].1.4.2 The SUP05 Gammaproteobacteria group in Saanich InletThe high abundance of Gammaproteobacteria SUP05 group in Saanich Inlet basin waters hassupported the assembly of a draft population genome [33]. Members of the SUP05 group wereoriginally detected in the Suyio Seamount hydrothermal plume where 88-90% of bacterial cells in21the plume layer were identified as SUP05 by fluorescence in situ hybridization [117]. Membersof the SUP05 group also include sulfur oxidizing symbionts of deep sea clams and musselsliving near hydrothermal vents and cold seeps [33, 117] where they act to detoxify reduced sulfurcompounds and fix carbon for their hosts [118, 119]. Surveys of OMZs globally have foundfree-living SUP05 in many anoxic and sulfidic OMZs [2], including the Baltic Sea [76, 120], TheBlack Sea [121], the ETSP [18], Guaymas Basin [122], Namibian upwelling [75] and other inletson the coast of British Columbia [34, 123]. In many of these systems, most notably in SaanichInlet and Namibian upwelling, SUP05 is observed at highest abundance in the water columnat the interface between NO3 – and H2S rich waters [33, 75], suggesting a redox driven energymetabolism.The draft population genome for SUP05 from Saanich Inlet metagenomes indicated severalenergy metabolism pathways enabling it to thrive at the sulfide-nitrate interface in Saanich Inletas well as other anoxic and sulfidic OMZs [18, 75, 76, 120–122, 124, 125]. Further, a recentlysequenced SUP05 isolate, Ca. Thioglobus autotrophicus, had similar energy metabolism pathways[34]. Analysis of available SUP05 genomes revealed multiple pathways for denitrification andsulfide oxidation. With respect to denitrification, SUP05 harbours two interesting features. One isthe apparent absence of the gene nitrous oxide reductase, nosZ, which catalyzes the reductionof N2O to N2, implicating SUP05 in the production of N2O. Second is the presence of twofunctionally analogous enzymes mediating nitrate reduction, NarG and NapA. Membrane boundnitrate reductase NarG is common in a wide range of organisms and has been well characterizedto function under anoxic, high nitrate conditions [126]. Alternatively, periplasmic nitrate reductasenapA has so far been found only in proteobacteria, most often associated with narG. It is lesswell characterized, but studies in E. coli suggest that it functions under hypoxic, low nitrateconditions [127]. With respect to sulfur oxidation pathways, SUP05 harbors a reverse dissimilatorysulfate reductase pathway (rDsr) and the Sox sulfur oxidation system. The absence of soxCsubunit implicates SUP05 in the formation of globules of elemental sulfur (S◦) [74]. S◦ maybe stored and oxidized as needed to sulfite (SO3 – ) by rDsr, allowing for an adaptive energymetabolism [74]. In addition, two enzymes mediating the initial step of H2S oxidation to S◦,Flavocytochrome/sulfidedehydrogenase (FccAB) and Sulfide:quinone oxidoreductase (sqr) were22identified. In some organisms FccAB has been observed to function under low sulfide conditions[74], while Sqr is believed to function under fully anoxic conditions [74]. These observations echothe nitrate reduction pathway in providing alternative routes to fuel SUP05 energy metabolismunder changing water column conditions.1.4.3 Saanich Inlet as a model OMZ systemThe similarity of the Saanich Inlet microbial community to other OMZs [2, 4, 8] and recurringgradients of O2, NO3 – and H2S provide an excellent model system to utilize a multi-omicsapproach for the study of OMZ biogeochemical cycles and co-metabolic interactions along redoxgradients. An environmental time series can provide an extensive and robust dataset of multi-omicand environmental monitoring data such that environmental perturbations are observable in themicrobial community structure and metabolic processes [92, 104]. Indeed, while Saanich Inlet isstratified throughout the summer months, annual renewal with oxygenated waters results in achemical profile of the water column in winter and early spring similar to that of Open-ocean orAnoxic OMZs (Figure 1.2) [4, 8].1.5 Thesis objectives and overviewUsing Saanich Inlet as a model OMZ, I explore the energy metabolism and carbon fixationpathways of the microbial community along redox gradients and utilize multi-omic datasets. Iidentify potential co-metabolic interactions and how the availability of energetic substrates mayserve to inform those interactions. I gain insight into taxonomic groups governing key processesin the nitrogen and sulfur cycles including novel taxonomic lineages carrying and expressingan otherwise unidentified N2O reductase gene. Following these investigations in Saanich InletI compare selected results to other marine and OMZ systems, providing a global context forbiogeochemical cycles and key microbial players.23Chapter 2: Methodologies and work flows for generating and processing multi-omic datasetsIn Chapter 2 I detail the multiple datasets used throughout this thesis, including chemical andphysical, SSU rRNA amplicon, metagenomic, metatranscriptomic, metaproteomic and singe-cellamplified genomes. I include both wet lab and in silico protocols developed through the course ofthe thesis project. I also include notes on the application of MetaPathways annotation pipeline tomulti-omic and next generation sequencing datasets.Chapter 3: Metaproteomics reveals differential modes of metabolic coupling amongubiquitous oxygen minimum zone microbesIn Chapter 3 I use metagenomic and metaproteomic workflows outlined in Chapter 2 to generateand analyse SSU rRNA tag, metagenome (specifically Sanger sequenced fosmid libraries), andmetaproteomic datasets. Using these data from multiple stations and time points in Saanich InletI uncovered shifts in the microbial community structure and associated metabolic activities alongredox gradients for dominant nitrogen and sulfur transformations and carbon fixation pathways.Additionally, I detected evidence of gene regulation within the Gammaproteobacteria SUP05group for a combined nitrogen sulfur and carbon fixation operon and extend these results toSUP05’s potential impact on global carbon budgets in OMZs globally.Chapter 4: Diverse Marinimicrobia bacteria may mediate coupled biogeochemical cyclesalong eco-thermodynamic gradientsIn Chapter 4 I explored the global distribution and metabolic capacity of the dark matter phylumMarinimicrobia. To do this, I use metagenomes and metatranscriptomes and associated methodolo-gies outlined in Chapter 2 as well as 24 single-cell amplified genomes (SAGs) from seven differentenvironments globally, representing 10 distinct clades. Comparison to over 200 environmentalmetagenomes globally indicates both cosmopolitan and endemic clades, as well as one claderestricted to sulfidic OMZ waters. I find the evolutionary diversification of major Marinimicrobiaclades to be closely related to energy yields, with increased co-metabolic interactions in moredeeply branching clades. Several of these clades participate in the biogeochemical cycling of sulfur24and nitrogen, filling previously unassigned niches in the ocean. Notably, two Marinimicrobiaclades, occupying different energetic niches, express nitrous oxide reductase, potentially acting asa global sink for the greenhouse gas nitrous oxide.Chapter 5: A niche for NosZ?In Chapter 5 I take a closer look at the distribution and expression of the nitrous oxide reductase(nosZ) gene both in Saanich Inlet and globally. Using the collection of SAGs from Saanich Inlet tophylogenetically anchor 7 distinct nosZ sequences and map their global abundance and expressionusing available metagenomes and metatranscriptomes from other marine environments in additionto Saanich Inlet. Additionally, I explore the seasonal dynamics of these nosZ types in Saanich Inlettime series multi-omic datasets.25Chapter 2Methodologies and workflows forgenerating and processing mulit-omicdatasets1This chapter describes the methodologies and workflows for generating multi-omic datasets in-cluding SSU rRNA amplicon (tag) sequencing, metagenomes, metatranscriptomes, metaproteomes,single-cell amplified genomes (SAGs) as well the as the processing of metaproteomic datasets anduse of the MetaPathways annotation pipeline. These methodologies form the basis of multi-omicanalyses carried out throughout this thesis in the investigation of microbial energy metabolismand co-metabolic interactions along redox gradients in oxygen-depleted waters.2.1 IntroductionMicrobial communities are the primary engines driving biogeochemical cycles on our planet,playing essential roles in carbon, nutrient and energy cycling and oxygen (O2) production [38, 128].Many biogeochemical cycles have key reactions occurring within the O2-depleted waters ofmarine oxygen minimum zones (OMZs). OMZs are areas of O2-depleted subsurface watersthat form as a result of the combination of microbial respiration of organic matter from surfacewaters and decreased ventilation or mixing [2, 6]. Under O2-depleted conditions microbes use1Selected methodologies presented in this chapter have been published in Methods in Enzymology as Moleculartools for investigating microbial community structure and function in oxygen-deficient marine waters in 2013 by Hawley, A. K.,Kheirandish, S., Mueller, A., Leung, H. T., Norbeck, A. D., Brewer, H. M., Pasa-Tolic, L. and Hallam, S. J.. Selecteddatasets presented in this chapter are published in part or in whole in Scientific Data as A compendium of water columnmulti-omic sequence information from a seasonally anoxic fjord Saanich Inlet in 2017 by Hawley, A K., Torres-Beltrn, M.,Bhatia, M., Zaikova, E., Walsh, D. A. et al. and as A compendium of water column chemistry from the seasonally anoxic fjordSaanich Inlet in 2017 by Hawley, A. K. and Torres-Beltra´n, M. and Bhatia, M. P. and Zaikova, E. and Walsh, D. A..26alternative terminal electron acceptors such as nitrate (NO3 – ), nitrite NO2 – , sulfate (SO42 – )and carbon dixoide (CO2), producing potent greenhouse gases such as nitrous oxide (N2O) andmethane (CH4) and toxic hydrogen sulfide (H2S). Currently, OMZs are expanding and intensifyingon a global scale [10, 11, 129], making it increasingly important to characterise the microbialcommunities and associated metabolic activities and biogeochemical cycles.Saanich Inlet, a seasonally anoxic fjord in British Columbia, Canada, is a model OMZ with anon-going 10 year time series monitoring geochemical parameters, microbial community structureand activity as well as viral populations [8, 33, 115, 130, 131]. Annual cycles of O2-depletion anddevelopment of anoxic-sulfidic basin waters followed by influx of oxygenated nutrient rich watersprovides recurring gradients of O2, NO3 – and H2S, making Saanich Inlet a model system forstudying open-ocean, anoxic and sulfidic OMZs (Figure 1.2) [4]. An overall redox gradient, createdby gradients of O2, NO3 – and H2S, supports microbial communities involved in biogeochemicalcycles and co-metabolic interactions, reminiscent of symbiotic associations [38, 86–89, 95, 96].Although current research efforts are increasingly focused on defining interaction networks withinmicrobial communities [92], many open questions remain regarding the regulatory and ecologicaldynamics modulating microbial community structure, function and activity both along gradientsof O2-depletion and over time. Temporally resolved datasets are necessary for the developmentof robust metabolic and climate models and incorporating environmental sequence informationthat predicts future responses to OMZ expansion [3, 129, 132] as evidenced in current efforts bythe Scientific Committee on Oceanic Research (SCOR) Working Group 144 Microbial CommunityResponses to Ocean Deoxygenation (http://www.scor-int.org/SCOR_WGs_WG144.htm).Investigations into microbial community structure, biogeochemical cycling and co-metabolicinteractions introduced in Chapter 1 have all benefited from the recent advances in sequencingtechnologies, enabling the study of microbial communities at the molecular level in a cultivation-independent manner. Amplicon or tag sequencing uses primers to amplify and subsequentlysequence individual genes, often taxonomic markers such as specific regions of the small orlarge subunit rRNA genes (SSU or LSU). These tags provide a high-throughput method forfingerprinting the microbial community structure, which can be used to compare across samplesand environments. Metagenomics, sequencing DNA from environmental microbial communities27and assembling sequence reads into contiguous regions containing multiple genes from a givenpopulation of organisms, provides an inventory of the metabolic potential present in a microbialcommunity. However, metabolic potential does not directly correlate with metabolic activity, asdifferent genes may be expressed by the same microbial population under different environmentalconditions [133]. Metatranscriptomics, sequencing cDNA generated from RNA from environ-mental microbial communities and assembling either de novo or aligning to a correspondingmetagenome, more closely represents the gene expression of the microbial community foundin that specific environment. Moreover, additional levels of regulation occur after transcriptionat the level of protein production and degradation. Metaproteomics, sequencing the proteinsfrom environmental microbial communities via tandem mass-spectroscopy, offers yet anotherperspective on microbial gene expression. Single-cell amplified genomes (SAGs), genomes ofsingle cells collected from an environmental sample, can be used to directly link the taxonomywith functional genes, providing a more complete picture of co-metabolic interactions in theenvironment. Throughout all multi-omic analyses consistent and thorough determination of genesor open reading frames and functional annotation of those genes is paramount to the identificationof biogeochemical cycles carried out by microbial communities along redox gradients present inOMZs and other environments [79, 94, 134].2.2 Sampling and multi-omic dataset overviewSampling was carried out monthly on-board the MSV John Strickland in Saanich Inlet at station S3(480 35.500 N, 123 30.300 W) (Figure 1.6) and occasionally at other stations where stated. StationS3 is one of the deepest points in the inlet, providing recurring gradients of O2, NO3 – and H2Salong which to sample the geochemical parameters and the associated microbial community andmetabolic activity. Standard physical oceanographic parameters were measured with probes via aConductivity-Temperature-Depth (CTD) instrument attached to a line that was lowered into theinlet and provided readings for every meter. Discrete samples for chemical analysis and microbialcommunity structure by pyrotags (amplicon sequencing of the V6-V8 region of the SSU rRNAgene) were taken at 16 high resolution (HR) depths along the oxycline (10, 20, 40, 60, 75, 80, 90,2897, 100, 110, 120, 135, 150, 165, 185 and 200 meters). Samples for multi-omics were taken at sixmajor depths (LV) spanning the oxycline (10, 100, 120, 135, 200m) (Figure 2.1). Details of samplecollection and processing are in the following text.Eight years of monthly sampling have generated a dataset of matched multi-omic and geo-chemical measurements. In its entirety, Saanich Inlet time series consists of 100 time points ofgeochemical profiles, 412 SSU rRNA pyrotag samples, 82 SSU rRNA iTag (V4 region) samples, 90metagenomes, 62 metatranscriptomes (including 46 unique samples and 16 replicates,) and 68metaproteomes (64 unique samples) and 378 single-cell amplified genomes. Additional viral frac-tion metagenomes and fosmid libraries were also generated (Figure 2.1). Together these datasetsserve as a primary resource for observing shifts in microbial community structure and metabolicactivities in response to changing environmental parameters along redox gradients. These datasetsof coupled multi-omic sequence information span five years of the Saanich Inlet time series andserve as a much needed resource for microbial ecology and environmental modelling efforts andcontribute an significant volume of time series data, similar to other time-series datasets such asHawaii Ocean Time Series and Bermuda Ocean Time Series.29B 20082006 2007 2009 2010 2011 2012 2013 2014Feb08 Jan09 Jan10 Jan11 Jan12 Jan13 Jan14>25010050155Oxygen (µM)<3100Feb06ADepth (m)050150200HR-DNACTD dataGasesNutrientsLV-DNAMetagenomesViromesPrelter-DNALV-DNAHR-DNA SSU librariesFosmid librariesMetatranscriptomesMetaproteomesSAGsPyrotagsiTags29 411791Cruise ID: 53 65 79 91 100Published Datasets Published in Chapter 3 manuscript Published in Chapter 4 manuscript Unpublisehd (data analysis in progresses)Figure 2.1: Multi-omics sampling overview. (A) Oxygen concentration contour for CTD data (February 2008 onward)8, with points for 16sampling depths water chemistry and high-resolution (HR) DNA samples for SSU libraries (small black dots) and six major depths for largevolume (LV) depths sampled for metagenomics, metatranscriptomics, metaproteomics and LV SSU libraries (depth indicated by large blackdots). (B) Sample inventory from February 2006 to October 2014 indicating multi-molecular datasets included in this manuscript (solid black), inprevious publications (gray) and accompanying datasets currently in analysis (open gray).302.3 Establishing water column chemistryThe chemistry and redox gradients along the depth of the water column are the primary driverof microbial community structure and associated metabolic activity. Measurements of severalgeochemical parameters along the water column were taken in order to provide environmentalcontext for metabolic activity of the microbial community. Standard physical oceanographicparameters were measured with probes mounted on a Conductivity-Temperature-Density (CTD).The CTD was attached to a line that was lowered into the inlet and provided readings for everymeter, and samples were taken from Niskin and Go-Flow bottles attached to the same line.Samples for chemical parameters were taken at the 16 HR depths. Table 2.1 provides the measuredphysical and chemical parameters; the referenced methodologies and complete descriptions withdata are available in Torres-Beltra´n and Hawley et al. [115] and an on-line visualised protocol isavailable at http://www.jove.com/video/1159/seawater-sampling-and-collection40.The recurring annual cycle of stratification and renewal in Saanich Inlet provides patterns of O2depletion followed by NO3 – depletion and development of H2S in basin waters. These recurringpatterns provide a baseline for environmental monitoring as well as observations of microbialcommunity structure and activity in addition to a vehicle for identification of environmentalconditions that deviate from normal cycles. Indeed, a weak renewal in September 2009 andsubsequent build up of H2S to higher concentrations and shallower depths throughout 2010 hadimpacts for microbial community structure and function. It is such perturbations and associatedchanges in biogeochemical processes that the Saanich Inlet time series seeks to chart and utilisefor understanding microbial community dynamics and future climate modeling efforts.31Table 2.1: Physical and chemical parameters and protocols.Parameter Instrument or Protocol ReferencePhysical ParametersTemperature CTDDenisty CTDSalinity CTDTransmisivity CTDConductivity CTDDissolved Oxygen CTDChemical ParametersNitrate Bran Luebbe AutoAnalyser Armstrong et al. 1967 [135]Nitrite Cary60 spectrometer Armstrong et al. 1967 [135]Ammonium Varioscan FLASH (ThermoScientific) Holmes et al. 1999 [136]Hydrogen Sulfide Hach kit reagents Varioscan FLASH (ThermoScientific) Cline 1969 [137]Silicic acid Bran Luebbe AutoAnalyser Armstrong et al. 1967 [135]Phosphate Bran Luebbe AutoAnalyser Murphy and Riley 1962 [138]Dissolved Oxygen Winkler Winkler 1888 [139]Dissolved GasesOxygen headspace GC/MS Zaikova et al. 2010 [8]Dinitrogen headspace GC/MS Zaikova et al. 2010 [8]Carbon Dioxide headspace GC/MS Zaikova et al. 2010 [8]Nitrous Oxide headspace GC/MS and purge and trap GC/MS Zaikova et al. 2010 [8] and Capelle et al. 2015 [140]Methane headspace GC/MS and purge and trap GC/MS Zaikova et al. 2010 [8] and Capelle et al. 2015 [140]2.4 Exploring oxygen minimum zone community structureSeveral methods for investigating microbial community structure have been used with varyingdegrees of taxonomic resolution and quantitative power in O2-depleted waters. These includeamplicon-based methods such as terminal restriction length polymorphism (T-RFLP) or denaturinggradient gel electrophoresis (DGGE) of SSU rRNA genes [8, 141–144], SSU rRNA gene clonelibrary sequencing[8, 17, 145–149], and massively parallel tag sequencing[31, 150–152]. WhileT-RFLP and DGGE are inexpensive and amenable to automation, peak resolution is limited andtaxonomic identification requires secondary purification and sequencing steps. Clone librariescan provide significantly more taxonomic information per read. However, quantitative power islimited by the cost of paired-end Sanger sequencing. Conversely, small subunit rRNA (SSU rRNA)tag sequencing currently provides less taxonomic information per read than clone libraries butsignificantly more quantitative power. Catalyzed reporter deposition (CARD) and fluorescentin situ hybridization (FISH) have also been used in community composition [141, 142, 146, 153]and group-specific studies targeting SUP05 [75, 154], Marine Group A [31] and Planctomycetes[14, 65]. While CARD-FISH provides effective group-specific quantitation, probe development andoptimization can be cost prohibitive and technically demanding when profiling large numbersof samples. Finally, plurality sequencing methods (aka metagenomics and metatranscriptomics)are have also been effectively used to derive taxonomic information from O2-depleted waters32[4, 18, 78, 155].Quantitative polymerase chain reaction (qPCR) using dye assay chemistry such as SYBER-Green® or EvaGreen® can be a specific alternative or adjunct to sequencing and CARD-FISHmethods. For example, qPCR using 5′endonuclease probe-based chemistry (Taqman) or dyeassay chemistry (SYBRGreen®) have been successfully adapted for rapid and high-throughputquantification of microbial populations in seawater [156–158]. In O2-depleted waters, qPCR hasbeen used successfully to quantify functional gene expression of nitrogen cycling genes [60, 61].Moreover, domain specific primers targeting total bacteria and archaea, and group specific primerstargeting SUP05 and Arctic96BD-19 SSU rRNA gene copy number have been used in SYBRGreen®-based qPCR assays to monitor secular changes in microbial community structure [8, 149].The use of group specific primers provides quantitative assessments of taxon abundance neededto accurately describe and monitor population dynamics in response to changing levels of watercolumn O2-depletion.2.4.1 Sample collection and DNA extraction for metagenomics and SSU rRNA tagsequencingMultiple methods for environmental DNA (eDNA) extraction from seawater exist and no singlemethod will unfailingly provide ultraclean nucleic acids in sufficient quantity and quality to sup-port multiple sequencing platforms (both metagenomic and tag sequencing) and qPCR applicationswithout prior optimization. Throughout Saanich Inlet time series, I have used methods involving aperistaltic pump to concentrate biomass from seawater onto a 0.2 µm Sterivex filter using an in-line2.7 µm polycarbonate pre-filter to remove the bulk of the eukaryotic organisms. Samples for SSUrRNA tag sequencing were collected from 1 to 2 l of seawater at the 16 high resolution depths wtihno pre-filter. Samples for SSU rRNA and metagenomics were collected from 20 L six large volumedepths and included pre-filtering. This method had been previously developed and documentedin Zaikova et al. [8] and in the Journal of Visualized Experiments (JoVE) at http://www.jove.com/video/1161/large-volume-20l-filtration-of-coastal-seawater-samples[159]. Extractionof DNA from Sterivex for the Saanich Inlet time series was previously developed and documentedin Zaikova et al. [8] and in JoVe at http://www.jove.com/video/1352/dna-extraction-from-022\33-m-sterivex -filters-cesium-chloride-density23 [160].2.4.2 SSU rRNA tag sequencingSSU rRNA tags (Pyrotags or iTags) were generated from extracted environmental DNA. Pyrotagdatasets from HR and LV samples were generated by PCR amplification using universal three-domain forward and reverse bar-coded primers targeting the V6-V8 hypervariable region ofthe 16S or 18S rRNA genes26: 926F (5'-AAA CTY AAA KGA ATT GRC GG- 3') and 1392R(5'-ACG GGC GGT GTG TRC- 3'). Samples were purified using the QIAquick PCR PurificationKit (Qiagen), and sequenced by 454-pyrosequencing at the Department of Energy Joint GenomeInstitute (JGI) in California USA, or Ge´nome Que´bec Innovation Centre at McGill University. iTagdatasets from HR and LV samples were generated by PCR amplification using forward and reversebar-coded primers targeting the V4-V5 hypervariable region of the 16S rRNA bacterial gene: 515F(5'-Y GTG YCA GCM GCC GCG GTAA- 3') and 806R (5'-CCG YCA ATT YMT TTR AGT TT- 3')[161, 162]. Samples were sequenced according to the standard operating protocol on an IlluminaMiSeq platform at the JGI. Quality control protocols were similar for both sequencing centersand generally followed manufactures specifications for the respective sequencing platforms). Forproduced 454 pyrotag datasets from both high resolution (HR) and large volume (LV) samples ahistogram of raw read counts verses read length (Figure 2.2 is used to determine the success ofa run. A successful run will have the majority of reads >450 base pairs. A plot of read countsversus read length for all HR and LV samples is provided in Figure 2.2.02.5e +055e +057.5e +051e +060 200 400 600Read LengthRaw read countsHR Pyrotags01e+052e+053e+05200 400 600LV Pyrotags0Raw read countsRead LengthFigure 2.2: SSU rRNA tag sequencing validation 454 Pyrotags for small subunit rRNA gene showingnumber of raw reads versus read length for large volume samples (99 samples in total) (left) and highresolution samples (311 samples in total) (right).342.4.3 DNA sequencingIlumina metagenome shotgun libraries from LV samples were generated at the JGI and pairedend sequenced on the Illumina HiSeq platform. JGI quality control protocol for metagenomicsequences prior to assembly entails rolling QC, an in-house sequence QC pipeline that performsa set collection of analyses and produces a summary report for each lane of Illumina dataproduced by the sequencing group. This set of analyses calculates read quality, measures sequenceuniqueness, and detects abnormal sequence motifs. An assembly, using Velvet, is used to measurecoverage and detect contamination [163]. For individual sample assemblies the average foldcoverage versus the contig length (Figure 2.3) is plotted and should have a distinct shape fordifferent samples with peaks in contig length representing at a specific coverage represent agiven closely related microbial population. Additionally, the percent GC versus the average foldcoverage can be plotted, again with distinct shapes for different samples and clusters representingclosely related microbial populations.2.4.4 Metagenomic samplesA total of 90 metagenomic samples (Table A.1) were generated covering 14 time points includingtwo renewal periods in August/September and samples from multiple stations in September 2009.These metagenomes provide means to explore metabolic pathways, taxonomic affiliations andgene abundances (see section on MetaPathways annotation pipeline below) along redox gradients.350Contig length (bp)GC (%)1 e -10.20.00.40.60.81.01e+1 1e+31e+31 e -1 1e+1GC (%)0.20.00.40.60.81.00 1e+5 2e+51e+5 2e+5Contig length (b p)1e+11e-11e+31e+11e-11e+310m135mAverage fold coverageAverage fold coverageAverage fold coverageAverage fold coverageFigure 2.3: Metagenomic sequencing validation Metagenomic assemblies for two samples from differentdepths showing average fold coverage versus contig length and percentage GC versus average fold coveragefor contigs.2.5 Exploring oxygen mimimum zone microbial communitytranscriptomeExploration of microbial community transcriptional activity has generally taken two forms,either Reverse Transcriptase-Quantative Polymerase Chain Reaction (RT-qPCR) to quantify thenumber of copies of a given transcript or using community wide metatranscriptomic studies.Both methods require the collection and extraction of high quality RNA from the microbialcommunity. Within laboratory grown cells the total RNA pool in any given microbe consistspredominantly of ribosomal RNA (rRNA), with low numbers of messenger (mRNA). Within cellsunder environmental conditions of the ocean the ratio of mRNA to rRNA is even lower, with anestimated 200 mRNA molecules per cell [133]. Combined with the short half-life of most mRNAmolecules the sampling time and extraction efficiency for metatranscriptomics become critical.36Sampling and extraction protocols have been designed to maximise RNA yield while minimizingdegradation and time between collection and freezing.2.5.1 Sample collection and RNA extractionSimilar to DNA extraction, multiple methods for RNA extraction from seawater exist and are usedbased on the particular samples and needs of the user. Sample collection focuses on minimizingtime between collection and filtration and freezing, while extraction focuses on extraction efficiencyas well as limiting degradation. Sea water sampling and concentration of biomass onto Sterivex isnearly identical to metagenomic methods, including use of an in-line 2.7 µm filter to remove thebulk of eukaryotic organisms. RNA extraction protocol was based on Shi et al. 2009 [164] wheretotal RNA was extracted from Sterivex using an mirVana RNA isolation kit (Ambion). Detailedprotocol developed to maximise extraction efficiency and RNA quality is found in Appendix A.1RNA extraction and isolation protocol.2.5.2 RNA sequencingPurified total RNA was used to generate paired end sequenced Illumina metatranscriptomelibraries at the JGI and sequenced there on the HiSeq and MiSeq platform. The quality of purifiedRNA was verified on the Bioanalyzer using a RNA nano Analysis Kit (Agilent Technologies) inorder to check on the RNA integrity and sample quantitation before cDNA library production andsequencing. JGI quality control protocol for metatranscriptomic sequencing preparation followsthe TruSeq Stranded Total RNA Sample Preparation Guide (Illumina). Briefly this protocol entails theremoval of ribosomal rRNA with RiboZero, followed by RNA fragmentation for first strand cDNAsynthesis. This is followed by second strand synthesis and the subsequent ligation of the adapters.After PCR amplification, library quality is checked using Bioanalyzer for fragment size (260bp)and purity. Indexed (barcoded) libraries are normalized to 10nM and pooled in equal volumes.Transcriptomes are assembled de novo and or mapped to a corresponding metagenomes. Foradditional quality assessment of sequencing run for each sample, histograms of percentage ofreads verse average read quality and of reads per percent GC are generated (Figure 2.4).374040 10.80.60.40.202.5e+42.0e+41.5e+41.0e+45e+33.5e+43.0e+42.5e+42.0e+41.5e+41.0e+45e+30353530302525202015151010550010m135mPercentage of Reads4035302520151050Percentage of Reads ReadsReadsAverage Read Quality %GC40 10.80.60.40.20035302520151050Average Read Quality %GCFigure 2.4: Metatranscriptomic reads for two samples from different depths showing distribution of readsover read quality (left) and percentage GC (right).2.5.3 Metatranscriptomic SamplesA total of 62 metatranscriptomic samples (46 unique samples and 16 replicates) (Table A.2) weregenerated covering 8 time points including two renewal periods in August/September. Thesemetatranscriptomes provide means to explore differential gene expression for genes involved inenergy production and other metabolic activities.2.6 Exploring oxygen minimum zone microbial community proteomeEnvironmental proteomics also known as metaproteomics was first used to describe microbialcommunity gene expression in an acid mine drainage ecosystem [165]. Reduced communitycomplexity in the acid mine milieu enabled the identification of key metabolic activities andmetabolic partitioning between community members. Since that time, metaproteomic approacheshave been successfully applied to a wide range of natural and human-engineered ecosystemsincluding soils [166, 167], leaf surfaces [168], human guts [169], napthalene-degrading enrichment38cultures [170], and wastewater treatment plants [171, 172]. Although no metaproteomes forOMZ microbiota have been reported, surface ocean surveys have provided insight into microbialcommunity responses to nutrient conditions along a coastal to open-ocean transect in the SouthAtlantic [173], coastal northeast Pacific Ocean upwelling [174], and winter to summer transitionsoff the Antarctic Peninsula [175]. Indeed, metaproteomics opens a functional window intomicrobial community metabolism and coupled biogeochemical cycles needed to monitor microbialcommunity responses to changing levels of water column O2-depletion.2.6.1 Sample collection and protein extractionProtein extraction yields per unit volume of seawater need to be considered prior to large-scalesample collection to ensure sufficient biomass is filtered for downstream processing and detectionsteps. Empirical observations suggest that a minimum of 108 cells is needed to reliably detectabundant proteins under these water column conditions using nano-high-performance liquidchromatography coupled to a Thermo Electron LTQ-Orbitrap mass spectrometer with electrosprayionization. Protocols have been designed to minimize time between sample collection andprocessing and freezing as not to alter the proteome of the microbial community. Detailed protocolfor total protein extraction and peptide detection is provided in Appendix A.2 Protein extractionand isolation protocol.2.6.2 Protein sequencingTandem mass spectrometry and peptide identificationWhile the detection and quantification of potential key microbial players in O2-depleted watersprovides insight into community structure and dynamics, additional methods for profiling envi-ronmental gene expression are needed for gene model and pathway validation. The applicationof high-pressure liquid chromatography (HPLC) coupled tandem mass spectrometry to identifyexpressed protein sequences from O2-depleted waters offers a rapid and high-throughput profilingsolution. The most effective peptide matching relies on the availability of environmental sequenceinformation derived from the ecosystem under study although a standard reference database395,000040,00030,000 Detected PeptidesDetected Proteins20,00010,00050,0000100,000CruiseSI02010m60m97m100m120m130m135m150m165m200mSI037SI038SI042SI044SI046SI047SI048SI053SI054Figure 2.5: Protein validation. Metaproteome of identified peptides (top) and detected proteins (bottom)for each depth samples, colour coded by cruise ID for peptides matched to Saanich Inlet Illumnia se-quenced metagenomic database. Higher number of detected proteins than peptides is due to the sequenceredundancy in the metagenomic database used to identify proteins.compiled from cultured isolates and publically available marine metagenomes can also be utilized.For analyses presented in this thesis I utilized a database of conceptually translated proteinsequences from Saanich Inlet metagenomes (Chapter 3 utilizes protein sequences from paired-endfosmid and whole-genome shotgun sequences, Chapter 5 utilizes protein sequences from Illumniaplatform sequenced metagenomes encompassing over 23 million protein sequences (Figure 2.1).Programs for matching peptide spectra to protein sequences varried depending on analysis (Chap-ter 3 utilizes the SEQUEST™ program, Chapter 5 utilizes the search tool MSGFDBPlus [176]).Detailed protocol developed to maximise protein identification from environmental samples withprotein sequence database from the same environment for SEQUEST™ is found in Appendix A.3Protein sequencing protocol. For MSGFDBPlus peptide mapping to full length protein sequencesthe False Discovery Rate was calculated using the spectra to peptide matches that resulted inreversed hits from the on-the-fly reversed database search and a filter on the MSGF value. Numberof peptides and proteins detected varies between samples (Figure 2.5). Due to the large size ofmetagenomic dataset used and redundancy in protein sequences because of multiple sampling ofthe same environment in the Saanich Inlet time series most peptides map to multiple identicalproteins, resulting in a greater number of proteins identified than peptides.402.6.3 Taxonomic binning and visualization of expressed proteinsThere are many ways to visualize taxonomic distributions in environmental sequence dataincluding heat maps, histograms, bubble plots, and trees. A composite visualization methodusing BLAST and the least common ancestor (LCA) algorithm implemented in MEGAN [177]can be superimposed on the Interactive Tree of Life (iTOL) (http://itol.embl.de/) [178, 179] tovisualize the taxonomic distribution of expressed proteins from the Saanich Inlet water column(Figure 2.6). Protein abundance information is mapped onto these tree structures using thenormalized spectral abundance factor (NSAF) [180]. The NSAF values for a given protein can thenbe compared between samples for a more accurate representation of gene expression betweenenvironmental samples. A detailed protocol of NSAF calculation and taxonomic binning andvisualization can be found in Appendix A.4 Taxonomic binning and visualization of expressed proteins.41no hits/not assigned no hits/not assignedNitrosopumilus maritimusCandidatus NitrosoarchaeumBacteroidetesCandidatus Nitrospira deuviiBacillusSAR324 cluster bacterium JCVI-SC AAA005delta proteobacterium NaphS2Desulfococcus oleovorans Hxd3CystobacterineaeMethylophagaARCTIC96BD-19Uncultured SUP05 bacteriummarine gamma proteobacterium HTCC2143Marinobacterium stanieri S30MethylococcaceaeChromatialesCandidatus Pelagibacter sp. HTCC7211Candidatus Pelagibacter ubique HTCC1062Pelagibaca bermudensis HTCC2601Rhodobacterales bacterium HTCC2255NeisseriaCandidatus Scalindua profundaStreptomycesThermovibrio ammonicans HB-1Caldithrix abyssi DSM 13497Nitrosopumilus maritimusCandidatus NitrosoarchaeumBacteroidetesCandidatus Nitrospira deuviiBacillusSAR324 cluster bacterium JCVI-SC AAA005delta proteobacterium NaphS2Desulfococcus oleovorans Hxd3CystobacterineaeMethylophagaARCTIC96BD-19Uncultured SUP05 bacteriummarine gamma proteobacterium HTCC2143Marinobacterium stanieri S30MethylococcaceaeChromatialesCandidatus Pelagibacter sp. HTCC7211Candidatus Pelagibacter ubique HTCC1062Pelagibaca bermudensis HTCC2601Rhodobacterales bacterium HTCC2255NeisseriaCandidatus Scalindua profundaStreptomycesThermovibrio ammonicans HB-1Caldithrix abyssi DSM 13497100 m 130 mFigure 2.6: Least common ancestor distribution of detected proteins (present and preceding pages). The taxonomic distribution of detectedprotein sequences in Saanich Inlet Station S3 determined by MEGAN least common ancestor algorithm at 100 (green), 130 (teal), 150 (blue), and200 (purple) m sampling intervals. Taxon abundance in the metaproteome is shown using NSAF values. The NSAF value for detected proteinsnot assigned to the NCBI hierarchy or with no BLAST hit/ not assigned are shown at the base of the tree for each depth interval.420.2Scale: 100 m NSAF 130 m NSAF150 m NSAF 200 m NSAF1 15 30no hits/not assigned no hits/not assignedNitrosopumilus maritimusCandidatus NitrosoarchaeumBacteroidetesCandidatus Nitrospira deuviiBacillusSAR324 cluster bacterium JCVI-SC AAA005delta proteobacterium NaphS2Desulfococcus oleovorans Hxd3CystobacterineaeMethylophagaARCTIC96BD-19Uncultured SUP05 bacteriummarine gamma proteobacterium HTCC2143Marinobacterium stanieri S30MethylococcaceaeChromatialesCandidatus Pelagibacter sp. HTCC7211Candidatus Pelagibacter ubique HTCC1062Pelagibaca bermudensis HTCC2601Rhodobacterales bacterium HTCC2255NeisseriaCandidatus Scalindua profundaStreptomycesThermovibrio ammonicans HB-1Caldithrix abyssi DSM 13497Nitrosopumilus maritimusCandidatus NitrosoarchaeumBacteroidetesCandidatus Nitrospira deuviiBacillusSAR324 cluster bacterium JCVI-SC AAA005delta proteobacterium NaphS2Desulfococcus oleovorans Hxd3CystobacterineaeMethylophagaARCTIC96BD-19Uncultured SUP05 bacteriummarine gamma proteobacterium HTCC2143Marinobacterium stanieri S30MethylococcaceaeChromatialesCandidatus Pelagibacter sp. HTCC7211Candidatus Pelagibacter ubique HTCC1062Pelagibaca bermudensis HTCC2601Rhodobacterales bacterium HTCC2255NeisseriaCandidatus Scalindua profundaStreptomycesThermovibrio ammonicans HB-1Caldithrix abyssi DSM 13497150 m 200 mFigure 2.6 Least common ancestor distribution of detected proteins continued from previous page432.6.4 Metaproteomic samplesA total of 68 metaproteomic samples (Table A.3) were generated covering 14 time points includingtwo renewal periods in August/September. These metaproteomes provide a means to validateproposed metabolic pathways and energy metabolisms and co-metabolic interactions.2.7 Single-cell amplified genomesWhile multi-omic analysis is a powerful tool for exploring microbial community metabolismand expression, a concrete link between identified metabolic functions and taxonomy remainselusive with omics alone. Recent technological advances have allowed for the production ofsingle-cell amplified genomes (SAGs) [105], generated by physically isolating individual cellsfrom environmental samples and carrying out whole genome amplification and sequencing. Thus,genomic sequence data is generated from an individual cell, irrefutably linking the genes withthe taxonomy of the cell. This is particularly useful for taxa that do not have any closely relatedcultured and sequenced representatives, such as microbial dark matter phyla [19]. The sorting ofcells into individual wells using flow cytometery also serves to reflect the natural abundance inthe environment [105], providing a way to survey microbial communities without amplificationbias. Isolated cells can be screened for taxonomy by PCR amplification of the SSU rRNA andsequencing, and cells from desired taxonomy chosen for additional amplification and sequencing.Coupling SAGs with metagenomes from the same environment by stringent alignment searchesand Kmer frequency analysis [181] can phylogenetically anchor genes and whole metagenomiccontigs to a given taxonomy, greatly increasing the metabolic insights into specific taxonomicgroups involved in biogeochemical transformations as well as potential co-metabolic interactions.Mapping metatranscriptomic reads onto SAGs can similarly provide insight into transcriptionalactivity of specific taxonomic groups and individual cell populations with greater phylogeneticresolution.In Saanich Inlet samples for SAGs were taken at a single time point, August 2012, at stationS3 at three depths spanning the oxycline (100, 150 and 185 m) in efforts to capture the microbialcommunity at particular points along the redox gradient (dysoxic, anoxic and sulfidic, respectively).44Screening of the SSU rRNA gene showed an overall decrease in diversity along the redox gradientusing Silva database (Quast13). In dysoxic waters Miscellaneous Gammaproteobacteria are themost dominant at 14.8% of collected SAGs, followed by Flavobacteriales at 10.6% and severalother taxa in similar abundances (Figure 2.7). In anoxic waters SUP05 GammaproteobacterialSAGs were the dominant taxa with 54.5% of collected SAGs. In sulfidic waters ArcobacterEpsilonproteobacteria SAGs were the dominant taxa with 40.7% of detected SAGs, followedclosely by SUP05 at 39.8%. After screening, SAGs were chosen for additional amplification andsequencing based primarily on efficiency of the whole genome amplification as well as desiredtaxonomy (Figure2.7).AnoxicArchaea ThaumarchaeotaMisc. ArchaeaABY1 OD1 ActinobacteriaMisc. BacteroidetesFlavobacterialesMarine group APlanctomycetesPelagibacter SAR11Misc. AlphaRhodobacterMisc. DeltaNitrospinaSAR324ArcobacteraceaeMisc. GammaSUP05Unclass. GammaSpirochaetesOpitutaeMisc. VerrucomicrobiaOther VerrucomicrobiaSAG:s ArchaeaBacteriaProteobacteriaαδεγVerrucomicrobiaCandidate phylaBacteroidetes 1 25 50 100Dysoxic(100m)Anoxic(100m)Suldic(100m)Total SAGssequencedUnclassied BacteriaChloroexinumber of242226218721--1218113552121873-228Figure 2.7: Singe-cell amplified genomes. Taxonomic distribution and number sequenced of single-cellamplified genomes (SAGs) collected from Saanich Inlet August 2012. Taxonomy assigned by screening ofthe small subunit rRNA gene for each SAG collected.452.8 Annotation of meta-omics datasets by MetaPathwaysMulti-omics datasets are powerful tools for investigating microbial community structure andmetabolic expression and a key point to generating robust datasets is annotation. Annotationinvolves the identification of open reading frames (ORFs) or genes and assignment of genefunction based on sequence homology searches of sequence databases. An important aspect ofthe multi-omics approaches applied in this thesis is the use of a consistent annotation pipeline:MetaPathways [94, 182]. Metapathways is a modular bioinformatics pipeline for multi-omicannotation developed in the Hallam lab. Input files are sequence files in the form of .fasta or.fastq, and use Programming Gene finding Algorithm (Prodigal), to identify ORFs, includingincomplete or fragmented ORFs [183], thus maximizing ORF recovery for environmental sequencedata types where genomic information is not exhaustively sequenced. Amino acid translatedORFs are then searched against a suite of functional databases using LAST [184] or FAST [185]algorithm and a BLAST-score ratio cut-off [186] for assignment of function from any one database.The use of multiple functional databases including KEGG [187], COG [188], RefSeq and MetaCyc[189] provides a robust functional annotation. Further, MetaPathways leverages Pathway-Toolsfunctionality to identify metabolic pathways [189], thus providing additional insight into themetabolic potential and activity of microbial communities.MetaPathways is also designed to scale with next generation sequencing platforms suchas Illumina. Assembly of metagenomic and metatranscriptomic data into contigs results in aloss of the read-depth information, such that the number of copies of a given sequence of DNA,reflective of the number of organisms carrying that sequence, is not incorporated into the assemblyinformation. The reads can be mapped back to the assembly using alignment algorithms [190]and accounted for using the reads per kilobase per million mapped (RPKM) (Equation 2.3). TheRPKM value for an ORF reflects the number of reads mapped to an ORF while accounting forORF length and total number of reads in a sample [190, 191] .RPKM =Reads Mapped to ORFORF Length (bp)Reads Mapped to Sample106(2.1)46Furthermore, RPKM values are additive such that for a given functional gene, e.g. sulfideoxidase, the RPKM for each sulfide oxidase in a metagenome (or transcriptome) can be summedto give the total relative abundance of sulfide oxidase in a given sample and compared toother samples. RPKM values can also be summed for a given taxonomy to provide the relativeabundance of genes from that taxa.Metabolic pathways of importance for microbial communities along redox gradients tend toconverge around nitrogen and sulfur based reactions such as ammonia oxidation, nitrification,denitrification, anammox, and reduction and oxidation of sulfur compounds. While most ofthese individual reactions exist in the metaCyc and Pathway-Tools database, upstream annotationof specific proteins and known taxonomic breadth of certain pathways (i.e. anammox) are nottaken into account by Pathway-Tools during identification of several nitrogen and sulfur-basedpathways. For example, in a metagenome from Hawaii station ALOHA OMZ waters, the majornitrate reduction pathways (denitrification, dissimilatory nitrate reduction and intra-aerobic nitritereduction) were identified by MetaPathways(Figure 2.8A). However, a detailed look at the enzymespresent indicated that only genes involved in the denitrification pathway were present in thesample (Figure 2.8A). Further analysis of the taxonomic affiliation of the detected genes indicatedthe presence of genes from known nitrite and ammonia oxidizing organisms, particularly inthe RNA, as well as the genes from denitrifying organisms (Figure 2.8B). Genes assigned tothe nitric oxide reduction step were all annotated as regulatory proteins with no enzymaticactivity to produce N2O. These nuances can be subtle but significant to the interpretation ofbiogeochemical cycles occurring in OMZs. Thus, throughout the course of multi-omic analysis,annotation of functional genes was carefully scrutinised and included information about theirtaxonomic affiliation.47Figure 6 Taxonomic and functional breakdown of nitrogen cycling pathways. (a) Nitrogen cycling pathways and reactions assigned byFigure 2.8: Nitrogen cycling pathways in MetaPathways. Example of Taxonomic and functional break-down of nitrogen cycling pathways from Hawaii Station ALOHA. (A) Nitrogen cycling pathways andreactions assigned by PathoLogic. Arrow color indicates pathway, nitrate reduction I (denitrification)(brown), nitrate reduction IV (dissimilatory) (yellow), and intra-aerobic nitrite reduction (red). Greynumbers adjacent to arrows indicate number of reads assigned to the reaction in the DNA and RNA(RNA in parentheses). Overlapping circles indicate the distribution of reads across multiple pathways.(B) BLAST-based functional and taxonomic breakdown of reads assigned to reactions in given pathwaysas indicated by letters A-E. Function was determined by the top RefSeq BLAST hit, reported by theMetaPathways pipeline, and indicated by reaction arrows, with color corresponding to taxa or taxonomicgroup with known activity: taxa with nitrate and nitrite reducing activity (blue), nitrite oxidizing activity(green), and ammonia oxidizing activity (purple). Grey reactions indicate no reads for enzymatic activitywere detected, only regulatory proteins that may be involved in gene expression regulation (*).2.9 Conclusion and applicationEnvironmental multi-omic datasets can be used to uncover biogeochemical cycling, reconstructmetabolic pathways, and identify different patterns of co-metabolic interactions along redoxgradients. Within Saanich Inlet time series I focused on pathways of nitrogen, sulfur and carbonfixation along redox gradients. The utility of a time series in environmental multi-omic datasets isthree-fold. Firstly, it provides a mechanism to monitor the microbial community and associatedmetabolic activities throughout the process of O2 depletion and associated development of H2S inbasin waters extensible to open-ocean, anoxic and sulfidic OMZs. Secondly, it permits trackingof shifts in microbial populations and their co-metabolic interactions over time and in responseto changing environmental conditions. Thirdly, it begins to address the lack of replication in48environmental omics datasets by providing pseudo-replicates, multiple samples with highlysimilar chemical conditions, under which to study the microbial community and associatedmetabolic activities. While there remain limits to using environmental multi-omics that can onlybe addressed through culturing and isolation, multi-omic analyses of microbial communitiesalong redox gradients can still provide valuable knowledge and insights into biogeochemicalcycling and co-metabolic interactions that will shed light on the fundamental principles shapingmicrobial communities and metabolisms.49Chapter 3Metaproteomics reveals differentialmodes of metabolic coupling amongubiquitous oxygen minimum zonemicrobes2This chapter represents the first metaproteomic analysis to chart spatial and temporal patterns ofgene expression along defined redox gradients in oxygen deficient waters. Using methodologiesfrom Chapter 2 for small subunit ribosomal RNA (SSU rRNA) tags, metagenomics and metapro-teomics, I establish microbial community structure, metabolic capacity and protein expressionassociated with key nitrogen, sulfur and carbon biogeochemical cycles outlined in Chapter 1.2.The expression of metabolic pathway components for nitrification, anaerobic ammonium oxi-dation (anammox), denitrification, and inorganic carbon fixation were differentially expressedacross the redoxcline and co-varied with distribution patterns of ubiquitous OMZ microbes.The numerical abundance of SUP05 proteins mediating inorganic carbon fixation under anoxicconditions suggests that SUP05 will become increasingly important in global ocean carbon andnutrient cycling as OMZs expand. The exploration of multiple stations and time points reinforcesthe reproducibility of the metaproteome under similar redox conditions with respect to relativeabundance of energy cycling proteins. This work is a basis for interpreting microbial communityfunction and expression and serves as a framework for co-metabolic interactions along redox2A version of this chapter has been published in Proceedings of the National Academy of Sciences as Metaproteomicsreveals differential modes of metabolic coupling among ubiquitous oxygen minimum zone microbes in 2014 by Hawley, A K.,Brewer, H.M., Norbeck, A. D., PasˇaTolic, L. and Hallam, S. J..50gradients in OMZs.3.1 IntroductionMarine oxygen (O2) minimum zones (OMZs) are widespread and naturally occurring watercolumn features that arise when respiratory O2 demand during decomposition of organic matterexceeds O2 availability in stratified waters. Operationally defined by dissolved O2 concentrations<20 µm, OMZs promote the use of alternative terminal electron acceptors (TEAs) in microbialenergy metabolism that results in climate active gas production including carbon dioxide (CO2),nitrous oxide (N2O) and methane (CH4) [6]. Currently OMZs constitute ∼7% of global oceanvolume [2, 10]. However, global climate change promotes conditions for OMZ expansion andintensification e.g., reduced O2 solubility and increased stratification, with resulting feedback onthe climate system, the extent to whcih have yet to be determined [10, 129].Within OMZs, the use of nitrate (NO3 – ) and nitrite (NO2 – ) as TEAs in dissimilatory nitratereduction (denitrification) and anaerobic ammonium oxidation (anammox) results in fixed nitrogenloss in the form of N2O and dinitrogen gas (N2) respectively [60, 192]. Because OMZs accountfor up to 50% of oceanic N2 production, they have the potential to limit primary production inoverlying surface waters [192, 193]. A recent model suggests that nitrogen fixation in proximityto OMZ waters can balance nitrogen loss processes [194], and several studies along redoxclinesin the Eastern Tropical South Pacific and Baltic Sea have measured nitrogen fixation rates thatsupport a close spatial coupling between nitrogen loss and nitrogen fixation consistent with thismodel [45], [44], [41]. Moreover, recent studies have begun to link the oxidation of reduced sulfur-compounds including thiosulfate (S2O32 – ) and hydrogen sulfide (H2S) to nitrogen transformationsin non-sulfidic OMZs providing evidence for a cryptic sulfur cycle with the potential to driveinorganic carbon fixation processes [78]. Indeed, many of the key microbial players implicated innitrogen and sulfur transformations in OMZs, including Thaumarchaeota, Nitrospina, Nitrospira,Planctomycetes and SUP05/ARCTIC96BD-19 Gammaproteobacteria have the metabolic potentialfor inorganic carbon fixation [33, 52, 55, 66, 77, 80] and previous process rate measurements inOMZs point to high rates of dark primary production [81–84]. However, the relative contribution51of each player to coupled carbon (C), nitrogen (N) and sulfur (S) biogeochemistry as a function ofredox zonation and in response to perturbation remains to be determined. These conributionshave important implications for understanding the long-term ecological and biogeochemicalimpacts of OMZ expansion and intensification on carbon and nutrient cycling in the global ocean.Here I investigate changes in microbial community structure and function in a seasonallystratified fjord, Saanich Inlet on Vancouver Island British Columbia Canada, to better understandmetabolic coupling along defined redox gradients. I combine cultivation-independent molecularapproaches including small subunit ribosomal RNA gene pyrosequencing, metagenomics andmetaproteomics to chart the progression of microbial community structure and gene expressionalong the redoxcline. I then construct a conceptual model linking different modes of inorganiccarbon fixation with distributed nitrogen and sulfur-based energy metabolism.3.2 Results and Discussion3.2.1 Water column chemistry and molecular samplingTo evaluate changes in water column redox gradients associated with different stages of stratifica-tion and renewal, samples were collected from the Saanich Inlet water column from station S3 onApril 9, 2008 (Apr08) and from multiple stations along the transect from the mouth of the inlet(S4) through the midpoint (S3) and at the back (S2) on September 1, 2009 (Sep09) (Figure 3.1Aand B) corresponding to metaproteomic datasets in CHapter 2. Water column chemistry profilesindicated four redox zones: upper oxycline (UO), lower oxycline (LO), sulfide nitrate transitionzone (SNTZ), and sulfidic zone (SZ) generally corresponding to dysoxic (20-90 µm O2), suboxic(1-20 µm O2), anoxic (<1 µm O2) and anoxic sulfidic water column conditions (Figure 3.2). Watercolumn redox zonation and associated microbial community structure was consistent with otherOMZs [2, 8] making Saanich Inlet a tractable model ecosystem for studying microbial communityresponses to changing levels of water column oxygen-deficiency.To explore changes in microbial community structure and function along water columnredox gradients, I analyzed paired metagenomic and metaproteomic datasets from Apr08 andpaired small subunit ribosomal RNA gene pyrosequencing and metaproteomics datasets from52Distance from back of inlet(km)Depth (m)05010015020025030040 30 20S4 S3 S210 ΑΒSaanich Inlet Transect in Sep09SuldiczoneS / NtransitionloweroxyclineupperoxyclineStation 20.0 0.4 0.8 1.20 100 200 300 400Station 30 4 8 12 160 4 8 12 16 0 4 8 12 16 0 4 8 12 160.0 0.4 0.8 1.20 100 200 300 400Station 40.0 0.4 0.8 1.20 100 200 300 4000 10 20 300 10 20 30 0 10 20 30 0 10 20 30Sep09Apr08Depth (m)0501001502000.0 0.4 0.8 1.20 100 200 300 400NH4+ , HS- (μM)NO2- (μM)O2 (μM/Kg)NO3- (μM)NH4+ , HS- (μM)NO2- (μM)O2 (μM/Kg)NO3- (μM)NH4+ , HS- (μM)NO2- (μM)O2 (μM/Kg)NO3- (μM)NH4+ , HS- (μM)NO2- (μM)O2 (μM/Kg)NO3- (μM)O2NO3- NH4+H2SNO2-Station 3Figure 3.1: Saanich Inlet bathymetry and chemistry. (A) Cross section of Saanich Inlet showing samplingstation locations. (B) Chemical profile of the water column in Saanich Inlet at indicated stations andsampling times, showing oxygen (O2), nitrate (NO3-), nitrite (NO2-), ammonia (NH4+), and hydrogensulfide (H2S) concentrations. Colors indicate region of water column including upper oxycline (green), loweroxycline (teal), S/N transition zone (blue), and sulfidic zone (purple), and colored bars indicate sampledepths for SSU rRNA gene pyrotags, metagenomics, and metaproteomics. [Reprinted with permissionfrom Zaikova et al. 2010 (Copyright 2009, Wiley & Sons).]53S2_130S3_150S2_150S4_150S2_200S3_200S4_190S3_100S4_100S2_100S3_130S4_130050100150200250300350Height564282878710010010010086bp200 50 80μMO2NO3-H2SSuldiczoneS / NtransitionloweroxyclineupperoxyclineFigure 3.2: Sample hierarchical clustering. Hierarchical clustering of metaproteome by NSAF (seeMethods) for detected proteins from Sep09 S2, S3, and S4 indicating compartments of the water column,with adjacent sparklines for oxygen (O2), nitrate (NO3-), and hydrogen sulfide (H2S) for each sample.Sep09 (Figure 3.1B). Sanger end sequencing of small insert clone libraries from the three Apr08samples yielded a total of 54,701 ORFs, with an average of 18,234 ORFs per sample. Smallsubunit ribosomal RNA (SSU rRNA) gene pyrosequencing of the 12 Sep09 samples yielded 87,138sequences that clustered into 3,385 non-singleton operational taxonomic units at the 97% identitythreshold. Tandem MS-coupled LC (LC/MS/MS) metaproteomic sequencing identified a totalof 5,019 unique proteins (Table B.1), a number comparable to previous marine metaproteomicstudies [195]. A consistent number of proteins were identified across the Sep09 samples, averaging695 unique proteins per sample (Table B.2). Although variability in protein detection in the Apr08samples was considerable, the high number of unique proteins detected in the Apr08 200 msample (4,344) enabled identification of more complete metabolic pathways.54CrenarchaeotaThaumarchaeotaHalobacteriaceaeMethanomicrobiaMarine group IIEuryarchaeota otherArchaea otherAcidimicrobiaceaeAcidimicrobiales ZA3409cActinobacteria otherBacteroidalesFlavobacteriaceaeBacteroidetes otherChlamydiae/Verrucomicrobia groupAnaerolineae envOPS12Chloroexi otherFirmicutesLentisphaeraeNitrospira deuviiPlanctomycetesRhizobialesRhodobacteraceae otherOceanicolaRuegeriaalphaproteobacterium HTCC2255RhodospirillaceaeRickettsialesSAR11Alphaproteobacteria otherBetaproteobacteriaNitrospinaSAR324DesulfobacteraceaeDeltaproteobacteria otherEpsilonproteobacteriaAlteromonadalesEctothiorhodospiraceae otherThioalkalivibrioThiorhodospiraChromatiales otherMethylococcalesARCTIC96BD-19SUP05Sulfur-oxidizing symbiontsOceanospirillalesEndoecteinascidiaceaeMethylophagaThiotrichales other OMG groupunclas.GammaproteobacteriaPhotobacterium sp. SKA34]Gammaproteobacteria otherZetaproteobacteriaOD1OP11 WCHB1-64 d153Arctic96B-7 A714017SSW63Au SHAS460SBR1093Bacteria otherEykarotaVirusNo Hit/Bleow CutoArchaea BacteriaAlphaProteobacteriaDelta GammaCandidateSAR406S4_100 mS3_100 mS2_100 mS4_130 mS3_130 mS2_130 mS4_150 mS3_150 mS2_150 mS4_190 mS3_200 mS2_200 m100 m120 m200 mOxygen StatusApr08WGS and metaproteomeSep09pyrotags and metaproteomeupper oxyclineMetagenomeabundancePyrotag abundanceMetaproteome NSAFlower oxyclineS / N transition zonesuldic zoneOxygen StatusOxicDysoxic (20 - 90 µM O2)Suboxic (1- 20 µM O2)AnoxicApr08Sep09Scale<0.010.1110255055Figure 3.3: Taxonomic distribution of metagenome, metaproteome and pyrotags (previous page). Tax-onomic distribution and relative abundance of metagenome for Apr08 (gray), SSU rRNA pyrotag forSep09 (gray), and metaproteome in upper oxycline (green), lower oxycline (teal), S/N transition zone(blue), and sulfidic zone (purple). Relative abundance for taxonomic groups is shown for selected groupsincluding any representing >2% SSU rRNA gene pyrotag, metagenome, or metaproteome datasets. Formetagenome percentages were determined as the sum of all ORFs in unassembled metagenomic reads hitto a given taxa, normalized to the total number of ORFs over 30 residues long. Percentage of metaproteomedetermined as the sum of all NSAF for all detected proteins with hit to a given taxa. Hierarchical clusteringof detected protein abundance shown above with color indicating oxygen status of the water column attime of sampling.3.2.2 Patterns of redox-driven niche partitioningTo determine patterns of redox-driven niche partitioning along redox gradients in metagenomicand metaproteomic datasets, I compared community composition with ORF counts and proteinnormalized spectral abundance (NSAF, see methods and Chapter 2) between UO, LO, SNTZand SZ (Figure 3.3). Hierarchical clustering of NSAF values was consistent with redox zonation(Figure 3.2). Clear trends in protein abundance were observed in relation to redox zonation notreflected in pyrotag and metagenomic datasets, consistent with alternative forms of couplingor regulated gene expression. Ammonia oxidizing Thaumarchaeota, mediating the first step ofnitrification, dominated UO and LO samples and decreased in abundance within the SNTZ andSZ. Similar trends were observed with respect to ORF counts and NSAF values (Tables B.2B.3).Thenitrite oxidizing bacterium Nitrospina gracilis [28], mediating the second step of nitrification, wasabundant in UO and LO samples and decreased in abundance within the SNTZ and SZ. A secondnitrite oxidizing bacterium Nitrospira defluvii [55], although absent from the pyrotag datasets,exhibited high NSAF values with a similar distribution pattern as Thaumarchaeota and N. gracilis.Anammox bacteria affiliated with the Planctomycetes (Tables B.2 -B.4) exhibited intermediateabundance (∼1%) in the UO and LO samples, decreasing in abundance within the SNTZ beforeincreasing again in the SZ. Planctomycete ORF abundance increased along the redoxcline whileprotein NSAF values were high in the UO and LO, decreasing to intermediate values withinthe SNTZ and SZ. These patterns of protein expression confirm previous reports of couplednitrification and anammox observed in OMZs based on process rate and functional marker geneabundance [56, 62].56In addition to known players in the nitrogen cycle summarised in section 1.2.1, taxa involvedin sulfur cycling or coupled nitrogen and sulfur cycling were also abundant and active in thewater column. Multiple lineages affiliated with SAR11 within the Alphaproteobacteria, mediatingdimethylsulfoniopropionate (DMSP) oxidation [23], were abundant in the UO and LO samples,decreasing in abundance within the SNTZ and SZ (Figure 3.3). Similar trends were observedwith respect to ORF counts and NSAF values (Tables B.2 -B.4). Multiple lineages affiliated withSUP05/ARCTIC96BD-19 and symbiont-related Gammaproteobacteria (Tables B.2 -B.4), mediatingoxidation of reduced sulfur compounds using O2 [77] or NO−3 [33] as TEAs, were also abundant.The ORFs for ARCTIC96BD-19, SUP05 and symbionts exhibited reciprocal distribution patterns,with ARCTIC96BD-19 ORFs decreasing and SUP05 and symbiont ORFs increasing in abundancewithin the SNTZ and SZ. A similar pattern was observed with respect to NSAF values, withhigh SUP05 NSAF values in the LO, SNTZ and SZ. These distribution patterns support previousreports of ARCTIC96BD-19 and SUP05 population structure [2, 33, 77, 149, 196].Collectively, Thaumarchaeota, Nitrospina, Nitrospira, Planctomycetes, SAR11 and SUP05/ARC-TIC96BD-19 and symbiont-related Gammaproteobacteria comprised on average 48% of pyrotag,41% of metagenomic, and 64% of metaproteomic datasets (Tables B.2, B.3). Several taxonomicgroups that were abundant based on pyrotags (≥1%) including Marine Group II Euryarchaea,Crenarchaeota, Acidomicrobiales, Bacteroidetes, Chloroflexi, Flavobacteria, Desulfobacteraceaeand Candidate divisions OD1, OP11, Marine Group A, SBR1093 were not well represented inmetagenomic or metaproteomic datasets (Figure 3.3). Lack of indigenous reference genomes likelycaused many sequences originating from these groups to be classified as no hit or below cut-off(see methods). Consistent with this observation, BLAST queries against the Genomic Encyclopediaof Bacteria and Archaea Microbial Dark Matter (GEBA-MDM) single-cell genome collection [19]yielded only 23 additional protein sequences which had otherwise been classified as below cutoffor no hit. Conversely, several taxonomic groups including N. defluvii and ARCTIC96BD-19 thatwere absent in pyrotag datasets exhibited intermediate ORF counts and NSAF values. Thisdiscrepancy was likely due to incomplete taxonomic resolution for these groups within theGreengenes database. Approximately 1% of pyrotag and 10% of metagenomic and metaproteomicdatasets remained unaffiliated with any taxonomic group. Taken together, these results indicate57that active nitrogen and sulfur cycling microorganisms are the primary contributors to bothgenetic potential and gene expression along the redoxcline in Saanich Inlet.3.2.3 Differential gene expression patternsTo investigate patterns of gene expression driving carbon and energy metabolism along theredoxcline in Saanich Inlet, I identified nitrification, anammox, denitrification, sulfur oxidationand inorganic carbon fixation pathway components in metagenomic and metaproteomic datasetsusing BLAST. By summing the NSAF values for each component I observed differential patternsof gene expression and metabolic coupling along the redoxcline (Figure 3.4). Expression of thesepathways was remarkably stable under similar redox conditions in space (Sep09 S2-S4) and time(Apr08 to Sep09) (Figures B.1 - B.3).Expressed pathways for nitrogen-based energy metabolism progressed from ammonia oxidationand nitrification in the UO and LO to denitrification in the SNTZ and SZ (Figure 3.4, B.1). Proteinscatalyzing the first step of nitrification, ammonia monooxygenase subunits B and C (Amo), fromThaumarchaeota, were detected in the UO and LO and decreased along the redoxcline. Proteinscatalyzing the second step of nitrification, nitrite oxidase (NXR), from N. graclilis and N. defluviifollowed the same pattern of expression as Amo. Moreover, the detection of both Amo andNXR from nitrifying taxa, albeit at lower NSAF values in the SNTZ and SZ, supports recentobservations of NO−2 oxidation in the Namibian OMZ with implications for NO−3 supply forreduction via denitrification [56, 197]. Nitrite oxidase from Planctomycetes (NXR) (Figure 3.4) [66]had the highest NSAF values of any protein in the UO and LO and exhibited a similar expressionprofile to Amo and NXR originating from N. gracilis and N. defluvii (Figure 3.4, B.1). Conversely,proteins catalyzing anammox, including hydrazine and hydroxylamine oxidoreductases (Anx)from Planctomycetes (Figure 3.4, B.1) exhibited opposing expression patterns, with low NSAFvalues in the UO and LO that increased in the SNTZ and SZ. Contrasting patterns of NXR andAnx expression from Planctomycetes could reflect a metabolic response to O2 resulting in ashift between maintenance energy production in the UO and LO to anammox for growth undermore favorable redox conditions in the SNTZ and SZ. Alternatively, close sequence similaritybetween Planctomycetes, N. gracilis and N. defluvii NXR could confound BLAST-based taxonomic58100 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTIC96BD-19SUP05SymbiontsOther120 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTIC96BD-19SUP05SymbiontsOtherThaumarchaeotaNitrospira deuviiPlanctomycetesARCTIC96BD-19SUP05SymbiontsOther200 m100 m120 m150 m200 mThaumarchaeotaNitrospira deuviiNitrospina gracilisNitrospina gracilisNitrospina gracilisNitrospina gracilisPlanctomycetesARCTIC96BD-19SUP05SymbiontsOtherScale:AmoAnxNarNRXNapNirSNirKNorNosZHAO-likeSqrFccSoxDsrAprSat3HP-4HBCBBrACoAAmoAnxNarNRXNapNirSNirKNorNosZHAO-likeSqrFccSoxDsrAprSat3HP-4HBCBBrACoAApr08Nitrogen Sulfur CarbonSep09Nitrogen Sulfur Carbon<0.01 0.1 1 5 9upper oxyclinemetagenome abundanceMetaproteome NSAFlower oxyclineS/N transition zonesuldic zoneFigure 3.4: Nitrogen, sulfur and carbon cycling proteins. Distribution and NSAF value of proteinsinvolved in nitrogen and sulfur-based energy metabolism and inorganic carbon fixation for taxa abundantin the metaproteome. For metagenome (gray, Apr08 only) and metaproteome in upper oxycline (green),lower oxycline (teal), S/N transition zone (blue), and sulfidic zone (purple). See Table C.3. for full listof protein names; Anx indicates anammox hydroxylamine oxidoreductase and hydrazine oxidoreductaseproteins59assignment.Proteins mediating the partial denitrification pathway from SUP05 including dissimilatory nitratereductase subunits G and H (Nar), periplasmic nitrate reductase subunits A and B (Nap), andnitrite reductase (NirK) were detected in the UO and increased in abundance along the redoxcline(Figure 3.4, B.1). Protein NSAF values for SUP05 Nar increased relative to Planctomycetes NXRin the SNTZ and SZ. Additional proteins for SUP05 nitric oxide reductase subunits B and C(Nor) were detected with similar NSAF values in the LO, SNTZ and SZ. Although denitrificationpathway components from other taxonomic groups were detected in the water column, SUP05was the only group to express consecutive proteins in the denitrification pathway, making up 50%of the total NSAF value of all denitrification proteins. While SUP05 contributed 50% of the totaldenitrification proteins in the Saanich Inlet water column, the remaining 50% were distributedbetween additional taxa. In addition to SUP05, Nap associated with Magnetococcus marinus andSulfuricella denitrificans from Alpha and Betaproteobacteria respectively was observed with highNSAF value within the LO, SNTZ and SZ. The NirS protein from other taxa including Colwelliapsychrerythraea and Sillicibacter lacuscaerulensis with smaller contribution from Ruegeria sp. TW15was also observed with high NSAF value in the LO, SNTZ and SZ. These observations pointto SUP05 as the dominant player in nitrogen-based energy metabolism in the SNTZ and SZ.The detection of SUP05 Nap and NirK in the UO and LO where O2 concentrations approached120 µm was unexpected given that 20 µm O2 is a commonly accepted threshold for denitrification[12] and may have implications for the O2 threshold for nitrogen loss processes in other OMZs.Additionally, the detection of SUP05 Nor and the absence of nitrous oxide reductase (NosZ) in theLO, and low abundance in SNTZ and SZ suggest SUP05 as a source of N2O. Recent observationsof enrichment of nor and nosZ genes on particles within OMZs suggest a distributed denitrificationpathway across particle and non-particle niches [16] and may account for low NSAF valuesobserved for NorCB and NosZ.Expressed pathways for sulfur-based energy metabolism were detected in the UO and increasedin NSAF value along the redoxcline (Figure 3.4, B.2). Proteins catalyzing sulfide oxidationpredominantly originated from SUP05/ARCTIC96BD-19 and symbiont-related Gammaproteobac-teria. With the exception of ARCTIC96BD-19 adenylylsulfate reductase (Apr) the vast majority60of proteins originated from SUP05 and symbionts. With respect to SUP05, flavocytochrome C(Fcc), sulfide oxidation proteins (Sox), dissimilatory sulfate reductase (Dsr) and adenylylsulfatereductase (Apr) were detected in the UO and increased in NSAF value along the redoxcline. Inaddition, SUP05 ATP sulfurylase (Sat) and sulfide:quinone oxidoreductase (Sqr) were detected inthe LO, SNTZ and SZ and SNTZ and SZ, respectively. These results are consistent with recentSUP05 protein expression profiles observed in hydrothermal plume and overlying waters [195].With the exception of Sox, symbiont proteins catalyzing sulfide oxidation followed the sameexpression pattern as SUP05. The expression of sulfur oxidation pathway components fromSUP05/ARCTIC96BD-19 in the UO and LO is consistent with a cryptic sulfur cycle. However, noproteins from defined sulfate (SO24-) reducing bacteria were identified in the metaproteome [78].This observation could reflect a bias against particle-associated microorganisms capable of SO24-reduction during sample processing or the use of alternative electron donors including DMSP,elemental sulfur, thiosulfate or polysulfide in the UO and LO. Additionally, proteins with BLASThits to the hydrogenase subunit HupL originating from Guaymas Basin SUP05 metagenomes [122]were detected in the SNTZ with NSAF values comparable to SUP05 NapA (Figure B.2), expandingthe range of potential substrates for SUP05 energy metabolism in the Saanich Inlet water column.Expressed proteins for three inorganic carbon fixation pathways including the 3-hydroxypropion-ate/4-hydroxybutyrate (3HP-4HB) from Thaumarchaeota, reductive acetyl-CoA (rACoA) fromPlanctomycetes and Calvin Benson Basham (CBB) cycle from SUP05 cycles were differentiallyexpressed along the redoxcline (Figure 3.4, B.3). Unlike proteins mediating nitrogen and sulfur-based energy metabolism, ORFs encoding carbon fixation pathway components were found inhigher relative abundance in the metagenome (Figure 3.4).Proteins catalyzing the 3HP-4HB pathway in Thaumarchaeota were detected predominantly inthe UO including 4-hydroxybutytyl-CoA dehydratase, acetyl-CoA carboxylase and propionylCoA carboxylase [52, 80, 198]. Similar expression patterns were observed for Amo and otherammonia oxidation pathway components, providing evidence of inorganic carbon fixation coupledto ammonia oxidation by Thaumarchaeota in the UO. Consistent with previous reports, proteinscatalyzing a putative Planctomycete rACoA pathway were detected in the SZ in Apr08 along withAnx proteins providing evidence for inorganic carbon fixation coupled to anammox under sulfidic61conditions (2.1 µm) [199]. Protein NSAF values for SUP05 CBB pathway components increasedrelative to other bacteria in the SNTZ and SZ providing compelling evidence for inorganic carbonfixation coupled to sulfide-oxidation and partial denitrification by SUP05. Indeed, CBB pathwaycomponents had the highest ORF counts and protein NSAF values of all carbon fixation pathways,composing 47% of all carbon fixation proteins within the SNTZ and SZ. In addition to inorganiccarbon fixation pathways, the abundance of SAR11 DNA and protein in the UO and LO (Figure3.3, Tables B.2, B.3) suggest that heterotrophic remineralization of dissolved organic matter (DOM)is an active process in the UO and LO. Specifically, ABC transporter proteins for uptake of glycinebetaine, spermidine/putrescine and taurine, (sources of carbon, nitrogen and sulfur, respectively)were detected with moderate NSAF values within the UO and LO. In addition to consumingmolecular oxygen, remineralization of DOM by SAR11 and other heterotrophic groups in the UOand LO could act as a source of metabolic substrates including NH+4 SO24- and CO2.3.2.4 Regulated gene expressionGiven the numerical abundance of SUP05 in the Saanich Inlet water column I were able toresolve changes in protein expression originating from a metabolic island integrating nitrogenand sulfur-based energy metabolism with inorganic carbon fixation [33] (Figure 3.5). Specifically,NSAF values for SUP05 Sqr, NarH and NarG subunits appeared to vary as a function of O2concentration while FccAB, NapAB subunits remained relatively constant in the LO, SNTZ andSZ (Figures 3.4, B.1 - B.3). Close proximity and similar expression profiles for napAB and fccABis consistent with regulated gene expression along the redoxcline. Indeed, two ORFs encodingCrp/Fnr transcriptional regulators implicated in redox sensing [200] are located on either side ofthe nap/fcc gene cluster with the potential to modulate gene expression and Crp/Fnr proteins(SUP05 0428) were detected in the Apr08 SZ (Figure 3.5).Protein NSAF values for CbbM, a ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO)subunit located in proximity to the nar gene cluster increased between the UO and LO andremained relatively constant in the SNTZ and SZ. These results provide functional evidencein support of previous genomic observations positing a highly integrated and redox-sensitiveenergy metabolism in SUP05 with direct implications for energy supply to inorganic carbon62narHnarKnarK2cbbOcbbQcbbMnarJnarIdsrC-likenarGcrp/fnrhaonapGnapH napAnapBnapFfccAfccB crp/fnrSUP05_FGYC13F180031 - 46 SUP05_FGYC13J70002-4, 17 - 30(GQ351267)(GQ351266)S4_100S3_100S2_100S4_130S3_130S2_130S4_150S3_150S2_150S4_200S3_200S2_200S3_100S3_120S3_200scaleApr 08Sep 09< 0.02 0.05 0.5 1 3 6NSAF % metagenomeabundance upper oxyclinesulfur nitrogencarbonHAO-like lower oxyclineS /N transitionsuldicKey200 80O2NO3-H2S*µMFigure 3.5: SUP05 gene expression regulation. Relative abundance of SUP05 genes and proteins in twooverlapping SUP05 fosmid sequences (GQ351266 and GQ351267) [33]. Metagenome (gray, Apr08 only) andmetaproteome for the upper oxycline (green), lower oxycline (teal), S/N transition zone (blue), and sulfidiczone (purple). Selected SUP05 genes involved in denitrification (dark gray shading), sulfide oxidation (blackshading), and putative hydroxylamine oxidoreductase (diagonal lines) are indicated. Protein abundanceshown as summed NSAF values for all detected ORFs with top hit to a given SUP05 protein. Metagenomeabundances shown as percentage of ORFs with top hit to a given SUP05 gene with sparklines for oxygen(O2), nitrate (NO3), and hydrogen sulfide (H2S) for each sample.63fixation. In addition to coordinated Fcc, Sqr, Nar, Nap and CbbM expression, an ORF encoding ahydroxylamine-oxidoreductase homolog (HAO-like) located in the nap/fcc gene cluster was amongthe most abundant SUP05 proteins detected in the SZ (Figure 3.4, 3.5, B.1). SUP05 hao is closelyrelated to genes found in the sulfur oxidizing endosymbionts as well as the anammox bacteriumCa. Kuenenia stuttgartiensis. All four HAO homologs contain eight CxxCH multi-heme motifssimilar to those found in NrfA, a nitrite reductase mediating dissimilatory nitrate reduction toammonia (DNRA) [201].3.2.5 Metabolic coupling modelAlthough bulk inorganic carbon fixation rates within OMZs have been measured [81–84], fewstudies have directly linked inorganic carbon fixation with energy metabolism of defined OMZmicrobes [154]. With this linkage in mind, I construct a metabolic model describing taxonomicand metabolic networks coupling pathways of nitrogen and sulfur-based energy metabolism andinorganic carbon fixation along the redoxcline in Saanich Inlet based on metaproteomic datasets(Figure 3.6). In this model, heterotrophic remineralization of DOM releases CO2, SO24- and NH+4within the UO and LO. Thaumarchaeota couple oxidation of NH4+ to inorganic carbon fixationvia the 3HP-4HB pathway within the UO producing NO−2 , a process that has been demonstratedboth in culture [50] and in situ [202]. Nitrous oxide is also produced as a by-product of ammoniaoxidation [202, 203]. Nitrite produced via NH+4 oxidation is oxidized in turn by N. defluvii, N.gracilis and Planctomycetes in the UO [56, 66]. The extent to which this process is coupled toinorganic carbon fixation within these groups remains to be determined. Ammonia oxidationattenuates as O2 levels decline in the LO, SNTZ and SZ accompanied by a transition to partialdenitrification and anammox. In the LO, SUP05 begins to couple oxidation of reduced sulfurcompounds, and possibly hydrogen, with NO−3 reduction to N2O to fix inorganic carbon viathe CBB pathway, a trend that increases in the SNTZ and SZ [33, 75, 122, 134]. In parallel,Planctomycetes couple anammox to fix inorganic carbon via the rACoA pathway [134], althoughthe broader occurrence of this process in the water column remains to be determined. Competitionbetween SUP05 and Planctomycetes for oxidants could help explain variations in spatial andtemporal dynamics of nitrogen loss processes observed in different OMZs. Alternatively, potential64DNRA by SUP05 could supply NH+4 for anammox resulting in a cometabolic linkage. Overall,the interactions described in the model are dynamic and reflect patterns of redox-driven nichepartitioning regulating nitrogen loss processes and carbon flux through ubiquitous OMZ microbes.65NO2- N2O N2OHS- SO42-HS- SO42-CO23HP-4HBCO2rACoACO2CBBNH4+NO3-N2ONH4+N2CO23HP-4HBCO2rACoACO2CBBNH4+NO3-NO2-N2ONH4+N2upper oxyclineS / N transitionzonelower oxyclinesuldic zoneNitrospiradeuviiNitrospinagracilis/SAR11clade Planctomycetes SUP05 uncultured bacteriumThaumarchaeotasulfuroxidationheterotrophicremineralizationenergy couplingnitrogen transformationhydrogen oxidationcarbon (pathway indicated)carbon xationNO2-N2OHS- SO42-CO23HP-4HBCO2rACoACO2CBBNH4+NO3-N2ONH4+N2CO23HP-4HBCO2rACoACO2CBBNH4+NO3-NO2-NO3- NO3- NO3-N2ON2ONH4+N2HS-H2X-H2SO42-DOMNO3-CO2NH4+SO42-DOMCO2NH4+SO42-DOMCO2NH4+SO42-DOMCO2NH4+SO42-????H2X-H2H2X-H2H2X-H2Figure 3.6: metabolic model. Proposed metabolic model based on metaproteomic observations for heterotrophic remineralization (brown) andenergetic coupling (yellow dashed lines) of nitrogen (green), sulfur (red), and hydrogen (orange) based chemolithotrophic energy metabolismwith carbon fixation (yellow star) for taxa abundant in the metaproteome. Line weight and arrow size indicate magnitude of metabolic activity.Gray lines, activity not occurring under given conditions; light gray taxa, reduced abundance and metabolic activity.66Energy for inorganic carbon fixation within the LO, SNTZ and SZ is derived in large part fromdenitrification and anammox nitrogen loss processes with the balance between these processesimpacting energy flow to either SUP05 or Planctomycetes with concomitant feedback on growthrates. The numerical dominance of SUP05 DNA and protein in the LO, SNTZ and SZ relative toPlanctomycetes suggests that partial denitrification outcompetes anammox from a bioenergeticperspective. Indeed, the difference in free energy yield between the two processes (denitrificationcoupled to sulfide oxidation yields ∼ 3.5 times the Gibbs free energy as anammox under standardconditions) is consistent with lower cell abundance and biomass for anammox bacteria eventhough anammox is observed more frequently than denitrification in many OMZs [12]. As OMZsexpand, the contribution of SUP05 to inorganic carbon fixation may have significant impact onglobal ocean carbon cycling if sufficient energetic substrates are available. However, the fate ofcarbon fixed in OMZ waters is largely unknown, as the balance between carbon transport andheterotrophic remineralization processes remains to be constrained.3.3 Conclusions and future implicationsThis chapter represents the first metaproteome of an O2-deficient water column encompassing therange of redox conditions, from dysoxic to anoxic sulfidic, found in OMZs globally. Although arecent numerical model by Reed and colleagues attempted to integrate geochemical processesand functional gene markers in the Arabian Sea OMZ [204], my conceptual model uses proteinexpression to describe differential metabolic coupling among ubiquitous OMZ microbes. TheReed model implicitly assumes reaction rates scale linearly with gene abundance. Thus, themodel does not account for biological information flow from DNA to RNA to protein, a regulatedprocess resulting in assembly of pathways driving real world process rates. Incorporation ofprotein expression information into the model could be used to convert gene abundance intoprotein abundance or protein production rates, resulting in more accurate predictions.Additional modeling efforts by Louca et al. [132] based on the conceptual model put forth here,incorporate Saanich Inlet multi-omics datasets outlined in Chapter 2 including meta-genomes,-transcritpomes and -proteomes as well as rate measurements for anammox and denitrification.67Louca’s meathamatical model reproduces transcript and protein concentration profiles along theSaanich Inlet redox gradient based on fluxes of energetic substrates O2, NO3 – , NO2 – and H2S. Theprediction of gene expression levels solely from geochemical fluxes suggests that for genes involvedin energy metabolism, geochemistry is a robust predictor of microbial community structure andgene expression. Indeed, the Louca model predicts a metabolic niche for N2O reduction withinSaanich Inlet, and is later addressed in Chapter 4 with the addition of the dark matter phylaMarinimicrobia. The Louca model suggests a central role of SUP05 gammaproteobacteria inthe production of NO2 – fuelled by sulfide oxidation, a trait observed in the recently culturedCandidatus Thioglobus autotrophicus [34] and with the potential to feed both further denitrificationand anammox driven nitrogen loss. Furthermore, the Louca model strongly supports the findingsin my currently conceptual model of the central role of SUP05 as a dominant contributor toinorganic carbon fixation within OMZs, providing insight into ocean carbon and nutrient cycling.Global climate models predict future expansion and intensification of OMZs, with concomitantshoaling and stabilization of sulfidic zones [4, 199]. Such a scenario would provide an increasedhabitat for SUP05, supporting inorganic carbon fixation via direct oxidation of reduced sulfurcompounds and cryptic sulfur cycling in oxygen-deficient waters, resulting in increased primaryproduction and potentially increased carbon sedimentation [13]. Given an estimate of 4.61X1018L of O2-deficient marine waters [3] and the range of observed dark carbon fixation rates fromvarious OMZs of 2.5 – 0.2 µm L-1 day-1 [81–84], I estimate 0.45 Pg C y−1 fixed in OMZs globally.This number represents up to 10% of surface primary production (using 48.5 Pg Cy−1[43]) andwill continue to increase with OMZ expansion. With 47% of observed carbon fixation proteinsoriginating from SUP05, I suggest that SUP05 is responsible for 0.2 – 2.4 Pg Cy−1, representingup to 5% of surface primary productivity. Although OMZ expansion is a predicted consequenceof global warming, negative feedback loops may ultimately lead to increased drawdown ofatmospheric CO2 driven in large part by blooming SUP05 populations.683.4 Methods3.4.1 Sample collectionSample collection was carried out as described in Chapter 2 for SSU rRNA, metagenomes andmetaproteomes, on board the MSV John Strickland in Saanich Inlet April 9 2008 at station S3(4835.30N, 12330.22W) (Apr08), and September 1, 2009 at station S2 (48◦33.106 N, 123◦ 32.081 W),station S3 and station S4 (48◦ 38.310 N, 123◦ 30.007 W) (Sep09) (Figure 3.1A) using a combinationof 12 L Niskin and 8 L Go-Flo bottles. Samples for metagenomics, metaproteomics and smallsubunit ribosomal RNA gene pyrosequencing were collected as described in Zaikova et al. (2009)and Hawley et al. (2016) [8, 114], with the exception the 1.0 L Apr08 metaproteomic sampleswhere RNAlater (Ambion) was used instead of lysis buffer. Multiple depths along the oxycline atall stations and dates were sampled for NO3-, NO2-, NH4+ and H2S as previously described inZaikova et al. (2009) and Torres-Beltra´n[8, 115] and a SBE O2 sensor on CTD was used to monitorO2 concentrations.3.4.2 Environmental DNA extraction, sequencing and assemblyEnvironmental DNA extraction was carried out as previously described in Hawley et al. (2013),Zakiova et al. (2009) and in video format (http://www.jove.com/video/1161/). Metagenomicsamples were sequenced at the Department of Energy Joint Genome Institute (Walnut Creek,CA) by Sanger shotgun sequencing. Sequences were annotated and translated into amino acidsequences using the FGENESB pipeline from Softberry (www.softberry.com/berry.phtml) asdescribed in Walsh et al. (2009) supplemental material. Sanger end sequencing of small insertclone libraries from the three Apr08 samples yielded a total of 54,701 open reading frames(ORFs), with an average of 18,234 ORFs per sample. The metagenomic library used for peptideidentification was assembled from DNA sequences from 16 fosmid libraries sourced from SaanichInlet samples collected between February 2006 to February 2007 (accession: LIBGSS 03912-17)([32]) with the addition of small insert clone libraries from April 2008 samples described in thecurrent study. Libraries were prepared as described in Walsh 2009 supplement [33]. Assemblywas carried out separately on libraries from 10 m and on libraries from 100 to 200 m using Phrap69(minmatch 30, maxmatch 55, minscore 55, max subclone size 50000, revise greedy vector bound20) [205] to yield 5,620 scaffolds from 10 m libraries and 37,657 scaffolds from 100 - 200 m librarieswith a remaining 141,349 un-assembled sequences. Scaffolds were subsequently annotated andtranslated with the FGENESB pipeline from Softberry (www.softberry.com/berry.phtml) usinga cutoff of 30 amino acids for minimum protein length to yield of 56,476 ORFs from 10 m libraries,57,674 ORFs from 100 - 200 m libraries and 112,828 ORFs from unassembled sequences.3.4.3 PCR amplification of SSU rRNA gene for pyrotag sequencing and analysisPyrosequencing of Sep09 samples was carried out as described in Allers et al. (2013)[31]. Briefly,small subunit ribosomal RNA (SSU rRNA) gene pyrosequencing of the 12 Sep09 samples yielded87,138 sequences that clustered into 3,385 non-singleton operational taxonomic units at the 97%identity threshold.3.4.4 Environmental protein extraction and identificationTotal protein was extracted from Sterivex filters as described in Hawley et al. (2013) [206] anddescribed in Chapter 2. Briefly, BugBuster (Novagen) was added to Sterivex filters to lyse cells,and lysate was extruded. Buffer exchange with 100 mM NH4HCO3 was carried out using a 10KAmicon (Millipore) and purified proteins were subject to overnight Trypsin digestion followedby cleanup on C18 (Sigma-Aldrich) and strong cation exchange solid phase extraction columns.Digested protein concentration was determined by bicinchoninic acid assay. Environmentalpeptides were analyzed by tandem-mass spectrometry (MS/MS) at the Environmental MolecularSciences Laboratory at Pacific Northwest National Labs (Richland, WA) as described in Hawley etal. (2013) [206] using on-line capillary liquid-chromatography-tandem mass spectrometry on aThermo LTQ ion trap or LTQ-Orbitrap using data-dependent fragmentation. Detected peptideswere identified from MS/MS using SEQUEST™with a mass spectra generating function (MS-GF)cutoff value below 10-11, corresponding to a false discovery rate of less the 2% [207]. Peptideswere searched against the Saanich Inlet metagenomic database, consisting of an assembly ofmetagenomic sequences from previously sequenced fosmid end libraries of environmental DNAcollected in Saanich Inlet from multiple depths between February 2006 and April 2007 [32] and70April 2008 metagenomic samples comprising a total of 176,978 protein sequences. Only peptidesmatched to protein sequences with a peptide prophet probability (PPP) score ≥ 0.95 were usedin further analysis. Tandem mass-spectroscopy coupled liquid chromatography (LC-MS/MS)metaproteomic sequencing identified a total of 5,019 unique proteins, a number comparable toprevious marine metaproteomic studies [208]. A consistent number of proteins were identifiedacross the Sep09 samples, with an average of 695 unique proteins per sample Table B.1. Whilevariability in protein detection in the Apr08 samples was considerable, the high number of uniqueproteins detected in the Apr08 200 m sample (4,344) was exceptional, enabling identification ofmore complete metabolic pathways.3.4.5 Functional and taxonomic assignment of metagenome and metaproteomeTaxonomy and function for all metagenomic (translated) and metaproteomic amino acid sequenceswere assigned as the top hit from BLASTP [209] against NCBI RefSeq database (Nov. 7, 2012)augmented to include SUP05 metagenome [33], Candidatus Kuenenia stuttgartiensis , and Can-didatus Scalindua profunda [208] using a BLAST score ratio of 0.4 as a cutoff [186]. Relativeabundance of genes in the metagenome was reported as the sum of all sequence reads with hitto a given accession divided by the total number of sequence reads in a given sample over 30amino acids in length. In the metaproteome the relative abundance of a detected protein wasreported as the normalized spectral abundance was described in Hawley et al. 2013 [206] . Withina given sample, peptide scan counts for each protein were summed, and in cases where peptidesequences matched to multiple proteins the scan count for the peptide was divided by the numberof proteins it matched to. The scan count for each protein was then divided by protein length (inamino acids) to give the spectral abundance factor (SAF), which was then divided by the sum ofall SAFs for a given sample to yield the normalized spectral abundance factor (NSAF) for a givenprotein.3.4.6 Hierarchical clustering of metaproteomic samplesThe NSAF values for all detected proteins with a PPP ≥ 0.95 were used in the calculation of aSorensen distance matrix using PC-ORD software, and a group average method was used for71grouping in construction of clusters.72Chapter 4Diverse Marinimicrobia bacteria maymediate coupled biogeochemical cyclesalong eco-thermodynamic gradients3As introduced in Chapter 1, microbial communities drive biogeochemical cycles through networksof metabolite exchange that are structured along redox and energy gradients. As energy yieldsbecome limiting, these networks favor co-metabolic interactions to maximize energy yield. Inthis chapter, I apply single-cell genomics and use metagenomic, and metatranscriptomic datasetsdescribed in Chapter 2 to study populations of the abundant microbial dark matter groupMarinimicrobia along defined energy gradients. I show that evolutionary diversification ofmajor Marinimicrobia clades appears to be closely related to energy yields, with increased co-metabolic interactions in more deeply branching clades. Several of these clades participate in thebiogeochemical cycling of sulfur and nitrogen, filling previously unassigned niches in the ocean.Notably, two Marinimicrobia clades, occupying different energetic niches, express nitrous oxidereductase, potentially acting as a global sink for the greenhouse gas nitrous oxide.4.1 IntroductionThe laws of thermodynamics apply to all aspects of Life, governing energy flow in both bioticand abiotic regimes. Nicholas Georgescu-Roegen was the first to directly apply the laws of3A version of this chapter has been published at Nature Communications as Diverse Marinimicrobia bacteria maymediate coupled biogeochemical cycles along eco-thermodynamic gradients in 2017 by Hawley, A. K., Nobu, M. K., Wright, J. J.,Durno, W. E., Morgan-Lang, C., Sage, B., Schwientek, P., Swan, B. K., Rinke, C., Torres-Beltra´n , M., Mewis, K., Liu,WT., Stepanauskas, R., Woyke, T., Hallam, S. J.73thermodynamics to economic theory, bringing to the forefront the reality of limited naturalresources on sustainable growth [210]. Robert Ayers used the term eco-thermodynamics to describethe application of thermodynamics and energy flow to economic models with the controversialconclusion that future economic growth would necessitate the recycling of goods [211]. Withinmicrobial ecology there is an emerging consensus that these same organizing principles structuremicrobial community interactions and growth with feedback on global nutrient and energycycling [38, 132, 204, 212]. Indeed, recycling in the common sense may be analogous to metaboliteexchange or use of public goods [213], as the goods from one production stream become availablefor growth of another. Microbial communities living near thermodynamic limits where highpotential electron acceptors are scarce tend to utilize differential modes of metabolic couplingincluding obligate syntrophic interactions, maximizing any chemical disequilibria to yield energyfor growth [90, 102]. Thus, the term eco-thermodynamics takes on new meaning in the context ofmicrobial ecology where thermodynamic constraints directly shape the structure and activity ofmicrobial interaction networks.Eco-thermodynamic gradients are formed by the distribution of available electron donors andacceptors within the physical environment, creating metabolic niches that are occupied by diversemicrobial partners playing recurring functional roles [214, 215]. Marine oxygen minimum zones(OMZs) provide a vivid example of eco-thermodynamic gradients shaping differential modes ofmetabolic coupling at the intersection of carbon, nitrogen and sulfur cycling in the ocean [2, 79].For example, OMZ microbial communities manifest a modular denitrification pathway that linksoxidation of reduced sulfur compounds to nitrate reduction and nitrous oxide (N2O) production[33, 79, 134]. While many of the most abundant interaction partners are known, recent modelingefforts point to a novel metabolic niche for the terminal step in the denitrification pathway(nitrous oxide reduction to dinitrogen gas) occupied by unidentified community members [132].By defining the interaction networks coupling microbial processes along eco-thermodynamicgradients, it becomes possible to more accurately model nutrient and energy flow at ecosystemscales.Recent advances in sequencing technologies have opened a genomic window on uncultivatedmicrobial diversity, illuminating the metabolic potential of numerous candidate divisions also74know as microbial dark matter (MDM) [19, 212, 216]. Many MDM organisms occupy low energyenvironments where high energy terminal electron acceptors are scarce, and appear to formobligate metabolic dependencies that could help explain resistance to traditional isolation methods.Marine Marinimicrobia have been previously implicated in sulfur cycling via a polysulfidereductase gene cluster [31, 32]. In studies of a methanogenic bioreactor, Marinimicrobia havealso been identified to rely on syntrophic interactions with metabolic partners to accomplishdegradation of amino acids [217]. The global distribution of Marinimicrobia clades indicates amuch wider diversity of metabolic functions and interacting partners than currently described.Here we use shotgun metagenomics, metatranscriptomics and single-cell genomics to investigateenergy metabolism within the Marinimicrobia to reveal novel modes of metabolic coupling withimportant implications for nutrient and energy cycling in the ocean.4.2 Results and Discussion4.2.1 Marinimicrobia single-cell amplified genomes and phylogenyA total of 25 Marinimicrobia single-cell amplified genomes (SAGs) from sources along eco-thermodynamic gradients were identified globally by flow sorting, whole-genome amplificationand sequencing (Table 4.1). SAG de novo assemblies ranged in size from 0.39 to 2.01 million bases(Mb) with estimated genome completeness ranging from <10% to >90% (average 45%) (Table4.1).Most Marinimicrobia SAGs manifested streamlined genomes, with high coding base percentage(89.99 − 97.13%) and low cluster of orthologous group (COG) redundancy (1.08 − 1.16) (FigureC.1). PhyloPhlAn analysis [218] using concatenated sequences of conserved marker genes placedMarinimicrobia SAGs within the bacterial domain branching deeply from the closest culturedthermophilic representative Caldithrix abyssi (Figure C.2). To determine phylogenetic diversitywithin the Marinimicrobia, we constructed a comprehensive SSU rRNA gene tree from identifiedMarinimicrobia SSU rRNA genes, resolving 17 clades (Figure 4.1). SAG sequences were affiliatedwith 10 clades spanning the entire breadth of the Marinimicrobia tree (Figures 4.1 and 4.2 A andB), providing a broad phylogenetic range with which to assess distribution patterns and energymetabolism within the phylum.75Table 4.1: Genomic Features of SAGsSamplingdepth (m)Oxygenstatus*CladeAssemblysize (bp)EstimatedGenomeCompleteness(%)EstimatedGenomeSize(bp)Number ofcontigsLargestcontig(bp)Largestcontig as %of assemblyGC content(%)# predictedgenes# protein-codinggenes# RNAgenes# CRISPRGulf of Maine (43◦84’N, 69◦64W)AAA160-I06 1 oxic ZA3312c-A 884,929 75.8 900,037 32 107,580 12.2 32.4 1,007 971 36 1AAA160-C11 1 oxic ZA3312c-A 824,595 78.0 955,155 17 190,175 23.1 32.7 883 852 31 1AAA076-M08 1 oxic ZA3312c-A 392,252 41.2 577,302 18 71,263 18.2 32.6 440 427 13 1AAA160-B08 1 oxic ZA3312c-A 922,550 78.3 1,296,752 26 21,880 2.4 33.0 995 965 30 1HOT Station ALOHA (22◦45 N, 158◦00 W)AAA0298-D23 25 oxic ZA3312c-B 979,176 92.3 979,176 10 407,043 41.6 31.0 1,055 1,016 39 1oxicSouth Atlantic Gyre (12◦29’S, 4◦59’W)AAA003-E22 800 oxic HF77OD10 888,773 33.5 1,737,273 53 67,111 7.6 37.0 788 764 24 2AAA003-L8 800 oxic ZA3648c 1,265,631 72.4 2,004,182 84 138,629 11.0 30.0 1,258 1,231 27 2Saanich InletAB-746 P06AB-902 100 dysoxic Arctic96B-7-B 1,538,315 77.6 1,848,863 69 139,459 9.1 32.0 1,547 1,522 25 1AB-746 N13AB-902 100 dysoxic Arctic96B-7-A 1,899,548 67.0 2,255,438 72 115,617 6.1 39.1 1,882 1,843 39 1AB-747 F21AB-903 100 dysoxic Arctic96B-7-A 751,298 8.6 8,976,777 55 64,402 8.6 38.9 799 789 10 1AB-750 L13AB-904 150 anoxic SHBH1141 2,009,596 57.1 2,659,593 112 86,252 4.3 43.0 1,921 1,885 36 1AB-750 A02AB-903 150 anoxic SHBH1141 1,925,097 45.8 3,701,687 136 88,465 4.6 43.1 1,762 1,741 21 1AB-751 D09AB-904 150 anoxic SHBH1141 569,612 11.5 3,917,216 45 50,480 8.9 43.2 528 517 11 1AB-755 M21D07 185 sulfidic SHAN400 999,835 TBD 2,877,074 70 53,898 5.4 37.0 971 953 18 1AB-755 E16C12 185 sulfidic SHBH1141 781,004 18.3 4,196,372 75 77,534 9.9 42.0 819 806 13 1Northeast subarctic Pacific (48◦52’N, 130◦40’W)JGI 0000113-D11 2,000 dysoxic Arctic96B-7-B 645,847 48.3 2,643,012 91 33,298 5.2 32.0 756 727 29 1Terephthalate degrading reactor (40◦6’N, 88◦13’W)0000039-E15 (TAsludge) na 0.0 HMTAb91-B 493,984 32.8 1,343,421 30 58,668 11.9 47.2 461 447 14 10000059-E23 (TAbiofilm) na 0.0 HMTAb91-B 618,060 17.2 3,865,965 41 75,054 12.1 47.3 583 572 11 10000059-D20 (TAbiofilm) na 0.0 HMTAb91-B 772,719 41.3 1,534,399 34 88,615 11.5 47.7 718 696 22 10000059-L03 (TAbiofilm) na 0.0 HMTAb91-B 597,650 27.5 1,115,912 41 55,420 9.3 47.8 585 579 6 10000077-B04 (TAbiofilm) na 0.0 HMTAb91-B 496,915 15.5 3,885,254 28 66,910 13.5 47.4 479 469 10 10000039-O11 (TAsludge) na 0.0 HMTAb91-B 828,046 34.5 2,959,672 75 32,891 4.0 47.0 749 729 20 10000039-D08 (TAsludge) na 0.0 HMTAb91-B 978,259 41.8 1,549,116 56 61,500 6.3 47.4 894 878 16 10000090-C20 na 0.0 HMTAb91-B 566,695 27.6 2,726,674 46 37,004 6.5 47.6 511 494 17 1Etoliko Lagoon Sediment (38◦47; N, 21◦33’ E)AAA257-N23 na 0.0 HMTAb91-A 948,616 27.4 2,239,091 71 50,692 5.3 39.0 912 898 14 1*oxic (>90 µm O2), dysoxic (90 µm < O2 > 20 µm), suboxic (20 µm< O2 > 2 µm), anoxic (< 2 µm O2), sulfidic.766597SI 150m SAG AB751_D09AB904SI 150m SAG AB750_A02AB903SI 185m SAG AB755_E16C12SI clone 200m clone SHBH435 (GQ350809)SI 150m SAG AB750_L13AB904SI 200m clone SHBH1141 (GQ350776) 80Wetland clone TDNP_Wbc97_96_1_238 (FJ517138)10094100TA sludge SAG JGI0000039-E15TA sludge SAG JGI0000039-O11Mesophilic sludge clone QEDP2DE08 (CU924246)100Tar oil contaminated aquifer sediments clone D25_21 (EU266898)10066Etoliko Lagoon SAG SCGCAAA257100100100Caldithrix palaeochoryensis MC (FJ999729) Lost City clone SGXT514 (FJ791936)Caldithrix abyssi (AJ430587) 73P4 500m fosmid 4050020-J15SPOTS clone SPOTSMAY03_890m13 (DQ009472) Denitrovibrio acetiphilus N2460 (AF146526) Geovibrio thiophilus AAFu3 (AJ299402) Deferribacter abyssi JR ((AJ515882)82100100100100100SI 185m SAG AB755_M21D07SI 120m clone SHAN400 (GQ349121)SI 100m fosmid FPPS_57A9 SI 100m clone SHAG399 (GQ348837) SI 10m fosmid FPPP_33K14Arabian Sea clone A714018 (AY907803) 100100P4 1300m fosmid 413009-K18P4 1300m fosmid 125003-E23SI 200m fosmid FPPU_33B15P4 500m fosmid 405006-B04SAR406 (U34043)9188HOT fosmid HF0500_01L02 (GU474916) NESAP P12 2000m SAG JGI0000113-D11Arctic Ocean clone Arctic96B-7 (AF355047) SI 100m SAG AB746_P06AB902SI 10m fosmid FPPP_13C3SI 100m SAG AB747_F21AB903SI 100m SAG AB746_N13AB902SI 200 m fosmid FPPZ_5C695P4 1300m fosmid 413004-H1767Arctic95A-2 (AF355046) Monterey Bay 750m fosmid EBAC750-03B02 (AY458631)Arctic Ocean 400 m clone CB1343b.27 (GQ337204)P12 500m fosmid 1250012-L08P4 1300m fosmid 4130011-I0763SI 125m fosmid FGYC_13M19SI 200m clone SHBH680 (GQ350930)SI 200m clone SHBH391 (GQ350786) HOT fosmid HF4000_22B16 (GU474892)P26 2000m clone P262000N21 (HQ674572)10010010093P26 1000m clone P261000M10P4 1300m clone P41300E03 (HQ673390)Juan de Fuca Ridge deep seawater clone JdFBBkgd36HOT 25m SAG AAA298_D23HOT fosmid HF0010_18O13 (GU474850)Gulf of Maine 5m SAG AAA160_I06Gulf of Maine 5m SAG AAA076_M08Gulf of Maine 5m SAG AAA160_C11Gulf of Maine 5m SAG AAA160_B08Atlantic Ocean clone ZA3312c (AF382116)689998100100P12 2000m fosmid 122006-I05P26 2000m clone P262000D03 (HQ674365)86South Atlantic Tropical Gyre 800m SAG AAA003L08P4 1000m clone P41000I01P12 2000m clone P122000J1291MGA_29 clone P122000H09Atlantic Ocean clone ZA3648c (AF382142)6992South Atlantic Tropical Gyre 800m SAG AAA003E22Juan de Fuca Ridge deep seawater clone JdFBBkgd39Arctic Ocean 1000 m clone CB1341b.90 (GQ375276)HOT clone HF770D10 (DQ300775)100100811001000.1Arctic95A-2SHAN400SHBH391HMTAb91-ASAR406A714018Arctic96B-7-AArctic96B-7-BP262000N21P41300E03ZA3312c-BZA3312c-AP262000D03ZA3648cSHBH1141TA sludge clone HMTAb91 (KM373076) HMTAb91-BHF770D10-410810-2493744207504071359Redox Pair E o’ (mV)NO2- / NO -240-210CO2 / CH4 Sn 2-/ H2SH+/ H2SO4 2-/ HS-NO3- / N2 O2/ H2O N2O / N2 NO3- / NONO3- / NO2-77Figure 4.1: Maximum likelihood small subunit rRNA tree and energy metabolism redox pairs forMarinimicrobia lineages (previous page). Maximum likelihood phylogenetic tree of small subunit riboso-mal rRNA (SSU rRNA) gene from all available studies. SSU rRNA gene from SAGs used in this study arein bold and coloured to indicate there membership to population genome bins. Energy metabolism redoxpairs for each lineage explored in this publication are mapped to electron tower on the right of the tree.The bar represents 1% estimated sequence divergence. Bootstrap values below 50% are not shown.4.2.2 Biogeography of Marinimicrobia cladesUsing this phylogenetic information, we determined the global biogeographic distribution ofMarinimicrobia and specific SAG-affiliated clades along eco-thermodynamic gradients spanningoxic (>90 µm O2), dysoxic (20 − 90 µm O2), suboxic (1 −20 µm O2), anoxic(<1 µm O2), sulfidic andmethanogenic conditions. Estimates of Marinimicrobia total abundance and clade distributionwere carried out by a robust survey of 594 globally sourced metagenomes (549 assembled Illuminadata sets and 45 unassembled 454 data sets) across terrestrial and marine ecosystems, includingNortheastern Subarctic Pacific (NESAP, n = 43), Saanich Inlet (SI, n = 90), Eastern Tropical SouthPacific (ETSP, n = 6), Peruvian (n = 17), and Guaymas Basin (n = 2) OMZs; TARA Oceans (n= 243) and several other marine (n = 141) and terrestrial sites (n = 52), (Table C.1) totalling127 Gigabases (Gb) of sequence information. To estimate total abundance, we used a sequencesimilarity recruitment with a cutoff of >70% nucleotide identity over >70% of the metagenomiccontig. Globally we recovered 1.3 Gb of Marinimicrobia-affiliated sequence or 1.3 million genomeequivalents (assuming 1Mb average genome) representing ∼1% of surveyed data. The recoveryof Marinimicrobia-affiliated sequences was highest in coastal OMZs, increasing in relation todecreasing O2 concentration (Figure C.3A). Recovery was more variable in other marine locationsand minimal in terrestrial locations. To more fully resolve this sequence information at thelevel of specific Marinimicrobia clades, we conducted a more stringent recruitment of ≥95%nucleotide identity across ≥200 bp intervals. On a global scale three clades constituted 75% ofobserved Marinimicrobia with the remaining seven clades making up the difference (Figure C.3B).Consistent with previous results, predominantly marine sites were recruited with two hits fromterrestrial locations. Sakinaw Lake, a meromictic lake with high methane concentrations [216],was the only geographic location with recruitment to the HMTAb91 clade. Within marine systems,78HMTAb91-BArctic96B-7-BHMTAb91-AArctic96B-7-ASHBH1141SHAN400ZA3648cZA3312c-AHF770D10ZA3312c-BRelated taxaMGA Cladesacdf jkm1000m10mNESAP1300m500mDCDCMMESSRFSRFDCMMIXMESSRFDCMMESSRFDCMSRFDCMSRFDCMMIXMESSRFMESSRFMSI100m10m120m135m150m200m110−200m15mETSP35m50−65m500−800m70−85mPERU20m40−80m5m MESbe ghilnA. B.C.Sn2 / H2SH+ / H2N2O / N21359810420374-249-410NO2- / NONO3- / NO2-O2 / H2OSAG originO2 Eo’(mV)RedOx PairsNitrogenOxygenSulfurHydrogen1234567171819202122232413 14 15162512891011Figure 4.2: Phylogeny and electron donors of Marinimicrobia and Biogeographic distribution. (A)Unrooted phylogenetic tree based on small subunit rRNA (SSU rRNA) gene in single-cell amplifiedgenomes (SAG) showing the phylogenetic affiliation of Marinimicrobia SAGs, each dot represents a SAGsin Table S1 with the corresponding number. (B) Circular plot indicating the terminal electron acceptorsused and their respective E ◦ '(mV) value (right) by the different Marinimicrobia clades (left). (C) Globaldistribution of Marinimicrobia SAG-related microorganisms, as determined by metagenomic fragmentrecruitment using FAST(Online) with 595 global metagenomes with a threshold of ≥ 95% nucleotidesequence identity and alignments ≥ 200 bp. Recruited contig lengths were normalized by the length ofeach SAG assembly in mega base pairs (Mbp) and to the size of the metagenome of origin in Mbps.79SAGs recruited sequences from cognate environments and conditions consistent with observedtree branching patterns (Figure 4.2, Table C.2). Overall, trends indicated that specific clades inhabitparticular energetic niches with potential for metabolic coupling within a given niche.Population genome bin constructionTo determine the energy metabolism of Marinimicrobia clades and overcome low genome comple-tion of some SAGs, we leveraged extensive metagenomic and metatranscriptomic resources fromNESAP and Saanich Inlet time series [114, 219] to construct population genome bins, improvingestimated genome completion to an average of 87% (Table C.3). Metagenomic contigs >5000bpand with >95% identity to SAGs were identified followed by tetra-nucleotide frequency analysisto resolve specific clades (Figure 4.3a). A total of five population genomes for Marinimicrobiaclades ZA3312c-A/B, HF770D10, Arctic96B-7-A/B, SHAN400, and SHBH1141 spanning oxic,dysoxic, suboxic, anoxic, and anoxicsulfidic conditions were resolved from Saanich Inlet andNESAP metagenomes, enabling more complete metabolic reconstruction within each clade (Figure4.3a, b). A sixth clade (HMTAb91-A), endemic to a methanogenic bioreactor branching nearthe base of Marinimicrobia radiation was included in downstream comparisons of metabolicpotential to encompass the complete range of electron donoracceptor pairs. Energy metabolismof Marinimicrobia population genomes was examined in relation to tree branching patterns andenvironmental disposition. A total of 18 metatranscriptomes from six depths and three time points(Figure 4.4 a and b ) were used to explore Marinimicrobia gene expression over defined energygradients including a deep water renewal event resulting in the influx of oxygenated nutrient richwaters in Saanich Inlet basin waters. This enabled the resolution of metabolic niches and indictedpotential modes of metabolic coupling within specific Marinimicrobia clades.4.2.3 Metabolic reconstruction and gene model validationMarinimicrobia clades ZA3312cA/B and HF770D10 were most abundant under oxic water columnconditions with extensive genome streamlining comparable to Ca. Pelagibacter ubique (FigureC.1A). All three clades harbored genes encoding for aerobic respiration, and heterotrophy with noindication for autotrophic CO2 fixation. ZA3312c clades also encoded the oxidative tricarboxylic80acid (TCA) cycle (Table C.4) and proteorhodopsin, a proton-pump used to harness light energy(Figure 4.3b) [220]. ZA3312c proteorhodopsin transcripts were highly expressed in oxic surfacewaters of Saanich Inlet, suggesting that ZA3312c are capable of supplementing organotrophy withphototrophy in surface waters, a trait well suited to open-ocean oligotrophic environments (FigureC.6A). Interestingly, ZA3312c-A also encoded nitrous oxide reductase (nosZ) and associatedmaturation factors (nosL, nosD, and nosY) that drive the reduction of N2O to N2 in the terminalstep of denitrification. Transcripts for nosZ were expressed throughout the Saanich Inlet watercolumn (Figures 4.4A, C.7) and indicate potential coupling to ammonia oxidizing Thaumarchaeathat produce N2O as a byproduct of ammonia oxidation [203]. ZA3312c-A nosZ transcripts werealso detected in suboxic waters of the NESAP, Peru, and ETSP OMZs, and four TARA oceansmetagenomes contained ZA3312c-A nosZ sequences (>80% nucleotide identity) (Figure 4.4B)reinforcing a global distribution pattern with functional implications for marine nitrogen budgetsand greenhouse gas cycling.Marinimicrobia clades Arctic96B-7-A and B were widespread in dysoxic ocean waters. Arctic96B-7clades harbored genes encoding for aerobic respiration, organotrophy and oxidative TCA cyclewith no indication for proteorhodopsin or autotrophic CO2 fixation (Table C.4). Arctic96B-7 cladesmay supplement energy generation in a similar manner to proteorhodopsin through catabolismof the common ocean compound oxalate [221], coupling a unique oxalate: formate antiporterand oxalate decarboxylase [222]. The Arctic96B 7-A clade also encoded nitrate reductase (narG),and polysulfide (polyS) reductase (psrABC) (Figures 4.3b, 4.4a) that were expressed throughoutthe Saanich Inlet water column. Peak expression corresponded to depths with low NO3 – andno detectable H2S. Interestingly, the PsrABC enzyme complex can use H2S as an auxiliaryelectron donor through PsrABC-mediated H2S oxidation to polyS and stored polyS can serve asan alternative electron sink, regenerating H2S. The combination of narG and psrABC providesArctic96B-7 clades with versatile energy metabolism with potential coupling to both sulfuroxidizing bacteria (ARCTIC96-BD19, SUP05) by regenerating H2S under non-sulfidic conditions,and anaerobic ammonium (Planctomycetes) and nitrite (Nitrospina) oxidizing bacteria throughthe production of NO2 – in dysoxic, suboxic, and anoxic waters (Figure 4.5A). Thus, Arctic96B-7clades may form supportive metabolic partnerships with major of primary producers in OMZs81critical to the biogeochemical cycling of carbon, nitrogen, and sulfur [79]82Table 4.2: Genomic features of Marinimicrobia population genomesLineage Singel Cell Genome IdentityPopulationGenome Size(Mbp)EstimatedCompleteness(%)Number ofcontigsN50GC content(%)Number of singlecopy marker genes*Strain Heterogeneity*ZA3312c-AAAA160-I06, AAA160-C11, AAA076-M08,AAA160-B0811 95.8 531 35213 32.8% 56 94.33ZA3312c-B AAA0298-D23 1 93.4 41 236078 31.6% 147 28HF770D10 AAA003-E22 1.4 41.2 118 15724 36.6% 104 100Arctic96B7 A AB-746 N13AB-902, AB-747 F21AB-903 50.9 100.0 3423 18609 39.4% 56 70.67Arctic96B7 B AB-746 P06AB-902, JGI 0000113-D11 6.0 96.6 583 13227 32.6% 104 63.36SHAN400 AB-755 M21D07 32.2 87.5 2196 19252 37.4% 56 99.64SHBH1141AB-750 L13AB-904, AB-755 E16C12,AB-751 D09AB-90465.6 91.7 3127 35279 43.5% 56 96.11* Parks et al. Genome Research 201583Arctic96B-7-AZA3312c SHAN400 SHBH1141 HMTAb91NADH2H+NdhHeterotrophicmetabolismH2SS n2- S n-12-2H+2H+QH2QPsrCPsrABNAD+FoOxOxCO 2 H+NO3- NO3-NO2- NO2-NO3-NO2-NADH NAD+2H+ 2H+Heterotrophicmetabolism2H+2H+QH2QNarNdh-410/ H 2H+810O2 / H2O-249/ H2SS n2-NADHNAD+ NAD+Na+FdredFdoxH+/Na+Rnf+NADH NAD++FdredFdoxH+/Na+RnfCarbon FixationrTCA cycle2H+QH2QNqrH+hνPR OxlT4H+FdredFdoxNADHNAD+γαβ2H 2420/ NO2-NO3-375/ NONO2-407/ NONO3-NADH2H+NdhHeterotrophicmetabolismH2SS n2- S n-12-2H+2H+QH2QPsrCPsrABNO3- NO3-NO2- NO2-NO3-NO2-H2SNO2-NOS n2-S n-12-NADH NAD+ NAD+2H+ 2H+NdhHeterotrophicmetabolism2H+2H+QH2Q2H+QH2QPsrC Nar2H+2H+QH2QNir2H+Cytbc1CytCH2S N2O N2S n2-S n-12-2H+2H+ 2H+QH2QCytbc1PsrCPsrABNosCytC1359/ N2N2OH2S S n2-H2S S n2-H2S S n2-H 2 2H+C H2O C O2Amino Acid VFA + C O2C H2O C O2C H2O C O2 /C H2O C O2 C H2O C O2/H2SS n2-S n-12H+QH2QPsrCPsrAB2H+PsrABPolysuldeRespirationHydrogeno-genesisH2SS n2- S n-12-2H+2H+QH2QNi,FeHydPsrCPsrAB2H+H 2AerobicRedox Paire- donore- donore- acceptorReactions Oxidation ReductionElectron carriers Fd = Ferredoxin Q/QH 2 = Quinone\QuinolBAEo’ (mV)PartialDenitricationNitrous OxideRespirationN2O N22H++Cytbc1NosCytC84Figure 4.3: Energy metabolism of Marinimicrobia population genome bins(Previous page). (A) Binningof Marinimicrobia population genomes by Kmer frequency principal component analysis, two rotations ofthree-dimensional plot, clouds of color coded genome bins are apparent. (B) Summary of co-metabolicand energy metabolism and conservation strategies of Marinimicrobia population genomes from alongeco-thermodynamic gradients, for nitrogen (blue), sulfur (pink), and hydrogen (green). Enzymes include:proteorhodopsin (PR), sulfur: polysulfide reductase (PsrAB, PsrC); nitrogen: nitrite reductase (Nir), nitratereductase Nar, nitrate/nitrite antiporter (NirK), nitrous oxide reductase (Nos); hydrogen metabolism: Ni,Fehydrogenase (Ni,Fe Hyd), hydrogenase complex (HydBD); respiratory elements: cytochrome bc1 complex(Cytbc1), NADH dehydrogenase (Ndh), energy-conserving putative electron transfer mechanisms putativeion-translocating ferredoxin:NADH oxidoreductase (IfoAB); oxalate transporter (OxlT); Rhodobacter nitrogenfixation complex (Rnf). Oxidation and reduction indicated by solid or dotted arrows respectively.Marinimicrobia clade SHAN400 appears to be endemic to Saanich Inlet where it is most abundantbelow the oxycline (Figure C.5). SHAN400 harbored genes encoding for aerobic and anaerobicrespiration, heterotrophy and oxidative TCA cycle. SHAN400 also encoded ferredoxin, pyruvatemetabolism, and NADH dehydrogenase (Figures 4.3b, C.9, C.8), potentially providing additionalelectron shuttles for energy metabolism under anoxic conditions. Like Arctic96B-7, SHAN400encoded narG and psrABC, potentially linking its energy metabolism to both sulfur-oxidizingbacteria (SUP05) and anaerobic ammonium- (Planctomycetes) and nitrite- (Nitrospina) oxidizingbacteria in anoxic waters (Figures 4.3b, 4.4a, C.6ab). In contrast to Arctic96B-7, SHAN400transcripts for heme/copper-type cytochrome and NADH dehydrogenase were most highlyexpressed in anoxic waters (Figure C.8). This is consistent with redox-driven niche partitioningbetween Arctic96B-7 and SHAN400 clades in the Saanich Inlet water column.Marinimicrobia clade SHBH1141 was prevalent in anoxic and anoxic-sulfidic OMZ wa-ters (Figure C.5). SHBH1141 harbored genes encoding for aerobic and anaerobic respiration,autotrophic CO2 fixation via the reductive TCA cycle (citrate lyase and ferredoxin-dependent2-ketoacid oxidoreductases), and the Rhodobacter nitrogen fixation (Rnf) complex to producereduced ferredoxin to drive endergonic reductive carboxylation steps, indicating a capacity tocarry out autotrophy (Figures C.9, C.8). In addition, SHBH1141 encoded psrABC, class I [Ni,Fe]hydrogenases (hybOABCD) and nosZ with associated maturation factors nosL and nosD (Figures4.3b, C.6, C.9). Gene expression for psrABC, hybOABCD, and nosZ was elevated under anoxicto sulfidic conditions (120 m in July 2010, and 150 m in July and August 2010; Figure 4.4a).85ZA3312c-AArctic96B-7-ASHAN400SHBH1141200175150125100200175150125100200175150125100July 2010August 2010Feb 2011NarGN2O (µM)NosZPsrABHyaABNarGNosZPsrABHyaABNarGNosZPsrABHyaABNarGNosZPsrABHyaABdepth (m)Chemical statusAnoxicSuboxicSuldicDysoxic306090120Scale(RPKM)10 20 30SI NESAP PERU ETSP TARAAverage RPKM5101520SHBH1141ZA3312cNosZOxicDysoxicSuboxicAnoxicSuldicOxicDysoxicSuboxicOxicDysoxicSuboxicSuldicOxicDysoxicSuboxicOxicDysoxicSuboxicAnoxicSHBH1141 nosZ ZA3312c nosZ SHBH1141 nosZ ZA3312c nosZ MetagenomeMetatranscriptomeA.B.Figure 4.4: Expression of selected Marinimicrobia energy metabolism genes in Saanich Inlet. (A)Expression of selected genes involved in Marinimicrobia energy metabolism from Saanich Inlet stationSI03 at three time points between 100 and 200 m. Size of circle represents reads per kilobase per millionmapped (RPKM) (see methods) for metatranscriptomic reads mapped to the selected genes for the indicatedpopulation genomes. Water column redox status for each time point encoded on left axis and nitrousoxide concentration profile for each time point on left. Enzymes nitrate reductase (narG), Nitrous oxidereductase (nosZ), polysulfide reductase subunits A and B (psrAB) and Ni-Fe hydrogenase subunits A andB (hyaAB). (B) Detected genes and transcripts for Marinimicrobia ZA3312c and SHBH1141 nosZ alongeco-thermodynamic gradients from oxic (>90 µm O2), dysoxic (20-90 µm O2), suboxic (1-20 µm O2), anoxic(<2 µm O2), and sulfidic conditions in Saanich Inlet time series, Northeastern Subarctic Pacific (NESAP),Peru, Eastern Tropical South Pacific (ETSP) and TARA Oceans (no transcriptomes available) data sets. For SIand ETSP dot size represents average reads per killobase per million mapped (RPKM) summed for a givennosZ type each metagenome or metatranscriptome and averaged by the total number of metagenomes ormetatranscriptomes for a given water column classification. For ETSP, Peru and TARA bubble size is thenumber of reads (ETSP and Peru) or contigs (TARA) with nosZ averaged per number of metagenome ormetatranscriptomes for a given water column classification.86SHBH1141 class I [Ni,Fe] hydrogenase is proposed to operate bidirectionally based on observa-tions in Escherichia coli and Salmonella enterica, with proposed hydrogen production under moreoxidizing conditions [223]. SHBH1141 nosZ was recovered on a global scale and expressed underboth sulfidic conditions in Peru and suboxic conditions in the ETSP as well as Saanich Inlet (Figure4.4b), positing a central role for SHBH1141 in OMZ N2O reduction. The expression of thesegenes in nonsulfidic waters points to a new mode of dynamic metabolic mutualism in whichSHBH1141 may rely on SUP05 N2O generation in anoxic and sulfidic waters [34, 79] to store polySand re-evolve H2S from polyS to stimulate SUP05 N2O production (Figure 4.5b). This would inturn support autotrophic carbon fixation in both partners and sustains N and S biogeochemicalcycling under dynamic or unfavorable conditions (e.g., limited H2S bioavailability; Figure C.4).Such mutualism would be highly dependent on either (a) migration along the eco-thermodynamicgradient or (b) seasonal/temporal changes such as renewal or upwelling events.Marinimicrobia clades HMTAb91-A/B are prevalent in methanogenic locations at thebase of the electron tower. HMTAb91-A/B did not harbor genes for aerobic respiration andhad an incomplete TCA cycle. HMTAb91-A encoded the Embden-Meyerhof-Parnas pathway(Table C.4) and both HMTAb91-A/B encoded energy-conserving H+ respiration through electron-confurcating hydrogenases (reverse electron transport), the energy-conserving (Rnf complex)and putative syntrophic amino-acid metabolism through the ion-translocating ferredoxin:NADHoxidoreductase (ifoAB) (Figure 4.3b) [217]. Within the methanogenic reactor where it was initiallydescribed, HMTAb91-A is postulated to accomplish thermodynamically unfavorable amino-acid degradation supporting methanogenesis [217]. HMTAb91-A/B clades appear restricted tomethanogenic ecosystems as no metagenomic or metatranscriptomic sequences were recruitedfrom non-methanogenic locations.87Dysoxic Suboxic Anoxic SuldicPlanctomycetes SUP05 uncultured bacteriumNitrosopum-ulaceaeZA3312c-ASHAN400Arctic96B-7-BSHBH1141Nitrogen cyclingHydrogen redoxSulfur cyclingNitrospiradeuviiHS-SO42-NH4+NO3-N2N2NO2-N2ONH4+NO2-NO2-H2H+HS-SO42-NH4+NO3-NO3-N2N2O N2ON2Sn-1 Sn-1NO2-SO42-NO2-NO2- NO2-NO2-H2H+2-Sn-1HS-NH4+NO3-N2N2NO2-NH4+NO3-N2NO2-Sn-1 Sn-1 Sn-1NO2-N2ONH4+HS-N2N2O N2N2O N2Sn-1Sn-1 Sn-1 Sn-1Sn-1NO2-NO3- NO2-NO3-Sn-1Sn-1 Sn-12-Sn-1NO3-SO42-H2SN2OB. Dynamic metabolic mutualismNO3-SO42-H2SH2 2H+N2OH2S H2SN2O N2S n2- S n2-A. Supporting chemolithotrophic primary productionC. Community level metabolic coupling along eco-thermodynamic gradientNO2-NH4+N2NO3- NO3-SO42-H2SN2OH2SNO2-CO2CH2OS n2-interactionMetabolismoxidationreductionbioavailableFigure 4.5: Proposed co-metabolic model along eco-thermodynamic gradients in Saanich Inlet. (A) Proposed coupling where ARCTIC96B-1clades support the chemolithotrophic primary production of SUP05 and Planctomycetes. (B) Proposed dynamic metabolic mutualism betweenSUP05 and SHBH1141. (C) Overall proposed model for Marinimicrobia co-metabolic activity with other dominant microbial groups in SaanichInlet along eco-thermodynamic gradients. Interactions based on expression data for sulfur (pink), nitrogen (blue) and hydrogen (green) fordominant Marinimicrobia lineages in Saanich Inlet as well as metabolic partners Nitrosopumulaceae sp.,Planctomycetes, and SUP05 unculturedbacterium.884.3 DiscussionCo-metabolic functions encoded and expressed within globally distributed Marinimicrobia cladeswould fill several hitherto unassigned niches in the nitrogen and sulfur cycles and support recentmodeling efforts integrating biogeochemical and multi-omic sequence information in the SaanichInlet water column [113–115]. The N2O reductase expressed on a global basis by ZA3312c-A andSHBH1141 clades has the potential to act as a biological filter for N2O produced by the ubiquitousmarine processes of ammonia oxidation (e.g., Thaumarchaeota) [203] and partial denitrification(e.g., SUP05) [34, 79]. In contrast, nitrate reduction to NO2 – by other Marinimicrobia clades(i.e., Arctic96B-7-A and SHAN400) has potential to provide NO2 – to anaerobic ammonium-oxidizing (Planctomycetes) and nitrite-oxidizing (Nitrospina) bacteria in dysoxic, suboxic, andanoxic waters. The polysulfide reductase expressed by multiple Marinimicrobia clades (e.g.,Arctic96B-7, SHAN400, and SHBH1141) has potential to provide an energy storage mechanismvia accumulation of polyS that can be reduced or oxidized under changing water column redoxconditions and support both cooperative and dynamic interactions including cryptic sulfur andcycling and dark carbon fixation [78].The application of eco-thermodynamics principles to microbial ecology provides perspectiveon how thermodynamic constraints serve to shape microbial community structure and the natureof co-metabolic interactions along energy gradients. Indeed, phylo- genetic branching patternsoften coincided with energy yields of redox pairs for identified clade energy metabolism, withdeeper branching clades near the base of the electron tower where lower energy yields wouldincrease potential for metabolic coupling. Additionally, many Marinimicrobia clades encodedenzyme systems tied to both nitrogen- and sulfur-cycling, suggesting extensive specialization formetabolic cooperation bridging within and between biogeochemical cycles. Such dependencieslikely confound isolation efforts within the phylum and point to an ancestral state primed forco-existence. The extent to which this reflects the diversification of other phyla, particularly MDMacross the Tree of Life is an interesting area of research with implications for understanding anddirecting the evolution of metabolic networks driving Earths biogeochemical cycles.894.4 Methods4.4.1 SAG collection, sequencing, assembly, and decontaminationSAGs from Gulf of Maine, HOT station ALOHA, South Atlantic Gyre, the Terephthalate degradingbioreactor and Etoliko Lagoon Sediment were included in Rinke et al. [19], and collection,assembly and decontamination follows accordingly. See Supplementary Data 1 for details onSAG genomics. SAGs from Northeast subarctic Pacific (NESAP) and Saanich Inlet followed thefollowing protocol. Replicate 1 mL aliquots of sea water collected for single-cell analyses werecryopreserved with 6% glycine betaine (Sigma-Aldrich), frozen on dry ice and stored at -80◦C.Single-cell sorting, whole-genome amplification, real-time PCR screens, and PCR product sequenceanalyses were performed at the Bigelow Laboratory for Ocean Sciences Single Cell GenomicsCenter (www.bigelow.org/scgc), as described by Stepanauskas and Sieracki [105]. SAGs from theNESAP were generated at the DOE Joint Genome Institute (JGI) using the Illumina platform asdescribed in Rinke et al. [19]. SAGs from Saanich Inlet were sequenced at the Genome SciencesCentre, Vancouver BC, Canada, as described in Roux et al. [224]. All SAGs were assembled at JGIas described in Rinke et al. [19, 224].The following steps were performed for SAG assembly: (1) filtered Illumina reads wereassembled using Velvet version 1.1.0437 using the VelvetOptimiser script (version 2.1.7) withparameters: (–v –s 51 –e 71 –i 4 –t 1 –o -ins length 250 -min contig lgth 500) 2) wgsim (-e 0 -1100 -2 100 -r 0 -R 0 -X 0) 3) Allpaths-LG (prepareAllpathsParams: PHRED 64 = 1 PLOIDY = 1FRAG COVERAGE = 125 JUMP COVERAGE = 25 LONG JUMP COV = 50, runAllpathsParams:THREADS = 8 RUN = std pairs TARGETS = standard VAPI WARN ONLY = True OVERWRITE= True). SAG prediction analysis and functional annotation was performed within the IntegratedMicrobial Genomes (IMG) platform [225] (http://img.jgi.doe.gov) developed by the JointGenome Institute, Walnut Creek, CA, USA.4.4.2 Phylogenomic analysis of SAGsThe PhyloPhlAn pipeline was used to determine relationships among Marinimicrobia SAGs[218] (Figure C.3) as well as the phylogenetic placement of Marinimicrobia within the bacterial90domain (Figure C.2) . In both cases, fasta files for the 25 SAGs and related genomes werepassed to PhyloPhlAn and resulting trees were visualized and drawn using GraPhlAn. The25 Marinimicrobia SAGs and related genomes were inserted into the already built PhyloPhlAnmicrobial Tree of Life containing 3737 genomes using the insert functionality, and a de novophylogenetic tree was created using the user functionality based solely on the 25 MarinimicrobiaSAG and related genome fasta files. Default parameters were used in each case with the exceptionof a custom annotation file used in GraPhlAn to colour the leaves based on phylum in themicrobial Tree of Life, and subgroup in the de novo phylogenetic tree.4.4.3 Metagenome fragment recruitmentThe proportion of Marinimicrobia represented in the 594 globally distributed metagenomes(Figure C.3) was determined by SAG nucleotide sequence alignment to individual metagenomesusing FAST [185]. Parameters of 70% nucleotide identity cutoff over 70% of the contig length(or 454-read, where applicable) were employed to encompass the Marinimicrobia phylum [226].The small subunit ribosomal RNA (SSU rRNA) gene was removed from SAG sequences beforealignment searches to prevent cross-recruitment to non-Marinimicrobia sequences. The totallength of contigs passing the cutoff for a given metagenome was summed and divided by thetotal contig length for that metagenome to calculate percentage of Marinimicrobia. Where data onO2 concentration was available, for Saanich Inlet, NESAP, ETSP [134], and Peruvian upwelling[199], O2 status of the sample was used as indicated. Data on O2 concentration were unavailablefor Marine-Misc. and terrestrial samples.Biogeography of Marinimicrobia SAG-affiliated clades was similarly determined usingalignment parameters of 95% identity cutoff and >200 base pairs (bp) alignment length to ensureonly contigs with high sequence similarity while maintaining clade resolution. Metagenomiccontigs mapping to more than one Marinimicrobia clade were assigned to the clade with greatestpercent identity and in the event of a tie were assigned to the clade with the greatest alignmentlength. Overall abundance was calculated for each metagenome by summing the total lengths ofall contigs with hits to a given Marinimicrobia clade divided by the total size of the SAG and thetotal size of the assembled metagenome in base pairs. Results by metagenome and clade were then91summed in Figure 4.2 and itemized in Table C.2. Global relative abundance of Marinimicrobiaclades shown in Figure C.3 was calculated similarly by summing the total lengths of all contigswith hits to a given Marinimicrobia clade divided by the total size of the SAG and the total size ofthe assembled metagenome in base pairs and then summing for all hits to a given clade.4.4.4 Saanich Inlet and NESAP metagenomes and metatranscriptomesSaanich Inlet metagenomes and metatranscriptomes were collected, sequenced, and assembledas described in Hawley et al. [114] and cognate chemical and physical measurements can befound in and Torres-Beltra´n et al. [115] . Briefly, Saanich Inlet samples for metagenomic andmetatranscriptomic sequencing were collected by Niskin or Go-Flow on line with CTD. Samplesfor metatranscriptomics, 2 L, were filtered by peristaltic pump with in-line 2.7 µm prefilter ontoa sterivex filter with 1.8 mL RNALater added and frozen on dry ice within 20 min of bottleon-deck. Metagenomic samples, 20 L, were filtered within 8 h of collection by peristaltic pumpwith in-line 2.7 µm prefilter onto a sterivex filter with 1.8 µL lysis buffer added and frozen at-80◦C. Metagenomic and metatranscriptomic samples were processed, sequenced, and assembledaccording to Hawley et al.[114] at the JGI using the Illumina HiSeq platform.Sampling in the NESAP was conducted via multiple hydrocasts using a Conductivity,Temperature, Depth (CTD) rosette water sampler aboard the CCGS John P. Tully during threeLine P cruises: 2009-09 [June 2009, major stations P4 (48◦39.0 N, 126◦4.0 W, 7 June), P12 (48◦58.2N, 130◦40.0 W, 9 June), and P26 (50◦N, 145◦W, 14 June), 2009-10 [August 2009, major stations P4(21 August), P12 (23 August) and P26 (27 August)], and 2010-01 [February 2010, major stationsP4 (4 February) and P12 (11 February)]. At these stations, large volume (20 L) samples for DNAisolation were collected from the surface (10 m), while 120 L samples were taken from three depthsspanning the OMZ core and upper and deep oxyclines (500, 1000, 1300 m at station P4; 500, 1000,2000 m at station P12). Sequencing and assembly was carried out as described above for SaanichInlet and accession numbers are available in C.1.Construction and validation of population genome bins. Marinimicrobia population genomebins were constructed by identifying metagenomic contigs from Saanich Inlet, and NESAPmetagenomes mapping to specific SAG(s) using a supervised binning method based in part on92methodologies developed by Dodsworth et al. [181] in the construction of OP9 population genomebins. Initially, determination of membership of individual SAGs to SAG-clusters making up agiven phylogenetic clade was conducted. SAG tetranucleotide frequencies were then calculatedand converted to z-scores with TETRA (http://www.megx.net/tetra) [227, 228]. Z-scores werereduced to three dimensions with principal component analysis (PCA) using PRIMER v6.1.13[229] and hierarchical cluster analysis of the z-score PCA with Euclidian distance (also performedin PRIMER) was carried out to generate SAG-clusters. These SAG-clusters reflected phylogeneticplacement of the SAGs by SSU rRNA gene analysis. For construction of population genomebins, metagenomic contigs from NESAP and SI data sets were aligned to SAG contigs with >95%nucleotide identity using BLAST [209] and a minimum of 5 kilobase pairs alignment length,Tetranucleotide frequencies of all metagenomic contigs passing this identity and length thresholdwere calculated and converted to z-scores. SAG–upervised binning as described in Dodsworthet al. using linear discriminant analysis was carried out using all z-scores with the SAG-bins astraining data to classify the metagenomic conigs as making up a given population-genome bin.Individual SAGs and population genome bins were analyzed for completeness and strainheterogeneity using CheckM v1.0.5 [230]. Specifically, the lineage wf workflow was used withdefault parameters. The lineage wf workflow includes determination of the probable phylogeneticlineage based on detected marker genes. The determined lineage then dictates the sets of markergenes that is most relevant for estimating a given genomes completeness and other statistics. Thestrain heterogeneity metric is highly informative for population genome bins as it is essentiallythe average amino-acid identity for pairwise comparisons of the (lineage appropriate) redundantsingle-copy marker genes within a population genome bin (Table 4.2). For population genomebins the higher the strain heterogeneity value, the more similar the amino acid identity of theredundant maker genes indicating the sequences in the bin originate from a closely related, if notidentical, phylogenetic source.4.4.5 Marinimicrobia genome streamliningGene-coding bases and COG-based gene redundancy shown in Figure C.1 were calculated usingcluster of orthologous group (COG)-based genome redundancy as described in Rinke et al. [19].93Each genes COG category was predicted through the JGI IMG pipeline. COG redundancy wascalculated by averaging the occurrence of each COG in the genome. The percentage of gene-codingbases was calculated by dividing the number of bases contributing to protein and RNA-codinggenes by the total genome size. For SAGs, the length of the assembled genome was used ratherthan the estimated genome size.4.4.6 Annotation and identification of metabolic genes of interestGenes of interest were identified in the SAGs and in IMG/M (https://img.jgi.doe.gov/cgi-bin/m/main.cgi) [231] for the metagenomic contigs which made up the population genomebins. Contigs making up Marinimicrobia population genome bins were run through MetaPath-ways 2.5 [94, 182] to annotate open reading frames (ORFs) and reconstruct metabolic pathways.As the population genome bins were constructed from multiple metagenomes they containedredundant sequence information, BLASTp [209] (amino-acid identity cutoff >75%) was used toidentify all copies of a given gene of interest in each population genome bin, which was then usedin gene model validation and expression mapping.4.4.7 Gene expression mappingMetatranscriptomes from three time points in Saanich Inlet time series [114] were used to inves-tigate changes in gene expression along water column redox gradients over time for selectedORFs involved in energy metabolism and electron shuttling. Quality controlled reads frommetatranscriptomes were mapped to identified ORFs of interest using bwa -mem [190] and readsper kilobase per million mapped (RPKM) per ORF was calculated using RPKM calculation inMetaPathways 2.5 [191]. For each population genome bin RPKM values for a given sample weresummed for ORFs with the same functional annotation to yield an RPKM for a given functionalgene. For other taxonomic groups in Saanich Inlet shown in Supplementary Figure C.6B, geneswere identified by sequence alignment searches of Saanich Inlet metatranscriptomes (bioSampleindicated above) assembled and conceptually translated using BLASTp against selected nitrogenand sulfur cycling genes from Hawley et al. [79] and RPKM values calculated as described above.944.4.8 Global distribution and expression of nosZFurther analysis was carried out to determine the global distribution of Marinimicrobia nosZin 594 metagenomes. The nosZ nucleotide sequences from SHBH1141 and ZA3312c, whichexhibited a 65% nucleotide identity to each other by BLAST, were clustered at 95% identity usingthe USEARCH cluster fast algorithm [232], resulting in three clusters, two SHBH1141 and oneZA3312c. Nucleotide sequence alignment was carried out using FAST [185], with parameters of>80% nucleotide identity and >60 bp alignment length against 594 metagenomes. For SaanichInlet and NESAP data sets, abundance of nosZ in a given metagenome or metatranscriptome wasdetermined by summing the RPKM value for ORF hits to either SHBH1141 or ZA3312c for a givenmetagenome or metatranscriptome. For 454 sequenced metagenomes and metatranscriptomes[134, 199], the number of reads which hit to either SHBH1141 or ZA3312c were summed for agiven metagenome. For the TARA Oceans data set [233], the number of genes identified in anassembled metagenome was summed. Metatranscriptomic data for Tara was unavailable at thistime.95Chapter 5A niche for NosZ? 45.1 IntroductionNitrous oxide (N2O) is a potent green house gas with 298 times the atmospheric heat-trappingactivity of carbon dioxide and is also currently the most dominant ozone depleting substance[234, 235]. Marine ecosystems are estimated to account for one third of total global N2O production[234], with lower oxygen (O2) concentrations correlating with increased N2O production [113].Oxygen minimum zones (OMZs) and coastal upwelling regions are seen as the dominant marinesources of N2O, where it is produced by microorganisms under low O2 or anoxic conditions[236, 237]. However, the organisms responsible for N2O production and consumption and theirdynamics within the marine environment have yet to be constrained. As concentrations of O2 inthe Global Ocean are expected to decrease significantly in the coming decades [11], identificationof marine N2O sources and sinks are a pressing area of interest.The only known biological sink for N2O within both marine and terrestrial environments isthe nitrous oxide reductase enzyme NosZ (encoded by the nosZ gene). Nitrous oxide reductaseis typically found as the final step in the denitrification pathway which reduces nitrate (NO3 – )to nitrite (NO2 – ), to nitric oxide (NO), then N2O and finally to nitrogen gas (N2), a process thatoccurs largely under O2-depleted to anoxic conditions (i.e. <20 µm). However, approximately40% of organisms carrying genes for NO3 – reduction do not carry the nosZ gene [238] resultingin incomplete denitrification and the production of N2O [239], as in the case of the prevalentOMZ Gammaproteobacteria, SUP05 [33, 34]. Furthermore, the high sensitivity of NosZ to O2,causing inhibition of N2O reduction, can result in N2O production from complete denitrifiers4Portions of this chapter, namely N2O concentrations in Saanich Inlet, have been previously published in Limnologyand Oceanography as A Multi-year time-series of N2O dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbiain 2017 by Capelle, D. W. and Hawley, A. K. and Hallam, S. J. and Tortell, P. D..96under microaerophilic conditions [240]. Nitrous oxide is also produced by several other processesincluding ammonia oxidation, nitrifier denitrification (where nitric oxide reductase reduces NOformed upon NO2 – oxidation) and commomox [241]. With these abundant and diverse sourcesof environmental N2O, recent studies including pure cultures [240, 242, 243], enrichments [244],molecular [245], genomic and metagenomic surveys [59, 246] and multi-omic models [132] suggestmultiple niches for non-denitrifying N2O-reducing organisms within O2-depleted environments.While organisms capable of surviving on N2O as the sole electron acceptor have been known fordecades [243], two clades of nosZ have only been recently observed [59] and much remains tobe understood regarding the dynamics of different N2O-reducing organisms within the GlobalOcean.Recent work has established two different types of nosZ [59]. Type I are typically foundin Alpha- Beta- and Gammaproteobacteria, 83% of which carry some genes for the completedenitrification pathway [238]. Type II are found with a broad taxonomic distribution of organisms,51% of which do not carrying any other genes for the denitrification pathway [59, 238, 246, 247].While Type I was thought to be the dominant NosZ, work in terrestrial systems has found Type IIto be up to an order of magnitude more abundant [238]. Differences in the structure of the nosgene cluster (NGC) indicate possible differences in energy metabolism between the two types.Type I nosZ NGC, with a twin-arginine signal sequence, typically occure in a nosRZDFYL cluster,encoding proteins responsible for passing electrons from quinol to NosZ via cytochrome bc1 andcytochrome c complex and NosR, a periplasmic flavin mononucleotide binding protein [115]. TypeII nosZ NGC, with a sec signal sequence, typically occurs in a nosZBDFYL cluster, though greatervariation exists than in Type I NGC and additional nosGH may also be present [115]. The Type IINGC found in Epsilonproteobacteria, for example, encodes proteins proposed to pass electronsfrom menaquinone/menaquinol to NosZ via cytochrome bc1 complexes, NosGH or NosB proteins.Notably, NosGH and NosB may be capable of generating a proton motive force by acting as amenaquinol-reactive proton pump [115], supplying additional energy for Type II N2O-reducingorganisms.Differences in the abundance and distribution of Type I and II nosZ as well as complete vsincomplete denitrification have been the focus of recent studies, with results pointing to several97possible environmental factors that may be mediating these dynamics. Ample carbon sourcesseem to support complete denitrifiers and organisms with Type I nosZ. With no limits on carbonsource, complete denitrifiers grow faster than organisms which only respire N2O [244]. However,with ample carbon supply and sufficient source of electron acceptors, denitrifying organismsmay carry out incomplete denitrification, reducing NO3 – and NO2 – but halting at NO reductionand producing N2O. Indeed, NOx– limitation is seen to favour complete denitrifiers, as theType I nosZ generally has a higher µmax/Ks and affinity for N2O to maximise energy yield fromdwindling NOx– and making them resistant to scavenging by Type II [242, 244, 245]. Finally, O2concentrations also appear to shape the abundance and distribution of Type I vs Type II nosZ as O2sensitivity may inhibit N2O reduction. However, recent research indicates some Type II appear torecover more quickly from O2 exposure and may operate under microaerophilic conditions [240],presenting a unique niche for nosZ. Indeed, recent studies have found evidence of association oforganisms carrying Type II nosZ with incomplete denitrifiers within agricultural soils and marinesediments [245, 248]. Similar association has also been proposed in Chapter 4 between incompletedenitrifier SUP05 and N2O-reducer Marinimicrobia SHBH1141 [249]. Much of the research on theType I and Type II nosZ abundance and distribution has been carried out in agricultural soils andwaster water treatment with the aim to mitigate N2O release, however, comprehensive research inmarine systems has been lacking.To date, identification of N2O-reducing organisms within marine environments has beenlimited to complete denitrifiers: Sulfuramonas gotlandica, isolated from the Baltic Sea [37] andSUP05 group member UThioglobus perditus (U to indicate uncultured), metagenome assembledgenome from the Peruvian upwelling [36]; and non-denitrifying N2O-reducers: MarinimicrobiaSHBH1141, single cell amplified genome from Saanich Inlet, and Marinimicrobia ZA3312c,metagenomic population genome bin from Saanich Inlet and Gulf of Maine [249]. In order tobetter phylogenetically constrain nosZ genes within OMZs and the Global Ocean and survey therelative abundance and distribution of different nosZ clades, I utilize single cell genomes fromSaanich Inlet, a seasonally anoxic fjord and model OMZ system on the coast of British Columbia.I define 13 NosZ clades encompassing both Type I and TypeII and phylogenetically anchoradditional six nosZ genes from OMZs. I further chart the distribution of these clades in Global98Ocean metagenomes and explore metatranscriptomes in the OMZs of Peru, Eastern Tropical SouthPacific and Saanich Inlet (metaproteomes on for Saanich Inlet). I explore the seasonal dynamics ofnosZ clades in Saanich Inlet over time and with accompanying rate measurements of nitrogenloss processes denitrification and anammox. Finally, I explore abundance and expression of nosZclades along gradients of O2, NO3 – and hydrogen sulfide (H2S) exploring possible niches forrespective clades.5.2 Results5.2.1 Inventory of single-cell amplified genomesIn order to better identify the taxonomic origins of nosZ genes within the Gobal Ocean, samplesfor single cell amplified genomes (SAGs) [105] from Saanich Inlet were collected, sequenced andnosZ genes identified. Samples for SAGs were collected in August 2011 from station S3 in SaanichInlet (48◦35.500 N, 123◦30.300W) at three depths to capture different chemical conditions: dysoxic(20-90 µm O2)at 100 m, suboxic (2-20 µm O2) at 120 m and sulfidic at 185 m. From the collectedsamples, three 96 well plates were sorted for each depth, resulting in 864 sorted wells, of thosewells, a total of 645 had a small subunit ribosonal RNA gene (SSUrRNA) that was able to beamplified and assembled as described in Stepanauskas and Sieracki, 2007 and Swan et al., 2013[77, 105], resulting in SSU rRNA genes from a range of Bacteria and a few Archaea. From thata total of 371 SAGs were chosen for sequencing based on taxonomy and aplification efficiency(Figure 5.1). Annotation of sequenced SAGs by Metapathways [182] revealed 51 nosZ genesequences in six different taxonomic groups (21 Arcobacteraceae, 12 Bacteroidales VC21, threeEctothiorhodospirales, two Marinimicrobia SHBH1141, two SAR324 and 10 SUP05 1a (See FigureD.1 for SUP05 phylogenetic tree). On average, SAGs containing nosZ were 62% complete with2.98% contamination (see TableD.1 for CheckM statistics on SAG completion and contamination[230]).99Euryarchaeota, marine group IIFlavobacteriales, CytophagaUnclassiedVerrucomicrobiaSpirochaetesEctothiorhodospiralesSUP05_singeltonSUP05_2SUP05_1bSUP05_1aArctic96BD19SUP05_1cGammaproteobacteria - otherEpsilonproteobacteria, ArcobacteraceaeSAR324NitrospinaDesulfobacteraceae; DesulfobacterDeltaproteobacteria - otherRhodobacterales; RhodobacterPelagibacter; SAR11Alphaproteobacteria - otherPlanctomycetesSHBH1141SHAN400 Arctic96B−7Lentisphaerae, LentisphaeraChloroexi, AnaerolineaeChlamydiae, ParachlamydiaceaeSaprospiralesFlavobacteriales - otherBacteroidales; VC21_Bac22 Bacteroidetes - otherAcidimicrobidae, Microthrixineae Candidate Phylum, ABY1_OD1UnclassiedThaumarchaeota, CenarchaealesnosZ12/182/42/1821/523/1210/48CollectedContain nosZ Sequenced255075number of SAGs:Archaea100 m 150 m 185 mBacteriaBacteroidetesMarinimicrobiaAlphaProteobacteria DeltaGammaSUP05 GroupFigure 5.1: Saanich Inlet SAG inventory and taxonomy. Taxonomy of Saanich Inlet SAGs collected (grey)and sequenced (coloured) from 100 m (green), 150 m (teal) and 185 m (purple) on August 10th 2011. Redstar indicates SAGs containing nosZ gene and number of SAGs with nosZ out of the total number of SAGsfor that taxa are show on the right (nosZ/total). Only SAGs with an amplified and assembled small subunitribosomal RNA gene are shown.1005.2.2 Clustering & genomic neighbourhood analysisIn order to confirm that the SAG nosZ sequences were consistent within a given taxonomy, theSAG nosZ nucleotide sequences as well as nosZ previously identified as Marinimicrobia ZA3312c[249] were clustered at 95% ID. Sequences from the same taxonomic group clustered together withthe exception of the Arcobacteracea that had two additional singletons (which clustered togetherat 90% identity). Within each taxonomy the genomic area surrounding nosZ (as compared amongSAG contigs) was consistent within the taxonomy and differed across taxonomic groups (Figure5.2). The Arcobacteracea, Ectothiorhodospirales and SUP05 1a all carried additional genes in thedenitrification cycle on the same contig, often interspersed within the nosZ gene cluster. Notably,nosR was found in the Type I nosZ NGC but not the Type II, though many genes were annotatedas hypothetical, better annotation may reveal more consistencies or differences across NGCs ofdifferent clades.5.2.3 NosZ phylogeny and abundanceTo phylogenetically place the nosZ genes from Saanich Inlet SAGs within the context of isolatedN2O-reducing organisms, NosZ protein sequences from Sanford et al. 2012, Saanich Inlet SAGsand Marinimicrobia ZA3312c, were clustered at 95% identity and a maximum-likelihood tree wasconstructed (Figure 5.3). The resulting tree retained previously observed topology of Type I andType II NosZ. Of the Saanich Inlet SAGs only SUP05 1a NosZ clustered in the Type I portion ofthe tree. Thirteen clades were resolved as indicated by clustering patterns. Saanich Inlet SAGnosZ sequences clustered into five pre-existing clades: Bacteroidales VC21 and MarinimicrobiaZA3312c NosZ sequences clustered closely together in Clade 5 consisting primarily of other mem-bers of the Fibrobacteres-Chlorobi-Bacteroidetes superphylum (FCB); Marinimicrobia SHBH1141NosZ sequences clustered in Clade 6 also with members of the FCB and Aquificae; SAR324and Ectothiorhodospirales NosZ sequences clustered in Clade 9 with several Proteobacteriallineages, as did Arcobacteracea NosZ in Clade 10. The SUP05 1a NosZ sequence clustered withAlphaproteobacteria in Clade 13.101SUP05_1anosZfkpAccmFccmEccmBsel1cc553nosDnapGnapHnosFnosYnuoNnuoAnosRnosZIOIAnirKnorDnorQnorEnorBnorCccoPccoOcbbIIcbbIIcbbIIIsurf1cbbIccoNBacteroidales VC21ArcobacteraceaMarinimicrobia SHBH1141SAR324nosZcytochromenosDABC trans. ATPaseABC trans. EctothiorhodospiralesnosDnosYlowCO2-inducedPRPS2phpAcrcBtsaAABC_trans.nosZnosDnosYfolnosZbgalbgalXlowCO2-inducednosZnosZnapHnosDnapGrraAnorCnorBnirSnirJnirFtrpDFigure 5.2: Genome neighbourhood analysis for nosZ SAGs. Genomic neighbourhood for Saanich InletSAG contigs containing the nosZ gene (red), other nos gene cluster genes (green) and other genes in thedenitrification pathway (blue).102Alkalilimnicola ehrlichii MLHE1Opitutaceae bacterium TAV5Azospirillum lipoferum 4BRhodothermus marinus DSM 4252Albidiferax ferrireducens T118Rhodobacter capsulatus SB 1003Dechloromonas aromatica RCBRobiginitalea biformata HTCC2501Ruegeria pomeroyi DSS3Rhodopseudomonas palustris TIE1Ignavibacterium album JCM 16511Cupriavidus metallidurans CH34Thiobacillus denitricans ATCC 25259Caldilinea aerophila DSM 14535 Anaeromyxobacter sp. KPseudomonas stutzeri A1501Sulfurovum sp. NBC371Sulfurimonas denitricans DSM 1251Leptospira biexaRalstonia eutropha H16Campylobacter curvus 525.92Leptothrix cholodnii SP6Prevotella denticola F0289Dinoroseobacter shibae DFL 12Thermomicrobium roseum DSM 5159Sulfurimonas denitricans DSM 1251Bradyrhizobium sp. BTAi1Nitratiruptor sp. SB1552Brucella suis ATCC 23445Hydrogenobacter thermophilus TK6Desulfomonile tiedjei DSM 6799Haloarcula marismortui ATCC 43049Desultobacterium dichloroeliminans LMG P21439Rhodanobacter fulvus Jip2Anaeromyxobacter sp. Fw1095Candidatus Accumulibacter phosphatis clade IIA Magnetospirillum gryphiswaldense MSR1Ochrobactrum anthropi ATCC 49188gamma proteobacterium HdN1Rhodopseudomonas palustris BisB18Rhodobacter sphaeroides KD131Campylobacter fetus subsp. fetus 8240Pyrobaculum calidifontis JCM 11548Rhodanobacter sp. 115Psychromonas ingrahamii 37Haliscomenobacter hydrossis DSM 1100Pseudomonas uorescens F113Marinimicrobia SHBH1141Desultobacterium dehalogenans ATCC 51507Pseudopedobacter saltans DSM 12145Shewanella loihica PV4Colwellia psychrerythraea 34HRhodospirillum centenum SWRoseobacter sp. SK20926Rhodopseudomonas palustris DX1Methylobacterium sp. 446Burkholderia pseudomallei 1106aAchromobacter xylosoxidans A8Neisseria lactamicaSUP05_1aMarivirga tractuosa DSM 4126Acidovorax sp. JS42Denitrovibrio acetiphilus DSM 12809Bacteroida;es VC21_Bac22Pyrobaculum ferrireducensParacoccus denitricans PD1222Aromatoleum aromaticum EbN1Dyadobacter fermentans DSM 18053Deltaproteobacteria SAR324 Cellulophaga algicola DSM 14237Opitutus terrae PB901Halopiger xanaduensis SH6Sinorhizobium melilotiMarinobacter hydrocarbonoclasticus ATCC 49840Magnetospirillum magneticum AMB1Nitratifractor salsuginis DSM 16511Ferroglobus placidus DSM 10642Polymorphum gilvum SL003B26A1Alicycliphilus denitricans BCPseudomonas aeruginosa LES431Azospirillum brasilense Sp245Photobacterium profundum SS9Azoarcus oleariusEctothiorhodospiralesPseudomonas mendocina NK01Ralstonia pickettii 12JWolinella succinogenesRhodanobacter denitricansBradyrhizobium diazoeciens USDA 110Marinimicrobia ZA3312cSphaerobacter thermophilus DSM 20745Oligotropha carboxidovorans OM5Bordetella petriiFlavobacteriaceae bacterium 351910Sulfurimonas autotrophica DSM 16294Hahella chejuensis KCTC 2396Halorubrum lacusprofundi ATCC 49239Desultobacterium hafniense DCB2Riemerella anatipestifer ATCC 11845 = DSM 15868ArcobacteraceaSalinibacter ruber DSM 13855Gemmatimonas aurantiaca T27Geobacillus thermodenitricans NG802Pseudomonas brassicacearumDechlorosoma suillum PSShewanella denitricans OS217Persephonella marina EXH1Rhodanobacter thiooxydans LCS2Campylobacter concisus 13826Hyphomicrobium denitricans ATCC 51888Thioalkalivibrio suldiphilus HLEbGr7Gramella forsetii KT0803Rhodopseudomonas palustris BisA53Desulfotomaculum ruminis DSM 2154Pseudovibrio sp. FOBEG1Halogeometricum borinquense DSM 11551Maribacter sp. HTCC2170Thauera sp. MZ1TDesulfosporosinus meridiei DSM 13257Roseobacter denitricans OCh 1148810098100100871009710093100999999100100911001001001007999100701009710010086100968910098100761009492100999375100100100100100100861009410010078100100868510099100100981007794100100999199811001007410093Tree scale: 0.1255105075OxicDysoxicSuboxicAnoxicSuldicnosZ Clade:Unk12345678910111213RPKM Scale:103Figure 5.3: NosZ phylogenetic tree with global abundance and expression (previous page). Phylogenetictree of nosZ gene for cultured isolates [59] and SAGs from Saanich Inlet with clade labels (far right).Abundance (by RPKM) of clades (or specific SAG sequences within a clade where connected by a thickblack line) in globally found metagenomes (grey) and metatranscriptomes (coloured) for indicated chemicalconditions: dysoxic (20-90 µm, green), suboxic (2-20 µm, teal), anoxic (0-2 µm, blue), and sulfidic (purple)shown to the right. Bubble size represents the RPKM averaged for the number of samples from eachchemical condition. Sequences not mapping to leaves, but to internal nodes made up 2.2% and 0.6% ofmetagenomic and metatranscriptomic sequences respectively.With the inclusion of OMZ origin NosZ sequences in the phylogenetic tree with isolatedorganisms, I explored the abundance and expression of nosZ clades in Saanich Inlet and the GlobalOcean under different O2 regimes: oxic (>90 µm), dysoxic (20-90 µm), suboxic (2-20 µm), anoxic(0-2 µm) and sulfidic. Metagenomic datasets were searched by protein sequence similarity toNosZ SAG sequences using LAST+ [185]. Sequences were identified in Saanich Inlet [114], otherOMZs (Peru [199] and Eastern tropical south pacific (ETSP) [134], North Eastern Subarctic Pacific(NESAP), Eastern South Atlantic (Knorr cruise), TARA Global Oceans survey [233] and a collectionof >200 marine metagenomes sourced globally (see Table C.1 for list). Identified NosZ sequenceswere then mapped to the phylogenetic tree using MLTreeMap [250] and calculated RPKM values(or relative abundance for 454-sequenced datasets) were summed for hits to individual SAGsequences or clusters.The top most abundant nosZ clades in the surveyed metagenomes were Clade 6, containingMarinimicrobia SHBH1141, Clade 9 containing the Ectothiorhodospirales and SAR324, Clade 5containing Bacteroidales VC21 and Marinimicrobia ZA3312c, followed by Clade 13 containingSUP05 1a (Figure 5.3, Table D.2). Within other OMZ environments, particularly the ETSP [134] andthe Peruvian upwelling [199], the distribution of nosZ clades in the metagenomes were similar toSaanich Inlet. While the most highly abundant clades in Saanich Inlet correspond to the collectedSAGs, non-OMZ environments, represented in the TARA, Knorr and Global datasets showedbroader distribution among the different clades (Figure 5.4).Where available, nosZ gene expression in metatranscriptomic datasets (de-novo assembliesfor Saanich Inlet, NESAP and Knorr and 454-sequenced reads for Peru) were also searched byprotein sequence homology to Saanich Inlet NosZ SAG sequences using LAST+ [185]. It is notedthat low abundance sequences in the transcriptome may not assemble and as such would not be104detected with this method, unassembled reads may still be present but not detected here. Topmost abundantly expressed clades in the metatranscriptomes were similarly dominated by Type IINosZ sequences and were similar to the metagenomes however, in slightly different order (TableD.3). Clade 6 Marinimicrobia SHBH1141 showed highest levels of expression followed by Clade5 containing Marinimicrobia ZA3312c and Bacteroidales V21. Further differences in nosZ geneexpression are explored later in the text.InternalNodeMetaG MetaT MetaG MetaT MetaG MetaT MetaG MetaTMetaG MetaG MetaGSaanich (/100) PERU NESAPClade NumberType IIType IETSP KNORR TARA GLOBAL123456789101112130.20.40.6Normalised RPKM:Figure 5.4: Abundance and expression of nosZ in global systems. Abundance (in RPKM) of nosZ cladesin various datasets globally, for metagenomes (MetaG) and metatranscriptomes (MetaT)(where available,indicated by a vertical line with MetaT at the base) normalized to the total number of samples containingnosZ in a given dataset. Saanich Inlet abundance is shown decreased by factor of 100 for the purpose ofvisualization. Internal Node indicates nosZ RPKM mapped to internal nodes of the tree, rather then specificclades. Global refers to collection of >300 globally sourced metagenomes.1055.2.4 nosZ global distributionThe abundance and distribution of nosZ clades in other OMZs and in global ocean metagenomesshowed substantial diversity of nosZ clades with biogeographic patterns emerging for somespecific regions such as the Eastern South Atlantic (Knorr) (Figure D.4) as well as regions withinthe TARA Oceans global cruise track (Figure D.5). Notably, Clade 2 was more abundant in theKnorr samples and also apparent in the deep chlorophyll maximum (DCM) and mesopelagic inthe TARA samples. Knorr deep water samples (>4000 m) showed an abundance of sequenceswhich mapped to internal nodes, further placement of these sequences in the NosZ phylogenetictree may resolve yet unidentified clades. Points where Tara cruise track passes through OMZs(as indicated by coloured dot at the base of the stacked bar) generally indicated a shift in cladestructure within the DCM and mesopelagic samples, notably the inclusion of Clades 9 and 8.Within other OMZs such as ETSP and Peru, where multi-omic samples were available, therewas a similar clade distribution to Saanich Inlet (Figures 5.5, 5.6) with an overall dominance ofClade 6 in the metagenome. Indeed, Clade 6 appears to be primarily constrained to sulfidicsystems and does not appear in Tara or Knorr samples. This is consistent with the distribution ofthe Marinimicrobia SHBH1141 clade in Chapter 4. Interestingly, there was a notable inconsistencyin the metatranscriptome for Peru at 20 m where Clades 11 and 12 were more highly expressed.Both clades 11 and 12, found within the Type I portion of the NosZ tree, likely belong to completedenitrifiers. The chemistry of the water column at 20 m indicated a depletion of both NO3 – andNO2 – [199] consistent with denitrification, pointing to a possible niche for complete denitrifierswithin OMZs as NOx– become scarce.5.2.5 NosZ time resolved multi-omic dynamics in Saanich InletTo further identify specific niches for nosZ clades I explored clade dynamics in the Saanich Inletmulti-omics dataset outlined in Chapter 2 [114, 115]. By charting multiple levels of informationflow at the DNA, RNA and protein level patterns of expression along gradients of O2, NO3 – andH2S were observed. In general, nosZ abundance increased with depth, generally showing peakabundance at 200 m but also at 120 m and 135 m following renewal in cruise SI075, consistent1060.00.20.40.60.80.00.20.40.60.80.00.20.40.60.80.00.20.40.60.80.00.20.40.60.80.00.20.40.60.8MetagenomePERUA. B.ETSPMetagenomeMetatranscriptome0.00.10.20.30.40.00.10.20.30.40.00.10.20.30.40.00.10.20.30.40.00.10.20.30.40.00.10.20.30.4520405060800.000.030.060.090.000.030.060.090.000.030.060.090.000.030.060.090.000.030.060.090.000.030.060.090.000.030.060.090.000.030.060.090.000.030.060.09355065708085110150200nosZ RPKM in MetagenomenosZ RPKM in MetagenomeOxicDysoxicSuboxicAnoxicSuldic42313121051169871Clade: Chemistry:Figure 5.5: Peruvian and ETSP nosZ clade distribution. Distribution of nosZ clades in metagenome andmor metaranscriptome over depth in the Peruvian upwelling (PERU)(12◦21.88’S to 77◦0.00’W, [199]) andEastern Tropical South Pacific (ETSP)(20◦07’S, 70◦23’W, [18]). Abundance in RPKM of nosZ clades inmetagenome and metatranscriptome (Peru only) at indicated depths as found in previous studies. Chemicalcondition of individual sample are indicated in the coloured dot at the base of each stacked bar; oxic(>90 µm yellow), dysoxic (20-90 µm, green), suboxic (2-20 µm, teal) and anoxic (0-2 µm, blue).107with shoaling of 200 m waters in the renewal process (Figures 5.6, D.2) [8]. Clade abundancesshowed consistency with global analysis in the most abundant Clades, 6, 5, 13, 9, 8 with Clade 2also showing intermittent low levels of abundance. Generally Clades 9 and 8 were more abundantin 100 m samples and Clade 2 more abundant at 100 and 120 m and during renewal at 135 m and150 m in samples SI072, SI073 and SI074.Saanich Inlet metatranscriptomes showed lower diversity compared to the metagenomes(Figure 5.6). The most abundant clades in the metatranscriptome included 6, 5 and 13 (FigureD.2). Total nosZ expression was highest during renewal in August 2010, cruise SI048, from 120 mdown, peaking at 150 m. Expression from cruise SI048 varied from different clades, with Caldes6 and 13 dominating at 120 and 135m, Clade 6 dominating at 150 m and Clade 5 dominatingat 200 m. Interestingly, different taxa within some clades showed differential expression bothover time and in the water column. Within Clade 5, Bacteroidales VC21 dominated expression insulfidic waters, while Marinimicrobia ZA3312c was more often expressed in anoxic and suboxicwaters (Figure 5.7). Clade 9 Ectothiorhodospirales was more highly expressed in sulfidic watersand following renewal at 135 m in cruise SI075, possibly due to shoaling of sulfidic basin waters,while SAR324 was expressed predominantly in dysoxic waters. Other members of Clade 9 alsoshowed expression in predominantly sulfidic waters for cruises SI047, SI048 and SI054, time whenthe water column was highly stratified and sulfide accumulated at shallower depths than usual.Clade 13 was primarily dominated by SUP05 1a expression in anoxic waters.1080010002000300010002000300001000200030000100020003000010002000300001000200030000100020003000nosZ RPKMnosZ RPKM7.5 e-35.0 e-32.5 e-307.5 e-35.0 e-32.5 e-307.5 e-35.0 e-32.5 e-307.5 e-35.0 e-32.5 e-307.5 e-35.0 e-32.5 e-307.5 e-35.0 e-32.5 e-307.5 e-35.0 e-32.5 e-30NosZ NSAF0100200300400010020030040001002003004000100200300400010020030040001002003004000100200300400Jun09Aug09Nov09Feb10Jul10Aug10Jan11Feb11Aug11Aug12-1Aug12-28Sep12-10Sep12-20Jun09Aug09Nov09Feb10Jul10Aug10Jan11Feb11Aug11Aug12-1Aug12-28Sep12-10Sep12-20Jun09Aug09Nov09Feb10Jul10Aug10Jan11Feb11Aug11Aug12-1Aug12-28Sep12-10Sep12-202007 2008 2011 2012 2013 20142009 2010Jun09Aug09Nov09Feb10Jul10Aug10Jan11Feb11Aug11Aug12-1Aug12-28Sep12-10Sep12-2010m100m120m135m150m165m200mSI034SI036SI039SI042SI047SI048SI053SI054SI060SI072SI073SI074SI075cruise10m100m120m135m150m165m200mSI034SI036SI039SI042SI047SI048SI053SI054SI060SI072SI073SI074SI075cruise10m100m120m135m150m165m200mSI034SI036SI039SI042SI047SI048SI053SI054SI060SI072SI073SI074SI075cruiseMetaproteomeMetatranscriptomeMetagenomeRenewal eventRate measurement taken DysoxicSuboxicOxicAnoxicSuldic300303020103020109633O2 µMNO3 µMNO2 µMH2S µM50Depth (m)15010020050Depth (m)15010020050Depth (m)15010020050Depth (m)150100200423 1312105 1169871Clade: Chemistry:109Figure 5.6: Saanich Inlet time series chemical profiles and nosZ multi-omic dynamics(previous page).Saanich Inlet water column chemistry for oxygen (O2), nitrate (NO3 – ), nitrous oxide (N2O) and hydrogensulfide (SO42 – ) over seven years. Abundance of nosZ clade in the metagenome, metatranscriptome andmetaproteome for multiple time points (Month and year on top x axis, cruise ID on bottom x axis) anddepths (Y-axis) in Saanich Inlet. Chemical condition of individual sample are indicated in the coloureddot at the base of each stacked bar; oxic (>90 µm yellow), dysoxic (20-90 µm, green), suboxic (2-20 µm,teal), anoxic (0-2 µm, blue), and sulfidic (purple). Absence of coloured bar and chemistry dot indicate nosample was available for analysis. Outline of a bar with the chemistry dot indicate no nosZ as detected.(The chemical profiles portion of this figure was previously published in Capelle et al. 2016, used withpermission.)110050010001500050010001500050010001500050010001500050010001500050010001500100m120m135m150m165m200mSI047SI048SI054SI072SI073SI074SI075cruise cruisecruise cruisecruisenosZ RPKM in MetatranscriptomenosZ RPKM in MetatranscriptomenosZ RPKM in MetatranscriptomenosZ RPKM in MetatranscriptomenosZ RPKM in MetatranscriptomeClade 5Bacteroidales VC21Marinimicrobia ZA3312cOther Clade 5Clade 9EctothiorhodospiralesSAR324Other Clade 9 Clade 5010002000300001000200030000100020003000010002000300001000200030000100020003000100m120m135m150m165m200m100m120m135m150m165m200mSI042SI047SI048SI054SI072SI073SI074SI075Clade 6Marinimicrobia SHBH1141Other Clade 6Clade 13SUP05_1aOther Clade 13Clade 10ArcobacteraceaOther Clade 10Clade 6020406080020406080020406080020406080020406080020406080100m120m135m150m165m200mSI047SI048SI054SI072SI073SI074SI075Clade 9036912036912036912036912036912120m135m150m165m200mSI047SI048SI054SI072SI073SI074SI075Clade 10010020030040050001002003004005000100200300400500010020030040050001002003004005000100200300400500SI042SI047SI048SI054SI072SI073SI074SI075Clade 13111Figure 5.7: Metatranscriptome expression dynamics of nosZ subclades in Saanich Inlet (previous page).Metatranscriptome RPKM for indicated nosZ subclades in Saanich Inlet over multiple time points anddepths. No Expression from 10m samples was detected and this thus not included in this figure.Saanich Inlet metaproteome NosZ was nearly completely dominated by Clade 6, affiliatedwith Marinimicrobia SHBH1141. NosZ protein was most abundant in cruise SI047, when the watercolumn had not been renewed in over two years and was highly stratified. Interestingly, nosZexpression in the metatranscriptome was highest the following month in cruise SI048 followingthe influx of renewal waters (as indicated by drop in H2S concentration in basin waters (seeFigure 5.8)). Additionally, SI048 200 m showed protein expression from Clade 5 in relatively highabundance, consistent with Clade 5 expression in the metatranscriptome for the same time pointand depth. In general, Expression of NosZ in the proteome did not correspond to potential ratesor concentrations of N2O in the water column (Figure 5.8). However, abundance of NosZ proteinwould likely result in low N2O concentrations, confounding the expected correlation betweenN2O and NosZ expression.To further investigate the activity of NosZ and nitrogen-loss pathways including denitrifica-tion and anammox in Saanich Inlet, I carried out processes rate measurements with 15NO3 – andand 15NH4+ respectively. Overall, anammox appeared to be the dominant nitrogen-loss processeswith high potential rates of denitrification (N2 production) measured only during renewal inAugust 2010 (SI048) (Figure 5.8). While high potential rates of denitrification coincided withmaximal expression of nosZ in the metatrancriptome at cruise SI048 they did not coincide withpeak NosZ protein expression. In fact, peak protein expression of NosZ was at 120 m at SI053(unfortunately no metatranscriptome was available for that time point). NosZ protein was detectedin medial amounts for all samples were denitrification was measured. Interestingly, N2O potentialproduction was observed under anoxic, sulfidic conditions, with no detected NO3 – or NO2 –(SI047 200 m, SI048 135 m, 150 m, 200 m), though at SI048 the observation of N2O production didcorrespond to high rates of denitrification.112SI047 SI048x 10x 10x 10SI054SI053July 2010 August 2010 February 2011January 2011Denitrification N2O production rate magnitudeNo N2O production measuredAnammoxO2 H2SNH4+NO2-NO3-Figure 5.8: Saanich Inlet denitrification and anammox rates. Potential rates for denitrification (orange)and anammox (blue) taken as production of 30N2 from addition of added 15NO3 – and 29N2 from 15NH4+addition respectively. Relative magnitude of N2O production shown by purple circle, empty circle indicatesno N2O production measured and no circle indicates no measurement taken. Horizontal black line indicatesdepths at which rate measurements were taken. Absence of coloured bar indicates no rate detected.1135.2.6 nosZ global nichesMapping NosZ clades and abundance in available metagenomes and metatranscriptomes ontochemical parameters of O2, H2S and NO3 – revealed patterns of distribution and expression alongenvironmental gradients. Within the metagenome, Clades 2, 4, 5, 7, 8, 9, 12 and 13 showed adistribution that included higher O2 concentrations and a range of NO3 – concentrations. Clades 3,6, 10 and 11 showed presence with O2 concentrations much closer to zero, though a range of NO3 –was still is apparent. All clades were apparent under sulfidic conditions, though some appearedin higher abundance in anoxic waters. Expression was not observed to a large extent under higherO2 conditions and nosZ was most abundantly expressed under sulfidic conditions (with zeroNO3 – ) form Clades 5, 6 and 13. Some expression was seen at low to 0 µm O2 concentrationsand higher NO3 – conditions. Overall, Clade 6 appeared to be most dominantly expressed in themetatranscriptome and primarily under sulfidic conditions.114I 3 4 56 7 8 910 11 12 13010203040010203040010203040010203040101001000Metagenome Total RPKM per clade per sample:Metatransciptome201001002003002010010020030020100100200300NO3- (µM)O2 (µM)H2S (µM)O2 (µM)H2S (µM)O2 (µM)H2S (µM)4231312105 116 9871Clade:2 3 4 56 7 8 91 1 1 1115Figure 5.9: nosZ clade distribution and expression along chemical gradients (previous page). Distribu-tion and expression of nosZ clades along oxygen (O2), hydrogen sulfide (H2S) and nitrate (NO3 – ) from alldatasets where size of symbol corresponds to RPKM in metagenome (X) and metatranscriptomes (colouredsymbols); Clade 5:  Bacteriodetes, 4 Marinimicrobia Za3312c; Clade 6: 4 Marinimicrobia SHBH1141;Clade 9:  Ectothiorhodospirales, 4 SAR324; Clade 10:  Arcobacter; Clade 13:  SUP05 1a.5.2.7 Completing the tree with additional cladesSeveral NosZ sequences could not accurately be mapped to specific taxa and were consequentlymapped to internal nodes by MLTreeMAP (Figure 5.10) [250]. The tree was rebuilt including thesesequences to determine if they were new nodes or variants of existing leaves in the tree. Afterremoving sequence that were below 30 amino acids long, 13 new sequences were added to thetree, resulting in two new clades and the re-assortment of another.A new clade was formed from three sequences branching deeply in between the halophilicArchaea making up Clade 1 and the remainder of the Type II portion of the tree. These sequenceswere assigned a taxonomy of Bacteria using the TreeSAP algorithm to determine likely taxonomybased on the surrounding sequences in the tree [251]. These sequences were from differentlocations geographically, the Eastern tropical north pacific OMZ, the Arabian up-welling OMZ(branching together) and a third from Juan DeFuca hydrothermal vent.116Rhodanobacter sp. 115TreeSAPP Bacteria ETNP OMZWolinella succinogenesTreeSAPP Rhodobacteraceae SOUTH PACIFIC TROPICAL GYRE TreeSAPP Betaproteobacteria RIFLE USA GROUND WATERDechlorosoma suillum PSArcobacteraceaHahella chejuensis KCTC 2396Ferroglobus placidus DSM 10642SUP05_1aDeltaproteobacteria SAR324Caldilinea aerophila DSM 14535 = NBRC 104270Rhodothermus marinus DSM 4252Haloarcula marismortui ATCC 43049Rhodanobacter fulvus Jip2Maribacter sp. HTCC2170Campylobacter curvus 525.92Halopiger xanaduensis SH6Pseudomonas aeruginosa LES431EctothiorhodospiralesShewanella denitricans OS217Dechloromonas aromatica RCBThioalkalivibrio suldiphilus HLEbGr7Ignavibacterium album JCM 16511Robiginitalea biformata HTCC2501Gramella forsetii KT0803Pseudomonas uorescens F113Rhodanobacter thiooxydans LCS2Sphaerobacter thermophilus DSM 20745TreeSAPP Campylobacterales FRASASSI GORGE CAVE WATERMagnetospirillum gryphiswaldense MSR1Sulfurimonas autotrophica DSM 16294TreeSAPP Burkholderiales ETNP OMZColwellia psychrerythraea 34HCampylobacter concisus 13826Marinobacter hydrocarbonoclasticus ATCC 49840Pseudomonas brassicacearum subsp. brassicacearum NFM421Nitratiruptor sp. SB1552Candidatus Accumulibacter phosphatis clade IIA str. UW1Pyrobaculum ferrireducensCellulophaga algicola DSM 14237Gemmatimonas aurantiaca T27TreeSAPP Rhodobacteraceae SI augustPhotobacterium profundum SS9Pseudomonas stutzeri A1501Magnetospirillum magneticum AMB1Halogeometricum borinquense DSM 11551Campylobacter fetus subsp. fetus 8240TreeSAPP Gammaproteobacteria THIOGLOBUS PERDITUSHalorubrum lacusprofundi ATCC 49239Alkalilimnicola ehrlichii MLHE1Shewanella loihica PV4TreeSAPP Bacteria JUANDEFUCA HYDROTHERMAL VENT TreeSAPP SAR324 cluster SI end of augustTreeSAPP Burkholderiales SIThermomicrobium roseum DSM 5159Pyrobaculum calidifontis JCM 11548Pseudomonas mendocina NK01Denitrovibrio acetiphilus DSM 12809 TreeSAPP Bacteria ARABIAN upwelling OMZSulfurovum sp. NBC371Persephonella marina EXH1Sulfurimonas denitricans DSM 1251TreeSAPP Epsilonproteobacteria YELLOWSTONE THERMAL SPRINGSRhodanobacter denitricansSulfurimonas denitricans DSM 1251Hydrogenobacter thermophilus TK6TreeSAPP Gammaproteobacteria FRASASSI GORGE CAVE WATERMarinimicribia ZA3312cLeptospira biexaPsychromonas ingrahamii 37Marinimicrobia SHBH1141Bacteroidales VC21_Bac2210010088100981009410010010092100100731001009410080901001009910094611006910010099100100747596961001001005110076991007510073100100859710084741008292100936095536893Nitratifractor salsuginis DSM 16511nosZ Clade:122a34567899a10111213117Figure 5.10: NosZ tree with additional environmental sequences (previous page). Phylogenetic tree ofNosZ protein sequences from figure 5.3 including metagenomic sequences which were previously mappedto internal nodes. Clades which were unchanged and do not harbour SAG or metagenomic sequenceswere collapsed. Blue sequence names indicate sequences from SAGs, green sequence names indicate newmetagenomic sequences.The second new clade is formed from a single sequence branching between Clades 8 and9 and is taxonomically assigned to Epsilonproteobacteia and was from Yellowstone thermalsprings. Two additional sequences were also added to Clade 9, a Campylobacterales from afresh water gorge in Farassi Italy and a Betaproteobacteria from ground water Colorado USA.These taxonomic assignments are consistent with Clades 9 and 10 containing NosZ from variousproteobacteria. Clade 9 is further re-arranged with the addition of NosZ from UThioglobus perditusa metagenome assembled genome from the Peruvian upwelling [36], which is part of the SUP05clade. Interestingly, the UT. perditus NosZ does not cluster with the Saanich Inlet SAG NosZ inthe Type I portion of the tree. These additions to the tree further rearrange Clade 9 and 10, withSAR324 and an additional sequence assigned to SAR324 from Saanich Inlet, moving to Clade 10.Within the Type I portion of the tree, a deep branching node at the base of Clade 12 isformed by two sequences taxonomically assigned to Burkholderiales (Betaproteobacteria). Onesequence is from the Eastern Tropical North Pacific (ETNP) and one from Saanich Inlet. Alsowithin Clade 12, an additional sequence was added branching with Thioalkalivibrio and assignedto Gammaproteobacteria from fresh water sulfidic gorge in Frasassi Italy. Within Clade 13, anadditional, deeper branching node, was formed by two sequences taxonomically assigned toRhodobacteraceae. One sequence is from the South Pacific Tropical Gyre (SPTG) and anotherfrom Saanich Inlet.5.3 DiscussionUsing SAGs, I was able to phylogenetically anchor environmental metagenomic NosZ sequences,identify previous unknown taxonomic groups with the potential metabolic capacity to reduceN2O and appended additional clades to the NosZ reference tree. Furthermore, assessmentof the abundance of identified NosZ clades confirmed the recently found global distribution118of Marinimicrobia SHBH1141 nosZ and established it as the dominant nosZ in OMZs waters.Taxonomies not previously known to carry nosZ such as Bacteroidales VC21, SAR324 and theGammaproteobacterial Ectothiorhodospirales were identified. Saanich Inlet SUP05 1a SAGsappear to be similar to the SUP05 group UT. perditus (unclutured), recently binned from PeruvianOMZ metagenomes, as both are seen to carry nosZ gene cluster [36]. However, the differentlocation of these two SUP05 NosZ sequences in the phylogenetic tree raises several questionsabout SUP05 metabolic flexibility and convergent evolution. While little is known about SAR324,SAGs from the deep ocean [77] and metagenome assembled genome from the Guaymas Basin[27] indicate capacity for C1 metabolism and sulfur oxidation, as well as genes for initial steps ofdenitrification. Further investigation of SAR324 Saanich Inlet SAGs should determine SAR324to be a complete denitrifier or non-denitrifying N2O reducer and also identify linkages betweenproposed nitrogen, sulphur and carbon metabolisms in OMZs. No other genomic information isavailable for Bacteroidales VC21, a group often found in OMZs [2] and hydrothermal vents [252],further metabolic analysis of the Saanich Inlet Bacteroidales VC21 SAGs may provide importantinformation with respect to nitrogen cycling in the Global Ocean.Samples from the deep South Atlantic (>4000 m) from the Knorr Cruise and globallysourced metagenomes showed several sequences mapping to internal nodes on the NosZ tree,which were then incorporated into a new tree, adding a few new leaves and two new clades.The addition of new clades from relatively unexplored environments suggests that metagenomicanalysis of environments such as deep sea sediments and hydrothermal vents may yield additionalnovel nosZ clades and unanticipated potential N2O sinks.Some trends are apparent in the biogeography of individual nosZ clades. Clade 2, consistingpredominantly of Firmicutes, appears to represent an open ocean clade given its consistentabundance within the TARA and Knorr datasets and relatively low abundance in OMZ datasets.The apparent restriction of Clade 6 to systems with sulfide (Saanich Inlet, Peruvian upwelling)or active cryptic sulfur cycling (ETSP) [78] confirm Clade 6 Marinimicrobia SHBH1141 to be theprimarily active N2O-reducer within OMZs. The appearance of Clade 5 in higher abundances inmore oxygenated waters both in the open ocean and OMZ samples as well as in Saanich Inletrenewal samples, suggests a higher O2 tolerance for this clade. Clade 5 also shows interesting119expression dynamics between clade members Bacteroidales VC21 and Marinimicrobia ZA3312calong the Saanich Inlet water column over time including a higher abundance of BacteroidalesVC21 during renewal and may indicate its presence in oxygenated renewal waters. Assessmentof microbial community and nosZ clade structure within renewal waters would be necessary toconfirm this hypothesis.Interestingly, inconsistencies in expression dynamics within certain clades in Saanich Inletsuggests that organisms within a given clade may not behave similarly and thus clade membershipmay not be the best predictor of expression patterns or global distribution. This points to regulatoryfactors, likely within the NGC, which may differ within clades. Further analysis of elements withinthe NGC, including phylogenetic trees, may help to identify regulatory elements and possiblyidentify horizontal gene transfer either within or between clades leading to variation in expressionpatterns. Ultimately, understanding the environmental conditions leading to expression of NosZand N2O reduction will aid in understanding of N2O consumption within the ocean.The prevalence of nosZ within the surface waters of TARA Global Oceans Survey as wellas throughout the water column in Knorr metagenomes is puzzling, as N2O reduction is ananaerobic process. Presence in oxygenated waters may reflect particle association of nosZ carryingorganisms, as observed in the ETSP subsurface OMZ waters [15]. The formation of anoxic micro-niches within particles may serve to support anaerobic processes in otherwise oxygenated bulkwater [253], providing an additional niche for N2O production and consumption. Recently, N2Oproduction was observed from Nitrococcus [54], a nitrite oxidizing bacteria isolated from OMZwaters and points to a potential source of N2O for N2O-reducers in dysoxic and oxygenatedwaters. Additional work on rate measurements of N2O production and consumption within openocean surface waters and on particles may identify a previously unaccounted for N2O reducingcapacity.The dynamics of the NosZ proteins in Saanich Inlet, coupled to rate measurements, indicatea discrepancy between RNA and protein expression. Reasons behind this most likely lie insampling methodology. Although care was taken to reduce wait times between sample collectionand filtering for RNA and protein, the microbial community may respond to trace amounts ofO2 introduced during collection, altering expression profiles. Lack of detected 30N2 production120in many samples may also indicate reduction of NOx– to ammonia (NH4+) via dissimilatorynitrate reduction to ammonium (DNRA) rather than to N2 via denitrification. Future ratemeasurements including DNRA and N2O production as well as analysis of the metatranscriptomesand metaproteomes for genes involved in other transformations in the nitrogen cycle and housekeeping genes/proteins from various groups of interest may provide additional insight intonitrogen cycling processes at work in Saanich Inlet.It is intriguing that Saanich Inlet showed a low abundance of Type I nosZ that are generallyassociated with complete denitrifiers. Type I nosZ were seen more abundantly at specific depthsin the Peru, ETSP and TARA data sets. Low occurrence of Type I nosZ in Saanich Inlet suggeststhat the beneficial mutualism between incomplete denitrifying type SUP05 and MarinimicrobiaSHBH1141 proposed in Chapter 4 [249] may serve to outcompete individual Type I completedenitrifiers. Particularly within Saanich Inlet, the fluctuations in NOx– and H2S may select fororganisms with robust metabolisms such as SUP05 with multiple NO3 – reductases and multipleH2S oxidases [33], as well as Marinimicrobia capable of storing polysulfide for reduction/ox-idation under energy stress [249]. The proposed mutualism between these organisms wouldfurther their resilience beyond an individual complete denitrifier such as S. gotlandica, similarto Arcobacteraceae found in Saanich Inlet. Interestingly, the presence of nosZ in SUP05 1a alsoposits the possibility of a similar mutualism between incomplete and complete denitrifying SUP05clades which could also account for the success of SUP05 in Saanich Inlet. The extent to whichthese partnerships occur and the ramifications for N2O production and consumption in GlobalOcean OMZs have yet to be determined.The presence of different nosZ sequences in the Saanich Inlet SUP05 SAGs and in thePeruvian upwelling UT. perditus metagenome assembled genome (MAG) brings up questionsabout SUP05 metabolic flexibility and convergent evolution. It also may call into question thevalidity of MAGs to reflect natural populations. Metagenome assembled genomes have recentlycome into wide use in the field of metagenomics as a method for automated generation ofpopulation genomes. However, the methods of validating their completeness and contaminationhave yet to be thoroughly vetted. When considering reasons for different nosZ sequences in thesetwo SUP05 genomes, the possibility that the nosZ in UT. perditus is a contaminant is raised, however,121its location on a long contig (>10,000 base pairs) makes it unlikely that the UT. perditus nosZ is acontaminant. The presence of nosZ on 10 out of 48 SUP05 1a SAGs (several of which also reside oncontigs >10,000) is fairly conclusive that the SUP05 Type I nosZ is contained in at least some of theSUP05 1a population in Saanich Inlet. Interestingly, the Ectothiorhodospirales-type nosZ foundon a single SUP05 1c SAG (contig size >45,000 base pairs), thought to be a contaminant, mayreflect the ability of SUP05 to pick up a nosZ NGC from the environment when it is advantageousto do so. This supports an idea of convergent evolution of two different SUP05 groups, gainingthe same function, i.e. N2O-reduction, from different sources to occupy a similar niche. Furtherinvestigation of the structure and similarity of the NGC from the Saanich Inlet SAGs for SUP05 1a,SUP05 1c and Ectothiorhodospirales, and UT. perditus would be very informative as to the locationand method of potential gene transfer. Further analysis of Saanich Inlet population dynamics ofSUP05 groups carrying respective NGCs, as well as the dynamics of the UT. perditus NGC withinPeruvian upwelling metagenomes, may highlight conditions under which SUP05 N2O-reducingcapacity would be advantageous. Such information would be valuable for understanding thecontribution of N2O production/consumption by this abundant OMZ group.Furthermore, the potential of both complete and incomplete denitrifying SUP05 clades andinteractions between N2O-reducing Marinimicrobia SHBH1141 brings up questions regardinggene loss/acquisition and fitness of individual organisms vs fitness of metabolically coupledpartners/communities. Incomplete denitrifying SUP05 has apparently made a living by creatingthe nosZ niche for other organism by producing N2O. Further investigation of SUP05 genomesboth from Saanich Inlet and other OMZs would help to address fundamental questions aboutfitness and gene loss within the context of microbial communities and distributed metabolisms.5.4 ConclusionsOverall, marine systems show a significantly greater abundance of Type II nosZ over Type I, similarto surveyed terrestrial systems. The abundance of Type II nosZ, associated with non-denitrifyingN2O-reducers, indicates globally distributed niches for non-denitrifying N2O-reducing organismsalong varying concentrations of O2, NO3 – and H2S. The distribution of the nitrous oxide reductase122gene nosZ in the Global Oceans, with broad taxonomic associations, points to unappreciatedsources and sinks of the potent greenhouse gas N2O. While coastal areas and OMZs are strongersources for N2O production [236], the expression of nosZ in OMZ metatranscriptomes andmetaproteomes also point to N2O consumption, but the factors regulating the balance betweenroduction and consumption have yet to be constrained. The presence of nosZ within oxicsurface waters suggests potential N2O sources in the surface ocean. However, the flux of N2Oin many areas of the ocean has yet to be measured and thus the balance of N2O production vs.consumption has yet to be constrained. As Global Ocean O2 concentrations continue to declineglobally, increasing the production of N2O, the capacity of marine waters to consume N2O is ofcritical importance for future climate models.5.5 Methods5.5.1 Single-cell Amplified genome collection, sequencing and annotationSAGs were collected and sequenced from Saanich Inlet as described in Hawley et al. 2017and in Chapter 4, and sequenced on Illumnia hi-seq. Samples for SAGs were collected fromSaanich Station S3 on August 10th, 2011 at 100 m, 150 m, 185 m, using 1 mL of sample waterinto 143 µL 48% Betaine, frozen on dry ice and stored at -80◦C. Sequences were assembled inSPAdes3.9 [254] and functional annotation carreid out in the Metapathways pipeline [191]. SAGscontaining the nitrous oxide reductase gene nosZ were identified and used in further analysis.SAGs containing nosZ were manually decontaminated based on visual analysis of Kmer frequencyprinciple component analysis of contigs containing genes of interest in the denitrification pathway,ensuring all contigs containing genes of interest shared Kmer space with the majority of the SAGKmer space.5.5.2 Multi-omics datasetsSaanich Inlet and NESAP metagenomes and metatranscriptomes and metaproteomes were gen-erated as described in Chapter 2. Metagenomes and metatranscriptomes from other geographicareas were from other publications detailed in Table C.1.1235.5.3 Identification and clustering of nosZ sequencesNitrous oxide reductase genes were identified in the SAGs based on functional annotation inMetapathways [94, 182] and validated by sequence homology searches to RefSeq. SAG NosZamino acid sequences were then clustered at 95% id and representative sequences used in sequencehomology comparisons against environmental metagenomes and metatranscriptomes using FASTalgorithm [185] with a cutoff of 30% identity and alignment length of over 50 amino acids. Foreach metagenomic dataset (i.e. Saanich Inlet/NESAP, ETSP, PERU, TARA, KNORR) identifiedNosZ amino acid sequences were clustered at 85% identity using USEARCH cluster fast withdefault settings and removing sequences <90 amino acids long. Representative sequences fromeach clustered dataset were then clustered together at 85% identity and mapped to phylogeneticreference tree using MLTreemap [250].5.5.4 Generation of NosZ phylogenetic treeReference tree of NosZ protein sequences was generated using sequences from Sanford et al.2012 clustered at 95% (USEARCH cluster fast) to collapsed sequence redundancy, totalling113 representative sequences from an original 136. RaxML was used to construct the tree withMLTreemap [250] was used to add SAG NosZ sequences. Metagenomic and metatranscriptomicsequences were then mapped on using MLTreemap.5.5.5 Gene, transcript and protein abundance mappingFor metagenomes and metatranscriptoms from Saanich Inlet, NESAP, KNORR and TARA readsper kilobase mapped per million were (RPKM) were calculated through the Metapathways pipeline[191] and summed for a given nosZ clade and or SAG sequence such as clade 5 MarinimicrobiaZA3312c and Clade 5 Bacteroidales VC21. For all other datasets the number of genes found for agiven dataset and cluster were summed for a given nosZ clade. For metaproteome from SaanichInlet Normalized spectral abundance factor (NSAF) were calculated as described in Chapter 2)1245.5.6 Denitrification and Anammox rate measurementsRates of denitrification and anammox were taken as described in Holtappels et al. 2011 [255].125Chapter 6ConclusionsThroughout this thesis, multi-omics approaches were used to chart microbial community structure,identify nitrogen (N) and sulphur (S)-based energy metabolisms and carbon (C) fixation pathwaysas well as to propose metabolic interactions along redox gradients in marine oxygen minimumzones (OMZs). Metaproteomics was used in the development of a conceptual model for C, N andS-based metabolic interactions between dominant taxa along the redox gradient in Saanich Inlet,illustrating energetic coupling of denitrification and anammox to sulfur oxidation and carbonfixation within these taxa. These findings formed the basis for a collaborative work in Loucaet al. to build a steady state multi-omic mathematical model, confirming the conceptual modeland realising a previously unrecognized niche for nitrous oxide (N2O) reduction [132]. Genesfor N2O reduction (nosZ) were identified within single cell amplified genomes (SAGs) from thedark matter phylum Marinimicrobia SHBH1141 clade collected from Saanich Inlet sulfidic basinwaters, filling the niche of non-denitrifying N2O-reducers proposed by Louca et al.. Analysis ofenergy metabolism and biogeography of additional Marinimicrobia clades, defined by globallysourced SAGs, found several clades to play additional roles in C, N and S cycling along redoxgradients in the Global Ocean. Finally, using SAGs from Saanich Inlet, I phylogenetically anchoredmultiple nosZ genes, placing the Marinimicrobia SHBH1141 as the dominant N2O-reducing groupwithin anoxic and sulfidic OMZs globally and further explored the distribution and abundanceof different nosZ clades in the Global Ocean. As OMZs continue to expand and intensify due toclimate change the metabolic processes and interactions involved in N-loss and greenhouse gasproduction/consumption within O2-depleted systems becomes increasingly important to nutrientand energy flow and ecosystem services. The findings in this thesis add knowledge about keymicrobial taxa involved in N and S-cycling and carbon fixation, adding insights into metabolic126interactions on the level of the microbial community in OMZs and the Global Ocean.This final chapter discusses some of the advantages and limitations of multi-omic approachesand identifies improvements to expand these approaches both in the field and in-silico. Thischapter also looks at expanding findings from Saanich Inlet to other global OMZs with respectto the ecology of dominant organisms across multiple OMZs and further explores the themes ineco-thermodynamics in relation to microbial community structure and metabolic interactions.6.1 Advantages and limitations with multi-omics approachesMulti-omic approaches facilitate the culture independent study of microbial communities withinnatural and engineered environments and support the construction of hypotheses about metabolicnetworks and interactions. However, reconstructing microbial interactions from environmentalsamples is far from a perfect practice with many assumptions and limitations. While nextgeneration sequencing has greatly reduced sampling bias compared to traditional methodsinvolving preparation of Sanger sequencing libraries, assembly of contiguous genomic sequencesfor a given taxonomic group from millions of short reads generated by sequencing platformssuch as Illumnia may lead to construction of chimeras, confounding taxonomic assignment andmetabolic reconstruction efforts. Once assembled, computational advances have been developedto predict open reading frames and assign function but ultimately functional annotation inherentlyrelies on sequence similarity to existing databases, essentially making such functional annotationshypotheses. While manually checking annotation against multiple hits and databases to addressmiss-annotation is possible for selected genes of interest, it is not feasible for entire metagenomes.Additional manual annotation techniques, such as building phylogenetic trees, as done for nosZ,and exploring protein structures, while highly informative, are possible for only a limited numberof genes. Furthermore, functional annotation does not take into account enzyme activity and/orregulation, which cannot be uncovered without further laboratory experimentation. Despitethese inherent limitations, multi-omic analysis remains the best approach currently available toassess the overall metabolic potential of a microbial community at the individual, population andcommunity level.1276.2 Methodological and analytical developments6.2.1 Field work and samplingOxygen measurementsThroughout this thesis, gradients of O2 and NO3 – are seen to impact processes of H2S oxidation,nitrogen loss and production/consumption of N2O, particularly relating to O2 inhibition ofNosZ and other nitrogen cycling processes. Accurate measurement of O2 concentrations directlywithin the sampling environment is critical to understand the impact of these processes onmetabolic processes. Furthermore, several recent studies have uncovered evidence of crypticoxygen cycling where nano-molar concentrations of oxygen (O2) are generated within the OMZ[256, 257], with implications for nitrogen loss and greenhouse gas production as many enzymesinvolved in denitrification and anammox are oxygen sensitive. The ability to detect nano-molarconcentrations of O2 requires a switchable trace oxygen (STOX) sensor [258] and adaptations tosampling protocols. Currently, the Saanich Inlet time series is limited to the 2-3 µm detection rangeprovided by an O2 optode and coupled Winkler titration method. Implementation of the STOXsensor could address some of the inconsistencies seen in the Saanich Inlet time series data setssuch as the presence of proteins involved in aerobic processes, i.e. nitrification, in anoxic watersseen in Chapters 3 and 5 or expression of different nosZ clades in the proteome under varyingconditions. Incorporation of more precise O2 measurements into the Saanich Inlet time seriescould contribute to more meaningful interpretations of multi-omic datasets and more accuratemodeling of N and S energy metabolism and dark carbon fixation extensible to other O2-depletedsystems.6.2.2 Analysis and MethodologiesLinking taxonomy and functionOne of the main challenges in multi-omics analysis is the ability to match taxonomy to function,identifying the microbial groups within a community responsible for specific metabolic processes.Within this thesis, I used sequence homology to previously isolated and sequenced organisms128or manually curated metagenome assembled genome (MAG) and single cell amplified genomes(SAGs), for SUP05 and Marinimicrobia respectively. Advances in both sequencing technologiesand computational analyses are making matching taxonomy and function more accurate andmore accessible for high-through-put metagenomic applications. On the technological front,SAGs and long-read technologies such as Oxford-nanopore, produce sequence data from a singleorganism, ideally with enough sequence data to provide phylogenetic and taxonomic anchors(such as ribosomal genes and single copy marker genes [230]). Both SAG sequences and long-reads can serve as nucleation points for further recruiting reads and/or contigs from short-readmetagenomes (e.g. Illumnia) and provide information on abundance of sequenced organismswithin an environment. On the analytical front, there is a growing collection of binning algorithmsand software to produce MAGs en-mass from metagenomic sequence data [35]. While MAGsmay provide insight into matching function to taxonomy, the phylogenetic level at which theyoperate is not yet well defined in that if a bin reflects a strain, species, genus seems to be variable.Phylogenetic anchoring of key functional genes, such as the N2O reductase nosZ and other genesinvolved in the nitrogen cycle, is key to building comprehensive understanding of microbialinteractions as well as extending that understanding to global processes and models.SAG-EXtrapolatorMapping metagenomic contigs to long reads (including Sanger-sequenced full fosmids) and SAGswith high stringency may provide an exploitation of these phylogenetically anchored sequencesto build population level genomic bins. SAGExtrapolator (SAGEX), put forth by previous HallamLab Masters student W. Evan Durno [259], which automates an approach originally conceived byDodsworth et at., 2013 to recruit metagenomic contigs to microbial dark matter phyla OP9 SAG t[181]. SAGEX recruits metagenomic contigs to a SAG at high sequence similarity and validates bytetranucleotide frequency [227], effectively extrapolating the genomic sequence data beyond theSAG. SAGEX offers a high-throughput supervised binning algorithm based on the approach usedin Chapter 4 to bind metagenomic contigs to Marinimicrobia SAGs. SAGEX-generated populationgenome bins have an advantage over purely algorithm-generated MAGs in that SAGEX bassesbinning on sequence data directly from an individual organism opposed to only assembled contigs129within the metagenome.Including abundance information such as RPKM for recruited contigs can additionallyprovide valuable information about the abundance of various microbial populations. Usedacross geographic areas, SAGEX can facilitate comparative genomics for various taxa, addressingquestions of endemism vs cosmopolitanism and correlation of specific groups with environmentalconditions such as O2 concentrations, providing a more complete picture of metabolic processesand coupling in O2-depleted waters.6.3 Expanding the Saanich Inlet model to Global OMZsSaanich Inlet serves as a model OMZ but the extent to which the information that is uncoveredin this model informs what is known about other OMZ systems, both coastal and open ocean,in many cases remains to be explored and validated. Ongoing research both in Saanich Inletand other OMZs continually adds new knowledge and insights to microbial processes operatingunder O2-depletion. Integration of this knowledge, such as using genomic information for SAGsin Saanich to phylogenetically anchor functional genes in other OMZs, can address large scaleecological questions.6.3.1 Questions about ecology and global implicationsFundamental questions about microbial community organization arise when comparing SaanichInlet and other OMZs on a global scale. For example, the question of endemism vs cosmopoli-tanism; which species or clades may be found only in Saanich Inlet and which ones are foundin other OMZs? Are the endemic groups perhaps key-stone members, carrying out functionsessential and specific to that environment (e.g. trace metal redox reactions or detoxification).Do cosmopolitan taxa shift in abundance in the same manner along environmental gradients indifferent systems? While these questions hold across all ecosystems, the unique redox gradientsfound in OMZs (and O2-deficient waters) support investigation of a range of micro-niches within alarger environment. The assembly of microbial communities and associated metabolic interactionsalong these gradients within different geographic locations offer an opportunity to address specific130questions of community organization and population dynamics within specific clades. Withintaxa contributing to global nitrogen, sulphur and carbon cycles such as SUP05 and MarinimicrobiaSHBH1141, understanding population dynamics are key for modeling global processes such asN2O production/consumption.6.3.2 SUP05 sub-clade metabolism, population dynamics and biogeographyThe SUP05 group of denitrifying, sulfur oxidizing Gammaproteobacteria is an obvious group forinvestigation of population dynamics both in Saanich Inlet and other OMZs. The SAGs collectedin Saanich Inlet indicate the presence of two predominate clades, SUP05 1a and SUP05 1c (FigureD.1), but questions remain about their metabolic capacities and niche differentiation, specificallywith respect to N2O reduction. The Saanich Inlet time series from Chapter 2 offers a uniqueopportunity to address time-resolved population dynamics and changes in metabolic capacityalong defined redox gradients by mapping sequencing reads from the time series metagenomesto the SAGs within the respective clades. Expression from the respective clades could be furtherobserved by mapping reads from the metatranscriptomes in the same way. Clade abundance orexpression over time and along the redox gradient could possibly identify correlations of cladeswith specific environmental factors (O2, NO3 – , NO2 – etc.) and aid in the definition of niches forthe two clades.Further to SUP05 population dynamics in Saanich Inlet would be a comparative genomicsstudy across SUP05 genomes from multiple environments including SUP05 species that have beenrecently sequenced from Effingham Inlet Canada, Candidatus Thioglobus autotrophicus (culturedisolate) [34] and the Peruvian upwelling system UThioglobus perditus (MAG) [36]. Together,population dynamics and comparative genomics could provide a greater understanding of therole different clades of the abundant SUP05 group with respect to N2O production/consumption,sulfide oxidation and detoxification [75] and carbon fixation [36, 79] in different OMZs.1316.4 Themes in microbial interactions along eco-thermodynamicgradientsUsing an eco-thermodynamic approach to microbial ecology, I explore how energy available to amicrobial community flows through community members in the form of metabolite exchange,shaping community structure under a given energy regime. Eco-thermodynamic gradients, theavailability of electron donors (e.g. reduced sulphur compounds) and acceptors (e.g. NO3 – ,NO2 – , N2O) within the physical environment, such as those observed along redox gradients inSaanich Inlet, appear to govern N and S-based metabolic interactions. While direct evidence ofmetabolic coupling is difficult to obtain in un-cultured systems, particularly so for dissimilatoryprocesses, both conceptual and mathematical models support such interactions [79, 132, 204, 249]and suggest energy availability to be a strong organizing principal for microbial communitystructure and metabolic processes.The exploration of metabolic interactions along eco-thermodynamic gradients within thisthesis suggests different motifs for metabolic interactions exist under different energy regimes(Figure 6.1). Within the energy replete, highly oxidized, surface ocean, a model of public-goodsis proposed where one species makes a product that is released into the environment and usedby many other species. For example, cyanobacteria release vitamin-B12 that is used by manyother bacteria as an enzyme co-factor [91]. Within energy deficient, reduced, deep-sea sedimentsa model of discrete exchange is proposed where metabolites are passed directly between twospecies. For example, methane oxidizing Archaea pass electrons to sulfate reducing bacteria[260, 261]. Within moderately reduced environments, such as marine oxygen minimum zones andtidal sediments [262], a model of selective exchange is proposed where metabolites are sharedamong a limited number of taxonomic groups. For example NO2 – produced by SUP05-grouptaken up by Planctomycetes and Nitrospira for anammox and nitrification respectively [132]. Takentogether, these observations point to a continuum of metabolite exchange from public goodsthrough selective exchange to discrete exchange along an eco-thermodynamic gradient from oxicto reduced environments. I further construct a hypothesis where, as energy available to thecommunity decreases (in the form of Gibbs free energy between electron donors and acceptors)132metabolic couplings become increasingly more discrete as the specificity of each interaction isunder greater and greater selective pressure to optimize energetic gain from individual metaboliteexchanges over welfare of the community as a whole.Discrete ExchangePublic-Goods(e.g. oxic environment)(e.g. reducedenvironment)Selective ExchangeEnergy AvaliabilityTightness of metabolic couplingFigure 6.1: Smotifs for metabolic interactions. Diagram showing various motifs for metabolic interactionsalong an energy gradient.6.5 ClosingAs technological and analytical tools have advanced to the point of providing sequence informationwith taxonomic association from a diverse range of environments, it is apparent that microbialcommunities are truly the engines that drive Earth’s biogeochemical cycles [38]. Within OMZs,the microbial community and associated biogeochemical cycles play key roles in nitrogen, sulfurand carbon cycling on a global scale. The redox gradients within OMZs offer an opportunity tochart how decreases in energy availability, in the form of high-energy electron acceptors, shapethe microbial community and metabolic interactions carrying out these processes. Throughoutthis thesis, Saanich Inlet serves as a model OMZ to study the microbial communities along133these redox gradients and provides a crucial framework for the development of multi-omicsapproaches to studying microbial communities over spatial, temporal and energetic gradients.Findings from Saanich Inlet and Global Ocean genomic surveys in this thesis reveal the impactsof energy availability on processes that are critical to climate change on our planet, includinggreenhouse gas production/consumption, loss of biologically available nitrogen, and dark carbonfixation. As O2 concentrations in the Global Ocean continue to decease, causing OMZ expansionand intensification, this thesis provides an essential knowledge base of the dominant microbialplayers and processes, providing important information for global modeling and environmentalmonitoring efforts. Further, this thesis builds hypotheses about the influence of energy availabilityon microbial metabolic interactions, offering important insights into factors that may shape thenature of interactions within microbial communities along eco-thermodynamic gradients. Asmore information about microbial communities and metabolism is revealed, we come to see aprofound connectedness at the smallest levels of life, bringing home the necessity of communityinteractions in order to carry out global processes.134Bibliography[1] P. K. Weyl. On the oxygen supply of the deep pacific ocean. Limnology and Oceanography, 10(2):215–219,1965.[2] J. J. Wright, K. M. Konwar, and S. J. Hallam. Microbial ecology of expanding oxygen minimum zones.Nat Rev Microbiol, 10(6):381–94, 2012.[3] R. E. Keeling, A. Kortzinger, and N. Gruber. Ocean deoxygenation in a warming world. Ann Rev MarSci, 2:199–229, 2010.[4] O. Ulloa, D. E. Canfield, E. F. DeLong, R. M. Letelier, and F. J. Stewart. Microbial oceanography ofanoxic oxygen minimum zones. Proc Natl Acad Sci U S A, 109(40):15996–6003, 2012.[5] R. J. Diaz and R. Rosenberg. Spreading dead zones and consequences for marine ecosystems. Science,321(5891):926–9, 2008.[6] A. Paulmier and D. Ruiz-Pino. Oxygen minimum zones (omzs) in the modern ocean. Progress inOceanography, 80(3-4):113–128, 2009.[7] Douglas G. Capone and David A. Hutchins. Microbial biogeochemistry of coastal upwelling regimesin a changing ocean. Nature Geoscience, 6, 2013.[8] E. Zaikova, D. A. Walsh, C. P. Stilwell, et al. Microbial community dynamics in a seasonally anoxicfjord: Saanich inlet, british columbia. Environ Microbiol, 12(1):172–91, 2010.[9] Osvaldo Ulloa, Jody J. Wright, Lucy Belmar, and Steven J. Hallam. Pelagic oxygen minimum zonemicrobial communities. pages 113–122, 2013.[10] F. A. Whitney, H. J. Freeland, and M. Robert. Persistently declining oxygen levels in the interiorwaters of the eastern subarctic pacific. Progress in Oceanography, 75(2):179–199, 2007.[11] M. C. Long, C. Deutsch, and T. Ito. Finding forced trends in oceanic oxygen. Global BiogeochemicalCycles, 30(2):381–397, 2016.[12] P. Lam and M. M. Kuypers. Microbial nitrogen cycling processes in oxygen minimum zones. AnnRev Mar Sci, 3:317–45, 2011.[13] A. H. Devol and H. E. Hartnett. Role of the oxygen-deficient zone in transfer of organic carbon to thedeep ocean. Limnology and Oceanography, 46(7):1684–1690, 2001.[14] D. Woebken, B. M. Fuchs, M. M. M. Kuypers, and R. Amann. Potential interactions of particle-associated anammox bacteria with bacterial and archaeal partners in the namibian upwelling system.Applied and Environmental Microbiology, 73(14):4648–4657, 2007.[15] S. Ganesh, L. A. Bristow, M. Larsen, et al. Size-fraction partitioning of community gene transcriptionand nitrogen metabolism in a marine oxygen minimum zone. ISME J, 2015.[16] S. Ganesh, D. J. Parris, E. F. DeLong, and F. J. Stewart. Metagenomic analysis of size-fractionatedpicoplankton in a marine oxygen minimum zone. ISME J, 8(1):187–211, 2014.135[17] H. Stevens and O. Ulloa. Bacterial diversity in the oxygen minimum zone of the eastern tropicalsouth pacific. Environmental Microbiology, 10(5):1244–1259, 2008.[18] Frank J. Stewart, Osvaldo Ulloa, and Edward F. DeLong. Microbial metatranscriptomics in apermanent marine oxygen minimum zone. Environmental Microbiology, pages no–no, 2011.[19] C. Rinke, P. Schwientek, A. Sczyrba, et al. Insights into the phylogeny and coding potential ofmicrobial dark matter. Nature, 499(7459):431–7, 2013.[20] R. M. Morris, C. D. Frazar, and C. A. Carlson. Basin-scale patterns in the abundance of sar11subclades, marine actinobacteria (om1), members of the roseobacter clade and ocs116 in the southatlantic. Environ Microbiol, 14(5):1133–44, 2012.[21] J. Viklund, T. J. Ettema, and S. G. Andersson. Independent genome reduction and phylogeneticreclassification of the oceanic sar11 clade. Mol Biol Evol, 29(2):599–615, 2012.[22] S. J. Giovannoni, L. Bibbs, J. C. Cho, et al. Proteorhodopsin in the ubiquitous marine bacterium sar11.Nature, 438(7064):82–5, 2004.[23] J. Sun, L. Steindler, J. C. Thrash, et al. One carbon metabolism in sar11 pelagic marine bacteria. PLoSOne, 6(8):e23973, 2011.[24] R. R. Malmstrom, R. P. Kiene, M. T. Cottrell, and D. L. Kirchman. Contribution of sar11 bacteriato dissolved dimethylsulfoniopropionate and amino acid uptake in the north atlantic ocean. ApplEnviron Microbiol, 70(7):4129–35, 2004.[25] D. Tsementzi, J. Wu, S. Deutsch, et al. Sar11 bacteria linked to ocean anoxia and nitrogen loss. Nature,536:179–183, 2016.[26] K. T. Marshall and R. M. Morris. Isolation of an aerobic sulfur oxidizer from the sup05/arctic96bd-19clade. ISME J, 7(2):452–5, 2013.[27] C. S. Sheik, S. Jain, and G. J. Dick. Metabolic flexibility of enigmatic sar324 revealed throughmetagenomics and metatranscriptomics. Environ Microbiol, 16(1):304–17, 2014.[28] S. Lu¨cker, B. Nowka, T. Rattei, E. Spieck, and H. Daims. The genome of nitrospina gracilis illuminatesthe metabolism and evolution of the major marine nitrite oxidizer. Front Microbiol, 4:27, 2013.[29] B. Kartal, M. M. Kuypers, G. Lavik, et al. Anammox bacteria disguised as denitrifiers: nitratereduction to dinitrogen gas via nitrite and ammonium. Environ Microbiol, 9(3):635–42, 2007.[30] C. A. Francis, K. J. Roberts, J. M. Beman, A. E. Santoro, and B. B. Oakley. Ubiquity and diversity ofammonia-oxidizing archaea in water columns and sediments of the ocean. Proc Natl Acad Sci U S A,102(41):14683–8, 2005.[31] E. Allers, J. J. Wright, K. M. Konwar, et al. Diversity and population structure of marine group abacteria in the northeast subarctic pacific ocean. ISME J, 7(2):256–68, 2013.[32] J. J. Wright, K. Mewis, N. W. Hanson, et al. Genomic properties of marine group a bacteria indicate arole in the marine sulfur cycle. ISME J, 8(2):455–68, 2014.[33] D. A. Walsh, E. Zaikova, C. G. Howes, et al. Metagenome of a versatile chemolithoautotroph fromexpanding oceanic dead zones. Science, 326(5952):578–82, 2009.[34] V. Shah, B. X. Chang, and R. M. Morris. Cultivation of a chemoautotroph from the sup05 clade ofmarine bacteria that produces nitrite and consumes ammonium. ISME J, 2016.136[35] R. Knight, A. Vrbanac, B. C. Taylor, et al. Best practices for analysing microbiomes. Nature ReviewsMicrobiology, 16(7):410–422, 2018.[36] C. M. Callbeck, G. Lavik, T. G. Ferdelman, et al. Oxygen minimum zone cryptic sulfur cyclingsustained by offshore transport of key sulfur oxidizing bacteria. Nature Communications, 9(1), 2018.[37] M. Labrenz, J. Grote, K. Mammitzsch, et al. Sulfurimonas gotlandica sp. nov., a chemoautotrophicand psychrotolerant epsilonproteobacterium isolated from a pelagic redoxcline, and an emendeddescription of the genus sulfurimonas. Int J Syst Evol Microbiol, 63(Pt 11):4141–8, 2013.[38] P. G. Falkowski, T. Fenchel, and E. F. Delong. The microbial engines that drive earth’s biogeochemicalcycles. Science, 320(5879):1034–9, 2008.[39] J. P. Zehr and R. M. Kudela. Nitrogen cycle of the open ocean: from genes to ecosystems. Ann RevMar Sci, 3:197–225, 2011.[40] J. A. Sohm, E. A. Webb, and D. G. Capone. Emerging patterns of marine nitrogen fixation. Nat RevMicrobiol, 9(7):499–508, 2011.[41] H. Farnelid, M. Bentzon-Tilia, A. F. Andersson, et al. Active nitrogen-fixing heterotrophic bacteria atand below the chemocline of the central baltic sea. ISME J, 7(7):1413–23, 2013.[42] T. Grosskopf, W. Mohr, T. Baustian, et al. Doubling of marine dinitrogen-fixation rates based on directmeasurements. Nature, 488(7411):361–4, 2012.[43] C. B. Field, M. J. Behrenfeld, J. T. Randerson, and P. Falkowski. Primary production of the biosphere:Integrating terrestrial and oceanic components. Science, 281(5374):237–240, 1998.[44] C. Fernandez, L. Farı´as, and O. Ulloa. Nitrogen fixation in denitrified marine waters. PlosOne,6(6):e20539, 2011.[45] C. R. Loescher, T. Grosskopf, F. D. Desai, et al. Facets of diazotrophy in the oxygen minimum zonewaters off peru. ISME J, 8(11):2180–92, 2014.[46] S. Bonnet, J. Dekaezemacker, K. A. Turk-Kubo, et al. Aphotic n2 fixation in the eastern tropical southpacific ocean. PLOS ONE, 8(12):1–14, 12 2013.[47] E. Costa, J. Perez, and J. U. Kreft. Why is metabolic labour divided in nitrification? Trends Microbiol,14(5):213–9, 2006.[48] S. N. Merbt, D. A. Stahl, E. O. Casamayor, et al. Differential photoinhibition of bacterial and archaealammonia oxidation. FEMS Microbiol Lett, 327(1):41–6, 2012.[49] C. Wuchter, B. Abbas, M. J. Coolen, et al. Archaeal nitrification in the ocean. Proc Natl Acad Sci U S A,103(33):12317–22, 2006.[50] M. Konneke, A. E. Bernhard, J. R. de la Torre, et al. Isolation of an autotrophic ammonia-oxidizingmarine archaeon. Nature, 437(7058):543–6, 2005.[51] C. A. Schleper, R. V. Swanson, E. J. Mathur, and E. F. DeLong. Characterization of a dna polymerasefrom the uncultivated psychrophilic archaeon cenarchaeum symbiosum. J Bacteriol, 179(24):7803–7811,1997.[52] S. J. Hallam, T. J. Mincer, C. Schleper, et al. Pathways of carbon assimilation and ammonia oxidationsuggested by environmental genomic analyses of marine crenarchaeota. PLoS Biol, 4(4):e95, 2006.137[53] N. Vajrala, W. Martens-Habbena, L. A. Sayavedra-Soto, et al. Hydroxylamine as an intermediate inammonia oxidation by globally abundant marine archaea. Proc Natl Acad Sci U S A, 110(3):1006–1011,2013.[54] J. Fu¨ssel, S. Lu¨cker, P. Yilmaz, et al. Adaptability as the key to success for the ubiquitous marinenitrite oxidizer nitrococcus. Science Advances, 3(11), 2017.[55] S. Lucker, M. Wagner, F. Maixner, et al. A nitrospira metagenome illuminates the physiology andevolution of globally important nitrite-oxidizing bacteria. Proc Natl Acad Sci U S A, 107(30):13479–84,2010.[56] J. Fu¨ssel, P. Lam, G. Lavik, et al. Nitrite oxidation in the namibian oxygen minimum zone. ISME J,6(6):1200–9, 2012.[57] D. E. Canfield, A. N. Glazer, and P. G. Falkowski. The evolution and future of earth’s nitrogen cycle.Science, 330(6001):192–6, 2010.[58] J. Simon and M. G. Klotz. Diversity and evolution of bioenergetic systems involved in microbialnitrogen compound transformations. Biochim Biophys Acta, 1827(2):114–35, 2013.[59] R. A. Sanford, D. D. Wagner, Q. Wu, et al. Unexpected nondenitrifier nitrous oxide reductase genediversity and abundance in soils. Proc Natl Acad Sci U S A, 109(48):19709–14, 2012.[60] B. B. Ward, A. H. Devol, J. J. Rich, et al. Denitrification as the dominant nitrogen loss process in thearabian sea. Nature, 461(7260):78–81, 2009.[61] P. Lam, G. Lavik, M. M. Jensen, et al. Revising the nitrogen cycle in the peruvian oxygen minimumzone. Proceedings of the National Academy of Sciences, 106(12):4752–4757, 2009.[62] P. Lam, M. M. Jensen, G. Lavik, et al. Linking crenarchaeal and bacterial nitrification to anammox inthe black sea. Proc Natl Acad Sci U S A, 104(17):7104–9, 2007.[63] A. Jayakumar, G. D. OMullan, S. W. A. Naqvi, and B. B. Ward. Bacterial community compositionchanges associated with stages of denitrification in oxygen minimum zones. Microbial Ecology,52(2):350–626, 2009.[64] C. G. Bruckner, K. Mammitzsch, G. Jost, et al. Chemolithoautotrophic denitrification of epsilonpro-teobacteria in marine pelagic redox gradients. Environ Microbiol, 15(5):1505–13, 2013.[65] D. Woebken, P. Lam, M. M. M. Kuypers, et al. A microdiversity study of anammox bacteria reveals anovel candidatusscalindua phylotype in marine oxygen minimum zones. Environmental Microbiology,10(11):3106–3119, 2008.[66] M. Strous, E. Pelletier, S. Mangenot, et al. Deciphering the evolution and metabolism of an anammoxbacterium from a community genome. Nature, 440(7085):790–4, 2006.[67] M. S. Jetten, Lv Niftrik, M. Strous, et al. Biochemistry and molecular biology of anammox bacteria.Crit Rev Biochem Mol Biol, 44(2-3):65–84, 2009.[68] T. Kalvelage, M. M. Jensen, S. Contreras, et al. Oxygen sensitivity of anammox and coupled n-cycleprocesses in oxygen minimum zones. PLoS One, 6(12):e29299, 2011.[69] J. B. Kirkpatrick, C. A. Fuchsman, E. Yakushev, J. T. Staley, and J. W. Murray. Concurrent activity ofanammox and denitrifying bacteria in the black sea. Front Microbiol, 3:256, 2012.[70] B. Thamdrup and T. Dalsgaard. Production of n2 through anaerobic ammonium oxidation coupled tonitrate reduction in marine sediments. Applied and Environmental Microbiology, 68(3):1312–1318, 2002.138[71] M. A. Azhar, D. E. Canfield, K. Fennel, B. Thamdrup, and C. J. Bjerrum. A model-based insightinto the coupling of nitrogen and sulfur cycles in a coastal upwelling system. Journal of GeophysicalResearch: Biogeosciences, 119:264–285, 2014.[72] M. Voss and J. P. Montoya. Nitrogen cycle: Oceans apart. Nature, 461:49–50, 2009.[73] P. Lam, M. M. Jensen, A. Kock, et al. Origin and fate of the secondary nitrite maximum in the arabiansea. Biogeosciences, 8(6):1565–1577, 2011.[74] W. Ghosh and B. Dam. Biochemistry and molecular biology of lithotrophic sulfur oxidation bytaxonomically and ecologically diverse bacteria and archaea. FEMS Microbiol Rev, 33(6):999–1043,2009.[75] G. Lavik, T. Stuhrmann, V. Bruchert, et al. Detoxification of sulphidic african shelf waters by bloomingchemolithotrophs. Nature, 457(7229):581–4, 2009.[76] S. Glaubitz, K. Kiesslich, C. Meeske, M. Labrenz, and K. Jurgens. Sup05 dominates the gammapro-teobacterial sulfur oxidizer assemblages in pelagic redoxclines of the central baltic and black seas.Appl Environ Microbiol, 79(8):2767–76, 2013.[77] B. K. Swan, M. Martinez-Garcia, C. M. Preston, et al. Potential for chemolithoautotrophy amongubiquitous bacteria lineages in the dark ocean. Science, 333(6047):1296–300, 2011.[78] D. E. Canfield, F. J. Stewart, B. Thamdrup, et al. A cryptic sulfur cycle in oxygen-minimum-zonewaters off the chilean coast. Science, 330(6009):1375–8, 2010.[79] A K. Hawley, H.M. Brewer, A. D. Norbeck, L. Paa-Toli c, and S. J. Hallam. Metaproteomics revealsdifferential modes of metabolic coupling among ubiquitous oxygen minimum zone microbes. ProcNatl Acad Sci U S A, 11(31):11395–11400, 2014.[80] C. B. Walker, J. R. de la Torre, M. G. Klotz, et al. Nitrosopumilus maritimus genome reveals uniquemechanisms for nitrification and autotrophy in globally distributed marine crenarchaea. Proc NatlAcad Sci U S A, 107(19):8818–8823, 2010.[81] Y. I. Sorokin, P. Y Sorokin, V. A. Avdeev, D. Y. Sorokin, and S. V. Ilchenkol. Biomass, production andactivity of bacteria in the black sea, with special reference to chemosynthesis and the sulfur cycle.Hydrobiologia, 308(1):61–76, 1995.[82] G. Jost, M. V. Zubkov, E. Yakushev, M. Labrenz, and K. Jrgens. High abundance and dark co2 fixationof chemolithoautotrophic prokaryotes in anoxic waters of the baltic sea. Limnology and Oceanography,53(1):14–22, 2008.[83] B. B. Ward, H. E. Glover, and F. Lipschultz. Chemoautotrophic activity and nitrification in the oxygenminimum zone off peru. Deep Sea Research, 36(7):1031–1051, 1989.[84] G. T. Taylor, M. Iabichella, T. Ho, et al. Chemoautotrophy in the redox transition zone of the cariacobasin: A significant midwater source of organic carbon production. Limnology and Oceanography,46(1):149–163, 2001.[85] S. Louca, M. P. Polz, F. Mazel, et al. Function and functional redundancy in microbial systems. NatureEcology & Evolution, pages 2397–334X, 2018.[86] E. F. DeLong. Microbial community genomics in the ocean. Nat Rev Microbiol, 3(6):459–69, 2005.[87] DeLongE. F., C. M. Preston, T. Mincer, et al. Community genomics amoung stratified microbialassemblages in the ocean’s interior. Science, 331, 2006.139[88] J. A. Fuhrman. Microbial community structure and its functional implications. Nature, 459(7244):193–9,2009.[89] S. L. Strom. Microbial ecology of ocean biogeochemistry: a community of perspective. Science,320(5879):1043–4045, 2008.[90] B. E. Morris, R. Henneberger, H. Huber, and C. Moissl-Eichinger. Microbial syntrophy: interactionfor the common good. FEMS Microbiol Rev, 37(3):384–406, 2013.[91] S. J. Giovannoni. Vitamins in the sea. Proc Natl Acad Sci U S A, 109(35):13888–9, 2012.[92] J. A. Fuhrman, J. A. Cram, and D. M. Needham. Marine microbial community dynamics and theirecological interpretation. Nat Rev Microbiol, 13(3):133–46, 2015.[93] M. T. Mee, J. J. Collins, G. M. Church, and H. H. Wang. Syntrophic exchange in synthetic microbialcommunities. Proc Natl Acad Sci U S A, 111(20):E2149–E2156, 2014.[94] N. W. Hanson, K. M. Konwar, A. K. Hawley, et al. Metabolic pathways for the whole community.BMC Genomics, 15(619), 2014.[95] S. J. Hallam and J. P. McCutcheon. Microbes don’t play solitaire: how cooperation trumps isolationin the microbial world. Environ Microbiol Rep, 7(1):26–8, 2015.[96] K. Anantharaman, C. T. Brown, L.A. Hug, et al. Thousands of microbial genomes shed light oninterconnected biogeochemical processes in an aquifer system. Nature Communications, 7:13219, 2016.[97] J. P. McCutcheon and N. A. Moran. Functional convergence in reduced genomes of bacterialsymbionts spanning 200 my of evolution. Genome Biol Evol, 2, 2010.[98] S. Scheller, H. Yu, G. L. Chadwick, S. E. McGlynn, and V. J. Orphan. Artificial electron acceptorsdecouple archaeal methane oxidation from sulfate reduction. Science, 351(6274):703–707, 2016.[99] J. J. Morris, Z. I. Johnson, M. J. Szul, M. Keller, and E. R. Zinser. Dependence of the cyanobacteriumprochlorococcus on hydrogen peroxide scavenging microbes for growth at the ocean’s surface. PLoSOne, 6(2):e16805, 2011.[100] S. J. Giovannoni, H. J. Tripp, S. Givan, et al. Genome streamlining in a cosmopolitan oceanic bacterium.Science, 309(5738):1242–5, 2005.[101] J. J. Morris, R. E. Lenski, and E. R. Zinserc. The black queen hypothesis: Evolution of dependenciesthrough adaptive gene loss. MBio, 3(2):e00036–12, 2012.[102] E. F. DeLong. Life on the thermodynamic edge. Science, 317:327–328, 2007.[103] M. K. Nobu, H. Tamaki, K. Kubota, and W. T. Liu. Metagenomic characterization of ’candidatusdefluviicoccus tetraformis strain tfo71’, a tetrad-forming organism, predominant in an anaerobic-aerobic membrane bioreactor with deteriorated biological phosphorus removal. Environ Microbiol,16(9):2739–51, 2014.[104] S. J. Giovannoni and K. L. Vergin. Seasonality in ocean microbial communities. Science, 335(6069):671–676, 2012.[105] R. Stepanauskas and M. E. Sieracki. Matching phylogeny and metabolism in the uncultured marinebacteria, one cell at a time. Proc Natl Acad Sci U S A, 104(21):9052–7, 2007.[106] A. S. Hahn, K. M. Konwar, S. Louca, N. W. Hanson, and S. J. Hallam. The information science ofmicrobial ecology. Curr Opin Microbiol, 31:209–16, 2016.140[107] R. H. Herlinveaux. Oceanography of saanich inlet in vancouver island, british columbia. Journal ofthe Fisheries Research Board of Canada, 19:1–37, 1962.[108] M. D. Lilley, J. A. Baross, and L. I. Gordon. Dissolved hydrogen and methane in saanich inlet, britishcolumbia. Deep Sea Research Part A. Oceanographic Research Papers, 29(1471):1484, 1982.[109] B. B. Ward and K. A. Kilpatrick. Relationship between substrate concentration and oxidation ofammonium and methane in a stratified water column. Continental Shelf Research, 10:1193–1208, 1990.[110] N. M. Carter. The oceanography of the fjords of southern british columbia. Fish. Res. Bd. Canada Prog.Rept. Pacific Coast Sta., 12:7–11, 1932.[111] N. M. Carter. Physiography and oceanography of some british columbia fjords. Proc. Fifth. Pacific Sci.Cong., 1:721, 1934.[112] J. J. Anderson and A. H. Devol. Deep water renewal in saanich inlet, an intermittently anoxic basin.Estuarine and Coastal and Marine Science, 1:1–10, 1973.[113] D. W. Capelle, A. K. Hawley, S. J. Hallam, and P. D. Tortell. A multi-year time-series of n2o dynamicsin a seasonally anoxic fjord: Saanich inlet, british columbia. Limnology and Oceanography, 63(2):524–539,2017.[114] A K. Hawley, M. Torres Beltra´n, M. P. Bhatia, et al. A compendium of water column multi-omicsequence information from a seasonally anoxic fjord saanich inlet. Scientific Data, submitted, 2017.[115] M. Torres-Beltra´n, A. K. Hawley, D. Capelle, et al. A compendium of water column chemistry fromthe seasonally anoxic fjord saanich inlet. Scientific Data, Submitted, 2017.[116] Osvaldo Ulloa and Silvio Pantoja. The oxygen minimum zone of the eastern south pacific. Deep SeaResearch Part II: Topical Studies in Oceanography, 56(16):987–991, 2009.[117] M. Sunamura, Y. Higashi, C. Miyako, J. Ishibashi, and A. Maruyama. Phylotypes are predominant inthe suiyo seamount hydrothermal plume. Appl Environ Microbiol, 70(2):1190–1198, 2004.[118] I. L. Newton, T. Woyke, T. A. Auchtung, et al. The calyptogena magnifica chemoautotrophic symbiontgenome. Science, 315(5814):998–1000, 2007.[119] M. Harada, T. Yoshida, H. Kuwahara, et al. Expression of genes for sulfur oxidation in the intracellularchemoautotrophic symbiont of the deep-sea bivalve calyptogena okutanii. Extremophiles, 13(6):895–903,2009.[120] S. Glaubitz, M. Labrenz, G. Jost, and K. Jurgens. Diversity of active chemolithoautotrophic prokaryotesin the sulfidic zone of a black sea pelagic redoxcline as determined by rrna-based stable isotopeprobing. FEMS Microbiol Ecol, 74(1):32–41, 2010.[121] C. A. Fuchsman, J. B. Kirkpatrick, W. J. Brazelton, J. W. Murray, and J. T. Staley. Metabolic strategiesof free-living and aggregate-associated bacterial communities inferred from biologic and chemicalprofiles in the black sea suboxic zone. FEMS Microbiol Ecol, 78(3):586–603, 2011.[122] K. Anantharaman, J. A. Breier, C. S. Sheik, and G. J. Dick. Evidence for hydrogen oxidation andmetabolic plasticity in widespread deep-sea sulfur-oxidizing bacteria. Proc Natl Acad Sci U S A,110(1):330–335, 2012.[123] J. Schmidtova, S. J. Hallam, and S. A. Baldwin. Phylogenetic diversity of transition and anoxic zonebacterial communities within a near-shore anoxic basin: Nitinat lake. Environ Microbiol, 11(12):3233–51,2009.141[124] R. A. Lesniewski, S. Jain, K. Anantharaman, P. D. Schloss, and G. J. Dick. The metatranscriptome of adeep-sea hydrothermal plume is dominated by water column methanotrophs and lithotrophs. ISMEJ, 6(12):2257–68, 2012.[125] B. J. Baker, C. S. Sheik, C. A. Taylor, et al. Community transcriptomic assembly reveals microbes thatcontribute to deep-sea carbon and nitrogen cycling. ISME J, 7(10):1962–73, 2013.[126] D. J. Richardson, B. C. Berks, D. A. Russell, S. Spiro, and C. J. Taylor. Functional, biochemical andgenetic diversity of prokaryotic nitrate reductase. Cellular and Molecular Life Sciences, 58:165–178, 2001.[127] V. Stewart, Y. Lu, and A. J. Darwin. Periplasmic nitrate reductase (napabc enzyme) supports anaerobicrespiration by escherichia coli k-12. Journal of Bacteriology, 184:1314–1323, 2002.[128] M.A. Moran. The global ocean microbiome. Science, 350(6266):aac8455, 2015.[129] P. G. Falkowski, T. Algeo, L. Codispoti, et al. Ocean deoxygenation: Past, present, and future. EOS,Trans AGU, 92(46):409–410, 2011.[130] J. M. Labonte, S. J. Hallam, and C. A. Suttle. Previously unknown evolutionary groups dominate thessdna gokushoviruses in oxic and anoxic waters of a coastal marine environment. Front Microbiol,6:315, 2015.[131] C. E. Chow, D. M. Winget, 3rd White, R. A., S. J. Hallam, and C. A. Suttle. Combining genomicsequencing methods to explore viral diversity and reveal potential virus-host interactions. FrontMicrobiol, 6:265, 2015.[132] S. Louca, A. K. Hawley, S. Katsev, et al. Integrating biogeochemistry with multi-omic sequenceinformation in a model oxygen minimum zone. Proc Natl Acad Sci U S A, In press, 2016.[133] M. A. Moran, B. Satinsky, S. M. Gifford, et al. Sizing up metatranscriptomics. ISME J, 7(2):237–43,2013.[134] F. J. Stewart, O. Ulloa, and E. F. DeLong. Microbial metatranscriptomics in a permanent marineoxygen minimum zone. Environ Microbiol, 14(1):23–40, 2012.[135] F. A. J. Armstrong, C. R. Stearns, and J. D. H. Strickland. The measurement of upwelling andsubsequent biological process by means of the technicon autoanalyzer and associated equipment.Deep Sea Research and Oceanographic Abstracts, 14:381–389, 1967.[136] R. M. Holmes, A. Aminot, R. Krouel, B. A. Hooker, and B. J. Peterson. A simple and precise methodfor measuring ammonium in marine and freshwater ecosystems. Canadian Journal of Fisheries andAquatic Sciences, 59(10):1801–1808, 1999.[137] J. D. Cline. Spectrophotometric determination of hydrogen sulfide in natural waters. Limnology andOceanography, 14:454–458, 1969.[138] J. Murphy and J. P. Riley. A modified single solution method for the determination of phosphate innatural waters. Analytica Chemica Acta, 27:31–36, 1962.[139] L. W. Winkler. Die bestimmung des im wasser gelsten sauerstoffes. Berichte der deutschen chemischenGesellschaft, 21:2843–2854, 1888.[140] D. Capelle, J. Dacey, and P. D Tortell. An automated, high-throughput method for accurate andprecise measurements of dissolved nitrous-oxide and methane concentrations in natural seawaters.Limnology and Oceanography: Methods, in review, 2015.[141] X. J. Lin, M. I. Scranton, A. Y. Chistoserdov, R. Varela, and G. T Taylor. Spatiotemporal dynamics ofbacterial populations in the anoxic cariaco basin. Limnology and Oceanography, 53(1):37–51, 2008.142[142] X. J. Lin, M. I Scranton, R. Varela, A. Chistoserdov, and G. T. Taylor. Compositional responses ofbacterial communities to redox gradients and grazing in the anoxic cariaco basin. Aquatic MicrobialEcology, 47(1):57–72, 2007.[143] M. J. Rodriguez-Mora, M. I. Scranton, G. T. Taylor, and A. Y. Chistoserdov. Bacterial communitycomposition in a large marine anoxic basin: a cariaco basin time-series survey. FEMS Microbiol Ecol,84(3):625–39, 2013.[144] C. Vetriani, H. V. Tran, and L. J. Kerkhof. Fingerprinting microbial assemblages from the oxic/anoxicchemocline of the black sea. Applied and environmental microbiology, 69(11):6481–6488, 2003.[145] V. Edgcomb, W. Orsi, C. Leslin, et al. Protistan community patterns within the brine and halocline ofdeep hypersaline anoxic basins in the eastern mediterranean sea. SExtremophiles, 13(1):151–167, 2009.[146] B. M. Fuchs, D. Woebken, M. V. Zubkov, P. Burkill, and R. Amann. Molecular identification ofpicoplankton populations in contrasting waters of the arabian sea. Aquatic Microbial Ecology, 39(2):145–157, 2005.[147] W. Orsi, Y. C. Song, S. Hallam, and V. Edgcomb. Effect of oxygen minimum zone formation oncommunities of marine protists. The ISME journal, 6(8):1586–601, 2012.[148] T. Stoeck, B. Hayward, G. T. Taylor, R. Varela, and S. S. Epstein. A multiple pcr-primer approach toaccess the microeukaryotic diversity in environmental samples. Protist, 157(1):31–43, 2006.[149] D. A. Walsh and S. J. Hallam. Bacterial community structure and dynamics in a sea- sonally anoxic fjord:Saanich Inlet, British Columbia, pages 253–267. Wiley-Blackwell, Hoboken, NJ, 2011.[150] D. P. Herlemann, M. Labrenz, K. Jurgens, et al. Transitions in bacterial communities along the 2000km salinity gradient of the baltic sea. ISME J, 5(10):1571–1579, 2011.[151] T. Stoeck, D. Bass, M. Nebel, et al. Multiple marker parallel tag environmental dna sequencing revealsa highly complex eukaryotic community in marine anoxic water. Mol Ecol, 19, 2010.[152] T. Stoeck, A. Behnke, R. Christen, et al. Massively parallel tag sequencing reveals the complexity ofanaerobic marine protistan communities. BMC Biol, 7:72, 2009.[153] S. Wakeham, R. Amann, K. Freeman, et al. Microbial ecology of the stratified water column of theblack sea as revealed by a comprehensive biomarker study. Organic Geochemistry, 38(12):2070–2097,2007.[154] S. Glaubitz, T. Lueders, W. R. Abraham, et al. 13c-isotope analyses reveal that chemolithoautotrophicgamma- and epsilonproteobacteria feed a microbial food web in a pelagic redoxcline of the centralbaltic sea. Environ Microbiol, 11(2):326–37, 2009.[155] J. A. Bryant, F. J. Stewart, J. M. Eppley, and E. F. DeLong. Microbial community phylogenetic andtrait diversity declines with depth in a marine oxygen minimum zone. Ecology, 93(7):1659–1673, 2012.[156] T. J. Mincer, M. J. Church, L. T. Taylor, et al. Quantitative distribution of presumptive archaeal andbacterial nitrifiers in monterey bay and the north pacific subtropical gyre. Environmental Microbiology,9(5):1162–1175, 2007.[157] M. T. Suzuki, L. T. Taylor, and E. F. DeLong. Quantitative analysis of. Appl Environ Microbiol,66(11):4605–4614, 2000.[158] K. Takai and K. Horikoshi. Rapid detection and quantification of members of the archaeal communityby quantitative pcr using fluorogenic probes. Applied and environmental microbiology, 2000(66):11, 2000.143[159] D. A. Walsh, E. Zaikova, and S. J. Hallam. Small volume (1-3l) filtration of coastal seawater samples.JoVE, e1163, 2009.[160] J. J. Wright, S. Lee, E. Zaikova, D. A. Walsh, and S. J. Hallam. Dna extraction from 0.22 micronsterivex filters and cesium chloride density gradient centrifugation. Journal of Vissualized Experiments,page e1352, 2009.[161] J. A. Cram, C. E. Chow, R. Sachdeva, et al. Seasonal and interannual variability of the marinebacterioplankton community throughout the water column over ten years. ISME J, 9(3):563–80, 2015.[162] J. Tremblay, K. Singh, A. Fern, et al. Primer and platform effects on 16s rrna tag sequencing. FrontMicrobiol, 6:771, 2015.[163] C. Daum, J. Han, M. Zane, et al. Illumina ga iix & hiseq 2000 production sequencing and qc analysispipelines at the doe joint genome institute (advances of genome biology and technology meeting2011), 2011.[164] Y. Shi, G. W. Tyson, and E. F. DeLong. Metatranscriptomics reveals unique microbial small rnas inthe ocean’s water column. Nature, 459(7244):266–9, 2009.[165] Rachna J. Ram, Nathan C. VerBerkmoes, Michael P. Thelen, et al. Community proteomics of a naturalmicrobial biofilm. Science, 208:1915–1920, 2005.[166] K. M. Keiblinger, I. C. Wilhartitz, T. Schneider, et al. Soil metaproteomics - comparative evaluation ofprotein extraction protocols. Soil Biol Biochem, 54(150-10):14024, 2012.[167] T. Schneider, K. M. Keiblinger, E. Schmid, et al. Who is who in litter decomposition? metaproteomicsreveals major microbial players and their biogeochemical functions. The ISME journal, 6(9):1749–62,2012.[168] N. Delmotte, C. Knief, S. Chaffron, et al. Community proteogenomics reveals insights into thephysiology of phyllosphere bacteria. Proc Natl Acad Sci U S A, 106(38):16428–33, 2009.[169] N. C. Verberkmoes, A. L. Russell, M. Shah, et al. Shotgun metaproteomics of the human distal gutmicrobiota. The ISME journal, 3(2):179–89, 2009.[170] M. E. Guazzaroni, F. A.’ Herbst, I. Lores, et al. Metaproteogenomic insights beyond bacterial responseto naphthalene exposure and bio-stimulation. ISME J, 7(1):122–136, 2013.[171] R. Kuhn, D. Benndorf, E. Rapp, et al. Metaproteome analysis of sewage sludge from membranebioreactors. Proteomics, 11(13):2738–2744, 2011.[172] P. Wilmes, M. Wexler, and P. L. Bond. Metaproteomics provides functional insight into activatedsludge wastewater treatment. PLosObe, 3(3):e1778, 2008.[173] R. M. Morris, B. L. Nunn, C. Frazar, et al. Comparative metaproteomics reveals ocean-scale shifts inmicrobial nutrient utilization and energy transduction. ISME J, 4(5):673–85, 2010.[174] S. M. Sowell, P.E. Abraham, M. Shah, et al. Environmental proteomics of microbial plankton in ahighly productive coastal upwelling system. The ISME Journal, 5(5):856–865, 2011.[175] T. J. Williams, E. Long, F. Evans, et al. A metaproteomic assessment of winter and summer bacterio-plankton from antarctic peninsula coastal surface waters. The ISME Journal, 2012.[176] S. Kim, N. Mischerikow, N. Bandeira, et al. The generating function of cid, etd and cid/etd pairs oftandem mass spectra: Applications to database search. Molecular & Cellular Proteomics, 9:2840–2852,2010.144[177] D. H. Huson, A. F. Auch, J. Qi, and S. C. Schuster. Megan analysis of metagenomic data. GenomeResearch, 17(3):377–386, 2007.[178] I. Letunic and P. Bork. Interactive tree of life (itol): an online tool for phylogenetic tree display andannotation. Bioinformatics, 23(1):127–8, 2007.[179] I. Letunic and P. Bork. Interactive tree of life v2: online annotation and display of phylogenetic treesmade easy. Nucleic Acids Res, 39(Web Server issue):W475–8, 2011.[180] B. Zybailov, A. L. Mosley, M. E. Sardiu, et al. Statistical analysis of membrane proteome expressionchanges in saccharomyces cerevisiae. Journal of proteome research, 5:2339–2347, 2006.[181] J. A. Dodsworth, P. C. Blainey, S. K. Murugapiran, et al. Single-cell and metagenomic analyses indicatea fermentative and saccharolytic lifestyle for members of the op9 lineage. Nature Communications,4:1854, 2013.[182] K. M. Konwar, N. W. Hanson, A. P. Page, and S. J. Hallam. Metapathways: a modular pipelinefor constructing pathway/genome databases from environmental sequence information. BMCBioinformatics, 14(202), 2013.[183] D. Hyatt, G-L Chen, M. L. LoCascio, P.F.and Land, F. W. Larimer, and L. J. Hauser. Prodigal: prokary-otic gene recognition and translation initiation site identification. BMC Bioinformatics, 11(11):119,2010.[184] S. M. Kiełbasa, R. Wan, K Sato, P. Horton, and M. C. Frith. Adaptive seeds tame genomic sequencecomparison. Genome Research, 21:487–493, 2011.[185] Dongjae Kim, Aria S Hahn, Kishori M Hanson, Niels Wand Konwar, and Steven J Hallam. Fast: Fastannotation with synchronized threads. IEEE Conference on Computational Intelligence in Bioinformaticsand Computational Biology, in press, 2016.[186] D. A. Rasko, G. S. Myers, and Ravel J. Visualization of comparative genomic analyses by blast scoreratio. BMC Bioinformatics, 6(2), 2005.[187] S Okuda, T Yamada, M. Hamajima, et al. Kegg atlas mapping for global analysis of metabolicpathways. Nucleic Acids Research, 36:W423, 2008.[188] R. L. Tatusov, D. A. Natale, I. V. Garkavtsev, et al. The cog database: new developments in phylogeneticclassification of proteins from complete genomes. Nucleic Acids Research, 29(1):22, 2001.[189] R. Caspi, T. Altman, K. Dreher, et al. The metacyc database of metabolic pathways and enzymes andthe biocyc collection of pathway/genome databases. Nucleic Acids Research, 40(D1):D742–D753, 2012.[190] H. Li and R. Durbin. Fast and accurate short read alignment with burrows-wheeler transform.Bioinformatics, 25(14):1754–60, 2009.[191] K. M. Konwar, N. W. Hanson, M. P. Bhatia, et al. Metapathways v2.5: quantitative functional,taxonomic and usability improvements. Bioinformatics, 31(20):3345–7, 2015.[192] N. Gruber and J. L. Sarmiento. Global patterns of marine nitrogen fixation and denitrification. GlobalBiogeochemical Cycles, 11(2):235–266, 1997.[193] L. A. Codispoti, J. A. Brandes, J. P. Christensen, et al. The oceanic fixed nitrogen and nitrous oxidebudgets: Moving targets as we enter the anthropocene? Scientia Marina, 65(2):85–105, 2001.[194] C. Deutsch, J. L. Sarmiento, D. M. Sigman, N. Gruber, and J. P. Dunne. Spatial coupling of nitrogeninputs and losses in the ocean. Nature, 445(7124):163–7, 2007.145[195] T. E. Mattes, B. L. Nunn, K. T. Marshall, et al. Sulfur oxidizers dominate carbon fixation at abiogeochemical hot spot in the dark ocean. ISME J, 7(12):2349–60, 2013.[196] R. E. Anderson, M. T. Beltran, S. J. Hallam, and J. A. Baross. Microbial community structure acrossfluid gradients in the juan de fuca ridge hydrothermal system. FEMS Microbiol Ecol, 83(2):324–39,2013.[197] J. M. Beman, J. Leilei Shih, and B. N. Popp. Nitrite oxidation in the upper water column and oxygenminimum zone of the eastern tropical north pacific ocean. ISME J, 7(11):2192–205, 2013.[198] M. Hugler and S. M. Sievert. Beyond the calvin cycle: autotrophic carbon fixation in the ocean. AnnRev Mar Sci, 3:261–89, 2011.[199] H. Schunck, G. Lavik, D. K. Desai, et al. Giant hydrogen sulfide plume in the oxygen minimum zoneoff peru supports chemolithoautotrophy. PLoS One, 8(8):e68661, 2013.[200] H. Ko¨rner, H. J. Sofia, and W. G. Zumft. Phylogeny of the bacterial superfamily of crp-fnr transcriptionregulators: exploiting the metabolic spectrum by controlling alternative gene programs. FEMSMicrobiol Rev, 27(5):559–592, 2003.[201] M. G. Klotz, M. C. Schmid, M. Strous, et al. Evolution of an octahaem cytochrome c protein familythat is key to aerobic and anaerobic ammonia oxidation by bacteria. Environ Microbiol, 10(11):3150–63,2008.[202] L. Farı´as, C. Ferna´ndez, J. Fau Fau´ndez, M. Cornejo, and M. E. Alcaman. Chemolithoautotrophicproduction mediating the cycling of the greenhouse gases n2o and ch4 in an upwelling ecosystem.Biogeosciences, 6(3053-3069), 2009.[203] A. E. Santoro, C. Buchwald, M. R. McIlvin, and K. L. Casciotti. Isotopic signature of n2o produced bymarine ammonia-oxidizing archaea. Science, 333, 2011.[204] D. C. Reed, C. K. Algar, J. A. Huber, and G. J. Dick. Gene-centric approach to integrating environ-mental genomics and biogeochemical models. Proc Natl Acad Sci U S A, 111(5):1879–84, 2014.[205] K. Mavromatis, N. Ivanova, K. Barry, et al. Use of simulated data sets to evaluate the fidelity ofmetagenomic processing methods. Nat Methods, 4(6):495–500, 2007.[206] A. K. Hawley, S. Kheirandish, A. Mueller, et al. Molecular tools for investigating microbial communitystructure and function in oxygen-deficient marine waters. Methods Enzymol, 531:305–29, 2013.[207] S. Kim, N. Gupta, and P.A. Pevzner. Spectral probabilities and generating functions of tandem massspectra: A strike against decoy databases. J Proteome Res, 7(8):33543363, 2008.[208] J. Van de Vossenberg, D. Woebken, W. J. Maalcke, et al. The metagenome of the marine anammoxbacterium candidatus scalindua profunda illustrates the versatility of this globally important nitrogencycle bacterium. Environmental Microbiology, 15(5):12751289, 2013.[209] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J.Mol. Biol., 216:403–410, 1990.[210] N. Georgescu-Roegen. The Entropy Law and the Economic Process. Harvard University Press, Cambridge,MA, 1971.[211] R. U. Ayres. Eco-thermodynamics: economics and the second law. Ecological Economics, 26(189-209),1997.146[212] L. A. Hug, B. C. Thomas, I. Sharon, et al. Critical biogeochemical functions in the subsurface areassociated with bacteria from new phyla and little studied lineages. Environ Microbiol, 18(1):159–73,2016.[213] H. J. Tripp, J. B. Kitner, M. S. Schwalbach, et al. Sar11 marine bacteria require exogenous reducedsulphur for growth. Nature, 452(7188):741–4, 2008.[214] F. O. Aylwarda, J. A M. Eppley, J. M. Smith, et al. Microbial community transcriptional networks areconserved in three domains at ocean basin scales. Proc Natl Acad Sci U S A, 112(17):5443–5448, 2015.[215] S. Louca, L. Wegener Parfrey, and M. Doebeli. Decoupling function and taxonomy in the globalocean microbiome. Science, 353(1272-1277), 2016.[216] E. A. Gies, K. M. Konwar, J. T. Beatty, and S. J. Hallam. Illuminating microbial dark matter inmeromictic sakinaw lake. AEM, 80(21):6807–6018, 2014.[217] M. K. Nobu, T. Narihiro, C. Rinke, et al. Microbial dark matter ecogenomics reveals complexsynergistic networks in a methanogenic bioreactor. ISME J, 2015.[218] N. Segata, D. Bornigen, X. C. Morgan, and C. Huttenhower. Phylophlan is a new method forimproved phylogenetic and taxonomic placement of microbes. Nat Commun, 4:2304, 2013.[219] S.J. Hallam, M. Torres Beltra´n, and A.K. Hawley. Monitoring microbial responses to ocean deoxy-genation in a model oxygen minimum zone. Sci. Data, 4:170158, 2017.[220] O. Be´ja`, L. Aravind, E. V. Koonin, et al. Bacterial rhodopsin: Evidence for a new type of phototrophyin the sea. Science, 289:1902–1906, 2000.[221] S. M. Steinberg and J. L. Badal. Oxalic, glyoxalic and pyruvic acids in eastern pacific ocean waters.Journal of Marine Research, 42:697–708, 1984.[222] V. Anantharam, M. J. Allison, and P. C. Maloney. 0xalate:formate exchange. Journal of BiologicalChemistry, 264(13):7244–7250, 1989.[223] C. Greening, A. Biswas, C. R. Carere, et al. Genomic and metagenomic surveys of hydrogenasedistribution indicate h2 is a widely utilised energy source for microbial growth and survival. ISME J.,10:761–777, 2016.[224] S. Roux, A. K. Hawley, M. Torres Beltra´n, et al. Ecology and evolution of viruses infecting uncultivatedsup05 bacteria as revealed by single-cell and meta-genomics. Elife, 3:e03125, 2014.[225] V. M. Markowitz, I. M. Chen, K. Palaniappan, et al. Img: the integrated microbial genomes databaseand comparative analysis system. Nucleic Acids Res, 40(Database issue):D115–22, 2012.[226] N. J. Varghese, S. Mukherjee, N. Ivanova, et al. Microbial species delineation using whole genomesequences. Nucleic Acids Res, 43:6761–71, 2015.[227] H. Teeling, A. Meyerdierks, M. Bauer, R. Amann, and F. O. Glockner. Application of tetranucleotidefrequencies for the assignment of genomic fragments. Environ Microbiol, 6(9):938–47, 2004.[228] H. Teeling, J. Waldmann, T. Lombardot, M. Bauer, and F. O. Glockner. Tetra: a web-service anda stand-alone program for the analysis and comparison of tetranucleotide usage patterns in dnasequences. BMC Bioinformatics, 5:163, 2004.[229] K. R. Clarke and R. N. Gorley. Primer v6: User manual/tutorial. 2006.147[230] D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson. Checkm: assessing thequality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res,25(7):1043–55, 2015.[231] V. M. Markowitz, K. Mavromatis, N. N. Ivanova, et al. Img er: a system for microbial genomeannotation expert review and curation. Bioinformatics, 25:2271–2278, 2009.[232] R. C Edgar. Search and clustering orders of magnitude faster than blast. Bioinformatics, 26:2460246,2010.[233] S. Pesant, F. Not, M. Picheral, et al. Open science resources for the discovery and analysis of taraoceans data. Sci Data, 2:150023, 2015.[234] IPCC. Climate Change 2013: The Physical Sciences Basis. Contribution of Working Group I to the FithAssessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, 2013.[235] A. R. Ravishankara, J. S. Daniel, and R. W. Portmann. Nitrous oxide (n2o): The dominant ozone-depleting substance emitted in the 21st century. Science, 326(5949):123–125, 2009.[236] S. W. A. Naqvi, H. W. Bange, L. Faras, et al. Marine hypoxia/anoxia as a source of ch4 and n2o.Biogeosciences, 7(7):2159–2190, 2010.[237] S. W. Naqvi, D. A. Jayakumar, P. V. Narvekar, et al. Increased marine production of n2o due tointensifying anoxia on the indian continental shelf. Nature, 408(6810):346–9, 2000.[238] S. Hallin, L. Philippot, F. E. Loffler, R. A. Sanford, and C. M. Jones. Genomics and ecology of noveln2o-reducing microorganisms. Trends Microbiol, 26(1):43–55, 2018.[239] W. G. Zumft. Cell biology and molecular basis of denitrification. Microbiol. Mol. Biol. Rev., 61:533616,1997.[240] T. Suenaga, S. Riya, M. Hosomi, and A. Terada. Biokinetic characterization and activities of n2o-reducing bacteria in response to various oxygen levels. Front Microbiol, 9:697, 2018.[241] M. M. M. Kuypers, H. K. Marchant, and B. Kartal. The microbial nitrogen-cycling network. Nat RevMicrobiol, 16(5):263–276, 2018.[242] I. Koike and A. Hattori. Energy yield of denitrification: An estimate from growth yield in continuouscultures of pseudomonas denitrijicans under nitrate=,nitrite- and nitrous oxide-limited conditions.Journal of General Microbiology, 88:11–19, 1975.[243] T. Yoshinari. N2o reduction by vibrio succinogenes. Appl Environ Microbiol., 39(1):81–84, 1980.[244] M. Conthe, L. Wittorf, J. G. Kuenen, et al. Life on n2o: deciphering the ecophysiology of n2o respiringbacterial communities in a continuous culture. IMSEJ, 12:11421153, 2018.[245] J. Juhanson, S. Hallin, M. So¨derstro¨m, M. Stenberg, and C. M. Jones. Spatial and phyloecologicalanalyses of nosz genes underscore niche differentiation amongst terrestrial n2o reducing communities.Soil Biology and Biochemistry, 115:82 – 91, 2017.[246] C. M. Jones, D. R. Graf, D. Bru, L. Philippot, and S. Hallin. The unaccounted yet abundant nitrousoxide-reducing microbial community: a potential nitrous oxide sink. ISME J, 7(2):417–26, 2013.[247] Daniel R. H. Graf, Christopher M. Jones, and Sara Hallin. Intergenomic comparisons highlightmodularity of the denitrification pathway and underpin the importance of community structure forn2o emissions. PLOS ONE, 9(12):1–20, 12 2014.148[248] B. J. Baker, C. S. Lazar, A. P. Teske, and G. J. Dick. Genomic resolution of linkages in carbon, nitrogen,and sulfur cycling among widespread estuary sediment bacteria. Microbiome, 3, 2015.[249] A. K. Hawley, M. K. Nobu, J. J. Wright, et al. Diverse marinimicrobia bacteria may mediate coupledbiogeochemical cycles along eco-thermodynamic gradients. Nature Communications, 8(1507), 2017.[250] M. Stark, S. A. Berger, A. Stamatakis, and C. von Mering. Mltreemap - accurate maximum likelihoodplacement of environmental dna sequences into taxonomic and functional reference phylogenies.BMC Genomics, 11(1):461, 2010.[251] C. Morgan Lang, A. K. Hawley, J. Anistad, and S. J. Hallam. Treesapp: Tree-based sensitive andaccurate protein profiler. BMC Genomics, In Progress, 2018.[252] A. Oulas, P. N. Polymenakou, R. Seshadri, et al. Metagenomic investigation of the geologically uniquehellenic volcanic arc reveals a distinctive ecosystem with unexpected physiology. Environ Microbiol,18(4):1122–36, 2016.[253] A. L. Alldredge and Y. Cohen. Can microscale chemical patches persist in the sea? microelectrodestudy of marine snow, fecal pellets. Science, 235(4789):689–691, 1987.[254] A. Bankevich, S. Nurk, D. Antipov, et al. Spades: A new genome assembly algorithm and itsapplications to single-cell sequencing. Journal of Computational Biology, 19(5):455–477, 2012.[255] M Holtappels, G Lavik, MM Jensen, and MMM Kuypers. 15n-labeling experiments to dissect thecontributions of heterotrophic denitrification and anammox to nitrogen removal in the omz waters ofthe ocean. Methods in Enzymology, 486:223–251, 2011.[256] E. Garcia-Robledo, C. C. Padilla, M. Aldunate, et al. Cryptic oxygen cycling in anoxic marine zones.Proc Natl Acad Sci U S A, 114(31):8319–8324, 2017.[257] C. C. Padilla, L. A. Bristow, N. Sarode, et al. Nc10 bacteria in marine oxygen minimum zones. ISME J,10(8):2067–71, 2016.[258] N. P. Revsbech, B. Thamdrup, T. Dalsgaard, and D. E. Canfield. Chapter fourteen - construction ofstox oxygen sensors and their application for determination of o2 concentrations in oxygen minimumzones. In Martin G. Klotz, editor, Research on Nitrification and Related Processes, Part A, volume 486 ofMethods in Enzymology, pages 325 – 341. Academic Press, 2011.[259] W. E. Durno. Precise correlation and metagenomic binning uncovers fine microbial communitystructure. Master’s thesis, University of British Columbia, 2017. Retrieved from https://circle.ubc.ca/.[260] V. J. Orphan, C. H. House, K. U. Hinrichs, K. D. McKeegan, and E. F. DeLong. Methane-consumingarchaea revealed by directly coupled isotopic and phylogenetic analysis. Science, 293(5529):484–7,2001.[261] S. E. McGlynn, G. L. Chadwick, C. P. Kempes, and V. J. Orphan. Single cell activity reveals directelectron transfer in methanotrophic consortia. Nature, 526(7574):531–5, 2015.[262] J. Chen, A. Hanke, H. E. Tegetmeyer, et al. Impacts of chemical gradients on microbial communitystructure. ISME J, 11(4):920–931, 2017.[263] T. C. Walther and M. Mann. Mass spectrometry-based proteomics in cell biology. J Cell Biol,190(4):491–500, 2010.[264] S Sunagawa, L Pedro Coelho, and S... et al Chaffron. Structure and function of the global oceanmicrobiome. Science, 348(6237), 2015.149Appendix AChapter 2: Supplementary material150Table A.1: Metagenome inventory Inventory of metagenomic datasets and accession numbersSampleID Cruise ID Year Month Station Depth (m) MetaG IMG/M Genome ID MetaG BioSample AccessionSI034 S3 10 34 2009 Jun SI03 10 3300000224 SAMN05224402SI034 S3 100 34 2009 Jun SI03 100 3300000254 SAMN05224404SI034 S3 120 34 2009 Jun SI03 120 3300000225 SAMN05224407SI034 S3 135 34 2009 Jun SI03 135 3300000226 SAMN05224408SI034 S3 150 34 2009 Jun SI03 150 3300000237 SAMN05224411SI034 S3 200 34 2009 Jun SI03 200 3300000172 SAMN05224484SI036 S3 100 36 2009 Aug SI03 100 3300000238 SAMN05224405SI036 S3 120 36 2009 Aug SI03 120 3300000239 SAMN05224472SI036 S3 135 36 2009 Aug SI03 135 3300000170 SAMN05224406SI036 S3 150 36 2009 Aug SI03 150 3300000204 SAMN05224409SI036 S3 200 36 2009 Aug SI03 200 3300000155 SAMN05224412SI037 S2 100 37 2009 Sep SI02 100 3300004110 SAMN05224479SI037 S2 150 37 2009 Sep SI02 150 3300004109 SAMN05224480SI037 S2 200 37 2009 Sep SI02 200 3300004111 SAMN05224481SI037 S3 10 37 2009 Sep SI03 10 3300003599 SAMN05224521SI037 S3 100 37 2009 Sep SI03 100 3300003478 SAMN05224482SI037 S3 110 37 2009 Sep SI03 110 3300003615 SAMN05224526SI037 S3 120 37 2009 Sep SI03 120 3300003600 SAMN05224527SI037 S3 125 37 2009 Sep SI03 125 3300003620 SAMN05224532SI037 S3 130 37 2009 Sep SI03 130 3300003498 SAMN05224483SI037 S3 150 37 2009 Sep SI03 150 3300003494 SAMN05224486SI037 S3 200 37 2009 Sep SI03 200 3300003496 SAMN05224487SI037 S4 100 37 2009 Sep SI04 100 3300003500 SAMN05224429SI037 S4 130 37 2009 Sep SI04 130 3300003501 SAMN05224434SI037 S4 150 37 2009 Sep SI04 150 3300003495 SAMN05224435SI039 S3 10 39 2009 Nov SI03 10 4096421 SAMN05224416SI039 S3 100 39 2009 Nov SI03 100 4096422 SAMN05224417SI039 S3 120 39 2009 Nov SI03 120 4096423 SAMN05224422SI039 S3 135 39 2009 Nov SI03 135 4096424 SAMN05224423SI039 S3 150 39 2009 Nov SI03 150 4096425 SAMN05224428SI039 S3 200 39 2009 Nov SI03 200 4096426 SAMN05224477SI042 S3 10 42 2010 Feb SI03 10 * SAMN05224451SI042 S3 100 42 2010 Feb SI03 100 * SAMN05224447SI042 S3 120 42 2010 Feb SI03 120 * SAMN05224436SI042 S3 135 42 2010 Feb SI03 135 * SAMN05224437SI042 S3 150 42 2010 Feb SI03 150 * SAMN05224442SI042 S3 200 42 2010 Feb SI03 200 * SAMN05224443SI047 S3 100 47 2010 July SI03 100 3300000148 SAMN05224454SI047 S3 120 47 2010 July SI03 120 3300000212 SAMN05224455SI047 S3 135 47 2010 July SI03 135 3300000193 SAMN05224458SI047 S3 150 47 2010 July SI03 150 3300000154 SAMN05224459SI047 S3 200 47 2010 July SI03 200 3300000171 SAMN05224463SI048 S3 10 48 2010 Aug SI03 10 3300000207 SAMN05224462SI048 S3 100 48 2010 Aug SI03 100 3300000324 SAMN05224393SI048 S3 120 48 2010 Aug SI03 120 3300000150 SAMN05224394SI048 S3 135 48 2010 Aug SI03 135 3300000160 SAMN05224397SI048 S3 150 48 2010 Aug SI03 150 3300000200 SAMN05224398SI048 S3 200 48 2010 Aug SI03 200 3300000166 SAMN05224401SI053 S3 10 53 2011 Jan SI03 10 3300000143 SAMN05224489SI053 S3 100 53 2011 Jan SI03 100 3300000187 SAMN05224490SI053 S3 120 53 2011 Jan SI03 120 3300000215 SAMN05224491SI053 S3 135 53 2011 Jan SI03 135 3300000211 SAMN05224492SI053 S3 150 53 2011 Jan SI03 150 3300000216 SAMN05224466SI053 S3 200 53 2011 Jan SI03 200 3300000151 SAMN05224467SI054 S3 100 54 2011 Feb SI03 100 3300000158 SAMN05224410SI054 S3 120 54 2011 Feb SI03 120 3300000146 SAMN05224433SI054 S3 135 54 2011 Feb SI03 135 3300000201 SAMN05224473SI054 S3 150 54 2011 Feb SI03 150 3300000147 SAMN05224478SI054 S3 200 54 2011 Feb SI03 200 3300000214 SAMN05224438SI060 S3 100 60 2011 Aug SI03 100 3300000192 SAMN05224439SI060 S3 150 60 2011 Aug SI03 150 3300000188 SAMN05224444SI060 S3 200 60 2011 Aug SI03 200 3300000174 SAMN05224474SI072 S3 10 72 2012 Aug-1 SI03 10 3300003592 SAMN05224440SI072 S3 100 72 2012 Aug-1 SI03 100 3300003588 SAMN05224441SI072 S3 120 72 2012 Aug-1 SI03 120 3300003589 SAMN05224512SI072 S3 135 72 2012 Aug-1 SI03 135 3300003585 SAMN05224513SI072 S3 150 72 2012 Aug-1 SI03 150 3300003591 SAMN05224518SI072 S3 165 72 2012 Aug-1 SI03 165 3300003619 SAMN05224523SI072 S3 200 72 2012 Aug-1 SI03 200 3300003590 SAMN05224519SI073 S3 10 73 2012 Aug-28 SI03 10 3300003582 SAMN05224534SI073 S3 100 73 2012 Aug-28 SI03 100 3300003583 SAMN05224524SI073 S3 120 73 2012 Aug-28 SI03 120 3300003584 SAMN05224525SI073 S3 135 73 2012 Aug-28 SI03 135 3300003596 SAMN05224530SI073 S3 150 73 2012 Aug-28 SI03 150 3300003587 SAMN05224531SI073 S3 165 73 2012 Aug-28 SI03 165 3300003618 SAMN05224533SI073 S3 200 73 2012 Aug-28 SI03 200 3300003581 SAMN05224508151A.1 Metagenome inventory continued from previous pageSampleID Cruise ID Year Month Station Depth (m) MetaG IMG/M Genome ID MetaG BioSample AccessionSI074 S3 10 74 2012 Sep-10 SI03 10 3300003594 SAMN05224529SI074 S3 100 74 2012 Sep-10 SI03 100 3300003593 SAMN05224509SI074 S3 120 74 2012 Sep-10 SI03 120 3300003580 SAMN05224514SI074 S3 135 74 2012 Sep-10 SI03 135 3300003586 SAMN05224515SI074 S3 150 74 2012 Sep-10 SI03 150 3300003602 SAMN05224528SI074 S3 165 74 2012 Sep-10 SI03 165 3300003601 SAMN05224535SI074 S3 200 74 2012 Sep-10 SI03 200 3300003595 SAMN05224520SI075 S3 10 75 2012 Sep-20 SI03 10 3300004279 SAMN05224536SI075 S3 100 75 2012 Sep-20 SI03 100 3300004280 SAMN05224522SI075 S3 120 75 2012 Sep-20 SI03 120 3300004274 SAMN05224493SI075 S3 150 75 2012 Sep-20 SI03 135 3300004278 SAMN05224495SI075 S3 165 75 2012 Sep-20 SI03 165 3300004276 SAMN05224496SI075 S3 200 75 2012 Sep-20 SI03 200 3300004277 SAMN05224497SI075 S3 135 75 2012 Sep-20 SI03 135 3300004273 SAMN05224494* SI042 samples are not currently in IMG/M database, but are available in the NCBI sequence read Archive with the indicated BioSample152Table A.2: Metatranscriptome inventory Inventory of metatranscriptomic datasets and accession numbersSampleID Cruise ID Year Month Station Depth (m) MetaT IMG/M JGI project ID MetaT BioSample AccessionSI042 S3 10 42 2010 Feb SI03 10 1001537 SAMN05238748SI042 S3 100 42 2010 Feb SI03 100 1001540 SAMN05238739SI042 S3 120 42 2010 Feb SI03 120 1001543 SAMN05238743SI042 S3 135 42 2010 Feb SI03 135 1001546 SAMN05238741SI042 S3 150 42 2010 Feb SI03 150 1001549 SAMN05238745SI042 S3 200 42 2010 Feb SI03 200 1001552 SAMN05238751SI047 S3 10 47 2010 July SI03 10 3300004642 SAMN05224517SI047 S3 100 47 2010 July SI03 100 3300005234 SAMN05224498SI047 S3 120 47 2010 July SI03 120 3300004958 SAMN05224499SI047 S3 135 47 2010 July SI03 135 3300004640 SAMN05224500SI047 S3 150 47 2010 July SI03 150 3300004637 SAMN05224516SI047 S3 200 47 2010 July SI03 200 3300004974 SAMN05224501SI048 S3 10 48 2010 Aug SI03 10 3300004960 SAMN05224502SI048 S3 100 48 2010 Aug SI03 100 3300004962 SAMN05224503SI048 S3 120 48 2010 Aug SI03 120 3300004639 SAMN05224504SI048 S3 135 48 2010 Aug SI03 135 3300004638 SAMN05224505SI048 S3 150 48 2010 Aug SI03 150 3300004636 SAMN05224511SI048 S3 200 48 2010 Aug SI03 200 3300004641 SAMN05223291SI054 S3 10 54 2011 Feb SI03 10 3300004957 SAMN05223292SI054 S3 100 54 2011 Feb SI03 100 3300004975 SAMN05223293SI054 S3 120 54 2011 Feb SI03 120 3300004954 SAMN05224510SI054 S3 135 54 2011 Feb SI03 135 3300005233 SAMN05236416SI054 S3 150 54 2011 Feb SI03 150 3300004968 SAMN05224506SI054 S3 200 54 2011 Feb SI03 200 3300004627 SAMN05224507SI072 S3 10 72 2012 Aug 1 SI03 10 1024556 SAMN05238753SI072 S3 100 72 2012 Aug 1 SI03 100 1024559 1024562 SAMN05238755 SAMN05236417SI072 S3 135 72 2012 Aug 1 SI03 135 1024571 1024574 SAMN05238757 SAMN05238759SI072 S3 150 72 2012 Aug 1 SI03 150 1024577 1024580 SAMN05238761 SAMN05238729SI072 S3 165 72 2012 Aug 1 SI03 165 1024583 1024586 SAMN05238731 SAMN05238732SI072 S3 200 72 2012 Aug 1 SI03 200 1024589 1024592 SAMN05238733 SAMN05238734SI073 S3 10 73 2012 Aug 28 SI03 10 1024595 SAMN05238721SI073 S3 165 73 2012 Aug 28 SI03 165 1024622 1024625 SAMN05238722 SAMN05238723SI073 S3 200 73 2012 Aug 28 SI03 200 1024628 SAMN05238724SI074 S3 10 74 2012 Sep 10 SI03 10 1024634 SAMN05238725SI074 S3 100 74 2012 Sep 10 SI03 100 1024637 1024640 SAMN05238726 SAMN05238727SI074 S3 120 74 2012 Sep 10 SI03 120 1024643 SAMN05238728SI074 S3 135 74 2012 Sep 10 SI03 135 1024649 1024652 SAMN05238730 SAMN05238763SI074 S3 150 74 2012 Sep 10 SI03 150 1024655 1024658 SAMN05238765 SAMN05238736SI074 S3 165 74 2012 Sep 10 SI03 165 1024661 1024664 SAMN05238738 SAMN05238740SI074 S3 200 74 2012 Sep 10 SI03 200 1024667 1024670 SAMN05238742 SAMN05238744SI075 S3 10 75 2012 Sep 20 SI03 10 1024673 SAMN05238746SI075 S3 100 75 2012 Sep 20 SI03 100 1024676 1024679 SAMN05238749 SAMN05238747SI075 S3 120 75 2012 Sep 20 SI03 120 1024682 1024685 SAMN05238750 SAMN05238752SI075 S3 150 75 2012 Sep 20 SI03 135 1024694 1024697 SAMN05238758 SAMN05238760SI075 S3 200 75 2012 Sep 20 SI03 200 1024706 1024709 SAMN05236415 SAMN05238735SI075 S3 135 75 2012 Sep 20 SI03 135 1024688 1024691 SAMN05238754 SAMN05238756* SI042 samples are not currently in IMG/M database but are avaliable in the NCBI sequence read Archive with the indicated BioSample153Table A.3: Metaproteome inventory Inventory of metaproteomic datasets and accession numbersSampleID Cruise ID Year Month Station Depth (m) MetaP Pride File PrefixSI020 S3 100 20 2008 Apr SI03 100 SH SBI 02, SH SBI 03, SH SBI 19SI020 S3 200 20 2008 Apr SI03 200 SH SBI 04, SH SBI 05, SH SBI 21SI020 S3 10 20 2008 Apr SI03 10 SH SBI 18SI020 S3 120 20 2008 Apr SI03 120 SH SBI 20SI037 S2 100 37 2009 Sep SI02 100 SH SBI 06SI037 S2 130 37 2009 Sep SI02 130 SH SBI 09SI037 S2 150 37 2009 Sep SI02 150 SH SBI 12SI037 S2 200 37 2009 Sep SI02 200 SH SBI 15SI037 S3 100 37 2009 Sep SI03 100 SH SBI 07SI037 S3 130 37 2009 Sep SI03 130 SH SBI 10SI037 S3 150 37 2009 Sep SI03 150 SH SBI 13SI037 S3 200 37 2009 Sep SI03 200 SH SBI 16SI037 S4 100 37 2009 Sep SI04 100 SH SBI 08SI037 S4 130 37 2009 Sep SI04 130 SH SBI 11SI037 S4 150 37 2009 Sep SI04 150 SH SBI 14SI037 S4 190 37 2009 Sep SI04 190 SH SBI 17SI038 S3 10 38 2009 Oct SI03 10 SH SBI TC 01SI038 S3 97 38 2009 Oct SI03 97 SH SBI TC 02SI038 S3 120 38 2009 Oct SI03 120 SH SBI TC 03SI038 S3 150 38 2009 Oct SI03 150 SH SBI TC 04SI038 S3 165 38 2009 Oct SI03 165 SH SBI TC 05SI038 S3 200 38 2009 Oct SI03 200 SH SBI TC 06SI042 S3 10 42 2010 Feb SI03 10 SH SBI TC 07SI042 S3 120 42 2010 Feb SI03 120 SH SBI TC 09SI042 S3 150 42 2010 Feb SI03 150 SH SBI TC 10SI042 S3 200 42 2010 Feb SI03 200 SH SBI TC 12SI044 S3 10 44 2010 Apr SI03 10 SH SBI TC 13SI044 S3 60 44 2010 Apr SI03 60 SH SBI TC 14SI044 S3 97 44 2010 Apr SI03 67 SH SBI TC 15SI044 S3 120 44 2010 Apr SI03 120 SH SBI TC 16SI044 S3 135 44 2010 Apr SI03 135 SH SBI TC 17SI044 S3 150 44 2010 Apr SI03 150 SH SBI TC 18SI044 S3 200 44 2010 Apr SI03 200 SH SBI TC 19SI046 S3 10 46 2010 Jun SI03 10 SH SBI TC 20SI046 S3 60 46 2010 Jun SI03 60 SH SBI TC 21SI046 S3 100 46 2010 Jun SI03 100 SH SBI TC 22SI046 S3 120 46 2010 Jun SI03 120 SH SBI TC 23SI046 S3 135 46 2010 Jun SI03 135 SH SBI TC 24SI046 S3 150 46 2010 Jun SI03 150 SH SBI TC 25SI046 S3 200 46 2010 Jun SI03 200 SH SBI TC 26SI047 S3 10 47 2010 July SI03 10 SH SBI TC2 SI047 10mSI047 S3 100 47 2010 July SI03 100 SH SBI TC2 SI047 100mSI047 S3 120 47 2010 July SI03 120 SH SBI TC2 SI047 120mSI047 S3 135 47 2010 July SI03 135 SH SBI TC2 SI047 135mSI047 S3 150 47 2010 July SI03 150 SH SBI TC2 SI047 150mSI047 S3 200 47 2010 July SI03 200 SH SBI TC2 SI047 200mSI048 S3 10 48 2010 Aug SI03 10 SH SBI TC2 SI048 10mSI048 S3 100 48 2010 Aug SI03 100 SH SBI TC2 SI048 100mSI048 S3 120 48 2010 Aug SI03 120 SH SBI TC2 SI048 120mSI048 S3 135 48 2010 Aug SI03 135 SH SBI TC2 SI048 135mSI048 S3 150 48 2010 Aug SI03 150 SH SBI TC2 SI048 150mSI048 S3 200 48 2010 Aug SI03 200 SH SBI TC2 SI048 200m154Table A.3 Metaproteome inventory continued from previous page.SampleID Cruise ID Year Month Station Depth (m) MetaP Pride File PrefixSI053 S3 10 53 2011 Jan SI03 10 SH SBI TC2 SI053 10mSI053 S3 100 53 2011 Jan SI03 100 SH SBI TC2 SI053 100mSI053 S3 120 53 2011 Jan SI03 120 SH SBI TC2 SI053 120mSI053 S3 135 53 2011 Jan SI03 135 SH SBI TC2 SI053 135mSI053 S3 150 53 2011 Jan SI03 150 SH SBI TC2 SI053 150mSI053 S3 200 53 2011 Jan SI03 200 SH SBI TC2 SI053 200mSI054 S3 10 54 2011 Feb SI03 10 SH SBI TC2 SI054 10mSI054 S3 100 54 2011 Feb SI03 100 SH SBI TC2 SI054 100mSI054 S3 120 54 2011 Feb SI03 120 SH SBI TC2 SI054 120mSI054 S3 135 54 2011 Feb SI03 135 SH SBI TC2 SI054 135mSI054 S3 150 54 2011 Feb SI03 150 SH SBI TC2 SI054 150mSI054 S3 200 54 2011 Feb SI03 200 SH SBI TC2 SI054 200m155A.1 RNA extraction and isolation protocolHere I detail the protocol developed from Shi et al. 2009 to maximise extraction efficiency and RNA quality.As with all RNA extractions clean your work area with RNAse away or other RNAse cleaner to neutraliseany RNAse that may degrade RNA in your samples. Use only RNase free tips, tubes and buffers.RNA Extration1. Using a peristaltic pump (Cole-Parmer), seawater is filtered through a 2.7-mm GF/D prefilter toreduce particle and eukaryotic cell loading. Flow through biomass is concentrated in-line ontoa 0.2 mm Sterivex filter (Millipore). Filter volumes will vary with cell densities ranging between1 to 5 L as greater volume take longer and microbial expression may change. Following biomassconcentration, a syringe is used to purge remaining seawater from the filter cartridge prior to additionof RNALater.2. 1.8 mL of RNALater (Ambion) is added to the Sterivex filter, sealed at both ends with parafilm,placed in steril bag, frozen on dry ice, and stored at -80◦C until extraction.3. Prior to RNA extraction, the Sterivex filter is thawed on ice.4. Remove RNA later using a sterile 10 mm syringe to slowly push RNAlater out of Sterivex into twonuclease-free 1.5 ml tubes. Store on ice until extraction is complete and presence of RNA is validated.5. Wash Stervix with Ringer′s Solution by adding 1.8 mL of Ringer′s (prepared with RNAse-free water)solution to Sterivex. Invert several times to mix and incubate with rolling at room temperature for 20min.6. Remove Ringer′s Solution from sterivex using a steril 10 mL syringe to slowly push Ringer′s solutionout of Sterivex into two nuclease-free 1.5 ml tubes. Store on ice until extraction is complete andpresence of RNA is validated.7. Lyse cells in Sterivex by adding 1.8 mm of Lysis/Binding from mirVana kit to Sterivex, add 100 µLlysozyme to Sterivex (62.5 mg lysozyme in 500 µL nuclease-free TE. Invert several times to mix andincubate at 37◦C for 30 min. with rolling.8. Remove lysate from Sterivex with sterile 10 mm syringe, collect into a 15 mm falcon tube. WashSterivex with 1 mL Lysis/Binding buffer and add to lysate. Total lysate volume will be ∼3.5 mL9. Organic extraction of RNA; add 1/10 lysate volume miRNA Homogenate Additive form mirVana kit∼350 µL. Mix well by inverting several times, incubate on ice for 10 min.15610. Add 1 lysate volume of Acid-Phenol:Chloroform (take Acid-Phenol:Chloroform from bottom ofbottle, as the top is aqueous buffer), invert several times to mix. Centrifuge 10 min at 6,000 rpm inswinging bucket rotor at ∼5◦, remove aqueous layer to new falcon tube and record volume.11. Add 1.25 volumes of room temperature 100% ethanol to aqueous phase (∼350 mL).12. Pass sample through filter cartridge. Place filter cartridge into collection tube, pipet lysate/ethanolmixture onto filter cartridge 700 µL at a time, spin 10,000 x g for ∼15 s, after each addition and applymore ethanol/lysate until all is loaded onto column.13. Wash filter by adding 700 µL miRNA Wash Solution 1 from mirVana kit to filter cartridge. Centrifugefor 5 -10 s and discard flow-through.14. Wash filter by adding 500 µL Wash Solution 2/3 from mirVana kit. Centrifuge 5 - 10 seconds anddiscard flow-through. Repeat with additional 500 µL Wash Solution 2/3 and spin. Centrifuge anadditional 1 min and discard flow-through.15. Elute RNA by transfering filter cartridge into fresh collection tube, apply 60 µL pre-heated 95◦Celution solution (or nuclease free water and spin for 20-30 seconds at max. Store total RNA at -80 ◦Cor -20◦C or continue on to cleaning procedures. Aliquot two 1.0 µL aliquots into two 5 µL PCR tubesbefore freezing for quality control analysis on bioannalyzer.DNA removal using TURBO DNA-free kit1. Add 0.1 volume 10X TURBO DNase Buffer to extracted RNA solution and mix gently.2. Incubate in heating block at 37◦C for 30 min.3. Add 6 µL resuspended DNase Inactivation Reagent and mix well.4. Centrifuge at 10,000 x g for 1.5 min.5. Transfer supernatant to clean tube. The supernatant contains total RNA and mo DNA.Clean totalRNA using RNeasy MiniElute Cleanup Kit1. Adjust sample volume to 100 µL with RNase-free water.2. Add 350 µL Buffer RLT and mix well.3. Add 250 µL 100% Ethanol and well by pipetting. Proceed immediately to next step.1574. Trasfer samples (700 µL) to RNeasy MinElute spin column in 2 mL collection tube and centrifuge for15 s at > 8000 x g and discard flow-through.5. Place column in fresh 2 mL collection tube and add 500 µL Buffer RPE to spin column. Centrifuge for15 s at > 8000 x g and discard flow-through.6. Add 500 µL of 80% Ethanol to column. Centrifuge for 2 min at > 8000 x g and discard flow-throughand collection tube7. Place column in new 2 mL collection tube. Centrifuge at full speed with lid of column open for 5min (easiest to cut the lid off at this point). Discard flow-through and collection tube.8. Place column in new 1.5 mL collection tube and add 14 µL of RNase-free water directly to centre ofcolumn membrane. Centrifuge for 1 min at full speed to collect clean total RNA.9. Aliquot two 1.0 µL aliquots into two 5 µL PCR tubes before freezing for quality control analysis onbioannalyzer.158A.2 Protein extraction and isolation protocolHere I detail the protocol developed to effeciently extract total protein from Sterivex filters and peptidedetection optimized for community gene expression profiling in O2-deficient marine waters.Sample Processing and Protein Extraction1. Using a peristaltic pump (Cole-Parmer), seawater is filtered through a 2.7 mm GF/D prefilter toreduce particle and eukaryotic cell loading. Flow through biomass is concentrated in-line onto a0.2 mm Sterivex filter (Millipore). Filter volumes required will vary with corresponding cell densitiesand typically range between 1 L in surface ocean waters and up to 200 L in dark ocean waters.Following biomass concentration, a syringe is used to purge remaining seawater from the filtercartridge prior to lysis buffer addition.2. Add 1.8 mL of lysis buffer (0.75m sucrose, 40 mm EDTA, 50 mm Tris, pH 8.3) to the Sterivex filter,sealed at both ends with parafilm, frozen on dry ice, and stored at -80◦C until extraction.3. Prior to protein extraction, the Sterivex filter is thawed on ice followed by the addition of 200 mm of10X Bugbuster (Novagen). The Sterivex filter is then incubated at room temperature with rocking orrolling for 20-30 min to lyse cells.4. The lysate is extruded from the filter into a 15 mL tube using a 10 mL syringe and put on ice prior tocentrifugation at 3500 x g for 10 min at 4◦C to pellet cellular debris. Rinse the filter with 1 mL of lysisbuffer, extrude, and combine with lysate.5. For buffer exchange, transfer aqueous layer to Amicon(need circledR) Ultra filter with 10K nominalmolecular weight limit cutoff (Millipore), increase volume to 4 mL with 100 µm urea NH4HCO3, andcentrifuge at 3500 x g for 10 min at 4◦C or until there is less the 1 mL remaining in the Amicon filter.Keep samples on ice during buffer exchange steps. Buffer exchange two more times with 1 - 3 mL of100 mm NH4HCO3.6. In the final spin, bring volume down to to 200 mL–500 mL. Record the final extraction volume andtransfer to 1.5 mL tube.7. Protein concentration is determined with 2-(4-carboxyquinolin-2-yl) quinoline-4-carboxylic acid(Bicinchoninic acid or BCA) assay.8. Add powdered urea to a final concentration of 8m (780 mg/mL). NOTE: each mg of Urea addedwill add 0.8 mL of volume.1599. A 50 mm working stock of the reducing agent Dithiothreitol (DTT) is added to a final concentrationof 5 mm and the sample is incubated at 60◦ for 30 min.10. Following DTT incubation, the sample is diluted 10-fold with 100 mM NH4HCO3 and 1m CaCl2 isadded to a final concentration of 1 mm.11. The sample can now be flash-frozen in liquid nitrogen and stored at -80◦C until trypsin digestion.Protein digestion and sample clean-upTo remove residual salts from seawater samples as well as detergents used in protein extraction both a C18column and strong cation exchange column are used following trypsin digest.1. Trypsin digest is carried out using 1 unit of mass spectrometry grade trypsin to 50 units protein at37◦C for 6 hours.2. A 1 mL/50 mg bed volume C18 Solid Phase Extraction (SPE) (Sigma-Aldrich, Supelco Supelclean)column is conditioned with 3 ml of methanol and rinsed with 2 ml 0.1% Trifluoroacetic acid (TFA)using a vacuum manifold.3. After conditioning,the sample is added to the column and washed with 4 mL of 95:50.1%TFA:Acetonitrile (ACN) and allowed to dry. Peptides are eluted with 1 mL of 80:200.1%TFA:ACN using vacuum and concentrated to 50 mL–100 mL in a speed-vac.4. A 1 mL/50 mg bed volume SCX SPE column is used to clean remaining detergents from the sample.Condition the column by following steps 5 - 10 on a vacuum manifold. (Note: A SCX SPE 1 mL/50 mgtube is sufficient for up to 400 mg of protein, use a 1 mL/100 mg tube for larger protein amounts.)5. Condition column with 2 mL of methanol.6. Rinse column with 2 mL 10 mm ammonium formate (NH4HCO2),25% ACN, pH 3.0.7. Rinse column with 2 mL of 500 mm NH4HCO2, 25%ACN, pH 6.8.8. Rinse column with 2 mL of 10 mm NH4HCO2, 25% ACN, pH 3.0.9. Rinse column with 2 mL of Nanopure water.10. Rinse column with 4 mL of 10 mm NH4HCO2, 25% ACN, pH 3.0.11. Acidify sample by adding 10%TFA in Nanopure water to a final sample concentration of 1% andcentrifuge for 5 min at 15,000 x g at room temperature to pellet any precipitates. Slowly pass thesupernatant through column.16012. Wash the column with 4 mL of 10 mm NH4HCO2 25%ACN, pH 3.0, and elute to dryness. Blot endsof manifold tubing below columns dry.13. Place fresh 2.0 mL microcentrifuge tubes below columns, and with the vacuum turned off, add 1.0 mLMeOH:H2O:NH4OH (80:15:5) to each column.14. Turn on vacuum, slowly elute sample from columns, and when columns are dry add an additional500 mL of MeOH:H2O:NH4OH (80:15:5) for a total elution volume of 1.5 mL. Concentrate thesample in a speed-vac to a final volume of 50 mL–100 mL, adding small volumes of H2O to dissolveparticulate matter on the side of the tube (if needed). Perform BCA protein assay.15. The sample can now be flash-frozen in liquid nitrogen and stored at 80◦C until needed for MSanalysis.A.3 Protein sequencing protocolHere I detail the protocol developed to effeciently match peptide sequences to a protein sequence databasecomposed of metagenomic sequences (translated into protein sequences) from Saanich Inlet.Tandem mass spectrometry and peptide identification1. Aliquots containing 5 mg of protein are analyzed by online capillary liquid-chromatography?tandemmass spectrometry (Thermo, LTQ ion trap mass spectrometer or Thermo LTQ-Orbitrap mass spec-trometer) using data-dependent fragmentation on the top 10 ions per duty cycle and a 100-minLC gradient from 0.1% formic acid in water to 0.1% formic acid acetonitrile. (Note: Reverse-phasecapillary HPLC column used was made in-house at Environmental Molecular Sciences Laboratory atPacific Northwest National Laboratories by slurry packing 3 mm Jupiter C18 stationary phase intoa 60 cm length of 360 mm o. d. 75 mm i.d. fused silica capillary tubing using a 1 cm sol-gel frit forretention of the packing material.)2. Peptides are identified from MS/MS spectra using SEQUEST™allowing for a potential oxidation ofthe methionine residues. Each search is performed using an environmental database of predictedprotein sequences generated from the source location. For ion trap data, a mass error windowof 3 m/z units is used for the precursor mass. A mass error window of 1 m/z unit is used forOrbitrap data, given the higher resolution of the instrument. In both cases, a 0 m/z tolerance isused for the fragmentation mass. Peptides identifications are permitted if they have a mass spectra161generating functions value of less than 1011, which corresponds to a false discovery rate below 2%[207]. Identifications are allowed for all possible peptide termini, that is, not limited by tryptic-onlytermini.3. The number of peptide observations(scans matching to a peptide) from each protein is used as arough measure of relative abundance and multiple charge states of a single peptide are consideredas individual observations, as are the same peptides detected in different mass spectral analyses.However, it is important to note that while abundant proteins tend to produce more spectra, notall peptides ionize equally well. The most accurate quantitation requires some form of metabolic,isotopic, or isobaric tagging, or in the case of targeted proteomics, selected reaction monitoring ormultiple reaction monitoring using stable isotope-labeled synthetic peptides [263].A.4 Taxonomic binning and visualization of expressed proteinsHere I detail the protocol developed to calculate normalised spectral abundance factors (NSAF) formetaproteomic samples to estaimate the quantity of peptides detected for a given protein in a metaproteomeand detail process of taxonomic binning and vissualization of expressed proteins.1. Amino acid sequences of detected proteins are compared to known protein sequences in genomicdatabases such as NCBI RefSeq via BLAST. A bit score ratio of 0.4 is used as a cutoff for confidenceand the top hit is assigned to that protein sequence.2. To increase taxonomic resolution and include genomes or metagenomes that are not yet in publicdatabase (such as RefSeq), the target database can be amended with the user-defined sequenceinformation. For example, protein sequences for SUP05 uncultured bacterium and CandidatusKuenenia stuttgartiensis were included in the BLAST against the NCBI RefSeq database by amendingthe RefSeq database with the additional genomic sequence information from desired organisms.3. Peptide scan counts are summed for each protein (with PPP>0.95). For peptides mapping to morethan one protein, scan counts are divided between the total number of identified proteins.4. The spectral abundance factor (SAF) (Equation A.1) is calculated using the sum of all scan counts fora given protein divided by the number of amino acids making up the protein sequence. The NSAF(Equation A.2) is the SAF for a given protein divided by the sum of all SAFs for a given sample.SAF =Sum of scan counts for a given proteinLength of a given protein(A.1)162NSAF =(SAF)Sum of all SAF in a given sample(A.2)5. MEGAN is run using the BLAST output for all identified protein sequences in a given sample byusing the Import from BLAST option in the File menu. In the Import tab of the import dialogue box,select the BLAST output file, in the Content tab, deselect SEED and KEGG options, in the LCA Paramstab, change Min Support to 1 and deselect Use Min-Complexity Filter. (Note: These param- etersare user defined and can be altered based on user preferences or specific data requirements.)6. Users can include taxa missing from the NCBI taxonomy or alter the structure of the NCBI taxonomy,for example, SUP05 uncultured bacterium by downloading the NCBI taxonomy structure from http://ab.inf.uni-tuebingen.de/data/software/megan4/download/welcome.html under Updates ofNCBI taxonomy. Unzip the file in the MEGAN/class/resources/files directory. Open the names.dmpfile in a text editor, append an unused taxon ID number (left most field) for the new species andenter scientific name in the far right field maintaining the syntax present in the rest of the file.Repeat this process at the genus or family level (parent nodes) as needed. For example, in the case ofSUP05 uncultured bacterium, the names.dmp file was amended with the following lines:805819 |SUP05 cluster| |scientific name|805820 |uncultured SUP05 cluster bacteriumr| |scientific name|To place SUP05 in context with existing parent nodes for higher order taxonomic structure, determinethe taxon ID number present in the names.dmp file, for example, the Gammaproteobacteria taxonID number is 1236. Open the nodes.dmp file, locate the position of your new taxon ID numberin the far left field, and enter the new taxon ID number in the far left field and the taxon IDnumber for the parent node in the second field position, for the remainder of the fields copyfrom an existing line (the NCBI download site explains all these fields in taxdump readme.txtat ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/). For example, in the case of SUP05 unculturedbacterium, the nodes. dmp file was amended with the following lines:805819 | 135619 | order | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |163805820 | 805819 | species | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |Save both of these files and relaunch MEGAN to use the newly assigned taxonomy and import yourBLAST output file as described in step 1.7. In the MEGAN tree view, open all the desired nodes by selecting aleaf on the far right, and selectingUncollapse Subtree from the Tree menu to open the internal nodes for that leaf, repeat for allleaves. From the Select menu, select All Nodes and then from the Export menu select Reads in theFile menu to export a list of sequences belonging to each taxon level.8. On your desktop, open a terminal window. Using the cd command, enter the directory containingthe MEGAN output files. Combine these files into a single file with sequence name and taxon IDusing the following gawk scripts:user$ gawk ?{ f =FILENAME” . new ” ; sub (” ˆ>” ,?” , $0 ) ; p r i n t $0?\ t ? FILENAME> f }? * . f a s t auser$ c a t * . new >combine taxanames . t x tuser$ gawk ? ! a [ $0 ] ? combine taxanames . t x t >combine taxonnames clean . t x t9. To sum the NSAF values for all proteins assigned to agiven taxon node, copy thecombine taxonnames clean.txt file into Excel and sum the NSAF values for all sequences with aunique taxon ID using the SUMIF function.10. Tree generation using iTOL requires the NCBI taxon ID assigned by MEGAN. A user account in iTOLis required to enter and store created trees. Enter a list of all taxon IDs (obtained from the names.dmpfile for identified taxa) into iTOL at other trees (no number below 1 is permitted) to generate aNewick formatted output tree. Copy the tree and upload it to a new project in iTOL using advancedoptions with internal node IDs selected. View tree in iTOL to evaluate information content andvisual balance. If there are more nodes than are possible to view in a single figure, consider adjustingthe MEGAN parameters, or use a cutoff of minimum NSAF value, excluding all underrepresentednodes, to reduce to total number of nodes and repeat the import process. Once satisfied with yourtree, export in Newick format including internal nodes.16411. Compare the taxon names in the exported tree to the taxon names of the nodes from MEGAN. It isimportant that names match exactly. Change spaces and ’|’ into ’ ’.12. For each sample, make a .csv file in the format taxa value value value. Use the same value inthe first two value columns with an R in front of the first value and a zero for the third value (e.g.,Gammaproteobacteria, R28, 28, 0). See iTOL Uploading and working with your own trees formore information on file headers. To provide a scale for your tree, add values to nodes which are ’0’and manipulate the graphics file later to move scale bubbles into tree positions in the figure.13. The.csv file is up loaded with a new tree in iTOL using the Multivalue bar or pie chart datatype. Additional settings such as the minimum and maximum radii are user defined. Occasionally,the tree or the data may not be displayed in the iTOL user interface but can still be exported. Use theexport function to generate an .svg file.14. The output of multiple .svg files can be composited together using graphic design software (e.g.,Adobe Illustrator) to visualize NSAF values, sequence read count, and taxonomic structure in aunified perspective (Figure 2.6).165Appendix BChapter 3: Supplementary material166AmoBCAmo associatedFe-S cluster proteinsAnamox protiensNarGHINXR NXR NXR NXRNapABNirK NirS NorCBNosZHAO-likeNir/SirAmmonia trasporterGlnB AmoBCAmo associatedFe-S cluster proteinsAnamox protiensNarGHINapABNirK NirS NorCBNosZHAO-likeNir/SirAmmonia trasporterGlnB AmoBCAmo associatedFe-S cluster proteinsAnamox protiensNarGHINapABNirK NirS NorCBNosZHAO-likeNir/SirAmmonia trasporterGlnB AmoBCAmo associatedFe-S cluster proteinsAnamox protiensNarGHINapABNirK NirS NorCBNosZHAO-likeNir/SirAmmonia trasporterGlnBNitrosopumulaceaePlanctomycetesNitrospira deuviiSUP05SymbiontsOtherNitrosopumulaceaePlanctomycetesNitrospira deuviiSUP05SymbiontsOtherNitrosopumulaceaePlanctomycetesNitrospira deuviiSUP05SymbiontsOtherNitrosopumulaceaePlanctomycetesNitrospira deuviiSUP05SymbiontsOtherScale (NSAF):Station 3Apr08Station 4 Station 3 Station 2Sep09NH4+ Ox AMX OtherDenitrication NH4+ Ox AMX OtherDenitrication NH4+ Ox AMX OtherDenitrication NH4+ Ox AMX OtherDenitrication100 m 100 m120 m 130 m150 m200 m 200 m9410.10.05Figure B.1: Detected nitrogen cycling proteins. Detected protein NSAF values in the Nitrogen cycle for April 2008 and September 2009showing highly similar protein expression profiles within water column compartments across multiple stations. Ammonia monooxygenase(AmoBC), protein product of Nmar 1501 adjacent to genes for AmoA and AmoB (Amo associated), 4Fe-4S binding domain proteins fromNitrosopumulacea apparently co-expressed with Amo proteins (Fe-S cluster proteins), Anammox proteins hydroxylamine oxidoreductase,hydrazine oxidoreductase, Nitrate reductase (NarGHI), periplasmic nitrate reductase (NapAB), copper containing nitrite reductase (NirK),nitrite reductase (NirS), nitric oxide reductase (NorCB), nitrous oxide reductase (NosZ), potential hydroxylamine oxidoreductase (HAO-like),nitrite/sulfite reductase proteins (Nir/Sir), Ammonium transporters, nitrogen PII regulatory proteins (GlnB).167Sqr Fcc Sox Dsr Apr Sat Sqr Fcc Sox Dsr Apr Sat Sqr Fcc Sox Dsr Apr Sat Sqr Fcc Sox Dsr Apr Sat100 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTIC96BD-19SUP05SymbiontsOther120 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTIC96BD-19SUP05SymbiontsOtherThaumarchaeotaNitrospira deuviiPlanctomycetesARCTIC96BD-19SUP05SymbiontsOther200 m100 m130 m150 m200 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTIC96BD-19SUP05SymbiontsOtherScale<0.01 0.1 1 2 5Station 3Apr08Station 4 Station 3 Station 2Sep09Apr08_100mApr08_120mApr08_200mS4_100S3_1004S_1303S_1302S_1003S_1502S_1304S_1502S_1504S_1903S_2002S_200HupLSScale:0.075 0.10 0.50 0.75 upper oxyclineMetaproteome NSAFlower oxyclineS / N transition zonesuldic zoneA.Detected SUP05 Hydrogenase Proteins (NSAF)B.Figure B.2: Detected sulfur and hydrogen cycling proteins. (A) Detected protein NSAF values in thesulfur cycle for April 2008 and September 2009 showing highly similar protein expression profiles withinwater column compartments across multiple stations. Sulfide:quinone reductase (Sqr), Fcc flavocytochromeC (Fcc), Sox sulfide oxidation protein complex (Sox), dissimilatory sulfate reductase pathway (Dsr),adenylylsulfate reductase (Apr), ATP sulfurylase (Sat). (B) Detected protein NSAF values for hydrogencycling gene Hydrogenase (HupLS).1683hb_dh3hp_dhACoA_atACoA_crb1cro_hymml_epipro_cbx1vya_iso6pp_kinf6p_atrfbp_asefbp_adlgp_dhp5p_epipg_kinppr_kinr5p_isoRuBisCOtpp_isooxy_syncit_lyfum_hdmal_dhpep_cbxpep_synpyr_cbxpyr_synsc_synsc_dhACoA_sy2CO_dn2fmt_synfm_dhmtf_dh3hb_dh3hp_dhACoA_atACoA_crb1cro_hymml_epipro_cbx1vya_iso6pp_kinf6p_atrfbp_asefbp_adlgp_dhp5p_epipg_kinppr_kinr5p_isoRuBisCOtpp_isooxy_syncit_lyfum_hdmal_dhpep_cbxpep_synpyr_cbxpyr_synsc_synsc_dhACoA_sy2CO_dh2fmt_synfm_dhmtf_dh100 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOther120 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOtherThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOther200 m100 m130 m150 m200 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOtherScale (NSAF)Apr08 S3 Sep09 S43HP-4HB CBB rTCA rACoA 3HP-4HB CBB rTCA rACoA0.0050.010.111.5upper oxyclineMetaproteome NSAFlower oxyclineS/N transition zonesuldic zoneFigure B.3: Detected proteins in carbon fixation pathways. Detected protein NSAF values in inorganic carbon fixation pathways for April 2008and September 2009. See Table B.4 for a list of protein names.1693hb_dh3hp_dhACoA_atACoA_crb1cro_hymml_epipro_cbx1vya_iso6pp_kinf6p_atrfbp_asefbp_adlgp_dhp5p_epipg_kinppr_kinr5p_isoRuBisCOtpp_isooxy_syncit_lyfum_hdmal_dhpep_cbxpep_synpyr_cbxpyr_synsc_synsc_dhACoA_sn2CO_dh2fmt_synfm_dhmtf_dh3hb_dh3hp_dhACoA_atACoA_crb1cro_hymml_epipro_cbx1vya_iso6pp_kinf6p_atrfbp_asefbp_adlgp_dhp5p_epipg_kinppr_kinr5p_isoRuBisCOtpp_isooxy_syncit_lyfum_hdmal_dhpep_cbxpep_synpyr_cbxpyr_synsc_synsc_dhACoA_sy2CO_dh2fmt_synfm_dhmtf_dh100 m130 m150 m200 m100 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOther130mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOther150 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOther200 mThaumarchaeotaNitrospira deuviiPlanctomycetesARCTICSUP05SymbiontsOtherSep09 S3 Sep09 S23HP-4HB CBB rTCA rACoA 3HP-4HB CBB rTCA rACoAFigure B.3 continued170Table B.1: Number of detected peptides and proteins. Detected peptides and proteins for samples fromApril 2008 and September 2009, showing before and after peptide prophet probability (PPP) score cutoff of≥ 0.95Total peptides detected Unique peptides detected Proteins detectedApr-08100m 10 Ltotal 829 549 811PPP ≥ .95 719 448 699100m 1.0 Ltotal 1480 771 815PPP ≥ .95 738 281 270120m 1.0 Ltotal 1514 784 763PPP ≥ .95 727 292 234200m 10Ltotal 17089 7651 6032PPP ≥ .95 14850 5796 4344200m 1.0 Ltotal 3344 1571 1436PPP ≥ .95 1883 692 4872S 100 9.5 Ltotal 4145 2123 2046PPP ≥ .95 2220 887 7083S 100 9.4 Ltotal 2542 1307 1467PPP ≥ .95 1447 584 654S4 100 9.4 Ltotal 1709 881 1031PPP ≥ .95 978 395 4392S 130 10 Ltotal 6292 2952 2796PPP ≥ .95 3321 1148 8473S 130 10 Ltotal 2800 1520 1562PPP ≥ .95 1542 666 5574S 130 7.4 Ltotal 3550 1798 1812Sep-09 PPP ≥ .95 2136 844 7582S 150 8.6 Ltotal 5490 2659 2501PPP ≥ .95 2860 1008 7133S 150 9.0 Ltotal 6638 3120 2967PPP ≥ .955 3686 1253 9244S 150 9.0 Ltotal 5649 2777 2589PPP ≥ .95 3088 1117 7722S 200 9.9 Ltotal 3565 1970 1958PPP ≥ .95 1734 719 5743S 200 9.0 Ltotal 4010 2111 2027PPP ≥ .95 2162 866 6784S 200 8.0 Ltotal 4872 2557 2443PPP ≥ .95 2461 982 715171Table B.2: Taxonomic breakdown for April 2008 metagenome. Taxonomic breakdown for abundant groups for metagenomic reads from April9, 2008 samples. Percentage of metagenome indicates the percentage of metagenomic reads (above 30 amino acids) which had top BLAST hit toindicated taxa. Number of unique genes detected is the number of unique reference sequences for indicated taxa that were recovered in topBLAST hits. Percentage genome covered is derived by the number of unique references recovered for a taxa divided by the total number ofprotein coding genes in the genome of that taxa.% of Metagenome Total number of genes detected Number of unique genes detected % genome coveredApr08 100 Apr08 120 Apr08 200 Apr08 100 Apr08 120 Apr08 200 Apr08 100 Apr08 120 Apr08 200 Apr08 100 Apr08 120 Apr08 200Nitrosopumilaceae 14.834 14.903 2.357 1875 1720 361 - - - - - -Ca. Nitrosoarchaeum koreensis MY1 1.108 0.988 0.183 140 114 28 117 96 26 6.02 4.94 1.34Ca. Nitrosoarchaeum limnia BG20 0.649 0.771 0.085 82 89 13 69 75 13 2.99 3.26 0.56Ca. Nitrosoarchaeum limnia SFB1 0.815 1.178 0.157 103 136 24 87 106 20 4.27 5.2 0.98Ca. Nitrosopumilus salaria BD31 6.036 5.138 0.986 763 593 151 485 402 136 22.52 18.66 6.31Nitrosopumilus maritimus SCM1 5.965 6.629 0.92 754 765 141 506 482 127 28.17 26.84 7.07Cenarchaeum symbiosum A 0.261 0.199 0.026 33 23 4 22 19 4 1.09 0.94 0.2Candidatus Nitrospira defluvii 0.253 0.286 0.072 32 33 11 28 26 9 0.66 0.61 0.21Planctomycetaceae 2.445 3.89 7.338 309 449 1124 - - - - - -Ca. Kuenenia stuttgartiensis 0.055 0.104 0.209 7 12 32 6 12 24 0.13 0.26 0.51Ca. Scalindua profunda* 0.672 2.192 6.763 85 253 1036 67 192 742Blastopirellula marina DSM 3645 0.245 0.173 0.046 31 20 7 27 17 7 0.45 0.28 0.12Planctomyces brasiliensis DSM 5305 0.309 0.208 0.013 39 24 2 31 22 2 0.65 0.46 0.04Planctomyces limnophilus DSM 3776 0.079 0.078 0.026 10 9 4 8 9 4 0.19 0.21 0.09Planctomyces maris DSM 8797 0.348 0.364 0.085 44 42 13 38 40 9 0.59 0.62 0.14Singulisphaera acidiphila DSM 18658 0.127 0.225 0.026 16 26 4 14 24 4 0.18 0.31 0.05planctomycete KSU-1 0.111 0.147 0.091 14 17 14 14 14 13 0.39 0.39 0.36Gemmata obscuriglobus UQM 2246 0.087 0.052 0.033 11 6 5 8 6 4 0.1 0.08 0.05Isosphaera pallida ATCC 43644 0.055 0.026 0 7 3 0 7 3 0 0.19 0.08 0Pirellula staleyi DSM 6068 0.19 0.173 0.033 24 20 5 23 18 5 0.49 0.38 0.11Rhodopirellula baltica SH 1 0.166 0.147 0.013 21 17 2 19 16 2 0.26 0.22 0.03SAR11 Cluster 12.033 6.533 1.025 1521 754 157 - - - - - -alpha proteobacterium HIMB114 0.095 0.069 0.007 12 8 1 12 7 1 0.84 0.49 0.07Ca. Pelagibacter sp. HTCC7211 4.881 3.726 0.607 617 430 93 435 315 82 30.06 21.77 5.67Ca. Pelagibacter sp. IMCC9063 0.285 0.139 0.052 36 16 8 34 13 7 2.35 0.9 0.48Ca. Pelagibacter ubique HTCC1002 2.492 1.022 0.118 315 118 18 247 94 17 17.73 6.75 1.22Ca. Pelagibacter ubique HTCC1062 4.28 1.577 0.242 541 182 37 397 156 34 29.32 11.52 2.51ARCTIC96BD-19 2.745 2.227 1.234 347 257 189 229 180 75 25.79 20.27 8.45SUP05 3.125 6.776 27.216 395 782 4169 259 499 1097 20.08 38.68 85.04Symbionts 1.891 2.27 8.8 239 262 1348 - - - - - -Ca. Vesicomyosocius okutanii HA 0.459 0.442 1.528 58 51 234 45 45 136 4.8 4.8 14.51Ca. Ruthia magnifica str. Cm (Calyptogena magnifica) 0.728 0.927 3.551 92 107 544 73 85 241 7.48 8.71 24.69Endoriftia persephone ’Hot96 1+Hot96 2’ 0.016 0.026 0.091 2 3 14 2 2 13 0.03 0.03 0.2endosymbiont of Riftia pachyptila (vent Ph05) 0.055 0.087 0.405 7 10 62 6 10 53 0.19 0.31 1.67endosymbiont of Tevnia jerichonana (vent Tica) 0.119 0.165 0.947 15 19 145 14 18 102 0.43 0.56 3.16endosymbiont of Bathymodiolus sp.* 0.514 0.624 2.278 65 72 349 41 57 131 - - -* complete genome not avaliable172Table B.3: Taxonomic breakdown for Sepetmber 2009 metaproteome. Taxonomic break down of abundant groups for metaproteome samplesfrom September 1, 2009. Protein NSAF shows the total value for indicated group or taxa. Total proteins detected shows the total number ofdetected proteins originating from indicated taxa. Unique proteins detected shows the number of unique reference sequences for indicated taxawhich were recovered in top BLAST hit. Percent genome coverage in proteome is derived by the number of unique references recovered for ataxa divided by the total number of protein coding genes in the genome of that taxa.NSAF3S 100 S4 100 2S 100 3S 130 4S 130 2S 130 2S 150 3S 150 4S 150 2S 200 3S 200 4S 200 Apr08 100 Apr08 120 Apr08 200Thaumarchaeota 28.591 30.035 10.738 9.7 10.983 3.018 3.282 4.15 1.8 2.484 1.44 1.224 22.797 17.668 3.113Ca. Nitrosoarchaeum koreensis MY1 2.669 3.903 0.81 1.008 0.753 0.104 0.089 0.129 0.04 0.023 0.019 0.017 1.876 0.341 0.278Ca. Nitrosoarchaeum limnia BG20 2.948 2.442 0.635 0.891 0.971 0.037 0 0.146 0 0 0 0 2.296 2.497 0.131Ca. Nitrosoarchaeum limnia SFB1 6.822 9.89 3.214 3.275 3.704 1.347 1.399 1.355 0.805 1.292 0.7 0.686 5.145 6.147 0.384Ca. Nitrosopumilus salaria BD31 7.134 6.288 1.002 1.133 1.64 0.166 0.12 0.245 0.087 0.099 0.026 0.072 4.797 0.697 0.959Nitrosopumilus maritimus SCM1 8.943 7.496 5.069 3.392 3.891 1.364 1.674 2.276 0.867 1.07 0.696 0.449 8.566 7.986 1.353Cenarchaeum symbiosum A 0.075 0.016 0.008 0 0.024 0 0 0 0 0 0 0 0.116 0 0.008Ca Nitrospira defluvii 6.285 7.995 3.085 4.096 4.136 0.624 0.689 0.954 0.36 0.35 0.372 0.096 7.105 7.57 0.362Planctomycetia 10.68 12.36 6.1 9.186 6.752 2.06 2.036 2.158 1.423 2.54 2.311 2.045 8.874 11.573 5.872Ca. Kuenenia stuttgartiensis 0.081 0.116 0.216 0.256 0.269 0.118 0.306 0.188 0.277 0.222 0.348 0.237 0.414 0 0.286Ca. Scalindua profunda 0.142 0.171 1.291 0.379 0.459 1.143 1.249 0.964 0.843 2.172 1.795 1.781 0.371 0.675 5.256Blastopirellula marina DSM 3645 0 0 0 0 0 0 0 0 0 0 0 0 0.328 0 0Planctomyces brasiliensis DSM 5305 0.014 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001Planctomyces limnophilus DSM 3776 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Planctomyces maris DSM 8797 0 0 0 0 0 0 0 0 0 0 0 0 0.276 0.385 0Singulisphaera acidiphila DSM 18658 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.022planctomycete KSU-1 10.45 12.08 4.594 8.551 6.024 0.799 0.48 1.006 0.303 0.145 0.168 0.027 7.484 10.513 0.306SAR11 cluster 10.536 10.322 4.281 5.936 3.532 0.565 0.607 0.704 0.552 0.151 0.29 0.415 10.356 8.742 0.703alpha proteobacterium HIMB114 0 0 0 0 0 0 0 0 0 0 0 0 0.034 0 0.015Ca. Pelagibacter sp. HTCC7211 0.853 0.764 0.263 0.251 0.131 0.071 0.073 0.088 0.11 0.044 0.08 0.117 0.823 0.104 0.183Ca. Pelagibacter sp. IMCC9063 0.374 0.189 0 0 0 0 0 0 0 0 0 0 0.376 0 0Ca. Pelagibacter ubique HTCC1002 3.664 4.942 1.233 2.077 1.688 0.07 0.102 0.121 0 0 0 0 5.492 3.798 0.164Ca. Pelagibacter ubique HTCC1062 5.645 4.426 2.785 3.608 1.714 0.425 0.432 0.494 0.442 0.107 0.209 0.298 3.631 4.839 0.34ARCTIC96BD-19 6.009 6.505 3.686 3.145 2.273 1.475 1.829 1.916 2.067 2.357 3.205 2.747 8.717 6.156 1.396SUP05 7.779 7.948 24.45 28.64 32.78 32.69 34.2 35.52 41.53 40.49 41.53 40.07 6.824 15.975 46.307sulfur-oxidizing symbionts 1.054 0.726 7.247 4.39 7.005 12.86 11.27 10.91 12.26 9.33 11.83 13.73 1.688 2.586 8.754Ca. Vesicomyosocius okutanii HA 0.007 0.006 0.32 0.158 0.551 0.752 0.618 0.821 1.204 0.875 1.273 1.548 0.098 0 1.448Ca. Ruthia magnifica str. Cm (Calyptogena magnifica) 0.583 0.322 0.631 0.664 1.551 2.061 1.941 2.083 2.425 1.728 3.744 2.949 1.217 0.52 3.379Endoriftia persephone ’Hot96 1+Hot96 2’ 0 0 0.016 0 0 0 0.151 0.06 0.012 0 0 0 0 0 0.039endosymbiont of Riftia pachyptila (vent Ph05) 0.323 0.306 4.22 2.579 3.192 5.766 5.377 4.974 5.144 4.839 4.288 5.046 0.048 0.784 1.778endosymbiont of Tevnia jerichonana (vent Tica) 0 0 1.432 0.613 0.77 2.947 2.016 2.063 2.067 0.886 0.859 1.895 0.062 0 0.678endosymbiont of Bathymodiolus sp. 0.141 0.094 0.627 0.373 0.942 1.335 1.168 0.917 1.411 1.002 1.669 2.299 0.262 1.282 1.432173Table B.3 Taxonomic breakdown for Sepetmber 2009 metaproteome continuedTotal Proteins Detected3S 100 S4 100 2S 100 3S 130 4S 130 2S 130 2S 150 3S 150 4S 150 2S 200 3S 200 4S 200 Apr08 100 Apr08 120 Apr08 200Thaumarchaeota 241 145 75 82 117 37 28 51 24 16 16 13 245 44 308Ca. Nitrosoarchaeum koreensis MY1 26 22 15 14 16 7 6 8 5 2 2 2 21 6 30Ca. Nitrosoarchaeum limnia BG20 13 10 7 9 9 4 0 5 0 0 0 0 14 8 9Ca. Nitrosoarchaeum limnia SFB1 14 11 4 5 10 4 3 4 3 3 3 3 16 4 9Ca. Nitrosopumilus salaria BD31 98 52 25 25 37 8 8 13 6 3 3 3 94 6 130Nitrosopumilus maritimus SCM1 88 49 23 29 44 14 11 21 10 8 8 5 98 20 128Cenarchaeum symbiosum A 2 1 1 0 1 0 0 0 0 0 0 0 2 0 2Ca Nitrospira defluvii 9 6 4 4 4 4 4 3 4 3 4 3 7 6 21Planctomycetia 19 17 33 25 37 36 31 40 29 36 38 33 39 15 284Ca. Kuenenia stuttgartiensis 4 2 12 5 13 11 8 11 10 10 14 10 13 0 29Ca. Scalindua profunda 5 6 12 11 15 17 16 19 14 23 21 21 15 6 239Blastopirellula marina DSM 3645 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0Planctomyces brasiliensis DSM 5305 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1Planctomyces limnophilus DSM 3776 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1Planctomyces maris DSM 8797 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0Singulisphaera acidiphila DSM 18658 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2planctomycete KSU-1 9 9 9 9 9 8 7 10 5 3 3 2 9 8 12SAR11 cluster 35 30 29 24 29 10 11 21 9 7 8 7 50 18 61alpha proteobacterium HIMB114 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2Ca. Pelagibacter sp. HTCC7211 9 4 6 4 5 4 3 5 5 3 4 3 12 1 25Ca. Pelagibacter sp. IMCC9063 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0Ca. Pelagibacter ubique HTCC1002 10 9 9 8 8 2 2 4 0 0 0 0 16 8 10Ca. Pelagibacter ubique HTCC1062 15 16 14 12 16 4 6 12 4 4 4 4 20 9 24ARCTIC96BD-19 20 15 25 19 21 22 20 23 22 16 18 28 27 8 99SUP05 94 66 204 151 216 301 251 324 297 205 261 280 104 47 1529sulfur-oxidizing symbionts 25 12 57 36 71 100 79 102 106 68 86 90 29 8 463Ca. Vesicomyosocius okutanii HA 4 1 11 6 12 16 13 16 19 15 15 15 3 0 82Ca. Ruthia magnifica str. Cm (Calyptogena magnifica) 11 8 18 14 24 38 26 41 39 22 33 32 19 3 188Endoriftia persephone ’Hot96 1+Hot96 2’ 0 0 1 0 0 0 1 1 1 0 0 0 0 0 3endosymbiont of Riftia pachyptila (vent Ph05) 2 1 8 6 8 13 8 13 10 8 8 11 2 3 22endosymbiont of Tevnia jerichonana (vent Tica) 2 0 10 1 7 13 13 16 12 10 11 13 1 0 40endosymbiont of Bathymodiolus sp. 6 2 9 9 20 20 18 15 25 13 19 19 4 2 128174Table B.3 Taxonomic breakdown for Sepetmber 2009 metaproteome continuedUnique proteins detected3S 100 S4 100 2S 100 3S 130 4S 130 2S 130 2S 150 3S 150 4S 150 2S 200 3S 200 4S 200 Apr08 100 Apr08 120 Apr08 200Thaumarchaeota 120 75 42 44 64 23 18 29 16 10 11 9 128 27 161Ca. Nitrosoarchaeum koreensis MY1 15 11 9 7 9 4 3 5 2 1 1 1 11 4 18Ca. Nitrosoarchaeum limnia BG20 9 6 4 5 5 1 0 2 0 0 0 0 9 4 8Ca. Nitrosoarchaeum limnia SFB1 10 7 4 5 7 4 3 4 3 3 3 3 11 4 8Ca. Nitrosopumilus salaria BD31 41 23 12 11 18 6 5 7 5 2 2 2 47 5 57Nitrosopumilus maritimus SCM1 43 27 12 16 24 8 7 11 6 4 5 3 49 10 68Cenarchaeum symbiosum A 2 1 1 0 1 0 0 0 0 0 0 0 1 0 2Ca Nitrospira defluvii 5 3 2 2 2 2 2 1 2 1 2 1 4 3 9Planctomycetia 9 6 13 11 15 22 17 20 18 19 18 17 16 8 175Ca. Kuenenia stuttgartiensis 2 1 4 2 4 6 4 4 5 4 5 3 5 0 15Ca. Scalindua profunda 4 3 7 7 9 13 10 13 11 14 12 13 7 5 152Blastopirellula marina DSM 3645 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0Planctomyces brasiliensis DSM 5305 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1Planctomyces limnophilus DSM 3776 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1Planctomyces maris DSM 8797 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0Singulisphaera acidiphila DSM 18658 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2planctomycete KSU-1 2 2 2 2 2 3 3 3 2 1 1 1 2 2 4SAR11 cluster 24 19 15 12 16 9 9 13 8 5 6 5 36 7 43alpha proteobacterium HIMB114 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2Ca. Pelagibacter sp. HTCC7211 8 4 6 4 5 3 3 4 5 3 4 3 11 1 19Ca. Pelagibacter sp. IMCC9063 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0Ca. Pelagibacter ubique HTCC1002 5 4 4 3 3 2 2 3 0 0 0 0 10 3 7Ca. Pelagibacter ubique HTCC1062 10 10 5 5 8 4 4 6 3 2 2 2 13 3 15ARCTIC96BD-19 9 7 10 7 7 9 7 9 9 6 6 12 12 2 44SUP05 32 24 71 58 73 103 87 108 97 67 83 90 46 23 517sulfur-oxidizing symbionts 14 8 40 23 43 60 51 61 60 41 50 53 17 7 264Ca. Vesicomyosocius okutanii HA 3 1 7 3 7 9 7 9 10 7 7 7 3 0 56Ca. Ruthia magnifica str. Cm (Calyptogena magnifica) 5 4 12 8 14 20 14 20 19 13 18 17 8 3 101Endoriftia persephone ’Hot96 1+Hot96 2’ 0 0 1 0 0 0 1 1 1 0 0 0 0 0 2endosymbiont of Riftia pachyptila (vent Ph05) 1 1 5 4 4 8 5 8 5 4 4 6 2 2 13endosymbiont of Tevnia jerichonana (vent Tica) 1 0 8 1 5 11 11 14 10 8 9 11 1 0 31endosymbiont of Bathymodiolus sp. 4 2 7 7 13 12 13 9 15 9 12 12 3 2 61175Table B.3 Taxonomic breakdown for Sepetmber 2009 metaproteome continued% genome coverage in proteome3S 100 S4 100 2S 100 3S 130 4S 130 2S 130 2S 150 3S 150 4S 150 2S 200 3S 200 4S 200 Apr08 100 Apr08 120 Apr08 200Thaumarchaeota - - - - - - - - - - - - - - -Ca. Nitrosoarchaeum koreensis MY1 0.771 0.566 0.463 0.36 0.463 0.206 0.154 0.257 0.103 0.051 0.051 0.051 0.566 0.206 0.925Ca. Nitrosoarchaeum limnia BG20 0.391 0.26 0.174 0.217 0.217 0.043 0 0.087 0 0 0 0 0.391 0.174 0.347Ca. Nitrosoarchaeum limnia SFB1 0.491 0.343 0.196 0.245 0.343 0.196 0.147 0.196 0.147 0.147 0.147 0.147 0.54 0.196 0.393Ca. Nitrosopumilus salaria BD31 1.903 1.068 0.557 0.511 0.836 0.279 0.232 0.325 0.232 0.093 0.093 0.093 2.182 0.232 2.646Nitrosopumilus maritimus SCM1 2.394 1.503 0.668 0.891 1.336 0.445 0.39 0.612 0.334 0.223 0.278 0.167 2.728 0.557 3.786Cenarchaeum symbiosum A 0.099 0.05 0.05 0 0.05 0 0 0 0 0 0 0 0.05 0 0.099Ca Nitrospira defluvii 0.117 0.07 0.047 0.047 0.047 0.047 0.047 0.023 0.047 0.023 0.047 0.023 0.094 0.07 0.211Planctomycetia - - - - - - - - - - - - - - -Ca. Kuenenia stuttgartiensis 0.043 0.021 0.086 0.043 0.086 0.129 0.086 0.086 0.107 0.086 0.107 0.064 0.107 0 0.322Ca. Scalindua profunda - - - - - - - - - - - - - - -Blastopirellula marina DSM 3645 0 0 0 0 0 0 0 0 0 0 0 0 0.017 0 0Planctomyces brasiliensis DSM 5305 0.021 0 0 0 0 0 0 0 0 0 0 0 0 0 0.021Planctomyces limnophilus DSM 3776 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.023Planctomyces maris DSM 8797 0 0 0 0 0 0 0 0 0 0 0 0 0.015 0.015 0Singulisphaera acidiphila DSM 18658 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.026planctomycete KSU-1 0.056 0.056 0.056 0.056 0.056 0.083 0.083 0.083 0.056 0.028 0.028 0.028 0.056 0.056 0.111SAR11 cluster - - - - - - - - - - - - - - -alpha proteobacterium HIMB114 0 0 0 0 0 0 0 0 0 0 0 0 0.07 0 0.14Ca. Pelagibacter sp. HTCC7211 0.553 0.276 0.415 0.276 0.346 0.207 0.207 0.276 0.346 0.207 0.276 0.207 0.76 0.069 1.313Ca. Pelagibacter sp. IMCC9063 0.069 0.069 0 0 0 0 0 0 0 0 0 0 0.069 0 0Ca. Pelagibacter ubique HTCC1002 0.359 0.287 0.287 0.215 0.215 0.144 0.144 0.215 0 0 0 0 0.718 0.215 0.503Ca. Pelagibacter ubique HTCC1062 0.739 0.739 0.369 0.369 0.591 0.295 0.295 0.443 0.222 0.148 0.148 0.148 0.96 0.222 1.108ARCTIC96BD-19 1.01 0.79 1.13 0.79 0.79 1.01 0.79 1.01 1.01 0.68 0.68 1.35 1.35 0.23 4.95SUP05 2.48 1.86 5.5 4.5 5.66 7.98 6.74 8.37 7.52 5.19 6.43 6.98 3.57 1.78 40.08sulfur-oxidizing symbionts - - - - - - - - - - - - - - -Ca. Vesicomyosocius okutanii HA 0.32 0.107 0.747 0.32 0.747 0.961 0.747 0.961 1.067 0.747 0.747 0.747 0.32 0 5.977Ca. Ruthia magnifica str. Cm (Calyptogena magnifica) 0.512 0.41 1.23 0.82 1.434 2.049 1.434 2.049 1.947 1.332 1.844 1.742 0.82 0.307 10.348Endoriftia persephone ’Hot96 1+Hot96 2’ 0 0 0.016 0 0 0 0.016 0.016 0.016 0 0 0 0 0 0.031endosymbiont of Riftia pachyptila (vent Ph05) 0.031 0.031 0.157 0.126 0.126 0.251 0.157 0.251 0.157 0.126 0.126 0.189 0.063 0.063 0.409endosymbiont of Tevnia jerichonana (vent Tica) 0.031 0 0.248 0.031 0.155 0.341 0.341 0.433 0.31 0.248 0.279 0.341 0.031 0 0.96endosymbiont of Bathymodiolus sp. - - - - - - - - - - - - - - -* completed genome not avaliable176Table B.4: Protein naming key. Table of full names of detected proteins for nitrogen and sulfur-basedenergy metabolism and inorganic carbon fixation pathways depicted in Figures 3.4 and B.1, B.2, B.3. Boldindicates enzyme is essential for carbon fixation pathway to be occurringPathway Abreviation FunctionNitrogen energy metabolismNitrification Amo Ammonia monooxygenaseAnammox Anx hydroxylamine oxidoreductase, hydrazine oxidoreductasesNitrification/Denitrification Nar Nitrate reductaseDenitrification Nap periplasmic nitrate reductaseDenitrification NirS nitrite reductaseDenitrification NirK copper containing nitrite reductaseDenitrification Nor nitric oxide reductaseDenitrification NosZ nitrous oxide reductasePossible Dissimilatory nitrate reduction HAO Hydroxylamine oxidoreductase-like proteinSulfur energy metabolismsulfide oxidation Sqr sulfide:quinone reductasesulfide oxidation Fcc Fcc flavocytochrome Csulfide oxidation Sox Sox sulfide oxidation protein complexsulfide oxidation Dsr dissimilatory sulfate reductase pathwaysulfide oxidation Apr adenylylsulfate reductasesulfide oxidation Sat ATP sulfurylaseInorganic Carbon Fixation3-hydroxypropionate/4-hydroxybutyrate (3HP-4HB)3HP-4HB 3hb dh 3-hydroxybutryryl-CoA dehydrogenase3HP-4HB 3hp-dh 3-hydroxypropionyl-CoA dehydratase3HP-4HB ACoA at acetyl-CoA acetyltransferase3HP-4HB ACoA crb acetyl-CoA carboxylase13HP-4HB cro hy crotonyl-CoA hydratase3HP-4HB mml epi methylmalonyl-CoA epimerase3HP-4HB pro cbx propionyl CoA carboxylase13HP-4HB vya iso vinylacetyl-CoA isomeraseCalvin Benson Basham (CBB)CBB 6pp kin 6-phosphofructokinaseCBB f6p atr transketolaseCBB fbp ase fructose-1,6-bisphosphataseCBB fbp adl fructose-bisphosphate aldolaseCBB gp dh glyceraldehyde-3-phosphate dehydrogenaseCBB p5p epi ribulose-5-phosphate 3-epimeraseCBB pg kin 3-phosphoglycerate kinaseCBB ppr kin phosphoribulokinaseCBB r5p iso ribose 5-phosphate isomeraseCBB RuBisCO Ribulose 1,5-bisphosphate carboxylaseCBB tpp iso triosephosphate isomeraseReductive tricarboxylic acid cyclerTCA oxy syn 2-oxoglutarate synthaserTCA cit ly aconitase BrTCA fum hd fumarate hydrataserTCA mal dh malate dehydrogenaserTCA pep cbx Phosphoenolpyruvate carboxykinaserTCA pep syn phosphoenolpyruvate synthaserTCA pyr cbx pyruvate carboxylaserTCA pyr syn pyruvate synthaserTCA sc syn succinyl-CoA synthetaserTCA sc dh Succinate dehydrogenaseReductive Acetyl-CoA pathwayrACoA ACoA syn acetyl-CoA synthetase2rACoA CO dh carbon-monoxide dehydrogenase2rACoA fmt syn formyltetrahydrofolate synthetaserACoA fm dh formate dehydrogenaserACoA mtf dh methylene-tetrahydrofolate dehydrogenase1. Within Nitrosopumulis maritimus these steps are proposed to be catalyzed by the same enzyme (Walker et al 2010).2. These enzymes are often found as a single protein with dual functions.177Appendix CChapter 4: Supplementary material178Table C.1: Metagenome inventory for global fragment recruitment analysis. Inventory of environmental metagenomes with hits to Marinimi-crobia SAGs in biogeography fragment recruitment analysis.File Name Accession Data Repository Location/sampleID Specific Ecosystem Metagenome Group Latitude Longitude Depth (m) Size (bp) Sequencing Platform Reference / ContactCENF SAMEA2619399 ENA DCM 004 0.22-1.6 North Atlantic Subtropical Gyre DCM North Atlantic Ocean 36.5533 -6.5669 40m 4.27E+08 Illumina Sunagawa et al. 2015CENG SAMEA2591057 ENA SRF 007 0.22-1.6 Mediterranean Sea, Black Sea SRF Mediterranean Sea 37.051 1.9378 5m 2.25E+08 Illumina Sunagawa et al. 2015CENI SAMEA2591107 ENA DCM 023 0.22 Mediterranean Sea, Black Sea DCM Mediterranean Sea 42.1735 17.7252 55m 9.62E+07 Illumina Sunagawa et al. 2015CENJ SAMEA2619531 ENA SRF 009 0.22-1.6 Mediterranean Sea, Black Sea SRF Mediterranean Sea 39.1633 5.916 5m 4.36E+08 Illumina Sunagawa et al. 2015CENN SAMEA2591084 ENA SRF 023 0.22-1.6 Mediterranean Sea, Black Sea SRF Mediterranean Sea 42.2038 17.715 5m 1.97E+08 Illumina Sunagawa et al. 2015CENO SAMEA2619857 ENA SRF 033 0.22-1.6 Red Sea, Persian Gulf SRF Red Sea 21.9467 38.2517 5m 2.28E+08 Illumina Sunagawa et al. 2015CENP SAMEA2591122 ENA DCM 030 0.22-1.6 Mediterranean Sea, Black Sea DCM Mediterranean Sea 33.9235 32.8118 70m 5.23E+08 Illumina Sunagawa et al. 2015CENQ SAMEA2619802 ENA SRF 031 0.22-1.6 Red Sea, Persian Gulf SRF Red Sea 27.16 34.835 5m 2.90E+08 Illumina Sunagawa et al. 2015CENT SAMEA2591074 ENA DCM 007 0.22-1.6 Mediterranean Sea, Black Sea DCM Mediterranean Sea 37.0541 1.9478 42m 2.12E+08 Illumina Sunagawa et al. 2015CENU SAMEA2619766 ENA SRF 025 0.22-1.6 Mediterranean Sea, Black Sea SRF Mediterranean Sea 39.3888 19.3905 5m 4.67E+08 Illumina Sunagawa et al. 2015CENW SAMEA2619818 ENA SRF 032 0.22-1.6 Red Sea, Persian Gulf SRF Red Sea 23.36 37.2183 5m 2.90E+08 Illumina Sunagawa et al. 2015CENX SAMEA2619678 ENA DCM 018 0.22-1.6 Mediterranean Sea, Black Sea DCM Mediterranean Sea 35.7528 14.2765 60m 4.65E+08 Illumina Sunagawa et al. 2015CENY SAMEA2619840 ENA DCM 032 0.22-1.6 Red Sea, Persian Gulf DCM Red Sea 23.4183 37.245 80m 4.01E+08 Illumina Sunagawa et al. 2015CENZ SAMEA2619782 ENA DCM 025 0.22-1.6 Mediterranean Sea, Black Sea DCM Mediterranean Sea 39.3991 19.3997 50m 4.28E+08 Illumina Sunagawa et al. 2015CEOC SAMEA2619952 ENA DCM 036 0.22-1.6 Northwest Arabian Sea Upwelling DCM Indian Ocean 20.8222 63.5133 17m 3.30E+08 Illumina Sunagawa et al. 2015CEOF SAMEA2619667 ENA SRF 018 0.22-1.6 Mediterranean Sea, Black Sea SRF Mediterranean Sea 35.759 14.2574 5m 4.70E+08 Illumina Sunagawa et al. 2015CEOJ SAMEA2619879 ENA SRF 034 0.22-1.6 Red Sea, Persian Gulf SRF Red Sea 18.3967 39.875 5m 2.25E+08 Illumina Sunagawa et al. 2015CEOK SAMEA2591108 ENA SRF 030 0.22-1.6 Mediterranean Sea, Black Sea SRF Mediterranean Sea 33.9179 32.898 5m 4.77E+08 Illumina Sunagawa et al. 2015CEOM SAMEA2619376 ENA SRF 004 0.22-1.6 North Atlantic Subtropical Gyral SRF North Atlantic Ocean 36.5533 -6.5669 5m 3.91E+08 Illumina Sunagawa et al. 2015CEOO SAMEA2619927 ENA SRF 036 0.22-1.6 Northwest Arabian Sea Upwelling SRF Indian Ocean 20.8183 63.5047 5m 2.16E+08 Illumina Sunagawa et al. 2015CEOP SAMEA2619548 ENA DCM 009 0.22-1.6 Mediterranean Sea, Black Sea DCM Mediterranean Sea 39.0609 5.9422 55m 4.91E+08 Illumina Sunagawa et al. 2015CEOQ SAMEA2619907 ENA DCM 034 0.22-1.6 Red Sea, Persian Gulf DCM Red Sea 18.4417 39.8567 60m 6.06E+08 Illumina Sunagawa et al. 2015CEOR SAMEA2620836 ENA DCM 064 0.1-0.22 Eastern Africa Coastal DCM Indian Ocean -29.5333 37.9117 65m 1.16E+08 Illumina Sunagawa et al. 2015CEOS SAMEA2620666 ENA MES 056 0.22-3 Eastern Africa Coastal MES Indian Ocean -15.3379 43.2948 1000m 2.53E+08 Illumina Sunagawa et al. 2015CEOV SAMEA2620756 ENA SRF 062 0.22-3 Eastern Africa Coastal SRF Indian Ocean -22.3368 40.3412 5m 2.33E+08 Illumina Sunagawa et al. 2015CEOW SAMEA2620734 ENA DCM 058 0.22-3 Eastern Africa Coastal DCM Indian Ocean -17.2855 42.2866 66m 2.64E+08 Illumina Sunagawa et al. 2015CEOX SAMEA2620890 ENA DCM 065 0.22-3 Eastern Africa Coastal DCM Indian Ocean -35.2421 26.3048 30m 2.05E+08 Illumina Sunagawa et al. 2015CEOY SAMEA2620651 ENA SRF 056 0.22-3 Eastern Africa Coastal SRF Indian Ocean -15.3424 43.2965 5m 1.99E+08 Illumina Sunagawa et al. 2015CEOZ SAMEA2620786 ENA SRF 064 0.22-3 Eastern Africa Coastal SRF Indian Ocean -29.5019 37.9889 5m 4.11E+08 Illumina Sunagawa et al. 2015CEPA SAMEA2620542 ENA SRF 052 0.22-1.6 Indian South Subtropical Gyre SRF Indian Ocean -16.957 53.9801 5m 3.11E+08 Illumina Sunagawa et al. 2015CEPB SAMEA2620672 ENA SRF 057 0.22-3 Eastern Africa Coastal SRF Indian Ocean -17.0248 42.7401 5m 2.20E+08 Illumina Sunagawa et al. 2015CEPC SAMEA2620828 ENA DCM 064 0.22-3 Eastern Africa Coastal DCM Indian Ocean -29.5333 37.9117 65m 2.58E+08 Illumina Sunagawa et al. 2015CEPD SAMEA2620815 ENA MES 064 0.22-3 Eastern Africa Coastal MES Indian Ocean -29.5046 37.9599 1000m 1.36E+08 Illumina Sunagawa et al. 2015CEPJ SAMEA2620404 ENA SRF 048 0.22-1.6 Indian South Subtropical Gyre SRF Indian Ocean -9.3921 66.4228 5m 2.84E+08 Illumina Sunagawa et al. 2015CEPK SAMEA2620259 ENA DCM 042 0.22-1.6 Indian Monsoon Gyres DCM Indian Ocean 5.9998 73.9067 80m 3.65E+08 Illumina Sunagawa et al. 2015CEPL SAMEA2620339 ENA SRF 045 0.22-1.6 Indian Monsoon Gyres SRF Indian Ocean 0.0033 71.6428 5m 2.37E+08 Illumina Sunagawa et al. 2015CEPS SAMEA2620035 ENA MES 038 0.22-1.6 Indian Monsoon Gyres MES Indian Ocean 19.0351 64.5638 340m 2.19E+08 Illumina Sunagawa et al. 2015CEPT SAMEA2620081 ENA DCM 039 0.22-1.6 Indian Monsoon Gyres DCM Indian Ocean 18.5839 66.4727 25m 3.04E+08 Illumina Sunagawa et al. 2015CEPU SAMEA2620194 ENA SRF 041 0.22-1.6 Indian Monsoon Gyres SRF Indian Ocean 14.6059 69.9776 5m 2.74E+08 Illumina Sunagawa et al. 2015CEPV SAMEA2620097 ENA MES 039 0.22-1.6 Indian Monsoon Gyres MES Indian Ocean 18.7341 66.3896 270m 2.61E+08 Illumina Sunagawa et al. 2015CEPW SAMEA2620021 ENA DCM 038 0.22-1.6 Indian Monsoon Gyres DCM Indian Ocean 19.0284 64.5126 25m 3.38E+08 Illumina Sunagawa et al. 2015CEPX SAMEA2619974 ENA MES 037 0.1-0.22 Northwest Arabian Sea Upwelling MES Indian Ocean 20.8457 63.5851 600m 5.00E+08 Illumina Sunagawa et al. 2015CEPZ SAMEA2620106 ENA MES 039 0.1-0.22 Indian Monsoon Gyres MES Indian Ocean 18.7341 66.3896 270m 2.52E+08 Illumina Sunagawa et al. 2015CEQA SAMEA2620227 ENA DCM 041 0.22 Indian Monsoon Gyres DCM Indian Ocean 14.5536 70.0128 60m 1.38E+08 Illumina Sunagawa et al. 2015CEQD SAMEA2620230 ENA SRF 042 0.22-1.6 Indian Monsoon Gyres SRF Indian Ocean 6.0001 73.8955 5m 2.42E+08 Illumina Sunagawa et al. 2015CEQE SAMEA2619970 ENA MES 037 0.22-1.6 Northwest Arabian Sea Upwelling MES Indian Ocean 20.8457 63.5851 600m 3.84E+08 Illumina Sunagawa et al. 2015CEQF SAMEA2621003 ENA MES 068 0.45-0.8 South Atlantic Gyral MES South Atlantic Ocean -31.0198 4.6685 700m 2.59E+08 Illumina Sunagawa et al. 2015CEQG SAMEA2621242 ENA MES 076 0.45-0.8 South Atlantic Gyral MES South Atlantic Ocean -20.9315 -35.1794 800m 3.21E+08 Illumina Sunagawa et al. 2015CEQH SAMEA2621232 ENA MES 076 0.22-3 South Atlantic Gyral MES South Atlantic Ocean -20.9315 -35.1794 800m 2.91E+08 Illumina Sunagawa et al. 2015CEQI SAMEA2621204 ENA SRF 076 0.45-0.8 South Atlantic Gyral SRF South Atlantic Ocean -20.9354 -35.1803 5m 2.00E+08 Illumina Sunagawa et al. 2015179C.1 Metagenome inventory for global fragment recruitment continued from previous pageFile Name Accession Data Repository Location/sampleID Specific Ecosystem Metagenome Group Latitude Longitude Depth (m) Size (bp) Sequencing Platform Reference / ContactCEQJ SAMEA2621198 ENA SRF 076 0.22-3 South Atlantic Gyral SRF South Atlantic Ocean -20.9354 -35.1803 5m 1.10E+08 Illumina Sunagawa et al. 2015CEQL SAMEA2621203 ENA SRF 076 0.22-0.45 South Atlantic Gyral SRF South Atlantic Ocean -20.9354 -35.1803 5m 3.28E+08 Illumina Sunagawa et al. 2015CEQM SAMEA2621278 ENA DCM 078 0.45-0.8 South Atlantic Gyral DCM South Atlantic Ocean -30.1484 -43.2705 120m 2.69E+08 Illumina Sunagawa et al. 2015CEQN SAMEA2621277 ENA DCM 078 0.22-0.45 South Atlantic Gyral DCM South Atlantic Ocean -30.1484 -43.2705 120m 3.75E+08 Illumina Sunagawa et al. 2015CEQO SAMEA2621272 ENA DCM 078 0.22-3 South Atlantic Gyral DCM South Atlantic Ocean -30.1484 -43.2705 120m 2.82E+08 Illumina Sunagawa et al. 2015CEQP SAMEA2621221 ENA DCM 076 0.22-0.45 South Atlantic Gyral DCM South Atlantic Ocean -21.0292 -35.3498 150m 3.07E+08 Illumina Sunagawa et al. 2015CEQQ SAMEA2621222 ENA DCM 076 0.45-0.8 South Atlantic Gyral DCM South Atlantic Ocean -21.0292 -35.3498 150m 2.50E+08 Illumina Sunagawa et al. 2015CEQR SAMEA2621085 ENA SRF 070 0.22 South Atlantic Gyral SRF South Atlantic Ocean -20.4091 -3.1759 5m 1.13E+08 Illumina Sunagawa et al. 2015CEQS SAMEA2621176 ENA MES 072 0.22-3 South Atlantic Gyral MES South Atlantic Ocean -8.7986 -17.9034 800m 1.65E+08 Illumina Sunagawa et al. 2015CEQT SAMEA2621132 ENA SRF 072 0.22-3 South Atlantic Gyral SRF South Atlantic Ocean -8.7789 -17.9099 5m 2.77E+08 Illumina Sunagawa et al. 2015CEQU SAMEA2621216 ENA DCM 076 0.22-3 South Atlantic Gyral DCM South Atlantic Ocean -21.0292 -35.3498 150m 2.77E+08 Illumina Sunagawa et al. 2015CEQW SAMEA2621066 ENA SRF 070 0.22-3 South Atlantic Gyral SRF South Atlantic Ocean -20.4091 -3.1759 5m 1.31E+08 Illumina Sunagawa et al. 2015CEQX SAMEA2621076 ENA SRF 070 0.45-0.8 South Atlantic Gyral SRF South Atlantic Ocean -20.4091 -3.1759 5m 3.69E+08 Illumina Sunagawa et al. 2015CEQY SAMEA2621075 ENA SRF 070 0.22-0.45 South Atlantic Gyral SRF South Atlantic Ocean -20.4091 -3.1759 5m 4.03E+08 Illumina Sunagawa et al. 2015CERB SAMEA2621155 ENA DCM 072 0.22-3 South Atlantic Gyral DCM South Atlantic Ocean -8.7296 -17.9604 100m 3.43E+08 Illumina Sunagawa et al. 2015CERC SAMEA2621099 ENA MES 070 0.22-0.45 South Atlantic Gyral MES South Atlantic Ocean -20.4075 -3.1641 800m 1.86E+08 Illumina Sunagawa et al. 2015CERD SAMEA2621101 ENA MES 070 0.45-0.8 South Atlantic Gyral MES South Atlantic Ocean -20.4075 -3.1641 800m 2.81E+08 Illumina Sunagawa et al. 2015CERE SAMEA2621092 ENA MES 070 0.22-3 South Atlantic Gyral MES South Atlantic Ocean -20.4075 -3.1641 800m 2.26E+08 Illumina Sunagawa et al. 2015CERG SAMEA2621037 ENA DCM 068 0.22-3 South Atlantic Gyral DCM South Atlantic Ocean -31.027 4.6802 50m 1.85E+08 Illumina Sunagawa et al. 2015CERI SAMEA2620979 ENA SRF 067 0.22-0.45 Benguela Current Coastal SRF South Atlantic Ocean -32.2401 17.7103 5m 3.27E+08 Illumina Sunagawa et al. 2015CERJ SAMEA2620967 ENA DCM 066 0.22 Benguela Current Coastal DCM South Atlantic Ocean -34.8901 18.0459 30m 8.73E+07 Illumina Sunagawa et al. 2015CERK SAMEA2620950 ENA DCM 066 0.22-3 Benguela Current Coastal DCM South Atlantic Ocean -34.8901 18.0459 30m 1.29E+08 Illumina Sunagawa et al. 2015CERL SAMEA2620995 ENA MES 068 0.22-3 South Atlantic Gyral MES South Atlantic Ocean -31.0198 4.6685 700m 2.38E+08 Illumina Sunagawa et al. 2015CERM SAMEA2620970 ENA SRF 067 0.22-3 Benguela Current Coastal SRF South Atlantic Ocean -32.2401 17.7103 5m 2.55E+08 Illumina Sunagawa et al. 2015CERN SAMEA2620947 ENA SRF 066 0.22 Benguela Current Coastal SRF South Atlantic Ocean -34.9449 17.9189 5m 1.03E+08 Illumina Sunagawa et al. 2015CERO SAMEA2621033 ENA SRF 068 0.22 South Atlantic Gyral SRF South Atlantic Ocean -31.0266 4.665 5m 1.00E+08 Illumina Sunagawa et al. 2015CERP SAMEA2620991 ENA SRF 067 0.22 Benguela Current Coastal SRF South Atlantic Ocean -32.2401 17.7103 5m 1.10E+08 Illumina Sunagawa et al. 2015CERQ SAMEA2620882 ENA MES 065 0.22-3 Eastern Africa Coastal MES Indian Ocean -35.1889 26.2905 850m 2.46E+08 Illumina Sunagawa et al. 2015CERR SAMEA2621021 ENA SRF 068 0.45-0.8 South Atlantic Gyral SRF South Atlantic Ocean -31.0266 4.665 5m 2.91E+08 Illumina Sunagawa et al. 2015CERS SAMEA2621020 ENA SRF 068 0.22-0.45 South Atlantic Gyral SRF South Atlantic Ocean -31.0266 4.665 5m 3.16E+08 Illumina Sunagawa et al. 2015CERU SAMEA2620925 ENA DCM 065 0.22 Eastern Africa Coastal DCM Indian Ocean -35.2421 26.3048 30m 1.24E+08 Illumina Sunagawa et al. 2015CERV SAMEA2621045 ENA DCM 068 0.45-0.8 South Atlantic Gyral DCM South Atlantic Ocean -31.027 4.6802 50m 2.67E+08 Illumina Sunagawa et al. 2015CERW SAMEA2620929 ENA SRF 066 0.22-3 Benguela Current Coastal SRF South Atlantic Ocean -34.9449 17.9189 5m 2.59E+08 Illumina Sunagawa et al. 2015CERX SAMEA2621044 ENA DCM 068 0.22-0.45 South Atlantic Gyral DCM South Atlantic Ocean -31.027 4.6802 50m 2.96E+08 Illumina Sunagawa et al. 2015CERZ SAMEA2621448 ENA DCM 082 0.22 Southwest Atlantic Shelves DCM South Atlantic Ocean -47.2007 -57.9446 40m 3.41E+07 Illumina Sunagawa et al. 2015CESA SAMEA2621254 ENA SRF 078 0.22-3 South Atlantic Gyral SRF South Atlantic Ocean -30.1367 -43.2899 5m 2.11E+08 Illumina Sunagawa et al. 2015CESB SAMEA2621259 ENA SRF 078 0.22-0.45 South Atlantic Gyral SRF South Atlantic Ocean -30.1367 -43.2899 5m 2.70E+08 Illumina Sunagawa et al. 2015CESC SAMEA2621295 ENA MES 078 0.45-0.8 South Atlantic Gyral MES South Atlantic Ocean -30.1471 -43.2915 800m 2.07E+08 Illumina Sunagawa et al. 2015CESD SAMEA2622021 ENA DCM 098 0.22-3 South Pacific Subtropical Gyre DCM South Pacific Ocean -25.826 -111.7294 188m 2.60E+08 Illumina Sunagawa et al. 2015CESF SAMEA2621779 ENA SRF 093 0.22-3 Chile-Peru Current Coastal SRF South Pacific Ocean -34.0614 -73.1066 5m 3.45E+08 Illumina Sunagawa et al. 2015CESG SAMEA2621287 ENA MES 078 0.22-3 South Atlantic Gyral MES South Atlantic Ocean -30.1471 -43.2915 800m 2.38E+08 Illumina Sunagawa et al. 2015CESI SAMEA2621423 ENA DCM 082 0.22-3 Southwest Atlantic Shelves DCM South Atlantic Ocean -47.2007 -57.9446 40m 3.54E+08 Illumina Sunagawa et al. 2015CESJ SAMEA2621487 ENA SRF 084 0.22-3 Antarctic SRF Southern Ocean -60.2287 -60.6476 5m 2.96E+08 Illumina Sunagawa et al. 2015CESL SAMEA2622197 ENA MES 102 0.22-3 Pacific Equatorial Divergence MES South Pacific Ocean -5.261 -85.1678 480m 2.66E+08 Illumina Sunagawa et al. 2015CESM SAMEA2622219 ENA DCM 102 0.22-3 Pacific Equatorial Divergence DCM South Pacific Ocean -5.2669 -85.2732 40m 5.38E+08 Illumina Sunagawa et al. 2015CESN SAMEA2621990 ENA SRF 098 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -25.8051 -111.7202 5m 1.89E+08 Illumina Sunagawa et al. 2015CESO SAMEA2621812 ENA DCM 093 0.22-3 Chile-Peru Current Coastal DCM South Pacific Ocean -33.9116 -73.0537 35m 3.71E+08 Illumina Sunagawa et al. 2015CESP SAMEA2622074 ENA SRF 099 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -21.146 -104.787 5m #VALUE! Illumina Sunagawa et al. 2015CESQ SAMEA2622119 ENA DCM 100 0.22-3 South Pacific Subtropical Gyre DCM South Pacific Ocean -12.9723 -96.0122 50m 4.51E+08 Illumina Sunagawa et al. 2015CESR SAMEA2622149 ENA MES 100 0.22-3 South Pacific Subtropical Gyre MES South Pacific Ocean -12.9794 -96.0232 177m 4.19E+08 Illumina Sunagawa et al. 2015180C.1 Metagenome inventory for global fragment recruitment continued from previous pageFile Name Accession Data Repository Location/sampleID Specific Ecosystem Metagenome Group Latitude Longitude Depth (m) Size (bp) Sequencing Platform Reference / ContactCESS SAMEA2591098 ENA DCM 023 0.22-1.6 Mediterranean Sea, Black Sea DCM Mediterranean Sea 42.1735 17.7252 55m 2.15E+08 Illumina Sunagawa et al. 2015CEST SAMEA2621509 ENA SRF 085 0.22-3 Antarctic SRF Southern Ocean -62.0385 -49.529 5m 1.93E+08 Illumina Sunagawa et al. 2015CESV SAMEA2622097 ENA SRF 100 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -13.0023 -95.9759 5m 4.17E+08 Illumina Sunagawa et al. 2015CESW SAMEA2621859 ENA SRF 096 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -29.7238 -101.1604 5m 2.81E+08 Illumina Sunagawa et al. 2015CESY SAMEA2622048 ENA MES 098 0.22-3 South Pacific Subtropical Gyre MES South Pacific Ocean -25.8076 -111.6906 488m 2.60E+08 Illumina Sunagawa et al. 2015CESZ SAMEA2621260 ENA SRF 078 0.45-0.8 South Atlantic Gyral SRF South Atlantic Ocean -30.1367 -43.2899 5m 2.06E+08 Illumina Sunagawa et al. 2015CETA SAMEA2622362 ENA MES 109 0.22-3 Chile-Peru Current Coastal MES North Pacific Ocean 2.0649 -84.5546 380m 2.65E+08 Illumina Sunagawa et al. 2015CETB SAMEA2622545 ENA DCM 112 0.22-3 South Pacific Subtropical Gyre DCM South Pacific Ocean -23.2189 -129.4997 155m 3.66E+08 Illumina Sunagawa et al. 2015CETC SAMEA2622694 ENA DCM 122 0.1-0.22 South Pacific Subtropical Gyre DCM South Pacific Ocean -9.0063 -139.1394 115m 2.02E+08 Illumina Sunagawa et al. 2015CETD SAMEA2622402 ENA DCM 110 0.22-3 South Pacific Subtropical Gyre DCM South Pacific Ocean -1.9002 -84.6265 50m 3.71E+08 Illumina Sunagawa et al. 2015CETE SAMEA2622336 ENA DCM 109 0.22-3 Chile-Peru Current Coastal DCM North Pacific Ocean 2.0299 -84.5546 30m 2.80E+08 Illumina Sunagawa et al. 2015CETG SAMEA2622696 ENA DCM 122 0.45-0.8 South Pacific Subtropical Gyre DCM South Pacific Ocean -9.0063 -139.1394 115m 4.76E+08 Illumina Sunagawa et al. 2015CETH SAMEA2622695 ENA DCM 122 0.22-0.45 South Pacific Subtropical Gyre DCM South Pacific Ocean -9.0063 -139.1394 115m 5.53E+08 Illumina Sunagawa et al. 2015CETI SAMEA2622478 ENA DCM 111 0.22-3 South Pacific Subtropical Gyre DCM South Pacific Ocean -16.9587 -100.6751 90m 5.02E+08 Illumina Sunagawa et al. 2015CETJ SAMEA2622677 ENA MES 122 0.22-0.45 South Pacific Subtropical Gyre MES South Pacific Ocean -8.9729 -139.2393 600m 2.00E+08 Illumina Sunagawa et al. 2015CETK SAMEA2622452 ENA SRF 111 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -16.9601 -100.6335 5m 3.91E+08 Illumina Sunagawa et al. 2015CETL SAMEA2622518 ENA SRF 112 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -23.2811 -129.3947 5m 3.13E+08 Illumina Sunagawa et al. 2015CETM SAMEA2622690 ENA DCM 122 0.22-3 South Pacific Subtropical Gyre DCM South Pacific Ocean -9.0063 -139.1394 115m 5.46E+08 Illumina Sunagawa et al. 2015CETO SAMEA2622678 ENA MES 122 0.45-0.8 South Pacific Subtropical Gyre MES South Pacific Ocean -8.9729 -139.2393 600m 1.98E+08 Illumina Sunagawa et al. 2015CETP SAMEA2622657 ENA SRF 122 0.22-0.45 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9971 -139.1963 5m 1.98E+08 Illumina Sunagawa et al. 2015CETQ SAMEA2622376 ENA SRF 110 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -2.0133 -84.589 5m 2.76E+08 Illumina Sunagawa et al. 2015CETR SAMEA2622568 ENA MES 112 0.22-3 South Pacific Subtropical Gyre MES South Pacific Ocean -23.2232 -129.5986 696m 2.87E+08 Illumina Sunagawa et al. 2015CETS SAMEA2622673 ENA MES 122 0.22-3 South Pacific Subtropical Gyre MES South Pacific Ocean -8.9729 -139.2393 600m 3.08E+08 Illumina Sunagawa et al. 2015CETT SAMEA2622429 ENA MES 110 0.22-3 South Pacific Subtropical Gyre MES South Pacific Ocean -1.8902 -84.6141 380m 3.07E+08 Illumina Sunagawa et al. 2015CETU SAMEA2623488 ENA DCM 142 0.22-3 Caribbean DCM North Atlantic Ocean 25.6168 -88.4532 125m 3.58E+08 Illumina Sunagawa et al. 2015CETV SAMEA2623155 ENA MES 133 0.22-3 North Pacific Subtropical and Polar Fronts MES North Pacific Ocean 35.2698 -127.7268 650m 2.74E+08 Illumina Sunagawa et al. 2015CETW SAMEA2620570 ENA DCM 052 0.22-1.6 Indian South Subtropical Gyre DCM Indian Ocean -16.9534 53.9601 75m 3.48E+08 Illumina Sunagawa et al. 2015CETX SAMEA2623390 ENA MES 138 0.22-3 North Pacific Equatorial Countercurrent MES North Pacific Ocean 6.3559 -103.0598 450m 3.57E+08 Illumina Sunagawa et al. 2015CETY SAMEA2623135 ENA DCM 133 0.22-3 North Pacific Subtropical and Polar Fronts DCM North Pacific Ocean 35.4002 -127.7499 45m 4.87E+08 Illumina Sunagawa et al. 2015CETZ SAMEA2623098 ENA MES 132 0.22-3 North Pacific Subtropical and Polar Fronts MES North Pacific Ocean 31.528 -159.0224 550m 2.60E+08 Illumina Sunagawa et al. 2015CEUA SAMEA2623079 ENA DCM 132 0.22-3 North Pacific Subtropical and Polar Fronts DCM North Pacific Ocean 31.5168 -159.046 115m 4.79E+08 Illumina Sunagawa et al. 2015CEUB SAMEA2623370 ENA DCM 138 0.22-3 North Pacific Equatorial Countercurrent DCM North Pacific Ocean 6.3378 -102.9538 60m 3.51E+08 Illumina Sunagawa et al. 2015CEUC SAMEA2623295 ENA DCM 137 0.22-3 North Pacific Equatorial Countercurrent DCM North Pacific Ocean 14.2075 -116.6468 40m 4.09E+08 Illumina Sunagawa et al. 2015CEUD SAMEA2623314 ENA MES 137 0.22-3 North Pacific Equatorial Countercurrent MES North Pacific Ocean 14.2025 -116.6433 375m 3.80E+08 Illumina Sunagawa et al. 2015CEUE SAMEA2623116 ENA SRF 133 0.22-3 North Pacific Subtropical and Polar Fronts SRF North Pacific Ocean 35.3671 -127.7422 5m 6.25E+08 Illumina Sunagawa et al. 2015CEUF SAMEA2623350 ENA SRF 138 0.22-3 North Pacific Equatorial Countercurrent SRF North Pacific Ocean 6.3332 -102.9432 5m 2.75E+08 Illumina Sunagawa et al. 2015CEUG SAMEA2622901 ENA SRF 128 0.22-3 Pacific Equatorial Divergence SRF South Pacific Ocean 0.0003 -153.6759 5m 2.67E+08 Illumina Sunagawa et al. 2015CEUH SAMEA2623275 ENA SRF 137 0.22-3 North Pacific Equatorial Countercurrent SRF North Pacific Ocean 14.2035 -116.6261 5m 3.13E+08 Illumina Sunagawa et al. 2015CEUJ SAMEA2622842 ENA MIX 125 0.22-0.45 South Pacific Subtropical Gyre MIX South Pacific Ocean -8.8999 -142.5461 140m 4.16E+08 Illumina Sunagawa et al. 2015CEUK SAMEA2622843 ENA MIX 125 0.45-0.8 South Pacific Subtropical Gyre MIX South Pacific Ocean -8.8999 -142.5461 140m 4.15E+08 Illumina Sunagawa et al. 2015CEUL SAMEA2622800 ENA MIX 124 0.22-0.45 South Pacific Subtropical Gyre MIX South Pacific Ocean -9.0714 -140.5973 120m 4.15E+08 Illumina Sunagawa et al. 2015CEUM SAMEA2622716 ENA SRF 123 0.45-0.8 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9068 -140.283 5m 3.99E+08 Illumina Sunagawa et al. 2015CEUN SAMEA2622801 ENA MIX 124 0.45-0.8 South Pacific Subtropical Gyre MIX South Pacific Ocean -9.0714 -140.5973 120m 5.25E+08 Illumina Sunagawa et al. 2015CEUO SAMEA2622799 ENA MIX 124 0.1-0.22 South Pacific Subtropical Gyre MIX South Pacific Ocean -9.0714 -140.5973 120m 2.52E+08 Illumina Sunagawa et al. 2015CEUP SAMEA2622738 ENA MIX 123 0.45-0.8 South Pacific Subtropical Gyre MIX South Pacific Ocean -8.9109 -140.2845 150m 1.27E+08 Illumina Sunagawa et al. 2015CEUQ SAMEA2622796 ENA MIX 124 0.22-3 South Pacific Subtropical Gyre MIX South Pacific Ocean -9.0714 -140.5973 120m 5.25E+08 Illumina Sunagawa et al. 2015CEUR SAMEA2622715 ENA SRF 123 0.22-0.45 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9068 -140.283 5m 3.30E+08 Illumina Sunagawa et al. 2015CEUS SAMEA2622837 ENA MIX 125 0.22-3 South Pacific Subtropical Gyre MIX South Pacific Ocean -8.8999 -142.5461 140m 6.69E+08 Illumina Sunagawa et al. 2015CEUT SAMEA2622821 ENA SRF 125 0.1-0.22 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9111 -142.5571 5m 1.94E+08 Illumina Sunagawa et al. 2015CEUU SAMEA2622658 ENA SRF 122 0.45-0.8 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9971 -139.1963 5m 2.67E+08 Illumina Sunagawa et al. 2015181C.1 Metagenome inventory for global fragment recruitment continued from previous pageFile Name Accession Data Repository Location/sampleID Specific Ecosystem Metagenome Group Latitude Longitude Depth (m) Size (bp) Sequencing Platform Reference / ContactCEUV SAMEA2622710 ENA SRF 123 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9068 -140.283 5m 3.58E+08 Illumina Sunagawa et al. 2015CEUW SAMEA2622652 ENA SRF 122 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9971 -139.1963 5m 2.47E+08 Illumina Sunagawa et al. 2015CEUY SAMEA2622737 ENA MIX 123 0.22-0.45 South Pacific Subtropical Gyre MIX South Pacific Ocean -8.9109 -140.2845 150m 5.50E+08 Illumina Sunagawa et al. 2015CEVB SAMEA2622763 ENA SRF 124 0.1-0.22 South Pacific Subtropical Gyre SRF South Pacific Ocean -9.1504 -140.5216 5m 1.98E+08 Illumina Sunagawa et al. 2015CEVC SAMEA2622817 ENA SRF 125 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -8.9111 -142.5571 5m 3.72E+08 Illumina Sunagawa et al. 2015CEVD SAMEA2622733 ENA MIX 123 0.22-3 South Pacific Subtropical Gyre MIX South Pacific Ocean -8.9109 -140.2845 150m 6.12E+08 Illumina Sunagawa et al. 2015CEVE SAMEA2622764 ENA SRF 124 0.22-0.45 South Pacific Subtropical Gyre SRF South Pacific Ocean -9.1504 -140.5216 5m 3.02E+08 Illumina Sunagawa et al. 2015CEVF SAMEA2622765 ENA SRF 124 0.45-0.8 South Pacific Subtropical Gyre SRF South Pacific Ocean -9.1504 -140.5216 5m 3.71E+08 Illumina Sunagawa et al. 2015CEVG SAMEA2623426 ENA SRF 140 0.22-3 Central American Coastal SRF North Pacific Ocean 7.4122 -79.3017 5m 3.43E+08 Illumina Sunagawa et al. 2015CEVH SAMEA2623446 ENA SRF 141 0.22-3 Guianas Coastal SRF North Atlantic Ocean 9.8481 -80.0454 5m 3.53E+08 Illumina Sunagawa et al. 2015CEVI SAMEA2623513 ENA MES 142 0.22-3 Caribbean MES North Atlantic Ocean 25.6236 -88.45 640m 2.30E+08 Illumina Sunagawa et al. 2015CEVJ SAMEA2623693 ENA MES 146 0.22-3 North Atlantic Subtropical Gyral MES North Atlantic Ocean 34.6663 -71.2907 640m 1.80E+08 Illumina Sunagawa et al. 2015CEVK SAMEA2623794 ENA MES 149 0.22-3 North Atlantic Subtropical Gyral MES North Atlantic Ocean 34.0771 -49.8233 740m 2.04E+08 Illumina Sunagawa et al. 2015CEVL SAMEA2623673 ENA SRF 146 0.22-3 North Atlantic Subtropical Gyral SRF North Atlantic Ocean 34.6712 -71.3093 5m 3.91E+08 Illumina Sunagawa et al. 2015CEVM SAMEA2623649 ENA MES 145 0.22-3 Gulf Stream MES North Atlantic Ocean 39.2392 -70.0343 590m 2.80E+08 Illumina Sunagawa et al. 2015CEVN SAMEA2623463 ENA SRF 142 0.22-3 Caribbean SRF North Atlantic Ocean 25.5264 -88.394 5m 3.89E+08 Illumina Sunagawa et al. 2015CEVO SAMEA2623907 ENA MES 152 0.22-3 North Atlantic Subtropical Gyral MES North Atlantic Ocean 43.7182 -16.8714 800m 2.49E+08 Illumina Sunagawa et al. 2015CEVP SAMEA2623808 ENA SRF 150 0.22-3 North Atlantic Subtropical Gyral SRF North Atlantic Ocean 35.9346 -37.3032 5m 3.44E+08 Illumina Sunagawa et al. 2015CEVQ SAMEA2623756 ENA MES 148b 0.22-3 North Atlantic Subtropical Gyral MES North Atlantic Ocean 34.1504 -56.9684 250m 3.75E+08 Illumina Sunagawa et al. 2015CEVR SAMEA2623734 ENA SRF 148 0.22-3 North Atlantic Subtropical Gyral SRF North Atlantic Ocean 31.6948 -64.2489 5m 3.51E+08 Illumina Sunagawa et al. 2015CEVS SAMEA2623850 ENA SRF 151 0.22-3 North Atlantic Subtropical Gyral SRF North Atlantic Ocean 36.1715 -29.023 5m 4.00E+08 Illumina Sunagawa et al. 2015CEVT SAMEA2623826 ENA DCM 150 0.22-3 North Atlantic Subtropical Gyral DCM North Atlantic Ocean 35.8427 -37.1526 40m 3.72E+08 Illumina Sunagawa et al. 2015CEVU SAMEA2623868 ENA DCM 151 0.22-3 North Atlantic Subtropical Gyral DCM North Atlantic Ocean 36.1811 -28.9373 80m 3.63E+08 Illumina Sunagawa et al. 2015CEVV SAMEA2623919 ENA MIX 152 0.22-3 North Atlantic Subtropical Gyral MIX North Atlantic Ocean 43.7056 -16.8794 25m 3.81E+08 Illumina Sunagawa et al. 2015CEVW SAMEA2623774 ENA SRF 149 0.22-3 North Atlantic Subtropical Gyral SRF North Atlantic Ocean 34.1132 -49.9181 5m 3.73E+08 Illumina Sunagawa et al. 2015CEVX SAMEA2623886 ENA SRF 152 0.22-3 North Atlantic Subtropical Gyral SRF North Atlantic Ocean 43.6792 -16.8344 5m 3.43E+08 Illumina Sunagawa et al. 2015CEVY SAMEA2620855 ENA SRF 065 0.22-3 Eastern Africa Coastal SRF Indian Ocean -35.1728 26.2868 5m 1.85E+08 Illumina Sunagawa et al. 2015CEVZ SAMEA2620217 ENA DCM 041 0.22-1.6 Indian Monsoon Gyres DCM Indian Ocean 14.5536 70.0128 60m 4.64E+08 Illumina Sunagawa et al. 2015CEWA SAMEA2620000 ENA SRF 038 0.22-1.6 Indian Monsoon Gyres SRF Indian Ocean 19.0393 64.4913 5m 2.00E+08 Illumina Sunagawa et al. 2015CEWB SAMEA2621013 ENA SRF 068 0.22-3 South Atlantic Gyral SRF South Atlantic Ocean -31.0266 4.665 5m 1.97E+08 Illumina Sunagawa et al. 2015CEWE SAMEA2622759 ENA SRF 124 0.22-3 South Pacific Subtropical Gyre SRF South Pacific Ocean -9.1504 -140.5216 5m 6.14E+08 Illumina Sunagawa et al. 2015CEWG SAMEA2622173 ENA SRF 102 0.22-3 Pacific Equatorial Divergence SRF South Pacific Ocean -5.2529 -85.1545 5m 4.27E+08 Illumina Sunagawa et al. 2015CEWH SAMEA2622316 ENA SRF 109 0.22-3 Chile-Peru Current Coastal SRF North Pacific Ocean 1.9928 -84.5766 5m 2.82E+08 Illumina Sunagawa et al. 2015CEWI SAMEA2622499 ENA MES 111 0.22-3 South Pacific Subtropical Gyre MES South Pacific Ocean -16.9486 -100.6715 350m 2.63E+08 Illumina Sunagawa et al. 2015CEWJ SAMEA2622923 ENA DCM 128 0.22-3 Pacific Equatorial Divergence DCM South Pacific Ocean 0.0222 -153.6858 40m 3.11E+08 Illumina Sunagawa et al. 2015CEWK SAMEA2623059 ENA SRF 132 0.22-3 North Pacific Subtropical and Polar SRF North Pacific Ocean 31.5213 -158.9958 5m 2.78E+08 Illumina Sunagawa et al. 2015CEWO SAMEA2623627 ENA SRF 145 0.22-3 Gulf Stream SRF North Atlantic Ocean 39.2305 -70.0377 5m 4.23E+08 Illumina Sunagawa et al. 2015CEWP SAMEA2620980 ENA SRF 067 0.45-0.8 Benguela Current Coastal SRF South Atlantic Ocean -32.2401 17.7103 5m 4.14E+08 Illumina Sunagawa et al. 2015CEWR SAMEA2621551 ENA MES 085 0.22-3 Antarctic MES Southern Ocean -61.9689 -49.5017 790m 3.59E+08 Illumina Sunagawa et al. 2015engcyc 2081372001 2081372001 IMG/M taxon oid Deepwater Horizon Oil Spill Oil-contaminated Gulf of Mexico 28.672222 -88.4375 unknown 2.38E+07 Illumina Mason et al. 2012 Janet Janssonengcyc 2088090017 2088090017 IMG/M taxon oid Deepwater Horizon Oil Spill Oil-contaminated Gulf of Mexico 28.672222 -88.4375 unknown 2.53E+07 Illumina Mason et al. 2012 Janet Janssonengcyc 2149837025 2149837025 IMG/M taxon oid Gulf of Mexico Black smokers Gulf of Mexico 28.716667 -88.466667 1250m 5.25E+07 Illumina Adam R. Riversengcyc 2149837027 2149837027 IMG/M taxon oid Gulf of Mexico Black smokers Gulf of Mexico 28.716667 -88.466667 1300m 4.36E+07 Illumina Adam R. Riversengcyc 2149837028 2149837028 IMG/M taxon oid Gulf of Mexico Black smokers Gulf of Mexico 28.716667 -88.466667 1210m 5.26E+07 Illumina Adam R. Riversengcyc 2236347000 2236347000 IMG/M taxon oid Guaymas Basin Hydrothermal vent plume Guaymas Basin 27.823 -111.4 1996m 2.11E+07 Illumina HiSeq Gregory Dickengcyc 2263328000 2263328000 IMG/M taxon oid Sakinaw Lake, BC, Canada meromitic lake Sakinaw Lake 49.682207 -124.005217 120m 4.37E+08 454 Rinke et al. 2013 / Tanja Woykeengcyc 3300001139 3300001139 IMG/M taxon oid Iowa, USA Grasslands Iowa 43.303333 -89.3325 NA 1.90E+09 454-GS-FLX, Illumina GAii Janet Janssonengcyc 3300001683 3300001683 IMG/M taxon oid Guyams Basin Hydrothermal vent plume Guaymas Basin 27.015833 -111.425 1993 m 5.72E+08 Illumina HiSeq Gregory Dickengcyc 3300002908 3300002908 IMG/M taxon oid Angelo Coastal Reserve Grasslands Angelo Coastal Reserve 39.718176 -123.652732 NA 9.67E+08 Illumina HiSeq Jill Banfieldengcyc 3300004069 3300004069 IMG/M taxon oid Juan de Fuca Ridge flank Hydrothermal vents Juan de Fuca Ridge flank 47.76 -127.76 2667m 3.90E+08 Illumina HiSeq Ramunas Stepanauskas182C.1 Metagenome inventory for global fragment recruitment continued from previous pageFile Name Accession Data Repository Location/sampleID Specific Ecosystem Metagenome Group Latitude Longitude Depth (m) Size (bp) Sequencing Platform Reference / ContactG0K4BGD03 4461588.3 MG-RAST mgm4461588.3 Coastal waters off Lima PERU 40-80m -12.37444 -77 60m 1.39E+08 pyrosequencing Schunck et al. 2013GJCUZF103 4450891.3 MG-RAST mgm4450891.3 Coastal waters off Lima PERU 40-80m -12.37444 -77 40m 1.19E+08 pyrosequencing Schunck et al. 2013GJCUZF104 4450892.3 MG-RAST mgm4450892.3 Coastal waters off Lima PERU 20m -12.37444 -77 20m 1.01E+08 pyrosequencing Schunck et al. 2013GXED30K01 4460676.3 MG-RAST mgm4460676.3 Coastal waters off Lima PERU 40-80m -12.37444 -77 80m 1.54E+08 pyrosequencing Schunck et al. 2013GXED30K02 4460677.3 MG-RAST mgm4460677.3 Coastal waters off Lima PERU 5m -12.37444 -77 5m 1.22E+08 pyrosequencing Schunck et al. 2013GZ2L4FS03 4460736.3 MG-RAST mgm4460736.3 Coastal waters off Lima PERU 40-80m -12.37444 -77 50m 1.24E+08 pyrosequencing Schunck et al. 2013SI MetaG 1039680 SAMN05224482 NCBI bioproject SI037 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.70E+09 Illumnia Steven HallamSI MetaG 1039686 SAMN05224486 NCBI bioproject SI037 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 1.25E+09 Illumnia Steven HallamSI MetaG 1039689 SAMN05224487 NCBI bioproject SI037 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 1.48E+09 Illumnia Steven HallamSI MetaG 1039704 SAMN05224440 NCBI bioproject SI072 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 1.50E+09 Illumnia Steven HallamSI MetaG 1039707 SAMN05224441 NCBI bioproject SI072 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.98E+09 Illumnia Steven HallamSI MetaG 1039713 SAMN05224513 NCBI bioproject SI072 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.70E+09 Illumnia Steven HallamSI MetaG 1039716 SAMN05224518 NCBI bioproject SI072 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 1.39E+09 Illumnia Steven HallamSI MetaG 1039719 SAMN05224519 NCBI bioproject SI072 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 7.57E+08 Illumnia Steven HallamSI MetaG 1039722 SAMN05224534 NCBI bioproject SI073 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 1.33E+09 Illumnia Steven HallamSI MetaG 1039725 SAMN05224524 NCBI bioproject SI073 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.62E+09 Illumnia Steven HallamSI MetaG 1039728 SAMN05224525 NCBI bioproject SI073 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 1.66E+09 Illumnia Steven HallamSI MetaG 1039731 SAMN05224530 NCBI bioproject SI073 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.79E+09 Illumnia Steven HallamSI MetaG 1039734 SAMN05224531 NCBI bioproject SI073 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 1.69E+09 Illumnia Steven HallamSI MetaG 1039737 SAMN05224508 NCBI bioproject SI073 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 7.87E+08 Illumnia Steven HallamSI MetaG 1039740 SAMN05224529 NCBI bioproject SI074 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 8.44E+08 Illumnia Steven HallamSI MetaG 1039743 SAMN05224509 NCBI bioproject SI074 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 7.16E+08 Illumnia Steven HallamSI MetaG 1039746 SAMN05224514 NCBI bioproject SI074 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 9.69E+08 Illumnia Steven HallamSI MetaG 1039749 SAMN05224515 NCBI bioproject SI074 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.74E+09 Illumnia Steven HallamSI MetaG 1039752 SAMN05224528 NCBI bioproject SI074 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 1.91E+09 Illumnia Steven HallamSI MetaG 1039755 SAMN05224520 NCBI bioproject SI074 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 1.18E+09 Illumnia Steven HallamSI MetaG 1040232 SAMN05224521 NCBI bioproject SI037 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 1.03E+09 Illumnia Steven HallamSI MetaG 1040238 SAMN05224527 NCBI bioproject SI037 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 1.07E+09 Illumnia Steven HallamSI MetaG 1057018 SAMN05224536 NCBI bioproject SI075 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 1.54E+09 Illumnia Steven HallamSI MetaG 1057019 SAMN05224522 NCBI bioproject SI075 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.53E+09 Illumnia Steven HallamSI MetaG 1057020 SAMN05224493 NCBI bioproject SI075 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 8.17E+08 Illumnia Steven HallamSI MetaG 1057021 SAMN05224494 NCBI bioproject SI075 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 7.86E+08 Illumnia Steven HallamSI MetaG 1057022 SAMN05224495 NCBI bioproject SI075 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 1.34E+09 Illumnia Steven HallamSI4093112 SAMN05224413 NCBI bioproject LP J09 P20 500m North eastern subartic pacific LP 500m 49.567 -138.664 500m 6.80E+07 Illumnia Steven HallamSI4093113 SAMN05224425 NCBI bioproject LP A09 P04 10m North eastern subartic pacific LP 10m 48.651 -126.667 10m 1.62E+08 Illumnia Steven HallamSI4093125 SAMN05224430 NCBI bioproject LP A09 P04 500m North eastern subartic pacific LP 500m 48.651 -126.667 500m 1.00E+08 Illumnia Steven HallamSI4093127 SAMN05224418 NCBI bioproject LP A09 P04 1000m North eastern subartic pacific LP 1000m 48.651 -126.667 1000m 5.74E+07 Illumnia Steven HallamSI4093128 SAMN05224419 NCBI bioproject LP A09 P04 1300m North eastern subartic pacific LP 1300m 48.651 -126.667 1300m 1.31E+08 Illumnia Steven HallamSI4093129 SAMN05224424 NCBI bioproject LP A09 P20 1000m North eastern subartic pacific LP 1000m 49.567 -138.664 1000m 1.01E+08 Illumnia Steven HallamSI4093130 SAMN05224446 NCBI bioproject LP A09 P20 500m North eastern subartic pacific LP 500m 49.567 -138.664 500m 1.17E+08 Illumnia Steven HallamSI4093131 SAMN05224427 NCBI bioproject LP J08 P16 500m North eastern subartic pacific LP 500m 49.283 -134.666 500m 9.79E+07 Illumnia Steven HallamSI4093132 SAMN05224450 NCBI bioproject LP J09 P20 1000m North eastern subartic pacific LP 1000m 49.567 -138.664 1000m 9.26E+07 Illumnia Steven HallamSI4093144 SAMN05224451 NCBI bioproject SI042 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 1.46E+08 Illumnia Steven HallamSI4093145 SAMN05224447 NCBI bioproject SI042 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.95E+08 Illumnia Steven HallamSI4093146 SAMN05224436 NCBI bioproject SI042 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 1.94E+08 Illumnia Steven HallamSI4093147 SAMN05224437 NCBI bioproject SI042 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.68E+08 Illumnia Steven HallamSI4093148 SAMN05224442 NCBI bioproject SI042 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 1.39E+08 Illumnia Steven HallamSI4093149 SAMN05224443 NCBI bioproject SI042 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 1.57E+08 Illumnia Steven HallamSI4096364 SAMN05224469 NCBI bioproject LP A08 P12 1000m North eastern subartic pacific LP 1000m 48.97 -130.666 1000m 9.31E+07 Illumnia Steven HallamSI4096365 SAMN05213796 NCBI bioproject LP A08 P12 2000m North eastern subartic pacific LP 2000m 48.97 -130.666 2000m 1.48E+08 Illumnia Steven Hallam183C.1 Metagenome inventory for global fragment recruitment continued from previous pageFile Name Accession Data Repository Location/sampleID Specific Ecosystem Metagenome Group Latitude Longitude Depth (m) Size (bp) Sequencing Platform Reference / ContactSI4096367 SAMN05213798 NCBI bioproject LP A08 P20 500m North eastern subartic pacific LP 500m 49.567 -138.664 500m 4.77E+07 Illumnia Steven HallamSI4096368 SAMN05224403 NCBI bioproject LP J09 P12 10m North eastern subartic pacific LP 10m 48.97 -130.666 10m 5.09E+07 Illumnia Steven HallamSI4096369 SAMN05213797 NCBI bioproject LP A08 P20 2000m North eastern subartic pacific LP 2000m 49.567 -138.664 2000m 6.44E+07 Illumnia Steven HallamSI4096370 SAMN05224414 NCBI bioproject LP A09 P16 10m North eastern subartic pacific LP 10m 49.283 -134.666 10m 1.01E+08 Illumnia Steven HallamSI4096371 SAMN05224420 NCBI bioproject LP A09 P16 500m North eastern subartic pacific LP 500m 49.283 -134.666 500m 1.48E+08 Illumnia Steven HallamSI4096373 SAMN05224415 NCBI bioproject LP A09 P16 2000m North eastern subartic pacific LP 2000m 49.283 -134.666 2000m 4.85E+07 Illumnia Steven HallamSI4096375 SAMN05224449 NCBI bioproject LP A09 P20 2000m North eastern subartic pacific LP 2000m 49.567 -138.664 2000m 1.32E+08 Illumnia Steven HallamSI4096377 SAMN05224426 NCBI bioproject LP F10 P16 500m North eastern subartic pacific LP 500m 49.283 -134.666 500m 5.55E+07 Illumnia Steven HallamSI4096378 SAMN05224432 NCBI bioproject LP F10 P16 1000m North eastern subartic pacific LP 1000m 49.283 -134.666 1000m 1.69E+08 Illumnia Steven HallamSI4096379 SAMN05224421 NCBI bioproject LP F10 P16 2000m North eastern subartic pacific LP 2000m 49.283 -134.666 2000m 3.17E+07 Illumnia Steven HallamSI4096381 SAMN05224400 NCBI bioproject LP J08 P04 500m North eastern subartic pacific LP 500m 48.651 -126.667 500m 1.11E+08 Illumnia Steven HallamSI4096382 SAMN05224396 NCBI bioproject LP J08 P04 1000m North eastern subartic pacific LP 1000m 48.651 -126.667 1000m 1.80E+07 Illumnia Steven HallamSI4096383 SAMN05224468 NCBI bioproject LP J08 P04 1300m North eastern subartic pacific LP 1300m 48.651 -126.667 1300m 1.15E+08 Illumnia Steven HallamSI4096385 SAMN05224448 NCBI bioproject LP J09 P16 500m North eastern subartic pacific LP 500m 49.283 -134.666 500m 1.26E+08 Illumnia Steven HallamSI4096386 SAMN05224475 NCBI bioproject LP J09 P16 1000m North eastern subartic pacific LP 1000m 49.283 -134.666 1000m 1.42E+08 Illumnia Steven HallamSI4096387 SAMN05224445 NCBI bioproject LP J09 P16 2000m North eastern subartic pacific LP 2000m 49.283 -134.666 2000m 1.31E+08 Illumnia Steven HallamSI4096389 SAMN05224452 NCBI bioproject LP J09 P20 2000m North eastern subartic pacific LP 2000m 49.567 -138.664 2000m 4.20E+07 Illumnia Steven HallamSI4096390 SAMN05224470 NCBI bioproject LP A08 P26 10m North eastern subartic pacific LP 10m 50 -145 10m 8.19E+07 Illumnia Steven HallamSI4096391 SAMN05224471 NCBI bioproject LP A08 P26 500m North eastern subartic pacific LP 500m 50 -145 500m 1.72E+07 Illumnia Steven HallamSI4096392 SAMN05224488 NCBI bioproject LP A08 P26 1000m North eastern subartic pacific LP 1000m 50 -145 1000m 1.13E+08 Illumnia Steven HallamSI4096394 SAMN05224453 NCBI bioproject LP A09 P26 500m North eastern subartic pacific LP 500m 50 -145 500m 1.17E+08 Illumnia Steven HallamSI4096395 SAMN05224461 NCBI bioproject LP F09 P12 500m North eastern subartic pacific LP 500m 48.97 -130.666 500m 9.20E+07 Illumnia Steven HallamSI4096396 SAMN05224460 NCBI bioproject LP F09 P12 1000m North eastern subartic pacific LP 1000m 48.97 -130.666 1000m 9.63E+06 Illumnia Steven HallamSI4096398 SAMN05224457 NCBI bioproject LP F09 P26 500m North eastern subartic pacific LP 500m 50 -145 500m 8.23E+07 Illumnia Steven HallamSI4096399 SAMN05224456 NCBI bioproject LP F09 P26 1000m North eastern subartic pacific LP 1000m 50 -145 1000m 1.38E+08 Illumnia Steven HallamSI4096400 SAMN05224431 NCBI bioproject LP J08 P26 500m North eastern subartic pacific LP 500m 50 -145 500m 1.18E+08 Illumnia Steven HallamSI4096401 SAMN05224485 NCBI bioproject LP J08 P16 1000m North eastern subartic pacific LP 1000m 49.283 -134.666 1000m 4.28E+07 Illumnia Steven HallamSI4096402 SAMN05224395 NCBI bioproject LP J08 P12 500m North eastern subartic pacific LP 500m 48.97 -130.666 500m 1.33E+08 Illumnia Steven HallamSI4096403 SAMN05224464 NCBI bioproject LP J08 P12 1000m North eastern subartic pacific LP 1000m 48.97 -130.666 1000m 1.77E+07 Illumnia Steven HallamSI4096404 SAMN05224465 NCBI bioproject LP J08 P12 2000m North eastern subartic pacific LP 2000m 48.97 -130.666 2000m 4.07E+07 Illumnia Steven HallamSI4096405 SAMN05224476 NCBI bioproject LP J09 P12 500m North eastern subartic pacific LP 500m 48.97 -130.666 500m 1.74E+08 Illumnia Steven HallamSI4096409 SAMN05224402 NCBI bioproject SI034 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 1.48E+08 Illumnia Steven HallamSI4096410 SAMN05224404 NCBI bioproject SI034 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.93E+08 Illumnia Steven HallamSI4096411 SAMN05224407 NCBI bioproject SI034 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 2.13E+08 Illumnia Steven HallamSI4096412 SAMN05224408 NCBI bioproject SI034 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 2.38E+08 Illumnia Steven HallamSI4096413 SAMN05224411 NCBI bioproject SI034 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 7.78E+07 Illumnia Steven HallamSI4096414 SAMN05224484 NCBI bioproject SI034 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 2.01E+08 Illumnia Steven HallamSI4096416 SAMN05224405 NCBI bioproject SI036 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.03E+08 Illumnia Steven HallamSI4096417 SAMN05224472 NCBI bioproject SI036 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 2.15E+08 Illumnia Steven HallamSI4096418 SAMN05224406 NCBI bioproject SI036 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.36E+08 Illumnia Steven HallamSI4096419 SAMN05224409 NCBI bioproject SI036 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 4.33E+07 Illumnia Steven HallamSI4096420 SAMN05224412 NCBI bioproject SI036 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 6.19E+07 Illumnia Steven HallamSI4096421 SAMN05224416 NCBI bioproject SI039 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 6.03E+07 Illumnia Steven HallamSI4096422 SAMN05224417 NCBI bioproject SI039 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.95E+08 Illumnia Steven HallamSI4096423 SAMN05224422 NCBI bioproject SI039 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 2.34E+08 Illumnia Steven HallamSI4096424 SAMN05224423 NCBI bioproject SI039 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.62E+08 Illumnia Steven HallamSI4096425 SAMN05224428 NCBI bioproject SI039 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 7.10E+07 Illumnia Steven HallamSI4096426 SAMN05224477 NCBI bioproject SI039 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 2.05E+08 Illumnia Steven HallamSI4096428 SAMN05224454 NCBI bioproject SI047 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.27E+08 Illumnia Steven HallamSI4096429 SAMN05224455 NCBI bioproject SI047 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 1.77E+08 Illumnia Steven Hallam184C.1 Metagenome inventory for global fragment recruitment continued from previous pageFile Name Accession Data Repository Location/sampleID Specific Ecosystem Metagenome Group Latitude Longitude Depth (m) Size (bp) Sequencing Platform Reference / ContactSI4096430 SAMN05224458 NCBI bioproject SI047 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.81E+08 Illumnia Steven HallamSI4096431 SAMN05224459 NCBI bioproject SI047 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 1.83E+08 Illumnia Steven HallamSI4096432 SAMN05224463 NCBI bioproject SI047 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 1.00E+08 Illumnia Steven HallamSI4096433 SAMN05224462 NCBI bioproject SI048 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 5.85E+07 Illumnia Steven HallamSI4096434 SAMN05224393 NCBI bioproject SI048 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.75E+08 Illumnia Steven HallamSI4096435 SAMN05224394 NCBI bioproject SI048 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 1.12E+08 Illumnia Steven HallamSI4096436 SAMN05224397 NCBI bioproject SI048 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.36E+08 Illumnia Steven HallamSI4096437 SAMN05224398 NCBI bioproject SI048 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 6.47E+07 Illumnia Steven HallamSI4096438 SAMN05224401 NCBI bioproject SI048 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 4.66E+07 Illumnia Steven HallamSI4096439 SAMN05224489 NCBI bioproject SI053 SI3 10m Saanich Inlet SI 10m 48.588 -123.504 10m 1.38E+07 Illumnia Steven HallamSI4096440 SAMN05224490 NCBI bioproject SI053 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.35E+08 Illumnia Steven HallamSI4096441 SAMN05224491 NCBI bioproject SI053 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 1.67E+08 Illumnia Steven HallamSI4096442 SAMN05224492 NCBI bioproject SI053 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 1.07E+08 Illumnia Steven HallamSI4096443 SAMN05224466 NCBI bioproject SI053 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 2.07E+08 Illumnia Steven HallamSI4096444 SAMN05224467 NCBI bioproject SI053 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 1.65E+08 Illumnia Steven HallamSI4096446 SAMN05224410 NCBI bioproject SI054 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.48E+08 Illumnia Steven HallamSI4096447 SAMN05224433 NCBI bioproject SI054 SI3 120m Saanich Inlet SI 120m 48.588 -123.504 120m 8.58E+07 Illumnia Steven HallamSI4096448 SAMN05224473 NCBI bioproject SI054 SI3 135m Saanich Inlet SI 135m 48.588 -123.504 135m 7.67E+07 Illumnia Steven HallamSI4096449 SAMN05224478 NCBI bioproject SI054 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 4.10E+07 Illumnia Steven HallamSI4096450 SAMN05224438 NCBI bioproject SI054 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 9.80E+07 Illumnia Steven HallamSI4096451 SAMN05224439 NCBI bioproject SI060 SI3 100m Saanich Inlet SI 100m 48.588 -123.504 100m 1.73E+08 Illumnia Steven HallamSI4096452 SAMN05224444 NCBI bioproject SI060 SI3 150m Saanich Inlet SI 150m 48.588 -123.504 150m 9.75E+07 Illumnia Steven HallamSI4096453 SAMN05224474 NCBI bioproject SI060 SI3 200m Saanich Inlet SI 200m 48.588 -123.504 200m 1.90E+08 Illumnia Steven HallamSRR064444 SRS113624 NCBI SRA MOOMZ1 50m DNA Coastal waters off Iquique ETSP 50-65m -20.07 -70.23 50m 1.08E+08 454 GS FLX Stewart et al.2012SRR064446 SRS113625 NCBI SRA MOOMZ1 85m DNA Coastal waters off Iquique ETSP 70-85m -20.07 -70.23 85m 1.64E+08 454 GS FLX Stewart et al.2012SRR064448 SRS113626 NCBI SRA MOOMZ1 110m DNA Coastal waters off Iquique ETSP 110-200m -20.07 -70.23 110m 1.11E+08 454 GS FLX Stewart et al.2012SRR064450 SRS113627 NCBI SRA MOOMZ1 200m DNA Coastal waters off Iquique ETSP 110-200m -20.07 -70.23 200m 1.42E+08 454 GS FLX Stewart et al.2012SRR070081 SRS118145 NCBI SRA MOOMZ2 70m DNA Coastal waters off Iquique ETSP 70-85m -20.07 -70.23 70m 6.30E+08 454 GS FLX Titanium Stewart et al.2012SRR070082 SRS118146 NCBI SRA MOOMZ2 200m DNA Coastal waters off Iquique ETSP 110-200m -20.07 -70.23 200m 5.84E+08 454 GS FLX Titanium Stewart et al.2012SRR070083 SRS118147 NCBI SRA MOOMZ3 80m DNA Coastal waters off Iquique ETSP 70-85m -20.07 -70.23 80m 7.45E+08 454 GS FLX Titanium Stewart et al.2012SRR070084 SRS118148 NCBI SRA MOOMZ3 150m DNA Coastal waters off Iquique ETSP 110-200m -20.07 -70.23 150m 7.17E+08 454 GS FLX Titanium Stewart et al.2012SRR304656 SRS213611 NCBI SRA Moomz1 65m DNA Coastal waters off Iquique ETSP 50-65m -20.07 -70.23 65m 1.16E+08 454 GS FLX Stewart et al.2012SRR304668 SRS213613 NCBI SRA Moomz1 500m DNA Coastal waters off Iquique ETSP 500-800m -20.07 -70.23 500m 1.46E+08 454 GS FLX Stewart et al.2012SRR304671 SRS213616 NCBI SRA Moomz2 35m DNA Coastal waters off Iquique ETSP 35m -20.07 -70.23 35m 5.59E+08 454 GS FLX Titanium Stewart et al.2012SRR304672 SRS213617 NCBI SRA Moomz2 50m DNA Coastal waters off Iquique ETSP 50-65m -20.07 -70.23 50m 6.11E+08 454 GS FLX Titanium Stewart et al.2012SRR304673 SRS213618 NCBI SRA Moomz2 110m DNA Coastal waters off Iquique ETSP 110-200m -20.07 -70.23 110m 5.03E+08 454 GS FLX Titanium Stewart et al.2012SRR304674 SRS213619 NCBI SRA Moomz3 50m DNA Coastal waters off Iquique ETSP 50-65m -20.07 -70.23 50m 8.79E+08 454 GS FLX Titanium Stewart et al.2012SRR304680 SRS213623 NCBI SRA Moomz3 110m DNA Coastal waters off Iquique ETSP 110-200m -20.07 -70.23 110m 8.07E+08 454 GS FLX Titanium Stewart et al.2012SRR304683 SRS213614 NCBI SRA Moomz1 800m DNA Coastal waters off Iquique ETSP 500-800m -20.07 -70.23 800m 5.50E+07 454 GS FLX Stewart et al.2012SRR304684 SRS213624 NCBI SRA Moomz1 15m DNA Coastal waters off Iquique ETSP 15m -20.07 -70.23 15m 1.56E+08 454 GS FLX Stewart et al. 2012185Table C.2: Summary of recruited fragments to metagenome groups in global fragment recruitment analysisMetagenome Study Abreviation Metagenome group # metagenomes in group Marinimicrobia lineageZA3312c-B ZA3312c-A Arctic96B-7-A Arctic96B-7-B HF770D10 ZA3648c SHAN400 SHBH1141 HMTAb91-B Group totalSaanich Inlet SISI 10m 10 0 3287 3328 3917 0 0 978 9948 0 21458SI 100m 14 2 4622 18597 21844 4 0 5931 7997 0 58997SI 120m 12 0 2811 12709 10918 3 0 4277 14561 0 45279SI 135m 12 0 1954 10153 11818 1 0 4903 21423 0 50252SI 150m 14 0 5432 18304 7663 6 0 9097 32848 0 73350SI 200m 13 1 1554 14080 996 2 0 6361 29807 0 52801North Eastern Sub Arctic Pacific NESAPNESAP 10m 4 1 1272 0 1 0 0 0 0 0 1274NESAP 1000m 12 50 1186 3624 459 4335 0 1868 55 0 11577NESAP 500m 16 16 869 2701 130 3374 0 1181 59 0 8330NESAP 1300m 2 12 7 102 10 977 0 2 0 0 1110NESAP 2000m 8 35 768 674 1111 3215 0 387 38 0 6228Eastern Tropical South Pacific ETSPETSP 15m 1 27 367 2 26 1 0 0 0 0 423ETSP 35m 1 7 89 1 174 0 0 0 0 0 271ETSP 50-65m 4 13 326 350 1519 4 0 0 0 0 2212ETSP 70-85m 3 9 81 1067 745 86 8 6 0 0 2002ETSP 110-200m 6 17 24 2165 244 266 9 8 4 0 2737ETSP 500-800m 2 2 1 77 18 192 38 1 0 0 329Peru PERUPERU 5m 1 2 11 3 2 0 0 0 0 0 18PERU 20m 1 2 10 15 7 0 0 0 5 0 39PERU 40-80m 4 0 2 351 46 6 1 1 79 0 486TARA Oceansm SRF Indian Ocean 12 2042 1 0 5 0 0 0 0 0 2048j SRF Mediterranean Sea 6 179 44 0 1 0 0 0 0 0 224h SRF North Atlantic Ocean 10 626 206 5 313 0 0 0 0 0 1150e SRF North Pacific Ocean 6 468 54 0 4 0 0 0 0 0 526k SRF Red Sea 4 220 0 0 3 1 0 0 0 0 224i SRF South Atlantic Ocean 21 1685 431 2 122 0 0 0 0 0 2240n SRF South Pacific Ocean 22 114 27 0 19 0 0 0 0 0 160l SRF Southern Ocean 2 0 0 0 2 0 0 0 0 0 2m DCM Indian Ocean 12 606 2 0 17 0 1 0 0 0 626j DCM Mediterranean Sea 7 123 126 0 22 0 1 0 0 0 272h DCM North Atlantic Ocean 4 236 40 0 26 1 0 0 0 0 303e DCM North Pacific Ocean 5 26 146 13 31 3 0 0 0 0 219k DCM Red Sea 2 132 0 0 3 0 0 0 0 0 135i DCM South Atlantic Ocean 14 689 259 1 136 1 2 0 0 0 1088n DCM South Pacific Ocean 12 57 11 1 20 0 2 0 0 0 91h MIX North Atlantic Ocean 1 7 35 0 25 0 0 0 0 0 67e MIX South Pacific Ocean 10 8 4 2 10 1 0 0 0 0 25m MES Indian Ocean 8 9 5 46 7 298 4 0 0 0 369h MES North Atlantic Ocean 6 3 7 19 9 257 3 0 0 0 298e MES North Pacific Ocean 5 0 3 49 11 341 4 0 0 0 408i MES South Atlantic Ocean 10 243 11 79 191 1418 8 0 0 0 1950n MES South Pacific Ocean 9 13 11 42 13 435 1 0 0 0 515l MES Southern Ocean 1 0 1 9 0 56 0 0 0 0 66Hydro-thermald Guaymas Basin 2 11 5 59 16 189 8 2 5 0 295g Gulf of Mexico 5 2 0 16 4 286 1 0 0 0 309b Juan de Fuca Ridge flank 1 0 9 1 5 7 0 0 0 0 22Strat. Lake a Sakinaw Lake 1 0 0 0 0 0 0 0 0 1 1TRf Iowa 1 0 0 0 1 0 0 0 0 0 1c Angelo Coastal Reserve 1 0 0 0 1 0 0 0 0 0 1Total: 7695 26111 88647 62665 15766 91 35003 116829 1*Abbreviations: SRF - surface; DCM - deep chlorophil max; MIX - mix layer; MES - mesopelagic; Strat. Lake - stratified lake; TR - trrestereal.186Table C.3: Genomic features of Mrinimicrobia population genome bins.Clade Singel Cell Genome Identity Population Genome Size (Mbp) Estimated Completeness (%) Number of Contigs N50 GC Content (%) Single Copy Marker Genes Strain Heterogeneity Marker LineageZA3312c-A AAA160-I06, AAA160-C11, AAA076-M08, AAA160-B08 11 95.8 531 35213 32.8% 56 94.33 rootZA3312c-B AAA0298-D23 1 93.4 41 236078 31.6% 147 28 BacteriaHF770D10 AAA003-E22 1.4 41.2 118 15724 36.6% 104 100 BacteriaArctic96B7 A AB-746 N13AB-902, AB-747 F21AB-903 50.9 100.0 3423 18609 39.4% 56 70.67 rootArctic96B7 B AB-746 P06AB-902, JGI 0000113-D11 6.0 96.6 583 13227 32.6% 104 63.36 BacteriaSHAN400 AB-755 M21D07 32.2 87.5 2196 19252 37.4% 56 99.64 rootSHBH1141 AB-750 L13AB-904, AB-755 E16C12, AB-751 D09AB-904 65.6 91.7 3127 35279 43.5% 56 96.11 root187Table C.4: Summary of central metabolism in Marinimicrobia lineages by SAGs and population genomes.Mirinimicrobia lineage is listedin top row, use of SAGs inidates multiple SAGs in a given lineages as per figure 1A. Abbrevation ’pop. Genome indicates the populationgenome including recruited etagenomic contigs, if no pop. Genome is listed no metagenomic contigs were recruited.Metabolism Pathway or Ggene ZA3312c-A ZA3312c-B HF770D10 ZA3648c Arctic96-B-7-A Arctic96-B-7-B SHAN400 SHBH1141 HMTAb91-A HMTAb91-BSAGs pop. Genome SAG pop. Genome SAG pop. Genome SAG SAGs pop. Genome SAGs pop. Genome SAG pop. Genome SAGs pop. Genome SAGs SAGSugar MetabolismEntner Doudoroff - - - - - - - - - - - - - - - - -Non-Oxidative Pentose PhosphatePpathway Y Y - - - - - - - - - - - Y Y - -Pentose Phosphate Pathway - - - - - - - - - - - - - - - - -Embden-Meyerhof-Parnar - - - - - - - - Y - - - Y Y Y Y -Pyruvate Kinase - Y - - - - - - - - Y Y Y - - - -GluconeogenesisPhosphoenolpyruvate Carboxylase/Carboxykinase Y Y Y Y - - Y Y - Y Y Y Y Y Y YFructose-1,6-bisphosphatase Y Y Y - - - Y Y Y Y Y Y Y Y Y Y YTCAPyr Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y YRHox Y Y Y - - - - - Y - Y - Y Y Y - -LHox Y Y Y - - - - - Y Y Y - Y Y Y - -Sdh Y Y Y - - - Y - Y Y - Y Y Y Y Y -Carbon FixationReductive-TCA (citrate-lyase) - - - - - - - - - - - - - Y Y - -WoodLjungdahl - - - - - - - - - - - - - - - - -Hydroxypropionate Bicycle - - - - - - - - - - - - - - - - -Hydroxy-ropionate/4-Hydroxybutyrate - - - - - - - - - - - - - - - - -Calvin Benson Bassham - - - - - - - - - - - - - - - - -MotilityFlagellum - - - - - - - - - - - - - - - - -Pillin - - - - - - - - - - - - - - - - -Cell WallLipopolysaccharide A Y Y - - - - - - - - - - - - - - -Peptidoglycan - - - - - - - - - - - - - - - -188868890929496981 1.2 1.4 1.6 1.8 2Gene-Coding Bases (%)COG-based Gene RedundancyHMTAb91-AHMTAb91-BMGA clades:ZA3312c-AZA3312c-BZA3648cHF770D10SHBH1141SHAN400Arctic96B-7-BArctic96B-7-AE. coli K-12A. mobileT. acidaminovoransAmino-acid syntrophsCa. Pelagibacter (SAR11)CyanobacteriaA. colombiense02004006008001000120014000. 80. 91.01. 11. 21. 31. 41. 5899091929394959697980 20 40 60 80 100Genome Completenes (%)Coding bases (%)COG-baesd Gene RedundancyCOG functionsHMTAb91 ZA3312cSHBH1141 Arctic96B-70 20 40 60 80 100Genome Completenes (%)A.B.ubiqueFigure C.1: Genomic streamlining in Marinimicrobia clades. (A) Comparison of genome reductionbetween Marinimicrobia clades and selected reference organisms based on estimated gene redundancy usingclusters of orthologous groups (COG) annotation and frequency of gene-coding bases. (B) Benchmarkingof genome reduction showing COG functions recovered, COG-based gene redundancy and percentage ofcoding bases as a function of estimated genome completeness.189FermicutesTenericutesFusobacteriaActinobacteriaCyanobacteriaSynergistetesThermiThermotogaeProteobacteriaEuarchaeotaBacteroidetesCrenarchaeotaChlorobiChlamydiaeSpirochaetesAcidobacteriaPlanctomycetesVerrucomicrobiaMGAOtherT. thermophilusT. osimaiiM. roseusI. albumC. abyssiarinimicrobiaFigure C.2: Phylogenomic tree placing Marinimicrobia in bacterial phylum.Phylogenetic relationship ofMarinimicrobia SAGs and related genomes within the microbial tree of life as determined by sequencealignment of 400 conserved protein sequences. The tree was generated using the PhyloPhlAn pipeline,placing Marinimicrobia SAG sequences within a phylogeny of 3,737 curated microbial genomes (colored byphylum). 25 MGA SAGs are shown as red hexagons, and 5 genomes previously identified as being highlysimilar to Marinimicrobia based on small subunit rRNA gene sequence shown as black hexagons.190Oxicn = DysoxicSuboxicOxicDysoxicSuboxicOxicDysoxicSuboxicAnoxicSuldicMarine - Misc.FreshwaterThermal springsDeep subsurfaceSaline and AlkalineSoil024681012Coastal OMZ Open Ocean OMZTARATerresterialMarineAnoxicSuboxicSuldicDysoxicUndenedOxicPercentage of Marinimicrobia in Metagenome (%)16 20 30 19 22 33 11 7 11 445 26 12 164 3 11 76Arctic96B-7-ALineage<1%33%22%20%9%8%2%5%Arctic96B-7-BHF77D10HMTAb91-BSHAN400SHBH1141Za3312c-AZa3312c-BZA3648cA.B.Figure C.3: Global prevalence of Marinimicrobia in surveyed metagenomes(A) Box and whisker plotshowing the distribution of percentage of Marinimicrobia in surveyed metagenomes by region and redoxstatus including Coastal OMZs (Saanich Inlet, Eastern Tropical South Pacific and the Peruvian upwellingsystem), Open Ocean OMZ (Northeastern subarctic Pacific), TARA oceans survey, miscellaneous marineand terrestrial environments (see table A3.2) with the number of samples with Marinimicrobia presentindicated as n =. Where environmental data was available the metagenenomic sample was categorizedas oxic (>90 µm O2; yellow), dysoxic (20-90 µm O2; teal), suboxic (1-20 µm O2; blue), anoxic (<1), sulfidic(purple). (B) Global abundance of Marinimicrobia clades.191depth (m)20018516515013512011010090857560402010July 2010depth (m)20018516515013512011010090857560402010O2O2 O2August 2010depth (m)20018516515013512011010090857560402010February 20110 100 150 200 250 30050 0 100 150 200 250 30050 0 100 150 200 250 300500 5 10 15 20 25 30N2O (nM)NO3- , NO2-, NH4+, H2S (µM)0 5 10 15 20 25 30N2O (nM)NO3- , NO2-, NH4+, H2S (µM)0 5 10 15 20 25 30N2O (nM)NO3- , NO2-, NH4+, H2S (µM)NO3-O2NH4+NO2-H2SN2OFigure C.4: Saanich Inlet water column chemistry for metatranscriptome samples.Plots of Saanich Inletwater column chemistry for time points used for metatranscriptomic expression analysis.1920e+001e+052e+053e+050e+001e+052e+053e+050e+001e+052e+053e+05LP09A_P4_10LP09J_P4_10LP09A_P4_500LP08J_P4_500LP09A_P4_1000LP09A_P4_1300LP08J_P12_500LP09F_P12_500LP09J_P12_500LP08J_P12_2000LP08A_P12_2000LP09A_P16_10LP08J_P16_500LP09J_P16_500LP09A_P16_500LP10F_P16_500LP08J_P16_1000LP09J_P16_1000LP10F_P16_1000LP09J_P16_2000LP09A_P16_2000LP10F_P16_2000LP08A_P20_500LP09J_P20_500LP09A_P20_500LP09J_P20_1000LP09A_P20_1000LP08A_P20_2000LP09J_P20_2000LP08A_P26_10LP08J_P26_500LP08A_P26_500LP09F_P26_500LP09A_P26_500LP08A_P26_1000LP09F_P26_1000LP08J_P4_1300SI34_SI03_10SI34_SI03_100SI34_SI03_120SI34_SI03_135SI34_SI03_150SI34_SI03_200SI36_SI03_100SI36_SI03_120SI36_SI03_135SI36_SI03_150SI36_SI03_200SI42_SI03_10SI42_SI03_100SI42_SI03_120SI42_SI03_135SI42_SI03_150SI42_SI03_200SI47_SI03_100SI47_SI03_120SI47_SI03_135SI47_SI03_150SI47_SI03_200SI48_SI03_10SI48_SI03_120SI48_SI03_135SI48_SI03_150SI48_SI03_200SI53_SI03_10SI53_SI03_100SI53_SI03_120SI53_SI03_135SI53_SI03_150SI53_SI03_200SI54_SI03_100SI54_SI03_120SI54_SI03_135SI54_SI03_150SI54_SI03_200SI60_SI03_100SI60_SI03_150SI60_SI03_200contig lengthMetagenome SampleChemical condition: oxic dysoxic suboxic anoxic suldicP4 P12LineP (stations)Saanich Inlet(Cruises)P16 P20 P26 SI34 SI36 SI42 SI47 SI48 SI53 SI54 SI60P4 P12LineP (stations)Saanich Inlet(Cruises)P16 P20 P26 SI34 SI36 SI42 SI47 SI48 SI53 SI54 SI60 P4 P12LineP (stations)Saanich Inlet(Cruises)P16 P20 P26 SI34 SI36 SI42 SI47 SI48 SI53 SI54 SI60ZA3312c_A ZA3312c_B HF770D10Arctic96B-7--A Arctic96B-7-B SHAN400SHBH1141Figure C.5: Origin, length and abundance of contigs in population genomes. Distribution of metagenomic contigs from North easternsubarctic Pacific (NESAP) Ocean and Saanich Inlet metagenomic samples making up indicated SAG population genomes. Each dot represents acontig, sample origin is shown on the X-axis as indicated for NESAP and Saanich Inlet and contig length of recruited contigs is shown on theY-axis. Colours represent the chemical condition of the water column at time of sampling for metagenoms: oxic (>90 µm O2; yellow), dysoxic(90 µm < O2 > 20 µm; green), suboxic (20 µm< O2 > 2 µm; teal), anoxic (< 2 µm O2; blue) and sulfidic (purple).193-410810-2494203757504071359Redox PairE o’ (mV)NO3- / NO2- -240-210CO2 / CH4 Sn 2-/ H2SH+/ H2SO4 2-/ HS-NO3- / N2 O2/ H2O N2O / N2 NO3- / NONO2- / NON2O respirationZA3312c-AArctic96B-7-ASHAN400SHBH1141N2O RespirationPartial DenitricationPartial DenitricationAerobicAerobicAerobicPoly SuldePoly SuldePoly SuldePoly SuldeHydrogen MetabolismProteorhodopsinAverage expression under indicated chemical conditionsRnf Reverse Electron TransportA.B.CytCNosZHyaACyCuNosZPsrBPsrCBac123430006000900012000Mean transcript abundance (RPKM)Mean transcript abundance (RPKM)CyCuCytCNarHNarINarJNarKPsrAPsrCNarGPsrAPsrBPsrAPsrABPsrBPsrCRnfARnfBRnfCRnfDRnfERnfGNarHNarINarJNarKPsrABPsrBPsrCNarGPsrAHyaBHyaDOXICDYSOXICSUBOXICANOXICSULFIDICOXICDYSOXICSUBOXICANOXICSULFIDICNxrNxrNirSHaoHzoNapNarGNarJNirKNorCBFccDsrC−likeSUP05 unclutured GammaproteobacteriaPlanctomycetesNitrospinaFigure C.6: Expression of energy metabolism enzyme subunits from Marinimicrobia and co-metabolicpartners. (A) Expression of energy metabolism gene subunits average over water column redox regimes(oxic, dysoxic, suboxic, anoxic, sulfidic), mapped to redox pairs on the electron tower. (B) Expressionselected energy metabolism genes for proposed co-metabolic partners in Saanich Inlet. 19420015010010MetagenomeMetatranscriptomeDepth (m)Depth (m)RPKMRenewal Event25507510012013520015010010120135Jun Aug Nov Feb Jul Aug Jan Feb Aug Aug-120122009 2010Cruise2011Aug-28 Sep-10 Sep-20Sep 12009 2010 2011 2012SHBH1141 nosZ ZA3312c nosZ SHBH1141 nosZ ZA3312c nosZ MetagenomeMetatranscriptomeNo Marinimicrobia nosZ detectedFigure C.7: Marinimicrobia nosZ genes and expression in Saanich Inlet Time Series MarinimicrobianosZ abundance in Saanich Inlet time series metagenomes and metatranscriptomes. Dot size representssummed RPKM for each nosZ type in a given metagenome or metatranscriptome.195200150100501020015010050102001501005010Cytochrome bc1, cCytochrome c, mono− and diheme variantscytochrome c oxidase, subunit Icytochrome oxidase, caa(3)−type oxidase, subunit IVcytochrome c oxidase, cbb3−type, subunit I/IIquinol:cytochrome c oxidoreductase pentaheme cytochromeHeme/copper−type cytochrome/quinol oxidase, subunit 1NADH dehydrogenase subunit BNADH:ubiquinone oxidoreductase, Na(+)−translocating, subunit BRnfAFerredoxinPyruvate:ferredoxin oxidoreductase, alphaCitrate lyase beta subunitCitrate lyase, alpha subunitCytochrome bc1, cCytochrome c, mono− and diheme variantscytochrome c oxidase, subunit Icytochrome oxidase, caa(3)−type oxidase, subunit IVcytochrome c oxidase, cbb3−type, subunit I/IIquinol:cytochrome c oxidoreductase pentaheme cytochromeHeme/copper−type cytochrome/quinol oxidase, subunit 1NADH dehydrogenase subunit BNADH:ubiquinone oxidoreductase, Na(+)−translocating, subunit BRnfAFerredoxinPyruvate:ferredoxin oxidoreductase, alphaCitrate lyase beta subunitCitrate lyase, alpha subunitCytochrome bc1, cCytochrome c, mono− and diheme variantscytochrome c oxidase, subunit Icytochrome oxidase, caa(3)−type oxidase, subunit IVcytochrome c oxidase, cbb3−type, subunit I/IIquinol:cytochrome c oxidoreductase pentaheme cytochromeHeme/copper−type cytochrome/quinol oxidase, subunit 1NADH dehydrogenase subunit BNADH:ubiquinone oxidoreductase, Na(+)−translocating, subunit BRnfAFerredoxinPyruvate:ferredoxin oxidoreductase, alphaCitrate lyase beta subunitCitrate lyase, alpha subunitCytochrome bc1, cCytochrome c, mono− and diheme variantscytochrome c oxidase, subunit Icytochrome oxidase, caa(3)−type oxidase, subunit IVcytochrome c oxidase, cbb3−type, subunit I/IIquinol:cytochrome c oxidoreductase pentaheme cytochromeHeme/copper−type cytochrome/quinol oxidase, subunit 1NADH dehydrogenase subunit BNADH:ubiquinone oxidoreductase, Na(+)−translocating, subunit BRnfAFerredoxinPyruvate:ferredoxin oxidoreductase, alphaCitrate lyase beta subunitCitrate lyase, alpha subunitCytochrome bc1, cCytochrome c, mono− and diheme variantscytochrome c oxidase, subunit Icytochrome oxidase, caa(3)−type oxidase, subunit IVcytochrome c oxidase, cbb3−type, subunit I/IIquinol:cytochrome c oxidoreductase pentaheme cytochromeHeme/copper−type cytochrome/quinol oxidase, subunit 1NADH dehydrogenase subunit BNADH:ubiquinone oxidoreductase, Na(+)−translocating, subunit BRnfAFerredoxinPyruvate:ferredoxin oxidoreductase, alphaCitrate lyase beta subunitCitrate lyase, alpha subunit10203040ZA3312c-A Arctic96B-7-A Arctic96B-7-B SHAN400 SHBH1141July 2010August 2010Feb 2011depth (m)Chemical statusScale(RPKM)AnoxicSuboxicSuldicDysoxicOxicFigure C.8: Differential expression of enzymes involved in electron transfer in population genomes. Expression of selected enzymesinvolved in aerobic and anaerobic electron shuttling and energy production in Saanich Inlet station S3 at three time points from 10 to 200 m. asreads per kilobase mapped (RPKM) for metatranscriptomics reads mapped to the selected genes for the indicated population genomes. Watercolumn redox status for each time point encoded on left axis.196TRHydrogen metabolism - (SHBH1141_Hyb cassette)HybO HybC HybDHybA HybBNitrous oxide metabolism - (SHBH1141_Nos cassette)CytC NosLNosZ NosDPolyS metabolism - (SHBH1141_Psr cassette)TonB-dep. OM receptorTonB-dep. OMRFrhG FrhAOxdC OxlT IfoA IfoBPsrC PsrA-BCytC CytCABC ABCPolyS & Sulfur metabolism - (SHBH1141_PsrYed cassette) - (Arctic96B-7-A_PsrYed cassette)Nitrate/Nitrite reduction & PolyS metabolism - (Arctic96B-7-A_NarPsr cassette)Oxalate/formate + H2 metabolism & putative Reverse Electron Transport - (Arctic96B-7-A_Ox cassette) Clade SHBH1141TRTR CytC2RICSCO1 TRCytC3 CytCNirKPsrA-B PsrCNarG NarH NarJ NarI NarKPsrCPsrA PsrB YedE YedEPsrCPsrA PsrB YedE YedEHypANorS Clade SHAN400CoxACoxAClade Arctic96B-7-AProteorhodopsinCytC oxidaseCytbc1HydrogenasePolyS reductaseNO 3- reductaseNO 2- reductaseN 2O reductaseCitrate lyaseRnf Hdr-IfoHMTAb91-AHMTAb91-BZA3312c-AZA3312c-BZA3648cHF770D10SHBH1141SHAN400Arctic96B-7-BArctic96B-7-ABANitrous oxide metabolism - (ZA3312c _Nos cassette)CytC NosL NosYNosZ NosD ABCFigure C.9: Energy metabolism summary and operons across Marinimicrobia clades. (A) summary ofenergy metabolism, carbon fixation and co-metabolic interdependency (Rnf and Hdr-Ifo) for Marinimicrobialineages. (B) Operons in Marinimicrobia SAGs showing different gene arrangement in different lineages.197Appendix DChapter 5: Supplementary materialTable D.1: Summary of CheckM statistics for SAGs with taxonomies containing nosZTaxonomy Completeness Contamination Strain HeterogeneityBacteroidales 59.64 3.59 13.61Marinimicrobia 69.79 1.36 2.78SAR324 63.92 2.73 18.29Arcobacteraceae 47.70 3.33 25.12SUP05 1c 38.49 1.64 18.75SUP05 1a 40.72 0.35 18.40Ectothiorhodospirales 61.41 6.65 10.08198Table D.2: Metagenome and metatranscriptome RPKM for clades and nodes by chemistryChemistry Clade Metagenome RPKM Metatranscriptome RPKMOxic 2 1.789 0.000Oxic 3 0.864 0.000Oxic 4 1.247 0.000Oxic 5 62.503 0.000Oxic 6 44.001 0.000Oxic 7 3.554 0.000Oxic 8 4.880 0.000Oxic 9 101.691 0.000Oxic 10 2.962 0.000Oxic 11 1.115 0.000Oxic 12 4.457 0.000Oxic 13 6.629 0.000Oxic 2-10 2.279 0.000Oxic 4-6 2.035 0.000Dysoxic 2 6.760 4.200Dysoxic 3 1.000 0.000Dysoxic 4 0.781 0.000Dysoxic 5 22.801 15.557Dysoxic 6 13.464 7.322Dysoxic 7 1.671 3.022Dysoxic 8 7.899 1.430Dysoxic 9 22.055 5.256Dysoxic 10 3.642 2.067Dysoxic 11 1.146 0.000Dysoxic 12 2.776 0.000Dysoxic 13 16.068 14.826Dysoxic 2-10 1.736 3.689Dysoxic 2-13 1.000 0.000Dysoxic 4-6 3.105 0.000Dysoxic 11-13 1.000 0.000Suboxic 2 4.918 5.093Suboxic 3 1.564 4.814Suboxic 4 0.000 2.312Suboxic 5 20.910 22.387Suboxic 6 53.310 98.863Suboxic 7 1.720 3.040Suboxic 8 11.306 11.707Suboxic 9 21.076 20.517Suboxic 10 3.211 5.519Suboxic 11 1.248 7.200Suboxic 12 2.097 8.000Suboxic 13 21.053 86.593Suboxic 2-10 1.127 6.305199Metagenome and metatranscriptome RPKM for clades and nodes by chemistry (continued from previouspage)Chemistry Clade Metagenome RPKM Metatranscriptome RPKMSuboxic 4-6 6.732 0.000Suboxic 9-10 1.000 1.521Suboxic 11-13 0.434 0.661Anoxic 2 3.757 3.304Anoxic 4 1.000 0.000Anoxic 5 10.916 29.115Anoxic 6 63.105 30.994Anoxic 7 3.030 0.000Anoxic 8 12.176 7.684Anoxic 9 35.197 8.986Anoxic 10 1.691 2.834Anoxic 11 3.010 0.000Anoxic 12 3.000 2.058Anoxic 13 14.373 16.798Anoxic 2-10 1.000 2.036Anoxic 4-6 2.000 0.000Anoxic 9-10 1.000 2.379Anoxic 11-13 1.000 0.000Sulfidic 2 4.234 7.283Sulfidic 3 1.066 0.970Sulfidic 4 1.000 1.657Sulfidic 5 55.338 193.119Sulfidic 6 81.584 247.016Sulfidic 7 7.583 2.996Sulfidic 8 9.301 6.642Sulfidic 9 38.678 22.478Sulfidic 10 5.695 2.905Sulfidic 11 1.208 2.147Sulfidic 12 2.000 0.911Sulfidic 13 16.960 72.401Sulfidic 2-10 6.394 2.854Sulfidic 4-6 1.000 0.000Sulfidic 7-10 0.000 0.654Sulfidic 9-10 1.732 3.311Sulfidic 11-13 0.680 1.092Unknown 1 9.200 0.000Unknown 2 3.484 0.000Unknown 3 4.291 0.000Unknown 4 8.824 0.000Unknown 5 10.813 0.000Unknown 6 7.905 0.000Unknown 7 4.534 0.000Unknown 8 5.062 0.000200Metagenome and metatranscriptome RPKM for clades and nodes by chemistry (continued from previouspage)Chemistry Clade Metagenome RPKM Metatranscriptome RPKMUnknown 9 8.464 0.000Unknown 10 9.056 0.000Unknown 11 7.692 0.000Unknown 12 2.531 0.000Unknown 13 5.221 0.000Unknown 2-10 2.364 0.000Unknown 2-13 2.000 0.000Unknown 4-6 4.543 0.000Unknown 4-10 1.600 0.000Unknown 5-6 1.500 0.000Unknown 7-10 1.000 0.000Unknown 9-10 2.444 0.000Unknown 11-13 1.500 0.000Unknown 12-13 1.750 0.000Table D.3: Total clade abundance and expressionClade Number Metagenome RPKM Metatranscriptome RPKM1 9.200 0.0002 24.941 19.8803 8.785 5.7844 12.852 3.9695 183.281 260.1786 263.368 384.1957 22.092 9.0578 50.624 27.4639 227.162 57.23710 26.257 13.32511 15.420 9.34712 16.860 10.96913 80.303 190.618Internal Node Range:2-10 14.900 14.8842-13 3.000 0.0004-6 19.415 0.0004-10 1.600 0.0005-6 1.500 0.0007-10 1.000 0.6549-10 6.176 7.21111-13 4.614 1.75212-13 1.750 0.000201SUP05_09 clone SGPZ712 (GQ347012)Candidatus ‘Ruthia magnica str. Cm’ (NC_008610)*Candidatus ‘Vesicomyosocius okutanii HA’ (NC_009465)*SUP05_05 clone J8P41000_2F03 (GQ351130)Bathymodiolus azoricus symbiont (AY235676)*Maorithyas hadalis symbiont (AB042413)*SUP05_06 clone SGSX825 (GQ348247) IX0954_GGCTAC (2, 15)SI_Contig34441 (ACSG01000087)*SUP05_01 clone SGSH727 (GQ345539) IX0937_GATGCT (92, 1131)SUP05_27 clone SIAC610 (HQ163721)*IX0954_CCCATG (1, 2) IX0954_CTCAGA (1, 2)SUP05_16 clone SIAC656 (HQ163062)IX0954_ATAATT (6, 9)Namibian upwelling clone nam73_171 (FM246626)*SUP05_07 clone SHBC733 (GQ350494)65ETSP clone ESP-60-K23I-54 (DQ810449)*SUP05_17 clone J8P41000 1G08 (GQ351190)IX0954_GCACTT (1, 2)Namibian upwelling clone nam73_107 (FM246760)*6882SUP05_14 clone SIAC731 (HQ163763)SUP05_15 clone SIAC530 (HQ163688)IX0943_ACAAAC (85, 291)81IX0955_AAGCGA (1, 3)SUP05_03 clone SGSH427 (GQ346245)68SUP05_13 clone SGSH803 (GQ347611)SUP05_25 clone SGSC474 (GQ347140)SUP05_10 clone SIAC516 (HQ163680)826699IX0954_TGCTGG (1, 2)Namibian upwelling clone nam73_142 (FM246783)*SUP05_04 clone SHZZ494 (HQ162919)6880100760.028163264SUP05_1aSUP05_1cSUP05_1bSUP05_2100 m 150 m185 mFigure D.1: SUP05 phylogenetic tree. Maximum likelihood tree for SUP05 small subunit ribosomal RNAsequences from Saanich Inlet single cell amplified genomes (SAGs) and publically available datasets. SUP05clades are indicated to the right of the tree. Number of collected SAGs indicated with bubbles size for 100m (green) 120 m (blue) and 185m (purple).2022007 2008 2011 2012 2013 20142009 2010Jun090100200300010020030001002003000100200300010020030001002003000100200300000000010m100m120m135m150m165m200mSI034SI036SI039SI042SI047SI048SI053SI054SI060SI072SI073SI074SI075Jun09Aug09Nov09Feb10Jul10Aug10Jan11Feb11Aug11Aug12-1Aug12-28Sep12-10Sep12-20Metatranscriptome010020030040001002003004000100200300400010020030040001002003004000100200300400010020030040010m100m120m135m150m165m200mSI034SI036SI039SI042SI047SI048SI053SI054SI060SI072SI073SI074SI075cruise cruiseTotal nosZ RPKMTotal nosZ RPKMMetagenome Jun09Aug09Nov09Feb10Jul10Aug10Jan11Feb11Aug11Aug12-1Aug12-28Sep12-10Sep12-20OxicDysoxicSuboxicAnoxicSuldicChemistry:423 1312105 1169871Clade:Figure D.2: Proportions of nosZ clades in Saanich Inlet metagenome and metatranscriptome. Proportionof nosZ clades with total RPKM indicated by black bar. Sample chemistry is indicated by coloured dotbelow each stacked bar.203100m120m135m150m165m200m100m120m135m150m165m200m5 6 910 13Clade 5 Clade 6 Clade 9Clade 10 Clade 13050001000005000100000500010000050001000005000100000500010000SI042SI047SI048SI054SI072SI073SI074SI075SI042SI047SI048SI054SI072SI073SI074SI075SI042SI047SI048SI054SI072SI073SI074SI075SI042SI047SI048SI054SI072SI073SI074SI075SI042SI047SI048SI054SI072SI073SI074SI075cruiseTotal Clade RPKMTotal Clade RPKM050001000005000100000500010000050001000005000100000500010000Total Clade RPKMClade 5Bacteroidales VC21Marinimicrobia ZA3312cOther Clade 5Clade 9EctothiorhodospiralesSAR324Other Clade 9Clade 6Marinimicrobia SHBH1141Other Clade 6Clade 13SUP05_1aOther Clade 13Clade 10ArcobacteraceaOther Clade 10OxicDysoxicSuboxicAnoxicSuldicNo sample avaliableSample ChemistryFigure D.3: Proportions of nosZ subclades in metatranscriptome. Proportion of RPKM for indicatedsubclades of nosZ with total nosZ clade expression indicated as black bar. Sample chemistry is indicated bycoloured dot below stacked bar for each sample.204010200102001020010200102001020SurfaceDeep Chlorophyll Max250mAAIWNADWBottomStn2Stn7Stn15Stn23KNORR Station NumbernosZ RPKM in Metagenomesequence maped to internal node on tree423 1312105 1169871Clade:Figure D.4: Abundance of nosZ clades for Knorr Cruise. Abundance of nosZ in the metagenome fromthe Knorr cruise in the mid western Atlantic ocean at Station 2 (-38◦N, -45◦W), Station 7(-22.5◦N, -33◦W),Station 15 (-2.7◦N, -28.5◦W) and Station 23 (9.75◦N, -55.3◦W) at depths including Surface, Deep ChlorophyllMaximum (DCM), 250 m, Antarctic Intermediate Water (AAIW, ~800 m), North Atlantic Deep Water(NADW, ~2500 m), Antarctic Bottom Water (AABW, >4000 m).205010203001020300102030SurfaceDeep Chlorophyll MaxMesopelagic7 918222530313233343637383941424546485657586264656667687072767885939698100102109111112122123124125132133137138140141145146149152TARA Station Number & Oceanographic ProvincesnosZ RPKM in Metagenomesequence maped to internal node on treeSuboxicAnoxic423 1312105 1169871Clade: Chemistry:********************* *********includes viral fractionMediterranean & Black SeasPersianGulfAntarcticISSGARABCHILSPSGPEODCHILEastern AfricanCoastalCAMRGUIAGFSTNASTNPSTPNECBenguela CurrentSouth Atlantic Gyre South Pacic Subtropical GyreIndian MonsoonGyres206Figure D.5: Abundance of nosZ clades along TARA Oceans cruise track (previous page). Abundance ofnosZ in the metagenome from the Tara Global Oceans cruise [264] for Surface, Deep Chlorophyll Maximumand Mesopelagic depths. Oceanographic Provinces abbreviations: Northwest Arabian Sea UpwellingProvince (ARAB); Indian South Subtropical Gyre Province(ISSG); Chile-Peru Current Coastal Province(CHIL); South Pacific Subtropical Gyre Province (SPSG); Pacific Equatorial Divergence Province (PEOD);North Pacific Subtropical and Polar Front Provinces (NPST); North Pacific Equatorial CountercurrentProvince (PNEC); Central American Coastal Province (CAMR); Guianas Coastal Province(GUIA); GulfStream Province (GFST); North Atlantic Subtropical Gyral Province(NAST). For OMZ samples chemicalstatus of the water during sampling is indicated by coloured dot at the base of the stacked bar.207