@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Louca, Stilianos"@en ; dcterms:issued "2017-03-31T00:00:00"@en, "2016"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """Microbial metabolic activity drives biogeochemical cycling in virtually every ecosystem. Yet, microbial ecology and its role in ecosystem biochemistry remain poorly understood, partly because the enormous diversity found in microbial communities hinders their modeling. Despite this diversity, the bulk of global biogeochemical fluxes is driven by a few metabolic pathways encoded by a small set of genes, which through time have spread across microbial clades that can replace each other within metabolic niches. Hence, the question arises whether the dynamics of these pathways can be modeled regardless of the hosting organisms, for example based on environmental conditions. Such a pathway-centric paradigm would greatly simplify the modeling of microbial processes at ecosystem scales. Here I investigate the applicability of a pathway-centric paradigm for microbial ecology. By examining microbial communities in replicate "miniature" aquatic environments, I show that similar ecosystems can exhibit similar metabolic functional community structure, despite highly variable taxonomic composition within individual functional groups. Further, using data from a recent ocean survey I show that environmental conditions strongly explain the distribution of microbial metabolic functional groups across the world's oceans, but only poorly explain the taxonomic composition within individual functional groups. Using statistical tools and mathematical models I conclude that biotic interactions, such as competition and predation, likely underlie much of the taxonomic variation within functional groups observed in the aforementioned studies. The above findings strongly support a pathway-centric paradigm, in which the distribution and activity of microbial metabolic pathways is strongly determined by energetic and stoichiometric constraints, whereas additional mechanisms shape the taxonomic composition within metabolic guilds. These findings motivated me to explore concrete pathway-centric mathematical models for specific ecosystems. Notably, I constructed a biogeochemical model for Saanich Inlet, a seasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones. The model describes the dynamics of individual microbial metabolic pathways involved in carbon, nitrogen and sulfur cycling, and largely explains geochemical depth profiles as well as DNA, mRNA and protein sequence data. This work yields insight into ocean biogeochemistry and demonstrates the potential of pathway-centric models for microbial ecology."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/59313?expand=metadata"@en ; skos:note "THE ECOLOGY OF MICROBIAL METABOLIC PATHWAYSbyStilianos LoucaB.Sc., Mathematics, Friedrich Schiller University of Jena, Germany, 2010Diplom, Physics, Friedrich Schiller University of Jena, Germany, 2012A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Applied Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2016c© Stilianos Louca, 2016AbstractAbstractMicrobial metabolic activity drives biogeochemical cycling in virtually every ecosystem. Yet,microbial ecology and its role in ecosystem biochemistry remain poorly understood, partlybecause the enormous diversity found in microbial communities hinders their modeling. De-spite this diversity, the bulk of global biogeochemical fluxes is driven by a few metabolicpathways encoded by a small set of genes, which through time have spread across micro-bial clades that can replace each other within metabolic niches. Hence, the question ariseswhether the dynamics of these pathways can be modeled regardless of the hosting organ-isms, for example based on environmental conditions. Such a pathway-centric paradigmwould greatly simplify the modeling of microbial processes at ecosystem scales.Here I investigate the applicability of a pathway-centric paradigm for microbial ecology.By examining microbial communities in replicate “miniature” aquatic environments, I showthat similar ecosystems can exhibit similar metabolic functional community structure, de-spite highly variable taxonomic composition within individual functional groups. Further,using data from a recent ocean survey I show that environmental conditions strongly explainthe distribution of microbial metabolic functional groups across the world’s oceans, but onlypoorly explain the taxonomic composition within individual functional groups. Using statis-tical tools and mathematical models I conclude that biotic interactions, such as competitionand predation, likely underlie much of the taxonomic variation within functional groupsobserved in the aforementioned studies. The above findings strongly support a pathway-centric paradigm, in which the distribution and activity of microbial metabolic pathways isstrongly determined by energetic and stoichiometric constraints, whereas additional mecha-nisms shape the taxonomic composition within metabolic guilds.These findings motivated me to explore concrete pathway-centric mathematical models forspecific ecosystems. Notably, I constructed a biogeochemical model for Saanich Inlet, aseasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones. Themodel describes the dynamics of individual microbial metabolic pathways involved in carbon,nitrogen and sulfur cycling, and largely explains geochemical depth profiles as well as DNA,mRNA and protein sequence data. This work yields insight into ocean biogeochemistry anddemonstrates the potential of pathway-centric models for microbial ecology.iiPrefacePrefaceA number of sections in this work are partly or wholly published, in press or in review.Copyright licenses to all works were obtained and are listed where appropriate.• Chapter 2: Stilianos Louca conceived the project. Stilianos Louca performed the se-quence analysis and statistical analysis with input from Laura W. Parfrey and MichaelDoebeli. Stilianos Louca wrote a first draft of the manuscript. All coauthors con-tributed to the final preparation of the manuscript. Michael Doebeli and Laura Parfreysupervised the project.A version of this chapter is published in Science as a copyrighted article. Copyright(2016), American Association for the Advancement of Science. Reprinted with permis-sion from AAAS, from:Louca, S., Parfrey, L.W., Doebeli, M. (2016). Decoupling function and taxonomy inthe global ocean microbiome. Science. 353:1272–1277. DOI:10.1126/science.aaf4507.• Chapter 3: Stilianos Louca, Vinicius F. Farjalla, Saulo M. S. Jacques, Aliny P. F. Pires,Juliana S. Leal performed the field work. Vinicius F. Farjalla and Saulo M. S. Jacquesperformed the chemical measurements in the laboratory. Stilianos Louca performedthe molecular work in the laboratory, the DNA sequence analysis and the statisticalanalyses. Stilianos Louca, Michael Doebeli, Vinicius F. Farjalla, Diane S. Srivastava.and Laura W. Parfrey interpreted the statistical findings. Stilianos Louca wrote a firstdraft of the manuscript, and all authors contributed to the final preparation of themanuscript. Michael Doebeli and Vinicius F. Farjalla supervised the project.A version of this chapter is under peer review for publication:Louca, S., Jacques, S.M.S., Pires, A.P.F., Leal, J.S., Srivastava, D.S., Parfrey, L.W.,Farjalla, V.F., Doebeli, M. (in review). Functional stability despite high taxonomicvariability across microbial communities.• Chapter 4: Stilianos Louca conceived and developed MCM. Stilianos Louca designedthe E. coli example and performed the simulations. S.D. and Michael Doebeli analyzedthe simulation results. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.Software developed in this chapter is available under the GNU Lesser General Public Li-cense (http://www.gnu.org/licenses/lgpl.html) at: http://www.zoology.ubc.ca/MCMA version of this chapter is published in eLife under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/4.0):Louca, S., Doebeli, M. 2015. Calibration and analysis of genome-based models formicrobial ecology. eLife. 4:e08208. DOI:10.7554/eLife.08208iiiPreface• Chapter 5: Stilianos Louca conceived the project, ran the simulations and performedthe statistical analysis. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.A version of this chapter is published in Environmental Microbiology as a copyrightedarticle. Copyright (2015) Society for Applied Microbiology and John Wiley & SonsLtd. Reprinted, with permission, from:Louca, S., Doebeli, M. 2015. Transient dynamics of competitive exclusion in micro-bial communities. Environmental Microbiology. 18:1863–1874. DOI:10.1111/1462-2920.13058• Chapter 6: Stilianos Louca conceived the project, ran the simulations and performedthe statistical analysis. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.A version of this chapter is under peer review for publication:Louca, S., Doebeli, M. (in review). Taxonomic variability and functional stability inmicrobial communities infected by phages.• Chapter 7: Sean A. Crowe and Steven J. Hallam had the idea for the research. StilianosLouca constructed the mathematical models with input from Sergei Katsev, and Stil-ianos Louca performed the simulations. Stilianos Louca, Sergei Katsev, Alyse Hawley,Sean A. Crowe and Steven J. Hallam analyzed the model predictions. Alyse Hawley,Maya P. Bhatia, Monica Torres-Beltran and Steven J. Hallam performed the sequenc-ing. Alyse Hawley, Maya P. Bhatia, Stilianos Louca and Steven J. Hallam analyzed thesequence data. Monica Torres-Beltran, Alyse Hawley, Celine Michiels, Dave Capelle,Sergei Katsev, Gaute Lavik, Sean A. Crowe and Steven J. Hallam collected the chem-ical and physical data. Stilianos Louca wrote the manuscript with Sean A. Crowe andSteven J. Hallam. All authors contributed to the final preparation of the manuscript.Sergei Katsev, Sean A. Crowe, Michael Doebeli and Steven J. Hallam supervised theproject.A version of this chapter is published in PNAS as a copyrighted article. Copyright(2016) National Academy of Sciences. Reprinted, with permission, from:Louca, S., Hawley, A.K., Katsev, S., Torres-Beltran, M., Bhatia, P.B., Kheirandish,S., Michiels, C., Capelle, D., Lavik, G., Doebeli, M., Crowe, S.A., and Hallam, S.J.(2016). Integrating biogeochemistry with multi-omic sequence information in a modeloxygen minimum zone. Proceedings of the National Academy of Sciences.DOI:10.1073/pnas.1602897113• Chapter 8: Stilianos Louca conceived the project, ran the simulations and performedthe statistical analysis. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.A version of this chapter is published in Ecological Modelling as a copyrighted article.Copyright (2016) Elsevier. Reprinted, with permission, from:Louca, S., Doebeli, M. 2016. Reaction-centric modeling of microbial ecosystems. Eco-logical Modelling. 335:74–86. DOI:10.1016/j.ecolmodel.2016.05.011ivPrefaceThroughout this dissertation the word “we” refers to Stilianos Louca unless otherwise stated.None of the work encompassing this dissertation required consultation with the UBC Re-search Ethics Board.vTable of ContentsTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Opening chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Objectives and overview of this dissertation . . . . . . . . . . . . . . . . . . 62 The decoupling of function and taxonomy in the global ocean microbiome 82.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Environmental conditions mainly affect microbial function . . . . . . . . . . 102.4 Causes of variation within functional groups . . . . . . . . . . . . . . . . . . 112.5 Beyond taxonomic profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Functional stability despite high taxonomic variability across microbialcommunities in bromeliad tanks . . . . . . . . . . . . . . . . . . . . . . . . . 213.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.1 Functional stability contrasts with taxonomic variability . . . . . . . 233.3.2 Causes of variation within functional groups . . . . . . . . . . . . . . 243.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Calibration and analysis of cell-metabolic models for microbial ecology 32viTable of Contents4.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4.1 Successional dynamics of a microbial community . . . . . . . . . . . . 364.4.2 Experimental calibration . . . . . . . . . . . . . . . . . . . . . . . . . 374.4.3 Predicting microbial community dynamics . . . . . . . . . . . . . . . 384.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Transient dynamics of competitive exclusion in microbial communities . 485.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.3.1 Competition for a common resource . . . . . . . . . . . . . . . . . . . 495.3.2 Bioreactors as model systems . . . . . . . . . . . . . . . . . . . . . . 525.3.3 Bioreactor community dynamics . . . . . . . . . . . . . . . . . . . . . 535.3.4 Variable does not mean unstable . . . . . . . . . . . . . . . . . . . . 555.3.5 Model limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3.6 Alternative destabilizing factors . . . . . . . . . . . . . . . . . . . . . 565.3.7 Towards a pathway-centric microbial ecology . . . . . . . . . . . . . . 575.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Taxonomic variability and functional stability in microbial communitiesinfected by phages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3.1 Bioreactor dynamics in the absence of phages . . . . . . . . . . . . . 666.3.2 Effects of phages on community dynamics . . . . . . . . . . . . . . . 676.3.3 Functional redundancy promotes functional stability . . . . . . . . . 686.3.4 Statistical averaging or dynamic stabilization? . . . . . . . . . . . . . 716.3.5 Phages promote prokaryotic diversity . . . . . . . . . . . . . . . . . . 726.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Gene-centric modeling of the Saanich Inlet oxygen minimum zone . . . 787.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.3 Construction and calibration of a gene-centric model . . . . . . . . . . . . . 807.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.4.1 DNA profiles and process rates . . . . . . . . . . . . . . . . . . . . . 817.4.2 mRNA and protein profiles . . . . . . . . . . . . . . . . . . . . . . . . 847.4.3 Consequences for geobiology . . . . . . . . . . . . . . . . . . . . . . . 867.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Reaction-centric modeling of microbial community metabolism . . . . . 92viiTable of Contents8.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.3 Derivation of the reaction-centric framework . . . . . . . . . . . . . . . . . . 958.3.1 One reaction per cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 958.3.2 Multiple reactions per cell . . . . . . . . . . . . . . . . . . . . . . . . 978.4 Demonstration of the reaction-centric framework . . . . . . . . . . . . . . . . 998.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.4.2 Example 1: Urea hydrolysis and nitrification in a batch-fed incubator 998.4.3 Example 2: Nitrification in a flow-through bioreactor . . . . . . . . . 1028.4.4 Estimating concentrations of other organic compounds . . . . . . . . 1068.4.5 Limitations and extensions of reaction-centric models . . . . . . . . . 1078.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089 Closing chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169.2 So, what is life? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Summary of software contributions . . . . . . . . . . . . . . . . . . . . . . . . . 121Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122AppendicesA Chapter 2: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 173A.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173A.1.1 Sequencing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173A.1.2 Functional annotation of prokaryotic taxa . . . . . . . . . . . . . . . 173A.1.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174A.2 Resolving ambiguities in gene-centric metagenomics . . . . . . . . . . . . . . 178A.3 Comparison with Sunagawa et al. (2015) . . . . . . . . . . . . . . . . . . . . 179B Chapter 3: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 202B.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202B.1.1 Biological sample collection . . . . . . . . . . . . . . . . . . . . . . . 202B.1.2 Chemical analysis of tank water . . . . . . . . . . . . . . . . . . . . . 202B.1.3 Measurement of other physicochemical variables . . . . . . . . . . . . 204B.1.4 16S sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205B.1.5 Functional annotation of prokaryotic taxa (FAPROTAX) . . . . . . . 206B.1.6 Metagenomic sequencing . . . . . . . . . . . . . . . . . . . . . . . . . 207B.1.7 Comparing functional and taxonomic variability . . . . . . . . . . . . 208B.1.8 Metric multidimensional scaling and coloring . . . . . . . . . . . . . . 209B.1.9 Phylogenetic community structure . . . . . . . . . . . . . . . . . . . . 210B.1.10 Comparing OTU proportions to environmental variables . . . . . . . 211B.1.11 Comparing community dissimilarities to geographical distances . . . . 212B.1.12 Comparing OTU co-occurrences to a null model . . . . . . . . . . . . 212B.1.13 Sequence data availability . . . . . . . . . . . . . . . . . . . . . . . . 213viiiTable of ContentsC Chapter 4: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 235C.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235C.1.1 MCM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235C.1.2 Calibration of E. coli cell models . . . . . . . . . . . . . . . . . . . . 239C.1.3 Simulation of the microbial community model . . . . . . . . . . . . . 240C.1.4 Robustness of the SS-FS coexistence . . . . . . . . . . . . . . . . . . 240C.1.5 Seasonal restriction of the SS-FS co-cultures . . . . . . . . . . . . . . 240D Chapter 5: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 244D.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244D.1.1 Computational framework . . . . . . . . . . . . . . . . . . . . . . . . 244D.1.2 Construction of the cell models . . . . . . . . . . . . . . . . . . . . . 245D.1.3 Calibration of the template cell models . . . . . . . . . . . . . . . . . 245D.1.4 Nitrifying membrane bioreactor model . . . . . . . . . . . . . . . . . 246D.1.5 Statistics of community convergence . . . . . . . . . . . . . . . . . . 247D.2 Elaboration on the competition model . . . . . . . . . . . . . . . . . . . . . 247D.3 Details on the bioreactor model . . . . . . . . . . . . . . . . . . . . . . . . . 248D.3.1 Construction of cell models . . . . . . . . . . . . . . . . . . . . . . . 248D.3.2 Metabolites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249D.3.3 Reaction network for AOB . . . . . . . . . . . . . . . . . . . . . . . . 251D.3.4 Reaction network for NOB . . . . . . . . . . . . . . . . . . . . . . . . 253D.3.5 Uptake kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254D.3.6 Community-scale dynamics . . . . . . . . . . . . . . . . . . . . . . . 255E Chapter 6: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 257E.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257E.1.1 Model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257E.1.2 Reaction rates and metabolite dynamics . . . . . . . . . . . . . . . . 257E.1.3 Gibbs free energy and cell production . . . . . . . . . . . . . . . . . . 258E.1.4 Cell and phage population dynamics . . . . . . . . . . . . . . . . . . 259E.1.5 Parameterization and simulations . . . . . . . . . . . . . . . . . . . . 260E.1.6 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261E.1.7 Deterministic vs stochastic competitive exclusion . . . . . . . . . . . 262F Chapter 7: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 269F.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269F.1.1 Model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269F.1.2 Mathematical model structure . . . . . . . . . . . . . . . . . . . . . . 269F.1.3 Considered pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . 270F.1.4 Model calibration and data . . . . . . . . . . . . . . . . . . . . . . . 272F.1.5 mRNA and protein models . . . . . . . . . . . . . . . . . . . . . . . . 272F.1.6 Inverse linear transport modeling . . . . . . . . . . . . . . . . . . . . 273F.2 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274F.2.1 Sampling site and time . . . . . . . . . . . . . . . . . . . . . . . . . . 274F.2.2 Chemical and physical depth profiles . . . . . . . . . . . . . . . . . . 275ixTable of ContentsF.2.3 Metagenomics, metatranscriptomics and metaproteomics . . . . . . . 276F.2.4 Quantifying metagenomic and metatranscriptomic data using RPKM 277F.2.5 Process rate measurements . . . . . . . . . . . . . . . . . . . . . . . . 278F.2.6 qPCR quantification of SUP05 cell counts . . . . . . . . . . . . . . . 279F.3 Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280F.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280F.3.2 Considered pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . 282F.3.3 Pathway stoichiometry . . . . . . . . . . . . . . . . . . . . . . . . . . 283F.3.4 Reaction kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284F.3.5 Gibbs free energy and gene growth . . . . . . . . . . . . . . . . . . . 285F.3.6 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 286F.3.7 Model parameterization . . . . . . . . . . . . . . . . . . . . . . . . . 286F.3.8 Calibrating reaction-kinetic parameters to data . . . . . . . . . . . . 290F.3.9 Calibrating multi-omic data units . . . . . . . . . . . . . . . . . . . . 291F.3.10 Predicting metatranscriptomic and metaproteomic profiles . . . . . . 292F.3.11 Calculating metabolic fluxes between pathways . . . . . . . . . . . . 297F.3.12 Local sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . 298F.4 Caveats and special notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300F.4.1 The role of sulfate reduction . . . . . . . . . . . . . . . . . . . . . . . 300F.4.2 The role of DNRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302F.4.3 The role of aerobic sulfide oxidation . . . . . . . . . . . . . . . . . . . 302F.4.4 Planctomycetes and nxr . . . . . . . . . . . . . . . . . . . . . . . . . 303F.5 Inverse linear transport modeling (ILTM) . . . . . . . . . . . . . . . . . . . . 303G Chapter 8: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 307G.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307G.1.1 Details on example 1 (batch-fed incubator) . . . . . . . . . . . . . . . 307G.1.2 Details on example 2 (flow-through bioreactor) . . . . . . . . . . . . . 310G.1.3 Computational methods: Using MCM for reaction-centric models . . 312G.2 Mathematical proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316G.2.1 On specific maintenance rates . . . . . . . . . . . . . . . . . . . . . . 316G.2.2 On the concentration of organic components . . . . . . . . . . . . . . 317G.2.3 Properties of the amplification matrix . . . . . . . . . . . . . . . . . . 319xList of TablesList of Tables8.1 Overview of symbols and units for cell-centric and reaction-centric models . 111A.1 Overview of oceanographic variables . . . . . . . . . . . . . . . . . . . . . . 181A.2 Overview of considered Tara oceans samples . . . . . . . . . . . . . . . . . . 196A.3 KOG-function associations for the ocean microbiome . . . . . . . . . . . . . 200A.4 OTUs per functional group in the global ocean microbiome . . . . . . . . . 201B.1 OTU overlap between bromeliad microbiomes . . . . . . . . . . . . . . . . . 227B.2 Coefficients of variation for gene abundances across bromeliad microbiomes 228B.3 OTU co-occurrence patterns across bromeliad microbiomes. . . . . . . . . . 229B.4 Geographical distances vs taxonomic dissimilarities across bromeliad micro-biomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229B.5 Physicochemical environmental variables in bromeliads . . . . . . . . . . . . 230B.6 OTUs per functional group in bromeliad microbiomes . . . . . . . . . . . . 231B.7 KOG-function associations for bromeliad microbiomes . . . . . . . . . . . . 232C.1 Fitted parameters for the individual E. coli cell models . . . . . . . . . . . 241E.1 Pathway stoichiometry in a methanogenic bioreactor . . . . . . . . . . . . . 267E.2 Model parameters for a methanogenic bioreactor . . . . . . . . . . . . . . . 268F.1 KEGG-function associations for the global ocean microbiome . . . . . . . . 277F.1 Boundary conditions for the gene-centric model . . . . . . . . . . . . . . . . 287F.2 Reaction-kinetic parameters used in the gene-centric model . . . . . . . . . 289F.3 Rescaling factors for metagenomic profiles . . . . . . . . . . . . . . . . . . . 293F.4 Proportionality factors for mRNA and protein models . . . . . . . . . . . . 295G.1 Model parameters for a ureolytic bioreactor . . . . . . . . . . . . . . . . . . 309G.2 Model parameters for a nitrifying bioreactor . . . . . . . . . . . . . . . . . . 311xiList of FiguresList of Figures2.1 Linking functional and taxonomic composition to environmental conditions. 162.2 Environmental filtering of microbial communities in the global ocean. . . . . 172.3 Functional redundancy in the global ocean microbiome. . . . . . . . . . . . 182.4 Microbial community differences vs geographical distance. . . . . . . . . . . 192.5 Regression of the taxonomic composition of the global ocean microbiome . . 203.1 Bromeliad species used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Functional stability vs taxonomic variability of bromeliad microbiomes . . . 293.3 OTU-function associations . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 Phylogenetic dispersion in bromeliad microbiomes . . . . . . . . . . . . . . 303.5 Relating OTU proportions to environmental variables across bromeliads . . 314.1 Conceptual framework used by MCM . . . . . . . . . . . . . . . . . . . . . 424.2 Overview of MCM functionalities . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Relative cell densities during evolution experiments with E. coli . . . . . . . 444.4 Calibration of E. coli cell models using monoculture experiments . . . . . . 444.5 Dynamics of the E. coli microbial community model . . . . . . . . . . . . . 454.6 Metabolic differentiation of E. coli types . . . . . . . . . . . . . . . . . . . . 464.7 Predicted relative cell densities in co-culture when restricted to only one season 475.1 Calibration and validation of AOB and NOB models . . . . . . . . . . . . . 605.2 Simulations of the nitrifying bioreactor under constant conditions . . . . . . 615.3 Simulations of the perturbed nitrifying bioreactor . . . . . . . . . . . . . . . 626.1 Modeling methanogenic communities . . . . . . . . . . . . . . . . . . . . . . 756.2 Phage predation drives OTU turnover . . . . . . . . . . . . . . . . . . . . . 766.3 Effects of functional redundancy on methane production . . . . . . . . . . . 766.4 Effects of functional redundancy on functional community composition . . . 777.1 Metabolic network and selected chemical time series in Saanich Inlet . . . . 897.2 Measured vs predicted chemical depth profiles in Saanich Inlet . . . . . . . 907.3 Molecular and reaction rate profiles in Saanich Inlet, compared to modelpredictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918.1 Modeling urea hydrolysis and nitrification . . . . . . . . . . . . . . . . . . . 1108.2 Model predictions and data for a ureolytic bioreactor . . . . . . . . . . . . . 1128.3 Reconstructing a bioreactor’s state using chemical time series . . . . . . . . 1138.4 Information needed to estimate the state of a reaction-centric model . . . . 1148.5 Model predictions and data for a nitrifying bioreactor . . . . . . . . . . . . 1159.1 Steady state in the Saanich Inlet OMZ . . . . . . . . . . . . . . . . . . . . . 118xiiList of Figures9.2 Community dynamics in a methanogenic bioreactor, compared to turbulentthermal convection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119A.1 Correlation analysis at various taxonomic levels . . . . . . . . . . . . . . . . 182A.2 Environmental filtering at higher taxonomic levels . . . . . . . . . . . . . . 183A.3 Functional vs taxonomic community profiles of the ocean microbiome . . . . 184A.4 Functional dissimilarities vs geographical distances in the ocean microbiome 185A.5 Phenotype-based vs metagenomic functional profiles . . . . . . . . . . . . . 186A.6 Correlations between functional groups in the global ocean microbiome . . . 187A.7 Taxonomic compositions within functional groups in the ocean microbiome 188A.8 Community dissimilarities vs geographical distances in the surface and DCM 189A.9 Sampling locations of the Tara oceans survey . . . . . . . . . . . . . . . . . 190A.10 Correlations between environmental variables across the global ocean . . . . 191A.11 Functional group overlaps in the global ocean microbiome . . . . . . . . . . 192A.12 Correlations between functional groups in the global ocean microbiome . . . 193A.13 Taxonomic dissimilarities vs geographical distances in the global ocean mi-crobiome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194A.14 Functional vs taxonomic determinism in the ocean surface microbiome . . . 195B.1 Genus-level composition within functional groups across bromeliad micro-biomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214B.2 Family-level composition within functional groups across bromeliad micro-biomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215B.3 Order-level composition within functional groups across bromeliad microbiomes216B.4 Class-level composition within functional groups across bromeliad microbiomes217B.5 Geographic distances vs dissimilarities within functional groups across bromeliadmicrobiomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218B.6 Geographic location vs composition within functional groups in bromeliadmicrobiomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219B.7 PARAFAC model components of bromeliad DOC . . . . . . . . . . . . . . . 220B.8 Modeling EEMs of bromeliad DOC with PARAFAC . . . . . . . . . . . . . 221B.9 16S rDNA rarefaction curves for bromeliad microbiomes . . . . . . . . . . . 222B.10 Detailed functional profiles of bromeliad microbiomes . . . . . . . . . . . . . 223B.11 Functional redundancy in bromeliad microbiomes at the genus level . . . . . 224B.12 Functional redundancy in bromeliad microbiomes at the family level . . . . 225B.13 Functional redundancy in bromeliad microbiomes at the class level . . . . . 226C.1 Predicted relative cell densities of the A and FS E. coli types in co-culture . 242C.2 Robustness of the predicted stable coexistence of the SS and FS E. coli typesin co-culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242C.3 Measured relative cell densities of the SS and FS types in batch co-culturewhen restricted to only one “season” . . . . . . . . . . . . . . . . . . . . . . 243E.1 Predicted metabolite uptake rates in a methanogenic bioreactor . . . . . . . 264E.2 Predicted metabolite export rates in a methanogenic bioreactor . . . . . . . 265E.3 Predicted phage-host trajectories in a methanogenic bioreactor . . . . . . . 266xiiiList of FiguresF.1 Temperature, salinity and POM profiles in Saanich Inlet . . . . . . . . . . . 290F.2 Overview of the gene-centric modeling approach for the Saanich Inlet OMZ 296F.3 Local sensitivity analysis of the gene-centric model . . . . . . . . . . . . . . 299F.1 multi-omic depth profiles for various sulfur and nitrogen cycling genes in theSaanich Inlet OMZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301G.1 Comparison of models for ammonium production in a ureolytic bioreactor . 321G.2 Comparison of models with and without ure-amo cross-amplification in aureolytic bioreactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322xivList of AbbreviationsList of Abbreviationsamo Aerobic ammonium oxidationAOB ammonia oxidizing bacteriaCH4 methaneCO2 carbon dioxideCOG clusters of orthologous genesDNRA dissimilatory nitrate reduction to ammoniumFAPROTAX functional annotation of prokaryotic taxaFBA flux balance analysisH2 hydrogenH2S hydrogen sulfideKEGG Kyoto Encyclopedia of Genes and GenomesKOG KEGG orthologous groupsKTW killing the winnerMCM Microbial Community ModelerMDS multidimensional scalingN2O nitrous oxideNH3 ammoniaNH+4 ammoniumNMDS non-metric multidimensional scalingNO−2 nitriteNO3− nitrateNOB nitrite oxidizing bacterianxr Aerobic nitrite oxidationO2 oxygenOMZ oxygen minimum zonePDNO partial denitrification to nitrous oxidePO−4 phosphateROM remineralization of organic matterOTU operational taxonomic unitSNTZ sulfate-nitrate transition zoneSO2−4 sulfateSSU rRNA small subunit of the 16S rRNA geneTEA terminal electron acceptorure Urea hydrolysisxvAcknowledgementsAcknowledgementsI would like to express my deep appreciation to my supervisor, Prof. Michael Doebeli, whogave me the freedom and tremendous support to chase pretty much any white rabbit thatcrossed my way. I would like to thank all my collaborators, from whom I have learned a lotand with whom I was able to accomplish much of the work presented here: Alyse Hawley,Prof. Sergei Katsev, Captain Monica Torres-Beltran, Maya Bhatia, Celine Michiels, DavidCapelle, Gaute Lavik, Prof. Sean Crowe, Aria Hahn, Prof. Laura Parfrey, Sam Kheirandish,Saulo Jacques, Aliny Pires, Juliana Leal and Prof. Diane Srivastava. Special thanks toProf. Steven Hallam for giving me an entrance to the world of microbial ecology and forplenty of stimulating discussions. Special thanks to Prof. Vinicius Farjalla for tremendoussupport on my field work and for being an amazing companion during my stay in Brazil. Mygratitude to Mellisa Chen for teaching me a multitude of laboratory techniques. Thanks toAndreas Mueller for teaching me DNA extraction. Thanks to the Parfrey lab for hosting andsupporting me along the way. Thanks to the Hallam lab for being such great collaborators.Thanks to Prof. Rebecca Tyson, Prof. Mary O’Connor and Prof. Daniel Coombs fortheir numerous advises and pleasant discussions. Thanks to Jan Finke, Andy Loudon, Prof.Angélica González, Sarah Perez and Matthew Osmond for numerous scientific discussionsand for providing feedback on my manuscripts. Thanks to Prof. Sally Otto for being sucha thought-provoking committee member and for providing feedback on my manuscripts.I am grateful to the Department of Mathematics (UBC), to the Pacific Institute for theMathematical Sciences (PIMS) and to NSERC for funding. Thanks to my parents, brotherand sister for supporting me at all times.xviOpening chapterChapter 1Opening chapterFor such a model there is no need to ask the question “Is the model true?”.If “truth” is to be the “whole truth” the answer must be “No”. The onlyquestion of interest is “Is the model illuminating and useful?”George Box, 19791.1 IntroductionMicroorganisms are the most ancient, the most diverse and the most abundant form of life onEarth (518). The biomass of prokaryotes alone is comparable to that of all plants combined(518), and their distribution extends far beyond that of multicellular organisms (111, 404).The metabolic activity of microorganisms drives the bulk of biogeochemical fluxes in virtu-ally every natural ecosystem (116), including marine sediments (388), soil (231, 494), theopen ocean (79, 455) and freshwater lakes (126). Cyanobacteria, for example, perform upto 35% of global photosynthesis (356), turning solar energy into a redox disequilibrium be-tween carbon and oxygen that powers much of current heterotrophic life. Microorganismsstrongly shape the marine nitrogen and sulfur cycles, thereby modulating global ocean pro-ductivity and climate (136). For example, denitrification and anaerobic ammonia oxidation(anammox), two microbial pathways that utilize nitrogen compounds as alternative terminalelectron acceptors for respiration, can lead to a significant net loss of bioavailable nitrogento N2 (501). On the other hand, nitrification, an exclusively prokaryotic process by whichreduced nitrogen compounds are aerobically oxidized to nitrate for energy, plays a centralrole in soil productivity (375) and industrial processes such as wastewater treatment (520).Understanding the spatiotemporal dynamics of microbial metabolic processes is thereforecentral to understanding overall ecosystem biochemistry and towards optimizing the perfor-mance of microbially driven industrial processes (316).Until recently, difficulties in culturing and therefore characterizing the majority of microor-1Opening chapterganisms has been a bottleneck to microbial ecology (21, 203, 346). With the advent of high-throughput molecular techniques, notably marker-gene sequencing (353, 529), shotgun DNAsequencing (metagenomics; 534), shotgun RNA sequencing (metatranscriptomics; 157, 511)and mass-spectrometry based protein sequencing (metaproteomics; 298, 359), we are nowentering an new era of biological inference (169). These culture-independent techniquesgenerate massive amounts of data and provide unprecedented insight into the composition,metabolic potential and activity of microbial communities (98). However, the generationof these data is still rarely theory-driven and in most cases their mechanistic interpretationremains elusive (209, 241, 380, 401, 459). For example, while taxonomic community profilingcan reveal intriguing variation, for example along the ocean water column (460) or acrossseasons (540), the reasons for the observed taxonomic variation remain largely unknown be-cause the ecological role of most taxa is unknown and can vary strongly even between closelyrelated clades (3, 308).The mechanistic modeling of microbial communities is further complicated by the sheer diver-sity of microorganisms in any particular environment (377, 401). For example, a single gramof soil can harbor several thousands of different — and potentially interacting — microbialspecies (399). Conventional reductionist approaches would require a careful physiologicalcharacterization of each member species. However, even the simplest life forms cannot bestudied in isolation because the ability to catalyze different steps of metabolic pathways orsynthesize required biomolecules is partitioned across different organisms, and hence eachspecies inevitably only constitutes a small link in the overall reaction network sustaining lifein an environment (183, 200, 238, 311). An incorporation of each species into a comprehen-sive mathematical model similar to some mechanistic “macro-ecological” models (230, 412)is thus impractical for most natural microbial communities, although such approaches havebeen suggested in the past (258). In most cases, a different approach is needed for describingmicrobial metabolic processes at ecosystem scales.Despite an enormous microbial diversity, the bulk of global biogeochemical fluxes is drivenby a core set of metabolic pathways, which have evolved and proliferated in response tovarious redox disequilibria available for sustaining life (116). Through time and notablyvia horizontal gene transfer, these pathways have spread across microbial clades that canco-occupy or replace each other within metabolic niches (116, 270, 483). The growth ofeach clade is inevitably coupled to the activity of its energy-yielding pathways, and thisactivity is subject to environmental energetic and stoichiometric constraints such as theavailability of specific electron donors and acceptors (56). For example, the abundance andexpression of genes linked to nitrogen and sulfur metabolism in marine oxygen minimum2Opening chapterzones typically reflect the varying redox conditions along the water column (181, 448). On theother hand, microbial — notably prokaryotic — genomes rapidly lose pathways that are notrequired in their natural environment, presumably due to strong selective pressure for genomestreamlining (41, 149, 250, 334). It is therefore tempting to theorize that these pathways —or more precisely, the genes and operons encoding them — constitute largely independentunits of replication and selection (41, 93), and that environmental physicochemical conditionsprescribe the structure of the community metabolic network regardless of which organismshappen to occupy each metabolic niche (126). The taxonomic composition within each nichemay of course be subject to additional selection processes, such as tolerance to particularenvironmental stressors or susceptibility to specialist phage populations (397, 423), howeverthe resulting variation in taxonomic composition may have little effect on overall metabolicfunctioning. Such a “pathway-centric” paradigm, if applicable, would greatly simplify themodeling of microbially mediated processes at ecosystem scales and would provide holisticinsight into global biogeochemistry. Further, modeling microbial communities at the levelof pathways or genes would enable a direct integration of metagenomic, metatranscriptomicand metaproteomic sequence data, which to date remain largely unutilized in quantitativeecosystem models (258).1.2 Problem statementThe pathway-centric paradigm assumes that the metabolic function of a community some-how becomes decoupled from its specific taxonomic makeup, so that overall communitymetabolism is strongly controlled by physicochemical environmental factors alone, while ad-ditional processes shaping taxonomic composition have little effect on metabolic function.It is a priori unclear what conditions would promote such a decoupling and how appropriatea pathway-centric paradigm is for natural microbial communities. For communities withlow taxonomic richness, or in which certain pathways are only performed by a small set oforganisms, metabolic activity may depend strongly on the particular genomes present. Inparticular, multiple pathways co-occurring in the same genomes will not behave as inde-pendent replicating units, and this will likely lead to deviations from the pathway-centricparadigm. On the other hand, at the community level the potential presence of the samepathways in alternative configurations could enable pathway independence. Hence, func-tional redundancy, i.e., the presence of multiple alternative clades capable of filling a specificmetabolic niche, may promote the decoupling between the taxonomic makeup of a commu-nity and its metabolic function, however this remains to be tested.3Opening chapterA key prediction of the pathway-centric paradigm is that physicochemically similar environ-ments would promote similar metabolic functional community structure, while allowing forstrong taxonomic variation within individual functional groups. This prediction is indeedsupported by observations in engineered ecosystems, such as bioreactors exhibiting strongtaxonomic fluctuations while maintaining constant biochemical performance (122, 495), al-though it remains largely untested for natural microbial communities (25). More generally,one would expect that variable environmental conditions correlate more strongly with com-munity function than with taxonomic composition, especially taxonomic composition withinfunctional groups. An analysis of metagenomes from the Global Ocean Survey (406) indeedrevealed strong correlations between pairwise environmental differences on the one hand,and pairwise metagenomic (but not taxonomic) community differences on the other hand(381). Similarly, bacterial community composition on the macroalgae Ulva australis wasbest explained in terms of metagenomic content rather than species content (52). Thesefindings are consistent with a pathway-centric paradigm, however they do not explicitlyaddress the taxonomic composition within individual functional groups, perhaps becauseassigning metagenomic sequences to specific taxa is a notoriously hard problem (471; which,as we show in Chapter 2, can be circumvented).Even if not absolutely accurate, a pathway-centric paradigm would constitute an elegantand potentially insightful null model for microbial ecology because, as discussed above, itmakes concrete predictions about microbial community metabolism and its interaction withthe environment. Hence, the question is not whether a pathway-centric paradigm wouldbe true; indeed, the short answer is “No”. Rather, how much of the “truth” would wereally be missing in a pathway-centric paradigm, in which pathways are no longer associatedwith a specific host but constitute self-serving replicators that directly interact with theirenvironment? The real question of interest is, would such a paradigm be “illuminatingand useful?” (43) and if so, under which conditions? For example, it may seem intuitivethat a high functional redundancy would promote the decoupling between environmentally-driven pathway dynamics on the one hand, and the particular taxonomic composition of acommunity on the other hand, but this remains to be rigorously examined.Further, any pathway-centric description of microbial communities would only capture partof the full story because it would make no statement about the assembly within individ-ual metabolic guilds, which may be driven by a multitude of additional mechanisms. Suchmechanisms may include biotic interactions such as competition or predation (258, 276, 455),random population drift (349), random colonization order (“lottery effects” 52), spatiallylimited random dispersal (306) and microbial chemical warfare (237). For example, adapta-4Opening chaptertion of bacteriophages to specific hosts can strongly influence bacterial species composition(423) and promote variation of microbial communities through so-called “killing the winnerdynamics” (397, 462). Further, trade-offs between environmental stress tolerance and com-petition may lead to additional environmental filtering within functional groups (159, 362).The question then arises, at what point do these additional processes significantly influencecommunity metabolism?Apart from questions regarding the appropriateness and limitations of a pathway-centricparadigm, questions also emerge regarding the precise mathematical formulation of an eco-logical theory for microbial metabolic pathways. Specifically, how exactly would an envi-ronment “determine” the distribution and activity of particular pathways, and how shouldthe current activity of pathways feed back to the “population dynamics” of these pathways?Further, how could multi-omic (metagenomic, metatranscriptomic and metaproteomic) se-quence data be quantitatively incorporated into such models? Inspiration may be taken fromexisting ecosystem models in which broad microbial processes are represented by homogenousfunctional groups, such as photoautotrophs or detritivores. In these models, the activity offunctional groups is determined by resource-dependent responses, such as Michaelis-Mentenkinetics (211), and their growth is determined by simple biomass-per-substrate yield factors(193, 360). These “functional group models” are much coarser than potential descriptionsof metabolic networks at the pathway level, and this coarseness may explain why multi-omic data are yet to be incorporated in such models. Further, such models rarely accountfor energetic constraints of microbial metabolism because yield factors are considered to befixed model parameters. In reality, however, the energy that can be gained from a metabolicreaction depends on the specific physicochemical state of the local environment, and thuslocal reaction energetics strongly shape the structure of the microbial metabolic network(34, 56, 256). A recent biogeochemical model by Reed et al. (386), which predicts genegrowth rates based on the energy yield from their associated metabolic pathways, consti-tutes a first attempt to construct a thermodynamics-based pathway-centric model. Whilecompelling, the model by Reed et al. (386) stops short of a quantitative integration betweengeochemical and multi-omic sequence data. For example, while the model allowed for a qual-itative comparison between modeled gene production rates and selected transcript profiles, itdoes not provide any explicit mechanistic links (386). In fact, the lack of a quantitative val-idation against process rate measurements or other proxies for activity (e.g., proteins) begsthe question whether such pathway-centric models are adequate descriptions for microbialcommunity metabolism.5Opening chapter1.3 Objectives and overview of this dissertationIn this dissertation, I examine the appropriateness of a pathway-centric paradigm for mi-crobial ecology using microbial community profiling and mathematical modeling. Further,I examine concrete biogeochemical pathway-centric models for specific ecosystems, which Ievaluate using chemical and molecular sequence data.Specifically, in Chapter 2, I use metagenomic data from the Tara Oceans survey (460) toinvestigate the functional structure of bacterial and archaeal communities across the globalocean, as well as the taxonomic composition within each of 28 metabolically defined func-tional groups. For that purpose, I constructed a custom database for functional annota-tions of prokaryotic taxa (“FAPROTAX”) based on extensive literature search. Using thisdatabase, I find that environmental physicochemical conditions strongly predict the dis-tribution of microbial metabolic functional groups, but only poorly predict the taxonomiccomposition within individual functional groups, in line with a pathway-centric paradigm.In Chapter 3, I use metagenomic sequencing and 16S rRNA marker gene sequencing ofbacterial and archaeal communities within the foliage of bromeliad plants, a model system forcommunity ecology (117, 438), to show that similar aquatic environments can indeed sustainsimilar microbial metabolic networks, despite a highly variable taxonomic composition withinindividual metabolic functional groups. I then use statistical tools from community ecologyto elucidate the potential mechanisms shaping the taxonomic composition within functionalgroups. I find that deterministic biotic interactions and additional environmental filtering,but not random drift or dispersal limitation, likely underlie the observed taxonomic variationbetween bromeliad microbiomes.In Chapter 4, I present a computational framework (“Microbial Community Modeler”, shortMCM) that I developed for modeling microbial communities. This framework enables theconstruction of both pathway-centric models as well as cell-centric models (using genome-based metabolic cell models). I validate this framework by modeling previous laboratoryevolution experiments with Escherichia coli (188, 262), during which an ancestral straindiversified into two coexisting ecotypes. The models are able to reproduce the successionaldynamics in the evolution experiments, and yield detailed insights into the metabolic pro-cesses that drove bacterial diversification. Thus, apart from demonstrating the potential ofMCM, this work provides a unifying quantitative perspective on a multitude of co-culturingexperiments performed in our lab over the last 10 years.6Opening chapterIn Chapters 5 and 6 I use MCM to examine specific mechanisms by which biotic interactionsand functional redundancy could potentially affect community composition and metabolism,based on models for nitrifying as well as methanogenic bioreactors. I find support for theinterpretation that biotic interactions, such as competition (Chapter 5) and predation byphages (Chapter 6), may underlie a large part of the taxonomic variation within functionalgroups, reported in Chapters 2 and 3. Further, these models explicitly demonstrate howa higher functional redundancy can lead to a decoupling between functional stability andtaxonomic variability under constant environmental conditions.Taken together, the work presented in Chapters 2, 3, 5 and 6 provides strong support fora pathway-centric paradigm, according to which the distribution and activity of metabolicpathways is strongly determined by physicochemical conditions, whereas additional mecha-nisms shape the taxonomic composition within metabolic guilds. These results motivatedme to develop and test specific pathway-centric biogeochemical models for a series of naturalas well as engineered ecosystems.Specifically, in Chapter 7 I describe a biogeochemical model that I constructed for the oxygen-depleted water column in Saanich Inlet (540), a seasonally anoxic fjord with biogeochemistryanalogous to oxygen minimum zones (OMZs). The model describes the activity and distri-bution of individual microbial metabolic pathways involved in carbon, nitrogen and sulfurcycling in Saanich Inlet and integrates geochemical depth profiles, process rate measurementsas well as DNA, mRNA and protein sequence data.In Chapter 8, I present an alternative pathway-centric modeling framework, in which themetabolic activity of a microbial community is described purely in terms of reaction ratesand the “capacity” to perform particular reactions. The benefits of such a “reaction-centric”approach, when compared to models explicitly keeping track of cell (or gene) densities, isthe reduced number of physiological parameters required for constructing a model and ofbiotic measurements required to estimate the current state of a system. I validate thisapproach using data from previous bioreactor experiments (94, 109). Further, I examinehow the co-occurrence of multiple pathways in the same organisms, rather than in separateorganisms, can affect overall community metabolism and thus the accuracy of the pathway-centric paradigm.Methodological details for each chapter are provided as supplemental material in AppendicesA–G.7Decoupling between function and taxonomy in the ocean microbiomeChapter 2The decoupling of function and taxon-omy in the global ocean microbiome12.1 SynopsisHere we use statistical analyses of taxonomic and functional community profiles to deter-mine the factors that shape marine bacterial and archaeal communities across the globalocean. Through an extensive literature search we classified >30,000 microbial organismsinto metabolic functional groups, which allowed us to disentangle functional from taxonomiccommunity variation. We find that environmental conditions strongly influence the distri-bution of functional groups by shaping metabolic niches, but barely influence the taxonomiccomposition within individual functional groups. Hence, the bulk of environmentally drivenvariation in community composition is attributable to functional properties, while the re-maining variation is enabled by a high global functional redundancy across taxa.2.2 IntroductionMicrobial communities power global biogeochemical cycling and form the most importantinterface between abiotic and biotic processes on Earth (116). Bacteria and Archaea, inparticular, drive marine nitrogen and sulfur cycling, thereby modulating global ocean pro-ductivity and climate (136). Elucidating the processes shaping microbial communities overspace and time presents a missing link towards understanding the integrated biotic-abioticsystem we call our planet, and is key for predicting how biogeochemical cycles will changewith changing environmental conditions.The majority of microbial biogeographical studies focus on taxonomic community compo-1A version of this chapter has been published in Science (see the Preface for author contributions): Louca,S., Parfrey, L.W., Doebeli, M. (in press). Decoupling function and taxonomy in the global ocean microbiome.Science. 353:1272–1277. DOI:10.1126/science.aaf4507.8Decoupling between function and taxonomy in the ocean microbiomesition (306). Taxonomic community profiling can reveal intriguing variation between en-vironments, and functional differences between organisms are generally thought to causethese patterns. Distantly related microbes, however, can often perform similar metabolicfunctions and, reciprocally, closely related taxa may occupy separate metabolic niches (308).This leads to a disconnect between taxonomic community structure and function (122, 466).As a consequence, the ecological reasons for the observed taxonomic variation usually re-main unknown. Other studies directly estimate functional potential based on communitygene content using environmental shotgun sequencing–or metagenomics–and have indeed re-vealed strong correlations between the distribution of particular metabolic pathways andenvironmental conditions (99, 181, 381). This suggests that environmental conditions shapethe functional potential of microbial communities in terms of community gene content byconstraining metabolic niches. However, it is yet unknown how this relates to communityassembly rules, and which aspect of the variation in taxonomic composition is relevant toecosystem functions. In addition to these niche effects (also known as environmental fil-tering), microbial populations are subject to complex community-level processes such aspredation or mutualistic interactions (258, 455), as well as to potential limits to their dis-persal across spatial scales (306). Given this complexity, it is important to establish basicprinciples determining microbial community composition.Here we present an analysis of over 100 bacterial and archaeal communities across the globalocean, combining taxonomic and functional community profiling to elucidate the role ofenvironmental filtering, global functional redundancy and dispersal limitation in shapingnatural microbial communities. Taxonomic profiles were generated based on shotgun DNAsequences of the 16S ribosomal gene, a standard marker gene in microbial ecology. Func-tional profiles were generated by associating individual organisms with metabolic functionsof particular ecological relevance, such as photoautotrophy and sulfate respiration, using anannotation database that we created through extensive literature search. This information,which we validate by comparing it to metagenomic gene profiles, allowed us to explore multi-ple facets of microbial community structure — taxonomic composition, metabolic functionalpotential and taxonomic composition within individual functional groups — in relation toenvironmental conditions and geographical location.9Decoupling between function and taxonomy in the ocean microbiome2.3 Environmental conditions mainly affect microbialfunctionTo assess the effects of environmental conditions on various aspects of community structure,we performed regression and correlation analysis of the relative abundances of metabolicfunctional groups, as well as the proportions of various operational taxonomic units (OTU)within each functional group, against 13 key abiotic oceanographic variables that includeddissolved oxygen, salinity, temperature and depth (Table A.1). Both regression and corre-lation analyses generally revealed that environmental conditions had very strong effects onthe functional profiles of microbial communities, but only minor effects on the taxonomiccomposition within each functional group. In particular, the cross-validated coefficient of de-termination (R2cv, a measure of the predictive power of a model) for the relative abundancesof most functional groups greatly exceeds the average R2cv achieved for the OTU proportionswithin the same groups (Fig. 2.1A). Similarly, correlations between relative functional groupabundances and environmental variables are generally greater in magnitude, compared tothe correlations between OTU proportions within each group and environmental variables(Fig. 2.1B). These differences persist even when OTUs are combined at higher taxonomiclevels (e.g., genus, family or order; Figs. A.1 and A.2). Hence, the poor correlation betweenthe taxonomic composition within functional groups and environmental conditions is not dueto a sub-optimal choice of taxonomic resolution, but rather reflects a lack of environmentaleffects on the non-functional variation in community composition.Regression modeling of taxonomic profiles of entire communities (at any taxonomic resolu-tion) against environmental variables achieved an average R2cv (Fig. 2.5) that is lower thanthe R2cv for the relative abundances of most functional groups, but higher than the mean R2cvachieved for the taxonomic compositions within functional groups. This further supports theinterpretation that deterministic environmental effects on function only partly shape over-all taxonomic community structure, due to taxonomic variation within functional groupsthat is much less affected by environmental conditions. Accordingly, clustering samples bytaxonomic as well as functional community composition (Bray-Curtis dissimilarity metric,Figs. 2.2BC and A.3) shows that a distinction between water column zones based on func-tion is comparable in strength to a distinction based on taxonomy. In fact, the fraction offunctional groups exhibiting statistically significant segregation between water column zones(e.g., mesopelagic vs surface) is usually higher than the fraction of significantly segregatedOTUs or higher taxa (Figs. 2.2E–G). Hence, the bulk of deterministic variation in commu-nity composition across different zones is well captured by the variation of purely functional10Decoupling between function and taxonomy in the ocean microbiomeproperties.These results strongly suggest that environmental conditions influence microbial communitystructure in the global ocean primarily by shaping metabolic niches. In fact, we find that theproductivity of a particular metabolic niche – represented here by the relative abundanceof a functional group – generally only weakly influences the taxonomic composition withinthat niche (Fig. 2.3B). The importance of niche effects in structuring marine microbialcommunities is further underlined by our finding that OTUs sharing a higher number offunctions tend to co-occur more frequently (Fig. 2.2H). In contrast, if competition andspecies assortment were dominant forces we would expect lower co-occurrences among OTUswith more shared functions. In a similar way metabolic niche effects were shown to dominatehuman gut microbiome assembly (271), suggesting that this may be a general pattern innatural microbial communities.The decoupling between environmental conditions and niche productivity on the one hand,and the taxonomic composition within individual niches on the other hand, is consistent withprevious smaller-scale observations. For example, in a wastewater treatment plant the ratio ofaerobic ammonia oxidizing bacteria and heterotrophic bacteria remained constant over time,while the taxonomic composition within each functional group varied markedly and appearedto be only weakly explained by environmental factors (349). Moreover, metatranscriptomicsin two distinct ocean regions revealed strongly conserved diurnal cyclic succession patternsof community gene expression, despite largely non-overlapping taxonomic affiliations of tran-scripts between the two regions (25). Our work suggests that these previous observations arein fact just the tip of an iceberg. Energetic and stoichiometric constraints generally driveocean microbial metabolic activity, but not the identity of the microbes involved in thatactivity.2.4 Causes of variation within functional groupsIf environmental conditions mainly interact with functional community structure, the ques-tion arises as to what drives the taxonomic variation within individual functional groups.High functional redundancy on a global ocean scale (Fig. 2.3) presumably enables the highdiversity within functional groups and a decoupling between taxonomic composition andconstraints on function. It is possible that the “unexplained” taxonomic variation withinfunctional groups may be partly due to unconsidered physicochemical variables combinedwith unconsidered phenotypic differences, driving location-dependent growth differences be-11Decoupling between function and taxonomy in the ocean microbiometween competing species. However, it is unlikely that latent environmental variables alonecould explain the widespread apparent randomness seen within such a large number of func-tional groups (Fig. A.7). Alternatively, spatially limited dispersal is often suggested as acause of random distribution patterns not attributable to environmental conditions. Disper-sal limitation has been shown to be important for larger organisms such as plants (487), butits importance for microorganism is generally thought to be low compared to environmentalfiltering, and to depend strongly on the environments considered (e.g., lakes vs open ocean;(29, 306, 466)). To test whether the variation in taxonomic composition in the ocean com-munities considered here can be explained by dispersal limitation, we compared geographicalsample distances to dissimilarities in functional and taxonomic community composition, aswell as to differences in composition within individual functional groups. Mantel correlationtests between geographical distances and various dissimilarity metrics revealed no significantpositive correlations, neither at the level of the whole community (Figs. 2.4AB) nor withinfunctional groups (Figs. A.4). The apparent absence of a distance-decay of similarity is alsoreflected in ordination plots (Figs. 2.4CD), in which sample clustering appears to be inde-pendent of ocean region, with the notable exception that polar samples are clearly distinct.These observations suggest that range-limited random dispersal does not play a significantrole in shaping taxonomic community differences across the oceans, even when restrictedto within metabolic niches. In principle, dispersal limitation may cause a distance-decayof similarity at much smaller geographical scales than the ones considered here, and thusremain undetected by our analysis (307). This scenario, however, is unlikely for marine mi-croorganisms that can be dispersed at global scales by large ocean currents (125), and thatindeed appear to be recruited from a global marine microbial seed bank (147).Instead, the unexplained taxonomic variation may be driven predominantly by community-level processes such as metabolic interdependencies, chemical warfare, or predation by virusesand eukaryotes. Previous work has highlighted the central role of such processes in micro-bial community assembly (258, 276, 455). For example, adaptation of bacteriophages tospecific hosts can cause continuous replacement of competing species (423) and promotespatial structuring of microbial communities (462). Hence, functional community structureand taxonomic composition within functional groups appear to constitute roughly comple-mentary axes of variation, with the former being affected more strongly by environmentalconditions and the latter shaped by community-level processes. In this perspective, the ob-served overall microbial community structure would result from the superposition of bothenvironmental constraints and community-level processes (276, 455).12Decoupling between function and taxonomy in the ocean microbiome2.5 Beyond taxonomic profilingTaxonomic microbial community profiles can reveal intriguing differences between environ-ments, for example across mammalian guts (335) or along the ocean water column (460).However, microbial taxonomy and metabolic potential can exhibit significant inconsistencies(3); the extent of these inconsistencies is strongly trait-dependent and often considerable(308). Inconsistencies between taxonomy and metabolic potential are driven by diverse evo-lutionary processes including adaptive loss of function (165, 334) and metabolic convergenceof distinct clades accelerated by frequent horizontal gene transfer (116). As a result, manymetabolic phenotypes can be shared by distant microbial clades (Figs. 2.3AC) and, recipro-cally, members of the same clade can fill separate metabolic niches (165). This misalignmentbetween taxonomy and metabolic potential complicates the mechanistic interpretation oftaxonomic biogeographical patterns and the development of predictive ecological theories(381). Thus, while modern marker-gene sequencing techniques can yield detailed taxonomiccommunity profiles, they suffer from the crucial limitation that these high-dimensional pro-files are obtained along axes of sub-optimal ecological relevance. Sophisticated statisticaltechniques such as principal coordinate analysis or multidimensional scaling are often usedto reduce the dimensionality of these data, in the hope of detecting axes of variation bearingan ecological meaning (382). Alternatively, detected species may be combined at highertaxonomic levels that more closely resemble the depth at which traits vary across lineages.However, the optimal taxonomic level is highly trait-dependent (308).Here we have shown that a straightforward but thorough binning of organisms into functionalgroups can reveal strong patterns relevant to biogeochemistry and ecosystem function. Inparticular, the comparison of functional profiles to environmental variables (Fig. 2.1) yieldsinsight into the processes driving variation in community composition along geochemical gra-dients and, reciprocally, gives information about the effects of that variation on ecosystemprocesses. For example, our correlation analysis revealed a particularly strong influence ofwater depth on function (Fig. 2.1B), which is also reflected in the clear separation of themesopelagic zone from the upper sunlit zones (surface and deep chlorophyll maximum; Figs.2.2AB). This functional zonation with depth is consistent with metagenomic profiles of thesame samples (Fig. A.5). The partitioning of function along the water column is presumablydriven by depth-dependent factors known to influence microbial life history strategies andproductivity, such as light intensity and temperature (236). Furthermore, in deeper zonesoxygen becomes a limiting resource for respiration that can be partly replaced by alterna-tive electron acceptors such as nitrate and sulfate (531). Such redox gradients underlie the13Decoupling between function and taxonomy in the ocean microbiomespatial zonation of metabolic pathways frequently observed in gene-centric metagenomic,metatranscriptomic and metaproteomic profiles across ecosystems (99, 181). Accordingly,our functional profiles reveal that deeper samples exhibit an over-representation of severalgroups capable of fermentation as well as nitrate and sulfate respiration (Fig. 2.2A). Thesealternative metabolic modes lead to an increased community richness (in terms of detectedfunctional groups and OTUs), especially in the mesopelagic zone (Fig. 2.2D, after rarefyingat equal sequence count). While an increase of taxonomic richness with depth has beennoted previously (368, 460), our analysis now provides evidence that this richness gradientis strongly related to the number of available metabolic niches. Hence, the systematic in-tegration of taxonomic and functional information demonstrated here can help answer longstanding questions regarding the relation between microbial taxonomic and functional diver-sity and variability in a given environment (283). Similarly, functional group co-occurrencepatterns (e.g., Fig. A.6) may trigger novel hypotheses on the interaction of metabolic path-ways at ecosystem scales.The translation of taxonomic information into functional profiles based on phenotypic char-acterizations, as demonstrated here, provides a powerful complement to gene-centric metage-nomics, the current de facto standard for functional community profiling (486). Metagenomicprofiles suffer from the conceptual limitation that community gene content generally doesnot directly and unambiguously translate to functional potential (376). Phenotype-basedprofiles on the other hand have the potential to resolve ambiguities inherent to metage-nomics, because experimental evidence is used to identify the actual metabolic capabilitiesof organisms (see supplementary text for examples). We emphasize that the full potentialof phenotype-based profiling remains underutilized due to our current inability to associateseveral detected taxa with any functional group. For example, a large fraction of the ubiqui-tous but poorly studied Thaumarchaeota phylum potentially involved in ammonia oxidation(441) was excluded from our functional annotations. Similarly, microeukaryotes such as fungilikely contribute to some of the considered metabolic functions, including cellulose and chitindegradation (91). The restriction to Bacteria and Archaea may thus explain our inabilityto relate the distribution of these functional groups to environmental conditions (Fig. 2.1).Future functional profiling will greatly benefit from an inclusion of eukaryotic microorgan-isms and from an effort to phenotypically characterize underexplored (e.g., hard-to-culture)clades of potentially high relevance to ecosystem functioning.14Decoupling between function and taxonomy in the ocean microbiome2.6 ConclusionsDespite an enormous microbial diversity, the bulk of global biogeochemical fluxes is drivenby a core set of metabolic pathways that evolved in response to past geochemical condi-tions (116). Through time, these pathways have spread across microbial clades that canco-occupy and compete within metabolic niches. The decoupling of taxonomic compositionwithin metabolic niches from environmental conditions, as demonstrated here, suggests thatBaas Becking’s famous hypothesis “everything is everywhere and the environment selects”(28) should be refined towards the more conservative formulation that “every function iseverywhere, and the environment selects” (381). This realization has implications for theinterpretation of differences in community structure across environments or across time, be-cause differences in taxonomic composition that do not affect functional composition havelittle relevance to ecosystem processes (122). Functional descriptions of microbial com-munities should therefore constitute the baseline for interpreting biogeographical patterns,particularly across transects where geochemical gradients shape microbial niche distribution(162, 386). The remaining variation within functional groups can then be analyzed separatelyto elucidate additional community assembly processes. As the number of phenotypicallycharacterized organisms increases, the potential of functional annotations of microbial taxacan only further improve. An incorporation of global microbial functional profiles — andtheir response to potentially changing environmental conditions — into future biogeochem-ical models will greatly benefit reconstructive and predictive modeling of Earth’s elementalcycles.15Decoupling between function and taxonomy in the ocean microbiomeBAapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonR2cv correlations to environmental variablesrelative functional group abundances OTU proportions within functional groupsfunc. groups OTU prop.1 0 1Figure 2.1: Linking functional and taxonomic composition to environmental conditions.(A) Cross-validated coefficients of determination (R2cv) for relative functional group abundances(left bars), as well as for OTU proportions within each functional group (right bars), achieved byregression models with environmental predictor variables. (B) Spearman rank correlations betweenenvironmental variables and relative functional group abundances (left box) or OTU proportionswithin each group (right box). Circle surface area and color saturation are proportional to theabsolute correlation.16Decoupling between function and taxonomy in the ocean microbiomeSRF MESDCM MIXBCStress: 0.067Stress: 0.25DAOTU richnessfunctional richnessEphylaclassesordersfamiliesgeneraOTUsfunc. groupsfraction segregated0.00.20.40.60.81.0DCM vs MES DCM vs SRFphylaclassesordersfamiliesgeneraOTUsfunc. groupsphylaclassesordersfamiliesgeneraOTUsfunc. groupsF G HMES vs SRFFigure 2.2: Environmental filtering of microbial communities in the global ocean. (A)Functional community profiles, with samples ordered according to water column zone (SRF: surfacewater layer; DCM: deep chlorophyll maximum; MIX: subsurface epipelagic mixed layer; MES:mesopelagic zone). A darker color corresponds to a higher relative abundance of a functionalgroup. (B,C) Metric multidimensional scaling of microbial communities (one point per sample),based on Bray-Curtis dissimilarities in terms of (B) functional groups and (C) OTUs. Points ingreater proximity correspond to more similar communities. (D) Community richness in terms offunctional groups and OTUs (one point per sample), after rarefaction. (E–G): Fraction of OTUs,taxa or functional groups significantly segregated between water column zones (E: DCM vs MES,F: DCM vs SRF, G: MES vs SRF). (H) Box-plots of pairwise Spearman rank correlations betweenrelative OTU abundances, depending on the number of shared functions. Vertical bars show 66%percentiles.17Decoupling between function and taxonomy in the ocean microbiome 1 10 100 1000 10000 100000L2 L3 L4 L5 L6raw-OTUsnumber of taxa represented per functionCOTUgenusfamilyorderclassphylumAOTU proportionstotal aerobic ammonia oxidizersBtaxa per functionFigure 2.3: Functional redundancy in the global ocean microbiome. (A) Number ofbacterial and archaeal taxa represented within each functional group (one point per group), atvarious taxonomic levels. At the species and genus level, aerobic chemoheterotrophs present by farthe richest group. (B): OTU proportions within the group of aerobic ammonia oxidizers (one colorper OTU). Samples are sorted according to the relative abundance of the entire functional group.For OTU proportions within other functional groups, see Fig. A.7. (C) Association of functionalgroups (columns) with members of microbial classes (rows). A darker color corresponds to a higherrelative contribution of a class (in terms of the number of OTUs) to a functional group. Rows andcolumns are sorted accorded to the number of non-zero entries within them.18Decoupling between function and taxonomy in the ocean microbiomeACBD5 10 15 200geographical distance (10³ km)Stress: 0.062 Stress: 0.20⇢ = 0.09,P = 0.17⇢ = 0.03,P = 0.385 10 15 200geographical distance (10³ km)Figure 2.4: Community differences vs geographical distance. (A,B): Bray-Curtis dissimilar-ities between microbial communities, compared with geographical distances (one point per samplepair). Community dissimilarities are calculated in terms of (A) relative functional group abun-dances and (B) relative OTU abundances. Plot imprints indicate Spearman rank correlations andtheir statistical significance. Samples are restricted to the mesopelagic zone; for other water columnzones see Fig. A.8. For other taxonomic resolutions (e.g., at genus or family level) see Fig. A.13.(C,D): Metric multidimensional scaling of microbial communities (one point per sample), based onBray-Curtis dissimilarities in terms of (C) functional groups (C) OTUs. Points in greater proximitycorrespond to more similar communities. Points are shaped and colored by ocean region.19Decoupling between function and taxonomy in the ocean microbiomeclassorderfamilygenusspecies0.00.10.20.30.40.5R 2cvMean regression R2cv of full OTU table vs metadataFigure 2.5: Regression of taxonomic community composition. Mean cross-validated coef-ficients of determination (R2cv) for relative taxon abundances at the community level (for varioustaxonomic resolutions), achieved by regression models with environmental predictor variables.20Functional stability and taxonomic variability in bromeliad microbiomesChapter 3Functional stability despite high taxo-nomic variability across microbial com-munities in bromeliad tanks13.1 SynopsisAccording to the pathway-centric paradigm suggested in the previous chapter, the metabolicfunctional structure of microbial communities is strongly shaped by environmental condi-tions constraining the metabolic pathways that sustain growth. Apart from metabolic nicheeffects, however, several additional processes such as biotic interactions or dispersal limi-tation can influence overall community composition. Thus, similar habitats could exhibitvery different microbial communities despite a similar functional structure. To test thisprediction, we determined the bacterial and archaeal community composition in 22 repli-cate “miniature” aquatic ecosystems, contained within the foliage of wild bromeliads. Weused 16S rDNA marker gene sequencing for inferring the taxonomic composition within 9metabolically defined functional groups, as well as shotgun environmental DNA sequencingfor estimating the overall abundances of these groups. We find that all bromeliads exhibitremarkably similar functional community structure, but a highly variable taxonomic com-position within individual functional groups. Using a variety of statistical analyses fromcommunity ecology, we find evidence that the taxonomic turnover within functional groupsis driven by a combination of environmental filtering and biotic interactions. We find noeffect of dispersal limitation or random population drift on community composition, andconclude that complex deterministic processes — rather than neutral assembly — shapecommunity variation within functional groups.1A version of this chapter is currently under review for publication (see the Preface for author contribu-tions): Louca, S., Jacques, S.M.S., Pires, A.P.F., Leal, J.S., Srivastava, D.S., Parfrey, L.W., Farjalla, V.F.,Doebeli, M. (in review). Functional stability despite high taxonomic variability across microbial communities.21Functional stability and taxonomic variability in bromeliad microbiomes3.2 IntroductionMicrobial metabolism drives the bulk of biogeochemical fluxes in virtually every naturalecosystem and has shaped Earth’s surface chemistry through geological time (116). Nat-ural microbial communities can display complex variation in composition across space ortime, such as through the ocean water column (460) or across seasons (540), and this varia-tion can have profound effects on ecosystem functions (500, 540). The mechanisms drivingthis variation remain poorly understood, because the entanglement of multiple mechanismsseverely complicates the identification of direct causal relationships (258). Potential mech-anisms of microbial community assembly suggested previously include adaptation to localenvironmental conditions (“environmental filtering” 371), biotic interactions such as preda-tion or syntrophy (258, 276, 455), random population drift (349), random colonization order(“lottery effects” 52) and spatially limited random dispersal (306). Recent work suggeststhat the bulk of environmentally driven variation in the global ocean microbiome is closelyrelated to its metabolic function, while the taxonomic variation within individual functionalgroups is only poorly explained by environmental conditions (289, 381). This points to-wards a promising and elegant paradigm for microbial ecology, in which community functionis strongly shaped by energetic and stoichiometric constraints such as the availability oflight or electron acceptors for respiration (25, 381), while the composition within functionalgroups is modulated by additional deterministic or stochastic mechanisms. According to thisparadigm, one would predict that physicochemically similar environments will promote sim-ilar functional community structure, while allowing for strong taxonomic variation withinindividual functional groups. This prediction is supported by observations in engineeredecosystems such as bioreactors exhibiting strong taxonomic fluctuations while maintainingconstant biochemical performance (122, 495), but it remains largely untested for naturalmicrobial communities.Here we test this prediction in natural prokaryote (i.e., bacterial and archaeal) communitiesacross 22 replicate natural aquatic environments, harbored within the foliage (“tanks”) ofbromeliads in the Jurubatiba coastal sand dune National Park, Brazil (Fig. 3.1). Bromeliadtanks accumulate rain water and organic material (such as dead leaves) from their sur-rounding environment, and intense decomposition of this material sustains a high richnessof microorganisms and macroinvertebrates (152, 395). Apart from constituting regional bio-diversity hotspots, bromeliads are often used as “miniature” model systems for microbialand invertebrate ecology (117, 438). Microbial communities in bromeliads tend to be highlydistinct from the surrounding environments (e.g., soil), exhibiting a strong shift towards22Functional stability and taxonomic variability in bromeliad microbiomesfermenting and methanogenic organisms (151, 152, 304).To ensure a high similarity between systems, we only surveyed mature plants of a singlebromeliad species (Aechmea nudicaulis) from the same region (296). We used ampliconDNA sequencing of the 16S ribosomal gene, a standard marker gene in microbial ecology(536), to estimate the taxonomic richness and variability within 9 metabolically definedfunctional groups of potential ecological importance in bromeliads, such as fermentation,dissimilatory reduction of nitrogen compounds (“nitrogen respiration”) and methanogenesis(151, 152, 345). Detected taxa were assigned to these metabolic functional groups, wheneverpossible, based on available literature. In parallel, we used environmental shotgun DNAsequencing (metagenomics) to estimate the overall abundance of each functional group interms of one or multiple proxy genes. We find that all communities exhibited a remarkablysimilar functional composition, which contrasts with a highly variable taxonomic compo-sition within individual functional groups. Further, we examined phylogenetic communitystructure and species co-occurrence patterns, and compared community composition to abi-otic environmental conditions and geographical location, to elucidate potential mechanismsdriving this variation within functional groups.3.3 Results and discussion3.3.1 Functional stability contrasts with taxonomic variabilityWe found that the metabolic functional potential of tank prokaryotic communities, as mea-sured by gene abundance profiles, is consistent across bromeliads (Fig. 3.2A,B). This con-sistency in metabolic functional potential is presumably promoted by strong stoichiometricbalancing between coupled metabolic pathways, the majority of which serve to break downlarge organic compounds to simpler organic molecules and gradually move electrons fromreduced organic carbon to terminal electron acceptors such as protons (H+), carbon dioxide(CO2), sulfate (SO2−4 ), nitrate (NO−3 ) and oxygen (O2) (56). These metabolic pathwaysare distributed across multiple organisms and link the breakdown of dead organic mattercaptured in the bromeliads to the eventual release of CO2 (22), methane (CH4)(304) and pre-sumably molecular nitrogen (N2). Each step along these pathways thus appears to sustainhighly constrained microbial productivities, resulting in specific proportions of metabolicfunctional groups that are conserved across bromeliads.On the other hand, we find that the taxonomic composition within individual functional23Functional stability and taxonomic variability in bromeliad microbiomesgroups is highly variable across bromeliads, both in terms of the occurrence of operationaltaxonomic units (OTU, at 99% 16S rDNA similarity, see the Methods for justification) aswell as their relative abundances (Figs. 3.2C–K). For example, within any given functionalgroup, any two bromeliads share only ∼20–60% of their OTUs (Table B.1), and this overlapis significantly lower than would be expected solely due to insufficient sampling effort (P <0.001). In fact, within any given functional group, OTUs detected in all of the samples(“core microbiome”) only make up ∼0–1% of total OTUs across all samples (“regional pool”).Further, coefficients of variation for OTU proportions within functional groups are typicallyan order of magnitude higher than coefficients of variation of relative gene abundances (∼2–3vs ∼0.2–0.6, respectively; Table B.2). This taxonomic variability within functional groupspersists to a considerable extent even when OTUs are combined at higher taxonomic levels(e.g., genus, family, order or class level; Figs. B.1, B.2, B.3 and B.4) and is in strongcontrast to the much more stable relative gene abundances. Hence, in each bromeliad thesame metabolic niches appear to be occupied by very different species assemblages, even if theoccupancy of each niche — in terms of its relative abundance — remains almost unchanged.This variability within metabolic niches explains the previously observed strong variationin overall microbial community composition across bromeliads (117) and underlines the factthat high taxonomic variability between replicate ecosystems need not imply differences inmetabolic function.3.3.2 Causes of variation within functional groupsThe strong taxonomic variability within functional groups is presumably enabled by a highfunctional redundancy in the regional microbial species pool (Fig. 3.3), allowing for potentialcolonization of each bromeliad by multiple metabolically similar OTUs. Although we do notyet know the precise mechanisms determining the subset of OTUs that eventually establishin each bromeliad and within each metabolic niche, we can discount certain explanations.For example, random population drift combined with random dispersal within the sampledarea would result in negligible associations between OTUs and be perceived as a randomsubsampling of the regional OTU pool. To test this scenario for each functional group, wecompared OTU co-occurrences, as defined by their “C-score” (a measure for mutual OTUsegregation, averaged over all OTU pairs; 159), to a null model corresponding to randomOTU sampling from the functional group’s regional pool. The null model preserved thetotal number of OTUs per sample and per functional group as well as the total number ofsamples containing each OTU, in order to avoid spurious co-occurrence patterns caused bydifferences in OTU richness or OTU frequency. Within 6 out of 9 functional groups (aero-24Functional stability and taxonomic variability in bromeliad microbiomesbic chemoheterotrophs, cellulose degraders, fermenters, nitrogen respirers, photoautotrophsand sulfate respirers), we find that OTUs are significantly co-segregated with respect toeach other, that is, C-scores are higher than expected by chance (P < 0.05, Table B.3).The remaining functional groups also display OTU segregation, although differences fromthe null model are not statistically significant. This general segregation of OTUs beyondthe null expectation rules out random subsampling and drift as important causes of OTUturnover within functional groups. We note that, when combined with spatially limited dis-persal, neutral population drift could in principle produce non-random OTU co-occurrencepatterns (490), because bromeliads in greater proximity would tend to exhibit more similar(i.e., correlated) community composition. However, spatially limited dispersal is likely notimportant at this scale: We did not find any significant correlations between geographicaldistance and community dissimilarity for any of the functional groups and for any of theconsidered dissimilarity metrics (Mantel tests with Spearman rank correlations; Fig. B.5;detailed results in Table B.4). In fact, bromeliads at opposite ends of our study site of-ten contained more similar communities than immediately adjacent bromeliads (Fig. B.6).These results are consistent with previous work that found negligible effects of spatial dis-tance on bacterial, zooplankton and macroinvertebrate communities in bromeliads at similarspatial scales (117). Hence, the OTU co-occurrence patterns observed here likely reflect adeterministic mutual exclusion between OTUs that is potentially caused by environmentalfiltering or biotic interactions (159), rather than spatially correlated or uncorrelated randomassembly.To further verify the importance of deterministic assembly processes, we examined the phy-logenetic structure within functional groups in each sample. Specifically, within any givenfunctional group, we assessed whether OTUs found in the same sample tend to be phyloge-netically underdispersed (“clustered”) or overdispersed in terms of their mean phylogeneticdistance, when compared to the expectation based on random OTU sampling from the re-gional pool of that functional group. Conventionally, underdispersion is interpreted as asign of environmental filtering acting similarly on closely related clades (196), while overdis-persion is interpreted as a sign of increased competition between close relatives, althoughalternative mechanisms, such as specialist predation by phages, may also create non-randompatterns (362). Of the 9 functional groups, we find that 4 show a significant tendency to-wards underdispersion and 2 functional groups demonstrate overdispersion (P < 0.05, Fig.3.4). The detection of a statistically significant phylogenetic structure in 6 out of 9 functionalgroups is unlikely the result of a false positive detection rate (P < 0.000001). This supportsour previous interpretation that community assembly is generally not random within func-25Functional stability and taxonomic variability in bromeliad microbiomestional groups, but is subject to deterministic processes that are sensitive to phylogeneticrelationships. Note that an absence of phylogenetic structuring, on the other hand, doesnot rule out deterministic processes (362). Moreover, the fact that some groups exhibit phy-logenetic underdispersion, while others exhibit phylogenetic overdispersion, suggests thatdifferent ecological processes influence phylogenetic structure in different functional groups.In particular, the strong overdispersion of methanogens as well as methylotrophs suggeststhat competition may correlate strongly with relatedness in these groups and that otherfactors, such as environmental filtering, may be unimportant or only weakly correlate withphylogeny. On the other hand, frequent horizontal transfer of genes for the degradationof particular organic compounds reduces the correlation between phylogenetic relatednessand metabolic similarity in aerobic chemoheterotrophs and fermenters (308). This may ex-plain why in these two functional groups mechanism causing underdispersion — rather thanoverdispersion — seem to dominate.The above findings suggest that, although taxonomic composition within functional groupsis highly variable, it is not random in terms of OTU co-occurrences or phylogenetic relation-ships. This determinism might be caused by environmental filtering, by biotic interactions,or by a combination of these, such as trade-offs between environmental stress tolerance andcompetition (159, 362). For example, recent work demonstrates that microbial communitiescan exhibit complex but deterministic responses to extremely weak environmental fluctua-tions (130), and that environmental turnover — when carefully characterized — can explaincommunity turnover (371, 450). To determine whether environmental filtering partly drivesthe variation in OTU composition within functional groups, we examined the predictive abil-ity of an extensive set of physicochemical variables (overview in Table B.5). We consideredstandard limnological variables such as pH, salinity and multivariate characterization of dis-solved organic carbon, as well as other potentially important variables such as detrital volumeand vegetative cover (“shading”). Many of these variables are known to influence macroinver-tebrate communities in bromeliads (102, 279, 296). We find that a subset of environmentalvariables — including pH, salinity, detrital volume and shading — exhibit high and statis-tically significant correlations to relative OTU abundances within several functional groups,suggesting that these variables may be particularly influential (Fig. 3.5A). Regression mod-els generally exhibit low to moderate predictive power when compared against novel data,as indicated by cross-validated coefficients of determination (R2cv), although we note thatpredictive power varies greatly among different OTUs (Fig. 3.5B). We therefore concludethat the environmental variables considered here can explain some of the variation withinfunctional groups, but that additional factors are also important. In fact, the predictive26Functional stability and taxonomic variability in bromeliad microbiomespower of environmental variables is similarly low within functional groups showing eitherphylogenetic underdispersion or overdispersion (Fig. 3.4), indicating that the non-randomphylogenetic community structure may be shaped mostly by biotic interactions rather thanenvironmental filtering.Taken together, our results suggest that, in addition to environmental filtering, biotic inter-actions also play a significant role in shaping these communities while maintaining functionalsimilarity across bromeliads. The potential importance of biotic interactions, such as com-petitive exclusion, predation by phages or protists as well as metabolic interdependencies,in shaping microbial communities has been pointed out previously (258, 276, 455). For ex-ample, adaptation of bacteriophages to specific hosts can strongly influence bacterial speciescomposition (423) and promote spatial as well as temporal variation of microbial communi-ties (397, 462). Consequently, seemingly random taxonomic variation across locations mayresult from biotic interactions driving complex but deterministic population dynamics or,alternatively, from biotic interactions generating complex community responses to subtleenvironmental variation (130, 450, 451). This conclusion is consistent with previous findingsthat the distribution of cyanobacterial taxa across coexisting bromeliads was driven both byphysicochemical factors as well as protozoans and invertebrates (63). Here we have not con-sidered possible effects of invertebrates, although we note that we detected almost no insectsin the sampled bromeliads. Further, given the available data it is impossible to determinewhether microbial communities within single bromeliads exhibited high temporal variability,or if communities were near steady state. Previous work shows that even strongly controlledengineered ecosystems can exhibit high temporal fluctuations in microbial taxonomic com-position (122, 349) and that these fluctuations can be deterministic (130, 495). Hence, thehighly variable community profiles observed here could be mere “snapshots” along similarsuccessional trajectories far from steady state (285).3.4 ConclusionsHere we have shown that replicate natural ecosystems exhibit highly variable taxonomiccomposition of bacterial and archaeal communities, despite very similar metabolic func-tional structure. This points to a fundamental and important difference between func-tional and taxonomic community structure, which arises because mechanisms leading to aconvergence of functional structure (e.g., nutrient limitation, stoichiometric balancing be-tween coupled metabolic pathways) do not necessarily lead to a convergence of taxonomiccomposition. Reciprocally, strong taxonomic turnover may only weakly affect ecosystem27Functional stability and taxonomic variability in bromeliad microbiomesfunctioning (254, 517; but see 454). We suggest that functional community profiles, eitherbased on gene-centric metagenomics (255, 381) or on a functional classification of detectedtaxa (289), should be the baseline of microbial biogeographical studies, particularly in caseswhere geochemical gradients shape microbial niche distribution (381, 387). The residualvariation within individual functional groups can then be extracted and analyzed separately,as demonstrated here, in order to elucidate additional community assembly processes thatact in superposition to metabolic niche effects. Our analysis suggests that in bromeliadtank ecosystems the variation within individual functional groups is the result of multipledeterministic processes, including environmental filtering and biotic interactions, while ran-dom processes such as dispersal limitation or neutral population drift (427) appear to beless relevant. This is in line with recent work on the global ocean microbiome (289) andsuggests a general paradigm for microbial ecology. A careful distinction between functionalcomposition and taxonomic composition within functional groups thus enables deeper in-sight into microbial community assembly processes and will be an important step towards atruly mechanistic microbial ecology.Aechmea nudicaulis20 cmFigure 3.1: Bromeliad species used in this study Large picture: Aechmea nudicaulis, thebromeliad species considered in this study. The foliage forms a deep central cavity (“tank”, smallpicture) that accumulates rainwater and dead organic material, such as leaves from nearby trees.The decomposition of this material sustains a highly productive and diverse food web inside thetank.28Functional stability and taxonomic variability in bromeliad microbiomesmetagenomic profiles (proxy genes)OTU proportionsrelative gene abundancesOTU proportionsOTU proportionsA BDHC EIfermenters aerobic chemoheterotrophs nitrogen respirerssulfate respirersphotoautotrophsmethanogens ureolyticmethylotrophscellulolyticF GJ Kmetagenomic profiles (rare proxy genes) gene groups in Fig. A & Bbromeliad sample bromeliad sample bromeliad sampleFigure 3.2: Functional stability vs taxonomic variability. (A) Relative abundances of proxygenes in prokaryotic metagenomic sequences (genes grouped by function, one color per gene group,one column per sample). For details on associating genes with functions see the Methods. (B) Sub-plot of (A) focusing on the rarer genes for better illustration. (C–K) Prokaryotic OTU proportionswithin individual functional groups (one color per OTU, one column per sample, one plot perfunctional group), as determined from 16S rDNA sequences. Due to ambiguities in gene function,for some functional groups (D, H) we considered multiple proxy genes. For each functional group,proxy genes are indicated via color codes (corresponding to colors in A and B) next to the functionalgroup’s name. For more detailed metagenomic profiles see Supplemental Fig. B.10. For thetaxonomic composition within functional groups at higher taxonomic levels (genus, family or order)see Figs. B.1, B.2 and B.3.29Functional stability and taxonomic variability in bromeliad microbiomesaerobic chemoheterotrophy (225)fermentation (103)photoautotrophy (61)sulfate respiration (40)ureolysis (21)methylotrophy (24)nitrogen respiration (17)cellulolysis (14)methanogenesis (13)Figure 3.3: Functional redundancy in the regional OTU pool. Associations of functionalgroups (rows) with OTUs (columns), indicated by blue table cells. Functional groups are sortedaccording to their number of OTUs (indicated in brackets). Some OTUs were associated with morethan one functional group. For analogous plots at the genus, family and class level, see Figs. B.11,B.12 and B.13, respectively.standardized effect size of mean phylogenetic distanceaerobic chemoheterotrophs (P<0.001)cellulolytic (P=0.86)fermenters (P<0.001)methanogens (P=0.02)methylotrophs (P=0.011)nitrogen respirers (P=0.62)photoautotrophs (P<0.001)sulfate respirers (P<0.001)ureolytic (P=0.089)underdispersion (clustering) overdispersionFigure 3.4: Phylogenetic dispersion. Standardized effect sizes (SES) of mean phylogeneticdistances between OTUs within individual functional groups (one point per sample, one row perfunctional group), as an indicator of phylogenetic overdispersion (SES > 0) or underdispersion(SES < 0). The vertical line at zero corresponds to the expectation under the null model, and isshown for reference. Functional groups displaying statistically significant overdispersion or under-dispersion (i.e., a strong tendency towards positive or negative SES across samples, respectively)are highlighted in bold (P-values are given in brackets).30Functional stability and taxonomic variability in bromeliad microbiomesA Baverage absolute correlations coefficients of determinationR2cvFigure 3.5: Relating OTU proportions to environmental variables. (A) Average magnitude(i.e., average absolute value) of correlations between OTU proportions within functional groups andmeasured environmental variables (one column per environmental variable, one row per functionalgroup). A larger and darker circle corresponds to a larger average absolute correlation, and indi-cates a stronger relation between an environmental variable and a functional group’s taxonomiccomposition. Statistically significant correlations (P < 0.05) are written inside the circles. (B) Dis-tribution of cross-validated coefficients of determination (R2cv, a measure for a model’s predictivepower) for regression models of OTU proportions within each functional group using environmentalvariables as predictors (one box per functional group). Horizontal bars comprise 95% of the R2cvacross OTUs. The vertical grey line at zero is shown for reference.31Cell-metabolic models for microbial ecologyChapter 4Calibration and analysis of cell-metabolicmodels for microbial ecology14.1 SynopsisMicrobial ecosystem modeling is complicated by the large number of unknown parame-ters and the lack of appropriate calibration tools. Here we present a novel computationalframework for modeling microbial ecosystems, which combines genome-level cell models intoa microbial community in which the metabolic activity of each cell can affect the sharedmetabolite pool and thus potentially the metabolism of other cells. The framework, whichwe called MCM (“Microbial Community Modeler”), automates statistical analysis and modelcalibration to experimental data. To demonstrate the potential of MCM, we examined thedynamics of a community of Escherichia coli strains that emerged in laboratory evolutionexperiments, during which an ancestral strain diversified into two coexisting ecotypes. Weconstructed a microbial community model comprising the ancestral and the evolved strains,which we calibrated using separate monoculture experiments. Simulations reproduced thesuccessional dynamics in the evolution experiments, and pathway activation patterns ob-served in microarray transcript profiles. Our approach yielded detailed insights into themetabolic processes that drove bacterial diversification, involving acetate cross-feeding andcompetition for organic carbon and oxygen.4.2 IntroductionMetabolic interactions are an emergent property of microbial communities (70, 333). Eventhe simplest life forms can only be understood in terms of biological consortia character-ized by shared metabolic pathways and distributed biosynthetic capacities (200, 238, 311).1A version of this chapter has been published (see the Preface for author contributions): Louca, S.,Doebeli, M. 2015. Calibration and analysis of genome-based models for microbial ecology. eLife. 4:e08208.DOI:10.7554/eLife.0820832Cell-metabolic models for microbial ecologyFor example, glucose catabolism to carbon dioxide or methane is a multi-step process ofteninvolving several organisms that indirectly exchange intermediate products through theirenvironment (442). Microbial communities are thus complex systems comprising severalinteracting components that cannot be fully understood in isolation. In fact, metabolic in-terdependencies between organisms are at least partially responsible for our current inabilityto culture the great majority of prokaryotes (410). Understanding the emergent dynamics ofmicrobial communities is crucial to harnessing these multicomponent assemblages and usingsynthetic ecology for medical, environmental and industrial purposes (44).Genome sequencing has enabled the reconstruction of full-scale cell-metabolic networks (184),which have provided a firm basis for understanding individual cell metabolism (107, 238, 496).Recent work indicates that multiple cell models can be combined to understand microbialcommunity metabolism and population dynamics (70, 177, 238, 453, 543). These approachesassume knowledge of all model parameters such as stoichiometric coefficients, maintenanceenergy requirements or extracellular transport kinetics, a requirement that is rarely met inpractice (118, 177). Experiments and monitoring of environmental samples could providevaluable data to calibrate microbial community models, e.g., via statistical parameter esti-mation, but appropriate tools are lacking. So far, the standard approach has been to obtaineach parameter through laborious specific measurements or from the available literature,or to manually adjust parameters to match observations (70, 177, 292). Furthermore, sta-tistical model evaluation and sensitivity analysis is typically performed using ad-hoc code,thus increasing the effort required for the construction of any new model. Consequently, theexperimental validation of genome-based microbial community models and their applicationto biological questions are rare (177, 318).We have developed MCM (Microbial Community Modeler), a mathematical framework andcomputational tool that unifies model construction with statistical evaluation, sensitivityanalysis and parameter calibration. MCM is designed for modeling multi-species microbialcommunities, in which the metabolism and growth of individual cell species is predicted usinggenome-based metabolic models. Cells in the community interact in a dynamical environ-ment in which metabolite concentrations and other environmental variables influence, andare influenced by, microbial metabolism. Unknown model parameters can be automaticallycalibrated (fitted) using experimental data such as cell densities, nutrient concentrations orrate measurements. To demonstrate the potential of MCM, we modeled a bacterial commu-nity that has emerged from in-vitro evolution experiments, during which an ancestral strainrepeatedly diversified into two distinct ecotypes. Experiments with microbes have an estab-lished tradition as model systems for understanding ecological and evolutionary processes33Cell-metabolic models for microbial ecology(112, 226). We show that the predictions derived from MCM are in very good agreement withthe outcomes of several monoculture and co-culture experiments. While the experimentalresults described below have been found over the course of several years (132, 188, 262, 436),it is only now that a mechanistic model has managed to unify them in a clear, unambiguousand synergistic manner. The analysis presented here thus provides a unifying quantitativeinterpretation for a large body of experimental work performed in our lab over the course ofroughly a decade.4.3 Model overviewIn MCM, a microbial community model is a set of differential equations for the populationdensities of the cell species comprising the community and of the ambient concentrationsof utilized nutrients (metabolites), coupled to optimization problems for the cell-specificrates of reactions involving these metabolites. Each cell is characterized by its metabolicpotential, that is, the genetically determined subset of reactions it can catalyze, as well asany available metabolite transport mechanisms. The reaction rates and metabolite exchangerates (i.e., the metabolism) of each cell are assumed to depend on its metabolic potential aswell as on the current environmental conditions, such as metabolite concentrations. Throughtheir metabolism, in turn, cells act as sinks and sources of metabolites in the environment.Additional metabolite fluxes, such as oxygen diffusion from the atmosphere into the growthmedium of a modeled bacterial culture, can be included in the model.At any point in time, individual cell metabolism is determined using flux balance analysis(FBA) (354), a widely used framework in cell-metabolic modeling (70, 107, 129, 238, 496).In FBA, cell metabolism is assumed to be regulated in such a way that the rate of biosyn-thesis is maximized (119, 496). The chemical state of cells is assumed to be steady, leadingto stoichiometric constraints that need to be satisfied for any particular combination ofintracellular reaction rates. Reaction rates, on the other hand, are limited due to finite en-zyme capacities. Metabolite uptake/export rates can also be limited due to finite diffusionrates or limited transmembrane transporter efficiency. For example, uptake rates can beMonod-like functions of substrate concentrations (177, 292). Taken together, cell-metabolicpotential, stoichiometric consistency, reaction rate limits and transport rate limits define theconstraints of a linear optimization problem for each cell species at each point in time. Theoptimized biosynthesis rate is translated into a cell production rate by dividing by the cell’smass, thus defining the species’ population growth (Fig 4.1).34Cell-metabolic models for microbial ecologyThe central assumption of individual cells maximizing biosynthesis, subject to environmentaland physiological constraints, is rooted in the idea that evolution has shaped regulatory mech-anisms of unicellular organisms in such a way that they strive for maximum growth wheneverpossible. Biosynthesis has been experimentally verified as an objective for Saccharomycescerevisiae and Escherichia coli (51, 146, 176). The assumption of maximized biosynthe-sis is less valid for genetically engineered organisms or those exposed to environments thatare radically different from the environments that shaped their evolution (417). Despiteits limitations, FBA has greatly contributed to the understanding of several genome-scalemetabolic networks and metabolic interactions between cells (70, 129, 177, 238, 354, 453).One advantage of FBA models over full biochemical cell models is their independence ofintracellular kinetics and gene regulation, which limits the number of required parametersto stoichiometric coefficients and uptake kinetics.The combination of FBA with a varying environmental metabolite pool, as implemented byMCM, is known as dynamic flux balance analysis (DFBA) (70, 177, 292). In contrast toconventional FBA, DFBA models are dynamical because cell densities and environmentalmetabolite concentrations both change with time, and the rate of change of each cell densityand metabolite concentration depends on the current cell densities and metabolite concen-trations (177, 292). Because metabolites can be depleted or produced by several cell species,the environmental metabolite pool mediates the metabolic interactions between cells (410).For example, oxygen uptake rates might depend on environmental oxygen concentrations,which in turn are reduced by cellular respiration. Similarly, cells might excrete acetate as abyproduct of glucose catabolism, which then becomes available to other cells. The metabolicoptimization of individual cells striving for maximal growth, while modifying their environ-ment, leads to non-trivial community dynamics that can include competition, cooperationand exploitation. The cell-centric nature of DFBA differs fundamentally from other flux bal-ance analyses of microbial communities that assume an optimization of a community-wideobjective such as total biomass synthesis (239, 453, 545). Such an assumption is, how-ever, questionable from an evolutionary perspective and likely not appropriate for naturalcommunities comprising several species. Community-level optimality will typically conflictwith optimality for individual competing lineages, and configurations that optimize overallbiosynthesis at the expense of individual “cooperators” would be vulnerable to exploitation(327).Recent work suggests that DFBA is a promising approach to microbial ecological mod-eling (70, 177, 318). For example, Harcombe et al. (177) designed a computational tool(COMETS) based on DFBA, which was able to accurately predict equilibrium compositions35Cell-metabolic models for microbial ecologyof mixed bacterial cultures grown on petri dishes. However, COMETS offers limited modelversatility in terms of uptake and reaction kinetics and only has few environmental feed-back mechanisms (namely, varying extracellular metabolite concentrations). Furthermore,it assumes complete knowledge of all required model parameters and provides no genericstatistical model analysis. Hence, while COMETS sets an important precedent, considerablework is still needed to make DFBA a practical approach in microbial ecosystem modeling.MCM extends Harcombe et al.’s framework to more versatile microbial ecological modelsthat include arbitrary reaction kinetics (e.g., subject to product-inhibition) as well as dy-namical environmental variables (e.g., pH) that influence, and are influenced by, microbialmetabolism. In addition, MCM supports cell models in which internal molecules act as dy-namical constraints that further restrict the FBA solution space, for example to account forpost-transcriptional regulation or delays in enzyme synthesis (38). These so called regula-tory FBA models have been shown to improve the fidelity of conventional FBA models forE. coli and S. cerevisiae (81–83, 187), however their application to microbial communitiesremains untested. MCM can statistically evaluate models against data, analyze their sensi-tivity to varying parameters (61), and estimate the uncertainty of model predictions in theface of stochasticity (171). Perhaps most importantly, MCM can automatically calibrateunknown model parameters to data, for example obtained from monoculture experiments(as demonstrated below), from bioreactor experiments involving multiple species (285) orfrom environmental samples of unculturable communities (Fig 4.2; see section C.1 for de-tails). MCM can thus be used to understand the dynamics of realistic microbial ecosystems,ranging from the soil or groundwater to mixed laboratory cultures and bioreactors.4.4 Results and discussion4.4.1 Successional dynamics of a microbial communityIn a series of laboratory evolution experiments with E. coli (strain B REL606; 538) in glucose-acetate supplemented medium, two metabolically distinct strains consistently evolved fromthe ancestral (A) strain (188, 262, 437). When grown in monoculture with the same mediumcomposition, all three strains exhibit diauxic growth curves with a fast glucose-driven growthphase followed by slower growth on acetate. However, the three strains differ in their ef-ficiencies to catabolize glucose and acetate: Strain SS (slow switcher) is a better glucoseutilizer when compared to strain A, and the depletion of glucose only leads to a slow switchto acetate consumption. On the other hand, the FS (fast switcher) strain has evolved to bea better acetate utilizer, initiating acetate consumption at higher remnant glucose concen-36Cell-metabolic models for microbial ecologytrations than strains A and SS. This acetate specialization is based on a tradeoff in the citricacid cycle and comes at the cost of being a less competitive glucose consumer.Replicated serial dilution experiments starting with strain A monocultures have shown aconsistent phenotypic diversification, involving an initial invasion of the SS phenotype anda subsequent invasion of the FS phenotype, leading to the eventual extinction or near-extinction of the ancestor and the stable coexistence of the SS and FS phenotypes (Fig4.3) (188, 262, 437, 488). Genome sequencing revealed that this metabolic diversificationcan be attributed to point-mutations in genes linked to glucose and acetate uptake kineticsand metabolism (188). The successional dynamics of the three phenotypes are thus likelydriven by adaptations to a changing metabolic niche space, defined by fluctuating glucose,acetate and, potentially, oxygen availabilities (188, 262, 488). An understanding of theunderlying ecological processes would shed light on the ecology and evolution of naturalmicrobial communities with shared catabolic pathways.To mechanistically explain the observed community dynamics, we used MCM to constructa model comprising the ancestral and the two evolved E. coli types. By keeping track ofpathway activation, cell densities, metabolic fluxes and nutrient concentrations, we gaineddetailed insight into the processes driving the successional dynamics of metabolic diversifi-cation.4.4.2 Experimental calibrationBased on a published cell-metabolic template for the ancestral E. coli strain comprising over2000 reactions (538), we first constructed three separate cell models for the phenotypes A,SS and FS, respectively. In these preliminary models, cells grew on a substrate pool thatresembled previous batch-fed monoculture experiments with glucose-acetate supplementedminimal medium (262). Cell-specific oxygen, acetate and glucose uptake rate limits wereMonod-like functions of substrate concentrations (114, 325). We calibrated several physi-ological parameters for each cell type to measured chemical concentration and cell densityprofiles, using least squares fitting (Fig 4.4). MCM automatically calibrates free parametersto data through an optimization algorithm that involves step-wise exploration of parameterspace and repeated simulations (see Appendix C.1.2).We then constructed the microbial community (MC) model by combining the three calibratedcell models into a community growing in a common substrate pool. The environmentalcontext resembles Herron and Doebeli’s evolution experiments (188). In particular, the model37Cell-metabolic models for microbial ecologyincludes realistic oxygen depletion-repletion dynamics (167), glucose and acetate depletionby microbial consumption, as well as daily dilutions into fresh glucose-acetate supplementedmedium at a factor 1:100. The microbial community initially consists mostly of type A (1010cells/L), while both SS as well as FS cells are assumed to be rare (1 cell/L). Because themodel is deterministic, the invasion or extinction of each type only depends on its growth ratein a possibly changing environment, but not on random mutation events, nor on demographicstochastic fluctuations.4.4.3 Predicting microbial community dynamicsSimulations of the MC model reproduced the successional dynamics observed in Herron andDoebeli’s experiments: An initial replacement of the ancestor by the SS type is followedby an invasion of the FS type, leading to the eventual coexistence of the SS and FS typesand extinction of the ancestral strain (Fig 4.5A). Interestingly, FS can also invade in theabsence of SS, however invasion occurs much slower and FS reaches lower densities than inthe presence of SS (Supplemental Fig. C.1). This is consistent with an early presence of theFS lineage at low densities in the evolution experiments (Fig 4.3), indicating that some ofthe first FS mutations already confer a slight advantage over the ancestor when FS is rare(188).Time series of acetate concentrations (Fig 4.5B) link the observed successional dynamics ofthe three types to a gradually changing metabolic niche space: The replacement of type Aby the more efficient glucose specialist SS leads to an accumulation of acetate and facili-tates the invasion of the FS type. The specialization of the SS and FS types on glucoseand acetate, respectively (Fig 4.6A), enables their long-term coexistence on glucose-acetateenriched medium through frequency dependent competition (132, 188, 262). In fact, cell-specific acetate exchange rates reveal that the SS type temporarily excretes acetate duringshort intervals, which is concurrently and subsequently consumed by the FS type (Fig 4.5G).This periodic acetate cross-feeding is an evolutionarily emergent property of the microbialcommunity (484). The temporary production of acetate by the SS type is consistent withprevious SS-FS co-culture experiments, which revealed slightly increased acetate concentra-tions towards the end of the SS exponential growth phase (436). An evolved increase ofacetate excretion by E. coli in glucose minimal medium has also been reported by Harcombeet al. (176).It should be noted that cell metabolism depends on substrate concentrations and is subjectto strong temporal variation. In particular, acetate excretion by SS cells correlates strongly38Cell-metabolic models for microbial ecologywith oxygen limitation (Figs. 4.5G,K). The excretion of acetate by E. coli as a byproductof oxygen-limited glucose catabolism has been observed experimentally and explained usingflux balance analysis (292). In the absence of oxygen limitation, complete aerobic glucosecatabolism to carbon dioxide is preferred over incomplete glucose catabolism with acetateexcretion. On the other hand, oxygen limitation leads to an energetic tradeoff betweencomplete glucose catabolism and efficient oxygen utilization, resulting in the excretion ofacetate.Furthermore, the depletion of oxygen during cell growth makes oxygen a temporary limitingresource for all cells (Fig 4.5K). Shortly after dilution into fresh medium, the exponentialgrowth of the SS type on glucose leads to a rapid drop of oxygen to nanomolar concentrations.Despite oxygen diffusion into the medium, oxygen remains at sub-saturation levels for severalmore hours because the slow-growing acetate-consuming FS cells still consume oxygen afterthe growth of SS cells has halted. Differences in SS and FS growth rates (Figs. 4.5C,E)thus mitigate competition for oxygen through temporal niche separation. Hence, oxygenlikely plays an important role in the metabolic diversification, as previously hypothesized byLe Gac et al. (262). This shows that the splitting of metabolic pathways across specialists canbe caused by the composite effects of competition for electron donors and electron acceptors.Consistent with differential substrate usage, average cell-specific reaction rates (Fig 4.6B)reveal differences in pathway activation: The transformation of acetate into acetyl-CoA byacetyl-CoA synthetase (acs) is predicted to be decreased in type SS and increased in type FS,when compared to the ancestral type. Furthermore, the conversion of phosphoenolpyruvateto oxaloacetate (ppc), the conversion of phosphoenolpyruvate to pyruvate (pyk) and thedecarboxylation of pyruvate to acetyl-CoA (pdh), linking the glycolysis pathway to the citricacid cycle, are all predicted to be upregulated in the SS type when compared to the FS type.Similar differences in pathway activation are also predicted during early exponential growthin monoculture (Fig 4.6C,D), because FS grows partly on acetate and SS excretes acetate (Fig4.4F,J). Previous microarray profiles of mRNA concentrations during exponential growth inmonocultures (262) found an upregulation of acetate consumption genes in FS and acetateexcretion genes in SS compared to A, qualitatively confirming our predictions (Fig 4.6C,D).Interestingly, our simulations suggest a significant downregulation of glucose catabolism(pyk, pdh and ppc) in FS compared to A, which contradicts the transcript profiles (Fig4.6D). This discrepancy may be explained by the fact that mRNA was harvested from well-aerated flasks, while the monoculture experiments (Fig 4.4) and evolution experiments (Fig4.3) were performed in test tubes where oxygen can become limiting (10). Oxygen becomesparticularly scarce in the FS tubes (Fig 4.4K) and temporarily limits glucose catabolism,39Cell-metabolic models for microbial ecologywhich would explain the strong downregulation not reflected in the transcript profiles (262).Furthermore, while broad pathway activation patterns could be qualitatively reproduced inour system, this might be harder in other cases due to post-transcriptional regulation orpost-translational modifications (38).The periodic (seasonal) changes in glucose and acetate concentrations in batch culture havepreviously been shown to promote coexistence of the SS and FS types, in analogy to themaintenance of phytoplankton diversity via fluctuations of resource availability (432, 436).Experiments with SS-FS batch co-cultures revealed that the SS type quickly dominates overthe FS type, when restricted to the first glucose-rich season through frequent dilution intofresh growth medium. Reciprocally, when SS and FS are grown in solution resembling thesecond glucose-depleted acetate-rich season, the FS type quickly dominates over the SS type(436). Accordingly, in a full batch cycle the relative SS cell density has been shown toculminate within 4-8 hours and to gradually decrease afterwards (132, Fig 6B), consistentwith our simulations (Fig 4.5D). Simulations of the SS and FS batch co-culture restrictedto the first or second season, analogous to Spencer et al.’s experiments, reproduce theseobservations and verify the role of periodic variation of glucose and acetate concentrationsin maintaining the coexistence of both types (Fig 4.7, see Appendix C.1 for details).4.5 ConclusionsThe models presented here make detailed predictions about the microbial dynamics in theconsidered experiments. First, after calibration the cell models largely explain the data fromthe monoculture experiments (Fig 4.4). Second, the predictions for pathway activation in thethree strains (Fig 4.6) are qualitatively consistent with most transcription profiles. Third,simulations of the microbial community consisting of all three strains (Fig 4.5) reproducethe successional dynamics of diversification observed in the evolution experiments (Fig 4.3).Fourth, simulations of the SS-FS co-cultures restricted to either the glucose-rich or glucose-depleted season reproduce the dominance of the SS or FS type (Fig 4.7), respectively, inconsistence with previous co-culture experiments. It is important to note that only datafrom monoculture experiments were used to calibrate the cell models for the three strains(A, SS and FS). In particular, no information from co-culture experiments was used in thesetup of the microbial community model, and thus there was no a priori knowledge aboutwhat the emergent community dynamics would be. Hence, our work conceptually producednon-trivial predictions that could be compared to experimental observations, although allexperiments had already been performed.40Cell-metabolic models for microbial ecologyOur work sheds light on the fundamental problem of metabolic diversification and the emer-gence of shared catabolic pathways. In particular, our microbial community model allowedquantitative predictions for the metabolic fluxes for each strain in co-culture, revealing tem-porary cross-feeding as an emergent property of the evolved community (484). Cross-feeding,conventionally seen as a beneficial interaction (333), thus emerged as a form of niche segre-gation driven by competition for organic carbon and oxygen. Because both evolved typesprefer glucose whenever available at high concentrations, but exchange acetate under oxygenlimitation, the community constantly switches between competitive and beneficial interac-tions. Natural microbial populations might thus also oscillate between negative and positiveinteractions, for example depending on oxygen levels.The models considered here were completely deterministic, in the sense that the growth andmetabolic activity of each strain were completely determined by the conditions in the testtubes. In particular, both evolved strains were included in the simulations right from the startat low densities, while the invasion or extinction of individual strains was contingent upontheir growth rates in an environment that changed in response to the activity of each strain.Our findings thus support previous suggestions that microbial evolution can be driven bydeterministic ecological processes (188, 358, 530). In this case, the observed diversificationis due to competition for limiting resources whose use is constrained by basic metabolictradeoffs. Other instances of ecological diversification in microbial evolution experiments,e.g., as reported by Plucain et al. (367), might be explained using a similar approach. Weemphasize that at longer time scales and in more diverse communities evolution may notbe as predictable as here, because horizontal gene transfer and rare but complex mutationscould introduce substantial stochasticity (39, 348). The transitions in community structureand activity following such rare events may still be understood in a deterministic frameworksuch as ours.We have demonstrated how MCM can be used to experimentally calibrate and combinegenome-based cell models to predict the emergent dynamics of microbial communities. Ourframework thus provides a starting point for designing microbial communities with particu-lar metabolic properties, such as optimized catabolic performance. While MCM is designedfor genome-based metabolic models, it can also accommodate conventional functional groupmodels. In these models, different ecological functions such as photosynthesis, heterotrophyor nitrification are performed by distinct populations whose metabolic activity is deter-mined, for example, by Michaelis-Menten kinetics and whose growth is described by simplesubstrate-biomass yield factors (Chapter 6; 193, 386). Hence, natural microbial communi-ties could be modeled even if annotated genomes are not available for each member species.41Cell-metabolic models for microbial ecologyWhile functional group models generally require fewer parameters, their calibration remainsa challenge (360). In MCM, model calibration becomes analogous to coefficient estimation inconventional multivariate regression and can be used to estimate poorly known parameterssuch as stoichiometric coefficients, growth kinetics or extracellular transport coefficients.To our knowledge, no existing comparable framework offers the flexibility combined withthe statistical functionality of MCM. In view of the increasing availability of genome-scalemetabolic models (118), our work provides a missing link to a predictive and synthetic mi-crobial ecology.A BFBAsolve linearoptimization problemspredict metabolitefluxesupdate metabolite and cell concentrations12 34Figure 4.1: (A) Conceptual framework used by MCM. Cells (colored shapes) optimize theirmetabolism for maximal growth and influence their environment via metabolite exchange (smallcolored arrows). Additional external fluxes can also affect the environment (large grey arrows).The environment, in turn, influences each cell’s metabolism. (B) Computational framework usedby MCM. Each iteration consists of four steps: flux balance analysis (FBA) is used to translatecell-metabolic potentials and environmental conditions (1) into a linear optimization problem forthe growth rate of each cell species (2). The set of possible reaction rates corresponds to a polytopein high-dimensional space. Solving the optimization problems (3) yields predictions on microbialmetabolite exchange rates (4). Metabolic fluxes and cell growth rates are used to predict metaboliteand cell concentrations in the next iteration (1).42Cell-metabolic models for microbial ecologymeasured time series(experimental or survey data)microbial community model(metabolites, reactions, cell species, environmental variables)MCM(a) NO3 concentrationloglik=8.89307 (log-normal error structure, 10 data points)estimated σ=0.209739 (relative SD=0.216783)SSR = 7.66345e-09 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meanOverall log-likelihood: 19.2918 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 0.918659Sum of normalized (unitless) squared residuals (SNSR) = 0.280244Average normalized squared residual = SNSR/data points = 0.0133450.00050.00060.00070.00080.00090.0010 5 10 15 20concentration (mol/L)time (day)(b) NH4 concentrationloglik=10.3988 (log-normal error structure, 11 data points)estimated σ=0.0970036 (relative SD=0.0976908)SSR = 4.62685e-08 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meandatamodel00.20.40.60.81NH4 conc.NO3 conc.normalized log-likelihoodobservable(c) normalized log-likelihood per observable(log-likelihood/data points)00.0050.010.0150.02NH4 conc.NO3 conc.average NSRobservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points)0123456780 5 10 15 20pHpH measured during experiment)time (days)(a) pHpH: linear interpolation (data/pH)random: logOU (mean 0.8, SD 2.7, tau 2) and exp(t/10)(b) randomcalibratedmodelcontrol script(MCM commands)set integrationTimeStep 0.005set maxTimeSeriesSize 10000set maxSimulationTime 100setodnew simulations/Ecoli_evolution_$now$set MCmodelDir Ecoli_community_modelsaveContextsaveMCmodelrunMCMquit02e+114e+116e+118e+111e+120 0.2 0.4 0.6 0.8 1density (dead+alive) (1/L)time (day)(a) Ecoli_FS density (dead+alive)loglik=19.1092 (log-normal error structure, 24 data points)estimated σ=0.183 (relative SD=0.187)SSR = 0.14 (data units)2, R2 = 0.907estimated data units = 9.86338e+11 * 1/Llikelihood made unitless by dividing by data meanOverall log-likelihood: -33.3277 (evaluated over 90 data points)Normalized log-likelihood (=log-likelihood/data points): -0.370308Sum of normalized (unitless) squared residuals (SNSR) = 21.0434Average normalized squared residual = SNSR/data points = 0.23381695% centilemodeldata00.00050.0010.00150.0020.00250.0030.00350.0040 0.2 0.4 0.6 0.8 1concentration (mol/L)time (day)(b) M_ac_e concentrationloglik=-16.2639 (normal error structure, 24 data points)estimated σ=0.000589SSR = 8.33e-06 (mol/L)2, R2 = 0.627fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata00.00010.00020.00030.00040.00050.00060.00070.00080 0.2 0.4 0.6 0.8 1concentration (mol/L)time (day)(c) M_o2_e concentrationloglik=-12.5224 (normal error structure, 18 data points)estimated σ=0.000118SSR = 2.53e-07 (mol/L)2, R2 = 0.615fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata00.00050.0010.00150.0020.00250 0.2 0.4 0.6 0.8 1concentration (mol/L)time (day)(d) M_glc_D_e concentrationloglik=-23.6507 (normal error structure, 24 data points)estimated σ=0.000334SSR = 2.68e-06 (mol/L)2, R2 = 0.682fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata-1-0.500.5Ecoli_FS dens. d.a.M_ac_e conc.M_o2_e conc.M_glc_D_e conc.normalized log-likelihoodobservable(e) normalized log-likelihood per observable(log-likelihood/data points)00.10.20.30.40.5Ecoli_FS dens. d.a.M_ac_e conc.M_o2_e conc.M_glc_D_e conc.average NSRobservable(f) average normalized squared residual per observable(sum of normalized squared residuals/data points)00.20.40.60.81Ecoli_FS dens. d.a.M_glc_D_e conc.M_ac_e conc.M_o2_e conc.R2observable(g) R2 per observable(coefficients of determination)01e+072e+073e+074e+075e+070 5 10 15 20density (1/L)time (day)(a) Nitrosomonas05e+061e+071.5e+072e+070 5 10 15 20density (1/L)time (day)(b) Nitrobacter01e+072e+073e+074e+075e+076e+077e+070 5 10 15 201/Ltime (day)(c) total cell densitysimulationspathwayactivation patternsmetabolic flux analysismodel-data comparisonSensitivity of focal observables with respect to parameter ’Vmax_glucose’Parameter varied between 5e-14 and 9e-14, default = 7.056e-14sensitivityanalysisstochastic model analysis00.00010.00020.00030.00040.000505101520c onc ent r at i on ( mo l /L )time (day)(a) NO3 concentrationloglik=15.3049 (log-normal error structure, 10 data points)estimated σ=0.11 (relative SD=0.111)SSR = 2.58e-09 (mol/L)2, R2 = 0.992fixed data units = mol/Llikelihood made unitless by dividing by data meanOverall log-likelihood: 26.4726 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 1.2606Sum of normalized (unitless) squared residuals (SNSR) = 0.144726Average normalized squared residual = SNSR/data points = 0.0068917195% centilemodeldata0.00040.00050.00060.00070.00080.00090.0010.00110.00120.001305101520c onc ent r at i on ( mo l /L )time (day)(b) NH4 concentrationloglik=11.1677 (log-normal error structure, 11 data points)estimated σ=0.0905 (relative SD=0.091)SSR = 3.93e-08 (mol/L)2, R2 = 0.876fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata00.20.40.60.811.21.41.61.8NO3 conc.NH4 conc.nor mal i ze d l og -l i ke li ho odobservable(c) normalized log-likelihood per observable(log-likelihood/data points)00.0010.0020.0030.0040.0050.0060.0070.0080.009NO3 conc.NH4 conc.av er ag e NS Robservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points)00.20.40.60.81NO3 conc.NH4 conc.R2observable(e) R2 per observable (coefficients of determination)para eterfittingmetabolitesreactionsspeciesenvironmental variablesmicrobial community modelMCM control scriptmetabolite concentrationsreaction ratescell densities ...measured time series data 0 5e-05 0.0001 0.00015 0.0002 0.00025 0.0003 0.00035 0.0004 0.00045 0 5 10 15 20 25concentration (mol/L)time (day)(a) NO3 concentration (data vs model)loglik=8.87154 (log-normal error structure, 9 data points)!=0.224249 (relative SD=0.232881)data units/(mol/L)=1likelihood made unitless by dividing by geometric mean of dataOverall log-likelihood: 17.8332 (evaluated over 19 data points)Normalized log-likelihood (=log-likelihood/data points): 0.938589datamodel 0.0005 0.00055 0.0006 0.00065 0.0007 0.00075 0.0008 0.00085 0.0009 0.00095 0 5 10 15 20 25concentration (mol/L)time (day)(b) NH4 concentration (data vs model)loglik=8.96166 (log-normal error structure, 10 data points)!=0.099155 (relative SD=0.099889)data units/(mol/L)=1likelihood made unitless by dividing by geometric mean of datadatamodel 0 0.2 0.4 0.6 0.8 1NO3 conc.NH4 conc.normalized log-likelihoodvariable(c) normalized log-likelihood per variable(log-likelihood/data points)-6e-05-4e-05-2e-05 0 2e-05 4e-05 0 5 10 15 20 25export rate (mol/(L*day))time (day)(a) HNO2_NO2-1e-45-5e-46 0 5e-46 1e-45 0 5 10 15 20 25export rate (mol/(L*day))time (day)(b) NO2 0 1e-05 2e-05 3e-05 4e-05 5e-05 6e-05 7e-05 8e-05 0 5 10 15 20 25export rate (mol/(L*day))time (day)(c) HNO3_NO3-1e-45-5e-46 0 5e-46 1e-45 0 5 10 15 20 25export rate (mol/(L*day))time (day)(d) NO3-1e-45-5e-46 0 5e-46 1e-45 5 10 15 20 25export rate (mol/(L*day))time (day)(e) NO-1e-45-5e-46 0 5e-46 1e-45 0 5 10 15 20 25export rate (mol/(L*day))time (day)(f) N2O-7e-05-6e-05-5e-05-4e-05-3e-05-2e-05-1e-05 0 0 5 10 15 20 25export rate (mol/(L*day))time (day)(g) NH3_NH4-1e-45-5e-46 0 5e-46 1e-45 0 5 10 15 20 25export rate (mol/(L*day))time (day)(h) NH3-1e-45-5e-46 0 5e-46 1e-45 0 5 10 15 20 25export rate (mol/(L*day))time (day)(i) NH4-0.0001-8e-05-6e-05-4e-05-2e-05 0 0 5 10 15 20 25export rate (mol/(L*day))time (day)(j) O2-1e-45-5e-46 0 5e-46 1e-45 0 5 10 15 20 25export rate (mol/(L*day))time (day)(k) NH2OHmodel predictions st istical evaluation against datasensitivity analysis parameter estimation (fitting)modelrefinement 0 0.0001 0.0002 0.0003 0.0004 0.0005 0 5 1 15 20concentration (mol/L)time (day)(a) NO3 concentration (data vs model)l glik=9.79005 (log-normal error structure, 10 data points)estimated σ=0.19 745 (relative SD=0.197111)SSR = 4.79772e-09 (mol/L)2fixed d ta units = mol/Llikelihood mad unitless by dividing by data meanOverall log-likelihood: 20.877 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 0.994143Sum of normalized (unitless) squared residuals (SNSR) = 0.206545Average normalized squared residual = SNSR/data points = 0.00983548datamodel 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 0 5 10 15 20concentration (mol/L)time (day)(b) NH4 concentration (data vs model)loglik=11.087 (log-normal error structure, 11 data points)estimated σ=0.0911208 (relative SD=0.0916902)SSR = 4.36279e-08 (data units)2stimated data units = 0.926906 * mol/Llikelihood made unitless by dividing by data meandatamodel 0 0.2 0.4 0.6 0.8 1 1.2NH4 conc.NO3 conc.normalized log-likelihoodobservable(c) normalized log-likelihood per observable(log-likelihood/data points) 0 0.002 0.004 0.006 0.008 0.01 0.012NH4 conc.NO3 conc.average NSRobservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points) 20.75 20.8 20.85 20.9 20.95 21 21.05 21.1 21.15 21.2 21.25 20 40 60 80 100 120data pointssimulation(a) Number of evaluated data points during fitting-25-20-15-10-5 0 5 10 15 20 20 40 60 80 100 120LLsimulation(b) Log-likelihood during fitting-1.2-1-0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 20 40 60 80 100 120NLLsimulation(c) Normalized log-likelihood during fitting(a) NO3 concentrationloglik=8.89307 (log-normal error structure, 10 data points)estimated σ=0.209739 (relative SD=0.216783)SSR = 7.66345e-09 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meanOverall log-likelihood: 19.2918 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 0.918659Sum of normalized (unitless) squared residuals (SNSR) = 0.280244Average normalized squared residual = SNSR/data points = 0.0133450.00050.00060.00070.00080.00090.0010 5 10 15 20concentration (mol/L)time (day)(b) NH4 concentrationloglik=10.3988 (log-normal error structure, 11 data points)estimated σ=0.0970036 (relative SD=0.0976908)SSR = 4.62685e-08 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meandatamodel00.20.40.60.81NH4 conc.NO3 conc.normalized log-likelihoodobservable(c) normalized log-likelihood per observable(log-likelihood/data points)00.0050.010.0150.02NH4 conc.NO3 conc.average NSRobservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points)0123456780 5 10 15 20pHpH measured during experiment)time (days)(a) pHpH: linear interpolation (data/pH)random: logOU (mean 0.8, SD 2.7, tau 2) and exp(t/10)(b) randomMCMFigure 4.2: Overview of MCM’s working principle and functionalities: A icrobial communitymodel is specified using human-readable configuration files in terms of metabolites, reactions, themetabolic potential of cell species and any additional environmental variables. Models with multipleecosystem compartments are also possible. A script with MCM commands controls the analysis ofthe model and, if needed, its calibration using experimental data. The calibrated model can alsobe used to create new, more complex models (as exemplified in this chapter).43Cell-metabolic models for microbial ecologyA B CFigure 4.3: Relative cell densities of the A, SS and FS types during three replicated evolutionexperiments by Herron and Doebeli (188, Fig. 2), starting with the same ancestral E. coli strain.Within each experiment, the illustrated SS or FS lineage comprises several strains with varyinglypronounced SS or FS phenotypes, respectively. Cell generations were translated to days by assumingan average of 6.7 generations per day (188).A B CD E F GH I J kFigure 4.4: Calibration of E. coli cell models using monoculture experiments. Continuouscurves: Time course of cell densities, glucose concentration, acetate concentration and oxygenconcentration (columns 1–4, respectively) predicted by MCM for monocultures of strain A, SS andFS (rows 1–3, respectively) grown on glucose-acetate medium. Points are data used for modelcalibration and were obtained from analogous monoculture growth experiments (262). Oxygendata were not available for strain A.44Cell-metabolic models for microbial ecologyABC D EF G HI J KFigure 4.5: Dynamics of the E. coli microbial community model. (A) Relative cell densitiesof the A, SS and FS types over time. (B) Acetate concentration over time. (C), (D) and (E): SSand FS cell densities, relative cell densities and growth rates over time, respectively, during stablecoexistence. (F), (G) and (H): Cell-specific glucose, acetate and oxygen uptake rates over time,respectively. Negative values correspond to export. (I), (J) and (K): Glucose, acetate and oxygenconcentrations over time, respectively. Diurnal fluctuations in all figures are due to daily dilutionsinto fresh medium. Tics on the time axes in (c–k) mark points of dilution.45Cell-metabolic models for microbial ecologyA BC Dacetyl-Pacetateacetyl-CoApta 9.3/0 (1.6)acs 0 (0.8)ack9.3/0 (1.4 ) environmentalacetate poolpyruvateglycolysisppc1.1 (1.7)environmentalglucose poolpdh1.6 (1.5)pyk 4.6 (1.7)acetyl-Pacetateacetyl-CoApta0/0 (0.9)acs1.9 (1.6)ack0/0 (1)pyruvateglycolysisTCA cycleppc0.8 (1)environmentalglucose poolpdh0.5 (1.8)pyk0 (0.7)environmentalacetate pool1.3 0.62.1TCA cycleacetyl-Pacetateacetyl-CoApta 9.3/0 (1.6)acs 0 (0.8)ack9.3/0 (1.4 ) environmentalacetate poolpyruvateglycolysisppc1.1 (1.7)environmentalglucose poolpdh1.6 (1.5)pyk 4.6 (1.7)acetyl-Pacetateacetyl-CoApta0/0 (0.9)acs1.9 (1.6)ack0/0 (1)pyruvateglycolysisTCA cycleppc0.8 (1)environmentalglucose poolpdh0.5 (1.8)pyk0 (0.7)environmentalacetate pool1.3 0.62.1TCA cycleFigure 4.6: Metabolic differentiation of the A, SS and FS types. (A) Predicted cell-specific netmetabolite uptake rates in co-culture. (B) Predicted cell-specific reaction rates in co-culture, for acs(acetyl-CoA synthesis), ack (acetate synthesis), pta (acetyl phosphate synthesis), ppc (oxaloacetatesynthesis from phosphoenolpyruvate), pdh (decarboxylation of pyruvate to acetyl-CoA) and pyk(pyruvate synthesis from phosphoenolpyruvate). Rates in (A) and (B) are averaged over all timepoints within the first 100 days of evolution, with reversed reactions or net metabolite exportrepresented by negative rates. (C) and (D): Simplified model subset of E. coli acetate and glucosemetabolism, showing pathway activations in type SS (C) and FS (D) relative to type A duringexponential growth in monoculture. Non-bracketed numeric values are ratios of predicted fluxesin the evolved types over fluxes in type A. Bracketed values are ratios of mRNA harvested frommonoculture experiments by Le Gac et al. (262), for comparison. A ratio of 0/0 indicates zero fluxin both the evolved and ancestral type, a ratio of 1 corresponds to an unchanged flux or mRNA,a ratio of 0 corresponds to complete deactivation in the evolved type. Darker arrows indicateincreased predicted fluxes in the evolved type. Flux predictions correspond to the time points ofmRNA measurements, i.e., 3.5 hours after dilution for SS and 4 hours after dilution for A and SS(262).46Cell-metabolic models for microbial ecologyA BFigure 4.7: Predicted relative cell densities of the SS and FS types in batch co-culture when re-stricted to either the first glucose-rich (A) or second glucose-depleted (B) season. In (A), restrictionto the first season was achieved by shorter dilution periods which prevented the complete depletionof glucose. In (B), restriction to the second season was achieved by using the glucose-depletedacetate-rich solution, produced by the full-batch co-culture, as growth medium (see the Methodsfor details).47Transients of competitive exclusionChapter 5Transient dynamics of competitive ex-clusion in microbial communities15.1 SynopsisMolecular profiling in bioreactors has revealed that microbial community composition canbe highly variable while maintaining constant functional performance, in accordance with apathway-centric paradigm for microbial ecology. Similarly, following perturbation bioreactorperformance typically recovers rapidly while community composition only slowly returns toits original state. At this point we still lack a detailed understanding of the actual mecha-nisms causing the discrepancy between functional and compositional stability of microbialcommunities. Using a mathematical model for microbial competition, as well as simulationsof a model for a nitrifying bioreactor, we explain these observations on grounds of slow non-equilibrium dynamics eventually leading to competitive exclusion. In the presence of severalcompeting strains, metabolic niches are rapidly occupied by opportunistic populations, whilesubsequent species turnover and the eventual dominance of top competitors proceeds at amuch slower rate. Hence, functional redundancy causes a separation of the time scales char-acterizing the functional and compositional stabilization of microbial communities. Thiseffect becomes stronger with increasing richness, because greater similarities between topcompetitors lead to longer transient population dynamics.5.2 IntroductionMicrobial metabolism drives the biochemistry of virtually all ecosystems and plays a centralrole in industrial processes such as biofuel production and wastewater treatment (16, 116).Thus, understanding the mechanisms that shape the dynamics and metabolic performance1A version of this chapter has been published (see the Preface for author contributions): Louca, S.,Doebeli, M. 2015. Transient dynamics of competitive exclusion in microbial communities. EnvironmentalMicrobiology. 18:1863–1874. DOI:10.1111/1462-2920.1305848Transients of competitive exclusionof microbial communities is of great practical importance. Experiments with bioreactorshave shown that bioreactor performance can be constant despite highly variable microbialcommunities (506). For example, following functional stabilization, methanogenic or nitri-fying bioreactors can exhibit species turnover for several more years (122, 524). In somecases non-equilibrium community trajectories have been reproduced across replicated ex-periments, suggesting that the underlying processes are deterministic (24, 495). Even whencommunities converge to a steady composition, recovery of community composition followingperturbation can take several months. This is in contrast to metabolic throughput, whichrecovers within a few days (144, 461). Fluctuations and the rate of stabilization of micro-bial communities are thus multifaceted properties that depend greatly on whether the focusis on metabolic function or taxonomic composition. An improved understanding of theseproperties in microbial communities is crucial for optimizing microbially driven industrialprocesses and interpreting the response of ecosystems to anthropogenic perturbations.It has been hypothesized that functional redundancy and non-equilibrium population dynam-ics within each metabolic compartment could promote fast stabilization of performance withslow convergence of community composition (46). Here we show that temporary populationdynamics leading to an eventual steady community composition via competitive exclusioncan indeed last much longer than the time required for the stabilization of overall metabolicperformance. We first formalize our reasoning using a microbial community model, in whichmultiple strains compete for a common resource. The model illustrates in a simple wayhow taxonomic community composition can vary almost independently of the community’smetabolic performance. We then construct a more realistic model of a nitrifying bioreactorand use simulations to demonstrate the validity of our arguments and their consistency withprevious experimental observations.5.3 Results and discussion5.3.1 Competition for a common resourceMicrobial community richness can be disproportionally high compared to the metaboliccomplexity of bioreactors (123, 523). In fact, the coexistence of organisms with similarmetabolic function in these systems has been contrasted to the “competitive exclusion prin-ciple” (122, 178). In the case of a single limiting resource, Tilman’s competition theorypredicts that at equilibrium the only persisting competitor will be the one that can surviveat the lowest resource concentration (479). However ecosystems can be subject to long tran-49Transients of competitive exclusionsient dynamics, i.e., temporary population dynamics far from equilibrium, and convergenceto equilibrium might occur at much longer time scales than assumed (180). For example,slow species turnover has been suggested to be responsible for the perplexingly high diver-sity seen in many microbial systems (67) and, as we show here, can explain the discrepancybetween functional and taxonomic stability in bioreactors.To formalize our argument, we use a model for multiple populations competing for the samelimiting resource. We focus on the transient dynamics eventually leading to steady state,where resource input is balanced by microbial consumption. The cell density of strain i,denoted Ni, as well as the resource concentration, denoted R, are described by the followingdifferential equations:dNidt= Ni[βiΦi(R)− λi], (5.3.1)dRdt= fo −∑iNiΦi(R). (5.3.2)Here, fo is the constant resource supply rate, λi is the decay rate of strain i in the absence ofgrowth, Φi(R) is the rate at which cells take up the resource as a function of R, and βi is abiomass yield factor. We assume that Φi(R) increases with R. For example, Φi(R) could bea Monod function that increases linearly with R at low concentrations but saturates at highconcentrations, an assumption often made in geobiological and bioengineering models (211).The last term in Eq. (5.3.2) is a sum over all strains accounting for resource depletion bymicrobial metabolism.The growth rate of strain i is positive whenever R is greater than the threshold concentra-tion, Roi , defined by Φi(Roi ) = λi/βi. In general, the equilibrium of Equations (5.3.1) and(5.3.2) is characterized by the extinction of all but one strain, namely the strain with thelowest survival threshold Roi . To elucidate the transient dynamics preceding this competi-tive exclusion, we consider the total cell density N = ∑iNi and the relative cell densitiesηi = Ni/N . Using the community-average growth kinetics (denoted Φ, βΦ and λ), one canderive the dynamicsdηidt= εi · ηi(βΦ− λ) (5.3.3)for the relative cell densities,dNdt= N(βΦ− λ)(5.3.4)50Transients of competitive exclusionfor the total cell density N , anddRdt= fo −NΦ (5.3.5)for the resource concentration R (details in Appendix D). Here, the εi account for deviationsof strain i growth kinetics from the community average. For example, if N is growing and εiis positive, then strain i grows faster than average and thus increases in relative abundance.As the resource is depleted, weaker competitors decay and the average growth kinetics aredetermined by a few remaining competitors of similar efficiency, for which the deviations εifrom the average become very small (εi  1). Hence, while the dynamics of N and R aredetermined by the community-average growth kinetics (Eqs. (5.3.4) and (5.3.5)), the relativecell densities are slowed down by the factors εi. This means that while metabolic niches arequickly filled, establishing a high rate of resource uptake, some of the competing popula-tions can coexist during prolonged transition periods until eventual competitive exclusion.In agreement with these predictions, Gentile et al. (144) reports a quick functional stabiliza-tion and long transient periods in community composition following mechanical shock, andVanwonterghem et al. (495) reports a gradually decreasing richness in anaerobic digestersover the course of several months following inoculation.The probability of similar strains being present in a random inoculum, or a microbial commu-nity in general, increases with the number of strains. In particular, the expected dissimilaritybetween top-competitors decreases with increasing community richness. The underlying as-sumption is that growth kinetic parameters are bound within some natural finite range.Hence, one should expect longer transient dynamics of competitive exclusion and slowerconvergence to a steady community composition at higher inoculum richness.It has been previously hypothesized that as richness increases, the variability of ecosystemfunctions decreases whereas the variability of individual populations increases (267, 283,480). The proposed mechanisms typically involve stochastic fluctuations of independentpopulations, so that the total community biomass and functional performance become morestable when more populations contribute to them. This statistical inevitability (105), whichhas been criticized on grounds of interspecific interactions (481), differs fundamentally fromthe deterministic mechanisms explored here. Namely, competition between strains leads to aslow decay of weaker competitors, which is compensated by the growth of other populationsthat stabilize overall functional performance.51Transients of competitive exclusion5.3.2 Bioreactors as model systemsThe above competition model explains how populations occupying a common metabolicniche can, in principle, undergo long transient periods of coexistence. The actual durationand nature of these transients depend on the similarity between competing strains, as wellas their typical intrinsic growth kinetics. To test the relevance of our predictions to realisticmicrobial communities, we examined a separate numerical model for a nitrifying bioreactor(524). Apart from their practical relevance to industrial processes such as sewage treatmentand biofuel production (16), bioreactors are also ideal model systems for understandingmicrobial ecology and processes shaping microbial community structure (123, 160, 495). Thebioreactor considered here is a flow-through chemostat continuously fed with ammonium(NH+4 ), which is aerobically oxidized to nitrate (NO−3 ) in a two-step process. Oxidationoccurs in a microbial community that consists of chemoautotrophic ammonium-oxidizingbacteria (AOB), which transform ammonium to nitrite (NO−2 ), and chemoautotrophic nitrite-oxidizing bacteria (NOB), which transform nitrite to nitrate. Nitrate is exported from thebioreactor as part of a continuous outflow through a filter membrane that retains cells withinthe bioreactor. The substrate feed rate and the hydraulic dilution rate are kept constant andin line with previous bioreactor experiments (524), allowing the establishment of a steadymetabolic throughput following an initial startup period.The bioreactor’s microbial community is modeled using differential equations for the cellpopulation densities and the ambient ammonium, nitrite and nitrate concentrations. Thesemetabolites are subject to microbial production and depletion, as well as physical additionand removal from the bioreactor. The metabolic activity of individual cells is determined us-ing flux balance analysis (FBA), a widely used framework in cell-metabolic modeling (354).In FBA, the chemical state of cells is assumed to be steady, leading to stoichiometric con-straints that need to be satisfied for any particular combination of intracellular reactionrates. These rates are assumed to be regulated by the cell in such a way that some objectivefunction, commonly associated with biomass production, is maximized subject to additionalconstraints on substrate uptake rates (119). In our case, the optimized biosynthesis rate istranslated to a growth rate by dividing by the cell mass. Ammonium and nitrite uptakerates are limited by substrate concentrations in a Monod-like fashion, thus constraining theachievable growth rates depending on the bioreactor’s chemical state (177, 292).The assumption of cells maximizing biosynthesis, subject to environmental and physiologicalconstraints, is rooted in the idea that evolution has shaped regulatory mechanisms to inducemaximum growth whenever possible (51, 176). This assumption is less valid for genetically52Transients of competitive exclusionengineered organisms or those exposed to environments that are radically different from theenvironments that shaped their evolution, and other objectives such as ATP production ormetabolic efficiency have been proposed (146, 417). Biosynthesis has been experimentallyverified as an objective for, among others, Saccharomyces cerevisiae, Escherichia coli andNitrosomonas europaea (107, 119, 364). Despite its limitations, FBA with maximization ofgrowth has greatly contributed to the understanding of several single-cell metabolic networksas well as metabolic interactions between cells (70, 129, 238, 354). One advantage of FBAmodels over full biochemical cell models is their independence of intracellular kinetics andgene regulation, which limits the number of required parameters to stoichiometric coefficientsand uptake kinetics. Recent work has shown that FBA-based models with maximization ofgrowth can accurately predict microbial community dynamics (70, 177, 284, 318).Our bioreactor model comprises multiple AOB and NOB strains, which are constructed byrandomly choosing several cell parameters around those of two template AOB and NOBmodels. The AOB and NOB templates were calibrated and validated beforehand usingdata from previous bioreactor experiments (Fig. 5.1; see Section D.1 for details). Becausemetabolites can be depleted or produced by several cells, the environmental metabolite poolmediates the metabolic interactions between cells (410). For example AOB deplete ammo-nium from their environment, rendering it a limiting resource that mediates competitionbetween AOB strains. The excretion of nitrite as a by-product, in turn, enables the growthof nitrite-limited NOB. The metabolic optimization of individual cells striving for maximalgrowth, while modifying their environment, thus leads to non-trivial community dynamicsthat can include cooperation, competition and extinction.5.3.3 Bioreactor community dynamicsFollowing inoculation of the bioreactor, two phases can generally be distinguished (Fig.5.2). Initially, the concentration of inflowing ammonium increases until AOB populationshave grown to sufficient densities to balance ammonium supply by ammonium consumption.The accumulation of nitrite as an AOB waste product, in turn, triggers the growth of NOBpopulations until nitrite production is eventually balanced by nitrite consumption. Thisinitial startup phase is dominated by fast-growing opportunists that benefit from an excessof substrates and little competition. The duration of this phase is mainly determined bythe hydraulic renewal rate, ammonium supply rate and bacterial growth rates, and theduration predicted by our model (roughly 3 weeks) is in line with typical nitrifying bioreactorexperiments (109, 302).53Transients of competitive exclusionAs ammonium and nitrite consumption increase, their concentrations decrease to near orbelow the survival thresholds for an increasing number of strains (Fig. 5.2C). This secondsaturation phase is characterized by low and relatively stable substrate concentrations, stag-nation of growth, a gradual extinction of less competitive strains and a long coexistence ofsimilar top competitors (Fig. 5.2A). The microbial community slowly converges to a stablecomposition of decreased diversity in which each metabolic niche is occupied by a singlestrain, with transient periods occasionally lasting up to several thousands of days. A grad-ual decrease in diversity is expected under the competitive exclusion principle of equilibriumecology (178) and is consistent with similar observations in previous bioreactor experiments(495). On the other hand, the total cell densities of metabolically similar strains (e.g.,AOB) stabilize much faster and only vary weakly during the saturation phase (Fig. 5.2B).Hence, each of the two available metabolic niches is rapidly filled by several competing andtemporarily coexisting strains, which are only slowly replaced by the top competitor.Our results show how transient dynamics of competitive exclusion can lead to a separation oftime scales characterizing functional and compositional stabilization of communities. Thisseparation of time scales is also expected to be reflected in the community’s response toperturbations. Perturbations such as mechanical biofilm removal (144) or nutrient shocks(461) can alter the relative abundances of individual clades or lead to a temporary collapseof the community. Such a collapse would initiate a race for the (re-)occupation of metabolicniches and a subsequent saturation phase, analogous to the dynamics following inoculation.For example, Gentile et al. (144) observed that, after the shearing of biofilms inside a fluidizedbed reactor, community composition recovered much slower, i.e., it had lower resilience (419),than the bioreactor’s performance.Simulations of the bioreactor model including a strong pulse perturbation, applied simulta-neously to the entire community, reproduced these observations (Fig. 5.3). The modeledperturbation corresponds to an increased mortality for one day, with a strength chosen ran-domly for each strain and resulting in a temporary collapse of the community by severalorders of magnitude. Consistent with experimental observations, the bioreactor’s perfor-mance quickly recovers within a few days to weeks (Figs. 5.3C,D), while the community’srecovery to its original composition typically takes several months to years (Figs. 5.3A,E,F).Metabolic niches are reoccupied rapidly and concurrently with the bioreactor’s functionalstabilization (Fig. 5.3B), however metabolic niches can be temporarily shared by severalcoexisting strains. Non-equilibrium processes, particularly following perturbation, are fre-quently thought to maintain high diversity, for example in rain forests (75) or phytoplankton(432). Furthermore, a meta-analysis by Shade et al. (419) found more studies reporting re-54Transients of competitive exclusioncovery of microbial community function than composition, following pulse perturbation.As predicted above by our competition model, the discrepancy between functional and taxo-nomic stability should be stronger for communities with high richness because the likelihoodof two similar top competitors increases, thus delaying competitive exclusion. Simulations ofbioreactors inoculated with different numbers of random strains verify this prediction. Forexample, the time until compositional convergence following inoculation, i.e., reaching a 90%Bray-Curtis similarity to the steady state (266), ranges from roughly 600 days for 20 strainsto 1300 days for 100 strains (median values, Figs. 5.2E,F). Moreover, richer communitiesare expected to be more prone to temporary changes in composition during perturbationbecause of a greater reservoir of opportunistic strains that could temporarily invade (524).This is reflected in our simulations, where a greater number of strains correlates with astronger change in community composition following the pulse perturbation (Figs. 5.3E,F).The insensitivity to disturbance is known in ecology as resistance and is, together with re-silience, a common measure of community stability (419). Our work suggests that microbialcommunities with higher functional redundancy have lower resilience and lower resistance topulse perturbation in terms of taxonomic composition.5.3.4 Variable does not mean unstablePrevious bioreactor experiments have revealed variable community composition despite sta-ble bioreactor performance over hundreds of days following inoculation (122, 495, 524, 548),while others have reported convergence to steady compositions within a few months (144,314, 533). Fluctuating community compositions are often interpreted as unstable, non-convergent or even chaotic. However, the observed dynamics may be mere transients ofslowly converging communities. Typical richness in bioreactors can range from hundreds tothousands of operational taxonomic units (OTUs, a species analog based on rDNA similarity)(234, 440, 495). As shown here, at these richness scales transient dynamics of competitiveexclusion can last several years. Much longer operation times might thus be needed to ac-tually observe an eventual community convergence in typical bioreactors. However, at thesetime scales other destabilizing processes, such as the invasion of new strains introduced bycontamination, could prevent community convergence.55Transients of competitive exclusion5.3.5 Model limitationsThe simple models considered in this chapter focus on generic ingredients of microbialecosystems, namely substrate-limited growth and competition, stoichiometric constraintson coexisting pathways, as well as physical substrate repletion and waste removal (e.g., incontinuous-flow bioreactors). In particular, we have assumed that microbial growth increaseswith increasing substrate concentrations, thus ignoring the possibility of substrate inhibition.For example, substrate inhibition can occur during nitrification by excess ammonia and ni-trous acid (15), resulting in reduced bioreactor performance (425). Similarly, growth mayalso be subject to product inhibition, e.g., when the partial pressure of accumulating wasteproducts renders a pathway unfavorable (225). Accurately modeling specific industrial se-tups or natural systems may thus require a consideration of more complicated kinetics, e.g.,including substrate and product inhibition. Further, metabolic niche structure in naturalsystems may be more complex than considered here since functional groups may partly orcompletely overlap, for example if a single organism performs both nitrification steps (89).Our main point is that long transient dynamics can emerge even in the simple cases con-sidered here, acknowledging that more complex communities are likely subject to furtherdestabilizing mechanisms (see below).5.3.6 Alternative destabilizing factorsTransient dynamics of competitive exclusion provide a simple explanation for the discrepancybetween functional and taxonomic stability of microbial communities, and our simulationsunderline the relevance of these processes at least to typical bioreactor setups. However,other mechanisms likely contribute to a long-term variability of community composition.For example, time lags associated with the degradation of organic matter, such as cellulosehydrolysis in anaerobic digesters (495), can result in slow changes of the metabolic land-scape and optimal electron flow, in turn driving adaptive changes in community composition(122). More complicated non-sequential pathways, ubiquitous in organic carbon catabolism,could also lead to positive feedback loops that further destabilize community dynamics. Fur-thermore, in contrast to well-controlled bioreactors, many natural ecosystems are subject tointense environmental variation that can drive adaptation and succession in microbial com-munities. For example, annual deep-water renewal in a seasonally anoxic fjord has beenshown to cause significant changes in microbial community structure (540).We emphasize that mechanisms that destabilize community composition need not necessarilydestabilize community function. For example, in open systems such as wastewater treatment56Transients of competitive exclusionplants (506) occasional invasion by novel competitors could drive species turnover withoutsignificantly affecting ecosystem functioning, however this scenario is unlikely in bioreac-tors with a sterile feed (495). Similarly, repeated adaptation of bacteriophages to dominanthosts (“killing the winner” dynamics) has been shown to sustain bacterial diversity and drivecontinuous species turnover (423, 475). Collapsing populations could be replaced by less sus-ceptible but functionally similar populations that ensure the overall stability of biochemicalfluxes.Reciprocally, negative feedback mechanisms stabilizing biochemical fluxes may only weaklyaffect community composition. For example, substrate build-up can promote the growthof functional groups benefiting from the underutilized resource, in turn counteracting theprocesses causing substrate build-up. This stabilizing mechanism, perhaps comparable to LaChâtelier’s principle of an “opposing force” (411), a priori acts on functional groups ratherthan taxonomy.5.3.7 Towards a pathway-centric microbial ecologyMarker gene-based taxonomic community profiling has become a standard approach in mi-crobial ecology (148, 524). However, metabolic functions may be performed by several com-peting clades and, conversely, members of the same clade can fill separate metabolic niches(3, 308). Such irregular metabolic trait distributions across clades are caused by diverseevolutionary processes, including adaptive loss of function and metabolic convergence accel-erated by frequent horizontal gene transfer (116). As a result, taxonomic profiles, at any tax-onomic level, often obscure the relationship between community structure and function. Forexample, environmental conditions and ecological function can show a stronger correlationto particular metabolic pathways or even individual genes, than to the distribution of par-ticular taxa (Chapter 2; 52, 349, 381). In fact, even physicochemically similar systems withsimilar functional community structure can exhibit markedly different taxonomic composi-tion (Chapter 3; 122). Consistent with this, as we have shown here, compositional stabilitycan be independent from functional stability while specific functional groups (for exampleAOB) remain synchronized with the community’s metabolic activity. Hence, prokaryotictaxa or OTUs should be questioned as ecologically meaningful units for describing commu-nity structure, at least when the focus is on ecosystem functioning (106, 145, 510). Microbialecology and biogeography might be best understood using pathway-centric theories in whichindividual genes, operons or pathways are considered as basic reproductive and functionalunits, particularly under conditions where metabolic function defines the microbial niche57Transients of competitive exclusionspace (109, 386). Accordingly, metagenomic, metatranscriptomic and metaproteomic profil-ing would be more suitable than taxonomic profiling for monitoring or predicting fluctuationsin ecosystem functioning (106, 181, 510). Alternatively, functional community structure maybe estimated from marker gene sequences by binning known OTUs into functional groups,as demonstrated in Chapters 2 and 3.5.4 ConclusionsConvergence of microbial community composition is a gradual process that can last muchlonger than typical bioreactor experiments and environmental surveys. Transient dynamicsof competitive exclusion explain why microbial communities can remain variable long afterinoculation or perturbation, while exhibiting high functional stability. The correct interpre-tation of observed community dynamics in bioreactors and natural ecosystems thus requiresa proper consideration of the involved time scales. Previous work has highlighted the gen-eral mismatch between the duration of typical experiments and the time scales assumed byconventional steady-state ecological theories (180), and our work demonstrates some of theimplications of this mismatch. Fluctuations in natural and less controlled microbial commu-nities likely result from several destabilizing processes, however the effects of these processescould be augmented by transient dynamics of competitive exclusion.Furthermore, less resilient and more flexible communities need not imply a compromisedfunctional stability, and previous experiments have indeed indicated a positive correlationbetween flexible community structure and stable performance (123). Several competingstrains can rapidly and concurrently fill a metabolic niche when opportunities arise, whileslowly replacing each other and maintaining constant performance during saturation. Thetime required for convergence or recovery of community composition correlates positivelywith functional redundancy, because more competitors are likely to have similar efficienciesunder substrate limitation.The extreme case in which each functional group consists of equal competitors is comparableto the so called emergent group theory in ecology, according to which assembly within eachgroup is subject to neutral dynamics (185, 197). In that limit transient periods of com-petitive exclusion can be extremely long, while community composition appears dissociatedfrom environmental conditions and driven by purely stochastic factors. While exact neu-trality is an extreme idealization, some natural communities may indeed include functionalgroups consisting of almost-equal competitors. For example, previous work on a wastewater58Transients of competitive exclusiontreatment plant found that fluctuations within the group of ammonia oxidizing bacteria,as well as within the heterotrophic community, were predominantly explained by neutralprocesses rather than environmental factors (349). Similarly, a global study of desert mi-crobial communities by Caruso et al. (64) found that climatic effects were detectable atthe whole community level, but became undetectable when restricted to variations withinthe photosynthetic or heterotrophic groups. Frossard et al. (134) found that spatiotempo-ral variations of microbial community structure in stream catchments were best describedby a neutral assembly model, whereas potential activities of several carbon-, nitrogen- andphosphorus-acquiring enzymes showed clear seasonal patterns. This disconnect betweenpotential enzyme activities and the composition of their host communities indicates highfunctional redundancy and a decoupling between community function and taxonomy. Simi-larly, Yin et al. (537) showed significant functional redundancy in soil microbial communitiesby measuring population responses to enrichment with individual carbon sources. However,it is unclear whether all detected OTUs were active prior to the enrichment, and at any pointin time a significant fraction of functionally similar clades may have only been present at lowdensities (418). Furthermore, subtle partitioning along additional non-functional axes suchas moisture or pH may create micro-niches that enable the long-term coexistence of function-ally similar populations, particularly in spatially or temporally heterogenous environments.For example, the coexistence of hundreds of subpopulations of the marine cyanobacteriumProchlorococcus is likely enabled by subtle niche differentiation such as adaptations to differ-ent nutrient availabilities (224). The role of neutrality in natural microbial communities andits proper reconciliation with niche theory remains controversial, and patterns that appearto result from neutral drift may in fact have underlying deterministic causes (see Chapter6). Nevertheless, our work shows that approximate neutrality within ecological niches canexplain several patterns of microbial community assembly in engineered environments andshould also be considered when interpreting the dynamics of natural microbial communities.59Transients of competitive exclusionA B DNH+4NO−2NO−3AOBNOBC EFigure 5.1: Calibration and validation of the template AOB and NOB cell models to data from anexperiment with a nitrifying batch-fed bioreactor (A) (94). Ammonium (NH+4 ) was added at thebeginning of the experiment, and was sequentially oxidized to nitrite (NO−2 ) and nitrate (NO−3 ) bya growing nitrifier community. (B) and (C): Ammonium (B) and nitrate (C) concentration timeseries data (dots), compared to the calibrated model (continuous lines). (D): AOB and NOB celldensities over time, as predicted by the model. (E): Nitrite concentration over time, as predictedby the model. Note that while the template cell models were calibrated using batch-bioreactorexperiments, for our subsequent analysis we consider continuously-fed flow-through bioreactorsbecause these can support metabolically active microbial communities at steady state.60Transients of competitive exclusionA C Est. saturation startup saturation st. saturationB D FFigure 5.2: Simulations of the nitrifying bioreactor under constant conditions. (A):Simulated cell densities over time in the ammonium-fed nitrifying membrane bioreactor, inoculatedwith 100 random strains (AOB in variations of red, NOB in variations of blue). (B) Correspondingtotal cell densities per functional group (AOB or NOB). (C) Corresponding ammonium (NH+4 ),nitrite (NO−2 ) and nitrate (NO−3 ) concentrations. (D) Corresponding community-wide ammoniumand nitrite uptake rates. (E,F): Distance of the community from the long-term steady state (interms of the Bray-Curtis dissimilarity index; 266), following inoculation with 20 (E) or 100 (F)random strains. Shown as a probability distribution over 100 random simulations (colors correspondto centiles). Notice the faster rate of convergence to steady state (i.e., resilience) in (E) compared to(F). The two intervals on the top of figures (A,C,E) indicate rough startup and saturation phases,respectively.61Transients of competitive exclusionA C EB D FFigure 5.3: Simulations of the perturbed nitrifying bioreactor. (A): Simulated cell densitiesover time in the ammonium-fed nitrifying membrane bioreactor, inoculated with 100 random strains(AOB in variations of red, NOB in variations of blue). A strong perturbation during day 5000(grey arrow) causes a temporary collapse of the microbial community. (B) Corresponding totalcell densities per functional group (AOB or NOB). (C) Corresponding ammonium (NH+4 ), nitrite(NO−2 ) and nitrate (NO−3 ) concentrations, at times near the perturbation. (D) Correspondingcommunity-wide ammonium and nitrite uptake rates, at times near the perturbation. (E,F): Bray-Curtis dissimilarity of the community to the state shortly prior to the perturbation, in bioreactorsinoculated with 20 (E) or 100 (F) random strains. Shown as a probability distribution over 100random simulations (colors correspond to percentiles). Notice the greater resistance to perturbationand greater resilience in (E), compared to (F).62Microbial communities infected by phagesChapter 6Taxonomic variability and functional sta-bility in microbial communities infectedby phages16.1 SynopsisAs shown in Chapters 2 and 3, microbial communities can display intense variation in taxo-nomic composition across space or time, and yet this taxonomic variation can coincide withstable metabolic functional community structure and constant biochemical performance.This decoupling between taxonomic and functional community structure is presumably en-abled by a high functional redundancy in the global microbiome, however the mechanismsdriving the sustained taxonomic variation within functional groups remain largely unknown.Predation by specialist lytic phages leading to “killing the winner” dynamics has been sug-gested as a potential cause of host turnover, however the plausibility and required conditionsfor this scenario have not been rigorously examined. Further, it is unknown how preda-tion by phages affects community metabolic processes and whether these effects are actuallymitigated by functional redundancy in the host populations. Here we address these issuesusing a model for a methanogenic microbial community that includes several interactingmetabolic functional groups. Each functional group comprises multiple competing strainswith distinct physiological parameters, and each strain is subject to predation by a specialistlytic phage. We find that phages induce intense taxonomic turnover within each functionalgroup, resembling the variability observed in past experiments. The functional compositionand metabolic throughput of the community are also disturbed by phage predation, but theybecome more stable as the functional redundancy in the community increases. Our work re-veals explicit mechanisms by which functional redundancy stabilizes ecosystem performance1A version of this chapter is under peer review for publication (see the Preface for author contributions):Louca, S., Doebeli, M. (in review). Taxonomic variability and functional stability in microbial communitiesinfected by phages.63Microbial communities infected by phagesand supports the interpretation that biotic interactions — rather than random populationdrift, as often suggested — drive taxonomic turnover within functional guilds.6.2 IntroductionMicrobial metabolism powers global biogeochemical fluxes (116) and is a key componentin many engineered ecosystems, such as biofuel production units or wastewater treatmentplants (16). Understanding the mechanisms shaping microbial community composition andfunction is thus paramount towards predicting ecosystem responses to anthropogenic changeand towards optimizing the performance of microbially driven industrial processes (316).Microbial communities can exhibit strong variation in taxonomic composition, both acrosstime and space, and yet this taxonomic variation can coincide with remarkably stable func-tional community structure (109, 288, 349, 397). For example, the proportions of severalmetabolic functional groups, such as nitrifiers, photoautotrophs and sulfate reducers, werefound to be very similar across replicate natural aquatic ecosystems despite strong taxonomicturnover within individual functional groups (288). Similarly, methanogenic and nitrifyingbioreactors operated under constant conditions were found to exhibit intense turnover ofbacterial operational taxonomic units (OTUs) despite constant overall biochemical perfor-mance (122, 123, 506, 524, 548). These observations point towards an elegant paradigmin microbial ecology, in which energetic and stoichiometric constraints determine functionalcommunity structure and performance, but a high degree of functional redundancy in theglobal microbiome enables taxonomic variability within individual functional groups. Theprecise mechanisms causing this taxonomic variability remain largely unknown. Neutralpopulation drift between equivalent competitors is sometimes suggested as a possible cause(349, 427), however non-random phylogenetic structure and co-occurrence patterns withinfunctional groups point towards deterministic community assembly mechanisms, notablybiotic interactions (288). In some cases complex non-equilibrium community trajectorieshave been reproduced across replicate isolated systems under constant conditions, furthersuggesting that the taxonomic variation within functional groups is not random (24, 495).Host-specific predation by lytic phages has been suggested as a potential mechanism promot-ing host succession through “killing the winner” (KTW) dynamics, in which abundant hostpopulations eventually collapse due to increased specialist predation, giving way to oppor-tunistic competitors (322, 397, 398, 423, 499). For example, previous experiments revealedstrong OTU turnover in a bioreactor, in which the temporary emergence of specific OTUs wasfollowed by the temporary increase in their specific phages, consistent with KTW dynamics64Microbial communities infected by phages(424). Predation by lytic phages is increasingly recognized as an important contributor tomicrobial mortality in natural and engineered ecosystems (135, 322, 476), and viral lysis hasbeen shown to significantly reduce the flux of microbially assimilated organic carbon to highertrophic levels (e.g., to protist grazers; 135, 324). On the other hand, the effects of phageson dissimilatory carbon transformations (e.g., via respiration and fermentation) are muchless understood (423), despite the fact that in many (e.g., methanogenic) environments mostorganic carbon is metabolized to byproducts for energy gain rather than assimilated into cellmass (312). Further, it is unclear whether — and under what conditions — phage-drivenKTW dynamics can actually explain the extreme discrepancy between taxonomic variabilityand functional stability observed in microbial communities (109, 288, 349, 397). While thisrole of KTW dynamics is often hypothesized, in practice time lags involved in the recovery orreplacement of collapsed populations, as well as potentially destabilizing interdependenciesbetween metabolic pathways, could prevent the functional stability of communities.To elucidate the effects of phage predation and microbial functional redundancy on com-munity structure and metabolic functioning, we constructed a mechanistic model for amethanogenic microbial community subject to predation by lytic phages, hosted within ananaerobic flow-through bioreactor. Bioreactors constitute powerful model ecosystems formicrobial ecology, because physicochemical conditions can be closely monitored and con-trolled, enabling replicate time series experiments. It is thus not surprising that bioreactorexperiments have greatly contributed to our mechanistic understanding of microbial com-munity assembly and of phage-host dynamics in particular (122, 312, 349, 423, 491). Wechose methanogenic bioreactors as a template for our model because methanogenic metabolicnetworks are well understood and of great industrial relevance (77, 179) and because thisallows for comparisons with previous experiments (122, 123).Our model considers the population dynamics of multiple microbial functional groups in-volved in the anaerobic catabolism of glucose (the input substrate) to methane (CH4), aswell as the concentrations of any intermediate metabolites (77; overview in Fig. 6.1). Specif-ically, in the first step input glucose is fermented to short-chain fatty acids, lactate andalcohols by several bacterial functional groups. These fermentation products are then fur-ther catabolized to hydrogen (H2) and acetate by “syntrophs”, i.e., bacteria that rely onthe rapid consumption of H2 and acetate by hydrogenotrophic (“H2/CO2”) and acetoclasticmethanogenic archaea. Each functional group initially comprises one or more distinct celllineages — henceforth referred to as operational taxonomic units (OTU), which catalyze thesame reaction but differ in several of their physiological parameters, such as their substratehalf-saturation constants. We use “OTU” as an abstraction representing a taxonomic group65Microbial communities infected by phages(such as a strain or species) that is sufficiently narrow so that reaction kinetics are simi-lar across members, and sufficiently broad so that different OTUs are infected by differentspecialist phages (69, 174, 521). The number of OTUs initially present in each functionalgroup (termed “functional redundancy”) is a key parameter in our analysis and accountsfor the presence of multiple functionally similar lineages in many bioreactors and naturalenvironments (128, 224, 288, 289, 349, 526, 537). Note that here functional redundancyonly refers to the number of redundant OTUs at the beginning of our simulations, while wemake no assumptions on the long-term persistence of OTUs. Each OTU is associated witha distinct phage population that infects cells and causes increased mortality through celllysis. Cell infection rates are proportional to phage concentrations and phage concentrationsare, in turn, driven by cell lysis rates. Physiological parameters were chosen randomly foreach OTU and each phage within realistic ranges (Table E.2), to account for the variationtypically seen between strains or species (207, 328, 352). As we describe below, our modelsuccessfully reproduces previous experimental observations and yields novel insight into theeffects of phage predation and functional redundancy on microbial community compositionand function.6.3 Results and discussion6.3.1 Bioreactor dynamics in the absence of phagesFollowing “startup” of the bioreactor, and in the absence of phages, the successive growthof fermenters, syntrophs and methanogens quickly leads to the stabilization of metabolicactivity within a few weeks (Fig. 6.2A). At this stage, microbial metabolism balances glucosesupply and residual substrate loss from the bioreactor, although the exact steady statemetabolite concentrations and community composition depend on the random parameterschosen. Competitive exclusion between reactions that are limited by the same substrateseventually leads to the persistence of only a subset of possible pathways driving glucosecatabolism to CH4 and CO2. Each reaction is eventually performed by at most one remainingOTU, characterized by its ability to persist at the lowest substrate concentration (Figs.6.2B,C). The bulk of biomass is attributable to fermenters and, to a lesser extent, syntrophs.Methanogens only account for a small fraction (< 1%) of the community because most ofthe energy available from glucose catabolism is extracted in the preceding steps (77).66Microbial communities infected by phages6.3.2 Effects of phages on community dynamicsWhen phages are included in the model, specialist predation by phages leads to intense andirregular fluctuations of individual host populations in accordance to KTW dynamics (Fig.6.2D). The duration of each infection cycle (i.e., the period from the initial detection to theeventual collapse of a phage population) varies greatly between phage-host pairs and betweensimulations, but is typically on the order of 20–150 days, consistent with similar durationsobserved in activated sludge (423). For many phage-host pairs these fluctuations closelyresemble classical predator-prey cycles, although most cycles are irregular in their phase andamplitude (Fig. E.3), owing to their strong indirect interactions. Such complex — oftenchaotic — dynamics are common in systems composed of interacting oscillating componentswith distinct random frequencies (161). When averaged over time, predation by phages hasdetrimental effects on individual cell populations as well as on overall reaction rates. Inparticular, in the absence of any functional redundancy, average methane production dropsdown to less than 1% of the production that would typically be achieved in the absence ofphages. Our model suggests that this reduction in performance can occur in at least two ways:First, increased cell mortality through cell lysis results in fewer cells that could consume aparticular substrate before it is lost from the bioreactor. Second, because phage predationis biased towards dominant OTUs, it skews selection towards potentially less competitive(in terms of metabolic efficiency) OTUs within each functional group, leading to residualsubstrate concentrations than are higher than the equilibrium substrate concentrations ofthe top competitor (479). Due to the temporal delays involved in the recovery of populationsor the opportunistic invasion by competitors, phages not only reduce the average metabolicthroughput, but also induce fluctuations around that average (Fig. 6.3C). This may explainpreviously observed fluctuations of bioreactor performance that could not be fully explainedusing purely energetic and reaction-kinetic models (109, 286, 429).Neutral demographic drift between equivalent competitors has previously been suggested asan explanation for seemingly random OTU turnover within functional groups in a wastew-ater treatment plant (349). In such systems, however, cell densities can be extremely high(e.g., ∼ 109 cells · L−1 in lakes and up to 1013 cells · L−1 in bioreactors; 249, 518) and hencedeterministic dynamics such as competitive exclusion between OTUs with even slight phys-iological differences are expected to dominate over pure demographic drift. For example,even at a population size of only 105 cells and a constant difference in growth rates of only1% between two competing OTUs, stochastic trajectories accounting for demographic driftwould closely resemble the deterministic trajectory of exponential decline of the weaker com-petitor (e.g., R2 ∼ 0.98 for a birth-death model with constant combined population size, see67Microbial communities infected by phagesMethods for details). In contrast, our model shows that simple deterministic mechanismscan easily explain the sustained turnover between non-equivalent competitors observed inrealistic settings, without the need for unrealistic neutral models.6.3.3 Functional redundancy promotes functional stabilityWhen considering multiple degrees of functional redundancy, we find a clear trend towardshigher as well as more stable overall methane production rates at elevated functional redun-dancies (Figs. 6.3A,B). This suggests that the opportunistic growth of functionally similarOTUs may mitigate the detrimental effects of phage predation on community function byfilling underutilized metabolic niches, thereby increasing and stabilizing overall communityfunction. The probability that an appropriate alternative OTU is able to quickly replacea collapsing competitor increases with the functional redundancy in the initial inoculum(i.e., the available “seed bank”). Functional niche complementation is generally thought topromote a positive correlation between community richness and functional stability againstexternal environmental perturbations (46, 365, 480, 522). Our work suggests that func-tional complementation in microbial communities also mitigates the detrimental effects thatintrinsically emerging (rather than externally driven) fluctuations can have on ecosystemfunctioning. This may explain the occasionally observed positive relationship between mi-crobial species diversity and biochemical performance, particularly at low diversities, evenin the absence of external perturbations (163, 516, 526). The mechanism proposed hereis fundamentally different from known mechanisms leading to a positive diversity-stabilityrelation in classical food webs, as these mechanisms either involve a differential responseof competitors to environmental perturbations (282) or are based on a skewed distributiontowards weaker consumer-resource interactions (310). We mention that phage-driven KTWdynamics have been previously hypothesized to stabilize bioreactor performance by prevent-ing competitive exclusion and hence maintaining a high functional redundancy that would,in turn, increase resilience to perturbations (423). Our analysis suggests that this interpre-tation may be slightly misleading, because phage predation itself can severely reduce anddestabilize community function, while functional redundancy (e.g., ensured by a rich inocu-lum or exposure to a large pool of potential colonizers) acts to mitigate these destabilizingeffects.At increased functional redundancy, our model predicts that the proportions of various func-tional groups (in terms of relative cell abundances) become more stable, although the extentof this stabilization depends on the functional groups considered (Fig. 6.4I–L). The sta-68Microbial communities infected by phagesbilization of functional community structure is especially pronounced at a “coarse” level,i.e., when considering the proportions between fermenters, syntrophs and methanogens (Fig.6.4I). Specifically, the coefficient of variation of these coarse groups drops from ∼0.9 (medianvalue) in the absence of functional redundancy down to ∼0.2 at 100-fold functional redun-dancy. Since these groups represent indispensable and stoichiometrically coupled catabolicsteps in the bioreactor, it is not surprising that their relative productivities are subjectto strong stabilizing forces. Our model thus provides an explicit explanation for the dis-crepancy between stable functional community profiles and the highly variable taxonomicprofiles often observed under constant environmental conditions (109, 288, 349, 397), andhighlights the central role of functional redundancy coupled with biotic interactions in pro-moting this discrepancy. In particular, our results support the interpretation that the strongOTU turnover observed within functional groups in previous studies and in Chapter 3 was atleast partly caused by biotic interactions including (but not necessarily limited to) predationby phages, rather than by neutral drift. Prolonged transients of competitive exclusion mayalso lead to slow OTU turnover within functional groups, especially following environmentalperturbation (Chapter 5), although the detection of correlated OTU and phage successionin bioreactors provides additional evidence for KTW dynamics in these systems (397, 424).On a finer resolution, i.e., when considering the proportions of single-reaction functionalgroups (e.g., acetoclastic vs H2/CO2 methanogens), the extent of stabilization depends onthe specific set of functions considered (Figs. 6.4J–L). For example, individual ferment-ing functional groups (A–F in Fig. 6.1) appear interchangeable even at high functionalredundancies (Fig. 6.4J), presumably because these groups represent strongly overlappingmetabolic niches (they all ferment glucose). Fluctuations between these reactions, in turn,are predicted to drive comparably strong fluctuations in the proportions of syntrophic func-tional groups (G–J in Fig. 6.1) specializing on different fermentation products (Fig. 6.4K).Irregular transitions between alternative (“parallel”) catabolic electron flows — congruentlywith stable overall catabolic performance — have indeed been observed in previous exper-iments (122). In contrast, the proportions of the two methanogenic groups (K,L in Fig.6.1) quickly stabilize at increasing functional redundancy. These findings reveal that thestability of functional community structure depends on the precise definition of functionalgroups, because non-identical functional groups may be interchangeable in case of overlap(365). A distinction between parallel and sequential functional groups is particularly crucialfor “branched” metabolic networks such as organic carbon catabolism but may be less rele-vant for more sequential functions such as nitrification (oxidizing ammonium to nitrite andthen nitrate) or denitrification (reducing nitrate to nitrite, nitric oxide, nitrous oxide and69Microbial communities infected by phageseventually nitrogen gas; 144).Although not modeled, phage-host co-evolution could further destabilize population dynam-ics within functional groups, for example due to the repeated emergence of new resistantstrains. In addition, rapid evolution of host resistance, for example via clustered regularlyinterspaced short palindromic repeats (CRISPRs; 13), could buffer the impacts of viruseson overall ecosystem functioning (269). While evolution is likely an important componentof phage-host dynamics in natural systems (156, 322, 424), our work suggests that deter-ministic ecological dynamics are sufficient to explain the succession of OTUs often observedduring constant community function. In fact, Shapiro et al. (424) observed rapid successionbetween distantly related OTUs and their associated phages in a bioreactor, suggesting thatreplacement by non-related competitors — rather than adaptive evolution of resistance —was indeed the main mode of host succession.We note that the destabilizing role of phage predation and the stabilizing role of functionalredundancy, as predicted by our model, rely on the assumption that lytic phages are special-ized on single host populations (a prerequisite for KTW dynamics). High host specificities ofphage infectivity (e.g., at the strain or species level) are generally considered to be the rule(69, 174, 521), however it is becoming apparent that host specificity may be more variablethan previously assumed (191, 423). In reality multiple microbial clades may be susceptibleto the same phages and hence the effective functional redundancy in the system may be muchlower than the actual number of bacterial and archaeal strains. In cases where host specificityis the exception rather than the rule, the dynamics are expected to be markedly differentthan predicted here. For example, experimental and theoretical work shows that generalistpredation by protist grazers can severely and permanently reduce community function evenat high prey diversity (213, 366, 474, 476). Further, temperate phage strategies, not consid-ered here, likely have less severe effects on host populations and metabolic throughput thanpredicted by our model. For example, recent findings suggest that phages may be switchingfrom lytic to lysogenic during high host abundances, a behavior termed “piggy-backing thewinner” (243), and this switching may dampen oscillations in host abundances. Hence, inenvironments characterized by temperate — rather than lytic — phage-host interactions (asmay be the case for the human gut, 391), KTW dynamics will be less pronounced and hencemicrobial communities may be more stable in terms of their taxonomic composition. Further,recent work suggests that metabolic host reprogramming by viruses can direct energy andnutrients towards viral replication, potentially altering biogeochemical cycling (199, 405).It is likely, however, that the time scales of such host-phage co-evolutionary dynamics aremuch longer than — and thus of limited relevance to — the ecological successional dynamics70Microbial communities infected by phagesconsidered here.6.3.4 Statistical averaging or dynamic stabilization?As discussed above, a high functional redundancy in the microbial community can reversethe destabilizing effects that KTW dynamics have on net metabolic activity and on theabundances of functional groups (Figs. 6.4A–L). Such stabilizing effects at increased speciesrichness have been hypothesized in the past, based on the expectation that the effects ofmultiple fluctuating populations on broad community properties (such as total biomass oroverall catabolic activity) ought to “average” each other out (267, 283, 480). According tothis interpretation, which assumes that populations fluctuate independently, the stabiliza-tion of broad community properties at elevated functional redundancy is a simple statisticalnecessity (46, 105). In reality, however, populations are inevitably coupled due to competi-tion for resources and metabolic interdependencies, and it is a priori unclear whether theseinteractions lead to stronger or weaker averaging effects (481). For example, fluctuationsin metabolite concentrations caused by changes in specific microbial populations will affectthe growth of all functional groups consuming or producing the particular metabolites, andthese effects will typically act in a similar direction on all members within a functional group.Such positive correlations in the response of competing populations to chemical perturba-tions would act against stabilization by averaging. On the other hand, the collapse of aparticular population due to cell lysis eventually frees a metabolic niche that can be occu-pied by competing populations, and such a “dynamic stabilization” would likely lead to ahigher functional stability when compared to mere statistical averaging. A distinction be-tween the two stabilizing mechanisms — statistical averaging vs dynamic stabilization — iskey to understanding the effects of (metabolic) community interactions on overall functionalstability.To assess whether the functional stabilization observed in our simulations at high functionalredundancy is a mere averaging effect or dynamic, we compared the coefficients of variationof functional group abundances to a null model in which the time course of each cell pop-ulation was randomly shifted in time. This null model resembles the hypothetical scenarioin which populations fluctuate independently of one another, while preserving the broadfeatures of these fluctuations. We defined the “degree of dynamic stabilization” (DDS) in aparticular simulation as the probability that the null model would lead to a higher coefficientof variation (averaged across functional groups) than observed, estimated through repeatedrandom trials. We find that coarse functional group proportions exhibit a high DDS that71Microbial communities infected by phagesapproaches 1.0 at high degrees of functional redundancy (Fig. 6.4M), consistent with theinterpretation that opportunistic competitors quickly replace collapsing populations. Onthe other hand, we find contrasting effects for the proportions of single-reaction functionalgroups. Specifically, the proportions of various fermenting groups exhibit the lowest DDS(∼ 0.1− 0.6, Fig. 6.4N), while the proportions of the two methanogenic groups exhibit thehighest DDS (up to ∼ 0.9 − 1.0, Fig. 6.4P). This is consistent with the interpretation thatgroups consuming similar substrates (e.g., glucose in the case of fermenters) are highly inter-changeable and hence their proportions are only weakly stabilized. On the other hand, therapid stabilization of H2 and acetate concentrations at high functional redundancies appearsto promote a stable ratio between H2/CO2 and acetoclastic methanogen populations.We find that the total cell concentration is only occasionally dynamically stabilized and isoften less stable than predicted purely based on statistical averaging (DDS∼0.1–0.7 at 100-fold functional redundancy). This means that fluctuations of single populations can lead tosynchronized and similarly signed responses across multiple functional groups (e.g., a collapseof the dominant glucose fermenters also induces a temporary collapse of methanogens). Inconsequence, the absolute abundances of individual functional groups are less stable thantheir relative proportions, even at elevated functional redundancies. This means that caremust be taken when assessing the stability of functional community profiles, for examplebased on metagenomic sequences (97), because such profiles generally only reflect the relativebut not the absolute abundances of functional groups.6.3.5 Phages promote prokaryotic diversityClassical competition theory predicts that at steady state only a single competitor can per-sist within a metabolic functional group that is limited by a single substrate, and this com-petitor is determined by its ability to survive at the lowest possible substrate concentra-tion (178, 479). Consistent with this competitive exclusion principle, when we excludedphages from our model each functional group was eventually occupied by at most one OTU(e.g., Fig. 6.2C). In reality, however, microbial richness can be extremely high even insimple engineered ecosystems, such as nitrifying bioreactors (109), methanogenic digesters(123, 150) or activated sludge (234), where it can range from hundreds to thousands ofOTUs. Such high richness is in apparent contradiction to the competitive exclusion prin-ciple. Mechanisms proposed in the past to explain this discrepancy include slow transientdynamics of competitive exclusion far from steady state (285), negative frequency depen-dence through non-linear substrate dependencies (281, 449), externally driven fluctuations72Microbial communities infected by phagesof resource availability or physical conditions (432, 449), as well as phage-driven KTW dy-namics (48, 320, 424, 474, 475, 512). Our work provides support for KTW dynamics as alikely explanation for sustained high OTU richness within metabolic functional groups, es-pecially when high richness is observed in the absence of obvious environmental fluctuations(122, 349, 424). For example, in our simulations with 20-fold functional redundancy, in thelong term each functional group is typically occupied by 5–10 OTUs, each of which occasion-ally and temporarily increases in relative abundance (e.g., Figs. 6.2E,F). Similarly, as men-tioned earlier, the repeated disruption of reactions prevents the long-term mutual exclusionbetween alternative reactions limited by the same substrates, resulting in a “parallelization”of catabolic fluxes. Notably, during most simulations multiple fermenting groups would con-tribute (occasionally or at all times) to the catabolism of glucose. Phage-induced catabolicparallelization can thus maintain functional diversity in addition to diversity within func-tional groups and may explain the cooccurrence of alternative intermediate products (e.g.,multiple fatty acids) sometimes observed in methanogenic digesters (179, 533). We notethat in certain cases (e.g., at high functional redundancy) slow transients of competitiveexclusion can also lead to prolonged coexistence of OTUs with similar metabolic efficiencies(e.g., Fig. 6.2A,B and Chapter 5), and in reality multiple mechanisms likely contribute tothe maintenance of richness within a community.6.4 ConclusionsMicrobial communities can display strong taxonomic variation across space (288, 307) ortime (109, 349, 397) even under similar environmental conditions, while exhibiting relativelyconstant functional community structure. In fact, recent work showed that physical andchemical conditions strongly predict the distribution of metabolic functional groups acrossthe world’s ocean, but only poorly explain the taxonomic composition within individualfunctional groups (289). Both Louca et al. (288) and Louca et al. (289) found no effect ofspatial dispersal limitation on community differences, and Louca et al. (288) showed thatOTUs within individual functional groups displayed non-random phylogenetic relationshipsand non-random co-occurrence patterns that were at least partly caused by biotic inter-actions. These findings suggest an elegant paradigm for microbial ecology, in which thefunctional structure and the taxonomic composition within functional groups constitute twoseparate facets of community composition, with the former being driven by stoichiometricand energetic environmental constraints, while the latter is heavily shaped by complex bi-otic interactions. Our mechanistic model provides further support for such an interpretationby explicitly demonstrating how — and under which conditions — predation by specialist73Microbial communities infected by phagesphages can drive OTU turnover while maintaining constant functional community structureand metabolic performance. Experimental evidence for phage-driven KTW dynamics re-mains limited (322, 357, 397, 398, 424, 499), mostly due to the technical difficulties involvedin virome profiling and in linking particular phages to their specific hosts in natural ecosys-tems. Nevertheless, many natural environments, including the open ocean (462), soil (18)and the human gut (391), exhibit high phage densities and it is likely that phages also con-tribute to the decoupling between taxonomic and functional community structure in theseenvironments (134, 288, 289, 307, 379, 381).The decoupling between functional and taxonomic community structure, especially at highfunctional redundancy, has important implications for the interpretation of microbial bio-geographical patterns, because variation in taxonomy need not imply differences in commu-nity function. Reciprocally, environmental constraints determining community function mayonly poorly explain the distribution of individual taxa. If phage-induced KTW dynamicsindeed strongly shape the taxonomic variation within functional groups, as suggested bythis and previous work, then the aspiration of accurate microbial species distribution models(258, 455) may turn out to be a Sisyphean struggle, especially when important functionaltraits are non-monophyletic (308). Disentangling the functional and taxonomic variationin microbial communities is thus an important prerequisite for a truly predictive microbialecology.74Microbial communities infected by phagesglucoseacetate H2ethanol lactate butyratepropionatemethaneAKGglucose fermentationsyntrophy (catabolism of alcohols, lactate and SCFA)acetoclastic and H2/CO2 methanogenesisB C D E FH I JLFigure 6.1: Modeling methanogenic communities. Simplified metabolic network of anaerobicmethanogenic communities (77), as modeled in this study. Each circle represents one of 12 func-tional groups specialized on a particular metabolic reaction and each reaction can be performedby multiple competing OTUs (“functional redundancy”). The network is roughly structured into3 sequential “coarse” functional groups based on the type of substrate used (fermentation, syntro-phy and methanogenesis). Detailed reaction stoichiometry in reference to the inscribed letters isprovided in Table E.1.75Microbial communities infected by phages01e+092e+093e+09500 600 700 800 900 1000cell abundances (cells/L)time (days)Species proportions (HI_20x, run_21, all species)02e+114e+116e+118e+110 200 400 600 800 1000cell abundances (cells/L)time (days)Species abundances (NI_20x, run_09, all species)00.20.40.60.810 200 400 600 800 1000proportionstime (days)methanogens_ac OTU proportions (NI_20x, run_09)00.20.40.60.810 200 400 600 800 1000proportionstime (days)methanogens_H2 OTU proportions (NI_20x, run_09)00.20.40.60.810 200 400 600 800 1000proportionstime (days)methanogens_H2 OTU proportions (HI_20x, run_21)C H2/CO2 methanogens B acetoclastic methanogensall OTUsA0246810¹¹ cells/L12310⁹ cells/LFEDno phageswith phagesOTU proportionsOTU proportions00.20.40.60.810 200 400 600 800 1000proportionstime (days)methanogens_ac OTU proportions (HI_20x, run_21)1time (days)time (days)2 4 6 8time (days)ethanol-consuming syntrophethanol-producing fermenterethanol-consuming syntrophFigure 6.2: Phage predation drives OTU turnover. (A) OTU abundances (one color perOTU) during a single simulation without phage predation, at 20-fold functional redundancy. Com-petitive exclusion leads to the extinction of all but a single fermenter and all but two very similarsyntrophs (although eventually only one syntroph remains, not shown). Methanogens exhibit muchlower abundances than fermenter and syntrophs and are thus not visible. (B,C): Proportions ofacetoclastic methanogens (B) and proportions of H2/CO2 methanogens (C) (one color per OTU)during the same simulation as (A). (D–F): Analogous to (A–C), but for a simulation includingphage predation. Phage-host interactions drive variation in overall cell abundances (D) as well asin the OTU composition within functional groups (E,F).concentration (μM)functional redundancy1x 5x 20x 50x 100xA B0306090mean effluent methane C effluent methane D effluent methane 0 40 80 120 500 600 700 800 900 1000µMtime (days)methane concentrations (100x, run_23) 0 40 80 120 500 600 700 800 900 1000µMtime (days)methane concentrations (100x, run_23)µMimethane concentrations (20x, run_21)methane concentration (μM)01234CV of effluent methane20x functional redundancy 100x functional redundancyfunctional redundancy1x 5x 20x 50x 100x 500 600 700 800 900 1000µMti e (days)t c c tr ti s (2 x, run_21) 0 40 80 120 500 600 700 800 900 1000µMtime (days)methane concentrations (20x, run_21)coefficient of variationFigure 6.3: Functional redundancy increases and stabilizes methane production in thepresence of phages. (A,B): Temporal (A) averages and (B) coefficients of variation of effluentmethane concentration, for various degrees of functional redundancy. Box plots represent the dis-tribution across 100 random simulations. Vertical bars comprise 95% percentiles. (C,D): Methaneconcentrations during a single simulation at (C) 20-fold and (D) 100-fold functional redundancy.76Microbial communities infected by phages00.20.40.60.81500 600 700 800 900 1000proportionstime (days)Methanogenic gene proportions (HI_100x, run_22)00.20.40.60.81500 600 700 800 900 1000proportionstime (days)Syntrophic gene proportions (HI_100x, run_22)00.20.40.60.81500 600 700 800 900 1000proportionstime (days)Fermenting gene proportions (HI_100x, run_22)00.20.40.60.81500 600 700 800 900 1000proportionstime (days)Coarse gene group proportions (HI_100x, run_22)00.20.40.60.81500 600 700 800 9 0 1000proportionstime (days)Fermenting gene proportions (HI_20x, run_21)00.20.40.60.81500 600 700 800 900 1000proportionstime (days)Syntrophic gene proportions (HI_20x, run_21)00.20.40.60.81500 600 700 800 9 1000proportionstime (days)Coarse gene group proportions (HI_20x, run_21)00.20.40.60.81500 600 700 800 9 1000proportionstime (days)Methanogenic gene proportions (HI_20x, run_21)functional redundancyDDSfermenting groupscoarse functional groups syntrophic groups methanogenic groups1x 5x 20x 50x 100x0.00.20.40.60.81.0proportions0.0.20.40.60.81.0A B C Dproportions0.0.20.40.0.81.0 E F G H20x functional redundancy100x functional redundancyM N O P0.00.81.62.43.2I J K Lcoefficient of variation500 600 700 800 900 1000time (days)500 600 700 800 900 1000time (days)0 0 0 0 0 10time (days)00 9 10ti e (days)functional redundancy1x 5x 20x 50x 100xfunctional redundancy1x 5x 20x 50x 100xfunctional redundancy1x 5x 20x 50x 100xFigure 6.4: Effects of functional redundancy on functional community composition inthe presence of phages. Row 1: Proportions of (A) coarse functional groups (total methanogensvs total fermenters vs total syntrophs), (B) fermenting groups, (C) syntrophic groups and (D)methanogenic groups over time (one color per functional group), during a random simulation at 20-fold redundancy. Note that each functional group consists of multiple OTU populations (individualOTU concentrations not shown). Row 2: Similar to row 1, but for a simulation at 100-fold functionalredundancy. Row 3: Coefficients of variation (CV) for the proportions of (I) coarse functionalgroups, (J) fermenting groups, (K) syntrophic groups and (L) methanogenic groups, at variouslevels of functional redundancy. Row 4: Degree of dynamic stabilization (DDS) of functional groupproportions (corresponding to I–L), at various levels of functional redundancy. Box plots (I–P)represent the distribution of CVs (I–L) and DDSs (M–P) across 100 random simulations; verticalbars indicate 95% percentiles.77Gene-centric modeling of the Saanich Inlet OMZChapter 7Gene-centric modeling of the SaanichInlet oxygen minimum zone17.1 SynopsisThe work presented above provides strong support for a pathway-centric paradigm of mi-crobial ecology, according to which the distribution and activity of microbial metabolicpathways is strongly shaped by environmental physicochemical conditions, regardless of thedetailed community assembly mechanisms. The question then arises, what would be anappropriate mathematical model to describe the interaction between metabolic pathwaysand their environment, and how could such a model incorporate modern molecular sequencedata? To address these questions, here we construct a quantitative biogeochemical modelthat describes metabolic coupling along the redox gradient in Saanich Inlet—a seasonallyanoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZ). The modelreproduces measured biogeochemical process rates as well as DNA, mRNA, and proteinconcentration profiles across the redox cline. Simulations make rigorous but unexpected pre-dictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen,and sulfur cycling. For example, nitrite “leakage” during incomplete sulfide-driven denitri-fication by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixationand intense nitrogen loss via anaerobic ammonia oxidation (anammox). Moreover, this cou-pling creates a metabolic niche for nitrous oxide reduction that completes denitrification bycurrently unidentified community members. These results quantitatively improve previousconceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specificpredictions, model results indicate that geochemical fluxes are robust indicators of microbialcommunity structure and, reciprocally, that gene abundances and geochemical conditions1A version of this chapter is published in PNAS (see the Preface for author contributions): Louca,S., Hawley, A.K., Katsev, S., Torres-Beltran, M., Bhatia, P.B., Kheirandish, S., Michiels, C., Capelle, D.,Lavik, G., Doebeli, M., Crowe, S.A., and Hallam, S.J. (2016). Integrating biogeochemistry with multi-omicsequence information in a model oxygen minimum zone. Proceedings of the National Academy of Sciences.DOI:10.1073/pnas.160289711378Gene-centric modeling of the Saanich Inlet OMZlargely determine gene expression patterns. The integration of geochemical profiles, pro-cess rate measurements, as well as metagenomic, metatranscriptomic and metaproteomicsequence data into a biogeochemical model, as demonstrated here, enables holistic under-standing of the distributed microbial metabolic network driving nutrient and energy flow atecosystem scales.7.2 IntroductionMicrobial communities catalyze Earth’s biogeochemical cycles through metabolic pathwaysthat couple fluxes of matter and energy to biological growth (116). These pathways areencoded in evolving genes that over time spread across microbial lineages and today helpshape the conditions for life on Earth. High-throughput sequencing and mass spectrometryplatforms are generating multi-omic sequence information (DNA, mRNA and protein) thatis transforming our perception of this microcosmos, yet the vast majority of environmentalsequencing studies lack a mechanistic link to geochemical processes. At the same time,mathematical models are increasingly used to describe local and global scale biogeochemicalprocesses or to predict future changes in global element cycling and climate (138, 392). Whilethese models typically incorporate the catalytic properties of cells, they fail to integrate theinformation flow between DNA, mRNA, proteins and process rates as described by the centraldogma of molecular biology (85). Hence, a mechanistic framework integrating multi-omicdata with geochemical information has remained elusive.Recent work based on metagenomics and quantitative PCR suggests that biogeochemicalprocesses may be described by models focusing on the population dynamics of individualgenes (386, 387). In such gene-centric models, genes are used as proxies for particularmetabolic pathways, with gene production rates being determined solely by the Gibbs freeenergy released by the catalyzed reactions. While compelling, these models stop short ofincorporating the entire central dogma of molecular biology, and therefore do not achievea truly quantitative integration of multi-molecular information and geochemical processes.For example, while such models allowed for a qualitative comparison between modeled geneproduction rates and selected transcript profiles, they do not provide any explicit mechanisticlinks (386). In fact, the lack of a quantitative validation against process rate measurementsor other proxies for activity (e.g., proteins) begs the question whether gene-centric modelsare adequate descriptions for microbial activity.Here we construct a gene-centric model for Saanich Inlet, a seasonally anoxic fjord on the79Gene-centric modeling of the Saanich Inlet OMZcoast of Vancouver Island, Canada (12), and a tractable analogue for the biogeochemistryof oxygen minimum zones (OMZs). Oxygen minimum zones are widespread and expand-ing regions of the ocean in which microbial community metabolism drives coupled carbon,nitrogen and sulfur cycling (489, 531). These processes exert a disproportionate influenceon global N budgets with resulting feedbacks on marine primary productivity and climate(252, 509). Extensive time series monitoring in Saanich Inlet provides an opportunity tointerrogate biogeochemical processes along defined redox gradients extensible to coastal andopen ocean OMZs (503, 540). Our model explicitly describes DNA, mRNA, and proteindynamics as well as biogeochemical reaction rates at ecosystem scales. We use geochemicaldepth profiles, rate measurements for N cycling processes, metagenomic, metatranscriptomicand metaproteomic sequence data, as well as qPCR-based cell count estimates for SUP05Gammaproteobacteria — the dominant denitrifiers in Saanich Inlet (181), to calibrate andvalidate our model. All data were obtained from a single location in Saanich Inlet acrossseveral depths in early or mid-2010.7.3 Construction and calibration of a gene-centric modelA recurring cycle of deep water renewal and stratification in Saanich Inlet results in an annualformation of an oxygen-depleted (O2) zone in deep basin waters (Fig. 7.1B). As this oxygenminimum zone slowly expands upwards during spring, anaerobic carbon remineralization inthe underlying sediments leads to an accumulation of ammonium (NH+4 ) and hydrogen sulfide(H2S) at depth (340). The oxidation of sulfide using nitrate produced through nitrificationin the upper water column fuels chemoautotrophic activity and promotes the formation of anarrow sulfide-nitrate transition zone (SNTZ) at intermediate depths (218, 504, 540).Our model describes the dynamics of several genes involved in carbon, nitrogen and sulfur cy-cling across the Saanich Inlet redox cline. Each gene is a proxy for a particular redox pathwaythat couples the oxidation of an external electron donor to the reduction of an external elec-tron acceptor. The model builds on a large reservoir of previous work that provides concep-tual insight into the microbial metabolic network in Saanich Inlet (11, 181, 218, 503, 504, 540)and includes aerobic remineralization of organic matter (ROM), nitrification, anaerobic am-monium oxidation (anammox) and denitrification coupled to sulfide oxidation (Fig. 7.1A).Certain pathways found in other OMZs, such as aerobic sulfide oxidation (415), dissimila-tory nitrate reduction to ammonium (DNRA; 253) and sulfate reduction (386), were excludedfrom our model based on information from previous studies (11, 181, 218, 503, 504, 540) aswell as preliminary tests with model variants (as explained below). Reaction rates (per80Gene-centric modeling of the Saanich Inlet OMZgene) depend on the concentrations of all used metabolites according to 1st order or 2ndorder (Michaelis-Menten) kinetics (211). In turn, the production or depletion of metabolitesat any depth is determined by the reaction rates at that depth. The production of genesis driven by the release of energy from their catalyzed reactions, and is proportional to theGibbs free energy multiplied by the reaction rate (396).The model was evaluated at steady state between 100 and 200 m depth. Accordingly,free parameters were calibrated using geochemical profiles obtained in early 2010 during aperiod of intense water column stratification, which resulted in an extensive anoxic zone thatapproached a steady state (Fig. 7.2; Appendix F.3.7). After calibration, most residuals tothe data are associated with an upward offset of the predicted SNTZ (Figs. 7.2B,C,F) andan underestimation of nitrous oxide (N2O) concentrations in deep basin waters (Fig. 7.2E).These residuals can be explained by subtle deviations from a steady state. Such deviationswere revealed in subsequent time series measurements during which the SNTZ continued tomigrate upwards in the water column (Fig. 7.2).7.4 Results and Discussion7.4.1 DNA profiles and process ratesThe calibrated model makes predictions about gene abundance and process rates, whichcan be validated using metagenomic sequence data and N process rate measurements fromthe same location and period as the geochemical calibration data (Fig. 7.3). Consistentwith metagenomic depth profiles, the model predicts a redox-driven partitioning of path-ways across the water column. Genes associated with ROM (ABC transporters mapped todominant aerobic heterotrophs), aerobic ammonium oxidation to nitrite (ammonia monooxy-genases, amo) and aerobic nitrite oxidation to nitrate (nitrite oxidoreductases, nxr) declineprecipitously in deep basin waters, whereas genes associated with partial denitrification tonitrous oxide (PDNO, represented by nitric oxide reductases, norBC ), nitrous oxide reduc-tion (nitrous oxide reductases, nosZ ) and anammox (hydrazine oxidoreductases, hzo) aremost abundant in the SNTZ (Fig. 7.1A).The similarity of the PDNO, nosZ and hzo gene profiles is indicative of their strong inter-action. In particular, the co-occurring peaks of PDNO and nosZ abundances in the absenceof N2O accumulation (Fig. 7.2E) reflect a quantitative coupling between the two denitrifi-cation steps and imply that both steps support extensive productivity at the SNTZ. This81Gene-centric modeling of the Saanich Inlet OMZis intriguing because genomic reconstruction from both uncultivated and cultivated SUP05,the dominant denitrifier in Saanich Inlet, has not identified the nosZ gene (181, 420, 504).This suggests that incomplete nitrate reduction by SUP05 and reduction of nitrous oxide byunidentified community members constitute separate and complementary metabolic nichesin Saanich Inlet under suboxic and anoxic conditions.The superposition of electron donor-acceptor pairs in redox transition zones supports chem-ical energy transfer in stratified water columns (45, 547), and previous studies revealedrelatively high cell abundances and chemoautotrophic activity in such zones (164, 515, 540).At the SNTZ in Saanich Inlet, the simultaneous availability of NO−3 and H2S fuels chemoau-totrophic nitrate respiration coupled to sulfide oxidation, in turn supplying anammox withNO−2 via “leaky” denitrification (up to 88% of NO−2 supplied by PDNO; Appendix F.3.11).Most of the NH+4 utilized by anammox (0.3 mmol ·m−2 ·d−1), on the other hand, is predictedto originate from the underlying sediments and reach the SNTZ via diffusion. Accordingly,both anammox and denitrification rates are predicted to peak around the SNTZ and lead toa significant release of N2. This prediction is consistent with process rate measurements fromdiscrete depth intervals during subsequent cruises in 2010 (Fig. 7.3B), as well as elevatedSUP05 cell counts at the SNTZ (Fig. 7.3A, estimated via qPCR). In fact, the good agree-ment between predicted PDNO gene counts and observed SUP05 cell counts suggests energyfluxes associated with denitrification can be accurately translated to denitrifier biosynthesisrates. Predicted peak sulfide-driven denitrification rates are somewhat higher than peakanammox rates, although depth-integrated nitrogen loss rates are comparable for both path-ways (∼ 0.3 mmol-N2 ·m−2 ·d−1). These predictions are partly consistent with rate estimatesderived directly from the geochemical profiles using inverse linear transport modeling (ILTM,Fig. 7.3B; see Methods for details). Hence, near steady state conditions, coupled sulfide-driven denitrification and anammox can concurrently drive significant nitrogen loss in thewater column.The fraction of NO−2 leaked during denitrification, compared to the total NO−3 consumed(LPDNO = 0.352; Appendix F.3.3), could be calibrated as a free model parameter based onthe observed geochemical profiles. Such a high NO−2 leakage may result from an optimiza-tion of energy yield under electron donor limitation. Further experimental work is neededto understand the mechanisms controlling this leakage by SUP05. Heterotrophic denitrifi-cation and nitrification are conventionally thought of as the primary sources of both nitriteand ammonium for anammox in OMZs (253, 501), and so far evidence for a direct couplingbetween sulfide-driven denitrification and anammox has been scarce (515). Our results indi-cate that sulfide-driven denitrification is an important precursor for anammox, particularly82Gene-centric modeling of the Saanich Inlet OMZunder conditions of organic carbon limitation (172).Steady state gene production rates for chemoautotrophic pathways are predicted to peakaround the SNTZ, reaching ∼ 3.2 × 106 genes · L−1 · d−1. This gene production rate cor-responds to a dark carbon assimilation (DCA) rate of ∼ 60 nmol-C · L−1 · d−1, assuming acarbon:dry weight ratio of 0.45 (115) and a dry cell mass of m = 5 × 10−13 g (396). Pre-viously measured peak DCA rates in the Saanich Inlet OMZ reached 2 µmol-C · L−1 · d−1(218), which is significantly higher than the values predicted here. The potential activityof autotrophic pathways not considered here, such as methane oxidation (531), may explainsome of these differences. Moreover, the model assumes steady state conditions, while redoxconditions were far from steady state during previous DCA measurements (218). Transientdynamics in Saanich Inlet can exhibit significantly higher nitrogen fluxes and chemoau-totrophic activity (295), which might further explain discrepancies between the model andmeasured DCA rates. Accounting for chemoautotrophic productivities based on oxidant andreductant supply in redox transition zones is generally difficult due to limited knowledge onactive pathways, the possibility of cryptic nutrient cycling and potential lateral substrateintrusions, and discrepancies similar to our study are frequently reported for other OMZs(217, 274, 341, 546).Previously detected amino acid motifs similar to those found in proteins catalyzing DNRA,suggested that SUP05 may also be providing NH+4 to anammox through DNRA (181).DNRA, not included in the model, is known to fuel anammox in anoxic sediments andwater columns (253, 374). So far incubation experiments have not revealed any DNRAactivity in Saanich Inlet, and measured ammonium profiles do not indicate any significantammonium source at or below the SNTZ (Fig. 7.2B). Nevertheless, DNRA could be activein Saanich Inlet and remain undetected due to rapid ammonium consumption by anam-mox (374). An extension of the model that included DNRA as an additional pathway, andwhich we calibrated to the same geochemical data (January–March 2010), predicted negli-gible DNRA rates compared to denitrification and anammox and consistently converged tothe same predictions as the simpler model. This suggests that DNRA may be absent fromthe Saanich Inlet water column — at least near steady state conditions in late spring — andthat the hydroxylamine-oxidoreductase homolog encoded by SUP05 plays an alternative rolein energy metabolism (504).DNA concentration profiles of anammox and denitrification genes appear more diffuse andare skewed towards deep basin waters, compared to their corresponding rate profiles and theSNTZ (Fig. 7.3). The model explains this apparent discrepancy based on turbulent diffusion83Gene-centric modeling of the Saanich Inlet OMZand sinking, which both transport genes away from their replication origin. Hence, commu-nity composition at any depth is the combined result of local as well as non-local populationdynamics. Metabolic flexibility encoded in the genomes of microorganisms mediating theseprocesses may also contribute to broader distributions of individual genes than their pre-dicted activity range (172). This disconnect between local metabolic potential and activityneeds to be considered when interpreting metagenomic profiles in a functional context, es-pecially in environments with strong redox gradients such as OMZs (531) or hydrothermalvents (387).The concentration maxima of anammox and denitrification genes are predicted at shallowerdepths than measured (Fig. 7.3A). This observation is consistent with the upward offset ofthe predicted SNTZ and highlights an important limitation of steady state models appliedto dynamic ecosystems. Indeed, process rate maxima predicted via ILTM at multiple timepoints continue to move upwards beyond the time interval used for model calibration (Fig.7.3B). In reality, an electron donor/electron-acceptor interface as narrow as predicted by themodel would only develop after sufficient time for transport processes and microbial activityto reach a true steady state. Such narrow interfaces do appear in permanently stratifiedmeromictic lakes (435) or the Baltic Sea, where stagnation periods can persist for manyyears (172).7.4.2 mRNA and protein profilesMetagenomics can yield insight into the metabolic potential of microbial communities,however it is ill suited as a means to infer actual pathway activity because the latterdepends, among others, on environmental conditions. Metatranscriptomics and metapro-teomics present powerful means to assess community metabolic activity, and each methodcomes with its own set of advantages (181, 330). For example, while transcripts presentimmediate proxies for gene up-regulation (e.g., in response to changing redox conditions),proteins represent the immediate catalytic potential of a community, and in vitro charac-terization of enzyme kinetics can facilitate the projection of protein abundances to in situprocess kinetics (502). Transcript abundances need not always correlate strongly with proteinabundances, for example in cases of transcriptional control or protein instability (265, 502),and hence metatranscriptomics and metaproteomics can in principle provide different per-spectives on community activity. This warrants a systematic evaluation of the consistencybetween these alternative layers of information in real ecosystems. In fact, a unifying mech-anistic framework describing the processes that control environmental mRNA and protein84Gene-centric modeling of the Saanich Inlet OMZdistributions is crucial for the correct interpretation of multi-omics data (330).While DNA replication and process rates are predicted by our gene-centric model merelybased on environmental redox conditions, it is uncertain to what extent intermediate stagesof gene expression (transcription and translation) can be explained based on such a paradigm.For example, environmental mRNA concentrations measured via qPCR have previously beendirectly compared to predicted reaction rates (386), but such a heuristic comparison ig-nores other mechanisms controlling environmental biomolecule distributions, such as physi-cal transport processes. Here, in an attempt to mechanistically describe mRNA and proteindynamics at ecosystem scales, we hypothesized that both mRNA and protein productionrates at a particular depth are proportional to the total reaction rate at that depth (cal-culated using the calibrated model). This premise is motivated by observations of elevatedtranscription and translation rates during high metabolic activity or growth (9, 154, 229).Furthermore, we assumed that mRNA and protein molecules are subject to the same hydro-dynamic dispersal processes as DNA, while decaying exponentially with time, post synthesis.The decay time of each molecule, as well as the proportionality factor between the reactionrate and synthesis, were statistically estimated using metatranscriptomic and metaproteomicdata (Appendix F.3.10).The general agreement between this model and the molecular data (Fig. 7.3A) suggeststhat the production-degradation dynamics of several of these molecules are, at the ecosys-tem level, dominated by the mechanisms described above. The best fit (in terms of thecoefficient of determination, Table F.4) is achieved for nosZ and nxr mRNA, as well as amo,norBC, nxr and ROM proteins. The greater number of protein over mRNA profiles thatcan be explained by the model suggests that the proteins considered here are indeed simplyproduced on demand and slowly degrade over time, while mRNA dynamics are subject tomore complex regulatory mechanisms (153, 330). In particular, the decay times of sometranscripts and proteins were estimated to be as high as several weeks (Table F.4). For pro-teins these estimates fall within known ranges (153), however for transcripts these estimatesare much higher than decay times determined experimentally in cells (465). One reason forthis discrepancy appears to be the underestimation of the SNTZ depth range by the model,which in turn leads to longer estimates for mRNA decay times needed to explain the detec-tion of these molecules outside of the SNTZ. Alternatively, transcripts and proteins mightpersist in the cells in inactive states for a significant period of time even after dispersal intoareas with low substrate concentrations. For example, stable but silent transcripts have beenfound in bacteria following several days of starvation (245, 299). Further, gene expressionmay not change immediately upon a change of external stimulus (181). For example, for85Gene-centric modeling of the Saanich Inlet OMZsome prokaryotic transcription cascades the basic time unit may be the cell doubling time(which can reach several weeks in anoxic environments; 518), due to regulation by long-livedtranscription factors (402). Hence, the decay times estimated here may reflect a hysteresisin gene down-regulation following nutrient depletion, perhaps in anticipation of potentialfuture opportunities for growth (326). Overall, these observations suggest that future mod-els for environmental mRNA dynamics and future metatranscriptomics would benefit from aconsideration of additional mRNA control mechanisms, for example derived from cell-centrictranscription models (403, 542).7.4.3 Consequences for geobiologyGene-centric models have the potential to integrate biogeochemical processes with microbialpopulation dynamics (386, 387). According to the central dogma of molecular biology (85),gene transcription and translation are intermediate steps that regulate metabolic processesin individual cells, but the appropriate projection of the central dogma to ecosystem scalesremained unresolved because transcription and translation were not explicitly consideredin previous models (386, 387). We have developed a biogeochemical model that explicitlyincorporates multi-omic sequence information and predicts pathway expression and growthin relation to geochemical conditions. In particular, when mRNA and protein dynamics areomitted, the gene-centric model only includes 4 calibrated parameters and yet is able tolargely reproduce geochemical profiles (Fig. 7.2), relative metagenomic profiles (Fig. 7.3A)and SUP05 cell abundances, which means that the good fit is unlikely the mere result ofoverfitting. In fact, as we refined and calibrated our model to the geochemical profiles,we observed that the metagenomic profiles were well reproduced as soon as the model’sgeochemical predictions roughly aligned with the data, even if the calibrated parameters werestill being adjusted. This further strengthens the interpretation that fluxes of matter andenergy are robust predictors of microbial productivity and functional community structure.Our model successfully explains a large fraction of environmental mRNA and protein distri-butions based on DNA concentration profiles and biogeochemical reaction rates in the OMZ.These results are consistent with the idea that DNA is a robust descriptor of an ecosystem’sbiological component (104, 201) which, in conjunction with the geochemical background, de-termines pathway expression and process rates (257). This implies that the co-occurrence ofa metabolic niche with cells able to exploit it is sufficient to predict microbial activity. Underthis paradigm, DNA may be regarded as directly coupled to the extracellular geochemicalenvironment, while the production of mRNA and proteins is an inevitable consequence of86Gene-centric modeling of the Saanich Inlet OMZthe biologically catalyzed flow of matter and energy. This interpretation is supported by theoverall consistency between the metatranscriptomic and metaproteomic profiles for N andS cycling genes (Fig. 7.3A) and suggests that mRNA and proteins may each be adequateproxies for pathway activity in Saanich Inlet. Further work is needed to test this paradigm inother ecosystems, especially under non-steady state conditions. Nevertheless, many marineand freshwater water columns are permanently or almost permanently anoxic (416, 435, 489)and hence, our approach and conclusions are expected to be particularly applicable to thesesystems.In addition to providing a systematic calibration and validation of the model we identifiedprocesses that need to be considered when interpreting multi-omic data. Conventional prox-ies for activity, such as mRNA and proteins, are themselves subject to complex populationdynamics that include production, active or passive degradation as well as physical transportprocesses. Consequently, the close association between process rates and biomolecule pro-duction suggested above does not imply that biomolecule distributions per se are equivalentto local microbial activity rates. In Saanich Inlet, for example, the wide distribution of DNA,mRNA and proteins across the OMZ, in contrast to a relatively narrow metabolic “hot zone”at the SNTZ, is predicated on a balance between spatially confined production and dispersalacross the water column. This so-called mass effect (268) indicates that geochemical or bio-chemical information is needed to assign actual activity to genes or pathways identified inmulti-omic data, especially for components mediating energy metabolism. This conclusionis generalizable and should be applied to other ecosystems exhibiting dispersal across strongenvironmental gradients, such as freshwater-marine transitions (87) or hydrothermal vents(387). Moreover, in dynamic ecosystems with changing geochemical conditions past popula-tion growth rates can influence future community structure and biomolecular patterns andhence, cross-sectional community profiles may not reflect current dynamics (430). In suchsystems, an incorporation of multiple layers of geochemical and biological information intoa mechanistic framework — as demonstrated here — will be crucial for disentangling themultitude of processes underlying experimental observations.The gene-centric model constructed here, although evaluated at steady state, is in fact aspatiotemporal model that could, in principle, predict gene population dynamics and pro-cess rates over time. A spatiotemporal analysis of the model would require multi-omic timeseries coverage and knowledge of non-stationary physical processes, such as seasonal patternsin surface productivity and hydrodynamics during deep water renewal events, which wereunavailable at the time of this study. Multi-omic time series would be especially useful forimproving the mRNA and protein models introduced here, due to the high number of cur-87Gene-centric modeling of the Saanich Inlet OMZrently unknown parameters. For example, integrating metatranscriptomic, metaproteomicand geochemical time series during rapid environmental changes into our model would al-low for a more direct determination of in situ transcriptional and translational responsesand biomolecule decay times. Advances in multi-omic sequencing will undoubtedly lead toincreased spatiotemporal coverage in the future (241), thus allowing for the much neededintegration of such data into biogeochemical models.The multi-omic profiles that we used to validate our model are only given in terms of relative— rather than absolute — biomolecule concentrations. Hence, the observed abundance ofeach biomolecule may be influenced by the abundances of other biomolecules, which couldexplain some of the discrepancies between the model and the multi-omic data. Unfortunately,this limitation is currently pervasive across environmental shotgun sequencing studies, largelydue to technical challenges in estimating in situ DNA, mRNA and protein concentrations(376). These challenges will likely be overcome in the future (428), and this will undoubtedlyimprove model testing and refinement. Given this current caveat, the general agreement ofthe model with the shape of the multi-omic profiles (Fig. 7.3A) is all too remarkable, andsuggests that the spatial structuring of the metabolic network is well captured by the model.In fact, our qPCR-based estimates for absolute SUP05 cell concentrations are consistentwith absolute PDNO gene concentrations predicted by the model, as well as with the shapeof the PDNO abundance profiles in the metagenomes (Fig. 7.3A). This double agreementsuggests that — at least for PDNO — both our model as well as our metagenomic data setsreflect the actual gene distributions.7.5 ConclusionsMost major metabolic pathways driving global biogeochemical cycles are encoded by a coreset of genes, many of which are distributed across distant microbial clades (116). Thesegenes are expressed and proliferate in response to specific redox conditions and, in turn,shape Earth’s surface chemistry. Here we have shown that the population dynamics of genesrepresentative of specific metabolic pathways, their expression and their catalytic activityat ecosystem scales can all be integrated into a mechanistic framework for understandingcoupled carbon, nitrogen and sulfur cycling in OMZs. This framework largely explainedDNA, mRNA and protein concentration profiles and resolved several previous uncertaintiesin metabolic network structure in Saanich Inlet, including a direct coupling of sulfide-drivendenitrification and anammox through “leaky” nitrite production by SUP05, as well as thepresence of a metabolic niche for nitrous oxide reduction contributing to nitrogen loss. Be-88Gene-centric modeling of the Saanich Inlet OMZyond OMZ-specific predictions, model results indicate that geochemical fluxes are robustindicators of microbial community structure and, reciprocally, that gene abundances andgeochemical conditions largely determine gene expression patterns. Such integrated model-ing approaches offer the potential to understand microbial community metabolic networksand to predict the responses of elemental cycles in a changing world.Feb08 Jan09 Jan11 Dec112008 2009 2010 2011015>18105H 2S (µM)2501003010<3>353020100Oxygen (µM)NO3- (µM)Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200Planctomycetes Nitrosopumulus sp.unknown cladeNitrospira/Nitrospinauncultured SUP05 heterotrophFeb08 Jan09 Jan11 Dec112 08 2 09 2010 2011Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200BANH+4 NO3NO2SO24N2OO2HSN2SO24DOM Feb06 Depth (m)0501001502002501003010<3Oxygen (µM)Feb07 Jan08 Jan09 Jan10 Jan11 Jan12 Jan13 Jan14 Depth (m)050100150200>353020100NO3- (µM) Depth (m)0501001502001.01.520.50NO2- (µM) 1510>2050NH4+ (µM)Depth (m)050100150200Depth (m)050100150200 Depth (m)050100150200 015>18105HS (µM)25010>50010CH4(nM) Depth (m)050100150200205>5010(nM)NO 22006 2007 2008 2009 2010 2011 2012 2013 Feb06 Depth (m)0501001502002501003010<3Oxygen (µM)Feb07 Jan08 Jan09 Jan10 Jan11 Jan12 Jan13 Jan14 Depth (m)050100150200>353020100NO3- (µM) Depth (m)0501001502001.01.520.50NO2- (µM) 1510>2050NH4+ (µM)Depth (m)050100150200Depth (m)050100150200 Depth (m)050100150200 015>18105HS (µM)25010>50010CH4(nM) Depth (m)050100150200205>5010(nM)NO 22006 2007 2008 2009 2010 2011 2012 2013015>18105H 2S (µM)2501003010<3>353020100Oxygen (µM)NO3- (µM)Jan 13 Feb 10 Mar 10 Apr 07ROMamo nxrhzonosZPDNOFigure 7.1: Metabolic network and selected chemical time series. (A) Coupled carbon,nitrogen and sulfur redox pathways considered in the model: Remineralization of organic matter(ROM), aerobic ammonia oxidation (amo), aerobic nitrite oxidation (nxr), anaerobic ammoniaoxidation (hzo), as well as partial denitrification to nitrous oxide (PDNO) and reduction of nitrousoxide (nosZ ) coupled to hydrogen sulfide oxidation (see Methods for details). Major taxonomicgroups encoding specific pathways are indicated. (B) Water column oxygen (O2), nitrate (NO−3 )and hydrogen sulfide (H2S) concentrations measured at Sa ich Inlet station SI03 from Ja uary2008 to December 2011. The shaded interval and the dates at the bottom indicate the chemicalmeasurements that were used for model calibration. The vertical white line marks the time ofmolecular sampling. The model considers depths between 100 m and 200 m.89Gene-centric modeling of the Saanich Inlet OMZA B CD E FFigure 7.2: Measured and predicted geochemical profiles. (A) oxygen, (B) ammonium, (C)nitrate, (D) nitrite, (E) nitrous oxide and (F) hydrogen sulfide concentrations as predicted by thecalibrated model at steady state (thick blue curves). Dots: Data used for the calibration, measuredduring cruise 41 on January 13, 2010 (SI041_01/13/10, rectangles), cruise 42 (SI042_02/10/10,rhomboids) and cruise 43 (SI043_03/10/10, triangles). Oxygen profiles were not available forcruises 41 and 43, hence data from cruise 44 (SI044_04/07/10, stars) were used instead. Thinblack curves: Data measured during cruise 47 (SI047_07/07/10), shortly before deep water renewal.Details on data acquisition in Appendix F.2.90Gene-centric modeling of the Saanich Inlet OMZii“Article” — 2016/6/14 — 15:56 — page 11 — #11 iiiiiiFig. 3. Molecular and rate profiles. (a) Predicted DNA, mRNA and protein concentrations (rows 1–3) for ROM, amo, nxr,norBC, hzo and nosZ genes (thick curves), compared to corresponding metagenomic, metatranscriptomic and metaproteomic data (circles,February 10, 2010). The dashed curve under PDNO genes (row 1, column 4) shows concurrent qPCR-based cell count estimates for SUP05,the dominant denitrifier in Saanich Inlet. (b) Denitrification and anammox rates predicted by the model (thick blue curves), comparedto rate measurements (circles) during cruises 47 (SI047 07/07/10) and 48 (SI048 08/11/10), as well as rates estimated from geochemicalconcentration profiles using inverse linear transport model fitting (ILTM; Supplement S5). The ILTM estimates “calibr.” in the 3rd and6h plot are based on the same geochemical data as used for model calibration (Fig. 2).Footline Author PNAS Issue Date Volume Issue Number 11DNA (10⁶/L)mRNA (RPKM)proteins (10ˉ³ × NSAF)ROM amo nxr nosZPDNOABhzomodel measurements ILTM model measurements ILTMdenitrification anammoxmodel DNAmodel mRNAmodel proteinmulti-omic dataqPCR SUP05Figure 7.3: Molecular and rate profiles. (A) Predicted DNA, mRNA and protein concen-trations (rows 1–3) for ROM, amo, nxr, norBC, hzo and nosZ genes (thick curves), compared toco responding metagenomic, metatranscriptomic d metaproteomic data (circles, February 10,2010). The dashed curve under PDNO genes (row 1, column 4) shows concurrent qPCR-basedcell count estimates for SUP05, the dominant denitrifier in Saanich Inlet. (B) Denitrification andanammox rates predicted by the model (thick blue curves), compared to rate measurements (cir-cles) during cruises 47 (SI047_07/07/10) and 48 (SI048_08/11/10), as well as rates estimated fromgeochemical concentration profiles using inverse linear transport modeling (ILTM; Appendix F.5).The ILTM estimates “calibr.” in the 3rd and 6h plot are based on the same geochemical data asused for model calibration (Fig. 7.2).91Reaction-centric modeling of community metabolismChapter 8Reaction-centric modeling of microbialcommunity metabolism18.1 SynopsisThe growth of microbial populations catalyzing biochemical reactions leads to positive feed-back loops and self-amplifying process dynamics at ecosystem scales. Hence, the state ofa biocatalyzed process is not completely determined by its physicochemical state but alsodepends on current cell or enzyme concentrations that are often unknown. Here we proposean approach to modeling reaction networks of natural and engineered microbial ecosystemsthat is able to capture the self-amplifying nature of biochemical reactions without explicitreference to the underlying microbial populations. This is achieved by an appropriate com-bination of parameters and variables, that allows keeping track of a system’s “capacity”to perform particular reactions, rather than the cell populations actually catalyzing them.Our reaction-centric approach minimizes the need for cell-physiological parameters such asyield factors and provides a suitable framework for describing a system’s dynamics purely interms of chemical concentrations and fluxes. We demonstrate our approach using data froman incubation experiment involving urea hydrolysis and nitrification, as well as time seriesfrom a long-term nitrifying bioreactor experiment. We show that reaction-centric modelscan capture the dynamical character of microbially catalyzed reaction kinetics and enablethe reconstruction of bioprocess states using solely chemical data, hence reducing the needfor laborious biotic measurements in environmental and industrial process monitoring.1A version of this chapter has been published (see the Preface for author contributions): Louca, S.,Doebeli, M. 2016. Reaction-centric modeling of microbial ecosystems. Ecological Modelling. 335:74–86.DOI:10.1016/j.ecolmodel.2016.05.01192Reaction-centric modeling of community metabolism8.2 IntroductionMicrobial metabolism powers biochemical fluxes in natural and engineered ecosystems (116,312). Reciprocally, biochemical fluxes sustain biosynthesis and thus drive microbial popula-tion dynamics (210). Changes in the microbial populations, in turn, influence the reactionkinetics at ecosystem scales because system-wide reaction rates depend not only on substrateconcentrations but also on the density of catalyzing cells or of extracellular enzymes (426).Thus, the dynamics of microbial communities emerge from the continuous interplay betweenmetabolic activity, changes in the extracellular metabolite pool and microbial populationgrowth (433). In particular, and in contrast to purely abiotic chemical processes (297),the state and future trajectory of a biocatalyzed process cannot be determined solely basedon the system’s chemical state (210, 426). For example, empirical mineralization curvesthat describe the degradation rate of organic matter as a function of substrate density canvary strongly in shape, and this variation historically resulted partly from the interaction ofsubstrate concentrations and cell population densities in experiments (426).In deterministic or stochastic differential equation models (233, 390, 433), the dynamicalcharacter of microbially catalyzed reaction kinetics is typically incorporated by includingadditional variables representing cell densities, whose growth is proportional to the rates ofthe processes that they catalyze and determined by cell-per-substrate (or sometimes biomass-per-substrate) yield factors (210). In turn, system-wide reaction kinetics are modulated bycurrent cell densities and extracellular metabolite concentrations. Such cell-centric modelsare widely used and can capture the typical self-amplifying character of biocatalyzed pro-cesses (68). Likewise, deterministic as well as stochastic individual-based models, which keeptrack of multiple individual organisms and their metabolic activity, can also capture the feed-back loops within microbial metabolic networks because the metabolic or trophic activity ofeach organism eventually leads to the production of new copies of that organism (124, 258).All of these cell-centric models, however, depend on physiological parameters such as yieldfactors, cell masses or maximum cell-specific reaction rates, and require knowledge of cell orenzyme concentrations (in addition to physicochemical variables) for describing a system’scurrent state. As we explain below, some of these parameters also introduce redundanciesfrom a reaction kinetic point of view that can lead to strong uncertainties in parameterestimation (242, 426).Flux-balance models, a popular alternative to dynamical models (354), reduce the number ofrequired parameters by ignoring cell population dynamics and by assuming that metaboliteconcentrations are constant through time (i.e., metabolite fluxes are “balanced”). In these93Reaction-centric modeling of community metabolismmodels, reaction rates (and sometimes metabolite turnover rates; 72) are the only indepen-dent variables, and their values are calculated by optimizing some objective function (e.g.,ATP production) in the presence of constraints (e.g., on maximum reaction rates). Flux bal-ance models have been very successful in elucidating metabolic network properties such asthe feasibility of certain reactions or the prediction of metabolic interactions between species(238, 453, 545) but — being steady-state models — they fail to capture the dynamical natureof microbial communities. Hence, current model frameworks either ignore the temporal andself-amplifying character of biocatalyzed processes or require an extensive set of — oftenpoorly estimated — physiological parameters.To address the above limitations, here we develop a new framework for dynamical bioprocessmodeling with a focus on system-wide reaction kinetics. Our objective was to reduce thereliance on physiological parameters and to reduce the need for biotic measurements for statereconstruction and model calibration, while still accounting for the self-amplifying characterof metabolic reactions at the ecosystem level. Such a “reaction-centric” model would ide-ally make predictions purely in terms of metabolite concentrations and reaction rates at theecosystem level, without the need to consider the underlying cell populations. As we showbelow, this can be achieved by keeping track of a system’s “capacity” to perform particularreactions (or pathways), rather than the cell populations actually catalyzing them. Microbialecosystem metabolism can then be described similarly to abiotic reaction networks, with theaddition of so-called self- and cross-amplification factors between reactions. These amplifi-cation factors are specific to a particular microbial community and translate the system’smetabolic fluxes into changes of the system’s reaction capacities. Hence, a system’s stateand dynamics can be inferred using solely physicochemical measurements, bypassing labo-rious biotic measurements for example in environmental and industrial process monitoring.Furthermore, reaction-centric models minimize the reliance on cell-physiological parameters,allowing for model calibration even when biotic data are scarce. Reaction-centric modelsthus provide an elegant alternative to many conventional cell-centric models, particularlywhen the ultimate focus is on a system’s reaction kinetics.We begin with a derivation of the reaction-centric framework and show how it relates toconventional, cell-centric models. We focus on differential equation models, however we notethat our reasoning can also be applied to other cell-centric frameworks. We demonstratethe potential of reaction-centric models using data from a previous short-term incubationexperiment with a ureolytic and nitrifying microbial community (94), as well as long-termtime series from a flow-through nitrifying bioreactor (109). Bioreactors provide ideal modelecosystems for testing new theories for microbial ecology, due to their higher controllability94Reaction-centric modeling of community metabolismand measurability when compared to natural ecosystems. Ureolysis and nitrification werechosen as examples because of their conceptual simplicity as well as their great relevanceto ecosystem productivity, industry and agriculture (375, 520). Our entire analysis wasperformed with MCM (Chapter 4; 284), which we extended to accommodate reaction-centricmodels.8.3 Derivation of the reaction-centric framework8.3.1 One reaction per cellConventional cell-centric microbial ecosystem models consider the extracellular concentra-tions of metabolites as well as the cell densities of microbial populations catalyzing var-ious reactions. In the simplest and most common case each reaction is catalyzed by adistinct microbial population, the growth of which is proportional to the rate of the reaction(210, 258, 426). More precisely, the population density of cells catalyzing reaction r (Nr, cellsper volume) and the concentration (Cm) of each metabolite m are described by differentialequations similar to the following:dNrdt= NrYrVrhr(C)− λrNr, (8.3.1)dCmdt= Fm(t,C) +∑rSmrNrVrhr(C). (8.3.2)In Eq. (8.3.1), Yr is a cell yield factor (cells produced per substrate used), Vr is the maxi-mum cell-specific reaction rate (flux per cell per time) and C is the vector representing allmetabolite concentrations (overview of symbols in Table 8.1). We note that in models whereNr is alternatively measured in biomass (rather than cells) per volume, Yr is typically abiomass yield factor and Vr is a maximum biomass-specific reaction rate. The dependenceof cell-specific reaction kinetics on C is encoded by the unitless function hr(C), which isnormalized to unity at those C that maximize the cell-specific reaction rate. The last termin Eq. (8.3.1) corresponds to the decay of biomass at an exponential rate λr (with unitstime−1), for example due to cell death. Alternatively, λr can account for reduced biosynthesisdue to maintenance energy requirements, in which case it is sometimes called the “specificmaintenance rate” (210). In Eq. (8.3.2), Fm accounts for abiotic metabolite fluxes, such assubstrate supply in a bioreactor, and Smr is the stoichiometric coefficient of metabolite m inreaction r. The sum in Eq. (8.3.2) iterates through all reactions and accounts for microbialmetabolic fluxes.95Reaction-centric modeling of community metabolismIn the above cell-centric model the system’s state depends on the current metabolite con-centrations (Cm) as well as the current cell densities (Nr), the prediction of which, in turn,requires knowledge of physiological parameters such as Yr and Vr. As we show below, thisfocus on cell populations can be avoided if one is solely interested in the system’s reactionkinetics. Observe that the product Mr = NrVr, henceforth referred to as the system’s cur-rent “reaction capacity”, is the maximum system-wide rate of reaction r (flux per volumeper time) that could possibly be attained at favorable metabolite concentrations (i.e., whenhr(C) = 1). On the other hand, the product Hr = NrVrhr = Mrhr is the actual system-widerate of reaction r. Note that Hr depends both on the reaction capacity Mr as well as thenormalized kinetics hr(C), which encodes the dependence of the reaction rate on the sys-tem’s chemical state. Rewriting Eqs. (8.3.1) and (8.3.2) in terms of the reaction capacitiesMr yields the reaction-centric modeldMrdt= ArMrhr(C)−Mrλr, (8.3.3)dCmdt= Fm(t,C) +∑rSmrHr(C), (8.3.4)Hr = Mrhr, (8.3.5)where we introduced the so called self-amplification factor Ar = VrYr in Eq. (8.3.3). Thismodel describes biochemical reactions at the scale of the ecosystem, without explicit referenceto biotic quantities such as cell densities or physiological parameters such as yield factors.The structure of Eq. (8.3.3) emphasizes the self-amplifying nature of biochemical reactionsat the ecosystem level, with the self-amplification factors Ar mediating the conversion ofreaction rates to a growth of reaction capacities. In the context of cell-centric models, Aris the maximum specific growth rate of cells performing reaction r (in units time−1). In thereaction-centric model, Ar becomes the maximum exponential growth rate of the reactioncapacity Mr. Note that Ar only depends on the product VrYr, but not on the individual Vror Yr. Hence, the system’s biochemical dynamics can be modeled without knowledge of theVr and Yr because the system’s trajectory is completely determined by the self-amplificationfactors and the reaction capacities at some point in time. This collapse of unknown param-eters into fewer ones, without loosing any predictive power, means that fewer parametersare needed for practical purposes than often assumed. In fact, the redundancy inherent tothe simultaneous inclusion of Vr and Yr in conventional models was previously pointed outby Simkins and Alexander (426). This redundancy can lead to strong negative correlationsbetween estimated Yr and Vr, particularly when parameter estimation is based solely on non-96Reaction-centric modeling of community metabolismbiotic chemical time series, because such time series cannot differentiate between alternativecombinations of Vr and Yr yielding the same product VrYr (242).8.3.2 Multiple reactions per cellSo far we assumed that each cell performs exactly one reaction, which means that eachmodeled reaction only induces the growth of its own capacity. While this assumption iswidespread in ecosystem modeling (258, 386), in reality several alternative pathways may beperformed by the same cells. For example, members of the ammonium oxidizing Nitrosospiragenus are also able to hydrolyze urea (300), and urea hydrolysis in incubation experimentswith Nitrosospira was shown to promote ammonium oxidation by the same population (94).In the simplest case, the combined effects of several metabolic pathways on cell populationgrowth can be assumed to be additive, so that each reaction r has a contribution YrHr tothe total growth of the cell population:dNrdt= YrHr +∑q 6=rYqHq −Nrλr. (8.3.6)Here, the sum in Eq. (8.3.6) iterates over all additional reactions attributable to cells per-forming reaction r. If two reactions r and q are performed by the same population thenNr = Nq and, reciprocally, if two populations share a common reaction, that reaction willneed to be represented twice using two separate indices r. The assumption of additive effectson growth is common in conventional microbial population models. For example, Courtinand Spoelstra (80) model a population of acetic acid bacteria utilizing multiple organic sub-strates by assuming that each pathway has an additive effect on the total population growth.More sophisticated models of microbial metabolism based on flux balance analysis and opti-mization of a linear utility function also assume additive effects of various metabolic fluxes,although the functions hr(C) may not be explicit in C, but instead specified in terms of anoptimization algorithm (354).The cell-centric model in Eq. (8.3.6) corresponds to a reaction-centric model in which mul-tiple reactions amplify each other’s capacities whenever they are performed by the samecells:dMrdt= ArHr +∑q 6=rArqHq − λrMr. (8.3.7)Here, the so-called “cross-amplification” factors Arq = VrYq correspond to the positive effects97Reaction-centric modeling of community metabolismof the flux through some reaction q on the capacity of some other reaction r and hence, thesum in Eq. (8.3.7) iterates through all additional reactions q performed by the same cellpopulation as reaction r. The amplification matrix, whose diagonal entries are the self-amplification factors Ar and whose off-diagonal entries are the cross-amplification factorsArq, defines a linear transformation of the vector containing all reaction rates to a vectorcontaining changes in reaction capacities. Note that regardless of any amplifications of thereaction capacities, actual rates may still be limited by low substrate concentrations or thepresence of inhibitors, as determined by the normalized kinetics hr(C). Also note that sinceArq = VrYq and Nr = Nq for any two reactions q and r performed by the same cells, thefollowing consistency conditions apply:Aqr =ArAqArq, Mr = MqArqAq. (8.3.8)Regardless of any cell-centric interpretation, the system’s reaction dynamics only depend onthe amplification factors Arq, but not on any Yq or Vr.The above discussion illustrates how conventional cell-centric models can be used to derivereaction-centric models and foster confidence in their realism. For example, amplificationfactors can be seen as a combination of — and a replacement for — cell-centric parameters.However, as we demonstrate below, in practice a reaction-centric model can be taken as analternative self-contained description of a system’s reaction kinetics. Under such a paradigm,the amplification matrix becomes a set of standalone system-specific parameters and thereaction capacities become independent state variables whose dynamics are shaped by theamplification matrix. We note that while here we focus on linear growth dynamics, non-linear generalizations are also possible with additional amplification coefficients mediatingthe higher order effects of biochemical fluxes on reaction capacities.Apart from the elegance of a reaction-centric description, an added benefit is that all pa-rameters and state variables can be inferred from purely physicochemical time series. Forexample, at high substrate concentrations and in the absence of inhibitors, reaction capaci-ties (Mr) are approximately equal to actual reaction rates (Hr) and can thus be estimateddirectly from the derivative (slope) of chemical concentration time series. Similarly, if thenormalized reaction kinetics hr (or equivalently, the half-saturation constants in case ofMonod kinetics) are known, then reaction rates estimated from concentration time seriescan be divided by hr to yield the reaction capacities. In general, however, reaction capacitiesmay constitute unknown system variables which must be estimated indirectly, for exampleby repeated observation of metabolite concentrations (as demonstrated below).98Reaction-centric modeling of community metabolism8.4 Demonstration of the reaction-centric framework8.4.1 OverviewTo exemplify our approach, we constructed reaction-centric models for two separate engi-neered microbial ecosystems used in previously published experiments. Specifically, in thefirst example we consider urea hydrolysis and nitrification in a batch-fed incubation exper-iment previously described by de Boer and Laanbroek (94). The structure of our model,described in detail below, was chosen to closely resemble the physicochemical conditions inthe experiment as well as the metabolic network involved in dissimilatory nitrogen transfor-mations — as inferred from the experiment. We test the adequacy of our reaction-centricmodel by assessing its “goodness of fit” after calibrating unknown parameters to the ex-perimental data. Further, we demonstrate the importance of cross-amplification factors foraccounting for pathway co-occurrences in cells by comparing two variants of the model,namely one variant with and one variant without the cross-amplification factors.In the second example we consider a reaction-centric model for a flow-through ammonium-fednitrifying bioreactor, operated under varying conditions over the course of several months(109). Similarly to the first example, our model is constructed to closely resemble thephysicochemical conditions of the bioreactor. In this example, we demonstrate how purelychemical time series can be used to calibrate a reaction-centric model and to infer the fullbiochemical state of the bioreactor (i.e., Mr and Cm) in “real-time”. In addition, to furtherassess the fidelity of the model, we use independent biomass concentration measurementsfrom the original experiment, which we compare to the hypothetical biomass concentrationsthat would correspond to the reaction capacities in the reaction-centric model.All time series analysis, simulations and parameter calibrations in this study were performedusing MCM (Chapter 4; 284). The construction and analysis of the models in MCM isexplained in Appendix G.8.4.2 Example 1: Urea hydrolysis and nitrification in a batch-fedincubatorOverview of experimental resultsThe microbial community in the incubator was dominated by Nitrosospira sp., which areammonium oxidizing bacteria (AOB) also capable of hydrolyzing urea to ammonium, and99Reaction-centric modeling of community metabolismNitrobacter sp., which are nitrite oxidizing bacteria (NOB; Fig. 8.1A). The incubator wasbatch-fed with urea, the complete hydrolysis of which by the AOB led to a temporaryaccumulation of ammonium (NH+4 ) within roughly one week. Concurrently to its production,NH+4 was also oxidized by the AOB into nitrite (NO−2 ), which was in turn oxidized by theNOB into nitrate (NO−3 ). Nitrification continued after complete urea hydrolysis until NH+4concentration dropped to about 0.5 mM. The high energy requirements for maintaining amore neutral internal pH than the external environment (pH= 5) could presumably not bemet at lower NH+4 concentrations, eventually leading to a halt of nitrification (94).Inferred model structureThe model focuses on dissimilatory nitrogen fluxes encompassing urea hydrolysis (ure), am-monium oxidation (amo) and nitrite oxidation (nxr). All nitrogen metabolism is assumedto be entirely dissimilatory. Specifically, we assume that each mol urea is converted by ureto 2 mol NH+4 , of which a small fraction ρure is immediately oxidized (“recycled”) to NO−2within the same cell, while the remaining NH+4 leaks to the extracellular medium:ure : urea→ 2(1− ρure)× NH+4 + 2ρure × NO−2 . (8.4.1)We assume that extracellular NH+4 taken up by AOB is completely oxidized to NO−2 , andthat all NO−2 taken up by NOB is completely oxidized to NO−3 :amo : NH+4 → NO−2 , nxr : NO−2 → NO−3 . (8.4.2)The recycling term ρure was included in order to explain the early appearance of NO−3 in theincubator (Fig. 8.2C). Despite the increased model complexity (one additional free parame-ter), preliminary statistical model selection tests (based on AIC and BIC; 246) showed a clearpreference for the inclusion of ρure (Supplemental Fig. G.1). amo rates were assumed to belimited by ammonia (NH3) concentrations, rather than NH+4 concentrations, in accordanceto findings by Suzuki et al. (463). Due to a lack of further information, potential oxygenlimitation in the incubator was ignored.The co-occurrence of ure and amo genes in the same AOB cells leads to a direct couplingof the population dynamics of these genes and enzymes, and therefore the incubator’s amoand ure reaction capacities (Fig. 8.1B). In the model, this coupling corresponds to positivecross-amplification factors that measure the mutual effects of ure flux on amo capacity andvice versa. Hence, based on the model structure introduced in Section 8.3.2, the differential100Reaction-centric modeling of community metabolismequations for the reaction capacities Mure, Mamo and Mnxr take the formdMuredt= Aure ·Hure + Aure,amo ·Hamo −Mure · λAOB, (8.4.3)dMamodt= Aamo ·Hamo + Aamo,ure ·Hure −Mamo · λAOB, (8.4.4)dMnxrdt= Anxr ·Hnxr −Mnxr · λNOB. (8.4.5)Preliminary tests indicated that nxr decay could be omitted from the model because withinthe time span of the experiment NOB cell densities were mostly limited by NO−2 supply,hence on grounds of parsimony we set λNOB = 0. On the other hand, our tests indicatedthat the term λAOB was mostly attributable to AOB maintenance rates (210) that causeda reduced growth of ure and amo compared to a simple proportionality with respect to ureand amo rates. These maintenance requirements result in a substrate threshold below whichdissimilatory metabolism can no longer sustain growth. That threshold is reached whenAamohamo + Aurehure ≤ λAOB, (8.4.6)at which point we assumed a complete halt of ure and amo activity (Appendix G.2.1). Notethat care needs to be taken to ensure consistency between the cross-amplification termsAamo,ure and Aure,amo, as well as between the initial reaction capacities M oure and M oamo. Asexplained previously, we need to haveAure,amo =AamoAureAamo,ure, M oamo = M oureAamo,ureAure. (8.4.7)The normalized reaction kinetics hure, hamo and hnxr are Monod-functions of substrate con-centrations (211), i.e., linear at lower and saturating at higher concentrations:hure =CureaKure + Curea, hamo =CNH3Kamo + CNH3, hnxr =CNO−2Knxr + CNO−2. (8.4.8)Here, Kure, Kamo and Knxr are half-saturation constants. Note that no cell yield factors orcell-specific rates appear in the model; instead, growth dynamics are completely capturedby the cross-amplification factors Aure, Aamo, Anxr and Aamo,ure.101Reaction-centric modeling of community metabolismModel goodness of fitWe fixed 4 out of 11 model parameters to values from the literature. For example, theself-amplification factors Aamo and Anxr were set to 1.2 d−1 and 1.03 d−1, according totypical maximum growth rates of Nitrosospira (30) and Nitrobacter (227), respectively. Theinitial ure rate, M oure, was determined directly from the slope of the urea time series at timet = 0, eliminating the need for initial cell counts as in conventional cell-centric models.The remaining free parameters were calibrated to the experimental data using a maximum-likelihood approach (see Methods for details and Table G.1 for parameter values).Upon calibration, the model largely explains the experimental data and is able to capturethe self- and cross-amplifying character of the incubator’s dynamics (Fig. 8.2). In particular,the ure-amo cross-amplification causes an increase of the system’s amo capacity during ureahydrolysis, even when amo rates are still slow. This results in a fundamentally differentbehavior of the system than could have been explained by a cell- or reaction-centric modelnot accounting for the co-occurrence of ure and amo in the same cells. To verify thisinterpretation, we tested a variation of the model in which the cross-amplification factorsAure,amo and Aamo,ure were set to zero. In this model variant, the initial capacities M oureand M oamo became independent parameters. Similarly, λAOB was split into two independentmaintenance rates, λamo and λnxr. The resulting larger set of free parameters was fittedto the same data as above. This model variant was unable to explain the NH+4 and NO−3time series, despite the higher number of calibrated parameters (Supplemental Fig. G.2).We concluded that the early increase of amo reaction capacity cannot be explained solelyon grounds of amo self-amplification but was indeed partly fueled by ure activity. Thishighlights the importance of taking into account pathway co-occurrences and interactionsin cells and suggests that cross-amplification factors in reaction-centric models may be anadequate means to that end.8.4.3 Example 2: Nitrification in a flow-through bioreactorThe problem of state reconstructionIn principle, reaction-centric models predict future system trajectories (M(t), C(t)) giveninitial conditions (M(0), C(0)). In practice, uncertainty in initial conditions or model param-eters, as well as neglected secondary processes, lead to uncertainties in the predicted systemstate that can increase with time. Selected measurements can provide crucial information toensure model proximity to reality, however typically only a subset of state variables may be102Reaction-centric modeling of community metabolismmeasurable. Inferring a system’s full state from a smaller set of observations is a commonproblem, for example in oceanography or engineering, and generally multiple sequential mea-surements are used to gradually improve state reconstruction and model predictions (32, 53).In this example we demonstrate how long-term, purely abiotic chemical time series can becombined with a reaction-centric model in order to infer the full state of a bioreactor in realtime.Model structureThe model describes a flow-through ammonium-fed nitrifying bioreactor, resembling theexperimental setup by Dumont et al. (109, Bioreactor B). In our model we assume thateach mol NH+4 is oxidized to one mol NO−2 by amo and subsequently to one mol NO−3 bynxr (520). The model thus keeps track of the bioreactor’s amo and nxr reaction capacitiesas well as extracellular NH+4 , NO−2 and NO−3 concentrations over time:dMamodt= AamoHamo −Mamo · µ, (8.4.9)dMnxrdt= AnxrHnxr −Mnxr · µ, (8.4.10)Hamo = Mamohamo, Hnxr = Mnxrhnxr, (8.4.11)dCmdt= Sm,amoHamo + Sm,nxrHnxr + (C inm − Cm) · µ. (8.4.12)Here, Cm is the concentration of the m-th metabolite (NH+4 , NO−2 or NO−3 ), Sm,amo andSm,nxr are the known (520) stoichiometric coefficients for metabolite m in amo and nxr ,respectively, µ is the bioreactor’s controlled dilution rate (causing the bulk of biomass decayin the bioreactor), and C inm is the metabolite concentration in the input medium (zero for allmetabolites except NH+4 ). During the original experiment, the dilution rate as well as theinput NH+4 concentration were varied on several occasions (Figs. 8.3C,D), resulting in non-equilibrium bioreactor dynamics. Hence, in our model both µ and C inNH+4 depend explicitly ontime in the same way as in the original experiment (Figs. 8.3C,D). The normalized reactionkinetics, hamo and hnxr, are Monod-functions of NH3 and NO−2 concentrations, respectively,as in the previous example.Model calibration and “real-time” state reconstructionThe concentrations of NH+4 , NO−2 and NO−3 in the bioreactor were monitored throughout,providing a subset of the bioreactor’s state variables. The remaining state variables (i.e., the103Reaction-centric modeling of community metabolismreaction capacities) were inferred through gradual assimilation of these time series into themodel, as follows. At each point in time the rates of change of NH+4 and NO−3 concentrations(inferred from the NH+4 and NO−3 time series) were used to infer the reaction rates (Hamoand Hnxr), while accounting for the known reaction stoichiometries and the part explainedby the known dilution and substrate supply rates (Figs. 8.3C,D). Next, we inserted theinferred reaction rates into Eqs. (8.4.9) and (8.4.10) to predict the growth rates of amo andnxr capacities that would correspond to these reaction rates:dMˆrdt= ArHr − Mˆr · µ. (8.4.13)Integrating Eq. (8.4.13) over time yields estimates, Mˆamo(t) and Mˆnxr(t), for the reactioncapacities (Figs. 8.3H,I). Due to the decay rate µ, any initial discrepancies between theestimated and true capacities quickly decay exponentially regardless of initial conditions,provided that model parameters are correctly chosen (see below):ddt(Mˆr −Mr)= −µ ·(Mˆr −Mr). (8.4.14)This method of gradual state reconstruction (Fig. 8.3B) is analogous to the use of so-called“observers” in control theory, which gradually approach the system’s unknown state withtime by combining sequential observations with concurrent model predictions (434). In gen-eral, finding appropriate observers for the available data and ensuring their convergence canbe challenging, and our example shows that the special structure of reaction-centric modelsmitigates this problem. Note that the temporal resolution of the chemical data, as opposedto single snapshots, is key to estimating the reaction rates needed for full-dimensional statereconstruction (Fig. 8.4). We note that our reaction-centric approach presents an alternativeto the approach taken in the original experiment, where biomolecular time series data areassimilated into a cell-centric model (108, 109).To validate the estimated bioreactor state, we used Eqs. (8.4.11,8.4.12) to predict the timecourses of the metabolite concentrations corresponding to the estimated Mˆr, and these pre-dictions were then compared to the measured NH+4 , NO−2 and NO−3 concentrations. The amoand nxr half-saturation constants (Kamo and Knxr), as well as the self-amplification factors(Aamo and Anxr), were a priori unknown and were calibrated via least-squares fitting of thepredicted metabolite concentrations to the data (see Appendix G.1.2 for details and TableG.2 for fitted values). Hence, the chemical time series were used both for model calibrationas well as state reconstruction. Only data from days 1–250 were used for the calibration; the104Reaction-centric modeling of community metabolismremaining data (days 250–525) were used to assess the adequacy of the model for explainingthe experimental observations.Within the calibration period the model is able to reproduce most major patterns of NO−2 ,NO−3 and, to a lesser extend, NH+4 concentrations (Fig. 8.5A–C). This indicates that thebioreactor’s state is well estimated by the model during that time. The agreement betweenthe model and the NH+4 and NO−2 data decreases outside of the calibration period, althoughNO−3 predictions remain accurate. In particular, the model overestimates the temporaryaccumulation of NH+4 on days 337–380, during which a higher dilution rate was applied tothe bioreactor (Fig. 8.3D). An increase of residual substrate concentration at higher dilutionrates, as predicted by our model, is consistent with standard bioreactor theory (312). An ex-planation for the absence of NH+4 accumulation in the data could be the potential appearanceof an alternative opportunistic ammonium oxidizer that achieves faster growth rates at highsubstrate concentrations, thus maintaining the residual NH+4 below the model’s predictions.Indeed, this scenario is supported by molecular analyses in the original experiment, whichshowed that a previously rare phylotype had emerged temporarily during that period (109).Comparison with biomass concentration profilesThe reaction-centric model in the above example does not, a priori, require or predict biomassconcentrations or cell densities. After all, its purpose is to shift the focus towards system-widereaction kinetics, and away from the microbial populations that catalyzed them. Neverthe-less, biotic data (if available) can be used as an additional means to test the accuracy ofa reaction-centric model. In the following we shall compare our model’s predicted reactioncapacities (which are proportional to biomass concentrations) to independent dry biomassconcentrations measured during the original experiment (109).We assumed that the bulk of biomass can be attributed to ammonium oxidizers, an as-sumption typically met in practice (109, 520). It then follows that YamoMamo/Aamo shouldbe comparable to the biomass concentration, with Yamo being an unknown biomass yieldfactor. Note that Yamo simply rescales the predicted time profile of Mamo/Aamo. Hence, Yamocan be estimated in retrospect by choosing Yamo such that YamoMamo/Aamo best resemblesthe measured biomass profile. Ordinary linear least-squares fitting yields an estimate ofYamo ≈ 3.2 g dW/mol N (Fig. 8.5D). This estimate is greater than typical yield factors forAOB (e.g., 2.1 g dW/mol N for Nitrosomonas europaea; 520), although higher yield fac-tors have also been reported (321). Other microbial groups such as NOB or non-nitrifierslikely also contribute to total biomass, resulting in an overestimate of Yamo. For example,105Reaction-centric modeling of community metabolismheterotrophic bacteria were detected in the original experiment using molecular methods(109).While the model is consistent with chemical measurements during most of the experimentas discussed previously (Fig. 8.5), it clearly overestimates biomass concentrations duringdays 380–420 (Fig. 8.5D). At that time, the input substrate concentration was high and thedilution rate was low (Fig. 8.3C,D), in principle allowing for high equilibrium cell densities.Previous models for this system based on molecular data show a similar discrepancy (108).Both Dumont’s and our model assume a constant yield factor, ignoring the fact that the mi-crobial community is subject to continuous taxonomic turnover (109). Previous bioreactorexperiments have repeatedly revealed rapid taxonomic turnover and fluctuations in biomassdensities, despite stable metabolic performance (123, 524). This discrepancy between reac-tion rates and community composition is often attributed to functional redundancy withinmicrobial communities (46; 285), and highlights an important limitation of reaction-centricmodels: Namely, reaction-centric models may explain ecosystem reaction rates, but theycan fail to detect microbial community changes when functional performance remains sta-ble. Multiple reaction capacities representing equivalent reactions may be included in amodel to account for functional redundancy, however this will typically compromise param-eter identifiability.8.4.4 Estimating concentrations of other organic compoundsIn the last example above we assessed the adequacy of our reaction-centric model using in-dependent biomass concentration measurements by introducing an additional biomass yieldfactor, which related dissimilatory nitrogen fluxes to biosynthesis rates. Similarly, reaction-centric models may also predict the concentration of other organic compounds or elements,either for model validation using additional data or for addressing particular ecological ques-tions. For example, organic nitrogen or carbon concentration profiles can yield insight intonitrogen fixation rates and productivity at ecosystem scales (62, 447). The concentrations ofvarious compounds in living cells (e.g., organic N) can be derived from the reaction capacitiesusing so called assimilation factors, which represent the amount of compound assimilated orsynthesized per reaction flux (520). More precisely, the concentration of a particular organiccompound is given by the matrix productTTA−1M, (8.4.15)106Reaction-centric modeling of community metabolismwhere M is the column vector containing all reaction capacities, T is the column-vectorcontaining the assimilation factors for the compound for the various reactions, TT is thetranspose of T and A−1 is the inverse of the amplification matrix (see Appendix G.2.2for a derivation). For example, the stoichiometries of N-metabolism and anabolism in theammonium oxidizer N. europaea and nitrite oxidizer Nitrobacter winogradskyi are usuallysummarized by55NH+4 + 76O2 + 109HCO−3 → C5H7NO2 + 54NO−2 + 57H2O + 104H2CO3 (8.4.16)and400NO−2 + NH+4 + 4H2CO3 + HCO−3 + 195O2 → C5H7NO2 + 3H2O + 400NO−3 , (8.4.17)respectively (520). Here, C5H7NO2 represents biomass. Hence, for organic N the assimilationfactors are Tamo = 1 : 55 ≈ 0.018 (1 mol N assimilated per NH+4 consumed) for dissimilatoryammonium oxidationamo : NH+4 +32O2 → NO−2 + H2O + 2H+, (8.4.18)and Tnxr = 1 : 400 ≈ 0.0025 for dissimilatory nitrite oxidationnxr : NO−2 +12O2 → NO−3 . (8.4.19)In other cases (e.g., when stoichiometries are unknown) assimilation factors may be estimatedthrough linear least-squares fitting, as demonstrated above for total biomass.8.4.5 Limitations and extensions of reaction-centric modelsThe reaction-centric models presented in this study were formulated in terms of ordinary dif-ferential equations that describe the temporal evolution of the chemical and reaction-kineticstate of a well-mixed (i.e., spatially homogenous) system. Spatial extensions, for examplecomprising multiple interacting compartments or formulated as partial (i.e., spatiotempo-ral) differential equations, are equally possible. Such extensions may be used to describethe biogeochemistry in the ocean water column (386) or in multi-stage industrial processes(373).For simplicity, we only considered Monod-type reaction kinetics, which capture the non-linear107Reaction-centric modeling of community metabolismand saturating dependence of reaction rates on single substrate concentrations, but whichignore potential substrate inhibition effects or multi-substrate dependencies. For example,excess ammonia and nitrous acid concentrations in nitrifying bioreactors can cause inhibitionof the very pathways that consume them (15), and this substrate inhibition can result inreduced bioreactor performance (425). Similarly, the accumulation of metabolic productscan inhibit pathway activity, e.g., by rendering pathways energetically unfavorable (256),thereby slowing down reaction rates or even causing a decline of reaction capacities due tocell death (77, 225). In reaction-centric models, substrate or product inhibition as well asmulti-substrate dependencies can be incorporated through appropriate normalized reactionkinetics, hr(C), for example in the form of multi-substrate Michaelis-Menten functions withinhibition terms (e.g. 478).We note that reaction-centric models are not appropriate for capturing complex hetero-geneities in the physiology or metabolic activity within populations that may be caused, forexample, by stochastic regulatory switching (2). Simple heterogeneities, e.g., involving asmall set of alternative metabolic phenotypes, may be accounted for by including multiplereactions whose capacities are coupled through cross-amplification factors. However, whenvariation between individual cells involves multiple traits or spans a continuum of values,individual-based models (124, 258) may be more appropriate for incorporating that variation.Moreover, reaction-centric descriptions eliminate cell-centric information (e.g., cell densitiesof particular species or strains) that is potentially needed to model additional community-level processes such as predation by protists (166) or bacteriophages (423). For example,bacteriophages adapted to specific bacterial taxa can exert strong control on their hostpopulations and can drive rapid turnover of competing bacterial taxa through “killing thewinner” dynamics (423, 462). Such taxonomic turnover within microbial “metabolic guilts”cannot be captured by reaction-centric models, although in several previous bioreactor ex-periments the overall biochemical throughput remained constant despite rapid taxonomicturnover (122, 179, 506, 524) and hence, reaction-centric models may be adequate for suchsystems. Other biotic interactions, such as chemical warfare (393) or quorum sensing (137)may also necessitate the use of cell-centric (e.g., individual-based) models.8.5 ConclusionsMarker gene-based profiling of taxonomic community composition has become a standardtool in microbial ecology and bioengineering (109, 524). However, taxonomic profiles can leadto ambiguous conclusions about metabolic processes due to functional redundancy across mi-108Reaction-centric modeling of community metabolismcrobial clades, fine-scale ecological differentiations and poor functional characterization ofspecies (109, 224). In fact, microbial communities can have highly variable taxonomic com-position while maintaining stable overall reaction rates, as has been repeatedly demonstratedin bioreactors (123, 506, 524). Furthermore, the measurement of biotic variables such as en-zyme concentrations and taxonomic profiles often presents practical challenges (497). Theseobservations motivate the pursuit for reaction-centric descriptions of microbial ecosystemsthat can fully utilize abiotic physicochemical data and minimize the need for laborious bioticmeasurements. This is particularly important in bioprocess and environmental engineering,where the need for real-time and unambiguous state reconstruction imposes strong require-ments on the data (261).Here we have shown how a reaction-centric model enables the inference of a bioreactor’sstate, from a reaction kinetic point of view, based solely on chemical data. Reaction-centricmodels can capture the self- and cross-amplifying nature of biocatalyzed processes thatso strongly sets them apart from most non-living systems. This is achieved through anamplification matrix that translates system-wide reaction rates to changes in system-widereaction capacities. Because the amplification matrix can contain off-diagonal entries it canaccount for pathway co-occurrences in cells, as we have demonstrated for the case of ureahydrolysis and ammonium oxidation. Reaction-centric models share a conceptual similarityto gene-centric models (Chapter 7), in that both only consider the dynamics of pathwayswith disregard to the various organisms hosting a particular pathway. We note that herereaction-centric models were derived from cell-centric models, and hence their realism andaccuracy are at most as good as those of cell-centric models. In principle, however, reaction-centric models may be taken as a starting point for further generalizations, e.g., using non-linear self- or cross-amplification terms; this remains an avenue for future research. Theelegance of reaction-centric models makes them a potentially powerful alternative to cell-centric (80, 433) or gene-centric models for describing microbial metabolic fluxes at ecosystemscales, especially in the absence of molecular data or when the focus is entirely on reactionkinetics.109Reaction-centric modeling of community metabolismA CNH+4NO−2NO−3AOBNOBureaNH+4NO−2NO−3ureaamourenxrFigure 8.1: Modeling urea hydrolysis and nitrification. (A) Microbial ecosystem model forurea hydrolysis and subsequent nitrification by ammonium (NH+4 ) oxidizing bacteria (AOB) andnitrite (NO−2 ) oxidizing bacteria (NOB), in a batch-fed incubator. (B) Corresponding reaction-centric model comprising urea hydrolysis (ure), ammonium oxidation (amo) and nitrite oxidation(nxr) with explicit self- and cross-amplifications (continuous arrows): The flux through each re-action (dashed arrows) powers biosynthesis by the cells performing the reaction, leading to thegrowth of the rate capacity of that reaction and of other reactions catalyzed by the same cells.110Reaction-centric modeling of community metabolismTable 8.1: Overview of symbols and units. The indices r and q enumerate reactions or cellspecies, while m enumerates metabolites. Parameters or variables specific to cell-centric models areindicated by “†”, those specific to reaction-centric models are indicated by “?”. Parameter valuesused in the examples are given in Tables G.1 and G.2.symbol and description units used ast time days (d) -Nr cell density † cells/L independent variableN all cell densities (vector) † cells/L independent variableCm metabolite concentration mol/L independent variableC all metabolite concentrations (vector) mol/L independent variableCom initial metabolite concentration mol/L parameterYr cell yield factor † cells/mol parameterVr maximum cell-specific reaction rate † mol/(cell · d) parameterhr normalized cell-specific reaction rate - function of Cλr exponential biomass decay rate 1/d parameterKr substrate half-saturation constant mol/L parameterFm abiotic net metabolite influx mol/(L · d) function of t and CSmr stoichiometric coefficient - parameterHr reaction rate mol/(L · d) dependent variableAr self-amplification factor ? 1/d parameterArq cross-amplification factor ? 1/d parameterMr reaction capacity ? mol/(L · d) independent variableM all reaction capacities (vector) ? mol/(L · d) independent variableM or initial reaction capacity ? mol/(L · d) parameterρure ammonia recycling fraction - parameterTr substrate assimilation factor - parametersymbols specific to example 2 (flow-through bioreactor)C inm metabolite concentration in inflow mol/L function of tMˆr reconstructed reaction capacity ? mol/(L · d) estimated variableCˆm reconstructed metabolite concentration mol/(L · d) estimated variableµ hydraulic dilution rate 1/d function of t111Reaction-centric modeling of community metabolismChapter 8. ti - tri li f it t liA B CD E FFigure 8.2: Model predictions and data for Example 1. Model predictions and data from abatch-fed incubation experiment involving urea hydrolysis and nitrification: (A) Urea, (B) ammo-nium and (C) nitrate concentrations over time, following incubation of a mixed Nitrosospira AHB1and Nitrobacter NHB1 community in a urea-enriched medium. Second row: (D) urea hydrolysis(ure), (E) ammonium oxidation (amo) and (F) nitrite oxidation (nxr) rates over time. The rapidhalt of amo (and subsequently nxr) around day 20 occurs when ammonia concentration falls be-low the threshold imposed by the maintenance energy requirements of the cells (Eq. 8.4.6). SeeMethods for details. Data from de Boer and Laanbroek (91).111ure amo nxrl i ii i i i l i l i i ifi i : ,i i i i , ll i i i ii i i i . : l i, i i i i i i i i . il l i i lll l i i i ll . . . .il . 4 .2Reaction-centric modeling of community metabolismA C DNH+4NO−2NO−3amonxrB E Fmetabolite concentration time series infer reaction ratesHrestimate system statedMˆrdt= ArHr ≠ µMˆrdCˆmdt=ÿrSmrMˆrhr + Fmcompare predictions to data predict metabolite concentrationsG HFigure 8.3: Reconstructing a bioreactor’s state using chemical time series. (A) Reaction-centric illustration of a flow-through nitrifying bioreactor, corresponding to experiments by Dumontet al. (109). Continuous loop-arrows represent self-amplifications of ammonium oxidation (amo) andnitrite oxidation (nxr). (B) Methodological overview for model-based inference of the bioreactor’sstate using chemical time series, as performed in this chapter. Reaction rates are inferred fromthe derivative of metabolite concentration time series. These reaction rates, in turn, are usedin the reaction-centric model to predict the growth of the corresponding reaction capacities andhence the trajectory of the system’s state. A comparison of predictions with the original chemicalconcentration data can be used to calibrate and validate the model. Right panel: (C) Input NH+4concentration, (D) dilution rate, (E) inferred amo rate, (F) inferred nxr rate, (G) estimated amocapacity and (H) estimated nxr capacity over time.113Reaction-centric modeling of community metabolismNH+4NO≠3NH+4NO≠3abiotic & biotic snapshotabiotic time profileAOBNOBNH+4NO≠3AOBNOBBCAtimeNO2timeFigure 8.4: Information needed to estimate the state of a reaction-centric model. (A)Illustration of a nitrifying microbial community: AOB oxidize ammonium (NH+4 ) to nitrite (NO−2 ),which is subsequently oxidized by NOB to nitrate (NO−3 ). (B) In a cell-centric framework, both abi-otic (e.g., physicochemical) and biotic (e.g., cell density) measurements are required for a completedescription of the system’s state at any particular moment in time (“snapshot”). (C) In a reaction-centric framework, the system’s state can be reproduced based on purely abiotic measurements,however measurements across multiple time points are needed (“time profile”).114Reaction-centric modeling of community metabolismABCDFigure 8.5: Model predictions and data for Example 2. (A) Ammonium, (B) nitrite, (C)nitrate and (D) dry biomass concentration in the flow-through nitrifying bioreactor, as predicted bythe data-driven model (thick blue curve) and compared to experimental data (dots). The thin greycurves show smoothened, i.e., noise-reduced, approximations of the data (see Methods for details).The shaded regions in (a–c) mark the data that were used for model calibration. Data in the whiteregion were ignored during calibration and serve as an independent validation of the model. Thearrow in (C) indicates the delayed onset of nxr after the temperature of the bioreactor was reducedfrom 30◦C to 25◦C on day 181. The unknown biomass yield factor, required for comparing thereaction-centric model to biomass measurements in (D), was calibrated using least-squares fitting(see the main text for details). Data by Dumont et al. (109, Bioreactor B).115Closing chapterChapter 9Closing chapter9.1 ConclusionsThe work presented here provides strong support for a pathway-centric paradigm of mi-crobial ecology, in which stoichiometric and energetic constraints (“boundary conditions”),such as the availability of light and the presence of certain electron acceptors for respira-tion, dictate the structure of the metabolic network in a community, while exerting muchless control on the taxonomic composition within individual metabolic functional groups(Chapters 2 and 3). According to this paradigm, the metabolic functional structure of amicrobial community and the taxonomic composition within individual functional groupsconstitute roughly independent and complementary facets of community structure, whichare shaped by distinct assembly mechanisms. Although undoubtedly an idealization, thisparadigm provides an elegant interpretation for several patterns of community assembly ex-amined here. For example, the decoupling between taxonomy and function, manifested asa strong taxonomic turnover within functional groups, explains the previously reported ap-parent randomness in overall microbial community composition across bromeliads (Chapter3; 117) or in bioreactors during stable performance (Chapters 5 and 6; 122).A pathway-centric paradigm greatly simplifies conceptual and mathematical modeling ofecosystem biochemistry, because taxonomic richness and turnover become system proper-ties that have little relevance to metabolic network structure. Constructing an appropriatemodel for ecosystem biochemistry remains itself a hard problem, however the two approachesdemonstrated in Chapters 7 and 8 yield encouraging results. Notably, the pathway-centricbiogeochemical model presented in Chapter 7 was able to largely explain the geochemical,metagenomic, metatranscriptomic and metaproteomic depth profiles in the Saanich InletOMZ. This supports the idea that genes provide a robust description of an ecosystem’sbiological component (104, 201), which, in conjunction with the geochemical background,determines pathway activity (257). That is not to say that metabolic pathways simply re-spond to an externally imposed and independent geochemical background, since microbial116Closing chaptermetabolism itself can strongly affect environmental conditions and hence metabolic nichespace. For example, most major transitions in Earth’s surface redox state were the com-bined result of coupled geological and microbial metabolic processes (54, 116). In the SaanichInlet OMZ, the formation of a narrow sulfide-nitrate transition zone at intermediate depthsduring periods of stratification results from the gradual balancing between oxidant supplyfrom the upper layers, reductant supply from the sediments, physical transport processesand microbially catalyzed electron flow (Fig. 9.1).Taken together, the work presented in this dissertation strongly suggests that functionalprofiling of microbial communities — either via multi-omics or via a functional annotationof detected taxa (Chapter 2) — should constitute the baseline for microbial biogeography,especially across transects where geochemical gradients shape microbial niche space. The re-maining taxonomic variation within functional groups can then be analyzed separately (e.g.,Chapter 3) to elucidate additional community assembly processes. For example, the statis-tical analyses and mathematical models presented in Chapters 3, 5 and 6 suggest that bioticinteractions, such as competition and predation by phages, play a major role in driving thetaxonomic variation within functional groups observed even under constant environmentalconditions. In contrast, neither across the global ocean (Chapter 2) nor between replicatebromeliad tanks (Chapter 3) did spatial distances have any significant effect on communitydifferences. This suggests that in these cases dispersal limitation may be negligible comparedto other microbial community assembly mechanisms. Similarly, non-random phylogenetic as-sociations and OTU co-occurrence patterns across bromeliad samples (Chapter 3) suggestthat random demographic drift and random colonization events are unlikely the cause of thestrong taxonomic variation observed within functional groups.The complete decoupling between community metabolism and community assembly pro-cesses other than metabolic niche effects, as implied by a pathway-centric paradigm, is ofcourse only an idealization, whose accuracy will depend on the particular system at hand.As shown in Chapter 6, biotic interactions such as predation by lytic phages can significantlydisturb the metabolic throughput of a bioreactor even under constant operating conditions,because fluctuations in individual populations (Fig. 9.1A) inevitably lead to fluctuations incommunity function. In such cases, a high functional redundancy can stabilize overall com-munity function over time despite strong taxonomic turnover within individual functionalgroups. This situation is analogous to convective heat transport through a fluid trappedbetween a hot and a cold plate (aka. Rayleigh-Bénard convection; 532), where spontaneousplumes and vortices in the fluid can disturb overall heat transport rates; an increased platesurface area stabilizes overall heat transport despite small-scale turbulence (Fig. 9.2).117Closing chapter+eA B Cdepth (m)Figure 9.1: Biogeochemical steady state in the Saanich Inlet OMZ. (A) Near-steady statedepth profiles of major oxidants (O2 and NO−3 ) and major reductants (NH+4 and H2S) in theSaanich Inlet OMZ, after a prolonged period of stratification (February 10, 2010). (B) Corre-sponding gene concentration profiles for partial denitrification to nitrous oxide (PDNO), nitrousoxide reduction (nosZ ) and anammox (hzo) inferred from metagenomic sequences. Gene profilesare linearly rescaled to resemble the actual gene counts predicted by the model in Chapter 7. Thesteady state shown in (A) and (B) results from a balance between oxidant supply from the upperlayers, reductant supply from the sediments, physical transport processes and microbially catalyzedelectron flow. The biosynthetic rate sustained by each pathway is roughly proportional to its Gibbsfree energy (∆G) multiplied by its reaction rate (386, 396). (C) Illustration of a simple electricalcircuit in analogy to (A) and (B): A potential gradient between the top and bottom of the circuitinduces a steady electron flow through the resistors. The energy used up by each resistor (releasedas heat) is proportional to the difference in electrical potential between its two ends multiplied bythe electrical current through it.9.2 So, what is life?This work provides strong support for the idea that the activity and distribution of prokary-otic metabolic pathways is dictated by environmental physicochemical conditions in a waythat can become decoupled from the particular combination of organisms in a community.Given their frequent horizontal transfer across prokaryotic clades (116, 133, 270, 385, 483),these pathways — or more precisely, the genes encoding them — resemble self-serving livingentities that proliferate when environmental conditions are favourable (93). Strong mutualinterdependencies, for example between rRNA genes and genes involved in metabolism, inturn promote the association of genes into genomes encapsulated inside a single container(the prokaryotic cell). The conceptual similarity of this interpretation to obligate symbiotic“organelle” assemblages, structurally stabilized in the form of eukaryotic cells (195, 280), is118Closing chapterheat transportcoldhot01e+092e+093e+09500 600 700 800 900 1000cell abundances (cells/L)time (days)Species proportions (HI_20x, run_21, all species)12310⁹ cells/L0 200 400 600 800 1000time (days)A BgravityFigure 9.2: Complex community dynamics in a methanogenic bioreactor, comparedto turbulent thermal convection. (A) Simulated cell concentrations for 240 different OTUs(one color per OTU) over time, in a methanogenic glucose-fed bioreactor operated under constantconditions (Chapter 6). Phage-driven “killing-the-winner” dynamics lead to repeated complexfluctuations of cell populations and community metabolic performance. Increasing the functionalredundancy stabilizes the overall metabolic throughput of the bioreactor (see Fig. 6.3 in Chapter 6),although fluctuations in individual cell populations remain at the same amplitudes. (B) Turbulentconvective heat transport across a thermal gradient, through a fluid trapped between a hot anda cold plate and subject to gravity, also known as Rayleigh-Bénard convection (532). The flux ofenergy through this system involves the spontaneous and repeated formation of complex vortices,analogous to the complex community dynamics sustained by the supply of glucose in the bioreactor.Increasing the plate surface area stabilizes the overall heat transport between the plates, althoughplumes and vortices remain at the same spatiotemporal scales. Simulation performed using softwareby The Concord Consortium (473).striking. In fact, primordial cells were likely modular assemblages of loosely interdependentcomponents subject to frequent horizontal exchange (525).A precise definition of life has long been an elusive intellectual endeavor (413, 482), al-though the ability to “self-reproduce” and the ability to evolve are arguably the two mostcommonly stated prerequisites (485). The truth of the matter is that the majority of or-ganisms can only reproduce as part of a broader biological consortium that is characterizedby distributed biosynthetic capacities and a metabolic network that is partitioned acrossmultiple organisms (169, 200, 238, 311). Chemoheterotrophs, for example, by definitiondepend on other organisms for reduced carbon supply. Auxotrophy, i.e., the inability to pro-duce specific required biomolecules such as cofactors, is widespread even amongst autotrophs(183, 408, 409). Syntrophy (333), in which the metabolic activities of two organisms are mu-tually dependent, is yet another manifestation of the fact that “life needs life”. For example,119Closing chapterhydrogen-consuming methanogens enable bacterial hydrogenogenic fermentation by loweringpartial hydrogen pressure to levels at which fermentation becomes exergonic (27, 315, 453).In the Saanich Inlet water column, SUP05 chemolithoautotrophic Gammaproteobacteriaoxidize hydrogen sulfide produced by sulfate reducers and reduce nitrate produced by nitri-fiers, in turn supplying nitrous oxide to nitrous oxide reducers and nitrite to nitrite-respiringanammox bacteria (Fig. 7.1, Chapter 7). Viruses, the most abundant “lifeform” on Earth(462), rely on the translational machinery of their hosts for replication. In fact, bacte-riophages occasionally reprogram the metabolic machinery of their hosts to increase viralgenome replication (199). Prions, i.e., proteinaceous infectious particles that propagate bytransmitting their misfolded state to other proteins, can even undergo Darwinian evolution(272). Plants rely on the photosynthetic capacity of their cyanobacterial endosymbionts and,reciprocally, these endosymbionts benefit from a steady source of nutrients and protection(313). The fluent conceptual transition between simple evolving molecules (proteins, genes)that depend on each other for replication, up to obligately symbiotic cell assemblages (170),underlies much of the difficulty of agreeing on a single definition for “life” (485). In fact,a precise separation of the living from the abiotic based on a threshold of “self-sufficiency”for replication appears as meaningful as the definition of a bacterial “species” based on athreshold of genomic similarity (422, 482). In any case, it seems that a coerced definitionof “life” may well be irrelevant to an understanding of the integrated biotic-abiotic EarthSystem.120Summary of software contributionsSummary of software contributionsA large part of the work presented here involved the development of novel computationaltools. Notably, examining the taxonomic composition within individual metabolic functionalgroups in Chapters 2 and 3 required the manual construction of a database for the func-tional annotation of prokaryotic taxa (“FAPROTAX”), based on extensive literature search.Currently, FAPROTAX includes over 7400 annotations (e.g., at the species or genus level)covering more than 80 functional groups (e.g., anoxygenic photosynthesis, anammox or sul-fate respiration). FAPROTAX thus constitutes a cheap alternative to metagenomics, thecurrent de facto standard for functional community profiling (97). In fact, because FAPRO-TAX is based on experimental evidence for metabolic phenotypes, it can resolve ambiguitiesin the interpretation of community gene content inherent to metagenomics (376). Further,since FAPROTAX provides information on taxon-function associations within a community,it enables the separate analysis of community assembly processes within individual func-tional groups (Chapter 3). I thus anticipate that FAPROTAX (and its potential successors)will push microbial ecology into exciting new directions. FAPROTAX will be freely availableupon publication of Chapter 2 at: http://www.zoology.ubc.ca/louca/FAPROTAXThe work presented in Chapters 5, 6 and 8 required the construction of high-dimensionaldynamical models for microbial communities and their calibration to experimental data. Aframework tailored to this task was unavailable when work on this dissertation began andhence MCM, a software for modeling microbial communities, was developed (Chapter 4).Compared to recently published comparable tools (177), MCM remains unique in its abilityto automatically calibrate arbitrary model parameters using a plethora of experimental data,such as cell counts and chemical time series. As demonstrated in Chapter 4 using data fromprevious E. coli evolution experiments, MCM can yield insight into the metabolic activityof individual populations in a community context and provide mechanistic explanations forpreviously observed eco-evolutionary dynamics. MCM, including a thorough user manualand multiple examples, is freely available at: http://www.zoology.ubc.ca/MCM121BibliographyBibliography[1] Abedon, S. T. 1989. Selection for bacteriophage latent period length by bacterialdensity: A theoretical examination. Microbial Ecology 18:79–88.[2] Ackermann, M. 2015. A functional perspective on phenotypic heterogeneity in mi-croorganisms. Nature Reviews Microbiology 13:497–508.[3] Aguilar, D., F. X. Aviles, E. Querol, and M. J. E. Sternberg. 2004. Analysis ofphenetic trees based on metabolic capabilites across the three domains of life. Journalof Molecular Biology 340:491–512.[4] Ahmed, S., S. King, and J. Clayton Jr. 1984. Organic matter diagenesis in theanoxic sediments of Saanich Inlet, British Columbia, Canada: a case for highly evolvedcommunity interactions. Marine Chemistry 14:233–252.[5] Ahring, B. K., and P. Westermann. 1987. Kinetics of butyrate, acetate, and hydrogenmetabolism in a thermophilic, anaerobic, butyrate-degrading triculture. Applied andEnvironmental Microbiology 53:434–439.[6] Aksnes, D. L., and F. J. Cao. 2011. Inherent and apparent traits in microbial nutrientuptake. Marine Ecology Progress Series 440:41–51.[7] Alleman, J. E., and K. Preston, 1991. Behavior and physiology of nitrifying bacteria.Pages 1–13 in Proceedings of the second annual conference on commercial aquaculture,CES, volume 240.[8] Almeida, J. S., M. A. M. Reis, and M. J. T. Carrondo. 1995. Competition betweennitrate and nitrite reduction in denitrification by Pseudomonas fluorescens. Biotech-nology and Bioengineering 46:476–484.[9] Amy, P. S., C. Pauling, and R. Y. Morita. 1983. Recovery from nutrient starvation bya marine Vibrio sp. Applied and Environmental Microbiology 45:1685–1690.[10] Andersen, K. B., and K. von Meyenburg. 1980. Are growth rates of Escherichia coliin batch cultures limited by respiration? Journal of Bacteriology 144:114–123.[11] Anderson, J., 1984. The oxic/anoxic interface in Saanich Inlet. Pages 17–23 in S. Ju-niper and R. Brinkhurst, editors. Proceedings of a multidisciplinary symposium on122BibliographySaanich Inlet, 2nd February, 1983. Number 83 in Canadian technical report of hy-drography and ocean sciences, Department of Fisheries and Oceans Canada, Sidney,BC.[12] Anderson, J. J., and A. H. Devol. 1973. Deep water renewal in Saanich Inlet, anintermittently anoxic basin. Estuarine and Coastal Marine Science 1:1–10.[13] Andersson, A. F., and J. F. Banfield. 2008. Virus population dynamics and acquiredvirus resistance in natural microbial communities. Science 320:1047–1050.[14] Andrade-Eiroa, Á., M. Canle, and V. Cerdá. 2013. Environmental applications ofexcitation-emission spectrofluorimetry: An in-depth review II. Applied SpectroscopyReviews 48:77–141.[15] Anthonisen, A. C., R. C. Loehr, T. B. S. Prakasam, and E. G. Srinath. 1976. Inhibitionof nitrification by ammonia and nitrous acid. Journal of Water Pollution ControlFederation 48:835–852.[16] Antolli, P., and Z. Liu. 2012. Bioreactors: Design, Properties, and Applications.Biochemistry research trends series, Nova Science Publishers, Incorporated.[17] Antoniewicz, M. R. 2013. Dynamic metabolic flux analysis – tools for probing transientstates of metabolic networks. Current Opinion in Biotechnology 24:973–978.[18] Ashelford, K. E., M. J. Day, and J. C. Fry. 2003. Elevated Abundance of BacteriophageInfecting Bacteria in Soil. Applied and Environmental Microbiology 69:285–289.[19] Aßhauer, K. P., B. Wemheuer, R. Daniel, and P. Meinicke. 2015. Tax4Fun: predictingfunctional profiles from metagenomic 16S rRNA data. Bioinformatics 31:2882–2884.[20] Atkins, P., and J. de Paula. 2012. Elements of Physical Chemistry. OUP Oxford.[21] Atlas, R. M., and R. Bartha. 1981. Microbial ecology: fundamentals and applications.Addison-Wesley Publishing Company, Don Mills, Ontario.[22] Atwood, T. B., E. Hammill, H. S. Greig, P. Kratina, J. B. Shurin, D. S. Srivastava,and J. S. Richardson. 2013. Predator-induced reduction of freshwater carbon dioxideemissions. Nature Geoscience 6:191–194.[23] Awata, T., M. Oshiki, T. Kindaichi, N. Ozaki, A. Ohashi, and S. Okabe. 2013. Physi-ological characterization of an anaerobic ammonium-oxidizing bacterium belonging to123Bibliographythe “Candidatus Scalindua” group. Applied and Environmental Microbiology 79:4145–4148.[24] Ayarza, J., L. Guerrero, and L. Erijman. 2010. Nonrandom assembly of bacterialpopulations in activated sludge flocs. Microbial Ecology 59:436–444.[25] Aylward, F. O., J. M. Eppley, J. M. Smith, F. P. Chavez, C. A. Scholin, and E. F.DeLong. 2015. Microbial community transcriptional networks are conserved in threedomains at ocean basin scales. Proceedings of the National Academy of Sciences112:5443–5448.[26] Azhar, M. A., D. E. Canfield, K. Fennel, B. Thamdrup, and C. J. Bjerrum. 2014.A model-based insight into the coupling of nitrogen and sulfur cycles in a coastalupwelling system. Journal of Geophysical Research: Biogeosciences 119:264–285.[27] Bäckhed, F., R. E. Ley, J. L. Sonnenburg, D. A. Peterson, and J. I. Gordon. 2005.Host-Bacterial mutualism in the human intestine. Science 307:1915–1920.[28] Becking, B. 1934. Geobiologie of Inleiding Tot de Milieukunde (Geobiology or In-troduction to the Science of the Environment). W.P. Van Stockum and Zoon, TheHague.[29] Bell, T. 2010. Experimental tests of the bacterial distance-decay relationship. ISMEJournal 4:1357–1365.[30] Belser, L., and E. Schmidt. 1980. Growth and oxidation kinetics of three genera ofammonia oxidizing nitrifiers. FEMS Microbiology Letters 7:213–216.[31] Belser, L. W. 1979. Population ecology of nitrifying bacteria. Annual reviews inmicrobiology 33:309–333.[32] Bertino, L., G. Evensen, and H. Wackernagel. 2003. Sequential data assimilationtechniques in oceanography. International Statistical Review 71:223–241.[33] Beszteri, B., B. Temperton, S. Frickenhaus, and S. J. Giovannoni. 2010. Averagegenome size: a potential source of bias in comparative metagenomics. ISME Journal4:1075–1077.[34] Bethke, C. M., R. A. Sanford, M. F. Kirk, Q. Jin, and T. M. Flynn. 2011. Thethermodynamic ladder in geomicrobiology. American Journal of Science 311:183–210.124Bibliography[35] Betlach, M. R., and J. M. Tiedje. 1981. Kinetic explanation for accumulation ofnitrite, nitric oxide, and nitrous oxide during bacterial denitrification. Applied andEnvironmental Microbiology 42:1074–1084.[36] Björck, A. 1996. Numerical Methods for Least Squares Problems. Society for Industrialand Applied Mathematics.[37] Blackburne, R., V. M. Vadivelu, Z. Yuan, and J. Keller. 2007. Kinetic characterisationof an enriched Nitrospira culture with comparison to Nitrobacter. Water Research41:3033–3042.[38] Blazier, A. S., and J. A. Papin. 2012. Integration of expression data in genome-scalemetabolic network reconstructions. Frontiers in Physiology 3.[39] Blount, Z. D., C. Z. Borland, and R. E. Lenski. 2008. Historical contingency andthe evolution of a key innovation in an experimental population of Escherichia coli.Proceedings of the National Academy of Sciences 105:7899–7906.[40] Blumberg, K., C. Michiels, C. Jones, S. A. Crowe, and S. Hallam, 2014. Saanich Inletoxygen mimimum zone: SUP05-driven Michaelis-Menten sulfide oxidation kinetics.in 2014 GSA Annual Meeting. Number 191-9 in Harnessing “omics” to advance thegeosciences: New paradigms and platforms for observing Earth systems, GeologicalSociety of America, Vancouver, BC, Canada.[41] Boon, E., C. J. Meehan, C. Whidden, D. H. J. Wong, M. G. I. Langille, and R. G. Beiko.2014. Interactions in the microbiome: communities of organisms and communities ofgenes. FEMS Microbiology Reviews 38:90–118.[42] Borg, I., and P. J. F. Groenen. 2005. Modern Multidimensional Scaling: Theory andApplications. 2 edition. Springer Series in Statistics, Springer.[43] Box, G. E. 1979. Robustness in the strategy of scientific model building. Robustnessin Statistics 1:201–236.[44] Brenner, K., L. You, and F. H. Arnold. 2008. Engineering microbial consortia: a newfrontier in synthetic biology. Trends in Biotechnology 26:483–489.[45] Brettar, I., and G. Rheinheimer. 1991. Denitrification in the Central Baltic: evidencefor H2S-oxidation as motor of denitrification at the oxic-anoxic interface. Marine Ecol-ogy Progress Series 77:157–169.125Bibliography[46] Briones, A., and L. Raskin. 2003. Diversity and dynamics of microbial communities inengineered environments and their implications for process stability. Current Opinionin Biotechnology 14:270–276.[47] Bristow, L. A., T. Dalsgaard, L. Tiano, D. B. Mills, O. Ulloa, D. E. Canfield, N. P.Revsbech, and B. Thamdrup. 2013. High sensitivity of ammonia and nitrite oxidationrates to nanomolar oxygen concentrations. Mineralogical Magazine 77:636–804.[48] Brockhurst, M. A., A. Fenton, B. Roulston, and P. B. Rainey. 2006. The impactof phages on interspecific competition in experimental populations of bacteria. BMCEcology 6:1–7.[49] Brown, J., J. Gillooly, A. Allen, V. Savage, and G. West. 2004. Toward a metabolictheory of ecology. Ecology 85:1771–1789.[50] Brüchert, V., B. B. Jørgensen, K. Neumann, D. Riechmann, M. Schlösser, andH. Schulz. 2003. Regulation of bacterial sulfate reduction and hydrogen sulfide fluxesin the central Namibian coastal upwelling zone. Geochimica et Cosmochimica Acta67:4505–4518.[51] Burgard, A. P., and C. D. Maranas. 2003. Optimization-based framework for inferringand testing hypothesized metabolic objective functions. Biotechnology and Bioengi-neering 82:670–677.[52] Burke, C., P. Steinberg, D. Rusch, S. Kjelleberg, and T. Thomas. 2011. Bacterialcommunity assembly based on functional genes rather than species. Proceedings ofthe National Academy of Sciences 108:14288–14293.[53] Camacho, E. F., and C. Bordons Alba. 2004. Model Predictive Control. Springer.[54] Canfield, D. E., A. N. Glazer, and P. G. Falkowski. 2010. The evolution and future ofEarth’s nitrogen cycle. Science 330:192–196.[55] Canfield, D. E., F. J. Stewart, B. Thamdrup, L. De Brabandere, T. Dalsgaard, E. F.Delong, N. P. Revsbech, and O. Ulloa. 2010. A cryptic sulfur cycle in oxygen-minimum-zone waters off the Chilean coast. Science 330:1375–1378.[56] Canfield, D. E., and B. Thamdrup. 2009. Towards a consistent classification schemefor geochemical environments, or, why we wish the term ‘suboxic’ would go away.Geobiology 7:385–392.126Bibliography[57] Capelle, D. W., J. W. Dacey, and P. D. Tortell. 2015. An automated, high through-putmethod for accurate and precise measurements of dissolved nitrous-oxide and methaneconcentrations in natural waters. Limnology and Oceanography: Methods 13:345–355.[58] Caporaso, J. G., K. Bittinger, F. D. Bushman, T. Z. DeSantis, G. L. Andersen, andR. Knight. 2010. PyNAST: a flexible tool for aligning sequences to a template align-ment. Bioinformatics 26:266–267.[59] Caporaso, J. G., J. Kuczynski, J. Stombaugh, K. Bittinger, F. D. Bushman, E. K.Costello, N. Fierer, A. G. Pena, J. K. Goodrich, J. I. Gordon, et al. 2010. QIIME allowsanalysis of high-throughput community sequencing data. Nature Methods 7:335–336.[60] Caporaso, J. G., C. L. Lauber, W. A. Walters, D. Berg-Lyons, J. Huntley, N. Fierer,S. M. Owens, J. Betley, L. Fraser, M. Bauer, et al. 2012. Ultra-high-throughputmicrobial community analysis on the Illumina HiSeq and MiSeq platforms. The ISMEJournal 6:1621–1624.[61] Cariboni, J., D. Gatelli, R. Liska, and A. Saltelli. 2007. The role of sensitivity analysisin ecological modelling. Ecological Modelling 203:167–182.[62] Carlson, C. A., H. W. Ducklow, and A. F. Michaels. 1994. Annual flux of dissolvedorganic carbon from the euphotic zone in the northwestern Sargasso Sea. Nature371:405–408.[63] Carrias, J.-F., R. Céréghino, O. Brouard, L. Pélozuelo, A. Dejean, A. Couté, B. Cor-bara, and C. Leroy. 2014. Two coexisting tank bromeliads host distinct algal commu-nities on a tropical inselberg. Plant Biology 16:997–1004.[64] Caruso, T., Y. Chan, D. C. Lacap, M. C. Y. Lau, C. P. McKay, and S. B. Point-ing. 2011. Stochastic and deterministic processes interact in the assembly of desertmicrobial communities on a global scale. ISME Journal 5:1406–1413.[65] Caspi, R., T. Altman, R. Billington, K. Dreher, H. Foerster, C. A. Fulcher, T. A.Holland, I. M. Keseler, A. Kothari, A. Kubo, M. Krummenacker, M. Latendresse, L. A.Mueller, Q. Ong, S. Paley, P. Subhraveti, D. S. Weaver, D. Weerasinghe, P. Zhang,and P. D. Karp. 2014. The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research42:D459–D471.127Bibliography[66] Chave, J., G. Chust, and C. Thébaud, 2007. The importance of phylogenetic structurein biodiversity studies. Pages 150–167 in D. Storch, P. L. Marquet, and J. H. Brown,editors. Scaling Biodiversity. Cambridge University Press.[67] Chesson, P. 2000. Mechanisms of maintenance of species diversity. Annual Review ofEcology and Systematics 31:343–366.[68] Cheyns, K., J. Mertens, J. Diels, E. Smolders, and D. Springael. 2010. Monod kineticsrather than a first-order degradation model explains atrazine fate in soil mini-columns:Implications for pesticide fate modelling. Environmental Pollution 158:1405–1411.[69] Chibani-Chennoufi, S., A. Bruttin, M.-L. Dillmann, and H. Brüssow. 2004. Phage-HostInteraction: an Ecological Perspective. Journal of Bacteriology 186:3677–3686.[70] Chiu, H.-C., R. Levy, and E. Borenstein. 2014. Emergent biosynthetic capacity insimple microbial communities. PLoS Computational Biology 10:e1003695.[71] Christopher Frey, H., and S. R. Patil. 2002. Identification and review of sensitivityanalysis methods. Risk analysis 22:553–578.[72] Chung, B. K. S., and D.-Y. Lee. 2009. Flux-sum analysis: a metabolite-centric ap-proach for understanding the metabolic network. BMC Systems Biology 3:1–10.[73] Clegg, S. L., and M. Whitfield. 1995. A chemical model of seawater including dissolvedammonia and the stoichiometric dissociation constant of ammonia in estuarine waterand seawater from -2 to 40◦C. Geochimica et Cosmochimica Acta 59:2403–2421.[74] Cohen, Y. 1978. Consumption of dissolved nitrous oxide in an anoxic basin, SaanichInlet, British Columbia. Nature 272:235–237.[75] Connell, J. H. 1978. Diversity in tropical rain forests and coral reefs. Science 199:1302–1310.[76] Connor, E. F., and D. Simberloff. 1979. The assembly of species communities: chanceor competition? Ecology pages 1132–1140.[77] Conrad, R. 1999. Contribution of hydrogen to methane production and control ofhydrogen concentrations in methanogenic soils and sediments. FEMS MicrobiologyEcology 28:193–202.[78] Costello, D., P. Greenfield, and P. Lee. 1991. Dynamic modelling of a single-stagehigh-rate anaerobic reactor—II. Model verification. Water Research 25:859–871.128Bibliography[79] Cotner, J. B., and B. A. Biddanda. 2002. Small players, large role: Microbial influenceon biogeochemical processes in pelagic aquatic ecosystems. Ecosystems 5:105–121.[80] Courtin, M. G., and S. F. Spoelstra. 1990. A simulation model of the microbiologicaland chemical changes accompanying the initial stage of aerobic deterioration of silage.Grass and Forage Science 45:153–165.[81] Covert, M. W., E. M. Knight, J. L. Reed, M. J. Herrgard, and B. O. Palsson. 2004.Integrating high-throughput and computational data elucidates bacterial networks.Nature 429:92–96.[82] Covert, M. W., and B. Ø. Palsson. 2002. Transcriptional regulation in constraints-based metabolic models of Escherichia coli. Journal of Biological Chemistry277:28058–28064.[83] Covert, M. W., C. H. Schilling, and B. Palsson. 2001. Regulation of gene expressionin flux balance models of metabolism. Journal of Theoretical Biology 213:73–88.[84] Covert, M. W., N. Xiao, T. J. Chen, and J. R. Karr. 2008. Integrating metabolic,transcriptional regulatory and signal transduction models in Escherichia coli. Bioin-formatics 24:2044–2050.[85] Crick, F. 1970. Central dogma of molecular biology. Nature 227:561–563.[86] Cross, S. F., and P. C. Chandler, 1996. Saanich Inlet study - Surface circulationpatterns. Technical report, Province of British Columbia, Ministry of Environment,Lands and Parks.[87] Crump, B. C., C. S. Hopkinson, M. L. Sogin, and J. E. Hobbie. 2004. Microbialbiogeography along an estuarine salinity gradient: Combined influences of bacterialgrowth and residence time. Applied and Environmental Microbiology 70:1494–1505.[88] Dahlgren, J. P. 2010. Alternative regression methods are not considered in Murtaugh(2009) or by ecologists in general. Ecology Letters 13:E7–E9.[89] Daims, H., E. V. Lebedeva, P. Pjevac, P. Han, C. Herbold, M. Albertsen, N. Jehmlich,M. Palatinszky, J. Vierheilig, A. Bulaev, R. H. Kirkegaard, M. von Bergen, T. Rattei,B. Bendinger, P. H. Nielsen, and M. Wagner. 2015. Complete nitrification by Nitrospirabacteria. Nature 528:504–509.[90] Dalsgaard, T., B. Thamdrup, and D. E. Canfield. 2005. Anaerobic ammonium oxida-tion (anammox) in the marine environment. Research in Microbiology 156:457–464.129Bibliography[91] Damare, S., P. Singh, and S. Raghukumar, 2012. Biotechnology of Marine Fungi. Pages277–297 in C. Raghukumar, editor. Biology of Marine Fungi, volume 53 of Progress inMolecular and Subcellular Biology. Springer Berlin Heidelberg.[92] Davidson, R., and J. MacKinnon. 2004. Econometric Theory and Methods. OxfordUniversity Press.[93] Dawkins, R. 1976. The selfish gene. Oxford University Press.[94] de Boer, W., and H. Laanbroek. 1989. Ureolytic nitrification at low pH by Nitrosospiraspec. Archives of Microbiology 152:178–181.[95] de Boyer Montégut, C., G. Madec, A. S. Fischer, A. Lazar, and D. Iudicone. 2004.Mixed layer depth over the global ocean: An examination of profile data and a profile-based climatology. Journal of Geophysical Research: Oceans 109.[96] Dean, J. 1998. Lange’s Handbook of Chemistry. 15 edition. McGraw-Hill Professional.[97] DeLong, E. 2013. Microbial Metagenomics, Metatranscriptomics, and Metaproteomics.Methods in Enzymology, Elsevier Science.[98] DeLong, E. F. 2009. The microbial ocean from genomes to biomes. Nature 459:200–206.[99] DeLong, E. F., C. M. Preston, T. Mincer, V. Rich, S. J. Hallam, N.-U. Frigaard,A. Martinez, M. B. Sullivan, R. Edwards, B. R. Brito, et al. 2006. Community genomicsamong stratified microbial assemblages in the ocean’s interior. Science 311:496–503.[100] Devol, A. H., and S. I. Ahmed. 1981. Are high rates of sulphate reduction associatedwith anaerobic oxidation of methane? Nature 291:407–408.[101] Devol, A. H., J. J. Anderson, K. Kuivila, and J. W. Murray. 1984. A model for coupledsulfate reduction and methane oxidation in the sediments of Saanich Inlet. Geochimicaet Cosmochimica Acta 48:993–1004.[102] Dézerald, O., S. Talaga, C. Leroy, J.-F. Carrias, B. Corbara, A. Dejean, and R. Cérégh-ino. 2014. Environmental determinants of macroinvertebrate diversity in small waterbodies: insights from tank-bromeliads. Hydrobiologia 723:77–86.[103] Dick, J. M. 2008. Calculation of the relative metastabilities of proteins using theCHNOSZ software package. Geochemical Transactions 9.130Bibliography[104] Dick, J. M., and E. L. Shock. 2013. A metastable equilibrium model for the relativeabundances of microbial phyla in a hot spring. PloS one 8:72395.[105] Doak, D. F., D. Bigger, E. Harding, M. Marvier, R. O’Malley, and D. Thomson. 1998.The statistical inevitability of stability-diversity relationships in community ecology.The American Naturalist 151:264–276.[106] Doolittle, W. F., and O. Zhaxybayeva. 2010. Metagenomics and the units of biologicalorganization. BioScience 60:102–112.[107] Duarte, N. C., M. J. Herrgård, and B. Ø. Palsson. 2004. Reconstruction and vali-dation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scalemetabolic model. Genome Research 14:1298–1309.[108] Dumont, M., 2008. Apports de la modelisation des interactions pour une comprehen-sion fonctionnelle d’un ecosysteme. Ph.D. thesis, Universite Montpellier II.[109] Dumont, M., J. Harmand, A. Rapaport, and J.-J. Godon. 2009. Towards functionalmolecular fingerprints. Environmental Microbiology 11:1717–1727.[110] Edgar, R. C. 2010. Search and clustering orders of magnitude faster than BLAST.Bioinformatics 26:2460–2461.[111] Edwards, K. J., K. Becker, and F. Colwell. 2012. The deep, dark energy biosphere:Intraterrestrial life on Earth. Annual Review of Earth and Planetary Sciences 40:551–568.[112] Elena, S. F., and R. E. Lenski. 2003. Evolution experiments with microorganisms: thedynamics and genetic bases of adaptation. Nature Reviews Genetics 4:457–469.[113] Eliason, S. R. 1993. Maximum Likelihood Estimation: Logic and Practice. SAGEPublications, Newbury Park, CA.[114] Emerson, S., and J. Hedges. 2008. Chemical Oceanography and the Marine CarbonCycle. Cambridge University Press, Cambridge, UK.[115] Fagerbakke, K., M. Heldal, and S. Norland. 1996. Content of carbon, nitrogen, oxygen,sulfur and phosphorus in native aquatic and cultured bacteria. Aquatic MicrobialEcology 10:15–27.[116] Falkowski, P. G., T. Fenchel, and E. F. Delong. 2008. The microbial engines that driveEarth’s biogeochemical cycles. Science 320:1034–1039.131Bibliography[117] Farjalla, V. F., D. S. Srivastava, N. A. C. Marino, F. D. Azevedo, V. Dib, P. M. Lopes,A. S. Rosado, R. L. Bozelli, and F. A. Esteves. 2012. Ecological determinism increaseswith organism size. Ecology 93:1752–1759.[118] Feist, A. M., M. J. Herrgård, I. Thiele, J. L. Reed, and B. Ø. Palsson. 2008. Recon-struction of biochemical networks in microorganisms. Nature Reviews Microbiology7:129–143.[119] Feist, A. M., and B. O. Palsson. 2010. The biomass objective function. CurrentOpinion in Microbiology 13:344–349.[120] Fennel, K., and E. Boss. 2003. Subsurface maxima of phytoplankton and chlorophyll:Steady-state solutions from a simple model. Limnology and Oceanography 48:1521–1534.[121] Fennel, K., J. Wilkin, J. Levin, J. Moisan, J. O’Reilly, and D. Haidvogel. 2006. Nitrogencycling in the Middle Atlantic Bight: Results from a three-dimensional model andimplications for the North Atlantic nitrogen budget. Global Biogeochemical Cycles20.[122] Fernández, A., S. Huang, S. Seston, J. Xing, R. Hickey, C. Criddle, and J. Tiedje.1999. How stable is stable? Function versus community composition. Applied andEnvironmental Microbiology 65:3697–3704.[123] Fernandez, A. S., S. A. Hashsham, S. L. Dollhopf, L. Raskin, O. Glagoleva, F. B. Dazzo,R. F. Hickey, C. S. Criddle, and J. M. Tiedje. 2000. Flexible community structurecorrelates with stable community function in methanogenic bioreactor communitiesperturbed by glucose. Applied and Environmental Microbiology 66:4058–4067.[124] Ferrer, J., C. Prats, and D. López. 2008. Individual-based modelling: An essentialtool for microbiology. Journal of Biological Physics 34:19–37.[125] Finlay, B. J. 2002. Global dispersal of free-living microbial eukaryote species. Science296:1061–1063.[126] Finlay, B. J., S. C. Maberly, and J. I. Cooper. 1997. Microbial Diversity and EcosystemFunction. Oikos 80:209–213.[127] Fofonoff, N. P., and R. Millard-Junior, 1983. Algorithms for computation of funda-mental properties of seawater. Technical report, UNESCO technical papers in marinescience.132Bibliography[128] Franklin, R. B., and A. L. Mills. 2006. Structural and functional responses of a sewagemicrobial community to dilution-induced reductions in diversity. Microbial Ecology52:280–288.[129] Freilich, S., R. Zarecki, O. Eilam, E. S. Segal, C. S. Henry, M. Kupiec, U. Gophna,R. Sharan, and E. Ruppin. 2011. Competitive and cooperative metabolic interactionsin bacterial communities. Nature Communications 2:589.[130] Frentz, Z., S. Kuehn, and S. Leibler. 2015. Strongly deterministic population dynamicsin closed microbial communities. Physical Review X 5:041014.[131] Frey, C., S. Hietanen, K. Jürgens, M. Labrenz, and M. Voss. 2014. N and O isotopefractionation in nitrate during chemolithoautotrophic denitrification by Sulfurimonasgotlandica. Environmental Science & Technology 48:13229–13237.[132] Friesen, M. L., G. Saxer, M. Travisano, and M. Doebeli. 2004. Experimental evi-dence for sympatric ecological diversification due to frequency-dependent competitionin Escherichia coli. Evolution 58:245–260.[133] Frigaard, N. U., A. Martinez, T. J. Mincer, and E. F. DeLong. 2006. Prote-orhodopsin lateral gene transfer between marine planktonic Bacteria and Archaea.Nature 439:847–850.[134] Frossard, A., L. Gerull, M. Mutz, and M. O. Gessner. 2012. Disconnect of microbialstructure and function: enzyme activities and bacterial communities in nascent streamcorridors. ISME Journal 6:680–691.[135] Fuhrman, J. A. 1999. Marine viruses and their biogeochemical and ecological effects.Nature 399:541–548.[136] Fuhrman, J. A. 2009. Microbial community structure and its functional implications.Nature 459:193–199.[137] Fuqua, C., M. R. Parsek, and E. P. Greenberg. 2001. Regulation of gene expression bycell-to-cell communication: Acyl-homoserine lactone quorum sensing. Annual Reviewof Genetics 35:439–468.[138] Galbraith, E., A. Gnanadesikan, J. Dunne, and M. Hiscock. 2010. Regional impacts ofiron-light colimitation in a global biogeochemical model. Biogeosciences 7:1043–1064.133Bibliography[139] Ganesh, S., D. J. Parris, E. F. DeLong, and F. J. Stewart. 2014. Metagenomic analysisof size-fractionated picoplankton in a marine oxygen minimum zone. The ISME Journal8:187–211.[140] Garcia, H. E., R. A. Locarnini, T. P. Boyer, J. I. Antonov, O. Baranova, M. Zweng,J. Reagan, and D. Johnson, 2014. Dissolved Oxygen, Apparent Oxygen Utilization,and Oxygen Saturation. in S. Levitus and A. Mishonov, editors. World Ocean Atlas2013, volume 3. NOAA Atlas NESDIS 75.[141] Garcia, H., R. A. Locarnini, T. P. Boyer, J. I. Antonov, O. Baranova, M. Zweng,J. Reagan, and D. Johnson, 2014. Dissolved Inorganic Nutrients (phosphate, nitrate,silicate). in E. S. Levitus and A. Mishonov, editors. World Ocean Atlas 2013, volume 4.NOAA Atlas NESDIS 76.[142] Gargett, A. 1984. Vertical eddy diffusivity in the ocean interior. Journal of MarineResearch 42:359–393.[143] Geets, J., N. Boon, and W. Verstraete. 2006. Strategies of aerobic ammonia-oxidizingbacteria for coping with nutrient and oxygen fluctuations. FEMS Microbiology Ecology58:1–13.[144] Gentile, M., T. Yan, S. Tiquia, M. Fields, J. Nyman, J. Zhou, and C. Criddle. 2006.Stability in a denitrifying fluidized bed reactor. Microbial Ecology 52:311–321.[145] Gevers, D., F. M. Cohan, J. G. Lawrence, B. G. Spratt, T. Coenye, E. J. Feil, E. Stacke-brandt, Y. V. de Peer, P. Vandamme, F. L. Thompson, and J. Swings. 2005. Re-evaluating prokaryotic species. Nature Reviews Microbiology 3:733–739.[146] Gianchandani, E. P., M. A. Oberhardt, A. P. Burgard, C. D. Maranas, and J. A. Papin.2008. Predicting biological system objectives de novo from internal state measurements.BMC Bioinformatics 9:43.[147] Gibbons, S. M., J. G. Caporaso, M. Pirrung, D. Field, R. Knight, and J. A. Gilbert.2013. Evidence for a persistent microbial seed bank throughout the global ocean.Proceedings of the National Academy of Sciences 110:4651–4655.[148] Gilbert, J., F. Meyer, D. A. Antonopoulos, P. Balaji, C. T. Brown, C. T. Brown,N. Desai, J. A. Eisen, D. Evers, D. Field, et al. 2010. Meeting report: the terabasemetagenomics workshop and the vision of an Earth microbiome project. Standards inGenomic Sciences 3:243–248.134Bibliography[149] Giovannoni, S. J., H. J. Tripp, S. Givan, M. Podar, K. L. Vergin, D. Baptista, L. Bibbs,J. Eads, T. H. Richardson, M. Noordewier, M. S. Rappé, J. M. Short, J. C. Carrington,and E. J. Mathur. 2005. Genome streamlining in a cosmopolitan oceanic bacterium.Science 309:1242–1245.[150] Godon, J.-J., E. Zumstein, P. Dabert, F. Habouzit, and R. Moletta. 1997. Molecularmicrobial diversity of an anaerobic digestor as determined by small-subunit rDNAsequence analysis. Applied and Environmental Microbiology 63:2802–2813.[151] Goffredi, S. K., G. Jang, W. T. Woodside, and W. Ussler. 2011. Bromeliad catchmentsas habitats for methanogenesis in tropical rainforest canopies. Frontiers in Microbiology2.[152] Goffredi, S. K., A. H. Kantor, and W. T. Woodside. 2011. Aquatic microbial habi-tats within a neotropical rainforest: bromeliads and pH-associated trends in bacterialdiversity and composition. Microbial Ecology 61:529–542.[153] Goldberg, A. L., and A. St. John. 1976. Intracellular protein degradation in mammalianand bacterial cells: Part 2. Annual Review of Biochemistry 45:747–804.[154] Golding, I., J. Paulsson, S. M. Zawilski, and E. C. Cox. 2005. Real-time kinetics ofgene activity in individual bacteria. Cell 123:1025–1036.[155] Golterman, H., R. Clymo, and M. Ohnstad. 1978. Methods for physical and chemicalanalysis of fresh waters. International Review in Hydrobiology 65:169.[156] Gómez, P., and A. Buckling. 2011. Bacteria-phage antagonistic coevolution in soil.Science 332:106–109.[157] Gosalbes, M. J., A. Durbán, M. Pignatelli, J. J. Abellan, N. Jiménez-Hernández, A. E.Pérez-Cobas, A. Latorre, and A. Moya. 2011. Metatranscriptomic approach to analyzethe functional human gut microbiota. PLoS ONE 6:e17447.[158] Gosset, G. 2005. Improvement of Escherichia coli production strains by modification ofthe phosphoenolpyruvate: sugar phosphotransferase system. Microbial Cell Factories4:14.[159] Gotelli, N. J. 2000. Null model analysis of species co-occurrence patterns. Ecology81:2606–2621.135Bibliography[160] Graham, D. W., C. W. Knapp, E. S. Van Vleck, K. Bloor, T. B. Lane, and C. E.Graham. 2007. Experimental demonstration of chaotic instability in biological nitrifi-cation. The ISME Journal 1:385–393.[161] Grebogi, C., E. Ott, and J. A. Yorke. 1985. Attractors on an N-torus: Quasiperiodicityversus chaos. Physica D: Nonlinear Phenomena 15:354–373.[162] Green, J. L., B. J. Bohannan, and R. J. Whitaker. 2008. Microbial biogeography:from taxonomy to traits. Science 320:1039–1043.[163] Griffiths, B. S., K. Ritz, and R. E. Wheatley, 1997. Relationship between FunctionalDiversity and Genetic Diversity in Complex Microbial Communities. Pages 1–9 inH. Insam and A. Rangger, editors. Microbial Communities: Functional Versus Struc-tural Approaches. Springer Berlin Heidelberg, Berlin, Heidelberg.[164] Grote, J., G. Jost, M. Labrenz, G. J. Herndl, and K. Jürgens. 2008. Epsilonproteobac-teria represent the major portion of chemoautotrophic bacteria in sulfidic waters ofpelagic redoxclines of the Baltic and Black Seas. Applied and Environmental Microbi-ology 74:7546–7551.[165] Grote, J., J. C. Thrash, M. J. Huggett, Z. C. Landry, P. Carini, S. J. Giovannoni,and M. S. Rappé. 2012. Streamlining and core genome conservation among highlydivergent members of the SAR11 clade. mBio 3.[166] Güde, H. 1979. Grazing by protozoa as selection factor for activated sludge bacteria.Microbial Ecology 5:225–237.[167] Gupta, A., and G. Rao. 2003. A study of oxygen transfer in shake flasks using anon-invasive oxygen sensor. Biotechnology and Bioengineering 84:351–358.[168] Hahn, A., N. Hanson, D. Kim, K. Konwar, and S. Hallam, 2015. Assembly independentfunctional annotation of short-read data using SOFA: Short-ORF functional annota-tion. Pages 1–6 in Computational Intelligence in Bioinformatics and ComputationalBiology (CIBCB), 2015 IEEE Conference on.[169] Hahn, A. S., K. Konwar, S. Louca, N. W. Hanson, and S. J. Hallam. 2016. Theinformation science of microbial ecology. Current Opinion in Microbiology 31:209–216.[170] Hallam, S. J., and J. P. McCutcheon. 2015. Microbes don’t play solitaire: how coop-eration trumps isolation in the microbial world. Environmental Microbiology Reports7:26–28.136Bibliography[171] Hammersley, J. M., and D. C. Handscomb. 1964. Monte Carlo methods. Methuen.[172] Hannig, M., G. Lavik, M. Kuypers, D. Woebken, W. Martens-Habbena, and K. Jür-gens. 2007. Shift from denitrification to anammox after inflow events in the centralBaltic Sea. Limnology and Oceanography 52:1336–1345.[173] Hanson, R. S., and T. E. Hanson. 1996. Methanotrophic bacteria. MicrobiologicalReviews 60:439–471.[174] Hantula, J., A. Kurki, P. Vuoriranta, and D. Bamford. 1991. Ecology of bacteriophagesinfecting activated sludge bacteria. Applied and Environmental Microbiology 57:2147–2151.[175] Hao, X., Q. Wang, X. Zhang, Y. Cao, and C. van Mark Loosdrecht. 2009. Experimentalevaluation of decrease in bacterial activity due to cell death and activity decay inactivated sludge. Water Research 43:3604–3612.[176] Harcombe, W. R., N. F. Delaney, N. Leiby, N. Klitgord, and C. J. Marx. 2013. Theability of flux balance analysis to predict evolution of central metabolism scales withthe initial distance to the optimum. PLOS Computational Biology 9:e1003091.[177] Harcombe, W. R., W. J. Riehl, I. Dukovski, B. R. Granger, A. Betts, A. H. Lang,G. Bonilla, A. Kar, N. Leiby, P. Mehta, C. J. Marx, and D. Segrè. 2014. Metabolic re-source allocation in individual microbes determines ecosystem interactions and spatialdynamics. Cell Reports 7:1104–1115.[178] Hardin, G. 1960. The competitive exclusion principle. Science 131:1292–1297.[179] Hashsham, S. A., A. S. Fernandez, S. L. Dollhopf, F. B. Dazzo, R. F. Hickey, J. M.Tiedje, and C. S. Criddle. 2000. Parallel processing of substrate correlates with greaterfunctional stability in methanogenic bioreactor communities perturbed by glucose. Ap-plied and Environmental Microbiology 66:4050–4057.[180] Hastings, A. 2004. Transients: the key to long-term ecological understanding? Trendsin Ecology & Evolution 19:39–45.[181] Hawley, A. K., H. M. Brewer, A. D. Norbeck, L. Paša-Tolić, and S. J. Hallam. 2014.Metaproteomics reveals differential modes of metabolic coupling among ubiquitousoxygen minimum zone microbes. Proceedings of the National Academy of Sciences111:11395–11400.137Bibliography[182] Hawley, A. K., S. Kheirandish, A. Mueller, H. T. Leung, A. D. Norbeck, H. M. Brewer,L. Pasa-Tolic, and S. J. Hallam, 2013. Molecular Tools for Investigating MicrobialCommunity Structure and Function in Oxygen-Deficient Marine Waters. Chapter 16,pages 305–329 in E. F. DeLong, editor. Microbial Metagenomics, Metatranscriptomics,and Metaproteomics, volume 531 of Methods in Enzymology. Academic Press.[183] Helliwell, K. E., G. L. Wheeler, and A. G. Smith. 2013. Widespread decay of vitamin-related pathways: coincidence or consequence? Trends in Genetics 29:469–478.[184] Henry, C. S., M. DeJongh, A. A. Best, P. M. Frybarger, B. Linsay, and R. L.Stevens. 2010. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nature Biotechnology 28:977–982.[185] Hérault, B. 2007. Reconciling niche and neutrality through the Emergent Groupapproach. Perspectives in Plant Ecology, Evolution and Systematics 9:71–78.[186] Herlinveaux, R. H. 1962. Oceanography of Saanich Inlet in Vancouver Island, BritishColumbia. Journal of the Fisheries Research Board of Canada 19:1–37.[187] Herrgård, M. J., B.-S. Lee, V. Portnoy, and B. Ø. Palsson. 2006. Integrated analysisof regulatory and metabolic networks reveals novel regulatory mechanisms in Saccha-romyces cerevisiae. Genome research 16:627–635.[188] Herron, M. D., and M. Doebeli. 2013. Parallel evolutionary dynamics of adaptivediversification in Escherichia coli. PLoS Biology 11:e1001490.[189] Hester, E. R., K. L. Barott, J. Nulton, M. J. Vermeij, and F. L. Rohwer. 2015. Stableand sporadic symbiotic communities of coral and algal holobionts. ISME Journal .[190] Holmes, A. J., A. Costello, M. E. Lidstrom, and J. C. Murrell. 1995. Evidence that par-ticipate methane monooxygenase and ammonia monooxygenase may be evolutionarilyrelated. FEMS Microbiology Letters 132:203–208.[191] Holmfeldt, K., M. Middelboe, O. Nybroe, and L. Riemann. 2007. Large variabilities inhost strain susceptibility and phage host range govern interactions between lytic ma-rine phages and their Flavobacterium hosts. Applied and Environmental microbiology73:6730–6739.[192] Holtappels, M., G. Lavik, M. M. Jensen, and M. M. Kuypers, 2011. 15N-labelingexperiments to dissect the contributions of heterotrophic denitrification and anammoxto ritrogen removal in the OMZ waters of the ocean. Chapter 10, pages 223–251 in138BibliographyM. G. Klotz, editor. Research on Nitrification and Related Processes, Part A, volume486 of Methods in Enzymology. Academic Press.[193] Hood, R. R., E. A. Laws, R. A. Armstrong, N. R. Bates, C. W. Brown, C. A. Carlson,F. Chai, S. C. Doney, P. G. Falkowski, R. A. Feely, M. A. M. Friedrichs, M. R.Landry, J. Keith Moore, D. M. Nelson, T. L. Richardson, B. Salihoglu, M. Schartau,D. A. Toole, and J. D. Wiggert. 2006. Pelagic functional group modeling: Progress,challenges and prospects. Deep Sea Research II: Topical Studies in Oceanography53:459–512.[194] Hoppe, H., H. Ducklow, and B. Karrasch. 1993. Evidence for dependency of bacterial-growth on enzymatic-hydrolysis of particulate organic-matter in the mesopelagic ocean.Marine Ecology Progress Series 93:277–283.[195] Horiike, T., K. Hamada, S. Kanaya, and T. Shinozawa. 2001. Origin of eukaryoticcell nuclei by symbiosis of Archaea in Bacteria is revealed by homology-hit analysis.Nature Cell Biology 3:210–214.[196] Horner-Devine, M. C., and B. J. M. Bohannan. 2006. Phylogenetic clustering andoverdispersion in bacterial communities. Ecology 87:S100–S108.[197] Hubbell, S. P. 2005. Neutral theory in community ecology and the hypothesis offunctional equivalence. Functional Ecology 19:166–172.[198] Hunik, J., H. Meijer, and J. Tramper. 1993. Kinetics of Nitrobacter agilis at extremesubstrate, product and salt concentrations. Applied microbiology and biotechnology40:442–448.[199] Hurwitz, B. L., S. J. Hallam, and M. B. Sullivan. 2013. Metabolic reprogramming byviruses in the sunlit and dark ocean. Genome Biology 14:R123.[200] Husnik, F., N. Nikoh, R. Koga, L. Ross, R. P. Duncan, M. Fujie, M. Tanaka, N. Satoh,D. Bachtrog, A. C. C. Wilson, C. D. von Dohlen, T. Fukatsu, and J. P. McCutcheon.2013. Horizontal gene transfer from diverse bacteria to an insect genome enables atripartite nested mealybug symbiosis. Cell 153:1567–1578.[201] Inskeep, W. P., D. B. Rusch, Z. J. Jay, M. J. Herrgard, M. A. Kozubal, T. H. Richard-son, R. E. Macur, N. Hamamura, R. Jennings, B. W. Fouke, et al. 2010. Metagenomesfrom high-temperature chemotrophic systems reveal geochemical controls on microbialcommunity structure and function. PloS ONE 5:e9773.139Bibliography[202] Jang, A., S. Okabe, Y. Watanabe, I. S. Kim, and P. L. Bishop. 2005. Measurementof growth rate of ammonia oxidizing bacteria in partially submerged rotating biolog-ical contactor by fluorescent in situ hybridization (FISH). Journal of EnvironmentalEngineering and Science 4:413–420.[203] Jannasch, H. W., and R. I. Mateles, 1974. Experimental bacterial ecology studied incontinuous culture. Chapter 5, pages 165–212 in A. Rose and D. Tempest, editors.Advances in Microbial Physiology, volume 11. Elsevier Science.[204] Jensen, M. A., S. M. Faruque, J. J. Mekalanos, and B. R. Levin. 2006. Modeling therole of bacteriophage in the control of cholera outbreaks. Proceedings of the NationalAcademy of Sciences of the United States of America 103:4652–4657.[205] Jensen, M. M., P. Lam, N. P. Revsbech, B. Nagel, B. Gaye, M. S. Jetten, and M. M.Kuypers. 2011. Intensive nitrogen loss over the Omani Shelf due to anammox coupledwith dissimilatory nitrite reduction to ammonium. The ISME Journal 5:1660–1670.[206] Jensen, M. M., J. Petersen, T. Dalsgaard, and B. Thamdrup. 2009. Pathways, rates,and regulation of N2 production in the chemocline of an anoxic basin, Mariager Fjord,Denmark. Marine Chemistry 113:102–113.[207] Jetten, M. S., A. J. Stams, and A. J. Zehnder. 1990. Acetate threshold values andacetate activating enzymes in methanogenic bacteria. FEMS Microbiology Letters73:339–344.[208] Jiang, Q. Q., and L. R. Bakken. 1999. Comparison of Nitrosospira strains isolatedfrom terrestrial environments. FEMS Microbiology Ecology 30:171–186.[209] Jiao, D., Y. Ye, and H. Tang. 2013. Probabilistic inference of biochemical reactions inmicrobial communities from metagenomic sequences. PLoS Comput Biol 9:e1002981.[210] Jin, Q., and C. M. Bethke. 2007. The thermodynamics and kinetics of microbialmetabolism. American Journal of Science 307:643–677.[211] Jin, Q., E. E. Roden, and J. R. Giska. 2013. Geomicrobial kinetics: Extrapolatinglaboratory studies to natural environments. Geomicrobiology Journal 30:173–185.[212] Jing, C., Z. Ping, and Q. Mahmood. 2010. Influence of various nitrogenous electronacceptors on the anaerobic sulfide oxidation. Bioresource Technology 101:2931–2937.140Bibliography[213] Johnke, J., Y. Cohen, M. de Leeuw, A. Kushmaro, E. Jurkevitch, and A. Chatzinotas.2014. Multiple micro-predators controlling bacterial communities in the environment.Current Opinion in Biotechnology 27:185–190.[214] Johnson, S. G., 2014. The NLopt nonlinear-optimization package. Software.[215] Jones, J. B., R. M. Holmes, S. G. Fisher, N. B. Grimm, and D. M. Greene. 1995.Methanogenesis in Arizona, USA dryland streams. Biogeochemistry 31:155–173.[216] Jørgensen, L., C. A. Stedmon, T. Kragh, S. Markager, M. Middelboe, and M. Søn-dergaard. 2011. Global trends in the fluorescence characteristics and distribution ofmarine dissolved organic matter. Marine Chemistry 126:139–148.[217] Jost, G., M. V. Zubkov, E. Yakushev, M. Labrenz, and K. Jurgens. 2008. High abun-dance and dark CO2 fixation of chemolithoautotrophic prokaryotes in anoxic watersof the Baltic Sea. Limnology and Oceanography 53:14–22.[218] Juniper, S., and R. Brinkhurst. 1986. Water-column dark CO2 fixation and bacterial-mat growth in intermittently anoxic Saanich Inlet, British-Columbia. Marine EcologyProgress Series 33:41–50.[219] Kalvelage, T., M. M. Jensen, S. Contreras, N. P. Revsbech, P. Lam, M. Günter,J. LaRoche, G. Lavik, and M. M. M. Kuypers. 2011. Oxygen sensitivity of anammoxand coupled N-cycle processes in oxygen minimum zones. PLoS ONE 6:e29299.[220] Kalyuzhnyi, S. 1997. Batch anaerobic digestion of glucose and its mathematicalmodeling. II. Description, verification and application of model. Bioresource technology59:249–258.[221] Kanehisa, M., and S. Goto. 2000. KEGG: Kyoto encyclopedia of genes and genomes.Nucleic Acids Research 28:27–30.[222] Kantz, H., and T. Schreiber. 2004. Nonlinear Time Series Analysis. 2 edition. Cam-bridge University Press.[223] Karlsson, F. H., I. Nookaew, D. Petranovic, and J. Nielsen. 2011. Prospects for systemsbiology and modeling of the gut microbiome. Trends in Biotechnology 29:251–258.[224] Kashtan, N., S. E. Roggensack, S. Rodrigue, J. W. Thompson, S. J. Biller, A. Coe,H. Ding, P. Marttinen, R. R. Malmstrom, R. Stocker, et al. 2014. Single-cell ge-nomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science344:416–420.141Bibliography[225] Kaspar, H., and K. Wuhrmann. 1977. Product inhibition in sludge digestion. MicrobialEcology 4:241–248.[226] Kassen, R., and P. B. Rainey. 2004. The ecology and genetics of microbial diversity.Annual Review of Microbiology 58:207–231.[227] Keen, G., and J. Prosser. 1987. Steady state and transient growth of autotrophicnitrifying bacteria. Archives of Microbiology 147:73–79.[228] Kelley, D., 2014. oce: Analysis of oceanographic data.[229] Kemp, P. F., S. Lee, and J. LaRoche. 1993. Estimating the growth rate of slowlygrowing marine bacteria from RNA content. Applied and Environmental Microbiology59:2594–2601.[230] Kendall, B. E., C. J. Briggs, W. W. Murdoch, P. Turchin, S. P. Ellner, E. McCauley,R. M. Nisbet, and S. N. Wood. 1999. Why do populations cycle? A synthesis ofstatistical and mechanistic modeling approaches. Ecology 80:1789–1805.[231] Kent, A. D., and E. W. Triplett. 2002. Microbial communities and their interactionsin soil and rhizosphere ecosystems. Annual Review of Microbiology 56:211–236.[232] Keswani, J., and W. B. Whitman. 2001. Relationship of 16S rRNA sequence simi-larity to DNA hybridization in prokaryotes. International Journal of Systematic andEvolutionary Microbiology 51:667–678.[233] Khatri, B. S., A. Free, and R. J. Allen. 2012. Oscillating microbial dynamics driven bysmall populations, limited nutrient supply and high death rates. Journal of TheoreticalBiology 314:120–129.[234] Kim, T.-S., J.-Y. Jeong, G. Wells, and H.-D. Park. 2013. General and rare bacte-rial taxa demonstrating different temporal dynamic patterns in an activated sludgebioreactor. Applied Microbiology and Biotechnology 97:1755–1765.[235] Kirchman, D. 2010. Microbial Ecology of the Oceans. Wiley Series in Ecological andApplied Microbiology, Wiley.[236] Kirchman, D. L., X. A. G. Moran, and H. Ducklow. 2009. Microbial growth in thepolar oceans — role of temperature and potential impact of climate change. NatureReviews Microbiology 7:451–459.142Bibliography[237] Kirkup, B. C., and M. A. Riley. 2004. Antibiotic-mediated antagonism leads to abacterial game of rock-paper-scissors in vivo. Nature 428:412–414.[238] Klitgord, N., and D. Segrè. 2010. Environments that induce synthetic microbialecosystems. PLOS Computational Biology 6:e1001002.[239] Klitgord, N., and D. Segrè. 2011. Ecosystems biology of microbial metabolism. CurrentOpinion in Biotechnology 22:541–546.[240] Klotz, M. G. 2011. Research on Nitrification and Related Processes. Elsevier.[241] Knight, R., J. Jansson, D. Field, N. Fierer, N. Desai, J. A. Fuhrman, P. Hugenholtz,D. van der Lelie, F. Meyer, R. Stevens, et al. 2012. Unlocking the potential ofmetagenomics through replicated experimental design. Nature Biotechnology 30:513–520.[242] Knightes, C. D., and C. A. Peters. 2000. Statistical analysis of nonlinear parameterestimation for Monod biodegradation kinetics using bivariate data. Biotechnology andBioengineering 69:160–170.[243] Knowles, B., C. Silveira, B. Bailey, K. Barott, V. Cantu, A. Cobián-Güemes,F. Coutinho, E. Dinsdale, B. Felts, K. Furby, et al. 2016. Lytic to temperate switchingof viral communities. Nature 531:466–470.[244] Koeppel, A. F., and M. Wu. 2014. Species matter: the role of competition in theassembly of congeneric bacteria. The ISME Journal 8:531–540.[245] Kolter, R., D. A. Siegele, and A. Tormo. 1993. The stationary phase of the bacteriallife cycle. Annual Reviews in Microbiology 47:855–874.[246] Konishi, S., and G. Kitagawa. 2008. Information Criteria and Statistical Modeling.Springer Series in Statistics, Springer, New York.[247] Konwar, K. M., N. W. Hanson, M. P. Bhatia, D. Kim, S.-J. Wu, A. S. Hahn, C. Morgan-Lang, H. K. Cheung, and S. J. Hallam. 2015. MetaPathways v2.5: quantitativefunctional, taxonomic and usability improvements. Bioinformatics 31:3345–3347.[248] Konwar, K. M., N. W. Hanson, A. P. Pagé, and S. J. Hallam. 2013. MetaPathways:a modular pipeline for constructing pathway/genome databases from environmentalsequence information. BMC Bioinformatics 14:202.143Bibliography[249] Krakat, N., A. Westphal, S. Schmidt, and P. Scherer. 2010. Anaerobic digestion ofrenewable biomass: Thermophilic temperature governs methanogen population dy-namics. Applied and Environmental Microbiology 76:1842–1850.[250] Kuo, C.-H., N. A. Moran, and H. Ochman. 2009. The consequences of genetic driftfor bacterial genome complexity. Genome Research 19:1450–1454.[251] Kuypers, M. M. M., G. Lavik, D. Woebken, M. Schmid, B. M. Fuchs, R. Amann,B. B. Jørgensen, and M. S. M. Jetten. 2005. Massive nitrogen loss from the Benguelaupwelling system through anaerobic ammonium oxidation. Proceedings of the NationalAcademy of Sciences of the United States of America 102:6478–6483.[252] Lam, P., and M. M. Kuypers. 2011. Microbial nitrogen cycling processes in oxygenminimum zones. Annual Review of Marine Science 3:317–345.[253] Lam, P., G. Lavik, M. M. Jensen, J. van de Vossenberg, M. Schmid, D. Woebken,D. Gutiérrez, R. Amann, M. S. Jetten, and M. M. Kuypers. 2009. Revising thenitrogen cycle in the Peruvian oxygen minimum zone. Proceedings of the NationalAcademy of Sciences 106:4752–4757.[254] Langenheder, S., E. S. Lindström, and L. J. Tranvik. 2005. Weak coupling betweencommunity composition and functioning of aquatic bacteria. Limnology and Oceanog-raphy 50:957–967.[255] Langille, M. G., J. Zaneveld, J. G. Caporaso, D. McDonald, D. Knights, J. A. Reyes,J. C. Clemente, D. E. Burkepile, R. L. V. Thurber, R. Knight, et al. 2013. Predictivefunctional profiling of microbial communities using 16S rRNA marker gene sequences.Nature Biotechnology 31:814–821.[256] LaRowe, D. E., A. W. Dale, J. P. Amend, and P. Van Cappellen. 2012. Thermodynamiclimitations on microbially catalyzed reaction rates. Geochimica et Cosmochimica Acta90:96–109.[257] Larsen, P., F. Collart, D. Field, F. Meyer, K. Keegan, C. Henry, J. McGrath, J. Quinn,and J. Gilbert. 2011. Predicted Relative Metabolomic Turnover (PRMT): determiningmetabolic turnover from a coastal marine metagenomic dataset. Microbial Informaticsand Experimentation 1:4.[258] Larsen, P., Y. Hamada, and J. Gilbert. 2012. Modeling microbial communities: Cur-rent, developing, and future technologies for predicting microbial community interac-tion. Journal of Biotechnology 160:17–24.144Bibliography[259] Lawrence, A. W., and P. L. McCarty. 1969. Kinetics of methane fermentation inanaerobic treatment. Water Pollution Control Federation 41:R1–R17.[260] Lawrence, J., and S. Maier. 1977. Correction for the inherent error in optical densityreadings. Applied and Environmental Microbiology 33:482–484.[261] Lazar, I., I. Petrisor, and T. Yen. 2007. Microbial enhanced oil recovery (MEOR).Petroleum Science and Technology 25:1353–1366.[262] Le Gac, M., M. D. Brazas, M. Bertrand, J. G. Tyerman, C. C. Spencer, R. E. W. Han-cock, and M. Doebeli. 2008. Metabolic changes associated with adaptive diversificationin Escherichia coli. Genetics 178:1049–1060.[263] Lebowitz, J., and H. Spohn. 1982. Microscopic basis for Fick’s law for self-diffusion.Journal of Statistical Physics 28:539–556.[264] Lee, K., Y.-J. Choo, S. J. Giovannoni, and J.-C. Cho. 2007. Maritimibacter alkaliphilusgen. nov., sp. nov., a genome-sequenced marine bacterium of the Roseobacter cladein the order Rhodobacterales. International Journal of Systematic and EvolutionaryMicrobiology 57:1653–1658.[265] Lee, P. S., L. B. Shaw, L. H. Choe, A. Mehra, V. Hatzimanikatis, and K. H. Lee. 2003.Insights into the relation between mRNA and protein expression patterns: II. Experi-mental observations in Escherichia coli. Biotechnology and Bioengineering 84:834–841.[266] Legendre, P., and L. Legendre. 1998. Numerical Ecology. 2 edition. Developments inEnvironmental Modelling, Elsevier Science B.V., Amsterdam.[267] Lehman, C. L., and D. Tilman. 2000. Biodiversity, stability, and productivity incompetitive communities. The American Naturalist 156:534–552.[268] Leibold, M. A., M. Holyoak, N. Mouquet, P. Amarasekare, J. M. Chase, M. F. Hoopes,R. D. Holt, J. B. Shurin, R. Law, D. Tilman, M. Loreau, and A. Gonzalez. 2004. Themetacommunity concept: a framework for multi-scale community ecology. EcologyLetters 7:601–613.[269] Lennon, J. T., and J. B. H. Martiny. 2008. Rapid evolution buffers ecosystem impactsof viruses in a microbial food web. Ecology Letters 11:1178–1188.[270] Lerat, E., V. Daubin, H. Ochman, and N. A. Moran. 2005. Evolutionary origins ofgenomic repertoires in bacteria. PLoS Biol 3:e130.145Bibliography[271] Levy, R., and E. Borenstein. 2013. Metabolic modeling of species interaction in thehuman microbiome elucidates community-level assembly rules. Proceedings of theNational Academy of Sciences 110:12804–12809.[272] Li, J., S. Browning, S. P. Mahal, A. M. Oelschlegel, and C. Weissmann. 2010. Dar-winian evolution of prions in cell culture. Science 327:869–872.[273] Li, W., L. Fu, B. Niu, S. Wu, and J. Wooley. 2012. Ultrafast clustering algorithms formetagenomic sequence analysis. Briefings in Bioinformatics 13:656–668.[274] Li, X. N., G. T. Taylor, Y. Astor, R. Varela, and M. I. Scranton. 2012. The conundrumbetween chemoautotrophic production and reductant and oxidant supply: A case studyfrom the Cariaco Basin. Deep Sea Research Part I: Oceanographic Research Papers61:1–10.[275] Lilley, M. D., J. A. Baross, and L. I. Gordon. 1982. Dissolved hydrogen and methane inSaanich Inlet, British Columbia. Deep Sea Research Part A. Oceanographic ResearchPapers 29:1471–1484.[276] Lima-Mendez, G., K. Faust, N. Henry, J. Decelle, S. Colin, F. Carcillo, S. Chaffron,J. C. Ignacio-Espinosa, S. Roux, F. Vincent, L. Bittner, Y. Darzi, J. Wang, S. Au-dic, L. Berline, G. Bontempi, A. M. Cabello, L. Coppola, F. M. Cornejo-Castillo,F. d’Ovidio, L. De Meester, I. Ferrera, M.-J. Garet-Delmas, L. Guidi, E. Lara, S. Pe-sant, M. Royo-Llonch, G. Salazar, P. Sánchez, M. Sebastian, C. Souffreau, C. Dimier,M. Picheral, S. Searson, S. Kandels-Lewis, T. O. coordinators, G. Gorsky, F. Not,H. Ogata, S. Speich, L. Stemmann, J. Weissenbach, P. Wincker, S. G. Acinas, S. Suna-gawa, P. Bork, M. B. Sullivan, E. Karsenti, C. Bowler, C. de Vargas, and J. Raes. 2015.Determinants of community structure in the global plankton interactome. Science 348.[277] Limpiyakorn, T., M. Fürhacker, R. Haberl, T. Chodanon, P. Srithep, and P. Son-thiphand. 2013. amoA-encoding archaea in wastewater treatment plants: a review.Applied Microbiology and Biotechnology 97:1425–1439.[278] Locarnini, R. A., A. V. Mishonov, J. I. Antonov, T. P. Boyer, H. E. Garcia, O. K.Baranova, M. M. Zweng, C. R. Paver, J. R. Reagan, D. R. Johnson, M. Hamilton, andD. Seidov, 2013. Temperature. in S. Levitus and A. Mishonov, editors. World OceanAtlas 2013, volume 1. NOAA Atlas NESDIS 73.[279] Lopez, L. C. S., and R. I. Rios. 2001. Phytotelmata faunal communities in sun-exposedversus shaded terrestrial bromeliads from Southeastern Brazil. Selbyana 22:219–224.146Bibliography[280] López-García, P., and D. Moreira. 1999. Metabolic symbiosis at the origin of eukary-otes. Trends in Biochemical Sciences 24:88–93.[281] Loreau, M. 2004. Does functional redundancy exist? Oikos 104:606–611.[282] Loreau, M., and C. Mazancourt. 2013. Biodiversity and ecosystem stability: a synthesisof underlying mechanisms. Ecology Letters 16:106—115.[283] Loreau, M., S. Naeem, P. Inchausti, J. Bengtsson, J. Grime, A. Hector, D. Hooper,M. Huston, D. Raffaelli, B. Schmid, et al. 2001. Biodiversity and ecosystem functioning:current knowledge and future challenges. Science 294:804–808.[284] Louca, S., and M. Doebeli. 2015. Calibration and analysis of genome-based modelsfor microbial ecology. eLife 4:e08208.[285] Louca, S., and M. Doebeli. 2015. Transient dynamics of competitive exclusion inmicrobial communities. Environmental Microbiology 18:1863–1874.[286] Louca, S., and M. Doebeli. 2016. Reaction-centric modeling of microbial ecosystems.Ecological Modelling 335:74–86.[287] Louca, S., A. K. Hawley, S. Katsev, M. Torres-Beltran, M. P. Bhatia, S. Kheirandish,C. C. Michiels, D. Capelle, G. Lavik, M. Doebeli, S. A. Crowe, and S. J. Hallam. 2016.Integrating biogeochemistry with multiomic sequence information in a model oxygenminimum zone. Proceedings of the National Academy of Sciences .[288] Louca, S., S. M. S. Jacques, A. P. F. Pires, J. S. Leal, D. S. Srivastava, L. W. Parfrey,V. F. Farjalla, and M. Doebeli. in review. Functional stability despite high taxonomicvariability across microbial communities .[289] Louca, S., L. W. Parfrey, and M. Doebeli. 2016. Decoupling function and taxonomyin the global ocean microbiome. Science 353:1272–1277.[290] Luo, Y., L. A. Miller, B. De Baere, M. Soon, and R. Francois. 2014. POC fluxesmeasured by sediment traps and 234Th:238U disequilibrium in Saanich Inlet, BritishColumbia. Marine Chemistry 162:19–29.[291] Lyons, L. 1986. Statistics for Nuclear and Particle Physicists. Cambridge UniversityPress, Cambridge, UK.[292] Mahadevan, R., J. S. Edwards, and F. J. Doyle III. 2002. Dynamic flux balanceanalysis of diauxic growth in Escherichia coli. Biophysical Journal 83:1331–1340.147Bibliography[293] Maier, U., M. Losen, and J. Büchs. 2004. Advances in understanding and modeling thegas–liquid mass transfer in shake flasks. Biochemical Engineering Journal 17:155–167.[294] Maixner, F., D. R. Noguera, B. Anneser, K. Stoecker, G. Wegl, M. Wagner, andH. Daims. 2006. Nitrite concentration influences the population structure of Nitrospira-like bacteria. Environmental Microbiology 8:1487–1495.[295] Manning, C. C., R. C. Hamme, and A. Bourbonnais. 2010. Impact of deep-waterrenewal events on fixed nitrogen loss from seasonally-anoxic Saanich Inlet. MarineChemistry 122:1–10.[296] Marino, N. A. C., D. S. Srivastava, and V. F. Farjalla. 2013. Aquatic macroinvertebratecommunity composition in tank-bromeliads is determined by bromeliad species and itsconstrained characteristics. Insect Conservation and Diversity 6:372–380.[297] Marjanovic, O., B. Lennox, D. Sandoz, K. Smith, and M. Crofts. 2006. Real-time mon-itoring of an industrial batch process. Computers & Chemical Engineering 30:1476–1481.[298] Maron, P.-A., L. Ranjard, C. Mougel, and P. Lemanceau. 2007. Metaproteomics: Anew approach for studying functional microbial ecology. Microbial Ecology 53:486–493.[299] Marouga, R., and S. Kjelleberg. 1996. Synthesis of immediate upshift (Iup) proteinsduring recovery of marine Vibrio sp. strain S14 subjected to long-term carbon starva-tion. Journal of Bacteriology 178:817–822.[300] Marsh, K., G. Sims, and R. Mulvaney. 2005. Availability of urea to autotrophicammonia-oxidizing bacteria as related to the fate of 14C-and 15N-labeled urea addedto soil. Biology and Fertility of Soils 42:137–145.[301] Marshall, K. T., and R. M. Morris. 2013. Isolation of an aerobic sulfur oxidizer fromthe SUP05/Arctic96BD-19 clade. ISME Journal 7:452–455.[302] Martens-Habbena, W., P. M. Berube, H. Urakawa, J. R. de la Torre, and D. A. Stahl.2009. Ammonia oxidation kinetics determine niche separation of nitrifying Archaeaand Bacteria. Nature 461:976–979.[303] Martinez-Camara, M., B. Béjar Haro, A. Stohl, and M. Vetterli. 2014. A robustmethod for inverse transport modeling of atmospheric emissions using blind outlierdetection. Geoscientific Model Development 7:2303–2311.148Bibliography[304] Martinson, G. O., F. A. Werner, C. Scherber, R. Conrad, M. D. Corre, H. Flessa,K. Wolf, M. Klose, S. R. Gradstein, and E. Veldkamp. 2010. Methane emissions fromtank bromeliads in neotropical forests. Nature Geoscience 3:766–769.[305] Martiny, A. C., A. P. Tai, D. Veneziano, F. Primeau, and S. W. Chisholm. 2009. Tax-onomic resolution, ecotypes and the biogeography of Prochlorococcus. EnvironmentalMicrobiology 11:823–832.[306] Martiny, J. B. H., B. J. Bohannan, J. H. Brown, R. K. Colwell, J. A. Fuhrman,J. L. Green, M. C. Horner-Devine, M. Kane, J. A. Krumins, C. R. Kuske, et al.2006. Microbial biogeography: putting microorganisms on the map. Nature ReviewsMicrobiology 4:102–112.[307] Martiny, J. B. H., J. A. Eisen, K. Penn, S. D. Allison, and M. C. Horner-Devine. 2011.Drivers of bacterial β-diversity depend on spatial scale. Proceedings of the NationalAcademy of Sciences 108:7850–7854.[308] Martiny, J. B. H., S. E. Jones, J. T. Lennon, and A. C. Martiny. 2015. Microbiomesin light of traits: A phylogenetic perspective. Science 350.[309] MATLAB. 2010. version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts.[310] McCann, K. 2000. The diversity-stability debate. Nature 405:228–233.[311] McCutcheon, J. P., and N. A. Moran. 2012. Extreme genome reduction in symbioticbacteria. Nature Reviews Microbiology 10:13–26.[312] McDuffie, N. G. 1991. Bioreactor Design Fundamentals. Butterworth-Heinemann,USA.[313] McFadden, G. I. 1999. Endosymbiosis and evolution of the plant cell. Current Opinionin Plant Biology 2:513–519.[314] McGuinness, L. M., M. Salganik, L. Vega, K. D. Pickering, and L. J. Kerkhof.2006. Replicability of bacterial communities in denitrifying bioreactors as measuredby PCR/T-RFLP analysis. Environmental Science & Technology 40:509–515.[315] McInerney, M. J., C. G. Struchtemeyer, J. Sieber, H. Mouttaki, A. J. M. Stams,B. Schink, L. Rohlin, and R. P. Gunsalus. 2008. Physiology, ecology, phylogeny, andgenomics of microorganisms capable of syntrophic metabolism. Annals of the NewYork Academy of Sciences 1125:58–72.149Bibliography[316] McMahon, K. D., H. G. Martin, and P. Hugenholtz. 2007. Integrating ecology intobiotechnology. Current Opinion in Biotechnology 18:287–292.[317] McMurdie, P. J., and S. Holmes. 2014. Waste not, want not: Why rarefying microbiomedata is inadmissible. PLoS computational biology 10:e1003531.[318] Meadows, A. L., R. Karnik, H. Lam, S. Forestell, and B. Snedecor. 2010. Application ofdynamic flux balance analysis to an industrial Escherichia coli fermentation. MetabolicEngineering 12:150–160.[319] Meile, L., U. Jenal, D. Studer, M. Jordan, and T. Leisinger. 1989. Characterization ofψM1, a virulent phage of Methanobacterium thermoautotrophicum Marburg. Archivesof Microbiology 152:105–110.[320] Meunier, A., and S. Jacquet. 2015. Do phages impact microbial dynamics, prokaryoticcommunity structure and nutrient dynamics in Lake Bourget? Biology Open 4:1528–1537.[321] Meurant, G. 1989. Advances in Microbial Physiology. Academic Press.[322] Middelboe, M., A. Hagström, N. Blackburn, B. Sinn, U. Fischer, N. Borch, J. Pinhassi,K. Simu, and M. Lorenz. 2001. Effects of bacteriophages on the population dynamicsof four strains of pelagic marine bacteria. Microbial Ecology 42:395–406.[323] Middelburg, J. J., T. Vlug, F. Jaco, and W. Van Der Nat. 1993. Organic mattermineralization in marine systems. Global and Planetary Change 8:47–58.[324] Miki, T., T. Nakazawa, T. Yokokawa, and T. Nagata. 2008. Functional consequencesof viral impacts on bacterial communities: a food-web model analysis. FreshwaterBiology 53:1142–1153.[325] Millero, F. J. 2013. Chemical Oceanography. 4 edition. Marine Science Series, Taylor& Francis, Bosa Roca.[326] Mitchell, A., G. H. Romano, B. Groisman, A. Yona, E. Dekel, M. Kupiec, O. Dahan,and Y. Pilpel. 2009. Adaptive prediction of environmental changes by microorganisms.Nature 460:220–224.[327] Mitri, S., and K. R. Foster. 2013. The genotypic view of social interactions in microbialcommunities. Annual Review of Genetics 47:247–273.150Bibliography[328] Mladenovska, Z., and B. K. Ahring. 2000. Growth kinetics of thermophilicMethanosarcina spp. isolated from full-scale biogas plants treating animal manures.FEMS Microbiology Ecology 31:225–229.[329] Montgomery, H., N. Thom, and A. Cockburn. 1964. Determination of dissolved oxygenby the Winkler method and the solubility of oxygen in pure water and sea water.Journal of Applied Chemistry 14:280–296.[330] Moran, M. A., B. Satinsky, S. M. Gifford, H. Luo, A. Rivers, L.-K. Chan, J. Meng,B. P. Durham, C. Shen, V. A. Varaljay, C. B. Smith, P. L. Yager, and B. M. Hopkinson.2013. Sizing up metatranscriptomics. ISME Journal 7:237–243.[331] Morgan, J. L., A. E. Darling, and J. A. Eisen. 2010. Metagenomic sequencing of anin vitro-simulated microbial community. PLoS ONE 5:e10209.[332] Morisita, M. 1959. Measuring of interspecific association and similarity between com-munities. Mem. Fac. Sci. Kyushu Univ. Series E 3:65–80.[333] Morris, B. E., R. Henneberger, H. Huber, and C. Moissl-Eichinger. 2013. Microbialsyntrophy: interaction for the common good. FEMS Microbiology Reviews 37:384–406.[334] Morris, J. J., R. E. Lenski, and E. R. Zinser. 2012. The Black Queen Hypothesis:evolution of dependencies through adaptive gene loss. MBio 3:e00036–12.[335] Muegge, B. D., J. Kuczynski, D. Knights, J. C. Clemente, A. González, L. Fontana,B. Henrissat, R. Knight, and J. I. Gordon. 2011. Diet drives convergence in gut micro-biome functions across mammalian phylogeny and within humans. Science 332:970–974.[336] Muller, A. L., K. U. Kjeldsen, T. Rattei, M. Pester, and A. Loy. 2015. Phylogeneticand environmental diversity of DsrAB-type dissimilatory (bi)sulfite reductases. ISMEJournal 9:1152–1165.[337] Murphy, K. R., A. Hambly, S. Singh, R. K. Henderson, A. Baker, R. Stuetz, and S. J.Khan. 2011. Organic matter fluorescence in municipal water recycling schemes: Towarda unified PARAFAC model. Environmental Science & Technology 45:2909–2916.[338] Murphy, K. R., C. A. Stedmon, D. Graeber, and R. Bro. 2013. Fluorescence spec-troscopy and multi-way techniques. PARAFAC. Analytical Methods 5:6557–6566.151Bibliography[339] Murphy, K. R., C. A. Stedmon, P. Wenig, and R. Bro. 2014. OpenFluor - an on-line spectral library of auto-fluorescence by organic compounds in the environment.Analytical Methods 6:658–661.[340] Murray, J. W., V. Grundmanis, and W. M. Smethie Jr. 1978. Interstitial water chem-istry in the sediments of Saanich Inlet. Geochimica et Cosmochimica Acta 42:1011–1026.[341] Murray, J. W., B. B. Jorgensen, H. Fossing, C. O. Wirsen, and H. W. Jannasch. 1991.Sulfide oxidation in the anoxic Black Sea chemocline. Deep Sea Research Part A.Oceanographic Research Papers 38:S1083–S1103.[342] NASA Earth Observations, 2015. Solar Insolation (1 month average). Online database(Accessed November 21, 2015). URL http://neo.sci.gsfc.nasa.gov/view.php?datasetId=CERES_INSOL_M.[343] Nazareno, A. G., and W. F. Laurance. 2015. Brazil’s drought: Beware deforestation.Science 347:1427–1427.[344] Neidhardt, F. C., and H. E. Umbarger, 1996. Chemical Composition of Escherichia coli.Chapter 3 in N. F.C., editor. Escherichia coli and Salmonella: Cellular and MolecularBiology, volume 1. American Society of Microbiology (ASM) Press, 2 edition.[345] Ngai, J. T., and D. S. Srivastava. 2006. Predators accelerate nutrient cycling in abromeliad ecosystem. Science 314:963–963.[346] Nichols, D. 2007. Cultivation gives context to the microbial ecologist. FEMS Micro-biology Ecology 60:351–357.[347] Nissenbaum, A., B. J. Presley, and I. R. Kaplan. 1972. Early diagenesis in a reducingfjord, Saanich Inlet, British Columbia — I. Chemical and isotopic changes in majorcomponents of interstitial water. Geochimica et Cosmochimica Acta 36:1007–1027.[348] Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer andthe nature of bacterial innovation. Nature 405:299–304.[349] Ofiţeru, I. D., M. Lunn, T. P. Curtis, G. F. Wells, C. S. Criddle, C. A. Francis, andW. T. Sloan. 2010. Combined niche and neutral effects in a microbial wastewatertreatment community. Proceedings of the National Academy of Sciences 107:15345–15350.152Bibliography[350] Ogawa, H., Y. Amagai, I. Koike, K. Kaiser, and R. Benner. 2001. Production ofrefractory dissolved organic matter by bacteria. Science 292:917–920.[351] Oh, J., and J. Silverstein. 1999. Acetate limitation and nitrite accumulation duringdenitrification. Journal of Environmental Engineering 125:234–242.[352] Ohtsubo, S., K. Demizu, S. Kohno, I. Miura, T. Ogawa, and H. Fukuda. 1992. Compar-ison of acetate utilization among strains of an aceticlastic methanogen, Methanothrixsoehngenii. Applied and Environmental Microbiology 58:703–705.[353] Olsen, G. J., D. J. Lane, S. J. Giovannoni, N. R. Pace, and D. A. Stahl. 1986. Microbialecology and evolution: a ribosomal RNA approach. Annual Reviews in Microbiology40:337–365.[354] Orth, J. D., I. Thiele, and B. O. Palsson. 2010. What is flux balance analysis? NatureBiotechnology 28:245–248.[355] Osburn, C. L., C. R. Wigdahl, S. C. Fritz, and J. E. Saros. 2011. Dissolved or-ganic matter composition and photoreactivity in prairie lakes of the U.S. Great Plains.Limnology and Oceanography 56:2371–2390.[356] Overmann, J., and F. Garcia-Pichel, 2006. The Phototrophic Way of Life. Pages 32–85in M. Dworkin, S. Falkow, E. Rosenberg, K.-H. Schleifer, and E. Stackebrandt, editors.The Prokaryotes. Springer New York.[357] Ovreas, L., D. Bourne, R.-A. Sandaa, E. O. Casamayor, S. Benlloch, V. Goddard,G. Smerdon, M. Heldal, and T. F. Thingstad. 2003. Response of bacterial and viralcommunities to nutrient manipulations in seawater mesocosms. Aquatic MicrobialEcology 31:109–121.[358] Oxman, E., U. Alon, and E. Dekel. 2008. Defined order of evolutionary adaptations:experimental evidence. Evolution 62:1547–1554.[359] Pan, C., and J. Banfield, 2014. Quantitative Metaproteomics: Functional Insights intoMicrobial Communities. Pages 231–240 in I. T. Paulsen and A. J. Holmes, editors.Environmental Microbiology, volume 1096 of Methods in Molecular Biology. HumanaPress.[360] Panikov, N. S., and M. V. Sizova. 1996. A kinetic method for estimating the biomassof microbial functional groups in soil. Journal of Microbiological Methods 24:219–230.153Bibliography[361] Paoletti, A. C., T. J. Parmely, C. Tomomori-Sato, S. Sato, D. Zhu, R. C. Conaway,J. W. Conaway, L. Florens, and M. P. Washburn. 2006. Quantitative proteomic anal-ysis of distinct mammalian Mediator complexes using normalized spectral abundancefactors. Proceedings of the National Academy of Sciences 103:18928–18933.[362] Pausas, J. G., and M. Verdú. 2010. The jungle of methods for evaluating phenotypicand phylogenetic structure of communities. BioScience 60:614–625.[363] Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning inPython. Journal of Machine Learning Research 12:2825–2830.[364] Perez-Garcia, O., S. G. Villas-Boas, S. Swift, K. Chandran, and N. Singhal. 2014.Clarifying the regulation of NO/N2O production in Nitrosomonas europaea duringanoxic–oxic transition via flux balance analysis of a metabolic network model. WaterResearch 60:267–277.[365] Peterson, G., R. C. Allen, and S. C. Holling. 1998. Ecological resilience, biodiversity,and scale. Ecosystems 1:6–18.[366] Pinto, A. J., and N. G. Love. 2012. Bioreactor function under perturbation scenariosis affected by interactions between bacteria and protozoa. Environmental Science &Technology 46:7558–7566.[367] Plucain, J., T. Hindré, M. Le Gac, O. Tenaillon, S. Cruveiller, C. Médigue, N. Leiby,W. R. Harcombe, C. J. Marx, R. E. Lenski, and D. Schneider. 2014. Epistasis andallele specificity in the emergence of a stable polymorphism in Escherichia coli. Science343:1366–1369.[368] Pommier, T., P. R. Neal, J. M. Gasol, C. Montserrat, S. G. Acinas, and C. Pedrós-Alió.2010. Spatial patterns of bacterial richness and evenness in the NW MediterraneanSea explored by pyrosequencing. Aquatic Microbial Ecology 61:221–233.[369] Poretsky, R. S., N. Bano, A. Buchan, G. LeCleir, J. Kleikemper, M. Pickering, W. M.Pate, M. A. Moran, and J. T. Hollibaugh. 2005. Analysis of microbial gene transcriptsin environmental samples. Applied and environmental microbiology 71:4121–4126.[370] Poughon, L., C.-G. Dussap, and J.-B. Gros. 2001. Energy model and metabolic fluxanalysis for autotrophic nitrifiers. Biotechnology and Bioengineering 72:416–433.154Bibliography[371] Powell, J. R., S. Karunaratne, C. D. Campbell, H. Yao, L. Robinson, and B. K.Singh. 2015. Deterministic processes vary during community assembly for ecologicallydissimilar taxa. Nature Communications 6.[372] Price, M. N., P. S. Dehal, and A. P. Arkin. 2009. FastTree: Computing large minimumevolution trees with profiles instead of a distance matrix. Molecular Biology andEvolution 26:1641–1650.[373] Prokop, A., L. E. Erickson, J. Fernandez, and A. E. Humphrey. 1969. Design andphysical characteristics of a multistage, continuous tower fermentor. Biotechnologyand Bioengineering 11:945–966.[374] Prokopenko, M. G., M. B. Hirst, L. De Brabandere, D. Lawrence, W. Berelson,J. Granger, B. X. Chang, S. Dawson, E. J. Crane III, L. Chong, et al. 2013. Nitrogenlosses in anoxic marine sediments driven by Thioploca-anammox bacterial consortia.Nature 500:194–198.[375] Prosser, J., 2005. Nitrogen in soils: Nitrification. Pages 31–39 in D. Hillel, editor.Encyclopedia of Soils in the Environment. Elsevier, Oxford, UK.[376] Prosser, J. I. 2015. Dispersing misconceptions and identifying opportunities for theuse of ‘omics’ in soil microbial ecology. Nature Reviews Microbiology 13:439–446.[377] Prosser, J. I., B. J. M. Bohannan, T. P. Curtis, R. J. Ellis, M. K. Firestone, R. P.Freckleton, J. L. Green, L. E. Green, K. Killham, J. J. Lennon, A. M. Osborn, M. Solan,C. J. van der Gast, and J. P. W. Young. 2007. The role of ecological theory in microbialecology. Nature Reviews Microbiology 5:384–392.[378] Pruesse, E., C. Quast, K. Knittel, B. M. Fuchs, W. Ludwig, J. Peplies, and F. O.Glöckner. 2007. SILVA: a comprehensive online resource for quality checked andaligned ribosomal RNA sequence data compatible with ARB: a comprehensive onlineresource for quality checked and aligned ribosomal RNA sequence data compatiblewith ARB. Nucleic Acids Research 35:7188–7196.[379] Qin, J., R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T. Nielsen,N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D. Li, J. Cao,B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M. Bertalan, J.-M. Batto,T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen, E. Pelletier, P. Renault,T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, S. Li, M. Jian, Y. Zhou, Y. Li, X. Zhang,S. Li, N. Qin, H. Yang, J. Wang, S. Brunak, J. Dore, F. Guarner, K. Kristiansen,155BibliographyO. Pedersen, J. Parkhill, J. Weissenbach, P. Bork, S. D. Ehrlich, and J. Wang. 2010.A human gut microbial gene catalogue established by metagenomic sequencing. Nature464:59–65.[380] Raes, J., and P. Bork. 2008. Molecular eco-systems biology: towards an understandingof community function. Nature Reviews Microbiology 6:693–699.[381] Raes, J., I. Letunic, T. Yamada, L. J. Jensen, and P. Bork. 2011. Toward moleculartrait-based ecology through integration of biogeochemical, geographical and metage-nomic data. Molecular Systems Biology 7:473.[382] Ramette, A. 2007. Multivariate analyses in microbial ecology. FEMS MicrobiologyEcology 62:142–160.[383] Ramos, J.-L. 2003. Lessons from the genome of a lithoautotroph: Making biomassfrom almost nothing. Journal of Bacteriology 185:2690–2691.[384] Raven, J., K. Caldeira, H. Elderfield, O. Hoegh-Guldberg, P. Liss, U. Riebesell, J. Shep-herd, C. Turley, and A. Watson. 2005. Ocean acidification due to increasing atmo-spheric carbon dioxide. The Royal Society.[385] Raymond, J., J. L. Siefert, C. R. Staples, and R. E. Blankenship. 2004. The naturalhistory of nitrogen fixation. Molecular Biology and Evolution 21:541–554.[386] Reed, D. C., C. K. Algar, J. A. Huber, and G. J. Dick. 2014. Gene-centric approachto integrating environmental genomics and biogeochemical models. Proceedings of theNational Academy of Sciences 111:1879–1884.[387] Reed, D. C., J. A. Breier, H. Jiang, K. Anantharaman, C. A. Klausmeier, B. M.Toner, C. Hancock, K. Speer, A. M. Thurnherr, and G. J. Dick. 2015. Predicting theresponse of the deep-ocean microbiome to geochemical perturbations by hydrothermalvents. ISME Journal 9:1857–1869.[388] Reed, H. E., and J. B. Martiny. 2013. Microbial composition affects the functioningof estuarine sediments. The ISME Journal 7:868–879.[389] Remacle, J., and J. De Leval, 1978. Approaches to nitrification in a river. Pages352–356 in D. Schlessinger, editor. Microbiology. American Society for Microbiology,Washington.156Bibliography[390] Resat, H., L. Petzold, and M. F. Pettigrew, 2009. Kinetic Modeling of BiologicalSystems. Pages 311–335 in R. Ireton, K. Montgomery, R. Bumgarner, R. Samudrala,and J. McDermott, editors. Computational Systems Biology, volume 541. HumanaPress, Totowa, NJ.[391] Reyes, A., N. P. Semenkovich, K. Whiteson, F. Rohwer, and J. I. Gordon. 2012.Going viral: next-generation sequencing applied to phage populations in the humangut. Nature Reviews Microbiology 10:607–617.[392] Ridgwell, A., J. Hargreaves, N. R. Edwards, J. Annan, T. M. Lenton, R. Marsh,A. Yool, and A. Watson. 2007. Marine geochemical data assimilation in an efficientEarth System Model of global biogeochemical cycling. Biogeosciences 4:87–104.[393] Riley, M. A., , and J. E. Wertz. 2002. Bacteriocins: Evolution, Ecology, and Applica-tion. Annual Review of Microbiology 56:117–137.[394] Roach, G. 1982. Green’s Functions. Cambridge University Press.[395] Rocha, C., L. Cogliatti-Carvalho, D. Almeida, and A. Freitas. 2000. Bromeliads:biodiversity amplifiers. Journal of Bromeliad Society 50:81–83.[396] Roden, E. E., and Q. Jin. 2011. Thermodynamics of microbial growth coupled tometabolism of glucose, ethanol, short-chain organic acids, and hydrogen. Applied andEnvironmental Microbiology 77:1907–1909.[397] Rodriguez-Brito, B., L. Li, L. Wegley, M. Furlan, F. Angly, M. Breitbart, J. Buchanan,C. Desnues, E. Dinsdale, R. Edwards, B. Felts, M. Haynes, H. Liu, D. Lipson, J. Ma-haffy, A. B. Martin-Cuadrado, A. Mira, J. Nulton, L. Pasic, S. Rayhawk, J. Rodriguez-Mueller, F. Rodriguez-Valera, P. Salamon, S. Srinagesh, T. F. Thingstad, T. Tran,R. V. Thurber, D. Willner, M. Youle, and F. Rohwer. 2010. Viral and microbialcommunity dynamics in four aquatic environments. ISME Journal 4:739–751.[398] Rodriguez-Valera, F., A.-B. Martin-Cuadrado, B. Rodriguez-Brito, L. Pasic, T. F.Thingstad, F. Rohwer, and A. Mira. 2009. Explaining microbial population genomicsthrough phage predation. Nat Rev Micro 7:828–836.[399] Roesch, L. F. W., R. R. Fulthorpe, A. Riva, G. Casella, A. K. M. Hadwin, A. D.Kent, S. H. Daroub, F. A. O. Camargo, W. G. Farmerie, and E. W. Triplett. 2007.Pyrosequencing enumerates and contrasts soil microbial diversity. ISME Journal 1:283–290.157Bibliography[400] Roeselers, G., I. L. Newton, T. Woyke, T. A. Auchtung, G. F. Dilly, R. J. Dutton,M. C. Fisher, K. M. Fontanez, E. Lau, F. J. Stewart, et al. 2010. Complete genomesequence of Candidatus Ruthia magnifica. Standards in genomic sciences 3:163.[401] Röling, W. F., M. Ferrer, and P. N. Golyshin. 2010. Systems approaches to microbialcommunities and their functioning. Current Opinion in Biotechnology 21:532–538.[402] Rosenfeld, N., and U. Alon. 2003. Response delays and the structure of transcriptionnetworks. Journal of Molecular Biology 329:645–654.[403] Rosenfeld, N., M. B. Elowitz, and U. Alon. 2002. Negative autoregulation speeds theresponse times of transcription networks. Journal of Molecular Biology 323:785–793.[404] Rothschild, L. J., and R. L. Mancinelli. 2001. Life in extreme environments. Nature409:1092–1101.[405] Roux, S., A. K. Hawley, M. Torres Beltran, M. Scofield, P. Schwientek,R. Stepanauskas, T. Woyke, S. J. Hallam, and M. B. Sullivan. 2014. Ecology andevolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell-and meta-genomics. eLife 3:e03125.[406] Rusch, D. B., A. L. Halpern, G. Sutton, K. B. Heidelberg, S. Williamson, S. Yooseph,D. Wu, J. A. Eisen, J. M. Hoffman, K. Remington, K. Beeson, B. Tran, H. Smith,H. Baden-Tillson, C. Stewart, J. Thorpe, J. Freeman, C. Andrews-Pfannkoch, J. E.Venter, K. Li, S. Kravitz, J. F. Heidelberg, T. Utterback, Y.-H. Rogers, L. I. Falcón,V. Souza, G. Bonilla-Rosso, L. E. Eguiarte, D. M. Karl, S. Sathyendranath, T. Platt,E. Bermingham, V. Gallardo, G. Tamayo-Castillo, M. R. Ferrari, R. L. Strausberg,K. Nealson, R. Friedman, M. Frazier, and J. C. Venter. 2007. The Sorcerer II GlobalOcean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific.PLoS Biol 5:e77.[407] Santoro, A. E., C. Buchwald, M. R. McIlvin, and K. L. Casciotti. 2011. Isotopicsignature of N2O produced by marine ammonia-oxidizing archaea. Science 333:1282–1285.[408] Sañudo-Wilhelmy, S. A., L. S. Cutter, R. Durazo, E. A. Smail, L. Gómez-Consarnau,E. A. Webb, M. G. Prokopenko, W. M. Berelson, and D. M. Karl. 2012. MultipleB-vitamin depletion in large areas of the coastal ocean. Proceedings of the NationalAcademy of Sciences 109:14041–14045.158Bibliography[409] Sañudo-Wilhelmy, S. A., L. Gómez-Consarnau, C. Suffridge, and E. A. Webb. 2014.The role of B vitamins in marine biogeochemistry. Annual Review of Marine Science6:339–367.[410] Schink, B., and A. Stams, 2006. Syntrophism among Prokaryotes. Pages 309–335 inM. Dworkin, S. Falkow, E. Rosenberg, K.-H. Schleifer, and E. Stackebrandt, editors.The Prokaryotes. Springer, New York.[411] Schneider, S. 2004. Scientists Debate Gaia: The Next Century. MIT Press, Cambridge,Massachusetts.[412] Schoener, T. W. 1986. Mechanistic approaches to community ecology: a new reduc-tionism. American Zoologist 26:81–106.[413] Schrödinger, E. 1944. What is life? Cambridge University Press.[414] Schulthess, R. V., and W. Gujer. 1996. Release of nitrous oxide (N2O) from denitri-fying activated sludge: Verification and application of a mathematical model. WaterResearch 30:521–530.[415] Schunck, H., G. Lavik, D. K. Desai, T. Großkopf, T. Kalvelage, C. R. Löscher, A. Paul-mier, S. Contreras, H. Siegel, M. Holtappels, P. Rosenstiel, M. B. Schilhabel, M. Graco,R. A. Schmitz, M. M. M. Kuypers, and J. LaRoche. 2013. Giant hydrogen sulfideplume in the oxygen minimum zone off Peru supports chemolithoautotrophy. PLoSONE 8:e68661.[416] Scranton, M. I., G. T. Taylor, R. Thunell, C. R. Benitez-Nelson, F. Muller-Karger,K. Fanning, L. Lorenzoni, E. Montes, R. Varela, and Y. Astor. 2014. Interannual andsubdecadal variability in the nutrient geochemistry of the Cariaco Basin. Oceanography27:148–159.[417] Segrè, D., D. Vitkup, and G. M. Church. 2002. Analysis of optimality in naturaland perturbed metabolic networks. Proceedings of the National Academy of Sciences99:15112–15117.[418] Shade, A., S. E. Jones, J. G. Caporaso, J. Handelsman, R. Knight, N. Fierer, andJ. A. Gilbert. 2014. Conditionally rare taxa disproportionately contribute to temporalchanges in microbial diversity. mBio 5:e01371–14.[419] Shade, A., H. Peter, S. D. Allison, D. L. Baho, M. Berga, H. Bürgmann, D. H. Huber,S. Langenheder, J. T. Lennon, J. B. Martiny, et al. 2012. Fundamentals of microbialcommunity resistance and resilience. Frontiers in Microbiology 3:417.159Bibliography[420] Shah, V., and R. M. Morris. 2015. Genome sequence of “Candidatus Thioglobus au-totrophica” strain EF1, a chemoautotroph from the SUP05 clade of marine gammapro-teobacteria. Genome Announcements 3:e01156–15.[421] Shao, J. 1993. Linear model selection by cross-validation. Journal of the Americanstatistical Association 88:486–494.[422] Shapiro, B. J., and M. F. Polz. 2014. Ordering microbial diversity into ecologicallyand genetically cohesive units. Trends in Microbiology 22:235–247.[423] Shapiro, O. H., and A. Kushmaro. 2011. Bacteriophage ecology in environmentalbiotechnology processes. Current Opinion in Biotechnology 22:449–455.[424] Shapiro, O. H., A. Kushmaro, and A. Brenner. 2010. Bacteriophage predation reg-ulates microbial abundance and diversity in a full-scale bioreactor treating industrialwastewater. ISME Journal 4:327–336.[425] Sheintuch, M., B. Tartakovsky, N. Narkis, and M. Rebhun. 1995. Substrate inhibitionand multiple states in a continuous nitrification process. Water Research 29:953–963.[426] Simkins, S., and M. Alexander. 1984. Models for mineralization kinetics with the vari-ables of substrate concentration and population density. Applied and EnvironmentalMicrobiology 47:1299–1306.[427] Sloan, W. T., M. Lunn, S. Woodcock, I. M. Head, S. Nee, and T. P. Curtis. 2006.Quantifying the roles of immigration and chance in shaping prokaryote communitystructure. Environmental Microbiology 8:732–740.[428] Smets, W., J. Leff, M. Bradford, R. McCulley, S. Lebeer, and N. Fierer. 2016. Amethod for simultaneous measurement of soil bacterial abundances and communitycomposition via 16S rRNA gene sequencing. Soil Biology & Biochemistry 96:145–151.[429] Smith, D. P., and P. L. McCarty. 1990. Factors governing methane fluctuationsfollowing shock loading of digesters. Research Journal of the Water Pollution ControlFederation 62:58–64.[430] Smith, M. B., A. M. Rocha, C. S. Smillie, S. W. Olesen, C. Paradis, L. Wu, J. H.Campbell, J. L. Fortney, T. L. Mehlhorn, K. A. Lowe, et al. 2015. Natural bacterialcommunities serve as quantitative geochemical biosensors. mBio 6:e00326–15.160Bibliography[431] Smith, S. L., Y. Yamanaka, M. Pahlow, and A. Oschlies. 2009. Optimal uptake kinetics:physiological acclimation explains the pattern of nitrate uptake by phytoplankton inthe ocean. Marine Ecology Progress Series 384:1–12.[432] Sommer, U. 1984. The paradox of the plankton: Fluctuations of phosphorus avail-ability maintain diversity of phytoplankton in flow-through cultures. Limnology andOceanography 29:633–636.[433] Song, H.-S., W. R. Cannon, A. S. Beliaev, and A. Konopka. 2014. Mathematicalmodeling of microbial community dynamics: a methodological review. Processes 2:711–752.[434] Sontag, E. 2013. Mathematical Control Theory: Deterministic Finite DimensionalSystems. Texts in Applied Mathematics, Springer New York.[435] Sorokin, D. Y., M. Foti, H. C. Pinkart, and G. Muyzer. 2007. Sulfur-oxidizing bacteriain Soap Lake (Washington State), a meromictic, haloalkaline lake with an unprece-dented high sulfide content. Applied and Environmental Microbiology 73:451–455.[436] Spencer, C. C., G. Saxer, M. Travisano, and M. Doebeli. 2007. Seasonal resourceoscillations maintain diversity in bacterial microcosms. Evolutionary Ecology Research9:775.[437] Spencer, C. C., J. Tyerman, M. Bertrand, and M. Doebeli. 2008. Adaptation increasesthe likelihood of diversification in an experimental bacterial lineage. Proceedings ofthe National Academy of Sciences 105:1585–1589.[438] Srivastava, D. S., J. Kolasa, J. Bengtsson, A. Gonzalez, S. P. Lawler, T. E. Miller,P. Munguia, T. Romanuk, D. C. Schneider, and M. Trzcinski. 2004. Are naturalmicrocosms useful model systems for ecology? Trends in Ecology & Evolution 19:379–384.[439] Stackebrandt, E., and J. Ebers. 2006. Taxonomic parameters revisited: tarnished goldstandards. Microbiology Today 33:152.[440] Stackebrandt, E., W. Frederiksen, G. M. Garrity, P. A. D. Grimont, P. Kämpfer,M. C. J. Maiden, X. Nesme, R. Rosselló-Mora, J. Swings, H. G. Trüper, L. Vauterin,A. C. Ward, and W. B. Whitman. 2002. Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. International Journal of Systematicand Evolutionary Microbiology 52:1043–1047.161Bibliography[441] Stahl, D. A., and J. de la Torre. 2012. Physiology and diversity of ammonia-oxidizingarchaea. Annual Review of Microbiology 66:83–101.[442] Stams, A. J. 1994. Metabolic interactions between anaerobic bacteria in methanogenicenvironments. Antonie van Leeuwenhoek 66:271–294.[443] Starkenburg, S. R., P. S. G. Chain, L. A. Sayavedra-Soto, L. Hauser, M. L. Land,F. W. Larimer, S. A. Malfatti, M. G. Klotz, P. J. Bottomley, D. J. Arp, and W. J.Hickey. 2006. Genome sequence of the chemolithoautotrophic nitrite-oxidizing bac-terium Nitrobacter winogradskyi Nb-255. Applied and Environmental Microbiology72:2050–2063.[444] Stedmon, C. A., and R. Bro. 2008. Characterizing dissolved organic matter fluorescencewith parallel factor analysis: a tutorial. Limnology and Oceanography: Methods 6:572–579.[445] Stedmon, C. A., S. Markager, and R. Bro. 2003. Tracing dissolved organic matterin aquatic environments using a new approach to fluorescence spectroscopy. MarineChemistry 82:239–254.[446] Steinkamp, K., 2011. Inverse modeling of the sources and sinks of atmospheric CO2:joint constraints from the ocean and atmosphere. Ph.D. thesis, ETH Zurich.[447] Sterner, R., and J. Elser. 2002. Ecological Stoichiometry: The Biology of Elementsfrom Molecules to the Biosphere. Princeton University Press.[448] Stewart, F. J., O. Ulloa, and E. F. DeLong. 2012. Microbial metatranscriptomics in apermanent marine oxygen minimum zone. Environmental Microbiology 14:23–40.[449] Stewart, F. M., and B. R. Levin. 1973. Partitioning of resources and the outcome ofinterspecific competition: A model and some general considerations. The AmericanNaturalist 107:171–198.[450] Stocker, R. 2012. Marine microbes see a sea of gradients. Science 338:628–633.[451] Stocker, R. 2015. The 100 µm length scale in the microbial ocean. Aquat Microb Ecol76:189–194.[452] Stolper, D. A., N. P. Revsbech, and D. E. Canfield. 2010. Aerobic growth at nanomolaroxygen concentrations. Proceedings of the National Academy of Sciences 107:18755–18760.162Bibliography[453] Stolyar, S., S. Van Dien, K. L. Hillesland, N. Pinel, T. J. Lie, J. A. Leigh, and D. A.Stahl. 2007. Metabolic modeling of a mutualistic microbial community. MolecularSystems Biology 3:92.[454] Strickland, M. S., C. Lauber, N. Fierer, and M. A. Bradford. 2009. Testing thefunctional significance of microbial community composition. Ecology 90:441–451.[455] Strom, S. L. 2008. Microbial ecology of ocean biogeochemistry: A community per-spective. Science 320:1043–1045.[456] Strona, G., D. Nappo, F. Boccacci, S. Fattorini, and J. San-Miguel-Ayanz. 2014. Afast and unbiased procedure to randomize ecological binary matrices with fixed rowand column totals. Nature Communications 5.[457] Strous, M., J. A. Fuerst, E. H. M. Kramer, S. Logemann, G. Muyzer, K. T. van dePas-Schoonen, R. Webb, J. G. Kuenen, and M. S. M. Jetten. 1999. Missing lithotrophidentified as new planctomycete. Nature 400:446–449.[458] Strous, M., E. Pelletier, S. Mangenot, T. Rattei, A. Lehner, M. W. Taylor, M. Horn,H. Daims, D. Bartol-Mavel, P. Wincker, V. Barbe, N. Fonknechten, D. Vallenet, B. Se-gurens, C. Schenowitz-Truong, C. Médigue, A. Collingro, B. Snel, B. E. Dutilh, H. J. M.Op den Camp, C. van der Drift, I. Cirpus, K. T. van de Pas-Schoonen, H. R. Harhangi,L. van Niftrik, M. Schmid, J. Keltjens, J. van de Vossenberg, B. Kartal, H. Meier, D. Fr-ishman, M. A. Huynen, H.-W. Mewes, J. Weissenbach, M. S. M. Jetten, M. Wagner,and D. Le Paslier. 2006. Deciphering the evolution and metabolism of an anammoxbacterium from a community genome. Nature 440:790–794.[459] Sul, W. J., T. A. Oliver, H. W. Ducklow, L. A. Amaral-Zettler, and M. L. Sogin. 2013.Marine bacteria exhibit a bipolar distribution. Proceedings of the National Academyof Sciences 110:2342–2347.[460] Sunagawa, S., L. P. Coelho, S. Chaffron, J. R. Kultima, K. Labadie, G. Salazar, B. Dja-hanschiri, G. Zeller, D. R. Mende, A. Alberti, F. M. Cornejo-Castillo, P. I. Costea,C. Cruaud, F. d’Ovidio, S. Engelen, I. Ferrera, J. M. Gasol, L. Guidi, F. Hildebrand,F. Kokoszka, C. Lepoivre, G. Lima-Mendez, J. Poulain, B. T. Poulos, M. Royo-Llonch,H. Sarmento, S. Vieira-Silva, C. Dimier, M. Picheral, S. Searson, S. Kandels-Lewis,T. O. coordinators, C. Bowler, C. de Vargas, G. Gorsky, N. Grimsley, P. Hingamp,D. Iudicone, O. Jaillon, F. Not, H. Ogata, S. Pesant, S. Speich, L. Stemmann, M. B.Sullivan, J. Weissenbach, P. Wincker, E. Karsenti, J. Raes, S. G. Acinas, and P. Bork.2015. Structure and function of the global ocean microbiome. Science 348.163Bibliography[461] Sundh, I., H. Carlsson, Å. Nordberg, M. Hansson, and B. Mathisen. 2003. Effectsof glucose overloading on microbial community structure and biogas production in alaboratory-scale anaerobic digester. Bioresource Technology 89:237–243.[462] Suttle, C. A. 2007. Marine viruses – major players in the global ecosystem. NatureReviews Microbiology 5:801–812.[463] Suzuki, I., U. Dular, and S. C. Kwok. 1974. Ammonia or ammonium ion as substratefor oxidation by Nitrosomonas europaea cells and extracts. Journal of Bacteriology120:556–558.[464] Takahashi, T., S. Sutherland, D. Chipman, J. Goddard, C. Ho, T. Newberger,C. Sweeney, and D. Munro. 2014. Climatological distributions of pH, pCO2, to-tal CO2, alkalinity, and CaCO3 saturation in the global surface ocean, and temporalchanges at selected locations. Marine Chemistry 164:95–125.[465] Takayama, K., and S. Kjelleberg. 2000. The role of RNA stability during bacterialstress responses and starvation. Environmental Microbiology 2:355–365.[466] Talbot, J. M., T. D. Bruns, J. W. Taylor, D. P. Smith, S. Branco, S. I. Glassman,S. Erlandson, R. Vilgalys, H.-L. Liao, M. E. Smith, and K. G. Peay. 2014. Endemismand functional convergence across the North American soil mycobiome. Proceedingsof the National Academy of Sciences 111:6341–6346.[467] Tappe, W., A. Laverman, M. Bohland, M. Braster, S. Rittershaus, J. Groeneweg, andH. W. van Verseveld. 1999. Maintenance energy demand and starvation recoverydynamics of Nitrosomonas europaea and Nitrobacter winogradskyi cultivated in a re-tentostat with complete biomass retention. Applied and Environmental Microbiology65:2471–2477.[468] Tarantola, A. 2005. Inverse Problem Theory and Methods for Model ParameterEstimation. Society for Industrial and Applied Mathematics, Philadelphia.[469] Tatusov, R. L., M. Y. Galperin, D. A. Natale, and E. V. Koonin. 2000. The COGdatabase: a tool for genome-scale analysis of protein functions and evolution. NucleicAcids Research 28:33–36.[470] Tatusova, T., S. Ciufo, B. Fedorov, K. O’Neill, and I. Tolstoy. 2014. RefSeq micro-bial genomes database: new representation and annotation strategy. Nucleic AcidsResearch 42:D553–9.164Bibliography[471] Teeling, H., and F. O. Glöckner. 2012. Current opportunities and challenges in mi-crobial metagenome analysis—a bioinformatic perspective. Briefings in Bioinformatics13:728–742.[472] Thamdrup, B., T. Dalsgaard, and N. P. Revsbech. 2012. Widespread functional anoxiain the oxygen minimum zone of the Eastern South Pacific. Deep Sea Research Part I:Oceanographic Research Papers 65:36–45.[473] The Concord Consortium, 2016 (Accessed on May 24, 2016). Bénard Convection Cells.URL http://energy.concord.org/energy2d/bernard-cells.html.[474] Thingstad, T., and R. Lignell. 1997. Theoretical models for the control of bacterialgrowth rate, abundance, diversity and carbon demand. Aquatic Microbial Ecology13:19–27.[475] Thingstad, T. F. 2000. Elements of a theory for the mechanisms controlling abun-dance, diversity, and biogeochemical role of lytic bacterial viruses in aquatic systems.Limnology and Oceanography 45:1320–1328.[476] Thomas, R., L. Berdjeb, T. Sime-Ngando, and S. Jacquet. 2011. Viral abundance,production, decay rates and life strategies (lysogeny versus lysis) in Lake Bourget(France). Environmental Microbiology 13:616–630.[477] Thrash, J. C., J.-C. Cho, S. Ferriera, J. Johnson, K. L. Vergin, and S. J. Giovannoni.2010. Genome sequences of Pelagibaca bermudensis HTCC2601T and Maritimibacteralkaliphilus HTCC2654T, the type strains of two marine Roseobacter genera. Journalof Bacteriology 192:5552–5553.[478] Thullner, M., P. Regnier, and P. Van Cappellen. 2007. Modeling microbially inducedcarbon degradation in redox-stratified subsurface environments: concepts and openquestions. Geomicrobiology Journal 24:139–155.[479] Tilman, D. 1982. Resource Competition and Community Structure. Princeton Uni-versity Press, Princeton, NJ.[480] Tilman, D. 1996. Biodiversity: population versus ecosystem stability. Ecology 77:350–363.[481] Tilman, D., C. L. Lehman, and C. E. Bristow. 1998. Diversity-stability relationships:Statistical inevitability or ecological consequence? The American Naturalist 151:277–282.165Bibliography[482] Tirard, S., M. Morange, and A. Lazcano. 2010. The definition of life: A brief historyof an elusive scientific endeavor. Astrobiology 10:1003–1009.[483] Treangen, T. J., and E. P. C. Rocha. 2011. Horizontal transfer, not duplication, drivesthe expansion of protein families in prokaryotes. PLoS Genetics 7:e1001284.[484] Treves, D. S., S. Manning, and J. Adams. 1998. Repeated evolution of an acetate-crossfeeding polymorphism in long-term populations of Escherichia coli. MolecularBiology and Evolution 15:789–797.[485] Trifonov, E. N. 2011. Vocabulary of definitions of life suggests a definition. Journal ofBiomolecular Structure and Dynamics 29:259–266.[486] Tringe, S. G., C. von Mering, A. Kobayashi, A. A. Salamov, K. Chen, H. W. Chang,M. Podar, J. M. Short, E. J. Mathur, J. C. Detter, P. Bork, P. Hugenholtz, andE. M. Rubin. 2005. Comparative metagenomics of microbial communities. Science308:554–557.[487] Tuomisto, H., K. Ruokolainen, and M. Yli-Halla. 2003. Dispersal, environment, andfloristic variation of western Amazonian forests. Science 299:241–244.[488] Tyerman, J. G., M. Bertrand, C. C. Spencer, and M. Doebeli. 2008. Experimentaldemonstration of ecological character displacement. BMC Evolutionary Biology 8:34.[489] Ulloa, O., D. E. Canfield, E. F. DeLong, R. M. Letelier, and F. J. Stewart. 2012.Microbial oceanography of anoxic oxygen minimum zones. Proceedings of the NationalAcademy of Sciences 109:15996–16003.[490] Ulrich, W. 2004. Species co-occurrences and neutral models: reassessing JM Diamond’sassembly rules. Oikos 107:603–609.[491] Vallino, J., C. Algar, N. González, and J. Huber, 2014. Use of receding horizon optimalcontrol to solve MaxEP-based biogeochemistry problems. Pages 337–359 in R. C.Dewar, C. H. Lineweaver, R. K. Niven, and K. Regenauer-Lieb, editors. Beyond theSecond Law. Springer Berlin Heidelberg.[492] Vallino, J. J. 2010. Ecosystem biogeochemistry considered as a distributed metabolicnetwork ordered by maximum entropy production. Philosophical Transactions of theRoyal Society B: Biological Sciences 365:1417–1427.166Bibliography[493] van de Vossenberg, J., D. Woebken, W. J. Maalcke, H. J. C. T. Wessels, B. E. Du-tilh, B. Kartal, E. M. Janssen-Megens, G. Roeselers, J. Yan, D. Speth, J. Gloerich,W. Geerts, E. van der Biezen, W. Pluk, K.-J. Francoijs, L. Russ, P. Lam, S. A. Mal-fatti, S. G. Tringe, S. C. M. Haaijer, H. J. M. Op den Camp, H. G. Stunnenberg,R. Amann, M. M. M. Kuypers, and M. S. M. Jetten. 2013. The metagenome of themarine anammox bacterium ‘Candidatus Scalindua profunda’ illustrates the versatil-ity of this globally important nitrogen cycle bacterium. Environmental Microbiology15:1275–1289.[494] Van Der Heijden, M. G. A., R. D. Bardgett, and N. M. Van Straalen. 2008. The unseenmajority: soil microbes as drivers of plant diversity and productivity in terrestrialecosystems. Ecology Letters 11:296–310.[495] Vanwonterghem, I., P. D. Jensen, P. G. Dennis, P. Hugenholtz, K. Rabaey, and G. W.Tyson. 2014. Deterministic processes guide long-term synchronised population dy-namics in replicate anaerobic digesters. ISME Journal 8:2015–2028.[496] Varma, A., and B. O. Palsson. 1994. Stoichiometric flux balance models quantita-tively predict growth and metabolic by-product secretion in wild-type Escherichia coliW3110. Applied and Environmental Microbiology 60:3724–3731.[497] Vojinović, V., J. Cabral, and L. Fonseca. 2006. Real-time bioprocess monitoring: PartI: In situ sensors. Sensors and Actuators B: Chemical 114:1083–1091.[498] Von Schulthess, R., D. Wild, and W. Gujer. 1994. Nitric and nitrous oxides from den-itrifying activated sludge at low oxygen concentration. Water Science and Technology30:123–132.[499] Vos, M., P. J. Birkett, E. Birch, R. I. Griffiths, and A. Buckling. 2009. Local adaptationof bacteriophages to their bacterial hosts in soil. Science 325:833.[500] Voss, M., H. W. Bange, J. W. Dippner, J. J. Middelburg, J. P. Montoya, and B. Ward.2013. The marine nitrogen cycle: recent discoveries, uncertainties and the potential rel-evance of climate change. Philosophical Transactions of the Royal Society B: BiologicalSciences 368.[501] Voss, M., and J. P. Montoya. 2009. Nitrogen cycle: Oceans apart. Nature 461:49–50.[502] Wallenstein, M. D., and M. N. Weintraub. 2008. Emerging tools for measuring andmodeling the in situ activity of soil extracellular enzymes. Soil Biology and Biochem-istry 40:2098–2106.167Bibliography[503] Walsh, D. A., and S. J. Hallam, 2011. Bacterial community structure and dynamics ina seasonally anoxic fjord: Saanich Inlet, British Columbia. Chapter 25, pages 253–267in F. J. de Bruijn, editor. Handbook of Molecular Microbial Ecology II: Metagenomicsin Different Habitats. John Wiley & Sons, Hoboken, NJ, USA.[504] Walsh, D. A., E. Zaikova, C. G. Howes, Y. C. Song, J. J. Wright, S. G. Tringe, P. D.Tortell, and S. J. Hallam. 2009. Metagenome of a versatile chemolithoautotroph fromexpanding oceanic dead zones. Science 326:578–582.[505] Wand, U., V. A. Samarkin, H.-M. Nitzsche, and H.-W. Hubberten. 2006. Biogeo-chemistry of methane in the permanently ice-covered Lake Untersee, central DronningMaud Land, East Antarctica. Limnology and Oceanography 51:1180–1194.[506] Wang, X., X. Wen, H. Yan, K. Ding, F. Zhao, and M. Hu. 2011. Bacterial communitydynamics in a functionally stable pilot-scale wastewater treatment plant. BioresourceTechnology 102:2352–2357.[507] Ward, B. 1987. Kinetic studies on ammonia and methane oxidation by Nitrosococcusoceanus. Archives of Microbiology 147:126–133.[508] Ward, B. 2005. Temporal variability in nitrification rates and related biogeochemicalfactors in Monterey Bay, California, USA. Marine Ecology Progress Series 292:97–109.[509] Ward, B., A. Devol, J. Rich, B. Chang, S. Bulow, H. Naik, A. Pratihary, and A. Jayaku-mar. 2009. Denitrification as the dominant nitrogen loss process in the Arabian Sea.Nature 461:78–81.[510] Ward, D. M., F. M. Cohan, D. Bhaya, J. F. Heidelberg, M. Kuhl, and A. Grossman.2007. Genomics, environmental genomics and the issue of microbial species. Heredity100:207–219.[511] Warnecke, F., and M. Hess. 2009. A perspective: Metatranscriptomics as a tool forthe discovery of novel biocatalysts. Journal of Biotechnology 142:91–95.[512] Weinbauer, M. G., and F. Rassoulzadegan. 2004. Are viruses driving microbial diver-sification and diversity? Environmental Microbiology 6:1–11.[513] Welch, B. L. 1947. The generalization of “Student’s” problem when several differentpopulation variances are involved. Biometrika 34:28–35.168Bibliography[514] Welch, R. A., V. Burland, G. Plunkett, P. Redford, P. Roesch, D. Rasko, E. L. Buckles,S.-R. Liou, A. Boutin, J. Hackett, D. Stroud, G. F. Mayhew, D. J. Rose, S. Zhou, D. C.Schwartz, N. T. Perna, H. L. T. Mobley, M. S. Donnenberg, and F. R. Blattner. 2002.Extensive mosaic structure revealed by the complete genome sequence of uropathogenicEscherichia coli. Proceedings of the National Academy of Sciences 99:17020–17024.[515] Wenk, C. B., J. Blees, J. Zopfi, M. Veronesi, A. Bourbonnais, C. J. Schubert, H. Nie-mann, and M. F. Lehmann. 2013. Anaerobic ammonium oxidation (anammox) bac-teria and sulfide-dependent denitrifiers coexist in the water column of a meromicticsouth-alpine lake. Limnology and Oceanography 58:1–12.[516] Werner, J. J., D. Knights, M. L. Garcia, N. B. Scalfone, S. Smith, K. Yarasheski, T. A.Cummings, A. R. Beers, R. Knight, and L. T. Angenent. 2011. Bacterial communitystructures are unique and resilient in full-scale bioenergy systems. Proceedings of theNational Academy of Sciences 108:4158–4163.[517] Wertz, S., V. Degrange, J. I. Prosser, F. Poly, C. Commeaux, T. Freitag, N. Guillau-maud, and X. L. Roux. 2006. Maintenance of soil functioning following erosion ofmicrobial diversity. Environmental Microbiology 8:2162–2169.[518] Whitman, W. B., D. C. Coleman, and W. J. Wiebe. 1998. Prokaryotes: The unseenmajority. Proceedings of the National Academy of Sciences 95:6578–6583.[519] Wicht, H. 1996. A model for predicting nitrous oxide production during denitrificationin activated sludge. Water Science and Technology 34:99–106.[520] Wiesmann, U., 1994. Biological nitrogen removal from wastewater. Pages 113–154 inAdvances in Biochemical Engineering and Biotechnology, volume 51. Springer BerlinHeidelberg.[521] Winter, C., A. Smit, G. J. Herndl, and M. G. Weinbauer. 2004. Impact of virioplanktonon archaeal and bacterial community richness as assessed in seawater batch cultures.Applied and Environmental Microbiology 70:804–813.[522] Wittebolle, L., M. Marzorati, L. Clement, A. Balloi, D. Daffonchio, K. Heylen,P. De Vos, W. Verstraete, and N. Boon. 2009. Initial community evenness favoursfunctionality under selective stress. Nature 458:623–626.[523] Wittebolle, L., W. Verstraete, and N. Boon. 2009. The inoculum effect on theammonia-oxidizing bacterial communities in parallel sequential batch reactors. Wa-ter Research 43:4149–4158.169Bibliography[524] Wittebolle, L., H. Vervaeren, W. Verstraete, and N. Boon. 2008. Quantifying commu-nity dynamics of nitrifiers in functionally stable reactors. Applied and EnvironmentalMicrobiology 74:286–293.[525] Woese, C. R. 2002. On the evolution of cells. Proceedings of the National Academyof Sciences 99:8742–8747.[526] Wohl, D. L., S. Arora, and J. R. Gladstone. 2004. Functional redundancy supportsbiodiversity and ecosystem function in a closed and constant environment. Ecology85:1534–1540.[527] Wolda, H. 1981. Similarity indices, sample size and diversity. Oecologia 50:296–302.[528] Wong, C., N. Waser, Y. Nojiri, W. Johnson, F. Whitney, J. Page, and J. Zeng. 2002.Seasonal and interannual variability in the distribution of surface nutrients and dis-solved inorganic carbon in the northern North Pacific: influence of El Niño. Journalof Oceanography 58:227–243.[529] Woo, P. C. Y., S. K. P. Lau, J. L. L. Teng, H. Tse, and K.-Y. Yuen. 2008. Then andnow: use of 16S rDNA gene sequencing for bacterial identification and discovery ofnovel bacteria in clinical microbiology laboratories. Clinical Microbiology and Infection14:908–934.[530] Wood, T., J. Burke, and L. Rieseberg. 2005. Parallel genotypic adaptation: whenevolution repeats itself. Genetica 123:157–170.[531] Wright, J. J., K. M. Konwar, and S. J. Hallam. 2012. Microbial ecology of expandingoxygen minimum zones. Nature Reviews Microbiology 10:381–394.[532] Xia, K.-Q. 2013. Current trends and future directions in turbulent thermal convection.Theoretical and Applied Mechanics Letters 3:052001.[533] Xing, J., C. Criddle, and R. Hickey. 1997. Long-term adaptive shifts in anaerobic com-munity structure in response to a sustained cyclic substrate perturbation. MicrobialEcology 33:50–58.[534] Xu, J. 2006. Microbial ecology in the age of genomics and metagenomics: concepts,tools, and recent advances. Molecular Ecology 15:1713–1731.[535] Yamashita, Y., J. N. Boyer, and R. Jaffe. 2013. Evaluating the distribution of ter-restrial dissolved organic matter in a complex coastal ecosystem using fluorescencespectroscopy. Continental Shelf Research 66:136–144.170Bibliography[536] Yarza, P., P. Yilmaz, E. Pruesse, F. O. Glockner, W. Ludwig, K.-H. Schleifer, W. B.Whitman, J. Euzeby, R. Amann, and R. Rossello-Mora. 2014. Uniting the classificationof cultured and uncultured bacteria and archaea using 16S rRNA gene sequences.Nature Reviews Microbiology 12:635–645.[537] Yin, B., D. Crowley, G. Sparovek, W. J. De Melo, and J. Borneman. 2000. Bacterialfunctional redundancy along a soil reclamation gradient. Applied and EnvironmentalMicrobiology 66:4361–4365.[538] Yoon, S. H., M.-J. Han, H. Jeong, C. H. Lee, X.-X. Xia, D.-H. Lee, J. H. Shim, S. Y.Lee, T. K. Oh, and J. F. Kim. 2012. Comparative multi-omics systems analysis ofEscherichia coli strains B and K-12. Genome Biology 13:R37.[539] Zagatto, E., A. Jacintho, J. Mortatti, and B. F. H. 1980. An improved flow injectiondetermination of nitrite in waters by using intermittent flows. Analytica Chimica Acta120:399–403.[540] Zaikova, E., D. A. Walsh, C. P. Stilwell, W. W. Mohn, P. D. Tortell, and S. J. Hallam.2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet,British Columbia. Environmental Microbiology 12:172–191.[541] Zeebe, R., and D. Wolf-Gladrow. 2001. CO2 in Seawater: Equilibrium, Kinetics,Isotopes. Elsevier oceanography series, Elsevier.[542] Zeisel, A., W. J. Köstler, N. Molotski, J. M. Tsai, R. Krauthgamer, J. Jacob-Hirsch,G. Rechavi, Y. Soen, S. Jung, Y. Yarden, et al. 2011. Coupled pre-mRNA and mRNAdynamics unveil operational strategies underlying transcriptional responses to stimuli.Molecular Systems Biology 7:529.[543] Zengler, K., and B. O. Palsson. 2012. A road map for the development of communitysystems (CoSy) biology. Nature Reviews Microbiology 10:366–372.[544] Zhang, J., K. Kobert, T. Flouri, and A. Stamatakis. 2014. PEAR: a fast and accurateIllumina Paired-End reAd mergeR. Bioinformatics 30:614–620.[545] Zomorrodi, A. R., and C. D. Maranas. 2012. OptCom: a multi-level optimizationframework for the metabolic modeling and analysis of microbial communities. PLoSComputational Biology 8:e1002363.[546] Zopfi, J., T. G. Ferdelman, B. B. Jorgensen, A. Teske, and B. Thamdrup. 2001. Influ-ence of water column dynamics on sulfide oxidation and other major biogeochemicalprocesses in the chemocline of Mariager Fjord (Denmark). Marine Chemistry 74:29–51.171Bibliography[547] Zubkov, M., A. Sazhin, and M. Flint. 1992. The microplankton organisms at theoxic-anoxic interface in the pelagial of the Black Sea. FEMS Microbiology Letters101:245—250.[548] Zumstein, E., R. Moletta, and J.-J. Godon. 2000. Examination of two years of commu-nity dynamics in an anaerobic bioreactor using fluorescence polymerase chain reaction(PCR) single-strand conformation polymorphism analysis. Environmental Microbiol-ogy 2:69–78.[549] Zweng, M., J. Reagan, J. Antonov, R. Locarnini, A. Mishonov, T. Boyer, H. Garcia,O. Baranova, D. Johnson, D.Seidov, and M. Biddle, 2013. Salinity. in S. Levitus andA. Mishonov, editors. World Ocean Atlas 2013, volume 2. NOAA Atlas NESDIS 74.172Chapter 2: Supplemental materialAppendix AChapter 2: Supplemental materialA.1 MethodsA.1.1 Sequencing dataBacterial and archaeal taxa were identified based on 16S ribosomal DNA sequences, ex-tracted from a previously published unassembled metagenomic data set that comprises 139marine samples from 68 locations (460; see Table A.2 for an overview and Fig. A.9 for sam-pling locations). Quality-filtered 16S rDNA sequences, extracted from the metagenomes bythe original authors (460), were clustered with uclust (110) by closed-reference mapping tothe SILVA 119 reference database (378) at 99 % similarity, yielding 49685 clusters (so-calledoperational taxonomic units, or OTUs) representing 10976383 sequences. OTU abundanceswere converted to relative proportions in each sample by total sum scaling. MetagenomicKEGG orthologous group (221) profiles for the same samples were taken directly from theoriginal publication (460) and were normalized using total sum scaling. Whenever possible,KEGG orthologs associated with similar enzymatic functions (e.g., dissimilatory nitrite re-duction to ammonium, nirBD and nrfAH) were combined into functional groups comparableto the ones considered in this chapter (Fig. A.5). A list of KEGG orthologs associated witheach function is provided in Table F.1.A.1.2 Functional annotation of prokaryotic taxaTo estimate the functional potential of the bacterial and archaeal communities, we clas-sified each OTU into one or more functional groups based on current literature, such asannouncements of cultured representatives or manuals of systematic microbiology, wheneverpossible. More precisely, a taxon (e.g., species or genus) was associated with a particularfunction (e.g., denitrification) if all cultured species within the taxon have been shown toexhibit that function. While the risk of false generalizations was minimized via extensivemanual investigation of available literature, we point out that as more organisms are cul-173Chapter 2: Supplemental materialtured in the future some of these generalization may turn out to be false. Furthermore,strain-specific variations within species (514) were ignored in favor of type strains, whichmay have led to further inaccuracies in our functional annotations. Nevertheless, a com-parison of the so-obtained functional community profiles with metagenomic profiles showsthat this approach is able to reveal most major differences in functional potential betweensamples (Fig. A.5). A total of 30633 OTUs (66.1 %) were assigned to at least one functionalgroup, yielding a total of 46313 functional annotations (see Table A.4 for an overview).OTUs without any functional annotation were excluded from the analysis. Our completedatabase for the functional annotation of prokaryotic taxa (FAPROTAX) is available onlineat www.zoology.ubc.ca/louca/FAPROTAX.Some taxa were associated with multiple functions: For example Sulfurospirillum arsenophilumwas associated with, among others, nitrate respiration and fermentation. Hence, the detec-tion of S. arsenophilum was interpreted as the detection of a putative nitrate respirer and aputative fermenter. Furthermore, functional groups could be nested: For example, all nitratedenitrifiers were also associated with nitrate respiration as well as nitrate reduction. Relativefunctional group abundances in each sample were calculated as the cumulative abundances ofOTUs assigned to each functional group, after normalizing by the cumulative abundances ofOTUs associated with at least one function (i.e., total-sum-scaling restricted to functionallyannotated OTUs). For example, if in a particular sample all functionally annotated OTUstogether accounted for 104 reads, and a single functionally annotated OTU accounted for 10sequences, then that OTU contributed a value of 10/104 = 0.001 to each functional group itwas associated with.A.1.3 Statistical AnalysisRelating community composition to environmental variablesTo assess the overall effect of environmental conditions on community structure, we per-formed multivariate regression of relative functional group abundances, as well as the pro-portions of various taxa within each functional group, against 13 key abiotic oceanographicvariables that included dissolved oxygen, salinity, temperature and depth (see Table A.1 foran overview; see below for details). Regression was done using non-linear kernel ridge regres-sion (KRR) with Gaussian radial basis function kernels, as implemented by the scikit-learnmachine learning package (363). In short, KRR models combine non-linear regression withregularization of coefficients. Regularization limits the complexity of a model and reduces therisk of overfitting by penalizing excessive coefficients, and generally presents a more robust174Chapter 2: Supplemental materialalternative to step-wise model selection methods (88). The final complexity of a fitted modeldepends on a penalization parameter that mediates the trade-off between model simplicityand a better fit to the data. Non-linearities, on the other hand, are accounted for using the“kernel-trick”, which embeds predictor variables into higher dimensions using Gauss-shapedfunctions prior to linear regression (363). The predictive power of each model (i.e., for eachfunctional group or taxon) was measured using 10-fold Monte-Carlo cross-validation with500 random iterations, which estimates the achievable coefficient of determination (R2CV)when only a random subset (90%) of the samples are used for fitting and the remainingsamples (10%) are used for independent testing (421). The R2CV is typically used to assessthe risk of data overfitting and inaccurate extrapolation, and provides a more conservativeestimate of a model’s predictive power than the classical coefficient of determination (R2).The degree of penalization as well as the Gaussian kernel bandwidth were optimized sepa-rately for each model using grid-search by maximizing the achievable R2CV. Hence, the finalmodels minimize complexity and the risk of overfitting while optimizing expected predictivepower.We mention that we had initially considered multivariate linear regression instead of KRRmodels. However, not surprisingly, the linear models resulted in very unreliable extrapola-tions and revealed strong non-linearities in the data not accounted for by the linear modelstructure. Nevertheless, the overall results and conclusions reported in this chapter remainedunchanged.To assess the influence of individual environmental variables on community structure, weperformed Pearson and Spearman rank correlation analysis between the relative abundancesof functional groups as well as the taxon proportions within individual functional groups onthe one hand, and environmental variables on the other hand. Both Pearson and Spearmanrank correlations yielded similar conclusions, hence we focus our reports on the latter.To verify the robustness of the regression and correlation analyses, we considered the tax-onomic composition within individual functional groups at various taxonomic resolutions(species, genus, family, order and class level); all taxonomic resolutions yielded similar re-sults (e.g., Figs. A.1 and A.2). Functional groups or taxa represented in less than 10 sampleswere excluded from regression and correlation analysis. Because each functional group couldcontain hundreds or thousands of different taxa, the achieved R2cv and absolute correlationsfor the taxon proportions in the functional group were averaged across all taxa.175Chapter 2: Supplemental materialEnvironmental variablesEnvironmental variables considered for regression and correlation analysis were standardabiotic oceanographic variables that are generally known to influence marine microbial dis-tributions (235). These variables included in-situ temperature, salinity, dissolved oxygenconcentration, apparent oxygen utilization, nitrate (NO−3 ), phosphate (PO3−4 ), dissolved sil-icate, sample depth, distance to the thermocline, surface total CO2, surface pH, daily inso-lation and duration of day. 15 out of 139 samples, for which some of these metadata wereunavailable (e.g., due to limited depth coverage), were excluded from the regression andcorrelation analysis but still included in all other investigations. Any environmental meta-data provided as part of the original data set (460) were used without modification; missingdata were obtained from public global gridded databases (140, 141, 278, 342, 464, 549) andlinearly interpolated between grid points if needed. No extrapolation outside of the availablegrids was done. An overview of environmental variables is given in Table A.1. Spearmanrank correlations between environmental variables are shown in Fig. A.10.Correlations between functional groupsTo detect putative interactions between different functional groups, we calculated Spearmanrank correlations between the relative abundances of functional groups across all 139 samples(Fig. A.6). To ensure that any positive correlations between two given functional groups werenot merely due to their overlap (in terms of shared OTUs; Fig. A.11), we only considered thefraction of each functional group due to OTUs not shared with the other functional group.For example, when comparing the abundances of cellulolytic and xylanolytic cells, we onlyin fact compared the abundances of non-xylanolytic cellulolytic cells and non-cellulolyticxylanolytic cells. For correlations between the entire functional groups, i.e., not accountingfor overlaps, see Fig. A.12.Functional and taxonomic richnessTo compare the taxonomic and functional richness of the microbial communities we first rar-efied all samples at equal sequencing depth, at the maximum depth possible (24644 sequences,picked without replacement), to eliminate richness differences purely based on varying sam-pling effort. Upon rarefaction, the total number of detected OTUs or functional groups wastaken as a measure of richness. Rarefaction was repeated 1000 times, and the OTU richnessas well as functional richness was averaged over all rarefactions.176Chapter 2: Supplemental materialAssessing segregation between water column zonesTo assess the extent to which OTUs, higher taxa or functional groups exhibit significantlydifferent abundances between environments, we compared their mean abundances betweenwater column zones. Specifically, for each OTU (or taxon or functional group) and for anytwo water column zones, we used the Welch test statistic (513) to compare the mean relativeabundances within each zone to a null model corresponding to random sample permutations.The size of each sample group (i.e., corresponding to either zone) is maintained by thepermutations. In contrast to other similar tests (e.g., Student’s t-test), this null model doesnot assume any particular probability distribution of the data. The statistical significance ofa difference in mean abundances was defined as the probability that the Welch statistic wouldbe more extreme (in either direction) than observed, and was estimated using 1000 randompermutations. Here we report the fraction of OTUs (or taxa or functional groups) that wereidentified to exhibit significantly different abundances between zones (Figs. 2.2E–G). Notethat under the null model (i.e., no segregation) the false detection rate would be 5%. Hence,the high fraction of functional groups identified as significantly segregated between zones(e.g., up to 80% of functional groups when comparing the mesopelagic and surface zones,Fig. 2.2G) is indicative of strong differences in metabolic niche structure between zones.Geographical variation in community structureTo assess the extent to which dispersal limitation may promote differences in communitycomposition we performed Pearson and Spearman rank Mantel correlation tests betweenthe geographical distances of sample pairs and their dissimilarities in terms of taxonomic aswell as functional community composition. Geographical distances were geodesic distancescalculated based on sample latitude and longitude. Considered dissimilarity metrics betweencommunity profiles (taxonomic or functional) were Bray-Curtis dissimilarity, Canberra dis-tance and Hellinger distance. These dissimilarity metrics are widely used in biogeographicalsurveys (266), and we considered all three of them to verify the robustness of our results.When calculating dissimilarities in taxonomic composition, multiple taxonomic levels wereconsidered (species, genus, family, order, class); all yielded similar results. To control forstrong variations across depth, presumably driven by environmental conditions rather thandispersal limitation (see discussion in the main text), we restricted our analysis to sampleswithin individual zones (i.e., mesopelagic, surface layer, mixed layer and deep chlorophyllmaximum).The statistical significance of correlations between geographical distance and community dis-177Chapter 2: Supplemental materialsimilarity was estimated using repeated random permutations of the rows and columns inthe distance matrices (1000 trials, rows and columns were permuted similarly). All dissim-ilarity metrics, and both Pearson and Spearman rank correlations, yielded no statisticallysignificant correlations between geographical distance and community dissimilarity. Thecomparisons between geographical distance and functional as well as taxonomic (OTU) dis-similarities are shown in Figs. 2.4AB for the mesopelagic zone, and Fig. A.8 for the surfacelayer and deep chlorophyll maximum. For other taxonomic resolutions (e.g., at genus andfamily level) see Fig. A.13. For comparisons between geographical distances and taxonomicdifferences within individual functional groups, see Fig. A.4.OTU co-correlations within functional groupsTo test whether OTUs with a higher number of shared functions correlated more strongly,we calculated the mean Spearman rank correlation between OTUs with 0 shared functions(90587017 OTU pairs), as well as between OTUs with exactly 1 shared function (34415335OTU pairs), between OTUs with exactly 2 shared functions (410333 OTU pairs) and so on(Fig. 2.2H). Only OTUs occurring in at least 10 samples were considered (15840 OTUs intotal). Mean correlations obtained over less than 10 OTU pairs were omitted.Metric multidimensional scalingPairwise dissimilarities between functional or taxonomic community profiles were calculatedusing three metrics — Bray-Curtis, Canberra and Hellinger (266). Each resulting dissimilar-ity matrix was used to visualize sample differences via metric multidimensional scaling withstress majorization (42). In this procedure, sample points are embedded into two-dimensionssuch that the pairwise Euclidean sample distances in the embedding “best match” the orig-inal sample dissimilarities. Hence, points appearing closer to each other in the embeddingcorrespond to samples with more similar microbial communities. Fitting was done by min-imizing the Kruskal stress function using the Scikit-lean package (363). All dissimilaritymetrics yielded similar results.A.2 Resolving ambiguities in gene-centric metagenomicsIn this work we have shown that a classification of community members into functional groupscan reveal ecologically meaningful differences between microbial communities across environ-ments. The translation of taxonomic information into community functional potential thus178Chapter 2: Supplemental materialprovides a powerful alternative to environmental shotgun sequencing (486). Gene-centricmetagenomic profiles, in particular, suffer from the conceptual limitation that communitygene content generally doesn’t directly translate to community functional potential. This isbecause the same or highly similar genes can be involved in several pathways and because thefunctionality of individual genes typically depends on their genomic context—which remainsunknown in shotgun metagenomics (376). These limitations are inherent to metagenomicprofiling, and apply equally to recent algorithms that estimate community gene content byprojecting detected marker genes to closely related sequenced genomes (255).In contrast, functional profiles based on experimental phenotypic characterizations, as con-structed here, can resolve ambiguities in the interpretation of community gene-content (376).For example, variants of the dissimilatory sulfite reductase (dsrAB) genes can be involved inrespiratory sulfur reduction, energy-yielding chemolithotrophic sulfur oxidation or electron-yielding sulfur oxidation for anoxygenic photosynthesis, depending on the host microorgan-ism (336). In fact, metagenomic dsrAB abundances in our samples do not significantly differbetween water column zones. On the other hand, the abundance of identified sulfide-oxidizingorganisms is greater than the abundance of sulfate respirers and peaks in the mesopelagic(Fig. A.5), where sulfide may be used to support chemolithoautotrophic growth, notably inoxygen minimum zones (415, 489).Similarly, gene sequences coding for ammonia monooxygenase (amo) and the homolog par-ticulate methane monooxygenase (pmo) are generally indistinguishable by many currentgene annotation databases (e.g., KEGG orthologs K10944, K10945, K10946 (221)). In fact,both genes code for enzymes that can oxidize ammonia and methane, even though ammo-nia oxidation (by nitrifiers) and methane oxidation (by methanotrophs) constitute trophicstrategies for separate microbial groups (173, 190). Consequently, the peak of amo/pmo-related metagenomic sequences observed in the mesopelagic zone (Fig. A.5B) cannot a prioribe unambiguously attributed to nitrifiers or methanotrophs. In contrast, our phenotype-based functional profiles strongly suggest that the over-representation of these genes in themesopelagic is due to methanotrophs (Fig. A.5A).A.3 Comparison with Sunagawa et al. (2015)Our analysis revealed a clear positive correlation of community richness with depth, bothin terms of detected functional groups as well as taxa, with the mesopelagic zone exhibit-ing particularly high richness (Figs. 2.2D). These patterns are consistent with an increased179Chapter 2: Supplemental materialgenetic as well as taxonomic richness at depth, as reported by Sunagawa et al. (460). Fur-thermore, our analysis revealed strong correlations of functional community profiles to depthand depth-correlated environmental variables such as temperature, nitrate and phosphate,but much weaker correlations to salinity (Fig. 2.1B). Previous metagenomic analysis of asubset of the same samples by Sunagawa et al. (460) also showed a strong correlation ofdepth and temperature, and an insignificant correlation of salinity, with functional profiles.In apparent contrast to our results, that study found only weak effects of nitrate and phos-phate on metagenomic composition. However, Sunagawa et al. only considered surface layersamples for calculating correlations with nutrient concentrations, whereas our analysis alsoincludes samples from the mesopelagic zone where light becomes less relevant and nitratebecomes an important terminal electron acceptor for anaerobic growth. A restriction of ourcorrelation analysis to surface samples revealed much weaker effects of nitrate and phosphateon functional profiles (Fig. A.14), consistent with Sunagawa et al.’s metagenomic analysis.180Chapter 2: Supplemental materialTable A.1: Oceanographic variables. Overview of oceanographic variables and sources.variable depth-specific units sourceoxygen (dissolved) yes mL/L aapparent oxygen utilization yes mL/L bsalinity yes PSU ctemperature (in-situ) yes ◦C dnitrate (NO−3 ) yes µM ephosphate (PO3−4 ) yes µM fsilicate (dissolved) yes µM gdepth yes m htotal inorganic carbon (total CO2) no (surface) µmol/kg ipH no (surface) - jinsolation no (surface) kWh ·m−2 · d−1 kduration of day no hours -distance to thermocline yes m la Sunagawa et al. (460) if available, otherwise monthly average by WOA 2013 V2 (140).bmonthly average by WOA 2013 V2 (140).c Sunagawa et al. (460) if available, otherwise monthly average by WOA 2013 V2 (549).d Sunagawa et al. (460) if available, otherwise monthly average by WOA 2013 V2 (278).e Sunagawa et al. (460) if available, otherwise monthly average by WOA 2013 V2 (141).f Sunagawa et al. (460) if available, otherwise monthly average by WOA 2013 V2 (141).g Sunagawa et al. (460) if available, otherwise monthly average by WOA 2013 V2 (141).h Sunagawa et al. (460).i monthly average by Takahashi et al. (464).j monthly average by Takahashi et al. (464).k monthly average by NASA Earth Observations (342).l thermocline defined as depth where temperature drops 0.5◦C below reference depth 10 m (95);using monthly average temperature profile from WOA 2013 V2 (278).181Chapter 2: Supplemental materialA Bapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonFigure A.1: Correlation analysis at various taxonomic levels. Mean absolute Spearman rankcorrelations between taxon proportions within each functional group and environmental variables.Taxa are collapsed at the (A) genus or (B) family level. Circle surface area and color darkness areproportional to the absolute correlation, averaged across all taxa within a functional group.182Chapter 2: Supplemental materialclassorderfamilygenus0.0 1.00.5 0.0 1.00.5 0.0 1.00.5 0.0 1.00.5Figure A.2: Environmental filtering at higher taxonomic levels. Cross-validated coefficientsof determination (R2cv) for taxon proportions within each functional group, achieved by regressionmodels with environmental predictor variables. Taxa are collapsed at various levels (genus, family,order and class).183Chapter 2: Supplemental materialFigure A.3: Functional vs taxonomic profiles. (A) Functional and (B) taxonomic (class-level) community profiles, both based on 16S rDNA sequences. Samples are ordered according towater column zone (SRF: surface water layer; DCM: deep chlorophyll maximum; MIX: subsurfaceepipelagic mixed layer; MES: mesopelagic zone). A darker color corresponds to a greater relativeabundance.184Chapter 2: Supplemental materialFigure A.4: Dissimilarities in functional group composition vs geographical distances.Bray-Curtis dissimilarities between microbial communities in the mesopelagic zone, compared togeographical distances (one point per sample pair). Dissimilarities are calculated in terms of OTUproportions within various functional groups (one plot per functional group).185Chapter 2: Supplemental materialFigure A.5: Phenotype-based vs metagenomic functional profiles. Functional communityprofiles calculated based on (A) functional annotations of bacterial and archaeal taxa and (B)metagenomic KEGG orthologous groups (221). A darker color corresponds to a higher relativeabundance. Samples (columns) are clustered by ocean layer (SRF: surface water layer; DCM:deep chlorophyll maximum; MIX: subsurface epipelagic mixed layer; MES: mesopelagic zone), andfunctional groups (rows) are hierarchically clustered using UPGMA. Several, but not all, functionalgroups are comparable between (A) and (B).186Chapter 2: Supplemental materialphotoautotrophyanimal parasites or symbiontschitinolyticnitrate denitrificationnitrite respirationdark hydrogen oxidationhydrocarbon degradationxylanolyticplant pathogenaerobic chemoheterotrophynitrate respirationnitrate reductioncellulolyticfermentationmanganese oxidationligninolyticmethanotrophyureolyticaerobic nitrite oxidationsulfate respirationdark oxidation of reduced sulfur compoundsmethylotrophydark sulfur oxidationmethanol oxidationdark sulfite oxidationdark sulfide oxidationaerobic ammonia oxidationanoxygenic photoautotrophyanoxygenic photoautotrophyaerobic ammonia oxidationdark sulfide oxidationdark sulfite oxidationmethanol oxidationdark sulfur oxidationmethylotrophydark oxidation of reduced sulfur compoundssulfate respirationaerobic nitrite oxidationureolyticmethanotrophyligninolyticmanganese oxidationfermentationcellulolyticnitrate reductionnitrate respirationaerobic chemoheterotrophyplant pathogenxylanolytichydrocarbon degradationdark hydrogen oxidationnitrite respirationnitrate denitrificationchitinolyticanimal parasites or symbiontsphotoautotrophy1.00.80.60.40.20.00.20.40.60.81.0Figure A.6: Correlations between functional groups (corrected). Spearman rank correla-tions between relative functional group abundances, after correcting for group overlaps (in termsof shared OTUs). Blue and red colors correspond to positive and negative correlations, respec-tively. White corresponds to zero or statistically non-significant correlations. Rows and columnsare hierarchically clustered (UPGMA). For correlations not correcting for overlaps see Fig. A.12.187Chapter 2: Supplemental materialFigure A.7: Taxonomic compositions within functional groups. OTU proportions withinvarious functional groups (one plot per functional group; one color per OTU within each plot).For each functional group, samples are sorted according to the relative abundance of the entirefunctional group, as indicated by the horizontal scale.188Chapter 2: Supplemental materialFigure A.8: Community dissimilarities vs geographical distances. Bray-Curtis dissimilari-ties between microbial communities compared to geographical distances (one point per sample pair).Samples are restricted to the surface layer (top row) and the deep chlorophyll maximum (bottomrow). Community dissimilarities are calculated in terms of relative functional group abundances(left column) and relative OTU abundances (right column).189Chapter 2: Supplemental material-80-40 0 40 80-150 -100 -50 0 50 100 150latitude (°)longitude (°)16S miTAG samples for regressionFigure A.9: Sampling locations. Most locations include samples at multiple depths. Data fromSunagawa et al. (460).190Chapter 2: Supplemental materialdistance to thermoclinetemperaturetotal inorganic carbonpHoxygensalinityphosphatenitratesilicateapparent oxygen utilizationdepthdaily insolationduration of dayduration of daydaily insolationdepthapparent oxygen utilizationsilicatenitratephosphatesalinityoxygenpHtotal inorganic carbontemperaturedistance to thermocline1.00.80.60.40.20.00.20.40.60.81.0Figure A.10: Correlations between environmental variables. Blue and red colors correspondto positive and negative correlations, respectively. White corresponds to zero or statistically non-significant correlations. Rows and columns are hierarchically clustered (UPGMA).191Chapter 2: Supplemental materialmethanol oxidationmethylotrophynitrate respirationdark hydrogen oxidationnitrate denitrificationnitrite respirationdark sulfite oxidationdark sulfur oxidationdark sulfide oxidationdark oxidation of reduced sulfur compoundsaerobic nitrite oxidationsulfate respirationanoxygenic photoautotrophyphotoautotrophyfermentationnitrate reductionaerobic chemoheterotrophyanimal parasites or symbiontsureolyticmethanotrophyaerobic ammonia oxidationchitinolyticcellulolyticxylanolytichydrocarbon degradationmanganese oxidationligninolyticplant pathogenplant pathogenligninolyticmanganese oxidationhydrocarbon degradationxylanolyticcellulolyticchitinolyticaerobic ammonia oxidationmethanotrophyureolyticanimal parasites or symbiontsaerobic chemoheterotrophynitrate reductionfermentationphotoautotrophyanoxygenic photoautotrophysulfate respirationaerobic nitrite oxidationdark oxidation of reduced sulfur compoundsdark sulfide oxidationdark sulfur oxidationdark sulfite oxidationnitrite respirationnitrate denitrificationdark hydrogen oxidationnitrate respirationmethylotrophymethanol oxidation0.00.10.20.30.40.50.60.70.80.91.0Figure A.11: Functional group overlaps. Overlaps between functional groups in terms ofshared OTUs (Jaccard similarity index). A darker color corresponds to a greater overlap. Anoverlap of 1.0 corresponds to identical groups.192Chapter 2: Supplemental materialphotoautotrophyanimal parasites or symbiontschitinolytichydrocarbon degradationplant pathogenmanganese oxidationligninolyticxylanolyticnitrate denitrificationnitrite respirationdark hydrogen oxidationcellulolyticfermentationureolyticnitrate respirationnitrate reductionmethanotrophyaerobic nitrite oxidationsulfate respirationdark sulfite oxidationdark sulfur oxidationmethanol oxidationaerobic chemoheterotrophymethylotrophydark oxidation of reduced sulfur compoundsaerobic ammonia oxidationdark sulfide oxidationanoxygenic photoautotrophyanoxygenic photoautotrophydark sulfide oxidationaerobic ammonia oxidationdark oxidation of reduced sulfur compoundsmethylotrophyaerobic chemoheterotrophymethanol oxidationdark sulfur oxidationdark sulfite oxidationsulfate respirationaerobic nitrite oxidationmethanotrophynitrate reductionnitrate respirationureolyticfermentationcellulolyticdark hydrogen oxidationnitrite respirationnitrate denitrificationxylanolyticligninolyticmanganese oxidationplant pathogenhydrocarbon degradationchitinolyticanimal parasites or symbiontsphotoautotrophy1.00.80.60.40.20.00.20.40.60.81.0Figure A.12: Function correlations (uncorrected). Spearman rank correlations between rel-ative functional group abundances, not accounting for functional group overlaps. Blue and redcolors correspond to positive and negative correlations, respectively. White corresponds to zero orstatistically non-significant correlations. Rows and columns are hierarchically clustered (UPGMA).For correlations accounting for functional group overlaps see Fig. A.6.193Chapter 2: Supplemental materialFigure A.13: Community dissimilarities vs geographical distances at higher taxonomiclevels. Bray-Curtis dissimilarities between microbial communities compared to geographical dis-tances (one point per sample pair), for samples in the mesopelagic zone. Community dissimilaritiesare calculated in terms of relative (A) genus, (B) family, (C) order and (D) class abundances.194Chapter 2: Supplemental materialA Bapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonFigure A.14: Functional vs taxonomic in the surface layer. Spearman rank correlationsbetween environmental variables and (A) relative functional group abundances or (B) OTU pro-portions within individual functional groups, restricted to 63 surface layer samples. Circle surfacearea and color darkness are proportional to absolute correlations.195Chapter 2: Supplemental materialTable A.2: Overview of considered Tara oceans samples. Samples usedin this study, obtained from Sunagawa et al. (460). The last column indicateswhether a sample was used in the regression and correlation analyses, dependingon the availability of metadata.# PANGAEA accession INSDC accessions regr. & corr.1 TARA_A100000164 ERS488330 yes2 TARA_A100001011 ERS478040 yes3 TARA_A100001015 ERS478017 yes4 TARA_A100001035 ERS488569 yes5 TARA_A100001037 ERS488599 yes6 TARA_A100001234 ERS488621 yes7 TARA_A100001388 ERS488545 yes8 TARA_A200000113 ERS477931 yes9 TARA_A200000159 ERS477953 yes10 TARA_B000000437 ERS490029 yes11 TARA_B000000441 ERS490085 yes12 TARA_B000000460 ERS490065 no13 TARA_B000000475 ERS490124 yes14 TARA_B000000477 ERS490163 yes15 TARA_B000000532 ERS489877 yes16 TARA_B000000557 ERS489846 yes17 TARA_B000000565 ERS489733 yes18 TARA_B000000609 ERS489712 yes19 TARA_B100000003 ERS488649 yes20 TARA_B100000029 ERS488685 yes21 TARA_B100000035 ERS488747 yes22 TARA_B100000073 ERS488830 yes23 TARA_B100000085 ERS488916 yes24 TARA_B100000123 ERS489087 yes25 TARA_B100000131 ERS489134 yes26 TARA_B100000161 ERS489236 yes27 TARA_B100000212 ERS489529 yes28 TARA_B100000214 ERS489585 yes29 TARA_B100000242 ERS489315 yes30 TARA_B100000282 ERS489043 yes31 TARA_B100000287 ERS489074 yes196Chapter 2: Supplemental materialTable A.2: Continued from previous page.# PANGAEA accession INSDC accession(s) regr. & corr.32 TARA_B100000315 ERS488769 yes33 TARA_B100000378 ERS489727 no34 TARA_B100000401 ERS489917 yes35 TARA_B100000405 ERS490002 yes36 TARA_B100000408 ERS489987 no37 TARA_B100000424 ERS490433 yes38 TARA_B100000427 ERS490476 yes39 TARA_B100000446 ERS490373 no40 TARA_B100000459 ERS490327 yes41 TARA_B100000470 ERS490230 no42 TARA_B100000475 ERS490265 yes43 TARA_B100000482 ERS490296 yes44 TARA_B100000497 ERS490183 yes45 TARA_B100000508 ERS490507 no46 TARA_B100000513 ERS490542 yes47 TARA_B100000519 ERS490597 yes48 TARA_B100000524 ERS490659 yes49 TARA_B100000530 ERS490691 yes50 TARA_B100000575 ERS492321 yes51 TARA_B100000579 ERS492357 yes52 TARA_B100000586 ERS492381 yes53 TARA_B100000609 ERS493044 yes54 TARA_B100000614 ERS493098 yes55 TARA_B100000674 ERS492821, ERS492814 yes56 TARA_B100000676 ERS492863 yes57 TARA_B100000678 ERS492680 yes58 TARA_B100000683 ERS492733 yes59 TARA_B100000686 ERS492778 yes60 TARA_B100000700 ERS492699 yes61 TARA_B100000745 ERS490714 no62 TARA_B100000749 ERS490633 no63 TARA_B100000767 ERS490928 yes64 TARA_B100000768 ERS490885 yes65 TARA_B100000780 ERS491001 yes197Chapter 2: Supplemental materialTable A.2: Continued from previous page.# PANGAEA accession INSDC accession(s) regr. & corr.66 TARA_B100000787 ERS491044 yes67 TARA_B100000795 ERS491095 yes68 TARA_B100000809 ERS491110 no69 TARA_B100000886 ERS491804 yes70 TARA_B100000900 ERS491938 yes71 TARA_B100000902 ERS492012 yes72 TARA_B100000925 ERS492145 yes73 TARA_B100000927 ERS492177 yes74 TARA_B100000929 ERS492205 yes75 TARA_B100000941 ERS492408 yes76 TARA_B100000945 ERS492445 yes77 TARA_B100000949 ERS492471 no78 TARA_B100000953 ERS491980 yes79 TARA_B100000959 ERS491913 yes80 TARA_B100000963 ERS491836 yes81 TARA_B100000965 ERS491874 yes82 TARA_B100000989 ERS491525 yes83 TARA_B100001013 ERS491767 yes84 TARA_B100001027 ERS491699 yes85 TARA_B100001029 ERS491740 yes86 TARA_B100001057 ERS491492 yes87 TARA_B100001059 ERS491463 yes88 TARA_B100001063 ERS491421 yes89 TARA_B100001079 ERS492294 yes90 TARA_B100001093 ERS493390 yes91 TARA_B100001094 ERS493431 yes92 TARA_B100001105 ERS493460 no93 TARA_B100001109 ERS492228 yes94 TARA_B100001113 ERS492264 yes95 TARA_B100001115 ERS492642 yes96 TARA_B100001121 ERS492888 yes97 TARA_B100001123 ERS492926 yes98 TARA_B100001142 ERS494170 yes99 TARA_B100001146 ERS494208 yes198Chapter 2: Supplemental materialTable A.2: Continued from previous page.# PANGAEA accession INSDC accession(s) regr. & corr.100 TARA_B100001167 ERS494274 no101 TARA_B100001173 ERS494579 yes102 TARA_B100001175 ERS494628 yes103 TARA_B100001179 ERS494616 no104 TARA_B100001245 ERS493372 yes105 TARA_B100001248 ERS493300 yes106 TARA_B100001250 ERS493340 yes107 TARA_B100001287 ERS493636 yes108 TARA_B100001540 ERS494236 yes109 TARA_B100001559 ERS494559 yes110 TARA_B100001564 ERS494518 yes111 TARA_B100001741 ERS494332 yes112 TARA_B100001750 ERS494374 yes113 TARA_B100001758 ERS494394 yes114 TARA_B100001765 ERS494431 no115 TARA_B100001769 ERS494445 yes116 TARA_B100001778 ERS494488 yes117 TARA_B100001939 ERS493914 yes118 TARA_B100001964 ERS493670 yes119 TARA_B100001971 ERS493705 yes120 TARA_B100001989 ERS493752 yes121 TARA_B100001996 ERS493788 yes122 TARA_B100002003 ERS493822 yes123 TARA_B100002019 ERS493877 yes124 TARA_B100002049 ERS494006 no125 TARA_B100002051 ERS493938 yes126 TARA_B100002052 ERS493981 yes127 TARA_E500000075 ERS477979 yes128 TARA_E500000081 ERS477998 yes129 TARA_E500000178 ERS488486 yes130 TARA_E500000331 ERS488509 yes131 TARA_S200000501 ERS488346 yes132 TARA_X000000368 ERS487936 yes133 TARA_X000000950 ERS488119 yes199Chapter 2: Supplemental materialTable A.2: Continued from previous page.# PANGAEA accession INSDC accession(s) regr. & corr.134 TARA_X000001036 ERS488147 yes135 TARA_Y100000022 ERS488714 yes136 TARA_Y100000031 ERS488936 yes137 TARA_Y100000287 ERS488799 yes138 TARA_Y100000294 ERS488849 yes139 TARA_Y200000002 ERS487899 yesTable A.3: KOG-function associations. KEGG orthologous groups (KOG) associated withvarious functions in the metagenomic sequences, for comparison with our phenotype-based func-tional profiles. Whenever KOGs are associated with a single pathway step, the corresponding genesare indicated in brackets.function KOGsoxygenic photosynthesis K02703–K02714, K02716–K02720K08901–K08904, K03541, K03542nitrate respiration K02567, K02568, K00370–K00373nitrate reduction K02567, K02568, K00370–K00374K00367, K00360, K10534hydrogen oxidoreduction K00532–K00534, K06441, K18016, K18017,K18023K00436 K05586–K05588, K18005–K18007xylanolytic K15924, K13465, K01198, K15920, K01181anoxygenic photosynthesis K08926–K08930, K08939–K08954K13991, K13992, K13994nitrite reduction to ammonium K00362, K00363, K03385, K15876sulfate-sulfide oxidoreduction(dsrAB)K00394, K00395denitrification (nosZ ) K00376methanol oxidation K00093, K14028, K14029, K16254–K16260,K17066denitrification (norBC ) K04561, K02305sulfate transport K02045–K02048nitrite respiration (nirKS) K00368, K15864methane or ammonia oxidation K10944–K10946cellulolytic K19356, K19357, K01179, K01225, K19668chitinolytic K01183, K13381, K01452, K17523200Chapter 2: Supplemental materialTable A.4: OTUs per functional group. Number of OTUs assigned to eachfunctional group, compared to the total number of taxonomically annotatedOTUs. Some OTUs were assigned to multiple functional groups (see Fig. A.11for functional group overlaps).functional group OTUs fraction (%)aerobic ammonia oxidation 193 0.39aerobic chemoheterotrophy 23565 47.4aerobic nitrite oxidation 226 0.45animal parasites or symbionts 3037 6.11anoxygenic photoautotrophy 700 1.40cellulolytic 175 0.35chitinolytic 134 0.27dark hydrogen oxidation 355 0.71dark oxidation of reduced sulfur compounds 954 1.92dark sulfide oxidation 327 0.66dark sulfite oxidation 235 0.47dark sulfur oxidation 282 0.57fermentation 3146 6.33hydrocarbon degradation 1776 3.57ligninolytic 236 0.47manganese oxidation 638 1.28methanol oxidation 897 1.81methanotrophy 104 0.21methylotrophy 1023 2.06nitrate denitrification 535 1.08nitrate reduction 3399 6.84nitrate respiration 1229 2.47nitrite respiration 640 1.29photoautotrophy 992 1.99plant pathogen 511 1.03sulfate respiration 300 0.61ureolytic 1054 2.12xylanolytic 58 0.12201Chapter 3: Supplemental materialAppendix BChapter 3: Supplemental materialB.1 MethodsB.1.1 Biological sample collectionDetritus from the bottom of bromeliad tanks was collected and physicochemical measure-ments were taken from all bromeliads in the period of January 8–10, 2015, within an areaspanning roughly 0.2 km2 in the Parque Nacional da Restinga de Jurubatiba, East coastBrazil (see Fig. B.6 for coordinates). At that time, weather conditions were sunny, dry andhot, and were preceded by several weeks of extreme drought (343). This drought may explainwhy we detected almost no insects in the bromeliad tanks that we sampled. Supernatantliquid was removed from the bromeliad’s central tank using a sterile serological pipette. Thedetritus at the bottom was then retrieved using a sterile syringe and a metal spatula, aftercutting the bromeliad open for easier access. The entire retrieved detrital content was mixedbefore sampling. Samples were flash-frozen in liquid nitrogen within 10 minutes of collectionand then frozen in the laboratory at −80◦C until further processing. For shipment, sampleswere concentrated via centrifugation (40 000 g for 15 min, balanced using miliQ filteredwater) and removal of the supernatant fluid, and then freeze-dried for 24 hours. The driedsamples were shipped to our lab at the University of British Columbia, Canada, for furtherprocessing.B.1.2 Chemical analysis of tank waterThe water above the benthic detritus was collected using a serological pipette, stored in 25mL centrifuge tubes on regular ice in the field and at −4◦C in the lab until further analysis(within two days). Water samples for CH4 measurement were taken separately (1.5 mL permeasurement) and directly from the bromeliad, fixed using formalin (4 %) in 3 mL glassvials, stored on regular ice in the field and at −4◦C in the lab until further analysis. Totaldissolved phosphorus concentrations were determined as the inorganic phosphorus obtained202Chapter 3: Supplemental materialafter a procedure of acid-digestion and autoclaving of the water samples and the ascorbicacid-molybdate reaction (155). Total dissolved nitrogen concentrations were determined asthe concentration of nitrate obtained after an acid digestion procedure and autoclavation, ina Flow Injection Analysis System (FIA-Asia IsmatecTM) (539). Methane concentrations weredeterminate using a ShimadzuTM GC-2010AF chromatograph equipped with a Rt-QPLOTcolumn (3 m × 0.32 m) and a flame ionization detector (FID-2010). Temperatures of theinjection, column and detection were 120◦C, 85◦C and 220◦C, respectively. Nitrogen (N2)was used as the carrier gas.Conductivity, pH, temperature and Total Suspended Solids (TSS) were measured in thefield using an ExStik II EC500TM (ExTech Instruments). Salinity was calculated from con-ductivity and temperature using the empirical formula reported by Fofonoff and Millard-Junior (127). Water turbidity was measured in the field using a Hanna TurbidimeterHI98703. Absorption spectra were measured in the lab using a Varian 50 Bio UV-VisibleSpectrophotometerTM, following the manufacturer’s procedures. Dissolved organic carbon(DOC) concentrations were determined using by Pt-catalyzed high-temperature combustionwith a Shimadzu TOC-VCPN Total Carbon AnalyzerTM, after filtering through 0.7 µmWhatmanTM GF/F glass fiber filters.For one bromeliad the retrieved supernatant water was insufficient for performing all of thechemical assays in the field. That water sample was thus diluted at a ratio 1:5 using deionizedwater prior to measuring conductivity, pH, TSS and turbidity. The resulting conductivity,salinity, TSS and turbidity were then corrected using the dilution factor. The pH was cor-rected using a standard curve constructed by serial dilution of water from another bromeliad.For several bromeliads the retrieved supernatant water was insufficient for measuring absorp-tion spectra and DOC concentrations, as well as for excitation-emission spectrophotometry(EES; explained below). These water samples were thus diluted in the lab using deionizedwater as needed. All measurements were subsequently corrected for the effects of dilution.EES of the water samples was performed using a Varian Cary EclipseTM fluorescence spec-trophotometer. In EES, each sample is exposed to light of several wavelengths while si-multaneously measuring the resulting fluorescence spectrum (14). The obtained “excitation-emission matrices” (EEM) were analyzed for organic carbon profiles using parallel factor anal-ysis (PARAFAC) with the MATLAB R© package drEEM (338). EEMs were pre-processed asfollows: The EEM of pure milli-QTM water was subtracted from the sample EEMs. Rayleigh(elastic) and Raman (inelastic) scatter signals were removed by replacing them with NaN.EEM entries for emission wavelengths lower that than the excitation wavelengths were set to203Chapter 3: Supplemental materialzero. EEM entries at the excitation wavelengths 320 nm and 365 nm were ignored becauseof abnormal intensity troughs at all emission wavelengths, likely resulting from imperfec-tions of the fluorometer lamp. EEMs were corrected for inner filter effects using the sampleabsorption spectra and the drEEM function fdomcorrect as described by Murphy et al.(338).PARAFAC model fitting was attempted for various model sizes (3–9). To avoid localPARAFAC minima, fitting for each model size was repeated 50 times with random ini-tialization using the drEEM function randinitanal. Model residuals were inspected asdescribed by Stedmon et al. (445) and Stedmon and Bro (444) to ensure that the modelsize was sufficient. Split-half validation (‘S4C4T2’; 338) failed for all considered model sizes,but was ignored because of low sample size when compared to the high richness of observedEEM profiles. Instead, to constrain the model’s size and avoid overfitting, model compo-nents were inspected for physical plausibility as described by Murphy et al. (338, e.g., Fig.7) and subsequently compared to published entries in the OpenFluor fluorophore databasebased on Tucker’s congruence coefficient (339). We kept the model (size 4; Fig. B.7) withthe highest number of plausible components represented in OpenFluor at a congruence ofat least 0.98. The best matches in the OpenFluor database were “CS-Galathea, C1” forcomponent 1 (216), “Recycle_WRAMS, C5” for component 2 (337), “PrairieLakes, C2” forcomponent 3 (355) and “FloridaKeys, C3” for component 4 (535). The model explained98.2% of the variance, at a core consistency of 82.9% (Fig. B.8). For each sample and foreach individual PARAFAC component we determined the maximum fluorescence intensity inthe component’s EEM, and multiplied it by the component’s score in the particular sample.This yielded 4 PARAFAC component intensities per sample, each in arbitrary units thatare comparable across samples but not across PARAFAC components. These componentintensities were subsequently used in our analysis as 4 additional environmental variables(“PARAFAC 1–4”).B.1.3 Measurement of other physicochemical variablesLight intensity (photosynthetic photon flux density) on bromeliads was measured using anLI-250A LightmeterTM (LI-COR Biosciences), placed on the ground next to the bromeliadat noon of a sunny day (January 10, 2015), after trimming the bromeliad’s foliage to avoidshading of the device by the bromeliad itself. The detrital volume was measured using thecentrifuge tube scale after allowing for precipitation for 5 minutes, performing the read atthe interface between the precipitated detritus and the supernatant transparent fluid. The204Chapter 3: Supplemental materialtotal tank volume was set to the total volume of all retrieved material (detritus and water).The total tank depth was either measured using a metal wire with engraved cm-scale, orusing the serological pipette’s volume scale upon calibration. Tree cover (“shading”) abovebromeliads was measured by taking a photo from the top of a bromeliad “face-up” on asunny day, and processing the photo using ImageJTM for contrasting objects against a bluesky background. An overview of all physicochemical environmental variables is provided inTable B.5.B.1.4 16S sequencingDNA was extracted from the re-hydrated samples using the MoBio PowerSoil R© DNA ex-traction kit, by applying the manufacturer’s suggested protocol. Amplification of the 16SrRNA gene was done using barcoded primers covering the V4 region (E. coli 515F and 806R)that included Illumina adapters, and using the Earth Microbiome Project 16S amplificationprotocol version 4_13 (60). Amplicon DNA from all samples was pooled into a single library,at such proportions that each sample contributed a similar amount of DNA. Primer dimersand remaining PCR enzymes were removed from the amplicon library using the MoBio R©UltraClean PCR Clean-Up Kit. Library quantitation was performed by Genoseq Core (Uni-versity of California, Los Angeles) using a high-sensitivity Agilent BioanalyzerTM and KappaBiosystems’ Illumina Genome AnalyzerTM (KAPA SYBR FAST Roche LightCycler 480) kit,followed by qPCR. Sequencing was performed by Genoseq Core using an Illumina MiSeqTMnext generation sequencer, following the manufacturer’s standard protocol.Sequencing yielded 15,090,774 paired-end sequences (2 × 300 base pairs each). Sequenceanalysis was performed using the QIIME toolbox (version 1.9.1, 59). Paired-end reads weremerged after trimming forward reads at length 240 and reverse reads at length 160. Mergedsequences were quality filtered using QIIME’s default settings, yielding 9481,315 sequences ofmedian length 253. Remaining sequences were de-noised and clustered de-novo using cd-hit-otu (273) at a 99% 16S rDNA similarity threshold, generating 2884 operational taxonomicunits (OTUs) representing 1,908,183 sequences across all samples. Sample B17 yielded byfar the fewest sequences (5811 sequences corresponding to 677 OTUs).We note that historically a less stringent threshold of 97% 16S rDNA similarity was rec-ommended for delineating prokaryotic OTUs in biogeographical studies (145). However,recent work shows that greater taxonomic resolution is needed to detect signals of endemism(e.g., up to 99.5% for the cyanobacterium Prochlorococcus; 305) and signals of competitiveexclusion (99–100%; 244), and that taxa defined on the basis of 97% similarity may be205Chapter 3: Supplemental materialunderspeciated (232, 439).We did not rarefy the OTU table so as to obtain as much of an accurate estimation of OTUproportions as possible for analyses based on quantitative abundances (317). Moreover,rarefaction prior to our presence-absence-based analyses (see details below) would have led tohigher estimates of OTU turnover between samples (Table B.1) as well as higher checkerboardC-scores between OTUs (Table B.3) because OTUs would be “competing” for sequences.This would further strengthen the patterns upon which our conclusions are based. DiagnosticOTU rarefaction curves are shown in Fig. B.9.Taxonomic assignment of representative sequences was done using uclust (110) and theSILVA reference database (release 119, 378), using the first 50 hits at a similarity thresholdof at least 90% as follows: For any queried sequence, if at least one hit had a similaritys ≥ 99%, then all hits with similarity s were used to form a consensus taxonomy. Otherwise,if at least one hit had a similarity s ≥ 90%, then all hits with similarity at least (s − 1%)were used to form a consensus taxonomy. If a query did not match any reference sequence ator above 90% similarity, it was considered unassigned. A total of 1965 OTUs (representing1,874,361 sequences across all samples) were taxonomically annotated.Representative sequences were aligned against the SILVA database using PyNAST (58, 378),and phylogenetic relationships were calculated using the FastTree algorithm (372), at stan-dard QIIME settings. Phylogenetic distances are in nucleotide substitutions per site.B.1.5 Functional annotation of prokaryotic taxa (FAPROTAX)To determine the taxonomic composition within each of the 9 considered functional groups(aerobic chemoheterotrophy, cellulolysis, fermentation, methanogenesis, methylotrophy, ni-trogen respiration, sulfate respiration, photoautotrophy, ureolysis), we associated each taxo-nomically annotated OTU with one or more functions based on extensive literature search,whenever possible. Details of this approach, which we outline here, are provided by Loucaet al. (289). In short, a taxon (e.g., strain, species or genus) and all OTUs within that taxonwere associated with a particular metabolic function if all cultured representatives within thetaxon are known to exhibit that function. We note that as the number of cultured strainscontinues to increase, some of these generalizations may turn out to be false. Furthermore,a substantial fraction of OTUs could not be assigned to any function, thus OTU propor-tions inside a functional group only apply to the subset of functionally characterized OTUs(although this limitation does not affect the conclusions of this study). In total 465 out of206Chapter 3: Supplemental material1965 OTUs were assigned to at least one functional group, yielding in total 518 functionalannotations (see Table B.6 for an overview). OTUs without any functional annotation wereomitted from the analysis.We note that FAPROTAX functional groups are not completely one-to-one comparable withmetagenomic gene groups, due to ambiguities in the functions potentially performed by somegenes (376). To strengthen our confidence in the stability of the 9 considered functionalgroups, we provide detailed gene-centric functional profiles for multiple related functions(Figs. 3.2A,B and B.10).B.1.6 Metagenomic sequencingTo assess the functional stability of microbial communities across samples, we performedshotgun environmental DNA sequencing (metagenomics), which allows the detection ofknown genes in an environment regardless of their host organisms. Extracted DNA wassequenced in 100-bp paired-end fragments on an Illumina HiSeq 2000TM. Library prepara-tion and sequencing was done by the Biodiversity Research Centre NextGen SequencingFacility and followed standard Illumina protocols. All samples were uniquely barcodedand run together on a single lane. The resulting sequence data were preprocessed us-ing Illumina’s CASAVA-1.8.2. Specifically, output files were converted to fastq format,and sequences were separated by barcode (allowing one mismatched base pair), using theconfigureBclToFastq.pl script. This yielded a total of 151,308,568 quality-filtered paired-end reads. Reads were trimmed at the beginning and end to increase average read quality,yielding an average forward and reverse read length of 97 and 98 bp, respectively. Suffi-ciently overlapping paired-end reads were merged using PEAR 0.9.8 with default options(544), yielding 17,007,327 merged reads. Non-merged read pairs were deduplicated usingthe SOFA pipeline (168, version 1.2) and the KEGG protein reference database (221, release2011.06.18), in order to reduce potential double-counts during subsequent gene annotation.MetaPathways 2.5 (247) was used for ORF prediction in all merged and non-merged reads(min peptide length 30, algorithm prodigal), yielding 215,140,278 ORFs. Predicted ORFswere taxonomically annotated in MetaPathways using LAST and the NCBI RefSeq proteindatabase (470, release 2015.12.12), and multiple taxonomic annotations were consolidatedusing a lowest common ancestor algorithm (247). Non-prokaryotic ORFs were excluded fromsubsequent analysis. LAST annotation of prokaryotic ORFs against the KEGG protein ref-erence database was performed using MetaPathways (KEGG release 2011.06.18, min BSR0.4, max E-value 10−6, min score 20, min peptide length 30, top hit), yielding 55,058,696207Chapter 3: Supplemental materialannotations. Metagenomic KEGG orthologous group (KOG) counts (221) were normalizedusing the total number of KEGG-annotated sequences per sample (total sum scaling).To estimate the variability of the 9 functional groups considered in this study, we examinedthe abundances of selected proxy genes that roughly corresponded to one or more functionalgroups. These genes were chosen based on the KEGG reference pathway database (221) andwere identified using the KOG annotations of prokaryotic ORFs. Whenever applicable, mul-tiple KOGs associated with similar metabolic functions (e.g., dissimilatory nitrite reductionto ammonium, nirBD and nrfAH ) were combined into a single gene group. An overview ofKOGs associated with each function is provided in Table B.7. The resulting metagenomicprofiles are given in Figs. 3.2A,B and B.10.B.1.7 Comparing functional and taxonomic variabilityTo robustly compare the degree of functional variability and taxonomic variability withinfunctional groups we used multiple statistical measures that are either entirely based on pres-ences and absences (“binary”) or that take into account relative abundances. Specifically, forevery functional group, we measured the binary OTU “overlap” between any two samplesin terms of the Jaccard overlap index, defined as the number of OTUs detected in bothsamples, divided by the number of OTUs detected in any of the two samples (527). Hence, aJaccard overlap of 1 corresponds to complete overlap (regardless of OTU proportions), whilea Jaccard overlap of 0 corresponds to no overlap at all. Mean Jaccard overlaps (MJO), i.e.,Jaccard overlaps averaged over all bromeliad pairs, were within the range ∼0.2–0.6 for allfunctional groups (Table B.1). These low MJOs indicate substantial differences in commu-nity structure across bromeliads, however in principle they could result purely from detectionstochasticity, especially of rare OTUs (i.e., due to insufficient sequencing depth)189. To de-termine the statistical significance (“P-value”) of these low MJOs, we compared them torandom MJOs generated under a null model of random sampling from the regional OTUpool. Specifically, for any given functional group, OTUs were randomly drawn from a multi-nomial distribution corresponding to the OTU proportions in the regional OTU pool, whilethe number of draws per bromeliad was equal to the number of sequences assigned to thefunctional group in that bromeliad. The P-value of an observed MJO was defined as theprobability that a random MJO would be lower than the observed MJO, and was estimatedbased on 1000 iterations. All functional groups had a significantly low MJO (P < 0.001),showing that low overlaps are not just the result of detection stochasticity. We note that theJaccard overlap of gene groups was 1 for all sample pairs, since all considered gene groups208Chapter 3: Supplemental materialwere detected in all samples.To account for OTU or gene group abundances, we also considered the Morisita overlap index(332). This index is particularly suited for cases where abundance estimates are obtainedat varying sequencing depths (527), and its interpretation is analogous to the Jaccard index(results in Table B.1). We mention that the Morisita overlap of gene groups (0.98 whenaveraged across sample pairs) is slightly below 1, because gene group abundances do varybetween samples.To verify the robustness of our conclusions based on overlap indices, we also examined thecoefficients of variation (CV, i.e., the standard deviation divided by the mean) of relativegene group abundances on the one hand, and the CVs of OTU proportions within individualfunctional groups, on the other hand. Because each particular functional group containedmultiple OTUs, we averaged the CV over all OTUs within the functional group. We notethat the considered gene groups (Fig. 3.2A) only cover a small fraction of the total detectedgene pool (∼ 5 % of annotated metagenomic sequences). Hence, to minimize the dependenceof the CV of any particular gene group on the choice and coverage of other gene groups, weconsidered gene group abundances relative to the total number of annotated metagenomicsequences in each sample. An overview of CVs is provided in Table B.2. Observe that OTUCVs are generally an order of magnitude higher than gene group CVs, consistent with ourconclusions based on overlap indices.B.1.8 Metric multidimensional scaling and coloringPairwise dissimilarities between taxonomic community profiles reported here were calculatedusing the Bray-Curtis metric, which is widespread in biogeographical studies (266). Otherdissimilarity metrics (Canberra and Hellinger) generally yielded similar conclusions, so theyare not further discussed here. Each dissimilarity matrix was used to visualize differencesin community composition via metric multidimensional scaling (MDS; 42). In MDS, samplepoints are embedded into a reduced number of dimensions (e.g., 3) such that the pairwise Eu-clidean sample distances in the embedding “best match” the original dissimilarities. Hence,points that are closer to each other in the embedding correspond to samples with more simi-lar microbial communities. The embedding was performed by minimizing the Kruskal stress,using the Scikit-lean package (363). 3-dimensional MDS coordinates were mapped to colorspace by associating each coordinate with one color channel (red, green and blue; Fig. B.6),hence sample pairs with similar microbial communities are colored similarly, allowing easieridentification. In all reported cases, the Kruskal stress of the MDS embeddings was below209Chapter 3: Supplemental material0.25.B.1.9 Phylogenetic community structureTo assess whether community assembly within individual functional groups was driven bypurely stochastic processes (such as lottery effects; 52) or was subject to deterministic selec-tion, we examined the phylogenetic distances between functionally similar OTUs co-occurringin each sample. The phylogenetic distance (PD) between two OTUs was calculated as thesum of branch lengths needed to traverse the phylogenetic tree from one OTU to the other.The mean phylogenetic distance (MPD) in a community of M OTUs was calculated asMPD =M∑i=1i−1∑j=1dijNiNjM∑i=1i−1∑j=1NiNj, (B.1.1)where Ni is the abundance of OTU i in the community and dij is the phylogenetic distancebetween OTUs i and j. Note that this definition of MPD is almost equivalent to the “phy-logenetic diversity” introduced by Chave et al. (66), with the difference that Chave et al.defined dij as the divergence time between two OTUs (which is half of their phylogeneticdistance in most cases).For each sample, the MPD was compared to the expected MPD (MPD) under the nullhypothesis of random phylogenetic relationships between OTUs. Specifically, the distributionof MPDs under the null hypothesis was estimated by randomly and repeatedly permutingOTUs in the phylogenetic tree 1000 times, while keeping their proportions in each samplefixed. OTUs were permuted independently for each sample, and permutations were restrictedto OTUs within the same functional group. The standardized effect size (SES) of a sample— which quantifies the deviation of the observed MPD from the expectation of the nullhypothesis, was calculated asSES = MPD−MPDσMPD, (B.1.2)where σMPD is the standard deviation of random MPDs generated by the null hypothe-sis. Hence, a strongly positive or strongly negative SES corresponds to strong phylogeneticoverdispersion or underdispersion, respectively. The SESs for all samples and within eachfunctional group are shown in Fig. 3.4.210Chapter 3: Supplemental materialFor several functional groups, SESs are either predominantly positive or predominantly nega-tive across samples, indicating that the MPDs within these groups may not be random (i.e.,are inconsistent with the null hypothesis). The statistical significance (P-value) of theseimbalances (i.e., the difference between the number of positive and negative SES acrosssamples) was defined as the probability that random SESs generated by the null hypothesiswould display comparable or stronger imbalances in magnitude, and was estimated using1000 random iterations. Hence, a low P-value means that the observed imbalance of SESswithin a functional group is unlikely to have occurred by chance.B.1.10 Comparing OTU proportions to environmental variablesTo test whether the variation of taxonomic composition within functional groups can beattributed to variation in environmental conditions, we constructed multivariate non-linearregression models for each OTU in each functional group, using 21 environmental variables aspredictors (overview of environmental variables in Table B.5). Specifically, we used non-linearkernel ridge regression (KRR) with Gaussian radial kernels, implemented by the scikit-learnsoftware (363). An important advantage of KRR models over conventional (e.g., multivariatelinear) regression models is the use of regularization, which reduces the risk of overfitting bypenalizing excessive model coefficients, thus avoiding excessive model complexity. The finalmodel complexity depends on a parameter that influences the extent to which coefficientsare penalized. KRR models generally present a more robust alternative to step-wise modelselection methods (88). Non-linearities in the data are addressed using the “kernel-trick”,which replaces predictors with higher-dimensional variables using Gauss-shaped functionsprior to linear regression (363). The predictive power of each KRR model was measured interms of the 10-fold cross-validated coefficient of determination (R2cv), which represents theachievable coefficient of determination when only a random subset (90%) of the samples areused for fitting and the remaining samples (10%) are used for independent testing (421).Hence, R2cv provides a more conservative estimate of the predictive power than the classicalcoefficient of determination (R2). The R2cv was determined via 10-fold Monte-Carlo cross-validation with 500 random iterations. The penalization parameter as well as the Gaussiankernel radius were optimized for each KRR model using grid search and maximization of theachievable R2cv.To further assess the potential influence of individual environmental variables on taxonomiccommunity composition, we calculated Spearman rank correlations between OTU propor-tions within functional groups on the one hand, and environmental variables on the other211Chapter 3: Supplemental materialhand. For each functional group and each environmental variable, we calculated the averagemagnitude of correlations across all OTUs (“average absolute correlation”; Fig. 3.5). Hence,a large average absolute correlation means that OTUs in a particular functional group tendto be strongly (negatively or positively) correlated with a particular environmental variable.The statistical significance of large average absolute correlations was estimated using 1000random permutations of columns (i.e., samples) in the OTU table.B.1.11 Comparing community dissimilarities to geographical dis-tancesTo examine whether dispersal limitation across bromeliads had an effect on the compositionwithin individual functional groups, we calculated Spearman rank Mantel correlations be-tween pairwise geographical distances and pairwise dissimilarities (in terms of OTU propor-tions within functional groups). To ensure the robustness of our conclusions, we consideredthree different dissimilarity metrics — Bray-Curtis, Canberra and Hellinger, all of whichare widely used in ecology (266). We estimated the statistical significance of correlationsusing 1000 random permutations of the rows and columns in the geographical distance ma-trix (rows and columns permuted similarly). None of the considered dissimilarity metricsyielded any statistically significant correlations to geographical distance. An overview ofresults for the Bray-Curtis dissimilarity metric is given for illustration in Table B.4. A visualcomparison of geographical distances and Bray-Curtis dissimilarities is shown in Fig. B.5.B.1.12 Comparing OTU co-occurrences to a null modelTo examine whether OTU co-occurrences across samples follow non-random patterns (e.g.,resulting from competitive exclusion), we considered a statistical quantity known as thecheckerboard score (“C-score”) of the OTU presence-absence matrix (159). The C-score isdefined asC = 2M(M − 1)M∑i=1i−1∑j=1(Ni −Nij)(Nj −Nji), (B.1.3)where M is the total number of considered OTUs, Ni is the number of samples containingOTU i and Nij is the number of samples containing both OTUs i and j. Hence, for fixed Ni,the C-score becomes larger if species co-occur less frequently (i.e., Nij are smaller). To assesswhether an observed C-score was likely due to chance (i.e., if OTUs occur independently of212Chapter 3: Supplemental materialeach other), we compared it to the C-score distribution of several random presence-absencematrixes generated under the “fixed-fixed” null model (159). This null model preserves thenumber of samples containing each OTU as well as the number of OTUs present in each sam-ple and in each functional group, and is thus suitable for detecting non-random co-occurrencepatterns across samples that may differ in terms of OTU richness, while maintaining a lowfalse positive error rate (76). Specifically, if C-scores generated by the null hypothesis aretypically lower than the observed C-score, this would mean that OTUs tend to exclude eachother more often than expected by chance (i.e., are segregated). We calculated the C-scoreand its deviation from the null model separately for each functional group. Randomizedpresence-absence matrixes corresponding to the null model were generated using the “curve-ball” algorithm (456). We used 1000 random matrixes to asses the statistical significance ofC-scores. An overview of results is given in Table B.3.B.1.13 Sequence data availabilityMolecular sequence data reported in this chapter have been deposited in the NCBI Bio-Project database (http://www.ncbi.nlm.nih.gov/bioproject) and will be made public uponpublication of this work (BioProject no. PRJNA321235; SRA accession nos. SRP074855and SRP074855).213Chapter 3: Supplemental materialFigure B.1: Taxonomic composition within functional groups (genus level). Proportionsof prokaryote genera within individual functional groups (one color per genus, one bar stack persample, one plot per functional group). Samples are sorted alphabetically as in Fig. 3.2.214Chapter 3: Supplemental materialFigure B.2: Taxonomic composition within functional groups (family level). Proportionsof prokaryote families within individual functional groups (one color per family, one bar stack persample, one plot per functional group). Samples are sorted alphabetically as in Fig. 3.2.215Chapter 3: Supplemental materialFigure B.3: Taxonomic composition within functional groups (order level). Proportionsof prokaryote orders within individual functional groups (one color per order, one bar stack persample, one plot per functional group). Samples are sorted alphabetically as in Fig. 3.2.216Chapter 3: Supplemental materialFigure B.4: Taxonomic composition within functional groups (class level). Proportionsof prokaryote classes within individual functional groups (one color per class, one bar stack persample, one plot per functional group). Samples are sorted alphabetically as in Fig. 3.2.217Chapter 3: Supplemental materialFigure B.5: Geographical distances vs dissimilarities within functional groups. Bray-Curtis dissimilarities between samples (in terms of OTU proportions within individual functionalgroups), compared to geographical sample distances (one point per sample pair, one plot perfunctional group).218Chapter 3: Supplemental materialFigure B.6: Geographic location vs composition within functional groups. Each plot:Geographical sample locations in terms of longitude and latitude (one point per sample). Pointsare colored according to Bray-Curtis dissimilarities between samples, in terms of OTU proportionswithin individual functional groups (one plot per functional group). Similar colors correspond tosimilar compositions within functional groups (see Methods for details).219Chapter 3: Supplemental materialFigure B.7: PARAFAC model components. Left column: Excitation-emission matrixes of the4 PARAFAC model components, estimated for the excitation-emission spectra of the bromeliaddetrital samples. Right column: Excitation and emission spectra corresponding to the PARAFACcomponents.220Chapter 3: Supplemental materialFigure B.8: Modeling EEMs of bromeliad DOC with PARAFAC. Left column: Measuredexcitation-emission matrixes (EEM) for a subset of bromeliad samples that illustrates the detectedfluorophore diversity (B1, B4, B12 and B31). Middle column: Corresponding EEMs modeledby the 4-component PARAFAC model. Right column: Corresponding residual EEMs. Whitehorizontal bands cover EEM pixels that we omitted from the analysis due to spurious excitationtroughs erroneously detected by the fluorometer. Diagonal white bands cover Rayleigh and Ramanscatters, which we also omitted from the analysis.221Chapter 3: Supplemental materialFigure B.9: 16S rDNA rarefaction curves (OTU richness). Each plot: Expected number ofobserved distinct OTUs at various sequencing depths for a particular sample, determined throughrepeated random rarefactions.222Chapter 3: Supplemental materialFigure B.10: Detailed functional community profiles. (A) Detailed gene-centric prokaryoticfunctional profiles, in terms of functional group proportions inferred from metagenomic sequences(one color per function, one column per sample). (B) Same as (A), but focusing on the 10 leastabundant functional groups.223Chapter 3: Supplemental materialFigure B.11: Functional redundancy at the genus level. Association of functional groups(columns) with members of various prokaryote genera (rows). A darker color corresponds to ahigher relative contribution of a genus (in terms of the number of associated OTUs) to a functionalgroup. Rows and columns are sorted accorded to the number of non-zero entries within them. Foranalogous plots at the OTU, family and class level, see Figs. 3.3, B.12 and B.13, respectively.224Chapter 3: Supplemental materialFigure B.12: Functional redundancy at the family level. Association of functional groups(columns) with members of various prokaryote families (rows) across all samples. A darker colorcorresponds to a higher relative contribution of a family (in terms of the number of associatedOTUs) to a functional group. Rows and columns are sorted accorded to the number of non-zeroentries within them. For analogous plots at the OTU, genus or class level, see Figs. 3.3, B.11 andB.13, respectively.225Chapter 3: Supplemental materialFigure B.13: Functional redundancy at the class level. Association of functional groups(columns) with members of various microbial classes (rows). A darker color corresponds to ahigher relative contribution of a class (in terms of the number of associated OTUs) to a functionalgroup. Rows and columns are sorted accorded to the number of non-zero entries within them. Foranalogous plots at the OTU, genus and family level, see Figs. 3.3, B.11 and B.12, respectively.226Chapter 3: Supplemental materialTable B.1: OTU overlap between samples. Overview of pairwise OTU overlaps betweenany two samples in terms of the Jaccard index (number of OTUs shared by both samples dividedby the number of OTUs present in any of the two samples), averaged over all sample pairs, andoverall overlap across all samples (number of OTUs present in all samples divided by the numberof OTUs present in at least one sample), within individual functional groups. The last columnlists the mean pairwise Morisita overlap index, which takes into account OTU proportions.group Jaccard overlap overall overlap Morisita overlapaerobic chemoheterotrophs 0.42 0.018 0.20cellulolytic 0.23 0.0 0.28fermenters 0.39 0.0 0.28methanogens 0.62 0.077 0.31methylotrophs 0.57 0.042 0.37nitrogen respirers 0.48 0.0 0.35photoautotrophs 0.26 0.0 0.24sulfate respirers 0.32 0.0 0.27ureolytic 0.36 0.0 0.38227Chapter 3: Supplemental materialTable B.2: Coefficients of variation. Coefficient of variation (CV = standarddeviation divided by mean) for the relative abundance of each gene group, andaverage coefficients of variation of OTU proportions within each functional group(averaged over all OTUs in a functional group).gene group CV of relative gene abundancesheterotrophy (PTS) 0.20oxygen respiration (cox) 0.11carbon fixation 0.098monosaccharide ABC transporters 0.23cellulolysis 0.31fermentation 0.071methanogenesis 0.56methylotrophy 0.65nitrogen respiration 0.15ureolysis 0.29photoautotrophy 0.54dissimilatory sulfur metabolism 0.59functional group average CV of OTU proportionsaerobic chemoheterotrophs 2.9cellulolytic 2.7fermenters 3.1methanogens 2.3methylotrophs 2.5nitrogen respirers 2.4photoautotrophs 3.3sulfate respirers 3.1ureolytic 2.9228Chapter 3: Supplemental materialTable B.3: OTU co-occurrence patterns. Overview of checkerboard analysis ofOTU co-occurrences within each functional group, including standardized effect sizes(SES) of the C-scores, statistical significances (P) and interpretation of co-occurrencepatterns.group SES P interpretationaerobic chemoheterotrophs 8.07 <0.001 segregatedcellulolytic 2.6 0.008 segregatedfermenters 7.4 <0.001 segregatedmethanogens 1.5 0.07 random, slightly segregatedmethylotrophs 0.23 0.35 randomnitrogen respirers 1.7 0.05 segregatedphotoautotrophs 3.6 0.002 segregatedsulfate respirers 6.6 <0.001 segregatedureolytic 0.39 0.31 randomTable B.4: Geographical distances vs taxonomic dissimilarities. Overview ofMantel Spearman rank correlation tests between pairwise geographical distances andBray-Curtis dissimilarities (in terms of OTU proportions within functional groups).group correlation statistical significanceaerobic chemoheterotrophs -0.043 0.30cellulolytic -0.033 0.36fermenters 0.017 0.39methanogens -0.016 0.46methylotrophs 0.067 0.18nitrogen respirers 0.12 0.08photoautotrophs -0.016 0.43sulfate respirers 0.027 0.31ureolytic -0.025 0.39229Chapter 3: Supplemental materialTable B.5: Environmental variables. Overview of physicochemical environmental variables,including mean, standard deviation and measurement units.variable mean std. unitsabsorption at 240 nm 2.12 1.39 -detrital volume 5.34 4.42 mLCH4 concentration 5.34 4.42 µMshading 42.3 16.8 % cover of face-up viewtank depth 10.6 2.31 cmDOC 80.9 49.8 mg · L−1plant height 56.0 8.17 cmlight intensity 1064 822 µmol ·m−2 · s−1 (photosynthetic photon flux density)number of leafs 8.31 2.22 -total nitrogen 83.4 55.9 µMtotal phosphorous 3.15 2.76 µMmolar N:P ratio 29.6 6.8 -PARAFAC 1 212 147 -PARAFAC 2 64.9 63.9 -PARAFAC 3 76.9 218 -PARAFAC 4 31.3 32.4 -pH 5.21 0.71 -salinity 0.047 0.031 PSUtotal suspended solids 73.0 55.1 mg · L−1total volume 63.6 24.4 mLturbidity 145 508 NTU230Chapter 3: Supplemental materialTable B.6: OTUs per functional group. Number of OTUs assignedto each functional group, compared to the total number of taxonomicallyannotated OTUs. Some OTUs were assigned to multiple functionalgroups.functional group OTUs fraction (%)aerobic chemoheterotrophs 225 11.4cellulolytic 14 0.7fermenters 103 5.2methanogens 13 0.7methylotrophs 24 1.2nitrogen respirers 17 0.9photoautotrophs 61 3.1sulfate respirers 40 2.0ureolytic 21 1.1231Chapter 3: Supplemental materialTable B.7: KOG-function associations. KEGG orthologous groups (KOG) associated withvarious functions in the metagenomic sequences.function KOGfermentation K01568, K13951, K00114, K00002, K04022, K00128,K00129, K00016, K00102, K00656, K00825, K00004,K00929, K00248, K00239, K00240, K00241, K00242carbon fixation K01595, K01601, K01602, K03737cellulolysis K19356, K19357, K01179, K01225, K19668denitrification (norBC) K04561, K02305denitrification (nosZ) K00376dissimilatory sulfur metabolism K11180, K11181, K00394, K00395, K17219, K17220,K17221, K16952, K17222, K17223, K17224, K17225,K17226, K17227heterotrophy (PTS) K08483, K02784, K02777, K02778, K02779, K02802,K02803, K02804, K02790, K02791, K02763, K02764,K02765, K02808, K02809, K02810, K02755, K02756,K02757, K02752, K02753, K02817, K02818, K02819,K11191, K11192, K02749, K02750, K02786, K02787,K02788, K02759, K02760, K02761, K02798, K02799,K02800, K11198, K11199, K11200, K02793, K02794,K02795, K02796, K19506, K19507, K19508, K19509,K11194, K11195, K11196, K02771, K02812, K02813,K02814, K02815, K02744, K02745, K02746, K02747,K10984, K10985, K10986, K17464, K17465, K17466,K17467, K02781, K02782, K02783, K02773, K02774,K02775, K02821, K02822, K03475, K11183, K02768,K02769, K02770, K08484, K08485, K02806, K17329,K17330, K17331, K17244, K17245, K17246, K17234,K17235, K17236, K17326, K17327, K17328, K10546,K10547, K10548, K07323, K02067, K02066, K07122,K02065hydrogen oxidoreduction K00532, K00533, K00534, K06441, K18016, K18017,K18023, K00436, K05586, K05587, K05588, K18005,K18006, K18007232Chapter 3: Supplemental materialTable B.7: Continued from previous page.function KOGmethanogenesis K00399, K00401, K00402, K03421, K03422, K03388,K03389, K03390methanol oxidation K14028, K14029, K17066, K14028, K16254, K16255,K14029, K16256, K16257, K16258, K16259, K16260,K00093, K17066methylotrophy K14028, K14029, K17066, K14028, K16254, K16255,K14029, K16256, K16257, K16258, K16259, K16260,K00093, K17066, K10713, K14028, K14029, K17066,K16157, K16158, K16159, K16160, K16161, K16162monosaccharide ABC trans-portersK10196, K10197, K10198, K10199, K17315, K17316,K17317, K10439, K10440, K10441, K06726, K10537,K10538, K10539, K10540, K10541, K10542, K10543,K10544, K10545, K10549, K10550, K10551, K10552,K10553, K10554, K10555, K10556, K10557, K10558,K10559, K10560, K10561, K10562, K17202, K17203,K17204, K17205, K17206, K17207, K17208, K17209,K17210, K17237, K17238, K17239, K17240, K17321,K17322, K17323, K17325, K17324, K05813, K05814,K05815, K05816, K10112nitrate respiration K02567, K02568, K00370, K00371, K00374, K00373nitrite reduction to ammonium K00362, K00363, K03385, K15876nitrite respiration K00368, K15864, K00362, K00363, K03385, K15876nitrogen respiration K04561, K02305, K00376, K00368, K15864, K02567,K02568, K00370, K00371, K00374, K00373, K00362,K00363, K03385, K15876oligosaccharide, polyol and lipidABC transportersK10108, K10109, K10110, K10111, K15770, K15771,K15772, K10117, K10118, K10119, K10112, K10188,K10189, K10190, K10191, K10227, K10228, K10229,K10111, K10232, K10233, K10234, K10235, K10192,K10193, K10194, K10195, K17241, K17242, K17243,K17318, K17319, K17320, K10236, K10237, K10238,K17311, K17312, K17313, K17314, K10200, K10201,K10202, K10240, K10241, K10242, K10112233Chapter 3: Supplemental materialTable B.7: Continued from previous page.function KOGoxygen respiration (cox) K02277, K02276, K02274, K15408, K02275, K02262,K02256, K02261, K02263, K02264, K02265, K02266,K02267, K02268, K02269, K02270, K02271, K02272,K02273, K02258, K02259, K02260, K00404, K00405,K15862, K00407, K00406, K02259, K02259, K00406,K00407, K00405, K00404, K00425, K00426photoautotrophy K02703, K02704, K02705, K02706, K02707, K02708,K02709, K02710, K02711, K02712, K02713, K02714,K02716, K02717, K02718, K02719, K02720, K02721,K02722, K02723, K02724, K03541, K03542, K08901,K08902, K08903, K08904, K08944, K08940, K08941,K08942, K08943, K08946, K08945, K08947, K08948,K08949, K08950, K08951, K08952, K08953, K08954,K08926, K08927, K13992, K08928, K08929, K13994,K13991, K08939, K08930, K02689, K02690, K02691,K02692, K02693, K02694, K08905, K02695, K02702,K14332, K02701, K02700, K02699, K02698, K02697,K02696, K02638, K02639, K02641, K08906sulfate sulfide oxidoreduction(aprAB)K00394, K00395sulfate sulfide oxidoreduction(dsrAB)K11180, K11181ureolysis K01427, K01428, K01429, K01430, K14048, K14541,K01457, K01941234Chapter 4: Supplemental materialAppendix CChapter 4: Supplemental materialC.1 MethodsC.1.1 MCM overviewMCM is a mathematical and computational framework for the construction, simulation,statistical analysis and calibration of microbial community models (Fig 4.2). Below wegive a brief overview of MCM’s mathematical structure and functionality. A thorough usermanual, including mathematical details and several step-by-step examples, was published byLouca and Doebeli (284) and is also available online at: http://www.zoology.ubc.ca/MCM.Mathematically, microbial community (MC) models in MCM correspond to a combination ofdifferential equations and optimization problems. In the simplest case, a model considers theconcentrations of S unicellular species, the concentrations ofM chemical substances (metabo-lites) in a single extracellular metabolite pool, and the cell-specific rates of R biologicallycatalyzed reactions. The environment (i.e., the medium containing the cells and extracellularmetabolites) is generally assumed to be well mixed, although compartmentalized ecosystemmodels are also possible (see the user manual).Each species is characterized by its metabolic potential, that is, the subset of reactions that itcan catalyze as well as any metabolites that it is able to uptake or export. The rate at whicheach cell performs a specific reaction in a specific moment depends on the species as well ason the current extracellular metabolite concentrations. At all times, intracellular metabolitefluxes are assumed to be balanced, so that the intracellular reaction rates completely deter-mine the rates at which metabolites are exported or taken up by cells. For each particularreaction performed by a particular species, a model may specify limits regarding the forwardand/or backward reaction rate. Similarly, for each metabolite utilized or produced by aparticular species, a model may specify limits regarding its uptake and/or export rate. Thisconstraint-based metabolic modeling is also known as Flux Balance Analysis (FBA; 354).235Chapter 4: Supplemental materialEach reaction can contribute to a cell’s biosynthesis rate (e.g., if the reaction produces a spe-cific amino acid), and cells are assumed to optimize their biosynthesis rate by appropriatechoice of their reaction rates, within the constraints imposed by FBA. Mathematically, if Fsis the vector listing the species’ cell-specific net metabolite uptake rates, Ns is the species’cell concentration, C is the vector listing the extracellular concentrations of all metabolitesand f is the vector listing any additional net metabolite fluxes into the environment (e.g.,external nutrient supply), then C changes according to the differential equationdCdt= −S∑s=1NsFs + f . (C.1.1)The sum in Eq. (C.1.1) iterates over all species and represents the net metabolite uptakeby the entire microbial community. Since intracellular metabolite fluxes are balanced, Fs isgiven byFs = −S ·Hs, (C.1.2)where Hs is the vector listing the cell-specific rates of all reactions for species s and S isthe stoichiometric matrix of all reactions. Specifically, each entry Smr is the stoichiometriccoefficient for metabolite m in reaction r. For example, the reactionC6H12O6 + 6O2 + 30ADP −→ 6CO2 + 6H2O + 30ATP (C.1.3)has stoichiometric coefficients −1, −6, −30, +6, +6 and +30 for the compounds C6H12O6,O2, ADP, CO2, H2O and ATP, respectively. Note that some entries in Hs may be constantzero, for example if the species lacks the capacity to perform certain reactions.At any moment, Hs is the solution to a linear optimization problem that maximizes thebiosynthesis rate (which is a linear function of Hs), given the constraints imposed on Hs aswell as Fs. These constraints can depend on the metabolite concentrations C, and henceHs and Fs depend on C. For example, the maximum possible O2 uptake rate may be aMichaelis-Menten-type function of environmental O2 concentrations (211), and hence theoptimal rates of aerobic pathways may depend on environmental O2 concentrations. Cell-specific biosynthesis rates are translated to per-capita cell birth rates by dividing by thecell mass, µs. On the other hand, cell loss is described by an exponential decay rate thataccounts, e.g., for cell death or dilution. Hence, the cell concentration Ns changes according236Chapter 4: Supplemental materialto the differential equationdNsdt= −Nsτs+ Nsµs· B(Hs(C)), (C.1.4)where τs is the expected life time and Bs is the cell-specific biosynthesis rate as a functionof Hs. We mention that in more general models, Hs, f and τs may also depend on arbitraryenvironmental variables, E, which themselves can be additional dynamical model variablesor be explicitly specified as part of the model (details in the user manual). For example,reaction rates may be limited by temperature (49) and temperature may be an explicitlycontrolled environmental variable.In MCM, models are specified in plain-text configuration files that define all metabolites,reactions, cell species as well as environmental variables. MCM translates these models intodifferential equations and linear optimization problems and solves them numerically. MCMis controlled through special scripts, which may contain commands for running simulations,fitting parameters or simply modifying technical parameters. MCM includes tools for theconversion of conventional genome-scale FBA models, such as generated by the Model SEEDpipeline (184) based on sequenced genomes, into a draft MC model.MCM can accommodate microbial communities comprising genome-based cell models witharbitrary environmental variables, metabolite exchange kinetics and regulatory mechanisms.For example, environmental variables may be stochastic processes (e.g., representing cli-mate), or specified using measured data (e.g., redox potential in bioreactor experiments), ordepend on metabolite concentrations (e.g., pH determined by acetate concentration) or evenbe dynamical (e.g., temperature increasing at a rate proportional to biomass productionrates). This versatility allows for the incorporation of complex environmental feedbacks,such as host immune responses in gut microbiota (223). Metabolite uptake and export ratelimits can be arbitrary functions of metabolite concentrations or environmental variables.Similar interdependencies are possible for reaction rate limits, thus allowing the inclusionof inhibitory or regulatory mechanisms (84). Metabolite concentrations can be explicitlyspecified, e.g., using measured time series, or depend dynamically on microbial export andother external fluxes. Effects of phage predation (204), reaction energetics (386) or stochasticenvironments can also be incorporated.MCM keeps track of a multitude of output variables such as cell densities, reaction rates,metabolite concentrations and metabolite exchange rates. Because each reaction can be for-mally associated with a particular enzyme, in turn encoded by a particular gene, MCM also237Chapter 4: Supplemental materialmakes predictions about gene densities as a product of cell densities and gene copy num-bers per cell. Metabolic activity statistics (e.g., Fig 4.6A,B) facilitate the identification ofmetabolic interactions such as cross-feeding (333). The predicted time courses of output vari-ables can be statistically evaluated against time series ranging from chemical concentrations,rate measurements to cell densities and metagenomics.MC models can include arbitrary abstract (symbolic) numeric parameters with a predefinedvalue range or probability distribution. Symbolic parameters can represent, for example,stoichiometric coefficients, gene copy numbers, cell life expectancies, half-saturation con-stants or environmental variables. The inclusion of symbolic parameters enables a high-levelanalysis of microbial communities: For example, MCM can automatically calibrate (fit) un-known symbolic parameters to time series using maximum-likelihood parameter estimation(113). The likelihood of the data, given a particular parameter choice, is calculated by as-suming a mixed deterministic-stochastic model in which the deterministic part is given bythe model predictions, and the stochastic part is given by normally distributed errors. Thelikelihood is minimized using an iterative optimization algorithm involving step-wise pa-rameter adjustments and repeated simulations. Other fitting algorithms are also available,such as maximization of the average coefficient of determination (R2), which is equivalentto weighted least-squares fitting. Because MCM can calibrate unknown measurement units,raw uncalibrated data (e.g., optical cell densities with no calibration to colony forming units,Fig 4.4A) can also be used.In this chapter single-cell models were calibrated to monoculture experiments, however mod-els can also be calibrated using data from experimental or natural communities that includeuncultured species. In general, fitted parameters need not be directly connected to the dataused for calibration, as long as a change in the parameters influences the predictions thatare being compared to the data. While this is a general principle of parameter estimation(468), in practice the uncertainty of calibrated parameters (e.g., in terms of confidence inter-vals) increases when their influence on the “goodness of fit” is weaker. Moreover, alternativeparameter combinations can sometimes yield a comparable match to the data, especially ifmultiple parameters influence the same variables (inverse problem degeneracy). Local fittingoptima can be detected through repeated randomly seeded calibrations (see next section),and overfitting can be partially avoided by keeping the number of free parameters at a bareminimum. Nevertheless, in certain cases good knowledge of the system or previous litera-ture may be required to identify the most plausible calibrations. Finally, we emphasize thatMCM is, after all, merely a framework enabling the construction, calibration and analysisof microbial community models. MCM models are thus limited by the same caveats and238Chapter 4: Supplemental materialassumptions as other constraint-based metabolic models (17, 38) and any predictions madeby MCM should be subject to similar scrutiny.C.1.2 Calibration of E. coli cell modelsE. coli strains were obtained from an evolution experiment performed in a batch cultureenvironment with daily dilutions into glucose-acetate supplemented Davis minimal medium(437, 488). For each phenotype, three clones were isolated from population 20 after 150days and used for three independent monoculture growth experiments. Optical densities, aswell as glucose, acetate and oxygen concentration data from these experiments were usedto calibrate the individual cell-metabolic models for the A, SS and FS phenotypes. Oxygenmeasurements were not available for type A. Experimental details and results are describedby Le Gac et al. (262).In the models, the limiting nutrients are assumed to be oxygen, glucose and acetate; allother nutrients can be taken up at an arbitrary rate. Oxygen, glucose and acetate uptakerate limits were described by Monod-like kinetics. The maximum cell-specific oxygen uptakerate was set to 1.008 × 10−13 mol/(d · cell), according to Varma and Palsson (496). Theoxygen half-saturation constant was set to 1.21 × 10−7 M according to Stolper et al. (452).Oxygen was assumed to be initially at atmospheric saturation levels (0.217 mM at 37◦ C)and repleted at a rate proportional to its deviation from saturation (167).The fitted parameters for each cell type were the maximum cell-specific uptake rates andhalf-saturation constants for glucose and acetate, as well as initial cell densities and non-growth associated ATP maintenance energy requirements. The initial glucose and acetateconcentrations were set to the average value measured at the earliest sampling point (1 hrafter incubation) for each type. The oxygen mass transfer coefficient (M/day per M devia-tion) was initially fitted individually for each type together with all other parameters, andthen fixed to the average of all three initial fits. All other parameters were then again fittedindividually for each type. Parameter fitting was done by maximizing the average coeffi-cient of determination (R2) using the MCM command fitMCM. A total of 237 data pointswere used to fit 19 parameters (Supplemental Table C.1). To reduce the possibility of onlyreaching a local maximum, fitting was repeated 100 times for each strain starting at randominitial parameter values and the best fit among all 100 runs was used. While some fittingruns reached alternative local maxima, the best overall fit was reached in most cases.Cell densities were directly compared to optical density (OD) measurements. The appropri-239Chapter 4: Supplemental materialate calibrations were estimated by MCM and ranged within 8.2×1011–1.3×1012 cells/(L ·OD).These estimates are consistent with previous experimental calibrations (260) that yielded0.26 g dW/(L ·OD), i.e., 1.4×1012 cells/(L ·OD) (assuming a cell dry weight of 1.8×10−13 gin the stationary phase; 115).C.1.3 Simulation of the microbial community modelThe microbial community model was simulated using the MCM command runMCM. Initialglucose and acetate concentrations were set to the average of all values measured at theearliest sampling point of the monoculture incubations. Cell death was not explicitly in-cluded, because of lack of appropriate data for calibration and because daily dilutions byfar exceeded cell death as a factor of cell population reduction. The MCM files required torun this model were published by Louca and Doebeli (284) and are also available online at:http://www.zoology.ubc.ca/MCM.C.1.4 Robustness of the SS-FS coexistenceTo verify the robustness of the stable SS-FS coexistence in co-culture, we randomly variedeach fitted model parameter uniformly within an interval spanning 10% above and 10% belowits calibrated value. Both types coexisted in 50 out of 50 random simulations (SupplementalFig. C.2).C.1.5 Seasonal restriction of the SS-FS co-culturesSimulations of the SS-FS co-cultures restricted to the first glucose-rich or second glucose-depleted season, as opposed to the full batch cycle, were performed in analogy to the ex-periments by Spencer et al. (436). More precisely, to model the first season experimentwe changed the dilution rate to 1/32 every 5 hours, so that at the end of each batch cy-cle glucose was not yet completely depleted. Similarly, for the second season experimentwe changed the dilution rate to 1/32 every 19 hours, and adjusted the growth medium toresemble the glucose-depleted acetate-rich solution reported by Spencer et al. (no glucose,3.59 mM acetate). Initial cell densities were set to 1× 1010 cells/L for both types. All othermodel parameters were kept unchanged. The original experiments by Spencer et al. (436)were performed at higher dilution rates (4 and 15 hours for the first and second season ex-periment, respectively), however in our simulations neither the FS nor SS type could persistat these high dilution rates. We note that the strains used in our work (262) had evolved240Chapter 4: Supplemental materialin separate evolution experiments using a different growth medium than those by Spenceret al. (436).Table C.1: Fitted parameters for the E. coli models described in the main text, together withreference values from the literature for comparison. Maximum cell-specific uptake rates (Vmax)are in fmol/cell/d. Half-saturation constants for acetate (Khalf,acetate) are in mM, half-saturationconstants for glucose (Khalf,glucose) are in µM. Initial cell densities are in 109 cells/L. Non-growthassociated maintenance requirements are given in fmol ATP/cell/d. The O2 mass transfer coefficientis in 1/d (reference value only roughly comparable, as the transfer coefficient depends strongly onshaking frequency and flask volume (293)). Dry-weight-specific values from the literature wereconverted to cell-specific values by assuming a dry weight of 180 fg/cell (115). All reference valueswere measured for strains other than B REL606.parameter values comparison referenceVmax,acetate 67.8 (A), 16.5 (SS), 220 (FS) 8.6 (318)Khalf,acetate 10.6 (A), 5.55 (SS), 12.9 (FS) 6.0 (318)Vmax,glucose 43.2 (A), 56.9 (SS), 29.0 (FS) 45 (496)Khalf,glucose 21.3 (A), 11.4 (SS), 44.6 (FS) 3–15 (158)maintenance req. 18.6 (A), 11.0 (SS), 15.0 (FS) 32 (496)O2 mass transfer 60.9 180 (292)init. cell density 8.48 (A), 11.3 (SS), 7.57 (FS)241Chapter 4: Supplemental materialFigure C.1: Predicted relative cell densities of the A and FS types in co-culture, in the absenceof SS. Initial cell densities were 1010 cells/L for type A and 1 cell/L for type FS. All other modelparameters are identical to the microbial community model (comprising the A, SS and FS types)described in the main text.A BFigure C.2: Robustness of the predicted stable coexistence of the SS and FS types in co-culture.Shown are the probability distributions of the relative SS (A) and FS (B) cell densities over time,when calibrated model parameters are randomly chosen within an interval spanning 10% aboveand 10% below their fitted values. Initial cell densities were 1010 cells/L for both types, all otherparameters were as described in the main text. Probability distributions were estimated using50 Monte Carlo simulations. In all cases both the SS and FS type persisted. The analysis wasperformed using the MCM command UAMCM.242Chapter 4: Supplemental materialA BC DE FFigure C.3: Measured relative cell densities of the SS and FS types in batch co-culture, whenrestricted to either the first glucose-rich (left column) or second glucose-depleted (right column)season for three independently evolved communities (rows 1–3), as reported by Spencer et al. (436,Figs. 2A,B therein). Restriction to the first season was achieved by shorter dilution periods whichprevented the complete depletion of glucose. In (B), restriction to the second season was achievedby using the glucose-depleted acetate-rich solution, produced by the full-batch co-culture, as growthmedium. Initial population sizes different between experiments. Strains used by Spencer et al. (436)evolved in slightly different growth medium than in this chapter. Cell generations were translatedto days by assuming an average of 6.7 generations per day (188).243Chapter 5: Supplemental materialAppendix DChapter 5: Supplemental materialD.1 MethodsD.1.1 Computational frameworkModel calibration, simulations and statistical analysis were performed using MCM (Chap-ter 4; 284). MCM combines FBA-based cell models with a dynamical environment thatinfluences, and is influenced by, microbial metabolism. The combination of FBA with avarying environmental metabolite pool is known as dynamic flux balance analysis (DFBA)(70, 177, 292), and has been shown to be a promising approach to microbial ecological model-ing (70, 177, 284, 318). MCM can accommodate microbial community models with arbitraryenvironmental variables and metabolite exchange kinetics. For example, environmental vari-ables may be stochastic processes (e.g., representing climate fluctuations) or specified usingmeasured data (e.g., pH in bioreactor experiments). Metabolite uptake and export ratelimits can be arbitrary functions of metabolite concentrations or environmental variables.Similar interdependencies are possible for reaction rate limits, thus allowing the inclusionof inhibitory or regulatory mechanisms (84). Metabolite concentrations can be explicitlyspecified ( e.g., using measured time series) or depend dynamically on microbial export andother external fluxes.MCM keeps track of a multitude of output variables such as cell concentrations, reactionrates, metabolite concentrations and metabolite exchange rates. Model predictions can thenbe compared to time series from experiments or environmental surveys, such as rate mea-surements, chemical profiles or optical cell densities. Reciprocally, time series data can beused to automatically calibrate unknown model parameters, e.g., using least squares fittingor maximum-likelihood estimation (e.g., Fig 5.1, see details below). Because MCM can cal-ibrate unknown measurement units, raw uncalibrated data (e.g., optical cell densities withno calibration to colony forming units) can also be used. MCM was recently validated us-ing laboratory experiments with bacterial communities (284). MCM is Open Source and244Chapter 5: Supplemental materialavailable at http://www.zoology.ubc.ca/MCM.D.1.2 Construction of the cell modelsThe metabolism of each cell was modeled using flux balance analysis with optimizationof biomass synthesis (354). The cell-internal reaction networks of the ammonium oxidizingbacteria (AOB) and nitrite oxidizing bacteria (NOB) are based on the core metabolic modelspublished by Poughon et al. (370) and Perez-Garcia et al. (364). More precisely, the biomasssynthesis functions of both cell types are taken from Perez-Garcia et al. (364), the energymetabolism of AOB is adopted from Perez-Garcia et al. (364) and the energy metabolismof NOB is adopted from (370). Assimilatory nitrite reduction to ammonium, required forbiomass synthesis, was added to NOB (443). The constructed AOB and NOB models arecomprised of 16 and 11 reactions, respectively (see Appendix D.3 for details). The nitrogensubstrate half-saturation constants were set to 26 µM NH3 for the AOB template accordingto Suzuki et al. (463), and to 229 µM NO−2 for the NOB template according to Remacleand De Leval (389). Cell masses were set to 3 × 10−13 g dW/cell for the AOB and 4 ×10−13 g dW/cell for the NOB, according to Keen and Prosser (227).D.1.3 Calibration of the template cell modelsThe maximum cell-specific substrate uptake rates for the AOB and NOB templates (Vmax,NH+4and Vmax,NO−3 , respectively) were calibrated using time series from a previous experiment withan ammonium-batch-fed nitrifying bioreactor by de Boer and Laanbroek (94), inoculatedwith strains of the AOB Nitrosospira and NOB Nitrobacter genera. For the calibration,our bioreactor model was adjusted to de Boer and Laanbroek’s experiment (Fig. 5.1A):The initial ammonium concentration was set to 0.916 mM, nitrite and nitrate were initiallyabsent, the pH was set to the reported profile and oxygen was assumed to be non-limiting.We used the reported concentration profiles of the gradually depleted ammonium (Fig. 5.1B)and produced nitrate (Fig. 5.1C) to calibrate Vmax,NH+4 and Vmax,NO−3 via maximum-likelihoodestimation (113). This approach estimates unknown parameters by maximizing the likeli-hood of observing the available data, given a particular candidate choice of parameter values.Maximum likelihood estimation is widely used in statistical inference such as multilinearregression, computational phylogenetics and modeling in physics (291). In our case, thelikelihood of the data was calculated on the basis of a mixed deterministic-stochastic struc-ture, in which the deterministic part is given by the microbial community model and errors245Chapter 5: Supplemental materialare assumed to be normally distributed. The likelihood was maximized using the SBPLXoptimization algorithm (214), which uses repeated simulations and gradual exploration ofparameter space and is integrated into MCM (284). To reduce the possibility of only reachinga local maximum, fitting was repeated 100 times using random initial parameter values andthe best fit among all 100 runs was used. While some fitting runs reached alternative localmaxima, the best overall fit was reached in most cases. This procedure yielded the fittedvalues Vmax,NH3 = 6.48×10−13 mol NH3/cell/d and Vmax,NO−2 = 7.31×10−13 mol NO−2 /cell/d,which are consistent with the literature (375).D.1.4 Nitrifying membrane bioreactor modelUsing the calibrated AOB and NOB template cell models, we constructed a model of anammonium-fed nitrifying membrane bioreactor similar to the one described by Wittebolleet al. (524). The hydraulic turnover rate for all metabolites was 0.672 d−1, ammonium inputwas 7.14 mM · d−1 and pH was fixed to 7.4. The input medium was assumed to be sterileand to contain micronutrients in sufficient amounts for autotrophic growth via nitrification(109, 524). The bioreactor medium was assumed to be well mixed. Microbial communitiesstarted with an equal number of AOB and NOB strains, each with an initial density of107 cells/L. Cell death was modeled as exponential decay. Further details are provided inAppendix D.3.Each strain was a random variation of the calibrated template cell models, with physiologicalparameters chosen as follows: Substrate uptake kinetic parameters, i.e., the maximum cell-specific nitrogen substrate uptake rates (Vmax) and substrate affinities (α; 6), were randomlyand uniformly chosen within an interval ranging an order of magnitude above and an order ofmagnitude below the template values. To account for the typically assumed tradeoff betweenVmax and α, these parameters were multiplied by a factor κ and (1−κ), respectively, where κwas chosen randomly within [0, 1] for each strain (431). Cell life times were randomly chosenwithin 50− 100 d for each strain according to typical nitrifier decay rates (7).Perturbations were modeled as a temporary increase in mortality rates, such that afterone day each cell population declines by some random factor, chosen log-uniformly andindependently for each strain within the interval [1, 1012].246Chapter 5: Supplemental materialD.1.5 Statistics of community convergenceThe distance between two community compositions was expressed using the Bray-Curtisdissimilarity, which is well established in the ecological literature (266). The maximumdissimilarity between any two communities is 100%, while identical communities have a dis-similarity of 0%. The convergence of the bioreactor community was examined by calculatingits dissimilarity to the steady composition established after a long time. This dissimilaritycurve is typically decreasing in time because communities eventually converge to a steadycomposition in which each metabolic niche is occupied by a single strain. A steeper curveimplies a faster convergence. Following inoculation, the dissimilarity curve depends on theparticular strains present in the community, which are chosen randomly for each simulation.The resulting probability distribution of the dissimilarity curves (Figs. 5.2E,F, and 5.3E,F)was estimated using 100 repeated random simulations of the model.D.2 Elaboration on the competition modelBelow we elaborate on the competition model in the main text. We consider the total celldensity N = ∑iNi and the relative cell densities ηi = Ni/N . Using the community-averagequantitiesβΦ =∑iηiβiΦi, λ =∑iηiλi, Φ =∑iηiΦi (D.2.1)it is straightforward to derive the dynamicsdRdt= fo −NΦ(R) (D.2.2)for the resource concentration. Similarly, starting withdNdt=∑idNidt(D.2.3)and using the original model equations for the Ni one quickly arrives atdNdt= N(βΦ(R)− λ). (D.2.4)247Chapter 5: Supplemental materialFurthermore, by using the product ruledηidt= 1NdNidt− NiN2dNdt(D.2.5)and inserting Eq. (D.2.4) one finds thatdηidt= ηi[(βiΦi − λi)− (βΦ− λ)]. (D.2.6)Note that the dynamics of N and R are determined by the community-average growthkinetics. In contrast, relative cell densities change at rates that depend on the deviation ofindividual growth kinetics from the community average. Furthermore, one can always writeβiΦi − λi =(βΦ− λ)(1 + εi), (D.2.7)where εi accounts for the relative deviation of individual growth kinetics from the communityaverage. Then Eq. (D.2.6) becomesdηidt= εi · ηi(βΦ− λ), (D.2.8)as given in the main text.D.3 Details on the bioreactor modelD.3.1 Construction of cell modelsThe reaction networks for the two cell types, AOB and NOB, only include core energymetabolism and biomass synthesis (production of ATP and NADH via nitrification andbiosynthesis via consumption of ATP and NADH). The biomass synthesis functions of bothcell types are taken from (364), assuming that biomass stoichiometry is similar for Nitro-somonas (AOB) and Nitrobacter (NOB) cells. In particular, energy-to-biomass conversioncoefficients (ATP & NADH to g dry weight) were experimentally calibrated by (364). Theenergy metabolism of AOB is taken from (364). Note that the original published AOBmodel had 17 reactions related to energy metabolism (Table S1 in (364)). One of thesereactions, “NO1”, was merely a trivial conversion reaction and was thus merged with therest. The remaining reactions in the AOB model by (364) were pure transport reactions andare implicitly included in the MCM model (but not referred to as “reactions”). The energy248Chapter 5: Supplemental materialmetabolism of NOB is taken from (370). Assimilatory nitrite reduction to ammonium, re-quired for biomass synthesis but not included in the original reaction network by (370), wasadded to NOB according to (443). Uptake kinetic parameters for NH3.NH4 and HNO2.NO2,as well as cell life times and cell masses were chosen as described in the main text. Belowwe provide an overview of all considered metabolites and reactions. Please consult the ref-erences by the reactions for details. The complete model and simulation script are availableat: http://www.zoology.ubc.ca/MCMD.3.2 Metabolites• ADP (adenosine diphosphate)• ATP (adenosine triphosphate)• CO2 (carbon dioxide)• Cytc550ox (class I cytochrome c550, oxidized)• Cytc550red (class I cytochrome c550, reduced)• Cytc552ox (class I cytochrome c552, oxidized)• Cytc552red (class I cytochrome c552, reduced)• Cytc554ox (class I cytochrome c554, oxidized)• Cytc554red (class I cytochrome c554, reduced)• Hc (hydrogen, cytosol)• Hp (hydrogen, periplasm)• H2O (water)• NH3.NH4 (ammonia + ammonium)• NH3 (purely extracellular, ammonia, depending on NH3.NH4 and pH, see notes below)• NH4 (purely extracellular, ammonium, depending on NH3.NH4 and pH, see notes be-low)• HNO2.NO2 (nitrite + nitrous acid)• NO2 (purely extracellular, nitrite, depending on HNO2.NO2 and pH)249Chapter 5: Supplemental material• HNO3.NO3 (nitrate + nitric acid)• NO3 (purely extracellular, nitrate, depending on HNO3.NO3 and pH)• N2O (nitrous oxide)• NAD (oxidized nicotinamide adenine dinucleotide)• NADH (Reduced nicotinamide adenine dinucleotide)• NH2OH (hydroxylamine)• NO (nitric oxide)• NOH (nitroxyle)• O2 (oxygen)• Pi (inorganic phosphate)• protein• UQ (ubiquinone)• UQH2 (ubiquinol)• maint (formal maintenance requirements, in ATP-equivalents)• Q8H2 (ubiquinol-8)• Q8 (ubiquinone-8)• Cyt554e (ferrocytochrome c554)• Cyt554 (ferricytochrome c554)• Cyt552me (membrane ferrocytochrome)• Cyt552m (ferricytochrome c552)• Cyt552e (periplasmic ferrocytochrome c552)• Cyt552 (ferricytochrome c552)250Chapter 5: Supplemental materialNote that in the model ammonium (NH4) is assumed to be at dissociation equilibriumwith ammonia (NH3), determined by the pH and the acid dissociation constant 5.69 ×10−10 M at standard temperature (73). pH was either adjusted to measurements for thebatch reactor (94), or to constant 7.4 for the membrane continuous-flow reactor (524). Thetotal concentration of ammonium and ammonia is represented in the model by NH3.NH4,and depends on the rate of ammonium input, the hydraulic dilution rate and microbialconsumption. A similar formalism was applied to nitrite and nitrate.D.3.3 Reaction network for AOBR_biomass (364) (biomass coefficient 113 g dW/mol, modified to include CO2 consumption):15 ATP + 12 NADH + 0.25 protein + 32 maint + 5 CO2 + 7 Hc→ 15 ADP + 10 NAD + 15 Pi + 4 O2(D.3.1)R_maint (364) (maintenance ATP consumption):ATP + H2O→ ADP + Pi + Hc + maint (D.3.2)R_protein (364) (protein synthesis via ATP consumption):8.9 ATP + 4 NH3.NH4 → 8.9 ADP + 8.9 Pi + 8.9 Hc + protein (D.3.3)R_amo (364):NH3.NH4 + O2 + Q8H2 → NH2OH + H2O + Q8 (D.3.4)R_HAO_NOH (364):NH2OH + Cyt554 → NOH + Cyt554e + 2 Hp (D.3.5)R_HAO_NO (364):NOH + 0.5 Cyt554 → NO + 0.5 Cyt554e + Hp (D.3.6)251Chapter 5: Supplemental materialR_HAO_hno2 (364):NO + 0.5 Cyt554 + H2O→ HNO2.NO2 + 0.5 Cyt554e + Hp (D.3.7)R_Cyt_554 (364):Cyt554e + Cyt552m → Cyt552me + Cyt554 (D.3.8)R_Q_8H_2_synt (364):Q8 + Cyt552me + 2 Hp → Q8H2 + Cyt552m (D.3.9)R_NADH_synt (364):NAD + Q8H2 + 2 Hp → NADH + Q8 + 3 Hc (D.3.10)R_Cytbc1 (364):Q8H2 + 2 Cyt552 → 2 Hp + Q8 + 2 Cyt552e (D.3.11)R_Cytaa3 (364):0.5 O2 + 4 Hc + 2 Cyt552e → H2O + 2 Hp + 2 Cyt552 (D.3.12)R_CytP460 (364):0.5 NH2OH + 0.5 NO + 2.5 Cyt552 + H2O→ HNO2.NO2 + 2.5 Cyt552e + 2.5 Hp(D.3.13)R_nir (364):HNO2.NO2 + Cyt552e + Hp → NO + Cyt552 + H2O (D.3.14)R_nor (364):NO + Cyt552e + Hp → 0.5 N2O + Cyt552 + 0.5 H2O (D.3.15)252Chapter 5: Supplemental materialR_ATP_synt (364):ADP + Pi + 3 Hp → ATP + H2O + 3 Hc (D.3.16)D.3.4 Reaction network for NOBR_biomass (364) (biomass coefficient 113 g dW/mol):15 ATP + 12 NADH + 0.25 protein + 32 maint + 5 CO2 + 7 Hc→ 15 ADP + 10 NAD + 15 Pi + 4 O2(D.3.17)R_maint (364):ATP + H2O→ ADP + Pi + Hc + maint (D.3.18)R_protein (364):8.9 ATP + 4 NH3.NH4 → 8.9 ADP + 8.9 Pi + 8.9 Hc + protein (D.3.19)Jnrj8 (370):2 Cytc550red + HNO2.NO2 + H2O→ 2 Cytc550ox + 2 Hc + HNO3.NO3 (D.3.20)Jnrj9 (370):Hp + Cytc550ox + HNO2.NO2 → Cytc550red + H2O + NO (D.3.21)Jnrj10 (370):4 Hp + UQ + 2 H2O + 2 NO→ UQH2 + 4 Hp + 2 HNO2.NO2 (D.3.22)Jnrj11 (370):Cytc550red + H2O + NO→ Cytc550ox + Hc + HNO2.NO2 (D.3.23)253Chapter 5: Supplemental materialJtermox_dissip (370) (dissipation (loss) of proton motive force (PMF) by proton diffusion):2 Cytc550ox + 4 Hc + 0.5 O2 + 1.12 Hp → 2 Hp + 2Cytc550red + H2O + 1.12 Hc (D.3.24)JNAD (370):4 Hp + NAD + UQH2 → UQ + NADH + 5 Hc (D.3.25)JATP (370):3 Hp + ADP + Pi→ ATP + 3 Hc + H2O (D.3.26)R_nirBD (443) (nitrite reduction to ammonium, accounting for nitrogen assimilation, nitritedetoxification, and NAD regeneration):HNO2.NO2 + 3 NADH + 4 Hc → NH3.NH4 + 3 NAD + 2 H2O (D.3.27)D.3.5 Uptake kinetics• Ammonium/ammonia (NH3.NH4) uptake rate limits are specified as a Monod functionof NH3:[NH3]1α+ [NH3]V, (D.3.28)where V = 6.48 × 10−13 mol/cell/day (maximum cell-specific uptake rate) and α =2.49×10−8 L/cell/day (affinity (6)) for the calibrated AOB model. Note that the modeldoes not differentiate between ammonium and ammonia uptake, since in the bioreactorthe two compounds are at dissociation equilibrium, determined by pH and the sum ofammonium and ammonia concentrations. Nevertheless, uptake kinetics were specifiedin terms of ammonia concentrations according to findings by Suzuki et al. (463) thatsuggest that ammonia is likely the limiting substrate.Random AOB strains had modified kinetics:[NH3]1αr(1−κr) +[NH3]κrVr, (D.3.29)where κr is chosen randomly from 0 to 1, Vr is chosen randomly from V/10 to 10V , and254Chapter 5: Supplemental materialαr is chosen randomly from α/10 and 10α. We randomly varied V and the affinity α (asopposed to the half-saturation concentration) according to Aksnes and Cao (6), whoshowed that affinity, rather than half-saturation concentration, is an inherent biologicaltrait. The random parameter κr accounts for trait-offs between affinities and maximumcell-specific uptake rates, as suggested by (431).• Nitrite/nitrous acid (HNO2.NO2) uptake rate limits are specified as a Monod functionof HNO2.NO2:[HNO2.NO2]1α+ [HNO2.NO2]V, (D.3.30)where V = 7.31×10−13 mol/cell/day and α = 3.19×10−9 L/cell/day for the calibratedAOB model. Note that the model does not differentiate between nitrite and nitrousacid uptake, since in the bioreactor the two compounds are at dissociation equilibriumdetermined by pH.Random NOB strains had modified kinetics:[HNO2.NO2]1αr(1−κr) +[HNO2.NO2]κrVr, (D.3.31)where κr is chosen randomly from 0 to 1, Vr is chosen randomly from V/10 to 10V ,and αr is chosen randomly from α/10 and 10α.D.3.6 Community-scale dynamicsThe membrane bioreactor model keeps track of the cell concentrations of each AOB andNOB strain, as well as extracellular ammonia/ammonium (NH3.NH4), nitrite/nitrous acid(HNO2.NO2) and nitrate/nitric acid (HNO3.NO3) concentrations over time in terms of differ-ential equations. The bioreactor is assumed to be well mixed and well oxygenated. Let Ni(t)be the cell concentration for strain i (AOB or NOB), Cj(t) the concentration of metabolitej and Eij(t) the net export rate of metabolite j by each cell of strain i, at time t. Notethat Eij(t) is calculated by solving a linear FBA problem for each strain and at each timestep. The constraints of the FBA problems, and thus their solutions, depend on the max-imum substrate uptake rates, which are given by Monod kinetics as described above. Allpopulation sizes Ni are described by ordinary differential equations (ODEs) of the following255Chapter 5: Supplemental materialform:dNi(t)dt= 1miBi(t)Ni(t)− λiNi(t). (D.3.32)Here, λi is the exponential death rate of strain i, i.e., the inverse of its expected life time,Bi(t) is the per-cell biosynthesis rate determined by FBA and mi is the cell mass. Similarly,all metabolite concentrations Ci are described by ODEs of the following form:dCjdt= r ·[C˜j − Cj(t)]+∑iEij(t)Ni(t). (D.3.33)Here, r is the hydraulic turnover rate (feed rate over bioreactor volume) and C˜j is theconcentration of the metabolite in the feed. The last term is a sum over all cell populations,accounting for microbial production or consumption of the metabolite.256Chapter 6: Supplemental materialAppendix EChapter 6: Supplemental materialE.1 MethodsE.1.1 Model overviewOur model describes the population dynamics of multiple bacterial and archaeal operationaltaxonomic units (OTUs), their reaction kinetics, the population dynamics of multiple phagepopulations, as well as extracellular metabolite concentrations in a flow-through bioreactor.Here, an “OTU” represents a cell lineage that is specialized on a specific metabolic function(e.g., acetoclastic methanogenesis) and predated by its own specialist phage population.Hence, an OTU represents a taxonomic group that is sufficiently narrow so that reactionkinetics are similar across members, and sufficiently broad so that different OTUs havedifferent specialist phages. Hence, an OTU in our model is roughly analogous to a singleprokaryotic species or strain (69, 174, 521).The bioreactor model largely resembles the setup used in previous experiments (122, 533).Glucose is supplied continuously to the bioreactor as part of a sterile inflow, which is balancedby an equivalent outflow that removes residual substrates, metabolic by-products as well ascells and free phage particles at a constant hydraulic renewal rate. The bioreactor’s interioris assumed to be well mixed and anaerobic. pH and temperature are held constant.E.1.2 Reaction rates and metabolite dynamicsThe model considers a total of 12 reactions, driving the stepwise catabolism of glucose allthe way to the eventual production of methane (see Table E.1 for a list of reactions and Fig.6.1 for a schematic overview). Each OTU is associated with a single metabolic reaction, suchas fermentation of glucose to ethanol or acetoclastic methanogenesis, but each reaction maybe performed by multiple competing OTUs. The cell-specific rate of a reaction performedby some OTU s, Hs, is assumed to be limited by a single limiting substrate (such as glucose257Chapter 6: Supplemental materialin the case of fermenters) according to classical Monod-kinetics (211):Hs =VsCC +Ks. (E.1.1)Here, Ks is a half-saturation concentration of the limiting substrate (specific to OTU s), Cis the substrate concentration and Vs is the maximum cell-specific reaction rate (specific toOTU s). Each reaction couples the uptake of a number of substrates to the export of a num-ber of products into the extracellular medium, thereby affecting metabolite concentrationsin the bioreactor. Specifically, the concentration of the m-th metabolite in the bioreactor,Cm, changes according to the differential equationdCmdt= λ(Com − Cm) +∑rSmr∑s∈JrNsHs, (E.1.2)where Jr is the set of OTUs performing reaction r, Ns is the cell concentration of OTU s,λ is the hydraulic renewal rate, Com is the metabolite’s concentration in the inflow (zero forall metabolites except glucose) and Smr is the stoichiometric coefficient of metabolite m inreaction r. For example, for glucose fermentation to ethanol,C6H12O6 → 2 CH3CH2OH + 2 CO2, (E.1.3)the stoichiometric coefficients of glucose, ethanol and CO2 are −1, +2 and +2, respectively.For each OTU, the total cell production rate is assumed to be proportional to the total rateof its catalyzed reaction, multiplied by the reaction’s Gibbs free energy (as described below).E.1.3 Gibbs free energy and cell productionThe Gibbs free energy of a reaction (∆G) is conventionally interpreted as the amount ofenergy that can be readily transformed to “work” (e.g., for ATP production) by a microbialpopulation (492). For example, the zonation of various microbial groups along redox tran-sition zones in sediments is strongly determined by the Gibbs free energy of their metabolicpathways (56). In anaerobic methanogenic digesters, the growth of so-called syntrophic bac-teria is (in addition to substrate availability) highly dependent upon the removal of down-stream products (e.g., acetate and H2) by methanogens, because a low partial pressure ofproducts ensures that the Gibbs free energy of reactions is sufficient to sustain growth (77).Hence, fluctuations of methanogen populations can affect upstream metabolic activity andpopulation dynamics by modulating the energetic yield of reactions.258Chapter 6: Supplemental materialA meta-analysis of several anaerobic metabolic pathways by Roden and Jin (396) showedthat microbial biosynthesis rates are approximately proportional to the Gibbs free energyreleased by the utilized pathway. The regression formula provided by Roden and Jin (396)has been included in subsequent microbial ecological models for predicting cell growth ratesbased on their metabolic activity (287, 386, 387). Following previous work, we thus assumethat the amount of biosynthesis supported by reaction r is given byZr = 2.08− 1γer0.0211×∆Gr (g dry biomass per mole e-donor consumed). (E.1.4)Here, γer is the absolute stoichiometric coefficient of the electron donor in the reaction,∆Gr = ∆Gor +RgT lnQr (E.1.5)is the Gibbs free energy of the reaction (in kJ per mol), ∆Gor is the standard Gibbs free energyof the reaction (see Table E.1), T is the temperature in Kelvin, Rg = 8.314 J · K ·mol−1 isthe molar gas constant andQr =∏mCSmrm (E.1.6)is the so-called reaction quotient (96). The per-capita biosynthesis rate, Zrhr, is translatedto a per-capita cell production rate by dividing by the dry cell mass.We note that determining microbial growth rates based on Gibbs free energy fluxes is anon-trivial task even for engineered systems (312), and other factors — such as cell-specificmaintenance requirements — can lead to deviations from regression models. Nevertheless,our model successfully captures key aspects of microbial metabolic networks, namely sto-ichiometric balancing between metabolic pathways and the dependence of biosynthesis onenergetic yield (396).E.1.4 Cell and phage population dynamicsIn the model, each OTU s is associated with a single specialist lytic phage population, whichcomprises free phage particles as well as phages that have infected a host cell. The modelkeeps track of the infected portion, N is, and the uninfected portion, Nus , of each cell popula-tion. The rate at which healthy cells become infected is proportional to the total number offree phage particles (Ps), multiplied by some proportionality constant (βs) that accounts forthe rate at which phages “scan” the medium via passive diffusion (volume clearance rate)259Chapter 6: Supplemental materialas well as the probability that an encounter with a cell would lead to infection (322). Allinfected cells are assumed to eventually undergo lysis, releasing new phage particles into thebioreactor (423). Lysogeny is not considered in the model. The loss of uninfected cells isassumed to be driven by hydraulic dilution, and is hence modeled as an exponential decayrate λ. In addition to hydraulic dilution, infected cells suffer from an elevated mortality rate,µs, which is equivalent to the inverse of the time lag between infection and cell lysis. Hence,uninfected and infected cell concentrations change according to the differential equationsdNusdt= ZsmsNusHs − λNus − βsPsNus , (E.1.7)dN isdt= βsPsNus − λN is − µsN is. (E.1.8)Here, Zs is the biomass yield of the reaction catalyzed by species s (introduced above) andms is the dry cell mass. Hence, the first term on the right hand side of Eq. (E.1.7) accountsfor a variable cell growth rate, depending on the current metabolic rate and the energycurrently available from a reaction. Observe that the term βsPsNs (rate of new infectionsof OTU s) increases as the concentration of cells increases, which is a key prerequisite forKTW dynamics. Phage-induced cell lysis leads to the release of new free phage particles.Phage particles that fail to infect any cells are assumed to eventually get flushed out of thebioreactor. Hence, the concentration of free phage particles associated with species s satisfiesthe differential equationdPsdt= νsµsN is − βsPsNus − λPs, (E.1.9)where νs is the average number of phage particles released per lysed cell.E.1.5 Parameterization and simulationsModel parameters were either fixed at values obtained from the literature, or chosen ran-domly and uniformly within an interval around values obtained from the literature (overviewin Table E.2). In particular, for each OTU the reaction-kinetic parameters (V and K; Eq.E.1.1) as well as parameters describing phage-host interactions (βs, νs and µs; Eqs. E.1.7–E.1.9) were chosen randomly and independently of other OTUs. The glucose concentrationin the inflow is set to 8 mg · L−1. This glucose concentration is comparable to dissolvedorganic carbon concentrations in natural methanogenic environments (215, 505), althoughit is lower than in typical bioreactor feeds (122, 533). Apart from influencing the overallextent of fluctuations in the bioreactor, the choice of glucose input (within ranges spanning260Chapter 6: Supplemental materialnatural and engineered systems) did not influence our overall conclusions.The differential equations (E.1.2), (E.1.7), (E.1.8) and (E.1.9) describe a high-dimensionaldeterministic dynamical system of 3S+M time-dependent variables, where S is the numberof OTUs and M is the number of considered metabolites. Numerical simulations of thissystem were performed using MCM (Chapter 4; 284).E.1.6 Statistical analysisTo quantify the metabolic performance of the community (in terms of methane production),for each simulation we calculated the average effluent methane concentration over time. Toquantify the variation in metabolic performance we calculated the coefficient of variation(CV, i.e., the standard deviation divided by the average) of effluent methane concentrationover time. Note that for each simulation the average methane concentration and its CVwere different and random because several model parameters were chosen randomly. For eachdegree of functional redundancy, we used 50 random simulations to estimate the distributionof average methane concentrations and their CVs (box-plots in Figs. 6.3A,B).To quantify the variation in functional community structure across time during any particularsimulation, we calculated the CVs of functional group proportions and averaged these overall considered functional groups. For example, to quantify the variation of coarse functionalgroups (fermenters, syntrophs, methanogens) we considered the average of (1) the CV of thefraction of fermenters, (2) the CV of the fraction of syntrophs and (3) the CV of the fractionof methanogens in the community. Similarly, to quantify the variation of methanogenicgroups we considered the average of (1) the CV of the fraction of acetoclastic methanogensand (2) the CV of the fraction of H2/CO2 methanogens. Note that CVs were differentand random for each simulation, because several model parameters were chosen randomly.For each degree of functional redundancy, we used 50 random simulations to estimate thedistribution of CVs (box-plots in Figs. 6.4I–L).As mentioned in the main text, to distinguish between statistical averaging and dynamicstabilization we compared the CVs of functional group proportions to a null model in whichOTU populations fluctuated independently. Specifically, for each simulation the null modelcyclically shifted the time series of each OTU by a random time step, resulting in hypotheticalcommunity trajectories in which each OTU population fluctuates at a random phase lag whencompared to other OTUs. The shifted time series of all OTUs within each functional groupwere then summed to calculate the hypothetical corresponding abundance of the functional261Chapter 6: Supplemental materialgroup. For each simulation this was done 1000 times, yielding a distribution of randomCVs generated by the null model. The “degree of dynamic stabilization” (DDS) of a singlesimulation was then defined as the fraction of random CVs that were above the actual CVof the simulation. For each degree of functional redundancy, we used 50 random simulationsto estimate the distribution of DDSs (box-plots in Figs. 6.4M–P).All statistics were calculated using the time series spanning days 500–1000, in order to avoidany transients right after inoculation.E.1.7 Deterministic vs stochastic competitive exclusionDemographic drift between similar competitors has been suggested previously (349) as acause of seemingly random and sustained OTU succession observed within functional groupsunder constant environmental conditions. At high cell densities, however, deterministic dy-namics such as competitive exclusion or predator-prey cycles are expected to dominate overstochastic demographic drift, although the importance of drift depends on the extent ofcompetitive differences and the strength of biotic interactions, as well as the spatial scaleat which populations are mixed (e.g., the entire bioreactor in case of vigorous mixing). Forexample, significant differences in growth rates, enzyme efficiencies, maintenance rates andstress responses are common between prokaryotic sister species or even between strains ofthe same species (207, 328, 352). To assess the plausibility of random demographic drift asa possible explanation for OTU succession in the face of competition, we examined a simplestochastic birth-death model (described below) for the population sizes of two competingOTUs with slightly different competitive abilities. We performed multiple simulations of themodel and compared the resulting stochastic trajectories to the corresponding deterministictrajectory leading to competitive exclusion. We measured the deviation of each stochastictrajectory from the corresponding deterministic trajectory in terms of the coefficient of de-termination (R2). As will become clear below, being conservative in terms of the assumedcompetitive differences between OTUs strengthens the confidence in our results.The model considers the population sizes (N1 and N2) of two competing OTUs with equaldeath rates but slightly different birth rates, while assuming a constant combined populationsize (N = N1 +N2). At each time step, a random cell is lost (“death”) from one of the twopopulations, while another random cell is added (“birth”) to one of the two populations. Theprobability that the lost cell belongs to population 1 is thus N1/N . The probability that theadded cell belongs to population 1 is N1/(N1 +(1+s)N2), where we assumed that the ratio ofper-capita birth rates (OTU 1 : OTU 2) is 1 : (1 + s) and s is the relative difference between262Chapter 6: Supplemental materialthe two per-capita birth rates (“relative advantage”). Hence, at each time step N1 can eitherdecrease by 1, increase by 1 or remain unchanged, while the transition probabilities dependon the current N1. On the other hand, the deterministic trajectory of N1 corresponding tothe above birth-death process is given by the difference equationN1(t+ 1) = N1(t) +N1N1 + (1 + s)N2− N1N, (E.1.10)where t is the number of elapsed time steps. We note that previously published neutralmodels for microbial communities also include random immigration of cells from a “regionalpool” as an additional source of fluctuations in the considered community (349, 427). Im-migration is omitted from our model, because in the systems in which OTU turnover withinfunctional groups has been observed over time (109, 349, 397) the bulk of living cells waslikely produced within the system at hand, rather than added via immigration.Each simulation of the above stochastic birth-death model was initiated at equal populationsizes (N1 = N2 = N/2) and was ran until population 1 dropped below a given “extinctionthreshold” (E), at which point competitive exclusion was considered to be complete. Thecoefficient of determination was then calculated asR2 = 1−∑Tt=1[N˜1(t)−N1(t)]2∑Tt=1[N1(t)−N1]2 , (E.1.11)where T is the number of time steps until competitive exclusion, and N1 is the average of thestochastic trajectory. For the example cited in the main text, we used a combined populationsize of N = 105, a relative advantage of s = 1% and a threshold of E = 0.01 × N . The R2reported in the text was averaged over 1000 random simulations. The fact that even at sucha low population size and such a weak competitive advantage the stochastic trajectory closelyresembles the deterministic trajectory of competitive exclusion, strengthens our argumentthat pure demographic drift is an unlikely explanation for the OTU turnover observed inprevious experiments (109, 349, 397).263Chapter 6: Supplemental material 0 1 2 3 500 600 700 800 900 1000µM/daytime (days)Pure butyrate uptake rates (50x, run_01) 0 2 4 6 500 600 700 800 900 1000µM/daytime (days)Pure lactate uptake rates (50x, run_01) 0 5 10 15 500 600 700 800 900 1000µM/daytime (days)Pure glucose uptake rates (50x, run_01) 0 0.2 0.4 0.6 0.8 500 600 700 800 900 1000µM/daytime (days)Pure propionate uptake rates (50x, run_01) 0 15 30 45 500 600 700 800 900 000µM/daytime (days)Pure ethanol uptake rates (50x, run_01)A glucoseuptake rate μM/dayB ethanolC lactate DE butyrateno methane uptake0123µM/dayti e (days)Pure butyrate uptake rates (50x, run_01)0123µM/dayiPure butyrate uptake rates (50x, run_01)uptake rate μM/dayuptake rate μM/daypropionateFigure E.1: Predicted metabolite uptake rates. Community-wide metabolite uptake ratesover time during a simulation at 50-fold functional redundancy.264Chapter 6: Supplemental material 0 10 20 500 600 700 800 900 1000µM/daytime (days)Pure methane export rates (50x, run_01) 0 1 2 500 600 700 800 900 1000µM/daytime (days)Pure butyrate export rates (50x, run_01) 0 2 4 500 600 700 800 900 1000µM/daytime (days)Pure lactate export rates (50x, run_01) 0 0.5 1 500 600 700 800 900 1000µM/daytime (days)Pure propionate export rates (50x, run_01) 0 10 20 30 500 600 700 800 900 000µM/daytime (days)Pure ethanol export rates (50x, run_01)uptake rate μM/dayA ethanolB lactate C propionateD butyrateno glucose export0123500 600 700 800 900 1000µM/dayti e (days)Pure butyrate uptake rates (50x, run_01)0123500 600 700 800 900 1000µM/dayti e (days)Pure butyrate uptake rates (50x, run_01)E methaneuptake rate μM/dayuptake rate μM/dayFigure E.2: Predicted metabolite export rates. Community-wide metabolite export ratesover time during a simulation at 50-fold functional redundancy.265Chapter 6: Supplemental materialphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationcell concentration cell concentration cell concentration cell concentration cell concentration cell concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationphage concentrationcell concentration cell concentration cell concentration cell concentration cell concentration cell concentrationphageconcentrationphageconcentrationphageconcentrationphageconcentrationphageconcentrationphageconcentrationphageconcentrationphageconcentrationphageconcentrationphageconcentrationhost cellconcentrationhost cellconcentrationhost cellconcentrationhost cellconcentrationhost cellconcentrationhost cellconcentrationFigure E.3: Predicted phage-host trajectories. Cell concentrations (horizontal axes) andassociated phage concentrations (vertical axes) across time (one plot per phage-host pair), during asimulation at 5-fold functional redundancy (i.e., comprising 60 cell populations). A brighter pointon a trajectory indicates an earlier time in the simulation. Trajectories that appear completelybright correspond to hosts that went extinct early in the simulation.266Chapter6:SupplementalmaterialTable E.1: Pathway stoichiometry. Reaction stoichiometry and standard Gibbs free energies(∆Go, kJ · mol−1 substrate) used in the model, taken from Conrad (77). Reaction IDs are as inFig. 6.1.ID Reaction ∆Goglucose fermentationA C6H12O6 → 2 CH3CH2OH + 2 CO2 −235.0B C6H12O6 → 2 CH3CHOHCOOH −198.1C C6H12O6 → 3 CH3COOH −311.2D C6H12O6 + 2 H2O→ 2 CH3COOH + 2 CO2 + 4 H2 −216.1E C6H12O6 → 4/3 CH3CH2COOH + 2/3 CH3COOH + 2/3 CO2 + 2/3 H2O −311.4F C6H12O6 → 2/3 CH3CH2CH2COOH + 2/3 CH3COOH + 2 CO2 + 8/3 H2 −248.0syntrophy (catabolism of short-chain fatty acids, lactate and alcohols)G CH3CH2OH→ CH3COOH + 2 H2 +9.6H CH3CHOHCOOH + H2O→ CH3COOH + CO2 + 2 H2 −48.7I CH3CH2COOH + 2 H2O→ CH3COOH + CO2 + 3 H2 +31.8J CH3CH2CH2COOH + 2 H2O→ 2 CH3COOH + 2 H2 +48.3methanogenesisK CH3COOH→ CO2 + CH4 −35.6L 4H2 + CO2 → 2 H2O + CH4 −32.7267Chapter6:SupplementalmaterialTable E.2: Model parameters. Parameter types used in the model, including substrate half-saturation concentra-tions and maximum cell-specific substrate uptake rates. Parameters marked with an asterisk (?) are randomly anduniformly chosen within an interval spanning 10− 1000 % of their default value, independently for each OTU and foreach simulation. References indicated by “†”: Mass-specific rates converted to cell-specific rates based on a dry cellmass of 2.8× 10−13 g (344).symbol and description scope default value referencem dry cell mass fermenters 280 fg (344)m dry cell mass syntrophs 280 fg (344)m dry cell mass H2/CO2 methanogens 440 fg (5)m dry cell mass acetoclastic methanogens 2.5 pg (5)V max. cell-sp. glucose upt. rate all glucose fermenters ? 67.2 fmol · cell−1 · d−1 (318)†K glucose half-saturation conc. — ” — ? 0.53 mM (318)†V max. cell-sp. H2 upt. rate H2/CO2 methanogens ? 1.43 pmol · cell−1 · d−1 (5)K H2 half-saturation conc. — ” — ? 7.65 µM (5)V max. cell-sp. acetate upt. rate acetoclastic methanogens ? 0.55 pmol · cell−1 · d−1 (5)K acetate half-saturation conc. — ” — ? 442 µM (5)V max. cell-sp. lactate upt. rate lactate syntrophs ? 143 fmol · cell−1 · d−1 (78)†K lactate half-saturation conc. — ” — ? 380 µM (78)V max. cell-sp. ethanol upt. rate ethanol syntrophs ? 536 fmol · cell−1 · d−1 (80)†K ethanol half-saturation conc. — ” — ? 0.3 µM (220)V max. cell-sp. butyrate upt. rate butyrate syntrophs ? 72.8 fmol · cell−1 · d−1 (78)†K butyrate half-saturation conc. — ” — ? 76 µM (5)V max. cell-sp. propionate upt. rate propionate syntrophs ? 44.8 fmol · cell−1 · d−1 (78)†K butyrate half-saturation conc. — ” — ? 432 µM (259)β phage infectivity free phage particles ? 3× 10−10 L · d−1 (1, 322)µ lysis rate all infected cells ? 2 d−1 (319, 322)ν phage particles released per lysis — ” — ? 10 (319, 322)λ hydraulic turnover rate bioreactor 0.1 d−1 (122, 533)Cogl glucose input concentration bioreactor 16 mg · L−1 (215, 505)T temperature bioreactor 35◦C (122, 533)pH bioreactor 7.0 (122, 533)268Chapter 7: Supplemental materialAppendix FChapter 7: Supplemental materialF.1 MethodsF.1.1 Model overviewThe core model is a set of differential equations for the concentrations of 8 metabolites and 6proxy genes (DNA) across depth (100–200 m) and time. Each gene is a proxy for a particularenergy-yielding pathway, which couples the oxidation of an external electron donor to thereduction of an external electron acceptor. Each gene is considered as a replicating unit thatis independent of other genes. This corresponds to the simplifying assumption that eachcell occupies a single metabolic niche associated with one of the modeled pathways (181,383) . Gene-specific reaction rates depend on the concentrations of all involved metabolitesaccording to 1st or 2nd order (Michaelis-Menten) kinetics (211, 386) (Appendix F.3.4). Inturn, the production or depletion of metabolites at any depth is determined by the reactionrates at that depth, taking into account reaction stoichiometry (Appendix F.3.3) and diffusivetransport across the water column. The production of genes at any depth is driven by therelease of energy from their catalyzed reactions, and is proportional to the Gibbs free energymultiplied by the reaction rate (396) (Appendix F.3.5). In addition, gene populations aresubject to exponential decay rates, diffusive transport and sinking.F.1.2 Mathematical model structureThe DNA concentration for gene r (Γr, copies per volume) exhibits the dynamics∂Γr∂t=− qrΓr + 1cZrHrΓr − v∂Γr∂z+ ∂∂z(K∂Γr∂z), (F.1.1)269Chapter 7: Supplemental materialwhile the concentration of the m-th metabolite (Cm, mole per volume) follows∂Cm∂t=∑rSmrHrΓr +∂∂z(K∂Cm∂z). (F.1.2)Both the gene concentrations Γr and metabolite concentrations Cm depend on time t anddepth z. The first term on the right-hand-side of Equation (F.1.1) corresponds to cell death,with qr being the exponential death rate in the absence of any metabolic activity for pathwayr. The 2nd term corresponds to gene production, with Hr being the per-gene reactionrate as a function of metabolite concentrations (Appendix F.3.4). The biomass productioncoefficient Zr is a linear function of the Gibbs free energy of reaction r (Appendix F.3.5). c isthe average dry cell mass, which is used to convert biomass production into cell production.The 3rd term corresponds to cell sinking at speed v. The last term in Equations (F.1.1)and (F.1.2) corresponds to diffusive transport, with K being the vertical eddy-diffusioncoefficient. In Equation (F.1.2), Smr is the stoichiometric coefficient of metabolite m inreaction r (Appendix F.3.3). The sum on the right hand side of Equation (F.1.2) iteratesthrough all reactions and thus accounts for microbial metabolic fluxes. Equations (F.1.1)and (F.1.2) specify the rates of change for the DNA and chemical concentration profiles.Steady state profiles were obtained after long simulations when all profiles had eventuallystabilized.F.1.3 Considered pathwaysRedox pathways occurring in a single cell require at least two enzymes, one involved in theoxidation of the initial electron donor and one involved in the reduction of the final elec-tron acceptor. In the model such pathways are represented by single proxy genes, chosensuch that ambiguities in their functional role are minimized. For example, nitrous oxidereduction using nitrous oxide reductase (nosZ ) coupled to sulfide oxidation is identified withnosZ, because many sulfur oxidizing enzymes are reversible. Other pathways considered inthe model are partial denitrification of nitrate to nitrous oxide coupled to sulfide oxidation(PDNO), aerobic ammonium oxidation using ammonia monooxygenase (amo), aerobic ni-trite oxidation to nitrate using nitrite oxidoreductases (nxr), anammox, i.e., the anaerobicammonium oxidation involving hydrazine oxidoreductase (hzo), as well as aerobic remineral-ization of (dissolved) organic matter (ROM). PDNO comprises 3 denitrification steps whichare thought to be predominantly performed by the same microorganisms in the SUP05 clade(181, 504): nitrate reduction to nitrite involving dissimilatory nitrate reductases (narGHIJor napAB), nitrite reduction to nitric oxide using nitrite reductases (nirKS) and nitric oxide270Chapter 7: Supplemental materialreduction to nitrous oxide using nitric oxide reductases (norBC ). The first denitrificationstep was assumed to be leaky, so that a small fraction of nitrite is released into the extracel-lular environment (253). We used norBC as a proxy for PDNO when interpreting moleculardata (but see Figs. F.1 d,e,f in the Appendix for narGHIJ, napAB and nirKS multimolec-ular data, and Fig. 7.3A for coverage of the dissimilatory sulfide oxidation pathway). ROMis associated with the release of ammonium and sulfate (SO2−4 ) at ratios corresponding tomarine bacterial biomass stoichiometry (115). The choice of redox pathways in the modelfollows the hypotheses put forward by Hawley et al. (181) based on molecular depth profiles,as well as reports of nitrous oxide reduction coupled to hydrogen sulfide oxidation in SaanichInlet (74).Hydrogen sulfide is assumed to originate via diffusion from the sediments, where intensesulfate reduction occurs (340) (Appendix F.4.1). Sulfate reduction was omitted from ourmodel because both our molecular as well as chemical data suggest that sulfate reduction inthe water column is negligible compared to the oxidation of sulfur compounds (see AppendixF.4.1 for a detailed discussion). In fact, when we included sulfate reduction in preliminarytests of our model the agreement between the model and the H2S profiles decreased dramat-ically, providing further evidence that H2S is largely supplied from the bottom, rather thanproduced in the water column.Aerobic H2S oxidation was omitted from the model based on extensive previous work thatpoints towards NO−3 and other nitrogen compounds as dominant electron acceptors for H2Soxidation during periods of strong stratification (11, 181, 218, 503, 540). For example,as shown in Fig. 7.2B, the upper boundary of H2S concentrations closely follows the lowerboundary of NO−3 — rather than O2 — over time, especially during the period considered here(early 2010). We mention that during renewal events in Fall O2 can become an importantelectron acceptor for H2S oxidation (540), however this does not affect this study, whichfocuses on a period of intense stratification near steady state conditions. A more detaileddiscussion on the role of aerobic sulfide oxidation is provided in Appendix F.4.3.Pathways for hydrogen (H2) and methane (CH4) metabolism are not included on grounds ofparsimony, because these are not directly linked to the other considered pathways (540) andbecause low hydrogen and methane fluxes into the OMZ suggest that hydrogen and methanepathways are of secondary importance (275, 540).271Chapter 7: Supplemental materialF.1.4 Model calibration and dataUnknown parameters of the basic gene-centric model (Eq. (F.1.1) and (F.1.2), ignoringmRNA and protein dynamics) were calibrated by comparing steady state predictions tomeasured depth profiles of oxygen, ammonium, nitrate, nitrite, hydrogen sulfide and nitrousoxide. Chemical calibration data were acquired on January 13, February 10 and March10, 2010 (or February 10 and April 7 for oxygen) from a single location in Saanich Inlet(123◦ 30.30′ W, 48◦ 35.50′ N; Appendix F.2.2). The calibrated parameters were the maxi-mum cell-specific reaction rate VPDNO, the 1st order rate constants AROM and AnosZ, as wellas the PDNO leakage fraction LPDNO (Supplemental Table F.2). Calibration was performedby maximizing the likelihood of a statistical model, in which the deterministic part (i.e.,expectation) is given by the predictions of the gene-centric model and the stochastic part(i.e., error) is normally distributed (Appendix F.3.8). This calibration method is known asmaximum-likelihood estimation and is widespread in statistical regression and physics (113).Maximization of the likelihood was performed using the MATLAB R© function fmincon, whichuses repeated simulations and gradual exploration of parameter space (309). The sensitiv-ity of the model to parameter variation was assessed via local sensitivity analysis (71), asdescribed in Appendix F.3.12. An overview of our workflow is shown in Supplemental Fig.F.2.Samples for molecular sequencing were collected on February 10, 2010 from the same locationas the geochemical data (Appendixes F.2.3 and F.2.4). Metagenomic profiles (a priori inrelative units) were rescaled to match the model scales using maximum-likelihood estimatedfactors (Appendix F.3.9). SUP05 cell counts for February 10, 2010 were estimated viaquantitative PCR (qPCR) using SUP05-specific primers targeting the 519–1048 region ofthe SUP05 16S rRNA gene, following the protocol by Hawley et al. (182). 16S gene countswere corrected for the number of 16S rRNA gene copies per cell, estimated using the Tax4Funpipeline (19) (Appendix F.2.6). Denitrification and anammox rates were measured on cruises47 (SI047_07/07/10) and 48 (SI048_08/11/10) via ex situ incubation experiments, and weresubsequently corrected for differences between in-situ and incubated substrate concentrations(Appendix F.2.5).F.1.5 mRNA and protein modelsAs mentioned previously, upon calibration of the gene-centric model to the geochemicalprofiles, we extended the model to describe mRNA (and similarly, protein) dynamics inthe water column. Specifically, the production rate of an mRNA (transcripts produced per272Chapter 7: Supplemental materialtime and per volume of seawater) at a particular depth was assumed to be proportional tothe total reaction rate (mol per time and per volume of seawater) at that depth. A linearrelation, while only an approximation, can be justified by the fact that increased enzymedilution rates at elevated cell division rates must be balanced (at the population level) bycorrespondingly increased translation — and hence transcription — rates (403). We alsoassumed that mRNA molecules disperse via diffusion and sinking similarly to genes (asthey are hosted by the same cells) and decay exponentially with time. Thus, environmentalmRNA concentrations satisfy the partial differential equation∂Tr∂t= −Trτr+ Rrαr− v∂Tr∂z+ ∂∂z(K∂Tr∂z), (F.1.3)where Tr is the mRNA concentration corresponding to the r-th reaction, τr is the decaytime of the mRNA molecule, Rr = HrΓr is the total reaction rate and αr is an unknownproportionality constant. We considered Tr in the same units as the multi-omic data (i.e.,RPKM for metatranscriptomes and NSAF for metaproteomes). Consequently, αr is theratio between the r-th reaction rate and the corresponding RPKM (or NSAF) “productionrate” (mol per time per volume of seawater per RPKM), and thus not only depends onthe particular reaction, but also on our sampling protocol and sequencing pipeline. Theabove model was evaluated at steady state, when mRNA production, dispersal and decayare balanced at each depth (∂Tr/∂t = 0). The parameters of the mRNA and protein models(proportionality factors and decay times) were calibrated by fitting to the metatranscriptomicand metaproteomic data, respectively (Appendix F.3.10). Calibration to metatranscriptomicdata failed for amo mRNA. Metagenomic and metaproteomic data were not available for nxrand nosZ, respectively (Appendix F.2.3). For all other mRNAs and proteins, the iterativecalibration converged rapidly to an optimum and this optimum was robust against variousstarting values for the parameters.F.1.6 Inverse linear transport modelingIn addition to the model predictions and rate measurements, denitrification and anammoxrates were also estimated directly from chemical concentration profiles via inverse lineartransport modeling (ILTM, Appendix F.5). ILTM provides an estimate for the metabolicfluxes in the OMZ based on the observed chemical concentration profiles. The exact shapeof estimated rate profiles depends sensitively on measurement errors and the noise-reductionmethod applied to the concentration profiles. Hence, ILTM only serves as a rough verificationof the order of magnitude of rates predicted by the model or measured experimentally. ILTM273Chapter 7: Supplemental materialfitting was applied separately to concentration profiles from cruises 47 and 48, as well as tothe chemical profiles used for model calibration (cruises 41–44, Fig. 7.2) after averagingacross replicates at each depth.F.2 Data acquisitionF.2.1 Sampling site and timeSaanich Inlet (SI) a seasonally anoxic fjord on the coast of Vancouver Island, British ColumbiaCanada has been the site of intensive study for many decades (86, 186). The presence ofa shallow glacial entrance sill at 75 m depth limits mixing and ventilation of basin watersbelow approximately 100 m, resulting in stratification and oxygen depletion during springand summer (Fig. 7.1A in the main text). Shifts in coastal currents in late summer andfall lead to an influx of denser, oxygenated and nutrient-rich water into the Inlet shoalinganoxic basin waters upward in a process known as deep water renewal (86, 540). Consistentpartitioning of the microbial community along the redox cline and similarity to other OMZmicrobial communities make Saanich Inlet a model ecosystem for studying the intersectionbetween environmental sequence information and biogeochemical activity along defined re-dox gradients (503, 531, 540).The fjord has a maximal depth of 232 m at the sampling site SI03 (123◦ 30.300′W, 48◦ 35.500′N).Sampling is conducted monthly during daylight hours using a combination of 5 and 8 LNiskin bottles and 12 L Go-Flo bottles attached to a nonconducting wire. A Sea-Bird CTD(conductivity, temperature and depth) sensor attached to the bottom of the wire providesdepth profiles for temperature, salinity, PAR/Irradiance, conductivity, density, and dissolvedoxygen (Sea-Bird ElectronicsTM). Water sampling for multiple chemical and microbial pa-rameters proceeds directly from the bottles in the following order: First, samples are takenfor dissolved O2 measurements via Winkler titration, followed by sampling of dissolved gases.Next, samples are taken for RNA, then protein followed by ammonium, hydrogen sulfide andnitrite. Finally, salinity is measured for a subset of depths for CTD calibration, and samplesare taken for DNA.All molecular sequencing was performed using samples collected on February 10, 2010(cruise SI040_02/08/10). Chemical data were acquired in the same year on January 13(SI041_01/13/10), February 10 (SI042_02/10/10), March 10 (SI043_03/10/10), April 7274Chapter 7: Supplemental material(SI044_04/07/10), July 7 (SI047_07/07/10) and August 11 (SI048_08/11/10).F.2.2 Chemical and physical depth profilesTemperature, salinity and depth were measured using the CTD sensor described above. TheWinkler titration method was used to measure dissolved oxygen (O2) concentrations (329)and calibrate CTD measurements. Samples were collected into Winkler glass Erlenmeyerflasks using latex tubing, overflowing three times to ensure no air contamination, manganese(III) sulphate and potassium iodide were added in succession, inverted to mix and storedat room temperature. Samples were titrated using an automatic titrator. CTD data wereprocessed and manually curated using the Sea-Bird SeasoftTM software.Samples for dissolved nutrient (nitrate, nitrite, sulphate and silicate) analyses were collectedinto 60 mL syringes and filtered through a 0.22 µm Millipore AcrodiscTM into 15 mL falcontubes. Prior to analysis all samples were stored on ice. Nitrate (NO−3 ) samples were storedat −20◦C in the laboratory, and later analyses carried out using a Bran Luebbe autoanalyserusing standard colorimetric methods. For nitrite (NO−2 ) analysis, 2 mL of sample water weresupplemented with 100 µL sulfanilamide and 100 µL nicotinamide adenine dinucleotide in4 mL plastic cuvettes. Prepared standards were supplemented with reagents at the sametime. Cuvettes were inverted for mixing, and temporarily stored on ice for not more than4 hrs. Concentration was measured using a Cary60 R© spectrometer, based on absorbance at452 nm.Samples for ammonium (NH+4 ) and hydrogen sulphide (H2S) were collected directly fromNiskin and GoFlo bottles into 15 mL amber scintillation vials and 15 mL falcon tubesaliquoted with 200 µL 20% zinc acetate respectively. Samples were stored on ice prior toanalysis. For NH+4 analysis, amber vials for standard curve and samples were pre-aliquotedwith 7.5 mL O-phthaldialdehyde (OPA) reagent respectively. 5 mL of sample water intriplicate and standard solutions were transferred into OPA pre-aliquoted amber vials. Vialswere inverted and stored up to 4 hours. From each standard solution and sample water vial,300 µL were transferred into a 96 well round bottom plate. Fluorescence at 380ex/420emmwas read using a Varioskan plate reader. For H2S analysis, 300 µL samples were transferredin triplicate to a 96 well plate, and finally Hach Reagent 1 and 2 (6 µL per well) were added.Absorbance at 670 nm was read after 5 min incubation using a VarioskanTM plate reader.Water for dissolved nitrous oxide (N2O) analysis was collected using Go-flo or Niskin bottles,and was transferred via a Teflon tube into 30 mL or 60 mL borosilicate glass serum vials.275Chapter 7: Supplemental materialVials were overflown three times their volume in order to remove any bubbles from the vialor tubing. Vials were subsequently spiked with 50 µL saturated mercuric chloride using apipette. Vials were then crimp-sealed with a butyl-rubber stopper and aluminum cap, andstored in the dark at 4◦C until they were analyzed. Dissolved nitrous-oxide concentrationswere measured using a purge-and-trap auto-sampler coupled with a gas-chromatographymass-spectrometer (57).F.2.3 Metagenomics, metatranscriptomics and metaproteomicsMetagenome and metaproteome datasets were generated using the same methods as de-scribed in Hawley et al. (182). Metaproteome sequence coverage was quantified using nor-malized spectral abundance factors (NSAF) (361). Metatranscriptome samples were filteredin the field onto 0.2 µm sterivex filter with inline pre-filter of 2.7 µm pre-filter, adding 1.8 mLof RNAlater R© (Qiagen) and freezing on dry ice before transferring to −80◦C. RNA later wasremoved by washing Sterivex filter with Ringer’s solution before proceeding with cell lysis inthe filter cartridge. Total RNA was extracted using the mirVanaTM miRNA extraction kit(Ambion), DNA was removed using the TURBO DNA-freeTM kit (Ambion) and total RNAwas purified using RNeasyTM MiniElute Cleanup Kit (Qiagen). RNA concentration andquality was determined using a Bioanalyzer. Production of cDNA libraries and sequencingwas carried out at the Joint Genome Institute using the TruSeq R© Stranded Total RNA Sam-ple preparation Guide, including depletion of ribosomal RNA using Ribo-Zero. Assembledmetagenomic and metatranscriptomic sequences (contigs) were run through Metapathways(248) for annotation using a combination of RefSeq (470), KEGG (221), COG (469) andMetaCyc (65) databases. Contig coverage was quantified using RPKM values (AppendixF.2.4). KEGG-annotated contigs were assigned to the selected process proxy genes of themodel (Supplemental Table F.1); gene coverage at each depth was then quantified by sum-ming all assigned contig RPKM values.Nitrate reductase (narGHIJ ) assigned to planctomycetes showed a decline with depth, sug-gesting that it may be acting in reverse as a nitrite oxidase (458). In fact, narGHIJ countsaffiliated with planctomycetes (narGHIJ-P) dominated all other nxr-associated counts inthe metagenomes, metatranscriptomes and metaproteomes. We thus associated nxr withnarGHIJ-P. However, because planctomycetes perform anammox in deeper depths (181),we observed a secondary peak in the narGHIJ-P DNA closer to the SNTZ that did not dis-sipate completely in bottom waters. Given this ambiguity in the interpretation of detectednarGHIJ-P genes, we omitted the narGHIJ-P metagenomes and only used the narGHIJ-P276Chapter 7: Supplemental materialmetatranscriptomes and metaproteomes. For more details see Appendix F.4.4.All nosZ -related protein sequences mapped to a nosZ homolog found in the strictly aerobicRoseobacter Maritimibacter alkaliphilus HTCC2654 (264, 477) and showed strong inconsis-tencies with nosZ metagenomic and metatranscriptomic profiles. nosZ genes have beenfound to be enriched on particles, likely because they constitute a more anaerobic niche(139). Our metaproteomes were pre-filtered to remove eukaryotes and particles and areexpected to be impoverished in nosZ proteins, facilitating a potential masking by relatedbut functionally different proteins. We thus omitted the nosZ metaproteomic data from ouranalysis.Table F.1: KEGG orthologous groups (KOG) identified with each gene. The abundance of eachgene was the sum of RPKM values (Appendix F.2.4) assigned to all included KOGs.gene or path-wayKOGs restrictionsROM K12536, K05648 ABC transporters in Pelagibacter and Roseobacteramo K10945, K10946nxr K00370, K00371K00374, K00373narGHIJ in Planctomycetaceahzo K10535 hao in PlanctomycetaceaPDNO(norBC ) K04561, K02305nosZ K00376sat K00958aprAB K00394, K00395dsrAB K11180, K11181nirKS K00368, K15864napAB K02567, K02568narGHIJ K00370, K00371K00374, K00373F.2.4 Quantifying metagenomic and metatranscriptomic data us-ing RPKMRelative open reading frame (ORF) abundance in the metagenomic and metatranscriptomicdatasets was determined for quantitative assessment of pathway coverage. This was achievedby adapting the reads per kilobase per million mapped (RPKM) coverage measure as de-scribed by Konwar et al. (247). Briefly, unassembled Illumina reads were mapped to assem-bled contigs using the short-read aligner BWA-MEM. The resulting SAM file is then inputed277Chapter 7: Supplemental materialinto the MetaPathways v2.5 software (247), which generates an RPKM value per ORF thatis extended to an RPKM per pathway via summation. For the case of determining theabundance of pathways expressed in the metatranscriptome relative to those present in themetagenome, the unassembled metatranscriptome reads were mapped back to the assembledmetagenome contigs. The RPKM calculation is a simple proportion of the number of readsmapped to a particular section of sequence normalized for ORF length and sequencing depth.F.2.5 Process rate measurementsRate measurements for anammox and denitrification were carried out as follows: Samplewater from each depth was collected anaerobically with sterile nitrile tubing directly into200 mL glass serum bottles, six per depth, and capped with butyl-rubber stopper andaluminum cap and stored at 10◦C for approximately 1 hr while collection was completed. Theprotocol described by Holtappels et al. (192) and briefly outlined here, was then followed.One sample from each depth was bubbled with He for 30 min to decrease concentration ofN2. The following substrates were then added: 15NH+4 alone, 15NH+4 and 14NO−2 combined,15NO−2 alone, 15NO−2 and 14NH+4 combined or 15NO−3 alone. A blank for each depth wasalso bubbled with He. Sample water was then transferred from the serum bottle into a12 mL exetainer, capped and stored upside down. Samples in exetainers were then killedwith 50 µL saturated HgCl at time intervals of 0 min, 6 hr, 12 hr, 24 hr, 48 hr and 72hrs. Partial pressures of 29N2 and 30N2 evolved during the incubations were measured bygas chromatography coupled to isotope ratio mass spectroscopy. Rates of anammox anddenitrification were calculated as described by Holtappels et al. (192).Rate measurements using N isotope methods require a compromise between ensuring de-tection of labeled tracer elements and avoiding excessive perturbation of ambient substrateconcentrations (240, §2.1). Due to the extremely low in-situ substrate levels in some of oursamples (Fig. 7.2 in the main text), tracer substrate concentrations in the ex-situ incubator(25 µM NH+4 , 2 µM NO−2 and 5 µM NO−3 ) significantly exceeded in-situ concentrations. Onthe other hand, denitrification and anammox-related genes were found throughout the OMZwater column (Fig. 7.3A in the main text). Hence, rates measured in the incubator are onlypotential rates that likely overestimate actual in-situ rates, especially in substrate-depletedregions far from the SNTZ. For example, Dalsgaard et al. (90) reports a 2–4 fold increase ofanammox rates following the addition of 10 µM NH+4 in anoxic water column experiments.Similarly, Wenk et al. (515) found high potential denitrification rates in nitrate-depleted re-gions of a meromictic lake. We thus corrected our rate measurements for differences between278Chapter 7: Supplemental materialin-situ and incubator substrate concentrations, as described below.The simplest approach would be to multiply measured rates with the ratios of in-situ over ex-situ substrate concentrations, as has been done in previous ex-situ incubation experiments(508). However, such a linear rescaling implicitly assumes that substrate half-saturationconstants are much higher than both the in-situ as well as ex-situ concentrations, an as-sumption that may not be justifiable in regularly substrate-depleted natural environments.For example, members of the Scalindua candidate clade, which is well represented in SaanichInlet (181), exhibit nitrite half-saturation constants as low as 0.45 µM (23). To avoid animplicit assumption of 1st order kinetics, and for consistency with the assumptions of ourmodel, we corrected our rates using Michaelis-Menten kinetic curves (Appendix F.3.4) withthe same half-saturation constants as used in our model (Appendix F.3.7). Specifically, ifR∗hzo(z) is the measured ex-situ (i.e., potential) anammox rate at some particular depth, thenthe corrected in-situ rate was assumed to beRhzo = R∗hzo(z) ·[NH+4 ]KNH+4+[NH+4 ][NO−2 ]KNO−2+[NO−2 ][NH+4 ]∗KNH+4+[NH+4 ]∗[NO−2 ]∗KNO−2+[NO−2 ]∗. (F.2.1)Here, KNH+4 and KNO−2 are anammox half-saturation constants for NH+4 and NO−2 , respec-tively (Appendix F.3.7), [NH+4 ] and [NO−2 ] are the corresponding measured in-situ concen-trations and [NH+4 ]∗ and [NO−2 ]∗ are the concentrations in the incubator at the beginningof the experiment, i.e., [NH+4 ]∗ = [NH+4 ] + 25 µM and [NO−2 ]∗ = [NO−2 ] + 2 µM. Mea-sured denitrification rates were corrected in a similar way to account for differences in NO−3concentrations.F.2.6 qPCR quantification of SUP05 cell countsAll metagenomic, metatranscriptomic and metaproteomic profiles presented here only pro-vide relative — rather than absolute — biomolecule abundances. This remains the de factostandard for multi-omic data sets, owing largely to methodological challenges involved inabsolute DNA, mRNA and protein quantification (but see Smets et al. (428) for recent ad-vancements). As we explain below (section F.3.9), multi-omic depth profiles were linearlyrescaled to facilitate comparison with our model predictions — expressed in absolute genecounts, however this comes at the cost of additional rescaling parameters.In order to perform an independent validation of modeled gene concentrations, we compared279Chapter 7: Supplemental materialthe predicted PDNO gene concentrations to independent cell-count estimates for SUP05 (thedominant nitrate reducer in Saanich Inlet; 181), obtained through quantitative polymerasechain reaction (qPCR). qPCR quantification of SUP05 cell counts was performed for watersamples collected at 8 distinct depths from the same location and time as for multi-omicsequencing (Fig. 7.3A in the main text). Water samples (volume ∼ 1L) were filtered inthe field onto 0.2 µm sterivex filters. Samples were not pre-filtered in order to obtain anaccurate estimate of total in-situ SUP05 cell counts. We used a custom SUP05-specificprimer set (Ba519F–1048R) to amplify the 519–1048 region of the SUP05 16S rRNA gene,and followed the protocol described by Hawley et al. (182) to estimate the starting templateconcentration. qPCR was performed in triplicate for each sample. We multiplied the averagetemplate concentration for each sample by the volume of extracted fluid (∼ 200− 400 µL),divided by the volume of filtered seawater, to obtain an estimate for the concentrationof SUP05 16S gene copies in seawater. To correct for multiple 16S gene copies in singlecells, we divided this concentration by the 16S gene copy number (3.767), estimated formembers of the SUP05 clade based on closely related fully sequenced reference genomes.Specifically, we used the 16S gene copy number assigned by the Tax4Fun pipeline (19) to theclade “Oceanospirillales;SUP05 cluster;uncultured gamma proteobacterium” in the SILVA123 database (378). Note that Tax4Fun (19) uses a probabilistic model to assign multiplereference genomes with varying weights to each clade in the SILVA database. Hence, theeffective 16S gene copy number assigned by Tax4Fun to each clade is the weighted harmonicmean of the 16S gene copy numbers in each reference genome assigned to that clade.F.3 Mathematical modelF.3.1 OverviewThe gene-centric model describes the spatiotemporal dynamics of 8 metabolite concentrationsand 6 gene (DNA) concentrations along the Saanich Inlet water column between depths 100–200 m. Each gene is a proxy for a particular redox pathway that couples the oxidation of anexternal electron donor to the reduction of an external electron acceptor (Appendix F.3.2).The model assumes that each cell occupies a single metabolic niche, associated with oneof the modeled pathways and thus one of the considered proxy genes. Reaction rates (pergene) depend on the concentrations of all used metabolites according to 1st order or 2ndorder (Michaelis-Menten) kinetics (211, 386) (Appendix F.3.4). In turn, the production ordepletion of metabolites at any depth is determined by the reaction rates at that depth,taking into account reaction stoichiometry (Appendix F.3.3). The production of genes (or280Chapter 7: Supplemental materialmore precisely, their host cells) at any depth is driven by the release of energy from theircatalyzed reactions, and is proportional to the Gibbs free energy multiplied by the reactionrate (Appendix F.3.5) (396). In addition, genes are subject to exponential decay as well aseddy-diffusion and sinking. Metabolites are also subject to eddy-diffusion.Mathematically, the model is defined as a set of partial differential equations (PDE) forthe gene and metabolite concentrations across time and depth. More precisely, the DNAconcentration of the r-th gene (Γr, copies per volume) at any a given depth z changesaccording to∂Γr∂t=− qrΓr + 1cZrHrΓr − v∂Γr∂z+ ∂∂z(K(z) ∂∂zΓr), (F.3.1)and the concentration of the m-th metabolite (Cm, mole per volume) changes according to∂Cm∂t=∑rSmrHrΓr +∂∂z(K(z)∂Cm∂z). (F.3.2)Both the DNA concentrations Γr and metabolite concentrations Cm depend on time t anddepth z. The first term in equation (F.3.1) corresponds to cell death, with qr being theexponential death rate for cells hosting gene r in the absence of any metabolites. The2nd term in (F.3.1) corresponds to gene production proportional to the per-gene reactionrate Hr (which in turn depends on metabolite concentrations, see Appendix F.3.4). Thebiomass production coefficient Zr is a linear function of the Gibbs free energy of the reactioncatalyzed by gene r and depends on the reaction quotient (Appendix F.3.5). In particular, Zrincreases when product concentrations are low and decreases when substrate concentrationsare low. c is the average dry cell mass, which is used to convert biomass production into cellproduction. The 3rd term in equation (F.3.1) corresponds to cell sinking at a constant speedv. The last term in equation (F.3.1) and equation (F.3.2) corresponds to diffusive transport(263), with K being the vertical eddy-diffusion coefficient. The 1st term in equation (F.3.2)corresponds to production or depletion of metabolites due to microbial metabolism. Reactionrates are transformed into metabolite fluxes via the stoichiometric matrix S, with entry Smrcorresponding to the stoichiometric coefficient of metabolite m in reaction r (AppendixF.3.3).The differential equations (F.3.1) and (F.3.2) give the rate of change of each metaboliteand gene profile, if the profiles are known at a given moment in time. Once all boundaryconditions (Appendix F.3.6), model parameters (Appendix F.3.7) and initial profiles arespecified, the model predicts the profiles at any future time point. Steady state profiles were281Chapter 7: Supplemental materialobtained by running simulations of the model until convergence to equilibrium. Because thepredicted profiles depend on model parameters, parameters can be calibrated such that thepredicted steady state profiles best reproduce the measured data: We used chemical depthprofiles to fit poorly known model parameters, thus obtaining a model calibrated to SaanichInlet’s OMZ (Appendix F.3.8). This calibrated model was then used to make predictionsabout steady state DNA profiles, which were compared to measured metagenomic profiles(sections F.2.3 and F.3.9). This comparison, described in the main text, serves as a test ofthe model’s ability to explain metagenomic profiles in Saanich Inlet’s OMZ. Reaction ratesat each depth are automatically calculated using the kinetics described in Appendix F.3.4.F.3.2 Considered pathwaysThe model considers key dissimilatory redox pathways involved in nitrogen and sulfur cy-cling. When comparing model predictions to molecular data, each pathway was representedby a single gene. For example, nitrous oxide reduction (nosZ gene) coupled to hydrogensulfide oxidation (dsr, apr and sat genes) is formally represented by nosZ. Other pathwaysconsidered by the model were aerobic ammonium oxidation (amo), aerobic nitrite oxidation(nxr), partial denitrification of nitrate to nitrous oxide (PDNO) coupled to sulfide oxida-tion, anammox (hzo) and remineralization of organic matter via aerobic respiration (ROM).PDNO comprises 3 denitrification steps: Reduction of nitrate to nitrite (narGHIJ or na-pAB genes), reduction of nitrite to nitric oxide (nirKS genes) and reduction of nitric oxideto nitrous oxide (norBC genes), all three of which are suspected to be predominantly per-formed by SUP05 γ-proteobacteria (181, 504). The first denitrification step was assumed tobe leaky, so that a small fraction of nitrite was released into the extracellular environment(253). PDNO was represented by norBC genes when comparing the model to moleculardata (Fig. 7.3A in the main text, but see Figs. F.1D,E,F for narGHIJ, napAB and nirKSmultimolecular data). Aerobic ammonium oxidation included a weak production of nitrousoxide (nitrifier denitrification (407)), although the inclusion of this process did not notice-ably affect model predictions because most of the nitrous oxide was produced by PDNO.Aerobic respiration of organic matter included the release of ammonium and sulfate at ra-tios adjusted to measured C:N:S ratios for marine bacterial biomass (115). The choice ofpathways follows the hypotheses made by Hawley et al. (181) based on metagenomic andmetaproteomic depth profiles, as well as reports of nitrous oxide reduction in Saanich Inlet’sOMZ (74). Hydrogen sulfide is assumed to originate from the sediments via diffusion, wherehigh rates of sulfate reduction have been observed (4, 101) (Appendix F.4.1 for a discussionof this assumption). Figure 7.1A in the main text gives an overview of the described reaction282Chapter 7: Supplemental materialnetwork. The detailed reaction stoichiometry is given in section F.3.3.F.3.3 Pathway stoichiometryWe list the stoichiometry of the dissimilatory redox pathways considered by the model:• Remineralization of organic matter through aerobic respiration:16POM + O2ROM−→ CO2 + H2O + νNH+4 + σSO2−4 (F.3.3)where POM corresponds to(C6H12O6)(NH+4 )6ν(SO2−4 )6σ (F.3.4)and1 : ν : σ = 1 : 0.184 : 0.0113 (F.3.5)correspond to typical molar C : N : S ratios in marine bacterial biomass (115).• Aerobic ammonium oxidation:NH+4 +12(3− Lamo)×O2amo−→ (1 + Lamo/2)× H2O + (1− Lamo)× NO−2 + (Lamo/2)× N2O + (2− Lamo)× H+.(F.3.6)Here Lamo is a parameter representing the fraction of N released as N2O via nitrifierdenitrification, compared to the total NH+4 consumed (407). For example, if Lamo = 0,then ammonium is completely oxidized and released as nitrite.• Aerobic nitrite oxidation:2NO−2 + O2nxr−→ 2NO−3 . (F.3.7)• Anaerobic ammonium oxidation (anammox):NH+4 + NO−2hzo−→ N2 + 2H2O. (F.3.8)283Chapter 7: Supplemental material• Partial denitrification to nitrous oxide (PDNO) coupled to hydrogen sulfide oxidation:(1− LPDNO/2)× H2S + 2NO−3 PDNO−→(1− LPDNO)× N2O + 2LPDNO × NO−2+ (1− LPDNO/2)× SO2−4 + (1− LPDNO)× H2O + LPDNO × H+.(F.3.9)Here, LPDNO is a parameter representing the fraction of NO−2 leaked to the extracellularmedium during PDNO, compared to the total NO−3 consumed (253).• Nitrous oxide reduction coupled to hydrogen sulfide oxidation:H2S + 4N2O nosZ−→ 4N2 + SO2−4 + 2H+. (F.3.10)• Nitrate reduction to ammonium (DNRA, identified with the nirBD gene):H2S + NO−3 + H2ODNRA−→ NH+4 + SO2−4 . (F.3.11)DNRA was eventually omitted from the model for reasons described in Appendix F.4.2.F.3.4 Reaction kineticsRespiration of organic matter involves the hydrolysis of particulate organic mater (POM)to dissolved organic matter (DOM), which is subsequently broken down to simpler organicmolecules by fermenters that provide non-fermenting organotrophs with a reactive DOMpool. However, reactive DOM rarely accumulates and most of the DOM pool is expected tobe refractory (350). Furthermore, POM degradation has been found to be strongly correlatedto bacterial growth in subeuphotic zones, likely due to limiting POM hydrolysis rates (194).We thus modeled organic matter respiration rates as a first-order function of particulateorganic carbon (POC) concentrations (323). More precisely, the gene-specific ROM reactionrate, HROM, is a function of metabolite concentrations C given byHROM(C) = AROMFT × CPOMCO2CO2 +KROM,O2, (F.3.12)where KROM,O2 is the oxygen half-saturation constant, AROM is a first-order rate constant(“affinity”) and FT is the unitless thermodynamic potential factor given by Reed et al. (386)(equation S1)284Chapter 7: Supplemental materialHalf-saturation constants reported for nitrous oxide oxidation are typically on the order of0.37− 2.5 µM N2O (414, 519) and 40 µM H2S (212), which are well above the typical N2Oand H2S concentrations in the Saanich Inlet OMZ (Fig. 7.2 in the main text). Sulfide-drivennitrous oxide reduction in Saanich Inlet is therefore likely limited both by electron donor aswell as electron acceptor availability. We thus modeled nitrous oxide reduction using firstorder substrate kinetics with oxygen inhibition:HnosZ(C) = AnosZFT × CN2OCH2SKnosZ,O2CO2 +KnosZ,O2, (F.3.13)where KnosZ,O2 is the oxygen half-inhibition constant and AnosZ is a first-order rate constant.All other gene-specific reaction rates (Hr) are modeled using Michaelis-Menten kinetics withpossible inhibition (211, 386):Hr(C) = VrFT×∏m reactantof reaction rCmKrm + Cm× ∏n inhibitorof reaction rK?rnK?rn + Cn.(F.3.14)Here, Vr is the maximum gene-specific reaction rate andKrm andK?rn are half-saturation andhalf-inhibition constants, respectively, given in Appendix F.3.7. The only explicitly modeledinhibition was oxygen inhibition for anammox (hzo), PDNO and nitrous oxide reduction(nosZ ).F.3.5 Gibbs free energy and gene growthFollowing Roden and Jin (396) and Reed et al. (386), we setZr = 2.08γer − 0.0211∆Gr, (F.3.15)(in g biomass per mole reaction flux) where γer is the negative stoichiometric coefficient ofthe electron donor in the reaction,∆Gr = ∆Gor +RgT lnQr (F.3.16)285Chapter 7: Supplemental materialis the Gibbs free energy of the reaction (in kJ per mol), ∆Gor is the standard Gibbs freeenergy of the reaction andQr =∏mCSmrm (F.3.17)is the reaction quotient (96). Each ∆Gor depends on the local temperature and pressure andwas calculated using the CHNOSZ R package (103).F.3.6 Boundary conditionsUniquely solving the partial differential equations (F.3.1) and (F.3.2) requires appropriateboundary conditions (BC) for all genes and metabolites at the top and bottom boundaries(100 m and 200 m, respectively). For all metabolites except N2, N2O, SO2−4 and O2, BCswere fixed values set to the average measurements from cruises 41 (SI041_01/13/10), 42(SI042_02/10/10) and 43 (SI043_03/10/10). For N2 and N2O, lower BCs were set to Neu-mann (zero flux). For O2, we used Dirichlet BCs (fixed value) with values equal to theaverage measurements from cruise 42 and 44 (SI044_04/07/10), because O2 data were un-available for cruises 41 and 43. For SO2−4 we used Dirichlet BCs set to 28 mM on bothsides (323). Metabolite boundary conditions are summarized in Table F.1. These boundaryconditions result in a net oxygen and nitrate influx from the top as well as an ammoniumand sulfide influx from the sediments (4, 100, 340).Gene boundary conditions were either set to fixed zero (hzo and norBC top BCs, ROM, amoand nxr bottom BCs) or to fixed relative gradients (ROM, amo, nxr, nirBD and nosZ topBCs, hzo, nirBD, norBC and nosZ bottom BCs), with the relative gradient inferred fromthe metagenomic profiles.F.3.7 Model parameterizationHalf-saturation and half-inhibition constants for all involved pathways are listed in TableF.2. Maximum cell-specific reaction rates were set to Vamo = 1.23 × 10−13 mol/(cell · d)(389), Vnxr = 3.26× 10−13 mol/(cell · d) (389) and Vhzo = 2× 10−14 mol/(cell · d) (251, 457).The nitrifier denitrification fraction Lamo was set to 10−4, according to nitrifier denitrificationfractions of marine ammonium oxidizing archaea measured by Santoro et al. (407, Fig 2)over varying NO−2 concentrations, and the fact that in Saanich Inlet NO−2 concentrations aretypically below 2 µM (Fig. 7.2 in the main text). Because of a lack of reliable information,286Chapter 7: Supplemental materialTable F.1: Top (100 m) and bottom (200 m) boundary conditionsfor metabolites in the gene-centric partial differential equation model.Numerical values denote Dirichlet boundary conditions. ‘N’ denoteszero-flux Neumann conditions.Metabolite Top (µM) Bottom (µM)NH+4 0 8.67O2 77.23 0NO−3 27.59 0NO−2 0.045 0N2 4.8× 10−4 NSO2−4 28× 103 28× 103H2S 0 14.07N2O 24.49× 10−3 Nthe rate constants AROM, VPDNO and AnosZ, as well as the PDNO leakage fraction LPDNO,were calibrated to chemical profiles as described in Appendix F.3.8 and in the main text.Calibration yielded AROM = 5.11 × 10−9 L/(cell · d), VPDNO = 2.18 × 10−14 mol/(cell · d),AnosZ = 0.098 L/(cell · d) and LPDNO = 0.352. An overview of fixed and calibrated reaction-kinetic parameters is provided in Table F.2. The sensitivity of the model to parametervariation is illustrated in Appendix F.3.12.The dry cell mass was assumed to be c = 5 × 10−13 g, for consistency with the mass usedby Roden and Jin (396) to obtain the regression formula (F.3.15). Cell death rates wereset to qROM = 0.063 d−1 in accordance with turnover times estimated by Whitman et al.(518) for marine prokaryotic heterotrophs above 200 m; to qamo = 0.024 d−1 in accordancewith average values reported for ammonium oxidizing bacteria (143); to qnxr = 0.054 d−1corresponding to values estimated for nitrite oxidizers (175) and to 0.0033 d−1 for all othergenes, in accordance with turnover times estimated by Whitman et al. (518) for marineprokaryotes below 200 m.The concentration of H+ was fixed to 8.5 nM, corresponding to pH= 8.07 (384). The totaldissolved inorganic carbon (DIC) was fixed to 2141 µM, corresponding to a surface DIC of2180 µmol/kg (528) and a surface water density of 1018 kg/m3. Accordingly, the dissolvedCO2 concentration was fixed at 28 µM according to aquatic carbonate equilibrium at thegiven pH and DIC (541). The POC profile was calculated from data reported for February2011 by Luo et al. (290) and POM was set to (1/6)×POC (Fig. F.1C in the Appendix).Fixing the POM profile circumvents poorly understood physical processes contributing to287Chapter 7: Supplemental materialorganic matter fluxes in Saanich Inlet (290). CO2, H+ and POM concentrations, while fixed,were still included in the reaction quotients (Appendix F.3.5) as well as the reaction-kinetics(Appendix F.3.4).The diapycnal eddy diffusion coefficient K was set to N−2 · 3.7× 10−10 W · kg−1, where N isthe buoyancy frequency (120, 142). The latter was calculated using temperature and salinityprofiles from January 13, 2010, using the oce R package (228) (Fig. F.1 in the supplement)after loess-smoothing temperature at degree 2 and salinity at degree 1, with a span of 75%.We chose this time point because the two subsequent temperature and salinity measurements(February 10th and March 10th) were unreliable due to technical problems with our CTD.The cell sinking speed v was set to 0.1 m/day, in accordance with previous marine microbialecological models (26, 121).288Chapter7:SupplementalmaterialTable F.2: Reaction-kinetic parameters used in the gene-centric model, either calibrated or taken fromthe literature: Half-saturation substrate concentrations (K), half-inhibition concentrations (K?), cell-specificmaximum rates for 2nd order kinetics (V ), 1st order kinetic constants (A, “affinities”), nitrifier denitrificationfraction (Lamo) and PDNO leakage fraction (LPDNO). The exact role of each parameter is explained inAppendix F.3.4. Additional (non-kinetic) fixed model parameters are provided in Appendix F.3.7. Cladeswith members that have been found active in the Saanich Inlet OMZ (181) are marked with a “†”.reaction parameter value units organism/region SourceROM KO2 0.121 µM Escherichia coli (452)A 5.11 nL/(cell · d) calibr.amo KNH+4 0.133 µM Ca. Nitrosopumilus maritimus† (302)KO2 3.91 µM Ca. Nitrosopumilus maritimus† (277)V 123 fmol/(cell · d) Nitrosomonas spp.† (389)Lamo 10−4 – marine ammonia oxidizing archaea† (407)nxr KNO−2 11.7 µM Nitrospira spp.† (37)KO2 0.78 µM Chilean OMZ (47)V 326 fmol/(cell · d) Nitrobacter sp. (389)hzo KNH+4 3 µM Ca. Scalindua sp.† (23)KNO−2 0.45 µM Ca. Scalindua sp.† (23)K?O2 0.2 µM Peruvian OMZ (219, 386)V 20 fmol/(cell · d) Planctomycetales† (251, 457)PDNO KNO−3 2.9 µM marine anoxic basin (206)KH2S 2 µM Saanich Inlet OMZ (40)K?O2 0.1 µM Eastern South Pacific OMZ (26, 472)V 21.8 fmol/(cell · d) calibr1.LPDNO 35.2 % calibr2.nosZ K?O2 0.971 µM low-oxygen activated sludge (498)A 0.098 L/(cell · d) calibr.1 Frey et al. (131) reports cell-specific thiosulphate-driven denitrification rates for Sulfurimonas gotlandica in therange 24.2− 74.3 fmol/(cell · d).2 Reported fractions of nitrite leakage during incomplete denitrification (LPDNO) range from 0% to 87%(8, 35, 351).289Chapter 7: Supplemental materialA B CFigure F.1: (A) Temperature and salinity profiles at Saanich Inlet main station, January 13, 2010.(B) Corresponding smoothened eddy diffusivity profile, as used in the simulations. (C) Fixed POMprofile used in the simulations.F.3.8 Calibrating reaction-kinetic parameters to dataAs described in the previous section, most model parameters were obtained from the lit-erature, however a subset of reaction-kinetic parameters (AROM, VPDNO, LPDNO and AnosZ;overview in Table F.2) had to be calibrated due to the lack of available information. Here wedescribe the statistical methods used to calibrate unknown reaction-kinetic model parame-ters to available chemical depth profile data. The steady state solution of the model definesa mapping from a given choice of parameter values (collectively written as a vector p) topredicted depth profiles for metabolite concentrations, C1, C2, ... We assumed that measuredconcentrations (C˜1, C˜2, ..) are normally distributed:C˜i = Ci + σi · εi. (F.3.18)Here, εi is a standard-normally distributed error and σi is the (unknown) standard deviationof measurement errors (henceforth referred to as error scale). We allowed for a different σifor each metabolite to account for variations in the magnitude of measurement errors.In the context of our spatial model, the concentrations Ci are predicted as functions ofdepth, z, i.e., Ci = Ci(z;p). Calibration data is given as tuples (zij, C˜ij), where each C˜ij is ameasurement of the i-th concentration at some depth zij and j enumerates all measurements290Chapter 7: Supplemental materialof the i-th concentration. The overall log-likelihood function for such a data set is given byl(σ,p) =−∑i,jln(σi√2pi)−∑i,j12σ2i[C˜ij − Ci(zij;p)]2.(F.3.19)The model was calibrated by maximizing the log-likelihood l(σ,p) by choice of the errorscales σi and the parameter values p. Maximization of the log-likelihood was performedusing the MATLAB R© function fmincon, which uses repeated simulations and gradual ex-ploration of parameter space (309). The following chemical concentration data were usedfor calibration: NH+4 , NO−3 , NO−2 , N2O and H2S from cruises 41–43, and O2 from cruises 42and 44.F.3.9 Calibrating multi-omic data unitsMetagenomic, metatranscriptomic and metaproteomic data are given only in relative units.For example, the correspondence between metagenomic RPKM values and actual DNA con-centrations in the water column is, a priori, unknown. In fact, RPKM values for differentgenes may correspond to different gene concentrations due to detection biases (33, 331, 369).Furthermore, model predictions regarding RNA and protein abundances are in arbitraryunits because the transcriptional, translational and enzymatic efficiency of proteins is un-known and differs between proteins.In order to compare model predictions to multi-omic sequence data, we assumed that eachmeasured DNA, mRNA and protein abundance profile is related to the corresponding modelprediction by a constant linear conversion factor. Conversion factors were estimated viamaximum-likelihood estimation, separately for each molecule to account for detection biases.More precisely, for each data set we assumed a normal error distribution as already describedin Appendix F.3.8. Hence, measured environmental biomolecule concentrations, for exampleamo DNA concentrations, are distributed asΓ˜i = Γi/βi + σi · εi, (F.3.20)where εi are uncorrelated standard-normally distributed errors, scaled by an unknown factorσi, and βi is the unknown proportionality factor between amo metagenomic RPKM values Γ˜iand actual DNA concentrations. The log-likelihood of a measured depth profile comprising291Chapter 7: Supplemental materialNi data points, (zi1, Γ˜i1), .., (ziNi , Γ˜iNi), is thusli(σi,p) =−Ni∑j=1ln(σi√2pi)−Ni∑j=112σ2i[Γ˜ij − Γi(zij;p)/βi]2. (F.3.21)For any fixed model parameter choice p (and therefore fixed predictions Γi), the log-likelihoodli(σi;p) is maximized by choosingβi = Ni√√√√√ Ni∏j=1Γi(zij;p)Γ˜ij, (F.3.22)(i.e., the geometric mean of model predictions over measurements) andσ2i =1NiNi∑j=1∣∣∣Γ˜ij − Γj(zij;p)/βi∣∣∣2 . (F.3.23)Choosing βi as in equation (F.3.22) yields maximum-likelihood estimates for the appropri-ate conversion factors between metagenomic units (RPKM) and actual DNA concentrations(genes/L) (see table F.3 in the supplement). Inserting the estimated βi and σi back intoequation (F.3.21) yields the log-likelihood of the particular metagenomics data set and fora particular choice of model parameters p. A similar approach was used to compare meta-transcriptomic and metaproteomic data sets to model predictions (Appendix F.3.10).The estimated proportionality factors βi are listed in table F.3 of the supplement, and rangefrom 3.9 × 104 genes · L−1 · RPKM−1 for norBC up to 3.3 × 107 genes · L−1 · RPKM−1 forROM. These differences may be due to variable DNA extraction efficiencies across cells,uneven community sampling due to filter-size partitioning (139) or differences in gene copynumbers per cell. Additionally, the assumption of a common cell mass for all modeledgenes may have resulted in an inaccurate conversion of predicted biomass production togene production. However, the good overall agreement between predicted functional geneconcentrations and flow cytometry cell counts (Fig. 7.3 in the main text) suggests that thismay only be a minor problem.F.3.10 Predicting metatranscriptomic and metaproteomic profilesA priori, the gene-centric model makes no predictions regarding mRNA or protein dynamics;in fact transcription and translation are circumvented by assuming that the release of energymanifests directly as DNA replication. To explore the possibility of explaining mRNA and292Chapter 7: Supplemental materialTable F.3: Proportionality factors (β) between environmental gene abundances and metagenomicRPKM values (in genes·L−1 ·RPKM−1), as defined in Appendix F.3.9. Estimated by comparing thepredictions of the calibrated model with metagenomic data from February 10, 2010. Unambiguousmetagenomic data was not available for nxr (see Appendix F.4.4).gene βROM 4.1× 107amo 1.0× 106nxr NAhzo 3.2× 105norBC 3.4× 104nosZ 4.5× 104protein distributions in Saanich Inlet’s OMZ, we extended the model to a set of hypotheticalmechanisms driving the production, decay and dispersal of these molecules. More precisely,we assumed that mRNA and protein production rate at a particular depth is proportional tothe total reaction rate at that depth (HrΓr), and that mRNA and proteins disperse similarlyto genes (Appendix F.3.10). The assumption that mRNA and protein production rates areproportional to reaction rates is motivated by observations of a positive relation betweentranscription and translation rates and metabolic activity or growth (9, 154, 229). A linearrelation, in particular, may be justified by the fact that increased enzyme dilution rates atelevated cell growth must be balanced (at the population level) by correspondingly increasedtranslation (and hence transcription) rates (403).This simple description introduces two unknown parameters per mRNA or protein: Theproportionality factor that converts reaction rates to molecule production rates, and thedecay time of molecules following production. We calibrated both parameters using meta-transcriptomic and metaproteomic depth profiles and then checked how well the latter couldbe reproduced. Our methodology is described for mRNA in detail below. Protein dynamicswere modeled and compared to metaproteomic data in a similar way.As mentioned, our first assumption was that the mRNA production rate (transcripts pro-duced per time and per volume of seawater) at a particular depth is proportional to the totalreaction rate (mol per time and per volume of seawater) at that depth. We also assumed thatmRNA molecules disperse via diffusion and sinking similarly to genes, as they are hostedby the same cells. Thus, environmental mRNA concentrations satisfy the partial differential293Chapter 7: Supplemental materialequation∂tTr = −Tr/τr +Rr/αr − v∂zTr + ∂zK∂zTr, (F.3.24)where Tr(t, z) is the mRNA concentration corresponding to the r-th reaction, τr is the decaytime of the mRNA molecule, Rr(t, z) = Hr(t, z)Γr(t, z) is the total reaction rate at depthz and αr is an unknown proportionality constant. We considered Tr in the same units asthe multi-omic data (i.e., RPKM for metatranscriptomes and NSAF for metaproteomes).Consequently, αr is the ratio between the r-th reaction rate and the corresponding RPKM(or NSAF) “production rate” (mol per time per volume of seawater per RPKM), and thus notonly depends on the particular reaction, but also on our sampling protocol and sequencingpipeline.For each gene r, the transcript profile Tr will satisfy the same boundary conditions as theDNA profile Γr, provided that the latter are either zero value (Dirichlet), zero flux (Neumann)or fixed relative gradient boundary conditions (Appendix F.3.6). We calculated the steadystate solution of equation (F.3.24), T ?r , by solving the time-invariant equation0 = −T ∗r /τr +Rr/αr − v∂zT ∗r + ∂zK∂zT ∗r (F.3.25)using the MATLAB function bvp4c (309). This was done after the gene-centric model hadalready reached steady state, at which point the reaction rates Rr are time-independentfunctions of depth.Note that the steady state profile T ?r (z) is proportional to 1/αr, all else being equal. Hence,by comparing T ?r to metatranscriptomic data (for some given τr), the constant αr can becalibrated via maximum-likelihood estimation as described in Appendix F.3.9. On the otherhand, maximizing the log-likelihood in equation (F.3.21) (separately for each gene) by choiceof αr, τr and the corresponding error scale, yields an estimate of the decay time τr. This wasdone through repeated solutions of equation (F.3.25) with varying τr and using the interior-point optimization algorithm implemented by the MATLAB function fmincon (309). Weconfined the fitted τr to between 10−4 and 105 days.After calibration of the decay time τr and proportionality factor αr, we calculated the coef-294Chapter 7: Supplemental materialficients of determination,R2r = 1−∑j[T˜rj − Tr(zrj)]2∑j[T˜rj − T r]2 , (F.3.26)to evaluate how well the mRNA model explained the metatranscriptomic data. Here,T˜r1, T˜r2, .. are measured mRNA abundances at depths zr1, zr2, .. and T r is their average. Forany given gene r, R2r is a measure for the goodness of fit of the above model to the multi-omicdata. Table F.4 in the supplement lists the results for all genes for which R2r ≥ 0.5.The statistical significance (P-value) of the obtained R2 was defined as the probability ofobtaining the same or greater R2 by applying the same procedure to a random data set,with independent normally distributed values with mean and standard deviation set to theoriginal sample mean and standard deviation. We estimated the P-values for cases whereR2r ≥ 0.9 using Monte Carlo simulations of 1000 random data sets: all of them were estimatedbelow 0.005.Table F.4: Proportionality factors (α) between mRNA or protein abun-dances and reaction rates (in mol/(L · d · RPKM) or mol/(L · d · NSAF),respectively), exponential mRNA or protein decay times (τ) and coeffi-cients of determination (R2), estimated as described in Appendix F.3.10.Only cases with R2 ≥ 0.5 are shown.molecule α τ (days) R2nxr mRNA 9.7× 10−10 52 0.93nosZ mRNA 2.5× 10−8 222 0.95ROM protein 2.7× 10−3 67 0.59amo protein 3.0× 10−5 42 0.92nxr protein 2.1× 10−4 510 0.65norBC protein 4.6× 10−4 145 0.91295Chapter 7: Supplemental materialMeasured chemical depth profilesCalibrated gene-centric model(predicts reaction rates and DNA distributions)ˆrˆt=≠ qrr + 1cZrHrr ≠ vˆrˆz+ ˆˆz3Kˆrˆz4ˆCmˆt=ÿrSmrHrr +ˆˆz3KˆCmˆz4.Zr = 2.08“er ≠ 0.0211Gr,Gene-centric modelHr(C) = VrFTŸm reactantof reaction rCmKrm + Cm◊Ÿn inhibitorof reaction rKırnKırn + CnCell-specific reaction rates = Michaelis-Menten kineticsBiomass yield per reaction flux = proportional to Gibbs free energyMetabolic fluxes determined by reaction rates and diffusionGene population growth modelcalibration of unknown reaction-kinetic parametersfor gene-centric model Metatranscriptomic and metaproteomic data(O2,N2O,NO≠2 ,NO≠3 ,NH+4 ,H2S)Postulated mRNA and protein dynamicsˆTrˆt= ≠Tr/·r +Rr/–r ≠ vr ˆTrˆz+ ˆˆz3KˆTrˆz4NH+4 NO3NO2SO24N2OO2HSN2SO24DOMROMamo nxrhzonosZPDNOMetabolic networkii“Article” — 2016/5/13 — 21:35 — page 9 — #9 iiiiiiFig. 2. Measured and predicted geochemical profiles. (a) oxygen, (b) ammonium, (c) nitrate, (d) nitrite, (e) nitrous oxide and(f) hydrogen sulfide concentrations as predicted by the calibrated model at steady state (thick blue curves). Dots: Data used for thecalibration, measured during cruise 41 on January 13, 2010 (SI041 01/13/10, rectangles), cruise 42 (SI042 02/10/10, rhomboids) and cruise43 (SI043 03/10/10, triangles). Oxygen profiles were not available for cruises 41 and 43, hence data from cruise 44 (SI044 04/07/10, stars)were used instead. Thin black curves: Data measured during cruise 47 (SI047 07/07/10), shortly before deep water renewal. Details ondata acquisition in Supplement S1.Footline Author PNAS Issue Date Volume Issue Number 9ii“Article” — 2016/5/13 — 21:39 — page 10 — #10 iiiiiiFig. 3. Molecular and rate profiles. (a) Predicted DNA, mRNA and protein concentrations (rows 1–3) for ROM, amo, nxr,norBC, hzo and nosZ genes (thick curves), compared to corresponding metagenomic, metatranscriptomic and metaproteomic data (circles,February 10, 2010). The dashed curve under PDNO genes (row 1, column 4) shows concurrent qPCR-based cell count estimates for SUP05,the dominant denitrifier in Saanich Inlet. (b) Denitrification and anammox rates predicted by the model (thick blue curves), comparedto rate measurements (circles) during cruises 47 (SI047 07/07/10) and 48 (SI048 08/11/10), as well as rates estimated from geochemicalconcentration profiles using inverse linear transport model fitting (ILTM; Supplement S5). The ILTM estimates “calibr.” in the 3rd and6h plot are based on the same geochemical data as used for model calibration (Fig. 2).10 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline AuthorROM amo nxr nosZPDNOhzo estimation of unknown mRNA and proteinparametersii“Article” — 2016/5/12 — 23:09 — page 9 — #9 iiiiiiFig. 3. Molecular and rate profiles. (a) Predicted DNA, mRNA and protein concentrations (rows 1–3) for ROM, amo, nxr,norBC, hzo and nosZ genes (thick curves), compared to corresponding metagenomic, metatranscriptomic and metaproteomic data (circles,February 10, 2010). The dashed curves in column 4 show metagenomic and metatranscriptomic pathway coverage (49) of the dissimilatorysulfate reduction/sulfide oxidation pathway (DSRO). (b) Denitrification and anammox rates predicted by the model (thick blue curves),compared to rate measurements (circles) during cruises 47 (SI047 07/07/10) and 48 (SI048 08/11/10), as well as rates estimated fromgeochemical concentration profiles using inverse linear transport model fitting (ILTM; Supplement S1.6). The ILTM estimates “calibr.”in the 3rd and 6h plot are based on the same geochemical data as used for model calibration (Fig. 2).Footline Author PNAS Issue Date Volume Issue Number 9DNA (10⁶/L)mRNA (RPKM)proteins (10ˉ³ × NSAF)ROM amo nxr nosZPDNOabhzomodel measurements ILTM model measurements ILTMdenitrification anammoxROM amo nxr PDNOhzoEvaluation of postulated mRNA and protein dynamicsFigure F.2: Ov rview of the modeling approach for the Saanich Inlet OMZ. Previous geo-chemical and multi’omi investigations provi e c ceptual information on the metabolic networkin the Saanich Inlet OMZ (11, 181, 218, 503, 504, 540). T is information was used to constructa gene-centric biogeochemical mathematical model, which describes the population dynamics ofindividual genes and metabolic process rates. Unknown reaction-kinetic parameters of the modelwer calibrated using geochemical depth profil s. The predictions of the calibrated gene-centricmodel were then validated using independent metagenomic sequence data, qPCR-based cell countestimates for SUP05 as well as process rate measurements. A subsequent extension of the modeldescribes the production, dispersal and decay of mRNA and protein molecules based on the reac-tion rates predicted by the calibrated gene-centric model. Unknown parameters for the mRNA andprotein dynamics are estimated using metatranscriptomic and metaproteomic data. The “good-ness of fit” to these multi-omic data is used to further evaluate the gene-centric model, to assessthe adequacy of the postulated mRNA and protein dynamics and to gain insight into potentiallyimportant but omitted mechanisms of mRNA and protein regulation at ecosystem scales.296Chapter 7: Supplemental materialF.3.11 Calculating metabolic fluxes between pathwaysDissimilatory metabolic reactions can be interpreted as sources and sinks of metabolitesdistributed along the water column, producing and consuming metabolites at rates given bythe first term in equation (F.3.2). Due to diffusive transport (2nd term in equation (F.3.2)),metabolite fluxes from sources to sinks need not be localized and can span across differentdepths. Furthermore, some metabolites are partly transported across the OMZ boundaries,towards or from the top layers or the sediments. In the following we describe our approachfor calculating steady-state metabolite fluxes across individual reactions.Let us focus on a particular metabolite and consider a single hypothetical particle created attime 0 at depth x. Let G(t, x, y) be the Green’s function of the dispersal-destruction model,so that G(t, x, ·) is the distribution density of a particle (created at depth x) at depth y andafter time t. Note that G(t, x, ·) may integrate to less than unity if the particle has a positiveprobability of being consumed anywhere in the water column. The probability rate at whichthat particle is consumed by any sink j at time t is then∫dy G(t, x, y)λj(y)C(y) , (F.3.27)where λj(y) gives the rate at which sink j consumes particles at depth y and C(y) is the steadystate metabolite concentration at that depth. Since each sink corresponds to a pathwayconsuming the metabolite, λj(y) is given by the community-wide reaction rate at y multipliedby the appropriate stoichiometric coefficient. The probability that the particle will eventuallybe destroyed by sink j is given by∫ ∞0dt∫dy G(t, x, y)λj(y)C(y) . (F.3.28)The total rate at which particles created by source i are destroyed by sink j across the entireOMZ, denoted Fij, isFij =∫dx bi(x)∫ ∞0dt∫dy G(t, x, y)λj(y)C(y) , (F.3.29)where bi(x) is the rate at which the metabolite is produced by source i at depth x. Switchingintegrals in (F.3.29) givesFij =∫dyλj(y)C(y)∫ ∞0dt ϑi(t, y), (F.3.30)297Chapter 7: Supplemental materialwhereϑi(t, y) =∫dx bi(x)G(t, x, y) (F.3.31)is the solution to the dispersal-destruction model with initial distribution bi(x):∂tϑi(t, y) =− ϑi(t, y)C(y)∑jλj(y) + ∂y [K(y)∂yϑi(t, y)] ,ϑi(0, y) =bi(y).(F.3.32)Particles crossing the domain boundary are considered to be lost. Hence, Dirichlet (Neu-mann) boundary conditions in the original model correspond to zero-value (zero-flux) bound-ary conditions for ϑi. The total boundary loss rate of particles created by source i is theremainderFi,o =∫dx bi(x)−∑jFij. (F.3.33)Similarly, the rate at which particles flow in at the boundary and are destroyed by sink j isgiven byFo,j =∫dx λj(x)−∑iFij. (F.3.34)We solved equation (F.3.32) using the MATLAB R© function pdepe and evaluated all integralsin equation (F.3.30), (F.3.33) and (F.3.34) using the trapezoid integration scheme (309).F.3.12 Local sensitivity analysisWe evaluated the sensitivity of the model predictions to small changes in model parametersusing normalized local sensitivity coefficients (NLSC) (71). NLSCs compare the relativechanges in model output variables (Vj, integrated over all depths) to the relative changes ofmodel parameters (pi) by means of partial derivatives, evaluated at the default (e.g., fitted)parameter values:NLSCij =∣∣∣∣∣ piVj ∂Vj∂pi∣∣∣∣∣ . (F.3.35)Hence, NLSCij is a measure for the relative effects that parameter i has on the outputvariable j. The partial derivative in equation (F.3.35) was approximated numerically by298Chapter 7: Supplemental materialchanging pi by 1% from its default value. The results are summarized in figure F.3 in thesupplement.The sensitivity of the model varied strongly among parameters. For example, the kineticconstants for ROM (aerobic remineralization of organic matter) had a relatively strong effecton chemical as well as gene concentration profiles by modulating the availability of oxygenand ammonium near and above the SNTZ. On the other hand, the kinetic constants forPDNO and nosZ (which constitute the denitrification pathway) had relatively little effectson the predicted chemical profiles, as long as both were increased or decreased in unison.Similar observations were made for amo and nxr, which constitute the nitrification path-way. Moreover, the total predicted gene concentrations (Fig. 7.3 in the main text) wererobust against parameter changes and only varied within an order of magnitude as long asthe calibrated geochemical profiles matched the data moderately well. This suggests thatgeochemical fluxes are good predictors for microbial growth, but less suited for estimatingreaction-kinetic parameters, especially when these are correlated (242).Figure F.3: Local sensitivity heatmap of the calibrated model by means of normalized local sen-sitivity coefficients. A brighter color corresponds to a higher sensitivity. “Khalf” stands for half-saturation constants, “Kinh” for half-inhibition constants, “V” for maximum cell-specific reactionrates and “q” for cell death rates. The heatmap is hierarchically clustered using UPGMA linkageand Euclidean metric. Methodological details are given in Appendix F.3.12.299Chapter 7: Supplemental materialF.4 Caveats and special notesF.4.1 The role of sulfate reductionThe choice of pathways included in the model was based on metaproteomics data by Hawleyet al. (181). None of the proteins associated with sulfur-metabolism were mapped to knownsulfate reducers, suggesting that these proteins may act in sulfur oxidation and that sulfatereduction only played a minor role in Saanich Inlet’s OMZ at the time of sampling. In par-ticular, an NCBI BLASTP search mapped all detected dsrA and aprAB proteins to SUP05(301). All other taxonomically resolved sulfite reductase proteins were mapped to Candi-datus Ruthia magnifica, a sulfur-oxidizing endosymbiont (400). The mRNA depth profilesof sat, aprAB and dsrAB (Supplemental Figs. F.1A,B,C), which comprise the dissimilatorysulfide oxidation pathway (or sulfate reduction pathway when reversed), show a clear peakat the SNTZ, consistent with the metatranscriptomic profiles of the denitrification genesnorBC and nosZ (Fig. 7.3 in the main text). These multimolecular data suggest that thesat, aprAB and dsrAB enzymes act predominantly in sulfur oxidation. The high sat, aprABand dsrAB DNA concentrations at the bottom might be due to sediment resuspension, cellsinking from the more productive SNTZ or cell diffusion from the sulfate reducing sediments(290, 347).Due to the much higher organic matter concentrations in the sediments, heterotrophic sulfatereduction and anaerobic remineralization is correspondingly higher in the sediments than inthe water column (4, 101). Hence, most of the H2S and NH+4 in the sulfidic part of theOMZ is expected to originate from the adjacent sediments via diffusion. An influx of H2Sand NH+4 predominantly from the sediments is compatible with the measured steep H2S andNH+4 gradients (Figs. 7.2B,F in the main text), as well as the gradual upward progressionof the H2S and NH+4 fronts following annual renewal (Figs. 7.1B and 7.2B,F in the maintext). Sediments have previously been indicated as the main sulfide sources in other OMZs,such as the the Eastern Boundary upwelling system (415) or the central Namibian coastalupwelling zone (50).Due to the lack of rate measurements heterotrophic sulfate reduction and cryptic sulfurcycling cannot be completely ruled out. However, calibrating the above model to the chemicaldata (Fig. 7.2 in the main text), while including sulfate reduction as an additional pathway,dramatically decreases the goodness of fit. This is because an additional H2S source in theOMZ shifts the SNTZ further up, thereby increasing the main discrepancy between the modeland the data. Hence, on grounds of parsimony, we eventually omitted sulfate reduction from300Chapter 7: Supplemental materialA B CD E FFigure F.1: Metagenomic, metatranscriptomic and metaproteomic depth profiles of (A) sulfateadenylyltransferase (sat), (B) adenylylsulfate reductase (aprAB) and (C) sulfite reductase (dsrAB)genes, which together comprise the sulfide oxidation pathway (or sulfate reduction pathway, ifreversed), as well as (D) periplasmic nitrate reductase napAB, (E) nitrate reductase narGHIJand (F) NO-forming nitrite reductase nirKS. Data taken on February 10, 2014. All of the dsrAB,aprAB and most of the napAB protein sequences were mapped to the γ-proteobacterial SUP05 clade(504). All detected narGHIJ protein sequences were either mapped to SUP05 or to the anammoxplanctomycete bacteria Candidatus Scalindua profunda and KSU-1 (493) (only SUP05 proteins areshown). Similarly, only non-planctomycete-annotated narGHIJ and nirKS DNA abundances areshown.the model and assumed that H2S originates from the sediments via diffusion.We note that similar theoretical work by Reed et al. (386) did suggest the existence of acryptic sulfur cycle in the Arabian Sea OMZ. However, the latter is located more than 1 kmabove the sediments and hydrogen sulfide influx from the sediments into the OMZ is notpossible due to elevated oxygen levels below the OMZ (489).301Chapter 7: Supplemental materialF.4.2 The role of DNRAIt has been previously hypothesized that dissimilatory nitrate reduction to ammonium(DNRA) might be active in Saanich Inlet’s OMZ, possibly providing ammonium to anammoxbacteria (181, 253, 374). So far DNRA was not detected in any of our incubation experi-ments, although we cannot rule out cryptic DNRA due to rapid ammonium consumption byanammox (374). Measured ammonium profiles in Spring 2010 did not indicate a significantammonium source at or below the SNTZ (Fig 7.2 b in the main text). Similarly, Schuncket al. (415) reports negligible DNRA for a sulfidic OMZ off the coast of Peru.Nevertheless, we tested an extension of our model with DNRA as an additional pathway.Calibrating the model to the same data (January–March 2010) consistently predicted neg-ligible DNRA rates, and the goodness of fit (in terms of the log-likelihood) did not signifi-cantly improve with the inclusion of DNRA. On grounds of parsimony we thus eventuallyomitted DNRA from the model. We mention that calibrating the model to chemical datafrom September 2009 (181) indicated significant DNRA as well as anammox rates (both inthe order of 1 mmol N/(m2 · d)), suggesting that DNRA-fed anammox activity fluctuatesstrongly throughout the year. High spatiotemporal variability of N -loss activities are knownfor other OMZs and may be associated with fluctuations in surface primary production, aswell as fluctuations in electron acceptor availability driven by annual deep water renewal(12, 205, 252).F.4.3 The role of aerobic sulfide oxidationExtensive previous work points towards NO−3 and other nitrogen compounds as dominantelectron acceptors for H2S oxidation in Saanich Inlet during periods of strong stratification(11, 181, 218, 503, 540). For example, as shown in Fig. 7.1B in the main text, the upperboundary of H2S concentrations closely follows the lower boundary of NO−3 — rather than O2— over time, especially during the period considered in this study (early 2010). The strongsimilarity between sulfur cycling gene profiles and denitrification gene profiles (February 10,2010; Fig. F.1) provides further evidence for the tight coupling between denitrification andsulfide oxidation at that time. Similarly, nitrogen compounds have been shown to be thedominant electron acceptors for sulfide oxidation in the Peruvian OMZ (415), and Canfieldet al. (55) established a strong link between sulfide oxidation and nitrate reduction in theChilean OMZ. Note that during renewal events in Fall, O2 can indeed become an importantelectron acceptor for H2S oxidation in Saanich Inlet (540). This does not, however, affectthis study, which focuses on periods of intense stratification near steady state conditions.302Chapter 7: Supplemental materialWe note that we had initially considered aerobic sulfide oxidation as an additional reactionin our model. Preliminary calibrations to geochemical data showed that the model’s ex-planatory power was significantly compromised by this reaction, because diffusive O2 fluxesinto the sulfidic zone could not account for the O2 needed for sulfide oxidation (in additionto O2 needed for nitrification). In fact, in our simulations ammonium ended up competingwith H2S for O2, which in turn negatively affected the accuracy of the predicted NO−3 profile.While lateral intrusions of oxygenated water could in principle account for the additionalO2 needed for sulfide oxidation, spatiotemporal O2 profiles do not provide any indication ofsuch intrusions during this period of intense stagnation (Fig. 7.1B in the main text). Wethus omitted aerobic sulfide oxidation from our final model.F.4.4 Planctomycetes and nxrOur molecular data suggest that the anammox bacteria planctomycetes (90) are also aer-obically oxidizing nitrite to nitrate in the oxycline (181) using the nitrate oxidoreduc-tase narGHIJ (458). Metatranscriptomic and metaproteomic profiles of planctomycete-associated narGHIJ sequences peak at about 120 m depth and decrease rapidly below that(Fig. 7.3 in the main text), while planctomycete-associated HAO (anammox-associatedhydroxylamine-oxidoreductase (458)) sequences are most abundant at 150 m depth and atappreciable levels all the way down to 200 m. As a consequence, narGHIJ is expected to alsoproliferate in regions where it is not actually being transcribed. Indeed, metagenomic datashow a bimodal profile of Planctomycete-associated narGHIJ sequences, with local maximaat 120 m and 150 m depths, corresponding to the putative maxima of nitrite oxidation andanammox activity. Due to this bimodality we did not include narGHIJ nor nxr metagenomicprofiles in our analysis.F.5 Inverse linear transport modeling (ILTM)Chemical concentration profiles were used to estimate denitrification and anammox ratesacross the water column, independently of the gene-centric model and the rate measurementsdescribed in Appendix F.2.5. In short, a steady state diffusion model was used to estimate thenet metabolite production (or consumption) rates that “best” explained the observed depthprofiles. This so called inverse linear transport modeling (ILTM) approach is widespread inoceanography and atmospheric sciences, were known global distributions of compounds suchas trace gases are used to estimate unknown sources and sinks (303, 446).303Chapter 7: Supplemental materialIn the following, we explain our procedure for estimating the net production profile, ρ(z),for a particular metabolite with a given concentration profile, Cˆ(z). All calculations wereperformed in Mathworks MATLAB R©. Each profile Cˆ(z) was obtained through PiecewiseCubic Hermite Interpolating Polynomial (PCHIP) interpolation of the actual measured con-centrations. ILTM was applied separately to concentration profiles from cruises 47 and 48,as well as to the chemical profiles used for model calibration (cruises 41–44, Appendix F.3.8)after averaging across replicates at each depth.Our starting point is the diffusive transport model0 = ρ+ ∂∂z[K(z) · ∂C∂z], (F.5.1)which describes the steady-state distribution C(z) across depth z, given a particular netproduction profile ρ(z) and eddy diffusion coefficient K(z). The eddy diffusion coefficientwas calculated as described in Appendix F.3.7. Our goal is to determine the appropriateρ(z) that “best” explains the observed steady state profile Cˆ(z), through the following steps:1. Calculate the discretized Green’s function (394) of the above partial differential equa-tion (PDE) with zero Dirichlet boundary conditions: Let Gnm be an approximation forG(zn, zm), where G solves the time-independent PDE0 = ∂∂x[K(x) ∂∂xG(x, y)]+ δ(x− y) (F.5.2)on the domain Ω := [top, bottom], with boundary conditionsG(x, y)∣∣∣x∈∂Ω = 0. (F.5.3)In practice, Gnm can be set to dzm ·G(zn, zm), where G is the solution to the PDE system0 = ∂∂x[K(x) ∂∂xG(x, zm)]+H(x− zm + dzm/2)H(zm + dzm/2− x)/dzm,G(x, zm)∣∣∣x∈∂Ω = 0.(F.5.4)Here, H is the Heaviside step function and dzm is the grid’s step at zm, assumed to bechosen small enough (dz = 2 m in our case).304Chapter 7: Supplemental material2. Note that for any candidate net production profile ρ(x), the sum∑mGnm · ρ(zm) (F.5.5)becomes an approximation for Co(zn), where Co is a solution to the following steady-statetransport problem with zero Dirichlet boundary conditions:0 = ∂∂x[D(x)∂Co∂x]+ ρ(x), Co(x)∣∣∣x∈∂Ω = 0. (F.5.6)3. For the given measured concentrations Cˆ(x) at the domain boundary x ∈ {top, bottom},calculate the particular solution Cp to the transport problem with given boundary valuesbut no sources:0 = ∂∂x[K(x)∂Cp∂x], Cp(x)∣∣∣x∈∂Ω = Cˆ(x). (F.5.7)After solving for Cp, evaluate Cp on the grid, i.e., set Cpn = Cp(zn).4. Note that for any candidate net production profile ρ(x), the sum C := Co + Cp is asolution to the full PDE problem0 = ∂∂x[K(x) ∂∂xC(x)]+ ρ(x), C(x)∣∣∣x∈∂Ω = Cˆ(x). (F.5.8)Similarly, the sumCpn +∑mGnm · ρ(zm) (F.5.9)is an approximation for C(zn).5. Note that Cp corresponds to the hypothetical steady-state profile that would result purelyfrom transport across the domain boundary, in the absence of any sources or sinks in itsinterior. Similarly, the difference B = Cˆ − Cp is the part that cannot be explained bytransport across boundaries, but must rather be attributed to production and consump-tion inside Ω. Hence, using the particular discretized solution Cpn, the discretized profileCˆn = Cˆ(zn) and the discretized steady-state transport kernel Gnm, one could in principleestimate ρm = ρ(zm) by minimizing the sum of squared residuals (SSR)SSR =∑n∣∣∣∣∑mGnm · ρm −Bn∣∣∣∣2, (F.5.10)305Chapter 7: Supplemental materialwhere Bn = Cˆn − Cpn. The above problem is a classical linear least-squares problem ifone considers Gnm as a matrix (G) and ρm, Cˆn, Cpn as vectors (ρ ∈ RM , Cˆ ∈ RN andCp ∈ RN):SSR = ‖G · ρ−B‖2 . (F.5.11)The minimum SSR is then obtained forρ = G˜ · (Cˆ−Cp), (F.5.12)where G˜ is the Moore-Penrose pseudoinverse of G. Put simply, the so estimated ρ is thenet production profile that “best” explains the observed steady-state concentration profileCˆ, after subtracting the part Cp explained by transport across the domain boundaries.6. The least-squares estimator in Eq. (F.5.12) becomes unstable if the reference profile Cˆstretches linearly (or almost linearly) across large depth intervals, leading to spuriousoscillations in the estimated profile ρ. To address this problem, we “penalized” strongoscillations in the estimated net production profile by instead minimizing the modifiedSSRSSR∗ = ‖G · ρ−B‖2 +M−2 ‖ξρ‖2 , (F.5.13)where ξ is an appropriately chosen regularization parameter (36) that quantifies thepenalty imposed on large |ρ|. The above regularization method is known as Tikhonovregularization. A larger Tikhonov factor ξ will typically result in a smoother ρ but also apoorer overall fit, since goodness of fit is sacrificed in favor of small ρ. We manually choseξ as large as possible but still small enough such that the residual ‖G · ρ−B‖ remainedmuch smaller than ‖B‖.7. Assuming that H2S is mostly consumed by denitrification (PDNO and nosZ ) accordingto the stoichiometry given in Appendix F.3.3, one mol of consumed H2S corresponds to8 · (1 − LPDNO)/(5 − 3LPDNO) mol N released as N2. Similarly, one mol of consumedNH+4 by anammox corresponds to 2 mol N released as N2, however nitrification likely alsocontributes to NH+4 consumption in the more oxygenated layers. Hence, whenever the netNO−3 production was positive, the net NO−3 production rate was subtracted from the netNH+4 consumption rate, yielding an estimate for NH+4 consumption purely by anammox.306Chapter 8: Supplemental materialAppendix GChapter 8: Supplemental materialG.1 MethodsG.1.1 Details on example 1 (batch-fed incubator)ParameterizationIn the model, temperature was held constant at 20◦C and pH was held constant at 5, inaccordance with the original incubation experiment (94). Ammonia and ammonium wereassumed to be at dissociation equilibrium, determined by the pH and the standard ammo-nium dissociation constant 5.69× 10−10 M (73). The dissociation constant was corrected forthe lower temperature in the experiment using the Van ’t Hoff equation (20).The initial ure capacity M oure was estimated from the derivative of the urea time series,assuming that the initial ure kinetics were saturated by high substrate concentration. Timeseries derivatives were estimated via 4th order Savitzky-Golay smoothening with a slidingwindow span of 10 days (222), followed by centered finite differences. The initial urea, NH+4and NO−3 concentrations were set to 1.12 mM, 124 µM and 49.8 µM, respectively, accordingto the first sampling point in the measured time series. The initial NO−2 concentration wasassumed to be zero. The parameters Kure, Knxr, Aamo and Anxr were taken from existingliterature on Nitrosospira and Nitrobacter (Table G.1).The remaining free parameters Kamo,M onxr, Aure, Aamo,ure, λAOB and ρure were simultaneouslycalibrated to the urea, NH+4 and NO−3 time series via maximum-likelihood estimation (113).This approach estimates unknown parameters by maximizing the likelihood of observingthe available data given a particular candidate choice of parameter values. Maximum like-lihood estimation is widely used in statistical inference such as multilinear regression andphysics (291). In our case, the likelihood of the data was calculated on the basis of a mixeddeterministic-stochastic structure, in which the deterministic part is given by the reaction-centric model and errors are assumed to be normally distributed on a logarithmic scale.307Chapter 8: Supplemental materialThe likelihood was maximized using the SBPLX optimization algorithm (214), which usesrepeated simulations and gradual exploration of parameter space. To reduce the possibilityof only reaching a local maximum, fitting was repeated 100 times using random initial pa-rameter values and the best fit among all 100 runs was used. Parameter confidence intervalswere calculated using the inverse observed Fisher information, which is an estimator of theparameter covariance matrix (92). Fitted parameter values, their confidence intervals and acomparison to available literature are given in Table G.1.Assessing the importance of ure-amo cross-amplificationTo test the suitability of a model variant without ure-amo cross-amplification as outlinedin the main text, we treated ure and amo as independent reactions performed by separatecell populations. Hence, we assumed Aamo,ure = Aure,amo = 0 and ρure = 0, and replacedthe maintenance rate λAOB with two independent rates λamo and λure. Furthermore, theinitial capacities M oure and M oamo were treated as independent parameters. The new set offree parameters thus comprised Kamo, M oamo, M onxr, Aure, Aamo, λure and λamo, while theremaining parameters were fixed as described above. Fitting was performed as with theoriginal model and yielded multiple local optima, none of which matched the data as well asthe original model (Supplemental Fig. G.2).308Chapter8:SupplementalmaterialTable G.1: Fixed and fitted model parameters for the batch bioreactor incubated with Nitrosospirasp. and Nitrobacter sp. Parameters marked with an asterisk (?) were unknown and were thusfitted to the time series; approximately comparable literature values are provided where available.SE refers to the standard error of the fitted value, in the same units. The initial metaboliteconcentrations Courea, CoNH+4and CoNO−3were taken from the chemical time series on day 1. Theinitial reaction capacity Moure was estimated from the slope of the time series at time zero. Theremaining parameter values were taken from the indicated literature.param. value SE comparison group literatureKure 670 µM urea – – Nitrosospira L115 (208)Kamo ? 4.59 µM NH3 ±0.27 6–11 Nitrosospira spp. (208)Knxr 27.2 µM NO−2 – – Nitrobacter spp. (37)Aure ? 1.11 d−1 ±0.004 – – –Aamo 1.2 d−1 – – Nitrosospira AV2 (30)Anxr 1.03 d−1 – – Nitrobacter sp. (227)Aamo,ure ? 12.8 d−1 ±0.54 – – –λAOB ? 0.0055 d−1 ±0.0005 0.027 d−1 N. europaea (467)ρure ? 0.26 ±0.018 – – –Courea 1.12 mM – – –CoNH+4124 µM – – –CoNO−20 – – –CoNO−349.8 µM – – –M oure 773 nM/d – – – –M onxr ? 35.8 mM/d ±1.4 – – –309Chapter 8: Supplemental materialG.1.2 Details on example 2 (flow-through bioreactor)Assimilation of time seriesExperimental time series of NH+4 and NO−3 concentrations were noise-filtered using 4th orderSavitzky-Golay smoothening with a sliding window time span of 30 days (222). Derivativesof concentration profiles were estimated by applying a centered finite differences scheme tothe noise-filtered profiles. amo and nxr rates were estimated from the derivatives of theNH+4 and NO−3 concentration profiles, respectively, after accounting for substrate input anddilution. Estimated amo and nxr rates were then used in the growth model for the reactioncapacities, Eqs. (8.4.9,8.4.10), as described in the main text.ParameterizationIn the experiment, pH was maintained around 7 by the automatic addition of an alkalinesolution, and the bioreactor was maximally ventilated to ensure sufficient oxygenation (109).In our model we thus assumed pH = 7 and ignored oxygen limitation in the reaction kinetics.Temperature was assumed to be 30◦C until day 181 and 25◦C afterwards, in accordance withthe original experiment. Bioreactor dilution rates and input substrate concentrations wereobtained from the authors of the original experiment upon personal correspondence. NH3concentration was calculated from NH+4 concentration by assuming that the two are at acid-dissociation equilibrium, similarly to the first example.The initial amo capacity, M oamo, was estimated from the NH+4 time series but had negligibleeffects on the simulations. The initial nxr capacity was set to zero based on the absence ofNO−3 accumulation. The amo and nxr half-saturation constants and the self-amplificationfactors Aamo and Anxr were calibrated to the NH+4 , NO−2 and NO−3 time series by maximizingthe mean coefficient of determination (R2) across all three data sets, which is analogousto weighted least-squares fitting in the univariate case. Only data from days 1–250 wereused for the calibration. The mean R2 was maximized using the SBPLX algorithm (214).To reduce the possibility of only reaching a local maximum, fitting was repeated 100 timesusing random initial parameter values and the best fit among all 100 runs was used. Fittedparameter values and a comparison to available literature are given in Table G.2.310Chapter8:SupplementalmaterialTable G.2: Fixed and fitted model parameters for the flow-through bioreactor (109). Parametersmarked with an asterisk (?) were calibrated using data from days 1–250 and are compared toliterature values. The initial metabolite concentrations CoNH+4, CoNO−2and CoNO−3were taken fromthe chemical time series on day 1. The initial reaction capacities Moamo and Monxr were estimatedfrom the slopes of the chemical time series on day 1. The parameters C inNH+4, µ, pH and temperaturewere controlled throughout the experiment.param. value comparison group literatureKamo ? 3.21 µM NH3 1.2–23 AOB (463, 507)Knxr ? 1.32 mM NO−2 0.01–1.68 NOB (198, 294)Aamo ? 0.145 d−1 0.32–2.1 AOB (202, 375)Anxr ? 0.176 d−1 0.17–1.4 NOB (31, 375)CoNH+426.7 mM – – –CoNO−20 – – –CoNO−30 – – –M oamo 17.1 mM/d – – –M onxr 0 – – –C inNH+435.7− 143 mM – – –µ 0− 0.46 d−1 – – –pH 7 – – –temperature 30◦C – 25◦C – – –311Chapter 8: Supplemental materialG.1.3 Computational methods: Using MCM for reaction-centricmodelsThe biochemical models described in the main text were constructed using MCM (Chap-ter 4; (284)). The framework allows the specification of microbial and abiotic reactionnetworks within an environmental context resembling, for example, a bioreactor. All consid-ered metabolites (e.g., NO−2 and NH+4 ), any reactions between them (e.g., amo and nxr) andany environmental variables (such as temperature and dilution rate), are specified in specialconfiguration files using high-level code. For example, the specification of metabolites maylook as follows:NO2environmental_dynamics: initial 0 flux -NO2*dilution_rateand environmental_productionNH4environmental_dynamics: initial 0 flux (input_NH4 - NH4)*dilution_rateand environmental_productionNotice that we specify the initial concentration (i.e., at the beginning of the experiment)of both metabolites as zero. Furthermore, both metabolites are subject to biochemicalfluxes (indicated by the keyword environmental_production), as well as depletion at arate proportional to the bioreactor’s dilution rate, represented by a separate model variabledilution_rate. In addition, NH4 is subject to repletion at a rate proportional to the inputsubstrate concentration, represented by the model variable input_NH4. Both environmentalvariables dilution_rate and input_NH4, in turn, are explicitly specified using time seriesthat are linearly interpolated between time points:dilution_ratedynamics: value interpolation of \"data/dilution_rate.txt\"units: 1/dayinput_NH4dynamics: value interpolation of \"data/input_NH4_concentration.txt\"units: mol/L312Chapter 8: Supplemental materialSince we assume NH3 and NH+4 to be at dissociation equilibrium, we define NH3 as a non-dynamical variable explicitly depending on NH4:NH3environmental_dynamics: base_of_acid NH4 5.62e-10Observe that we provide the ammonium standard dissociation constant, which is used tocalculate NH3 at any given time depending on NH4, pH and temperature. The latter twoare, in turn, specified as additional environmental variables, e.g., in the continuous-flowbioreactor model as follows:pHdynamics: value 7temperaturedynamics: value piecewise2(t,181,30,25)units: CThe definition of each reaction requires a chemical equation and a specification of the reac-tion’s rate, as demonstrated below:amoequation: NH4 + 1.5*O2 -> NO2 + H2O + 2Henvironmental_rate: amo_capacity * NH3/(NH3 + $Khalf_amo$)nxrequation: NO2 + 0.5*O2 -> NO3environmental_rate: nxr_capacity * NO2/(NO2 + $Khalf_nxr$)313Chapter 8: Supplemental materialObserve that while in the model amo consumes NH4, its rate is limited by NH3 in accordancewith suggestions by Suzuki et al. (463) that ammonia, and not ammonium, is the limitingsubstrate. The rates of both reactions are proportional to the bioreactors reaction capacities,amo_capacity and nxr_capacity, modeled as separate dynamic variables (see below). Thehalf-saturation constants, Khalf_amo and Khalf_nxr, are enclosed in dollar signs indicatingthat these are so called symbolic model parameters. Symbolic parameters allow a high-levelanalysis of the model, and can be automatically calibrated to available data. Symbolic modelparameters are themselves specified in appropriate configuration files, using a syntax similarto the following:Khalf_amodefault: 8.5e-6minimum: 1e-10maximum: 1e-1units: mol/Lfixed: noKhalf_nxrdefault: 2.29e-4minimum: 1e-10maximum: 1e-1units: mol/Lfixed: noObserve that we specified both symbolic parameters as non-fixed, which tells MCM tocalibrate them whenever possible. Model calibration is an iterative process, which beginswith the specified default values, and gradually explores the parameter space within theconstraints specified by minimum and maximum. The amo and nxr capacities are defined asdynamic environmental variables:amo_capacitydynamics: initial 0.017 rate $A_amo$*data_rate_amo - amo_capacity*dilution_rateunits: mol/L/d314Chapter 8: Supplemental materialnxr_capacitydynamics: initial 0 rate $A_nxr$ * data_rate_nxr - nxr_capacity * dilution_rateunits: mol/L/dObserve that the growth of amo_capacity and nxr_capacity is driven by the reaction ratesestimated from the time series, data_rate_amo and data_rate_nxr, respectively. The latterare, in turn, calculated directly from the available time series as well as the known dilutionrate and input substrate concentration:data_NH4dynamics: value smoothening_SG4 30 of \"data/NH4.txt\"units: mol/Ldata_NO3dynamics: value smoothening_SG4 30 of \"data/NO3.txt\"units: mol/Ldata_NH4_rate_of_changedynamics: value derivative_CFD of smoothening_SG4 30 of \"data/NH4.txt\"units: mol/L/ddata_NO3_rate_of_changedynamics: value derivative_CFD of smoothening_SG4 30 of \"data/NO3.txt\"units: mol/L/ddata_rate_amodynamics: value (input_NH4-data_NH4) * dilution_rate - data_NH4_rate_of_changeconstraints: positiveunits: mol/L/ddata_rate_nxrdynamics: value data_NO3 * dilution_rate + data_NO3_rate_of_changeconstraints: positiveunits: mol/L/d315Chapter 8: Supplemental materialThe above code is parsed by MCM, which sets up and numerically solves the correspondingdifferential equations for the bioreactor’s reaction capacities, the actual reaction rates aswell as the metabolite concentrations. When provided with time series data correspond-ing to any of the model’s predictions (e.g., NH4 concentration), MCM calibrates unknownmodel parameters (e.g., Khalf_amo) using maximum-likelihood estimation (113). The like-lihood is optimized using an iterative optimization algorithm involving step-wise parameteradjustments and repeated simulations. Other fitting objectives are also available, such asmaximization of the average coefficient of determination (R2), which is analogous to weightedleast-squares fitting.MCM itself is controlled through custom scripts, i.e., text files containing a sequence ofspecial commands, such as for running simulations or fitting parameters. For example, thefollowing four commands specify the output directory, the model configuration files, the totalsimulation time (in days), and subsequently invoke a simulation of the model:setod simulation_output/nitrifierset model models/nitrifierset maxSimulationTime 525runMCMThe full incubator and bioreactor models, as well as all necessary MCM scripts, are availableat: http://www.zoology.ubc.ca/MCMG.2 Mathematical proofsG.2.1 On specific maintenance ratesThe effects of maintenance requirements of individual cells on their population is typicallyrepresented by exponential decay that acts against flux-driven biosynthesis (210). However,sometimes it may be desirable to account for a stagnation of metabolism and growth if energyharvest per cell falls below a certain threshold. For example, in constraint-based metaboliccell models (354) this occurs automatically when the solution space becomes empty due toa a fixed ATP flux representing cell maintenance requirements (538). Here we show howmaintenance requirements with similar thresholds can be incorporated into the reaction-based model framework introduced in the main text.316Chapter 8: Supplemental materialWe assume that maintenance requirements impose a constant cost on a cell’s growth andmetabolism that can be represented by a specific maintenance rate λs, where s denotes aparticular cell species (210). Hence, the net population growth is given by the expressiondNsdt=∑r∈sYrHr − λsNs, (G.2.1)as long as this expression is positive, where Ns is the cell density and “r ∈ s” indicates thatthe sum only covers the reactions performed by the particular species. Note that the decayrate λs only represents an offset in biosynthetic yield due to maintenance requirements anddoes not account for cell lysis or washout from a bioreactor. When the right hand side in Eq.(G.2.1) becomes negative, i.e., when maintenance requirements exceed yield, dissimilatorymetabolism is assumed to halt.Recall that Hr = NsVrhr, where hr are normalized reaction kinetics. Hence, Eq. (G.2.1) ispositive exactly when∑r∈sNsVrYrhr > λsNs. (G.2.2)Also recall that VrYr = Ar is the self-amplification factor for reaction r, so that condition(G.2.2) translates to∑r∈sArhr > λs. (G.2.3)For example, if ure and amo are performed by the same AOB cells, condition (G.2.3) becomesAurehure + Aamohamo > λAOB, (G.2.4)as used in Eq. (8.4.3) in the main text. In the special case where species s only performsone reaction r, condition (G.2.3) simplifies to Arhr > λr.G.2.2 On the concentration of organic componentsHere we derive the general formula for the concentration (X) of a particular biomass compo-nent (e.g., organic N or cell wall proteins; in short, “compound”) within the context of thereaction-centric model described in the main text (Eq. (8.3.6)). We assume that the amount317Chapter 8: Supplemental materialof compound per cell only depends on the cell species, but is otherwise constant. Hence,X =∑rNrφrmr, (G.2.5)where Nr is the concentration of cells performing reaction r, φr is the amount of focalcompound per cell and mr is the total number of reactions performed by cells performingreaction r. Note that Nr, φr and mr are the same for any reactions performed by the samecells. We therefore divide each term in Eq. (G.2.5) by mr to correct for multiple countingof the same cell species. Next, recall that Mr = NrVr and that Aρr = VρYr whenever thereactions r and ρ are performed by the same cell species. As is shown in detail in SectionG.2.3 below, this means that there exists at least one vector w such that Aw = M, orformally “w = A−1M”. Note that w is not always uniquely defined. Specifically, if multiplereactions are performed by the same cells then the matrix A is not invertible and there mayexist multiple solutions w to the equation Aw = M. However, it can be shown that anysuch vector w satisfies∑rNrφrmr= Trwr = TTw, (G.2.6)were we defined Tr = φrYr (see Eq. (G.2.13) in Section G.2.3). Note that Eq. (G.2.6) holdsregardless of the exact choice of w. Hence, we can write in matrix notationX = TTA−1M, (G.2.7)as mentioned in the main text. The coefficient Tr can be interpreted as the assimilation factorfor reaction r, i.e., as the amount of compound assimilated or synthesized per reaction flux(e.g., mol compound per mol reaction flux). For example, the stoichiometry of dissimilatoryand assimilatory N-metabolism of Nitrosomonas europaea is conventionally summarized bythe following equation:55NH+4 + 76O2 + 109HCO−3 → C5H7NO2 + 54NO−2 + 57H2O + 104H2CO3, (G.2.8)where C5H7NO2 represents biomass (520). Hence Tamo = 1 : 55 ≈ 0.018 for organic N, whereamo represents the dissimilatory reactionNH+4 + 1.5O2 → NO−2 + H2O + 2H+. (G.2.9)318Chapter 8: Supplemental materialNote that if two reactions r and ρ are performed by the same cells, Tr and Tρ must satisfythe consistency conditionTrTρ= ArrArρ, (G.2.10)stemming from the fact that φr = φρ and Arρ = VrYρ.G.2.3 Properties of the amplification matrixThis section is a reference summary of technical details on the amplification matrix A of thereaction-centric model. It is only provided for the mathematical completeness of calculationspresented elsewhere in the Appendix.We denote by V and Y the column vectors containing the maximum cell-specific reactionrates Vr and the cell yield factors Yr. We denote by Ao the matrix whose entries Aorρ are 1if reactions r and ρ are performed by the same cell, and 0 otherwise. We also denote by Vand Y the diagonal matrices whose diagonals are given by V and Y, respectively. Hence, Acan be written as A = VAoY. The following assertions hold:1. Ao and A are invertible if and only if no two reactions are performed by the same cell(i.e., Ao is diagonal).Proof: If two reactions are hosted by the same cell, then at least two rows of Ao areidentical. Hence, det(Ao) = 0, so Ao and A are not invertible.2. Let φ and ν be two vectors such that Aoν = 0, and such that φr = φρ whenever thereactions r and ρ are performed by the same cell (i.e., φr = φρ whenever Aorρ = 1).Then φTν = 0.Proof: From standard matrix theory we know that N (Ao)⊥ = R(AoT), where N andR denote the null space and image space (a.k.a. column space), respectively, and ⊥denotes the orthogonal complement space. Note that Ao = AoT, so that φ ∈ N (Ao)⊥whenever φ ∈ R(Ao). Finally, note that R(Ao) is spanned by precisely those vectorsφ satisfying φr = φρ whenever Aorρ 6= 0.3. Let φ,y ∈ R(Ao). Let z be any vector satisfying Aoz = y. ThenφTz =∑rφryrmr, (G.2.11)where mr =∑ρAorρ is the number of reactions performed by cells performing reaction319Chapter 8: Supplemental materialr.Proof: Choose x in the following way: xr := yr/mr. Then Aox = y. Hence, z = x+νfor some ν ∈ N (Ao). By the previous assertion one has φTν = 0, henceφTz = φTx + φTν = φTx =∑rφrxr =∑rφryrmr, (G.2.12)as claimed.4. Let φ,N ∈ R(Ao). Let Mr = VrNr and Tr = φrYr. Then M ∈ R(A). Moreover, forany vector satisfying Aw = M one has∑rφrNrmr= TTw, (G.2.13)regardless of the exact choice of w.Proof: Recall that A = VAoY. Also note that since N is in R(Ao), M is in R(VAo).Since Y is invertible, one has R(VAo) = R(VAoY) = R(A), hence M is in R(A) asclaimed. Define zr = Yrwr. ThenAoz = V−1AY−1z = V−1Aw = V−1M = N. (G.2.14)Eqs. (G.2.11) and (G.2.14) imply that∑rφrNrmr= φTz =∑rφrYrwr =∑rTrwr = TTw, (G.2.15)as claimed.320Chapter 8: Supplemental materialA B CD E FFigure G.1: Comparison of models for the incubation experiment involving urea hydrolysis andnitrification, with (top row) and without (bottom row) explicitly accounting for partial oxidationof NH+4 produced by urea hydrolysis in the same cells (NH+4 “recycling”). Continuous curves showmodel predictions for urea (left column), NH+4 (center column) and NO−3 concentration (right col-umn), compared to experimental data (circles; 94). While both models predict urea concentrationswith similar accuracy, the 2nd model fails to explain the early nitrification of NH+4 to NO−3 (sub-figures E,F). On the other hand, including partial NH+4 recycling improves the model’s agreementwith the NO−3 time series (sub-figures B,C; log-likelihood = 41.3 with recycling vs 21.2 without).Despite the additional complexity of the first model (6 fitted parameters instead of 5), statisticalmodel selection criteria show a clear preference for the inclusion of partial recycling (AIC = −70.7and BIC = −62.3 with recycling vs AIC = −32.4 and BIC = −25.4 without; 246).321Chapter 8: Supplemental materialA B CD E FFigure G.2: Comparison of models for the incubation experiment involving urea hydrolysis andnitrification, with (top row) and without (bottom row) ure-amo cross-amplification. Curves showmodel predictions for urea (left column), NH+4 (center column) and NO−3 concentration (rightcolumn), compared to experimental data (circles; 94). Continuous and dashed curves in (D–F)show two typical alternative model fitting outcomes (local and global fitting optima, respectively),obtained in the absence of ure-amo cross-amplification. Fitting attempts starting at random pa-rameter values repeatedly converged to one of the two optima, which either completely fail topredict nitrification (continuous curve) or yield unrealistically high estimates (i.e. > 10 d−1) forthe amo self-amplification factor (dashed curve). All model calibrations without ure-amo cross-amplification achieved a substantially lower match to the data (best log-likelihood = 24.6, dashedcurve in bottom row) than the model with ure-amo cross-amplification (log-likelihood = 41.3, toprow).322"@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2016-11"@en ; edm:isShownAt "10.14288/1.0314930"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Mathematics"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "Attribution-NoDerivatives 4.0 International"@* ; ns0:rightsURI "http://creativecommons.org/licenses/by-nd/4.0/"@* ; ns0:scholarLevel "Graduate"@en ; dcterms:title "The ecology of microbial metabolic pathways"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/59313"@en .