UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The ecology of microbial metabolic pathways Louca, Stilianos 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_november_louca_stilianos.pdf [ 55.29MB ]
JSON: 24-1.0314930.json
JSON-LD: 24-1.0314930-ld.json
RDF/XML (Pretty): 24-1.0314930-rdf.xml
RDF/JSON: 24-1.0314930-rdf.json
Turtle: 24-1.0314930-turtle.txt
N-Triples: 24-1.0314930-rdf-ntriples.txt
Original Record: 24-1.0314930-source.json
Full Text

Full Text

THE ECOLOGY OF MICROBIAL METABOLIC PATHWAYSbyStilianos LoucaB.Sc., Mathematics, Friedrich Schiller University of Jena, Germany, 2010Diplom, Physics, Friedrich Schiller University of Jena, Germany, 2012A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Applied Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2016c© Stilianos Louca, 2016AbstractAbstractMicrobial metabolic activity drives biogeochemical cycling in virtually every ecosystem. Yet,microbial ecology and its role in ecosystem biochemistry remain poorly understood, partlybecause the enormous diversity found in microbial communities hinders their modeling. De-spite this diversity, the bulk of global biogeochemical fluxes is driven by a few metabolicpathways encoded by a small set of genes, which through time have spread across micro-bial clades that can replace each other within metabolic niches. Hence, the question ariseswhether the dynamics of these pathways can be modeled regardless of the hosting organ-isms, for example based on environmental conditions. Such a pathway-centric paradigmwould greatly simplify the modeling of microbial processes at ecosystem scales.Here I investigate the applicability of a pathway-centric paradigm for microbial ecology.By examining microbial communities in replicate “miniature” aquatic environments, I showthat similar ecosystems can exhibit similar metabolic functional community structure, de-spite highly variable taxonomic composition within individual functional groups. Further,using data from a recent ocean survey I show that environmental conditions strongly explainthe distribution of microbial metabolic functional groups across the world’s oceans, but onlypoorly explain the taxonomic composition within individual functional groups. Using statis-tical tools and mathematical models I conclude that biotic interactions, such as competitionand predation, likely underlie much of the taxonomic variation within functional groupsobserved in the aforementioned studies. The above findings strongly support a pathway-centric paradigm, in which the distribution and activity of microbial metabolic pathways isstrongly determined by energetic and stoichiometric constraints, whereas additional mecha-nisms shape the taxonomic composition within metabolic guilds.These findings motivated me to explore concrete pathway-centric mathematical models forspecific ecosystems. Notably, I constructed a biogeochemical model for Saanich Inlet, aseasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones. Themodel describes the dynamics of individual microbial metabolic pathways involved in carbon,nitrogen and sulfur cycling, and largely explains geochemical depth profiles as well as DNA,mRNA and protein sequence data. This work yields insight into ocean biogeochemistry anddemonstrates the potential of pathway-centric models for microbial ecology.iiPrefacePrefaceA number of sections in this work are partly or wholly published, in press or in review.Copyright licenses to all works were obtained and are listed where appropriate.• Chapter 2: Stilianos Louca conceived the project. Stilianos Louca performed the se-quence analysis and statistical analysis with input from Laura W. Parfrey and MichaelDoebeli. Stilianos Louca wrote a first draft of the manuscript. All coauthors con-tributed to the final preparation of the manuscript. Michael Doebeli and Laura Parfreysupervised the project.A version of this chapter is published in Science as a copyrighted article. Copyright(2016), American Association for the Advancement of Science. Reprinted with permis-sion from AAAS, from:Louca, S., Parfrey, L.W., Doebeli, M. (2016). Decoupling function and taxonomy inthe global ocean microbiome. Science. 353:1272–1277. DOI:10.1126/science.aaf4507.• Chapter 3: Stilianos Louca, Vinicius F. Farjalla, Saulo M. S. Jacques, Aliny P. F. Pires,Juliana S. Leal performed the field work. Vinicius F. Farjalla and Saulo M. S. Jacquesperformed the chemical measurements in the laboratory. Stilianos Louca performedthe molecular work in the laboratory, the DNA sequence analysis and the statisticalanalyses. Stilianos Louca, Michael Doebeli, Vinicius F. Farjalla, Diane S. Srivastava.and Laura W. Parfrey interpreted the statistical findings. Stilianos Louca wrote a firstdraft of the manuscript, and all authors contributed to the final preparation of themanuscript. Michael Doebeli and Vinicius F. Farjalla supervised the project.A version of this chapter is under peer review for publication:Louca, S., Jacques, S.M.S., Pires, A.P.F., Leal, J.S., Srivastava, D.S., Parfrey, L.W.,Farjalla, V.F., Doebeli, M. (in review). Functional stability despite high taxonomicvariability across microbial communities.• Chapter 4: Stilianos Louca conceived and developed MCM. Stilianos Louca designedthe E. coli example and performed the simulations. S.D. and Michael Doebeli analyzedthe simulation results. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.Software developed in this chapter is available under the GNU Lesser General Public Li-cense (http://www.gnu.org/licenses/lgpl.html) at: http://www.zoology.ubc.ca/MCMA version of this chapter is published in eLife under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/4.0):Louca, S., Doebeli, M. 2015. Calibration and analysis of genome-based models formicrobial ecology. eLife. 4:e08208. DOI:10.7554/eLife.08208iiiPreface• Chapter 5: Stilianos Louca conceived the project, ran the simulations and performedthe statistical analysis. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.A version of this chapter is published in Environmental Microbiology as a copyrightedarticle. Copyright (2015) Society for Applied Microbiology and John Wiley & SonsLtd. Reprinted, with permission, from:Louca, S., Doebeli, M. 2015. Transient dynamics of competitive exclusion in micro-bial communities. Environmental Microbiology. 18:1863–1874. DOI:10.1111/1462-2920.13058• Chapter 6: Stilianos Louca conceived the project, ran the simulations and performedthe statistical analysis. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.A version of this chapter is under peer review for publication:Louca, S., Doebeli, M. (in review). Taxonomic variability and functional stability inmicrobial communities infected by phages.• Chapter 7: Sean A. Crowe and Steven J. Hallam had the idea for the research. StilianosLouca constructed the mathematical models with input from Sergei Katsev, and Stil-ianos Louca performed the simulations. Stilianos Louca, Sergei Katsev, Alyse Hawley,Sean A. Crowe and Steven J. Hallam analyzed the model predictions. Alyse Hawley,Maya P. Bhatia, Monica Torres-Beltran and Steven J. Hallam performed the sequenc-ing. Alyse Hawley, Maya P. Bhatia, Stilianos Louca and Steven J. Hallam analyzed thesequence data. Monica Torres-Beltran, Alyse Hawley, Celine Michiels, Dave Capelle,Sergei Katsev, Gaute Lavik, Sean A. Crowe and Steven J. Hallam collected the chem-ical and physical data. Stilianos Louca wrote the manuscript with Sean A. Crowe andSteven J. Hallam. All authors contributed to the final preparation of the manuscript.Sergei Katsev, Sean A. Crowe, Michael Doebeli and Steven J. Hallam supervised theproject.A version of this chapter is published in PNAS as a copyrighted article. Copyright(2016) National Academy of Sciences. Reprinted, with permission, from:Louca, S., Hawley, A.K., Katsev, S., Torres-Beltran, M., Bhatia, P.B., Kheirandish,S., Michiels, C., Capelle, D., Lavik, G., Doebeli, M., Crowe, S.A., and Hallam, S.J.(2016). Integrating biogeochemistry with multi-omic sequence information in a modeloxygen minimum zone. Proceedings of the National Academy of Sciences.DOI:10.1073/pnas.1602897113• Chapter 8: Stilianos Louca conceived the project, ran the simulations and performedthe statistical analysis. Stilianos Louca wrote the manuscript, with editorial supportfrom Michael Doebeli. Michael Doebeli supervised the project.A version of this chapter is published in Ecological Modelling as a copyrighted article.Copyright (2016) Elsevier. Reprinted, with permission, from:Louca, S., Doebeli, M. 2016. Reaction-centric modeling of microbial ecosystems. Eco-logical Modelling. 335:74–86. DOI:10.1016/j.ecolmodel.2016.05.011ivPrefaceThroughout this dissertation the word “we” refers to Stilianos Louca unless otherwise stated.None of the work encompassing this dissertation required consultation with the UBC Re-search Ethics Board.vTable of ContentsTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Opening chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Objectives and overview of this dissertation . . . . . . . . . . . . . . . . . . 62 The decoupling of function and taxonomy in the global ocean microbiome 82.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Environmental conditions mainly affect microbial function . . . . . . . . . . 102.4 Causes of variation within functional groups . . . . . . . . . . . . . . . . . . 112.5 Beyond taxonomic profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Functional stability despite high taxonomic variability across microbialcommunities in bromeliad tanks . . . . . . . . . . . . . . . . . . . . . . . . . 213.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.1 Functional stability contrasts with taxonomic variability . . . . . . . 233.3.2 Causes of variation within functional groups . . . . . . . . . . . . . . 243.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Calibration and analysis of cell-metabolic models for microbial ecology 32viTable of Contents4.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4.1 Successional dynamics of a microbial community . . . . . . . . . . . . 364.4.2 Experimental calibration . . . . . . . . . . . . . . . . . . . . . . . . . 374.4.3 Predicting microbial community dynamics . . . . . . . . . . . . . . . 384.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Transient dynamics of competitive exclusion in microbial communities . 485.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.3.1 Competition for a common resource . . . . . . . . . . . . . . . . . . . 495.3.2 Bioreactors as model systems . . . . . . . . . . . . . . . . . . . . . . 525.3.3 Bioreactor community dynamics . . . . . . . . . . . . . . . . . . . . . 535.3.4 Variable does not mean unstable . . . . . . . . . . . . . . . . . . . . 555.3.5 Model limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3.6 Alternative destabilizing factors . . . . . . . . . . . . . . . . . . . . . 565.3.7 Towards a pathway-centric microbial ecology . . . . . . . . . . . . . . 575.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Taxonomic variability and functional stability in microbial communitiesinfected by phages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3.1 Bioreactor dynamics in the absence of phages . . . . . . . . . . . . . 666.3.2 Effects of phages on community dynamics . . . . . . . . . . . . . . . 676.3.3 Functional redundancy promotes functional stability . . . . . . . . . 686.3.4 Statistical averaging or dynamic stabilization? . . . . . . . . . . . . . 716.3.5 Phages promote prokaryotic diversity . . . . . . . . . . . . . . . . . . 726.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Gene-centric modeling of the Saanich Inlet oxygen minimum zone . . . 787.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.3 Construction and calibration of a gene-centric model . . . . . . . . . . . . . 807.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.4.1 DNA profiles and process rates . . . . . . . . . . . . . . . . . . . . . 817.4.2 mRNA and protein profiles . . . . . . . . . . . . . . . . . . . . . . . . 847.4.3 Consequences for geobiology . . . . . . . . . . . . . . . . . . . . . . . 867.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Reaction-centric modeling of microbial community metabolism . . . . . 92viiTable of Contents8.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.3 Derivation of the reaction-centric framework . . . . . . . . . . . . . . . . . . 958.3.1 One reaction per cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 958.3.2 Multiple reactions per cell . . . . . . . . . . . . . . . . . . . . . . . . 978.4 Demonstration of the reaction-centric framework . . . . . . . . . . . . . . . . 998.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.4.2 Example 1: Urea hydrolysis and nitrification in a batch-fed incubator 998.4.3 Example 2: Nitrification in a flow-through bioreactor . . . . . . . . . 1028.4.4 Estimating concentrations of other organic compounds . . . . . . . . 1068.4.5 Limitations and extensions of reaction-centric models . . . . . . . . . 1078.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089 Closing chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169.2 So, what is life? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Summary of software contributions . . . . . . . . . . . . . . . . . . . . . . . . . 121Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122AppendicesA Chapter 2: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 173A.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173A.1.1 Sequencing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173A.1.2 Functional annotation of prokaryotic taxa . . . . . . . . . . . . . . . 173A.1.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174A.2 Resolving ambiguities in gene-centric metagenomics . . . . . . . . . . . . . . 178A.3 Comparison with Sunagawa et al. (2015) . . . . . . . . . . . . . . . . . . . . 179B Chapter 3: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 202B.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202B.1.1 Biological sample collection . . . . . . . . . . . . . . . . . . . . . . . 202B.1.2 Chemical analysis of tank water . . . . . . . . . . . . . . . . . . . . . 202B.1.3 Measurement of other physicochemical variables . . . . . . . . . . . . 204B.1.4 16S sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205B.1.5 Functional annotation of prokaryotic taxa (FAPROTAX) . . . . . . . 206B.1.6 Metagenomic sequencing . . . . . . . . . . . . . . . . . . . . . . . . . 207B.1.7 Comparing functional and taxonomic variability . . . . . . . . . . . . 208B.1.8 Metric multidimensional scaling and coloring . . . . . . . . . . . . . . 209B.1.9 Phylogenetic community structure . . . . . . . . . . . . . . . . . . . . 210B.1.10 Comparing OTU proportions to environmental variables . . . . . . . 211B.1.11 Comparing community dissimilarities to geographical distances . . . . 212B.1.12 Comparing OTU co-occurrences to a null model . . . . . . . . . . . . 212B.1.13 Sequence data availability . . . . . . . . . . . . . . . . . . . . . . . . 213viiiTable of ContentsC Chapter 4: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 235C.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235C.1.1 MCM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235C.1.2 Calibration of E. coli cell models . . . . . . . . . . . . . . . . . . . . 239C.1.3 Simulation of the microbial community model . . . . . . . . . . . . . 240C.1.4 Robustness of the SS-FS coexistence . . . . . . . . . . . . . . . . . . 240C.1.5 Seasonal restriction of the SS-FS co-cultures . . . . . . . . . . . . . . 240D Chapter 5: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 244D.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244D.1.1 Computational framework . . . . . . . . . . . . . . . . . . . . . . . . 244D.1.2 Construction of the cell models . . . . . . . . . . . . . . . . . . . . . 245D.1.3 Calibration of the template cell models . . . . . . . . . . . . . . . . . 245D.1.4 Nitrifying membrane bioreactor model . . . . . . . . . . . . . . . . . 246D.1.5 Statistics of community convergence . . . . . . . . . . . . . . . . . . 247D.2 Elaboration on the competition model . . . . . . . . . . . . . . . . . . . . . 247D.3 Details on the bioreactor model . . . . . . . . . . . . . . . . . . . . . . . . . 248D.3.1 Construction of cell models . . . . . . . . . . . . . . . . . . . . . . . 248D.3.2 Metabolites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249D.3.3 Reaction network for AOB . . . . . . . . . . . . . . . . . . . . . . . . 251D.3.4 Reaction network for NOB . . . . . . . . . . . . . . . . . . . . . . . . 253D.3.5 Uptake kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254D.3.6 Community-scale dynamics . . . . . . . . . . . . . . . . . . . . . . . 255E Chapter 6: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 257E.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257E.1.1 Model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257E.1.2 Reaction rates and metabolite dynamics . . . . . . . . . . . . . . . . 257E.1.3 Gibbs free energy and cell production . . . . . . . . . . . . . . . . . . 258E.1.4 Cell and phage population dynamics . . . . . . . . . . . . . . . . . . 259E.1.5 Parameterization and simulations . . . . . . . . . . . . . . . . . . . . 260E.1.6 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261E.1.7 Deterministic vs stochastic competitive exclusion . . . . . . . . . . . 262F Chapter 7: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 269F.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269F.1.1 Model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269F.1.2 Mathematical model structure . . . . . . . . . . . . . . . . . . . . . . 269F.1.3 Considered pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . 270F.1.4 Model calibration and data . . . . . . . . . . . . . . . . . . . . . . . 272F.1.5 mRNA and protein models . . . . . . . . . . . . . . . . . . . . . . . . 272F.1.6 Inverse linear transport modeling . . . . . . . . . . . . . . . . . . . . 273F.2 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274F.2.1 Sampling site and time . . . . . . . . . . . . . . . . . . . . . . . . . . 274F.2.2 Chemical and physical depth profiles . . . . . . . . . . . . . . . . . . 275ixTable of ContentsF.2.3 Metagenomics, metatranscriptomics and metaproteomics . . . . . . . 276F.2.4 Quantifying metagenomic and metatranscriptomic data using RPKM 277F.2.5 Process rate measurements . . . . . . . . . . . . . . . . . . . . . . . . 278F.2.6 qPCR quantification of SUP05 cell counts . . . . . . . . . . . . . . . 279F.3 Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280F.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280F.3.2 Considered pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . 282F.3.3 Pathway stoichiometry . . . . . . . . . . . . . . . . . . . . . . . . . . 283F.3.4 Reaction kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284F.3.5 Gibbs free energy and gene growth . . . . . . . . . . . . . . . . . . . 285F.3.6 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 286F.3.7 Model parameterization . . . . . . . . . . . . . . . . . . . . . . . . . 286F.3.8 Calibrating reaction-kinetic parameters to data . . . . . . . . . . . . 290F.3.9 Calibrating multi-omic data units . . . . . . . . . . . . . . . . . . . . 291F.3.10 Predicting metatranscriptomic and metaproteomic profiles . . . . . . 292F.3.11 Calculating metabolic fluxes between pathways . . . . . . . . . . . . 297F.3.12 Local sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . 298F.4 Caveats and special notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300F.4.1 The role of sulfate reduction . . . . . . . . . . . . . . . . . . . . . . . 300F.4.2 The role of DNRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302F.4.3 The role of aerobic sulfide oxidation . . . . . . . . . . . . . . . . . . . 302F.4.4 Planctomycetes and nxr . . . . . . . . . . . . . . . . . . . . . . . . . 303F.5 Inverse linear transport modeling (ILTM) . . . . . . . . . . . . . . . . . . . . 303G Chapter 8: Supplemental material . . . . . . . . . . . . . . . . . . . . . . . . 307G.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307G.1.1 Details on example 1 (batch-fed incubator) . . . . . . . . . . . . . . . 307G.1.2 Details on example 2 (flow-through bioreactor) . . . . . . . . . . . . . 310G.1.3 Computational methods: Using MCM for reaction-centric models . . 312G.2 Mathematical proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316G.2.1 On specific maintenance rates . . . . . . . . . . . . . . . . . . . . . . 316G.2.2 On the concentration of organic components . . . . . . . . . . . . . . 317G.2.3 Properties of the amplification matrix . . . . . . . . . . . . . . . . . . 319xList of TablesList of Tables8.1 Overview of symbols and units for cell-centric and reaction-centric models . 111A.1 Overview of oceanographic variables . . . . . . . . . . . . . . . . . . . . . . 181A.2 Overview of considered Tara oceans samples . . . . . . . . . . . . . . . . . . 196A.3 KOG-function associations for the ocean microbiome . . . . . . . . . . . . . 200A.4 OTUs per functional group in the global ocean microbiome . . . . . . . . . 201B.1 OTU overlap between bromeliad microbiomes . . . . . . . . . . . . . . . . . 227B.2 Coefficients of variation for gene abundances across bromeliad microbiomes 228B.3 OTU co-occurrence patterns across bromeliad microbiomes. . . . . . . . . . 229B.4 Geographical distances vs taxonomic dissimilarities across bromeliad micro-biomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229B.5 Physicochemical environmental variables in bromeliads . . . . . . . . . . . . 230B.6 OTUs per functional group in bromeliad microbiomes . . . . . . . . . . . . 231B.7 KOG-function associations for bromeliad microbiomes . . . . . . . . . . . . 232C.1 Fitted parameters for the individual E. coli cell models . . . . . . . . . . . 241E.1 Pathway stoichiometry in a methanogenic bioreactor . . . . . . . . . . . . . 267E.2 Model parameters for a methanogenic bioreactor . . . . . . . . . . . . . . . 268F.1 KEGG-function associations for the global ocean microbiome . . . . . . . . 277F.1 Boundary conditions for the gene-centric model . . . . . . . . . . . . . . . . 287F.2 Reaction-kinetic parameters used in the gene-centric model . . . . . . . . . 289F.3 Rescaling factors for metagenomic profiles . . . . . . . . . . . . . . . . . . . 293F.4 Proportionality factors for mRNA and protein models . . . . . . . . . . . . 295G.1 Model parameters for a ureolytic bioreactor . . . . . . . . . . . . . . . . . . 309G.2 Model parameters for a nitrifying bioreactor . . . . . . . . . . . . . . . . . . 311xiList of FiguresList of Figures2.1 Linking functional and taxonomic composition to environmental conditions. 162.2 Environmental filtering of microbial communities in the global ocean. . . . . 172.3 Functional redundancy in the global ocean microbiome. . . . . . . . . . . . 182.4 Microbial community differences vs geographical distance. . . . . . . . . . . 192.5 Regression of the taxonomic composition of the global ocean microbiome . . 203.1 Bromeliad species used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Functional stability vs taxonomic variability of bromeliad microbiomes . . . 293.3 OTU-function associations . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 Phylogenetic dispersion in bromeliad microbiomes . . . . . . . . . . . . . . 303.5 Relating OTU proportions to environmental variables across bromeliads . . 314.1 Conceptual framework used by MCM . . . . . . . . . . . . . . . . . . . . . 424.2 Overview of MCM functionalities . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Relative cell densities during evolution experiments with E. coli . . . . . . . 444.4 Calibration of E. coli cell models using monoculture experiments . . . . . . 444.5 Dynamics of the E. coli microbial community model . . . . . . . . . . . . . 454.6 Metabolic differentiation of E. coli types . . . . . . . . . . . . . . . . . . . . 464.7 Predicted relative cell densities in co-culture when restricted to only one season 475.1 Calibration and validation of AOB and NOB models . . . . . . . . . . . . . 605.2 Simulations of the nitrifying bioreactor under constant conditions . . . . . . 615.3 Simulations of the perturbed nitrifying bioreactor . . . . . . . . . . . . . . . 626.1 Modeling methanogenic communities . . . . . . . . . . . . . . . . . . . . . . 756.2 Phage predation drives OTU turnover . . . . . . . . . . . . . . . . . . . . . 766.3 Effects of functional redundancy on methane production . . . . . . . . . . . 766.4 Effects of functional redundancy on functional community composition . . . 777.1 Metabolic network and selected chemical time series in Saanich Inlet . . . . 897.2 Measured vs predicted chemical depth profiles in Saanich Inlet . . . . . . . 907.3 Molecular and reaction rate profiles in Saanich Inlet, compared to modelpredictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918.1 Modeling urea hydrolysis and nitrification . . . . . . . . . . . . . . . . . . . 1108.2 Model predictions and data for a ureolytic bioreactor . . . . . . . . . . . . . 1128.3 Reconstructing a bioreactor’s state using chemical time series . . . . . . . . 1138.4 Information needed to estimate the state of a reaction-centric model . . . . 1148.5 Model predictions and data for a nitrifying bioreactor . . . . . . . . . . . . 1159.1 Steady state in the Saanich Inlet OMZ . . . . . . . . . . . . . . . . . . . . . 118xiiList of Figures9.2 Community dynamics in a methanogenic bioreactor, compared to turbulentthermal convection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119A.1 Correlation analysis at various taxonomic levels . . . . . . . . . . . . . . . . 182A.2 Environmental filtering at higher taxonomic levels . . . . . . . . . . . . . . 183A.3 Functional vs taxonomic community profiles of the ocean microbiome . . . . 184A.4 Functional dissimilarities vs geographical distances in the ocean microbiome 185A.5 Phenotype-based vs metagenomic functional profiles . . . . . . . . . . . . . 186A.6 Correlations between functional groups in the global ocean microbiome . . . 187A.7 Taxonomic compositions within functional groups in the ocean microbiome 188A.8 Community dissimilarities vs geographical distances in the surface and DCM 189A.9 Sampling locations of the Tara oceans survey . . . . . . . . . . . . . . . . . 190A.10 Correlations between environmental variables across the global ocean . . . . 191A.11 Functional group overlaps in the global ocean microbiome . . . . . . . . . . 192A.12 Correlations between functional groups in the global ocean microbiome . . . 193A.13 Taxonomic dissimilarities vs geographical distances in the global ocean mi-crobiome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194A.14 Functional vs taxonomic determinism in the ocean surface microbiome . . . 195B.1 Genus-level composition within functional groups across bromeliad micro-biomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214B.2 Family-level composition within functional groups across bromeliad micro-biomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215B.3 Order-level composition within functional groups across bromeliad microbiomes216B.4 Class-level composition within functional groups across bromeliad microbiomes217B.5 Geographic distances vs dissimilarities within functional groups across bromeliadmicrobiomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218B.6 Geographic location vs composition within functional groups in bromeliadmicrobiomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219B.7 PARAFAC model components of bromeliad DOC . . . . . . . . . . . . . . . 220B.8 Modeling EEMs of bromeliad DOC with PARAFAC . . . . . . . . . . . . . 221B.9 16S rDNA rarefaction curves for bromeliad microbiomes . . . . . . . . . . . 222B.10 Detailed functional profiles of bromeliad microbiomes . . . . . . . . . . . . . 223B.11 Functional redundancy in bromeliad microbiomes at the genus level . . . . . 224B.12 Functional redundancy in bromeliad microbiomes at the family level . . . . 225B.13 Functional redundancy in bromeliad microbiomes at the class level . . . . . 226C.1 Predicted relative cell densities of the A and FS E. coli types in co-culture . 242C.2 Robustness of the predicted stable coexistence of the SS and FS E. coli typesin co-culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242C.3 Measured relative cell densities of the SS and FS types in batch co-culturewhen restricted to only one “season” . . . . . . . . . . . . . . . . . . . . . . 243E.1 Predicted metabolite uptake rates in a methanogenic bioreactor . . . . . . . 264E.2 Predicted metabolite export rates in a methanogenic bioreactor . . . . . . . 265E.3 Predicted phage-host trajectories in a methanogenic bioreactor . . . . . . . 266xiiiList of FiguresF.1 Temperature, salinity and POM profiles in Saanich Inlet . . . . . . . . . . . 290F.2 Overview of the gene-centric modeling approach for the Saanich Inlet OMZ 296F.3 Local sensitivity analysis of the gene-centric model . . . . . . . . . . . . . . 299F.1 multi-omic depth profiles for various sulfur and nitrogen cycling genes in theSaanich Inlet OMZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301G.1 Comparison of models for ammonium production in a ureolytic bioreactor . 321G.2 Comparison of models with and without ure-amo cross-amplification in aureolytic bioreactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322xivList of AbbreviationsList of Abbreviationsamo Aerobic ammonium oxidationAOB ammonia oxidizing bacteriaCH4 methaneCO2 carbon dioxideCOG clusters of orthologous genesDNRA dissimilatory nitrate reduction to ammoniumFAPROTAX functional annotation of prokaryotic taxaFBA flux balance analysisH2 hydrogenH2S hydrogen sulfideKEGG Kyoto Encyclopedia of Genes and GenomesKOG KEGG orthologous groupsKTW killing the winnerMCM Microbial Community ModelerMDS multidimensional scalingN2O nitrous oxideNH3 ammoniaNH+4 ammoniumNMDS non-metric multidimensional scalingNO−2 nitriteNO3− nitrateNOB nitrite oxidizing bacterianxr Aerobic nitrite oxidationO2 oxygenOMZ oxygen minimum zonePDNO partial denitrification to nitrous oxidePO−4 phosphateROM remineralization of organic matterOTU operational taxonomic unitSNTZ sulfate-nitrate transition zoneSO2−4 sulfateSSU rRNA small subunit of the 16S rRNA geneTEA terminal electron acceptorure Urea hydrolysisxvAcknowledgementsAcknowledgementsI would like to express my deep appreciation to my supervisor, Prof. Michael Doebeli, whogave me the freedom and tremendous support to chase pretty much any white rabbit thatcrossed my way. I would like to thank all my collaborators, from whom I have learned a lotand with whom I was able to accomplish much of the work presented here: Alyse Hawley,Prof. Sergei Katsev, Captain Monica Torres-Beltran, Maya Bhatia, Celine Michiels, DavidCapelle, Gaute Lavik, Prof. Sean Crowe, Aria Hahn, Prof. Laura Parfrey, Sam Kheirandish,Saulo Jacques, Aliny Pires, Juliana Leal and Prof. Diane Srivastava. Special thanks toProf. Steven Hallam for giving me an entrance to the world of microbial ecology and forplenty of stimulating discussions. Special thanks to Prof. Vinicius Farjalla for tremendoussupport on my field work and for being an amazing companion during my stay in Brazil. Mygratitude to Mellisa Chen for teaching me a multitude of laboratory techniques. Thanks toAndreas Mueller for teaching me DNA extraction. Thanks to the Parfrey lab for hosting andsupporting me along the way. Thanks to the Hallam lab for being such great collaborators.Thanks to Prof. Rebecca Tyson, Prof. Mary O’Connor and Prof. Daniel Coombs fortheir numerous advises and pleasant discussions. Thanks to Jan Finke, Andy Loudon, Prof.Angélica González, Sarah Perez and Matthew Osmond for numerous scientific discussionsand for providing feedback on my manuscripts. Thanks to Prof. Sally Otto for being sucha thought-provoking committee member and for providing feedback on my manuscripts.I am grateful to the Department of Mathematics (UBC), to the Pacific Institute for theMathematical Sciences (PIMS) and to NSERC for funding. Thanks to my parents, brotherand sister for supporting me at all times.xviOpening chapterChapter 1Opening chapterFor such a model there is no need to ask the question “Is the model true?”.If “truth” is to be the “whole truth” the answer must be “No”. The onlyquestion of interest is “Is the model illuminating and useful?”George Box, 19791.1 IntroductionMicroorganisms are the most ancient, the most diverse and the most abundant form of life onEarth (518). The biomass of prokaryotes alone is comparable to that of all plants combined(518), and their distribution extends far beyond that of multicellular organisms (111, 404).The metabolic activity of microorganisms drives the bulk of biogeochemical fluxes in virtu-ally every natural ecosystem (116), including marine sediments (388), soil (231, 494), theopen ocean (79, 455) and freshwater lakes (126). Cyanobacteria, for example, perform upto 35% of global photosynthesis (356), turning solar energy into a redox disequilibrium be-tween carbon and oxygen that powers much of current heterotrophic life. Microorganismsstrongly shape the marine nitrogen and sulfur cycles, thereby modulating global ocean pro-ductivity and climate (136). For example, denitrification and anaerobic ammonia oxidation(anammox), two microbial pathways that utilize nitrogen compounds as alternative terminalelectron acceptors for respiration, can lead to a significant net loss of bioavailable nitrogento N2 (501). On the other hand, nitrification, an exclusively prokaryotic process by whichreduced nitrogen compounds are aerobically oxidized to nitrate for energy, plays a centralrole in soil productivity (375) and industrial processes such as wastewater treatment (520).Understanding the spatiotemporal dynamics of microbial metabolic processes is thereforecentral to understanding overall ecosystem biochemistry and towards optimizing the perfor-mance of microbially driven industrial processes (316).Until recently, difficulties in culturing and therefore characterizing the majority of microor-1Opening chapterganisms has been a bottleneck to microbial ecology (21, 203, 346). With the advent of high-throughput molecular techniques, notably marker-gene sequencing (353, 529), shotgun DNAsequencing (metagenomics; 534), shotgun RNA sequencing (metatranscriptomics; 157, 511)and mass-spectrometry based protein sequencing (metaproteomics; 298, 359), we are nowentering an new era of biological inference (169). These culture-independent techniquesgenerate massive amounts of data and provide unprecedented insight into the composition,metabolic potential and activity of microbial communities (98). However, the generationof these data is still rarely theory-driven and in most cases their mechanistic interpretationremains elusive (209, 241, 380, 401, 459). For example, while taxonomic community profilingcan reveal intriguing variation, for example along the ocean water column (460) or acrossseasons (540), the reasons for the observed taxonomic variation remain largely unknown be-cause the ecological role of most taxa is unknown and can vary strongly even between closelyrelated clades (3, 308).The mechanistic modeling of microbial communities is further complicated by the sheer diver-sity of microorganisms in any particular environment (377, 401). For example, a single gramof soil can harbor several thousands of different — and potentially interacting — microbialspecies (399). Conventional reductionist approaches would require a careful physiologicalcharacterization of each member species. However, even the simplest life forms cannot bestudied in isolation because the ability to catalyze different steps of metabolic pathways orsynthesize required biomolecules is partitioned across different organisms, and hence eachspecies inevitably only constitutes a small link in the overall reaction network sustaining lifein an environment (183, 200, 238, 311). An incorporation of each species into a comprehen-sive mathematical model similar to some mechanistic “macro-ecological” models (230, 412)is thus impractical for most natural microbial communities, although such approaches havebeen suggested in the past (258). In most cases, a different approach is needed for describingmicrobial metabolic processes at ecosystem scales.Despite an enormous microbial diversity, the bulk of global biogeochemical fluxes is drivenby a core set of metabolic pathways, which have evolved and proliferated in response tovarious redox disequilibria available for sustaining life (116). Through time and notablyvia horizontal gene transfer, these pathways have spread across microbial clades that canco-occupy or replace each other within metabolic niches (116, 270, 483). The growth ofeach clade is inevitably coupled to the activity of its energy-yielding pathways, and thisactivity is subject to environmental energetic and stoichiometric constraints such as theavailability of specific electron donors and acceptors (56). For example, the abundance andexpression of genes linked to nitrogen and sulfur metabolism in marine oxygen minimum2Opening chapterzones typically reflect the varying redox conditions along the water column (181, 448). On theother hand, microbial — notably prokaryotic — genomes rapidly lose pathways that are notrequired in their natural environment, presumably due to strong selective pressure for genomestreamlining (41, 149, 250, 334). It is therefore tempting to theorize that these pathways —or more precisely, the genes and operons encoding them — constitute largely independentunits of replication and selection (41, 93), and that environmental physicochemical conditionsprescribe the structure of the community metabolic network regardless of which organismshappen to occupy each metabolic niche (126). The taxonomic composition within each nichemay of course be subject to additional selection processes, such as tolerance to particularenvironmental stressors or susceptibility to specialist phage populations (397, 423), howeverthe resulting variation in taxonomic composition may have little effect on overall metabolicfunctioning. Such a “pathway-centric” paradigm, if applicable, would greatly simplify themodeling of microbially mediated processes at ecosystem scales and would provide holisticinsight into global biogeochemistry. Further, modeling microbial communities at the levelof pathways or genes would enable a direct integration of metagenomic, metatranscriptomicand metaproteomic sequence data, which to date remain largely unutilized in quantitativeecosystem models (258).1.2 Problem statementThe pathway-centric paradigm assumes that the metabolic function of a community some-how becomes decoupled from its specific taxonomic makeup, so that overall communitymetabolism is strongly controlled by physicochemical environmental factors alone, while ad-ditional processes shaping taxonomic composition have little effect on metabolic function.It is a priori unclear what conditions would promote such a decoupling and how appropriatea pathway-centric paradigm is for natural microbial communities. For communities withlow taxonomic richness, or in which certain pathways are only performed by a small set oforganisms, metabolic activity may depend strongly on the particular genomes present. Inparticular, multiple pathways co-occurring in the same genomes will not behave as inde-pendent replicating units, and this will likely lead to deviations from the pathway-centricparadigm. On the other hand, at the community level the potential presence of the samepathways in alternative configurations could enable pathway independence. Hence, func-tional redundancy, i.e., the presence of multiple alternative clades capable of filling a specificmetabolic niche, may promote the decoupling between the taxonomic makeup of a commu-nity and its metabolic function, however this remains to be tested.3Opening chapterA key prediction of the pathway-centric paradigm is that physicochemically similar environ-ments would promote similar metabolic functional community structure, while allowing forstrong taxonomic variation within individual functional groups. This prediction is indeedsupported by observations in engineered ecosystems, such as bioreactors exhibiting strongtaxonomic fluctuations while maintaining constant biochemical performance (122, 495), al-though it remains largely untested for natural microbial communities (25). More generally,one would expect that variable environmental conditions correlate more strongly with com-munity function than with taxonomic composition, especially taxonomic composition withinfunctional groups. An analysis of metagenomes from the Global Ocean Survey (406) indeedrevealed strong correlations between pairwise environmental differences on the one hand,and pairwise metagenomic (but not taxonomic) community differences on the other hand(381). Similarly, bacterial community composition on the macroalgae Ulva australis wasbest explained in terms of metagenomic content rather than species content (52). Thesefindings are consistent with a pathway-centric paradigm, however they do not explicitlyaddress the taxonomic composition within individual functional groups, perhaps becauseassigning metagenomic sequences to specific taxa is a notoriously hard problem (471; which,as we show in Chapter 2, can be circumvented).Even if not absolutely accurate, a pathway-centric paradigm would constitute an elegantand potentially insightful null model for microbial ecology because, as discussed above, itmakes concrete predictions about microbial community metabolism and its interaction withthe environment. Hence, the question is not whether a pathway-centric paradigm wouldbe true; indeed, the short answer is “No”. Rather, how much of the “truth” would wereally be missing in a pathway-centric paradigm, in which pathways are no longer associatedwith a specific host but constitute self-serving replicators that directly interact with theirenvironment? The real question of interest is, would such a paradigm be “illuminatingand useful?” (43) and if so, under which conditions? For example, it may seem intuitivethat a high functional redundancy would promote the decoupling between environmentally-driven pathway dynamics on the one hand, and the particular taxonomic composition of acommunity on the other hand, but this remains to be rigorously examined.Further, any pathway-centric description of microbial communities would only capture partof the full story because it would make no statement about the assembly within individ-ual metabolic guilds, which may be driven by a multitude of additional mechanisms. Suchmechanisms may include biotic interactions such as competition or predation (258, 276, 455),random population drift (349), random colonization order (“lottery effects” 52), spatiallylimited random dispersal (306) and microbial chemical warfare (237). For example, adapta-4Opening chaptertion of bacteriophages to specific hosts can strongly influence bacterial species composition(423) and promote variation of microbial communities through so-called “killing the winnerdynamics” (397, 462). Further, trade-offs between environmental stress tolerance and com-petition may lead to additional environmental filtering within functional groups (159, 362).The question then arises, at what point do these additional processes significantly influencecommunity metabolism?Apart from questions regarding the appropriateness and limitations of a pathway-centricparadigm, questions also emerge regarding the precise mathematical formulation of an eco-logical theory for microbial metabolic pathways. Specifically, how exactly would an envi-ronment “determine” the distribution and activity of particular pathways, and how shouldthe current activity of pathways feed back to the “population dynamics” of these pathways?Further, how could multi-omic (metagenomic, metatranscriptomic and metaproteomic) se-quence data be quantitatively incorporated into such models? Inspiration may be taken fromexisting ecosystem models in which broad microbial processes are represented by homogenousfunctional groups, such as photoautotrophs or detritivores. In these models, the activity offunctional groups is determined by resource-dependent responses, such as Michaelis-Mentenkinetics (211), and their growth is determined by simple biomass-per-substrate yield factors(193, 360). These “functional group models” are much coarser than potential descriptionsof metabolic networks at the pathway level, and this coarseness may explain why multi-omic data are yet to be incorporated in such models. Further, such models rarely accountfor energetic constraints of microbial metabolism because yield factors are considered to befixed model parameters. In reality, however, the energy that can be gained from a metabolicreaction depends on the specific physicochemical state of the local environment, and thuslocal reaction energetics strongly shape the structure of the microbial metabolic network(34, 56, 256). A recent biogeochemical model by Reed et al. (386), which predicts genegrowth rates based on the energy yield from their associated metabolic pathways, consti-tutes a first attempt to construct a thermodynamics-based pathway-centric model. Whilecompelling, the model by Reed et al. (386) stops short of a quantitative integration betweengeochemical and multi-omic sequence data. For example, while the model allowed for a qual-itative comparison between modeled gene production rates and selected transcript profiles, itdoes not provide any explicit mechanistic links (386). In fact, the lack of a quantitative val-idation against process rate measurements or other proxies for activity (e.g., proteins) begsthe question whether such pathway-centric models are adequate descriptions for microbialcommunity metabolism.5Opening chapter1.3 Objectives and overview of this dissertationIn this dissertation, I examine the appropriateness of a pathway-centric paradigm for mi-crobial ecology using microbial community profiling and mathematical modeling. Further,I examine concrete biogeochemical pathway-centric models for specific ecosystems, which Ievaluate using chemical and molecular sequence data.Specifically, in Chapter 2, I use metagenomic data from the Tara Oceans survey (460) toinvestigate the functional structure of bacterial and archaeal communities across the globalocean, as well as the taxonomic composition within each of 28 metabolically defined func-tional groups. For that purpose, I constructed a custom database for functional annota-tions of prokaryotic taxa (“FAPROTAX”) based on extensive literature search. Using thisdatabase, I find that environmental physicochemical conditions strongly predict the dis-tribution of microbial metabolic functional groups, but only poorly predict the taxonomiccomposition within individual functional groups, in line with a pathway-centric paradigm.In Chapter 3, I use metagenomic sequencing and 16S rRNA marker gene sequencing ofbacterial and archaeal communities within the foliage of bromeliad plants, a model system forcommunity ecology (117, 438), to show that similar aquatic environments can indeed sustainsimilar microbial metabolic networks, despite a highly variable taxonomic composition withinindividual metabolic functional groups. I then use statistical tools from community ecologyto elucidate the potential mechanisms shaping the taxonomic composition within functionalgroups. I find that deterministic biotic interactions and additional environmental filtering,but not random drift or dispersal limitation, likely underlie the observed taxonomic variationbetween bromeliad microbiomes.In Chapter 4, I present a computational framework (“Microbial Community Modeler”, shortMCM) that I developed for modeling microbial communities. This framework enables theconstruction of both pathway-centric models as well as cell-centric models (using genome-based metabolic cell models). I validate this framework by modeling previous laboratoryevolution experiments with Escherichia coli (188, 262), during which an ancestral straindiversified into two coexisting ecotypes. The models are able to reproduce the successionaldynamics in the evolution experiments, and yield detailed insights into the metabolic pro-cesses that drove bacterial diversification. Thus, apart from demonstrating the potential ofMCM, this work provides a unifying quantitative perspective on a multitude of co-culturingexperiments performed in our lab over the last 10 years.6Opening chapterIn Chapters 5 and 6 I use MCM to examine specific mechanisms by which biotic interactionsand functional redundancy could potentially affect community composition and metabolism,based on models for nitrifying as well as methanogenic bioreactors. I find support for theinterpretation that biotic interactions, such as competition (Chapter 5) and predation byphages (Chapter 6), may underlie a large part of the taxonomic variation within functionalgroups, reported in Chapters 2 and 3. Further, these models explicitly demonstrate howa higher functional redundancy can lead to a decoupling between functional stability andtaxonomic variability under constant environmental conditions.Taken together, the work presented in Chapters 2, 3, 5 and 6 provides strong support fora pathway-centric paradigm, according to which the distribution and activity of metabolicpathways is strongly determined by physicochemical conditions, whereas additional mecha-nisms shape the taxonomic composition within metabolic guilds. These results motivatedme to develop and test specific pathway-centric biogeochemical models for a series of naturalas well as engineered ecosystems.Specifically, in Chapter 7 I describe a biogeochemical model that I constructed for the oxygen-depleted water column in Saanich Inlet (540), a seasonally anoxic fjord with biogeochemistryanalogous to oxygen minimum zones (OMZs). The model describes the activity and distri-bution of individual microbial metabolic pathways involved in carbon, nitrogen and sulfurcycling in Saanich Inlet and integrates geochemical depth profiles, process rate measurementsas well as DNA, mRNA and protein sequence data.In Chapter 8, I present an alternative pathway-centric modeling framework, in which themetabolic activity of a microbial community is described purely in terms of reaction ratesand the “capacity” to perform particular reactions. The benefits of such a “reaction-centric”approach, when compared to models explicitly keeping track of cell (or gene) densities, isthe reduced number of physiological parameters required for constructing a model and ofbiotic measurements required to estimate the current state of a system. I validate thisapproach using data from previous bioreactor experiments (94, 109). Further, I examinehow the co-occurrence of multiple pathways in the same organisms, rather than in separateorganisms, can affect overall community metabolism and thus the accuracy of the pathway-centric paradigm.Methodological details for each chapter are provided as supplemental material in AppendicesA–G.7Decoupling between function and taxonomy in the ocean microbiomeChapter 2The decoupling of function and taxon-omy in the global ocean microbiome12.1 SynopsisHere we use statistical analyses of taxonomic and functional community profiles to deter-mine the factors that shape marine bacterial and archaeal communities across the globalocean. Through an extensive literature search we classified >30,000 microbial organismsinto metabolic functional groups, which allowed us to disentangle functional from taxonomiccommunity variation. We find that environmental conditions strongly influence the distri-bution of functional groups by shaping metabolic niches, but barely influence the taxonomiccomposition within individual functional groups. Hence, the bulk of environmentally drivenvariation in community composition is attributable to functional properties, while the re-maining variation is enabled by a high global functional redundancy across taxa.2.2 IntroductionMicrobial communities power global biogeochemical cycling and form the most importantinterface between abiotic and biotic processes on Earth (116). Bacteria and Archaea, inparticular, drive marine nitrogen and sulfur cycling, thereby modulating global ocean pro-ductivity and climate (136). Elucidating the processes shaping microbial communities overspace and time presents a missing link towards understanding the integrated biotic-abioticsystem we call our planet, and is key for predicting how biogeochemical cycles will changewith changing environmental conditions.The majority of microbial biogeographical studies focus on taxonomic community compo-1A version of this chapter has been published in Science (see the Preface for author contributions): Louca,S., Parfrey, L.W., Doebeli, M. (in press). Decoupling function and taxonomy in the global ocean microbiome.Science. 353:1272–1277. DOI:10.1126/science.aaf4507.8Decoupling between function and taxonomy in the ocean microbiomesition (306). Taxonomic community profiling can reveal intriguing variation between en-vironments, and functional differences between organisms are generally thought to causethese patterns. Distantly related microbes, however, can often perform similar metabolicfunctions and, reciprocally, closely related taxa may occupy separate metabolic niches (308).This leads to a disconnect between taxonomic community structure and function (122, 466).As a consequence, the ecological reasons for the observed taxonomic variation usually re-main unknown. Other studies directly estimate functional potential based on communitygene content using environmental shotgun sequencing–or metagenomics–and have indeed re-vealed strong correlations between the distribution of particular metabolic pathways andenvironmental conditions (99, 181, 381). This suggests that environmental conditions shapethe functional potential of microbial communities in terms of community gene content byconstraining metabolic niches. However, it is yet unknown how this relates to communityassembly rules, and which aspect of the variation in taxonomic composition is relevant toecosystem functions. In addition to these niche effects (also known as environmental fil-tering), microbial populations are subject to complex community-level processes such aspredation or mutualistic interactions (258, 455), as well as to potential limits to their dis-persal across spatial scales (306). Given this complexity, it is important to establish basicprinciples determining microbial community composition.Here we present an analysis of over 100 bacterial and archaeal communities across the globalocean, combining taxonomic and functional community profiling to elucidate the role ofenvironmental filtering, global functional redundancy and dispersal limitation in shapingnatural microbial communities. Taxonomic profiles were generated based on shotgun DNAsequences of the 16S ribosomal gene, a standard marker gene in microbial ecology. Func-tional profiles were generated by associating individual organisms with metabolic functionsof particular ecological relevance, such as photoautotrophy and sulfate respiration, using anannotation database that we created through extensive literature search. This information,which we validate by comparing it to metagenomic gene profiles, allowed us to explore multi-ple facets of microbial community structure — taxonomic composition, metabolic functionalpotential and taxonomic composition within individual functional groups — in relation toenvironmental conditions and geographical location.9Decoupling between function and taxonomy in the ocean microbiome2.3 Environmental conditions mainly affect microbialfunctionTo assess the effects of environmental conditions on various aspects of community structure,we performed regression and correlation analysis of the relative abundances of metabolicfunctional groups, as well as the proportions of various operational taxonomic units (OTU)within each functional group, against 13 key abiotic oceanographic variables that includeddissolved oxygen, salinity, temperature and depth (Table A.1). Both regression and corre-lation analyses generally revealed that environmental conditions had very strong effects onthe functional profiles of microbial communities, but only minor effects on the taxonomiccomposition within each functional group. In particular, the cross-validated coefficient of de-termination (R2cv, a measure of the predictive power of a model) for the relative abundancesof most functional groups greatly exceeds the average R2cv achieved for the OTU proportionswithin the same groups (Fig. 2.1A). Similarly, correlations between relative functional groupabundances and environmental variables are generally greater in magnitude, compared tothe correlations between OTU proportions within each group and environmental variables(Fig. 2.1B). These differences persist even when OTUs are combined at higher taxonomiclevels (e.g., genus, family or order; Figs. A.1 and A.2). Hence, the poor correlation betweenthe taxonomic composition within functional groups and environmental conditions is not dueto a sub-optimal choice of taxonomic resolution, but rather reflects a lack of environmentaleffects on the non-functional variation in community composition.Regression modeling of taxonomic profiles of entire communities (at any taxonomic resolu-tion) against environmental variables achieved an average R2cv (Fig. 2.5) that is lower thanthe R2cv for the relative abundances of most functional groups, but higher than the mean R2cvachieved for the taxonomic compositions within functional groups. This further supports theinterpretation that deterministic environmental effects on function only partly shape over-all taxonomic community structure, due to taxonomic variation within functional groupsthat is much less affected by environmental conditions. Accordingly, clustering samples bytaxonomic as well as functional community composition (Bray-Curtis dissimilarity metric,Figs. 2.2BC and A.3) shows that a distinction between water column zones based on func-tion is comparable in strength to a distinction based on taxonomy. In fact, the fraction offunctional groups exhibiting statistically significant segregation between water column zones(e.g., mesopelagic vs surface) is usually higher than the fraction of significantly segregatedOTUs or higher taxa (Figs. 2.2E–G). Hence, the bulk of deterministic variation in commu-nity composition across different zones is well captured by the variation of purely functional10Decoupling between function and taxonomy in the ocean microbiomeproperties.These results strongly suggest that environmental conditions influence microbial communitystructure in the global ocean primarily by shaping metabolic niches. In fact, we find that theproductivity of a particular metabolic niche – represented here by the relative abundanceof a functional group – generally only weakly influences the taxonomic composition withinthat niche (Fig. 2.3B). The importance of niche effects in structuring marine microbialcommunities is further underlined by our finding that OTUs sharing a higher number offunctions tend to co-occur more frequently (Fig. 2.2H). In contrast, if competition andspecies assortment were dominant forces we would expect lower co-occurrences among OTUswith more shared functions. In a similar way metabolic niche effects were shown to dominatehuman gut microbiome assembly (271), suggesting that this may be a general pattern innatural microbial communities.The decoupling between environmental conditions and niche productivity on the one hand,and the taxonomic composition within individual niches on the other hand, is consistent withprevious smaller-scale observations. For example, in a wastewater treatment plant the ratio ofaerobic ammonia oxidizing bacteria and heterotrophic bacteria remained constant over time,while the taxonomic composition within each functional group varied markedly and appearedto be only weakly explained by environmental factors (349). Moreover, metatranscriptomicsin two distinct ocean regions revealed strongly conserved diurnal cyclic succession patternsof community gene expression, despite largely non-overlapping taxonomic affiliations of tran-scripts between the two regions (25). Our work suggests that these previous observations arein fact just the tip of an iceberg. Energetic and stoichiometric constraints generally driveocean microbial metabolic activity, but not the identity of the microbes involved in thatactivity.2.4 Causes of variation within functional groupsIf environmental conditions mainly interact with functional community structure, the ques-tion arises as to what drives the taxonomic variation within individual functional groups.High functional redundancy on a global ocean scale (Fig. 2.3) presumably enables the highdiversity within functional groups and a decoupling between taxonomic composition andconstraints on function. It is possible that the “unexplained” taxonomic variation withinfunctional groups may be partly due to unconsidered physicochemical variables combinedwith unconsidered phenotypic differences, driving location-dependent growth differences be-11Decoupling between function and taxonomy in the ocean microbiometween competing species. However, it is unlikely that latent environmental variables alonecould explain the widespread apparent randomness seen within such a large number of func-tional groups (Fig. A.7). Alternatively, spatially limited dispersal is often suggested as acause of random distribution patterns not attributable to environmental conditions. Disper-sal limitation has been shown to be important for larger organisms such as plants (487), butits importance for microorganism is generally thought to be low compared to environmentalfiltering, and to depend strongly on the environments considered (e.g., lakes vs open ocean;(29, 306, 466)). To test whether the variation in taxonomic composition in the ocean com-munities considered here can be explained by dispersal limitation, we compared geographicalsample distances to dissimilarities in functional and taxonomic community composition, aswell as to differences in composition within individual functional groups. Mantel correlationtests between geographical distances and various dissimilarity metrics revealed no significantpositive correlations, neither at the level of the whole community (Figs. 2.4AB) nor withinfunctional groups (Figs. A.4). The apparent absence of a distance-decay of similarity is alsoreflected in ordination plots (Figs. 2.4CD), in which sample clustering appears to be inde-pendent of ocean region, with the notable exception that polar samples are clearly distinct.These observations suggest that range-limited random dispersal does not play a significantrole in shaping taxonomic community differences across the oceans, even when restrictedto within metabolic niches. In principle, dispersal limitation may cause a distance-decayof similarity at much smaller geographical scales than the ones considered here, and thusremain undetected by our analysis (307). This scenario, however, is unlikely for marine mi-croorganisms that can be dispersed at global scales by large ocean currents (125), and thatindeed appear to be recruited from a global marine microbial seed bank (147).Instead, the unexplained taxonomic variation may be driven predominantly by community-level processes such as metabolic interdependencies, chemical warfare, or predation by virusesand eukaryotes. Previous work has highlighted the central role of such processes in micro-bial community assembly (258, 276, 455). For example, adaptation of bacteriophages tospecific hosts can cause continuous replacement of competing species (423) and promotespatial structuring of microbial communities (462). Hence, functional community structureand taxonomic composition within functional groups appear to constitute roughly comple-mentary axes of variation, with the former being affected more strongly by environmentalconditions and the latter shaped by community-level processes. In this perspective, the ob-served overall microbial community structure would result from the superposition of bothenvironmental constraints and community-level processes (276, 455).12Decoupling between function and taxonomy in the ocean microbiome2.5 Beyond taxonomic profilingTaxonomic microbial community profiles can reveal intriguing differences between environ-ments, for example across mammalian guts (335) or along the ocean water column (460).However, microbial taxonomy and metabolic potential can exhibit significant inconsistencies(3); the extent of these inconsistencies is strongly trait-dependent and often considerable(308). Inconsistencies between taxonomy and metabolic potential are driven by diverse evo-lutionary processes including adaptive loss of function (165, 334) and metabolic convergenceof distinct clades accelerated by frequent horizontal gene transfer (116). As a result, manymetabolic phenotypes can be shared by distant microbial clades (Figs. 2.3AC) and, recipro-cally, members of the same clade can fill separate metabolic niches (165). This misalignmentbetween taxonomy and metabolic potential complicates the mechanistic interpretation oftaxonomic biogeographical patterns and the development of predictive ecological theories(381). Thus, while modern marker-gene sequencing techniques can yield detailed taxonomiccommunity profiles, they suffer from the crucial limitation that these high-dimensional pro-files are obtained along axes of sub-optimal ecological relevance. Sophisticated statisticaltechniques such as principal coordinate analysis or multidimensional scaling are often usedto reduce the dimensionality of these data, in the hope of detecting axes of variation bearingan ecological meaning (382). Alternatively, detected species may be combined at highertaxonomic levels that more closely resemble the depth at which traits vary across lineages.However, the optimal taxonomic level is highly trait-dependent (308).Here we have shown that a straightforward but thorough binning of organisms into functionalgroups can reveal strong patterns relevant to biogeochemistry and ecosystem function. Inparticular, the comparison of functional profiles to environmental variables (Fig. 2.1) yieldsinsight into the processes driving variation in community composition along geochemical gra-dients and, reciprocally, gives information about the effects of that variation on ecosystemprocesses. For example, our correlation analysis revealed a particularly strong influence ofwater depth on function (Fig. 2.1B), which is also reflected in the clear separation of themesopelagic zone from the upper sunlit zones (surface and deep chlorophyll maximum; Figs.2.2AB). This functional zonation with depth is consistent with metagenomic profiles of thesame samples (Fig. A.5). The partitioning of function along the water column is presumablydriven by depth-dependent factors known to influence microbial life history strategies andproductivity, such as light intensity and temperature (236). Furthermore, in deeper zonesoxygen becomes a limiting resource for respiration that can be partly replaced by alterna-tive electron acceptors such as nitrate and sulfate (531). Such redox gradients underlie the13Decoupling between function and taxonomy in the ocean microbiomespatial zonation of metabolic pathways frequently observed in gene-centric metagenomic,metatranscriptomic and metaproteomic profiles across ecosystems (99, 181). Accordingly,our functional profiles reveal that deeper samples exhibit an over-representation of severalgroups capable of fermentation as well as nitrate and sulfate respiration (Fig. 2.2A). Thesealternative metabolic modes lead to an increased community richness (in terms of detectedfunctional groups and OTUs), especially in the mesopelagic zone (Fig. 2.2D, after rarefyingat equal sequence count). While an increase of taxonomic richness with depth has beennoted previously (368, 460), our analysis now provides evidence that this richness gradientis strongly related to the number of available metabolic niches. Hence, the systematic in-tegration of taxonomic and functional information demonstrated here can help answer longstanding questions regarding the relation between microbial taxonomic and functional diver-sity and variability in a given environment (283). Similarly, functional group co-occurrencepatterns (e.g., Fig. A.6) may trigger novel hypotheses on the interaction of metabolic path-ways at ecosystem scales.The translation of taxonomic information into functional profiles based on phenotypic char-acterizations, as demonstrated here, provides a powerful complement to gene-centric metage-nomics, the current de facto standard for functional community profiling (486). Metagenomicprofiles suffer from the conceptual limitation that community gene content generally doesnot directly and unambiguously translate to functional potential (376). Phenotype-basedprofiles on the other hand have the potential to resolve ambiguities inherent to metage-nomics, because experimental evidence is used to identify the actual metabolic capabilitiesof organisms (see supplementary text for examples). We emphasize that the full potentialof phenotype-based profiling remains underutilized due to our current inability to associateseveral detected taxa with any functional group. For example, a large fraction of the ubiqui-tous but poorly studied Thaumarchaeota phylum potentially involved in ammonia oxidation(441) was excluded from our functional annotations. Similarly, microeukaryotes such as fungilikely contribute to some of the considered metabolic functions, including cellulose and chitindegradation (91). The restriction to Bacteria and Archaea may thus explain our inabilityto relate the distribution of these functional groups to environmental conditions (Fig. 2.1).Future functional profiling will greatly benefit from an inclusion of eukaryotic microorgan-isms and from an effort to phenotypically characterize underexplored (e.g., hard-to-culture)clades of potentially high relevance to ecosystem functioning.14Decoupling between function and taxonomy in the ocean microbiome2.6 ConclusionsDespite an enormous microbial diversity, the bulk of global biogeochemical fluxes is drivenby a core set of metabolic pathways that evolved in response to past geochemical condi-tions (116). Through time, these pathways have spread across microbial clades that canco-occupy and compete within metabolic niches. The decoupling of taxonomic compositionwithin metabolic niches from environmental conditions, as demonstrated here, suggests thatBaas Becking’s famous hypothesis “everything is everywhere and the environment selects”(28) should be refined towards the more conservative formulation that “every function iseverywhere, and the environment selects” (381). This realization has implications for theinterpretation of differences in community structure across environments or across time, be-cause differences in taxonomic composition that do not affect functional composition havelittle relevance to ecosystem processes (122). Functional descriptions of microbial com-munities should therefore constitute the baseline for interpreting biogeographical patterns,particularly across transects where geochemical gradients shape microbial niche distribution(162, 386). The remaining variation within functional groups can then be analyzed separatelyto elucidate additional community assembly processes. As the number of phenotypicallycharacterized organisms increases, the potential of functional annotations of microbial taxacan only further improve. An incorporation of global microbial functional profiles — andtheir response to potentially changing environmental conditions — into future biogeochem-ical models will greatly benefit reconstructive and predictive modeling of Earth’s elementalcycles.15Decoupling between function and taxonomy in the ocean microbiomeBAapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonapparent oxygen utilizationdaily insolationdepthdistance to thermoclineduration of daynitrateoxygen pHphosphatesalinitysilicatetemperaturetotal inorganic carbonR2cv correlations to environmental variablesrelative functional group abundances OTU proportions within functional groupsfunc. groups OTU prop.1 0 1Figure 2.1: Linking functional and taxonomic composition to environmental conditions.(A) Cross-validated coefficients of determination (R2cv) for relative functional group abundances(left bars), as well as for OTU proportions within each functional group (right bars), achieved byregression models with environmental predictor variables. (B) Spearman rank correlations betweenenvironmental variables and relative functional group abundances (left box) or OTU proportionswithin each group (right box). Circle surface area and color saturation are proportional to theabsolute correlation.16Decoupling between function and taxonomy in the ocean microbiomeSRF MESDCM MIXBCStress: 0.067Stress: 0.25DAOTU richnessfunctional richnessEphylaclassesordersfamiliesgeneraOTUsfunc. groupsfraction segregated0. vs MES DCM vs SRFphylaclassesordersfamiliesgeneraOTUsfunc. groupsphylaclassesordersfamiliesgeneraOTUsfunc. groupsF G HMES vs SRFFigure 2.2: Environmental filtering of microbial communities in the global ocean. (A)Functional community profiles, with samples ordered according to water column zone (SRF: surfacewater layer; DCM: deep chlorophyll maximum; MIX: subsurface epipelagic mixed layer; MES:mesopelagic zone). A darker color corresponds to a higher relative abundance of a functionalgroup. (B,C) Metric multidimensional scaling of microbial communities (one point per sample),based on Bray-Curtis dissimilarities in terms of (B) functional groups and (C) OTUs. Points ingreater proximity correspond to more similar communities. (D) Community richness in terms offunctional groups and OTUs (one point per sample), after rarefaction. (E–G): Fraction of OTUs,taxa or functional groups significantly segregated between water column zones (E: DCM vs MES,F: DCM vs SRF, G: MES vs SRF). (H) Box-plots of pairwise Spearman rank correlations betweenrelative OTU abundances, depending on the number of shared functions. Vertical bars show 66%percentiles.17Decoupling between function and taxonomy in the ocean microbiome 1 10 100 1000 10000 100000L2 L3 L4 L5 L6raw-OTUsnumber of taxa represented per functionCOTUgenusfamilyorderclassphylumAOTU proportionstotal aerobic ammonia oxidizersBtaxa per functionFigure 2.3: Functional redundancy in the global ocean microbiome. (A) Number ofbacterial and archaeal taxa represented within each functional group (one point per group), atvarious taxonomic levels. At the species and genus level, aerobic chemoheterotrophs present by farthe richest group. (B): OTU proportions within the group of aerobic ammonia oxidizers (one colorper OTU). Samples are sorted according to the relative abundance of the entire functional group.For OTU proportions within other functional groups, see Fig. A.7. (C) Association of functionalgroups (columns) with members of microbial classes (rows). A darker color corresponds to a higherrelative contribution of a class (in terms of the number of OTUs) to a functional group. Rows andcolumns are sorted accorded to the number of non-zero entries within them.18Decoupling between function and taxonomy in the ocean microbiomeACBD5 10 15 200geographical distance (10³ km)Stress: 0.062 Stress: 0.20⇢ = 0.09,P = 0.17⇢ = 0.03,P = 0.385 10 15 200geographical distance (10³ km)Figure 2.4: Community differences vs geographical distance. (A,B): Bray-Curtis dissimilar-ities between microbial communities, compared with geographical distances (one point per samplepair). Community dissimilarities are calculated in terms of (A) relative functional group abun-dances and (B) relative OTU abundances. Plot imprints indicate Spearman rank correlations andtheir statistical significance. Samples are restricted to the mesopelagic zone; for other water columnzones see Fig. A.8. For other taxonomic resolutions (e.g., at genus or family level) see Fig. A.13.(C,D): Metric multidimensional scaling of microbial communities (one point per sample), based onBray-Curtis dissimilarities in terms of (C) functional groups (C) OTUs. Points in greater proximitycorrespond to more similar communities. Points are shaped and colored by ocean region.19Decoupling between function and taxonomy in the ocean microbiomeclassorderfamilygenusspecies0. 2cvMean regression R2cv of full OTU table vs metadataFigure 2.5: Regression of taxonomic community composition. Mean cross-validated coef-ficients of determination (R2cv) for relative taxon abundances at the community level (for varioustaxonomic resolutions), achieved by regression models with environmental predictor variables.20Functional stability and taxonomic variability in bromeliad microbiomesChapter 3Functional stability despite high taxo-nomic variability across microbial com-munities in bromeliad tanks13.1 SynopsisAccording to the pathway-centric paradigm suggested in the previous chapter, the metabolicfunctional structure of microbial communities is strongly shaped by environmental condi-tions constraining the metabolic pathways that sustain growth. Apart from metabolic nicheeffects, however, several additional processes such as biotic interactions or dispersal limi-tation can influence overall community composition. Thus, similar habitats could exhibitvery different microbial communities despite a similar functional structure. To test thisprediction, we determined the bacterial and archaeal community composition in 22 repli-cate “miniature” aquatic ecosystems, contained within the foliage of wild bromeliads. Weused 16S rDNA marker gene sequencing for inferring the taxonomic composition within 9metabolically defined functional groups, as well as shotgun environmental DNA sequencingfor estimating the overall abundances of these groups. We find that all bromeliads exhibitremarkably similar functional community structure, but a highly variable taxonomic com-position within individual functional groups. Using a variety of statistical analyses fromcommunity ecology, we find evidence that the taxonomic turnover within functional groupsis driven by a combination of environmental filtering and biotic interactions. We find noeffect of dispersal limitation or random population drift on community composition, andconclude that complex deterministic processes — rather than neutral assembly — shapecommunity variation within functional groups.1A version of this chapter is currently under review for publication (see the Preface for author contribu-tions): Louca, S., Jacques, S.M.S., Pires, A.P.F., Leal, J.S., Srivastava, D.S., Parfrey, L.W., Farjalla, V.F.,Doebeli, M. (in review). Functional stability despite high taxonomic variability across microbial communities.21Functional stability and taxonomic variability in bromeliad microbiomes3.2 IntroductionMicrobial metabolism drives the bulk of biogeochemical fluxes in virtually every naturalecosystem and has shaped Earth’s surface chemistry through geological time (116). Nat-ural microbial communities can display complex variation in composition across space ortime, such as through the ocean water column (460) or across seasons (540), and this varia-tion can have profound effects on ecosystem functions (500, 540). The mechanisms drivingthis variation remain poorly understood, because the entanglement of multiple mechanismsseverely complicates the identification of direct causal relationships (258). Potential mech-anisms of microbial community assembly suggested previously include adaptation to localenvironmental conditions (“environmental filtering” 371), biotic interactions such as preda-tion or syntrophy (258, 276, 455), random population drift (349), random colonization order(“lottery effects” 52) and spatially limited random dispersal (306). Recent work suggeststhat the bulk of environmentally driven variation in the global ocean microbiome is closelyrelated to its metabolic function, while the taxonomic variation within individual functionalgroups is only poorly explained by environmental conditions (289, 381). This points to-wards a promising and elegant paradigm for microbial ecology, in which community functionis strongly shaped by energetic and stoichiometric constraints such as the availability oflight or electron acceptors for respiration (25, 381), while the composition within functionalgroups is modulated by additional deterministic or stochastic mechanisms. According to thisparadigm, one would predict that physicochemically similar environments will promote sim-ilar functional community structure, while allowing for strong taxonomic variation withinindividual functional groups. This prediction is supported by observations in engineeredecosystems such as bioreactors exhibiting strong taxonomic fluctuations while maintainingconstant biochemical performance (122, 495), but it remains largely untested for naturalmicrobial communities.Here we test this prediction in natural prokaryote (i.e., bacterial and archaeal) communitiesacross 22 replicate natural aquatic environments, harbored within the foliage (“tanks”) ofbromeliads in the Jurubatiba coastal sand dune National Park, Brazil (Fig. 3.1). Bromeliadtanks accumulate rain water and organic material (such as dead leaves) from their sur-rounding environment, and intense decomposition of this material sustains a high richnessof microorganisms and macroinvertebrates (152, 395). Apart from constituting regional bio-diversity hotspots, bromeliads are often used as “miniature” model systems for microbialand invertebrate ecology (117, 438). Microbial communities in bromeliads tend to be highlydistinct from the surrounding environments (e.g., soil), exhibiting a strong shift towards22Functional stability and taxonomic variability in bromeliad microbiomesfermenting and methanogenic organisms (151, 152, 304).To ensure a high similarity between systems, we only surveyed mature plants of a singlebromeliad species (Aechmea nudicaulis) from the same region (296). We used ampliconDNA sequencing of the 16S ribosomal gene, a standard marker gene in microbial ecology(536), to estimate the taxonomic richness and variability within 9 metabolically definedfunctional groups of potential ecological importance in bromeliads, such as fermentation,dissimilatory reduction of nitrogen compounds (“nitrogen respiration”) and methanogenesis(151, 152, 345). Detected taxa were assigned to these metabolic functional groups, wheneverpossible, based on available literature. In parallel, we used environmental shotgun DNAsequencing (metagenomics) to estimate the overall abundance of each functional group interms of one or multiple proxy genes. We find that all communities exhibited a remarkablysimilar functional composition, which contrasts with a highly variable taxonomic compo-sition within individual functional groups. Further, we examined phylogenetic communitystructure and species co-occurrence patterns, and compared community composition to abi-otic environmental conditions and geographical location, to elucidate potential mechanismsdriving this variation within functional groups.3.3 Results and discussion3.3.1 Functional stability contrasts with taxonomic variabilityWe found that the metabolic functional potential of tank prokaryotic communities, as mea-sured by gene abundance profiles, is consistent across bromeliads (Fig. 3.2A,B). This con-sistency in metabolic functional potential is presumably promoted by strong stoichiometricbalancing between coupled metabolic pathways, the majority of which serve to break downlarge organic compounds to simpler organic molecules and gradually move electrons fromreduced organic carbon to terminal electron acceptors such as protons (H+), carbon dioxide(CO2), sulfate (SO2−4 ), nitrate (NO−3 ) and oxygen (O2) (56). These metabolic pathwaysare distributed across multiple organisms and link the breakdown of dead organic mattercaptured in the bromeliads to the eventual release of CO2 (22), methane (CH4)(304) and pre-sumably molecular nitrogen (N2). Each step along these pathways thus appears to sustainhighly constrained microbial productivities, resulting in specific proportions of metabolicfunctional groups that are conserved across bromeliads.On the other hand, we find that the taxonomic composition within individual functional23Functional stability and taxonomic variability in bromeliad microbiomesgroups is highly variable across bromeliads, both in terms of the occurrence of operationaltaxonomic units (OTU, at 99% 16S rDNA similarity, see the Methods for justification) aswell as their relative abundances (Figs. 3.2C–K). For example, within any given functionalgroup, any two bromeliads share only ∼20–60% of their OTUs (Table B.1), and this overlapis significantly lower than would be expected solely due to insufficient sampling effort (P <0.001). In fact, within any given functional group, OTUs detected in all of the samples(“core microbiome”) only make up ∼0–1% of total OTUs across all samples (“regional pool”).Further, coefficients of variation for OTU proportions within functional groups are typicallyan order of magnitude higher than coefficients of variation of relative gene abundances (∼2–3vs ∼0.2–0.6, respectively; Table B.2). This taxonomic variability within functional groupspersists to a considerable extent even when OTUs are combined at higher taxonomic levels(e.g., genus, family, order or class level; Figs. B.1, B.2, B.3 and B.4) and is in strongcontrast to the much more stable relative gene abundances. Hence, in each bromeliad thesame metabolic niches appear to be occupied by very different species assemblages, even if theoccupancy of each niche — in terms of its relative abundance — remains almost unchanged.This variability within metabolic niches explains the previously observed strong variationin overall microbial community composition across bromeliads (117) and underlines the factthat high taxonomic variability between replicate ecosystems need not imply differences inmetabolic function.3.3.2 Causes of variation within functional groupsThe strong taxonomic variability within functional groups is presumably enabled by a highfunctional redundancy in the regional microbial species pool (Fig. 3.3), allowing for potentialcolonization of each bromeliad by multiple metabolically similar OTUs. Although we do notyet know the precise mechanisms determining the subset of OTUs that eventually establishin each bromeliad and within each metabolic niche, we can discount certain explanations.For example, random population drift combined with random dispersal within the sampledarea would result in negligible associations between OTUs and be perceived as a randomsubsampling of the regional OTU pool. To test this scenario for each functional group, wecompared OTU co-occurrences, as defined by their “C-score” (a measure for mutual OTUsegregation, averaged over all OTU pairs; 159), to a null model corresponding to randomOTU sampling from the functional group’s regional pool. The null model preserved thetotal number of OTUs per sample and per functional group as well as the total number ofsamples containing each OTU, in order to avoid spurious co-occurrence patterns caused bydifferences in OTU richness or OTU frequency. Within 6 out of 9 functional groups (aero-24Functional stability and taxonomic variability in bromeliad microbiomesbic chemoheterotrophs, cellulose degraders, fermenters, nitrogen respirers, photoautotrophsand sulfate respirers), we find that OTUs are significantly co-segregated with respect toeach other, that is, C-scores are higher than expected by chance (P < 0.05, Table B.3).The remaining functional groups also display OTU segregation, although differences fromthe null model are not statistically significant. This general segregation of OTUs beyondthe null expectation rules out random subsampling and drift as important causes of OTUturnover within functional groups. We note that, when combined with spatially limited dis-persal, neutral population drift could in principle produce non-random OTU co-occurrencepatterns (490), because bromeliads in greater proximity would tend to exhibit more similar(i.e., correlated) community composition. However, spatially limited dispersal is likely notimportant at this scale: We did not find any significant correlations between geographicaldistance and community dissimilarity for any of the functional groups and for any of theconsidered dissimilarity metrics (Mantel tests with Spearman rank correlations; Fig. B.5;detailed results in Table B.4). In fact, bromeliads at opposite ends of our study site of-ten contained more similar communities than immediately adjacent bromeliads (Fig. B.6).These results are consistent with previous work that found negligible effects of spatial dis-tance on bacterial, zooplankton and macroinvertebrate communities in bromeliads at similarspatial scales (117). Hence, the OTU co-occurrence patterns observed here likely reflect adeterministic mutual exclusion between OTUs that is potentially caused by environmentalfiltering or biotic interactions (159), rather than spatially correlated or uncorrelated randomassembly.To further verify the importance of deterministic assembly processes, we examined the phy-logenetic structure within functional groups in each sample. Specifically, within any givenfunctional group, we assessed whether OTUs found in the same sample tend to be phyloge-netically underdispersed (“clustered”) or overdispersed in terms of their mean phylogeneticdistance, when compared to the expectation based on random OTU sampling from the re-gional pool of that functional group. Conventionally, underdispersion is interpreted as asign of environmental filtering acting similarly on closely related clades (196), while overdis-persion is interpreted as a sign of increased competition between close relatives, althoughalternative mechanisms, such as specialist predation by phages, may also create non-randompatterns (362). Of the 9 functional groups, we find that 4 show a significant tendency to-wards underdispersion and 2 functional groups demonstrate overdispersion (P < 0.05, Fig.3.4). The detection of a statistically significant phylogenetic structure in 6 out of 9 functionalgroups is unlikely the result of a false positive detection rate (P < 0.000001). This supportsour previous interpretation that community assembly is generally not random within func-25Functional stability and taxonomic variability in bromeliad microbiomestional groups, but is subject to deterministic processes that are sensitive to phylogeneticrelationships. Note that an absence of phylogenetic structuring, on the other hand, doesnot rule out deterministic processes (362). Moreover, the fact that some groups exhibit phy-logenetic underdispersion, while others exhibit phylogenetic overdispersion, suggests thatdifferent ecological processes influence phylogenetic structure in different functional groups.In particular, the strong overdispersion of methanogens as well as methylotrophs suggeststhat competition may correlate strongly with relatedness in these groups and that otherfactors, such as environmental filtering, may be unimportant or only weakly correlate withphylogeny. On the other hand, frequent horizontal transfer of genes for the degradationof particular organic compounds reduces the correlation between phylogenetic relatednessand metabolic similarity in aerobic chemoheterotrophs and fermenters (308). This may ex-plain why in these two functional groups mechanism causing underdispersion — rather thanoverdispersion — seem to dominate.The above findings suggest that, although taxonomic composition within functional groupsis highly variable, it is not random in terms of OTU co-occurrences or phylogenetic relation-ships. This determinism might be caused by environmental filtering, by biotic interactions,or by a combination of these, such as trade-offs between environmental stress tolerance andcompetition (159, 362). For example, recent work demonstrates that microbial communitiescan exhibit complex but deterministic responses to extremely weak environmental fluctua-tions (130), and that environmental turnover — when carefully characterized — can explaincommunity turnover (371, 450). To determine whether environmental filtering partly drivesthe variation in OTU composition within functional groups, we examined the predictive abil-ity of an extensive set of physicochemical variables (overview in Table B.5). We consideredstandard limnological variables such as pH, salinity and multivariate characterization of dis-solved organic carbon, as well as other potentially important variables such as detrital volumeand vegetative cover (“shading”). Many of these variables are known to influence macroinver-tebrate communities in bromeliads (102, 279, 296). We find that a subset of environmentalvariables — including pH, salinity, detrital volume and shading — exhibit high and statis-tically significant correlations to relative OTU abundances within several functional groups,suggesting that these variables may be particularly influential (Fig. 3.5A). Regression mod-els generally exhibit low to moderate predictive power when compared against novel data,as indicated by cross-validated coefficients of determination (R2cv), although we note thatpredictive power varies greatly among different OTUs (Fig. 3.5B). We therefore concludethat the environmental variables considered here can explain some of the variation withinfunctional groups, but that additional factors are also important. In fact, the predictive26Functional stability and taxonomic variability in bromeliad microbiomespower of environmental variables is similarly low within functional groups showing eitherphylogenetic underdispersion or overdispersion (Fig. 3.4), indicating that the non-randomphylogenetic community structure may be shaped mostly by biotic interactions rather thanenvironmental filtering.Taken together, our results suggest that, in addition to environmental filtering, biotic inter-actions also play a significant role in shaping these communities while maintaining functionalsimilarity across bromeliads. The potential importance of biotic interactions, such as com-petitive exclusion, predation by phages or protists as well as metabolic interdependencies,in shaping microbial communities has been pointed out previously (258, 276, 455). For ex-ample, adaptation of bacteriophages to specific hosts can strongly influence bacterial speciescomposition (423) and promote spatial as well as temporal variation of microbial communi-ties (397, 462). Consequently, seemingly random taxonomic variation across locations mayresult from biotic interactions driving complex but deterministic population dynamics or,alternatively, from biotic interactions generating complex community responses to subtleenvironmental variation (130, 450, 451). This conclusion is consistent with previous findingsthat the distribution of cyanobacterial taxa across coexisting bromeliads was driven both byphysicochemical factors as well as protozoans and invertebrates (63). Here we have not con-sidered possible effects of invertebrates, although we note that we detected almost no insectsin the sampled bromeliads. Further, given the available data it is impossible to determinewhether microbial communities within single bromeliads exhibited high temporal variability,or if communities were near steady state. Previous work shows that even strongly controlledengineered ecosystems can exhibit high temporal fluctuations in microbial taxonomic com-position (122, 349) and that these fluctuations can be deterministic (130, 495). Hence, thehighly variable community profiles observed here could be mere “snapshots” along similarsuccessional trajectories far from steady state (285).3.4 ConclusionsHere we have shown that replicate natural ecosystems exhibit highly variable taxonomiccomposition of bacterial and archaeal communities, despite very similar metabolic func-tional structure. This points to a fundamental and important difference between func-tional and taxonomic community structure, which arises because mechanisms leading to aconvergence of functional structure (e.g., nutrient limitation, stoichiometric balancing be-tween coupled metabolic pathways) do not necessarily lead to a convergence of taxonomiccomposition. Reciprocally, strong taxonomic turnover may only weakly affect ecosystem27Functional stability and taxonomic variability in bromeliad microbiomesfunctioning (254, 517; but see 454). We suggest that functional community profiles, eitherbased on gene-centric metagenomics (255, 381) or on a functional classification of detectedtaxa (289), should be the baseline of microbial biogeographical studies, particularly in caseswhere geochemical gradients shape microbial niche distribution (381, 387). The residualvariation within individual functional groups can then be extracted and analyzed separately,as demonstrated here, in order to elucidate additional community assembly processes thatact in superposition to metabolic niche effects. Our analysis suggests that in bromeliadtank ecosystems the variation within individual functional groups is the result of multipledeterministic processes, including environmental filtering and biotic interactions, while ran-dom processes such as dispersal limitation or neutral population drift (427) appear to beless relevant. This is in line with recent work on the global ocean microbiome (289) andsuggests a general paradigm for microbial ecology. A careful distinction between functionalcomposition and taxonomic composition within functional groups thus enables deeper in-sight into microbial community assembly processes and will be an important step towards atruly mechanistic microbial ecology.Aechmea nudicaulis20 cmFigure 3.1: Bromeliad species used in this study Large picture: Aechmea nudicaulis, thebromeliad species considered in this study. The foliage forms a deep central cavity (“tank”, smallpicture) that accumulates rainwater and dead organic material, such as leaves from nearby trees.The decomposition of this material sustains a highly productive and diverse food web inside thetank.28Functional stability and taxonomic variability in bromeliad microbiomesmetagenomic profiles (proxy genes)OTU proportionsrelative gene abundancesOTU proportionsOTU proportionsA BDHC EIfermenters aerobic chemoheterotrophs nitrogen respirerssulfate respirersphotoautotrophsmethanogens ureolyticmethylotrophscellulolyticF GJ Kmetagenomic profiles (rare proxy genes) gene groups in Fig. A & Bbromeliad sample bromeliad sample bromeliad sampleFigure 3.2: Functional stability vs taxonomic variability. (A) Relative abundances of proxygenes in prokaryotic metagenomic sequences (genes grouped by function, one color per gene group,one column per sample). For details on associating genes with functions see the Methods. (B) Sub-plot of (A) focusing on the rarer genes for better illustration. (C–K) Prokaryotic OTU proportionswithin individual functional groups (one color per OTU, one column per sample, one plot perfunctional group), as determined from 16S rDNA sequences. Due to ambiguities in gene function,for some functional groups (D, H) we considered multiple proxy genes. For each functional group,proxy genes are indicated via color codes (corresponding to colors in A and B) next to the functionalgroup’s name. For more detailed metagenomic profiles see Supplemental Fig. B.10. For thetaxonomic composition within functional groups at higher taxonomic levels (genus, family or order)see Figs. B.1, B.2 and B.3.29Functional stability and taxonomic variability in bromeliad microbiomesaerobic chemoheterotrophy (225)fermentation (103)photoautotrophy (61)sulfate respiration (40)ureolysis (21)methylotrophy (24)nitrogen respiration (17)cellulolysis (14)methanogenesis (13)Figure 3.3: Functional redundancy in the regional OTU pool. Associations of functionalgroups (rows) with OTUs (columns), indicated by blue table cells. Functional groups are sortedaccording to their number of OTUs (indicated in brackets). Some OTUs were associated with morethan one functional group. For analogous plots at the genus, family and class level, see Figs. B.11,B.12 and B.13, respectively.standardized effect size of mean phylogenetic distanceaerobic chemoheterotrophs (P<0.001)cellulolytic (P=0.86)fermenters (P<0.001)methanogens (P=0.02)methylotrophs (P=0.011)nitrogen respirers (P=0.62)photoautotrophs (P<0.001)sulfate respirers (P<0.001)ureolytic (P=0.089)underdispersion (clustering) overdispersionFigure 3.4: Phylogenetic dispersion. Standardized effect sizes (SES) of mean phylogeneticdistances between OTUs within individual functional groups (one point per sample, one row perfunctional group), as an indicator of phylogenetic overdispersion (SES > 0) or underdispersion(SES < 0). The vertical line at zero corresponds to the expectation under the null model, and isshown for reference. Functional groups displaying statistically significant overdispersion or under-dispersion (i.e., a strong tendency towards positive or negative SES across samples, respectively)are highlighted in bold (P-values are given in brackets).30Functional stability and taxonomic variability in bromeliad microbiomesA Baverage absolute correlations coefficients of determinationR2cvFigure 3.5: Relating OTU proportions to environmental variables. (A) Average magnitude(i.e., average absolute value) of correlations between OTU proportions within functional groups andmeasured environmental variables (one column per environmental variable, one row per functionalgroup). A larger and darker circle corresponds to a larger average absolute correlation, and indi-cates a stronger relation between an environmental variable and a functional group’s taxonomiccomposition. Statistically significant correlations (P < 0.05) are written inside the circles. (B) Dis-tribution of cross-validated coefficients of determination (R2cv, a measure for a model’s predictivepower) for regression models of OTU proportions within each functional group using environmentalvariables as predictors (one box per functional group). Horizontal bars comprise 95% of the R2cvacross OTUs. The vertical grey line at zero is shown for reference.31Cell-metabolic models for microbial ecologyChapter 4Calibration and analysis of cell-metabolicmodels for microbial ecology14.1 SynopsisMicrobial ecosystem modeling is complicated by the large number of unknown parame-ters and the lack of appropriate calibration tools. Here we present a novel computationalframework for modeling microbial ecosystems, which combines genome-level cell models intoa microbial community in which the metabolic activity of each cell can affect the sharedmetabolite pool and thus potentially the metabolism of other cells. The framework, whichwe called MCM (“Microbial Community Modeler”), automates statistical analysis and modelcalibration to experimental data. To demonstrate the potential of MCM, we examined thedynamics of a community of Escherichia coli strains that emerged in laboratory evolutionexperiments, during which an ancestral strain diversified into two coexisting ecotypes. Weconstructed a microbial community model comprising the ancestral and the evolved strains,which we calibrated using separate monoculture experiments. Simulations reproduced thesuccessional dynamics in the evolution experiments, and pathway activation patterns ob-served in microarray transcript profiles. Our approach yielded detailed insights into themetabolic processes that drove bacterial diversification, involving acetate cross-feeding andcompetition for organic carbon and oxygen.4.2 IntroductionMetabolic interactions are an emergent property of microbial communities (70, 333). Eventhe simplest life forms can only be understood in terms of biological consortia character-ized by shared metabolic pathways and distributed biosynthetic capacities (200, 238, 311).1A version of this chapter has been published (see the Preface for author contributions): Louca, S.,Doebeli, M. 2015. Calibration and analysis of genome-based models for microbial ecology. eLife. 4:e08208.DOI:10.7554/eLife.0820832Cell-metabolic models for microbial ecologyFor example, glucose catabolism to carbon dioxide or methane is a multi-step process ofteninvolving several organisms that indirectly exchange intermediate products through theirenvironment (442). Microbial communities are thus complex systems comprising severalinteracting components that cannot be fully understood in isolation. In fact, metabolic in-terdependencies between organisms are at least partially responsible for our current inabilityto culture the great majority of prokaryotes (410). Understanding the emergent dynamics ofmicrobial communities is crucial to harnessing these multicomponent assemblages and usingsynthetic ecology for medical, environmental and industrial purposes (44).Genome sequencing has enabled the reconstruction of full-scale cell-metabolic networks (184),which have provided a firm basis for understanding individual cell metabolism (107, 238, 496).Recent work indicates that multiple cell models can be combined to understand microbialcommunity metabolism and population dynamics (70, 177, 238, 453, 543). These approachesassume knowledge of all model parameters such as stoichiometric coefficients, maintenanceenergy requirements or extracellular transport kinetics, a requirement that is rarely met inpractice (118, 177). Experiments and monitoring of environmental samples could providevaluable data to calibrate microbial community models, e.g., via statistical parameter esti-mation, but appropriate tools are lacking. So far, the standard approach has been to obtaineach parameter through laborious specific measurements or from the available literature,or to manually adjust parameters to match observations (70, 177, 292). Furthermore, sta-tistical model evaluation and sensitivity analysis is typically performed using ad-hoc code,thus increasing the effort required for the construction of any new model. Consequently, theexperimental validation of genome-based microbial community models and their applicationto biological questions are rare (177, 318).We have developed MCM (Microbial Community Modeler), a mathematical framework andcomputational tool that unifies model construction with statistical evaluation, sensitivityanalysis and parameter calibration. MCM is designed for modeling multi-species microbialcommunities, in which the metabolism and growth of individual cell species is predicted usinggenome-based metabolic models. Cells in the community interact in a dynamical environ-ment in which metabolite concentrations and other environmental variables influence, andare influenced by, microbial metabolism. Unknown model parameters can be automaticallycalibrated (fitted) using experimental data such as cell densities, nutrient concentrations orrate measurements. To demonstrate the potential of MCM, we modeled a bacterial commu-nity that has emerged from in-vitro evolution experiments, during which an ancestral strainrepeatedly diversified into two distinct ecotypes. Experiments with microbes have an estab-lished tradition as model systems for understanding ecological and evolutionary processes33Cell-metabolic models for microbial ecology(112, 226). We show that the predictions derived from MCM are in very good agreement withthe outcomes of several monoculture and co-culture experiments. While the experimentalresults described below have been found over the course of several years (132, 188, 262, 436),it is only now that a mechanistic model has managed to unify them in a clear, unambiguousand synergistic manner. The analysis presented here thus provides a unifying quantitativeinterpretation for a large body of experimental work performed in our lab over the course ofroughly a decade.4.3 Model overviewIn MCM, a microbial community model is a set of differential equations for the populationdensities of the cell species comprising the community and of the ambient concentrationsof utilized nutrients (metabolites), coupled to optimization problems for the cell-specificrates of reactions involving these metabolites. Each cell is characterized by its metabolicpotential, that is, the genetically determined subset of reactions it can catalyze, as well asany available metabolite transport mechanisms. The reaction rates and metabolite exchangerates (i.e., the metabolism) of each cell are assumed to depend on its metabolic potential aswell as on the current environmental conditions, such as metabolite concentrations. Throughtheir metabolism, in turn, cells act as sinks and sources of metabolites in the environment.Additional metabolite fluxes, such as oxygen diffusion from the atmosphere into the growthmedium of a modeled bacterial culture, can be included in the model.At any point in time, individual cell metabolism is determined using flux balance analysis(FBA) (354), a widely used framework in cell-metabolic modeling (70, 107, 129, 238, 496).In FBA, cell metabolism is assumed to be regulated in such a way that the rate of biosyn-thesis is maximized (119, 496). The chemical state of cells is assumed to be steady, leadingto stoichiometric constraints that need to be satisfied for any particular combination ofintracellular reaction rates. Reaction rates, on the other hand, are limited due to finite en-zyme capacities. Metabolite uptake/export rates can also be limited due to finite diffusionrates or limited transmembrane transporter efficiency. For example, uptake rates can beMonod-like functions of substrate concentrations (177, 292). Taken together, cell-metabolicpotential, stoichiometric consistency, reaction rate limits and transport rate limits define theconstraints of a linear optimization problem for each cell species at each point in time. Theoptimized biosynthesis rate is translated into a cell production rate by dividing by the cell’smass, thus defining the species’ population growth (Fig 4.1).34Cell-metabolic models for microbial ecologyThe central assumption of individual cells maximizing biosynthesis, subject to environmentaland physiological constraints, is rooted in the idea that evolution has shaped regulatory mech-anisms of unicellular organisms in such a way that they strive for maximum growth wheneverpossible. Biosynthesis has been experimentally verified as an objective for Saccharomycescerevisiae and Escherichia coli (51, 146, 176). The assumption of maximized biosynthe-sis is less valid for genetically engineered organisms or those exposed to environments thatare radically different from the environments that shaped their evolution (417). Despiteits limitations, FBA has greatly contributed to the understanding of several genome-scalemetabolic networks and metabolic interactions between cells (70, 129, 177, 238, 354, 453).One advantage of FBA models over full biochemical cell models is their independence ofintracellular kinetics and gene regulation, which limits the number of required parametersto stoichiometric coefficients and uptake kinetics.The combination of FBA with a varying environmental metabolite pool, as implemented byMCM, is known as dynamic flux balance analysis (DFBA) (70, 177, 292). In contrast toconventional FBA, DFBA models are dynamical because cell densities and environmentalmetabolite concentrations both change with time, and the rate of change of each cell densityand metabolite concentration depends on the current cell densities and metabolite concen-trations (177, 292). Because metabolites can be depleted or produced by several cell species,the environmental metabolite pool mediates the metabolic interactions between cells (410).For example, oxygen uptake rates might depend on environmental oxygen concentrations,which in turn are reduced by cellular respiration. Similarly, cells might excrete acetate as abyproduct of glucose catabolism, which then becomes available to other cells. The metabolicoptimization of individual cells striving for maximal growth, while modifying their environ-ment, leads to non-trivial community dynamics that can include competition, cooperationand exploitation. The cell-centric nature of DFBA differs fundamentally from other flux bal-ance analyses of microbial communities that assume an optimization of a community-wideobjective such as total biomass synthesis (239, 453, 545). Such an assumption is, how-ever, questionable from an evolutionary perspective and likely not appropriate for naturalcommunities comprising several species. Community-level optimality will typically conflictwith optimality for individual competing lineages, and configurations that optimize overallbiosynthesis at the expense of individual “cooperators” would be vulnerable to exploitation(327).Recent work suggests that DFBA is a promising approach to microbial ecological mod-eling (70, 177, 318). For example, Harcombe et al. (177) designed a computational tool(COMETS) based on DFBA, which was able to accurately predict equilibrium compositions35Cell-metabolic models for microbial ecologyof mixed bacterial cultures grown on petri dishes. However, COMETS offers limited modelversatility in terms of uptake and reaction kinetics and only has few environmental feed-back mechanisms (namely, varying extracellular metabolite concentrations). Furthermore,it assumes complete knowledge of all required model parameters and provides no genericstatistical model analysis. Hence, while COMETS sets an important precedent, considerablework is still needed to make DFBA a practical approach in microbial ecosystem modeling.MCM extends Harcombe et al.’s framework to more versatile microbial ecological modelsthat include arbitrary reaction kinetics (e.g., subject to product-inhibition) as well as dy-namical environmental variables (e.g., pH) that influence, and are influenced by, microbialmetabolism. In addition, MCM supports cell models in which internal molecules act as dy-namical constraints that further restrict the FBA solution space, for example to account forpost-transcriptional regulation or delays in enzyme synthesis (38). These so called regula-tory FBA models have been shown to improve the fidelity of conventional FBA models forE. coli and S. cerevisiae (81–83, 187), however their application to microbial communitiesremains untested. MCM can statistically evaluate models against data, analyze their sensi-tivity to varying parameters (61), and estimate the uncertainty of model predictions in theface of stochasticity (171). Perhaps most importantly, MCM can automatically calibrateunknown model parameters to data, for example obtained from monoculture experiments(as demonstrated below), from bioreactor experiments involving multiple species (285) orfrom environmental samples of unculturable communities (Fig 4.2; see section C.1 for de-tails). MCM can thus be used to understand the dynamics of realistic microbial ecosystems,ranging from the soil or groundwater to mixed laboratory cultures and bioreactors.4.4 Results and discussion4.4.1 Successional dynamics of a microbial communityIn a series of laboratory evolution experiments with E. coli (strain B REL606; 538) in glucose-acetate supplemented medium, two metabolically distinct strains consistently evolved fromthe ancestral (A) strain (188, 262, 437). When grown in monoculture with the same mediumcomposition, all three strains exhibit diauxic growth curves with a fast glucose-driven growthphase followed by slower growth on acetate. However, the three strains differ in their ef-ficiencies to catabolize glucose and acetate: Strain SS (slow switcher) is a better glucoseutilizer when compared to strain A, and the depletion of glucose only leads to a slow switchto acetate consumption. On the other hand, the FS (fast switcher) strain has evolved to bea better acetate utilizer, initiating acetate consumption at higher remnant glucose concen-36Cell-metabolic models for microbial ecologytrations than strains A and SS. This acetate specialization is based on a tradeoff in the citricacid cycle and comes at the cost of being a less competitive glucose consumer.Replicated serial dilution experiments starting with strain A monocultures have shown aconsistent phenotypic diversification, involving an initial invasion of the SS phenotype anda subsequent invasion of the FS phenotype, leading to the eventual extinction or near-extinction of the ancestor and the stable coexistence of the SS and FS phenotypes (Fig4.3) (188, 262, 437, 488). Genome sequencing revealed that this metabolic diversificationcan be attributed to point-mutations in genes linked to glucose and acetate uptake kineticsand metabolism (188). The successional dynamics of the three phenotypes are thus likelydriven by adaptations to a changing metabolic niche space, defined by fluctuating glucose,acetate and, potentially, oxygen availabilities (188, 262, 488). An understanding of theunderlying ecological processes would shed light on the ecology and evolution of naturalmicrobial communities with shared catabolic pathways.To mechanistically explain the observed community dynamics, we used MCM to constructa model comprising the ancestral and the two evolved E. coli types. By keeping track ofpathway activation, cell densities, metabolic fluxes and nutrient concentrations, we gaineddetailed insight into the processes driving the successional dynamics of metabolic diversifi-cation.4.4.2 Experimental calibrationBased on a published cell-metabolic template for the ancestral E. coli strain comprising over2000 reactions (538), we first constructed three separate cell models for the phenotypes A,SS and FS, respectively. In these preliminary models, cells grew on a substrate pool thatresembled previous batch-fed monoculture experiments with glucose-acetate supplementedminimal medium (262). Cell-specific oxygen, acetate and glucose uptake rate limits wereMonod-like functions of substrate concentrations (114, 325). We calibrated several physi-ological parameters for each cell type to measured chemical concentration and cell densityprofiles, using least squares fitting (Fig 4.4). MCM automatically calibrates free parametersto data through an optimization algorithm that involves step-wise exploration of parameterspace and repeated simulations (see Appendix C.1.2).We then constructed the microbial community (MC) model by combining the three calibratedcell models into a community growing in a common substrate pool. The environmentalcontext resembles Herron and Doebeli’s evolution experiments (188). In particular, the model37Cell-metabolic models for microbial ecologyincludes realistic oxygen depletion-repletion dynamics (167), glucose and acetate depletionby microbial consumption, as well as daily dilutions into fresh glucose-acetate supplementedmedium at a factor 1:100. The microbial community initially consists mostly of type A (1010cells/L), while both SS as well as FS cells are assumed to be rare (1 cell/L). Because themodel is deterministic, the invasion or extinction of each type only depends on its growth ratein a possibly changing environment, but not on random mutation events, nor on demographicstochastic fluctuations.4.4.3 Predicting microbial community dynamicsSimulations of the MC model reproduced the successional dynamics observed in Herron andDoebeli’s experiments: An initial replacement of the ancestor by the SS type is followedby an invasion of the FS type, leading to the eventual coexistence of the SS and FS typesand extinction of the ancestral strain (Fig 4.5A). Interestingly, FS can also invade in theabsence of SS, however invasion occurs much slower and FS reaches lower densities than inthe presence of SS (Supplemental Fig. C.1). This is consistent with an early presence of theFS lineage at low densities in the evolution experiments (Fig 4.3), indicating that some ofthe first FS mutations already confer a slight advantage over the ancestor when FS is rare(188).Time series of acetate concentrations (Fig 4.5B) link the observed successional dynamics ofthe three types to a gradually changing metabolic niche space: The replacement of type Aby the more efficient glucose specialist SS leads to an accumulation of acetate and facili-tates the invasion of the FS type. The specialization of the SS and FS types on glucoseand acetate, respectively (Fig 4.6A), enables their long-term coexistence on glucose-acetateenriched medium through frequency dependent competition (132, 188, 262). In fact, cell-specific acetate exchange rates reveal that the SS type temporarily excretes acetate duringshort intervals, which is concurrently and subsequently consumed by the FS type (Fig 4.5G).This periodic acetate cross-feeding is an evolutionarily emergent property of the microbialcommunity (484). The temporary production of acetate by the SS type is consistent withprevious SS-FS co-culture experiments, which revealed slightly increased acetate concentra-tions towards the end of the SS exponential growth phase (436). An evolved increase ofacetate excretion by E. coli in glucose minimal medium has also been reported by Harcombeet al. (176).It should be noted that cell metabolism depends on substrate concentrations and is subjectto strong temporal variation. In particular, acetate excretion by SS cells correlates strongly38Cell-metabolic models for microbial ecologywith oxygen limitation (Figs. 4.5G,K). The excretion of acetate by E. coli as a byproductof oxygen-limited glucose catabolism has been observed experimentally and explained usingflux balance analysis (292). In the absence of oxygen limitation, complete aerobic glucosecatabolism to carbon dioxide is preferred over incomplete glucose catabolism with acetateexcretion. On the other hand, oxygen limitation leads to an energetic tradeoff betweencomplete glucose catabolism and efficient oxygen utilization, resulting in the excretion ofacetate.Furthermore, the depletion of oxygen during cell growth makes oxygen a temporary limitingresource for all cells (Fig 4.5K). Shortly after dilution into fresh medium, the exponentialgrowth of the SS type on glucose leads to a rapid drop of oxygen to nanomolar concentrations.Despite oxygen diffusion into the medium, oxygen remains at sub-saturation levels for severalmore hours because the slow-growing acetate-consuming FS cells still consume oxygen afterthe growth of SS cells has halted. Differences in SS and FS growth rates (Figs. 4.5C,E)thus mitigate competition for oxygen through temporal niche separation. Hence, oxygenlikely plays an important role in the metabolic diversification, as previously hypothesized byLe Gac et al. (262). This shows that the splitting of metabolic pathways across specialists canbe caused by the composite effects of competition for electron donors and electron acceptors.Consistent with differential substrate usage, average cell-specific reaction rates (Fig 4.6B)reveal differences in pathway activation: The transformation of acetate into acetyl-CoA byacetyl-CoA synthetase (acs) is predicted to be decreased in type SS and increased in type FS,when compared to the ancestral type. Furthermore, the conversion of phosphoenolpyruvateto oxaloacetate (ppc), the conversion of phosphoenolpyruvate to pyruvate (pyk) and thedecarboxylation of pyruvate to acetyl-CoA (pdh), linking the glycolysis pathway to the citricacid cycle, are all predicted to be upregulated in the SS type when compared to the FS type.Similar differences in pathway activation are also predicted during early exponential growthin monoculture (Fig 4.6C,D), because FS grows partly on acetate and SS excretes acetate (Fig4.4F,J). Previous microarray profiles of mRNA concentrations during exponential growth inmonocultures (262) found an upregulation of acetate consumption genes in FS and acetateexcretion genes in SS compared to A, qualitatively confirming our predictions (Fig 4.6C,D).Interestingly, our simulations suggest a significant downregulation of glucose catabolism(pyk, pdh and ppc) in FS compared to A, which contradicts the transcript profiles (Fig4.6D). This discrepancy may be explained by the fact that mRNA was harvested from well-aerated flasks, while the monoculture experiments (Fig 4.4) and evolution experiments (Fig4.3) were performed in test tubes where oxygen can become limiting (10). Oxygen becomesparticularly scarce in the FS tubes (Fig 4.4K) and temporarily limits glucose catabolism,39Cell-metabolic models for microbial ecologywhich would explain the strong downregulation not reflected in the transcript profiles (262).Furthermore, while broad pathway activation patterns could be qualitatively reproduced inour system, this might be harder in other cases due to post-transcriptional regulation orpost-translational modifications (38).The periodic (seasonal) changes in glucose and acetate concentrations in batch culture havepreviously been shown to promote coexistence of the SS and FS types, in analogy to themaintenance of phytoplankton diversity via fluctuations of resource availability (432, 436).Experiments with SS-FS batch co-cultures revealed that the SS type quickly dominates overthe FS type, when restricted to the first glucose-rich season through frequent dilution intofresh growth medium. Reciprocally, when SS and FS are grown in solution resembling thesecond glucose-depleted acetate-rich season, the FS type quickly dominates over the SS type(436). Accordingly, in a full batch cycle the relative SS cell density has been shown toculminate within 4-8 hours and to gradually decrease afterwards (132, Fig 6B), consistentwith our simulations (Fig 4.5D). Simulations of the SS and FS batch co-culture restrictedto the first or second season, analogous to Spencer et al.’s experiments, reproduce theseobservations and verify the role of periodic variation of glucose and acetate concentrationsin maintaining the coexistence of both types (Fig 4.7, see Appendix C.1 for details).4.5 ConclusionsThe models presented here make detailed predictions about the microbial dynamics in theconsidered experiments. First, after calibration the cell models largely explain the data fromthe monoculture experiments (Fig 4.4). Second, the predictions for pathway activation in thethree strains (Fig 4.6) are qualitatively consistent with most transcription profiles. Third,simulations of the microbial community consisting of all three strains (Fig 4.5) reproducethe successional dynamics of diversification observed in the evolution experiments (Fig 4.3).Fourth, simulations of the SS-FS co-cultures restricted to either the glucose-rich or glucose-depleted season reproduce the dominance of the SS or FS type (Fig 4.7), respectively, inconsistence with previous co-culture experiments. It is important to note that only datafrom monoculture experiments were used to calibrate the cell models for the three strains(A, SS and FS). In particular, no information from co-culture experiments was used in thesetup of the microbial community model, and thus there was no a priori knowledge aboutwhat the emergent community dynamics would be. Hence, our work conceptually producednon-trivial predictions that could be compared to experimental observations, although allexperiments had already been performed.40Cell-metabolic models for microbial ecologyOur work sheds light on the fundamental problem of metabolic diversification and the emer-gence of shared catabolic pathways. In particular, our microbial community model allowedquantitative predictions for the metabolic fluxes for each strain in co-culture, revealing tem-porary cross-feeding as an emergent property of the evolved community (484). Cross-feeding,conventionally seen as a beneficial interaction (333), thus emerged as a form of niche segre-gation driven by competition for organic carbon and oxygen. Because both evolved typesprefer glucose whenever available at high concentrations, but exchange acetate under oxygenlimitation, the community constantly switches between competitive and beneficial interac-tions. Natural microbial populations might thus also oscillate between negative and positiveinteractions, for example depending on oxygen levels.The models considered here were completely deterministic, in the sense that the growth andmetabolic activity of each strain were completely determined by the conditions in the testtubes. In particular, both evolved strains were included in the simulations right from the startat low densities, while the invasion or extinction of individual strains was contingent upontheir growth rates in an environment that changed in response to the activity of each strain.Our findings thus support previous suggestions that microbial evolution can be driven bydeterministic ecological processes (188, 358, 530). In this case, the observed diversificationis due to competition for limiting resources whose use is constrained by basic metabolictradeoffs. Other instances of ecological diversification in microbial evolution experiments,e.g., as reported by Plucain et al. (367), might be explained using a similar approach. Weemphasize that at longer time scales and in more diverse communities evolution may notbe as predictable as here, because horizontal gene transfer and rare but complex mutationscould introduce substantial stochasticity (39, 348). The transitions in community structureand activity following such rare events may still be understood in a deterministic frameworksuch as ours.We have demonstrated how MCM can be used to experimentally calibrate and combinegenome-based cell models to predict the emergent dynamics of microbial communities. Ourframework thus provides a starting point for designing microbial communities with particu-lar metabolic properties, such as optimized catabolic performance. While MCM is designedfor genome-based metabolic models, it can also accommodate conventional functional groupmodels. In these models, different ecological functions such as photosynthesis, heterotrophyor nitrification are performed by distinct populations whose metabolic activity is deter-mined, for example, by Michaelis-Menten kinetics and whose growth is described by simplesubstrate-biomass yield factors (Chapter 6; 193, 386). Hence, natural microbial communi-ties could be modeled even if annotated genomes are not available for each member species.41Cell-metabolic models for microbial ecologyWhile functional group models generally require fewer parameters, their calibration remainsa challenge (360). In MCM, model calibration becomes analogous to coefficient estimation inconventional multivariate regression and can be used to estimate poorly known parameterssuch as stoichiometric coefficients, growth kinetics or extracellular transport coefficients.To our knowledge, no existing comparable framework offers the flexibility combined withthe statistical functionality of MCM. In view of the increasing availability of genome-scalemetabolic models (118), our work provides a missing link to a predictive and synthetic mi-crobial ecology.A BFBAsolve linearoptimization problemspredict metabolitefluxesupdate metabolite and cell concentrations12 34Figure 4.1: (A) Conceptual framework used by MCM. Cells (colored shapes) optimize theirmetabolism for maximal growth and influence their environment via metabolite exchange (smallcolored arrows). Additional external fluxes can also affect the environment (large grey arrows).The environment, in turn, influences each cell’s metabolism. (B) Computational framework usedby MCM. Each iteration consists of four steps: flux balance analysis (FBA) is used to translatecell-metabolic potentials and environmental conditions (1) into a linear optimization problem forthe growth rate of each cell species (2). The set of possible reaction rates corresponds to a polytopein high-dimensional space. Solving the optimization problems (3) yields predictions on microbialmetabolite exchange rates (4). Metabolic fluxes and cell growth rates are used to predict metaboliteand cell concentrations in the next iteration (1).42Cell-metabolic models for microbial ecologymeasured time series(experimental or survey data)microbial community model(metabolites, reactions,  cell species, environmental variables)MCM(a) NO3 concentrationloglik=8.89307 (log-normal error structure, 10 data points)estimated σ=0.209739 (relative SD=0.216783)SSR = 7.66345e-09 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meanOverall log-likelihood: 19.2918 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 0.918659Sum of normalized (unitless) squared residuals (SNSR) = 0.280244Average normalized squared residual = SNSR/data points = 0.0133450.00050.00060.00070.00080.00090.0010 5 10 15 20concentration (mol/L)time (day)(b) NH4 concentrationloglik=10.3988 (log-normal error structure, 11 data points)estimated σ=0.0970036 (relative SD=0.0976908)SSR = 4.62685e-08 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meandatamodel00. conc.NO3 conc.normalized log-likelihoodobservable(c) normalized log-likelihood per observable(log-likelihood/data points)00.0050.010.0150.02NH4 conc.NO3 conc.average NSRobservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points)0123456780 5 10 15 20pHpH measured during experiment)time (days)(a) pHpH:  linear interpolation (data/pH)random: logOU (mean 0.8, SD 2.7, tau 2) and exp(t/10)(b) randomcalibratedmodelcontrol script(MCM commands)set integrationTimeStep 0.005set maxTimeSeriesSize 10000set maxSimulationTime 100setodnew  simulations/Ecoli_evolution_$now$set MCmodelDir Ecoli_community_modelsaveContextsaveMCmodelrunMCMquit02e+114e+116e+118e+111e+120 0.2 0.4 0.6 0.8 1density (dead+alive) (1/L)time (day)(a) Ecoli_FS density (dead+alive)loglik=19.1092 (log-normal error structure, 24 data points)estimated σ=0.183 (relative SD=0.187)SSR = 0.14 (data units)2, R2 = 0.907estimated data units = 9.86338e+11 * 1/Llikelihood made unitless by dividing by data meanOverall log-likelihood: -33.3277 (evaluated over 90 data points)Normalized log-likelihood (=log-likelihood/data points): -0.370308Sum of normalized (unitless) squared residuals (SNSR) = 21.0434Average normalized squared residual = SNSR/data points = 0.23381695% centilemodeldata00.00050.0010.00150.0020.00250.0030.00350.0040 0.2 0.4 0.6 0.8 1concentration (mol/L)time (day)(b) M_ac_e concentrationloglik=-16.2639 (normal error structure, 24 data points)estimated σ=0.000589SSR = 8.33e-06 (mol/L)2, R2 = 0.627fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata00.00010.00020.00030.00040.00050.00060.00070.00080 0.2 0.4 0.6 0.8 1concentration (mol/L)time (day)(c) M_o2_e concentrationloglik=-12.5224 (normal error structure, 18 data points)estimated σ=0.000118SSR = 2.53e-07 (mol/L)2, R2 = 0.615fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata00.00050.0010.00150.0020.00250 0.2 0.4 0.6 0.8 1concentration (mol/L)time (day)(d) M_glc_D_e concentrationloglik=-23.6507 (normal error structure, 24 data points)estimated σ=0.000334SSR = 2.68e-06 (mol/L)2, R2 = 0.682fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata-1-0.500.5Ecoli_FS dens. d.a.M_ac_e conc.M_o2_e conc.M_glc_D_e conc.normalized log-likelihoodobservable(e) normalized log-likelihood per observable(log-likelihood/data points) dens. d.a.M_ac_e conc.M_o2_e conc.M_glc_D_e conc.average NSRobservable(f) average normalized squared residual per observable(sum of normalized squared residuals/data points) dens. d.a.M_glc_D_e conc.M_ac_e conc.M_o2_e conc.R2observable(g) R2 per observable(coefficients of determination)01e+072e+073e+074e+075e+070 5 10 15 20density (1/L)time (day)(a) Nitrosomonas05e+061e+071.5e+072e+070 5 10 15 20density (1/L)time (day)(b) Nitrobacter01e+072e+073e+074e+075e+076e+077e+070 5 10 15 201/Ltime (day)(c) total cell densitysimulationspathwayactivation patternsmetabolic flux analysismodel-data comparisonSensitivity of focal observables with respect to parameter ’Vmax_glucose’Parameter varied between 5e-14 and 9e-14, default = 7.056e-14sensitivityanalysisstochastic model analysis00.00010.00020.00030.00040.000505101520c onc ent r at i on  ( mo l /L )time (day)(a) NO3 concentrationloglik=15.3049 (log-normal error structure, 10 data points)estimated σ=0.11 (relative SD=0.111)SSR = 2.58e-09 (mol/L)2, R2 = 0.992fixed data units = mol/Llikelihood made unitless by dividing by data meanOverall log-likelihood: 26.4726 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 1.2606Sum of normalized (unitless) squared residuals (SNSR) = 0.144726Average normalized squared residual = SNSR/data points = 0.0068917195% centilemodeldata0.00040.00050.00060.00070.00080.00090.0010.00110.00120.001305101520c onc ent r at i on  ( mo l /L )time (day)(b) NH4 concentrationloglik=11.1677 (log-normal error structure, 11 data points)estimated σ=0.0905 (relative SD=0.091)SSR = 3.93e-08 (mol/L)2, R2 = 0.876fixed data units = mol/Llikelihood made unitless by dividing by data mean95% centilemodeldata00. conc.NH4 conc.nor mal i ze d l og -l i ke li ho odobservable(c) normalized log-likelihood per observable(log-likelihood/data points)00.0010.0020.0030.0040.0050.0060.0070.0080.009NO3 conc.NH4 conc.av er ag e NS Robservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points) conc.NH4 conc.R2observable(e) R2 per observable (coefficients of determination)para eterfittingmetabolitesreactionsspeciesenvironmental variablesmicrobial community modelMCM control scriptmetabolite concentrationsreaction ratescell densities ...measured time series data 0 5e-05 0.0001 0.00015 0.0002 0.00025 0.0003 0.00035 0.0004 0.00045 0  5  10  15  20  25concentration (mol/L)time (day)(a) NO3 concentration (data vs model)loglik=8.87154 (log-normal error structure, 9 data points)!=0.224249 (relative SD=0.232881)data units/(mol/L)=1likelihood made unitless by dividing by geometric mean of dataOverall log-likelihood: 17.8332 (evaluated over 19 data points)Normalized log-likelihood (=log-likelihood/data points): 0.938589datamodel 0.0005 0.00055 0.0006 0.00065 0.0007 0.00075 0.0008 0.00085 0.0009 0.00095 0  5  10  15  20  25concentration (mol/L)time (day)(b) NH4 concentration (data vs model)loglik=8.96166 (log-normal error structure, 10 data points)!=0.099155 (relative SD=0.099889)data units/(mol/L)=1likelihood made unitless by dividing by geometric mean of datadatamodel 0 0.2 0.4 0.6 0.8 1NO3 conc.NH4 conc.normalized log-likelihoodvariable(c) normalized log-likelihood per variable(log-likelihood/data points)-6e-05-4e-05-2e-05 0 2e-05 4e-05 0  5  10  15  20  25export rate (mol/(L*day))time (day)(a) HNO2_NO2-1e-45-5e-46 0 5e-46 1e-45 0  5  10  15  20  25export rate (mol/(L*day))time (day)(b) NO2 0 1e-05 2e-05 3e-05 4e-05 5e-05 6e-05 7e-05 8e-05 0  5  10  15  20  25export rate (mol/(L*day))time (day)(c) HNO3_NO3-1e-45-5e-46 0 5e-46 1e-45 0  5  10  15  20  25export rate (mol/(L*day))time (day)(d) NO3-1e-45-5e-46 0 5e-46 1e-45  5  10  15  20  25export rate (mol/(L*day))time (day)(e) NO-1e-45-5e-46 0 5e-46 1e-45 0  5  10  15  20  25export rate (mol/(L*day))time (day)(f) N2O-7e-05-6e-05-5e-05-4e-05-3e-05-2e-05-1e-05 0 0  5  10  15  20  25export rate (mol/(L*day))time (day)(g) NH3_NH4-1e-45-5e-46 0 5e-46 1e-45 0  5  10  15  20  25export rate (mol/(L*day))time (day)(h) NH3-1e-45-5e-46 0 5e-46 1e-45 0  5  10  15  20  25export rate (mol/(L*day))time (day)(i) NH4-0.0001-8e-05-6e-05-4e-05-2e-05 0 0  5  10  15  20  25export rate (mol/(L*day))time (day)(j) O2-1e-45-5e-46 0 5e-46 1e-45 0  5  10  15  20  25export rate (mol/(L*day))time (day)(k) NH2OHmodel predictions st istical evaluation against datasensitivity analysis parameter estimation (fitting)modelrefinement 0 0.0001 0.0002 0.0003 0.0004 0.0005 0  5  1  15  20concentration (mol/L)time (day)(a) NO3 concentration (data vs model)l glik=9.79005 (log-normal error structure, 10 data points)estimated σ=0.19 745 (relative SD=0.197111)SSR = 4.79772e-09 (mol/L)2fixed d ta units = mol/Llikelihood mad  unitless by dividing by data meanOverall log-likelihood: 20.877 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 0.994143Sum of normalized (unitless) squared residuals (SNSR) = 0.206545Average normalized squared residual = SNSR/data points = 0.00983548datamodel 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 0  5  10  15  20concentration (mol/L)time (day)(b) NH4 concentration (data vs model)loglik=11.087 (log-normal error structure, 11 data points)estimated σ=0.0911208 (relative SD=0.0916902)SSR = 4.36279e-08 (data units)2stimated data units = 0.926906 * mol/Llikelihood made unitless by dividing by data meandatamodel 0 0.2 0.4 0.6 0.8 1 1.2NH4 conc.NO3 conc.normalized log-likelihoodobservable(c) normalized log-likelihood per observable(log-likelihood/data points) 0 0.002 0.004 0.006 0.008 0.01 0.012NH4 conc.NO3 conc.average NSRobservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points) 20.75 20.8 20.85 20.9 20.95 21 21.05 21.1 21.15 21.2 21.25 20  40  60  80  100  120data pointssimulation(a) Number of evaluated data points during fitting-25-20-15-10-5 0 5 10 15 20 20  40  60  80  100  120LLsimulation(b) Log-likelihood during fitting-1.2-1-0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 20  40  60  80  100  120NLLsimulation(c) Normalized log-likelihood during fitting(a) NO3 concentrationloglik=8.89307 (log-normal error structure, 10 data points)estimated σ=0.209739 (relative SD=0.216783)SSR = 7.66345e-09 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meanOverall log-likelihood: 19.2918 (evaluated over 21 data points)Normalized log-likelihood (=log-likelihood/data points): 0.918659Sum of normalized (unitless) squared residuals (SNSR) = 0.280244Average normalized squared residual = SNSR/data points = 0.0133450.00050.00060.00070.00080.00090.0010 5 10 15 20concentration (mol/L)time (day)(b) NH4 concentrationloglik=10.3988 (log-normal error structure, 11 data points)estimated σ=0.0970036 (relative SD=0.0976908)SSR = 4.62685e-08 (mol/L)2fixed data units = mol/Llikelihood made unitless by dividing by data meandatamodel00. conc.NO3 conc.normalized log-likelihoodobservable(c) normalized log-likelihood per observable(log-likelihood/data points)00.0050.010.0150.02NH4 conc.NO3 conc.average NSRobservable(d) average normalized squared residual per observable(sum of normalized squared residuals/data points)0123456780 5 10 15 20pHpH measured during experiment)time (days)(a) pHpH:  linear interpolation (data/pH)random: logOU (mean 0.8, SD 2.7, tau 2) and exp(t/10)(b) randomMCMFigure 4.2: Overview of MCM’s working principle and functionalities: A icrobial communitymodel is specified using human-readable configuration files in terms of metabolites, reactions, themetabolic potential of cell species and any additional environmental variables. Models with multipleecosystem compartments are also possible. A script with MCM commands controls the analysis ofthe model and, if needed, its calibration using experimental data. The calibrated model can alsobe used to create new, more complex models (as exemplified in this chapter).43Cell-metabolic models for microbial ecologyA B CFigure 4.3: Relative cell densities of the A, SS and FS types during three replicated evolutionexperiments by Herron and Doebeli (188, Fig. 2), starting with the same ancestral E. coli strain.Within each experiment, the illustrated SS or FS lineage comprises several strains with varyinglypronounced SS or FS phenotypes, respectively. Cell generations were translated to days by assumingan average of 6.7 generations per day (188).A B CD E F GH I J kFigure 4.4: Calibration of E. coli cell models using monoculture experiments. Continuouscurves: Time course of cell densities, glucose concentration, acetate concentration and oxygenconcentration (columns 1–4, respectively) predicted by MCM for monocultures of strain A, SS andFS (rows 1–3, respectively) grown on glucose-acetate medium. Points are data used for modelcalibration and were obtained from analogous monoculture growth experiments (262). Oxygendata were not available for strain A.44Cell-metabolic models for microbial ecologyABC D EF G HI J KFigure 4.5: Dynamics of the E. coli microbial community model. (A) Relative cell densitiesof the A, SS and FS types over time. (B) Acetate concentration over time. (C), (D) and (E): SSand FS cell densities, relative cell densities and growth rates over time, respectively, during stablecoexistence. (F), (G) and (H): Cell-specific glucose, acetate and oxygen uptake rates over time,respectively. Negative values correspond to export. (I), (J) and (K): Glucose, acetate and oxygenconcentrations over time, respectively. Diurnal fluctuations in all figures are due to daily dilutionsinto fresh medium. Tics on the time axes in (c–k) mark points of dilution.45Cell-metabolic models for microbial ecologyA BC Dacetyl-Pacetateacetyl-CoApta 9.3/0 (1.6)acs 0 (0.8)ack9.3/0 (1.4 ) environmentalacetate poolpyruvateglycolysisppc1.1 (1.7)environmentalglucose poolpdh1.6 (1.5)pyk 4.6 (1.7)acetyl-Pacetateacetyl-CoApta0/0 (0.9)acs1.9 (1.6)ack0/0 (1)pyruvateglycolysisTCA cycleppc0.8 (1)environmentalglucose poolpdh0.5 (1.8)pyk0 (0.7)environmentalacetate pool1.3 0.62.1TCA cycleacetyl-Pacetateacetyl-CoApta 9.3/0 (1.6)acs 0 (0.8)ack9.3/0 (1.4 ) environmentalacetate poolpyruvateglycolysisppc1.1 (1.7)environmentalglucose poolpdh1.6 (1.5)pyk 4.6 (1.7)acetyl-Pacetateacetyl-CoApta0/0 (0.9)acs1.9 (1.6)ack0/0 (1)pyruvateglycolysisTCA cycleppc0.8 (1)environmentalglucose poolpdh0.5 (1.8)pyk0 (0.7)environmentalacetate pool1.3 0.62.1TCA cycleFigure 4.6: Metabolic differentiation of the A, SS and FS types. (A) Predicted cell-specific netmetabolite uptake rates in co-culture. (B) Predicted cell-specific reaction rates in co-culture, for acs(acetyl-CoA synthesis), ack (acetate synthesis), pta (acetyl phosphate synthesis), ppc (oxaloacetatesynthesis from phosphoenolpyruvate), pdh (decarboxylation of pyruvate to acetyl-CoA) and pyk(pyruvate synthesis from phosphoenolpyruvate). Rates in (A) and (B) are averaged over all timepoints within the first 100 days of evolution, with reversed reactions or net metabolite exportrepresented by negative rates. (C) and (D): Simplified model subset of E. coli acetate and glucosemetabolism, showing pathway activations in type SS (C) and FS (D) relative to type A duringexponential growth in monoculture. Non-bracketed numeric values are ratios of predicted fluxesin the evolved types over fluxes in type A. Bracketed values are ratios of mRNA harvested frommonoculture experiments by Le Gac et al. (262), for comparison. A ratio of 0/0 indicates zero fluxin both the evolved and ancestral type, a ratio of 1 corresponds to an unchanged flux or mRNA,a ratio of 0 corresponds to complete deactivation in the evolved type. Darker arrows indicateincreased predicted fluxes in the evolved type. Flux predictions correspond to the time points ofmRNA measurements, i.e., 3.5 hours after dilution for SS and 4 hours after dilution for A and SS(262).46Cell-metabolic models for microbial ecologyA BFigure 4.7: Predicted relative cell densities of the SS and FS types in batch co-culture when re-stricted to either the first glucose-rich (A) or second glucose-depleted (B) season. In (A), restrictionto the first season was achieved by shorter dilution periods which prevented the complete depletionof glucose. In (B), restriction to the second season was achieved by using the glucose-depletedacetate-rich solution, produced by the full-batch co-culture, as growth medium (see the Methodsfor details).47Transients of competitive exclusionChapter 5Transient dynamics of competitive ex-clusion in microbial communities15.1 SynopsisMolecular profiling in bioreactors has revealed that microbial community composition canbe highly variable while maintaining constant functional performance, in accordance with apathway-centric paradigm for microbial ecology. Similarly, following perturbation bioreactorperformance typically recovers rapidly while community composition only slowly returns toits original state. At this point we still lack a detailed understanding of the actual mecha-nisms causing the discrepancy between functional and compositional stability of microbialcommunities. Using a mathematical model for microbial competition, as well as simulationsof a model for a nitrifying bioreactor, we explain these observations on grounds of slow non-equilibrium dynamics eventually leading to competitive exclusion. In the presence of severalcompeting strains, metabolic niches are rapidly occupied by opportunistic populations, whilesubsequent species turnover and the eventual dominance of top competitors proceeds at amuch slower rate. Hence, functional redundancy causes a separation of the time scales char-acterizing the functional and compositional stabilization of microbial communities. Thiseffect becomes stronger with increasing richness, because greater similarities between topcompetitors lead to longer transient population dynamics.5.2 IntroductionMicrobial metabolism drives the biochemistry of virtually all ecosystems and plays a centralrole in industrial processes such as biofuel production and wastewater treatment (16, 116).Thus, understanding the mechanisms that shape the dynamics and metabolic performance1A version of this chapter has been published (see the Preface for author contributions): Louca, S.,Doebeli, M. 2015. Transient dynamics of competitive exclusion in microbial communities. EnvironmentalMicrobiology. 18:1863–1874. DOI:10.1111/1462-2920.1305848Transients of competitive exclusionof microbial communities is of great practical importance. Experiments with bioreactorshave shown that bioreactor performance can be constant despite highly variable microbialcommunities (506). For example, following functional stabilization, methanogenic or nitri-fying bioreactors can exhibit species turnover for several more years (122, 524). In somecases non-equilibrium community trajectories have been reproduced across replicated ex-periments, suggesting that the underlying processes are deterministic (24, 495). Even whencommunities converge to a steady composition, recovery of community composition followingperturbation can take several months. This is in contrast to metabolic throughput, whichrecovers within a few days (144, 461). Fluctuations and the rate of stabilization of micro-bial communities are thus multifaceted properties that depend greatly on whether the focusis on metabolic function or taxonomic composition. An improved understanding of theseproperties in microbial communities is crucial for optimizing microbially driven industrialprocesses and interpreting the response of ecosystems to anthropogenic perturbations.It has been hypothesized that functional redundancy and non-equilibrium population dynam-ics within each metabolic compartment could promote fast stabilization of performance withslow convergence of community composition (46). Here we show that temporary populationdynamics leading to an eventual steady community composition via competitive exclusioncan indeed last much longer than the time required for the stabilization of overall metabolicperformance. We first formalize our reasoning using a microbial community model, in whichmultiple strains compete for a common resource. The model illustrates in a simple wayhow taxonomic community composition can vary almost independently of the community’smetabolic performance. We then construct a more realistic model of a nitrifying bioreactorand use simulations to demonstrate the validity of our arguments and their consistency withprevious experimental observations.5.3 Results and discussion5.3.1 Competition for a common resourceMicrobial community richness can be disproportionally high compared to the metaboliccomplexity of bioreactors (123, 523). In fact, the coexistence of organisms with similarmetabolic function in these systems has been contrasted to the “competitive exclusion prin-ciple” (122, 178). In the case of a single limiting resource, Tilman’s competition theorypredicts that at equilibrium the only persisting competitor will be the one that can surviveat the lowest resource concentration (479). However ecosystems can be subject to long tran-49Transients of competitive exclusionsient dynamics, i.e., temporary population dynamics far from equilibrium, and convergenceto equilibrium might occur at much longer time scales than assumed (180). For example,slow species turnover has been suggested to be responsible for the perplexingly high diver-sity seen in many microbial systems (67) and, as we show here, can explain the discrepancybetween functional and taxonomic stability in bioreactors.To formalize our argument, we use a model for multiple populations competing for the samelimiting resource. We focus on the transient dynamics eventually leading to steady state,where resource input is balanced by microbial consumption. The cell density of strain i,denoted Ni, as well as the resource concentration, denoted R, are described by the followingdifferential equations:dNidt= Ni[βiΦi(R)− λi], (5.3.1)dRdt= fo −∑iNiΦi(R). (5.3.2)Here, fo is the constant resource supply rate, λi is the decay rate of strain i in the absence ofgrowth, Φi(R) is the rate at which cells take up the resource as a function of R, and βi is abiomass yield factor. We assume that Φi(R) increases with R. For example, Φi(R) could bea Monod function that increases linearly with R at low concentrations but saturates at highconcentrations, an assumption often made in geobiological and bioengineering models (211).The last term in Eq. (5.3.2) is a sum over all strains accounting for resource depletion bymicrobial metabolism.The growth rate of strain i is positive whenever R is greater than the threshold concentra-tion, Roi , defined by Φi(Roi ) = λi/βi. In general, the equilibrium of Equations (5.3.1) and(5.3.2) is characterized by the extinction of all but one strain, namely the strain with thelowest survival threshold Roi . To elucidate the transient dynamics preceding this competi-tive exclusion, we consider the total cell density N = ∑iNi and the relative cell densitiesηi = Ni/N . Using the community-average growth kinetics (denoted Φ, βΦ and λ), one canderive the dynamicsdηidt= εi · ηi(βΦ− λ) (5.3.3)for the relative cell densities,dNdt= N(βΦ− λ)(5.3.4)50Transients of competitive exclusionfor the total cell density N , anddRdt= fo −NΦ (5.3.5)for the resource concentration R (details in Appendix D). Here, the εi account for deviationsof strain i growth kinetics from the community average. For example, if N is growing and εiis positive, then strain i grows faster than average and thus increases in relative abundance.As the resource is depleted, weaker competitors decay and the average growth kinetics aredetermined by a few remaining competitors of similar efficiency, for which the deviations εifrom the average become very small (εi  1). Hence, while the dynamics of N and R aredetermined by the community-average growth kinetics (Eqs. (5.3.4) and (5.3.5)), the relativecell densities are slowed down by the factors εi. This means that while metabolic niches arequickly filled, establishing a high rate of resource uptake, some of the competing popula-tions can coexist during prolonged transition periods until eventual competitive exclusion.In agreement with these predictions, Gentile et al. (144) reports a quick functional stabiliza-tion and long transient periods in community composition following mechanical shock, andVanwonterghem et al. (495) reports a gradually decreasing richness in anaerobic digestersover the course of several months following inoculation.The probability of similar strains being present in a random inoculum, or a microbial commu-nity in general, increases with the number of strains. In particular, the expected dissimilaritybetween top-competitors decreases with increasing community richness. The underlying as-sumption is that growth kinetic parameters are bound within some natural finite range.Hence, one should expect longer transient dynamics of competitive exclusion and slowerconvergence to a steady community composition at higher inoculum richness.It has been previously hypothesized that as richness increases, the variability of ecosystemfunctions decreases whereas the variability of individual populations increases (267, 283,480). The proposed mechanisms typically involve stochastic fluctuations of independentpopulations, so that the total community biomass and functional performance become morestable when more populations contribute to them. This statistical inevitability (105), whichhas been criticized on grounds of interspecific interactions (481), differs fundamentally fromthe deterministic mechanisms explored here. Namely, competition between strains leads to aslow decay of weaker competitors, which is compensated by the growth of other populationsthat stabilize overall functional performance.51Transients of competitive exclusion5.3.2 Bioreactors as model systemsThe above competition model explains how populations occupying a common metabolicniche can, in principle, undergo long transient periods of coexistence. The actual durationand nature of these transients depend on the similarity between competing strains, as wellas their typical intrinsic growth kinetics. To test the relevance of our predictions to realisticmicrobial communities, we examined a separate numerical model for a nitrifying bioreactor(524). Apart from their practical relevance to industrial processes such as sewage treatmentand biofuel production (16), bioreactors are also ideal model systems for understandingmicrobial ecology and processes shaping microbial community structure (123, 160, 495). Thebioreactor considered here is a flow-through chemostat continuously fed with ammonium(NH+4 ), which is aerobically oxidized to nitrate (NO−3 ) in a two-step process. Oxidationoccurs in a microbial community that consists of chemoautotrophic ammonium-oxidizingbacteria (AOB), which transform ammonium to nitrite (NO−2 ), and chemoautotrophic nitrite-oxidizing bacteria (NOB), which transform nitrite to nitrate. Nitrate is exported from thebioreactor as part of a continuous outflow through a filter membrane that retains cells withinthe bioreactor. The substrate feed rate and the hydraulic dilution rate are kept constant andin line with previous bioreactor experiments (524), allowing the establishment of a steadymetabolic throughput following an initial startup period.The bioreactor’s microbial community is modeled using differential equations for the cellpopulation densities and the ambient ammonium, nitrite and nitrate concentrations. Thesemetabolites are subject to microbial production and depletion, as well as physical additionand removal from the bioreactor. The metabolic activity of individual cells is determined us-ing flux balance analysis (FBA), a widely used framework in cell-metabolic modeling (354).In FBA, the chemical state of cells is assumed to be steady, leading to stoichiometric con-straints that need to be satisfied for any particular combination of intracellular reactionrates. These rates are assumed to be regulated by the cell in such a way that some objectivefunction, commonly associated with biomass production, is maximized subject to additionalconstraints on substrate uptake rates (119). In our case, the optimized biosynthesis rate istranslated to a growth rate by dividing by the cell mass. Ammonium and nitrite uptakerates are limited by substrate concentrations in a Monod-like fashion, thus constraining theachievable growth rates depending on the bioreactor’s chemical state (177, 292).The assumption of cells maximizing biosynthesis, subject to environmental and physiologicalconstraints, is rooted in the idea that evolution has shaped regulatory mechanisms to inducemaximum growth whenever possible (51, 176). This assumption is less valid for genetically52Transients of competitive exclusionengineered organisms or those exposed to environments that are radically different from theenvironments that shaped their evolution, and other objectives such as ATP production ormetabolic efficiency have been proposed (146, 417). Biosynthesis has been experimentallyverified as an objective for, among others, Saccharomyces cerevisiae, Escherichia coli andNitrosomonas europaea (107, 119, 364). Despite its limitations, FBA with maximization ofgrowth has greatly contributed to the understanding of several single-cell metabolic networksas well as metabolic interactions between cells (70, 129, 238, 354). One advantage of FBAmodels over full biochemical cell models is their independence of intracellular kinetics andgene regulation, which limits the number of required parameters to stoichiometric coefficientsand uptake kinetics. Recent work has shown that FBA-based models with maximization ofgrowth can accurately predict microbial community dynamics (70, 177, 284, 318).Our bioreactor model comprises multiple AOB and NOB strains, which are constructed byrandomly choosing several cell parameters around those of two template AOB and NOBmodels. The AOB and NOB templates were calibrated and validated beforehand usingdata from previous bioreactor experiments (Fig. 5.1; see Section D.1 for details). Becausemetabolites can be depleted or produced by several cells, the environmental metabolite poolmediates the metabolic interactions between cells (410). For example AOB deplete ammo-nium from their environment, rendering it a limiting resource that mediates competitionbetween AOB strains. The excretion of nitrite as a by-product, in turn, enables the growthof nitrite-limited NOB. The metabolic optimization of individual cells striving for maximalgrowth, while modifying their environment, thus leads to non-trivial community dynamicsthat can include cooperation, competition and extinction.5.3.3 Bioreactor community dynamicsFollowing inoculation of the bioreactor, two phases can generally be distinguished (Fig.5.2). Initially, the concentration of inflowing ammonium increases until AOB populationshave grown to sufficient densities to balance ammonium supply by ammonium consumption.The accumulation of nitrite as an AOB waste product, in turn, triggers the growth of NOBpopulations until nitrite production is eventually balanced by nitrite consumption. Thisinitial startup phase is dominated by fast-growing opportunists that benefit from an excessof substrates and little competition. The duration of this phase is mainly determined bythe hydraulic renewal rate, ammonium supply rate and bacterial growth rates, and theduration predicted by our model (roughly 3 weeks) is in line with typical nitrifying bioreactorexperiments (109, 302).53Transients of competitive exclusionAs ammonium and nitrite consumption increase, their concentrations decrease to near orbelow the survival thresholds for an increasing number of strains (Fig. 5.2C). This secondsaturation phase is characterized by low and relatively stable substrate concentrations, stag-nation of growth, a gradual extinction of less competitive strains and a long coexistence ofsimilar top competitors (Fig. 5.2A). The microbial community slowly converges to a stablecomposition of decreased diversity in which each metabolic niche is occupied by a singlestrain, with transient periods occasionally lasting up to several thousands of days. A grad-ual decrease in diversity is expected under the competitive exclusion principle of equilibriumecology (178) and is consistent with similar observations in previous bioreactor experiments(495). On the other hand, the total cell densities of metabolically similar strains (e.g.,AOB) stabilize much faster and only vary weakly during the saturation phase (Fig. 5.2B).Hence, each of the two available metabolic niches is rapidly filled by several competing andtemporarily coexisting strains, which are only slowly replaced by the top competitor.Our results show how transient dynamics of competitive exclusion can lead to a separation oftime scales characterizing functional and compositional stabilization of communities. Thisseparation of time scales is also expected to be reflected in the community’s response toperturbations. Perturbations such as mechanical biofilm removal (144) or nutrient shocks(461) can alter the relative abundances of individual clades or lead to a temporary collapseof the community. Such a collapse would initiate a race for the (re-)occupation of metabolicniches and a subsequent saturation phase, analogous to the dynamics following inoculation.For example, Gentile et al. (144) observed that, after the shearing of biofilms inside a fluidizedbed reactor, community composition recovered much slower, i.e., it had lower resilience (419),than the bioreactor’s performance.Simulations of the bioreactor model including a strong pulse perturbation, applied simulta-neously to the entire community, reproduced these observations (Fig. 5.3). The modeledperturbation corresponds to an increased mortality for one day, with a strength chosen ran-domly for each strain and resulting in a temporary collapse of the community by severalorders of magnitude. Consistent with experimental observations, the bioreactor’s perfor-mance quickly recovers within a few days to weeks (Figs. 5.3C,D), while the community’srecovery to its original composition typically takes several months to years (Figs. 5.3A,E,F).Metabolic niches are reoccupied rapidly and concurrently with the bioreactor’s functionalstabilization (Fig. 5.3B), however metabolic niches can be temporarily shared by severalcoexisting strains. Non-equilibrium processes, particularly following perturbation, are fre-quently thought to maintain high diversity, for example in rain forests (75) or phytoplankton(432). Furthermore, a meta-analysis by Shade et al. (419) found more studies reporting re-54Transients of competitive exclusioncovery of microbial community function than composition, following pulse perturbation.As predicted above by our competition model, the discrepancy between functional and taxo-nomic stability should be stronger for communities with high richness because the likelihoodof two similar top competitors increases, thus delaying competitive exclusion. Simulations ofbioreactors inoculated with different numbers of random strains verify this prediction. Forexample, the time until compositional convergence following inoculation, i.e., reaching a 90%Bray-Curtis similarity to the steady state (266), ranges from roughly 600 days for 20 strainsto 1300 days for 100 strains (median values, Figs. 5.2E,F). Moreover, richer communitiesare expected to be more prone to temporary changes in composition during perturbationbecause of a greater reservoir of opportunistic strains that could temporarily invade (524).This is reflected in our simulations, where a greater number of strains correlates with astronger change in community composition following the pulse perturbation (Figs. 5.3E,F).The insensitivity to disturbance is known in ecology as resistance and is, together with re-silience, a common measure of community stability (419). Our work suggests that microbialcommunities with higher functional redundancy have lower resilience and lower resistance topulse perturbation in terms of taxonomic composition.5.3.4 Variable does not mean unstablePrevious bioreactor experiments have revealed variable community composition despite sta-ble bioreactor performance over hundreds of days following inoculation (122, 495, 524, 548),while others have reported convergence to steady compositions within a few months (144,314, 533). Fluctuating community compositions are often interpreted as unstable, non-convergent or even chaotic. However, the observed dynamics may be mere transients ofslowly converging communities. Typical richness in bioreactors can range from hundreds tothousands of operational taxonomic units (OTUs, a species analog based on rDNA similarity)(234, 440, 495). As shown here, at these richness scales transient dynamics of competitiveexclusion can last several years. Much longer operation times might thus be needed to ac-tually observe an eventual community convergence in typical bioreactors. However, at thesetime scales other destabilizing processes, such as the invasion of new strains introduced bycontamination, could prevent community convergence.55Transients of competitive exclusion5.3.5 Model limitationsThe simple models considered in this chapter focus on generic ingredients of microbialecosystems, namely substrate-limited growth and competition, stoichiometric constraintson coexisting pathways, as well as physical substrate repletion and waste removal (e.g., incontinuous-flow bioreactors). In particular, we have assumed that microbial growth increaseswith increasing substrate concentrations, thus ignoring the possibility of substrate inhibition.For example, substrate inhibition can occur during nitrification by excess ammonia and ni-trous acid (15), resulting in reduced bioreactor performance (425). Similarly, growth mayalso be subject to product inhibition, e.g., when the partial pressure of accumulating wasteproducts renders a pathway unfavorable (225). Accurately modeling specific industrial se-tups or natural systems may thus require a consideration of more complicated kinetics, e.g.,including substrate and product inhibition. Further, metabolic niche structure in naturalsystems may be more complex than considered here since functional groups may partly orcompletely overlap, for example if a single organism performs both nitrification steps (89).Our main point is that long transient dynamics can emerge even in the simple cases con-sidered here, acknowledging that more complex communities are likely subject to furtherdestabilizing mechanisms (see below).5.3.6 Alternative destabilizing factorsTransient dynamics of competitive exclusion provide a simple explanation for the discrepancybetween functional and taxonomic stability of microbial communities, and our simulationsunderline the relevance of these processes at least to typical bioreactor setups. However,other mechanisms likely contribute to a long-term variability of community composition.For example, time lags associated with the degradation of organic matter, such as cellulosehydrolysis in anaerobic digesters (495), can result in slow changes of the metabolic land-scape and optimal electron flow, in turn driving adaptive changes in community composition(122). More complicated non-sequential pathways, ubiquitous in organic carbon catabolism,could also lead to positive feedback loops that further destabilize community dynamics. Fur-thermore, in contrast to well-controlled bioreactors, many natural ecosystems are subject tointense environmental variation that can drive adaptation and succession in microbial com-munities. For example, annual deep-water renewal in a seasonally anoxic fjord has beenshown to cause significant changes in microbial community structure (540).We emphasize that mechanisms that destabilize community composition need not necessarilydestabilize community function. For example, in open systems such as wastewater treatment56Transients of competitive exclusionplants (506) occasional invasion by novel competitors could drive species turnover withoutsignificantly affecting ecosystem functioning, however this scenario is unlikely in bioreac-tors with a sterile feed (495). Similarly, repeated adaptation of bacteriophages to dominanthosts (“killing the winner” dynamics) has been shown to sustain bacterial diversity and drivecontinuous species turnover (423, 475). Collapsing populations could be replaced by less sus-ceptible but functionally similar populations that ensure the overall stability of biochemicalfluxes.Reciprocally, negative feedback mechanisms stabilizing biochemical fluxes may only weaklyaffect community composition. For example, substrate build-up can promote the growthof functional groups benefiting from the underutilized resource, in turn counteracting theprocesses causing substrate build-up. This stabilizing mechanism, perhaps comparable to LaChâtelier’s principle of an “opposing force” (411), a priori acts on functional groups ratherthan taxonomy.5.3.7 Towards a pathway-centric microbial ecologyMarker gene-based taxonomic community profiling has become a standard approach in mi-crobial ecology (148, 524). However, metabolic functions may be performed by several com-peting clades and, conversely, members of the same clade can fill separate metabolic niches(3, 308). Such irregular metabolic trait distributions across clades are caused by diverseevolutionary processes, including adaptive loss of function and metabolic convergence accel-erated by frequent horizontal gene transfer (116). As a result, taxonomic profiles, at any tax-onomic level, often obscure the relationship between community structure and function. Forexample, environmental conditions and ecological function can show a stronger correlationto particular metabolic pathways or even individual genes, than to the distribution of par-ticular taxa (Chapter 2; 52, 349, 381). In fact, even physicochemically similar systems withsimilar functional community structure can exhibit markedly different taxonomic composi-tion (Chapter 3; 122). Consistent with this, as we have shown here, compositional stabilitycan be independent from functional stability while specific functional groups (for exampleAOB) remain synchronized with the community’s metabolic activity. Hence, prokaryotictaxa or OTUs should be questioned as ecologically meaningful units for describing commu-nity structure, at least when the focus is on ecosystem functioning (106, 145, 510). Microbialecology and biogeography might be best understood using pathway-centric theories in whichindividual genes, operons or pathways are considered as basic reproductive and functionalunits, particularly under conditions where metabolic function defines the microbial niche57Transients of competitive exclusionspace (109, 386). Accordingly, metagenomic, metatranscriptomic and metaproteomic profil-ing would be more suitable than taxonomic profiling for monitoring or predicting fluctuationsin ecosystem functioning (106, 181, 510). Alternatively, functional community structure maybe estimated from marker gene sequences by binning known OTUs into functional groups,as demonstrated in Chapters 2 and 3.5.4 ConclusionsConvergence of microbial community composition is a gradual process that can last muchlonger than typical bioreactor experiments and environmental surveys. Transient dynamicsof competitive exclusion explain why microbial communities can remain variable long afterinoculation or perturbation, while exhibiting high functional stability. The correct interpre-tation of observed community dynamics in bioreactors and natural ecosystems thus requiresa proper consideration of the involved time scales. Previous work has highlighted the gen-eral mismatch between the duration of typical experiments and the time scales assumed byconventional steady-state ecological theories (180), and our work demonstrates some of theimplications of this mismatch. Fluctuations in natural and less controlled microbial commu-nities likely result from several destabilizing processes, however the effects of these processescould be augmented by transient dynamics of competitive exclusion.Furthermore, less resilient and more flexible communities need not imply a compromisedfunctional stability, and previous experiments have indeed indicated a positive correlationbetween flexible community structure and stable performance (123). Several competingstrains can rapidly and concurrently fill a metabolic niche when opportunities arise, whileslowly replacing each other and maintaining constant performance during saturation. Thetime required for convergence or recovery of community composition correlates positivelywith functional redundancy, because more competitors are likely to have similar efficienciesunder substrate limitation.The extreme case in which each functional group consists of equal competitors is comparableto the so called emergent group theory in ecology, according to which assembly within eachgroup is subject to neutral dynamics (185, 197). In that limit transient periods of com-petitive exclusion can be extremely long, while community composition appears dissociatedfrom environmental conditions and driven by purely stochastic factors. While exact neu-trality is an extreme idealization, some natural communities may indeed include functionalgroups consisting of almost-equal competitors. For example, previous work on a wastewater58Transients of competitive exclusiontreatment plant found that fluctuations within the group of ammonia oxidizing bacteria,as well as within the heterotrophic community, were predominantly explained by neutralprocesses rather than environmental factors (349). Similarly, a global study of desert mi-crobial communities by Caruso et al. (64) found that climatic effects were detectable atthe whole community level, but became undetectable when restricted to variations withinthe photosynthetic or heterotrophic groups. Frossard et al. (134) found that spatiotempo-ral variations of microbial community structure in stream catchments were best describedby a neutral assembly model, whereas potential activities of several carbon-, nitrogen- andphosphorus-acquiring enzymes showed clear seasonal patterns. This disconnect betweenpotential enzyme activities and the composition of their host communities indicates highfunctional redundancy and a decoupling between community function and taxonomy. Simi-larly, Yin et al. (537) showed significant functional redundancy in soil microbial communitiesby measuring population responses to enrichment with individual carbon sources. However,it is unclear whether all detected OTUs were active prior to the enrichment, and at any pointin time a significant fraction of functionally similar clades may have only been present at lowdensities (418). Furthermore, subtle partitioning along additional non-functional axes suchas moisture or pH may create micro-niches that enable the long-term coexistence of function-ally similar populations, particularly in spatially or temporally heterogenous environments.For example, the coexistence of hundreds of subpopulations of the marine cyanobacteriumProchlorococcus is likely enabled by subtle niche differentiation such as adaptations to differ-ent nutrient availabilities (224). The role of neutrality in natural microbial communities andits proper reconciliation with niche theory remains controversial, and patterns that appearto result from neutral drift may in fact have underlying deterministic causes (see Chapter6). Nevertheless, our work shows that approximate neutrality within ecological niches canexplain several patterns of microbial community assembly in engineered environments andshould also be considered when interpreting the dynamics of natural microbial communities.59Transients of competitive exclusionA B DNH+4NO−2NO−3AOBNOBC EFigure 5.1: Calibration and validation of the template AOB and NOB cell models to data from anexperiment with a nitrifying batch-fed bioreactor (A) (94). Ammonium (NH+4 ) was added at thebeginning of the experiment, and was sequentially oxidized to nitrite (NO−2 ) and nitrate (NO−3 ) bya growing nitrifier community. (B) and (C): Ammonium (B) and nitrate (C) concentration timeseries data (dots), compared to the calibrated model (continuous lines). (D): AOB and NOB celldensities over time, as predicted by the model. (E): Nitrite concentration over time, as predictedby the model. Note that while the template cell models were calibrated using batch-bioreactorexperiments, for our subsequent analysis we consider continuously-fed flow-through bioreactorsbecause these can support metabolically active microbial communities at steady state.60Transients of competitive exclusionA C Est. saturation startup saturation st. saturationB D FFigure 5.2: Simulations of the nitrifying bioreactor under constant conditions. (A):Simulated cell densities over time in the ammonium-fed nitrifying membrane bioreactor, inoculatedwith 100 random strains (AOB in variations of red, NOB in variations of blue). (B) Correspondingtotal cell densities per functional group (AOB or NOB). (C) Corresponding ammonium (NH+4 ),nitrite (NO−2 ) and nitrate (NO−3 ) concentrations. (D) Corresponding community-wide ammoniumand nitrite uptake rates. (E,F): Distance of the community from the long-term steady state (interms of the Bray-Curtis dissimilarity index; 266), following inoculation with 20 (E) or 100 (F)random strains. Shown as a probability distribution over 100 random simulations (colors correspondto centiles). Notice the faster rate of convergence to steady state (i.e., resilience) in (E) compared to(F). The two intervals on the top of figures (A,C,E) indicate rough startup and saturation phases,respectively.61Transients of competitive exclusionA C EB D FFigure 5.3: Simulations of the perturbed nitrifying bioreactor. (A): Simulated cell densitiesover time in the ammonium-fed nitrifying membrane bioreactor, inoculated with 100 random strains(AOB in variations of red, NOB in variations of blue). A strong perturbation during day 5000(grey arrow) causes a temporary collapse of the microbial community. (B) Corresponding totalcell densities per functional group (AOB or NOB). (C) Corresponding ammonium (NH+4 ), nitrite(NO−2 ) and nitrate (NO−3 ) concentrations, at times near the perturbation. (D) Correspondingcommunity-wide ammonium and nitrite uptake rates, at times near the perturbation. (E,F): Bray-Curtis dissimilarity of the community to the state shortly prior to the perturbation, in bioreactorsinoculated with 20 (E) or 100 (F) random strains. Shown as a probability distribution over 100random simulations (colors correspond to percentiles). Notice the greater resistance to perturbationand greater resilience in (E), compared to (F).62Microbial communities infected by phagesChapter 6Taxonomic variability and functional sta-bility in microbial communities infectedby phages16.1 SynopsisAs shown in Chapters 2 and 3, microbial communities can display intense variation in taxo-nomic composition across space or time, and yet this taxonomic variation can coincide withstable metabolic functional community structure and constant biochemical performance.This decoupling between taxonomic and functional community structure is presumably en-abled by a high functional redundancy in the global microbiome, however the mechanismsdriving the sustained taxonomic variation within functional groups remain largely unknown.Predation by specialist lytic phages leading to “killing the winner” dynamics has been sug-gested as a potential cause of host turnover, however the plausibility and required conditionsfor this scenario have not been rigorously examined. Further, it is unknown how preda-tion by phages affects community metabolic processes and whether these effects are actuallymitigated by functional redundancy in the host populations. Here we address these issuesusing a model for a methanogenic microbial community that includes several interactingmetabolic functional groups. Each functional group comprises multiple competing strainswith distinct physiological parameters, and each strain is subject to predation by a specialistlytic phage. We find that phages induce intense taxonomic turnover within each functionalgroup, resembling the variability observed in past experiments. The functional compositionand metabolic throughput of the community are also disturbed by phage predation, but theybecome more stable as the functional redundancy in the community increases. Our work re-veals explicit mechanisms by which functional redundancy stabilizes ecosystem performance1A version of this chapter is under peer review for publication (see the Preface for author contributions):Louca, S., Doebeli, M. (in review). Taxonomic variability and functional stability in microbial communitiesinfected by phages.63Microbial communities infected by phagesand supports the interpretation that biotic interactions — rather than random populationdrift, as often suggested — drive taxonomic turnover within functional guilds.6.2 IntroductionMicrobial metabolism powers global biogeochemical fluxes (116) and is a key componentin many engineered ecosystems, such as biofuel production units or wastewater treatmentplants (16). Understanding the mechanisms shaping microbial community composition andfunction is thus paramount towards predicting ecosystem responses to anthropogenic changeand towards optimizing the performance of microbially driven industrial processes (316).Microbial communities can exhibit strong variation in taxonomic composition, both acrosstime and space, and yet this taxonomic variation can coincide with remarkably stable func-tional community structure (109, 288, 349, 397). For example, the proportions of severalmetabolic functional groups, such as nitrifiers, photoautotrophs and sulfate reducers, werefound to be very similar across replicate natural aquatic ecosystems despite strong taxonomicturnover within individual functional groups (288). Similarly, methanogenic and nitrifyingbioreactors operated under constant conditions were found to exhibit intense turnover ofbacterial operational taxonomic units (OTUs) despite constant overall biochemical perfor-mance (122, 123, 506, 524, 548). These observations point towards an elegant paradigmin microbial ecology, in which energetic and stoichiometric constraints determine functionalcommunity structure and performance, but a high degree of functional redundancy in theglobal microbiome enables taxonomic variability within individual functional groups. Theprecise mechanisms causing this taxonomic variability remain largely unknown. Neutralpopulation drift between equivalent competitors is sometimes suggested as a possible cause(349, 427), however non-random phylogenetic structure and co-occurrence patterns withinfunctional groups point towards deterministic community assembly mechanisms, notablybiotic interactions (288). In some cases complex non-equilibrium community trajectorieshave been reproduced across replicate isolated systems under constant conditions, furthersuggesting that the taxonomic variation within functional groups is not random (24, 495).Host-specific predation by lytic phages has been suggested as a potential mechanism promot-ing host succession through “killing the winner” (KTW) dynamics, in which abundant hostpopulations eventually collapse due to increased specialist predation, giving way to oppor-tunistic competitors (322, 397, 398, 423, 499). For example, previous experiments revealedstrong OTU turnover in a bioreactor, in which the temporary emergence of specific OTUs wasfollowed by the temporary increase in their specific phages, consistent with KTW dynamics64Microbial communities infected by phages(424). Predation by lytic phages is increasingly recognized as an important contributor tomicrobial mortality in natural and engineered ecosystems (135, 322, 476), and viral lysis hasbeen shown to significantly reduce the flux of microbially assimilated organic carbon to highertrophic levels (e.g., to protist grazers; 135, 324). On the other hand, the effects of phageson dissimilatory carbon transformations (e.g., via respiration and fermentation) are muchless understood (423), despite the fact that in many (e.g., methanogenic) environments mostorganic carbon is metabolized to byproducts for energy gain rather than assimilated into cellmass (312). Further, it is unclear whether — and under what conditions — phage-drivenKTW dynamics can actually explain the extreme discrepancy between taxonomic variabilityand functional stability observed in microbial communities (109, 288, 349, 397). While thisrole of KTW dynamics is often hypothesized, in practice time lags involved in the recovery orreplacement of collapsed populations, as well as potentially destabilizing interdependenciesbetween metabolic pathways, could prevent the functional stability of communities.To elucidate the effects of phage predation and microbial functional redundancy on com-munity structure and metabolic functioning, we constructed a mechanistic model for amethanogenic microbial community subject to predation by lytic phages, hosted within ananaerobic flow-through bioreactor. Bioreactors constitute powerful model ecosystems formicrobial ecology, because physicochemical conditions can be closely monitored and con-trolled, enabling replicate time series experiments. It is thus not surprising that bioreactorexperiments have greatly contributed to our mechanistic understanding of microbial com-munity assembly and of phage-host dynamics in particular (122, 312, 349, 423, 491). Wechose methanogenic bioreactors as a template for our model because methanogenic metabolicnetworks are well understood and of great industrial relevance (77, 179) and because thisallows for comparisons with previous experiments (122, 123).Our model considers the population dynamics of multiple microbial functional groups in-volved in the anaerobic catabolism of glucose (the input substrate) to methane (CH4), aswell as the concentrations of any intermediate metabolites (77; overview in Fig. 6.1). Specif-ically, in the first step input glucose is fermented to short-chain fatty acids, lactate andalcohols by several bacterial functional groups. These fermentation products are then fur-ther catabolized to hydrogen (H2) and acetate by “syntrophs”, i.e., bacteria that rely onthe rapid consumption of H2 and acetate by hydrogenotrophic (“H2/CO2”) and acetoclasticmethanogenic archaea. Each functional group initially comprises one or more distinct celllineages — henceforth referred to as operational taxonomic units (OTU), which catalyze thesame reaction but differ in several of their physiological parameters, such as their substratehalf-saturation constants. We use “OTU” as an abstraction representing a taxonomic group65Microbial communities infected by phages(such as a strain or species) that is sufficiently narrow so that reaction kinetics are simi-lar across members, and sufficiently broad so that different OTUs are infected by differentspecialist phages (69, 174, 521). The number of OTUs initially present in each functionalgroup (termed “functional redundancy”) is a key parameter in our analysis and accountsfor the presence of multiple functionally similar lineages in many bioreactors and naturalenvironments (128, 224, 288, 289, 349, 526, 537). Note that here functional redundancyonly refers to the number of redundant OTUs at the beginning of our simulations, while wemake no assumptions on the long-term persistence of OTUs. Each OTU is associated witha distinct phage population that infects cells and causes increased mortality through celllysis. Cell infection rates are proportional to phage concentrations and phage concentrationsare, in turn, driven by cell lysis rates. Physiological parameters were chosen randomly foreach OTU and each phage within realistic ranges (Table E.2), to account for the variationtypically seen between strains or species (207, 328, 352). As we describe below, our modelsuccessfully reproduces previous experimental observations and yields novel insight into theeffects of phage predation and functional redundancy on microbial community compositionand function.6.3 Results and discussion6.3.1 Bioreactor dynamics in the absence of phagesFollowing “startup” of the bioreactor, and in the absence of phages, the successive growthof fermenters, syntrophs and methanogens quickly leads to the stabilization of metabolicactivity within a few weeks (Fig. 6.2A). At this stage, microbial metabolism balances glucosesupply and residual substrate loss from the bioreactor, although the exact steady statemetabolite concentrations and community composition depend on the random parameterschosen. Competitive exclusion between reactions that are limited by the same substrateseventually leads to the persistence of only a subset of possible pathways driving glucosecatabolism to CH4 and CO2. Each reaction is eventually performed by at most one remainingOTU, characterized by its ability to persist at the lowest substrate concentration (Figs.6.2B,C). The bulk of biomass is attributable to fermenters and, to a lesser extent, syntrophs.Methanogens only account for a small fraction (< 1%) of the community because most ofthe energy available from glucose catabolism is extracted in the preceding steps (77).66Microbial communities infected by phages6.3.2 Effects of phages on community dynamicsWhen phages are included in the model, specialist predation by phages leads to intense andirregular fluctuations of individual host populations in accordance to KTW dynamics (Fig.6.2D). The duration of each infection cycle (i.e., the period from the initial detection to theeventual collapse of a phage population) varies greatly between phage-host pairs and betweensimulations, but is typically on the order of 20–150 days, consistent with similar durationsobserved in activated sludge (423). For many phage-host pairs these fluctuations closelyresemble classical predator-prey cycles, although most cycles are irregular in their phase andamplitude (Fig. E.3), owing to their strong indirect interactions. Such complex — oftenchaotic — dynamics are common in systems composed of interacting oscillating componentswith distinct random frequencies (161). When averaged over time, predation by phages hasdetrimental effects on individual cell populations as well as on overall reaction rates. Inparticular, in the absence of any functional redundancy, average methane production dropsdown to less than 1% of the production that would typically be achieved in the absence ofphages. Our model suggests that this reduction in performance can occur in at least two ways:First, increased cell mortality through cell lysis results in fewer cells that could consume aparticular substrate before it is lost from the bioreactor. Second, because phage predationis biased towards dominant OTUs, it skews selection towards potentially less competitive(in terms of metabolic efficiency) OTUs within each functional group, leading to residualsubstrate concentrations than are higher than the equilibrium substrate concentrations ofthe top competitor (479). Due to the temporal delays involved in the recovery of populationsor the opportunistic invasion by competitors, phages not only reduce the average metabolicthroughput, but also induce fluctuations around that average (Fig. 6.3C). This may explainpreviously observed fluctuations of bioreactor performance that could not be fully explainedusing purely energetic and reaction-kinetic models (109, 286, 429).Neutral demographic drift between equivalent competitors has previously been suggested asan explanation for seemingly random OTU turnover within functional groups in a wastew-ater treatment plant (349). In such systems, however, cell densities can be extremely high(e.g., ∼ 109 cells · L−1 in lakes and up to 1013 cells · L−1 in bioreactors; 249, 518) and hencedeterministic dynamics such as competitive exclusion between OTUs with even slight phys-iological differences are expected to dominate over pure demographic drift. For example,even at a population size of only 105 cells and a constant difference in growth rates of only1% between two competing OTUs, stochastic trajectories accounting for demographic driftwould closely resemble the deterministic trajectory of exponential decline of the weaker com-petitor (e.g., R2 ∼ 0.98 for a birth-death model with constant combined population size, see67Microbial communities infected by phagesMethods for details). In contrast, our model shows that simple deterministic mechanismscan easily explain the sustained turnover between non-equivalent competitors observed inrealistic settings, without the need for unrealistic neutral models.6.3.3 Functional redundancy promotes functional stabilityWhen considering multiple degrees of functional redundancy, we find a clear trend towardshigher as well as more stable overall methane production rates at elevated functional redun-dancies (Figs. 6.3A,B). This suggests that the opportunistic growth of functionally similarOTUs may mitigate the detrimental effects of phage predation on community function byfilling underutilized metabolic niches, thereby increasing and stabilizing overall communityfunction. The probability that an appropriate alternative OTU is able to quickly replacea collapsing competitor increases with the functional redundancy in the initial inoculum(i.e., the available “seed bank”). Functional niche complementation is generally thought topromote a positive correlation between community richness and functional stability againstexternal environmental perturbations (46, 365, 480, 522). Our work suggests that func-tional complementation in microbial communities also mitigates the detrimental effects thatintrinsically emerging (rather than externally driven) fluctuations can have on ecosystemfunctioning. This may explain the occasionally observed positive relationship between mi-crobial species diversity and biochemical performance, particularly at low diversities, evenin the absence of external perturbations (163, 516, 526). The mechanism proposed hereis fundamentally different from known mechanisms leading to a positive diversity-stabilityrelation in classical food webs, as these mechanisms either involve a differential responseof competitors to environmental perturbations (282) or are based on a skewed distributiontowards weaker consumer-resource interactions (310). We mention that phage-driven KTWdynamics have been previously hypothesized to stabilize bioreactor performance by prevent-ing competitive exclusion and hence maintaining a high functional redundancy that would,in turn, increase resilience to perturbations (423). Our analysis suggests that this interpre-tation may be slightly misleading, because phage predation itself can severely reduce anddestabilize community function, while functional redundancy (e.g., ensured by a rich inocu-lum or exposure to a large pool of potential colonizers) acts to mitigate these destabilizingeffects.At increased functional redundancy, our model predicts that the proportions of various func-tional groups (in terms of relative cell abundances) become more stable, although the extentof this stabilization depends on the functional groups considered (Fig. 6.4I–L). The sta-68Microbial communities infected by phagesbilization of functional community structure is especially pronounced at a “coarse” level,i.e., when considering the proportions between fermenters, syntrophs and methanogens (Fig.6.4I). Specifically, the coefficient of variation of these coarse groups drops from ∼0.9 (medianvalue) in the absence of functional redundancy down to ∼0.2 at 100-fold functional redun-dancy. Since these groups represent indispensable and stoichiometrically coupled catabolicsteps in the bioreactor, it is not surprising that their relative productivities are subjectto strong stabilizing forces. Our model thus provides an explicit explanation for the dis-crepancy between stable functional community profiles and the highly variable taxonomicprofiles often observed under constant environmental conditions (109, 288, 349, 397), andhighlights the central role of functional redundancy coupled with biotic interactions in pro-moting this discrepancy. In particular, our results support the interpretation that the strongOTU turnover observed within functional groups in previous studies and in Chapter 3 was atleast partly caused by biotic interactions including (but not necessarily limited to) predationby phages, rather than by neutral drift. Prolonged transients of competitive exclusion mayalso lead to slow OTU turnover within functional groups, especially following environmentalperturbation (Chapter 5), although the detection of correlated OTU and phage successionin bioreactors provides additional evidence for KTW dynamics in these systems (397, 424).On a finer resolution, i.e., when considering the proportions of single-reaction functionalgroups (e.g., acetoclastic vs H2/CO2 methanogens), the extent of stabilization depends onthe specific set of functions considered (Figs. 6.4J–L). For example, individual ferment-ing functional groups (A–F in Fig. 6.1) appear interchangeable even at high functionalredundancies (Fig. 6.4J), presumably because these groups represent strongly overlappingmetabolic niches (they all ferment glucose). Fluctuations between these reactions, in turn,are predicted to drive comparably strong fluctuations in the proportions of syntrophic func-tional groups (G–J in Fig. 6.1) specializing on different fermentation products (Fig. 6.4K).Irregular transitions between alternative (“parallel”) catabolic electron flows — congruentlywith stable overall catabolic performance — have indeed been observed in previous exper-iments (122). In contrast, the proportions of the two methanogenic groups (K,L in Fig.6.1) quickly stabilize at increasing functional redundancy. These findings reveal that thestability of functional community structure depends on the precise definition of functionalgroups, because non-identical functional groups may be interchangeable in case of overlap(365). A distinction between parallel and sequential functional groups is particularly crucialfor “branched” metabolic networks such as organic carbon catabolism but may be less rele-vant for more sequential functions such as nitrification (oxidizing ammonium to nitrite andthen nitrate) or denitrification (reducing nitrate to nitrite, nitric oxide, nitrous oxide and69Microbial communities infected by phageseventually nitrogen gas; 144).Although not modeled, phage-host co-evolution could further destabilize population dynam-ics within functional groups, for example due to the repeated emergence of new resistantstrains. In addition, rapid evolution of host resistance, for example via clustered regularlyinterspaced short palindromic repeats (CRISPRs; 13), could buffer the impacts of viruseson overall ecosystem functioning (269). While evolution is likely an important componentof phage-host dynamics in natural systems (156, 322, 424), our work suggests that deter-ministic ecological dynamics are sufficient to explain the succession of OTUs often observedduring constant community function. In fact, Shapiro et al. (424) observed rapid successionbetween distantly related OTUs and their associated phages in a bioreactor, suggesting thatreplacement by non-related competitors — rather than adaptive evolution of resistance —was indeed the main mode of host succession.We note that the destabilizing role of phage predation and the stabilizing role of functionalredundancy, as predicted by our model, rely on the assumption that lytic phages are special-ized on single host populations (a prerequisite for KTW dynamics). High host specificities ofphage infectivity (e.g., at the strain or species level) are generally considered to be the rule(69, 174, 521), however it is becoming apparent that host specificity may be more variablethan previously assumed (191, 423). In reality multiple microbial clades may be susceptibleto the same phages and hence the effective functional redundancy in the system may be muchlower than the actual number of bacterial and archaeal strains. In cases where host specificityis the exception rather than the rule, the dynamics are expected to be markedly differentthan predicted here. For example, experimental and theoretical work shows that generalistpredation by protist grazers can severely and permanently reduce community function evenat high prey diversity (213, 366, 474, 476). Further, temperate phage strategies, not consid-ered here, likely have less severe effects on host populations and metabolic throughput thanpredicted by our model. For example, recent findings suggest that phages may be switchingfrom lytic to lysogenic during high host abundances, a behavior termed “piggy-backing thewinner” (243), and this switching may dampen oscillations in host abundances. Hence, inenvironments characterized by temperate — rather than lytic — phage-host interactions (asmay be the case for the human gut, 391), KTW dynamics will be less pronounced and hencemicrobial communities may be more stable in terms of their taxonomic composition. Further,recent work suggests that metabolic host reprogramming by viruses can direct energy andnutrients towards viral replication, potentially altering biogeochemical cycling (199, 405).It is likely, however, that the time scales of such host-phage co-evolutionary dynamics aremuch longer than — and thus of limited relevance to — the ecological successional dynamics70Microbial communities infected by phagesconsidered here.6.3.4 Statistical averaging or dynamic stabilization?As discussed above, a high functional redundancy in the microbial community can reversethe destabilizing effects that KTW dynamics have on net metabolic activity and on theabundances of functional groups (Figs. 6.4A–L). Such stabilizing effects at increased speciesrichness have been hypothesized in the past, based on the expectation that the effects ofmultiple fluctuating populations on broad community properties (such as total biomass oroverall catabolic activity) ought to “average” each other out (267, 283, 480). According tothis interpretation, which assumes that populations fluctuate independently, the stabiliza-tion of broad community properties at elevated functional redundancy is a simple statisticalnecessity (46, 105). In reality, however, populations are inevitably coupled due to competi-tion for resources and metabolic interdependencies, and it is a priori unclear whether theseinteractions lead to stronger or weaker averaging effects (481). For example, fluctuationsin metabolite concentrations caused by changes in specific microbial populations will affectthe growth of all functional groups consuming or producing the particular metabolites, andthese effects will typically act in a similar direction on all members within a functional group.Such positive correlations in the response of competing populations to chemical perturba-tions would act against stabilization by averaging. On the other hand, the collapse of aparticular population due to cell lysis eventually frees a metabolic niche that can be occu-pied by competing populations, and such a “dynamic stabilization” would likely lead to ahigher functional stability when compared to mere statistical averaging. A distinction be-tween the two stabilizing mechanisms — statistical averaging vs dynamic stabilization — iskey to understanding the effects of (metabolic) community interactions on overall functionalstability.To assess whether the functional stabilization observed in our simulations at high functionalredundancy is a mere averaging effect or dynamic, we compared the coefficients of variationof functional group abundances to a null model in which the time course of each cell pop-ulation was randomly shifted in time. This null model resembles the hypothetical scenarioin which populations fluctuate independently of one another, while preserving the broadfeatures of these fluctuations. We defined the “degree of dynamic stabilization” (DDS) in aparticular simulation as the probability that the null model would lead to a higher coefficientof variation (averaged across functional groups) than observed, estimated through repeatedrandom trials. We find that coarse functional group proportions exhibit a high DDS that71Microbial communities infected by phagesapproaches 1.0 at high degrees of functional redundancy (Fig. 6.4M), consistent with theinterpretation that opportunistic competitors quickly replace collapsing populations. Onthe other hand, we find contrasting effects for the proportions of single-reaction functionalgroups. Specifically, the proportions of various fermenting groups exhibit the lowest DDS(∼ 0.1− 0.6, Fig. 6.4N), while the proportions of the two methanogenic groups exhibit thehighest DDS (up to ∼ 0.9 − 1.0, Fig. 6.4P). This is consistent with the interpretation thatgroups consuming similar substrates (e.g., glucose in the case of fermenters) are highly inter-changeable and hence their proportions are only weakly stabilized. On the other hand, therapid stabilization of H2 and acetate concentrations at high functional redundancies appearsto promote a stable ratio between H2/CO2 and acetoclastic methanogen populations.We find that the total cell concentration is only occasionally dynamically stabilized and isoften less stable than predicted purely based on statistical averaging (DDS∼0.1–0.7 at 100-fold functional redundancy). This means that fluctuations of single populations can lead tosynchronized and similarly signed responses across multiple functional groups (e.g., a collapseof the dominant glucose fermenters also induces a temporary collapse of methanogens). Inconsequence, the absolute abundances of individual functional groups are less stable thantheir relative proportions, even at elevated functional redundancies. This means that caremust be taken when assessing the stability of functional community profiles, for examplebased on metagenomic sequences (97), because such profiles generally only reflect the relativebut not the absolute abundances of functional groups.6.3.5 Phages promote prokaryotic diversityClassical competition theory predicts that at steady state only a single competitor can per-sist within a metabolic functional group that is limited by a single substrate, and this com-petitor is determined by its ability to survive at the lowest possible substrate concentra-tion (178, 479). Consistent with this competitive exclusion principle, when we excludedphages from our model each functional group was eventually occupied by at most one OTU(e.g., Fig. 6.2C). In reality, however, microbial richness can be extremely high even insimple engineered ecosystems, such as nitrifying bioreactors (109), methanogenic digesters(123, 150) or activated sludge (234), where it can range from hundreds to thousands ofOTUs. Such high richness is in apparent contradiction to the competitive exclusion prin-ciple. Mechanisms proposed in the past to explain this discrepancy include slow transientdynamics of competitive exclusion far from steady state (285), negative frequency depen-dence through non-linear substrate dependencies (281, 449), externally driven fluctuations72Microbial communities infected by phagesof resource availability or physical conditions (432, 449), as well as phage-driven KTW dy-namics (48, 320, 424, 474, 475, 512). Our work provides support for KTW dynamics as alikely explanation for sustained high OTU richness within metabolic functional groups, es-pecially when high richness is observed in the absence of obvious environmental fluctuations(122, 349, 424). For example, in our simulations with 20-fold functional redundancy, in thelong term each functional group is typically occupied by 5–10 OTUs, each of which occasion-ally and temporarily increases in relative abundance (e.g., Figs. 6.2E,F). Similarly, as men-tioned earlier, the repeated disruption of reactions prevents the long-term mutual exclusionbetween alternative reactions limited by the same substrates, resulting in a “parallelization”of catabolic fluxes. Notably, during most simulations multiple fermenting groups would con-tribute (occasionally or at all times) to the catabolism of glucose. Phage-induced catabolicparallelization can thus maintain functional diversity in addition to diversity within func-tional groups and may explain the cooccurrence of alternative intermediate products (e.g.,multiple fatty acids) sometimes observed in methanogenic digesters (179, 533). We notethat in certain cases (e.g., at high functional redundancy) slow transients of competitiveexclusion can also lead to prolonged coexistence of OTUs with similar metabolic efficiencies(e.g., Fig. 6.2A,B and Chapter 5), and in reality multiple mechanisms likely contribute tothe maintenance of richness within a community.6.4 ConclusionsMicrobial communities can display strong taxonomic variation across space (288, 307) ortime (109, 349, 397) even under similar environmental conditions, while exhibiting relativelyconstant functional community structure. In fact, recent work showed that physical andchemical conditions strongly predict the distribution of metabolic functional groups acrossthe world’s ocean, but only poorly explain the taxonomic composition within individualfunctional groups (289). Both Louca et al. (288) and Louca et al. (289) found no effect ofspatial dispersal limitation on community differences, and Louca et al. (288) showed thatOTUs within individual functional groups displayed non-random phylogenetic relationshipsand non-random co-occurrence patterns that were at least partly caused by biotic inter-actions. These findings suggest an elegant paradigm for microbial ecology, in which thefunctional structure and the taxonomic composition within functional groups constitute twoseparate facets of community composition, with the former being driven by stoichiometricand energetic environmental constraints, while the latter is heavily shaped by complex bi-otic interactions. Our mechanistic model provides further support for such an interpretationby explicitly demonstrating how — and under which conditions — predation by specialist73Microbial communities infected by phagesphages can drive OTU turnover while maintaining constant functional community structureand metabolic performance. Experimental evidence for phage-driven KTW dynamics re-mains limited (322, 357, 397, 398, 424, 499), mostly due to the technical difficulties involvedin virome profiling and in linking particular phages to their specific hosts in natural ecosys-tems. Nevertheless, many natural environments, including the open ocean (462), soil (18)and the human gut (391), exhibit high phage densities and it is likely that phages also con-tribute to the decoupling between taxonomic and functional community structure in theseenvironments (134, 288, 289, 307, 379, 381).The decoupling between functional and taxonomic community structure, especially at highfunctional redundancy, has important implications for the interpretation of microbial bio-geographical patterns, because variation in taxonomy need not imply differences in commu-nity function. Reciprocally, environmental constraints determining community function mayonly poorly explain the distribution of individual taxa. If phage-induced KTW dynamicsindeed strongly shape the taxonomic variation within functional groups, as suggested bythis and previous work, then the aspiration of accurate microbial species distribution models(258, 455) may turn out to be a Sisyphean struggle, especially when important functionaltraits are non-monophyletic (308). Disentangling the functional and taxonomic variationin microbial communities is thus an important prerequisite for a truly predictive microbialecology.74Microbial communities infected by phagesglucoseacetate H2ethanol lactate butyratepropionatemethaneAKGglucose fermentationsyntrophy (catabolism of alcohols, lactate and SCFA)acetoclastic and H2/CO2 methanogenesisB C D E FH I JLFigure 6.1: Modeling methanogenic communities. Simplified metabolic network of anaerobicmethanogenic communities (77), as modeled in this study. Each circle represents one of 12 func-tional groups specialized on a particular metabolic reaction and each reaction can be performedby multiple competing OTUs (“functional redundancy”). The network is roughly structured into3 sequential “coarse” functional groups based on the type of substrate used (fermentation, syntro-phy and methanogenesis). Detailed reaction stoichiometry in reference to the inscribed letters isprovided in Table E.1.75Microbial communities infected by phages01e+092e+093e+09500 600 700 800 900 1000cell abundances (cells/L)time (days)Species proportions (HI_20x, run_21, all species)02e+114e+116e+118e+110 200 400 600 800 1000cell abundances (cells/L)time (days)Species abundances (NI_20x, run_09, all species) 200 400 600 800 1000proportionstime (days)methanogens_ac OTU proportions (NI_20x, run_09) 200 400 600 800 1000proportionstime (days)methanogens_H2 OTU proportions (NI_20x, run_09) 200 400 600 800 1000proportionstime (days)methanogens_H2 OTU proportions (HI_20x, run_21)C H2/CO2 methanogens B acetoclastic methanogensall OTUsA0246810¹¹ cells/L12310⁹ cells/LFEDno phageswith phagesOTU proportionsOTU proportions00. 200 400 600 800 1000proportionstime (days)methanogens_ac OTU proportions (HI_20x, run_21)1time (days)time (days)2 4 6 8time (days)ethanol-consuming syntrophethanol-producing fermenterethanol-consuming syntrophFigure 6.2: Phage predation drives OTU turnover. (A) OTU abundances (one color perOTU) during a single simulation without phage predation, at 20-fold functional redundancy. Com-petitive exclusion leads to the extinction of all but a single fermenter and all but two very similarsyntrophs (although eventually only one syntroph remains, not shown). Methanogens exhibit muchlower abundances than fermenter and syntrophs and are thus not visible. (B,C): Proportions ofacetoclastic methanogens (B) and proportions of H2/CO2 methanogens (C) (one color per OTU)during the same simulation as (A). (D–F): Analogous to (A–C), but for a simulation includingphage predation. Phage-host interactions drive variation in overall cell abundances (D) as well asin the OTU composition within functional groups (E,F).concentration (μM)functional redundancy1x 5x 20x 50x 100xA B0306090mean effluent methane C effluent methane D effluent methane 0 40 80 120 500  600  700  800  900  1000µMtime (days)methane concentrations (100x, run_23) 0 40 80 120 500  600  700  800  900  1000µMtime (days)methane concentrations (100x, run_23)µMimethane concentrations (20x, run_21)methane concentration (μM)01234CV of effluent methane20x functional redundancy 100x functional redundancyfunctional redundancy1x 5x 20x 50x 100x   500  600  700  800  900  1000µMti e (days)t  c c tr ti s (2 x, run_21) 0 40 80 120 500  600  700  800  900  1000µMtime (days)methane concentrations (20x, run_21)coefficient of variationFigure 6.3: Functional redundancy increases and stabilizes methane production in thepresence of phages. (A,B): Temporal (A) averages and (B) coefficients of variation of effluentmethane concentration, for various degrees of functional redundancy. Box plots represent the dis-tribution across 100 random simulations. Vertical bars comprise 95% percentiles. (C,D): Methaneconcentrations during a single simulation at (C) 20-fold and (D) 100-fold functional redundancy.76Microbial communities infected by phages00. 600 700 800 900 1000proportionstime (days)Methanogenic gene proportions (HI_100x, run_22) 600 700 800 900 1000proportionstime (days)Syntrophic gene proportions (HI_100x, run_22) 600 700 800 900 1000proportionstime (days)Fermenting gene proportions (HI_100x, run_22) 600 700 800 900 1000proportionstime (days)Coarse gene group proportions (HI_100x, run_22) 600 700 800 9 0 1000proportionstime (days)Fermenting gene proportions (HI_20x, run_21) 600 700 800 900 1000proportionstime (days)Syntrophic gene proportions (HI_20x, run_21) 600 700 800 9 1000proportionstime (days)Coarse gene group proportions (HI_20x, run_21) 600 700 800 9 1000proportionstime (days)Methanogenic gene proportions (HI_20x, run_21)functional redundancyDDSfermenting groupscoarse functional groups syntrophic groups methanogenic groups1x 5x 20x 50x 100x0. B C Dproportions0. E F G H20x  functional redundancy100x  functional redundancyM N O P0. J K Lcoefficient of variation500 600 700 800 900 1000time (days)500 600 700 800 900 1000time (days)0 0 0 0 0 10time (days)00 9 10ti e (days)functional redundancy1x 5x 20x 50x 100xfunctional redundancy1x 5x 20x 50x 100xfunctional redundancy1x 5x 20x 50x 100xFigure 6.4: Effects of functional redundancy on functional community composition inthe presence of phages. Row 1: Proportions of (A) coarse functional groups (total methanogensvs total fermenters vs total syntrophs), (B) fermenting groups, (C) syntrophic groups and (D)methanogenic groups over time (one color per functional group), during a random simulation at 20-fold redundancy. Note that each functional group consists of multiple OTU populations (individualOTU concentrations not shown). Row 2: Similar to row 1, but for a simulation at 100-fold functionalredundancy. Row 3: Coefficients of variation (CV) for the proportions of (I) coarse functionalgroups, (J) fermenting groups, (K) syntrophic groups and (L) methanogenic groups, at variouslevels of functional redundancy. Row 4: Degree of dynamic stabilization (DDS) of functional groupproportions (corresponding to I–L), at various levels of functional redundancy. Box plots (I–P)represent the distribution of CVs (I–L) and DDSs (M–P) across 100 random simulations; verticalbars indicate 95% percentiles.77Gene-centric modeling of the Saanich Inlet OMZChapter 7Gene-centric modeling of the SaanichInlet oxygen minimum zone17.1 SynopsisThe work presented above provides strong support for a pathway-centric paradigm of mi-crobial ecology, according to which the distribution and activity of microbial metabolicpathways is strongly shaped by environmental physicochemical conditions, regardless of thedetailed community assembly mechanisms. The question then arises, what would be anappropriate mathematical model to describe the interaction between metabolic pathwaysand their environment, and how could such a model incorporate modern molecular sequencedata? To address these questions, here we construct a quantitative biogeochemical modelthat describes metabolic coupling along the redox gradient in Saanich Inlet—a seasonallyanoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZ). The modelreproduces measured biogeochemical process rates as well as DNA, mRNA, and proteinconcentration profiles across the redox cline. Simulations make rigorous but unexpected pre-dictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen,and sulfur cycling. For example, nitrite “leakage” during incomplete sulfide-driven denitri-fication by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixationand intense nitrogen loss via anaerobic ammonia oxidation (anammox). Moreover, this cou-pling creates a metabolic niche for nitrous oxide reduction that completes denitrification bycurrently unidentified community members. These results quantitatively improve previousconceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specificpredictions, model results indicate that geochemical fluxes are robust indicators of microbialcommunity structure and, reciprocally, that gene abundances and geochemical conditions1A version of this chapter is published in PNAS (see the Preface for author contributions): Louca,S., Hawley, A.K., Katsev, S., Torres-Beltran, M., Bhatia, P.B., Kheirandish, S., Michiels, C., Capelle, D.,Lavik, G., Doebeli, M., Crowe, S.A., and Hallam, S.J. (2016). Integrating biogeochemistry with multi-omicsequence information in a model oxygen minimum zone. Proceedings of the National Academy of Sciences.DOI:10.1073/pnas.160289711378Gene-centric modeling of the Saanich Inlet OMZlargely determine gene expression patterns. The integration of geochemical profiles, pro-cess rate measurements, as well as metagenomic, metatranscriptomic and metaproteomicsequence data into a biogeochemical model, as demonstrated here, enables holistic under-standing of the distributed microbial metabolic network driving nutrient and energy flow atecosystem scales.7.2 IntroductionMicrobial communities catalyze Earth’s biogeochemical cycles through metabolic pathwaysthat couple fluxes of matter and energy to biological growth (116). These pathways areencoded in evolving genes that over time spread across microbial lineages and today helpshape the conditions for life on Earth. High-throughput sequencing and mass spectrometryplatforms are generating multi-omic sequence information (DNA, mRNA and protein) thatis transforming our perception of this microcosmos, yet the vast majority of environmentalsequencing studies lack a mechanistic link to geochemical processes. At the same time,mathematical models are increasingly used to describe local and global scale biogeochemicalprocesses or to predict future changes in global element cycling and climate (138, 392). Whilethese models typically incorporate the catalytic properties of cells, they fail to integrate theinformation flow between DNA, mRNA, proteins and process rates as described by the centraldogma of molecular biology (85). Hence, a mechanistic framework integrating multi-omicdata with geochemical information has remained elusive.Recent work based on metagenomics and quantitative PCR suggests that biogeochemicalprocesses may be described by models focusing on the population dynamics of individualgenes (386, 387). In such gene-centric models, genes are used as proxies for particularmetabolic pathways, with gene production rates being determined solely by the Gibbs freeenergy released by the catalyzed reactions. While compelling, these models stop short ofincorporating the entire central dogma of molecular biology, and therefore do not achievea truly quantitative integration of multi-molecular information and geochemical processes.For example, while such models allowed for a qualitative comparison between modeled geneproduction rates and selected transcript profiles, they do not provide any explicit mechanisticlinks (386). In fact, the lack of a quantitative validation against process rate measurementsor other proxies for activity (e.g., proteins) begs the question whether gene-centric modelsare adequate descriptions for microbial activity.Here we construct a gene-centric model for Saanich Inlet, a seasonally anoxic fjord on the79Gene-centric modeling of the Saanich Inlet OMZcoast of Vancouver Island, Canada (12), and a tractable analogue for the biogeochemistryof oxygen minimum zones (OMZs). Oxygen minimum zones are widespread and expand-ing regions of the ocean in which microbial community metabolism drives coupled carbon,nitrogen and sulfur cycling (489, 531). These processes exert a disproportionate influenceon global N budgets with resulting feedbacks on marine primary productivity and climate(252, 509). Extensive time series monitoring in Saanich Inlet provides an opportunity tointerrogate biogeochemical processes along defined redox gradients extensible to coastal andopen ocean OMZs (503, 540). Our model explicitly describes DNA, mRNA, and proteindynamics as well as biogeochemical reaction rates at ecosystem scales. We use geochemicaldepth profiles, rate measurements for N cycling processes, metagenomic, metatranscriptomicand metaproteomic sequence data, as well as qPCR-based cell count estimates for SUP05Gammaproteobacteria — the dominant denitrifiers in Saanich Inlet (181), to calibrate andvalidate our model. All data were obtained from a single location in Saanich Inlet acrossseveral depths in early or mid-2010.7.3 Construction and calibration of a gene-centric modelA recurring cycle of deep water renewal and stratification in Saanich Inlet results in an annualformation of an oxygen-depleted (O2) zone in deep basin waters (Fig. 7.1B). As this oxygenminimum zone slowly expands upwards during spring, anaerobic carbon remineralization inthe underlying sediments leads to an accumulation of ammonium (NH+4 ) and hydrogen sulfide(H2S) at depth (340). The oxidation of sulfide using nitrate produced through nitrificationin the upper water column fuels chemoautotrophic activity and promotes the formation of anarrow sulfide-nitrate transition zone (SNTZ) at intermediate depths (218, 504, 540).Our model describes the dynamics of several genes involved in carbon, nitrogen and sulfur cy-cling across the Saanich Inlet redox cline. Each gene is a proxy for a particular redox pathwaythat couples the oxidation of an external electron donor to the reduction of an external elec-tron acceptor. The model builds on a large reservoir of previous work that provides concep-tual insight into the microbial metabolic network in Saanich Inlet (11, 181, 218, 503, 504, 540)and includes aerobic remineralization of organic matter (ROM), nitrification, anaerobic am-monium oxidation (anammox) and denitrification coupled to sulfide oxidation (Fig. 7.1A).Certain pathways found in other OMZs, such as aerobic sulfide oxidation (415), dissimila-tory nitrate reduction to ammonium (DNRA; 253) and sulfate reduction (386), were excludedfrom our model based on information from previous studies (11, 181, 218, 503, 504, 540) aswell as preliminary tests with model variants (as explained below). Reaction rates (per80Gene-centric modeling of the Saanich Inlet OMZgene) depend on the concentrations of all used metabolites according to 1st order or 2ndorder (Michaelis-Menten) kinetics (211). In turn, the production or depletion of metabolitesat any depth is determined by the reaction rates at that depth. The production of genesis driven by the release of energy from their catalyzed reactions, and is proportional to theGibbs free energy multiplied by the reaction rate (396).The model was evaluated at steady state between 100 and 200 m depth. Accordingly,free parameters were calibrated using geochemical profiles obtained in early 2010 during aperiod of intense water column stratification, which resulted in an extensive anoxic zone thatapproached a steady state (Fig. 7.2; Appendix F.3.7). After calibration, most residuals tothe data are associated with an upward offset of the predicted SNTZ (Figs. 7.2B,C,F) andan underestimation of nitrous oxide (N2O) concentrations in deep basin waters (Fig. 7.2E).These residuals can be explained by subtle deviations from a steady state. Such deviationswere revealed in subsequent time series measurements during which the SNTZ continued tomigrate upwards in the water column (Fig. 7.2).7.4 Results and Discussion7.4.1 DNA profiles and process ratesThe calibrated model makes predictions about gene abundance and process rates, whichcan be validated using metagenomic sequence data and N process rate measurements fromthe same location and period as the geochemical calibration data (Fig. 7.3). Consistentwith metagenomic depth profiles, the model predicts a redox-driven partitioning of path-ways across the water column. Genes associated with ROM (ABC transporters mapped todominant aerobic heterotrophs), aerobic ammonium oxidation to nitrite (ammonia monooxy-genases, amo) and aerobic nitrite oxidation to nitrate (nitrite oxidoreductases, nxr) declineprecipitously in deep basin waters, whereas genes associated with partial denitrification tonitrous oxide (PDNO, represented by nitric oxide reductases, norBC ), nitrous oxide reduc-tion (nitrous oxide reductases, nosZ ) and anammox (hydrazine oxidoreductases, hzo) aremost abundant in the SNTZ (Fig. 7.1A).The similarity of the PDNO, nosZ and hzo gene profiles is indicative of their strong inter-action. In particular, the co-occurring peaks of PDNO and nosZ abundances in the absenceof N2O accumulation (Fig. 7.2E) reflect a quantitative coupling between the two denitrifi-cation steps and imply that both steps support extensive productivity at the SNTZ. This81Gene-centric modeling of the Saanich Inlet OMZis intriguing because genomic reconstruction from both uncultivated and cultivated SUP05,the dominant denitrifier in Saanich Inlet, has not identified the nosZ gene (181, 420, 504).This suggests that incomplete nitrate reduction by SUP05 and reduction of nitrous oxide byunidentified community members constitute separate and complementary metabolic nichesin Saanich Inlet under suboxic and anoxic conditions.The superposition of electron donor-acceptor pairs in redox transition zones supports chem-ical energy transfer in stratified water columns (45, 547), and previous studies revealedrelatively high cell abundances and chemoautotrophic activity in such zones (164, 515, 540).At the SNTZ in Saanich Inlet, the simultaneous availability of NO−3 and H2S fuels chemoau-totrophic nitrate respiration coupled to sulfide oxidation, in turn supplying anammox withNO−2 via “leaky” denitrification (up to 88% of NO−2 supplied by PDNO; Appendix F.3.11).Most of the NH+4 utilized by anammox (0.3 mmol ·m−2 ·d−1), on the other hand, is predictedto originate from the underlying sediments and reach the SNTZ via diffusion. Accordingly,both anammox and denitrification rates are predicted to peak around the SNTZ and lead toa significant release of N2. This prediction is consistent with process rate measurements fromdiscrete depth intervals during subsequent cruises in 2010 (Fig. 7.3B), as well as elevatedSUP05 cell counts at the SNTZ (Fig. 7.3A, estimated via qPCR). In fact, the good agree-ment between predicted PDNO gene counts and observed SUP05 cell counts suggests energyfluxes associated with denitrification can be accurately translated to denitrifier biosynthesisrates. Predicted peak sulfide-driven denitrification rates are somewhat higher than peakanammox rates, although depth-integrated nitrogen loss rates are comparable for both path-ways (∼ 0.3 mmol-N2 ·m−2 ·d−1). These predictions are partly consistent with rate estimatesderived directly from the geochemical profiles using inverse linear transport modeling (ILTM,Fig. 7.3B; see Methods for details). Hence, near steady state conditions, coupled sulfide-driven denitrification and anammox can concurrently drive significant nitrogen loss in thewater column.The fraction of NO−2 leaked during denitrification, compared to the total NO−3 consumed(LPDNO = 0.352; Appendix F.3.3), could be calibrated as a free model parameter based onthe observed geochemical profiles. Such a high NO−2 leakage may result from an optimiza-tion of energy yield under electron donor limitation. Further experimental work is neededto understand the mechanisms controlling this leakage by SUP05. Heterotrophic denitrifi-cation and nitrification are conventionally thought of as the primary sources of both nitriteand ammonium for anammox in OMZs (253, 501), and so far evidence for a direct couplingbetween sulfide-driven denitrification and anammox has been scarce (515). Our results indi-cate that sulfide-driven denitrification is an important precursor for anammox, particularly82Gene-centric modeling of the Saanich Inlet OMZunder conditions of organic carbon limitation (172).Steady state gene production rates for chemoautotrophic pathways are predicted to peakaround the SNTZ, reaching ∼ 3.2 × 106 genes · L−1 · d−1. This gene production rate cor-responds to a dark carbon assimilation (DCA) rate of ∼ 60 nmol-C · L−1 · d−1, assuming acarbon:dry weight ratio of 0.45 (115) and a dry cell mass of m = 5 × 10−13 g (396). Pre-viously measured peak DCA rates in the Saanich Inlet OMZ reached 2 µmol-C · L−1 · d−1(218), which is significantly higher than the values predicted here. The potential activityof autotrophic pathways not considered here, such as methane oxidation (531), may explainsome of these differences. Moreover, the model assumes steady state conditions, while redoxconditions were far from steady state during previous DCA measurements (218). Transientdynamics in Saanich Inlet can exhibit significantly higher nitrogen fluxes and chemoau-totrophic activity (295), which might further explain discrepancies between the model andmeasured DCA rates. Accounting for chemoautotrophic productivities based on oxidant andreductant supply in redox transition zones is generally difficult due to limited knowledge onactive pathways, the possibility of cryptic nutrient cycling and potential lateral substrateintrusions, and discrepancies similar to our study are frequently reported for other OMZs(217, 274, 341, 546).Previously detected amino acid motifs similar to those found in proteins catalyzing DNRA,suggested that SUP05 may also be providing NH+4 to anammox through DNRA (181).DNRA, not included in the model, is known to fuel anammox in anoxic sediments andwater columns (253, 374). So far incubation experiments have not revealed any DNRAactivity in Saanich Inlet, and measured ammonium profiles do not indicate any significantammonium source at or below the SNTZ (Fig. 7.2B). Nevertheless, DNRA could be activein Saanich Inlet and remain undetected due to rapid ammonium consumption by anam-mox (374). An extension of the model that included DNRA as an additional pathway, andwhich we calibrated to the same geochemical data (January–March 2010), predicted negli-gible DNRA rates compared to denitrification and anammox and consistently converged tothe same predictions as the simpler model. This suggests that DNRA may be absent fromthe Saanich Inlet water column — at least near steady state conditions in late spring — andthat the hydroxylamine-oxidoreductase homolog encoded by SUP05 plays an alternative rolein energy metabolism (504).DNA concentration profiles of anammox and denitrification genes appear more diffuse andare skewed towards deep basin waters, compared to their corresponding rate profiles and theSNTZ (Fig. 7.3). The model explains this apparent discrepancy based on turbulent diffusion83Gene-centric modeling of the Saanich Inlet OMZand sinking, which both transport genes away from their replication origin. Hence, commu-nity composition at any depth is the combined result of local as well as non-local populationdynamics. Metabolic flexibility encoded in the genomes of microorganisms mediating theseprocesses may also contribute to broader distributions of individual genes than their pre-dicted activity range (172). This disconnect between local metabolic potential and activityneeds to be considered when interpreting metagenomic profiles in a functional context, es-pecially in environments with strong redox gradients such as OMZs (531) or hydrothermalvents (387).The concentration maxima of anammox and denitrification genes are predicted at shallowerdepths than measured (Fig. 7.3A). This observation is consistent with the upward offset ofthe predicted SNTZ and highlights an important limitation of steady state models appliedto dynamic ecosystems. Indeed, process rate maxima predicted via ILTM at multiple timepoints continue to move upwards beyond the time interval used for model calibration (Fig.7.3B). In reality, an electron donor/electron-acceptor interface as narrow as predicted by themodel would only develop after sufficient time for transport processes and microbial activityto reach a true steady state. Such narrow interfaces do appear in permanently stratifiedmeromictic lakes (435) or the Baltic Sea, where stagnation periods can persist for manyyears (172).7.4.2 mRNA and protein profilesMetagenomics can yield insight into the metabolic potential of microbial communities,however it is ill suited as a means to infer actual pathway activity because the latterdepends, among others, on environmental conditions. Metatranscriptomics and metapro-teomics present powerful means to assess community metabolic activity, and each methodcomes with its own set of advantages (181, 330). For example, while transcripts presentimmediate proxies for gene up-regulation (e.g., in response to changing redox conditions),proteins represent the immediate catalytic potential of a community, and in vitro charac-terization of enzyme kinetics can facilitate the projection of protein abundances to in situprocess kinetics (502). Transcript abundances need not always correlate strongly with proteinabundances, for example in cases of transcriptional control or protein instability (265, 502),and hence metatranscriptomics and metaproteomics can in principle provide different per-spectives on community activity. This warrants a systematic evaluation of the consistencybetween these alternative layers of information in real ecosystems. In fact, a unifying mech-anistic framework describing the processes that control environmental mRNA and protein84Gene-centric modeling of the Saanich Inlet OMZdistributions is crucial for the correct interpretation of multi-omics data (330).While DNA replication and process rates are predicted by our gene-centric model merelybased on environmental redox conditions, it is uncertain to what extent intermediate stagesof gene expression (transcription and translation) can be explained based on such a paradigm.For example, environmental mRNA concentrations measured via qPCR have previously beendirectly compared to predicted reaction rates (386), but such a heuristic comparison ig-nores other mechanisms controlling environmental biomolecule distributions, such as physi-cal transport processes. Here, in an attempt to mechanistically describe mRNA and proteindynamics at ecosystem scales, we hypothesized that both mRNA and protein productionrates at a particular depth are proportional to the total reaction rate at that depth (cal-culated using the calibrated model). This premise is motivated by observations of elevatedtranscription and translation rates during high metabolic activity or growth (9, 154, 229).Furthermore, we assumed that mRNA and protein molecules are subject to the same hydro-dynamic dispersal processes as DNA, while decaying exponentially with time, post synthesis.The decay time of each molecule, as well as the proportionality factor between the reactionrate and synthesis, were statistically estimated using metatranscriptomic and metaproteomicdata (Appendix F.3.10).The general agreement between this model and the molecular data (Fig. 7.3A) suggeststhat the production-degradation dynamics of several of these molecules are, at the ecosys-tem level, dominated by the mechanisms described above. The best fit (in terms of thecoefficient of determination, Table F.4) is achieved for nosZ and nxr mRNA, as well as amo,norBC, nxr and ROM proteins. The greater number of protein over mRNA profiles thatcan be explained by the model suggests that the proteins considered here are indeed simplyproduced on demand and slowly degrade over time, while mRNA dynamics are subject tomore complex regulatory mechanisms (153, 330). In particular, the decay times of sometranscripts and proteins were estimated to be as high as several weeks (Table F.4). For pro-teins these estimates fall within known ranges (153), however for transcripts these estimatesare much higher than decay times determined experimentally in cells (465). One reason forthis discrepancy appears to be the underestimation of the SNTZ depth range by the model,which in turn leads to longer estimates for mRNA decay times needed to explain the detec-tion of these molecules outside of the SNTZ. Alternatively, transcripts and proteins mightpersist in the cells in inactive states for a significant period of time even after dispersal intoareas with low substrate concentrations. For example, stable but silent transcripts have beenfound in bacteria following several days of starvation (245, 299). Further, gene expressionmay not change immediately upon a change of external stimulus (181). For example, for85Gene-centric modeling of the Saanich Inlet OMZsome prokaryotic transcription cascades the basic time unit may be the cell doubling time(which can reach several weeks in anoxic environments; 518), due to regulation by long-livedtranscription factors (402). Hence, the decay times estimated here may reflect a hysteresisin gene down-regulation following nutrient depletion, perhaps in anticipation of potentialfuture opportunities for growth (326). Overall, these observations suggest that future mod-els for environmental mRNA dynamics and future metatranscriptomics would benefit from aconsideration of additional mRNA control mechanisms, for example derived from cell-centrictranscription models (403, 542).7.4.3 Consequences for geobiologyGene-centric models have the potential to integrate biogeochemical processes with microbialpopulation dynamics (386, 387). According to the central dogma of molecular biology (85),gene transcription and translation are intermediate steps that regulate metabolic processesin individual cells, but the appropriate projection of the central dogma to ecosystem scalesremained unresolved because transcription and translation were not explicitly consideredin previous models (386, 387). We have developed a biogeochemical model that explicitlyincorporates multi-omic sequence information and predicts pathway expression and growthin relation to geochemical conditions. In particular, when mRNA and protein dynamics areomitted, the gene-centric model only includes 4 calibrated parameters and yet is able tolargely reproduce geochemical profiles (Fig. 7.2), relative metagenomic profiles (Fig. 7.3A)and SUP05 cell abundances, which means that the good fit is unlikely the mere result ofoverfitting. In fact, as we refined and calibrated our model to the geochemical profiles,we observed that the metagenomic profiles were well reproduced as soon as the model’sgeochemical predictions roughly aligned with the data, even if the calibrated parameters werestill being adjusted. This further strengthens the interpretation that fluxes of matter andenergy are robust predictors of microbial productivity and functional community structure.Our model successfully explains a large fraction of environmental mRNA and protein distri-butions based on DNA concentration profiles and biogeochemical reaction rates in the OMZ.These results are consistent with the idea that DNA is a robust descriptor of an ecosystem’sbiological component (104, 201) which, in conjunction with the geochemical background, de-termines pathway expression and process rates (257). This implies that the co-occurrence ofa metabolic niche with cells able to exploit it is sufficient to predict microbial activity. Underthis paradigm, DNA may be regarded as directly coupled to the extracellular geochemicalenvironment, while the production of mRNA and proteins is an inevitable consequence of86Gene-centric modeling of the Saanich Inlet OMZthe biologically catalyzed flow of matter and energy. This interpretation is supported by theoverall consistency between the metatranscriptomic and metaproteomic profiles for N andS cycling genes (Fig. 7.3A) and suggests that mRNA and proteins may each be adequateproxies for pathway activity in Saanich Inlet. Further work is needed to test this paradigm inother ecosystems, especially under non-steady state conditions. Nevertheless, many marineand freshwater water columns are permanently or almost permanently anoxic (416, 435, 489)and hence, our approach and conclusions are expected to be particularly applicable to thesesystems.In addition to providing a systematic calibration and validation of the model we identifiedprocesses that need to be considered when interpreting multi-omic data. Conventional prox-ies for activity, such as mRNA and proteins, are themselves subject to complex populationdynamics that include production, active or passive degradation as well as physical transportprocesses. Consequently, the close association between process rates and biomolecule pro-duction suggested above does not imply that biomolecule distributions per se are equivalentto local microbial activity rates. In Saanich Inlet, for example, the wide distribution of DNA,mRNA and proteins across the OMZ, in contrast to a relatively narrow metabolic “hot zone”at the SNTZ, is predicated on a balance between spatially confined production and dispersalacross the water column. This so-called mass effect (268) indicates that geochemical or bio-chemical information is needed to assign actual activity to genes or pathways identified inmulti-omic data, especially for components mediating energy metabolism. This conclusionis generalizable and should be applied to other ecosystems exhibiting dispersal across strongenvironmental gradients, such as freshwater-marine transitions (87) or hydrothermal vents(387). Moreover, in dynamic ecosystems with changing geochemical conditions past popula-tion growth rates can influence future community structure and biomolecular patterns andhence, cross-sectional community profiles may not reflect current dynamics (430). In suchsystems, an incorporation of multiple layers of geochemical and biological information intoa mechanistic framework — as demonstrated here — will be crucial for disentangling themultitude of processes underlying experimental observations.The gene-centric model constructed here, although evaluated at steady state, is in fact aspatiotemporal model that could, in principle, predict gene population dynamics and pro-cess rates over time. A spatiotemporal analysis of the model would require multi-omic timeseries coverage and knowledge of non-stationary physical processes, such as seasonal patternsin surface productivity and hydrodynamics during deep water renewal events, which wereunavailable at the time of this study. Multi-omic time series would be especially useful forimproving the mRNA and protein models introduced here, due to the high number of cur-87Gene-centric modeling of the Saanich Inlet OMZrently unknown parameters. For example, integrating metatranscriptomic, metaproteomicand geochemical time series during rapid environmental changes into our model would al-low for a more direct determination of in situ transcriptional and translational responsesand biomolecule decay times. Advances in multi-omic sequencing will undoubtedly lead toincreased spatiotemporal coverage in the future (241), thus allowing for the much neededintegration of such data into biogeochemical models.The multi-omic profiles that we used to validate our model are only given in terms of relative— rather than absolute — biomolecule concentrations. Hence, the observed abundance ofeach biomolecule may be influenced by the abundances of other biomolecules, which couldexplain some of the discrepancies between the model and the multi-omic data. Unfortunately,this limitation is currently pervasive across environmental shotgun sequencing studies, largelydue to technical challenges in estimating in situ DNA, mRNA and protein concentrations(376). These challenges will likely be overcome in the future (428), and this will undoubtedlyimprove model testing and refinement. Given this current caveat, the general agreement ofthe model with the shape of the multi-omic profiles (Fig. 7.3A) is all too remarkable, andsuggests that the spatial structuring of the metabolic network is well captured by the model.In fact, our qPCR-based estimates for absolute SUP05 cell concentrations are consistentwith absolute PDNO gene concentrations predicted by the model, as well as with the shapeof the PDNO abundance profiles in the metagenomes (Fig. 7.3A). This double agreementsuggests that — at least for PDNO — both our model as well as our metagenomic data setsreflect the actual gene distributions.7.5 ConclusionsMost major metabolic pathways driving global biogeochemical cycles are encoded by a coreset of genes, many of which are distributed across distant microbial clades (116). Thesegenes are expressed and proliferate in response to specific redox conditions and, in turn,shape Earth’s surface chemistry. Here we have shown that the population dynamics of genesrepresentative of specific metabolic pathways, their expression and their catalytic activityat ecosystem scales can all be integrated into a mechanistic framework for understandingcoupled carbon, nitrogen and sulfur cycling in OMZs. This framework largely explainedDNA, mRNA and protein concentration profiles and resolved several previous uncertaintiesin metabolic network structure in Saanich Inlet, including a direct coupling of sulfide-drivendenitrification and anammox through “leaky” nitrite production by SUP05, as well as thepresence of a metabolic niche for nitrous oxide reduction contributing to nitrogen loss. Be-88Gene-centric modeling of the Saanich Inlet OMZyond OMZ-specific predictions, model results indicate that geochemical fluxes are robustindicators of microbial community structure and, reciprocally, that gene abundances andgeochemical conditions largely determine gene expression patterns. Such integrated model-ing approaches offer the potential to understand microbial community metabolic networksand to predict the responses of elemental cycles in a changing world.Feb08 Jan09 Jan11 Dec112008 2009 2010 2011015>18105H 2S (µM)2501003010<3>353020100Oxygen (µM)NO3- (µM)Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200Planctomycetes Nitrosopumulus sp.unknown cladeNitrospira/Nitrospinauncultured SUP05 heterotrophFeb08 Jan09 Jan11 Dec112 08 2 09 2010 2011Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200Depth (m)050100150200BANH+4 NO3NO2SO24N2OO2HSN2SO24DOM Feb06  Depth (m)0501001502002501003010<3Oxygen (µM)Feb07 Jan08 Jan09 Jan10 Jan11 Jan12 Jan13 Jan14  Depth (m)050100150200>353020100NO3- (µM) Depth (m)0501001502001.01.520.50NO2- (µM) 1510>2050NH4+ (µM)Depth (m)050100150200Depth (m)050100150200 Depth (m)050100150200 015>18105HS (µM)25010>50010CH4(nM) Depth (m)050100150200205>5010(nM)NO 22006 2007 2008 2009 2010 2011 2012 2013 Feb06  Depth (m)0501001502002501003010<3Oxygen (µM)Feb07 Jan08 Jan09 Jan10 Jan11 Jan12 Jan13 Jan14  Depth (m)050100150200>353020100NO3- (µM) Depth (m)0501001502001.01.520.50NO2- (µM) 1510>2050NH4+ (µM)Depth (m)050100150200Depth (m)050100150200 Depth (m)050100150200 015>18105HS (µM)25010>50010CH4(nM) Depth (m)050100150200205>5010(nM)NO 22006 2007 2008 2009 2010 2011 2012 2013015>18105H 2S (µM)2501003010<3>353020100Oxygen (µM)NO3- (µM)Jan 13 Feb 10 Mar 10 Apr 07ROMamo nxrhzonosZPDNOFigure 7.1: Metabolic network and selected chemical time series. (A) Coupled carbon,nitrogen and sulfur redox pathways considered in the model: Remineralization of organic matter(ROM), aerobic ammonia oxidation (amo), aerobic nitrite oxidation (nxr), anaerobic ammoniaoxidation (hzo), as well as partial denitrification to nitrous oxide (PDNO) and reduction of nitrousoxide (nosZ ) coupled to hydrogen sulfide oxidation (see Methods for details). Major taxonomicgroups encoding specific pathways are indicated. (B) Water column oxygen (O2), nitrate (NO−3 )and hydrogen sulfide (H2S) concentrations measured at Sa ich Inlet station SI03 from Ja uary2008 to December 2011. The shaded interval and the dates at the bottom indicate the chemicalmeasurements that were used for model calibration. The vertical white line marks the time ofmolecular sampling. The model considers depths between 100 m and 200 m.89Gene-centric modeling of the Saanich Inlet OMZA B CD E FFigure 7.2: Measured and predicted geochemical profiles. (A) oxygen, (B) ammonium, (C)nitrate, (D) nitrite, (E) nitrous oxide and (F) hydrogen sulfide concentrations as predicted by thecalibrated model at steady state (thick blue curves). Dots: Data used for the calibration, measuredduring cruise 41 on January 13, 2010 (SI041_01/13/10, rectangles), cruise 42 (SI042_02/10/10,rhomboids) and cruise 43 (SI043_03/10/10, triangles). Oxygen profiles were not available forcruises 41 and 43, hence data from cruise 44 (SI044_04/07/10, stars) were used instead. Thinblack curves: Data measured during cruise 47 (SI047_07/07/10), shortly before deep water renewal.Details on data acquisition in Appendix F.2.90Gene-centric modeling of the Saanich Inlet OMZii“Article” — 2016/6/14 — 15:56 — page 11 — #11 iiiiiiFig. 3. Molecular and rate profiles. (a) Predicted DNA, mRNA and protein concentrations (rows 1–3) for ROM, amo, nxr,norBC, hzo and nosZ genes (thick curves), compared to corresponding metagenomic, metatranscriptomic and metaproteomic data (circles,February 10, 2010). The dashed curve under PDNO genes (row 1, column 4) shows concurrent qPCR-based cell count estimates for SUP05,the dominant denitrifier in Saanich Inlet. (b) Denitrification and anammox rates predicted by the model (thick blue curves), comparedto rate measurements (circles) during cruises 47 (SI047 07/07/10) and 48 (SI048 08/11/10), as well as rates estimated from geochemicalconcentration profiles using inverse linear transport model fitting (ILTM; Supplement S5). The ILTM estimates “calibr.” in the 3rd and6h plot are based on the same geochemical data as used for model calibration (Fig. 2).Footline Author PNAS Issue Date Volume Issue Number 11DNA (10⁶/L)mRNA (RPKM)proteins (10ˉ³ × NSAF)ROM amo nxr nosZPDNOABhzomodel measurements ILTM model measurements ILTMdenitrification anammoxmodel DNAmodel mRNAmodel proteinmulti-omic dataqPCR SUP05Figure 7.3: Molecular and rate profiles. (A) Predicted DNA, mRNA and protein concen-trations (rows 1–3) for ROM, amo, nxr, norBC, hzo and nosZ genes (thick curves), compared toco responding metagenomic, metatranscriptomic d metaproteomic data (circles, February 10,2010). The dashed curve under PDNO genes (row 1, column 4) shows concurrent qPCR-basedcell count estimates for SUP05, the dominant denitrifier in Saanich Inlet. (B) Denitrification andanammox rates predicted by the model (thick blue curves), compared to rate measurements (cir-cles) during cruises 47 (SI047_07/07/10) and 48 (SI048_08/11/10), as well as rates estimated fromgeochemical concentration profiles using inverse linear transport modeling (ILTM; Appendix F.5).The ILTM estimates “calibr.” in the 3rd and 6h plot are based on the same geochemical data asused for model calibration (Fig. 7.2).91Reaction-centric modeling of community metabolismChapter 8Reaction-centric modeling of microbialcommunity metabolism18.1 SynopsisThe growth of microbial populations catalyzing biochemical reactions leads to positive feed-back loops and self-amplifying process dynamics at ecosystem scales. Hence, the state ofa biocatalyzed process is not completely determined by its physicochemical state but alsodepends on current cell or enzyme concentrations that are often unknown. Here we proposean approach to modeling reaction networks of natural and engineered microbial ecosystemsthat is able to capture the self-amplifying nature of biochemical reactions without explicitreference to the underlying microbial populations. This is achieved by an appropriate com-bination of parameters and variables, that allows keeping track of a system’s “capacity”to perform particular reactions, rather than the cell populations actually catalyzing them.Our reaction-centric approach minimizes the need for cell-physiological parameters such asyield factors and provides a suitable framework for describing a system’s dynamics purely interms of chemical concentrations and fluxes. We demonstrate our approach using data froman incubation experiment involving urea hydrolysis and nitrification, as well as time seriesfrom a long-term nitrifying bioreactor experiment. We show that reaction-centric modelscan capture the dynamical character of microbially catalyzed reaction kinetics and enablethe reconstruction of bioprocess states using solely chemical data, hence reducing the needfor laborious biotic measurements in environmental and industrial process monitoring.1A version of this chapter has been published (see the Preface for author contributions): Louca, S.,Doebeli, M. 2016. Reaction-centric modeling of microbial ecosystems. Ecological Modelling. 335:74–86.DOI:10.1016/j.ecolmodel.2016.05.01192Reaction-centric modeling of community metabolism8.2 IntroductionMicrobial metabolism powers biochemical fluxes in natural and engineered ecosystems (116,312). Reciprocally, biochemical fluxes sustain biosynthesis and thus drive microbial popula-tion dynamics (210). Changes in the microbial populations, in turn, influence the reactionkinetics at ecosystem scales because system-wide reaction rates depend not only on substrateconcentrations but also on the density of catalyzing cells or of extracellular enzymes (426).Thus, the dynamics of microbial communities emerge from the continuous interplay betweenmetabolic activity, changes in the extracellular metabolite pool and microbial populationgrowth (433). In particular, and in contrast to purely abiotic chemical processes (297),the state and future trajectory of a biocatalyzed process cannot be determined solely basedon the system’s chemical state (210, 426). For example, empirical mineralization curvesthat describe the degradation rate of organic matter as a function of substrate density canvary strongly in shape, and this variation historically resulted partly from the interaction ofsubstrate concentrations and cell population densities in experiments (426).In deterministic or stochastic differential equation models (233, 390, 433), the dynamicalcharacter of microbially catalyzed reaction kinetics is typically incorporated by includingadditional variables representing cell densities, whose growth is proportional to the rates ofthe processes that they catalyze and determined by cell-per-substrate (or sometimes biomass-per-substrate) yield factors (210). In turn, system-wide reaction kinetics are modulated bycurrent cell densities and extracellular metabolite concentrations. Such cell-centric modelsare widely used and can capture the typical self-amplifying character of biocatalyzed pro-cesses (68). Likewise, deterministic as well as stochastic individual-based models, which keeptrack of multiple individual organisms and their metabolic activity, can also capture the feed-back loops within microbial metabolic networks because the metabolic or trophic activity ofeach organism eventually leads to the production of new copies of that organism (124, 258).All of these cell-centric models, however, depend on physiological parameters such as yieldfactors, cell masses or maximum cell-specific reaction rates, and require knowledge of cell orenzyme concentrations (in addition to physicochemical variables) for describing a system’scurrent state. As we explain below, some of these parameters also introduce redundanciesfrom a reaction kinetic point of view that can lead to strong uncertainties in parameterestimation (242, 426).Flux-balance models, a popular alternative to dynamical models (354), reduce the number ofrequired parameters by ignoring cell population dynamics and by assuming that metaboliteconcentrations are constant through time (i.e., metabolite fluxes are “balanced”). In these93Reaction-centric modeling of community metabolismmodels, reaction rates (and sometimes metabolite turnover rates; 72) are the only indepen-dent variables, and their values are calculated by optimizing some objective function (e.g.,ATP production) in the presence of constraints (e.g., on maximum reaction rates). Flux bal-ance models have been very successful in elucidating metabolic network properties such asthe feasibility of certain reactions or the prediction of metabolic interactions between species(238, 453, 545) but — being steady-state models — they fail to capture the dynamical natureof microbial communities. Hence, current model frameworks either ignore the temporal andself-amplifying character of biocatalyzed processes or require an extensive set of — oftenpoorly estimated — physiological parameters.To address the above limitations, here we develop a new framework for dynamical bioprocessmodeling with a focus on system-wide reaction kinetics. Our objective was to reduce thereliance on physiological parameters and to reduce the need for biotic measurements for statereconstruction and model calibration, while still accounting for the self-amplifying characterof metabolic reactions at the ecosystem level. Such a “reaction-centric” model would ide-ally make predictions purely in terms of metabolite concentrations and reaction rates at theecosystem level, without the need to consider the underlying cell populations. As we showbelow, this can be achieved by keeping track of a system’s “capacity” to perform particularreactions (or pathways), rather than the cell populations actually catalyzing them. Microbialecosystem metabolism can then be described similarly to abiotic reaction networks, with theaddition of so-called self- and cross-amplification factors between reactions. These amplifi-cation factors are specific to a particular microbial community and translate the system’smetabolic fluxes into changes of the system’s reaction capacities. Hence, a system’s stateand dynamics can be inferred using solely physicochemical measurements, bypassing labo-rious biotic measurements for example in environmental and industrial process monitoring.Furthermore, reaction-centric models minimize the reliance on cell-physiological parameters,allowing for model calibration even when biotic data are scarce. Reaction-centric modelsthus provide an elegant alternative to many conventional cell-centric models, particularlywhen the ultimate focus is on a system’s reaction kinetics.We begin with a derivation of the reaction-centric framework and show how it relates toconventional, cell-centric models. We focus on differential equation models, however we notethat our reasoning can also be applied to other cell-centric frameworks. We demonstratethe potential of reaction-centric models using data from a previous short-term incubationexperiment with a ureolytic and nitrifying microbial community (94), as well as long-termtime series from a flow-through nitrifying bioreactor (109). Bioreactors provide ideal modelecosystems for testing new theories for microbial ecology, due to their higher controllability94Reaction-centric modeling of community metabolismand measurability when compared to natural ecosystems. Ureolysis and nitrification werechosen as examples because of their conceptual simplicity as well as their great relevanceto ecosystem productivity, industry and agriculture (375, 520). Our entire analysis wasperformed with MCM (Chapter 4; 284), which we extended to accommodate reaction-centricmodels.8.3 Derivation of the reaction-centric framework8.3.1 One reaction per cellConventional cell-centric microbial ecosystem models consider the extracellular concentra-tions of metabolites as well as the cell densities of microbial populations catalyzing var-ious reactions. In the simplest and most common case each reaction is catalyzed by adistinct microbial population, the growth of which is proportional to the rate of the reaction(210, 258, 426). More precisely, the population density of cells catalyzing reaction r (Nr, cellsper volume) and the concentration (Cm) of each metabolite m are described by differentialequations similar to the following:dNrdt= NrYrVrhr(C)− λrNr, (8.3.1)dCmdt= Fm(t,C) +∑rSmrNrVrhr(C). (8.3.2)In Eq. (8.3.1), Yr is a cell yield factor (cells produced per substrate used), Vr is the maxi-mum cell-specific reaction rate (flux per cell per time) and C is the vector representing allmetabolite concentrations (overview of symbols in Table 8.1). We note that in models whereNr is alternatively measured in biomass (rather than cells) per volume, Yr is typically abiomass yield factor and Vr is a maximum biomass-specific reaction rate. The dependenceof cell-specific reaction kinetics on C is encoded by the unitless function hr(C), which isnormalized to unity at those C that maximize the cell-specific reaction rate. The last termin Eq. (8.3.1) corresponds to the decay of biomass at an exponential rate λr (with unitstime−1), for example due to cell death. Alternatively, λr can account for reduced biosynthesisdue to maintenance energy requirements, in which case it is sometimes called the “specificmaintenance rate” (210). In Eq. (8.3.2), Fm accounts for abiotic metabolite fluxes, such assubstrate supply in a bioreactor, and Smr is the stoichiometric coefficient of metabolite m inreaction r. The sum in Eq. (8.3.2) iterates through all reactions and accounts for microbialmetabolic fluxes.95Reaction-centric modeling of community metabolismIn the above cell-centric model the system’s state depends on the current metabolite con-centrations (Cm) as well as the current cell densities (Nr), the prediction of which, in turn,requires knowledge of physiological parameters such as Yr and Vr. As we show below, thisfocus on cell populations can be avoided if one is solely interested in the system’s reactionkinetics. Observe that the product Mr = NrVr, henceforth referred to as the system’s cur-rent “reaction capacity”, is the maximum system-wide rate of reaction r (flux per volumeper time) that could possibly be attained at favorable metabolite concentrations (i.e., whenhr(C) = 1). On the other hand, the product Hr = NrVrhr = Mrhr is the actual system-widerate of reaction r. Note that Hr depends both on the reaction capacity Mr as well as thenormalized kinetics hr(C), which encodes the dependence of the reaction rate on the sys-tem’s chemical state. Rewriting Eqs. (8.3.1) and (8.3.2) in terms of the reaction capacitiesMr yields the reaction-centric modeldMrdt= ArMrhr(C)−Mrλr, (8.3.3)dCmdt= Fm(t,C) +∑rSmrHr(C), (8.3.4)Hr = Mrhr, (8.3.5)where we introduced the so called self-amplification factor Ar = VrYr in Eq. (8.3.3). Thismodel describes biochemical reactions at the scale of the ecosystem, without explicit referenceto biotic quantities such as cell densities or physiological parameters such as yield factors.The structure of Eq. (8.3.3) emphasizes the self-amplifying nature of biochemical reactionsat the ecosystem level, with the self-amplification factors Ar mediating the conversion ofreaction rates to a growth of reaction capacities. In the context of cell-centric models, Aris the maximum specific growth rate of cells performing reaction r (in units time−1). In thereaction-centric model, Ar becomes the maximum exponential growth rate of the reactioncapacity Mr. Note that Ar only depends on the product VrYr, but not on the individual Vror Yr. Hence, the system’s biochemical dynamics can be modeled without knowledge of theVr and Yr because the system’s trajectory is completely determined by the self-amplificationfactors and the reaction capacities at some point in time. This collapse of unknown param-eters into fewer ones, without loosing any predictive power, means that fewer parametersare needed for practical purposes than often assumed. In fact, the redundancy inherent tothe simultaneous inclusion of Vr and Yr in conventional models was previously pointed outby Simkins and Alexander (426). This redundancy can lead to strong negative correlationsbetween estimated Yr and Vr, particularly when parameter estimation is based solely on non-96Reaction-centric modeling of community metabolismbiotic chemical time series, because such time series cannot differentiate between alternativecombinations of Vr and Yr yielding the same product VrYr (242).8.3.2 Multiple reactions per cellSo far we assumed that each cell performs exactly one reaction, which means that eachmodeled reaction only induces the growth of its own capacity. While this assumption iswidespread in ecosystem modeling (258, 386), in reality several alternative pathways may beperformed by the same cells. For example, members of the ammonium oxidizing Nitrosospiragenus are also able to hydrolyze urea (300), and urea hydrolysis in incubation experimentswith Nitrosospira was shown to promote ammonium oxidation by the same population (94).In the simplest case, the combined effects of several metabolic pathways on cell populationgrowth can be assumed to be additive, so that each reaction r has a contribution YrHr tothe total growth of the cell population:dNrdt= YrHr +∑q 6=rYqHq −Nrλr. (8.3.6)Here, the sum in Eq. (8.3.6) iterates over all additional reactions attributable to cells per-forming reaction r. If two reactions r and q are performed by the same population thenNr = Nq and, reciprocally, if two populations share a common reaction, that reaction willneed to be represented twice using two separate indices r. The assumption of additive effectson growth is common in conventional microbial population models. For example, Courtinand Spoelstra (80) model a population of acetic acid bacteria utilizing multiple organic sub-strates by assuming that each pathway has an additive effect on the total population growth.More sophisticated models of microbial metabolism based on flux balance analysis and opti-mization of a linear utility function also assume additive effects of various metabolic fluxes,although the functions hr(C) may not be explicit in C, but instead specified in terms of anoptimization algorithm (354).The cell-centric model in Eq. (8.3.6) corresponds to a reaction-centric model in which mul-tiple reactions amplify each other’s capacities whenever they are performed by the samecells:dMrdt= ArHr +∑q 6=rArqHq − λrMr. (8.3.7)Here, the so-called “cross-amplification” factors Arq = VrYq correspond to the positive effects97Reaction-centric modeling of community metabolismof the flux through some reaction q on the capacity of some other reaction r and hence, thesum in Eq. (8.3.7) iterates through all additional reactions q performed by the same cellpopulation as reaction r. The amplification matrix, whose diagonal entries are the self-amplification factors Ar and whose off-diagonal entries are the cross-amplification factorsArq, defines a linear transformation of the vector containing all reaction rates to a vectorcontaining changes in reaction capacities. Note that regardless of any amplifications of thereaction capacities, actual rates may still be limited by low substrate concentrations or thepresence of inhibitors, as determined by the normalized kinetics hr(C). Also note that sinceArq = VrYq and Nr = Nq for any two reactions q and r performed by the same cells, thefollowing consistency conditions apply:Aqr =ArAqArq, Mr = MqArqAq. (8.3.8)Regardless of any cell-centric interpretation, the system’s reaction dynamics only depend onthe amplification