UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Evolution in heterogeneous environments Matthey-Doret, Rémi 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2020_november_matthey-doret_remi.pdf [ 4.91MB ]
Metadata
JSON: 24-1.0390994.json
JSON-LD: 24-1.0390994-ld.json
RDF/XML (Pretty): 24-1.0390994-rdf.xml
RDF/JSON: 24-1.0390994-rdf.json
Turtle: 24-1.0390994-turtle.txt
N-Triples: 24-1.0390994-rdf-ntriples.txt
Original Record: 24-1.0390994-source.json
Full Text
24-1.0390994-fulltext.txt
Citation
24-1.0390994.ris

Full Text

Evolution in heterogeneous environments  by  Rémi Matthey-Doret B.Sc,	University	of	Neuchâtel,	2012	M.Sc,	University	of	Lausanne,	2014		A	THESIS	SUBMITTED	IN	PARTIAL	FULFILLMENT	OF	THE	REQUIREMENTS	FOR	THE	DEGREE	OF	Doctor	of	Philosophy	in		The Faculty of Graduate and Postdoctoral Studies (Zoology)  The University of British Columbia (Vancouver)   April 2020 ©	Remi	Matthey-Doret,	2020 	 ii		The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:  Evolution in heterogeneous environments submitted by Rémi Matthey-Doret in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Zoology.  Examining Committee:   Michael C. Whitlock, Professor, Department of Zoology, UBC  Supervisor   Loren H. Rieseberg, Professor, Department of Botany, UBC Supervisory Committee Members   Jeannette Whitton, Associate Professor, Department of Botany, UBC University Examiner  Darren E. Irwin, Professor, Department of Zoology, UBC University Examiner  Tadeusz J. Kawecki, Associate Professor, Department of Ecology and Evolution, Unil External Examiner  Additional Supervisory Committee Members:  Amy L. Angert, Associate Professor, Department of Botany, UBC Supervisory Committee Members   Sarah P. Otto, Professor, Department of Zoology, UBC Supervisory Committee Members  	 iii			Environmental heterogeneity is a fundamental feature of evolutionary biology. In this thesis, I investigate a few aspects relating to environmental heterogeneity. In chapter 2, I explore how background selection can affect detection of local adaptation. Background selection is a process whereby recurrent deleterious mutations cause a decrease in the effective population size and genetic diversity at linked loci. Several authors have suggested that variation in the intensity of background selection could cause variation in FST across the genome, which could confound signals of local adaptation in genome scans. We performed realistic simulations of DNA sequences to show that variation in the intensity of background selection does not cause much variation in FST and does not affect the false positive rate in FST outlier studies in populations connected by gene flow. In chapter 3, I investigate how developmental instability might emerge as a side-effect of two distinct mechanisms for adaptive plasticity: sensing an environmental signal and sensing a performance signal (a.k.a. developmental selection). Using a numerical model of a network of gene interactions, we show that, because a performance signal allows a regulatory feedback loop buffering against developmental noise, plasticity comes at a cost of developmental instability when the plastic response is mediated via an environmental signal, but not when it is mediated via a performance signal. We also show that a performance signal mechanism can evolve in a constant environment to increase developmental robustness, leading to genotypes pre-adapted for plasticity to novel environments. In chapter 4, I present SimBit, a general purpose and high performance forward-in-time population genetics simulator. Because different simulation scenarios require different simulation methods in order to achieve high performance, SimBit is able to use different representations of the individuals’ genotype allowing it to sustain a high performance in a wide diversity of scenarios. SimBit’s performance is benchmarked in comparison to SLiM, Nemo and SFS_CODE and I report that SimBit is most often the highest performing program.  Abstract 	 iv		Environmental heterogeneity is at the core of the theory of evolution. In this thesis, I investigate a few aspects relating to environmental heterogeneity. FST outlier scans are a common technique to detect evidence of local adaptation. It consists of scanning through the genome of a species searching for genomic regions that are highly diverged among populations. It has been suggested that genomic regions subject to recurrent deleterious mutations might be associated with higher than average divergence among population and might therefore be misattributed as being under local adaptation. In chapter 2, I test this hypothesis and show that FST outlier scans are relatively insensitive to recurrent deleterious mutations. In chapter 3, I investigate the constraints on the evolution of plasticity and show that the constraint depends upon the mechanism by which the plastic response is implemented. Finally, in chapter 4, I present SimBit, a high performance forward-in-time population genetics simulator.   Lay summary 	 v		Chapter 2 has been previously published as  Matthey-Doret R. and Whitlock MC, 2019, Background selection and FST: Consequences for detecting local adaptation, Molecular Ecology, 28:3902-3914  Michael C. Whitlock formulated the original idea, provided guidance during the project and during writing. I took part in the designing of the project, testing of simulation platforms to select one that could perform the simulations, wrote the simulation platform and the FDist2 algorithm, ran the simulations and analyzed the data and wrote the chapter.  Chapter 3 has not yet been submitted to publication. It will be submitted in two separate papers. In both papers, I will be the primary author and the work will be coauthored by Jeremy A. Draghi and Michael C. Whitlock. Jeremy A. Draghi provided help with the coding and both Jeremy A. Draghi and Michael C. Whitlock provided guidance, writing and editing.  I modified the simulation platform ENTWINE and the R wrapper to redevelop genotypes, ran the simulations, analyzed the data and wrote the chapter.  Chapter 4 has been written as a single author. Michael C. Whitlock provided early ideas on how to design the software. Michael C. Whitlock provided guidance in writing and in how to present benchmark results. Pirmin Nietlisbach is the main beta tester and provided advice on how to improve SimBit’s manual and SimBit’s user interface.    Preface 	 vi		Table of contents  ABSTRACT ............................................................................................................................................. iii LAY SUMMARY ...................................................................................................................................... iv PREFACE ................................................................................................................................................. v TABLE OF CONTENTS ............................................................................................................................. vi LIST OF TABLES .................................................................................................................................... viii LIST OF FIGURES  ................................................................................................................................... ix ACKNOWLEDGMENTS ............................................................................................................................ x DEDICATION .......................................................................................................................................... xi 1 INTRODUCTION ............................................................................................................................. 1 2 BACKGROUND SELECTION AND FST: CONSEQUENCES FOR DETECTING LOCAL ADAPTATION ........... 8 2.1 INTRODUCTION ................................................................................................................................ 8 2.2 METHODS ..................................................................................................................................... 10 2.3 RESULTS ....................................................................................................................................... 17 2.4 DISCUSSION .................................................................................................................................. 23 2.5 ACKNOWLEDGMENTS ...................................................................................................................... 26 2.6 FUNDING ...................................................................................................................................... 26 3 PHENOTYPIC PLASTICITY AND ROBUSTNESS ................................................................................ 27 3.1 INTRODUCTION .............................................................................................................................. 27 3.2 METHODS ..................................................................................................................................... 30 3.3 RESULTS ....................................................................................................................................... 34 3.4 DISCUSSION .................................................................................................................................. 38 3.5 ACKNOWLEDGMENT ....................................................................................................................... 42 3.6 FUNDING ...................................................................................................................................... 42 4 SIMBIT: A FAST, FLEXIBLE, FORWARD-IN-TIME POPULATION GENETIC SIMULATOR ..................... 43 4.1 INTRODUCTION .............................................................................................................................. 43 4.2 USER INTERFACE ............................................................................................................................. 44 4.3 DEMOGRAPHY AND SPECIES ECOLOGIES ............................................................................................... 45 4.4 MATING SYSTEM ............................................................................................................................ 46 4.5 TYPES OF LOCI AND SELECTION .......................................................................................................... 46 4.6 INITIALIZATION ............................................................................................................................... 48 4.7 OUTPUTS ...................................................................................................................................... 49 4.8 PROGRAM COMPARISON – PERFORMANCE .......................................................................................... 49 4.9 CONCLUSION ................................................................................................................................. 55 	 vii		4.10 ACKNOWLEDGMENT ....................................................................................................................... 55 4.11 FUNDING ...................................................................................................................................... 56 5 CONCLUSION ............................................................................................................................... 57 5.1 CHAPTER 2 .................................................................................................................................... 57 5.2 CHAPTER 3 .................................................................................................................................... 60 5.3 CHAPTER 4 .................................................................................................................................... 61 BIBLIOGRAPHY ..................................................................................................................................... 63 APPENDIX A SUPPLEMENTAL INFORMATION TO CHAPTER 2 ....................................................... 116 APPENDIX A.1 SUPPLEMENTAL FIGURES AND TABLES ................................................................................. 116 APPENDIX A.2 COALESCENT VS STATE BASED FST ...................................................................................... 125 APPENDIX B SUPPLEMENTAL INFORMATION TO CHAPTER 3 ....................................................... 127 APPENDIX B.1 SUPPLEMENTAL FIGURES AND TABLES ................................................................................. 127 APPENDIX B.2 MUTATIONAL NOISE ESTIMATOR ....................................................................................... 130 APPENDIX C SUPPLEMENTAL INFORMATION TO CHAPTER 4 ....................................................... 131 APPENDIX C.1 SUPPLEMENTAL FIGURES AND TABLES ................................................................................. 131 APPENDIX C.2 SIMBIT COMMANDS FOR SIMULATIONS BENCHMARKED ......................................................... 134 APPENDIX C.3 SIMBIT’S MANUAL .......................................................................................................... 140    	 viii		Table 2.1: Summary of treatments	.....................................................................................................	14 Table A.1.2: Correlation tests between B and HS	.......................................................................	120 Table A.1.3: Correlation tests between B and FST	.....................................................................	121 Table A.1.4: Correlation tests between B and FST (average of ratios)	............................................	122 Table A.1.5: Correlation tests between B and dXY	......................................................................	123 Table A.1.6: Correlation tests between B and dXY-SNP	...............................................................	124 Table B.1.1: Regressions between mutational dn developmental noise	.............................	129   List of tables 	 ix		 Figure 2.1: Distribution of FST ....................................................................................... 18 Figure 2.2: Comparisons of mean FST, dXY and HS with and without BGS ................. 19 Figure 2.3: Correlation between B and FST, HS and dXY for the last generation ........... 20 Figure 2.4: Comparison of false‐positive rate with and without BGS ............................ 22 Figure 3.1: Reaction norms for simulations with spatial heterogeneity .......................... 34 Figure 3.2: Average developmental noise for simulations with spatial heterogeneity ... 35 Figure 3.3: Fraction of genotypes that evolved plasticity in all treatments .................... 37 Figure 3.4: Regressions between mutational noise and developmental noise ................ 38 Figure 4.1: Comparison of computational time among the four different ways to simulate the same evolutionary scenario using SimBit .................................................................. 51 Figure 4.2: Comparison of computational time among the four different simulation programs Nemo, SFS_CODE, SLiM and SimBit ............................................................ 52 Figure 4.3: Comparison of CPU time among the four programs to reproduce simulations inspired from three recent papers ..................................................................................... 54 Figure A.1.1 Distribution of selection coefficient ......................................................... 116 Figure A.1.2: Correlations between B and FST, dxy, FST (average of ratios), dxy-SNP, and MAF - FST (average of ratios) .............................................................................................................. 117 Figure A.1.3: Relationship between HT and FST on a site per site basis ........................ 118 Figure A.1.4: Regressions of HT and HS on B ............................................................... 118 Figure A.1.5: Relationship between observed FST and the expected FST obtained from Zeng and Corcoran (2015) ............................................................................................. 119 Figure A.2.1: CPU time comparison ............................................................................. 125 Figure A.2.2: Mean FST with different programs .......................................................... 126 Figure A.2.3: A comparison between SimBit and Zeng and Corcoran (2015) results .. 126 Figure B.1.1: Average developmental noise for all treatments ..................................... 127 Figure B.1.2: Changes in developmental noise over time ............................................. 128 Figure C.1.1: Comparison of computational time among the four different ways to simulate the same evolutionary scenario using SimBit .................................................. 131 Figure C.1.2: Comparison of memory usage among the four different ways to simulate the same evolutionary scenario using SimBit ................................................................ 132 Figure C.1.3: Comparison of computational time among programs ............................. 132 Figure C.1.4: Comparison of memory usage among programs .................................... 133    List of figures 	 x		My first thank goes to my advisor, Mike Whitlock. Mike has been an outstanding mentor throughout my PhD. He offered me endless support on any question that I would bring to him. In his contact, I have learned much more than evolutionary biology. I learned about the working of scientific research today, I learned to deal with the semantic of a scientific discussion, I learned ethical considerations in research and many other things, I learned much about biology in general. In his contact, I have learned the skills and confidence to contribute to the advancement of our understanding of evolutionary processes and to our community. Thank you, Mike! I also want to thank my advisory committee: Amy Angert, Loren Rieseberg and Sally Otto. I feel extremely grateful to have had feedback from such brilliant people. The accuracy of their insights has always surprised me. I did not realize one could understand a project so well and contribute so much to it just with occasional meetings! Thank you for all your help. I also want to thank them for words of encouragements and congratulations that went a long at giving me the confidence that I needed to do the work presented here. Thank you to my wife, Laurélène. Thanking her for my thesis sounds unfair as her contribution into my life goes of course way beyond the work presented here. Nevertheless, thank you Laurelene for making my life beautiful throughout my PhD and thank you for your help formatting this thesis.  I want to thank everyone else that have made my life beautiful in the past 6 years. Thank you for every feedback on my work. Thank you for every one participating to the Huts skit, thank you for every smile, every party, every dance, every chat, every hug, every hike. I love you guys; you have been great and I hope to keep contact for both professional and personal matters. Thank you to ComputeCanada that have provided the computational resources for the work presented here. Thank you to the Doc.Mobility fellowship from the Swiss National Science Foundation and to Mike’s NSERC Discovery Grant who funded my work.   Acknowledgments 	 xi		To Laurélène and Elliot  Dedication 	 1	It is not easy to wrap one’s mind around the immense diversity of environmental conditions living organisms on earth are experiencing. From the hot sands of the Sahara Desert, to the equatorial rainforest of Brazil, passing by a karstic mountain cliff in the Alps, damp earth on a river shore, deep ocean waters or the interior of a mammal intestine, our earth is not short of environmental heterogeneity for living organisms to call home.  How, then, do organisms adapt to such varied environments? Many mutations may experience a constant selection coefficient over the range of environmental conditions experienced by the species; such mutations are sometimes said to be globally beneficial or globally deleterious (Atkins & Travis, 2010; Booker et al., 2019). However, with the exceptions of some lethal mutations, no mutation can have a constant selection coefficient in all environments. In this sense, all adaptation is an adaptation to environmental conditions (Brandon, 1995). The question of environment is therefore tightly tied together with the concepts of fitness and natural selection. Environmental heterogeneity is such a fundamental concept of evolution and ecology that it is hard to intuit what a homogeneous world would look like. Many studies have reported a positive relationship between environmental heterogeneity and species diversity (Heino et al., 2015; Hortal et al., 2009; Murren et al., 2015b; Stein et al., 2014; Tamme et al., 2010), suggesting that an environmentally homogeneous world would be much less biodiverse. This relationship between environmental heterogeneity and biodiversity can be caused by a number of mechanisms. First, environmental heterogeneity creates a greater diversity of ecological niches and reduces competition among species allowing better coexistence (Currie, 1991; Tews et al., 2004). Second, in heterogeneous environments, organisms are more likely to find a suitable habitat and survive in times of extreme environmental conditions (Fjeldså et al., 2012; Kallimanis et al., 2008). Third, environmental heterogeneity can boost speciation rates with reproductive isolation being more likely to evolve when different subpopulations can adapt to their respective local conditions (Antonelli & Sanmartín, 2011; C. Hughes & Eastwood, 2006; Rainey & Travisano, 1998). Environmental heterogeneity also affects assemblage of communities (Keller et al., 2009; Vasudevan et al., 2006; Yang et al., 2015) and makes species living in these communities more resilient to climate changes than species living in more homogeneous environments (Boyd et al., 2016). This is due to both selection for plastic responses (Molina-Montenegro & Naya, 2012; Seebacher et al., 2015) and because environmental heterogeneity helps retain more genetic variance, thereby facilitating adaptation to novel conditions (Huang et al., 2016). Many studies acknowledge that environmental heterogeneity heavily influences the geographic distribution of species (Brachet et al., 1999; Hanski & Ovaskainen, 2003; Levins, 1968; Patterson, 1976). Environmental heterogeneity can also affect the evolution of species interactions (Kaltz & Shykoff, 1998) and the evolution of many phenotypic traits such as social behaviours 1 Introduction 	 2	(Billiard & Lenormand, 2005), dispersal rate (Blanquart & Gandon, 2014) and sex (Becks & Agrawal, 2010). A homogeneous world could reasonably be expected to be fundamentally unsuitable for the evolution of diverse life forms. Many models assume environmental homogeneity by default, and these models are useful and can eventually give good approximations for certain populations. However, because organisms modify their environment through their metabolism and physical presence (niche construction; Boogert et al., 2006; Matthews et al., 2014; Naiman et al., 1988), perfect homogeneity can only be a reality in the theoretician’s mind. As Vandana Shiva famously said; “Uniformity is not nature’s way; diversity is nature’s way” (New Internationalist, 1995). Environmental heterogeneity can be spatial or temporal (Levins, 1968) and the type of heterogeneity at play will affect how populations can adapt to it (DeAngelis & Waterhouse, 1987; Hewitt et al., 2007). With spatial heterogeneity, subpopulations can evolve to be adapted to their local conditions (Lenormand, 2002). Hereford (2009) found evidence of local adaptation in 71% of populations studied demonstrating that local adaptation is a common phenomenon. With both spatial and temporal heterogeneity, a population can also adapt through a plastic response (Bradshaw, 1965; Schlichting, 1986). Finally, with temporal heterogeneity, a population can evolve to use a strategy where they lower their mean fitness in any given generation in order to maximize geometric mean fitness across a number of generations. This strategy is termed bet-hedging (Mulvey et al., 2016; Seger & Brockmann, 1987; Simons & Johnston, 1997; Veening et al., 2008). There exist different definitions of local adaptation. One might consider that local adaptation occurs when 1) local individuals are expressing a higher average fitness than non-local (foreign) individuals (local vs foreign definition), or when 2) individuals express a higher average fitness in their local environment than in a different environment  (home vs away definition), or when 3) there is significant genotype per environment interaction for mean fitness (see Blanquart et al., 2013 and Kawecki & Ebert, 2004 for semantic discussions). A number of studies have reported evidence of local adaptation using phenotypic traits (Hedrick et al., 1976; Linhart & Grant, 1996), using genetic data (e.g. Hemmer-Hansen et al., 2013), as well as using both phenotypic and genetic data (e.g. Colosimo et al., 2004; Hoekstra et al., 2006; Steiner et al., 2007). Identifying the loci underlying local adaptation is an essential step for many fundamental questions in ecology, evolutionary biology and conservation biology (examples reviewed in Hoban et al., 2016), but  it can be a difficult endeavour. As reviewed in Hoban et al. (2016), there exist five general approaches to determine the genetic basis of local adaptation: genetic differentiation outlier scans (e.g. Eckert et al., 2010), genotype-environment association studies (e.g. de Villemereuil & Gaggiotti, 2015), QTL mapping in a reciprocal transplant experiment (reviewed in Savolainen et al., 2013), GWAS following a common garden experiment (e.g. CGTA by Yang et al., 2011; XTX by Günther & Coop, 2013) and detection of population-specific selective sweeps (e.g. Sabeti et al., 2007). I will here focus on genetic differentiation outlier scans, a technique particularly useful when the environmental axes causing the local adaptation are unknown (Lotterhos & Whitlock, 2015). Genetic differentiation outlier scans use the idea that, at a locus under local selection, alleles are expected to be at higher frequency where they are beneficial and at lower 	 3	frequency where they are deleterious, hence causing increased population divergence at the selected locus and surrounding region. The majority of studies of genetic differentiation outlier scans use FST as the statistic of population divergence, and the method is then usually called an FST outlier scan. But some studies considered other statistics of divergence like dXY (e.g. Irwin et al., 2016; Vijay et al., 2017). The first formal theoretical framework for outlier tests stems from the discovery that the statistic  ("#$)&!"&!"'''''  is approximately !2 distributed, where " is the number of demes and #()$$$$ is the mean #() among all loci (Lewontin & Krakauer, 1973). Many methods have been developed since (M. Beaumont & Nichols, 1996; Maxime Bonhomme et al., 2010; Matthieu Foll & Gaggiotti, 2008; Günther & Coop, 2013). Lotterhos & Whitlock (2014) categorize these outlier tests in two broad categories: 1) Tests that simulate a null distribution from a demographic history that is being estimated from the data and 2) tests that assume sampled patches have diverged in a measurable way from a common ancestral population. In the first category are methods developed by Beaumont & Nichols (1996) and Excoffier et al. (2009) that simulate a null distribution from an island model and from a hierarchically structured population, respectively. In the second category are methods that attempt to control for population structure by computing, for example, a covariance matrix (Günther & Coop, 2013) or coancestry (Maxime Bonhomme et al., 2010) among populations. Many of these methods have been shown to be not very robust to complex demographic scenarios, such as demographic expansion (Lotterhos & Whitlock, 2014a). Outlier scans make use of signals left by the effect of the selected locus on linked variation. There are two reasons for that. First, genetic drift can create loci with high among-population differentiation, and hence, a site-by-site analysis would not be sensitive enough to pick up any signal of selection. Second, the site that is directly under selection may not have been sampled. Outlier methods therefore search for genetic regions of arbitrary size that show unusually high population divergence. As such, any genetic process that has consequences on linked sites could potentially be confounded with local adaptation. For example, local fixation of a globally beneficial allele also lead to FST outliers (Booker et al., 2019). Disentangling the effect of local adaptation from global adaptation requires cautious data analysis (Booker et al., 2019). Many authors (e.g. Charlesworth et al., 1997; Cruickshank & Hahn, 2014; Cutter & Payseur, 2013; Hoban et al., 2016; Irwin et al., 2016) have discussed whether background selection (defined as the reduction in genetic diversity at loci linked to loci that receive repetitive deleterious mutations) may create outliers in FST that can be confounded with outliers that local adaptation can create. An intuitive explanation for why background selection would affect FST is because background selection reduces the within-population genetic diversity %( (Brian Charlesworth et al., 1997; Nordborg et al., 1996b), and if all else remained equal (an assumption that shall be discussed in chapter 2), it would lead to an increase in FST &#() = *"#*!*" (.  In the second chapter, Michael Whitlock and I investigate the effect of background selection on FST. We observe that many of the expectations found in the literature for why background selection should have a strong effect on FST are based upon Charlesworth et al. (1997). Charlesworth et al. (1997), however, never meant to perform realistic simulations. As made explicit in their methods, they purposely performed simulations for 	 4	which background selection would have an unrealistically high effect on FST. In the second chapter, we strive at performing realistic simulations of background selection (simulating genes location and recombination map from stickleback and human genomes) to investigate whether background selection can create peaks of FST that could eventually be confounded with evidence of local adaptation. We test whether background selection affects the false positive rate of Fdist2, a classical FST outlier technique. We discuss our results in the context of whether background selection is a confounding factor when seeking loci under local adaptation. As discussed above, an organism can also adapt to environmental heterogeneity with a plastic response. Adaptive plasticity (hereafter simply referred to as “plasticity”) is the ability to express a range of phenotypes in multiple environments so as to remain relatively fit despite environmental heterogeneity in a way that would not be possible by producing a single phenotype in all environments (Levins, 1968; Scheiner, 1993; Schlichting, 1986; West-Eberhard, n.d.). Early work acknowledged that plasticity can be an influential factor in evolutionary biology (Wright, 1931). Bradshaw (1965)’s review has been central in setting the conceptual foundations of the concepts of plasticity. Bradshaw emphasized that plasticity is in itself a phenotypic trait that is under genetic control, opening the door for studying plasticity with the tools of quantitative genetics. Bradshaw also asked a number of important questions that are still not fully answered today; Is there a correlation between plastic responses at different traits? What is the genetic architecture of a plastic response? How much genetic variance is there in populations for plastic responses? How can we apply selection for/against plasticity? Schlichting (1986) argues that empirical advances had been slow up to and through the 1980s 1) because of experimental difficulties in measuring a plastic response and 2) because, despite Bradshaw’s (1965) review, many authors saw plasticity as antithetical to the growing field of quantitative genetics. The first analytical models of plasticity were published in the late 80s and early 90s (Gavrilets & Scheiner, 1993a, 1993b; Gillespie & Turelli, 1989; Gomulkiewicz & Kirkpatrick, 1992; Leòn, 1993; Via & Lande, 1987). These early studies model phenotypic plasticity with a norm of reaction and did not deal with the genetic basis of the plastic response (black box approach; Pigliucci, 1996). In the past two decades, we started to understand the importance of the mechanism by which plastic responses are implemented. For example, we discovered that the developmental mechanism of a plastic response can impact genetic correlations among traits (Brookes & Rochette, 2007) and hence impact the ability of the reaction norm to respond to selection (Lande & Arnold, 1983). It has been hypothesized that different developmental mechanisms can differentially impact the fitness cost of plasticity (Snell-Rood, 2012) and the ability for an organism to respond adaptively to a novel environment (Snell-Rood et al., 2018). Finally, understanding whether the mechanism underlying a plastic response is active or passive gives us insight as to whether the response is adaptive or not (A Forsman, 2015; Pigliucci, 1996). As an example, we knew that the physiological mechanism of shade avoidance in plants is active and costly to the organism (reviewed in Schmitt & Wulff, 1993) before having ecological evidence that the response is indeed adaptive (Schmitt et al., 1995). Adaptive plasticity allows an individual to sustain a high fitness in a diversity of environments (Levins, 1968; Scheiner, 1993; Schlichting, 1986; West-Eberhard, n.d.). 	 5	Plasticity is, however, not as ubiquitous as many would expect, which has led many authors to question what constrains its evolution. If constraints did not exist, organisms could be “perfectly plastic”, expressing the optimal phenotype in all of the experienced environments. DeWitt et al. (1998) divide the constraints to plasticity evolution into costs and limits. They define a cost of plasticity as the fitness loss of a plastic individual compared to a non-plastic individual when producing the same phenotype. A limit of plasticity is defined as an inability to produce the optimal phenotype in a given environment. In general, organisms fail to produce a perfect plastic responses either because of a developmental inability to produce the optimal phenotypes (limits) or because producing such optimal phenotype would come at a fitness cost (costs).  As reviewed by DeWitt et al. (1998), limits include how reliable the information (e.g. environmental cue) is, the developmental lag-time to produce the response, the developmental and physical constraints inherent in producing the response and the fact that building a phenotype half way during development might not be as great as building this phenotype early on (this is called the epiphenotypic problem; Snell-Rood et al., 2015). Costs of plasticity include costs of production and maintenance of the sensory and regulatory developmental machinery, genetic costs caused by negative linkage disequilibrium, pleiotropic and epistatic effects, costs of information acquisition and cost of developmental instability. The hypothesis that plasticity generates developmental instability, reducing developmental robustness, hence reducing fitness has been postulated many times in the theoretical literature (Gavrilets and Hastings 1994; Wagner et al. 1997; Lynch and Walsh 1998; Badyaev 2005). Empirical evidence on such costs has been inconclusive though. Scheiner et al. (1991) explored the correlation between plasticity and the cost of developmental instability (estimated from fluctuating asymmetry) in Drosophila melanogaster. They found a significant correlation only for some of the traits studied and in only one of the two environments considered. They also reviewed evidence of such correlation in previous work on D. melanogaster and show that out of 8 studies, 3 reported a significant correlation for at least one trait. More recently, Tonsor et al. (2013) reported correlations between plasticity and developmental instability for 13 traits of Arabidopsis thaliana.  In chapter 3, we argue that the general intuition that plastic genotypes are more developmentally noisy might fail to take into account the diversity of developmental pathways an organism can use to implement a plastic response. We investigate how different plastic developmental pathways affect developmental robustness and how evolution for developmental robustness can affect plasticity evolution. We also investigate how developmental robustness can be used as a bet-hedging strategy. For this, we are using a model simulating the evolution of a network of genes that are regulated by each other’s transcription factors. This network of gene interaction is modelled through a modified Gillespie (2007) algorithm. We strive to model the molecular genetics of development with as much realism as is computationally tractable. A plastic developmental pathway may have other consequences on an organism’s physiology than affecting the individual’s developmental robustness. As a plastic response 	 6	affects the development of an individual, it might also affect its mutational robustness. In general, robustness of a genotype for a phenotypic trait is defined as the ability of this genotype to constantly produce the same trait value independently of perturbations (Montville et al., 2005; see also Pigliucci, 2008 for a review of definitions). In particular, mutational robustness is defined as the ability of a genotype to keep producing the same phenotype in face of new mutations (A. Wagner, 2005a).  Mutational robustness is an important factor explaining the pace of heritable adaptation in populations (A. Wagner, 2005a). Counter-intuitively maybe, studies show that over long periods of time, mutational robustness increases the ability to adapt to new conditions (Draghi et al., 2010; Wagner, 2008). If a plastic response affects mutational robustness, then adapting to a heterogeneous environment through a plastic response may have important and eventually counter-intuitive consequences on the future evolution of the species. While our understanding of the evolution of mutational robustness has increased over the past 20 years (mainly theoretical advancements in developmental biology and molecular genetics; reviewed in Hansen, 2006; Masel & Siegal, 2009; Masel & Trotter, 2010; Visser et al., 2003), we are still at a level that rarely allows us to treat mutational noise other than as a constant in most models, which very much reduces our ability to make accurate predictions. One of the most popular hypotheses for the evolution of mutational robustness is that it evolves as a correlated side-effect to the evolution of developmental robustness (Ancel & Fontana, 2000; Rifkin et al., 2005; Stearns et al., 1995; Szollosi & Derenyi, 2009a). This hypothesis is called the congruence hypothesis. In chapter 3, we also test the congruence hypothesis. In chapters 2 and 3 introduced above, I use intensive individual-based numerical simulations. For chapter 2, I perform realistic simulation of large stretches of DNA. Failing to find a program that could simulate such high genetic diversity in reasonable time, I had to write my own software, SimBit. Achieving such performance requires finding new creative ways to simulate a seemingly simple process.  Evolutionary genetics differ from most branches of biology by its strong theoretical structure. Numerical simulations have always been an important tool of investigation in evolutionary genetics. To my knowledge, the very first work in computational biology was conducted by one of the fathers of population genetics; Ronald A. Fisher (1950; on an EDSAC computer). The second work in computational biology was probably in developmental genetics by Alan M. Turing himself (1952; on a Manchester Computer). Most of the very early work in computational biology were concentrated in the fields of population biology (Bartlett, 1957; Hassell & Varley, 1969) and molecular developmental biology (Dayhoff & Ledley, 1962; Gierer & Meinhardt, 1972). Ecology and evolution are often dealing with highly complex systems for which we need to perform individual-based simulations (DeAngelis & Grimm, 2014). The number of papers in the literature that make use of individual-based models is predicted to increase in the future with expanding uses of software (DeAngelis & Mooij, 2005). There are two general motivations for using individual-based simulations (DeAngelis & Grimm, 2014). One motivation is to make new scientific breakthroughs. It is called the paradigmatic approach. The other motivation is to simulate specific populations of interest, often with a 	 7	management goal in mind. It is called the pragmatic approach. Grimm (1999) argues that studies using individual-based modelling for a paradigmatic approach should be made more common; a trend that current research seems to follow (DeAngelis & Grimm, 2014).  SimBit, the software used in the second chapter, has since grown to become a reliable and flexible software package, whose main goals are to 1) offer very high performance for a wide range of simulation scenarios, 2) offer a fast learning curve for new users and 3) a vast diversity of selection scenarios, demographic parameters and several species with their ecological interactions. In chapter 4, I present the software SimBit and compare its performance to Nemo, SFS_CODE and SLiM, commonly used forward-in-time population genetics software packages. Environmental heterogeneity is core to many evolutionary processes. In this thesis, I am investigating a few aspects of it. I am investigating how background selection can affect the methods to detect local adaptation. I am also investigating plastic responses and how it affects both developmental robustness and mutational robustness. Finally, I am presenting a general-purpose forward-in-time simulation platform for population genetic studies that allow rapid genome-wide evolution across environments.  	 8	2.1 Introduction Maynard-Smith & Haig (1974) recognized the influence of selection on linked neutral sites, proposing that strong positive selection could reduce genetic diversity at nearby sites. This process is now referred to as a ‘selective sweep’. Much later, Charlesworth, Morgan and Charlesworth (1993) proposed that deleterious mutations could also affect genetic diversity at nearby sites, because some haplotypes would be removed from the population as selection acts against linked deleterious alleles. They named this process background selection (BGS). Both selective sweeps and background selection affect genetic diversity; they both reduce the effective population size and distort the site frequency spectrum (SFS) of linked loci. Empirical evidence of a positive correlation between genetic diversity and recombination rate has been reported in several species (Cutter and Payseur, 2013), including Drosophila melanogaster (Begun & Aquadro, 1992; Elyashiv et al., 2016), humans (Spencer et al., 2006), collared flycatchers, hooded crows and Darwin’s finches (Dutoit et al., 2017; see also Vijay et al., 2017). BGS is also expected to affect FST (Charlesworth et al., 1997; Cutter & Payseur, 2013; Hoban et al., 2016). The negative relationship between effective population size Ne and FST is captured in Wright’s classical infinite island result; #() =	 $$+,-#(.+/)  (Wright, 1943), where m is the migration rate and µ is the mutation rate. One might therefore expect that loci under stronger BGS would show higher FST.  Many authors have also argued that, because BGS reduces the within-population diversity, it should lead to high FST (Cutter & Payseur, 2013a); Cruickshank & Hahn, 2014; Hoban et al., 2016). Expressed in terms of heterozygosities, #() =	*"#*!*" = 1 − *!*", where HT is the expected heterozygosity in the entire population and HS is the average expected heterozygosity within subpopulations (HS and HT are also sometimes called πS and πT; e.g. Charlesworth, 1998). All else being equal, a decrease of HS would indeed lead to an increase of FST. However, all else is not equal; HT is also affected by BGS (Charlesworth et al., 1997). Therefore in order to understand the effects of BGS on FST, we must understand the relative impact of BGS on both HS and HT. (This paragraph has expressed FST by one definition that mirrors Nei’s GST, but Weir and Cockerham’s (1984) q estimates a similar quantity, #() = 0$0!+0$ , where ,1 = "#$" (,) − ,2)  and d is the number of populations sampled. For the majority of this paper we consider this more broadly used 2 Background selection and FST: consequences for detecting local adaptation 	 9	measure from Weir and Cockerham.) By either measure of FST, if HS and HT are changed in the same proportion by BGS, there would be no effect on FST. Charlesworth et al. (1997) as well as Zeng & Corcoran (2015) have performed more sophisticated analysis measuring the effect of BGS on FST through numerical simulations. Charlesworth et al. (1997) reported that BGS reduces the within-population heterozygosity HS slightly more than it reduces the total heterozygosity HT, causing a net increase in FST. The effect on FST reported is quite substantial, but, importantly, their simulations were not meant to be realistic. The authors highlighted their goal in the methods: “The simulations were intended to show the qualitative effects of the various forces studied […], so we did not choose biologically plausible values […]. Rather, we used values that would produce clear-cut effects”.  For example, talking about their choice for the deleterious mutation rate of 8 × 10-4 per site: “This unrealistically high value was used in order for background selection to produce large effects [...]”  Much of the literature on the effect of BGS on FST is based on the results in Charlesworth et al. (1997), even though they only intended to provide a proof of concept (see also Zeng & Charlesworth, 2011 and Zeng & Corcoran, 2015). They did not attempt to estimate how strong of an effect BGS has on FST in real genomes.  The intensity of BGS varies throughout the genome as a consequence of variation in recombination rate, selection pressures and mutation rates. Therefore, if BGS significantly affects FST, we should expect that baseline FST to vary throughout the genome. It is important to distinguish two separate questions when discussing the effect of BGS on FST: 1) How does BGS affect the average genome-wide FST? and 2) How does locus-to-locus variation in the intensity of BGS affect locus-to-locus variation in FST? The second question is of particular interest to those trying to identify loci under positive selection (local selection or selective sweep). Locus-to-locus variation in FST due to BGS potentially could be confounded with the FST peaks created by positive selection. In this paper, we focus on this second question. The identification of loci involved in local adaptation is often performed via FST outlier tests (Lotterhos & Whitlock, 2014b). Other tests exist to identify highly divergent loci such as cross-population extended haplotype homozygosity (XP-EHH; Sabeti et al., 2007), comparative haplotype identity (Lange & Pool, 2016) and cross-population composite likelihood ratio (XP-CLR; Chen et al., 2010). FST outlier tests, such as FDist2 (M. A. Beaumont & Nichols, 1996), BayeScan (M. Foll & Gaggiotti, 2008) or FLK (M. Bonhomme et al., 2010), look for genomic regions showing particularly high FST values to find candidates for local adaptation. If BGS can affect FST unevenly across the genome, then regions with a high intensity of BGS could potentially have high FST values that could be confounded with the pattern caused by local selection ( Charlesworth et al., 1997; Cruickshank & Hahn, 2014). BGS could therefore inflate the false positive rate when trying to detect loci under local selection. 	 10	The potential confounding effect of BGS on signals of local adaptation has led to an intense effort trying to find solutions to this problem (Aeschbacher et al., 2017; Bank et al., 2014; Huber et al., 2016). Many authors have understood from Cruickshank and Hahn (2014) that dXY should be used instead FST in outlier tests (e.g. McGee et al., 2015; Yeaman, 2015; Whitlock & Lotterhos, 2015; Brousseau et al., 2016; Picq et al., 2016; Payseur & Rieseberg, 2016; Hoban et al., 2016; Vijay et al., 2017; see also Nachman & Payseur, 2012). FST is a measure of population divergence relative to the total genetic diversity, while dXY is an absolute measure of population divergence defined as the probability of non-identity by descent of two alleles drawn in the two different populations averaged over all loci (Nei, 1987, originally called it DXY but, here, we follow Cruickshank and Hahn’s, 2014 terminology by calling it dXY). Whether BGS can affect genome-wide FST under some conditions is not in doubt (Charlesworth et al., 1997), but whether locus-to-locus variation in the intensity of BGS present in natural populations substantially affects variation in FST throughout the genome is very much unknown. Empirically speaking, it has been very difficult to measure how much of the genome-wide variation in genetic diversity is caused by BGS, as opposed to selective sweeps or variation in mutation rates (Cutter & Payseur, 2013; see also attempts in humans by Cai et al., 2009, McVicker et al.,  2009 and Elyashiv et al., 2016). We are therefore in need of realistic simulations that can give us more insight into how BGS affects genetic diversity among populations and how it affects the statistics of population divergence. In this article, we investigate the effect of BGS in structured populations with realistic numerical simulations. Our two main goals are 1) to quantify the impact of locus-to-locus variation in the intensity of BGS on FST (B S Weir & Cockerham, 1984) and dXY (Nei, 1987a) and 2) to determine whether BGS inflates the false positive rate of FST outlier tests.  2.2 Methods Our goal is to perform biologically plausible simulations of the local genomic effects of background selection in biological settings with ongoing gene flow among populations. BGS is expected to vary with strength of selection (itself affected by gene density), mutation rate and recombination rate across the genome. We used data from real genomes to simulate realistic covariation in recombination rates and gene densities. We chose to base our simulated genomes on two eukaryote recombination maps derived from sticklebacks and humans, because these two species have attracted a lot of attention in studies of local adaptation. The recombination rate variation in humans is extremely fine scale, but it presents the potential issue that it is estimated from linkage disequilibrium data. As selection causes linkage disequilibrium to increase, estimates of recombination rate at regions under strong selection may be under-estimated, which might bias the simulated variance in the intensity of BGS. Although the recombination map for stickleback is much less fine-scaled, the estimates are less likely to be biased as they are computed from pedigrees. Our simulations are forward in time and were performed using the simulation platform SimBit version 3.69 (see chapter 4). We simulated non-overlapping generations with 	 11	hermaphroditic, diploid individuals and random mating within patches. Selection occurred before dispersal. The code and user manual are available at https://github.com/RemiMattheyDoret/SimBit. The rationale for using a new simulation platform is because all existing simulation platforms today were too slow for our needs. (See Appendix A.2) To double check our results, we also ran some simulations with SLiM (Haller & Messer, 2017a) and Nemo (Frédéric Guillaume & Rougemont, 2006) (see Appendix A.2), confirming that we obtain consistent distributions of genetic diversity and of FST among simulations. For simulations with SLiM and Nemo, independent simulations were parallelized with GNU parallel (Tange, 2011). 2.2.1 Genetics For each simulation, we randomly sampled a sequence of about 10 cM from either the stickleback (Gasterosteus aculeatus) genome or the human genome (see treatments below) and used this genomic location to determine the recombination map and exon locations for a simulation replicate. For the stickleback genome, we used the gene map and recombination map from Roesti et al. (2013). Ensembl-retrieved gene annotations were obtained from Marius Roesti (University of Bern, Switzerland). For the human genome, we used the recombination map from The International HapMap Consortium (The International HapMap, 2007) and the gene positions from NCBI and positions of regulatory sequences on Ensembl (Zerbino et al., 2018). We excluded sex chromosomes to avoid complications with haploid parts of the genomes. As estimates of mutation rate variation throughout the genome are very limited, we assumed that the haploid mutation rate varies from site to site following an exponential distribution with mean of 2.5 × 10-8 per generation (mean estimate from (Nachman & Crowell, 2000)).  More specifically, we first randomly sampled a sequence of 105 nucleotides, which we will refer to as the focal region. All of the statistics (defined under the section Statistics below) are calculated only on this focal region of each simulation. Nucleotides that occur in locations determined to be exons in the sampled genomic map are subject to selection (see Selection), while all other nucleotides are assumed to be neutral. The focal region itself contained on average ~0.44 genes for the human genome and ~3.15 genes for the stickleback genome.  We simulated a 5 cM region on each side of the focal region (resulting in a window of 10 cM plus the map distance covered by the specific focal region of 105 sites) in order to capture the local effects of background selection. In these 10 cM flanking regions, we only tracked exons (or in the case of the human genetic map, exons and regulatory sequences). In the nearest 1 cM on each side of the focal region, as well within the focal region, we individually simulated each nucleotide as a bi-allelic locus. On the remaining outer 4 cM, to improve the speed and RAM usage of the process, we tracked the number of mutations in blocks of up to 100 nucleotides. For these blocks, we tracked only the number of mutations but not their location within the block. Ignoring recombination within a block likely had little effect on the results because the average recombination distance between the first and last site of a block is of the order of 10-6 cM. The expected number of segregating sites within a block is  401 ∑ (1 3⁄ )3-#$45$ , which for a mutation rate per block 	 12	of 10-6 and a population size of N = 10,000 is ~0.42. The probabilities of having more than one mutation and more than two mutations (based on a Poisson approximation) are therefore only approximately 6.7% and 0.9%, respectively. Overall, the level of approximation used is very reasonable.  2.2.2 Selection As we are interested in the effect of BGS, we modelled the effects of purifying selection against novel deleterious mutations. Each nucleotide in the exons (and regulatory sequences for the human genetic map) is subject to purifying selection with a selection coefficient against mutant alleles determined by a gamma distribution described below. For focal regions that include exons, statistics are computed over a sequence that is at least partially under direct purifying selection.  To create variance in selection pressures throughout the genome, each exon (and regulatory sequence for the human genetic map) has its own gamma distribution of heterozygous selection coefficients s. The mean and variance of these gamma distributions are drawn from a bivariate uniform distribution with correlation coefficient of 0.5 (so that when the mean is high, so is the variance) bounded between 10-8 and 0.2 for both the mean and the variance. These bounds were inspired by the methodology used in Gilbert et al. (2017). The gamma distributions are bounded to one. Figure A.1.1 shows the overall distribution of selection coefficient s, with 2% of mutations being lethal and an average deleterious selection coefficient for the non-lethal mutations of 0.074. In the treatment Low selection pressure (see treatments below), the upper bounds for the mean and variance of the gamma distributions were set to 0.1 instead of 0.2. To improve the performance of our simulations, we assumed multiplicative fitness interactions among alleles, where the fitness of heterozygotes at locus i is 1– si and the fitness of the double mutant is (1– si)2. Any mutation changes the state of the locus to the other possible allele.  As a consequence of our parameter choices, our genome-wide deleterious mutation rate was about 1.6 in sticklebacks and about 3 in humans. 9.8% of the stickleback genome and 2.7% of the human genome was under purifying selection. For comparison, the genome-wide deleterious mutation rate is estimated at 2.2 in humans (Keightley, 2012) and 0.44 in rodents (Keightley & Gaffney, 2003). To our knowledge, there is currently no such estimation for sticklebacks. Note however that the above methods of estimation cannot reliably detect mutations that are quasi-neutral (s << 1/(2N)). By our distribution of selection coefficients, 49% of all deleterious mutations have a heterozygote selection coefficient lower than 1/(2Ne) when Ne = 1,000 (42% when Ne = 10,000). The fraction of selection coefficients that are of intermediate effect (between 1/(2Ne) and 10/(2Ne)) is 10% when Ne = 1,000 (7% when Ne = 10,000).  It is worth noting that, in rodents, about half of the deleterious mutations occur in non-coding sequences (Keightley & Gaffney, 2003). On top of having selection on exons, our simulations using human genetic map also has all regulatory sequences under purifying selection. With our simulations based on the stickleback genome, however, only exons were under purifying selection. It is therefore possible that we would have over-estimated the deleterious mutation rate in gene-rich regions and under-estimated the deleterious 	 13	mutation rate in other regions, especially in stickleback. This would artificially increase the locus-to-locus variation in the intensity of BGS in our simulations, which is conservative to our conclusions. 2.2.3 Demography In all simulations (except for those matching the Charlesworth et al. (1997) results), we started with a burn-in phase with a single population of N diploid individuals, lasting 5 × 2N generations. The population was then split into two populations of N individuals each with a migration rate between them equal to m. After the burn-in phase, each simulation was run for 5 × 2N more generations for a total of 10 × 2N generations. 2.2.4 Treatments We explored the presence and absence of deleterious mutations over two patch sizes, three migration rates, two genomes, and three selection scenarios. We considered a default design and explored variations from this design. The Default design had a population size per patch of N = 1000, a migration rate of m = 0.005 and used the stickleback genome for its recombination map and gene positions. As deviations from this default design, we explored modification of every variable, one variable at a time. The Large N treatment has N = 10000. The Human Genetic Map treatment uses the human genome for gene positions, regulatory sequences and recombination map. The treatments No Migration and High Migration have migration rates of m = 0 and m = 0.05, respectively. The Constant µ treatment assumes that all sites have a mutation rate of 2.5 × 10-8 per generation. The Low selection pressure treatment simulates lower selection coefficients (see section Selection above). To test the robustness of our results and because it may be relevant for inversions, we also performed simulations where the recombination rate for the entire 10cM region was set to zero. As a check against previous work, we qualitatively replicated the results of (Charlesworth et al., 1997) by performing simulations with similar assumptions as they used. We named this treatment CNC97. In our CNC97 simulations, N=2000, m=0.001, and 1000 loci were all equally spaced at 0.1 cM apart from each other with constant selection pressure with heterozygotes having fitness of 0.98 and mutant homozygote fitness of 0.9 and constant mutation rate µ = 0.0004. Note that Charlesworth et al. (1997) used partially recessive alleles which we mimic for this case, but the simulations for the other cases considered in this paper use multiplicative interactions between alleles. We performed further checks against previous works that are presented in Appendix A.2. A full list of all treatments can be found in table 2.1. 	 14	Table 2.1: Summary of treatments. For all treatments but CNC97, the average mutation rate was set to 2.5 × 10-8 per site, per generation and the mean heterozygous selection coefficient as described in the text. Treatment N m Genome Other BGS Default 1000 0.005 Stickleback --- Yes No High Migration 1000 0.05 Stickleback --- Yes No Large N 10000 0.005 Stickleback --- Yes No Human genetic  map 1000 0.005 Human --- Yes No Low selection pressure 1000 0.005 Stickleback Low selection pressure Yes No Constant µ 1000 0.005 Stickleback Constant          mutation rate Yes No No Migration 1000 0 Stickleback --- Yes No No Recombination 1000 0.005 Stickleback Absence of     crossover Yes No CNC97 2000 0.001 NA See methods    section Yes No  In all treatments (except Large N), we performed 4000 simulations; 2000 simulations with BGS and 2000 simulations without selection (where all mutations were neutral). For Large N, simulations took more memory and more CPU time. For the Large N case, we performed 1000 simulations with background selection and 1000 simulations without selection.  We set the generation 0 at the time of the split. The state of each population was recorded at the end of the burn-in period (generation -1) and at generations 0.001× 2N, 0.05 × 2N, 0.158 × 2N, 1.581 × 2N and 5 × 2N after the split. For N=1000, the sampled generations are therefore -1, 2, 100, 316, 3162 and 10000. 2.2.5 Predicted intensity of Background Selection In order to investigate the locus-to-locus correlation between the predicted intensity of BGS and various statistics, we computed B, a statistic that approximates the expected ratio of the coalescent time with background selection over the coalescent time without background selection &5 = 	 )%&!)'#()*+,(. B quantifies how strong BGS is expected to be for a given simulation (Nordborg et al., 1996a). Furthermore, B has been used to predict the effect of BGS on FST (Zeng and Corcoran 2015). A B value of 0.8 means that BGS has caused a drop of neutral genetic diversity of 20% compared to a theoretical absence of BGS. Lower B values indicate stronger BGS. 	 15	Both Hudson & Kaplan (1995) and Nordborg et al. (1996) have derived the following theoretical expectation for B.   where ri is the recombination rate between the focal site and the ith site under selection, and si is the heterozygous selection coefficient at that site, and ui is the (haploid) mutation rate at the ith site. By this formula, B is bounded between 0 and 1, where 1 means no BGS at all and low values of B mean strong BGS. We computed B for all sites in the focal region and report the average B for the region. While B is calculated just for the focal region, it is based on the effects of all selected sites in the simulated region of just over 10 cM. For the stickleback genome, B values ranged from 0.03 to 0.99 with a mean of 0.84 (Figure A.1.2). For the human genome, B values ranged from 0.20 to 1.0 with a mean at 0.91. In the No Recombination treatment, B values range from 10-10 to 0.71 with a mean of 0.07. Excluding the treatments No recombination and CNC97, we observed that simulations with BGS have a genetic diversity (whether HT or HS; HT data not shown) 6% to 25% lower than simulations without BGS. Messer & Petrov (2013) simulated a panmictic population, looking at a sequence of similar length inspired from a gene-rich region of the human genome, and reported a similar decrease in genetic diversity. Under the No Recombination treatment, this average reduction of genetic diversity due to BGS is 53%. In empirical studies, linked selection is estimated to reduce genetic diversity by up to 6% according to Cai et al. (2009) or 19-26% according to McVicker et al. (2009) in humans. In Drosophila melanogaster, where gene density is higher, the reduction in genetic diversity due to linked selection is estimated at 36% when using Kim & Stephan (2000)’s methodology and is estimated at 71% reduction using a composite likelihood approach (Elyashiv et al., 2016). In mice, BGS alone cannot explain fully the reduction in genetic diversity at low recombination sites, and selective sweeps due to positive selection are responsible for the majority of the reduction in diversity due to linked selection (Booker & Keightley, 2018a). It is worth noting that, because we were interested in the locus-to-locus variation of various statistics in response to varying intensity of BGS, we did not simulate a whole genome worth of BGS and hence the overall reduction in genetic diversity that we observe should not be understood as a genome-wide effect of BGS. 2.2.6 FST outlier tests In order to know the effect of BGS on outlier tests of local adaptation, we used a variant of FDist2 (M. A. Beaumont & Nichols, 1996). We chose FDist2 because it is a simple and fast method for which the assumptions of the test match well to most of the demographic scenarios simulated here (although it is a poor match to the No migration scenario, which has not reached equilibrium in our simulation conditions). Because the program FDist2 is not available through the command line, we rewrote the FDist2 algorithm in R and C++. Source code can be found at https://github.com/RemiMattheyDoret/Fdist2. Our FDist2 procedure is as follows. First, we estimated the migration rate from the average FST of the specific set of simulations considered (6 = $#&!",	7 --./80-	&!"; (Charlesworth, 1998) B = exp − uisi(si + ri )2i∑⎛⎝⎜⎞⎠⎟	 16	and then running 50000 simulations each lasting for 50 times the half-life to reach equilibrium FST given the estimated migration rate (Whitlock, 1992). For each SNP, we then selected the subset of FDist2 simulations for which allelic diversity was less than 0.02 away from the allelic diversity of the SNP of interest. The P-value is computed as the fraction of FDist2 simulations within this subset having a higher FST than the one we observed. The false positive rate is then defined as the fraction of neutral SNPs for which the P-value is lower than a given α value, using α = 0.05. We confirmed that the results were similar for other α values. For the outlier tests, to avoid issues of pseudo-replication, we considered only a single SNP (randomly sampled from the focal region) per simulation whose minor allele frequency is greater than 0.05. Then, we randomly assembled SNPs from a given treatment into groups of 500 SNPs to create the data file for FDist2. We have 4000 simulations (2000 with BGS and 2000 without BGS) per treatment (Large N is an exception with only 2000 simulations total), which allowed 8 independent false positive rate estimates per treatment (4 estimates with BGS and 4 without BGS). In each treatment, we tested for different false positive rate with and without BGS with both a Welch’s t-test and a Wilcoxon test. 2.2.7 Statistics FST and dXY are both measures of population divergence. In the literature there are several definitions of FST, and we also found potential misunderstanding about how dXY is computed.  There are two main estimators of FST in the literature; GST (Nei, 1973a) and θ (B S Weir & Cockerham, 1984). In this article, we focus on θ as an estimator of FST (B S Weir & Cockerham, 1984). There are also two methods of averaging FST over several loci. The first method is to simply take an arithmetic mean over all loci. The second method consists at calculating the sum of the numerator of θ over all loci and dividing it by the sum of the denominator of θ over all loci. Weir and Cockerham (1984) showed that this second averaging approach has lower bias than the simple arithmetic mean. We will refer to the first method as the “average of ratios” and to the second method as “ratio of the averages” (Reynolds et al., 1983; Weir & Cockerham, 1984). In this article, we use FST as calculated by “ratio of the averages”, as advised by Weir and Cockerham (1984). To illustrate the effects of BGS on the biased estimator of FST, we also computed FST as a simple arithmetic mean (“average of the ratio”), and we will designate this statistic with a subscript FST (average of ratios). dXY is a measure of genetic divergence between two populations X and Y. Nei (1987) defined dXY as  where L is the total number of sites, is the number of alleles at the lth site and  and are the frequency of the kth allele at the lth locus in the population X and Y respectively. dXY =1− xl,k yl,kk=1Al∑⎛⎝⎜⎞⎠⎟l=1L∑LAl xl,kyl,k	 17	Some population genetics software packages (e.g., EggLib; De Mita, 2012) average dXY over polymorphic sites only, instead of averaging over all sites, as in Nei's (1987) original definition of dXY. This measure averaged over polymorphic sites only will be called           dXY-SNP; otherwise, we use the original definition of dXY by Nei (1987). We report the average FST, dXY, and within-population genetic diversity . Our main results lie in the comparison between simulations with BGS and simulations without BGS within each treatment. Because theoretical expectations exist for the strength of BGS on genetic diversity within populations, we also investigated the relationships between this theoretical expectation, B, and FST, FST (average of ratios), dXY, dXY-SNP and HS in five independent tests for each treatment and at each generation. For this, we used Pearson correlation test, Spearman correlation tests, ordinary least squares regressions, robust regressions (using M-estimators; Huber, 1964), and permutation tests. The results were systematically consistent. Permutations tests of Pearson’s correlation coefficients were performed with 50,000 iterations. Because all tests were congruent, we only report the Pearson correlation coefficient and the P-values from permutation tests.  The data presented below are available on Dryad (doi: 10.5061/dryad.44vr58d). 2.3 Results The distributions of FST values from simulations with BGS are extremely similar to the distribution of FST values of simulations where all mutations were neutral. This remains true even in the most extreme treatment with no recombination. This general result is exemplified in Figure 2.1 by comparing the Default treatment to the No Recombination treatment. As we have a large number of simulations, the means of the distributions of FST are significantly different between simulations with BGS and simulations without BGS for both the Default (Wilcoxon tests: W=47875000; P=0.00002) and the No Recombination (Wilcoxon tests: W=47804000; P=0.002) treatments but the increase in mean FST due to BGS is only 4.3% for the Default treatment and 2.6% in the No Recombination treatment.  HS = 1− x2l,kk=1Al∑⎛⎝⎜⎞⎠⎟l=1L∑ / L	 18	 Figure 2.1: Violin plots showing the distribution of FST values for the Default treatment (labelled ‘With Recombination’) and for the No Recombination treatment. Simulations with BGS are shown in red, and simulations without BGS are in blue. The means and standard errors are displayed with dots and error bars (although the error bars are barely visible because the standard errors are so small). Figure 2.2 shows the means and standard errors for FST, dXY, and HS for all treatments. For most treatments FST is very little changed by BGS, even when HS and dXY are strongly affected.   ●●●●No     RecombinationWith        Recombination0.00 0.05 0.10 0.15 0.20FSTWith Recombination   No Recombination 	 19	 Figure 2.2: Comparisons of mean FST (left column), dXY (central column) and HS (right column) between simulations with (black) and without (grey) BGS. Error bars are 95% CI derived from permutations. Significance of difference between mean FST with and without BGS based on permutation tests is indicated with stars (***p < .001; **p < .01; *p < .05). Figure 2.3 shows the correlation between the predicted B and the statistics FST, dXY and HS for Default at the last generation. FST is not correlated with B (P = 0.24, r = –0.02). The strongest correlations with B are observed for the statistics dXY (P = 4 × 10-5, r = 0.06) and HS (P = 4 × 10-5, r = 0.06). In fact, the two statistics dXY and HS are very highly correlated (P < 2.2 × 10-16, r = 0.99). This high correlation explains the resemblance between the central and right graphs of figure 2.3. All correlations between the statistics HS, FST, FST (average of ratios), dXY, and dXY-SNP and B, are summarized in tables A.1.1, A.1.2, A.1.3, A.1.4 and A.1.5, respectively.                         FST                  dXY   HS        FST                   dXY   HS                                                               ** * * * * * *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 0.001 0.05 0.158 1.581 5 5 5 0.02 0 19u10-5 17u10-5 0.002 0 20u10-5 18u10-5 0.002 0 16u10-4 0.02 0 20u10-5 18u10-5 High Migration Default No Recombination Constant µ Human genetic map CNC97  No Migration  Low selection pressure Large N  20u10-4 0.02 0 19u10-5 18u10-5 0.02 0 18u10-5 14u10-5 0.8 0 2u10-4 0.02 0 20u10-5 12u10-5 6u10-4 0.2 0 2u10-3 4u10-3 0.0005 0.025 0.079 0.7905 2.5 2.5 2.5 Generation after split (2N units) Without Background Selection With Background Selection 0.001 0.05 0.158 1.581 5 5 5 * * 	 20	 Figure 2.3: Correlation between B and FST, HS and dXY for the last generation (5 × 2N generations after the split) of the Default treatment. Each grey dot is a single simulation where there is BGS. The large black dot is the mean of the simulations with no BGS. The p‐values are computed from a permutation test, and r is the Pearson's correlation coefficient. p‐values and r are computed on both simulations with and without BGS. Results are congruent when computing the correlation coefficients and p‐value on the subset of simulations that have BGS. When looking at correlations between B and the statistics of population divergence, the No Migration treatment is an exception to the other treatments. For the No Migration treatment, FST is not significantly correlated with B at early generations but becomes slightly correlated as divergence rises to an FST of 0.6 and higher. dXY shows an opposite pattern. dXY is very significantly correlated with B at early generations and seemingly independent of B at the last generation. Note that for FST all correlation coefficients are always very small. The largest r2 observed for FST is r2=0.01(found for FST No Migration).  As expected, in the CNC97 simulations, there is a strong difference between simulations with BGS and simulations without BGS for all three statistics (FST, dXY, and HS) at all generations (Welch’s t-tests; all P < 2.2×10-16; Figure 2.2).  FST averaged over loci as advised by Weir and Cockerham (1984) was generally less sensitive to BGS than FST calculated as an average of ratios (compare tables A.1.2 and A.1.3). This effect is again partially visible in the correlations with B. Figure A.1.2 illustrates the sensitivity of FST (average of ratios) in the worst case, the No Recombination treatment. This sensitivity is driven largely by rare alleles and goes away when minor alleles below a frequency of 0.05 are excluded. The observed false positive rate (FPR) for the FST outlier test is relatively close to the α values except for No Migration (with and without BGS) and CNC97 (with BGS). Excluding the intentionally unrealistic treatment CNC97, we do not see more significant differences between the FPR with and without BGS than we would expect by chance (Figure 2.4). There are other statistics of interest that one can consider to investigate whether BGS causes FST outliers. Among all treatments (excluding CNC97), the fraction of SNPs that are associated with an FST that is greater than 10 times the average FST in its particular treatment is 0.075% with BGS and 0.085% without BGS. These numbers go up        ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●● ●●●●●●●●●●●LowBGSHighBGSr : -0.02P : 0.24    0    0.05    0.1    0.150.0 0.5 1.0BF ST●●●●●●●●●●●●●●●●● ●●●● ● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●LowBGSHighBGSr : 0.06P : 4 x 10−50.00000.00050.00100.00150.0 0.5 1.0Bd XY●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●LowBGSHighBGSr : 0.06P : 4 x 10−50.00000.00050.00100.00150.0 0.5 1.0BHS	 21	to 1.7% and 1.8% for SNPs that have an FST five times greater than the average FST, for treatments with BGS and without BGS, respectively. We have also computed, in each treatment, the ratio of the largest FST to the average FST. Among treatments without BGS, the largest FST was on average 12.2 times the average FST, while among treatments with BGS, the largest FST was on average 12.1 times the average FST. With an 7	threshold of 0.001, we observe that 0.080% and 0.083% of the SNPs turn out false positives among treatments without BGS and among treatment with BGS, respectively. For the conditions considered in this paper, BGS does not significantly increase the rate of FST outliers.  	 22	  Figure 2.4: Comparison of false‐positive rate (FPR) returned by fdist2 between simulations with BGS (black) and without BGS (grey) for all treatments by generation. The significance level is 0.05 and is represented by the horizontal dashed line. Significance based on a Welch's t test is indicated with stars (***p < .001; **p < .01; *p < .05). With Wilcoxon tests, none of the treatments displayed here comes out as significant.   With Background Selection Without Background Selection  ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●DefaultHighMigrationLarge NHumangeneticmapLowselectionpressureConstant muNoMigrationNoRecombinationCNC970.0010.050.1581.581 50.050.060.050.060.050.060.050.060.050.070.050.070.10.30.050.070.10.2Generation after split (2N  units)False Positive Rate   Default High  igration    Human  genetic  map Low lection r ssure    Constant µ   No  Migration  No Recombination  CNC97 Generation after split (2N units) False Positive Rate * *** * * * *** *** *** 	 23	2.4 Discussion In agreement with previous works (e.g. Charlesworth, 2012; Elyashiv et al., 2016; Messer & Petrov, 2013; Nordborg et al., 1996a; Vijay et al., 2017a; Zeng & Charlesworth, 2011), we show that background selection reduces genetic diversity, both within and among populations. The magnitude of this effect is very similar to previous realistic simulations (Messer & Petrov, 2013). The effect of BGS on FST is however rather small and does not seem to impact the overall distribution of FST in the sexual outcrossing conditions that we study here (Figure 2.1). The relative robustness of FST to variation in the intensity of BGS is contrary to what has been found in less realistic simulations (Charlesworth et al., 1997; Zeng & Corcoran, 2015). FST was also generally not significantly correlated with B. The only exception is for the No Migration treatment, where, after many generations, as the average FST becomes very high (FST > 0.5), we observe a slight, yet significant, negative correlation between the expected effects of BGS, B, and FST (intense BGS lead to high values of FST). This highlights that FST is not completely insensitive to BGS, but FST is largely robust to BGS in cases of reasonable levels of gene flow among populations. The observed correlation coefficients are always very small with not a single r2 value greater than 1%. It is important to highlight that B has not been defined in order to estimate the effect of BGS onto FST but only for the effect of BGS on HS in a panmictic population, although Zeng and Corcoran (2015) predict that B can be used to determine the effects of BGS on FST. In this study, we investigated the locus-to-locus variation in the intensity of BGS and how it affects FST. Future research is needed to attempt a theoretical estimate of the genome-wide effect of BGS on FST (Charlesworth et al., 1997; Zeng & Corcoran, 2015). Our work has been restricted to the stickleback and human recombination maps. While these two genomes are reasonable representatives of many eukaryotic genomes, they are not good representatives of more compact genomes such as bacterial genomes or yeasts. Our simulations used randomly mating diploid populations. Non-random mating, selfing, and asexual reproduction could also affect our general conclusion, and potentially strongly increase the effects of BGS on FST (Charlesworth et al. 1997). Also, we did not explore the effect of haploid selection as we only considered autosomes (see Charlesworth, 2012). We have explored two population sizes, but we could not explore population sizes of the order of a million individuals (like Drosophila melanogaster) and still realistically simulate such long stretches of DNA. It is not impossible that a much greater population size or a more complex demography could result in BGS having a greater effect on FST than what we observed here (Torres et al. 2017). In the No Recombination treatment, we have explored cases of complete suppression of recombination over stretches of DNA of, on average, 0.8% of the stickleback genome and our results were still consistent. However, we have not explored the effect of suppression of recombination over greater regions, such as a whole chromosome that does not recombine. We have also not explored such suppression of recombination in perfectly isolated populations as we mainly focused on interconnected populations. It is not impossible that in such cases we might observe a greater impact of BGS on FST such as those observed in (Zeng & Corcoran, 2015) and reproduced in appendix A.2. 	 24	Some have argued that, because BGS reduces the within population diversity, it should lead to high FST (Cutter & Payseur, 2013; Cruickshank & Hahn, 2014; Hoban et al., 2016). All else being equal, this statement is correct. However, in our simulations, BGS reduces HT almost as much as HS (Figure A.1.4). It is therefore insufficient to consider only one component, and we must consider the ratio of these two quantities captured by the definition of FST, #() = (0"#0!)0"+ 1!-./ . This ratio, as we have shown, appears to be relatively robust to BGS. While genome-wide BGS might eventually be strong enough to affect FST values, it appears that locus-to-locus variation in the intensity of BGS is not strong enough to have much impact on FST  in outcrossing sexual organisms, at least as long as populations are not too highly diverged. We also investigated the consequences of BGS on the widely-used but imperfect estimator, FST (average of ratios), for which FST measures for each locus are averaged to create a genomic average. It is well known that FST (average of ratios) is a biased way to average FST over several loci (B S Weir & Cockerham, 1984); however, its usage is relatively common today. In our simulations, FST (average of ratios) is more affected by BGS than FST. Interestingly, FST (average of ratios) is most often higher with weaker BGS. The directionality of this correlation may seem unintuitive at first. To understand this discrepancy, remember that BGS affects the site frequency spectrum; we observed that BGS leads to an excess of loci with low HT (results not shown but see Charlesworth et al., 1995; see also contrary expectation in Stephan, 2010). Loci associated with very low HT also have low FST (Figure A.1.3), a well-known result described by Beaumont and Nichols (1996). As BGS creates an excess of loci with low HT and loci with low HT tend to have low FST, BGS can actually reduce FST (average of ratios). After filtering out SNPs with a minor allele frequency lower than 5%, most of the correlation between FST (average of ratios) and B is eliminated (Figure A.1.2). The absolute measure of divergence dXY is more sensitive to BGS than FST (Figure 2.2; Tables A.1.2 and A.1.4). Regions of stronger BGS are associated with low dXY. This is in agreement with correlation tests between B and dXY. The effect, although significant, is of relatively small size. BGS should affect dXY by its effect on the expected heterozygosity, and this effect should be greater early in divergence and with migration, compared to divergence occurring because of new mutations in isolated populations. This is consistent with the results of our simulations. This result is in agreement with Vijay et al. (2017) who reported a strong correlation between HS and dXY when FST is low (FST ≈ 0.02), but this correlation breaks down when studying more distantly related populations (FST ≈ 0.3).  As BGS also leads to a reduction of the number of polymorphic sites, BGS has an even stronger effect on dXY-SNP than on dXY (Figure A.1.2). (The measure that we call dXY-SNP  is dxy improperly calculated based only on polymorphic sites, as is done in some software packages.)  Zeng and Corcoran (2015) approximated the effect of background selection on FST accounting for the effect that BGS has on the effective population size (#() = $,-9.+$). With our simulations, we report only little to no association between this expectation and the actual #()(see figure A.1.5). BGS is a complex process and we think that reducing its effect on #()  by its theoretical expectation of the statistic B is misleading. Below, we discuss some of the reasons. 	 25	Because most deleterious mutations are recessive (García-Dorado et al., 2000; Peters et al., 2003; Shaw & Chang, 2006), the offspring of migrants, who enjoy an increased heterozygosity compared to local individuals, will be at a selective advantage. The presence of deleterious mutations therefore leads to an increase in the effective migration rate (Ingvarsson & Whitlock, 2000), which leads to a decrease in FST. In our simulations however, mutational allelic effects are close to additive and hence, we should not expect to see much effect of BGS on the effective migration rate. Additionally, patches that have a lower average fitness will receive migrants that are comparably fitter and hence will enjoy a higher effective migration rate than other patches. This should also lead to decreased FST.  There is one additional factor that may weaken the connection between the FST predicted based on Ne = N B used in the Zeng and Corcoran approximation. This approximation implicitly assumes that the effects of new deleterious alleles on the effective population size are immediate. However, in a structured population connected by migration (and when migration is strong relative to the strength of selection), alleles will typically migrate before selection acts to eliminate them. As a result BGS affects the global genetic variation and the local genetic variation in a similar way. In other words, if alleles migrate before the effects of selection on linked loci is fully realized, then BGS will affect coalescent times of alleles between populations in a parallel way to how it affects coalescent times of alleles in the same population, resulting in a weakened effect on FST compared to that predicted by the Zeng and Corcoran approximation. We anticipate that much stronger selection coupled with weaker migration would show a larger effect on FST, as is in fact the case in the parameters chosen by Zeng and Corcoran in their simulations (see Appendix A.2) Another reason why one might not observe a correlation between intensity of BGS and FST throughout the genome is because mutation rate is auto-correlated throughout the genome; neutral sequences closely linked to sequences that frequently receive deleterious mutation are also likely to experience frequent neutral mutations. As a high mutation rate leads to low FST values (#() ≅	 $$+,-:(.+/), Wright 1943), autocorrelation in mutation rate may also act as to reduce the effect of BGS on FST. This effect is however likely to be negligible as long as m >> µ. Recently, evidence of a correlation between recombination rate and FST has been interpreted as likely being caused by deleterious mutations rather than positive selection, whether the divergence between populations is very high (e.g. Cruickshank & Hahn, 2014), moderately high (Vijay et al., 2017) or moderately low (Torres et al., 2018). Here we showed the BGS is unlikely to explain all of these correlations between FST and recombination rate, especially in cases with relatively low divergence among populations. As positive selection has been shown to also have important effect on genetic diversity (Eyre-Walker & Keightley, 2009; Hernandez et al., 2011; Macpherson et al., 2007; Sattath et al., 2011; Wildman et al., 2003), it would be important to investigate whether positive selection (selective sweeps and local adaptation) could be an important driver of the correlations between FST and recombination rate. More research would be needed to investigate whether this is true.  	 26	McVicker et al. (2009) attempted an estimation of B values in the human genome (see also Elyashiv et al., 2016). They did so using equations from Nordborg et al. (1996). As there is little knowledge about the strength of selection throughout the genome, to our understanding, this estimation of B values should be highly influenced by the effects of beneficial mutations as well as deleterious mutations. Torres et al. (2018) reused this dataset and found a slight association between B and FST among human lineages. It is plausible that this correlation between B and FST could be driven by positive selection rather than by deleterious mutations; this possibility should be further explored. Our FDist2 analysis shows that the false positive rate does not differ in simulations with BGS or without BGS (figure 2.4). The only exceptions concern the unrealistic CNC97 treatment. The average FST at the last generation of the No Migration treatment is greater than 0.8. With such high FST, the FST outlier method, Fdist2, does not seem to perform well and both the simulations without BGS and with BGS lead to very high false positive rates (0.472 without BGS and 0.467 with BGS; Figure 2.4). Beaumont and Nichols (1996) showed that the Fdist2 procedure is problematic for non-equilibrium populations, especially when FST is greater than 0.5 (see section non-equilibrium populations from their results), especially when the number of subpopulations sampled is low. It therefore sounds plausible that Fdist2 would poorly apply to our No Migration treatment. While other issues may intervene in FST outlier methods (Lotterhos & Whitlock, 2014b), our results show that BGS should not represent any significant issue for outcrossing sexual species with moderately low mean FST. We have shown that BGS affects HS and dXY but has only a very minor effect on FST among sexual outcrossing populations connected by gene flow. Many authors (e.g. Cutter & Payseur, 2013; Hoban et al., 2016) have raised concerns that BGS can strongly reduce our ability to detect the genomic signature of local adaptation. Our analysis shows that BGS is not a strong confounding factor to FST outlier tests. 2.5 Acknowledgments Many thanks to Loren H. Rieseberg, Sarah P. Otto and Amy L. Angert for their help in discussing the design of the project and for feedback. Thanks to Sarah P. Otto, Darren E. Irwin, and Bret A. Payseur as well as two anonymous reviewers for helpful comments on the manuscript. We also thank Yaniv Brandvain and Tom Booker for their feedback and Marius Roesti for help with the Ensembl-retrieved gene annotations. We also want to thank Nick Barton for his help in interpreting why our results differ from Zeng and Corcoran (2015) theoretical expectation. We also acknowledge ComputeCanada that provided the hardware resources for running our simulations. 2.6 Funding The work was funded by NSERC Discovery Grant RGPIN-2016-03779 to MCW and by the Swiss National Science Foundation via the fellowship Doc.Mobility P1SKP3_168393 to RMD. 	 27	 3.1 Introduction By allowing organisms to sustain a high fitness in a range of environments, adaptive phenotypic plasticity would appear to provide a solution to any eco-evolutionary problem. However, plasticity is not ubiquitous, which leads many authors to ask what limits its evolution (Agrawal, 2001; DeWitt et al., 1998a; Moran, 1992; Murren et al., 2015a; Scheiner et al., 1991; Scheiner & Holt, 2012; Swanson & Snell-Rood, 2014; Tienderen, 1991; Tonsor et al., 2013; Van Kleunen & Fischer, 2005). There are a number of possible costs and limits to the evolution of plasticity (DeWitt et al., 1998; Snell-Rood et al., 2010), but current empirical research on the costs of plasticity have been mainly inconclusive; costs have been rarely found and when found they are rather mild (Snell-Rood, 2012; Snell-Rood et al., 2010; Van Buskirk & Steiner, 2009; Van Kleunen & Fischer, 2005). One of the most commonly discussed costs is the idea that plasticity generates developmental instability, reducing fitness in comparison to more precise, constitutive expression of a phenotype (DeWitt, 1998; Scheiner et al., 1991; Tonsor et al., 2013). Plasticity is the ability of a genotype to produce different phenotypes in different environments. Plasticity is a heritable trait (Côté et al., 2007; Landry et al., 2006; Li et al., 2006; Suzuki & Nijhout, 2006) that can have many ecological consequences; plasticity can increase population growth rate (A. R. Hughes et al., 2008; Wennersten & Forsman, 2012), broaden ecological niches (Smith & Skulason, 1996), reduce intraspecific competition (Anders Forsman, Ahnesiö, Caesar, & Karlsson, 2008) and increase invasiveness (Sultan, 2000; Bock et al., 2018). Plasticity also affects population differentiation and the rate of speciation (Gray & McKinnon, 2007; Pfennig et al., 2010). Finally, plasticity can affect adaptive evolution (Price et al., 2003), including local adaptation (Arendt, 2015), response to climate change (Charmantier et al., 2009; Nicotra et al., 2010; Hoffmann & Sgrò, 2011) and adaptation via genetic accommodation (Bock et al., 2018; Schlichting et al., 2006). More consequences of plasticity can be found in review by Forsman et al., (2008) and Wennersten & Forsman (2012). Plasticity is specific to particular phenotypic traits and environments (Wagner, 2005). In this paper we will be focusing on a single phenotypic trait and a specified set of environmental conditions, allowing us to talk about “plastic and non-plastic genotypes” without ambiguity. Developmental noise (the source of developmental instability), as defined in this paper, refers to the phenotypic variation among clones of a specific genotype developed in the same environment. Developmental robustness is defined as the inverse of developmental noise. Developmental noise is heritable (McAdams & Arkin, 1997; Klingenberg & 3 Phenotypic plasticity and robustness 	 28	Nijhout, 1999) and is involved in various processes such as cellular specialization in multicellular organisms (reviewed in Losick & Desplan, 2008) and the penetrance of alleles (the proportion of individuals carrying the allele that also express the trait; reviewed in Chalancon et al., 2012). Importantly, developmental noise creates deviations from the targeted phenotype, which, under stabilizing selection, reduces fitness (Gavrilets & Hastings, 1994). However, developmental instability can also be beneficial as a form of bet-hedging (Mulvey et al., 2016; Seger & Brockmann, 1987; Veening et al., 2008). Evolution of developmental noise can affect the overall rate of adaptation (Wang & Zhang, 2011), as well as networks of gene interactions (Chalancon et al., 2012). Understanding the evolution of plasticity and developmental noise can therefore influence our thinking about many aspects of evolution and ecology. In our simulation model, developmental noise arises from stochasticity in gene expression, translation, mRNA degradation and protein degradation, which produce substantial variability among unicellular organisms (Chalancon et al., 2012). Noise can also propagate through regulatory interactions among genes (Pedraza & Van Oudenaarden, 2005), and this effect increases with the extent of the regulatory cascade (Hooshangi et al., 2005). Multicellular organisms may have different and additional sources of random phenotypic variability, motivating us to treat phenotypic noise at an abstract level in this paper.  3.1.1 Costs and limits on the evolution of plasticity The idea that plasticity comes at a cost of developmental instability is one of the most popular constraints discussed in the literature (e.g. Debat & David, 2001; DeWitt, 1998; Scheiner et al., 1991; Snell-Rood et al., 2010; Tonsor et al., 2013; Wilson & Yoshimura, 2008; Yoshimura & Shields, 2018). Because a plastic response might require a complex regulatory mechanism and because, a more complex regulatory cascade is expected to be more stochastic, it follows that plastic traits are likely to be more developmentally noisy than their non-plastic counterparts (see also DeWitt, 1998).  The prediction of the relationship between plasticity and developmental robustness is based on an intuitive prediction that does not distinguish among the diverse developmental pathways by which a plastic response can be implemented (as criticized by Snell-Rood et al., 2010). The simplest developmental mechanism for a plastic response is one where individuals sense an environmental signal and respond to it. Daphnia can respond to kairomones released by predators such as fish, backswimmers or midge larvae, and they also respond to chemicals released by macerated conspecifics (Laforsch et al., 2006). In the presence of such signals, several species of Daphnia develop a long helmet on their head that protect them against these predators. Other examples of plastic responses involving an environmental signal include the Pennsylvanian meadow vole (coat thickness in the offspring is dependent on the duration of daylight experienced by the mother; Lee & Zucker, 1988) and the East African Acridoid Grasshoppers (melanin deposition in response to fires for camouflage; Rowell, 1972).  An alternative mechanism is what Snell-Rood (2012) calls “developmental selection” (aka. “somatic selection” or “epigenetic selection”; Sachs, 1988; West-Eberhard, 2003). 	 29	Developmental selection has two components: First, the genotype must produce a range of phenotypes, and second, the genotype must assess the performance of each phenotype and bias subsequent development toward the highest performing phenotypes (Snell-Rood, 2012). The essential distinction between simple plasticity and developmental selection is that the latter involves a 'performance signal' that integrates information from both development and the environment  (see also Bhalla & Iyengar, 1999; Becskei & Serrano, 2000). We will therefore contrast plasticity in response to an “environmental signal” with plasticity in response to a “performance signal”. While a performance signal as a mechanism for plastic responses has been largely ignored in the evolutionary literature, the reality is that such mechanisms are very common in nature (Snell-Rood, 2012). In humans, bones respond plastically to impact-loading by increasing their sizes, mineral content and density (reviewed in Zernicke et al., 2006). In tennis players for example, bone mineral content is 13% higher on the dominant arm that on the non-dominant arm (see Ducher et al., 2004 and Sanchis-Moysi et al., 2011 for similar observations in tennis players on gluteal muscles). Osteocytes serve a key role in sensing the resistance of the bone in face of the mechanical stress it receives and communicate this information to osteoclasts and osteoblasts that will, respectively, degrade and synthesize bone tissue (Zernicke et al., 2006). Other tissues show similar plastic response to performance signal such as cell walls in plants (Tiré et al., 1994) and muscles in animals (Hoppeler et al., 2011). Other examples include any trial-and-error learning behaviors in, say, foraging, or for learning associations between cues and rewards (Dukas, 2008; Papaj & Prokopy, 1989), vertebrates’ adaptive immune system (self vs non-self distinction; Litman et al., 1993; Nemazee, 2006), and neuroplasticity in the brain (Luo & O’Leary, 2005; Song & Abbott, 2001). More examples and evidence of the commonality of performance signal mechanisms are reviewed in Snell-Rood (2012).  3.1.2 Hypotheses Plastic developmental pathways tend to be more complex than non-plastic ones, and more complex developmental pathways tend to be more noisy (Hooshangi et al., 2005). Also, plasticity requires sensitivity to the environment, which allows noise in the signal to affect the phenotype. At the same time, feedback loops have been shown to buffer against developmental noise, hence increasing developmental robustness (Becskei & Serrano, 2000; Bhalla & Iyengar, 1999). We hypothesize that plasticity in response to an environmental signal would come at a cost of being developmentally less robust, but this cost would be attenuated or even reversed when the organism evolves plasticity via performance signal. Because performance signals might increase developmental robustness (Becskei & Serrano, 2000; Bhalla & Iyengar, 1999), one can expect that a performance signal mechanism could evolve even in a constant environment, in order to increase developmental robustness. Several authors have suggested that organisms using a performance signal are more likely to respond adaptively to a novel environment (Frank, 2011; Hull et al., 2001; Snell-Rood, 2012), and this hypothesis has been discussed 	 30	thoroughly in Snell-Rood et al. (2018). Hence, we also hypothesize that performance signal use can evolve in a constant environment to increase developmental robustness and create pre-adaptation through a plastic response to novel environments: organisms that have never encountered any environmental heterogeneity might nonetheless be adaptively plastic. Finally, we are also investigating the evolution of mutational robustness (the ability of a genotype to keep producing the same phenotype in the face of new mutations; Wagner, 2005) Several authors have suggested that mutational robustness might evolve as a correlated side effect to evolution of developmental robustness; a hypothesis known as the congruent hypothesis. If the congruent hypothesis were true, then one would expect a correlation between developmental and mutational robustness. In this chapter, we will also test this prediction. We investigate these three hypotheses through numerical simulations with a numerical model called ENTWINE (Draghi & Whitlock, 2015), which we modify here. Our goal is to simulate the evolution of network of genes that are regulated by other genes’ transcription factors. Parameters of the model are inspired from the empirical literature as we strive to model gene network interactions with as much realism as can be computationally tractable. 3.2 Methods 3.2.1 ENTWINE We simulated the development of single-cell individuals with a network of gene interaction, and we simulated the evolution of this network of gene interaction over time. For computational convenience, the evolutionary algorithm uses a panmictic populations of haploid individuals using a Wright-Fisher model (i.e., constant population size set here to 10,000, non-overlapping generations and with selection on fertility). Individuals were hermaphroditic and randomly mating. We measured a single phenotypic trait on which selection is applied. The phenotype of an individual is determined by simulating its development using a model based on a network of gene interactions. ENTWINE uses a modified Gillespie algorithm (Gillespie, 2007a) to model stochastic reactions of mRNAs and proteins. This framework sets a time-step τ that aims at balancing the computational time and controlling the error produced by the discrete-time approximation. We modeled three types of genes: regulatory genes, phenotype genes, signal-binding genes. The regulatory genes code for transcription factors that can only regulate the transcription rates of themselves or of other genes. The phenotype genes, like regulatory genes, can regulate other genes but can also directly affect the growth of the measured phenotypic trait. The signal-binding genes are regulatory genes but produce a protein that can interact with an environmental signal through cooperative binding. The protein complex formed by this interaction between the protein and the environmental signal may then regulate the expression of other genes in the same way as a transcription factor. None of these types of gene can mutate to any other type, but genes 	 31	may be deleted or may duplicate and diverge. Because a gene needs an enhancer in order to be expressed, nothing can be expressed in the system unless there are some transcription factors present by default. Per consequence, we provide a “basic transcription factor” that is inputted into the system at a constant rate and can eventually start the expression of one or more genes in the system. The effect of a phenotype genes protein product on the phenotypic trait is represented by its phenotypic effect. This phenotypic effect is always positive. The phenotype starts at zero at the beginning of the development and is increased at each time step by τ times the sum over all phenotype genes of the products of their protein concentrations and their phenotypic effect.  In ENTWINE, a gene is made of two parts: the cis-regulatory region and the coding region. The cis-regulatory region is made of a number of distinct binding sites, set to 20 for the simulations reported here. Each binding site is associated with a cis-regulatory effect and a binding affinity (an integer implicitly representing the number of amino acid mismatch between the protein's binding domain and the binding site). The coding region for transcription factors contains two types of information; the protein’s inherent regulatory effect (a positive or negative value) and which of the possible binding motifs (of the 20 allowed possibilities) is expressed in the protein. There exist six different types of mutations. There are duplications, deletions and four types of mutations that can influence the effect of a given protein on gene expression; such mutation can affect 1) The protein's inherent regulatory effect, 2) the binding affinity between the protein and the binding site of the target gene, 3) the binding site targeted by the protein, or 4) the cis-regulatory effect of the binding site. More information on the mutation rates and distribution of effect sizes are in the supplementary material of Draghi and Whitlock (2015). By implementing a model of regulation grounded in biophysical mechanisms, we allow genes to respond to external and internal signals and affect other genes’ expression by cis-regulatory interactions. To focus on cis-regulation, we fix the rate of translation and the rates of decay of both mRNA and proteins, for all genes (these constants are inspired from the literature on yeast molecular genetics; Draghi & Whitlock, 2015). Therefore, mutations only affect regulation by altering the rate of transcription of a gene. More information about the model of development and the choice of parameter values can be found in Draghi & Whitlock (2015).  3.2.2 Signals We brought two major modifications to ENTWINE compared to the previous version (Draghi & Whitlock, 2015) concerning the signals. First, we added the performance signal alongside the existing framework for environmental signal. Second, the transcription factors of both types of signals now exhibit cooperative binding kinetics with their respective signals. This modification is inspired by work suggesting that cooperative binding might be involved in plastic responses (Mayo et al., 2006).  The signals are produced outside the system and are inputted into the system at a given rate. For the environment signal, the rate of input is 9:, where 9: is the optimal phenotype 	 32	in the environment :. For the performance signal, the rate of input is |9 − 9: |, where 9;	is the current phenotype at time ; during development. In both cases, in the system, the signal decays at the same rate as do proteins and is therefore subject to the same stochasticity in its decay. Once in the system, the signal can affect gene expression through cooperative binding with the protein product of a signal-binding gene, creating a heterodimer which acts as a transcription factor. Let S be the signal. Cooperative binding is modeled with a Hill equation (Hill, 1910) <: = ('<'+('  , where =  is the Hill coefficient, >  is the dissociation constant and <: is the effective signal after cooperative binding has been taken into account. The Hill coefficient = is specific to each gene and takes a random value uniformly distributed between 1 and 5 when genotypes are constructed at the start of the simulation (see section Starting conditions below). The code is available at https://github.com/RemiMattheyDoret/ENTWINE.  3.2.3 Input parameters and treatments We explored three types of environmental patterns: a constant environment, spatial heterogeneity and temporal heterogeneity. The constant environment treatment is to study evolution of plasticity as a correlated side effect of the evolution of developmental robustness. The spatial heterogeneity treatment is to study whether plasticity comes at a cost of developmental instability and how different signals can affect this cost. The main interest of the temporal heterogeneity treatment is to have selection for bet-hedging (selection for increased developmental noise) to increase the diversity of selection scenarios for better testing the congruent hypothesis for the evolution of mutational robustness. In each type of environment, we considered three treatments: Environmental Signal, in which there is an environmental signal that the organism can potentially evolve to sense; Performance Signal, in which there is a performance signal that the organism can potentially evolve to sense; and finally, No Signal, in which no cue is present (and therefore no way to evolve plasticity). This 3 ´ 3 design yields a total of nine treatments, and we performed 200 replicates simulations for each treatment.  We followed the evolution of each population for at least 100,000 generations. Under spatial heterogeneity, half of the simulations of each of the three treatments were randomly chosen and were extended to 200,000 generations. Gaussian stabilizing selection is applied on a single phenotypic trait. If, at the end of its development, an individual phenotype is P in an environment e where the optimal phenotype is Pe, then the fitness of this individual is :?@ A− (=#=#)0>	×	$@2 B. We consider two environments, a low and a high environment in which the optimal phenotypes are respectively 1000 and 3000 (the fitness functions are displayed on the right side of figure 3.1). For the constant environment treatment, individuals only experience the low environment. With temporal heterogeneity, there is an alternation between environment at each generation. With spatial heterogeneity, each individual is randomly put in either environment in each generation (resulting in a migration rate of 0.5 among environments). This extreme scenario was chosen in order to have a high-incentive 	 33	on evolving a plastic response. Because our goal is to establish a proof of concept and not to estimate how different input parameters will impact the observed statistics and, also because simulations are computationally very expensive (a single simulation can take up to a month to run), we did not further vary the parameters such as the strength of selection or the optimal phenotypes to reach in the two environments. Note that because the number of protein product produced by each gene is, at each time step, drawn from a Poisson distribution and because the variance of a Poisson distribution equals its mean, a genotype that produces more protein products ought to be more noisy. Per consequence, genotypes that produce a large phenotype would tend to be more noisy too, hence causing a preference for targeting the low environment rather than the high environment for non-plastic genotypes. 3.2.4 Starting conditions Each simulation started with a population of cloned haploid individuals with a genome consisting of three regulatory genes and three phenotype genes. In the Environmental Signal treatment, organisms start with an extra three environmental-signal genes and in the Performance Signal treatment, organisms start with an extra three performance-feedback genes. Because gene duplication and gene deletion are possible, the number of genes in a representative genotype after 100,000 generations of evolution generally differed from the starting number of genes. Every simulation started with a population of clones founded by a unique ancestor. At the beginning of every simulation, before simulating the first generation, potential ancestral genotypes were randomly produced until a genotype that has a mean fitness between 0.15 and 0.25 is found. This fitness assay is measured in the low environment. This genotype is then cloned to create one starting population.  3.2.5 Measuring plasticity and robustness We sampled the most common genotype in the population at the last generation to measure the statistics of interest. For the treatments Environmental Signal and Performance Signal, under spatial heterogeneity, we also sampled other generations during the run to investigate the evolutionary dynamics. Plasticity and developmental robustness were then measured for each sampled genotypes. To measure plasticity, we redeveloped each genotype 50,000 times in the low environment and 50,000 times in the high environment. Because the vast majority of genotypes were either very plastic or not plastic at all (figures 3.1), instead of considering plasticity as a quantitative trait, we categorized genotypes as plastic or not plastic based on whether the average phenotypes between the two environments was greater than ΔP = 400 (i.e., greater than 20% of the difference in the optima), then the genotype was considered plastic. We have explored different threshold values for ΔP (100, 200 and 800), and the specific threshold did not affect the general conclusions (data not shown). 	 34	To determine the developmental robustness of a genotype, we used the 50,000 redevelopments of this genotype described above in the low environment. We define the developmental noise as the standard deviation in the realized phenotypes. Appendix B.2 explains the methodology used to estimate mutational robustness. 3.3 Results Figure 3.1 shows the reaction norms for Performance Signal and Environmental Signal under spatial heterogeneity at the end of the simulations. In these treatments and with very few exceptions, genotypes are either very adaptively plastic or not plastic at all. With a single exception, the non-plastic genotypes target the lower environment (the environment in which the initial fitness was required to be between 0.15 and 0.25).  Figure 3.1: Reaction norms for the Environmental Signal and Performance Signal treatments under spatial heterogeneity. Each line represents a single genotype (a single simulation) and links the average phenotypes in both environments. Reaction norms that are steep enough to be considered plastic (see Methods) are represented in black; non-plastic genotypes are represented in grey. The horizontal dashed lines represent the optimal phenotypes in each environment. On the right panel are the fitness functions in both environments. 3.3.1 Plasticity and developmental robustness We hypothesized that plasticity would come at a cost of increased instability when implemented via an environmental signal but that this cost would not exist (or even be reversed) when implemented via a performance signal. Figure 3.2 shows that in the Environment Signal treatment (Welch t-test: 79 < C < 168, P<5×10-7), but not in the  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Spatial heterogeneityEnv. SignalSpatial heterogeneityPerf. feedbackE0 E1 E0 E1Zopt,0Zopt,1Developmental EnvironmentAverage phenotype●●Not PlasticPlastic0 0.5 1FitnessEnvironmental Signal Performance Signal Spatial Heterogeneity 	 35	Performance Signal treatment (Welch t-test: -53 < C < 108, P≈0.47), non-plastic genotypes are less developmentally noisy than plastic genotypes as defined by figure 3.1. Importantly, plastic genotypes using the performance signal are less developmentally noisy than plastic genotypes using the environmental signal (Welch t-test: 52 < C < 136, P<3×10-5). We observe a significant interaction between the signal and whether or not a genotype is plastic after controlling for the main effects of signal and whether a genotype is plastic (OLS regression: P<0.05). Results for other treatments are in figure B.1.1.   Figure 3.2:	Average developmental noise for the Environmental Signal and Performance Signal treatments under spatial heterogeneity, separating the plastic (in black) from the non-plastic genotypes (in grey). See figure B.1.1 for the other treatments. Error bars are standard errors. See figure B.1.2 for the evolution of developmental noise over time for these two treatments.	Developmentally robust genotypes may face constraints preventing the evolution of plasticity (Draghi, 2019). In figure B.1.2, we see that the genotypes that eventually evolve plasticity have lower robustness than those that do not evolve plasticity. This difference among plastic and non-plastic lineages persists during the entire simulation in the Environmental Signal treatment. Therefore, the cost of plasticity (lower fitness caused by extra developmental noise) persists with Environmental Signal throughout the duration of the simulations. However, with the Performance Signal, the initial difference in robustness is eliminated by the end of the simulations. 3.3.2 Performance signal makes it easier to evolve plasticity? We hypothesized that a genotype may evolve using a performance signal even in a constant environment and, as a result, be adaptively plastic to environments that have never been encountered in the evolutionary history of the lineage. This prediction results from the idea that a performance signal can increase developmental robustness and hence can be  ●●●●Spatial heterogeneityEnv. SignalSpatial heterogeneityPerf. feedback250300350●●Not PlasticPlasticEnv. i al  P rf. Sign l  Spatial Heterogeneity Not Plastic Plastic Developmental Noise Unstable Robust 	 36	beneficial even in a constant environment and make the organism plastic as a side effect. However, some genotypes evolving in a constant environment and with an environmental signal evolve some level of plasticity, something we did not expect. Therefore, we want to investigate whether plasticity evolves more often in the Performance Signal treatment than in the Environmental Signal treatment. Comparing the fractions of simulations that evolve a plastic response in both treatments is made a little more complicated because some of the starting genotypes are already plastic. An organism needs to use some cue in order to initiate the development. For the majority of genotypes, this cue is the “basic transcription factor” (see methods), but some genotypes have used either the performance signal or the environmental signal to initiate the development. In fact, with the constant environment, 30% and 37.5% of genotypes started out plastic for the environmental signal and performance signal treatments, respectively. Most starting plastic genotypes lost their plasticity very early on, with only 10% and 9.3% of the genotypes that started plastic remaining plastic during the whole simulation for the environmental signal and performance signal treatments, respectively. With the environmental signal, 2.8% of the genotypes starting non-plastic end up plastic and 18.3% of the genotypes starting plastic end up plastic. With the performance signal, 41.6% of the genotypes starting non-plastic end up plastic and 32% of the genotypes starting plastic end up plastic. Because including simulations where genotypes started plastic may be misleading, we report below the fraction of genotypes that evolved plasticity including only simulations where starting genotypes were not plastic. However, including them would not change the conclusion. Excluding the simulations where genotypes started plastic, in the constant environment, we observe that, 0%, ~3% and ~42% of the genotypes evolved plasticity in the treatments No Signal, Environmental Signal and Performance Signal, respectively (figure 3.3). All pairwise differences are significant (Fisher tests: No Signal – Environmental Signal: P<0.05; No Signal – 	 37	Performance Signal: P=2.2×10-16; Environmental Signal – Performance Signal: P=6.55×10-16).   Figure 3.3: Fraction of genotypes that evolved plasticity in all treatments. The figure excludes simulations where genotypes started plastic. As expected, in the No Signal treatment, no genotypes ever evolved plasticity. With signals, a higher fraction of genotypes evolved plasticity in the Performance Signal treatment than in the Environmental Signal treatment, even when evolved in a constant environment. With the exception of the plastic genotypes of the environmental signal treatment in a constant environment for which we have only very few data points, we systematically observe a positive relationship between mutational noise and developmental noise (Figure 3.4). This positive relationship is small but significant in 10 out of 14 cases, with r2 values ranging from ~0.005 to ~0.2 (Table B.1.1). Interestingly, with both spatial and temporal heterogeneity, mutational noise is affected by whether or not an organism is plastic but not by the signal by which the plastic response is implemented (OLS; spatial heterogeneity: Plasticity - P<5×10-5, Signal - P=0.83; temporal heterogeneity: Plasticity - P<0.005, Signal - P=062).  Constant environmentSpa ial heterogeneityTemporal heterogeneityNo SignalEnv. SignalPerf. SignalNo SignalEnv. SignalPerf. SignalNo SignalEnv. SignalPerf. Signal0.000.250.500.75Proportion of plastic genotypesNS        PS        ES NS        PS        ES NS        PS        ES None      Env.      Perf. Constant Environment Temporal Heterogeneity Spatial Heterogeneity None      Env.      Perf. None      Env.      Perf. 	 38	 Figure 3.4: OLS Regressions between mutational noise (log-scale) and developmental noise for all treatments, separating the plastic (in black) from the non-plastic genotypes (in grey). Tests statistics can be found in Table B.1.1. 3.4 Discussion 3.4.1 Cost of plasticity We observe that plasticity comes at a cost of being developmentally noisy but only when the plastic response is mediated via an environmental signal (Figure 3.2). A performance signal allowed genotypes to evolve plasticity without the cost of being developmentally noisy. However, in spatially heterogeneous environments, we did not find evidence that a plastic genotype using performance signals are more developmentally robust than non-plastic genotypes. Looking at figure B.1.2, it seems plausible that if we were to prolong the simulations for another 200,000 generations we would see that plastic genotypes using the performance signal would be more robust than the non-plastic genotypes.  Empirical studies investigating the relationship between plasticity and developmental instability have found mixed results (DeWitt, 1998; DeWitt et al., 1998a; Perkins & Jinks, 1973; Scheiner et al., 1991; Tonsor et al., 2013; Van Buskirk & Steiner, 2009; Van Kleunen & Fischer, 2005). We want to suggest two reasons for this lack of consensus in empirical settings. First, as we have shown, whether plasticity comes at a cost of developmental instability depends upon the developmental mechanism used to implement plasticity. Second, a genotype can only be called “plastic” in regard to a specific phenotypic trait of interest (see also Wagner, 2005). If being plastic for a specific trait correlates with not being plastic for another trait, then associating any decrease in developmental robustness to this trait will be misleading. Consider body temperature for example. In order to have an environmentally robust (non-plastic) body temperature, traits like shivering, hair position, vasoconstriction, and other mechanisms of body temperature regulation must respond plastically to temperature variation (Tansey & Johnson, 2015). In this example, searching for a cost of plasticity for body temperature can be misleading when what may be costly are the plastic mechanisms (e.g., shivering, hair position, vasoconstriction) that regulate temperature. Such situations are likely to be common. While this second caveat is presented in the framework of the association between  None           Env.           Perf. None           Env.           Perf. None           Env.           Perf. 	 39	developmental robustness and plasticity, any cost of plasticity is always only meaningful when carefully specifying the trait and when keeping in mind that plasticity for this trait may be positively or negatively correlated with plasticity for another trait.  Given that sensing a performance signal has such an advantage over sensing an environmental signal, it raises a question of why an organism would ever evolve to use an environmental signal rather than performance signal. As an example, consider again the Daphnia plastic response to the presence of predators already discussed in the introduction. Daphnia cannot determine whether their helmet is long enough to be protected against predators because a failure to produce a long enough helmet would be fatal. In such an example, Daphnia has no other choice but to use an environmental signal. Interestingly, in our simulations with temporal heterogeneity, non-plastic individuals are extremely developmentally noisy; an observation congruent with expectations that organisms can use developmental noise as a diversifying bet-hedging strategy (Mulvey et al., 2016; Seger & Brockmann, 1987; Veening et al., 2008). As a consequence, plastic genotypes have a higher developmental robustness on average than non-plastic individuals (figure B.1.1). This result highlights that a genetic correlation between plasticity and developmental robustness is not only affected by the developmental mechanism by which plasticity is implemented but also by the type of environmental heterogeneity (spatial vs temporal heterogeneity) encountered. 3.4.2 Pre-adaptation to heterogeneous environments Several authors hypothesized that organisms using a performance signal are more likely to respond adaptively to novel environments (Frank, 2011; Hull et al., 2001; Snell-Rood, 2012; Snell-Rood et al., 2018). For this reason and because a performance signal can increase developmental robustness (figure 3.2), we hypothesized that a performance signal mechanism can evolve even in a constant environment and grant the organism an ability to respond adaptively to novel environments. In agreement with this hypothesis, we observed that in the constant environment, plasticity evolved in ~42% of simulations in the performance signal treatment but in only ~3% of simulations in the Environmental Signal treatment (figure 3.3).  Our hypothesis that a performance signal would help an organism be pre-adapted for a plastic response in a novel environmental was motivated by the idea that a performance signal would evolve in a constant environment because it would provide a gain in developmental robustness. It is however puzzling that plastic genotypes using the performance signal are in fact equally robust than non-plastic genotypes in the same treatment (figure B1.1). We observe that with both a constant environment and with spatial heterogeneity, genotypes using an environmental signal are more developmentally noisy than their non-plastic counterpart. This explains the low propensity of plastic genotypes with a constant environment in the Environmental signal treatment. We observe that with both a constant environment and with spatial heterogeneity, genotypes using a performance signal are equally developmentally noisy than their non-plastic counterpart. Using an older version of ENTWINE, Draghi	&	Whitlock	(2015) showed that mutations affecting the mean phenotype also affect the developmental robustness as a correlated 	 40	effect. We think that with a constant environment in the Performance signal treatment, while individual mutations making organisms plastic did not increase developmental robustness on average, they caused little to no reduction in developmental robustness and may have been beneficial as a mean to bring the individual’s mean phenotype closer to the optimal phenotype. Because we frame our work as a proof of concept and also because simulations are computational very extensive, we did not vary the parameters employed. Our work therefore does not provide reliable estimates of what ought to be observed in nature but only inform about the potential mechanisms at play. For example, our work does not provide an estimate of how much fitness can be gained by using a performance signal vs an environmental signal. Some qualitative results may be affected by varying parameters. For example, if the Gaussian selection kernel were to be set narrower, it would likely result in more genotypes evolving plasticity. Also, if the concentration of the signals were set without noise (higher reliability of the information; Getty,	1996; DeWitt	et	al.,	1998), it is possible that plastic genotypes would be more robust, and, in particular, it is possible that plastic genotypes using the performance signal would be more robust than non-plastic genotypes. We expected that some genotypes in the Performance Signal treatment would evolve plasticity in the constant environment (figure 3.3). We did not expect, however, that a few genotypes in the Environment Signal would also be plastic (figure 3.3) and with a positive (adaptive) reaction norm. We think that the details of development in our model lead to this outcome occasionally, because of the starting conditions of development. A genotype develops its phenotype starting from zero and in order to begin its expression, a phenotype gene must be enhanced by some transcription factor. When present and acting as an enhancer, the environmental signal can sometimes be used to initiate development. As a result, some genotypes in the early generations use the environment signal as an initiator of development, and as a result development proceeds faster when that signal is stronger. This attribute is shared in all details by the environmental signal and performance signal genetic architectures; therefore the large contrast in the rate of plasticity between the environmental signal and performance signal cases is a strong indicator of the greater propensity for the performance signal to evolve plasticity in a constant environment.  Historically, the literature on the evolution of plasticity has been focused on the selection pressures acting on the plastic behavior directly and on its associated costs. Indeed, all models for the evolution of adaptive plasticity state that adaptive plasticity can evolve only if populations are exposed to heterogeneous environments and if the different environments select for different phenotypes (see for example Bradshaw, 1965; Gomulkiewicz & Kirkpatrick, 1992; Levins, 1968; Lively, 1986; Moran, 1992; Via & Lande, 1985; reviewed in Ghalambor et al., 2007). In our simulations however, plasticity mediated by a performance signal regularly evolved in a constant environment in contrast to the assumptions of other models in the literature. Adaptive plasticity has long been suspected of playing a critical role when a species encounters a novel environment (Amarillo-Suárez & Fox, 2006; Baker, 1974; Baldwin, 1986; Pigliucci & Murren, 2003; Price et al., 2003; Robinson & Dukas, 1999; West-Eberhard, 2003); reviewed in Ghalambor et al., 2007 and Snell-Rood et al., 2018) and has 	 41	been known to be influential in range shifts and invasiveness (Agrawal, 2001; Sultan, 2000; Wennersten & Forsman, 2012; Bock et al., 2018). Our study shows that, when a performance signal is available, a species can evolve to be adaptively plastic for a novel environment that has never been encountered in the species’ evolutionary history. It remains to be estimated how common such processes are in nature. 3.4.3 Mutational Robustness In the literature, there are three main non-exclusive hypotheses for the evolution of mutational robustness; the adaptive hypothesis, the congruent hypothesis and the intrinsic hypothesis (de Visser et al., 2003). The adaptive hypothesis states that mutational robustness evolves directly in response to selection created by the loss of robustness generated by new mutations (Azevedo et al., 2006; Stearns & Kawecki, 1994; Szollosi & Derenyi, 2009a; van Nimwegen et al., 1999; Wagner et al., 1997). The congruent hypothesis states that mutational robustness evolves as a correlated side effect of the evolution of developmental robustness with evidence found in RNA folding (Ancel & Fontana, 2000), heat-shock proteins (Rutherford, 2000), P-elements insertion (Stearns et al., 1995) and in patterns of gene expression in Drosophila melanogaster (Rifkin et al., 2005). In contrast to these studies, Dworkin (2005) did not find any such correlation in Drosophila bristles. Finally, the intrinsic hypothesis is an umbrella hypothesis for alternative explanations that create robustness selection via other aspects of organismal function.  In line with predictions of the congruent hypothesis, we systematically observe a positive correlation between developmental robustness and mutational robustness in all treatments, whether we consider plastic or non-plastic genotypes (Figure 3.4). It has been argued that evidence of a positive correlation between developmental and mutational robustness is not in itself sufficient to provide evidence for the congruent hypothesis, as developmental and mutational robustness are both expected to increase with stronger stabilizing selection (Wagner et al., 1997). This objection doesn’t account for the observed correlation within a treatment, in which each replicate population is subject to the same strength of selection. Overall, the level of support for the congruent hypothesis is relatively high (Ancel & Fontana, 2000; Dworkin, 2005; Meiklejohn & Hartl, 2002; Rifkin et al., 2005; Stearns et al., 1995; Szollosi & Derenyi, 2009b), and our findings add further support. The correlations that we observe between mutational and developmental robustness are among populations that have undergone a very similar evolutionary history. In an empirical system one might compare genotypes of different species that may have experienced a much greater diversity of selection pressures, developmental mechanisms and demographics. This diversity of evolutionary histories among sampled genotypes may affect the observed correlation. The correlation coefficients that we observe are relatively small, ranging from ~0.005 to ~0.2 (Table B.1.1). It is hard to predict whether, in a natural system where there might be a much greater diversity in evolutionary history among the sampled genotypes, the correlation would get stronger or weaker. Interestingly, while, in our simulations, the performance signal allows plastic organisms to retain high developmental robustness, it does not allow the plastic organisms to retain 	 42	high mutational robustness. One might have expected that a performance feedback mechanism could buffer against mutational noise too, however, mutations directly affecting the feedback mechanism could have extreme mutational effects, which might compensate the advantage of having such a feedback mechanism. If this is true, then we should observe a higher variance among mutational effects for organisms using the performance feedback than for organisms using the environmental signal. Although not significant, we observe such trend in our data (results not shown). Our results support an interesting role for the evolution of plasticity via performance signals (aka. developmental selection; reviewed in Snell-Rood, 2012). However, despite the importance of this mechanism, the vast majority of studies of plasticity consider only environmental signals. There is a need to increase the research effort on the developmental mechanisms of plasticity in the empirical literature, and there is a need for greater consideration of the diversity of developmental pathways of plastic responses in the theory literature.  3.5 Acknowledgment Many thanks to Loren H. Rieseberg, Sarah P. Otto and Amy L. Angert for their help in discussing the design of the project and for feedback. Thanks to Sarah P. Otto for reviewing the manuscript. Thanks to ComputeCanada for offering the necessary computing resources.  3.6 Funding The work was funded by NSERC Discovery Grant RGPIN-2016-03779 to MCW and by the Swiss National Science Foundation via the fellowship Doc.Mobility P1SKP3_168393 to RMD.   	 43	4.1 Introduction As our understanding of ecological and evolutionary processes improves, we study more and more complex processes for which mathematical modelling becomes very tedious if not impossible. For such processes, only numerical simulations can allow us to perform realistic modelling. Examples include the simulation of background selection and evolution of a network of gene interactions presented in chapters 2 and 3 of this thesis. For such complex processes, we rely more and more on individual-based stochastic simulations.  Writing an algorithm to make efficient individual-based simulations is no easy task, and most authors therefore rely on existing, flexible simulation programs. It is often difficult, however, to choose a simulation program. There are no objective ways to compare and express how user-friendly a program is. Also, different program packages have drastically different performance for different simulation scenarios. Learning how to use a new program can be a lengthy and difficult task, therefore many users just use the program they already know or just pick one program that is able to perform the simulations they need without questioning its performance. However, as shown below, even under simple scenarios, a given program can be hundreds or thousands of times slower than another one which will drastically affect the feasibility, or level of replication, of a study. Here, I present SimBit, a general purpose forward-in-time population genetics simulator written in C++. SimBit has been designed to have a high performance for a wide variety of simulation scenarios. SimBit does so by using diverse representations of the genetic architecture for different simulation scenarios. As a user of Nemo (F. Guillaume & Rougemont, 2006), SFS_CODE (R. D. Hernandez, 2008; Ryan D. Hernandez & Uricchio, 2015) and SLiM (Haller et al., 2019; Haller & Messer, 2017b, 2019), I gathered my experience to make SimBit a program that offers a fast learning curve to new users. With a simple set of commands that are very flexible, users can quickly simulate a great diversity of scenarios. SimBit can simulate a wide variety of simulation scenarios (any selection coefficient at each of three genotypes at a locus, any epistatic interaction with any number of loci, any spatial and temporal changes of selection scenarios, etc.), demographic scenarios (any number of patches with specific migration scenario, hard vs. soft selection, changes in patch size depending on fecundity, exponential vs logistic growth, gametic or zygotic dispersion, etc.), mating systems (any cloning rate and selfing rate, hermaphrodites 4 SimBit: A Fast, Flexible, Forward-in-time Population Genetic Simulator 	 44	or males and females) and a great diversity of tools to manipulate simulations and gather output. SimBit also allows simulation of multiple species with their ecological interactions.  I am presenting this software because 1) I believe it can become an important simulation platform in the fields of population genetics, conservation genetics and landscape genetics and also 2) because, by presenting and comparing simulation methods and ideas, I am hoping to bring us closer to the ability to perform realistic simulations of entire genomes. This chapter aims at presenting the general working of SimBit and compare its performance to other similar programs. Please see the manual in appendix C.3 for more details. 4.2 User interface SimBit reads options either directly from the command line or via an input file. An important goal of SimBit is to have a user interface that takes input that is readable and in a very simple format to give the users a good understanding of what they are simulating and offer very explicit error messages when input is nonsense. SimBit recognizes specific options as they are proceeded by a double dash (`--`). For example, `--patchCapacity unif 1e4` indicates that the carrying capacity is uniform (keyword `unif`) for all patches and is set to 10,000. SimBit also provides a number of macros that are mainly inspired from R functions. For examples, to specify the sequences `0 0.1 0.2 0.3 0.4` and `wolf wolf wolf` one can write `seq 0 0.4 0.1` and `rep wolf 3` in the same logic that `seq(0, 0.4, 0.1)` and `rep(“wolf”, 3` creates the same sequences in R. In order to be fast and easy to learn, SimBit provides a lot of functionalities with a relatively small number of options. It achieves this by having most options being generation specific, habitat specific and/or species specific. To specify an option starting at generation 1000, a user would use `@G1000` (or `@G1e3`). To specify an option applying to a species that you called ` wolf`, one would use ` @Swolf`. To specify habitat ` 3`, a user would use ` @H3` (each patch can be assigned to a specific habitat via another option). While many elements of the user interface have been inspired from SFS_CODE (Hernandez, 2008; Hernandez & Uricchio, 2015), this syntax has been inspired from the ‘@g’ of Nemo (F. Guillaume & Rougemont, 2006). As an example, the option `--@G0 patchCapacity unif 100 @G5e3 patchCapacity unif 1000` asks for the carrying capacity of all patches to be uniformly set to 100 from generation 0 to generation 4999 and then set to 1000 up until the end of the simulation. For the many users who will not want to simulate multiple species, spatial and/or temporal variations, SimBit assumes by default that there is only one species, that there are no temporal changes and that there is only one habitat so as to hide all the @G, @H and @S complications. Most options come with a diversity of modes of data entry. For example, for the migration scenario, a user can indicate the whole dispersal matrix or can simply specify that they want an island model, a linear stepping stone model or a Gaussian dispersal kernel. 	 45	4.3 Demography and species ecologies In the current version, SimBit assumes non-overlapping generations (although different species can have different generation times), diploidy, and discrete patches (although patches can be made arbitrarily small, essentially mimicking continuous space). Outside of these three assumptions, SimBit can simulate very diverse types of scenarios. SimBit can simulate any number of patches with any migration matrix, carrying capacity, variation of the patch size from the carrying capacity based on realized fecundity with exponential or logistic growth model (the growth model can be set for each patch independently; see more on that below). Each patch can be initialized at the desired size and all of the above parameters can vary over time. Dispersal can happen at the gametic or at the zygotic phase and may be a function of the patch mean fitness (hard versus soft selection). SimBit can also simulate multiple species and their ecological interactions as explained below. SimBit can simulate realistic changes in population in response to patch mean fitnesses. Let’s denote at time t the expected number of offspring of a species s produced in patch p as 9;,2,B$$$$$$. Let’s also denote the patch growth rate D;,2,B = E∑F4  as the product of f, the theoretical maximum fecundity of an individual having a fitness of 1.0 (set by the user), and ∑F4, the sum of finesses in this patch. If the user allows the patch size to vary from the carrying capacity of this species and that at time t, in patch p, for species s, the carrying capacity is set to Kt,s,p then the expected number of offspring produced is 9;,2,B$$$$$$ = D0;,2,B  for the exponential model and 9;,2,B$$$$$$ = 0;,2,B + D0;,2,B H1 − -),4,5<),4,5I for the logistic model, where 0;,2,B is the size of the patch p of species s at time t. The actual number of offspring produced, 9;,2,B	can then either be set deterministically J9;,2,B = 9;,2,B$$$$$$K or stochastically J9;,2,B = 9L3MML=(9;,2,B$$$$$$)K. With more than one patch, these offspring produced are then spread out through migration. With a single patch (or in absence of immigration and emigration for the patch p), 0;+$,2,B is simply set to 9;,2,B. Into the above framework, we can add the fact that different species can affect each other’s through their ecological relationships. This can be achieved through a “competition matrix” that implements a Lotka-Volterra model of competition and/or through an “interaction matrix” that implements a consumer-resource model (or predator-prey model) with a linear rate of resource consumption (introduction to these models in Otto & Day, 2007; discrete-time example of a predator-prey model in Çelik & Duman, 2009). Let ⍺i,s be an element of the “competition matrix” describing the competitive effect of species i  on focal species s. The expected number of offspring produced is then given by 9;,2,B$$$$$$ =0;,2,B + D0;,2,B H1 − ∑ ⍺6,46 -),6,5<),4,5 I. Note that competitive effects can only be set on species and on patches having logistic growth. Let βi,s be an element of the “interaction matrix” describing the effect of species i on species s. The interaction effect is added to the expected number of offspring produced 9′;,2,B$$$$$$$ = 9;,2,B$$$$$$ + ∑ β4,24 . In this last equation, I assumed that all effects βi,s are independent of the patch sizes of both the causal and recipient species but in practice a user can specify for each βi,s whether the effect should be multiplied by the causal species patch size (0;,4,B), by the recipient species patch size 	 46	(0;,2,B) or by both. SimBit enforces that all the diagonal values ⍺2,2 = 1.0	and that all the diagonal values β2,2 = 0.0. SimBit can also allow the patch size to overshoot the carrying capacity >;,2,Bup to an arbitrary large value allowing for oscillating or chaotic changes in patch sizes.  4.4 Mating system SimBit can simulate hermaphrodites or males and females with an arbitrary sex-ratio. At every reproduction event, an organism will be cloned with probability C. If an organism does not clone itself, then it will either self with probability S or it will look for another organism to reproduce with. By default, the cloning rate is set at 0.0 and the selfing rate is set at 1/2N (Wright-Fisher model), but these can be set by the user.  4.5 Types of loci and selection Different programs use different representations of the genetic variation. For example, Nemo represents an individual’s haplotype with an array in which the nth element of the array indicates the allelic value for the nth locus. In SLiM, each individual’s haplotype is represented with a container of mutations (where each mutation is an object that stores its position and other associated features as attributes). In SFS_CODE, a haplotype is represented with a linked list of mutations. These different representations of the genetic variation have important consequences for the performance of software. Nemo’s technique is expected to perform well at high genetic diversity per locus, while SLiM and SFS_CODE are expected to perform better at low genetic diversity per locus. Nemo also has QTLs and SLiM can mimic QTLs through Eidos (the programming language used to parameterize SLiM simulations). These different representations also have consequences on the flexibility and performance of a program. SimBit implements five different representations of the genetic variation called T1, T2, T3, T4 and T5. I refer to these representations as types of loci. T1, T4 and T5 represent binary loci. T2 represents blocks that count mutations, T3 represent QTLs and all three types. SimBit has multiple representations for a single concept in order to sustain flexibility and high performance over a wide range of genetic diversity and of simulation scenarios. More information on these five types of representations is below. Loci of different types are integrated on the same recombination map. The recombination rate can be specified between any pair of adjacent loci (whether the two loci are of the same type or not) with any number of chromosomes. Mutation rates can also be set independently for each locus. For a number of types of loci (see below), SimBit can make use of an assumption about the selection scenario that can provide substantial improvement in run time. I call this assumption the “multiplicative fitness” assumption. The multiplicative fitness assumption assumes 1) multiplicative fitness interactions among loci and 2) that the fitnesses of the three possible genotypes at a given locus are 1, 1–s and (1–s)2. Unless the exact dominance relationship is of central importance, it is generally recommended to make use of this assumption. When a user makes this assumption, SimBit partitions a haplotype into blocks 	 47	and computes the fitness value for each block. If, during reproduction, no recombination events happen within a given block, then SimBit will not need to recompute the fitness for this specific block as the fitness of the block can simply be multiplied by the fitness of the same block on the other haplotype. By default, SimBit attempts to estimate the optimal size of these blocks, but a user can also explicitly specify the position and location of each block. This technique yields substantial performance improvement in terms of CPU time especially when the recombination rate within blocks is relatively low (see ‘Performance’ section below).  The genetic architecture can be set independently for each species and all the selection scenarios presented below can be set differentially for each species, habitat and time. By default, all of the patches belong to the same habitat, but a user can assign each patch to a specific habitat and all the selection pressures described below (including epistasis) can be specified for each habitat independently. Also, selection can be applied on viability and/or on fertility. 4.5.1 T1 loci T1 loci track binary variables (e.g., mutated vs wildtype). SimBit has in memory for each haplotype an array of bits of the length of the number of T1 loci simulated. The nth bit indicates whether the nth T1 locus of this haplotype is mutated or not. As such, T1 loci are somewhat similar to Nemo’s genetic representation except that it uses only a single bit per locus (while Nemo uses a byte per locus). T1 loci have high performance for simulations with very high per locus genetic diversity. Selection scenarios on T1 loci are extremely flexible. A user can set the fitness values of each of the three possible genotypes at each locus allowing for any kind of dominance scenario including overdominance and underdominance. Any epistatic interactions between any number of loci can also be specified. A user can also use the assumption of “multiplicative fitness” on T1 loci. 4.5.2 T2 loci T2 loci are meant to represent aggregate blocks of loci, and, SimBit counts the number of mutations happening in this block. This type should be used only when 1) the genetic diversity per T2 locus is very high, 2) when performance is a major concern, 3) the user is satisfied with the limited selection scenario it can model, and 4) a simple count of the number of mutations happening per T2 locus for each haplotype is a sufficient output. Selection on T2 is forced to have multiplicative effect among haplotypes (therefore T2 loci always use the assumption of “multiplicative fitness”). 4.5.3 T3 loci T3 loci are quantitative trait loci (QTL) and code for an n-dimensional phenotype. The user can set the phenotypic effect of each T3 locus on each of the n axes of the phenotype, 	 48	and these phenotypic effects can also depend on the environment in order to simulate a plastic response. A user can also add random developmental noise (drawn from a Gaussian distribution) in the production of a phenotype in order to reduce heritability. For T3 loci, the user can define a fitness landscape, where an individual’s fitness is given by its phenotype. 4.5.4 T4 loci For T4 loci, SimBit computes the coalescent tree of the population over time and adds the mutations onto the tree when the user asks for output. T4 loci are extremely fast when the recombination rate is low. T4 loci are inspired from Kelleher et al. (2018; implemented in SLiM; Haller et al., 2019). T4 loci are necessarily neutral. Simulations with T4 loci perform best for large numbers of tightly linked loci but are not necessarily faster than T1 or T5 loci for higher recombination rates. 4.5.5 T5 loci T5 loci are very similar to T1 loci (two simulations with the same random seed differing only by the fact that one uses T1 loci and the other uses T5 loci will produce the same output). For each haplotype, SimBit has a dynamic sorted array with the position of each T5 locus that is mutated. As such T5 loci are somewhat similar to how SLiM keeps track of its genetic architecture. With high genetic diversity SimBit therefore tracks a lot of mutated loci, while with low genetic diversity SimBit tracks few mutated loci. For this reason, T5 loci tend to perform better than T1 loci for moderate to high genetic diversity. T5 and T1 loci are meant to be the most used type of loci in SimBit. Behind the scene, SimBit will track separately T5 loci that are under selection and T5 loci that are neutral for improved performance. SimBit can also compress T5 loci (either the neutral ones and/or the selected ones) information in memory. Compression reduces the RAM usage by up to a factor of 2 and can increase or decrease CPU time depending on the simulation scenario. By default, SimBit makes this compression on the neutral T5 loci only and only when it is certain it will improve performance. For advanced users, it is also possible to ask SimBit to invert the meaning of some loci depending on their frequencies. For example, if the locus 23 is fixed or quasi-fixed, haplotypes would track this 23rd locus only if they carry the non-mutated allele. With T5 loci, one can specify the fitness values of the heterozygote and double mutants’ genotypes only allowing for all types of dominance including overdominance and underdominance. Just as on T1 loci (and T2 loci), a user can take advantage of the assumption of “multiplicative fitness”. 4.6 Initialization Several options exist in SimBit to initialize and reset the genome of existing individuals. The patch size as well as the genetic diversity for each locus (except for T4 loci) can be 	 49	set at initialization. A user can then perform any mutation desired at predefined times with the option --resetGenetics. To ease user interface, SimBit also allows the user to define “individual types” (via option --individualTypes). Those individual types can then be used to either initialize a population or to insert (or replace) new individuals into any patch at arbitrary moments (also via option --resetGenetics). One can, for example, create individual types belonging to large hypothetical patches and simulate immigration from these hypothetical patches by just introducing these individual types into the focal patch. This speeds up simulations as SimBit does not explicitly simulate these large source patches. It is also possible to start a simulation from the individuals of a previous simulation that have been saved in binary files. Binary files are particularly useful to 1) avoid simulating a burn-in multiple times, 2) resume a simulation from an intermediate timepoint, and 3) save the entire population in a compact format to extract specific summary statistics later on. 4.7 Outputs Outputs are often very limiting factors for population genetic simulators (Hoban et al., 2012). SimBit can produce 30 different types of outputs (which can be sampled at any number of generations throughout the simulation). These outputs include, but are not limited to, entire genotypes of each individual in the metapopulation, allele frequencies, FST, .vcf files, fitness (specifying fitness for each type of locus), patch sizes, extinction times of the different species, the whole genealogy between two specified generations, binary files of the entire population (that can be reused for future simulations or simply to extract summary statistics later on). Many of these outputs can be restricted on a specified subset of loci. SimBit can also simulate sequencing errors before producing the outputs to make results easier to compare to empirical data. 4.8 Program comparison – Performance It is often hard for a user to know which program to use for a given study. Indeed, few articles compare program’s features (but see Hoban, 2014, who compares software flexibility), and when an author publishes a new program, he/she does not always compare its performance to other similar programs (but see performance comparisons between SLiM, SFS_CODE and fwdpp in Haller & Messer, 2017).  In this article, I compared performance of SimBit to three forward-in-time programs; SFS_CODE (Hernandez, 2008; Hernandez & Uricchio, 2015), SLiM (Haller et al., 2019; Haller & Messer, 2017b, 2019) and Nemo (F. Guillaume & Rougemont, 2006). I chose these three programs because they are all forward-in-time simulation platforms, they can all simulate selection, they are all popular and are generally considered as high-performance software. Examples of command line input to SimBit for the Wright-Fisher simulations of figures 4.1, 4.2, C.1.1, C.1.2, C.1.3 and C.1.4 as well as for more the complex simulations of figure 4.3 are found in appendix C.2. 	 50	SimBit contains a number of options that are meant to refine its performance (see section “Performance options” in the manual). In practice though, most users will probably only need to choose the type of loci to simulate, and SimBit will do a decent job to figure out how best to simulate it. In order to best represent the performance that a new user ought to expect from SimBit, however, all simulation performances (CPU time and memory usage) presented below are made with the default parameters of SimBit. In order to compare program performance, I ran very basic simulations with a single Wright-Fisher population, uniform mutation rate and a uniform recombination rate. All loci experienced a selection coefficient of s=0.00001 and h=0.5. Low selection coefficients were chosen to 1) prevent Nemo from throwing an error stating that it might suffer from round-off errors caused by low mean fitness and 2) reduce the effects of assuming multiplicative fitness among haplotypes on the simulated scenario (fitness differences between simulations that take advantage of the assumption of multiplicative fitness and the ones that do not is of the order of 10-11). Note that while SimBit can take advantage of this assumption of multiplicative fitness on demand, SFS_CODE is forced to make this assumption and Nemo and SLiM cannot take advantage of this assumption. I varied the mutation rate (taking values 10-7, 10-5 and 10-3 per locus), the recombination rate (taking values 0, 10-9 and 10-7 and 10-5 per adjacent locus), the carrying capacity (taking values 102, 103, 104, 105 and 106 diploid individuals), and the number of loci (taking values 6, 6×102, 6×104 and 6×106) in a full factorial design. All simulations ran for 10,000 generations. I ran these simulations with Nemo (version 2.3.46), SLiM (version 3.1), SFS_CODE (version 20150910) and SimBit (version 4.9.11). Because using Nemo’s full potential is not trivial, for Nemo, the input files used for these benchmarks were directly created by Frederic Guillaume. In order to compare the behaviour of different types of loci and selection scenarios in SimBit, I ran all simulations four times in SimBit with T1 and T5 types of loci with and without making use of the assumption of multiplicative fitness among haplotypes. CPU time and peak in Resident Set Size (RSS; memory) usage are reported. Simulations that exceeded 10 days (240 hours) of simulation time or 20GB of memory were killed and are reported below with a dot at 240 hours (8.64 × 105 seconds in the units used on the figures) and at 20GB (2 × 107 kb in the units used on the figures). All these simulations were run on an Intel Xeon X5650 processor and codes were compiled with gcc-4.8.2rev203690. I ensured that the number of SNPs were not significantly different between all four programs for three of the simulation scenarios benchmarked. For brevity and because changing the recombination rate has very little effect on the results (only SFS_CODE appears to slow down with higher recombination rates), I am showing only the recombination rate 10-7 and only the carrying capacities 103, 104 and, 105 in the main figures. The other benchmarks are found in supplementary material. Figure 4.1 compares the CPU time among SimBit simulations (T1 vs. T5 and with vs. without taking advantage of the assumption of multiplicative fitness among haplotypes) for a subset of scenarios. Figure C.1.1 and C.1.2 compare, respectively, the CPU time and the memory usage among SimBit simulations for all scenarios. Figure 4.2 compares CPU time among Nemo, SLiM, SFS_CODE and SimBit for a subset of scenarios. Figure C.1.3 and C.1.4 compare, respectively, the CPU time and the memory usage among Nemo, SLiM, SFS_CODE and SimBit.  	 51	 Figure 4.1: Comparison of computational time among the four different ways to simulate the same evolutionary scenario using SimBit. Comparisons of memory usage (max Resident Set Size) are found in figure C.1.2. Simulations that exceeded 10 days (240 hours) of simulation time or 20GB of memory were killed and are reported below with an empty dot at 240 hours (8.64 × 105 second). The bold M signifies the usage of the assumption of multiplicative fitness. As expected, T1 loci perform best at high per locus genetic diversity, while T5 loci perform best at moderate to low per locus genetic diversity (figure 4.1). This is because with T5 loci, SimBit tracks the mutated loci, while with T1 loci, SimBit tracks every locus whether mutated or not (see above section “Representations of the genetic architecture”). Simulations taking advantage of the assumption of multiplicative fitness generally performed better. This advantage decreases as recombination gets higher. For the range of recombination rates explored (up to 10-5 among adjacent loci), simulations taking advantage of the assumption of multiplicative fitness always vastly outperformed the simulations that did not make this assumption. The reason why recombination rate matters for performance is because, as explained in section “Types of loci and selection”, SimBit needs to recompute fitness for a fitness block only if a recombination event happens within this block when using the multiplicative fitness assumption.  ● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●● ●●●●●●● ●●●●●●●●● ●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●mu: 1e−07r: 1e−07mu: 1e−05r: 1e−07mu: 0.001r: 1e−07N: 1000N: 10000N: 1e+05102 104 106 102 104 106 102 104 106102104106102104106102104106Number of LociCPU time [seconds]µ: 10-7            µ: 10-5           µ: 10-3   N: 103       N: 104     N: 105 SimBit M T1 SimBit T1 SimBit M T5 SimBit T5 	 52	 Figure 4.2: Comparison of computational time among the four different simulation programs Nemo, SFS_CODE, SLiM and SimBit. For SimBit, two lines are displayed showing the best performing between T1 and T5 loci from figure 4.1, once taking advantage of the assumption of multiplicative fitness, once without taking advantage of this assumption. For comparison, SLiM and Nemo are unable to take advantage of this assumption while SFS_CODE is forced to make this assumption. See figure 4.1 for more details. Comparisons between different programs highlight that there is no one program that always performs best (figure 4.2; figure C.1.3). SFS_CODE’s CPU time and peak RSS increases exponentially with increase in mutation rate and population size (see also simulations performed by the Ryan Hernandez on SFS_CODE websites; sfscode.sourceforge.net/SFS_CODE/Performance.htlm). Hence, SFS_CODE performs well for simulations that have very low genetic diversity, but it quickly becomes very slow as genetic diversity increases. Nemo is most competitive when there is high genetic diversity per locus (high mutation rate and high population size). This was expected because Nemo tracks every single locus for each haplotype whether or not it is mutated. In fact, with high genetic diversity, Nemo often runs in less time than SimBit when SimBit did not take advantage of the  ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●● ●●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●● ●●●● ●●●●●●● ●●●● ●●● ●●●● ●●●● ●●●●mu: 1e−07r: 1e−07mu: 1e−05r: 1e−07mu: 0.001r: 1e−07N: 1000N: 10000N: 1e+05102 104 106 102 104 106 102 104 106102104106102104106102104106Number of LociCPU time [seconds]µ: 10-7     µ: 10-5       µ: 10-3   N: 103  N: 104  N: 105  Nemo SFS_CODE SLiM SimBit M SimBit  	 53	multiplicative fitness assumptions (the grey dots in figures 4.2 and C.1.3). Nemo never outperformed SimBit in terms of memory usage though (Figure C.1.4). SLiM, just like SFS_CODE, performs best at very low genetic diversity. SLiM computational time is however not as exponential as SFS_CODE, which makes SLiM fast for a wider range of simulation scenarios. SLiM tends to perform better than SimBit when there is little genetic diversity, while SimBit tends to perform better when there is moderate to high genetic diversity. In general, performance comparison in terms of memory usage (figures C.1.2, C.1.4) mirrors well the performance comparisons in terms of CPU time (figures C.1.1, C.1.3). A difference in performance is not just a question of whether a user will have to wait a little longer to get their output; sometimes it is the difference between a research project that is feasible or not. The log scale on figures 4.1 and 4.2 (and supp. figures) might give the reader a false impression of the importance of an observed difference. As an example, let’s consider the simulation scenario where r=10-7, N=105, µ=10-7 and 6×104 loci. SimBit (with multiplicative fitness assumption) runs in ~4.5 hours, while SLiM runs in ~19 hours, Nemo runs in ~73 hours (more than 3 days), and SFS_CODE does not manage to finish within the 10-day limit. To further consider comparisons between SLiM and SimBit as example, in the simulation scenario where SLiM is comparably the fastest, SLiM is 8.7 times faster than SimBit; SimBit took 131 seconds while SLiM took only 15 seconds. For the simulation scenario where SimBit is comparably the fastest, SimBit is 446 times faster than SLiM; SimBit took ~32 minutes while SLiM was killed after overpassing the 240 hours walltime. These performance differences can translate into a major determinant of what can be achieved for a research project.  These very simple simulation scenarios benchmarked above might not be representative of what people really want to simulate. I therefore performed further benchmarking by comparing the performance of Nemo, SLiM, SFS_CODE and SimBit for simulations inspired by recent papers. I sampled three papers, one that performed simulations with SFS_CODE (O’Neill et al., 2019), one that performed simulations with Nemo (Gilbert et al. 2017) and one that performed simulations with SLiM (Booker & Keightley, 2018b). To simplify the writing of the commands and make sure that the comparison is fair, I simplified the Booker and Keightley (2017) simulations by assuming a constant mutation rate and recombination rate and used the gamma distribution of fitness effects with a mean of 0.05 and an alpha parameter of 0.111. For the Gilbert et al. (2017) paper, the simulations have also been slightly modified from the original. The original paper’s specified a “breeding kernel” that can only run on a modified version of Nemo that is not directly published on Nemo’s official website. Hence, for the Gilbert et al. (2017) simulation, I removed the breeding_kernel and modified the size of the dispersal kernel appropriately. For simplicity (because the original input file was 390Mb large), I also used a linear stepping stone model of 8000 patches starting with the 1000 left-most patches at carrying capacity and the others empty. I made sure the expansion speed was similar among the two programs. These simulations were run on an Intel i7-8559u processor, and codes were compiled with clang-800.0.42.1. For fairness, I compared the Nemo and SLiM that cannot take advantage of the assumption of multiplicative fitness with SimBit that does not make this assumption, while I compared SFS_CODE that is forced to make this assumption with 	 54	SimBit that makes this assumption. Finally, I added a benchmark of a simple Wright-Fisher simulation scenario (N=1000, µ=10-5, 106 loci, r=0) without selection. Neutral loci can be tracked through a coalescent tree for both SLiM (through Tree Recording and subsequent analysis in Python) and SimBit (through T4 loci). Such techniques work best at low recombination. I purposely chose an extreme recombination rate of 0 to highlight the potential advantage of such a technique. SimBit systematically outperforms the software used in the original papers (figure 4.3). Note that the reason Nemo is so slow compared to SimBit is because when using Nemo’s fastest method to conduct simulations (using backward migration as used in the Wright-Fisher simulations discussed above), Nemo falls into a an infinite loop when trying to colonize empty patches. Gilbert et al (2017), hence, needed to use forward migration which is a slower method in Nemo. Otherwise, Nemo would likely perform similarly to SimBit. Finally, for the “Neutral simulation example”, the coalescent tree recording technique of both SLiM and SimBit vastly outperform more traditional techniques (figure C.1.1). Here, I only considered an extreme recombination rate of zero. With higher recombination rates, the computational time of tree recording techniques would become slower, while it would not have much impact on the runs that did not use a tree recording technique. Among the more traditional techniques, SimBit is the fastest.      Figure 4.3: Comparison of CPU time among the four programs to reproduce simulations inspired from three recent papers as well as for a neutral simulation scenario with extreme parameters chosen to highlight the possible advantage of T4 loci (Tree recording). The bold M signifies the use of the assumption of multiplicative fitness. 02000400060008000NemoSimBit wo multfitGilbert et al.(2017)020406080      SFS_CODE      SimBit multfitO'Neil et al.(2019)050100150SLiMSimBit wo multfitBooker & Keightley(2018)0204060NemoSLiMSLiM_TreeSimBitSimBit_TreeTreeNeutral simulationexampleNemo  SimBit SLiM SimBit SFS_CODE SimBit M Nemo SimBit SLiM SimBit T4  SLiM tree CPU time [seconds]  	 55	4.9 Conclusion There is no perfect way to compare program performance, and one must always be careful when making conclusions from such a benchmark. First, I only considered a subset of the parameter space. For example, my benchmark does not include any single-locus simulations, simulations with high selfing rates or with males and females instead of hermaphrodites, or any simulations with a very high recombination rate. More importantly, different programs mean different things by a locus. SFS_CODE simulate triplets of loci as a codon. This means that many mutations that are happening in SFS_CODE are synonymous mutations that don’t affect fitness. Consequently, the performance comparisons shown here are unfairly favourable to SFS_CODE compared to Nemo, SLiM and SimBit, but it would not be any fairer either to run all SFS_CODE simulations with three times as many loci. Note that SLiM does not really simulate bi-allelic loci as mutations can “stack” on a single locus in a pseudo infinite allele type of model (see SLiM manual for more information; http://benhaller.com/slim/SLiM_Manual.pdf), a process for which it is hard to predict how a different program would compare. As explained above, SimBit contains a number of performance tweaks a user can take advantage to improve the performance above the default run mode (compression of T5 data in memory, allowing inversion of the meaning of T5 loci depending on their frequency, turning on/off the swapping of pointers for haplotypes that do not recombine or mutate during reproduction, setting manually the positions of blocks for the multiplicative fitness assumption). However, the above simulations were all performed with SimBit default values for these performance tweaks, which is somewhat unfair to SimBit. SimBit has already been used in a number of projects. It has been used for simulations with very high genome-wide genetic diversity to investigate the effect of background selection of large stretch of DNA (Matthey‐Doret & Whitlock, 2019; second chapter of this thesis). SimBit has as also used for two projects on genetic rescue, one requiring habitat-specific epistatic interactions (Nietlisbach et al., forthcoming) and one requiring complex metapopulation initialization and introduction of predefined individuals during the simulation (Whitlock lab consortium, forthcoming). SimBit is under a permissive free program license and is available at https://github.com/RemiMattheyDoret/SimBit. Different programs use different simulation methods. A very important factor affecting the performance of a program is how it tracks the genetic variation. This is why SimBit allows several types of representation of the genetic architecture to make sure to always provide a high-performance solution to a user. I think that further open discussion among scientists about tips to improve a software might bring us closer to better software that can, one day maybe, allow us to simulate entire populations with their entire genome in a realistic manner. 4.10 Acknowledgment Thank you to Michael C. Whitlock for his help through discussions in designing SimBit and for helpful comments on the manuscript. Thank you to Pirmin Nietlisbach for being the main beta tester and for his advice on how to improve the user interface and the manual. 	 56	Thank you also to Ben Haller and Frédéric Guillaume for their feedback on how to make a fair comparison among programs. Special thanks to Frédéric Guillaume for his help at creating the input files for Nemo and for his feedback about how to display the benchmark results in a way that is fair. Finally, thank you to ComputeCanada for the computational resources used for benchmarking. 4.11 Funding The work was partially funded by a Swiss National Science Foundation (SNF) Doc.Mobility fellowship P1SKP3_168393 and partially funded by Natural Science and Engineering Research Canada (NSERC) Discovery Grant RGPIN-2016-03779 to Michael C. Whitlock.   	 57	Environmental heterogeneity is a vast subject and I have, throughout my PhD thesis, only touched a few elements of it. Each chapter concerns a very different aspects of environmental heterogeneity and each chapter therefore lead to very different implications. 5.1 Chapter 2 As discussed in chapter 2, there is a widespread concern that background selection would cause strong variation in FST throughout the genome (see discussions in Charlesworth et al., 1997; Cruickshank & Hahn, 2014; Cutter & Payseur, 2013; Hoban et al., 2016; Irwin et al., 2016). We have shown that locus-to-locus variation in the intensity of background selection does not cause much variation in FST. We concluded that background selection is unlikely to be a confounding variable to local adaptation in FST outlier scans. Cruickshank and Hahn (2014) have suggested that dXY would be less sensitive to background selection than FST. In fact, we have shown that dXY is much more sensitive to loci-to-loci variation in the intensity of background selection than FST. We also brought attention to how other related statistics of divergence are affected by background selection. Some population genetics statistics software average dXY over all polymorphic sites instead of averaging over all sites (polymorphic or not) of the region considered as initially defined by Nei (1987) . We named this statistic dXY-SNP and showed that this statistic is even more sensitive to background selection than dXY. The reason is that dXY-SNP depends upon the number of polymorphic sites which is itself highly affected by background selection (Kaiser & Charlesworth, 2009). We considered some of the definitional variants of FST. We considered both Nei (1973)’s definition of FST (#() = *"#*!*" ; see Charlesworth, 1998) and Weir & Cockerham (1984)’s definition (#() = *%#*!*% ; see Charlesworth, 1998). We also considered two ways to sum up FST values among loci; ratio of averages (as first described in Weir and Cockerham, 1984) and average of ratios (as it was done originally). The ratio of averages method is generally considered the “correct” method. As explained in chapter 2, by default what we called FST, is Weir and Cockerham’s (1984) estimate with ratio of averages. We showed that using Nei (1973)’s or Weir and Cockerham’s (1984) definition does not changes much the estimated effect of background selection on FST. However, averaging FST as an average of ratios makes FST a little more sensitive to background selection. This effect however goes in the opposite direction than what many would expect; with the average of ratios method, loci under strong background selection have lower FST values. This is because background selection creates an excess of loci with very low heterozygosity (rare alleles, negative Tajima’s D; Charlesworth et al., 1995), loci with low heterozygosity tend to have 5 Conclusion 	 58	a lower FST  (Beaumont & Nichols, 1996), and the method “average of ratios” put more weight to loci of low heterozygosity than the method “ratio of averages”. FST outlier scans are a very common class of method used for detecting selection (Hoban et al., 2016b). Yet, despite a number of simulation studies that have compared existing methods (De Mita et al., 2013; Lotterhos & Whitlock, 2014a; Narum & Hess, 2011; Pérez-Figueroa et al., 2010; Vilas et al., 2012), we know little about the best practices, the advantages and disadvantages of the different methods. For example, in a sliding window analysis, to my knowledge, there exist no guideline about the size of the window to consider, or whether the window size should be defined in term of the physical map or of the recombination map. Several recent studies reported a positive correlation between recombination rate and FST (Cruickshank & Hahn, 2014b; Torres et al., 2018; Vijay et al., 2017c). In all these cases, the correlations were mainly assumed to be caused by background selection. In chapter 2, we report that all these correlations are unlikely to be caused by background selection. It is possible that positive selection could cause this pattern through genetic hitchhiking (Maynard-Smith & Haigh, 1974); a hypothesis that would deserve testing. This hypothesis seems to fit well within the general body of literature on the prevalence of selective sweeps; between D. melanogaster and D. simulans, it is estimated that 40% of all fixed substitutions are adaptive (Eyre-Walker & Keightley, 2009) and among the D. simulans lineages, an estimated 13% of all fixed substitutions are adaptive (Sattath et al., 2011). In Plasmodium falciparum, strong selective sweeps appears to be very common (about one every ten sexual generations; (Wootton et al., 2002). Between humans and chimpanzees, 31% of the 97 genes studied show evidence of divergent selection (Wildman et al., 2003); see also ( Hernandez et al., 2011). In humans, it is believed that selective sweeps play a strong role in reducing diversity along the genome (Munch et al., 2016). Back-of-the-envelope calculations estimate that on average approximatively 80 incomplete sweeps are happening, at any point in time, in Drosophila melanogaster (Booker and Whitlock, 2019). Therefore, it appears plausible that positive selection could be the main cause of correlations between recombination rate and FST. There are yet other alternatives that could explain this correlation such as for example a positive correlation between the recombination rate and the mutation rate as mutation rate affect FST when the migration rate is very low (µ≈m; Whitlock & Mccauley, 1999; Wright, 1931). An interesting follow-up study would consist in investigating genome-wide correlations between statistics of genetic diversity within and between populations and proxies for both purifying and positive selection. The proxies one might want to consider could include 1) conserved regions scores, 2) B statistics (statistics expressing the strength of background selection originally estimated by McVicker et al., 2009 for the human genome) and 3) the fraction of substitutions thought to be driven by selection (estimated from McDonald-Kreitman tests). In 2017, Torres et al. ran a similar study but only considering the B statistics. They interpreted the observed correlations with FST as caused by background selection despite that B statistics are computed from gene density and recombination data and are hence as much a proxy for positive selection than for purifying selection. They however tried to get rid of the problem by only considering sequences that have a high conserved score. This work would help contrast Torres et al. (2017)’s results and provide 	 59	a better picture of the relative roles of positive and purifying selection in explaining locus-to-locus variation in genetic diversity. As already discussed, in chapter 2 I report that some statistical software programs actually mistake dXY with what we termed dXY-SNP. Furthermore, during the planning of this project, I have attempted to run simulations with different software to compare their performance and while doing so, I have stumbled upon several bugs. In chapter 4, I report two bugs, one in SFS_CODE and one in Nemo but I could have reported at least two more bugs among these two programs. I did not find any bug in SLiM. Of course, no one is safe from bugs and I would be surprised if SimBit did not contain a few of them. Errors in computational methods are common (Wilson et al., 2017) and contribute to the common problems of reproducibility experienced in ecology and evolution (Fidler et al., 2017; Gilbert et al., 2012). These issues I experienced only further highlight and further raised my awareness of the importance of not blindly trusting a software but abundantly test that it is bug free and that it performs well for the desired task.  Much of modern statistics have been invented for evolutionary biology and ecology. Our methods of data analysis are complex and with such complex methods, we can lose the intuition that connects the data to the statistical conclusion. It is therefore very important that we systematically perform advanced and fair testing of the methodologies that we use to analyse our data. In evolutionary biology, we are increasingly attempting to perform more and more complex inferences; inferring what loci are under local selection (methods reviewed in Hoban et al., 2016), past (e.g. Gattepaille et al, 2016) and current (reviewed in Gilbert and Whitlock, 2015) effective population size, phylogenetic networks (e.g. Huson and Bryant, 2006), migration scenario (e.g. Petkova et al., 2016), age of fixed substitutions (Ormond et al., 2016: Inferring the age of a fixed beneficial allele). I think the complexity of these inferences should call for some humility and some care consideration of the reach and accuracy of our methods. Systematic testing and comparison of software could be made easier with a repository of population genetics simulation. While not a repository of simulations, stdpopsim (Adrion et al, 2019) is helpful in this domain by easing the reproduction of published works. Among methodologies that attempt at detecting loci under local selection, some methods look at FST peaks (e.g. Fdist2, FLK, BayScan), some look at genome-environment correlations (e.g. Bayenv2), some look at Extended Haplotype Homozygosity (EHH; e.g. PLINK), some look at linkage disequilibrium (e.g. Wang et al., 2006) or at site frequency spectrum patterns (e.g. SweepFinder). No method combines them all as developing such theory would be very difficult to develop. Also, these methods might have a hard time to distinguish global adaptation from local adaptation (Booker et al., 2019). A daring but potentially very rewarding work would consist in producing a very large amount of simulations with a great diversity of genetic architectures, mating systems and demographies and then train a machine learning algorithm (e.g. a neural network) through supervised learning to detect where selection is operating in these simulations. Such project would require substantial computational resources and a powerful simulation platform. To my knowledge, the existing use of machine learning to detect positive selection were all carried through unsupervised learning with Hidden Markov Models (Boitard et al., 2012; Boitard et al., 2009; Kern and Haussler, 2010). 	 60	5.2 Chapter 3 In chapter 3, we investigate the expectation that a plastic response would come at a cost of developmental instability. As reviewed in the introduction, empirical evidence on the question has been inconclusive. We show that whether or not plasticity comes at a cost of developmental plasticity depends upon the mechanism by which the plastic response is implemented; plasticity comes at a cost of developmental instability when the plastic response is mediated by an environmental signal but not when it is mediated by a performance signal (aka developmental selection). Further, we argue that the fundamental concept of cost of plasticity might be misleading as a fitness cost is an individual’s property, while plasticity is the property of a given phenotypic trait. In general, my personal bias is to think that 1) the concepts of costs and limits are not necessarily a useful categorization of constraints to plasticity evolution (or at least that it is not a well-defined categorization) and 2) if I have to use the concepts of costs and limits, I would follow Murren et al. (2015) in thinking that limits probably outweigh costs. We also show that a population can evolve to listen to a performance signal in a constant environment in order to buffer against developmental noise. We show that organisms of such populations are pre-adapted to a plastic response as they already “learned” to respond adaptively to a performance signal. The idea that a performance signal can help an organism respond adaptively to a novel environment has been theorized before (Snell-Rood, 2012; Snell-Rood et al., 2018). In chapter 3, we provide the first evidence of it. We also investigated the evolution of mutational robustness. Several hypotheses exist for the evolution of mutational robustness. The adaptive hypothesis is the idea that mutational robustness evolves in direct response to selection (Azevedo et al., 2006; Szollosi and Derenyi, 2009; Van Nimwengen et al., 1999; Wagner et al., 1997). The congruent hypothesis is the idea that mutational robustness evolves as a correlated side effect of the evolution of developmental robustness (Ancel & Fontana, 2000; Rifkin et al., 2005; Stearns et al., 1995; Szollosi & Derenyi, 2009). As reviewed in chapter 3, several empirical studies reported a positive correlation between mutational and developmental robustness. Such correlation is often interpreted as evidence for the congruent hypothesis. However, because the selection pressures for developmental robustness are similar to the ones for mutational robustness, it is possible that such correlation could be driven by the varying selection pressure acting as a confounding variable (Wagner et al., 1997). In chapter 3, while controlling for selection pressures, in all but one of all the simulation scenarios explored, we report a systematic positive correlation between developmental robustness and mutational robustness. By doing so, we bring further support for the congruent hypothesis. Historically, evolution of plasticity was mainly studied with statistical quantitative genetics models (reviewed in Schlichting, 1986). I personally think that a more mechanistic model of development can be the future of evo-devo studies (see also Geritz and Kisdi, 2012, for a discussion of the advantages of mechanistic models over phenomenological models in ecology and evolution). I hope that the model used in chapter 3, can highlight the potential of such mechanistic models of development to resolve questions in evolutionary biology. 	 61	While it was never addressed directly, by talking about evolution of plasticity and mutational robustness, chapter 3 runs close to the question of evolvability. Indeed, as reviewed by Pigliucci (2008), the important concepts discussed in the evolvability literature are developmental evolutionary biology (incl. developmental and mutational robustness, facilitated variation among others), plasticity (incl. adaptation to new environments, genetic accommodation among others), niche construction and inclusive inheritance (incl. epigenetics and cultural inheritance among others). The field of evolvability has seen a lot of conceptual issues articles and a lot of reviews but, sadly, relatively few original articles in comparison. Among the 30 first results of a search for “evolvability robustness” on Web Of Science (on the 23.01.2020), I found 15 reviews and conceptual issues articles, 1 proceeding article on a theory original article, 8 theory original articles and 6 empirical original articles. Among these 30 articles, original articles therefore represent less than half of the articles published and empirical original work represent only 20% of the work published. I tend to think that a field of research that produces more reviews than original work is likely to be messy and not well-developed.  Pigliucci (2008) wrote “The major point that authors do agree on is that there is some connection between modularity and the genetic architecture and evolvability. But the agreement doesn’t go much further [..]”. Agreeing on the existence of a connection among three vaguely defined concepts sounds like not much of an achievement yet. I personally think that, in the field of evolvability and robustness, we need more original works and fewer reviews or discussions on conceptual issues, semantic and terminology. Studying the role of development in evolution might feel like walking blind in a gigantic world where nothing is defined and everything needs to be discovered but I think that it has the potential to vastly improve our theory of evolution. 5.3 Chapter 4 We are in an era in which improvement of our simulation software and improvement of computational capabilities made realistic genome-wide simulations almost possible, but not quite yet. In chapter 4, I am contributing to the work effort by discussing the existing techniques and by bringing in some new simulation technics that allow to outperform existing simulation methods for many simulation scenarios. Different programs use different methods to simulate the same scenario. The choice of method has drastic impact on the performance. With SimBit, I created a program that can combine several methods in order to ensure to always be performing well. On top of that, I introduce a new technique taking advantage of an assumption that fitness effects are multiplicative among haplotypes. I show that this technic can bring about very important performance improvement above any other methods tested. Every software has its advantage and disadvantages. Over the years, I certainly have forged many opinions about SFS_CODE, SLiM and Nemo. I certainly would love to have Nemo’s ability to create outputs with SLiM scriptability and SFS_CODE easiness to produce the desired simulation. This concept has been discussed among Frederic Guillaume, Ben Haller and I in Portland 2017 but sadly it is no easy task to merge programs and to enjoy the best of each into a common framework. Nemo has first been created with the goal to 	 62	be a platform in which everyone could come and implement their specific need (Guillaume, 2006). While the idea is interesting, it never really took off, probably partly because implementing a new module necessarily require a decent understanding of the rest of the code. In chapter 3, I am using a model of development through a network of gene interactions. I think this model is very promising and can open the door to better understanding of the evolution of such networks, of the evolution of dominance and of the evolution of genome complexity. The current model, ENTWINE, is however not user friendly, does not make an ideal use of the cache memory and can therefore be made faster and is really not easy to work with. An interesting follow up would be to rewrite this model in SimBit, make it faster, easier to maintain and modify and give it flexible and practical commands to control such simulations and to analyse output data. Such a tool could reveal very helpful in the future.   	 63	Aeschbacher, S., Selby, J. P., Willis, J. H., & Coop, G. (2017). Population-genomic inference of the strength and timing of selection against gene flow. Proceedings of the National Academy of Sciences, 114(27), 7061–7066. https://doi.org/10.1073/pnas.1616755114 Agrawal, A. A. (2001). Phenotypic Plasticity in the Interactions and Evolution of Species. Science, 294(5541), 321–326. https://doi.org/10.1126/science.1060701 Amarillo-Suárez, A. R., & Fox, C. W. (2006). Population differences in host use by a seed-beetle: Local adaptation, phenotypic plasticity and maternal effects. Oecologia, 150(2), 247–258. https://doi.org/10.1007/s00442-006-0516-y Ancel, L. W., & Fontana, W. (2000). Plasticity, Evolvability, and Modularity in RNA. Journal of Experimental Biology, 42. Antonelli, A., & Sanmartín, I. (2011). Why are there so many plant species in the Neotropics? TAXON, 60(2), 403–414. https://doi.org/10.1002/tax.602010 Arendt, J. D. (2015). Effects of dispersal plasticity on population divergence and speciation. Heredity, 115(4), 306–311. https://doi.org/10.1038/hdy.2015.21 Bibliography 	 64	Atkins, K. E., & Travis, J. M. J. (2010). Local adaptation and the evolution of species’ ranges under climate change. Journal of Theoretical Biology, 266(3), 449–457. https://doi.org/10.1016/j.jtbi.2010.07.014 Azevedo, R. B. R., Lohaus, R., Srinivasan, S., Dang, K. K., & Burch, C. L. (2006). Sexual reproduction selects for robustness and negative epistasis in artificial gene networks. Nature, 440(7080), 87–90. https://doi.org/10.1038/nature04488 Baker, H. G. (1974). The Evolution of Weeds. Annual Review of Ecology and Systematics, 5(1), 1–24. https://doi.org/10.1146/annurev.es.05.110174.000245 Baldwin, J. M. (1986). A New Factor in Evolution. The American Naturalist, 441–451. Bank, C., Ewing, G. B., Ferrer-Admettla, A., Foll, M., & Jensen, J. D. (2014). Thinking too positive? Revisiting current methods of population genetic selection inference. Trends in Genetics, 30(12), 540–546. https://doi.org/10.1016/j.tig.2014.09.010 Bartlett, M. S. (1957). Measles Periodicity and Community Size. Journal of the Royal Statistical Society. Series A (General), 120(1), 48. https://doi.org/10.2307/2342553 	 65	Beaumont, M. A., & Nichols, R. A. (1996). Evaluating loci for use in the genetics analysis of population structure. Proceedings of the Royal Society B: Biological Sciences, 263, 1619–1626. Beaumont, M., & Nichols, R. (1996). Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society of London. Series B: Biological Sciences, 263(1377), 1619–1626. https://doi.org/10.1098/rspb.1996.0237 Becks, L., & Agrawal, A. F. (2010). Higher rates of sex evolve in spatially heterogeneous environments. Nature, 468(7320), 89–92. https://doi.org/10.1038/nature09449 Becskei, A, & Serrano, L. (2000). Engineering stability in gene networks by autoregulation. Nature, 405(June), 590–593. https://doi.org/10.1038/35014651 Becskei, Attila, & Serrano, L. (2000). Engineering stability in gene networks by autoregulation. Nature, 405(6786), 590–593. https://doi.org/10.1038/35014651 Begun, D. J., & Aquadro, C. F. (1992). Levels of naturally occuring DNA polymorphism correlate with recombination rate in D. melanogaster. Nature, 356. Bhalla, U. S., & Iyengar, R. (1999). Emergent Properties of Networks of Biological Signalling Pathways. Science, 283, 381–387. 	 66	Billiard, S., & Lenormand, T. (2005). EVOLUTION OF MIGRATION UNDER KIN SELECTION AND LOCAL ADAPTATION. Evolution, 59(1), 13–23. https://doi.org/10.1111/j.0014-3820.2005.tb00890.x Blanquart, F., & Gandon, S. (2014). ON THE EVOLUTION OF MIGRATION IN HETEROGENEOUS ENVIRONMENTS: ON THE EVOLUTION OF MIGRATION. Evolution, 68(6), 1617–1628. https://doi.org/10.1111/evo.12389 Blanquart, F., Kaltz, O., Nuismer, S. L., & Gandon, S. (2013). A practical guide to measuring local adaptation. Ecology Letters, 16(9), 1195–1205. https://doi.org/10.1111/ele.12150 Bock, D. G., Kantar, M. B., Caseys, C., Matthey-Doret, R., & Rieseberg, L. H. (2018). Evolution of invasiveness by genetic accommodation. Nature Ecology & Evolution, 604, 1–29. https://doi.org/10.1038/s41559-018-0553-z Bonhomme, M., Chevalet, C., Servin, B., Boitard, S., Abdallah, J., Blott, S., & SanCristobal, M. (2010). Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended. Genetics, 186(1), 241–262. https://doi.org/10.1534/genetics.110.117275 Bonhomme, Maxime, Chevalet, C., Servin, B., Boitard, S., Abdallah, J., Blott, S., & SanCristobal, M. (2010). Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended. Genetics, 186, 241–262. 	 67	Boogert, N. J., Paterson, D. M., & Laland, K. N. (2006). The Implications of Niche Construction and Ecosystem Engineering for Conservation Biology. BioScience, 56(7), 570. https://doi.org/10.1641/0006-3568(2006)56[570:TIONCA]2.0.CO;2 Booker, T. R., & Keightley, P. D. (2018a). Understanding the factors that shape patterns of nucleotide diversity in the house mouse genome. BioRxiv, 35(12), 2971–2988. https://doi.org/10.1101/275610 Booker, T. R., & Keightley, P. D. (2018b). Understanding the Factors That Shape Patterns of Nucleotide Diversity in the House Mouse Genome. Molecular Ecology, 18. https://doi.org/10.1093 Booker, T. R., Yeaman, S., & Whitlock, M. C. (2019). Global adaptation confounds the search for local adaptation [Preprint]. Evolutionary Biology. https://doi.org/10.1101/742247 Boyd, P. W., Cornwall, C. E., Davison, A., Doney, S. C., Fourquez, M., Hurd, C. L., Lima, I. D., & McMinn, A. (2016). Biological responses to environmental heterogeneity under future ocean conditions. Global Change Biology, 22(8), 2633–2650. https://doi.org/10.1111/gcb.13287 Brachet, S., Olivieri, I., Godelle, B., Klein, E., Frascaria-Lacoste, N., & Gouyon, P.-H. (1999). Dispersal and Metapopulation Viability in a Heterogeneous Landscape. 	 68	Journal of Theoretical Biology, 198(4), 479–495. https://doi.org/10.1006/jtbi.1999.0926 Bradshaw, A. D. (1965). Evolutionary Significance of Phenotypic Plasticity in Plants. In Advances in Genetics (Vol. 13, pp. 115–155). Elsevier. https://doi.org/10.1016/S0065-2660(08)60048-6 Brandon, R. N. (1995). Adaptation and Environment: Princeton University Press. https://doi.org/10.1515/9781400860661 Brookes, J. I., & Rochette, R. (2007). Mechanism of a plastic phenotypic response: Predator-induced shell thickening in the intertidal gastropod Littorina obtusata. Journal of Evolutionary Biology, 20(3), 1015–1027. https://doi.org/10.1111/j.1420-9101.2007.01299.x Brousseau, L., Postolache, D., Lascoux, M., Drouzas, A. D., Källman, T., Leonarduzzi, C., Liepelt, S., Piotti, A., Popescu, F., Roschanski, A. M., Zhelev, P., Fady, B., & Vendramin, G. G. (2016). Local adaptation in European firs assessed through extensive sampling across altitudinal gradients in southern Europe. PLoS ONE, 11(7), e0158216. https://doi.org/10.1371/journal.pone.0158216 Cai, J. J., Macpherson, J. M., Sella, G., & Petrov, D. A. (2009). Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genetics, 5(1), e1000336. https://doi.org/10.1371/journal.pgen.1000336 	 69	Çelik, C., & Duman, O. (2009). Allee effect in a discrete-time predator–prey system. Chaos, Solitons & Fractals, 40(4), 1956–1962. https://doi.org/10.1016/j.chaos.2007.09.077 Chalancon, G., Ravarani, C. N. J., Balaji, S., Martinez-Arias, A., Aravind, L., Jothi, R., & Babu, M. M. (2012). Interplay between gene expression noise and regulatory network architecture. Trends in Genetics, 28(5), 221–232. https://doi.org/10.1016/j.tig.2012.01.006 Charlesworth, B. (1998). Measures of divergence between populations and the effect of forces that reduce variability. Molecular Biology and Evolution, 15(5), 538–543. https://doi.org/10.1093/oxfordjournals.molbev.a025953 Charlesworth, B., Morgan, M. T., & Charlesworth, D. (1993). The effect of deleterious mutations on neutral molecular variation. Genetics, 134(4), 1289–1303. Charlesworth, B, Nordborg, M., & Charlesworth, D. (1997). The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetical Research, 70(2), 155–174. https://doi.org/10.1017/S0016672397002954 Charlesworth, Brian. (1998). Measures of divergence between populations and the effect of forces that reduce variability. Molecular Biology and Evolution, 15(5), 538–543. https://doi.org/10.1093/oxfordjournals.molbev.a025953 	 70	Charlesworth, Brian. (2012). The role of background selection in shaping patterns of molecular evolution and variation: Evidence from variability on the Drosophila X chromosome. Genetics, 191(1), 233–246. https://doi.org/10.1534/genetics.111.138073 Charlesworth, Brian, Nordborg, M., & Charlesworth, D. (1997). The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetical Research, 70(2), 155–174. https://doi.org/10.1017/S0016672397002954 Charlesworth, D., Charlesworth, B., & Morgan, M. T. (1995a). The pattern of neutral molecular variation under the background selection model. Genetics, 141(4), 1619–1632. Charlesworth, D., Charlesworth, B., & Morgan, M. T. (1995b). The Pattern of Neutral Molecular Variation Under the Background Selection Model. Genetics, 14. Charmantier, A., Mccleery, R. H., Cole, L. R., Perrins, C., Kruuk, L. E. B., & Sheldon, B. C. (2009). Adaptive Phenotypic Plasticity in Response to Climate Change in a. 800(2008), 800–804. https://doi.org/10.1126/science.1157174 Chen, H., Patterson, N., & Reich, D. E. (2010). Population differentiation as a test for selective sweeps. Genome Research, 20(3), 393–402. https://doi.org/10.1101/gr.100545.109 	 71	Colosimo, P. F., Peichel, C. L., Nereng, K., Blackman, B. K., Shapiro, M. D., Schluter, D., & Kingsley, D. M. (2004). The Genetic Architecture of Parallel Armor Plate Reduction in Threespine Sticklebacks. PLoS Biology, 2(5), e109. https://doi.org/10.1371/journal.pbio.0020109 Côté, G., Perry, G., Blier, P., & Bernatchez, L. (2007). The influence of gene-environment interactions on GHR and IGF-1 expression and their association with growth in brook charr, Salvelinus fontinalis (Mitchill). BMC Genetics, 8, 1–13. https://doi.org/10.1186/1471-2156-8-87 Cruickshank, T. E., & Hahn, M. W. (2014a). Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Molecular Ecology, 23(13), 3133–3157. https://doi.org/10.1111/mec.12796 Cruickshank, T. E., & Hahn, M. W. (2014b). Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Molecular Ecology, 23(13), 3133–3157. https://doi.org/10.1111/mec.12796 Currie, D. J. (1991). Energy and Large-Scale Patterns of Animal- and Plant-Species Richness. The American Naturalist, 137(1), 27–49. https://doi.org/10.1086/285144 	 72	Cutter, A. D., & Payseur, B. A. (2013a). Genomic signatures of selection at linked sites: Unifying the disparity among species. Nature Publishing Group, 14(4), 262–274. https://doi.org/10.1038/nrg3425 Cutter, A. D., & Payseur, B. A. (2013b). Genomic signatures of selection at linked sites: Unifying the disparity among species. Nature Reviews Genetics, 14(4), 262–274. https://doi.org/10.1038/nrg3425 Dayhoff, M. O., & Ledley, R. S. (1962). Comprotein: A computer program to aid primary protein structure determination. Proceedings of the December 4-6, 1962, Fall Joint Computer Conference on - AFIPS ’62 (Fall), 262–274. https://doi.org/10.1145/1461518.1461546 De Mita, M. S. (2012). EggLib: Processing, analysis and simulation tools for population genetics and genomics. BMC Genetics, 13(1), 27. https://doi.org/10.1186/1471-2156-13-27 De Mita, S., Thuillet, A.-C., Gay, L., Ahmadi, N., Manel, S., Ronfort, J., & Vigouroux, Y. (2013). Detecting selection along environmental gradients: Analysis of eight methods and their effectiveness for outbreeding and selfing populations. Molecular Ecology, 22(5), 1383–1399. https://doi.org/10.1111/mec.12182 	 73	de Villemereuil, P., & Gaggiotti, O. E. (2015). A new F ST -based method to uncover local adaptation using environmental variables. Methods in Ecology and Evolution, 6(11), 1248–1258. https://doi.org/10.1111/2041-210X.12418 de Visser, J. A. G. M. de, Hermisson, J., Wagner, G. P., Meyers, L. A., Bagheri-Chaichian, H., Blanchard, J. L., Chao, L., Cheverud, J. M., Elena, S. F., Fontana, W., Gibson, G., Hansen, T. F., Krakauer, D., Lewontin, R. C., Ofria, C., Rice, S. H., Dassow, G. von, Wagner, A., & Whitlock, M. C. (2003). Perspective: Evolution and Detection of Genetic Robustness. Evolution, 57(9), 1959–1972. DeAngelis, D. L., & Waterhouse, J. C. (1987). Equilibrium and Nonequilibrium Concepts in Ecological Models. Ecological Monographs, 57(1), 1–21. https://doi.org/10.2307/1942636 DeAngelis, Donald L., & Mooij, W. M. (2005). Individual-Based Modeling of Ecological and Evolutionary Processes. Annual Review of Ecology, Evolution, and Systematics, 36(1), 147–168. https://doi.org/10.1146/annurev.ecolsys.36.102003.152644 DeAngelis, Donald L., & Volker Grimm. (2014). Individual-based models in ecology after four decades. F1000Prime Reports, 6. https://doi.org/10.12703/P6-39 Debat, V., & David, P. (2001). Mapping phenotypes: Canalization , plasticity and developmental stability. 16(10), 555–561. 	 74	DeWitt, T. J. (1998). Costs and limits of phenotypic plasticity: Tests with morphology and life history in a freshwater snail. Journal of Evolutionary Biology, 11(4), 465–480. DeWitt, T. J., Sih, A., & Wilson, D. S. (1998a). Costs and limits of phenotypic plasticity. Trends in Ecology and Evolution, 13(2), 77–81. https://doi.org/10.1016/S0169-5347(97)01274-3 DeWitt, T. J., Sih, A., & Wilson, D. S. (1998b). Costs and limits of phenotypic plasticity. Trends in Ecology & Evolution, 13(2), 77–81. https://doi.org/10.1016/S0169-5347(97)01274-3 Draghi, J. A., Parsons, T. L., Wagner, G. P., & Plotkin, J. B. (2010). Mutational robustness can facilitate adaptation. Nature, 463(7279), 353–355. https://doi.org/10.1038/nature08694 Draghi, J., & Whitlock, M. C. (2015). Overdominance interacts with linkage to determine the rate of adaptation to a new optimum. Journal of Evolutionary Biology, 28(1), 95–104. https://doi.org/10.1111/jeb.12547 Draghi, Jeremy. (2019). Phenotypic variability can promote the evolution of adaptive plasticity by reducing the stringency of natural selection. Journal of Evolutionary Biology, 32(11), 1274–1289. https://doi.org/10.1111/jeb.13527 	 75	Draghi, Jeremy, & Whitlock, M. (2015). Robustness to noise in gene expression evolves despite epistatic constraints in a model of gene networks. Evolution, 69(9), 2345–2358. https://doi.org/10.1111/evo.12732 Ducher, G., Prouteau, S., Courteix, D., & Benhamou, C.-L. (2004). Cortical and Trabecular Bone at the Forearm Show Different Adaptation Patterns in Response to Tennis Playing. Journal of Clinical Densitometry, 7(4), 399–405. https://doi.org/10.1385/JCD:7:4:399 Dukas, R. (2008). Evolutionary Biology of Insect Learning. Annual Review of Entomology, 53(1), 145–160. https://doi.org/10.1146/annurev.ento.53.103106.093343 Dutoit, L., Vijay, N., Mugal, C. F., Bossu, C. M., Burri, R., Wolf, J., & Ellegren, H. (2017). Covariation in levels of nucleotide diversity in homologous regions of the avian genome long after completion of lineage sorting. Proceedings of the Royal Society B: Biological Sciences, 284(1849), 20162756. https://doi.org/10.1098/rspb.2016.2756 Dworkin, I. (2005). A study of canalization and developmental stability in the sternopleural bristle systems of Drosophila melanogaster. Evolution, 59(7), 1500–1509. 	 76	Eckert, A. J., van Heerwaarden, J., Wegrzyn, J. L., Nelson, C. D., Ross-Ibarra, J., González-Martínez, S. C., & Neale, David. B. (2010). Patterns of Population Structure and Environmental Associations to Aridity Across the Range of Loblolly Pine ( Pinus taeda L., Pinaceae). Genetics, 185(3), 969–982. https://doi.org/10.1534/genetics.110.115543 Elyashiv, E., Sattath, S., Hu, T. T., Strutsovsky, A., McVicker, G., Andolfatto, P., Coop, G., & Sella, G. (2016). A Genomic map of the effects of linked selection in Drosophila. PLoS Genetics, 12(8), e1006130. https://doi.org/10.1371/journal.pgen.1006130 Excoffier, L., Hofer, T., & Foll, M. (2009). Detecting loci under selection in a hierarchically structured population. Heredity, 103(4), 285–298. https://doi.org/10.1038/hdy.2009.74 Eyre-Walker, A., & Keightley, P. D. (2009). Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Molecular Biology and Evolution, 26(9), 2097–2108. https://doi.org/10.1093/molbev/msp119 Fisher, R. A. (1950). Gene frequencies in a cline determined by selection and diffusion. Biometrics, 6, 353–361. 	 77	Fjeldså, J., Bowie, R. C. K., & Rahbek, C. (2012). The Role of Mountain Ranges in the Diversification of Birds. Annual Review of Ecology, Evolution, and Systematics, 43(1), 249–265. https://doi.org/10.1146/annurev-ecolsys-102710-145113 Foll, M., & Gaggiotti, O. (2008). A genome-scan method to identify selectedl loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics, 180(2), 977–993. https://doi.org/10.1534/genetics.108.092221 Foll, Matthieu, & Gaggiotti, O. (2008). A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective. Genetics, 180(2), 977–993. https://doi.org/10.1534/genetics.108.092221 Forsman, A. (2015). Rethinking phenotypic plasticity and its consequences for individuals, populations and species. Heredity, 115(4), 276–284. https://doi.org/10.1038/hdy.2014.92 Forsman, Anders, Ahnesiö, J., Caesar, S., & Karlsson, M. (2008). A Model of Ecological and Evolutionary Consequences of Color Polymorphism. Ecology, 89(1), 34–40. Forsman, Anders, Ahnesiö, J., Caesar, S., Karlsson, M., Ecology, S., & Jan, N. (2008). A Model of Ecological and Evolutionary Consequences of Color Polymorphism Stable URL : http://www.jstor.org/stable/27651505 REFERENCES Linked references are available on JSTOR for this article: MOD OF. 89(1), 34–40. 	 78	Frank, S. A. (2011). Natural selection. II. Developmental variability and evolutionary rate*: Developmental variability. Journal of Evolutionary Biology, 24(11), 2310–2320. https://doi.org/10.1111/j.1420-9101.2011.02373.x García-Dorado, A., Caballero, A., Bateman, A. J., Bregliano, J.-C., Laurencon, A., Degroote, F., Caballero, A., Keightley, P. D., Caballero, A., Keightley, P. D., Turelli, M., Cockerham, C. C., Mukai, T., Fry, J. D., Keightley, P. D., Heinsohn, S. L., Nuzhdin, S. V., García-Dorado, A., García-Dorado, A., … Crow, J. F. (2000). On the average coefficient of dominance of deleterious spontaneous mutations. Genetics, 155(4), 1991–2001. https://doi.org/10.1080/09553005914550241 Gavrilets, S., & Hastings, A. (1994). A Quantitative-Genetic Model for Selection on Developmental Noise. Evolution, 48(5), 1478. https://doi.org/10.2307/2410242 Gavrilets, S., & Scheiner, S. M. (1993a). The genetics of phenotypic plasticity. V. Evolution of reaction norm shape. Journal of Evolutionary Biology, 6(1), 31–48. https://doi.org/10.1046/j.1420-9101.1993.6010031.x Gavrilets, S., & Scheiner, S. M. (1993b). The genetics of phenotypic plasticity. VI. Theoretical predictions for directional selection. Journal of Evolutionary Biology, 6(1), 49–68. https://doi.org/10.1046/j.1420-9101.1993.6010049.x 	 79	Getty, T. (1996). The Maintenance of Phenotypic Plasticity as a Signal Detection Problem. The American Naturalist, 148(2), 378–385. https://doi.org/10.1086/285930 Ghalambor, C. K., McKay, J. K., Carroll, S. P., & Reznick, D. N. (2007). Adaptive versus non-adaptive phenotypic plasticity and the potential for contemporary adaptation in new environments. Functional Ecology, 21(3), 394–407. https://doi.org/10.1111/j.1365-2435.2007.01283.x Gierer, A., & Meinhardt, H. (1972). A theory of biological pattern formation. Kybernetik, 12(1), 30–39. https://doi.org/10.1007/BF00289234 Gilbert, K. J., Sharp, N. P., Angert, A. L., Conte, G. L., Draghi, J. A., Guillaume, F., Hargreaves, A. L., Matthey-Doret, R., & Whitlock, M. C. (2017a). Local adaptation interacts with expansion load during range expansion: Maladaptation reduces expansion load. The American Naturalist, 189(4), 368–380. https://doi.org/10.1086/690673 Gilbert, K. J., Sharp, N. P., Angert, A. L., Conte, G. L., Draghi, J. A., Guillaume, F., Hargreaves, A. L., Matthey-Doret, R., & Whitlock, M. C. (2017b). Local Adaptation Interacts with Expansion Load during Range Expansion: Maladaptation Reduces Expansion Load. The American Naturalist, 189(4), 368–380. https://doi.org/10.1086/690673 	 80	Gillespie, D. T. (2007a). Stochastic Simulation of Chemical Kinetics. Annual Review of Physical Chemistry, 58(1), 35–55. https://doi.org/10.1146/annurev.physchem.58.032806.104637 Gillespie, D. T. (2007b). Stochastic Simulation of Chemical Kinetics. Annual Review of Physical Chemistry, 58(1), 35–55. https://doi.org/10.1146/annurev.physchem.58.032806.104637 Gillespie, J. H., & Turelli, M. (1989). Genotype-Envrionment Interactions and the Maintenance of Polygenic Variation. Genetics, 121, 129–158. Gomulkiewicz, R., & Kirkpatrick, M. (1992). QUANTITATIVE GENETICS AND THE EVOLUTION OF REACTION NORMS. Evolution, 46(2), 390–411. https://doi.org/10.1111/j.1558-5646.1992.tb02047.x Gray, S. M., & McKinnon, J. S. (2007). Linking color polymorphism maintenance and speciation. Trends in Ecology and Evolution, 22(2), 71–79. https://doi.org/10.1016/j.tree.2006.10.005 Grimm, V. (1999). Ten years of individual-based modelling in ecology: What have we learned and what could we learn in the future? Ecological Modelling, 115(2–3), 129–148. https://doi.org/10.1016/S0304-3800(98)00188-4 	 81	Guillaume, F., & Rougemont, J. (2006). Nemo: An evolutionary and population genetics programming framework. Bioinformatics, 22(20), 2556–2557. https://doi.org/10.1093/bioinformatics/btl415 Guillaume, Frédéric, & Rougemont, J. (2006). Nemo: An evolutionary and population genetics programming framework. Bioinformatics, 22(20), 2556–2557. https://doi.org/10.1093/bioinformatics/btl415 Günther, T., & Coop, G. (2013). Robust Identification of Local Adaptation from Allele Frequencies. Genetics, 195(1), 205–220. https://doi.org/10.1534/genetics.113.152462 Haller, B. C., Galloway, J., Kelleher, J., Messer, P. W., & Ralph, P. L. (2019). Tree‐sequence recording in SLiM opens new horizons for forward‐time simulation of whole genomes. Molecular Ecology Resources, 19(2), 552–566. https://doi.org/10.1111/1755-0998.12968 Haller, B. C., & Messer, P. W. (2017a). SLiM 2: Flexible, interactive forward genetic simulations. Molecular Biology and Evolution, 34(1), 230–240. https://doi.org/10.1093/molbev/msw211 Haller, B. C., & Messer, P. W. (2017b). SLiM 2: Flexible, Interactive Forward Genetic Simulations. Molecular Biology and Evolution, 34(1), 230–240. https://doi.org/10.1093/molbev/msw211 	 82	Haller, B. C., & Messer, P. W. (2019). SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model. Molecular Biology and Evolution, 36(3), 632–637. https://doi.org/10.1093/molbev/msy228 Hansen, T. F. (2006). The Evolution of Genetic Architecture. Annual Review of Ecology, Evolution, and Systematics, 37(1), 123–157. https://doi.org/10.1146/annurev.ecolsys.37.091305.110224 Hanski, I., & Ovaskainen, O. (2003). Metapopulation theory for fragmented landscapes. Theoretical Population Biology, 64(1), 119–127. https://doi.org/10.1016/S0040-5809(03)00022-4 Hassell, M. P., & Varley, G. C. (1969). New Inductive Population Model for Insect Parasites and its Bearing on Biological Control. Nature, 223(5211), 1133–1137. https://doi.org/10.1038/2231133a0 Hedrick, P. W., Ginevan, M. E., & Ewing, E. P. (1976). Genetic Polymorphism in Heterogeneous Environments. Annual Review of Ecology and Systematics, 7(1), 1–32. https://doi.org/10.1146/annurev.es.07.110176.000245 Heino, J., Melo, A. S., & Bini, L. M. (2015). Reconceptualising the beta diversity-environmental heterogeneity relationship in running water systems. Freshwater Biology, 60(2), 223–235. https://doi.org/10.1111/fwb.12502 	 83	Hemmer-Hansen, J., Nielsen, E. E., Therkildsen, N. O., Taylor, M. I., Ogden, R., Geffen, A. J., Bekkevold, D., Helyar, S., Pampoulie, C., Johansen, T., FishPopTrace Consortium, & Carvalho, G. R. (2013). A genomic island linked to ecotype divergence in Atlantic cod. Molecular Ecology, 22(10), 2653–2667. https://doi.org/10.1111/mec.12284 Hereford, J. (2009). A Quantitative Survey of Local Adaptation and Fitness Trade‐Offs. The American Naturalist, 173(5), 579–588. https://doi.org/10.1086/597611 Hernandez, R. D. (2008). A flexible forward simulator for populations subject to selection and demography. Bioinformatics, 24(23), 2786–2787. https://doi.org/10.1093/bioinformatics/btn522 Hernandez, Ryan D, Kelley, J. L., Elyashiv, E., Melton, S. C., & Auton, A. (2011). Classic Selective Sweeps were rare in Recent Human Evolution. Science, 331(6019), 920–924. Hernandez, Ryan D., & Uricchio, L. H. (2015). SFS_CODE: More Efficient and Flexible Forward Simulations [Preprint]. Bioinformatics. https://doi.org/10.1101/025064 Hewitt, J. E., Thrush, S. F., Dayton, P. K., & Bonsdorff, E. (2007). The Effect of Spatial and Temporal Heterogeneity on the Design and Analysis of Empirical Studies of Scale‐Dependent Systems. 11. 	 84	Hill, V. A. (1910). The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves. The Journal of Physiology, 40, 4–7. Hoban, S. (2014). An overview of the utility of population simulation software in molecular ecology. Molecular Ecology, 23(10), 2383–2401. https://doi.org/10.1111/mec.12741 Hoban, S., Bertorelle, G., & Gaggiotti, O. E. (2012). Computer simulations: Tools for population and evolutionary genetics. Nature Reviews Genetics, 13(2), 110–122. https://doi.org/10.1038/nrg3130 Hoban, S., Kelley, J. L., Lotterhos, K. E., Antolin, M. F., Bradburd, G., Lowry, D. B., Poss, M. L., Reed, L. K., Storfer, A., & Whitlock, M. C. (2016a). Finding the genomic basis of local adaptation: Pitfalls, practical solutions, and future directions. The American Naturalist, 188(4), 379–397. https://doi.org/10.1086/688018 Hoban, S., Kelley, J. L., Lotterhos, K. E., Antolin, M. F., Bradburd, G., Lowry, D. B., Poss, M. L., Reed, L. K., Storfer, A., & Whitlock, M. C. (2016b). Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions. The American Naturalist, 188(4), 379–397. https://doi.org/10.1086/688018 	 85	Hoekstra, H. E., Hirschmann, R. J., Bundey, R. A., Insel, P. A., & Crossland, J. P. (2006). A Single Amino Acid Mutation Contributes to Adaptive Beach Mouse Color Pattern. Science, 313(5783), 101–104. https://doi.org/10.1126/science.1126121 Hoffmann, A. a, & Sgrò, C. M. (2011). Climate change and evolutionary adaptation. Nature, 470(7335), 479–485. https://doi.org/10.1038/nature09670 Hooshangi, S., Thiberge, S., & Weiss, R. (2005). Ultrasensitivity and noise propagation in a synthetic transcriptional cascade. Proceedings of the National Academy of Sciences, 102(10), 3581–3586. https://doi.org/10.1073/pnas.0408507102 Hoppeler, H., Baum, O., Lurman, G., & Mueller, M. (2011). Molecular Mechanisms of Muscle Plasticity with Exercise. Comprehensive Physiology, 1(July), 1383–1412. https://doi.org/10.1002/cphy.c100042 Hortal, J., Triantis, K. A., Meiri, S., Thébault, E., & Sfenthourakis, S. (2009). Island Species Richness Increases with Habitat Diversity. The American Naturalist, 174(6), E205–E217. https://doi.org/10.1086/645085 Huang, Y., Tran, I., & Agrawal, A. F. (2016). Does Genetic Variation Maintained by Environmental Heterogeneity Facilitate Adaptation to Novel Selection? The American Naturalist, 188(1), 27–37. https://doi.org/10.1086/686889 	 86	Huber, C. D., Degiorgio, M., Hellmann, I., & Nielsen, R. (2016). Detecting recent selective sweeps while controlling for mutation rate and background selection. Molecular Ecology, 25, 142–156. https://doi.org/10.1111/mec.13351 Huber, P. J. (1964). Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101. https://doi.org/10.1214/aoms/1177703732 Hudson, R. R., & Kaplan, N. L. (1995). Deleterious background selection with recombination. Genetics, 141(4), 1605–1617. Hughes, A. R., Inouye, B. D., Johnson, M. T. J., Underwood, N., & Vellend, M. (2008). Ecological consequences of genetic diversity. Ecology Letters, 11(6), 609–623. https://doi.org/10.1111/j.1461-0248.2008.01179.x Hughes, C., & Eastwood, R. (2006). Island radiation on a continental scale: Exceptional rates of plant diversification after uplift of the Andes. Proceedings of the National Academy of Sciences, 103(27), 10334–10339. https://doi.org/10.1073/pnas.0601928103 Hull, D. L., Langman, R. E., & Glenn, S. S. (2001). A general account of selection: Biology, immunology, and behavior. Behavioral and Brain Sciences, 24(3), 511–528. https://doi.org/10.1017/S0140525X01004162 	 87	Ingvarsson, P. K., & Whitlock, M. C. (2000). Heterosis increases the effective migration rate. Proceedings of the Royal Society B: Biological Sciences, 267(1450), 1321–1326. https://doi.org/10.1098/rspb.2000.1145 Irwin, D. E., Alcaide, M., Delmore, K. E., Irwin, J. H., & Owens, G. L. (2016). Recurrent selection explains parallel evolution of genomic regions of high relative but low absolute differentiation in a ring species. Molecular Ecology, 25(18), 4488–4507. https://doi.org/10.1111/mec.13792 Kaiser, V. B., & Charlesworth, B. (2009). The effects of deleterious mutations on evolution in non-recombining genomes. Trends in Genetics, 25(1), 9–12. https://doi.org/10.1016/j.tig.2008.10.009 Kallimanis, A. S., Mazaris, A. D., Tzanopoulos, J., Halley, J. M., Pantis, J. D., & Sgardelis, S. P. (2008). How does habitat diversity affect the species–area relationship? Global Ecology and Biogeography, 17(4), 532–538. https://doi.org/10.1111/j.1466-8238.2008.00393.x Kawecki, T. J., & Ebert, D. (2004). Conceptual issues in local adaptation. Ecology Letters, 7(12), 1225–1241. https://doi.org/10.1111/j.1461-0248.2004.00684.x Keightley, P. D. (2012). Rates and fitness consequences of new mutations in humans. Genetics, 190(2), 295–304. https://doi.org/10.1534/genetics.111.134668 	 88	Keightley, P. D., & Gaffney, D. J. (2003). Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proceedings of the National Academy of Sciences, 100(23), 13402–13406. https://doi.org/10.1073/pnas.2233252100 Kelleher, J., Thornton, K. R., Ashander, J., & Ralph, P. L. (2018). Efficient pedigree recording for fast population genetics simulation. PLOS Computational Biology, 14(11), e1006581. https://doi.org/10.1371/journal.pcbi.1006581 Keller, A., Rödel, M.-O., Linsenmair, K. E., & Grafe, T. U. (2009). The importance of environmental heterogeneity for species diversity and assemblage structure in Bornean stream frogs. Journal of Animal Ecology, 78(2), 305–314. https://doi.org/10.1111/j.1365-2656.2008.01457.x Kim, Y., & Stephan, W. (2000). Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics, 155(3), 1415–1427. Klingenberg, C. P., & Nijhout, H. F. (1999). Genetics of Fluctuating Asymmetry: A Developmental Model of Developmental Instability. Evolution, 53(2), 358–375. Laforsch, C., Beccara, L., & Tollrian, R. (2006). Inducible defenses: The relevance of chemical alarm cues in Daphnia. Limnology and Oceanography, 51(3), 1466–1472. https://doi.org/10.4319/lo.2006.51.3.1466 	 89	Lande, R., & Arnold, S. J. (1983). The Measurement of Selection on Correlated Characters. Evolution, 37(6), 1210. https://doi.org/10.2307/2408842 Landry, C. R., Oh, J., Hartl, D. L., & Cavalieri, D. (2006). Genome-wide scan reveals that genetic variation for transcriptional plasticity in yeast is biased towards multi-copy and dispensable genes. Gene, 366(2), 343–351. https://doi.org/10.1016/j.gene.2005.10.042 Lange, J. D., & Pool, J. E. (2016). A haplotype method detects diverse scenarios of local adaptation from genomic sequence variation. Molecular Ecology, 25(13), 3081–3100. https://doi.org/10.1111/mec.13671 Lee, T. M., & Zucker, I. (1988). Vole infant development is influenced perinatally by maternal photoperiodic history. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology, 255(5), R831–R838. https://doi.org/10.1152/ajpregu.1988.255.5.R831 Lenormand, T. (2002). Gene flow and the limits to natural selection. Trends in Ecology & Evolution, 17(4), 183–189. https://doi.org/10.1016/S0169-5347(02)02497-7 Leòn, J. A. (1993). Plasticity in Fluctuating Environments. In Adaptation in Stochastic Envrionments (pp. 105–121). https://link.springer.com/content/pdf/10.1007%2F978-3-642-51483-8.pdf 	 90	Levins, R. (1968). Evolution in Changing Environments. Princeton University Press. Lewontin, R. C., & Krakauer, J. (1973). Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics, 74, 175–195. Li, Y., Álvarez, O. A., Gutteling, E. W., Tijsterman, M., Fu, J., Riksen, J. A. G., Hazendonk, E., Prins, P., Plasterk, R. H. A., Jansen, R. C., Breitling, R., & Kammenga, J. E. (2006). Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genetics, 2(12), 2155–2161. https://doi.org/10.1371/journal.pgen.0020222 Linhart, Y. B., & Grant, M. C. (1996). Evolutionary Significance of Local Genetic Differentiation in Plants. Annual Review of Ecology and Systematics, 2, 237–277. Litman, G. W., Rast, J. P., Shamblott, M. J., Haire, R. N., Hulst, M., Roess, W., Litman, R. T., Hinds-frey, K. R., Zilch, A., & Amemiyag, C. T. (1993). Phylogenetic diversification of immunoglobulin genes and the antibody repertoire. Molecular Biology and Evolution, February. https://doi.org/10.1093/oxfordjournals.molbev.a040000 Lively, C. M. (1986). Canalization Versus Developmental Conversion in a Spatially Variable Environment. The American Naturalist, 128(4), 561–572. https://doi.org/10.1086/284588 	 91	Losick, R., & Desplan, C. (2008). Stochasticity and Cell Fate. Science, 320(April), 65–69. Lotterhos, K. E., & Whitlock, M. C. (2014a). Evaluation of demographic history and neutral parameterization on the performance of F ST outlier tests. Molecular Ecology, 23(9), 2178–2192. https://doi.org/10.1111/mec.12725 Lotterhos, K. E., & Whitlock, M. C. (2014b). Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests. Molecular Ecology, 23(9), 2178–2192. https://doi.org/10.1111/mec.12725 Lotterhos, K. E., & Whitlock, M. C. (2015). The relative power of genome scans to detect local adaptation depends on sampling design and statistical method. Molecular Ecology, 24(5), 1031–1046. https://doi.org/10.1111/mec.13100 Luo, L., & O’Leary, D. D. M. (2005). Axon Retraction and Degeneration in Development and Disease. Annual Review of Neuroscience, 28(1), 127–156. https://doi.org/10.1146/annurev.neuro.28.061604.135632 Macpherson, J. M., Sella, G., Davis, J. C., & Petrov, D. A. (2007). Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics, 177(4), 2083–2099. https://doi.org/10.1534/genetics.107.080226 	 92	Masel, J., & Siegal, M. L. (2009). Robustness: Mechanisms and consequences. Trends in Genetics, 25(9), 395–403. https://doi.org/10.1016/j.tig.2009.07.005 Masel, J., & Trotter, M. V. (2010). Robustness and Evolvability. Trends in Genetics, 26(9), 406–414. https://doi.org/10.1016/j.tig.2010.06.002 Matthews, B., De Meester, L., Jones, C. G., Ibelings, B. W., Bouma, T. J., Nuutinen, V., de Koppel, J. van, & Odling-Smee, J. (2014). Under niche construction: An operational bridge between ecology, evolution, and ecosystem science. Ecological Monographs, 84(2), 245–263. https://doi.org/10.1890/13-0953.1 Matthey‐Doret, R., & Whitlock, M. C. (2019). Background selection and F ST: Consequences for detecting local adaptation. Molecular Ecology, 28(17), 3902–3914. https://doi.org/10.1111/mec.15197 Maynard-Smith, J., & Haig, J. (1974). The hitch-hiking effect of a favourable gene. Genetical Research, 23(1), 23–35. https://doi.org/10.1017/S0016672300014634 Maynard-Smith, J., & Haigh, J. (1974). The hitch-hiking effect of a favourable gene. Genetical Research, 13. Mayo, A. E., Setty, Y., Shavit, S., Zaslaver, A., & Alon, U. (2006). Plasticity of the cis-regulatory input function of a gene. PLoS Biology, 4(4), 555–561. https://doi.org/10.1371/journal.pbio.0040045 	 93	McAdams, H. H., & Arkin,  a. (1997). Stochastic mechanisms in gene expression. Proceedings of the National Academy of Sciences of the United States of America, 94(3), 814–819. https://doi.org/10.1073/pnas.94.3.814 McGee, M. D., Neches, R. Y., & Seehausen, O. (2015). Evaluating genomic divergence and parallelism in replicate ecomorphs from young and old cichlid adaptive radiations. Molecular Ecology, 25(1), 260–268. https://doi.org/10.1111/mec.13463 McVicker, G., Gordon, D., Davis, C., & Green, P. (2009). Widespread genomic signatures of natural selection in hominid evolution. PLoS Genetics, 5(5), e1000471. https://doi.org/10.1371/journal.pgen.1000471 Meiklejohn, C. D., & Hartl, D. L. (2002). A single mode of canalization. Trends in Ecology & Evolution, 17(10), 468–473. Messer, P. W., & Petrov, D. a. (2013). Frequent adaptation and the McDonald-Kreitman test. Proceedings of the National Academy of Sciences of the United States of America, 110(21), 8615–8620. https://doi.org/10.1073/pnas.1220835110 Molina-Montenegro, M. A., & Naya, D. E. (2012). Latitudinal Patterns in Phenotypic Plasticity and Fitness-Related Traits: Assessing the Climatic Variability Hypothesis (CVH) with an Invasive Plant Species. PLoS ONE, 7(10), e47620. https://doi.org/10.1371/journal.pone.0047620 	 94	Montville, R., Froissart, R., Remold, S. K., Tenaillon, O., & Turner, P. E. (2005). Evolution of Mutational Robustness in an RNA Virus. PLoS Biology, 3(11), e381. https://doi.org/10.1371/journal.pbio.0030381 Moran, N. A. (1992). The Evolutionary Maintenance of Alternative Phenotypes. The American Naturalist, 139(5), 971–989. Mulvey, M., Aho, J. M., & Rhodes, E. (2016). Nordic Society Oikos Parasitism and White-Tailed Deer: Timing and Components of Female Reproduction Author ( s ): M. Mulvey , J. M. Aho , O. E. Rhodes and Jr. Published by: Wiley on behalf of Nordic Society Oikos Stable URL : http://www.jstor.org/. 70(2), 177–182. Munch, K., Nam, K., Schierup, M. H., & Mailund, T. (2016). Selective sweeps across twenty millions years of primate evolution. Molecular Biology and Evolution, 33(12), 3065–3074. https://doi.org/10.1093/molbev/msw199 Murren, C. J., Auld, J. R., Callahan, H., Ghalambor, C. K., Handelsman, C. a, Heskel, M. a, Kingsolver, J. G., Maclean, H. J., Masel, J., Maughan, H., Pfennig, D. W., Relyea, R. a, Seiter, S., Snell-Rood, E., Steiner, U. K., & Schlichting, C. D. (2015a). Constraints on the evolution of phenotypic plasticity: Limits and costs of phenotype and plasticity. Heredity, 115(November 2014), 1–9. https://doi.org/10.1038/hdy.2015.8 	 95	Murren, C. J., Auld, J. R., Callahan, H., Ghalambor, C. K., Handelsman, C. A., Heskel, M. A., Kingsolver, J. G., Maclean, H. J., Masel, J., Maughan, H., Pfennig, D. W., Relyea, R. A., Seiter, S., Snell-Rood, E., Steiner, U. K., & Schlichting, C. D. (2015b). Constraints on the evolution of phenotypic plasticity: Limits and costs of phenotype and plasticity. Heredity, 115(4), 293–301. https://doi.org/10.1038/hdy.2015.8 Nachman, M. W., & Crowell, S. L. (2000). Estimate of the mutation rate per nucleotide in humans. Genetics, 156(1), 297–304. Nachman, M. W., & Payseur, B. A. (2012). Recombination rate variation and speciation: Theoretical predictions and empirical results from rabbits and mice. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1587), 409–421. https://doi.org/10.1098/rstb.2011.0249 Naiman, R. J., Johnston, C. A., & Kelley, J. C. (1988). Alteration of North American Streams by Beaver. BioScience, 38(11), 753–762. https://doi.org/10.2307/1310784 Narum, S. R., & Hess, J. E. (2011). Comparison of F ST outlier tests for SNP loci under selection: ANALYTICAL APPROACHES. Molecular Ecology Resources, 11, 184–194. https://doi.org/10.1111/j.1755-0998.2011.02987.x 	 96	Nei, M. (1973a). Analysis of Gene Diversity in Subdivided Populations. Proceedings of the National Academy of Sciences, 70(12), 3321–3323. https://doi.org/10.1073/pnas.70.12.3321 Nei, M. (1973b). Analysis of Gene Diversity in Subdivided Populations. 70, 3. Nei, M. (1987a). Molecular Evolutionary Genetics. Columbia University Press. Nei, M. (1987b). Molecular Evolutionary Genetics: Vol. neicharlesworth. Colmubia University Press. Nemazee, D. (2006). Receptor editing in lymphocyte development and central tolerance. Nature Reviews Immunology, 6(10), 728–740. https://doi.org/10.1038/nri1939 New Internationalist. (1995, May 20). New Internationalist. Nicotra, A. B., Atkin, O. K., Bonser, S. P., Davidson, A. M., Finnegan, E. J., Mathesius, U., Poot, P., Purugganan, M. D., Richards, C. L., Valladares, F., & van Kleunen, M. (2010). Plant phenotypic plasticity in a changing climate. Trends in Plant Science, 15(12), 684–692. https://doi.org/10.1016/j.tplants.2010.09.008 Nordborg, M., Charlesworth, B., & Charlesworth, D. (1996a). The effect of recombination on background selection. Genetical Research, 67(02), 159–174. https://doi.org/10.1017/S0016672300033619 	 97	Nordborg, M., Charlesworth, B., & Charlesworth, D. (1996b). The effect of recombination on background selection. Genetical Research, 67(2), 159–174. https://doi.org/10.1017/S0016672300033619 O’Neill, M. B., Shockey, A., Zarley, A., Aylward, W., Eldholm, V., Kitchen, A., & Pepperell, C. S. (2019). Lineage specific histories of Mycobacterium tuberculosis dispersal in Africa and Eurasia. Molecular Ecology, mec.15120. https://doi.org/10.1111/mec.15120 Otto, S. P., & Day, T. (2007). A biologist’s guide to Mathematical Modeling in Ecology and Evolution. Princeton University Press. Papaj, D. R., & Prokopy, R. J. (1989). Ecological and evolutionary aspects of learning in phytophagous insects. Annual Review of Entomology, 34, 315–350. https://doi.org/10.4324/9780203125915 Patterson, J. H. (1976). The Role of Environmental Heterogeneity in the Regulation of Duck Populations. The Journal of Wildlife Management, 40(1), 22. https://doi.org/10.2307/3800152 Payseur, B. A., & Rieseberg, L. H. (2016). A genomic perspective on hybridization and speciation. Molecular Ecology, 25(11), 2337–2360. https://doi.org/10.1111/mec.13557 	 98	Pedraza, J. M., & Van Oudenaarden, A. (2005). Noise Propagation in Gene Networks. Science, 307(March), 1965–1970. Pérez-Figueroa, A., García-Pereira, M. J., Saura, M., Rolán-Alvarez, E., & Caballero, A. (2010). Comparing three different methods to detect selective loci using dominant markers: Comparing methods to detect selective loci. Journal of Evolutionary Biology, 23(10), 2267–2276. https://doi.org/10.1111/j.1420-9101.2010.02093.x Perkins, J. M., & Jinks, J. L. (1973). The assessment and specificity of environmental and genotype-environmental components of variability. Heredity, 30(2), 111–126. https://doi.org/10.1038/hdy.1973.16 Peters, A. D., Halligan, D. L., Whitlock, M. C., & Keightley, P. D. (2003). Dominance and Overdominance of Mildly Deleterious Induced Mutations for Fitness Traits in Caenorhabditis elegans. Genetics, 165(2), 589–599. Pfennig, D. W., Wund, M. A., Snell-Rood, E. C., Cruickshank, T., Schlichting, C. D., & Moczek, A. P. (2010). Phenotypic plasticity’s impacts on diversification and speciation. Trends in Ecology and Evolution, 25(8), 459–467. https://doi.org/10.1016/j.tree.2010.05.006 Picq, S., Mcmillan, W. O., & Puebla, O. (2016). Population genomics of local adaptation versus speciation in coral reef fishes (Hypoplectrus spp, Serranidae). Ecology and Evolution, 6(7), 2109–2124. https://doi.org/10.1002/ece3.2028 	 99	Pigliucci, M. (1996). How organisms respond to environmenta changes: From phenotypes to molecules (and vice versa). Trends in Ecology & Evolution, 11. Pigliucci, M. (2008). Is evolvability evolvable? Nature Reviews Genetics, 9(1), 75–82. https://doi.org/10.1038/nrg2278 Pigliucci, M., & Murren, C. J. (2003). PERSPECTIVE: GENETIC ASSIMILATION AND A POSSIBLE EVOLUTIONARY PARADOX: CAN MACROEVOLUTION SOMETIMES BE SO FAST AS TO PASS US BY? Evolution, 57(7), 1455–1464. https://doi.org/10.1111/j.0014-3820.2003.tb00354.x Price, T. D., Qvarnstrom, A., & Irwin, D. E. (2003). The role of phenotypic plasticity in driving genetic evolution. Proceedings of the Royal Society B: Biological Sciences, 270(1523), 1433–1440. https://doi.org/10.1098/rspb.2003.2372 Rainey, P. B., & Travisano, M. (1998). Adaptive radiation in a heterogeneous environment. Nature, 394(6688), 69–72. https://doi.org/10.1038/27900 Reynolds, J., Weir, B. S., & Cockerham, C. C. (1983). Estimation of the coancestry coefficient: Basis for a short-term genetic distance. Genetics, 105(3), 767–779. 	 100	Rifkin, S. A., Houle, D., Kim, J., & White, K. P. (2005). A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression. Nature, 438(7065), 220–223. https://doi.org/10.1038/nature04114 Robinson, B. W., & Dukas, R. (1999). The Influence of Phenotypic Modifications on Evolution: The Baldwin Effect and Modern Perspectives. Oikos, 85(3), 582. https://doi.org/10.2307/3546709 Roesti, M., Moser, D., & Berner, D. (2013). Recombination in the threespine stickleback genome—Patterns and consequences. Molecular Ecology, 22(11), 3014–3027. https://doi.org/10.1111/mec.12322 Rowell, C. H. F. (1972). The Variable Coloration of the Acridoid Grasshoppers. In Advances in Insect Physiology (Vol. 8, pp. 145–198). Elsevier. https://doi.org/10.1016/S0065-2806(08)60197-6 Rutherford, S. L. (2000). From genotype to phenotype: Buffering mechanisms and the storage of genetic information. BioEssays, 22, 1095–1105. Sabeti, P. C., Varilly, P., Fry, B., Lohmueller, J., Hostetter, E., Cotsapas, C., Xie, X., Byrne, E. H., McCarroll, S. A., Gaudet, R., Schaffner, S. F., Lander, E. S., Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., Belmont, J. W., … Stewart, J. (2007). Genome-wide detection and 	 101	characterization of positive selection in human populations. Nature, 449(7164), 913–918. https://doi.org/10.1038/nature06250 Sachs, T. (1988). Epigenetic selection: An alternative mechanism of pattern formation. Journal of Theoretical Biology, 134(4), 547–559. https://doi.org/10.1016/S0022-5193(88)80056-0 Sanchis-Moysi, J., Idoate, F., Izquierdo, M., Calbet, J. A. L., & Dorado, C. (2011). Iliopsoas and Gluteal Muscles Are Asymmetric in Tennis Players but Not in Soccer Players. PLoS ONE, 6(7), e22858. https://doi.org/10.1371/journal.pone.0022858 Sattath, S., Elyashiv, E., Kolodny, O., Rinott, Y., & Sella, G. (2011). Pervasive adaptive protein evolution apparent in diversity patterns around amino acid substitutions in drosophila simulans. PLoS Genetics, 7(2), e1001302. https://doi.org/10.1371/journal.pgen.1001302 Savolainen, O., Lascoux, M., & Merilä, J. (2013). Ecological genomics of local adaptation. Nature Reviews Genetics, 14(11), 807–820. https://doi.org/10.1038/nrg3522 Scheiner, S. M. (1993). GENETICS AND EVOLUTION OF PHENOTYPIC PLASTICITY. Annual Review of Ecology and Systematics, 24, 35–68. 	 102	Scheiner, S. M., Caplan, L., & Lyman, F. (1991). The genetics of phenotypic III correlations and fluctuating asymmetries. Journal of Evolutionary Biology, 4, 51–68. Scheiner, S. M., & Holt, R. D. (2012). The genetics of phenotypic plasticity. X. Variation versus uncertainty. Ecology and Evolution, 2(4), 751–767. https://doi.org/10.1002/ece3.217 Schlichting, C. D. (1986). The evolution of phenotypic plasticity in plants. Annual Review of Ecology and Systematics, 17, 667–693. Schlichting, C. D., Pigliucci, M., & Murren, C. J. (2006). Phenotypic plasticity and evolution by genetic assimilation. The Journal of Experimental Biology, 209, 2362–2367. https://doi.org/10.1242/jeb.02070 Schmitt, J., McCormac, A. C., & Smith, H. (1995). A Test of the Adaptive Plasticity Hypothesis Using Transgenic and Mutant Plants Disabled in Phytochrome-Mediated Elongation Responses to Neighbors. The American Naturalist, 146(6), 937–953. https://doi.org/10.1086/285832 Schmitt, J., & Wulff, R. D. (1993). Light Spectral Quality Phytochrome and plant competition. Trends in Ecology & Evolution, 8(2). 	 103	Seebacher, F., White, C. R., & Franklin, C. E. (2015). Physiological plasticity increases resilience of ectothermic animals to climate change. Nature Climate Change, 5(1), 61–66. https://doi.org/10.1038/nclimate2457 Seger, J., & Brockmann, H. J. (1987). What is bet-hedging. Oxford University Press. Shaw, R. G., & Chang, S. M. (2006). Gene action of new mutations in Arabidopsis thaliana. Genetics, 172(3), 1855–1865. https://doi.org/10.1534/genetics.105.050971 Simons, A. M., & Johnston, M. O. (1997). Developmental Instability as a Bet-Hedging Strategy. Oikos, 80(2), 401. https://doi.org/10.2307/3546608 Smith, T. B., & Skulason, S. (1996). Evolutionary Significance of Resource Polymorphisms in Fishes , Amphibians , and Birds. Annual Review of Ecology and Systematics, 27(1), 111–133. https://doi.org/10.1146/annurev.ecolsys.27.1.111 Snell-Rood, E. C. (2012). Selective Processes in Development: Implications for the Costs and Benefits of Phenotypic Plasticity. Integrative and Comparative Biology, 52(1), 31–42. https://doi.org/10.1093/icb/ics067 	 104	Snell-Rood, E C, Swanson, E. M., & Young, R. L. (2015). Life history as a constraint on plasticity: Developmental timing is correlated with phenotypic variation in birds. Heredity, 115(4), 379–388. https://doi.org/10.1038/hdy.2015.47 Snell-Rood, Emilie C. (2012). Selective processes in development: Implications for the costs and benefits of phenotypic plasticity. Integrative and Comparative Biology, 52(1), 31–42. https://doi.org/10.1093/icb/ics067 Snell-Rood, Emilie C., Kobiela, M. E., Sikkink, K. L., & Shephard, A. M. (2018). Mechanisms of Plastic Rescue in Novel Environments. Annual Review of Ecology, Evolution, and Systematics, 49(1), 331–354. https://doi.org/10.1146/annurev-ecolsys-110617-062622 Snell-Rood, Emilie C., Van Dyken, J. D., Cruickshank, T., Wade, M. J., & Moczek, A. P. (2010). Toward a population genetic framework of developmental evolution: The costs, limits, and consequences of phenotypic plasticity. BioEssays, 32(1), 71–81. https://doi.org/10.1002/bies.200900132 Song, S., & Abbott, L. F. (2001). Cortical development and remapping through spike timing-dependent plasticity. Neuron, 32(2), 339–350. https://doi.org/10.1016/S0896-6273(01)00451-2 Spencer, C. C. A., Deloukas, P., Hunt, S., Mullikin, J., Myers, S., Silverman, B., Donnelly, P., Bentley, D., & McVean, G. (2006). The influence of recombination 	 105	on human genetic diversity. PLoS Genetics, 2(9), e148. https://doi.org/10.1371/journal.pgen.0020148 Stearns, S. C., Kaiser, M., & Kawecki, T. J. (1995). The differential genetic and environmental canalization of fitness components in Drosophila melanogaster. Journal of Evolutionary Biology, 8(5), 539–557. https://doi.org/10.1046/j.1420-9101.1995.8050539.x Stearns, S. C., & Kawecki, T. J. (1994). Fitness Sensitivity and the Canalization of Life-History Traits. Evolution, 48(5), 1438. https://doi.org/10.2307/2410238 Stein, A., Gerstner, K., & Kreft, H. (2014). Environmental heterogeneity as a universal driver of species richness across taxa, biomes and spatial scales. Ecology Letters, 17(7), 866–880. https://doi.org/10.1111/ele.12277 Steiner, C. C., Weber, J. N., & Hoekstra, H. E. (2007). Adaptive Variation in Beach Mice Produced by Two Interacting Pigmentation Genes. PLoS Biology, 5(9), e219. https://doi.org/10.1371/journal.pbio.0050219 Stephan, W. (2010). Genetic hitchhiking versus background selection: The controversy and its implications. 365(1544), 1245–1253. https://doi.org/10.1098/rstb.2009.0278 	 106	Sultan, S. E. (2000). Phenotypic plasticity for plant development, function and life history. Trends in Plant Science, 5(12), 537–542. https://doi.org/10.1016/S1360-1385(00)01797-0 Suzuki, A., & Nijhout, H. F. (2006). Evolution of a Polyphenism by Genetic Accommodation Author ( s ): Yuichiro Suzuki and H. Frederik Nijhout. 311(5761), 650–652. Swanson, E. M., & Snell-Rood, E. C. (2014). A Molecular Signaling Approach to Linking Intraspecific Variation and Macro-evolutionary Patterns. Integrative and Comparative Biology, 54(5), 805–821. https://doi.org/10.1093/icb/icu057 Szollosi, G. J., & Derenyi, I. (2009a). Congruent Evolution of Genetic and Environmental Robustness in Micro-RNA. Molecular Biology and Evolution, 26(4), 867–874. https://doi.org/10.1093/molbev/msp008 Szollosi, G. J., & Derenyi, I. (2009b). Congruent Evolution of Genetic and Environmental Robustness in Micro-RNA. Molecular Biology and Evolution, 26(4), 867–874. https://doi.org/10.1093/molbev/msp008 Tamme, R., Hiiesalu, I., Laanisto, L., Szava-Kovats, R., & Pärtel, M. (2010). Environmental heterogeneity, species diversity and co-existence at different spatial scales. Journal of Vegetation Science. https://doi.org/10.1111/j.1654-1103.2010.01185.x 	 107	Tange, O. (2011). GNU Parallel: The Command-Line Power Tool. ;;Login:, 36(1), 42–47. https://doi.org/10.5281/zenodo.16303 Tansey, E. A., & Johnson, C. D. (2015). Recent advances in thermoregulation. Advances in Physiology Education, 39(3), 139–148. https://doi.org/10.1152/advan.00126.2014 Tews, J., Brose, U., Grimm, V., Tielbörger, K., Wichmann, M. C., Schwager, M., & Jeltsch, F. (2004). Animal species diversity driven by habitat heterogeneity/diversity: The importance of keystone structures: Animal species diversity driven by habitat heterogeneity. Journal of Biogeography, 31(1), 79–92. https://doi.org/10.1046/j.0305-0270.2003.00994.x The International HapMap, C. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449(7164), 851–861. https://doi.org/10.1038/nature06258.A The International HapMap Consortium, Sabeti, P. C., Varilly, P., Fry, B., Lohmueller, J., Hostetter, E., Cotsapas, C., Xie, X., Byrne, E. H., McCarroll, S. A., Gaudet, R., Schaffner, S. F., & Lander, E. S. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature, 449(7164), 913–918. https://doi.org/10.1038/nature06250 	 108	Tienderen, P. H. Van. (1991). Evolution of Generalists and Specialist in Spatially Heterogeneous Environments. Evolution, 45(6), 1317. https://doi.org/10.2307/2409882 Tiré, C., De Rycke, R., De Loose, M., Inzé, D., Van Montagu, M., & Engler, G. (1994). Extensin gene expression is induced by mechanical stimuli leading to local cell wall strengthening in Nicotiana plumbaginifolia. Planta, 195(2), 175–181. https://doi.org/10.1007/BF00199676 Tonsor, S. J., Elnaccash, T. W., & Scheiner, S. M. (2013). Developmental instability is genetically correlated with phenotypic plasticity, constraining heritability, and fitness. Evolution, 67(10), 2923–2935. https://doi.org/10.1111/evo.12175 Torres, R., Szpiech, Z. A., & Hernandez, R. D. (2018). Human demographic history has amplified the effects background selection across the genome. PLoS Genetics, 14(6), e1007387. https://doi.org/10.1371/ journal.pgen.1007387 Turing, A. M. (1952). The Chemical Basis of Morphogenesis. Philosophical Transactions of the Royal Society B: Biological Sciences, 237(641), 37–72. Van Buskirk, J., & Steiner, U. K. (2009). The fitness costs of developmental canalization and plasticity. Journal of Evolutionary Biology, 22(4), 852–860. https://doi.org/10.1111/j.1420-9101.2009.01685.x 	 109	Van Kleunen, M., & Fischer, M. (2005). Constraints on the evolution of adaptive phenotypic plasticity in plants: Research review. New Phytologist, 166(1), 49–60. https://doi.org/10.1111/j.1469-8137.2004.01296.x van Nimwegen, E., Crutchfield, J. P., & Huynen, M. (1999). Neutral evolution of mutational robustness. Proceedings of the National Academy of Sciences, 96(17), 9716–9720. https://doi.org/10.1073/pnas.96.17.9716 Vasudevan, K., Kumar, A., & Chellam, R. (2006). Species turnover: The case of stream amphibians of rainforests in the Western Ghats, southern India. Biodiversity and Conservation, 15(11), 3515–3525. https://doi.org/10.1007/s10531-004-3101-x Veening, J.-W., Smits, W. K., & Kuipers, O. P. (2008). Bistability, Epigenetics, and Bet-Hedging in Bacteria. Annual Review of Microbiology, 62(1), 193–210. https://doi.org/10.1146/annurev.micro.62.081307.163002 Via, S., & Lande, R. (1985). Genotype-Environment Interaction and the Evolution of Phenotypic Plasticity. Evolution, 39(3), 505. https://doi.org/10.2307/2408649 Via, S., & Lande, R. (1987). Evolution of genetic variability in a spatially heterogeneous environment: Effects of genotype–environment interaction. Genetical Research, 49(2), 147–156. https://doi.org/10.1017/S001667230002694X 	 110	Vijay, N., Weissensteiner, M., Burri, R., Kawakami, T., Ellegren, H., & Wolf, J. B. W. (2017a). Genomewide patterns of variation in genetic diversity are shared among populations, species and higher-order taxa. Molecular Ecology, 26(16), 4284–4295. https://doi.org/10.1111/mec.14195 Vijay, N., Weissensteiner, M., Burri, R., Kawakami, T., Ellegren, H., & Wolf, J. B. W. (2017b). Genome-wide signatures of genetic variation within and between populations – a comparative perspective [Preprint]. Evolutionary Biology. https://doi.org/10.1101/104604 Vijay, N., Weissensteiner, M., Burri, R., Kawakami, T., Ellegren, H., & Wolf, J. B. W. (2017c). Genomewide patterns of variation in genetic diversity are shared among populations, species and higher-order taxa. Molecular Ecology, 26(16), 4284–4295. https://doi.org/10.1111/mec.14195 Vilas, A., Pérez-Figueroa, A., & Caballero, A. (2012). A simulation study on the performance of differentiation-based methods to detect selected loci using linked neutral markers: Detection of selected loci by linked neutral markers. Journal of Evolutionary Biology, 25(7), 1364–1376. https://doi.org/10.1111/j.1420-9101.2012.02526.x Wagner, A. (2005a). Robustnes and Evolvability in Living Systems. Princeton University Press. 	 111	Wagner, A. (2005b). Robustness and Evolvability in Living Systems. Princeton University Press. Wagner, A. (2008). Robustness and evolvability: A paradox resolved. Proceedings of the Royal Society B: Biological Sciences, 275(1630), 91–100. https://doi.org/10.1098/rspb.2007.1137 Wagner, G. P., Booth, G., & Bagheri-Chaichian, H. (1997). A Population Genetic Theory of Canalization. Evolution, 51(2), 329. https://doi.org/10.2307/2411105 Wang, Z., & Zhang, J. (2011). Impact of gene expression noise on organismal fitness and the efficacy of natural selection. Proceedings of the National Academy of Sciences, 108(16), E67–E76. https://doi.org/10.1073/pnas.1100059108 Weir, B S, & Cockerham, C. C. (1984). Estimating F-Statistics for the Analysis of Population Structure. Evolution, 38(6), 1358–1370. Weir, Bruce S., & Cockerham, C. C. (1984). Estimating F-Statistics for the Analysis of Population Structure. Evolution, 38(6), 1358–1370. Wennersten, L., & Forsman, A. (2012). Population-level consequences of polymorphism, plasticity and randomized phenotype switching: A review of predictions. Biological Reviews, 87(3), 756–767. https://doi.org/10.1111/j.1469-185X.2012.00231.x 	 112	West-Eberhard, M. J. (n.d.). PHENOTYPIC PLASTICITY AND THE ORIGINS OF DIVERSITyl. 32. West-Eberhard, M. J. (2003). Developmental Plasticity and Evolution. Oxford University Press. Whitlock, M. C. (1992). Temporal Fluctuations in Demographic Parameters and the Genetic Variance among Populations. Evolution, 46(3), 608–615. https://doi.org/10.2307/2409631 Whitlock, M. C., & Lotterhos, K. E. (2015). Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of F ST *. The American Naturalist, 186(S1), S24–S36. https://doi.org/10.1086/682949 Whitlock, M. C., & Mccauley, D. E. (1999). Indirect measures of gene flow and migration: FST 1/(4Nmǹ1). 9. Wildman, D. E., Uddin, M., Liu, G., Grossman, L. I., & Goodman, M. (2003). Implications of natural selection in shaping 99.4% nonsynonymous DNA identity between humans and chimpanzees: Enlarging genus Homo. Proceedings of the National Academy of Sciences of the United States of America, 100(12), 7181–7188. https://doi.org/10.1073/pnas.1232172100 	 113	Wilson, D. S., & Yoshimura, J. (2008). On the Coexistence of Specialists and Generalists Author ( s ): David Sloan Wilson and Jin Yoshimura Source: The American Naturalist , Vol. 144 , No. 4 , ( Oct ., 1994 ), pp. 692-707 Published by: The University of Chicago Press for The American Soci. The American Naturalist, 144(4), 692–707. Wootton, J. C., Feng, X., Ferdig, M. T., Cooper, R. A., Mu, J., Baruch, D. I., Magill, A. J., & Su, X. (2002). Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature, 418(6895), 320–323. https://doi.org/10.1038/nature00813 Wright, S. (1931). Evolution in Mendelian Populations. Genetics, 16. Wright, S. (1943). Isolation by distance. Genetics, 28(2), 114–138. Yang, J., Lee, S. H., Goddard, M. E., & Visscher, P. M. (2011). GCTA: A Tool for Genome-wide Complex Trait Analysis. The American Journal of Human Genetics, 88(1), 76–82. https://doi.org/10.1016/j.ajhg.2010.11.011 Yang, Z., Liu, X., Zhou, M., Ai, D., Wang, G., Wang, Y., Chu, C., & Lundholm, J. T. (2015). The effect of environmental heterogeneity on species richness depends on community position along the environmental gradient. Scientific Reports, 5(1), 15723. https://doi.org/10.1038/srep15723 	 114	Yeaman, S. (2015). Local Adaptation by Alleles of Small Effect. The American Naturalist, 186(S1), S74–S89. https://doi.org/10.1086/682405 Yoshimura, J., & Shields, W. (2018). Probabilistic Optimization of Body Size: A Discrepancy between Genetic and Phenotypic Optima Author ( s ): Jin Yoshimura and William M. Shields Published by: Society for the Study of Evolution Stable URL : http://www.jstor.org/stable/2410348. Evolution, 49(2), 375–378. Zeng, K., & Charlesworth, B. (2011). The Joint Effects of Background Selection and Genetic Recombination on Local Gene Genealogies. Genetics, 189(1), 251–266. https://doi.org/10.1534/genetics.111.130575 Zeng, K., & Corcoran, P. (2015). The effects of background and interference selection on patterns of genetic variation in subdivided populations. Genetics, 201(4), 1539–1554. https://doi.org/10.1534/genetics.115.178558 Zerbino, D. R., Achuthan, P., Akanni, W., Amode, M. R., Barrell, D., Bhai, J., Billis, K., Cummins, C., Gall, A., Girón, C. G., Gil, L., Gordon, L., Haggerty, L., Haskell, E., Hourlier, T., Izuogu, O. G., Janacek, S. H., Juettemann, T., To, J. K., … Flicek, P. (2018). Ensembl 2018. Nucleic Acids Research, 46(D1), D754–D761. https://doi.org/10.1093/nar/gkx1098 	 115	Zernicke, R., MacKay, C., & Lorincz, C. (2006). Mechanisms of bone remodeling during weight-bearing exercise. Applied Physiology, Nutrition, and Metabolism, 31(6), 655–660. https://doi.org/10.1139/h06-051 	 116	Appendix A.1 Supplemental figures and tables  Figure A.1.1: Overall distribution of selection coefficient in the heterozygotes. There are 2% lethal mutations, and the average selection coefficient of the non-lethal mutations is approximately 0.07   Heterozygote selection coefficientDensity0.0 0.2 0.4 0.6 0.8 1.00103050Appendix A Supplemental information to chapter 2 	 117	 Figure A.1.2: Correlations between B and FST, dxy, FST (average of ratios), dxy-SNP, and FST (average of ratios) after removing all loci that have minor allele frequency (MAF) lower than 0.05 (called MAF - FST (average of ratios)) for the treatment No Recombination only at the last generation (5 x 2N generations after the split). Each grey dot is a single simulation with BGS. The large black dot is the mean of all simulations without BGS. The P-values are computed from a permutation test and r is the Pearson’s correlation coefficient. P-values and r are computed on both simulations with and without BGS. Results are congruent when computing the correlation coefficients and P-value on the subset of simulations that have BGS. 	  ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●● ●LowBGSHighBGSr : -0.007P : 0.660.000.050.100.150.0 0.5 1.0BF ST●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●LowBGSHighBGSr : 0.214P : 00.00000.00050.00100.00150.0 0.5 1.0Bd XY●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●● ●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●LowBGSHighBGSr : 0.35P : 00.00.10.20.30.40.50.0 0.5 1.0Bd XY−SNP●●●●●●●●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●LowBGSHighBGSr : 0.129P : 00.000.020.040.060.0 0.5 1.0BF ST (average of ratios)●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●LowBGSHighBGSr : -0.03P : 0.070.000.050.100.0 0.5 1.0BMAF − FST (average of ratios)	 118	 Figure A.1.3: Relationship between total heterozygosity HT and FST on a site per site basis. Each dot represents a single site. The black line is a Local Polynomial Regression (LOESS). Data is a random subset from the Default treatment at the last generation.       ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0.0 0.2 0.40.000.100.20HTSite−per−site FST0.000120.000150.000180.2 0.6 1.0BExpectedHeterozygosityHTHSFigure A.1.4: Regressions of total (HT; upper line) and within (HS; lower line) population expected heterozygosity on the coefficient of BGS (B) for the last generation of the Default treatment. The two regression lines are not exactly parallel with HS tending to HT as B goes to low values (more intense BGS). 	 119	 Figure A.1.5: Relationship between observed FST and the expected FST obtained from the approximation & $$+,-9.(  from Zeng and Corcoran (2015) for the simulations with BGS of the Default and the No Recombination treatments. The exact formula used here is $$+,-9. --./  . The black line represents an OLS regression from which the P value and r are computed.   ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●r : 0.0063P : 0.780.000.050.100.150.1 0.2 0.3 0.4Expected FST by Zeng & Corcoran (2015)Observed FSTDefault●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●● ●●●●●●●●●●● ●●●●●● ●●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●● ●●● ●●●●●●● ●●●●●●●●●●●●●●●●● ●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●● ●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ● ●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●r : 0.0105P : 0.640.000.050.100.150.00 0.25 0.50 0.75 1.00Expected FST by Zeng & Corcoran (2015)Observed FSTNo Recombination	 120	Table A.1.1: Pearson’s correlation tests for the association between the coefficient of background selection (B) and HS. P-values and r are computed on the simulations with BGS only. Treatment Generation  after split       r P Bonferroni  correction Default 1 0.063 4.0 x 10-5 ** 5 0.061 0 *** 158 0.060 0 *** 1581 0.065 0 *** 10000 0.063 4.0 x 10-5 ** High Migration 1 0.061 4.0 x 10-5 ** 5 0.060 0 *** 158 0.060 4.0 x 10-5 ** 1581 0.057 4.0 x 10-5 ** 10000 0.074 0 *** Large N 10 0.076 4.0 x 10-4 . 50 0.075 0.001 . 1580 0.073 1.3 x 10-3 .  15810 0.070 1.8 x 10-3 . 100000 0.085 2.0 x 10-4 * Human genetic map 1 -0.005 0.729 . 5 -0.007 0.617 . 158 -0.013 0.424 . 1581 -0.029 0.080 . 10000 -0.007 0.659 . Low selection  pressure 1 0.142 0 *** 5 0.151 0 *** 158 0.156 0 *** 1581 0.206 0 *** 10000 0.217 0 *** Constant µ 1 0.399 0 *** 5 0.406 0 *** 158 0.411 0 *** 1581 0.521 0 *** 10000 0.535 0 *** No Migration 1 0.066 0 *** 5 0.065 4.0 x 10-5 ** 158 0.068 0 *** 1581 0.069 0 *** 10000 0.067 0 *** No Recombination 1 0.160 0 *** 5 0.163 0 *** 158 0.165 0 *** 1581 0.193 0 *** 10000 0.215 0 *** 	 121	Table A.1.2: Pearson’s correlation tests for the association between the coefficient of background selection (B) and FST. P-values and r are computed on the simulations with BGS only. Treatment Generation after split r P Bonferroni correction Default 1 -0.011 0.458  5 0.007 0.653  158 -0.032 0.053  1581 -0.030 0.062  10000 -0.019 0.243  High Migration 1 -0.016 0.313  5 0.026 0.094  158 0.033 0.029  1581 -0.004 0.779  10000 -0.010 0.522  Large N 10 0.021 0.348  50 -0.022 0.332  1580 -0.038 0.086  15810 -0.097 0 *** 100000 -0.040 0.072  Human genetic  map 1 -0.017 0.286  5 0.004 0.836  158 -0.013 0.398  1581 0.013 0.434  10000 0.025 0.116  Low selection  pressure 1 -0.013 0.577  5 -0.017 0.432  158 0.023 0.304  1581 -0.026 0.258  10000 -0.038 0.092  Constant µ 1 -0.025 0.256  5 -0.030 0.185  158 -0.052 0.021  1581 -0.039 0.081  10000 -0.058 0.010  No Migration 1 0.008 0.646  5 0.009 0.585  158 -0.061 2.8 x 10-4 . 1581 -0.087 0 *** 10000 -0.102 0 *** No Recombination 1 -0.008 0.612  5 -0.037 0.021  158 -0.006 0.706  1581 -0.022 0.168  10000 -0.007 0.658  	 122	Table A.1.3: Pearson’s correlation tests for the association between the coefficient of background selection (B) and FST (average of ratios). P-values and r are computed on the simulations with BGS only. Treatment Generation after split r P Bonferroni correction Default 1 0.016 0.307  5 0.060 4.0 x 10-5 ** 158 0.030 0.053  1581 0.022 0.162  10000 0.031 0.041  High Migration 1 -0.007 0.649  5 0.042 5.9 x 10-3  158 0.048 1.7 x 10-3  1581 0.011 0.480  10000 0.025 0.112  Large N 10 0.093 4.0 x 10-5 ** 50 0.067 2.8 x 10-3  1580 0.033 0.143  15810 -0.011 0.618  100000 0.050 0.024  Human genetic  map 1 -0.011 0.491  5 0.039 9.2 x 10-3  158 0.024 0.128  1581 0.023 0.139  10000 0.038 0.012  Low selection  pressure 1 0.012 0.579  5 0.021 0.357  158 0.071 1.4 x 10-3  1581 0.012 0.583  10000 0.004 0.856  Constant µ 1 0.030 0.180  5 0.084 2.8 x 10-4 . 158 0.064 4.0 x 10-3  1581 0.080 4.8 x 10-4 . 10000 0.055 0.015  No Migration 1 0.018 0.251  5 0.058 8.0 x 10-5 * 158 0.026 0.078  1581 0.034 0.030  10000 -0.003 0.851  No Recombination 1 0.019 0.245  5 0.067 0 *** 158 0.103 0 *** 1581 0.102 0 *** 10000 0.129 0 *** 	 123	Table A.1.4: Pearson’s correlation tests for the association between the coefficient of background selection (B) and dXY. P-values and r are computed on the simulations with BGS only. Treatment Generation after split r P Bonferroni correction Default 1 0.063 0 *** 5 0.061 0 *** 158 0.060 4.0 x 10-5 ** 1581 0.065 0 *** 10000 0.062 4.0 x 10-5 ** High Migration 1 0.061 8.0 x 10-5 * 5 0.060 0 *** 158 0.060 0 *** 1581 0.057 8.0 x 10-5 * 10000 0.074 0 *** Large N 10 0.076 5.2 x 10-4  50 0.075 6.4 x 10-4  1580 0.073 1.2 x 10-3  15810 0.070 1.6 x 10-3  100000 0.085 8.0 x 10-5 * Human genetic  map 1 -0.005 0.725  5 -0.007 0.647  158 -0.012 0.425  1581 -0.029 0.081  10000 -0.006 0.668  Low selection  pressure 1 0.142 0 *** 5 0.150 0 *** 158 0.157 0 *** 1581 0.204 0 *** 10000 0.215 0 *** Constant µ 1 0.399 0 *** 5 0.404 0 *** 158 0.409 0 *** 1581 0.520 0 *** 10000 0.533 0 *** No Migration 1 0.066 0 *** 5 0.066 0 *** 158 0.063 4.0 x 10-5 ** 1581 0.043 3.5 x 10-3  10000 0.024 0.117  No Recombination 1 0.160 0 *** 5 0.162 0 *** 158 0.166 0 *** 1581 0.192 0 *** 	 124	Table A.1.5: Pearson’s correlation tests for the association between the coefficient of background selection (B) and dXY-SNP. P-values and r are computed on the simulations with BGS only. Treatment Generation after split r P Bonferroni correction Default 1 0.127 0 *** 5 0.116 0 *** 158 0.117 0 *** 1581 0.127 0 *** 10000 0.127 0 *** High Migration 1 0.089 0 *** 5 0.098 0 *** 158 0.098 0 *** 1581 0.092 0 *** 10000 0.150 0 *** Large N 10 0.250 0 *** 50 0.247 0 *** 1580 0.231 0 *** 15810 0.242 0 *** 100000 0.258 0 *** Human genetic  map 1 0.070 0 *** 5 0.071 0 *** 158 0.055 4.4 x 10-4 . 1581 0.022 0.167  10000 0.063 0 *** Low selection  pressure 1 0.092 8.0 x 10-5 * 5 0.118 0 *** 158 0.101 0 *** 1581 0.118 0 *** 10000 0.121 0 *** Constant µ 1 0.256 0 *** 5 0.256 0 *** 158 0.235 0 *** 1581 0.299 0 *** 10000 0.305 0 *** No Migration 1 0.117 0 *** 5 0.121 0 *** 158 0.108 0 *** 1581 0.042 6.5 x 10-3  10000 -0.003 0.841  No Recombination 1 0.249 0 *** 5 0.244 0 *** 158 0.265 0 *** 1581 0.318 0 *** 10000 0.353 0 *** 	 125	Appendix A.2 Coalescent vs state based FST In order to verify that we get the same results with SimBit as with other simulation software, we conducted simulations based on the conditions in Zeng and Corcoran (2015) with SimBit, SLiM (Haller & Messer, 2017a), and Nemo (Frédéric Guillaume & Rougemont, 2006). For these simulations, the genome consists of 13,000 neutral loci (in the neutral case) or 13,000 neutral loci mixed with 13,000 selected loci (for the purifying selection case). In the “purifying selection” treatment, all mutations have a deleterious selection coefficient of s=0.01 in the heterozygote state. All sites are completely linked and the mutation rate per site is 2×10-6. The simulation starts with a single patch of 1000 diploid individuals that after 10,000 generations split into two patches of 5000 individuals each with a very low migration rate of 5×10-5. Simulations last for 80,000 generations after the split. FST is computed from allelic states for SimBit and Nemo but from the coalescent tree for SLiM. Figure A2.1 shows the CPU time for the simulations which results are displayed on the right graph. It serves as a justification for writing new software. Figure A.2.2 shows mean and 95% confidence interval for FST. The colors represent simulations from different software as well as different methods to compute FST. The dashed line shows the neutral expectation for Weir and Cockerham’s estimate of FST, computed from $$+,-. --./ (Charlesworth, 1998), where d is the number of demes. The new program SimBit gives results consistent with earlier programs Nemo and SLiM. We also conducted simulations to match the conditions simulated in Figure 1 of Zeng and Corcoran (2015); the results from SimBit were statistically indistinguishable from the results reported in that figure (see Figure A.2.3).  Figure A.2.1: CPU time (in hours) for simulations with 13,000 loci under purifying selection. SimBitSLiMNemo1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17CPU time [hours]	 126	 Figure A.2.2: Mean FST (with 95% confidence intervals) for simulations with and without purifying selection, for simulations done in SimBit, SLiM, and Nemo. The point estimates and confidence intervals are based on 20 simulations each. All programs get similar results in these simulations. Simulation conditions are described in the appendix main text.   Figure A.2.3: A comparison of SimBit to results from simulations in Zeng and Corcoran (2015) in their Figure 1. The point estimates and confidence intervals are based on 100 simulations from SimBit, and the black horizontal lines show the results from Zeng and Corcoran (2015). The conditions simulated in this figure were similar to those reported in the main text of this appendix, except there were only 2500 loci in these simulations. SimBit assumes diploid organisms whereas the results in Zeng and Corcoran were based on 1000 and 5000 haploid individuals (before and after the population split, respectively); these SimBit simulations therefore used 500 and 2500 diploid individuals.  ●●●●●●0.20.40.60.8Neutral BGSF ST ●●●●NemoSLiM coalescenceSimBit neutral loci onlySimBit all loci●●0.450.500.550.60Neutral Neutral linked toselected onesF ST	 127	Appendix B.1 Supplemental figures and tables  Figure B.1.1: Average developmental noise for all treatments, separating the plastic (in black) from the non-plastic genotypes (in grey).     ●●●●●●●● ●●●●●●●Constant low environmentNo signalConstant low environmentEnv. SignalConstant low environmentPerf. feedbackSpatial heterogeneityNo signalSpatial heterogeneityEnv. SignalSpatial heterogeneityPerf. feedbackT mporal heterogeneityNo signalTemporal heterogeneityEnv. SignalTemporal heterogeneityPerf. feedback300550800●●Not PlasticPlasticConstant Environment Spatial Heterogeneity Temporal Heterogeneity Developmental Noise Unstable Robust  None     Env.     Perf.  None     Env.     Perf.  None     Env.     Perf. Appendix B Supplemental information to chapter 3 	 128	 Figure B.1.2: Changes in developmental noise over time showing the evolution of developmental robustness. Errors bars are standard errors. Simulations are classified as plastic or not plastic based on the genotype considered at the last generation as in figure 3.1 (generation 200k). Data in generation 200k are shown in Figure 3.2.    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●Spatial heterogeneityEnv. SignalSpatial heterogeneityPerf. feedback050k100k150k200k 0 50k100k150k200k300600900Generation●●Not PlasticPlasticEnvironmental Signal                  Performance Signal Spatial Heterogeneity Developmental Noise Unstable Robust Generation 	 129	Table B.1.1: Statistics for OLS regressions shown in Figure 3.4. Abbreviation: CE=Constant Environment, SH=Spatial Heterogeneity, TH=Temporal Heterogeneity, NS=No Signal, ES=Environmental Signal, PS=Performance Signal. Environment Signal Plasticity Estimate SE F-value P-value r2 CE None No Plastic 0.4311 0.2228 3.742 0.0545 0.01855 CE None Plastic - - - - - CE Perf. No Plastic 0.8046 0.3076 6.843 0.0101 0.05437 CE Perf. Plastic 0.6678 0.3219 4.304 0.04165 0.05715 CE Env. No Plastic 0.2880 0.2778 1.075 0.3012 0.005839 CE Env. Plastic -0.5557 1.1365 0.2391 0.6330 0.01806 SH None No Plastic 0.3190 0.1925 2.746 0.09908 0.01389 SH None Plastic - - - - - SH Perf. No Plastic 0.5651 0.1624 12.11 0.001109 0.2084 SH Perf. Plastic 0.4481 0.1122 15.96 0.0001013 0.09614 SH Env. No Plastic 0.4135 0.1047 15.6 0.000165 0.1599 SH Env. Plastic 0.3588 0.1009 12.64 0.0005517 0.09979 TH None No Plastic 0.41555 0.09894 17.64 4.12e-05 0.08577 TH None Plastic - - - - - TH Perf. No Plastic 0.4827 0.2468 3.824 0.0646 0.1605 TH Perf. Plastic 0.5154 0.1029 25.07 1.52e-06 0.1424 TH Env. No Plastic 0.3653 0.2112 2.99 0.09234 0.07669 TH Env. Plastic 0.31395 0.09328 11.33 0.0009628 0.0681    	 130	Appendix B.2 Mutational noise estimator Let 1.E;be the parametric mean phenotype of the 6S;th mutant and 1.E; let 1F) be the parametric mean phenotype of a given wild type (parent; not mutated) genotype. We define mutational noise of this wild type genotype as the average sum of square differences between the mutants’ mean phenotypes and the mean phenotype of the wild type genotype 1F). Hence, the parametric mutational noise is T[(1.E;GH;2 − 1F))3]. Let us first compute an unbiased estimator for (1.E;GH;2 − 1F))3 (1F) − 1.E;)3 = T[(?̅F) − ?̅.E;)3] = TX?̅F)3 − 2?̅F)?̅.E; + ?̅.E;3Z= TX?̅F)3Z − 2T[?̅F)]T[?̅.E;] + [L\[?̅F) , ?̅.E;	] + T[?̅.E;3] , where ?̅F) is the sample mean for the wild type genotype and ?̅.E; is the sample mean for the 6S;th mutant. Error covariance between ?̅F) and ?̅.E; is zero because they are developed completely independently; ?̅F) is averaged over the ^ replicates and ?̅.E;are averaged over D replicates. Because for any variable _, T[_̅3] = 1I3 + I`3 =⁄ = T[_̅]3 + abD[_̅] holds true, it results that  (1F) − 1.E;)3 	= T[?̅F)]3 − 2T[?̅F)]T[?̅.E;] + T[?̅.E;]3 + J̅`7"3 ^⁄+ J̅`8()3 D⁄= (T[?̅F)] − T[?̅.E;])3 + J̅`7"3 ^⁄ + J̅`8()3 D⁄  Therefore, our estimate of (1F) − 1.E;)3	is (?̅F) − ?̅.E;)3 − <J7"3 ^⁄ −<J8()3 D⁄  , where <J7"3  is the sample variance of the wild type and <J8()3  is the sample variance for the 6S;th mutant. Averaging over all mutants lead to ∑ (?̅F) − ?̅.E;)3L4 c −∑ <J7"3L4c^ − ∑ <J8()3L4 cD= ∑ (?̅F) − ?̅.E;)3L4 c − <J7"3^ − <J8()3$$$$$$$D  , where <J8()3$$$$$$$ is the average sample variance over all mutants. As explained above, c=5000, D=1000 and ^=50,000.   	 131	Appendix C.1 Supplemental figures and tables  Figure C.1.1: Comparison of computational time among the four different ways to simulate the same evolutionary scenario using SimBit. See figure 4.1 for more details.    ● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●● ●● ●●● ●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●● ●● ●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●● ●● ●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●● ●● ●● ●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●●● ●● ●●●mu: 1e−07r: 0mu: 1e−07r: 1e−09mu: 1e−07r: 1e−07mu: 1e−07r: 1e−05mu: 1e−05r: 0mu: 1e−05r: 1e−09mu: 1e−05r: 1e−07mu: 1e−05r: 1e−05mu: 0.001r: 0mu: 0.001r: 1e−09mu: 0.001r: 1e−07mu: 0.001r: 1e−05N: 100N: 1000N: 10000N: 1e+05N: 1e+06102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106102104106102104106102104106102104106102104106Number of LociCPU time [seconds]SimBit M T1 SimBit T1 SimBit M T5 SimBit T5 r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5  µ: 10-7       µ: 10-5         µ: 10-3   N: 102           N: 103           N: 104           N: 105           N: 106 Appendix C Supplemental information to chapter 4 	 132		Figure C.1.2: Comparison of memory usage (max Resident Set Size) among the four different ways to simulate the same evolutionary scenario using SimBit. See figure 4.1 for more details. 	Figure C.1.3: Comparison of computational time among the four different simulation programs Nemo, SFS_CODE, SLiM and SimBit. For SimBit, two lines are displayed. Both lines show the best performing between T1 and T5 loci from figure C.1.1, once taking advantage of the assumption of multiplicative fitness, once without taking advantage of this assumption. See figure 4.1 for more details.  ● ● ●●●●●● ●●●●●●●● ●●●●●●● ●●●●●● ●●● ●● ●●● ● ●●● ●●●●● ●●●●●●● ●●●●●●● ●●●●●● ●●● ●● ●●●● ● ●●●●●● ●●●●●●● ●●●●●●● ●●●●●● ●●●●● ●●●●● ●●●● ●●●● ●●●●●●● ●●●●●●●● ●●●●●● ●●● ●● ●●●● ●●●● ●●●●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●● ● ●●●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●● ●●●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●● ●●●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●● ●●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●● ●●●●● ●●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●● ●●●●●● ●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●● ●●●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●● ●●mu: 1e−07r: 0mu: 1e−07r: 1e−09mu: 1e−07r: 1e−07mu: 1e−07r: 1e−05mu: 1e−05r: 0mu: 1e−05r: 1e−09mu: 1e−05r: 1e−07mu: 1e−05r: 1e−05mu: 0.001r: 0mu: 0.001r: 1e−09mu: 0.001r: 1e−07mu: 0.001r: 1e−05N: 100N: 1000N: 10000N: 1e+05N: 1e+06102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106104105106107104105106107104105106107104105106107104105106107Number of LociMax RSS [kb] SimBit M T1 SimBit T1 SimBit M T5 SimBit T5 r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5  µ: 10-7       µ: 10-5         µ: 10-3   N: 102           N: 103           N: 104           N: 105           N: 106  ● ●●●●●●●●● ●●●●● ●●●●●●●● ●●●● ●●●●●●●●● ●●●● ●●●●●● ●●● ●●●● ● ● ●●●● ● ●●● ●●●●●●●●● ●●●● ●●●●●●●●● ●●●● ●●●●●●●●● ●●● ●●●●●● ●●● ●●● ● ● ●●●● ● ●●● ●●●●●●●●●● ●●●●● ●●●●●●●●●●●●● ●●●●●●●● ●●●●● ●●●●●● ●●● ●●●● ● ● ●●●● ● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ●●●●●● ●●●●● ●●●●●●● ● ●● ●● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●●●● ●●●●●● ● ● ●●● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●● ● ● ●●● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●● ● ● ●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●● ●●●● ●●●●●● ● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●● ●●● ●●●● ●● ●● ●●●●● ● ● ●● ●●● ●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●● ●●●●●●● ●● ●● ●●●● ● ● ●● ●●● ●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●● ●●● ●●●● ●● ●● ●●●●● ● ● ●● ●●● ●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●● ●●●●●●● ●● ●● ●●●●● ● ●●● ●●mu: 1e−07r: 0mu: 1e−07r: 1e−09mu: 1e−07r: 1e−07mu: 1e−07r: 1e−05mu: 1e−05r: 0mu: 1e−05r: 1e−09mu: 1e−05r: 1e−07mu: 1e−05r: 1e−05mu: 0.001r: 0mu: 0.001r: 1e−09mu: 0.001r: 1e−07mu: 0.001r: 1e−05N: 100N: 1000N: 10000N: 1e+05N: 1e+06102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106102104106102104106102104106102104106102104106Number of LociCPU time [seconds]Nemo SFS_CODE SLiM SimBit M r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5  µ: 10-7       µ: 10-5         µ: 10-3   SimBit N: 102           N: 103           N: 104           N: 105           N: 106  	 133	 Figure C.1.4: Comparison of memory usage (max Resident Set Size) among the four different simulation programs Nemo, SFS_CODE, SLiM and SimBit. For SimBit, two lines are displayed. Both lines show the best performing between T1 and T5 loci from figure C.1.1, once taking advantage of the assumption of multiplicative fitness, once without taking advantage of this assumption. See figure 4.1 for more details.  ● ●●●●●●●●● ●●●●●●●● ●●●●● ●●●●●● ●●●● ●●● ●●●●●● ●●● ●●●●● ● ●●● ●●●●● ●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●● ●●● ●●●● ●●●●●● ●●● ●●●●● ● ●●● ●●●●●● ●●●●●●●● ●●●●●●●●● ●●●● ●●●●●● ●●●● ●●●● ●●●●●● ●●● ●●●●● ● ●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●● ●●●● ●●●●●● ●●● ●●●●● ●●●●●● ●●●●●●●● ●● ●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●● ●●●●●●●●●● ●●●● ●●●●●●●● ● ●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●● ●●●●●● ●●● ●●● ●●●● ●●●●●●●● ● ●●●●● ●●●●●●●●● ●●●●●●●● ●●●● ●●●●● ●●● ●●● ●●●● ●●●●●●●● ● ●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●● ●●● ● ●●●●●●●●●●●●●●● ●●●● ●●● ●●● ●●●● ●● ●●● ●●●● ●● ●●●● ● ●●● ●● ●●●●●●●●● ●●●● ●●● ●●● ●●●● ●● ●●● ●●●● ●● ●●●● ● ●●● ●●●●●●●●●●● ●●●● ●●● ●●● ●●●● ●● ●●● ●●●● ●● ●●●● ● ●●● ●● ●●●●●●●●● ●●●● ●●● ●●● ●●●● ●● ●●● ●●●● ●●● ●●● ● ●●● ●mu: 1e−07r: 0mu: 1e−07r: 1e−09mu: 1e−07r: 1e−07mu: 1e−07r: 1e−05mu: 1e−05r: 0mu: 1e−05r: 1e−09mu: 1e−05r: 1e−07mu: 1e−05r: 1e−05mu: 0.001r: 0mu: 0.001r: 1e−09mu: 0.001r: 1e−07mu: 0.001r: 1e−05N: 100N: 1000N: 10000N: 1e+05N: 1e+06102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106 102 104 106104105106107104105106107104105106107104105106107104105106107Number of LociMax RSS [kb] Nemo SFS_CODE SLiM SimBit M r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5 r: 0 r: 10-9 r: 10-7 r: 10-5  µ: 10-7       µ: 10-5         µ: 10-3   SimBit N: 102           N: 103           N: 104           N: 105           N: 106  Appendix C.2 SimBit commands for simulations benchmarked Here is the input used for the Wright-Fisher simulations presented at figures 1, 2, S1, S2, S3 and S4, with 100,000 individuals, 60,000 T5 loci, a uniform recombination rate of 10-7 and a uniform mutation rate of 10-7, assuming that fitnesses among haplotypes are multiplicative.  --PatchNumber 1 sets the number of patches to 1. --N unif 1e5 indicates a uniform carrying capacity for all patch of 100,000 individuals. --L T5 6e4 (could also have been written --Loci T5 6e4) asks for 60,000 T5 loci. --T5_mu unif 1e-7 sets a uniform mutation rate of 10-7 per locus. --T5_fit multfitUnif 0.99999 indicates a uniform selection scenario over all loci with the fitnesses of the three being set at 1, 0.99999 and 0.999992. --r rate unif 1e-7 indicates a uniform recombination rate of 10-7 between adjacent loci. Note the keyword ‘rate’ indicate the unit by which the recombination rate is set. Alternatively, one could # Number of patches --PatchNumber 1 # Carrying capacity --N unif 1e5 # Genomic architecture --L T5 6e4 # Mutation rate on T5 loci --T5_mu unif 1e-7 # Selection on T5 loci --T5_fit multfitUnif 0.99999 # Recombination rate --r rate unif 1e-7 # Number of generations --nbGens 1e5 # Do not print progress --printProgress false 	 135	use ‘cM’ for centiMorgans or ‘M’ for Morgans. --nbGens 1e5’ (could also be written --nbGenerations 1e5). Finally, --printProgress false simply turns off the printing in the standard output of the progress of the simulation. For the simulations inspired by Gilbert et al. (2017), the input for SimBit is 	 136	  # Patch Numbers --PatchNumbers 8000 # Carrying capacity --N unif 12 # Genomic architecture --L T1 1e3 # Mutation rate on T1 loci --T1_mu unif 0.0001 # Selection on T1 loci --T1_fit unif 1 0.997 0.99 # Recombination rate --r rate unif 0.005 # Number of generations --nbGens 18000 # Migration scenario --m LSS 3 0.3 0.4 0.3 1 # Fecundity --fec 7 # Initial patch size --InitialpatchSize A rep 12 1e3 rep 0 7e3 # Population growth model --popGrowthModel unif exponential # Is population growth stochastic --stochasticGrowth t 	 137	Some of the options have already been mentioned above. --m LSS 3 0.3 0.4 0.3 1 sets a Linear Stepping Stone model, with 3 probability of events, where the probability of not migrating is the value at index 1 (zero based counting). It therefore sets a probability of migrating toward the left or right to 0.3 and the probability of not migrating to 0.4. By default, boundary conditions are dealt with by redistributing the probabilities of migrating toward the non-existence patch to the other probabilities (boundaries are reflective, not absorbing). --fec 7 sets the fecundity for an individual having a fitness of one to 7. The keyword rep used for option --InitialpatchSize works like the homonym function in R. Hence, rep 12 1e3 creates an array of the value 12 repeated 1000 times. Hence, --InitialpatchSize A rep 12 1e3 rep 0 7e3 sets the initial patch size of the first 1000 patches to 12 and the initial patch size of the remaining 7000 patches to 0. --popGrowthModel unif exponential indicates that for all patches (hence the keyword ‘unif’ here) the growth model is an exponential model (bounded at the carrying capacity set by the option --N). --stochasticGrowth t indicates the growth must be stochastic (‘t’ is short for ‘true’), that is the number of offspring in the next generation is sampled from a Poisson distribution with rate of the expected number of offspring as explained in chapter 3. The keyword fromToBy works like the ‘seq’ function in R. It creates an array starting from the first value, to the second value by the third value. For example, writing fromToBy 0.1 0.9 0.2 is equivalent to writing 0.1 0.3 0.5 0.7 0.9. --patchSize_file GilbertEtAl fromToBy 0 end 100’ asks for an output file indicating the patch size (number of individuals) of each patch. The file name is set to ‘GilbertEtAl’ (to which an automatic extension will be added) and output will be produced at every 100 generations from generation 0 to the end of the simulation.  For the simulation inspired from Booker and Keightley (2018), the input used is 	 138	  --T5_fit cstHgamma 0.5 homo 0.111 2.22 indicates a Gamma distribution of homozygous (aka. double-mutant; hence the keyword ‘homo’) selection coefficient with parameters α = 0.111 and β = 2.22 and with a dominance coefficient of 0.5.  For the simulation inspired from O’neil et al. (2019), the input used was # Number of patches --PatchNumber 1 # Carrying capacity --N A 1e3 # Genomic architecture --L T5 15e4 # Mutation rate on T5 loci --T5_mu unif 2.5e-6 # Selection on T5 loci --T5_fit cstHgamma 0.5 homo 0.111 2.22 # Recombination rate --r rate unif 2.5e-6 # Number of generations --nbGens 20000 	 139	 --m island 0.000125 indicates an island model (among the three patches defined with option ‘--PatchNumber’) with a probability of migrating of 0.000125. --T5_fit multfitA rep 0.9995 25000 rep 1.0 75000 sets the selection coefficients of the first 25000 loci to 1-0.9995=0.0005 and the selection coefficients of the remaining 75000 to 1-1=0 and indicate the use of the assumption of multiplicative fitness. SLiM and Nemo’s input files tend to be much larger. As an example, for the simulation inspired from Gilbert et al. (2017), my input command for SimBit is 290 characters long (excluding the comments) while the input file of Nemo is 55,698 characters long. I therefore only reported here, input examples for SimBit.     # Number of patches --PatchNumber 3 # Migration scenario --m island 0.000125 # Carrying capacity --N unif 1e3 # Genomic architecture --L t5 1e5 # Selection on T5 loci --T5_fit multfitA rep 0.9995 25000 rep 1.0 75000 # Recombination rate --r rate unif 0 # Mutation rate on T5 loci --T5_mu unif 5e-07 # Number of generations --nbGenerations 2e4 	 140	Appendix C.3 SimBit’s manual    ____  _           ____  _ _      / ___|(_)_ __ ___ | __ )(_) |_    \___ \| | '_ ` _ \|  _ \| | __|    ___) | | | | | | | |_) | | |_    |____/|_|_| |_| |_|____/|_|\__| version 4.9.21    ____________________________________________________________________________  User Manual ____________________________________________________________________________   Author Remi Matthey-Doret                       Last revised May 21, 2020  141 1 Table of Contents 2 CORRESPONDENCE .................................................................................................................... 142 3 HOW TO CITE ............................................................................................................................. 142 4 A LITTLE A PRIORI INFORMATION ............................................................................................. 142 4.1 SIMBIT IN A FEW WORDS ........................................................................................................ 142 4.2 CONTRIBUTERS ..................................................................................................................... 142 4.3 HOW TO OBTAIN SIMBIT ........................................................................................................ 143 4.4 HOW TO COMPILE SIMBIT ...................................................................................................... 143 4.5 HOW TO READ THIS MANUAL .................................................................................................. 144 4.6 HOW TO GIVE ARGUMENTS TO SIMBIT ...................................................................................... 144 4.7 GENETIC ARCHITECTURE - THE BASIC TYPES OF LOCUS .................................................................. 145 4.7.1 T1 loci .......................................................................................................................... 145 4.7.2 T2 loci .......................................................................................................................... 145 4.7.3 T3 loci .......................................................................................................................... 146 4.7.4 T4 loci .......................................................................................................................... 146 4.7.5 T5 loci .......................................................................................................................... 146 4.8 UNIF AND A .......................................................................................................................... 147 4.9 SEQ, SEQINT, REP AND FROMTOBY ........................................................................................... 148 4.10 LISTING ALL OPTIONS ............................................................................................................. 148 4.11 SIMBIT VERSION ................................................................................................................... 149 5 SPECIES, GENERATION AND HABITAT-SPECIFIC OPTIONS ........................................................ 149 5.1 SPECIES-SPECIFIC OPTIONS ...................................................................................................... 154 5.2 GENERATION-SPECIFIC OPTIONS .............................................................................................. 154 5.3 HABITAT-SPECIFIC OPTIONS ..................................................................................................... 155 6 LAUNCHING BASIC SIMULATIONS ............................................................................................. 156 7 SELECTION AND PHENOTYPE ..................................................................................................... 163 7.1 GENERAL CONCEPTS .............................................................................................................. 163 7.2 T1 ...................................................................................................................................... 164 7.3 T2 ...................................................................................................................................... 166 7.4 T3 ...................................................................................................................................... 166 7.5 T4 ...................................................................................................................................... 168 7.6 T5 ...................................................................................................................................... 168 8 DEMOGRAPHY AND SPECIES ECOLOGY ..................................................................................... 169 9 MATING SYSTEM ........................................................................................................................ 173 10 DEFINING INDIVIDUAL TYPES .................................................................................................... 174 11 INITIAL POPULATION ................................................................................................................. 175 12 RESET GENETICS ........................................................................................................................ 177 13 OUTPUT ...................................................................................................................................... 179 13.1 LOGFILE ............................................................................................................................... 181 13.2 EXPORT POPULATION TO A BINARY FILE ..................................................................................... 182 13.3 USER FRIENDLY OUTPUTS ........................................................................................................ 183 13.3.1 Outputs for T1 loci .................................................................................................. 184 13.3.2 Outputs for T2 loci .................................................................................................. 186 13.3.3 Outputs for T3 loci .................................................................................................. 186 13.3.4 Outputs for T4 loci .................................................................................................. 186 13.3.5 Other Outputs ........................................................................................................ 187 14 TECHNICAL OPTIONS ................................................................................................................. 189 15 PERFORMANCE OPTIONS .......................................................................................................... 191   142 2 Correspondence I would be happy to hear about your questions, bug reports or requests for new features. If you receive an error message starting by ”Internal error”, then please send me an email with the error message, SimBit’s version, and the input data. Please, when applicable, always make sure to provide a reproducible example of your problem (input data, SimBit version) and report the entire error message.  Remi Matthey-Doret remi.b.md@gmail.com matthey@zoology.ubc.ca Beaty Biodiversity Research Center, room 205 Dept. of Zoology, University of British Columbia 6270 University Blvd, Vancouver, BC, V6T 1Z4 Phone: +1 (604) 369-5929   3 How to cite The software is not published yet, so just cite the github page https://github. com/RemiMattheyDoret/SimBit.   4 A little a priori information 4.1 SimBit in a few words SimBit is a flexible and fast forward in time simulation platform for finite site mutation models with arbitrary genetic architecture, selection scenario (including local selection, epistasis and all possible types of dominance), and demography. SimBit can simulate several species with their ecological interactions too. SimBit has been created with two main ideas in mind: Having a simple user interface with very good error report and to be extremely fast for a wide variety of scenarios. One way SimBit achieve such high performance is by allowing a diversity of internal representations of the genetics of individuals.  4.2 Contributers  Thank you to Michael C. Whitlock for his feedback on SimBit’s user interface and for proofreading this manual. Thank you also to the main beta tester, Pirmin Nietlisbach. Pirmin has also provided much advice on how to improve the manual.   143 4.3 How to obtain SimBit SimBit is available on GitHub at https://github.com/RemiMattheyDoret/SimBit. If you are experiencing trouble downloading from GitHub, you might  want to have a look at the following link https://stackoverflow.com/questions/ 6466945/fastest-way-to-download-a-github-project.  4.4 How to compile SimBit   Using the command line (terminal), cd to the downloaded directory and do make. For example:   This is it! The Makefile is ridiculously simple. You should now have an executable called SimBit in /bin. By default, the Makefile is using the standard C++ of 2014 (c++14; or any more recent standards) but SimBit can also compile on C++11 if you do:   SimBit also uses a few boost files that you might need to download if you have not yet. You can download them from https://www.boost.org/users/ download/. If you want to be able to call SimBit without having to specify the whole path, just do it manually. For example, in MacOS, you could do:   and, then write:   cd SimBit make make c++11 touch ~/bash profile open ~/bash profile export PATH=”/PathToSimBit/SimBit/bin/:$PATH”   144 by changing /PathToSimBit/SimBit/bin/ with the correct path on your machine.  4.5 How to read this manual  The manual is meant to be read from start to finish. Just take two hours to read it and unleash SimBit’s full potential!  The last two sections (“technical options” and “performance options”) can initially be skipped. The subsections of “Species, Generation and Habitat-specific options” can also be skipped (it will made clear as you arrive to it). 4.6 How to give arguments to SimBit  SimBit works through the command line (in the terminal). To use SimBit, just call the executable and follow it with options starting with the prefix -- (double dash). Each option is followed by an entry. For example:   indicates that you want a simulation lasting for 1000 generations. SimBit will list all available options if you just call the executable without giving it any arguments.   For some options, there exists a short and a long name (or even several long names). For example to set the carrying capacity per patch the short name is --N and the long name is --PatchCapacity. If you do not remember the name of an option (for example, if you are not sure if it is --PatchCapacity, --patchCapacity or --CarryingCapacity, you can just type whatever comes to your mind and SimBit will look for option names that look alike (which has the lowest Levenshtein distance) and will suggest you another name! You do not need to bother about the ordering of the options. If an option is missing SimBit will use the default when a default is available. If no default is available for a missing option, if an option is present more than once, if two entries do not coincide, or if an entry is nonsense, then SimBit will throw an error message. It is also possible to put all arguments (same format) in a file and specify the file and the line of the file that contains the argument. If the argument file is called ArgFile.txt and the arguments of interest are found in the 12th line (the first line is numbered 1, not 0), then one can do  --nbGenerations 1000 ./SimBit --option1 arguments for option 1 -- option2 arguments for option 2    145  instead of file, one can equivalently just write F, f or FILE. If you want SimBit to read all the lines in the file then write all (or just a) instead of the line number. For example:   The advantage of this technique is that SimBit will then ignore anything that follows the # sign on a given line allowing the user to leave comment in the input file. See the section “Launching basic simulations” for examples.  4.7 Genetic architecture - The basic types of locus  The representation of the genetic architecture is a key factor affecting flexibility and performance. Indeed, different types of simulation require different representations of the genetic architecture in order to maximize the performance. SimBit offers 5 types of locus that are referred to as T1, T2, T3, T4 and T5, which will be defined below. Loci of different types are integrated on the same recombination map (see the options --L aka --Loci and --r aka --recombinationRate below). T1 and T5 loci are meant to perform the same types of simulations but have different performance. T1 are meant to be used when there is high per locus genetic diversity while T5 are meant to be used when there is moderate to low genetic diversity per locus. It is hard to provide a good threshold value as it will depend from other elements in your simulations (the selection scenario, the recombination rate, etc.) and might even depend upon the processor you are using.  4.7.1 T1 loci T1 loci track binary variables (e.g. mutated vs wildtype). SimBit has in memory for each haplotype an array of bits of the length of the number of T1 loci simulated. The nth bit indicates whether the nth T1 locus of this haplotype is mutated or not. T1 loci have high performance for simulations with very high per locus genetic diversity.  4.7.2 T2 loci T2 loci are meant to represent aggregate blocks of loci and counts the number of mutations happening in this block. This type should be used only when 1) the genetic diversity per T2 locus is very high, 2) when performance is a major concern, 3) you are satisfied with the limited selection scenario it can model and 4) a simple count of the number of mutations happening per T2 locus for each haplotype is a sufficient output for your needs.  ./SimBit file ArgFile.txt 12  ./SimBit file ArgFile.txt a   146 4.7.3 T3 loci T3 loci are quantitative trait loci (QTL) and code for a n-dimensional phenotype. The user can set the phenotypic effect of each T3 locus on each of the n axes of the phenotype and this can also be set to be dependent on the environment in order to simulate a plastic response. A user can also add random developmental noise (drawn from a gaussian distribution) in the production of a phenotype in order to reduce heritability. For T3 loci, the user can define a fitness landscape and an individual’s fitness is given by its phenotype. In the current version, a T3 locus is coded as a single byte. It can therefore only take 28 = 256 different values (from -128 to 127). This is therefore only an approximation of a truly perfectly quantitative locus. 4.7.4 T4 loci For T4 loci, SimBit computes the coalescent tree of the population over time and add the mutations onto the tree when the user asks for output. T4 loci are extremely fast when the recombination rate is low. T4 loci are inspired from Kelleher et al. (2018; already implemented in SLiM; Haller et al., 2018). T4 loci are necessarily neutral. The advantage of T4 loci over T1 or T5 loci comes from the computational time. Use T4 loci if 1) you want many neutral loci (or the order of, say, 105 at least), 2) recombination rate is relatively low (of the order of, say, 10-7, on average between any two locus). T4 loci are extremely fast when dealing with recombination rate of the order of 10-9 and lower but will be much slower than T1 and T5 loci for cases of high recombination and cases with few loci. Note that variance in recombination rate throughout the genome typically helps making T4 faster. General advice: don't assume that one is faster but try it out for your parameters. 4.7.5 T5 loci T5 loci are very similar to T1 loci. Two simulations with the same random seed differing only by the fact that one uses T1 loci and the other uses T5 loci will produce the same output. The big difference is how SimBit tracks their values. For each haplotype, SimBit has an array with the position of each T5 locus that is mutated. T5 loci tend to perform better than T1 loci for moderate to low genetic diversity. Behind the scene, SimBit will track separately T5 loci that are under selection (which it calls T5sel) and T5 loci that are neutral (which it calls T5ntrl) for improved performance. SimBit can also compress T5 loci (T5ntrl and/or T5sel) information in memory. Behind the scenes again, the compressed T5 loci are actually called T6 (T6sel and T6ntrl for selected and neutral loci, respectively) but I don't think you really needed to know that! Compression reduces the RAM usage by (up to) a factor of 2. It can also increase or reduce CPU time depending on the simulation scenario. By default, SimBit makes this compression (on the neutral T5 loci only) only when  147 it is certain it will improve performance (which is when the number of T5 neutral loci is between 10 and 216). For advanced users, it is also possible to ask SimBit to invert the meaning of some loci depending on their frequencies. For example, if the locus 23 is fixed or quasi-fixed, then SimBit can invert the meaning of having the number 23 in its haplotype description. As a result, a haplotype would track this 23rd locus only if they carry the non-mutated allele. These advanced performance tweaks are explained in the section "performance options". 4.8 unif and A In order to indicate input data in a convenient way SimBit uses a number of different Modes of input. Each specific option has its list of Modes but two Modes that appear over and over again are unif and A. A stands for "All entries" saying that you want to input as many values as needed. unif stands for "uniform" saying that you want all elements to be set to the same value. As an example, to set four patches   or  to a carrying capacity of 1000, then you could do  or equivalently  Note that SimBit also understand the scientific notation with the aeb standing for a×10b. For examples, 1e-4 is 0.0001 and 5.27e3 is 5270. As such, the above entry could also be written  --PN 4  --PatchNumber 4  --N unif 1000  --N A 1000 1000 1000 1000  --N unif 1e3   148 4.9 seq, seqInt, rep and fromToBy There are special keywords seq, seqInt, rep and fromToBy. When SimBit reads the input, it first split the input by the option names (which start with a double dash such as --T1_fit), and it then directly evaluates the keywords seq, seqInt, and rep. The keywords seq and seqInt are analogous to the function "seq" in R. They both expects three values: the "from" value, the "to" value and the "by" value. seqInt is to be used for integer values while seq is for float values. For example the input seqInt 5 17 2 can be read as "from 5 to 17 by 2" and is equivalent to 5 7 9 11 13 15 17. The keyword rep is analogous to the function "rep" in R. rep expects two values the "whatToRepeat" value, the "howManyTimes" value. For example the input rep 4 5 is equivalent to 4 4 4 4 4. It is also possible to feed a vector as the first argument. While the seq keyword expect numbers only, the rep keyword expects only the second argument to be an integer. The first argument can be any string. The keywords can mixed at will. For example 3 4 rep hello 3 0 seq 1 2 0.3 is equivalent to 3 4 hello hello hello 0 1.0 1.3 1.6 1.9 3. In older versions of SimBit a keyword R existed and was equivalent to the current rep. R is now deprecated. The keyword fromToBy (or fromtoby or FromToBy) can only be used for output related option (see section “Outputs”). fromToBy works exactly like seqInt excepts that it accepts the keyword end to indicate the last generation of the simulation. For example:  asks for the output file showing the fitness summary statistics, the file name will be "fitFile" and the outputs will be printed every hundred generations from generation 0 and up to the last generation of the simulation (generation 60000). 4.10 Listing all options If you run SimBit without giving it any arguments, then it will list all the available options. --nbGenerations 60000 --fitnessStats_file fitFile fromToBy 0 end 100   149 4.11 SimBit Version Every time you run the executable, SimBit prints its version (bottom right of the logo) in standard output. 5 Species, Generation and Habitat-specific options Most options are species-specific, generation-specific and/or habitat-specific. SimBit uses the markers @S for species, @G for generation and @H for habitat (“at” symbol followed by either S, G or H). As such to refer to the, say, 120th generation one would write @G120. If you want to simulate a single species, a single type of habitat (no environmental heterogeneity) and no change over time, then you do not have to bother but these @S, @G, @H. All these species-, habitat-, and generation-specific markers must come in order. For example,   is correct but   leads to an error message. Similarly, habitats that are named by indices must come in order. For example,  is correct but  leads to an error message. Finally, species are named but you have to follow the ordering you used when naming these species. For example --N @G0 50 @G130 1000  --N @G130 1000 @G0 50  --T1_fit @H0 unif 1 1 0.99 @H1 unif 1 1 0.9  --T1_fit @H1 unif 1 1 0.9 @H0 unif 1 1 0.99   150  is correct but  leads to an error message. Note that with the species markers, you can also use species index instead of species name. For example, you can do  In general, it is quite intuitive what options are species-, generation-, and habitat-specific. All options regarding the fitness are both species-specific and habitat-specific. All options regarding the number of patches, the carrying capacity and the migration rates are both species-specific and generation-specific. All options regarding the genetic architecture, mating systems and fecundity are species-specific only. Note that while some options are meant to indicate species interaction (--eco aka --speciesEcologicalRelationships), it does not mean that the option is species-specific, in the sense that it does not take any @S marker.  Here is the entire list of options that are species-specific and generation-specific (but not habitat-specific)  Here is the entire list of options that are species-specific and habitat-specific (but not generation-specific) --species Quercus Fagus --N @SQuercus A 1000 @SFagus A 60000  --species Quercus Fagus --N @SFagus A 60000 @SQuercus A 1000  --species Quercus Fagus --N @S0 A 1000 @S1 A 60000  --N (aka --patchCapacity) --H (aka --Habitats) --m (aka --DispMat)  151  Here is the entire list of options that are species-specific (but not habitat-specific or generation-specific)  Here is the entire list of options that are generation-specific (but not species-specific or habitat-specific) --T1_epistasis (aka --T1_EpistaticFitnessEffects) --T1_fit (aka --T1_FitnessEffects) --T2_fit (aka --T2_FitnessEffects) --T3_pheno (aka --T3_PhenotypicEffects) --T3_fit (aka --T3_FitnessLandscape) --T3_DN (aka --T3_DevelopmentalNoise) --nbSubGens (aka --nbSubGenerations) --T5_approximationForNtrl (aka --T56_approximationForNtrl) --T5_fit (aka --T5_FitnessEffects) --T5_compressData (aka --T5_compress) --L (aka --Loci) --fec (aka --fecundityForFitnessOfOne) --DispWeightByFitness --gameteDispersal --InitialpatchSize --cloningRate --selfingRate --matingSystem --additiveEffectAmongLoci --selectionOn --T1_mu (aka --T1_MutationRate) --T2_mu (aka --T2_MutationRate) --T4_mu (aka --T4_MutationRate) --T4_maxAverageNbNodesPerHaplotype --T5_mu (aka --T5_MutationRate) --T5_toggleMutsEveryNGeneration --T5_freqThreshold (aka --T5_frequencyThresholdForFlippingMeaning) --T3_mu (aka --T3_MutationRate) --r (aka --RecombinationRate) --recRateOnMismatch --FitnessMapInfo --indTypes (aka --individualTypes) --resetGenetics --indIni (aka --individualInitialization) --T1_ini (aka --T1_Initial_AlleleFreqs) --T5_ini (aka --T5_Initial_AlleleFreqs) --popGrowthModel --stochasticGrowth --swapInLifeCycle --readPopFromBinary --geneticSampling_withWalker --individualSampling_withWalker  152  Note that the number of patches is the same for all species so that we can identify them as being in the same patch. Simulating a species absent from a given patch is achieved by setting is carrying capacity for this patch at 0.  Finally, here is the entire list of options that are neither species-specific, generation-specific or habitat-specific --PN (aka --PatchNumber)  153  If you feel the need to read more on that matter, please read the following three subsections. Otherwise, you should probably skip these sections and go straight to section "Launching basic simulations". --seed (aka --random_seed) --printProgress --nbGens (aka --nbGenerations) --startAtGeneration --S (aka --species) --LogfileType --sequencingErrorRate --GP (aka --GeneralPath) --T1_vcf_file (aka --T1_VCF_file) --T1_LargeOutput_file --T1_AlleleFreq_file --Log (aka --Logfile and --Logfile_file) --T1_MeanLD_file --T1_LongestRun_file --T1_HybridIndex_file --T1_ExpectiMinRec_file --T2_LargeOutput_file --SaveBinary_file --T3_LargeOutput_file --T3_MeanVar_file --fitness_file --fitnessSubsetLoci_file --fitnessStats_file --T1_FST_file --T1_FST_info --extraGeneticInfo_file --patchSize_file --extinction_file --genealogy_file --coalesce (aka --shouldGenealogyBeCoalesced) --T4_LargeOutput_file --T4_vcf_file (aka --T4_VCF_file) --T4_SFS_file --T1_SFS_file --T4_printTree --T4_coalescenceFst_file --T5_vcf_file (aka --T5_VCF_file) --T5_SFS_file --T5_AlleleFreq_file --T5_LargeOutput_file --outputSFSbinSize --eco (aka --speciesEcologicalRelationships) --Overwrite --DryRun --centralT1LocusForExtraGeneticInfo --killOnDemand  154 5.1 Species-specific options Most options are species-specific. All species must share the same geographic location. Therefore, the number of patches is not specific per species. However, the carrying capacity can vary among species. If you want a species to be absent from a patch, just set its carrying capacity to zero. All options regarding the genetic architecture, selection scenarios and demography are species-specific. You need to name your different species with the option  This option is aka --S. Each species needs a unique name. By default, SimBit assumes a single species (called “sp”). The only species name that is not accepted is “seed” for reasons explained in the “Outputs” section.  5.2 Generation-specific options The following two examples are equivalent   and the two following examples are also equivalent.   Consider the following example  --species name1 name2 name3 … --nbGenerations 2000 --PatchNumber 1 --nbGenerations 2000 --PatchNumber @G0 1 --nbGenerations 2000 --PatchNumber @G0 1 @G1000 5 @G1200 1 --nbGenerations 2000 --PatchNumber 1 @G1000 5 @G1200 1 --nbGenerations 2000 --PatchNumber @G0 1 @G1000 3 @G1200 1 --N @G0 unif 100 @G500 unif 1500 Species  155 This command would indicate that temporal changes will occur at generations 0, 500, 1000 and 1200. From generation 0 to 500, there is 1 patch of 100 individual. From generation 500 to 1000, there is still one patch but with 1500 individuals. From generation 1000 to 1200 there are 3 patches of 1500 individuals each and from generation 1200 to generation 2000, there is one patch of 1500 individuals.  The following two examples are also equivalent     It simulates one patch up to generation 1000 and three patch then. The patch sizes are 10 up to generation 500, then it is 100 up to end of the simulation. I think our intuition want us to rewrite @G1000 A 100 100 100, but it is in fact not required as @G500 unif 100 means to apply unif 100 any time after generation 500.  Many options are both species-specific and habitat-specific or species-specific and generation-specific. For such cases, always indicate the species first, and then the habitat or generation. For example,   specifies two species that live over a single patch. The carrying capacity of wolf is 100 for the first 500 generations and then it increases to 200. The carrying capacity for rabbits is 2000 all the way through the simulation.  5.3 Habitat-specific options All options that relate to the selection scenario (such as --T1_fit for example) and phenotypes (--T3_pheno) are habitat-specific. A habitat has to be understood in its ecological definition. A habitat could also have been called an environment. Several patches may belong to the same habitat and the habitat a given patch belong to can change over time. Patches are associated to habitat and habitat associated to specific selection scenario allowing local and temporal variation in selection. Let me explain how. --nbGenerations 2000 --PatchNumber @G0 1 @G1000 3 --N @G0 A 10 @G500 A 100 @G1000 A 100 100 100 --nbGenerations 2000 --PatchNumber @G0 1 @G1000 3 --N @G0 A 10 @G500 unif 100 --species wolf rabbit --PatchNumber 1 --N @Swolf @G0 unif 100 @G500 unif 200 @Srabbit unif 2e3  156 To associate patches to habitat, use the following option  This option is a generation-specific and is also written as --H. One can therefore change the association between patch and habitat over time and therefore to change selection pressure over both space and time. Two modes are available, A and unif. When using mode A, the number of entries must be of the same length as the number of patches. Consider for example   This indicates that from generation 0 to 500 there is only one patch which belongs to habitat 0. From generation 500 to 1000, there are 3 patches, the first two patches as well as the last one belong to habitat 0 and the third patch belong to habitat 1. Note that habitat 0 must always exist and it is impossible to specify an habitat index without specifying the previous one. For example it is impossible to specify habitat 5 without specifying habitats 0, 1, 2, 3 and 4. If the option --Habitats is absent, SimBit assumes that all patches belong to habitat 0. This system of associating each patch to a habitat in a time-specific manner is a very convenient solution to indicate variation in selection pressures through time and space. For all options concerning selection, one must indicate for which patch a given selection scenario applies using the @Hx notation, where x is the habitat in question. More information about fitness related options in the section “Selection”. 6 Launching basic simulations SimBit requires a quantity of basic information to make a simulation. This information is the number of patches (--PN aka --PatchNumber), the number of individuals per patch (--N aka --PatchCapacity), the number of loci, their types and physical ordering on the chromosome (--L aka --Loci), the number of generations (--nbGens aka --nbGenerations), the mutation rate for the type of locus indicated to option --L (aka --Loci). Mutation rates for each locus of each type --T1_mu (aka --T1_MutationRate), --T2_mu (aka --T2_MutationRate), --T3_mu (aka --T3_MutationRate), --T4_mu (aka --T4_MutationRate), --T5_mu aka --T5_MutationRate), and the dispersal rate --m (aka --DispMat; only required if there is more than one patch). Finally, the recombination rate is set with --r (aka --RecombinationRate). --Habitats mode int --nbGens 1000 --PatchNumber @G0 1 @G500 4 --Habitats @G0 unif 0 @G500 A 0 0 1 0 Habitats  157 Note, by the way, that I call each different panmictic patch in a structured population, a patch and not a subpopulation or a deme. For a start we will consider the following example:  As explained already in section “How to give arguments to SimBit”, if you prefer to write your command on a file than directly in the terminal, you can do  The number 1 at the end indicates that SimBit must read the first line from the file “example1.txt”. This allows a user to gather a number of different commands in the same file. As explained above, one could also split that command over several lines (which has the added advantage that you can comment in the file) and then read the command with SimBit --nbGeneration 5000 --PatchNumber 4 --N A 100 100 100 100 --m Island 0.01 --L T1 3 --T1_mu A 0.00001 1e-8 1e-8 --r rate A 1e-6 1e-6 <In example1.txt> --nbGeneration 5000 --PatchNumber 4 --N A 100 100 100 100 --m Island 0.01 --L T1 3 --T1_mu A 0.00001 1e-8 1e-8 --r rate A 1e-6 1e-6 <In terminal> SimBit file example1.txt 1  158  This simulation should run in a few seconds. Some elements of the above command may make little sense to you so far, so let's go through it. The command asks for a simulation lasting 5000 generations with 4 patches of 100 individuals each. Migration follows a classical island model in which the probability of migrating is 0.01. Each haplotype is made of 3 loci of type T1. The mutation rate per locus is 0.00001 (10-5), 10-8 and 10-8, respectively. The recombination rate between any adjacent locus and the next is 10-6. The first option is quite straightforward, the argument of --nbGeneration is a single integer number.  The option is aka --nbGens The second option here indicates number of patches <In example2.txt> # Set the number of generations --nbGeneration 5000      # 5000 generations  # Set the number of patches --PatchNumber 4          # four patches  # Set the carrying capacity. --N A 100 100 100 100    # N=100 per patch  # Set the migration scenario --m Island 0.01          # island model  # Set the genome map --L T1 3                 # Three T1 loci  # Set the mutation rate --T1_mu A 0.00001 1e-8 1e-8    # Set the recombination rate --r rate A 1e-6 1e-6   <In terminal> SimBit file example2.txt all --nbGenerations int Number of generations  159  The option generation-specific and is aka --PN. It is a generation-specific option. The third indicates the carrying capacity for each patch  This option is species- and generation-specific aka --PatchCapacity. The two modes are A and unif. Above, I set all four patches at the same carrying capacity   Instead, I could have used unif  or I could have used rep   The following option sets the migration scenario  This option is species- and generation-specific and is aka --DispMat. There modes are A, LSS, OnePatch, Island (or island), LinearNormal. A is the most flexible mode of entry and expect the user to input the entire matrix of dispersal probabilities. More information about these different modes in table 1. By default --m is set to OnePatch. --PatchNumber int --N mode int(s) --N A 100 100 100 100 --N unif 100 --N A rep 100 4 --m mode value Number of patches Carrying capacity Dispersal  160 Table 1: Input format for migration scenario Mode Meaning Example A Input a PatchNumber x PatchNumber square matrix. With three patches, the fourth element is the element of the second row, first column of the matrix and hence indicate the probability of migrating from the second patch to the first. A 0.9 0.1 0 0 0.1 0.8 0.1 0 0 0.1 0.8 0.1 0 0 0.1 0.9 It simulates a 4 patches stepping stone model LSS It stands for 1D Linear Stepping Stones. Here the first element is the number of probabilities to expect next. Then is a vector of probabilities and finally is which element of this vector (zero based counting) corresponds to the probability of not migrating. The resulting dispersal matrix is corrected at the edge (reflective boundary effect). The input format is LSS NbProbabilities <probabilities> center LSS 4 0.05 0.05 0.75 0.15 2 indicates that the probability of not migrating is 0.75, the probability of migrating one patch on the left is 0.05, the probability of migrating two patches on the left is 0.05, the probability of migrating one patch on the right is 0.15 OnePatch It is the only possible entry when there is only a single patch. It is the default. --PN 1 --m OnePatch Island This creates a classical island model. One value is expected which is the probability of migrating. island 0.01 It creates an island model where the probability of migrating from any patch to any other patch is 0.01 LinearNormal This creates a 1D dispersal kernel that is approximated by a normal (Gaussian) function. The two entries expected are the standard deviation of this Gaussian distribution (in number of patch) and the number of standard deviations above and below which the probability of migrating will be approximated to zero. LinearNormal 2 4 It indicates a 1D gaussian distribution kernel with 2 patches of standard deviation and that after 4 standard deviation, the probability of migrating will be considered sufficiently low to be approximated to 0. The next option sets the number of loci of each type as well as their ordering on the chromosomes  This option is species-specific and is aka --Loci. For example, if you want 23 T3 loci, followed by 1000 T1 loci followed by 12 T3 loci, you could input  --L LocusType nbLoci LocusType nbLoci LocusType NbLoci … --L T3 23 T1 1e3 T3 12 Loci  161 Note that instead of T3 12, you could have T3 6 T3 4 T3 2, and it would be equivalent. Lower case T is also accepted (e.g. t1). You can even ignore the Ts (e.g. --L 3 23 1 1e3 3 12) but it not very human readable. The recombination rate between any of these loci, including, the placing of loci on independent chromosomes is indicated via the option --r (aka --RecombinationRate) presented below. Because there are 23 + 1e3 + 12 = 1045 loci, the option --r will expect 1044 entries. You will likely only need one type of loci and your input might simply look like  This input asks for 10,000 T5 loci. The next option sets the mutation rate on T1 loci  This option is species-specific and is aka --T1_MutationRate. The two modes are A and unif. The options --T2_mu, --T3_mu, --T4_mu and --T5_mu are not present in the above code, but I will mention it here for their similarity with --T1_mu.     --L T5 1e4 --T1_mu mode float(s) --T2_mu mode float(s) --T3_mu mode float(s) --T4_mu mode float(s) --T5_mu mode float(s) Mutation rate on T1 loci Mutation rate on T2 loci Mutation rate on T3 loci Mutation rate on T4 loci Mutation rate on T5 loci  162 All are species-specific and all can also be named --Tx_MutationRate (e.g. --T5_MutationRate). The last option in the above example sets the recombination rate between any two loci (whatever their type)  This option is species-specific and is also written as --RecombinationRate. There are 3 possible units; rate, cM and M. rate means that values represent rate of recombination. cM indicates that values represent centiMorgans. M indicates that values represent Morgans (1 Morgan = 100 centiMorgans). Of course, there is not much difference between M and rate if distances are not too high (say below 0.1). Again there are two modes; A and unif. With A, the number of entries should equal the total number of loci minus 1. As an example  It sets the recombination rate between any two adjacent loci to 0.0000001. To indicate perfectly independent loci (a chromosome break), just give it a rate of 0.5. If you are using cM or M, you can just say -1, and it will be understood as a chromosome break. Consider the following complex example  It creates three independent chromosomes. Two of them are (almost) 0.01 cM in length and contained 100 T5 loci each (each T5 locus is at a distance of 0.00001 cM from the previous locus on this chromosome) and one chromosome of 0.2 cM with three T3 loci (each T3 locus is at a distance 0.1 from the previous locus on this chromosome). Note that it would not be quantitatively different to change the order of chromosomes as in the following example.  --r unit mode float(s) --r rate unif 1e-7 --L T5 100 T3 3 T5 100 --r cM A rep 1e-5 99 -1 0.1 0.1 -1 rep 1e-5 1e-5 --L T5 100 T5 100 T3 3 --r cM A rep 1e-5 99 -1 rep 1e-5 1e-5 -1 0.1 0.1 Recombination  163 7 Selection and phenotype 7.1 General concepts The selection scenario can be set independently for each type of locus with options --T1_fit, T1_epistasis, --T2_fit, --T3_fit and --T5_fit (T4 loci are always neutral, there is therefore no --T4_fit). Fitness effects among loci types of different types are always multiplicative. For some types of loci (see below), SimBit can make use of an assumption about the selection scenario that can provide substantial improvement in run time. I call this assumption the "multiplicative fitness" assumption (abbreviated "multfit")". The multiplicative fitness assumption assumes that fitness effects are multiplicative among loci too and that the fitnesses of the three possible genotypes are 1, 1 - s and (1 - s)2. With this assumption, dominance coefficients are very close to 0.5 (additivity) especially for small selection coefficients. For examples, if the double mutant homozygote fitness is 1 - t = 1 - 0.001, then h ≈ 0.5001. If 1 - t = 1−0.1, then h ≈ 0.51. When taking advantage of the assumption of multiplicative fitness, SimBit partitions a haplotype into blocks and computes the fitness value for each block. If, during reproduction, no recombination events happen within a given block, then SimBit will not need to recompute the fitness for this specific block as the fitness of the block can simply be multiplied over by the fitness of the same block on the other haplotype. This technique yields substantial performance improvement in terms of CPU time (especially when recombination rate within blocks is relatively low). SimBit does a decent job at choosing the size of blocks, but a user can have complete control over the block sizes with the option --FitnessMapInfo (see section "Performance options"). Unless the exact dominance relationship is of central importance, it is generally recommended to make use of this assumption (especially when recombination rate is low and when there are a fair amount of loci). All selection scenarios described below (including epistasis) are habitat-specific, hence allowing any kind of spatial and temporal variation in selection pressures (as a reminder, the matching between patches and habitats with option --H (aka --Habitats) is generation-specific). An example of spatial and temporal variation of selection scenario is provided at the end of the "Selection and phenotype" section. By default, selection happens on fertility, but it can also be simulated on viability or on both fertility and viability.  This option is species-specific. The “info” is either fertility, viability or both. Selection on fertility is faster than selection on viability. It is the default and recommended mode. As an example, --selectionOn info Selection on fertility and/or viability  164  require selection to be on viability and not on fertility. 7.2 T1 On T1 loci, a user can either set the fitness values of each of the three genotypes or take advantage of the multiplicative fitness assumption and only provide a single fitness value per locus.  This option is species- and habitat- specific and is aka --T1_FitnessEffects. The possible modes are A, domA, cstH, unif, multfitA (aka MultiplicityA ), multfitUnif (aka MultiplicityUnif), multfitGamma (aka MultiplicityGamma ). Table 2 summarizes the different mode of entries. All modes starting by multfit (or "multiplicity" for alternative names) means that you are willing to assume "multiplicative fitness" that is genotype 00 (unmutated homozygote) has a fitness of 1, genotype 01 (heterozygote) has a fitness ’x’ given in input and genotype 11 (mutated homozygote) has a fitness x2. Note that in the current version, it is impossible to make the multfit assumption for a given habitat but not for another one. It will raise an error message. There is no good reason for this limitation. If you need to get rid of it, you can ask me for help. --selectionOn viability --T1_fit mode float Selection on T1 loci  165 Table 2: Input format for selection on T1 loci Mode Meaning Example A Indicates the fitness of all three genotypes at all loci. --L T1 2 --T1_fit A 1 1 1 1 0.98 0.9 cstH first indicate a dominance coefficient "h", then indicate the keyword "hetero" or "homo". Then indicate for each locus the fitness of either the heterozygote or the double mutant homozygote fitness (depending on the keyword). The fitness of the other genotype will be computed from h. The fitness of the double wild type is always 1.0. --L T1 3 –T1_fit 0.5 hetero 0.95 0.95 0.9 domA First indicate either the keyword cst (stands for “constant”) or fun followed by a float number indicating the average dominance coefficient H. if cst is used, then all loci have the same dominant coefficient H. If fun is used, then the dominance coefficient at all loci is given by hi = e-k si/2, where k = -log(2H) / S, where S is the average selection coefficient. --L t1 3 domA fun 0.2 0.95 0.98 0.9 unif Indicates the fitness components for all three genotype and assume that all loci are under the same selection scenario. --L t1 –T1_fit 1e5 unif 1 0.98 0.6 multfitA Indicates the fitness components of the genotype 01 at each locus separately and makes the assumption of multiplicative fitness. --L T1 3 –T1_fit MultiplicityA 0.9 1.0 0.9 multfitUnif Indicates the fitness components of the genotype 01 for all loci makes the assumption of multiplicative fitness. --L T1 1e6 –T1_fit multfitUnif 0.99 multfitGamma Just like multfitA or multfitUnif except that the fitness values are 1 minus the selection coefficient which are drawn from a gamma distribution with parameters alpha and beta given by the user. --L T1 1e6 –T1_fit multfitGamma 0.111 2.22  For epistatic interactions use  166  This option is species- and habitat- specific and is aka --T1_EpistaticFitnessEffects. Note that --T1_epistasis is independent of --T1_fit. As such a locus could be under both non-epistatic selection and epistatic selection. The effects would be multiplicative. It is up to the user to decide whether (s)he want such thing or not. If this is the case SimBit will throw a warning message though just to make sure you know what you are doing. Note also that one locus can belong to more than one set of loci for which an epistatic interaction is defined. Note also, that epistasis is only available on T1 loci and not on T5 loci (mainly for performance reasons). The user can specify any number of set of loci that are in epistatic interactions and each set can contain any number of loci. If you specify n loci after keyword loci, then SimBit will expect 3n fitness values after the keyword fit. For a three-locus interaction, SimBit expects 33 = 27 fitness values. The first fitness value entered is the fitness for when the individual is homozygous wild type for the three loci. As an example  In this example, the loci 0 and 5 have an additive by additive epistatic interaction while the loci 2 and 3 have an additive by dominance epistatic interaction. 7.3 T2  This option is species- and habitat- specific and is aka --T2_FitnessEffects. Here there are only three modes: A, unif and gamma. In all cases, it assumes “multfit” of dominance effects (one may argue that the modes might better be renamed multfitA, multfitUnif and mulfitGamma). 7.4 T3 For T3 loci, one must indicate how the genotypes match to phenotypes with --T1_epistasis loci <lociSet> fit <fitnesses> loci <lociSet> <fitnesses> … --T1_epistasis loci 0 5 fit 1 0.9 0.8 0.9 0.9 0.9 0.8 0.9 1 loci 2 3 fit 1 0.9 0.8 0.8 0.9 1 1 0.9 0.8 --T2_fit mode float(s) Epistasis Selection on T2 loci  167  This option is species- and habitat- specific and is also known as --T3_PhenotypicEffects. The input format is a bit unusual here as this option expects an integer value and then habitat-specific arguments each with a mode float numbers. This is the reason I included the @S and @H in the presentation of the option. The integer value is the number of dimensions of the phenotypic space. There are two modes, A and unif, the expected number of entries is the number of dimensions of the phenotype and all loci will have the same impact on the phenotype. For Mode A, the expected number of entries is the number of dimensions times the number of T3 loci. For example  indicates a case where there are 2 T3 loci, the first locus affects only the last dimension of the phenotypic space and the second locus affects all dimensions. If the first locus has value -5 and the second locus has value 10, then the contribution to the phenotype for this specific haplotype (which will be added to the contribution of the other haplotype) is 11, 1, -0.5. The phenotypic value along the ith, Zi is therefore !! =#$%&!,#'($,# + (%,#*,&#  where L is the number of loci, locj,I is the effect of the jth locus on the ith phenotypic axis and A1,j and A2,j are the allelic values at the jth locus of the alleles on the first and second haplotype, respectively. Because --T3_pheno is a habitat-specific variable, one can model phenotypic plasticity (but not the evolution of the reaction norm in the current). Here is a simple example involving a plastic response  To match the phenotype to a fitness (fitness landscape), use the following option  --T3_fit @S0 int @H0 mode float @H1 mode float ... @S1 --Loci T3 2 --T3_pheno 3 A 0 0 0.5 1.1 0.1 0.2 --Loci T3 2 --T3_pheno 1 @H0 A -0.5 -0.5 @H1 A 0.5 0.5  --T3_fit @S0 selectionMode @H0 entryMode mean gradient/omega @H1 ... @S1 T3 phenotype Selection on T3 phenotypes  168 This option is species- and habitat- specific and is also known as --T3_FitnessLandscape. Again, the input is a little unusual, hence the inclusion of @S and @H in the presentation of the option. The “selectionMode” can be simple or gauss. If “selectionMode” is simple then the fitness component of a given phenotypic axis is calculated as a linear regression with 'mean' and 'gradient' (or slope) given after the EntryMode. Let D be the number of dimensions, zi be the phenotypic value along the ith axis, zopt,i be the optimal phenotype and gi be the gradient (or slope) along this same axis, then the fitness is defined by  , =-1− 0!|2! − 2'(),!|*! , where |x| means absolute value of x. If for any dimension i the quantity 1 − #!|%! −%"#$,!| is zero or negative, then of course, the fitness is set to 0. If “SelectionMode” is gauss, then it expects and optimum (mean; %"#$,!) and selection strength omega. Fitness is then given as , =-345 6−'2! − 2'(),!*%%7308 9*!  The two possible “entryMode” are A and unif. For unif, two values are expected (unif mean gradient or unif mean omega). For A, 2D values are expected (e.g. A mean1 gradient1 mean2 gradient2 or A mean1 omega1 mean2 omega2). 7.5 T4  T4 loci are necessarily neutral.  7.6 T5  This option is species- and habitat- specific and is aka --T5_FitnessEffects. The modes for selection on T5 loci are very similar to those on T1 loci. The possible modes are A, domA, cstH, unif, multfitA (aka MultiplicityA), multfitUnif (aka MultiplicityUnif), multfitGamma (aka MultiplicityGamma). The only difference with T1 loci is that if you do not want to take advantage of the multfit assumption, then you can specify only two fitness effects per locus, the fitness effect of the heterozygote individual and of the double mutant homozygote (in this order). The fitness of the homozygote wild type is always assumed to be 1.0 for T5 loci. --T5_fit mode float(s) Selection on T5 loci  169 Just for practice, here is an example of a somewhat complex simulation  Here, we ask for one patch for the first 5,000 generations and 10 patches in an island model afterward up until the end of the simulation at generation 15,000. The total carrying capacity remains at 500 during the entire simulation, as the carrying capacity of the single patch of the first 5,000 generations are set at 500 and the carrying capacity of each of the 10 patches from generation 5,000 are set at 50. The genetic map is made of 10 T1 loci surrounded on each side by 200 T5 loci. Mutation rate is 10-6 over all T1 and T5 loci and the recombination rate between adjacent loci is uniform at 10-6. Throughout the entire simulation, T5 loci are always lethal in the double homozygote state and have a selection coefficient of 0.01 in the heterozygote state (and 1 in the double homozygote non-mutated state). The selection on T1 loci vary with the habitat though. In habitat 0, there is no selection at all on T1 loci. In habitat 1, all T1 loci are under selection with fitnesses 1, 0.98 and 0.982 at all three genotypes. During the first 5,000 generations, the single patch is in habitat 0 (no selection on T1 loci). For the remaining 10,000 generations, the first 8 patches are in habitat 0 (no selection on T1 loci) and the remaining 2 patches are in habitat 1 (selection on T1 loci). The simulation runs in 7 seconds on my machine. 8 Demography and species ecology Here I do not mean to talk much about the basic options --N and --m (see section "Launching basic simulations" for their uses). Instead, I want to talk about how change in patch sizes is modelled and how fecundity and selection, species interaction, and migration affect patch sizes. Note that I talk about carrying capacity to refer to the absolute maximal number of individuals in a patch and I talk about patch size to refer to the number of individuals in a patch (which can differ from the carrying capacity on user’s demand). SimBit assumes non-overlapping generations (although different species can have different generation times) and assumes discrete patches (although patches can be made arbitrarily small, essentially mimicking continuous space). Outside of these two assumptions, SimBit can simulate very diverse types of demographies. SimBit --PN @G0 1 @G5e3 10 --H @G0 unif 0 @G5e3 A rep 0 8 1 1 --m @G0 OnePatch @G5e3 island 0.01 --N @G0 A 500 @G5e3 unif 50 --L T5 200 T1 10 T5 200 --T1_fit @H0 multfitUnif 1 @H1 multfitUnif 0.98 --T5_fit unif 0.99 0 --T5_mu unif 1e-6 --T1_mu unif 1e-6 --r cM unif 1e-6 --nbGens 15e3  170 can simulate any number of patches with any migration matrix (see option --m in section "Launching  basic simulations"), carrying capacity (see option --N in section "Launching basic simulations"), variation of the patch size from the carrying capacity based on realized fecundity with exponential or logistic growth model (the growth model can be set for each patch independently; see more on that below). Each patch can be initialized at the desired size, and all of the above parameters can vary over time. Dispersal can happen at the gametic or at the zygotic phase and may be a function of the patch mean fitness (hard vs soft selection). This is modified with the following option  This option is species-specific. If the migration rate is weighted by fitness, it would simulate a case of hard selection, otherwise it would be a case of soft selection. Note that when the fecundity differs from -1 (see below), then dispersal is necessarily weighted by fitness. Two entries are possible f (or 0 or false, FALSE or False) and t (or 1 or true, TRUE or True). false is default and means soft selection and true means hard selection. Choosing false (the default) might make the simulation a little bit faster especially for simulations with lots of patches.  This option is species-specific. If the option is set to true, then gametes disperse and the offspring will leave wherever the gametes meet. If set to false, then the offspring migrate. Concretely, if gamete disperse, the two parents can be from different patch. If offspring disperse, then the two parents must be from the same patch. Because offspring dispersal is computationally slightly faster (although often negligibly so) and is a standard in simulations, the default is true.  This option is species-specific and is aka --fecundityForFitnessOfOne. By default, this number is set to -1. In such case, the patch size is always at carrying capacity (it is like infinite fecundity). You could set the fecundity to an arbitrarily large value to obtain the same effect as -1 but that would (slightly) slow down the simulation. The fecundity is understood as a per individual measure. Take note that when using males and females, only the fecundity of females will affect the number of offspring produced (as long as there is at least one male in the patch). If we assume --DispWeightByFitness bool --gameteDispersal bool --fec float Hard or soft selection Gametic or Zygotic dispersal Fecundity  171 all fitnesses are at 1, then if we have hermaphrodites a fecundity of at least 1 will be necessary to replenish the entire patch of individuals (there can be stochastic variation that will cause on average a decrease in the patch size; see --stochasticGrowth. If you use a males-and-females mating system with a sex ratio of say, 0.75 (3 male for 1 female; see --matingSystem), then you will need a fecundity of at least 4 to have subsistence. SimBit can simulate realistic changes in population in response to patch mean fitnesses. Let’s denote at time t the expected number of offspring of a species s produced in patch p as &$,&,#''''''. Let’s also denote the patch growth rate ($,&,# = *∑,! as the product of f, the theoretical maximum fecundity of an individual having a fitness of 1.0 (set by the user), and ∑,!, the sum of finesses in this patch. If the user allows the patch size to vary from the carrying capacity of this species and that at time t, in patch p, for species s, the carrying capacity is set to Kt,s,p then the expected number of offspring produced is &$,&,#'''''' = (-$,&,#  for the exponential model and &$,&,#'''''' = -$,&,# + (-$,&,# /1 − '+,,,-(+,,,-0 for the logistic model, where -$,&,# is the size of the patch p of species s at time t. The actual number of offspring produced, &$,&,#	can then either be set deterministically 2&$,&,# = &$,&,#''''''3 or stochastically 2&$,&,# =&456647(&$,&,#'''''')3. With more than one patch, these offspring produced are then spread out through migration. With a single patch (or in absence of immigration and emigration for the patch p), -$)*,&,# is simply set to &$,&,#.  Into the above framework, we can add the fact that different species can affect each other through their ecological relationships. This can be achieved through a “competition matrix” that implements a Lotka-Volterra model of competition and/or through an “interaction matrix” that implements a consumer-resource model (or predator-prey model) with a linear rate of resource consumption (introduction to these models in Otto & Day, 2007; discrete-time example of a predator-prey model in Çelik & Duman, 2009). Let ⍺i,s be an element of the “competition matrix” describing the competitive effect of species i  on focal species s. The expected number of offspring produced is then given by &$,&,#'''''' = -$,&,# + (-$,&,# /1 −∑ ⍺.,,. '+,.,-(+,,,- 0. Note that competitive effects can only be set on species and on patches having logistic growth. Let βi,s be an element of the “interaction matrix” describing the effect of species i on species s. The interaction effect is added to the expected number of offspring produced &′$,&,#''''''' = &$,&,#'''''' + ∑ β!,&! . In this last equation, I assumed that all effects βi,s are independent of the patch sizes of both the causal and recipient species but in practice a user can specify for each βi,s whether the effect should be multiplied by the causal species patch size (-$,!,#), by the recipient species patch size (-$,&,#) or by both. SimBit enforces that all the diagonal values ⍺&,& =1.0	and that all the diagonal values β&,& = 0.0 by enforcing the user to input the keyword self (see below).   172 The growth model can be independently for each patch set with the option  This option is species-specific. The two possible modes are unif and A. The value(s) accepted are either the keyword "logistic" (aka -2), the keyword "exponential" (aka -1) or any positive integer value. A positive integer value means that you want a logistic growth but you want to set the carrying capacity for this growth calculation to some other value than the upper limit set by option --N. This is a neat way to allow more realistic demographics such as overshooting of the carrying capacity for example. Note that the patch size can never be greater than what is set with option --N. By default the growth rate is logistic. This growth rate only matters if the fecundity (set in option --fec) differs from -1.  If set to true (or other equivalent such as 1, t, T, ...), then the number of individuals in the next patch is drawn from a Poisson distribution as explained above. By default, it is set to false.  This option is aka --speciesEcologicalRelationships. The terms interaction and competition are keywords that precede the interaction matrix and the competition matrix, respectively. The ordering of the elements of the interaction and competition matrices makes sense by considering the ordering of the species as entered in option --species (aka --S). For the competition matrix SimBit expects S2 αi,j , where S is the number of species. For the effect of a species on itself (that is all the diagonal αi,j), SimBit expects the keyword self. Each element of the interaction matrix is made of two entries; The first entry is the "type" (either A, B, C, D or 0) and the second entry is the βi,j value. Just like for the competition matrix, for the effect of a species on itself, SimBit expects the keyword self (no letter, no βi,j, just write self). The "type" indicates whether the interaction effect must be multiplied by the causal species patch size -$,!,# (type B), by the recipient species patch size (-$,&,#) (type C), by both (type D) or by none (type A). The type 0 (the number zero, not the letter o) means no interaction (which is the default).  If you want to use the default matrix, just enter default as matrix description. The default input is therefore interaction default competition default, which would lead to simulate different species that are completely independent. --popGrowthModel mode values --stochasticGrowth bool --eco interaction <matrix> competition <matrix> Growth model Stochastic growth Species ecology  173 9 Mating system  This is a species-specific option. It sets the proportion of reproduction happening through cloning. Note while cloning, mutations are still happening so that offspring might not be exactly a clone of its parent.  It sets the proportion of reproduction happening through selfing. When using a cloning rate different from zero, the selfing rate is the selfing rate for offspring that are not produced via cloning. A selfing rate of -1.0 (default value) means that selfing rate occurs just like it does in a Wright-Fisher population, that is at frequency 1/N. Note that cloningRate has precedence over selfingRate. This means that if you use both options, then the selfing rate will be conditional on no cloning happened. For example, if you set the cloningRate to 1, then no selfing (or any other sort of sexual reproduction) will ever occur. If you set the cloningRate to 0.5 and the selfing rate to 0.1, then 50% of the offspring will be made through cloning, 5% through selfing (because 10% of 50% is 5%) and the other 45% through normal reproduction.  There are two possible mating systems; h (or H) for hermaphrodites, and fm (or mf, FM, MF) for males and females. Hermaphrodite is the default setting. If you specify males and females, you will then need to specify the sex ratio (the proportion of males) in the population. For example  sets all species to a males and females mating system with an even sex ratio. It is also possible to vary generation time between species. This is achieved with  --cloningRate float --selfingRate float --matingSystem system (sexRatio) --matingSystem fm 0.5 --nbSubGenerations int Cloning Selfing Mating System Sub-generations  174 This is a species-specific option and is aka --nbSubGens. A "SubGeneration" is a generation within a generation. This allows you to simulate a species that has, say 4 generations every time other species have one generation. The number of "SubGenerations" per generation must be an integer. For example  would cause species 1 to have two generations for every generation of species 0. It would make little sense to input something like  or  Instead, just triple the number of generations to option --nbGenerations. By default, of course, the number of sub-generations per generation is set to 1.  10 Defining individual types As a user, you can define what I call “individual types”. An individual type is an abstract individual for which you have fully specified its genome. You can then use those individual types to initialize the population (with option --individualInitialization; see section "Initial Population" below) or to reset the genetics of the population during run time (with option --resetGenetics; see section "Reset genetics" below).  This option is species-specific and is aka --indTypes. Each individual description starts with the keyword ind. It is then followed the name of this individual type by the keyword haplo0 with the haplotype description and the keyword haplo1 and its haplotype description. It is also possible to do ind bothHaplo <haplotype description> if both haplotypes are to be identical (perfect homozygosity). As --nbSubGenerations @S0 1 @S1 2 --nbSubGenerations @S0 3 @S1 6 --nbSubGenerations 3 --individualTypes ind <individualName> haplo0 <haplotype description> haplo1 <haplotype description> ind <individualName> ... Individual types  175 an example, the haplotype description is T1 0 0 0 0 1 indicates that the first 4 loci of T1 loci carry the wildtype allele while the last locus carries the mutated allele. Let’s say we want to define an individual type named “wildtype” and another named “mutant” that are fixed for the 0 and the 1 allele, respectively. We would do  Note that you cannot do bothHaplo T1 unif 1 but need to use rep instead if you do not want to write the number 1 50 times. Haplotype description must be made for each type of locus that you asked for with option --L (--Loci). For example, you asked for the types T1, T3 and T5 with, then you can simply do –T1 <T1 loci description> T3 <T3 loci description>. Note that the ordering does not matter (T5 <T5 loci description> T1 <T1 loci description> T3 <T3 loci description> is also valid). Table 3 explains how to provide a description for each type of locus. Table 3: Input format for individual types Locus type Entry Example T1 Indicate the value of each locus T1 0 1 0 T2 Indicate the number of mutation in each T2 block T2 0 0 12 2 T3 Indicate the value of each QTL T3 0 0 5 T4 Sorry, --indIni is currently not able to set T4 loci  T5 Indicate the position of each mutation T5 0 23 11 Initial Population By default, the patch size is initiated at carrying capacity (set by –N; aka --PatchCapacity). To set the initial patch size for each patch, use the following option  --L T1 50 --individualTypes ind wildType bothHaplo T1 rep 0 50 ind wildType mutant bothHaplo T1 rep 1 50 --InitialpatchSize mode ints Initial patch size  176 The two possible modes are A and unif. Of course, no initial patch size can be larger than the initial carrying capacity at generation 0. By default, all T2 loci are set to carrying 0 mutations, and all T1 and T5 loci are set to 0.  One can use individual types (defined with option --individualTypes; see section "Defining individual types" above) to initialize a population with the following option  This option is species-specific and is aka --indIni. Here, use the keywords patch0, patch1, ... patchx to describe how to initialize each patch. In each patch name, an individual type and the number of individuals of this type to put in this patch. SimBit will ensure that the number of individuals that you put in a patch is equal to the patch size at the beginning of the simulation. (Reminder: if the initial patch size is not specified, then it is set to the carrying capacity). Alternatively, one can use the following option to initialize T1 loci and T2 loci independently but, while they can be useful, these options are less flexible that using individual types. By default, all T2 loci are set to carrying 0 mutations and all T1and T5 loci are set to 0. It is possible however to indicate the per patch allele frequency with --T1_ini (aka --T1_Initial_AlleleFreqs) and --T5_ini (aka --T5_Initial_AlleleFreqs)   These are an option-specific options. The modes are AllOnes, AllZeros, A and Shift. The parenthesis around "value" above indicates that the input of values depend on the mode. See table 4 for more information. For example,  It will set the allele frequency at the zeroth locus to 0, 0.5 and 1 in the three different patches and the frequency of the locus index one will be set to 0.2 in all --individualInitialization patch0 <indTypeName> <nbIndividuals> <indTypeName> <nbIndividuals> .. patch1 <indTypeName> <nbIndividuals> ... --T1_ini mode (value) --T5_ini mode (value) --L T1 2 --PN 3 --T1_ini A 0 0.2 0.5 0.2 1 0.2 Initialization with individual types T1 initialization T5 initialization  177 patches. For more flexibility in initialization, please use the option --indIni (--individualInitialization) instead. Table 4: Input format for --T1_ini and --T5_ini Mode Meaning Example AllZeros Set all loci to 0. AllZeros AllOnes Set all loci to 1 (this option is not available for T5 as it is a bad idea for performance reasons). AllOnes A Specify the per patch and per locus allele frequency. The initialization will be done in a pseudo random manner in order to keep LD low A 0 0 0.2 0.2 0 0 Shift Specify a given patch before which allele frequencies (at all loci) are set to 0 and after which allele frequencies (at all loci) are set to 1 Shift 12 Another option is to directly import a population saved in a binary file from a previous simulation. For this use  In order to read a binary file correctly, one must know what it is reading. For this reason, it is essential to specify correctly the number of type of loci, number of patches and number of individuals per patch with the usual --PatchNumber, --N and --L options. Note that in the current version, T4 loci cannot be dumped into a binary file and therefore can be read from it either. 12 Reset genetics It is often useful to make arbitrary modification to the genetics of a population during runtime. It is common for example to introduce mutations at specific times in order to track its evolution. One might also want to introduce individuals into the populations that come from a fictional, non-simulated infinite population. Such features can be done with the following option. --ReadPopFromBinary file Import population  178  This option is species-specific. There are two types of events (“eventType”) called eventA and eventB. eventA refers to the input of specific mutations. eventB refers to the input of individual types (see section “Individual types”). For eventA, the “eventDescription” is structured as   Directly after the keyword eventA comes the generation and then the trait type to be affected in this event (T1, T2, T3 or T5; T4 not accepted). If the trait type is T2 or T3, then SimBit will just reset the designated loci/individuals/patch to zero (set the number of mutation in the T2 locus to zero). If the trait type is T1 or T5, then an extra specification (“typeOfMutations”) must be given to describe the type of mutations. There are three possible types of mutations setTo0, setTo1 and toggle.  It is followed by loci list information (“lociListInformation”). The user can either say allLoci and all loci will be affected or list the loci with the keyword lociList. For example to affect loci indices 0, 4 and 7, indicate lociList 0 4 7. Afterward comes the haplotypes information. It must start with the keyword haplo and be followed by either 0, 1 or both to indicate on which set of chromosomes (individuals are diploid as a reminder) the mutation will happen (e.g. haplo both) Finally comes the patch and individual information. It must be formatted as patch 0 <individuals information> patch 4 <individuals information>. To affect all individuals within a patch, use the keyword allInds and to affect only specific individuals within a patch use the keyword indsList (e.g. indsList 1 2 3 4 5). For example to affect all individuals of patch 0 and only individuals 10, 20 and 25 of patch 3, you would write patch 0 allInds patch 3 10 20 25. Let's do a full example. Let's assume we would like to simulation a selective sweep. We want only one mutation on the 11th T1 locus (index 10 by zero based counting) of the first (index 0) individual in the third (index 2) patch to happen at generation 100. We could do --resetGenetics <eventType> <eventDescription> <eventType> <eventDescription> eventA generation TraitType (typeOfMutationsIfNeeded) <lociListInformation> <haplotypesInformation> <patchAndIndividualInformation>. Reset genetics  179  Note that the genetic reset will happen at the end of the specified generation but just before writing the outputs for this generation. So, in the above example, the offspring of the generation 500 (that is the parents of the generation 501) are going to be reciprocally fixed at loci 10, 15 and 20. If the fecundity (--fec) is not set to -1, it is possible that the patch size differs from the carrying capacity. If an individual does not exist, SimBit can of course not mutate an inexistent individual. If you want to simulate a single mutation, it is therefore more strategic to use individuals with a low index. For eventB, the input is formatted as  If the patch size differs from the carrying capacity, SimBit will start by adding the individual types in the patch until it reaches carrying capacity, then only will SimBit start to replace currently existing individuals with the individual types. For example, imagine you want to model genetic rescue of a small population. For this, imagine you want to model the immigration of individuals coming from two populations of infinite sizes. Let’s say we want to introduce 5 migrantTypeOne and 5 migrantTypeTwo at generations 10, 20 and 30. You define individual types migrantTypeOne and migrantTypeTwo with option --indTypes and then you introduce them with   13 Output  This option is aka --GP. indicates the path where all the output will be printed. Other paths relative to outputs are all relative to the "GeneralPath". SimBit does not add a --resetGenetics eventA 500 T1 setTo1 lociList 11 haplo both patch 2 indsList 0 eventB generation <indTypeName> <nbInds> <indTypeName> <nbInds>… --resetGenetics     eventB 10 migrantTypeOne 5 migrantTypeTwo 5     eventB 20 migrantTypeOne 5 migrantTypeTwo 5     eventB 30 migrantTypeOne 5 migrantTypeTwo 5 --GeneralPath path General path  180 terminal "/" to the path. So, if your path does not end with a "/", then the characters after the last "/" are taken as a prefix of all output files. There is no default for this option, so you have to input at least an empty string if you wish any outputs, otherwise an error will be thrown. In all of below outputs you have to input a filename that comes after the "GeneralPath". There are two important keywords; nfn and NFN, which stand for "No File Name". If you indicate nfn (or NFN) as filename for a specific output, then the file will have a specific name (but only a specific extension and generation-specific information or other specific information). This is handy because it allows the user to give a standard name for files directly in the "GeneralPath". For example  will create the same output as  There are 3 general classes of outputs; the logfile, a binary file of the population, and various user-friendly (tab separated values and VCF) outputs. Each input requires first a file name and then, if applicable, timing of when the output must be produced. Please don’t use spaces in the names of files or directory as this might eventually be misinterpreted. Outputs that are species-specific are automatically produced for each species, and the file name is then preceded by the species name. For all the outputs provided, you can ask for a version of these outputs containing sequencing errors.  Using this option with a rate different from 0.0 will cause SimBit to produce all outputs as asked (original data without sequencing error) plus an extra set of outputs with simulated sequencing errors. The files with sequencing error will have the string "_sequencingError" added to the file name just before the extension. --GeneralPath /path/to/directory/firstSimulation --T1_vcf_file nfn 200 300 400 500 --T1_AlleleFreq_file nfn 500 --GeneralPath /path/to/directory/ --T1_vcf_file firstSimulation 200 300 400 500 --T1_AlleleFreq_file firstSimulation 500 --sequencingErrorRate rate Sequencing errors  181 13.1 Logfile The 'logfile' can be used to 1) remember what input has been given to SimBit and 2) to make sure SimBit interpreted the input as we wished although this second usage might not be for beginners.  SimBit will automatically add the extension '.log' to your filename. By default, the filename is 'logfile.log', that is the default entry is  It is possible to specify the type of logfile we want  Three possible entries are possible; 0,1 and 2. The default value is 0. See table 5 for more information. Table 5: Input format for logfile type Entry Meaning 0 No Logfile is being printed 1 Logfile contains only the arguments that SimBit has received through the command line 2 Logfile contains the arguments that SimBit has received through the command line and all the parameters that have been set for the simulation. Logifile might be very big! For all file outputs for which generations at which we need the outputs must be indicated, one can either write out every generation or use the keyword fromtoby (or fromToBy or FromToBy) to indicate a sequence. For examples  is equivalent to --Logfile filename --Logfile logfile --LogfileType int --nbGens 100 --T1_vcf_file filename 10 20 30 40 50 60 70 80 90 100 Logfile Logfile type  182  and is also equivalent to  and to  fromtoby is therefore very similar to seqInt. The two differences are 1) fromtoby only works for outputs (options ending in _file) and 2) fromtoby can accept the keyword end to specify the last generation of the simulation. In all cases, the first value is the start of the sequence, (from), the second value is the end of the sequence (to) and the third value is the increment (by). One can also mix up these methods. For example  can be rewritten  13.2 Export population to a binary file It may be of interest to export the population in a binary file. The main reason why you would want to do that would be to reuse the saved population as starting population for another simulation (with the option --readPopFromBinary). It can also be useful to recalculate statistics about this population later via SimBit by simulating 0 generation or (for advanced users) by directly treating the data yourself from a format that takes very little storage. To export the population in a binary file use the following option --nbGens 100 --T1\_vcf\_file filename seqInt 10 100 10 --nbGens 100 --T1_vcf_file filename fromtoby 10 100 10 --nbGens 100 --T1\_vcf\_file filename fromtoby 10 end 10 --T1_vcf_file filename 1 10 20 30 40 50 60 70 80 90 100 107 200 300 400 500 550 600 650 700 750 800 850 900 950 1000 --T1_vcf_file filename 1 fromtoby 10 100 10 107 fromtoby 200 500 100 fromtoby 550 1000 50  183  such as for example  save the files "HomoSapiens_MyBin_G50" and "PanTroglodytes_MyBin_G50", "HomoSapiens_MyBin_G100" and "PanTroglodytes_MyBin_G100", "HomoSapiens_MyBin_G500" and "PanTroglodytes_MyBin_G500" at the generations 50, 100 and 500, respectively. At each generation it will also save a binary file with the seed information. Its name is just like the above except species name is replaced by "seed". This also mean by the way that the "seed" cannot be used as a species name (SimBit will send an error message if you try). 13.3 User friendly outputs Outputs that are specific to certain type of loci have it clearly indicated in their name (e.g. --T1_FST_file or --T3_MeanVar_file). Just like before the output start with the name of the file (or the keywords nfn or NFN), followed by the generations at which the output is requested. Some options can also take a subset keyword that will allow the user to indicate the set of loci over which the output should be computed. For example:  It will output data at generations 50 and 100 for loci 0 2 4 6 8 10 12 and 14. For these outputs that can take a subset keyword, you can give the option several times to ask for different subset. For example  When using the same option several times, it is the user's responsibility to give different names to these files otherwise the data will be confounded in the same file in ways that can be confusing. The option --T1_fitness_file does not accept --SaveBinary_file file generations --S HomoSapiens PanTroglodytes --SavePopBinary MyBin 50 100 500 --T1_FST_file evenLoci 50 100 subset 0 2 4 6 8 10 12 14 --T1_FST_file evenLoci 50 100 subset 0 2 4 6 8 10 12 14 --T1_FST_file oddLoci 50 100 subset 1 3 5 7 9 11 13 15 Binary file  184 the subset keyword because a more advanced subsetting solution exists already via the option fitnessSubsetLoci_file. 13.3.1 Outputs for T1 loci  This outputs the complete genotype of every individual in a TSV (Tab separated Values). This can be a very large file (hence the silly name). The extension ".T1LO" is added to the filename. For example,  will print the large outputs for generations 100, 200, 300, 400 and 500.  This outputs a VCF (Variant Call Format) file. This can be very handy for usage with the command line vcftools. With the help of the software PGDspider, one can reach almost any commonly used file format from this VCF file. Some software using VCF files do not allow that locus or chromosome to have an index of 0. Hence, unlike everywhere else in SimBit, the first locus has index 1 and the first chromosome has index 1. This leads to a shift of 1 when comparing outputs of, say .T1LO files with .vcf files. The extension .T1vcf is added to the filename. This option accepts the subset keyword.  This outputs the allele frequency at each locus for each patch. The extension .AlleleFreq is added to the filename. This option accepts the subset keyword.  This outputs FST measures (Weir and Cockerham, as well as Nei estimates for both averaging over all loci and as the ratio of the averages of numerator and denominator over all loci). This output file has not really been tested yet. Please consider it with --T1_LargeOutput_file file generations --T1_LargeOutput_file mySimulation 100 200 300 400 500 --T1_vcf_file file generations --T1_AlleleFreq_file file generations --T1_FST_file file generations Outputs: T1 Large Outputs: T1 VCF Outputs: T1 Allele freqs Outputs: FST  185 precaution. The extension .T1FST is added to the filename. More info can be given via --T1_FST_info. This option accepts the subset keyword.  Gives info about what patch comparisons must be performed for FST calculations. It is easier to explain with examples. If you input --T1_FST_info allInteractions 2, then SimBit will output all pairwise FST measures. If you input --T1_FST_info allInteractions 3, then SimBit will output FST measures for all possible triplets of patches.  It outputs the within and among chromosomes average linkage disequilibrium per patch. The extension .MeanLD is added to the filename. This option accepts the subset keyword.  It outputs the longest run (or longest consecutive series) of 0 or longest run of 1 for each haplotype (of each individual in each patch). The extension .LR is added to the filename. This option accepts the subset keyword.  It outputs the hybrid index of each individual. Here, I call the hybrid index of an individual, the fraction of T1 loci of this individual that carry the "1" allele. This is a helpful statistic when used alongside --T1_ini. It allows the user to specify specific patches where all individuals are fixed for the "0" allele and other patches fix for the "1" allele and see how they interbreed through time. With selection against heterozygotes, you can have some barrier to gene flow. This option accepts the subset keyword.  --T1_FST_info nbPatchesToConsider --T1_MeanLD_file filename generations --T1_LongestRun_file filename generations --T1_HybridIndex_file filename generations --T1_ExpectiMinRec_file filename generations Outputs: FST info Outputs: T1 Mean Linkage Disequilibrium Outputs: T1 Longest Run Outputs: T1 Hybrid Index Outputs: T1 Average minimal number recombination events  186 This outputs the average (averaged among all haplotypes within each patch) number of times a run of zero is stopped by a run of 1 or vice versa. For example the haplotype '1111011111100000000' has 3 such events. The extension .EMR is added to the filename. This option accepts the subset keyword. 13.3.2 Outputs for T2 loci  This outputs all genotypes of all individuals. This can be a very large file. The extension .T2LO is added to the filename. 13.3.3 Outputs for T3 loci  This outputs all genotypes of all individuals (but not the phenotypes). This can be a very large file. The extension .T3LO is added to the filename.  This outputs the mean and variance in phenotype (along each dimension of the phenotypic space) per patch. 13.3.4 Outputs for T4 loci For T4 loci, there are the following two types of output. They work exactly like the T1 outputs with the same name except that they can't take the subset keyword.   The option --T4_printTree outputs the entire Ancestral Recombination Coalescence Tree for the T4 loci. Note that the tree is being outputted each time the current states are being computed, which is each time you asked for some T4 outputs --T2_LargeOutput_file filename generations --T3_LargeOutput_file filename generations --T3_MeanVar_file filename generations --T4_LargeOutput_file filename generations --T4_vcf_file filename generations Outputs: T2 Large Outputs: T3 Large Outputs: T3 MeanVar Outputs: T4 Large Outputs: T4 VCF  187 and each time the average number of nodes per haplotype overpass the limit set by the option --T4_maxAverageNbNodesPerHaplotype. Interpreting several trees might be tricky. Hence, if you are not asking for any specific T4 output before the last generation, you might want to set --T4_maxAverageNbNodesPerHaplotype to a very large number (like --T4_maxAverageNbNodesPerHaplotype 1e9 for example) to make sure the current states will never be computed before the very end of the simulation. That might affect performance though if the recombination rate is relatively large.  13.3.5 Other Outputs  it outputs the fitness of every individual in the population. The extension .fit is added to the file name.  This option is very similar to --fitness_file except that it allows outputting fitness for specific subset of the genome. The option is hence species-specific and allows definition of an infinite number of subset of the genome (called LociSet) that a user may want. The argument comes like other output arguments with the filename followed by the time at which output must be produced. What follows the time indication is a little bit unusual. First, you have the species-specific markers. Then, you can create an indefinite number of sets of loci from which fitness will be computed. Each set starts with the keyword LociSet.  After this keyword, you specify what types of loci you are willing to consider and their associated indices. The four possible types are T1 T2 T3 and T1epistasis. You can specify several types per set if you want. For example  --T4_printTree filename --fitness_file filename generations --fitnessSubsetLoci_file filename generations @S0 LociSet T1 ints T2 ints T3 ints T1epistasis ints LociSet ints ... @S1 LociSet T2 ints ... --fitnessSubsetLoci_file myFile LociSet T1epistasis 0 3 4 T1 0 5 10 T3 0 1 2 3 LociSet T1epistasis 0 1 2 3 4 Outputs: T4 Tree Outputs: Fitnesses Outputs: Fitnesses subset of loci  188 will lead SimBit to consider two sets of loci. One for which it will compute normal selection on the T1 0 5 and 10 as well as epistatic selection on T1 loci 0 3 and 4 and selection on T3 loci 0, 1, 2 and 3. The second set only computes the epistatic selection on loci 0, 1, 2, 3 and 4. Note that for both LociSet, the 4 components of fitness (T1Fitness, T2Fitness, T3Fitness and T1epistasisFitness) are printed. Hence, it would serves no purpose to add a third LociSet (LociSet 1 0 5 10) as this information is already contained in the first LociSet. The example lack any species-specific marker and therefore assumes either a single species or that the same LociSet are required for all species.  This outputs the fitness mean and variance per patch. The extension .fitStats is added to the filename.  This outputs the number of individuals in each patch. The extension .patchSize is added to the filename.  This outputs the extinction time (if it applies) for every species.  It outputs the entire genealogy between the two time points indicated (expects only two time points). Because, this represents a lot of data, SimBit does not keep the entire genealogy in the RAM but prints it out in a temporary file that it later merges together into a single file. Because a lot of files are being printed during the simulation, we recommend that you indicate a directory to SimBit, Something like  At the end of the simulation, there is a single file left. Each generation gives a line that looks like "G_1005 P0I0_P0I34_P3I102 P0I1_P0I121_P0I97 etc...". "G_1005" --fitnessStats_file filename generations --patchSize_file filename generations --exctinction_file filename generations --genealogy_file filename generations --genealogy_file familyTree 1000 5000 Outputs: Fitness stats Outputs: Patch sizes Outputs: Extinction Outputs: Genealogy  189 indicates the generation (generation 1005), and is followed by tokens with three values separated by "_" such as "P0I0_P0I34_P3I102". The first one is ID for the offspring and the last two are IDs for the two parents. "P0I0_P0I34_P3I102" means that the individual index "0" of patch index "0" is the descendent of a parent from patch index "0" individual index "34" and from a parent from patch index "3" (a migrant) individual index "102". I would not quite call it a user-friendly output, but this format can actually become quite handy especially that it allows matching individuals as identified here with their other attributes (such as their fitness) as given by other output files.  The option --coalesce allows SimBit to directly compute the coalescent tree from the genealogy files. In other words, it removes all individuals that did not leave offspring at the last generation sampled for the genealogy.  It takes a single value which is either 0 (which means don't coalesce) or a positive number. If a positive number is used, then SimBit will remove from the genealogy all ancestors that did not leave any offspring at the last generation sampled. Note that when cloning / selfing rates are high, coalescence happens fast but in presence of sexual reproduction, very few ancestors leave absolutely nothing in the current generation and the option becomes almost pointless. The positive value chosen matters only for performance reasons. A value of 250 for example, means that SimBit will look back at previous generations (to remove all ancestors that did not leave any offspring) every 250 generations. Because looking back at ancestors is a little bit slow (because the information is kept on the hard drive, not on the RAM), SimBit will run faster if you input a large number. However, keeping a very large genealogy on the hard drive may become problematic and may saturate the hard drive. As such, it may be of interest to remove all ancestors regularly enough so as to free up the storage. A priori, we would suggest that hard drive storage is rarely a limitation, and we would invite our user to use a large enough number. Without having done much testing, I would a priori recommend using a value of about 4N (where N is the total population size) generations. 14 Technical options  This is aka --seed. It specifies the random seed for the simulation. The seed will be printed on the logfile if this info has been demanded. Knowing the random seed allows one to replicate the same simulation exactly which is often very handy. Instead --coalesce int --random_seed int Outputs: Genealogy Random seed  190 of specifying an integer, you can also use the keyword binfile or f and give the path to a binary file containing the seed. Such binary files can be produced from other simulations (see section "Outputs"). By default, the random seed is set to the computer current time. A user can terminate a simulation upon some condition depending on the stat of the population simulated.  There is for the moment, only one function available, it is called isT1LocusFixedAfterGeneration. The arguments expected are the T1 locus to consider and the generation after which to call the function. The function kills the simulation if the chosen T1 locus is found fixed any time after the generation indicated. Please feel free to let me know if you need another killOnDemand function. An advanced user can also try to modify the function “void KillOnDemand::readUserInput(InputReader& input)” in "KillOnDemande.cpp" and add a new function with the desired name in this same file). It is sometimes helpful to start a simulation at a generation other than 0. This is typically useful when restarting a simulation (from a binary file) who crashed because of overpassing the wall time limit on a cluster). In such case, you can use --startAtGeneration  By default, --startAtGeneration is set to 0.  Entry values are explained in the table 6. By default Overwrite is set to 2. Table 6: Input format for --Overwrite Entry Meaning 0 Do not overwrite 1 Overwrite even if the logfile already exists but not if the last output files exist 2 Overwrite in any case --killOnDemand <functionToCall> <args> --startAtGeneration int --Overwrite int Kill on demand Start at generation Overwrite  191  To ask for a DryRun indicate --DryRun 1. In such case, SimBit will do a normal initialization of the simulation but will abort just before simulating the first generation.  If true, SimBit prints on standard output the progress of the simulation (prints the generation, more generations are printed at the beginning of the simulation than later on). By default, it is set to true. 15 Performance options Performance options do not change what is being simulated but only how it is simulated. To the exception of --swapInLifeCycle, --geneticSampling_withWalker, and --individualSampling_withWalker, all options will produce exactly the same outputs given the same random seed.  This option is species-specific. It is probably the most important performance option. Tweaking this option can eventually make your simulation faster if you use the assumption of "multfit". There are two modes prob and descr. The default entry cannot be copied but is something along the lines of prob 0.008. SimBit sums up the recombination rate between loci until it reaches the probability indicated before creating a new fitness map block. With descr, you can specify exactly which block each locus belongs to. It works as descr <nbLociFirstBlock> <nbLociSecondBlock> <nbLociThirdBlock> ... The total number of loci must be the total number of loci all types summed. For example,  In this example, we are asking for three chromosomes of 1000, 5000 and 1000 loci, respectively. I used --FitnessMapInfo descr in order to specify that fitness map block boundaries match with the chromosomal boundaries. --DryRun int --printProgress bool --FitnessMapInfo mode float --L T1 7000 --r cM A rep 0 999 -1 rep 0 4999 -1 rep 0 999 --FitnessMapInfo descr 1000 5000 1000 Dry run Print progress Fitness map info  192 T4 loci are not directly simulated. Instead a coalescent tree is being computed and mutations are added on the coalescent tree. In presence of recombination, every locus can potentially have its own evolutionary history. Instead of representing every lineage, SimBit records an ancestral recombination graph inspired by Keheller et al. (2018). At any time, a given haplotype can be described by a number of nodes in the ARG. As the average number of nodes per haplotypes increase, the simulations get slower. It is in general not too much of an issue, as drift is often sufficient to avoid this average number of nodes per haplotypes reaching a high value. That being said, it can become an issue. For this, SimBit can redefine the ancestral nodes and clear the tree on demand when the average number of nodes per haplotype is too high. The threshold for such action to be taken is given through the option.  This option species-specific. By default this option is set to 100. As a reminder, for T5 loci, each haplotype tracks the indices of the loci that have been mutated. When T5 locus, say, index 23 reaches fixation, then every haplotype tracks this locus 23, which is not optimal. Instead SimBit can flip the meaning of having 23. If SimBit flips the meaning for index 23, it means that every haplotype that carries the index 23 are not mutated at this locus while those who do not carry index 23 are mutated. SimBit can do this flipping not only for fixed loci but for any desired frequency greater than 0.5. To set this frequency above which meaning of haplotypes are flipped use the following option.  This option species-specific. By default, this frequency is 1.0. To set how often SimBit will check for the loci that have reached a high frequency, use  This option is species-specific. A value of -1 means "never try to flip meaning". By default, SimBit never tries to flip meaning.  --T4_maxAverageNbNodesPerHaplotype mode float --T5_freqThreshold float --T5_toggleMutsEveryNGeneration int --T5_compress bool bool T4 loci performance tweak T5 loci threshold for flipping meaning T5 loci rate meaning flip  Compress T5  193 This option is species-specific and is aka --T5_compressData. The first bool tells whether the T5ntrl loci must be compressed, and the second bool tells whether the T5sel loci must be compressed. By default, T5sel loci are never compressed, and T5ntrl loci are compressed if the number of T5ntrl loci is lower than 216 - 1 = 65535.  This option is species-specific. SimBit can either copy each parental haplotype to create each offspring haplotypes. But if a given parental haplotype creates its last offspring haplotype without recombination, then it would be faster to just copy a pointer to this haplotype. The down side of this is that by copying pointers, we reduce memory contiguity and, also, we have to figure out when is the last reproduction event of each haplotype. By default, --swapInLifeCycle is set to true only if the total number of loci is greater than 100 and if the total recombination map is shorter than 10cM.  In some simulations, sampling individuals for reproduction can be time consuming. When there is no selection, then sampling is relatively fast as it only requires getting a uniformly distributed random integer value (which is in fact harder that it may sound if you want to have truly unbiased uniformly distributed values). When there is selection, however, the probability of sampling an individual depends upon its fitness. Under such circumstance, SimBit produces a random float number uniformly distributed between 0 and the sum of fitnesses in the patch. Then, do a binary search on an array containing the cumulative sum of fitnesses of the individuals in the patch. An alternative method is the alias method described by Walker (1974). I also implemented the alias method but as with some quick benchmarks, it appears to perform a little worse than the other, by default, SimBit does not use the alias method. If you want to use the alias method, please set the following option to true.  A very similar sampling scheme exists for sampling position of mutations and recombination breakpoints. To use the alias method for these sampling, please set the following option to true.  --swapInLifeCycle bool --individualSampling_withWalker bool --geneticSampling_withWalker bool Avoid copying last reproduction of haplotype Alias method for sampling individuals Alias method for mutation and recombination positions  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0390994/manifest

Comment

Related Items