Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Understanding local adaptation and effective population size in the face of complex demographic history Gilbert, Kimberly Julie 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_november_gilbert_kimberly.pdf [ 16.18MB ]
JSON: 24-1.0319242.json
JSON-LD: 24-1.0319242-ld.json
RDF/XML (Pretty): 24-1.0319242-rdf.xml
RDF/JSON: 24-1.0319242-rdf.json
Turtle: 24-1.0319242-turtle.txt
N-Triples: 24-1.0319242-rdf-ntriples.txt
Original Record: 24-1.0319242-source.json
Full Text

Full Text

Understanding local adaptation andeffective population size in the face ofcomplex demographic historybyK imberly Julie G ilbertB.Sc., University of Virginia, 2010A thesis submitted in partial fulfillment ofthe requirements for the degree ofDoctor of PhilosophyinThe Faculty of Graduate and Postdoctoral Studies(Zoology)The University of British Columbia(Vancouver)October 2016© Kimberly Julie Gilbert, 2016AbstractEvolution is driven by four major processes that create, maintain, or eliminate geneticdiversity within and among populations: mutation, gene flow, genetic drift, and natu-ral selection. My thesis examines the role of demographic history and its interactionswith each of these processes in impacting the evolution of populations. Demographichistory can cause various states of non-equilibria in populations creating the potentialto mis-inform important evolutionary inferences. Such inferences may be key formaking conservation decisions in applied biology. Chapter 2 investigates methodsfor estimating effective population sizes under the assumption-violating scenario ofmigration among populations. Effective population size is proportional to the amountof genetic drift a population experiences, yet gene flow can affect measures of driftand thus estimates of population size. Using simulated data to understand theimpact of migration on estimation accuracy, I find that two existing estimation meth-ods function best. I next present two studies on species range expansions and theroles of migration, mutation, selection, and drift on expansion dynamics. Rangeexpansion is a common demographic history in many species and can lead to non-equilibrium genetic scenarios. The first of these studies shows the interaction ofdeleterious mutation accumulation and local adaptation to environmental gradientsduring range expansions (Chapter 3). The interplay of expansion load, mutationload, and migration load lead to different levels of local adaptation in expandingpopulations. Chapter 4 examines the ability of species to expand over patches ofenvironmental optima under different genetic architecture regimes. Expansion isenhanced by certain genetic architectures, and each of these interacts with the sizeof patches on the landscape as well as how strongly selection varies across patches.My final study assesses the reproducibility of analyses using the common stochasticalgorithm structure (Chapter 5). This research finds 30% failure of reproducibilityfor results from structure using published datasets and elucidates the reasons foriifailure of reproducibility. In sum, my thesis contributes to our understanding of howgene flow, population size, heterogeneous selection, and mutation interact to impactthe genetics of populations and thus the fate of evolving biodiversity.iiiPrefaceChapter 2 has been previously published as:Gilbert K.J. and Whitlock M.C. 2015. Evaluating methods for estimating localeffective population size with and without migration. Evolution, 68:2154-2166.Michael C. Whitlock helped formulate the project idea, provided guidance, writing,and editing.Chapter 3 has been submitted for publication. I was primary author on thiswork and conducted all of the simulations, analyses, and the majority of writing.This work was coauthored with Nathaniel P. Sharp, Amy L. Angert, Gina L. Conte,Jeremy A. Draghi, Frédéric Guillaume, Anna L. Hargreaves, Remi Matthey-Doret, andMichael C. Whitlock.Chapter 4 has been submitted for publication and was coauthored with MichaelC. Whitlock. I was the primary author with guidance from Michael C. Whitlock.Chapter 5 has been previously published as:Gilbert K.J., Andrew R.L., Bock D.G., Franklin M.T., Kane N.C., Moore J-S.,Moyers B.T., Renaut S., Rennison D.J., Veen T., and Vines T.H. 2012.Recommendations for utilizing and reporting population genetic analyses:The reproducibility of genetic clustering using the program structure .Molecular Ecology, 21 :4925 -4930 .Timothy H. Vines formulated the initial project idea. All authors performed datareanalyses. I was the primary author.ivTable of ContentsAbstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . vL ist of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . viiL ist of F igures . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Evaluating methods for estimating local effective populationsize with and without migration . . . . . . . . . . . . . . . . 102.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Local maladaptation reduces expansion load during speciesrange expansion . . . . . . . . . . . . . . . . . . . . . . . . . 343.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Genetic architecture and landscape patchiness change adaptiveability during range expansion . . . . . . . . . . . . . . . . . 574.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765 Recommendations for utilizing and reporting population geneticanalyses : The reproducibility of genetic clustering using theprogram structure . . . . . . . . . . . . . . . . . . . . . . . 815.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82v5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.4 Discussion and Recommendations . . . . . . . . . . . . . . . . . . . . . . 875.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 93B ibliography . . . . . . . . . . . . . . . . . . . . . . . . . . 101Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . 115a Supplementary Information to Chapter 2 . . . . . . . . . . . . 115a.1 Supplementary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115a.2 Supplementary Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123b Supplementary Information to Chapter 3 . . . . . . . . . . . . 147b.1 Supplementary results and analyses . . . . . . . . . . . . . . . . . . . . . 147c Supplementary Information to Chapter 4 . . . . . . . . . . . . 154d Supplementary Information to Chapter 5 . . . . . . . . . . . . 160d.1 Papers and data used in our study . . . . . . . . . . . . . . . . . . . . . . 160viList of TablesTable 2 .1 Programs tested in this study with input parameters and otherspecifications used. . . . . . . . . . . . . . . . . . . . . 16Table 2 .2 RMSE averaged over groups of demographic scenarios for the bestperforming estimation methods. . . . . . . . . . . . . . . . 23Table 2 .3 RMSE averaged over groups of demographic scenarios for compositeestimates from the best performing methods. . . . . . . . . . . 26Table 2 .4 Summarized results across methods. . . . . . . . . . . . . . 29Table 4 .1 Terminology and parameter definitions for simulations . . . . . . 63Table A.1 Isolated, no migration scenario RMSEs. . . . . . . . . . . . . 115Table A.2 Island model migration scenario RMSEs at Ne = 50 . . . . . . . 116Table A.3 Island model migration scenario RMSEs at Ne = 500 . . . . . . . 117Table A.4 Stepping stone model IBD scenario RMSEs at Ne = 50 . . . . . . 118Table A.5 Stepping stone model IBD scenario RMSEs at Ne = 500 . . . . . . 119Table A.6 Isolated, no migration scenario coverage probability and infiniteconfidence intervals . . . . . . . . . . . . . . . . . . . . 120Table A.7 Island model scenarios’ coverage probability and infinite confidenceintervals . . . . . . . . . . . . . . . . . . . . . . . . . 121Table A.8 Stepping stone model (IBD) scenarios’ coverage probability andinfinite confidence intervals . . . . . . . . . . . . . . . . . 122Table B.1 Fitness and expansion speed results per parameter combination . . 147Table C.1 Per locus effect size differences . . . . . . . . . . . . . . . . 154Table D.1 Characteristics of structure runs in original studies . . . . . . 171Table D.2 Characteristics of structure runs reanalyzed . . . . . . . . . 172viiList of FiguresF igure 2 .1 - Diagram showing the dispersal patterns for the simulated populationdemographies. . . . . . . . . . . . . . . . . . . . . . . 15F igure 2 .2 - RMSE for all methods across all scenarios. . . . . . . . . . . 22F igure 3 .1 - Landscape graphic.. . . . . . . . . . . . . . . . . . . . 39F igure 3 .2 - Average speed of expansion. . . . . . . . . . . . . . . . . 46F igure 3 .3 - Average core and edge fitness. . . . . . . . . . . . . . . . 47F igure 3 .4 - Average expansion load. . . . . . . . . . . . . . . . . . 48F igure 3 .5 - Average expansion load per locus. . . . . . . . . . . . . . 49F igure 3 .6 - Deleterious allele frequencies across the landscape. . . . . . . . 50F igure 4 .1 - Schematic of heterogeneous landscapes. . . . . . . . . . . . 64F igure 4 .2 - Growth rate of populations expanding over a linear gradient.. . . 68F igure 4 .3 - Growth rate of populations expanding over a patchy gradient. . . 71F igure 4 .4 - Waiting time before expansion into second patch. . . . . . . . 72F igure 4 .5 - Per locus genotypic differences.. . . . . . . . . . . . . . . 74F igure 4 .6 - Population growth rates across patchy landscapes. . . . . . . . 75F igure 5 .1 - Results of structure reanalyses. . . . . . . . . . . . . . 86F igure 5 .2 - Comparison of K values for original studies versus reproducedstudies. . . . . . . . . . . . . . . . . . . . . . . . . . 88F igure A.1 - Range of incorrect source populations provided in the islandmodel scenarios. . . . . . . . . . . . . . . . . . . . . . 123F igure A.2 - Proportion of estimates that reported infinite values of Ne. . . . 124F igure A.3 - All replicate estimated Ne values for neestimator’s ldnemethod across all scenarios.. . . . . . . . . . . . . . . . . . 125F igure A.4 - All replicate estimated Ne values for mlne likelihood methodacross all scenarios. . . . . . . . . . . . . . . . . . . . . 126F igure A.5 - All replicate estimated Ne values for mlne moment methodacross all scenarios. . . . . . . . . . . . . . . . . . . . . 127F igure A.6 - All replicate estimated Ne values for estim across all scenarios. . 128F igure A.7 - All replicate estimated Ne values for colony2’s heterozygoteexcess method across all scenarios. . . . . . . . . . . . . . . 129F igure A.8 - All replicate estimated Ne values for colony2’s sibship likelihoodmethod across all scenarios. . . . . . . . . . . . . . . . . . 130F igure A.9 - All replicate estimated Ne values for onesamp across all scenarios.131viiiF igure A.10- All replicate estimated Ne values for cone across all scenarios. . 132F igure A.11- All replicate estimated Ne values for tmvp across all scenarios. . 133F igure A.12- All replicate estimated Ne values for neestimator’s heterozygoteexcess method across all scenarios. . . . . . . . . . . . . . . 134F igure A.13- All replicate estimated Ne values for neestimator’s coancestrymethod across all scenarios. . . . . . . . . . . . . . . . . . 135F igure A.14 - All replicate estimated Ne values for neestimator’s Pollakmethod across all scenarios. . . . . . . . . . . . . . . . . . 136F igure A.15- All replicate estimated Ne values for neestimator’s Nei andTajima method across all scenarios. . . . . . . . . . . . . . . 137F igure A.16 - All replicate estimated Ne values for neestimator’s Jordeand Ryman method across all scenarios. . . . . . . . . . . . . 138F igure A.17 - All replicate estimated Ne values for the average of mlne’slikelihood and moment methods across all scenarios. . . . . . . . 139F igure A.18- All replicate estimated Ne values for the average of neestimator’sldne at both temporal sampling points across all scenarios. . . . . 140F igure A.19 - All replicate estimated Ne values for the average of mlne’slikelihood and moment methods with neestimator’s ldne estimatesat both temporal samples across all scenarios. . . . . . . . . . . 141F igure A.20 - All replicate estimated Ne values for the average of mlne’slikelihood and moment methods with tmvp across all scenarios. . . 142F igure A.21 - All replicate estimated Ne values for the average of mlne’slikelihood and moment methods with neestimator’s ldne estimatesat both temporal samples and tmvp across all scenarios. . . . . . 143F igure A.22 - RMSE compared across sample sizes. . . . . . . . . . . . 144F igure A.23 - RMSE of Ne across migration scenarios.. . . . . . . . . . . 145F igure A.24- Comparing migration rates for ongoing versus historic migrationscenarios. . . . . . . . . . . . . . . . . . . . . . . . . 146F igure B.1 - Range edge fitness predicts expansion speed. . . . . . . . . . 148F igure B.2 - Leptokurtic dispersal kernel fitness results. . . . . . . . . . . 149F igure B.3 - Recovery from expansion load.. . . . . . . . . . . . . . . 150F igure B.4 - Fitness across the landscape through time for each component offitness for UD = 0.0. . . . . . . . . . . . . . . . . . . . . 151F igure B.5 - Fitness across the landscape through time for each component offitness for UD = 0.1. . . . . . . . . . . . . . . . . . . . . 151F igure B.6 - Fitness across the landscape through time for each component offitness for UD = 1.0. . . . . . . . . . . . . . . . . . . . . 151F igure B.7 - Load due to loci fixed at the range edge. . . . . . . . . . . . 152F igure B.8 - Frequency of deleterious alleles across the landscape throughoutthe course of a simulation. . . . . . . . . . . . . . . . . . 153F igure C.1- Population statistics across the landscape. . . . . . . . . . . 155F igure C.2- Genetic variance across parameter sets.. . . . . . . . . . . . 156ixF igure C.3- Time spent before expansion into second patch. . . . . . . . . 157F igure C.4- Approximately constant genetic variance across parameter sets. . . 158F igure C.5- Mean population growth rates across patchy landscapes for largedispersal kernel.. . . . . . . . . . . . . . . . . . . . . . 159xAcknowledgementsMy greatest thanks goes to my advisor, Mike Whitlock. Mike has been a fantasticmentor and provided me with endless support both scientifically and academically.His thoughtful insight has been critical in the completion of the work presented herein.Mike’s ability to not only teach the intricacies of all aspects of population genetics indepth, but to also enlighten me on the most tangential facts of life has made workingunder his supervision a great pleasure and a truly educational experience. I alsothank the members of my advisory committee: Amy Angert, Loren Rieseberg, andSally Aitken. Their helpful feedback, discussions, and support greatly improved myresearch.Working amongst other members of the Otto-Whitlock labs and the UBC biodiver-sity research group not only made for a fantastic work environment, but also providedmotivational support and vast resources of knowledge and skills. I am honored tothank these people for the knowledge and wisdom they have imparted and thankfulto have gained many lifelong friends and colleagues. In particular, I would like tothank Rich FitzJohn for teaching me many of the intricacies of the R language, aswell as the value of good graphic design. Jeremy Draghi, who taught me C and C++,was always a great source of calm sensibility when I felt lost. Florence Débarre wasan outstanding mentor who imparted deep understanding of mathematical biologyin the most lighthearted manner. I owe thanks to many people, both near and far,friends and family, for support at the best and worst of times – Katie Lotterhos,Sam Yeaman, Holly Kindsvater, Matt Pennell, Michael Scott, Nathaniel Sharp, MattOsmond, Jasmine Ono, Liz Kleynhans, Thor Veen, Carl Rothfels, Tim Vines, DianaRennison, Aleeza Gerstein, Kay Hodgins, Julie Lee-Yaw, Seth Rudman, AmandaMoreira, Chett Hopkins, Cathy Rushworth, and Peter Fields.Despite the absence of a chapter on local adaptation in lodgepole pine, a largeportion of time during my PhD was spent on forthcoming work which would notxibe possible without the help of numerous people. Sally Aitken was an expert on allthings trees and provided connections and support, as well as a great field vehicle.Nick Ukrainetz and Vicky Berger with the BC Ministry of Forests provided me withall details and data on the Illingworth provenance trials. I had two amazing fieldassistants, Evan Cronmiller and Warren Neuvonen, who were both hard-working andfun. They always managed to keep a smile on my face despite the long, hot, bug-, andpollen-filled days spent in the forests of British Columbia. Anne Berland providedexcellent experience and navigational skills for visiting the Illingworth provenancetrials. Stilianos Louca helped me collect additional samples of lodgepole pine fromnorthern BC and the Yukon. Katie Lotterhos, Chett Hopkins, Roxy, Florence Débarre,Holly Kindsvater, and Jeremy Draghi all also contributed their own time to help meat various times in the field. Kristin Nurkowski was my trusted expert in the wet lab,seeing me through troubleshooting DNA extractions and devising solutions to all ofthe difficulties that arise in the lab. I have a newfound appreciation for bunnies anda clean centrifuge thanks to Kristin. I also thank all of the members of the AdapTreeteam and Sally Aitken’s lab.Compute Canada and Westgrid were essential resources without which this re-search would not have been possible. I thank Frédéric Guillaume for all of his timeand help explaining and tinkering in nemo.This work was financially supported by a BRITE Fellowship, UBC Affiliated Fel-lowships, a Zoology Graduate Fellowship, and an NSERC Discovery Grant.xiiTo my parents, who fostered my love of nature and the outdoorsthroughout my entire life.xiiichapter 1Introduction“I do not propose to argue the case for evolution, which I regard asbeing quite as well proven as most other historical facts, but to discussits possible causes, which are certainly debatable.” Haldane (1932), p.3Evolutionary biology, as a discipline, aims to understand the genesis and mainte-nance of the immense biodiversity in existence today. An essential part of this task re-quires understanding the genetic processes and patterns that occur within and amongpopulations. Population genetics coalesced as a field in the early 1900s, followingmore than a decade of controversy and debate over the process of evolution (Provine,2001). Since its advent, population genetics has made great insight into the processesthat contribute to the evolution of populations and species. Our understanding ofgenetic drift, gene flow, mutation, and selection have created a solid foundation forthe growing field of population genetics. Yet as the field has advanced, realization thatthe manifold and minute complexities of population genetic processes can immenselyimpact the evolution of populations and species leaves many open questions in thefield of evolutionary biology.These questions can be addressed through theoretical studies, using analytic mod-els or running simulations, or through empirical studies, gathering observational andexperimental biological data for analysis. Both approaches to understanding evolu-tion complement each other, and require insight from each other. Population genetictheory uses simplified models that lead to evolutionary predictions and expectationsthat are then tested in the real world. Biology is always more complicated, andwhen results in the real world differ from our theoretical predictions, the processrepeats, more complex models are developed and tested in the real world, and our1chapter 1understanding of biology advances. In my thesis, I take a simulation approach to fur-ther our understanding of population genetics in several interesting, non-equilibriumscenarios.The demographic history of a population plays a major role in its evolution.Demography describes the characteristics of a population’s size and compositionthrough statistics such as birth or death rates (Pavlík, 2000). This information is usefulfor short-term predictions of population dynamics, but in combination with popu-lation genetic data a more complete understanding of a population’s past, present,and future can be attained, proving especially vital in conservation biology (Lande,1988; Nunney and Campbell, 1993; Avise, 1995). Effective population size servesas an example of this situation. Census size of many organisms can be obtained,as this simply requires counting all individuals present in a population (thoughfor many reasons, census size can often be the more difficult quantity to obtain).However census size alone does not generally inform long term processes occurringin a population nor how the population may evolve. Genetic data is necessary for suchinsight. For example, a large number of individuals might be present at any giventime due to stochastic events such as an increase in resources, rather than reflectingconsistently large numbers of individuals. Likewise a small number of individualsmight be found if other stochastic abiotic factors such as a recent catastrophic stormled to the death of many individuals. In these instances, defining a population by itseffective size is more useful for informing the evolutionary processes occurring withinthe population and can reveal information that the census size does not.The effective population size is a key parameter in population genetics and evolu-tion. Defined by Wright (1931) as the size of an ideal population that experiencesthe same level of genetic drift as the given population, effective population sizeinforms how populations are likely to respond to evolutionary forces. Though itcan be defined in several different ways depending on which aspect of drift, temporalscale, or other population property is of interest, Wright’s definition serves best forthe contemporary effective size of a local population. The effective population sizehas been used widely as a deciding parameter in conservation situations such as2chapter 1suggesting minimally viable population sizes or establishing breeding populations inboth plants and animals (Lande and Barrowclough, 1987; Soulé, 1987; Ellstrand andElam, 1993).Estimating the effective population size for these purposes in natural populationscan, however, be a difficult task. We use genetic data to understand the degree ofgenetic drift a population experiences, and thus its effective size. This is generallyaccomplished by examining changes in allele frequencies over time. Allele frequencieswithin populations over time are often, if not always, subject to change due to forcesother than random genetic drift alone. Populations are subject to natural selection,mutation, and gene flow among populations – processes that are all capable of alteringallele frequencies in multitudes of ways. Methods developed for the purpose ofestimating effective size attempt to work past some of these difficulties. Using datasetsof only neutral loci removes the biases in estimating effective population size due toselection altering allele frequencies, at least to the best of our ability in identifyingwhich loci are truly neutral. Using data over very short time scales allows us to safelyassume a lack of mutational input for the dataset at hand. However, the final processin this list, gene flow, is not as easily ignored.Gene flow varies widely in amount across species, but is likely a prevalent force inmany populations and species across the globe (Slatkin, 1985, 1987). Estimating ratesof gene flow alone is already a daunting task possessing its own estimation methodsand their accompanied difficulties (Whitlock and McCauley, 1999; Beerli and Felsen-stein, 2001; Hey and Nielsen, 2004; Kuhner, 2006). Gene flow can be a major disruptiveforce in evolution by introducing foreign and sometimes maladaptive alleles (Slatkin,1987). Whether dispersal (i.e., migration) into a population results in successful geneflow is linked to effective size in terms of whether selection is sufficiently strong toremove any maladapted alleles introduced, or whether populations are so small thatselection is weak and gene flow instead becomes a positive force acting to sustainan otherwise extinction-bound population by introducing beneficial variation. Thelatter process is often referred to as demographic rescue and can lead to evolutionaryrescue whereby gene flow introduces genetic variation and leading to population3chapter 1recovery (Bell, 2013; Carlson et al., 2014). Even within his controversial work onshifting balance theory, Sewall Wright recognized the importance of the interplayof effective population size and gene flow with the evolutionary process:“The rate of evolutionary change depends primarily on the balance be-tween the effective size of population in the local strain and the amount ofinterchange of individuals with the species as a whole . . . ” Wright (1930)This introduction of foreign alleles has the potential to alter allele frequencies on topof any levels of genetic drift a population is subject to, and disentangling the effects ofthese disparate processes can be difficult. The caveat introduced with the majority ofestimators for effective population size is that the population must be isolated, but thisassumption is likely violated often in nature. Knowing the demographic history ofa population can thus help to understand how the estimation process may be biased.Without knowledge of the correct demographic history of a population, the abilityof many genetic analysis methods to correctly infer a parameter of interest can behampered. For example, if a population under study recently underwent a vicarianceevent and was previously part of a much larger population, the genetic patterns seenmay still reflect that of the larger population. Similarly, estimating immigration ratesin such a population may lead to the incorrect inference of ongoing immigration whenin fact this signal is only due to the recent similarity among the previously panmicticpopulation. Making conservation decisions based on an incorrectly overestimatedeffective population size in this instance could be detrimental.In Chapter 2 I investigate the impact of migration on estimation of effective pop-ulation sizes through genetic data analysis. Accurate estimation of the effectivepopulation size is key in conservation biology and leads to many political decisionsin terms of when and where to conserve populations at risk and how many resourcesto dedicate to which causes. Misinterpreting or miscalculating effective size may thuslead to the local, or even global, extinction of species in the most extreme cases. Usingsimulated data and testing pre-existing methods developed for estimating effectivepopulation size allows each of these methods to be equivalently tested across commondemographic scenarios. Use of simulated data benefits this investigation by having4chapter 1knowledge of the true effective population size, which otherwise in nature is nevertruly possible.Effective population size is relevant throughout many aspects of evolutionarybiology for reasons other than conservation. Small populations are more subjectto random genetic drift, while in larger populations selection is known to be moreeffective (Wright, 1931; Kimura, 1964; Gravel, 2016). Selection more successfully actsto remove deleterious genetic material and increase the frequency of beneficial allelesin populations when the effective size is larger. Effective population size thus im-pacts the likelihood of new mutations establishing or being lost within a population(Kimura, 1962; Whitlock, 2000). Larger effective populations sizes may even lead toa greater capacity for local adaptation (Leimu and Fischer, 2008). The relationship ofeffective population size and natural selection is thus tightly linked, and understand-ing not only the current effective population size, but also demographic histories ofpopulations thus allows us to infer the processes contributing to the genetic diversitywithin and among populations and species.Demographic history contributes to every population’s current genetic makeup.For example, population bottlenecks, demographic growth, admixture, or migrationevents all leave different genetic signatures in populations. A particular demographichistory, that of species range expansion, leads to interesting genetic processes andpatterns in populations. When populations grow in size, they either remain geo-graphically stationary or spread across geographic space. The latter case is a rangeexpansion. During a species range expansion, founding populations at the expandingfront exist at low effective population sizes. These populations are the source offurther colonizations as the range continues to expand. Thus, the expanding frontundergoes serial founder events and bottlenecks that maintain consistently low effec-tive population sizes. Genetic drift dominates at a range edge, and not only is geneticdiversity reduced, but a process termed allele surfing can occur. Drift increases thefrequency of some allelic variants, and these alleles proliferate across the expandingedge as the source of continued colonizations (Klopfstein et al., 2006). This process canoften be incorrectly inferred as a signal of positive selection (Edmonds et al., 2004), but5chapter 1can also lead to the increased frequency of deleterious alleles that confer a mutationalburden on populations and prevent their expansion or adaptation (Excoffier et al.,2009; Peischl et al., 2013; Peischl and Excoffier, 2015). This accumulation of deleteriousalleles is referred to as the expansion load (Peischl et al., 2013), and has been argued asthe cause for genetic diseases persisting in populations of humans since expansion outof Africa (Lohmueller et al., 2008; Lohmueller, 2014a; Do et al., 2015; Henn et al., 2016,2015). Theoretical studies have simulated this process and shown that these fitnessreductions can be substantial in populations at the range edge (Peischl et al., 2013;Peischl and Excoffier, 2015; Peischl et al., 2015), but the importance of this processremains somewhat debated.In Chapter 3 I examine this process of expansion load accumulating during arange expansion, but introduce a new aspect of study. By combining the presenceof deleterious mutations with expansion over an environmental gradient, we areable to test the interaction of expansion load with the process of local adaptationto the environment. This simulation study also explores new parameter space andincorporates a large-scale, approximately continuous landscape, providing insightinto the significance and prevalence of expansion load in realistic biological scenarios.Not only does this inform predictions of how real world populations may cope withchanging climate, but indicates the further importance of knowing how and whydemographic history plays such an important role in species evolution.In Chapter 4, I continue with models of species range expansions, but investigatea new aspect of this process. To better understand the process of local adaptationduring range expansions, I incorporate patchy environmental gradients over whichexpansion occurs and examine the role of genetic architecture in determining thisadaptive ability. The role genetic architecture plays in determining local adaptationcontinues to be a major topic of research (Holloway et al., 1990; Carroll et al., 2001;Peichel et al., 2001; Holt et al., 2003; Bratteler et al., 2006; Yeaman and Whitlock, 2011;Schiffers et al., 2014; Yeaman, 2015), and understanding how different types of geneticarchitecture may interact with local adaptation during range expansions may lead tovaluable insight. As genomic technologies continue to advance, it is especially impor-6chapter 1tant to create strong theoretical predictions for how a variety of genetic architecturesmay behave under different demographic situations and how this may contributeto populations’ abilities to adapt to changing environments. In today’s world ofclimate change and other anthropogenic forces fragmenting or altering species’ nativeenvironments, the ability of populations to adapt is key to maintaining biodiversity.Local adaptation is important in the process of evolution for many other reasons.Adapting to the local environment is what allows many species to exist across a rangeof conditions. In some cases, local adaptation can also be considered an early stagein the process of speciation, contributing to differentiation among populations thateventually become reproductively isolated (Kirkpatrick and Barton, 2006; Via, 2012).Local adaptation is a process that is highly studied in the context of the balancebetween gene flow and selection (Kawecki and Ebert, 2004), and understanding localadaptation and the maintenance of genetic diversity across populations has been alongstanding question in evolutionary biology:“The roles of local fitness peaks and gene flow in adaptive evolutionremain major open questions in evolutionary biology, nearly a centuryafter Wright first raised the issue.” Barton (2016)Numerous studies have investigated the process of range expansion and adapta-tion over environmental gradients (Garcia-Ramos and Kirkpatrick, 1997; Kirkpatrickand Barton, 1997; Barton, 2001; Bridle et al., 2010; Polechová and Barton, 2015). Thistheory has led to a strong understanding of the role of population sizes, genetic drift,and gene flow at range edges. An important aspect of this process is asymmetric geneflow at a range edge. Because central populations tend to be more dense under thecentral-marginal hypothesis (Brown, 1984; Eckert et al., 2008), proportionally moregene flow reaches populations at a range edge. This gene flow can have one oftwo effects: swamping locally adaptive alleles with foreign, maladaptive alleles andpreventing local adaptation, or introducing necessary genetic variation in otherwisedepauperate populations thereby enhancing the ability to adapt. Investigations ofthese processes have used linear gradients in examining this process, while manyenvironmental gradients in the real word may change more heterogeneously over7chapter 1space. Whether expansion over such heterogeneous landscapes better matches theoryon adaptation in two patch models (Gomulkiewicz and Holt, 1995; Holt and Go-mulkiewicz, 1997; Gomulkiewicz et al., 1999; Ronce and Kirkpatrick, 2001), expansionover a linear gradient, or neither motivates my further investigations within thisthesis.Furthermore, local adaptation during range expansions is a particularly relevantinvestigation, as many species in the northern hemisphere have expanded and adaptednorthward after the last glaciation to recolonize previously uninhabitable terrain. Thegenetic signatures often left behind by this demographic process create difficulties inpopulation genetic inference. A major goal of evolutionary biology today is to identifythe loci that contribute to local adaptation of populations (Coop et al., 2010; Le Correand Kremer, 2012; Savolainen et al., 2013; Whitlock and Lotterhos, 2015). Demographichistory is, however, one of the leading causes of the misidentification of adaptive loci(Whitlock and Lotterhos, 2015). Range expansions in particular can lead to the spatialautocorrelation of neutral alleles with clines in environmental characteristics, leavingbehind a false signal of natural selection on these loci.Apart from local adaptation within populations or effective sizes of populationsexists an even broader question of what a population is and how do we definepopulations explicitly. Understanding effective population sizes and populationsundergoing range expansions necessitates that we know what a population itselfis. This question is not as simple as it may sound, and determining what a truepopulation is can be incredibly difficult in many systems, or within a given systemfor choosing an appropriate hierarchical level of population structure at which todefine populations, or even undefinable in cases of isolation by distance (Waplesand Gaggiotti, 2006). Numerous methods exist and continue to be developed forthe sole purpose of identifying populations or visualizing population structure inboth continuous and discrete populations (Pritchard et al., 2000; Falush et al., 2003;Rosenberg, 2004; Falush et al., 2007; Petkova et al., 2015; Bradburd et al., 2016). Inmy final project, Chapter 5, I present a reproducibility study that tests the abilityfor a common and stochastic population clustering algorithm to repeatedly produce8chapter 1the same biological result. This reproducibility study replicates published analyses,and uses the same datasets to assess how biologically sound inferences of popula-tion identification and delineation are. Understanding the accuracy of our ability toappropriately detect population structure and identify populations is a vital aspectof population genetics, as these delineations are often used as the foundation forfurther analyses and insights into the evolution of populations. We used one ofthe most widely employed clustering methods, structure (Pritchard et al., 2000),and examined the potential biological and computational reasons for any lack ofreproducibility. Not only was this a test of the ability to correctly infer a biologicalparameter, but made a substantial contribution to the field in terms of promoting dataarchiving and sufficient descriptions of methods by authors in order to make studiesmore repeatable and more reproducible.My thesis projects combined contribute to understanding the evolution of popula-tions and species. From testing methods to simulating increasingly realistic models,these studies aim to build upon our knowledge of how different population geneticprocesses can alter the path of evolution as well as to how we can best infer whichprocesses are occurring and having the greatest impact on observed patterns of di-versity. Demographic history clearly has the ability to play a large role in evolution,and understanding its interaction with population genetic processes is critical for fullcomprehension of the evolutionary process and all of its ramifications for the futureof biodiversity.9chapter 2Evaluating methods for estimating local effective populationsize with and without migration2 .1 introductionEffective population size (Ne) is defined by Wright (1931) as the size of an idealpopulation that has the same properties of genetic drift as the population at hand. Neis a vital parameter used throughout population genetics and evolutionary biologyand applied widely in conservation genetics (Schwartz et al., 2007; Dudgeon et al.,2012). However, determining the true Ne in a natural system can be a difficult task(Serbezov et al., 2012; Theunert et al., 2012). With the advent of affordable moleculartools in nonmodel systems, it has become increasingly popular to estimate Ne usinggenotypic data (Wang, 2005; Palstra and Fraser, 2012), and multiple approaches to doso have been developed. These methods function through a range of methodologies,which each entail one or several assumptions.Ne can be estimated by the fact that as a population decreases in effective size,it is increasingly subject to allele frequency changes due to random genetic driftand increased inbreeding (Caballero, 1994). Quantifying these allele and genotypefrequency changes can therefore be used to determine its Ne. Several populationgenetic properties can be used for this purpose: change in allele frequencies overtime may reflect drift, and amounts of linkage disequilibrium (LD), homozygosityand heterozygosity, gene identity disequilibria, or coancestry may all deviate fromexpectation due to drift and inbreeding. Several methods use temporal data takenfrom two or more time points to estimate Ne (Nei and Tajima, 1981; Pollak, 1983;Beaumont, 2003; Wang and Whitlock, 2003; Anderson, 2005; Jorde and Ryman, 2007).While allele frequency change over time gives the most direct estimate of drift, obtain-10chapter 2ing samples from multiple generations may be difficult in some systems. With geneticdata from one point in time, Ne can be estimated by inferring drift or inbreeding frommeasures such as LD or other non-allele frequency measures described above (Hill,1981; Pudovkin et al., 1996; Vitalis and Couvet, 2001b; Nomura, 2008; Waples and Do,2008; Wang, 2009), or even from a combination of some properties (Tallmon et al.,2008). Details on the approach each program takes to estimate Ne are fully describedin the Methods.Genetic drift is not the only process that may cause changes in these populationgenetic quantities. Migration is a major evolutionary force that can also affect changesin allele frequencies over time, amounts of LD, and other of the aforementionedproperties. If changes due to migration are not accounted for by a method whenestimating Ne, those changes may be attributed to drift and result in a biased andinaccurate estimate of effective size (Wang and Whitlock, 2003). This bias frommigration may be in either direction. If many foreign alleles are introduced into apopulation through migration, this will create the signature of a larger amount ofdrift than is occurring and therefore produce an underestimate of Ne. Migration inthis case of higher population structure will also increase the amount of LD in thefocal population. However, if populations are genetically similar (i.e., low FST), thenmigration provides a larger number of parents than exist within the focal population,leading a small population to appear to be larger than it truly is and overestimatingNe. It is worth noting in this case, especially with conservation situations in mind,that the true Ne is not represented by this larger size, since if migration were to be cutoff, the focal population would again be subject to drift proportional to its isolatedpopulation size. Additionally, gene identity methods depend on gene flow for theircalculations, and higher Nem values create the most accurate and precise Ne estimates(Vitalis and Couvet, 2001b,c). Therefore the ability to detect whether drift alone or acombination of drift and migration is causing allele frequency changes in a populationis the key to accurately estimating effective population size.Method developers make every effort to clearly state the underlying assump-tions for their approach and test its performance to ensure appropriate application.11chapter 2While most methods have been assessed concurrently with their publication, the testdatasets used vary in design and are unique to each publication. Typically, theseassessments use simulated, ideal circumstances, or violate only a small subset ofassumptions for the method at hand (e.g., Vitalis and Couvet 2001a; Beaumont 2003;Wang and Whitlock 2003; Anderson 2005; Wang 2009; Do et al. 2014). Additionally,nonideal situations are generally not vetted across all possible estimation methods.In natural systems, which are frequently subject to migration, it can thus be difficultto know which method is best suited for the system in question to find the desiredmeaningful estimate of Ne. Several studies have therefore assessed the performanceof a subset of estimators in specific empirical situations (Barker, 2011; Serbezov et al.,2012; Holleley et al., 2013; Macbeth et al., 2013; Baalsrud et al., 2014). However, the trueeffective population size cannot be known in a natural system. Thus, evaluating theaccuracy of a method is not possible through these comparisons.Simulated datasets allow the true Ne to be known, therefore allowing estimationaccuracy to be assessed. By simulating more complicated demographic scenarios, non-ideal situations similar to those found in the real world can be addressed. Three suchstudies have attempted an evaluation of Ne estimates in the presence of migration;however, they have used only a small subset of existing estimation methods in varyingdemographic scenarios (Waples and England, 2011; Neel et al., 2013; Ryman et al.,2013). These studies give valuable insight into the performance of those methodswhen the assumption of a single isolated population is not true. They show thatthe performance of those methods can depend on assumptions about migration rates(Waples and England, 2011), population substructure (Ryman et al., 2013), and size ofarea sampled versus the typical distance within which mating occurs in populationsunder isolation by distance (Neel et al., 2013). Despite the vital findings of thesestudies, the field still lacks a comprehensive comparison of existing estimation meth-ods across identical, simulated datasets, which is the only way to make an informeddecision on which method to apply for the most accurate answer.Our study compares the seven most cited programs (which employ 14 estimationmethods) for estimating effective population size. We simultaneously apply them over12chapter 2identical datasets designed specifically to cover a wide range of simulated real worldmigration scenarios in order to assess their performance. These programs, colony2(Wang, 2009), cone (Anderson, 2005), estim (Vitalis and Couvet, 2001a), mlne(Wang and Whitlock, 2003), neestimator v2 (Do et al., 2014), onesamp (Tallmonet al., 2008), and tmvp (Beaumont, 2003), are specifically designed with the purpose ofestimating contemporary Ne. Our aim is to test each of the methods in an equivalentway to uncover the most accurate approach (or approaches) in estimating small scale,local Ne, as would be of interest in conservation studies. We only assessed programsthat use genetic data and did not include additional programs that calculate Ne as along-term effective size on the scale of species or as a product of other parameters,such as θ, since contemporary, local Ne was our value of interest (e.g., Migrate (Beerliand Felsenstein, 2001), IM (Hey and Nielsen, 2004), and Lamarc (Kuhner, 2006) amongothers). Our goal is to inform users as to which method(s) is best able to estimate Nein natural systems.2 .2 methods2 .2 .1 Population DemographiesPopulations were simulated with the forward-time, individual-based simulation pro-gram nemo (Guillaume and Rougemont, 2006). We modeled demographies withall populations at a constant size through time for isolation scenarios (no migration)and migration scenarios having either an island model metapopulation plus a nearbysink population, or a stepping stone case with linearly arrayed patches only sendingmigrants to the neighboring patch to represent an isolation by distance (IBD) scenario.Figure 2.1 shows the design for the island model cases that have a metapopulationof 11 patches connected equally by migration (either m = 0.01 or 0.1 throughout)and a sink population receiving unidirectional migration from a single patch in themetapopulation (either m = 0.01, 0.1, or 0.25). An isolated patch was also included inthis simulation. The sampling and estimation of Ne for these demographic cases (Fig.2.1 for the island model) is meant to represent situations one may encounter in a given13chapter 2real world system. In real applications, one may wish to estimate the Ne of a popu-lation without realizing that it may be affected by migration. The local patch may bepart of a large network of populations, or alternatively, the sampled population mayitself be substructured. In some cases the presence of migration may be difficult todetect. We therefore assessed nine sampling situations across methods: isolation, sink(low, medium, and high migration from source), within metapopulation (a patch inthe island model or a patch in the interior of the stepping stone model), an edge patchfrom the stepping stone model (one end of the linear array of patches), and across thewhole metapopulation (sample across all patches randomly within the island modeland the stepping stone model). Additionally, one estimation method, mlne, allowsa migration source to be identified, for which we provided a range of source patches:the correct source population, a more distantly related source population, an entirelyunrelated and isolated source population, the entire metapopulation combined as themigration source, or a case with no contemporary migration yet a historic migrationsource labeled incorrectly as a migration source (Fig. 2.1). This allowed us to furtherassess whether correctly or incorrectly identifying a migration source affected theperformance of mlne.In all cases we are interested in the local Ne of a patch except when samplingthe whole metapopulation, in which case the Ne estimation would be expected toreflect the metapopulation Ne. Population sizes were set per patch and constantacross patches with Ne equal to census size by following the Wright-Fisher model(see below), and were the same across all patches within a simulation. For isolated(ideal) populations, we tested Ne of 50, 500, and 5000 with 100 replicate simulationsat Ne = 50 and 500 and 25 replicates at Ne = 5000. In the migration scenarios, onlya local Ne of 50 and 500 were tested, again over 100 replicate simulations each, asestimators performed poorly at Ne = 5000 in preliminary runs (due to computationalrequirements, some methods additionally were not assessed or assessed with fewerreplicates at Ne = 500; see Results and Discussion). Several methods require that aprior be given for the estimate of Ne for which we provided a value of true Ne × 20as the upper bound (see Table 2.1). Some methods we test estimate the inbreeding14chapter 2Isolated	  Low	  m	  	  sink	  High	  m	  	  sink	  Medium	  	  m	  sink	  With	  source	  (MLNe)	  Sink	  popula=on	  Within	  metapopula=on	  Whole	  metapopula=on	  Sampling	  Schemes	  Simulated	  Demography	  F igure 2 .1 : Diagram showing the dispersal patterns for the island model migrationsimulations and the patches sampled for the different estimation cases. Low indicatesm = 0.01, medium m = 0.1, and high m = 0.25.15chapter 2Table 2 .1 : Programs tested in this study with input parameters and otherspecifications used.Program Citation Specifications (Ne = 50 / 500 / 5,000)colony2 v2.0 Wang 2009 run length specified = 2cone v1.01 Anderson 200510,000 iterationsprior Ne 2-1000 / 2-10,000 / 2-100,000step 5 / 10 / 25estim v1.2 Vitalis and Couvet 2001b,c prior max Ne 10,000† / 10,000 / 100,000mlne v1.0 Wang and Whitlock 2003 prior max Ne 1,000 / 10,000 / 38,250‡neestimatorv2.0Do et al. 2014(ldne – Waples and Do 2008) reporting results for MAF cutoff 0.05onesamp Tallmon et al. 200850,000iterationsprior Ne 2-1,000 / 2-10,000 (see text)tmvp Beaumont 200320,000-75,000iterationsmaxit 1,000-7,500;thinning interval 5-10;size distribution of updates 0.5; (see text)† The smallest maximum Ne allowed by estim is 10,000.‡ mlne did not allow larger maximum Ne due to memory limitations.effective size (NeI) while others estimate the variance effective size (NeV). However,our only source of nonrandom mating is due to the imposed migration, making thesevalues equivalent (Hill, 1979), and thus the estimates are comparable. Sample sizewas kept constant across all simulations at 250 individuals, which were obtainedby sampling individuals in the desired generation who did not contribute to thenext generation before regulation to carrying capacity. We additionally investigateda subset of analyses with a reduced sample size of 50 individuals to see how thisdecreased performance of the methods that we found to be most accurate.Individuals were simulated to represent genotypes with neutral microsatellitemarkers, as microsatellite data has been the main data type originally intended forprograms used. The life cycles modeled in nemo were breeding combined withmigration (defined by backward migration rates) followed by population regulation.In each generation, offspring are created from randomly chosen hermaphroditic par-ents until the population is of the desired size, N. Generations were non-overlapping.We simulate a Wright-Fisher model where each offspring has an equal and randomchance of coming from any given parent. By definition, this sets our Ne equal to16chapter 2N, though we clarify that this value differs slightly from the number of breedersthat could be captured by viewing the population at a specific point in time (Waplesand Faulkner, 2009). Mutation rate was set at 10−5 with a single step microsatel-lite mutation model, and 40 freely recombining polymorphic loci were used perindividual, initialized at maximum variance randomly up to the maximum possiblenumber of alleles of 256. Simulations were run until FST was approximately 95% ofits equilibrium value (or mutation-drift balance reached 95% equilibrium for isolationdemographies), calculated from Whitlock (1992).2 .2 .2 Temporal Ne Estimation MethodsWe compared seven different programs that implement 14 different Ne estimationmethods (see Table 2.1). Seven of these methods employ temporal estimation usingdata from two or more points in time. For our datasets, we used two time pointsseparated by one generation. Though it is known that increasing the number of gen-erations between samples (up to at most eight generations between samples) improvesestimation (Wang and Whitlock, 2003), in a real sampling situation it is more difficultto obtain samples separated further in time. Therefore one generation between ismost relevant to the current study, but is expected to produce conservative resultsin terms of performance of temporal estimators. We discuss the methods, and thedefaults used, in turn.tmvp uses a general method of genealogical inference with temporal genetic datato estimate recent changes in Ne (Beaumont, 2003). It incorporates both importancesampling methods and Markov chain Monte Carlo methods to sample independentgenealogical histories from which a posterior distribution of Ne is obtained with aBayesian framework. Parameter settings varied across runs (Table 2.1) to ensureproper convergence of the analysis, where generally longer run times were requiredfor higher migration cases. Posterior computations were performed using R scriptsto calculate the Gelman-Rubin statistic to test for convergence, and the modes for theBayes factor and the highest posterior density limits (Barker, 2011). mlne (Wang andWhitlock, 2003) also uses temporal data, performs both a moment and maximum17chapter 2likelihood approach to estimate Ne, and simultaneously estimates migration ratealong with Ne. Migration rates are estimated by providing an immigration sourcepopulation at the previous temporal time point. cone (Anderson, 2005) is an impor-tance sampling method that calculates the likelihood of Ne with temporal data usingMonte Carlo sampling under the coalescent model of Berthier et al. (2002), based onthe coalescent of gene copies drawn from the more recent time sample.neestimator v2.01 (Do et al., 2014) performs six different estimation methods,three of which use temporal data. These three approaches are based on calculationsof variance of change in allele frequencies as defined by Nei and Tajima (1981) andPollak (1983), which have slight variations in calculations and weightings of loci, andmore recently by Jorde and Ryman (2007) who account for bias in small sample sizesand low frequency alleles by weighting alleles differently.2 .2 .3 Single Sample Ne Estimation MethodsThe seven remaining methodologies use samples from only one point in time. estim(Vitalis and Couvet, 2001a) is a moment-based estimator that uses probabilities ofone- and two-locus identity states. It also simultaneously estimates Ne and migra-tion. In small populations, drift can result in associations between occurring genecopies (Hill and Robertson, 1968), so estim defines a “within-subpopulation identitydisequilibrium“ to account for this correlation of gene identities (Vitalis and Couvet,2001b). Runs were performed with random mating and mutation rate set to ourknown parameter of 10−5. It is worth noting that the true mutation rate is unlikelyto be known in a natural population, and we have not tested the sensitivity of themethod to varying this parameter. onesamp is designed specifically for use withmicrosatellites and uses approximate Bayesian computation to estimate Ne (Tallmonet al., 2008; Beaumont et al., 2009). It combines eight summary statistics specific to theinference of Ne, including several allele characteristics, heterozygosity, homozygosity,Wright’s FIS, and a linkage characteristic, which are estimated for simulated datasets.Due to computational limitations from long run times, onesamp was tested on onlya subset of Ne = 500 isolation replicates and not tested for Ne = 5000. colony218chapter 2(Wang, 2009) implements a maximum likelihood method and estimates Ne with eithera heterozygote-excess method or a sibship relatedness estimate. We did not providerelatedness information with our data to this program, as it is not often available, andthus do not include colony’s sibship results in our methods comparisons.The remaining three single sample estimators from neestimator use LD in-formation, as previously implemented in the program ldne (Waples and Do, 2008),another form of the heterozygote-excess method as described in Pudovkin et al. (1996),and a method by Nomura (2008) using a parent-based coancestry estimate amongindividuals in one cohort. ldne estimates Ne based on the amount of LD within ademe (Hill, 1981) and corrects for bias in calculating over a wide range of sample sizes(Waples and Do, 2008). 95% confidence intervals were obtained from the jackknifeoption, as parametric CIs are believed to be too narrow when locus pairs are notentirely independent. neestimator allows minimum allele frequency (MAF) cutoffvalues to be chosen for the analysis for which we chose a cutoff value of 0.05 asrecommended by the authors to ensure that results were not being driven by thepresence of rare alleles in the data.2 .2 .4 Composite EstimatesBecause each method uses different information from the data, we also tested if acombination of the most informative methods improved the accuracy and precisionof estimates. We combined estimates of Ne across a subset of the methods found toperform best to additionally create a more fair comparison between single samplemethods that would otherwise use only half as much data compared to temporalmethods. We averaged the estimates of 12Ne to produce the composite estimates. Thecomposite estimates tested averaged the results from combinations of (1) mlne’slikelihood method, (2) mlne’s moment method, (3) ldne’s estimate calculated ateach of the two time points used for the temporal methods, and (4) tmvp.19chapter 22 .2 .5 Analysis of EstimatesEstimates from each method were converted to values of 1/(2Ne), as this is the rateat which genetic drift occurs and therefore the value of interest against which wemeasure error (see Wang and Whitlock 2003). We assessed methods by calculatingroot mean square error (RMSE) of this value. RMSE accounts for both bias andprecision of the methods, estimating the typical error in an estimate:RMSE =√1nn∑i=1( 1Nˆe− 1TrueNe)2(2.1)where Nˆe equals the estimated Ne of the population and n the number of replicateestimates. Because we simulated a Wright-Fisher model, true Ne is known in ourcases, an advantage of simulated data, allowing an accurate and precise measure ofthe error in any given method. Whole metapopulation Ne was calculated accordingto Wright (1943) where Ne metapopulation =Ne patch × number patches1−FST , and FST is calculatedaccording to Wright (1931): FST = 14Nem+1 .The amount of drift that even an ideal population experiences in a given gener-ation may differ from the expectation of the Wright-Fisher model, because of ran-dom deviations from the expected distribution of reproductive success (Waples andFaulkner, 2009). This generates an additional source of error in the estimation of theNe of the system using data from a single generation. However, the goal of moststudies is to estimate the Ne (or 1Ne ) of a population over all instances of the driftprocess; this expected Ne is what helps best predict the amount of drift in a populationin future generations (for conservation purposes) or for comparison to other specieswith different mating systems or life histories (for basic science reasons). Thereforewe use the expected Ne (i.e., N from a Fisher-Wright population) as our standard ofcomparison when calculating 1/(2Ne ).RMSE was averaged across all scenarios and then within groupings of scenarios(isolation, island model, stepping stone) to describe methods that performed bestoverall. We calculated RMSE after removing estimates where the program indicated afailed run, as described in the reference for each method, or estimates that indicated20chapter 2invalid answers. (estim and the temporal neestimator methods produced zeroestimates in a subset of cases; personal communication with R. Vitalis and C. Doconfirmed that these are cases where the user is best to apply a different estimationmethod.) Additionally, as described by Do et al. (2014), negative estimates fromneestimator were taken to indicate an infinitely large Ne. 95% confidence intervalsfor RMSE were obtained by bootstrapping across Ne estimates within a demographicscenario with 10,000 replicate bootstrap samples each. Methods were further assessedper specific demographic scenario individually. We also examined the coverage prob-ability of containing the true value of Ne for confidence intervals produced from eachestimate.2 .3 resultsResults for all programs and scenarios are reported throughout the figures and tablesin the text and the supplemental material. Figure 2.2 shows the RMSE of all methodsfor the isolation (no migration) cases, the island model migration scenarios, and thestepping stone scenarios respectively. Numeric values of RMSE for all methods andscenarios are provided in Tables A.1–A.5 while Table 2.2 provides averaged RMSEs forthe best-performing methods and Table 2.3 for their composite estimates. Coverageprobabilities for confidence intervals are shown in Tables A.6–A.8, and the proportionof times that each estimator produced an infinite estimate of Ne is shown in FigureA.2. Point estimates of Ne with 95% confidence intervals for all replicates across allmethods and scenarios are visualized to show accuracy and precision in Figures A.3–A.21.The best (i.e., lowest) RMSE on average across all possible demographic scenarioswas mlne’s likelihood method (RMSE = 0.00240; see Table 2.2). However, becauseRMSE varied greatly across demographic scenarios, we describe results per class ofscenario in the following sections. We group results into demographic classes of nomigration, which contains scenarios of isolated populations, and migration scenarioswhich contain groupings of single patches as either sinks or within the island modelmetapopulation, as well as the whole metapopulation taken as one population, and21chapter 20.00010.0010.010.1a)           No migrationll lIsolatedNe =     50 500 5000RMSE( 1/2Ne )Ne = 500.000010.00010.0010.010.11Sink Within Whole Sink Within Wholeb)                        Island ModelNe = 5000.000010.00010.0010.010.11m = 0.01 m = 0.1Ne = 500.000010.00010.0010.010.11Within Whole Within Whole Within Wholec)                      Stepping StoneNe = 5000.000010.00010.0010.010.11m = 0.01 m = 0.1 m = 0.25lllEstimColony, HetExcessONeSampTMVPCoNeLDNeMLNe likelihoodMLNe momentMLNe lik. w/ mig. sourceMLNe mom. w/ mig. sourceHetExcessCoancestryPollakNei/TajimaJorde/RymanF igure 2 .2 : RMSE for all methods across (A) isolation scenarios, (B) island-modelmigration scenarios, and (C) IBD stepping-stone scenarios. The only sink case shownfor island model cases is m = 0.1 immigration rate from the metapopulation, as allsink cases showed similar results. The edge patch sampling case is not shown for thestepping stone model, as results were very similar to the within case. Note that thescale on the y-axis in panel (A) differs from those in panels (B) and (C). Bootstrapped95% confidence intervals are shown only in (A), as the intervals are smaller or equalto the point size for all but ten estimates of RMSE in (B) and (C).22chapter 2Table 2 .2 : RMSE averaged over groups of demographic scenarios for the bestperforming estimation methods.OverallAverageRMSEIsolatedN50IsolatedN500One PatchMigrationN50One PatchMigrationN500Metapop’nMigrationN50Metapop’nMigrationN500One PatchIBD N50One PatchIBD N500Metapop’nIBD N50Metapop’nIBD N500ldne 0.00828 0.00163 0.00094 0.00146 0.00409 0.00074 0.00009 0.00263 0.01977 0.04595 0.00841mlnelikelihood 0.00240 0.00341 0.00141 0.00347 0.00053 0.00021 0.00004 0.00351 0.00606 0.00011 0.00048mlne lik.with source 0.00242 – – 0.00119 0.00054 – – 0.00115 0.00785 – –mlnemoment 0.00247 0.00349 0.00202 0.00229 0.00020 0.00041 0.00006 0.00214 0.00952 0.00031 0.00036mlne mom.with source 0.00299 – – 0.00203 0.00054 – – 0.00181 0.00870 – –tmvp 0.00281 0.00349 0.00176 0.00236 0.00034† 0.00020 0.00059 0.00205 0.01129 0.00037 –†Gray boxes indicate the method with the lowest RMSE for that demography grouping.Dashes indicate cases where Ne was not estimated. See text for descriptions of demographicgroupings.† Parameter settings did not converge for TMVP runs, see text.the stepping stone models again with either single patches within the linear steppingstone array, or the whole metapopulation as one population. ldne had the lowestaverage RMSE in cases with no migration (Table 2.2). mlne’s various estimationmethods did best in the island model and stepping stone migration scenarios, withone demographic case where tmvp performs comparably (whole metapopulationcase with local Ne = 50).2 .3 .1 No Migration ScenariosThe three isolation scenarios (Ne = 50, 500, 5000), in which populations experienceno migration, showed ldne to consistently have the lowest RMSE. At Ne = 500,onesamp closely compares to ldne, but is one of the worst methods at Ne = 50.estim has a low RMSE at both Ne = 500 and 5000, but at Ne = 50 estim largelyoverestimates Ne with many very large or infinite estimates. colony’s heterozygoteexcess method tends to estimate infinite Ne in all cases, reflected by its large RMSE at50 and 500 (Fig. 2.2A). mlne’s likelihood estimate performs next best after ldne at500 and 5000 while the remaining majority of methods performed equally well at size50. It is worth noting that the heterozygote excess method (Pudovkin et al., 1996) andthe coancestry method (Nomura, 2008) as implemented in neestimator performpoorly at all population sizes.23chapter 2For the most accurate methods in the isolation scenarios, ldne and mlne likeli-hood, it is useful to examine the coverage probability of confidence intervals aroundeach estimate. Confidence intervals from ldne captured the true Ne 80% of the timefor Ne = 50 and 68% for Ne = 500, though six percent of those intervals spanned toinfinity at their upper end, and 24% for Ne = 5000 with 24% of those upper confidenceintervals spanning to infinity. mlne likelihood confidence intervals contained thetrue value of Ne 82%, 89%, and 16% of the time, respectively, however at Ne = 500, theupper bound on the confidence interval spanned to the maximum prior Ne provided39% of the time. Results for confidence intervals of all methods for no migrationdemographic cases are shown in Table A.6.2 .3 .2 Migration ScenariosRMSEs for the island model and stepping stone migration cases showed higher varia-tion than did the isolation cases (Fig. 2.2). Note that the y-axes on Figure 2.2B and Chave a greater range than in Figure 2.2A and remain on a log scale. neestimator’scoancestry and heterozygote excess methods again fall out as some of the poorerperforming methods. In the scenarios where the whole metapopulation has beensampled randomly (ignoring existing population substructure), many methods showdrastic change in their performance, either estimating Ne much more poorly (e.g.,neestimator’s methods according to Pollak (1983) and Nei and Tajima (1981)),varying from poorer to better depending on migration rate (estim), or improving(mlne moment and at a smaller Ne, tmvp). Several of these changes in performanceare a result of infinite estimates being produced.Among the migration scenarios, we averaged RMSE across single patch cases (asink population in the island model scenarios or an edge population in the step-ping stone scenarios grouped respectively with the population sampled within themetapopulation) and for the metapopulation as a whole at local patch size of 50 or500 across the various migration rates (Table 2.2). No method was consistently bestacross these scenarios. mlne’s likelihood method with a known migration sourceperformed best on single patch cases at Ne = 50 for both the island model and24chapter 2the stepping stone model scenarios, and mlne’s likelihood method with no sourceidentified showed the best RMSE for the whole metapopulation at Ne = 500 in theisland model scenarios, but includes many estimates at the upper bound of the prior,effectively leaving mlne’s moment method as the more accurate method in this case.In the stepping stone scenarios, again mlne’s likelihood method is best in singlepatch cases at both Ne = 50 and 500 while the moment method is best for themetapopulation cases, once accounting for Ne estimates at the upper bound of theprior.Despite not having the lowest RMSE for migration cases on average, ldne stillproduces accurate Ne estimates and has a lower RMSE in individual patch samplingscenarios, particularly at lower migration rates. Its higher average RMSE is drivenby cases where the whole metapopulation is sampled or there is a combination ofhigh migration rate and larger population size (m = 0.1, Ne = 500) in which casesldne produces infinite or inaccurate estimates of Ne. In these demographic scenarios,mlne or tmvp are needed to give appropriate estimates.Additionally, when examining the coverage probability of confidence intervalscontaining the true Ne from these methods, ldne shows greater coverage than mlnefor 18 of the 20 island model sampling scenarios and 16 of the 18 stepping stonescenarios. However, ldne also produces confidence intervals spanning to infinityin eight of those sampling scenarios for the island model and seven for the steppingstone model. Coverage probabilities for all methods are shown in Tables A.6-A.8 forall methods and demographic scenarios.2 .3 .3 Composite Methods and Sample Size ComparisonsThe composite estimates of Ne showed only slight improvements in RMSE in a subsetof methods and demographic scenarios (Table 2.3). Averaged over all scenarios, thecomposite estimate from mlne’s likelihood and moment estimates showed the lowestRMSE. Within grouped demographic scenarios, only in the single patch cases forNe = 50, the IBD stepping stone metapopulation at Ne = 500, and both isolationcases did a composite estimate improve over individual methods’ performances. As25chapter 2Table 2 .3 : RMSE averaged over groups of demographic scenarios for compositeestimates from the best performing methods.OverallAverageRMSEIsolatedN50IsolatedN500One PatchMigrationN50One PatchMigrationN500Metapop’nMigrationN50Metapop’nMigrationN500One PatchIBD N50One PatchIBD N500Metapop’nIBD N50Metapop’nIBD N500ldne / ldne 0.00774 0.00120 0.00090 0.00128 0.00276 0.00074 0.00009 0.00235 0.01939 0.04407 0.00858ldne /ldne/ tmvp 0.00568 0.00141 0.00109 0.00135 0.00104 0.00056 0.00014 0.00183 0.01698 0.02826 –†mlne lik./ moment 0.00197 0.00326 0.00170 0.00159 0.00032 0.00031 0.00005 0.00150 0.00778 0.00019 0.00005mlne lik./ momenttmvp0.00210 0.00330 0.00172 0.00149 0.00030 0.00027 0.00018 0.00130 0.00888 0.00013 –†mlne lik./ momentldne / ldne0.00455 0.00171 0.00120 0.00083 0.00140 0.00052 0.00006 0.00121 0.01342 0.02200 0.00434mlne lik./ momentldne / ldne/ tmvp0.00396 0.00203 0.00130 0.00099 0.00067 0.00045 0.00008 0.00112 0.01316 0.01692 –†Gray boxes indicate the method with the lowest RMSE for that demography grouping. Darkergray boxes indicate cases where the composite estimate showed a lower RMSE than anindividual method’s estimate. Dashes indicate cases where Ne was not estimated. See text fordescriptions of demographic groupings.† Parameter settings did not converge for TMVP runs, see text.expected, the composite estimate of ldne from two time points always improvedupon ldne at one time point for all demographic scenarios, though did not changeits rank performance relative to the best-performing methods averaged overall or forthe migration cases (ldne was already ranked best in isolation scenarios). However,no other composites showed consistent improvement across demographies. The com-posites are generally poorer than individual methods for the whole metapopulationsampling cases or at larger population sizes (Ne > 500).Additionally, comparing RMSEs for our results to those from estimates usinga sample size of only 50 individuals showed the expected result of a decrease inprecision and accuracy in both isolation and migration scenarios (see Fig. A.22).2 .3 .4 Estimating Ne and Migration RatesWe found no consistent effect of different source populations on the performance ofmlne when estimating Ne (see Fig. A.23). RMSE for mlne’s likelihood method atNe = 50 was improved in cases where either a correct or incorrect source populationwas identified. At a local size of 500, these results changed, where improvement onlyoccurred in certain cases for the island model demography, or not at all for the step-26chapter 2ping stone cases. mlne’s moment method showed different but equally inconsistentresults. For Ne = 50, different demographies produced Ne estimates that were moreor less accurate with a migration source. While at population size 500, not providingany migration source was consistently better than providing one for the island modelcases, the opposite was true in the stepping stone model where providing a corrector closely related source both consistently improved upon Ne estimates (where thecorrect source provided was consistently better than the related source).Finally, our last assessment pertained to the accuracy of migration rates that mlneestimates alongside its estimates of Ne. While the method showed high performancein its estimates of Ne, the migration rates it estimated in our cases were less accurate.When the correct migrant source was provided, migration rate estimates were mostaccurate at lower population sizes (Ne = 50) and lower migration rates (m = 0.01 –0.1). This accuracy decreased with increasing population size and increasing migra-tion rates (Fig. A.24). When migration was cut off from the source to the focal patchprior to the generations over which sampling occurred, but still (incorrectly) identifiedas a source for estimating Ne, migration rate estimates were even less accurate. Thehistoric migration rate (prior to migration being cut off) was more closely estimatedthan the true migration rate over the sampling period (m = 0). Ne estimates however,were either not affected or only slightly reduced in accuracy in these scenarios.2 .4 discussionEffective population size is clearly useful in population genetics and conservationbiology, yet as we have shown, its estimation is fraught with difficulties. By defi-nition, estimating effective population size derives from the ability to quantify theamount of random genetic drift a population experiences. Migration complicates thisquantification of drift by changing allele frequencies in less predictable ways. Manymethods have been developed to estimate Ne in natural systems, and our resultsshow that these methods vary highly in performance across demographic scenarios ofdiffering migration rates and population sizes. Some methods show consistently poorperformance and are best avoided, while we can recommend several methods that27chapter 2show the most accuracy and precision in our demographic scenarios (summarizedin Table 2.4). Because Ne is used in many population genetic equations as well asto inform conservation decisions (e.g., Shaffer 1981; Rieman 2001), it is vital that weestimate it as accurately as possible.Several methods were consistently poor estimators across all of our demographicscenarios, that is there was always a better method to choose from. The poorest per-forming methods are reassuringly mainly those with the fewest number of citationsout of the methods we evaluated. neestimator’s coancestry and heterozygote ex-cess methods were two such methods that stand out as consistently poor performers.However, the third most highly cited program, onesamp, has been shown to performsuboptimally in our demographic scenarios. We found no cases where it showed thelowest RMSE of all possible methods in any of our scenarios. Given that this methoduses a Bayesian approach, adjusting the prior expectation may improve results, asis claimed by Holleley et al. (2013). However, it seems unlikely that the majority ofsystems in which Ne is estimated are studied well enough to create an informativeprior; therefore we believe the wide prior we provided is a valid assessment of thismethod. Additionally, this program is subject to very long-run times, a limitationin our study that prevented assessing the method at all of our larger Ne cases sinceindividual run times were greater than 30 days at these sizes.Heterozygote excess approaches understandably performed poorly at higher Nevalues, as the theory upon which this estimation is based will only detect an excessof heterozygotes when a population is effectively small. Therefore, at larger sizes,a lack of heterozygote excess may not allow a large Ne to be distinguished from aneven larger Ne, explaining the lack of accuracy in those cases. Yet even when Ne = 50,these methods (heterozygote excess methods of colony2 and neestimator) wereworse than others. Though colony2’s likelihood approach, which is implementedalongside its heterozygote excess method, largely produced infinite estimates of Newith no sibship data provided, this method is expected to improve when such dataare available.28chapter 2Table 2 .4 : Summarized results across methods.Program Recommendation Reasonscolony2 v2.0 Heterozygote excess method:not recommendedOverestimates or estimates infinite valuesSibship method: not tested —cone v1.01 Not recommended for migra-tion scenariosWorks well in ideal scenarios, often fails in migra-tion scenarios, particularly with large Neestim v1.2 Not recommended Overestimates in ideal and migration scenarios andunderestimates in metapopulation and large Nestepping stone scenariosmlne v1.0 Recommended Moment method superior in migration scenariosand moment and likelihood methods both performwell over all scenarios, though estimates of migra-tion rate often inaccurateneestimatorv2.0ldne: recommended Superior performance in ideal scenarios, performswell in migration scenarios with low m and smallerNeOther estimators: not recom-mendedInconsistent or inaccurate estimatesonesamp Not recommended Overestimates small Ne, long run times at large Neprohibited testingtmvp Recommended with reserva-tionWorks well at small Ne in ideal and migrationscenarios as well as for whole metapopulation, butlong run times and difficult to establish workingparameters sets for analysesThe methods which we found to perform best were ldne, mlne, and tmvp.These methods were shown to be more reliable in terms of most consistently estimat-ing Ne across demographic scenarios, most accurate, and least biased. Fortunately,this fits with usage patterns; ldne is the most highly cited method, and mlne thesecond most cited, according to Web of Science as of January 12, 2015. ldne performsbest in situations with little to no migration (m = 0.01). This has been previously de-scribed by Waples and England (2011) where they found ldne to perform adequatelyup to migration rates of approximately 5–10%. With migration rates of 1%, we find itsestimates of Ne are still accurate and have appropriate confidence intervals at lowerpopulation sizes. Interestingly, ldne’s performance deteriorates greatly at highermigration rates when the population size is larger (Ne = 500), in which case it wouldnot be advisable to apply ldne.Neel et al. (2013) evaluated ldne in the context of continuously distributed pop-ulations, examining the effect of sampling too many or too few individuals relative29chapter 2to the area over which individuals mate randomly. They found that as long as sam-pling was restricted to the scale of local breeding (i.e., sampling window = breedingwindow), ldne performed well. However, with an increasing sampling window, Newas underestimated. We see this underestimation occur in our stepping stone modelas well, when we sample across the whole metapopulation. But this underestimationis also seen even when our sampling window matches the scope of local breeding atlarger population sizes (Ne = 500). Similarly, in the island model demography at ametapopulation migration rate of m = 0.1 when Ne = 500, Ne is also underestimated,as well as only to a slight extent in the stepping stone model at m = 0.25 and Ne = 50.Cases with population substructure are of particular significance. In our casesof sampling across the metapopulation in the island model, ldne produces infiniteestimates of Ne at both migration rates tested, while in the stepping stone model iteither underestimates Ne (when m = 0.01) or again produces infinite estimates. Evenat some higher migration cases, ldne is still not the worst method, and it oftenunderestimates the true Ne, which in terms of conservation efforts only results in aconservative answer. Only when the method fails to produce a finite Ne estimatein the substructure cases may this become dangerous, as one cannot differentiatewhether the population is truly large enough not to experience detectable drift or ifthe method is unable to estimate an accurate Ne.mlne, which simultaneously implements a moment and a likelihood estimateof Ne, appears to be the best program in cases of higher migration and populationsubstructure. One might expect higher migration rates to reflect a subpopulation’s Nebeing estimated closer to that of the whole metapopulation as m increases, howeverthis trend is not generally seen in either the island model or stepping stone modelscenarios. Most methods underestimate the local Ne, except for mlne’s likelihoodmethod that does show an average increase in local Ne with increasing migration inthe stepping stone model. Ryman et al. (2013) evaluated a temporal method in casesof migration and population substructure and found bias towards underestimatingNe. mlne’s likelihood method shows this trend only in the stepping stone scenariosat Ne = 500 when using a single focal patch (i.e., not the entire metapopulation). The30chapter 2moment method of mlne shows a similar bias toward underestimating Ne in thosesame stepping stone single population cases at both Ne = 50 and 500 as well as veryslight underestimation of the true Ne in the island model sink populations with highermigration rates. Barring those cases, mlne shows very accurate estimates of Ne. Asdescribed by Ryman et al. (2013), estimating the whole metapopulation Ne may beimproved by increasing sampling. Without increasing our sample size, we still findmlne’s moment method to estimate the whole metapopulation Ne quite accurately,or when less accurate, is still the most accurate estimate out of all the methods.tmvp produced very accurate Ne estimates in the migration cases. Despite itsadequate estimating ability, tmvp proved to be one of the most difficult analyses toconduct. Because the method uses importance sampling and MCMC, the parametersettings for each run must be set to allow for the results to converge on one estimate ofNe, which proved incredibly difficult at some of our larger Ne and m cases. For thosethat we did succeed in finding a suitable set of run parameters, run times exceededseveral weeks, therefore we did not run tmvp for all replicates of simulated datasets.For higher migration, higher Ne, and population substructure cases, our resultslend credit to applying mlne to obtain the most accurate Ne estimates. However, itis important to point out that the coverage probability of mlne’s confidence intervalsfor the likelihood method (the moment method produces no confidence intervals)were inaccurate. ldne’s coverage probability was better than mlne’s on average.Thus, even though mlne provides accurate Ne estimates, users are cautioned againsttrusting the confidence intervals to reliably include the true value of Ne. This has beendescribed previously both for mlne and tmvp (Tallmon et al., 2004), and though wedid find poor coverage for tmvp, it was less than the overconfidence of mlne’sconfidence intervals.An additional promised feature of mlne is to estimate the migration rate from anidentified source population which it also makes use of to improve its estimate of Newhen migration is occurring. When we provided a source population, however, therewas only noticeable improvement in estimation of Ne for those cases having a smallerNe. Interestingly, incorrectly identifying the source of immigration did not reduce the31chapter 2performance of mlne, as might have been expected. Therefore, lack of an identifiedsource of migration should not necessarily deter users from applying mlne.If it is suspected that migration rates may have changed in the recent past, wefound that mlne may produce less accurate Ne estimates. Moreover, relying onmlne to estimate accurate migration rates is not advised. Estimates of migration ratemay improve with more generations between sampling points, but this topic requiresfurther study.Composite estimates did not substantially improve the ability to estimate Ne.Because each of these approaches uses different information from the same data, wehad predicted that combining information could improve accuracy in Ne estimates.Though slight improvements in RMSE were seen, these results cannot be extrapolatedwidely beyond our migration scenarios. Poorly performing methods erred more dueto bias, rather than imprecision. Averages of biased results can only increase precisionaround an incorrect value, and any biases in opposite directions across methods thatmight effectively cancel out to give accurate results are likely to be idiosyncratic. It isclear that any unbiased single sample estimator would improve by averaging it overtwo temporal samples, as our results for ldne show. However, one of the majorhighlights of ldne is that it does not require temporal data, which is not feasiblyattainable in many study systems (Waples and Do, 2008).Choice of estimation method can greatly benefit from knowledge of a system’s un-derlying demography. There is no one best method across all scenarios, so designinga study suitable to a method that is predicted to work best under certain demographicconditions can increase accuracy in estimating Ne. Future study may also be able totest whether a more complex weighting of the different estimators may improve uponcomposite estimates.The ease of implementing an estimation method is also worth considering. ldneand mlne were two of the simplest programs to implement from a user’s pointof view. neestimator is very user-friendly, produces its estimates in a matter ofseconds to minutes, and is also one of the more actively maintained programs. mlneby default performs its moment and likelihood methods together and runs in a matter32chapter 2of minutes to hours. Initial setup of mlne may be slightly more daunting to users lessfamiliar with working with various software as it requires some additional steps tocompile on a Linux or Unix-based machine. Likewise, tmvp comes with difficultiesin its implementation in that it can be difficult to find a set of run parameters thatfunction properly to allow convergence of the analysis, run times are on the order ofdays, and there is an additional postprocessing step to acquire the Ne estimate fromits output.In conclusion, complex demographic histories make estimating Ne difficult, but interms of accuracy and performance, mlne and ldne have been shown to be the bestexisting methods to estimate contemporary Ne of a population, with one method orthe other being better depending on knowledge of migration rates and approximateidea of the magnitude of size of the population at hand.33chapter 3Local maladaptation reduces expansion load during speciesrange expansion3 .1 introductionSpecies range expansion is a complex process that is widely studied in evolutionarybiology, ecology, and conservation biology (Hastings et al., 2005; Phillips et al., 2006;Excoffier et al., 2009; Hallatschek and Nelson, 2010; Chen et al., 2011; Colautti andBarrett, 2013). An array of ecological and evolutionary factors are known to affectthe success or failure of range expansion, such as dispersal limitation (Hastings et al.,2005; Marsico and Hellmann, 2009; Hargreaves et al., 2014), interspecific competition(Case and Taper, 2000; Price and Kirkpatrick, 2009; Svenning et al., 2014; Louthanet al., 2015), or an inability to adapt to new conditions (Angert et al., 2008; Holt andBarfield, 2011; Polechová and Barton, 2015). A recently growing field of research hasfocused on an additional effect: the genetic load accumulated during the expansionprocess (Excoffier et al., 2009; Hallatschek and Nelson, 2010; Peischl et al., 2013; Peischland Excoffier, 2015; Peischl et al., 2015). In particular, the increasing frequency ofdeleterious variants in human populations that have expanded out of Africa has beenwidely studied (Lohmueller et al., 2008; Do et al., 2015; Henn et al., 2016). A furtherprocess that can increase genetic load during range expansion is maladaptation to thelocal environment due to gene flow from foreign environments, known as migrationload (Kirkpatrick and Barton, 1997; Barton, 2001; Polechová and Barton, 2015). Inthis study, we explore the impacts and interactions of heterogeneous selection dueto an environmental gradient and the accumulation of unconditionally deleteriousmutations during range expansion.34chapter 3During range expansions, an intriguing suite of population genetic processes canchange the course of evolution compared to populations that are growing in sizewithout movement. Populations at the expanding front of a species range undergoserial founder events where each new colonization into further territory creates apopulation bottleneck, leading to reduced genetic diversity in what becomes the newedge population and the primary source of future colonists. These processes createa persistently reduced effective population size at the range edge, decreasing theefficacy of selection and increasing the strength of random genetic drift. This leads tothe process of allele surfing (Klopfstein et al., 2006), whereby a deleterious mutationthat arises at the range edge is more likely to drift to high frequency than it otherwisewould if it arose in the denser core of the species range. Because selection is inefficient,strongly deleterious alleles may persist and reach higher frequencies than expectedin a population at equilibrium (Peischl and Excoffier, 2015). Theoretical work showsthat allele surfing is the main cause of increased deleterious allele frequency withinrecently expanded populations (Excoffier et al., 2009). The reduction in fitness due tothe accumulation of these deleterious alleles is termed expansion load (Peischl et al.,2013; Peischl and Excoffier, 2015).Empirical evidence also supports the predictions of both higher frequency andincreased effect sizes of deleterious mutations in recently expanded populations. Ex-pansion load has been posited as an explanation for the accumulation of deleteri-ous alleles in human populations that have undergone recent geographic expansions(Karlsson et al., 2014), e.g. genetic diseases in populations of Quebec (Labuda et al.,1997; Scriver, 2001; Yotova et al., 2005) and Scandinavia (Norio, 2003). A large pointof controversy, however, is the likelihood of these deleterious mutations persistingat high frequencies in modern populations. As human expansion out of Africaproceeded, deleterious mutations of larger effect rose in frequency, shifting the entiredistribution of allelic effect sizes upwards through time (Henn et al., 2016). Expansionload may thus have serious repercussions in many other species that undergo expan-sions, such as those recolonizing formerly glaciated areas (Hewitt, 1999), colonizing anew continent (Sakai et al., 2001), or tracking recent climate change (Chen et al., 2011).35chapter 3Understanding the detriment caused by this process may aid in efforts to combatinvasive species or in assisted migration. As global climate change proceeds, morespecies may be required to move their ranges in order to survive, and if fitness lossessimply due to the expansion process are common, populations may be left at higherrisk for extinction from stochastic, catastrophic events.A second evolutionary process that can affect range expansion is local adaptationto heterogeneous environmental conditions. Selective environments can vary overspace: e.g., environmental gradients in temperature or photoperiod from equatorialto polar latitudes influence important traits (Conover, 1992; Montague et al., 2008). Afoundational study by Kirkpatrick and Barton (1997) showed that steep environmentalgradients can impede range expansion due to the migration of individuals from therange core to the range edge, preventing local adaptation to edge conditions. Centralpopulations existing on an environmental gradient receive symmetric migration frompopulations both higher and lower on the gradient, leaving the population meanunchanged. In contrast, at a range edge, migration is asymmetric from core to edgeas small edge populations receive proportionally more immigration from the denserspecies core, causing edge populations to be swamped by locally maladaptive allelesand diverge from their local optimum. Therefore, when migration rates are highenough, the edge population can fail to adapt. When the environmental gradientis steep enough, the edge populations can experience sufficient fitness reductionsto result in local extinction and prevent range expansion. Further studies (Barton,2001; Polechová and Barton, 2015) have increased biological realism to models ofrange expansion (including evolution of genetic variance and effects of genetic drift),confirming theoretically that evolutionary processes alone can lead to the formationof a stable range edge.It is clear that expansion load can result in reduced fitness of recently expandedpopulations, and that heterogeneous selection along an environmental gradient canresult in local maladaptation of populations at the edge of a species range. These twoevolutionary processes – maladaptation from underlying environmental gradientsand expansion load from surfing of deleterious mutations at range edges – may occur36chapter 3simultaneously in many range expansions and could interact in interesting ways thathave not previously been studied. We consider two contrasting a priori hypotheses forhow they may interact. One possibility is that local maladaptation and expansion loadmay interact positively (i.e., each making the effects of the other more pronounced).This may be expected because both processes could lead to smaller population sizesat the expanding front, and both are stronger with greater genetic drift: expansionload increases because greater drift would cause more deleterious alleles to increasein frequency, and maladaptation worsens because greater drift would eliminate moreof the genetic variance needed to respond to local selection. Alternatively, expansionload and local maladaptation could negatively interact, with each partially amelio-rating the effects of the other. This is plausible because both forms of load couldreduce the pace at which ranges expand by reducing the fitness of individuals atthe range margin. Slower expansion enables fitter alleles to reach the edge throughmigration and to rescue populations suffering from accumulated expansion load.Similarly, slower range expansion could allow more time for populations to adaptto local conditions.We investigate these alternative hypotheses to establish the role of expansionload and migration load in isolation as well as in combination. Our goal is toemploy the most biologically reasonable parameters possible within the realm of ourmodel. Using individual-based simulations on a two-dimensional, spatially explicit,and approximately continuous landscape, we compare the reductions in fitness (load)that populations experience during range expansions over a series of environmentalgradients and deleterious mutation rates. These results have implications for the pre-dicted prevalence of expansion load in species across various types of environmentsand inform our understanding of the complex demographic and genetic processesthat occur during range expansions.3 .2 methodsWe model a species range expansion forward in time using the simulation programnemo (Guillaume and Rougemont, 2006). The population undergoes an initial burn-37chapter 3in period to mutation-selection equilibrium, after which it is allowed to expand acrossan empty landscape. Individuals are monoecious and diploid, possessing both aquantitative trait experiencing stabilizing selection and a set of loci subject to uncon-ditionally deleterious mutations. We compare three environmental gradients overwhich expansion occurs, and three genome-wide deleterious mutation rates. Allcombinations of parameter values are listed in Table B.1. We ran twenty independentreplicates for each of our simulated scenarios.3 .2 .1 The ModelThe environmental gradient underlying the landscape changes in only one dimension,along the axis of expansion. We compare three values for the steepness of thisgradient, b, which defines the change in phenotypic optimum over space: a flatlandscape with no gradient (b = 0.0), a shallow gradient (b = 0.0375), and a steepgradient (b = 0.375). On the shallow gradient, an average dispersal distance of anindividual would result in a fitness loss of 0.5% if it were perfectly adapted in itsnatal environment, whereas on the steep gradient, an average dispersal event wouldreduce an individual’s fitness by 5% given perfect adaptation in its natal environment.Each of these gradients results in a different equilibrium level of migration load inpopulations.We model space explicitly on a 2000× 40 unit landscape using a modified versionof nemo available at Each1× 40 unit slice of the landscape is referred to as a cross-section (Figure 3.1). Thismodification allows a discrete approximation of continuous space. The landscape isdivided into 80,000 1× 1 square units, which we call cells. The dispersal kernel andbreeding window occur across cells, allowing individuals to interact over a largerspatial scale. Lifecycle events occur in the order of breeding, dispersal, selection, andthen population regulation. Subpopulation regulation enforces a carrying capacityof 6 individuals per cell, but because breeding is not limited to within one cell,the realized neighborhood size as defined by Wright (1946) is approximately 300individuals.38chapter 3Populations were initiated at carrying capacity in the left-most 40 cross-sections ofthe landscape, which we term the landscape core (Figure 3.1). After a burn-in periodof 15,000 generations, the remaining 1960 cross-sections of the landscape becomeavailable, allowing for expansion to occur. The environmental optimum is constantacross any given cross-section.CoreDirection of expansionCross−sectionF igure 3 .1 : Landscape graphic representing an example of the landscape simulatedwith a grid of units where the burn-in period occurs in the landscape core at the left,expansion proceeds to the right, and any vertical column of units is a cross-section.Actual landscapes were 2000× 40 units with a 40× 40 core.Breeding — Our modified version of nemo creates a breeding window withinwhich individuals may search for a mate within their own cell on the landscapeor in nearby cells. Mating probability follows an approximate bivariate Gaussianfunction which we derive by integrating the probability over each cell within thebreeding window. We define the size of this breeding window with the parameterσbreed, where f (x, y) ∝ exp [−( ∆x22σ2breed +∆y22σ2breed)] gives the distance traveled to search fora mate in each dimension of the landscape. σbreed was held constant at 0.5 landscapeunits. The maximum searchable distance was restricted to 4σbreed with probabilitiesrenormalized to correct for this truncation. This results in a two-dimensional breedingwindow where an individual’s focal cell is the most likely source of a mate and the12 surrounding cells could be searched for a mate with decreasing probability. Thisbreeding kernel describes the relative probability of each individual within the possi-ble breeding window being chosen as the male parent for a given offspring. When noother individuals are present within the breeding window, an individual self-fertilizes.This will most often happen at the expanding range edge where population densities39chapter 3are lowest and mimics the ability of many plant species to self under conditions ofpollen limitation (Hargreaves and Eckert, 2014). Each female’s fecundity was drawnfrom a Poisson distribution with mean 7 to determine the number of mating events.D ispersal — We modeled dispersal for most simulations according to an approx-imate bivariate Gaussian kernel, similar to the breeding kernel:f (x, y) ∝ exp [−( ∆x22σ2disperse +∆y22σ2disperse)]. The dispersal kernel gives the forward (in time)migration probabilities for offspring to disperse to a given patch. The maximumdispersal distance was capped at 8σdisperse units. The Gaussian dispersal kernel σdispersewas set to equal two landscape units. We also simulated a leptokurtic kernel to testthe effects of rare, long-distance dispersal events. The leptokurtic dispersal kernel wasa mixture distribution created from a weighted sum of two different Gaussian kernels(Ibrahim et al., 1996) with the same average dispersal distance as the Gaussian kerneland a total kurtosis of 10. This value of kurtosis is not unrealistic for long-distance dis-persal in many species of plants and animals (Skalski and Gilliam, 2000; Lowe, 2009;Guttal et al., 2011). The first distribution used σdisperse equal to 1.5 with a weightingof 0.89, while the second used σdisperse of 5.5 and a weighting of 0.11. The leptokurtickernel was truncated to have the same maximum dispersal distance, which in bothcases meant an individual could at most disperse 16 units in any one direction fromits natal cell. Borders of the 40× 2000 landscape were absorbing, so any individualmigrating beyond the edge was removed. R code for calculating and discretizing thedispersal and breeding kernels is available in a wrapper package written for nemo,aNEMOne, available at: — We modeled a genetic architecture where 100 quantitative trait lociand 1000 loci subject to unconditionally deleterious mutations were randomly andindependently placed on the genome for each simulation. Each genome consistedof ten chromosomes of 100 cM each. We selected realistic mutational parameters forthese loci as follows.The quantitative trait, z, was controlled by 100 additive loci. Each allele at theseloci can take any real value. The evolution of this trait under stabilizing selection40chapter 3depends on the mutational variance, VM, and the inverse strength of stabilizingselection on the trait, ω2. These properties have been empirically estimated for anumber of traits, along with the corresponding environmental variances, VE. For ourmodel we set VM and ω2 relative to the same arbitrary VE value (VE = 1). Thereis evidence that ω2 ≈ 5VP on average (where VP is the phenotypic trait variance;Kingsolver et al. (2001); Johnson and Barton (2005)). The relationship between VP andVE can be expressed in terms of heritability, h2 = 1− VEVP , and gives a reasonable valueof ω2 ≈ 5VE1−h2 . Given VE = 1 and a typical heritability of h2 ≈ 13 (Mousseau et al., 1987;Houle, 1992), we set ω2 = 7.5.Most reported values of VMVE are in the range of 10−4 to 10−3 (Houle et al., 1996).VM depends on the genome-wide rate of mutations affecting the trait, Uz, and theexpected squared effect of a mutation on the trait, E[α2], where VM = UzE[α2]. Ifmutational effects are Gaussian with E[α] = 0, then E[α2] = V[α], the varianceof mutational effects on the trait. These components are difficult to estimate, butaccording to one theoretical approach we expect VP − VE = 4Uzω2 (Turelli, 1984;Charlesworth et al., 2010). Given the parameters above, this implies Uz ≈ 0.02,similar to some direct estimates (Lynch et al., 1998). In our simulations we used100 quantitative trait loci, with a mutation rate per locus of 10−4, giving a diploidmutation rate, Uz, of 0.02. We further assumed V[α] = 0.02 (i.e., mutational effectson z are drawn from a normal distribution with mean 0 and variance 0.02), giving avalue for VM of 4× 10−4.In this model an individual’s trait value is given by the sum of allelic effects acrossall quantitative trait loci, with no dominance. Alleles are continuous, such that amutation’s effect is added to the existing allelic value. The component of fitness(survivorship) for the quantitative trait value z is wz = exp[− (z−zopt)22ω2 ], where zopt isthe optimal trait value for the given location on the landscape. At the start of theburn-in, all individuals were initiated with z equal to the mean zopt of the burn-inarea (core).We also modeled 1000 bi-allelic loci subject to unconditionally deleterious muta-tions. We considered genome-wide diploid deleterious mutation rates, UD, of 0.1 and41chapter 31.0 in order to encompass probable rates for a variety of taxa, including Drosophilamelanogaster (Haag-Liautard et al., 2007), Caenorhabditis elegans (Denver et al., 2004),Arabidopsis thaliana (Shaw et al., 2000), Amsinckia sp. (Schoen, 2005), and possiblynon-human endothermic vertebrates (Baer et al., 2007). However, we note that inhumans UD likely exceeds 1.0 (Keightley, 2012), and UD is likely less than 0.1 in manyother organisms (Baer et al., 2007; Halligan and Keightley, 2009). We also included atreatment where deleterious mutations were absent (UD = 0.0). Deleterious mutationrates per haploid locus were thus 0, 5 × 10−5, and 5 × 10−4, and we allowed forback-mutation at rates of 0, 5× 10−7, and 5 × 10−6, respectively. Once mutated, alocus could not further mutate and the only possible change would be due to a back-mutation. These 1000 loci do not accurately portray the number of possible loci inbiological systems, but are instead an approximation required by simulation. Bymatching our genome-wide mutation rates to realistic values, each of these loci ismore representative of a region of the genome within which a deleterious mutationmay arise. Thus, for the distribution of effects we use, described below, small effectmutations may be approaching saturation, but because these small effect loci do notcontribute substantially to fitness reductions, this lack of realism in terms of thenumber of loci is not detrimental to the accuracy of our model.Though the true distribution of mutational fitness effects is empirically difficultto measure and may be complex, there is evidence that most mutations have smalleffects on fitness, and mutations with large effects on fitness are rare (Eyre-Walkerand Keightley, 2007). We modeled the homozygous fitness effects (s) of deleteriousmutations using a leptokurtic gamma distribution with mean 0.01 and shape 0.3, suchthat most mutations have s < 1% (Keightley, 1994). The mutational effect of each locuswas drawn from this distribution at the start of each independent simulation run andremained constant throughout the run.Estimates of the average dominance (h) of deleterious mutations are scarce, rangewidely, and are subject to various biases (Halligan and Keightley, 2009; Agrawal andWhitlock, 2011). It is likely that most new mutations are partially recessive, and thereis evidence for a negative relationship between h and s (Agrawal and Whitlock, 2011).42chapter 3We follow previous authors (Lynch et al., 1995; Deng and Lynch, 1996) in assumingan exponential relationship between h and s, as well as a mean h of approximately0.37. Specifically, we assume h = exp[−51.1s]/2. In addition, we included a classof deleterious mutations with s = 1 and h = 0, i.e., recessive lethals, and assumedthat 3% of deleterious mutations fall into this class to match the genome-wide ratetypically observed in D. melanogaster (Fry et al., 1999).M imicking Mutation Load — To disentangle the effects of mutation load andexpansion load we compared an additional parameter set to the core set describedabove. When populations are at mutation-selection balance, the reduction in fitnessdue to deleterious mutations is termed the mutation load. During range expansion,deleterious mutations can increase due to genetic drift at the expanding front. Thisexcess load beyond mutation load is termed the expansion load. To mimic the pres-ence of mutation load only, without expansion load, we altered individual fecundityto reduce realized fitness to a level that matches the mutation load measured inthe range core, which has not undergone expansion, for each of our scenarios ofUD = 0.1 and UD = 1.0. This was only done for the cases of Gaussian dispersal.We calculated the equilibrium fitness due only to deleterious mutations in the corelandscape populations and reduced mean fecundity to recreate this same realizedfitness. Mean fecundity in the case matched to UD = 0.1 was approximately 6.475and approximately 3.721 for the UD = 1.0 case.3 .2 .2 AnalysesWe assess the impact of expansion across an environmental gradient and in the pres-ence of deleterious mutations both independently and in combination. We quantifytwo measures from the simulation results: the speed of range expansion and meanfitness at the range edge versus in the core. Fitness is measured after population reg-ulation, therefore includes only individuals surviving each generation. We partitionfitness into each of its contributing components: wz for the quantitative trait and wDfor the deleterious alleles.43chapter 3To track the expanding front we use a single landscape cross section as the rangeedge. This edge was defined as the first cross section away from the core at whichpopulation size is at 50% of the core’s equilibrium population size. The speed ofexpansion measures the number of cross sections over which this edge cross sectiontravels per generation from the end of the burn-in until reaching the second to lastlandscape cross section.We examine fitness at the range edge using a broader definition of the edge toreduce sampling error. We defined this range edge to contain all individuals presentwithin the cross section used to measure expansion speed plus all individuals presentin cross sections remaining before empty landscape. Expansion load measured theexcess load that accumulated at the range edge beyond mutation load, calculated asexpansion load = 1− edge wDcore wD(3.1)Because expansion proceeded at different rates in different scenarios, to create anequivalent comparison of fitness on the landscape we measured populations at thelatest recorded point in the simulations before any individuals had dispersed into thelast 20 cross-sections of the landscape. Since individuals can at most disperse 16 units,this prevented bias from edge effects that might arise upon filling the landscape. Weaveraged population sizes and fitness within cross-sections of the landscape, as wesaw no major variation along this axis.We also examine the distribution of effect sizes for deleterious mutations accu-mulated at the range edge. We examined the distribution of alleles contributing toexpansion load across cases of UD and b. Deleterious mutations were binned by effectsize into 40 quantiles based on the underlying gamma distribution of homozygouseffects. Within a given simulation, we calculated the average load due to loci presentwithin each of these bins as follows. For the edge or the core, we calculated fitness ina given bin of n loci as wD = ∏ni=1(1− hisiφhet,i− siφhom,i), where φhet,i is the frequencyof heterozygotes for a given locus i, and φhom,i is the frequency of homozygotes.Expansion load was then calculated for each bin with equation 3.1. Because fitness44chapter 3is multiplicative across loci, we then calculated average expansion load per locus ineach bin as 1− n√1− expansion load.3 .3 results3 .3 .1 Expansion SpeedNo parameter combinations that we investigated ever led to the formation of a stablerange edge. However, the rate at which populations expanded across the landscapewas impacted by both heterogeneous selection from the environmental gradient anddeleterious mutations. Steeper environmental gradients slowed expansion (Figure3.2). Compared to the case of no gradient, the steep gradient slowed expansion inthe Gaussian dispersal cases by 77− 80%. Leptokurtic dispersal kernels resulted infaster expansion over the landscape, particularly in the absence of an environmentalgradient where expansion speed was 32% to 52% greater for leptokurtic dispersal.Increasing the deleterious genomic mutation rate slightly but significantly decreasedthe speed of expansion (ANOVA F2,358 = 3749.5, p < 0.001). Compared to UD = 0 ,UD = 1, decreased expansion speed by 31.6% with no gradient and by 21.3% on thesteep gradient. Cases without expansion load (fecundity adjusted) increased speedrelative to the cases with expansion load in the absence of a gradient for both values ofUD, with all other cases exhibiting slight increases or decreases in speed. The greatestspeed increase from removing expansion load was by 18.5% for UD = 1 and b = 0.3 .3 .2 Fitness and LoadIn general, heterogeneous selection or deleterious mutations both reduced mean fit-ness in the core and edge, as expected. Furthermore, fitness at the expanding edgewas always reduced relative to that in the core of the species range (Figure 3.3).Overall fitness at the range edge correlated with expansion speed, but the presenceof an environmental gradient more strongly changed the speed of expansion (Sup-plemental Figure B.1). The quantitative fitness component, w¯z, in the core was nearlyidentical across UD cases and ranged from 0.87 to 0.93 (Table B.1). For the quantitative45chapter 30123456UD = 0.0 UD = 0.1 UD = 1.00.0 0.03750.3750.0 0.03750.3750.0 0.03750.375Slope of environmental gradient (b )Speed of ExpansionlllllllllllllllllGaussian dispersal kernelLeptokurtic dispersal kernelNo expansion loadF igure 3 .2 : Average speed of expansion across scenarios. Circles indicate Gaussiandispersal kernels while triangles indicate leptokurtic dispersal kernels. Dashed linesindicates fecundity-adjusted simulations with Gaussian dispersal, which replicate theloss in realized fitness due to mutation load, but lack expansion load because nodeleterious alleles are present. 95% confidence intervals are smaller than the plotpoints in all cases.trait, fitness reduction at the edge only occurred in the presence of an environmentalgradient and ranged from a 25− 77% reduction in w¯z at the edge relative to the core.The most severe fitness loss at a range edge (77%) was for UD = 0 on the steepgradient (b = 0.375).The fitness component for loci with deleterious alleles, w¯D, in the range corereflects the equilibrium mutation load reached in the respective UD cases. For UD =0.1, core w¯D was 0.92, while for UD = 1, core w¯D was 0.53. Edge w¯D ranged from0.80− 0.89 for UD = 0.1 and from 0.32− 0.44 for UD = 1. Table B.1 reports all fitnessvalues for the respective scenarios simulated.Expansion load for our parameter sets caused at most a 39% reduction in fitness,while at its weakest resulted in a 3.5% decrease in fitness (Figure 3.4). Across thetwo UD cases, UD = 1 has a 2.95-fold increase in expansion load over the case of46chapter FitnessUD = 0.0 UD = 0.1 UD = 1.0a)  Quantitative Trait0.0 0.03750.3750.0 0.03750.3750.0 0.03750.375Slope of environmental gradient (b )CoreEdge0. UD = 0.0 UD = 0.1 UD = 1.0b)  Deleterious Mutations0.0 0.03750.3750.0 0.03750.3750.0 0.03750.375Slope of environmental gradient (b )CoreEdgeF igure 3 .3 : Average core and edge fitness components for quantitative trait andglobally deleterious alleles. Dashed lines indicate fecundity-adjusted simulationswhich replicate the loss in realized fitness due to mutation load, but lack expansionload because no deleterious alleles are present. Error bars indicate 95% confidenceintervals.UD = 0.1 in the absence of an environmental gradient, while on the steepest gradientthis change in UD resulted in a 5.1-fold increase in expansion load.Leptokurtic dispersal increased expansion speed, but did not significantly changefitness from the Gaussian results (Supplemental Figure B.2). At a given point on thelandscape, fitness recovered after the expanding front passes (Supplemental FigureB.3). Supplemental movies (Figures B.4 - B.6) show the effects on fitness reduction atthe range edge due to surfing and local maladaptation, as well as recovery throughtime due to evolutionary rescue.3 .3 .3 Interaction of Expansion Load and Heterogeneous SelectionWe next investigate how each individual component of fitness is affected by reducedfitness in the other component. We examine the impact of expansion load on thelevel of local maladaptation in populations and alternatively if local maladaptationaffects the degree of expansion load. First, we find that despite reduced overallfitness, increased load due to unconditionally deleterious alleles improves the level of47chapter 0.03750.375Slope of environmental gradient (b )Expansion LoadUD = 1UD = 0.1UD = 0F igure 3 .4 : Average expansion load across all combinations of environmentalgradients and genome-wide deleterious mutation rates. Error bars indicate 95%confidence intervals.adaptation via the quantitative trait to the local environment at the range edge (Figure3.3a). This becomes clear when we consider the fecundity-adjustment simulations thatlack expansion load. Within a given value of environmental gradient, w¯z increases atthe expanding front when mutation load is the only effect present (fecundity-adjustedruns) and increases even further in the presence of both mutation load and expansionload (UD = 0.1 and UD = 1). The largest increase in quantitative trait fitness occurson the weak gradient (b = 0.0375) between UD = 0 and UD = 1, where the presenceof mutation load leads to a 30% improvement in wz on average, and the presence ofmutation and expansion load together increase wz by 70%.Second, we find that the severity of expansion load is substantially reduced bythe presence of an environmental gradient (Figure 3.4). Expansion load decreaseswith increasing gradient steepness, though overall fitness still decreases on steepergradients. In the case of UD = 1, there is a 54% reduction in expansion load from48chapter 3the no gradient to the steepest gradient. In the case of UD = 0.1, expansion load isreduced by the steepest gradient by 73% compared to the case with no environmentalgradient.0204060801000.00 0.01 0.02 0.03 0.04 0.05 0.06 [0.0633 − 1]Density of loci0.00000.00050.00100.0015Average expansion load per locusHomozygous effect size (s)a) UD = 0.1b  = 0.0b  = 0.0375b  = 0.3750204060801000.00 0.01 0.02 0.03 0.04 0.05 0.06 [0.0633 − 1]Density of loci0.00000.00050.00100.0015Average expansion load per locusHomozygous effect size (s)b) UD = 1.0b  = 0.0b  = 0.0375b  = 0.375F igure 3 .5 : Average expansion load per locus at the range edge. Loci are binnedinto quantiles drawn from our modeled gamma distribution, shown by the histogramof loci density. Average load per locus is then calculated for each of these bins andplotted for each value of the environmental gradient3 .3 .4 Loci Contributing to Expansion LoadWe investigated the distribution of effect sizes for deleterious mutations accumulatingon the expanding population front. For UD = 1, the majority of expansion load is due49chapter 3to alleles of intermediate effect (Figure 3.5b). On the steep gradient (b = 0.375), thereis less load overall due to each locus, but a more even distribution across loci of bothintermediate and large effect. On the shallow gradient and with no gradient, there arefewer large-effect alleles contributing to load relative to the steep gradient, but alsomany more intermediate-sized alleles contributing to more overall load. For UD = 0.1(Figure 3.5a), there is less overall load, so lower averages within most ranges of alleliceffect size. However, we see an inflation of larger effect alleles above that seen for theUD = 1 case in the absence of an environmental gradient.There are very few loci present in the larger effect size classes that contributedisproportionately to expansion load. As can be seen in Figure 3.6, some of theselarger effect loci have fixed at the range edge (see also Supplemental Figure B.7). Theeffect of surfing that contributes to the fixation of these strongly-deleterious allelescan be seen in animation over time in Supplemental Figure B.8, where recovery fromfixation follows behind the expanding wave front.F igure 3 .6 : Snapshot in time of deleterious allele frequencies across the landscape,by homozygous effect size, s. This example is at 250 generations after burn-in for thecase of a leptokurtic dispersal kernel, with UD = 0.1 and b = 0. This example wasnot randomly chosen, but selected to show that large effect loci can locally fix underthese conditions. The locus indicated by the thickest orange line has an effect size of0.0898.50chapter 33 .4 discussionRange expansions are unique demographic events that lead to an interesting suiteof population genetic processes. These processes have been widely studied, yet thecombination of both a heterogeneous environmental gradient and expansion loadfrom deleterious mutations has not previously been investigated. Previous theoreticalstudies focusing independently on either the expansion load or adaptation along anenvironmental gradients predicted reduced fitness at the range edge. We find thatthese factors interact, so that the load at the expanding range edge is not as greatas would be expected by a simple combination of the two. Both expansion loadand local maladaptation to an environmental gradient reduce fitness at the rangeedge, but the mechanism of fitness reduction is important in the process of rangeexpansion. Whether fitness is reduced due to local maladaptation or due to expansionload impacts the rate of range expansion, and therefore the rate of evolutionary rescueand thus the load that accumulates during expansion. This interaction is importantfor predicting the dynamics of modern range expansions due to natural phenomenaor human-induced climate change.Our results corroborate previous studies, showing that expansion load accumu-lates at expanding range edges. Even though our simulation design differs fromprevious models investigating expansion load, we still find substantial load accumu-lating throughout the course of range expansion. The presence of hard versus softselection can change the amount of expansion load accumulated (Peischl et al., 2013;Peischl and Excoffier, 2015), as can one- versus two-dimensional landscape models(Peischl et al., 2013). Several other differences such as a range of mutational effectsfor deleterious alleles and continuous dispersal rather than a stepping-stone modelmight also contribute to the amount of expansion load.Our main result is that expansion load and the load due to local maladaptationinteract. Local maladaptation at the range edge is not as severe in the presence ofexpansion load, and of an even greater effect, expansion load is not as severe in thepresence of a strong environmental gradient causing local maladaptation. We believethat the dominant reasons differ for these two patterns of interaction.51chapter 3First, let us consider the improvement in the local adaptation at the range marginin the presence of expansion load. For a population to expand, the individuals atthe expanding front must have an absolute fitness greater than one, on average. Thismeans that a range margin will occur where absolute fitness drops below one. Asa consequence, when there is greater load in one component of fitness, such as thatcaused by deleterious mutations across the genome, the population will not persistunless other fitness components are large enough to allow sufficient absolute fitness.Therefore, when expansion load is greatest, as occurs at the range edge, the fitnessdue to local adaptation must be higher in order for the population to have a largeenough fitness to even exist. Note that this improvement in local adaptation is morea case of expansion load eliminating maladapted populations at the expanding frontrather than a reduction in load. As a result, greater adaptation in the quantitative traitis found in the presence of expansion load.On the other hand, we also observe that expansion load can be greatly reduced inthe presence of local maladaptation. In this case, we believe that there is an additionalfactor explaining this interaction. In the absence of an environmental gradient, wherethe greatest expansion load accumulated (39%), range expansion is mainly limitedby dispersal ability. However, with local maladaptation caused by an environmentalgradient, expansion is further slowed by the need for colonizing populations to adaptto the novel local environment. In fact, range expansion is slowed considerablymore by a changing selective environment than by a high frequency of uniformlydeleterious alleles (See figure B.1). As a result, the local maladaptation caused by aheterogeneous environment causes the rate of range expansion to slow substantially(Figure 3.2), which allows more time for evolutionary rescue of marginal populations.As range expansion slows, selection has more opportunity to reduce the frequency ofdeleterious alleles at the edge. Moreover, with slower expansion, high fitness allelesfrom the center of the range have more time to disperse toward the edge restoringgenetic diversity at sites locally fixed for deleterious alleles by previous drift, andback mutation also has more time to generate beneficial diversity.52chapter 3The mechanism altering expansion speed is a vital part of the process for theseinteractions to occur. When expansion is not slowed by the need to locally adapt andinstead only by dispersal limitations, there is no expectation for a faster expansion tocontribute to increased load. This explains the lack of further fitness reductions fromincreased expansion speed in our simulations with long distance dispersal. Instead,local adaptation at the edge was slightly reduced and expansion load less severe thanin the Gaussian dispersal case, suggesting that this difference in dispersal models ledto only slightly different amounts of expansion load accumulating given the differentamount of connectivity between the core and edge.Expansion load is attributed to the reduced efficacy of selection during expansionallowing detrimental mutations to accumulate, a result our study supports. The effectsizes of mutations underlying expansion load have important empirical implications,as many studies today aim to understand expansion load in humans after expansionout of Africa (Lohmueller, 2014b,a; Henn et al., 2016, 2015; Gravel, 2016). Whetherexpansion load exists but is difficult to detect has been debated, and a full under-standing of the selection coefficients across human (or other) genomes is lacking(Hancock et al., 2011; Lohmueller, 2014b; Simons et al., 2014; Henn et al., 2016, 2015).We find an interesting difference in the make-up of expansion load between our twosimulated cases of genome-wide deleterious mutation rates. While deleterious allelesof moderate and large effect are relatively rare, we find that they are responsiblefor the majority of expansion load. These alleles, which would otherwise tend tobe purged or kept at low frequency by purifying selection, are able to increase infrequency on the expanding wave front. Interestingly, we find that in cases exhibitingfaster range expansion, a larger proportion of expansion load comes from alleles ofmoderate and large effect, as purifying selection has less time to oppose the rise infrequency of strongly deleterious alleles.3 .4 .1 Implications, Caveats, and Future DirectionsOur finding that local maladaptation interacts with expansion load has broad evolu-tionary and ecological implications, including for studies of natural range expansions53chapter 3under climate change, invasive species, and conservation efforts (Hunter and Hutchin-son, 1994). For example, highly invasive species are known for their rapid rate ofspread (e.g., cane toads in Australia Phillips et al. 2006), providing interesting opportu-nities to examine if and how these species accumulate expansion load. Conservationefforts that aim to reintroduce genetic diversity through assisted migration wouldclearly benefit edge populations in terms of reducing expansion load, but potentialdisruption to local adaptation is also a concern (Aitken and Whitlock, 2013). Finally,both expansion load and migration load are likely to affect climate-change inducedrange shifts. Rapid climate change would necessitate fast range expansion. Therefore,expansion load may be increased and local adaptation decreased, leaving strugglingpopulations subject to stochastic extinction events. If the speed of climate change isnot too fast, however, populations adapting as they move over space may reduce anypotential impacts of expansion load.There are several biological features of organisms and their environments thatwe did not consider in our simulations due to the vast computational resourcesalready required, but these merit future investigation. Our simulations allowed forindividuals to self-fertilize when mates were limited, but this is not possible in manyspecies. The inability to self-fertilize could slow range expansions, leading to areduction in expansion load. Other effects can similarly reduce fitness in smalledge populations. Allee effects (Taylor and Hastings, 2005) or aggregating dispersalbehavior that discourages colonization of empty habitat (Altwegg et al., 2013) wouldslow expansion, as might dispersal barriers or increased encounters with antagonisticspecies (e.g. competitors, pathogens) (Case et al., 2005; Kubisch et al., 2013). It wouldbe interesting to consider species with overlapping generations, where previouslyestablished individuals may block immigration into patches at carrying capacity (i.e.a priority effect, Atkins and Travis 2010). Priority effects could slow replacementof initial colonizers at the range edge, impeding genetic rescue and increasing thepersistence of expansion load away from the edge. Some factors that could speed ex-pansion beyond rates seen in our model and lead to increased expansion load includegreater long distance dispersal or increased fecundity. We found that simulations54chapter 3with fecundity halved (3.5 vs. 7, data not shown) exhibited much less expansion loadand higher fitness in edge populations. Thus, increasing fecundity allows populationsto persist at lower mean fitness and thus accumulate more load. Finally, other factorscould result in complex, less predictable effects. Local adaptation likely involvesmultiple quantitative traits adapting over similar or different environmental gradients,with potentially complex interactions that merit future investigation.A potentially key evolutionary component of range expansions not included inour model is the evolution of dispersal ability. Increased dispersal is always expectedto evolve at expanding range margins (Hargreaves and Eckert, 2014). Interestingly,increased dispersal is expected theoretically and found empirically even during expan-sion across environmental gradients or with expansion load alone (Henry et al., 2015).However, increased dispersal also steepens the perceived slope of a given environ-mental gradient, which can eventually slow or even temporarily halt range expansionuntil edge populations evolve to overcome initial maladaptation (e.g. Phillips 2012).Therefore it is unclear how dispersal evolution would affect the results presented inthe current study.3 .4 .2 ConclusionsOur results support those of previous studies finding that expansion load via thesurfing of deleterious alleles reduces fitness in expanding populations (Peischl et al.,2013; Peischl and Excoffier, 2015; Peischl et al., 2015). We show this under biologi-cally realistic conditions, bolstering evidence that allele surfing may, indeed, causeexpansion load in nature. Our results are also in agreement with those of previousstudies showing that on an environmental gradient, migration load reduces fitnessin expanding populations (Kirkpatrick and Barton, 1997; Bridle et al., 2010; Polechováand Barton, 2015), though we did not see any cases of stable range limits as a resultof local maladaptation or expansion load. We highlight the mechanism of interactionbetween local maladaptation and expansion load. Local maladaptation feeds backto reduce the speed of expansion and thus allows for increased evolutionary rescuethroughout range expansion. Finally, we demonstrate that faster range expansion55chapter 3leads to a larger contribution of moderate and large effect deleterious alleles to expan-sion load. These contributions significantly advance theory on the genetics of rangeexpansion towards meaningful predictions and interpretations for studies of naturalpopulations.56chapter 4Genetic architecture and landscape patchiness change adaptiveability during range expansion4 .1 introductionThe ability for populations to adapt to novel environments has been studied in evolu-tionary biology for decades. Understanding adaptation to novel environments is keyto understanding range expansion, speciation, local adaptation, invasive species, andconservation biology (Sexton et al., 2009; Stapley et al., 2010). Populations can locallyadapt within their species range to survive across a variety of disparate environments(Kawecki and Ebert, 2004), but why populations cannot always continue to locallyadapt past a range edge and expand is unclear (Bridle and Vines, 2007). The capacityfor adaptation to new environments has broad implications such as the ability ofinvasive species to spread (Prentis et al., 2008) and the potential for success of newgenetic diversity from attempts of assisted migration (Aitken and Whitlock, 2013).Adaptation during range expansions has been examined in great detail. Foun-dational theoretical work on adaptation and range expansions over environmentalgradients has shown that, on a linear environmental gradient, increasing the steepnessof change in environmental optimum leads to maladaptation at the margin and even-tual extinction of the species (Kirkpatrick and Barton, 1997). Further investigationshave shown the effects of density dependent selection (Garcia-Ramos and Kirkpatrick,1997) as well as stochastic effects at low density (Bridle et al., 2010), temporal changesin environments (Pease et al., 1989), differing dispersal parameters (Aguilée et al.,2012), evolution of genetic variance (Barton, 2001; Polechová et al., 2009), and theimpacts of genetic drift (Polechová and Barton, 2015) on adaptation or the lack thereof57chapter 4at range margins. Sharp range limits have even been shown to occur due to lack ofadaptation at a range edge under certain conditions (Polechová and Barton, 2015).These studies have all investigated adaptation in models of linear environmentalgradients in continuous space. On a linear gradient, higher gene flow has beenshown to introduce more maladaptive alleles from greater distances and thus be moredetrimental to the local population (Garcia-Ramos and Kirkpatrick, 1997). Furthertheoretical studies have investigated adaptation across two discrete patches of differ-ent environmental optima. Ronce and Kirkpatrick (2001) showed that if selection isstrong enough within each habitat or migration reduced enough, this can prevent theestablishment of foreign alleles in a new population and allow it to sufficiently adaptto local conditions. Others have shown that evolutionary rescue from migration intomarginal populations of small effective population sizes, which otherwise behaveas sinks, can lead to self-sustaining populations that adapt and persist (Holt andGomulkiewicz, 1997; Gomulkiewicz et al., 1999; Ching et al., 2012). The level of geneflow among populations thus greatly affects the outcome for adaptation and rangeexpansion.Real world environments are often more complex than linear gradients of envi-ronmental change. Environments are heterogeneous and can change in multifar-ious ways over space, from gradual to sudden, and with minor or major magni-tudes of change over space. Species are known to exist across gradients of manytypes, from elevational (Stevens, 1992; Wilson et al., 2005) to latitudinal (Gaston, 2009).Changes in temperature or other environmental variables can occur heterogeneouslyin space (Pickett and Rogers, 1997), for example bioclimatic regions, patches of forestor meadow, elevational gradients with distinct regions of environments such as aboveor below the timberline, or changes in soil conditions along slopes. Lastly, speciesare also known to exist across scenarios similar to two-patch models, such as plantsliving on and off patches of serpentine soils (Brady et al., 2005). These differencesin landscape structure play a key role in the adaptation of populations and thereforemay potentially interact with evolutionary processes at the range margin.58chapter 4Modeling range expansions across realistic landscapes is novel and necessary tofully understand evolution during range limits. Previous studies have investigatedrange expansions across continuous linear gradients (Kirkpatrick and Barton, 1997;Barton, 2001; Polechová et al., 2009; Polechová and Barton, 2015) or adaptation acrosstwo discrete patches (Gomulkiewicz and Holt, 1995; Holt and Gomulkiewicz, 1997;Gomulkiewicz et al., 1999; Ronce and Kirkpatrick, 2001; Holt et al., 2003). We modelan approximately continuous landscape with discrete boundaries between environ-ments to investigate the intermediate scenarios between linear gradients and two-patch models of previous studies. Several characteristics of these models merit thisinvestigation. In two-patch models, populations within each patch are panmictic, andpatterns of adaptation are driven solely by migration rates between patches. Isolation-by-distance characterizes models of continuous space, where the closer individualsare geographically, the more similar they are genetically. In our model, isolationby distance occurs across the landscape, but the environmental gradient across thislandscape changes by discrete patches of environmental optima rather than linearlythrough space.Several important genetic processes during range expansions cause us to hypoth-esize that patchy landscapes may change the outcomes of adaptation and expansionbeyond expectations from linear and two-patch models. Increasing habitat hetero-geneity, and thus heterogeneity in selective pressures, interacts with gene flow toeither enhance or inhibit local adaptation in peripheral populations (Slatkin, 1987;Kirkpatrick and Barton, 1997; Ronce and Kirkpatrick, 2001). Due to reduced densityat range margins, edge populations experience gene flow at much higher proportionsthan populations existing within the denser core of the species range (Brown, 1984).This asymmetric migration can behave in two disparate ways: introducing new ge-netic variation needed for further adaptation in an otherwise genetically depauperatepopulation, or swamping locally adaptive alleles with foreign maladaptive alleles andthus preventing further adaptation. Which of these occurs depends on the details ofmigration rates, mutation rates, population sizes, and the degree to which the environ-ment differs across the species range. With discrete changes in the environment, we59chapter 4expect gene flow to be maladaptive at the boundaries between patches, but beneficialwithin patches by spreading adaptive alleles. Therefore gene flow will have differingeffects on facilitating or inhibiting local adaptation across the landscape dependingon the degree of patchiness within the overall gradient. It is unclear whether theresulting dynamics of adaptation and expansion across such a landscape will reflectthose of continuous linear gradients, two-patch models, or neither.The genetic basis of local adaptation is also important (Stapley et al., 2010) andcan interact with the patchiness of landscapes (Schiffers et al., 2014). Expandingpopulations must continually adapt to newly encountered environments, and thesuccess of adaptation can be driven by the relationship between genetic architectureand scale of environmental change (Schiffers et al., 2014). Many studies, both empir-ical and theoretical, have examined the role of quantitative traits and their geneticarchitecture in adaptation to novel environments (Holloway et al., 1990; Carroll et al.,2001; Peichel et al., 2001; Holt et al., 2003; Bratteler et al., 2006; Yeaman and Whitlock,2011; Schiffers et al., 2014; Yeaman, 2015). Holt et al. (2003) have shown that adaptationin sink populations is determined by the overall influx of genotypes to the newenvironment. It was previously thought that many alleles of small effect may beswamped by gene flow before being able to contribute to sufficient local adaptation(Tigano and Friesen, 2016), but local adaptation via many genes of small effect ispossible (Le Corre and Kremer, 2012; Yeaman, 2015). As shown by Yeaman (2015),this occurs through transient processes subject to random genetic drift when ratesand sizes of mutations and genetic redundancy are sufficient (though detecting thepresence of such alleles remains a difficult task). It is clear, however, that fewerloci of larger effect can contribute most easily to adaptation, which may eventuallylead to the building up of linked clusters of locally adapted alleles, often referred toas genomic islands of divergence (Feder and Nosil, 2010). Empirical studies havealso demonstrated adaptation via many loci of small effect (Buckler et al., 2009).Even when small effect alleles initially contribute to adaptation, larger effect allelesare expected to accumulate over time and contribute to adaptive differences amongpopulations (Yeaman and Whitlock, 2011). Genetic architecture was shown to play60chapter 4a larger role than geographic isolation in contributing to genomic divergence andspeciation in sunflowers (Renaut et al., 2013). The details of genetic architectureare thus vital in studies of adaptation. Kirkpatrick and Barton (1997) and Barton(2001) established predictions for the success of expansion over linear environmentalgradients based on sufficient levels of genetic variation, VG. Investigating how varioustypes of genetic architecture can contribute to the genetic variation necessary for rangeexpansions further informs our understanding of how species may be able to copewith global climate change.In this study, we address two main questions. First, do predictions of rangeexpansion from a linear gradient successfully extrapolate to a patchy gradient thatchanges in discrete jumps? And second, does the genetic architecture of the systemchange the environmental conditions for successful range expansion? We simulaterange expansion across a spatially explicit and discrete two-dimensional landscapewith individual-based, forward time simulations in the program nemo (Guillaumeand Rougemont, 2006). We vary two main parameters of interest: the heterogeneity ofchange in environmental optimum in space and the genetic architecture underlyingthe quantitative trait that allows adaptation to the local environment. We model arange of genetic architectures for a quantitative trait to examine potential differencesin ability to adapt to environments that change heterogeneously in space. Examiningthe rate of range expansion and the time required for populations to colonize andadapt in new environments provides insight into the role of both genetic architectureand environmental heterogeneity in determining the success or failure of adaptationacross a species range.4 .2 methodsWe implement a modified version of nemo (Guillaume and Rougemont, 2006), de-scribed in Chapter 3, which allows continuous space to be approximated by a discrete,two-dimensional species range. This expands greatly upon existing 2-patch modelswith non-explicit space within each landscape patch and approaches a model of con-tinuous space. Individuals are monoecious (hermaphroditic), obligately outcrossing61chapter 4diploids which possess a quantitative trait under stabilizing selection for the localenvironmental optimum. We initiate the population in a limited spatial region fora burn-in to migration-mutation-selection balance, after which populations expandinto empty landscape of various types of heterogeneous environmental optima. Gen-erations are non-overlapping, and life cycle events occur in the order of breeding,dispersal, viability selection, and population regulation. We monitor the populations’abilities to spread across the landscape, and when successful, compare these scenariosin terms of expansion speed and fitness as they encounter environments of differingenvironmental optima to which they must adapt in order to further expand.4 .2 .1 LandscapesTo approximate continuous space and maintain a spatially explicit landscape withdiscrete populations, we implemented a large spatial grid of habitat patches in nemo.This bridges the gap between existing two-patch landscape models (where existencewithin either of these two patches is not spatially defined) and models of continuousspace. The landscape is a rectangular grid of 40× 2000 cells with a defined phenotypicoptimum and carrying capacity for every cell in the landscape. We term the leftmost40 × 40 cells of the landscape the core, which is where the burn-in period occurs.After burn-in, expansion proceeds along the long (x) axis of the landscape into theremaining 40× 1960 cells. The core consistently had a constant optimum phenotypicvalue of 0 to ensure that the population was well-adapted before expanding ontoenvironmental gradients. In all cases, the environmental optimum only changed inone dimension over the axis of expansion (x-axis) and was constant within cross-sections of the landscape (y-axis). We thus average all values across the 40-cell y-axisinto a single value for a cross section at each location x on the landscape. Terms fordescribing these landscape and other parameters are defined in Table 4.1.We examined several types of environmental heterogeneities across landscapes.These range from a linear gradient to two patches, as well as combinations in betweenhaving multiple patches of constant environmental optima interspersed by sharpchanges in the environment. We model landscapes of heterogeneous environmental62chapter 4Table 4 .1 : Terminology and parameter definitions for simulationsTerm ExplanationCell One square unit on the landscape grid. The entire landscape is 40× 2000 cells.Cross section 40 cells across the y-axis of the landscape.Core The leftmost 40 × 40 cells, equivalently the leftmost 40 cross sections on thelandscape. The population is limited to this region during the burn-in period.Environmental optimum is constant within the core.Patch A region of cross sections for which the phenotypic optimum on the landscape isconstant.∆zopt The magnitude of change in the optimum phenotypic trait, z, between cross sectionson the landscape.b The average slope of the gradient in optimum phenotypic trait, z, across thelandscape.α2 The variance in effect size of mutations occurring in the quantitative trait.K Carrying capacity of a cell. Population regulation occurs at the level of cells. K = 5in all cases.σbreed One standard deviation of the Gaussian kernel describing the area of the breedingwindow. Individuals can find mates within cells contained by the breeding window.σbreed = 0.5 cell widths in all cases.σdisperse One standard deviation of the Gaussian kernel describing the area of potentialdispersal. Offspring can disperse to cells within the dispersal kernel. σdisperse = 2cell widths or 4 cell widths as described in the text.gradients possessing patches of constant phenotypic optima interspersed by changesin phenotypic optima over the landscape.To define each type of heterogeneous landscape, we describe three parameters:the width of patches over the landscape, the magnitude of change in optima betweeneach of these patches, and the total gradient value across the landscape, which can becalculated from the previous two values since the landscape is of a constant size. Thetotal gradient value, b, on the landscape is measured as the total change in phenotypicoptima over the landscape divided by the length of the landscape (measured bynumber of cross sections, 1960). We refer to each local change in environmentaloptima on the landscape as ∆zopt. This value measures the magnitude of change inphenotypic optima between two patches. Patches are defined as a contiguous regionof cross sections for which the phenotypic optima is constant on the landscape. Thenumber of times that ∆zopt changes over the entire landscape is equal to the totallength of the landscape minus the core region (1960) divided by patch width. Figure4.1 shows a schematic for a simplified landscape. We approximate a linear gradient63chapter 4with 1960 patches on the landscape, each with a width of one cross section, which ismuch narrower than the dispersal kernel.0 500 1000 1500 2000024681012Landscape X positionEnvironmental Optimum Δz opt Patch size Same b Different Δzopt  Different b Same Δzopt  Core xF igure 4 .1 : Schematic of simplified heterogeneous landscapes, where ∆zopt is themagnitude of change in phenotypic optimum at one location on the landscape andpatch size is the number of cross sections contained within a given area of the sameenvironmental optimum. Landscapes depicted with solid lines have the same ∆zopt,but different total gradients, b, while the dashed and gray lines show the sameoverall gradient consisting of different numbers of patches with different ∆zopt’s. Thelandscape core in the first 40 cross sections of the landscape is uniform.The steepness of a linear gradient at which range expansion is no longer possibleis referred to as bcrit (Kirkpatrick and Barton, 1997; Barton, 2001; Bridle et al., 2010). Wefirst model linear gradients to determine the critical value, bcrit, for which expansioncould no longer occur. We then fragment the same total gradient into increasinglylarger patches with higher values of ∆zopt to test at what point bcrit is changed bypatchy environments. For this analysis, we held the total gradient value constant, butchanged the width of patches and the ∆zopt between each. To understand why theobserved bcrit from a linear gradient does not hold, we then investigate the role of∆zopt alone on a two-patch landscape, and then elucidate the interaction of ∆zopt withpatch width in allowing adaptation and expansion.64chapter 44 .2 .2 Breeding & DispersalThe large landscape grid allows a discrete approximation to continuous space byimplementing breeding and dispersal kernels over a fine-scale landscape grid. Eachlandscape cell has a local carrying capacity set at K = 5. Breeding and dispersalboth occur in windows over regions of nearby cells, maintaining populations at largereffective population sizes. Neighborhood size is approximately 250 individuals, ascalculated from Wright (1946).The breeding window we introduced into nemo defines a given radius of cellsaround a focal cell from which an individual can find a potential mate and is describedin detail in Chapter 3. Breeding is thus not limited to an individual cell. Lone colonistson the range front can still potentially find a nearby mate, even if they are the soleinhabitant of their cell. This most closely resembles obligately outcrossing plants, whomay receive pollen from nearby mates, but could also represent animals that searchfor mates nearby then return to their home territory or scenarios where females areresident with roaming males. The probability that an individual mates with anotherindividual within her breeding window is described by an approximate bivariateGaussian function of the distance between them. The size of this breeding window isdefined by one standard deviation of a Gaussian distribution, σbreed, where f (x, y) ∝exp [−( ∆x22σ2breed +∆y22σ2breed)] gives the relative probability of mating with an individualat a given distance x and y. We discretized these probabilities by integrating theprobability over each cell and assigning each potential mate within the window achance to father offspring proportional to its distance. σbreed was set at one halfof a cell width, and the maximum search radius for a mate was limited to 4σbreed.This produced a breeding window containing the 13 cells: the focal cell and its 12surrounding cells. Individual fecundity was drawn from a Poisson distribution withmean 4.Dispersal occurred similarly to breeding, where a dispersal kernel was definedfor forward rates of migration and discretized over cells on the landscape. Thisdistribution was defined by σdisperse = 2 cell widths for the small dispersal kernel andσdisperse = 4 cell widths for the large dispersal kernel. An R wrapper package was cre-65chapter 4ated to calculate breeding and dispersal kernels, and to discretize these probabilitiesacross the landscape, and is available online at code for the modified version of nemo with a breeding window is availableonline at .2 .3 Viability Selection & Genetic ArchitectureSelection occurred only on survivorship, and not fecundity. Selection is stabilizing forthe local environment’s phenotypic optimum, with fitness of an individual definedby wz = exp[− (z−zopt)22ω2 ], where z is the quantitative trait value, zopt is the optimumphenotypic trait for a given cell on the landscape, and ω2 defines the width of thefitness function. We scale phenotypic units in terms of environmental variance. Withenvironmental variance, VE, set to 1, and a typical heritability of ∼ 13 , ω2 would beequal to 7.5, which is the value we used throughout all simulations. This is derivedfully in Chapter 3.Individual phenotypes are defined by the quantitative trait, z, that experiencesstabilizing selection to adapt to the local environment. The quantitative trait, z, iscontrolled by 100 freely recombining loci. We held the total mutational variance, Vm,constant at 10−2, but varied the underlying genetic architectures contributing to themutational variance. The number of loci, L, was also held constant at 100. Therefore,to satisfy Vm = 2Lµα2 (Lande, 1975), we varied the per locus mutation rate, µ, andthe variance in mutational effect sizes, α2. Mutations followed a continuous Gaussiandistribution with variance α2. Four regimes of genetic architecture were investigatedthat range from high mutation rate and low variance in mutational effect size to lowmutation rate and high variance in mutational effect size. The “small effect" regimeset µ = 10−2 and α2 = 0.005, the “medium-small effect" regime set µ = 10−3 andα2 = 0.05, the “medium-large effect" regime set µ = 10−4 and α2 = 0.5, and the“large effect" regime set µ = 10−5 and α2 = 5.0. Though the large effect regimehas the potential for mutations of much larger effect sizes, small effect mutations arenot excluded. For a subset of simulations, we also compared the effect of holding66chapter 4VG approximately constant. This was done by holding µ constant at 10−5 and onlyvarying α2 at 5, 0.5, and 0.05.4 .2 .4 AnalysesWe quantify range expansion with two main measures: the growth rate of the entiremetapopulation on the landscape and the time at which populations remain station-ary before expanding into a new patch. Growth rate provides the most accurate mea-sure of populations moving across the landscape within the confines of nemo and isfurther useful because it positively correlates closely to population fitness. Away fromthe expanding front and any patch boundaries, populations exist at carrying capacity,so local density differences do not bias this measure of expansion rate.4 .3 results4 .3 .1 Expansion on a linear gradientWe first determine the greatest steepness that allows expansion on a linear gradient,bcrit. This bcrit value varied depending on the genetic architecture modeled. Thesmall effect and large effect regimes showed the highest bcrit values, while bothmedium effect regimes had lower bcrit values. The medium-large regime had thelowest bcrit value of all genetic architectures modeled (Figure 4.2; i.e., lack of expansionat b = 4 [yellow] in both medium regimes and lack of expansion at b = 3.5 [lightblue] for the medium-large regime). In all cases, as the steepness of the gradientapproached bcrit, the rate of population growth decreased, indicating longer times toexpand across these gradients. In the large effect regime for the steepest gradienton which expansion occurs (b = 4), only 4 out of 20 replicates expanded over thelandscape to produce the calculated growth rate. These four replicates also did notimmediately expand after the end of the burn-in, as did all other cases; they beganto grow beyond the core 7400− 16100 generations later (mean = 11088). A similarwait-time phenomenon was observed for the medium-large regime. In these cases,expansion began for all replicates within 4850 generations (range = 575− 4850, mean67chapter 4= 2487.5 generations). The total population size maintained across the landscape aswell as average fitness were also lower as gradient steepness increased towards bcrit(Supplemental Figure C.1).llllllllllllllllllll050100150200250300Mean population growth rate10−2 10−3 10−4 10−50.005 0.05 0.5 5.0µα2l σdisperse = 2σdisperse = 4b  = 1b  = 2b  = 3b  = 3.5b  = 4b  = 4.5b  = 5NogrowthF igure 4 .2 : Growth rate of populations (number of individuals per generation)expanding over a linear gradient, from slope of b = 1 to b = 5 and for the small(σdisperse = 2) and large (σdisperse = 4) dispersal distances. The four different regimesof genetic architecture are indicated along the x-axis where α2 is the variance inmutational effect size, and points are jittered. Below the gray dashed line are casesthat the population did not expand out of the core. Error bars indicate 95% confidenceintervals.Barton (2001) derives predictions for the value of bcrit, which depend on the geneticvariance: bcrit =VGσ√VS, where σ is the standard deviation of distance travelled bycombining both gamete and offspring dispersal, i.e., σ =√σ2disperse +12σ2breed (weight-ing σbreed by half derives from the fact that gametes from only one parent moveduring breeding). We find that the genetic variance across the landscape increases68chapter 4in scenarios of steeper environmental gradients (Supplemental Figure C.2). Becausethe genetic variance evolves differently on each gradient, we cannot make a prioripredictions for bcrit. We can calculate a bcrit value using the genetic variance fromthe landscape core, but doing so creates a large underestimate of our realized bcritvalues (realized bcrit = [4 − 4.5], [3.5 − 4], [3 − 3.5], and [4 − 4.5]; predicted bcrit ≈0.161, 0.125, 0.018, and 0.002 for each genetic architecture respectively: small, medium-small, medium-large, and large). There are multiple reasons for this disparity in bcrit,which we consider in the Discussion.For the same gradient values and genetic architectures, we examined the impactof dispersal distance on the outcome of range expansion and the value of bcrit. Thepredicted bcrit for these scenarios is approximately half of those for the smaller dis-persal distance (realized bcrit = [2− 3], [2− 3], [1− 2], and [1− 2]; predicted bcrit ≈0.082, 0.063, 0.009, and 0.0008, respectively). Our realized bcrit values in these largerdispersal cases were again much larger than the predicted value using VG as estimatedin the core, but were also much smaller than each respective bcrit for the smallerdispersal cases (Figure 4.2).4 .3 .2 Expansion over heterogeneous, patchy landscape gradientsWe now examine how the outcome of range expansion changes as landscape het-erogeneity changes. We model two values of the overall gradient (b = 0.0075 andb = 0.015), both of which create a very weak gradient for which expansion is easilysuccessful in the linear model. Holding this total gradient constant, we vary patchsize across the landscape to model the cases from a linear gradient (patch size = 1,1960 patches) up to a two-patch model (patch size = 980, 2 patches), and a range ofintermediate patch sizes between these. To maintain the same overall gradient acrosssimulations, ∆zopt increases as patch size increases (within a given simulation, ∆zoptis constant), i.e., b = ∆zopt(number o f patches−1)1960 .We find that the total gradient steepness did not determine success of expan-sion, and instead, the success of expansion depended on the value of ∆zopt betweenpatches. On the overall weaker gradient, expansion failed when ∆zopt = 15 between69chapter 4two patches (except one out of 80 replicates; in the medium-small regime after a10, 000 generation delay). Likewise, on the overall stronger gradient, expansion alsofailed at ∆zopt = 15, which was in this case a five patch model. When ∆zopt = 7.5(weaker gradient = 3 patches, stronger gradient = 5 patches) expansion succeededequivalently regardless of overall gradient. Population growth rate was similar for alllower ∆zopt values except in the medium-large regime (Figure 4.3). In other words,these results indicate that shifting from a linear gradient towards a two-patch modelchanges the expectation of successful expansion only once the local ∆zopt becomes toolarge, not when the landscape consists of a specific number of patches. Furthermore,under these scenarios, the medium-large regime again showed difficulty in expansionrelative to the other regimes. For this regime, we see slower growth rates on thesteeper gradient at all patch sizes and on the weaker gradient when ∆zopt = 7.5,indicating more difficulty in adapting across the landscape. The population growthrates for smaller patch sizes in the large effect regime are slower than at ∆zopt = 7.5,suggesting the potential for a mismatch between the scale of environmental changeand the scale of mutational change.4 .3 .3 Expansion in a two patch modelWe examine the ability of each genetic architecture to expand in a two patch modelfor a range of ∆zopt values between the patches. The steepest ∆zopt (15) matches the∆zopt in the previous section’s weak gradient two-patch model. We find a patternsimilar in the two patch models to that on the linear gradients (i.e., Figure 4.2) forthe ability of each genetic architecture to successfully expand (Figure 4.4). The smalleffect regime can cross a steep ∆zopt (12.5), but requires time beyond the end of theburn-in to adapt and expand into the second patch. The medium-small regime hasmore difficulty at this value of ∆zopt, with only 3 of 20 replicates succeeding, and themedium-large regime does not expand at this or the next smallest ∆zopt value (10).However, the large effect regime shows a qualitative change in its expansion abilityby being able to cross a ∆zopt that the medium-large regime could not. Despite thelarge effect regime exhibiting the lowest genetic variance of all cases simulated, its70chapter 4lllllllllllllllllllllllMean population growth rate02547550052555057560010−2 10−3 10−4 10−50.005 0.05 0.5 5.0µα2lllll∆zopt = 30∆zopt = 15∆zopt = 7.5∆zopt < 4b  = 0.0075b  = 0.015*F igure 4 .3 : ∆zopt, rather than b or number of patches, determines success ofexpansion. Population growth rates (number of individuals per generation) areshown for populations expanding over patchy gradients with overall gradient valuesof b = 0.0075 (circles) and b = 0.015 (triangles). The four different regimes of geneticarchitecture are indicated along the x-axis. The medium-large genetic architectureregime shows slower expansion. Simulations with ∆zopt < 4 (more patches) allproduced similar growth rates and are averaged as the gray points. Cases where∆zopt is equal to that of the two patch model in Figure 4.4 are colored equivalently.The asterisk indicates the case where a single replicate that expanded contributes tothis mean growth rate. This upper 95% confidence interval spans to a growth rate of91 individuals per generation. Error bars indicate 95% confidence intervals.range succeeds in expanding where the medium-large effect regime could not. Theseexpansions require additional time after the end of the burn-in when populationsexist in the first patch and attempt to adapt to the second patch (Figure 4.4). Thesame trend is seen with a larger dispersal distance, where increased dispersal makesexpansion more difficult at the same values of ∆zopt (Supplemental Figure C.3).Mutations contributing to adaptation between patches — We ex-amine potential mechanisms for this qualitative change occurring across genetic ar-71chapter 40 5000 10000 15000 20000Time until expansionConstant VGConstant VmNo observed expansion 10−210−310−410−510−510−5µ0.0050.α2 l l∆zopt5 7.5 10 12.5 15F igure 4 .4 : Waiting time (in generations) before expansion succeeds into thesecond patch varies across cases of genetic architecture. Points are jittered forvisualization. All points beyond the vertical dashed line did not expand during the20, 000 generations of the simulation. All cases shown are for σdisperse = 2, whileresults for σdisperse = 4 are shown in Supplemental Figure C.3.chitectures. First to understand the effect sizes of mutations contributing to adap-tation in the second patch, we compare genotypic values of individuals in eachpatch near the boundary of the patches. Though we do not have the ability to trackspecific mutations, this analysis allows us to assess if few, large-effect mutations mayhave occurred at specific loci and contributed to adaptation in the second patch. Tomeasure this, we take individuals present on either side of the boundary betweenpatches. We exclude individuals in the closest two cross sections, as these are moststrongly subject to gene flow across the patches. Using the nearest 40 cross sections(outside of the two excluded on either side of the boundary), we calculate the meangenotypic value for each locus across all diploid individuals, resulting in 100 locusvalues. In order to capture the effects directly contributing to adaptation in the newpatch, this measurement was made within 25 generations of expansion, as determined72chapter 4by population size exceeding 4000 individuals in the second patch. The pairwisedifference between the two patches for these 100 locus values are shown for allreplicates that successfully expanded into the second patch in Figure 4.5, and numericvalues are listed in Supplemental Table C.1.In all cases of ∆zopt, we can see that the large effect regime (α2 = 5.0) containsthe largest per-locus differences between the patches, and the medium-large regimethe second-most. As ∆zopt increases, the large effect regime produces more andgreater magnitude differences between the patches (Figure 4.5). Small effect regimessuccessfully expand into the second patch without differentiating via large effectalleles.Constant VG versus constant VM — To examine if the amount of geneticvariance, rather than mutational effect sizes, drives the ability for populations toexpand and adapt, we analyze a subset of simulations with parameters (as describedin the Methods) that created an approximately constant VG instead of a constant VM(values of VG in Supplemental Figure C.4). With a decreasing α2, expansion becamemore difficult. Expansion no longer occurred at values of ∆zopt where expansionpreviously succeeded and became more difficult at lower values of ∆zopt (Figure 4.4).In combination with Figure 4.5, this shows that genetic variance was not the onlydetermining factor for expansion, but that the availability of mutations of larger effectdetermined the ability to adapt and expand into the second patch.4 .3 .4 Combination of patch size and ∆zoptWe have thus far shown that overall b matters for range expansion, but also thatbreaking this overall gradient into discrete patches can change the outcome of expan-sion. ∆zopt directly impacts the outcome of expansion, but when the overall b failsto predict expansion, is ∆zopt a sufficient determinant of range expansion? We nowinvestigate the ability of populations to expand across multiple patches of a constant∆zopt, using ∆zopt values that resulted in successful expansion in the two patch model(∆zopt = 5 and ∆zopt = 10). We simulated patch sizes between 1 and 392 across the73chapter 4Count05101520∆zopt = 5Mutation Parametersµ = 10−2;  α2 = 0.005µ = 10−3;  α2 = 0.05µ = 10−4;  α2 = 0.5µ = 10−5;  α2 = 5.0−0.4 0.4 1.2 2 2.8 3.6 4.4 5.2 6Count05101520∆zopt = 7.5−0.4 0.4 1.2 2 2.8 3.6 4.4 5.2 6Count05101520∆zopt = 10Per locus effect size difference between patches−0.4 0.4 1.2 2 2.8 3.6 4.4 5.2 6F igure 4 .5 : Large effect genetic architecture regimes adapt via few loci of largeeffect while small effect regimes adapt via many loci of small effect. Per locusgenotypic differences between patches in the two patch model are shown, where eachcount represents one locus from across all replicate simulations that expanded intothe second patch. Bars extending beyond the limit of the y-axis reach large values andare truncated to focus on large effect differences. For 20 replicates and 100 loci, thecounts for each color sum to 2000, and there are many small effect differences fromloci in the truncated bars.74chapter 4landscape, therefore increasing the overall b as patch size decreases and more patchesare present on the landscape (e.g. Figure 4.1, solid lines).1 2 5 10 20 50 200 5000100200300400500600∆zopt = 5Mean population growth ratePatch Size5 2.5 1 0.5 0.25 0.09 0.03 0.01bllllllllllllll l1 2 5 10 20 50 200 5000100200300400500600∆zopt = 10Patch Size10 5.0 2.0 1.0 0.50 0.17 0.07 0.02bl l l l ll llll llll llµ = 10−2;  α2 = 0.005  µ = 10−3;  α2 = 0.05µ = 10−4;  α2 = 0.5µ = 10−5;  α2 = 5.0F igure 4 .6 : Mean population growth rates (number of individuals per generation)are lower for steeper and patchier landscapes. Patchy landscapes have a constant∆zopt of either 5 or 10. The overall gradient (b) increases with additional steps andcan be calculated as b = ∆zopt(number o f patches−1)1960 .We find growth rates are faster for larger patches (Figure 4.6). As patch sizedecreases, population growth rate decreases, and this decrease is faster for the highervalue of ∆zopt. Greater dispersal distances increase population growth rate withlarger patches (Supplemental figure C.5). However, at smaller patch sizes, greaterdispersal leads to slower growth rates. At the lower ∆zopt value, genetic architecturesswitch rank order for fastest population growth rate as patch size decreases. Smallermutational effects exhibit slower growth on smaller patches and large mutationaleffects exhibit slower growth with larger patches, suggesting again the potential formismatches between the scale of landscape patchiness and mutational changes at agiven value of ∆zopt. Despite the ability of ∆zopt to predict expansion in a two patch75chapter 4model, multiple smaller patches of a given ∆zopt can change the outcome of expansion,making a prediction based solely on ∆zopt inaccurate.4 .4 discussionOur study has investigated two major aspects of range expansion biology: how dolandscape patchiness and genetic architecture change the outcome of adaptation andexpansion. We find that both of these factors can impact the ability for populationsto expand. This leads to two main results of our study. First, landscape heterogeneityshows complex results when moving away from simple linear gradients or pure twopatch models. In these intermediate, multiple patch landscape scenarios, knowingeither the overall gradient value or the degree of change in environment betweenpatches is not sufficient to successfully predict the outcome of a range expansion.Second, genetic architecture shows an interesting pattern where adaptation and ex-pansion can succeed from either many mutations of small effect or few of large effect,but that between these two regimes where mutations are of intermediate effect andfrequency, adaptation becomes more difficult or even impossible.4 .4 .1 The role of genetic architectureThe genetic architecture of the trait impacted range expansion in all landscapes in-vestigated, from linear gradients to two patch models to heterogeneous gradientsconsisting of many patches. At the environmental limit of expansion ability (i.e.,where b or ∆zopt become too steep), a change in genetic architecture regime awayfrom either large effect mutations or many small effect mutations limited expansion.Moreover, the genetic architecture regime in general impacted the rate of popula-tion growth across the landscape and over environmental barriers. Though geneticvariance has been used previously for determining the critical steepness on linearenvironmental gradients (Kirkpatrick and Barton, 1997; Barton, 2001; Bridle et al., 2010;Polechová and Barton, 2015), we find that genetic variance is not always sufficientfor describing this outcome. With equivalent genetic variance, the availability of76chapter 4larger effect mutations can lead to successful range expansion that is not possiblevia small effect alleles under the same environmental conditions. Our simulationsrevealed that it is possible to successfully adapt to novel environments in the absenceof high genetic variation, as long as there are sufficiently large effect mutations.Therefore, two qualitatively different scenarios exist under which adaptation to newlyencountered environments can proceed. First, sufficient numbers of small effectalleles can lead to successful expansion, and second, few alleles of sufficiently largeeffect can also lead to successful expansion. The intermediate between these scenarios,where populations lack either mutations of large enough effect or in large enoughnumbers, expanded slowly or not at all.Furthermore, predicting expansion based on values of genetic variance was notuseful, as even for cases of equal genetic variance but varying genetic architecture,the outcome of expansion differed. Genetic variance evolved differently on eachgradient and alone is not a sufficient predictor of expansion. Having said that, whengenetic variance was too large, populations did not persist, presumably because mostindividuals were too different from their local phenotypic optimum, reducing fitnessand therefore population density to non-sustainable levels. Similar to the result ofSchiffers et al. (2014), we also have limited evidence from simulations using land-scapes of multiple patches that expansion is optimized when the scale of landscapepatchiness matches the scale of mutational effects available to the system.4 .4 .2 The role of landscape heterogeneityThe structure of the landscape has a large impact on the outcome of expansion.Expansion over a linear gradient requires continued adaptation to environmentalconditions changing gradually over space. Therefore the success of expansion ona linear gradient is determined by the steepness of the gradient, b. As expected,steeper gradients prevented range expansion. However, the critical values of b atwhich expansion failed was not predicted by previous theory (Barton, 2001). Thereare several major reasons for which these predictions are not expected to matchour measured bcrit. Our two-dimensional landscape does not perfectly match the77chapter 4continuous space simulations from which these predictions derive. Perhaps moreimportantly, our simulations contain a core population that can never go extinct.Previous studies initiated populations on the environmental gradient, whereas oursbegin in a core patch with a constant environmental optimum, and only outsideof this core does the environment change. Therefore, even on an extremely steepgradient, global extinction cannot occur because the core population will survive. Inour model, individuals dispersing from the core can repeatedly attempt to adapt tothe new environment throughout the course of the simulation without being subjectto extinction.Our novel results on the impact of landscape structure on range expansion comefrom introducing heterogeneity within an overall environmental gradient. We haveclearly shown that increasing the patchiness across an environmental gradient canlimit range expansion. Understanding why and predicting where this patchinesscreates a limit for range expansion is complicated. The magnitude of change inenvironmental optimum between two patches strongly affects expansion ability, andthe success of expansion into a second patch relies on sufficient mutational inputin order to adapt to the novel environment. When patch size becomes increasinglylarge, expansion becomes easier as individuals are able to adapt sufficiently to theirlocal patch before confronting the next environmental change. In these cases, thesuccess of expansion is determined by ∆zopt. Because space is explicit in out model,larger patches allow more individuals to exist farther from patch boundaries and thusescape the maladaptive effects of immigrants from nearby patches. Increasing patchsize does not, however, necessarily increase the effective population size and thusincrease the efficacy of selection against maladaptive alleles. Unlike pure two-patchmodels where every individual within a population exists in one undefined location,the neighborhood size over the landscape in our model limits panmixia within apatch. Therefore, only at patch boundaries do individuals suffer from the influx oflocally maladaptive alleles from migrants. This is in agreement with our finding thatincreasing dispersal distance makes expansion over a given ∆zopt more difficult as78chapter 4colonists in the new habitat are more strongly subject to gene swamping despite thefurther influx of genetic variation, in contrast to the findings of (Holt et al., 2003).4 .4 .3 Implications and limitationsThere are numerous points of interest from these results that merit future investiga-tion. Investigating the effects of migration barriers on heterogeneous landscapes aswell as landscapes that do not always increase monotonically will provide insightinto several potential processes. Temporary barriers to gene flow would allow lo-cal populations otherwise subject to maladaptive migrants to become more locallyadapted and expand further across the landscape. Even after removal of the barrier,populations may have expanded a sufficient distance to be able to survive even ifpopulations at the location of the former barrier may again go locally extinct. Suchan expansion event may have bearing on studies of speciation. Furthermore, exam-ining expansion over non-monotonic patchy landscape gradients may have differentresults for expansion. On a monotonic landscape, as in our simulations, each newlyconfronted change in the environment continues to push the phenotypic optimumincreasingly away from what it was in the landscape core. On a non-monotoniclandscape, however, some newly confronted patches would have an optimum closerto one that the expanding population has already encountered previously. In thiscase, edge populations that exist as sinks in patches that are impossible to fullyadapt to may have the potential to bridge a dispersal gap to a subsequent patchwhere adaptation might be able to succeed and expansion thus continue. Mostimportantly, incorporating the evolution of dispersal ability may produce differentresults as populations may adapt their dispersal ability to better match the level ofpatchiness in their environment and limit the detrimental effects of migration load.We might expect landscapes with smaller patches to evolve more limited dispersal,therefore enhancing local adaptation across the landscape.Several empirical systems support our results and may provide useful avenuesfor further tests of our theoretical predictions. There are many examples of adjacent,distinct habitat patches over which gene flow occurs and populations must adapt to79chapter 4either location. Color polymorphism in many species shows strong selection frompredators against intermediate phenotypes or phenotypes mismatched to the localenvironment. In mice adapting crypsis to beach environments or across lava flows(Nachman et al., 2003; Hoekstra et al., 2004, 2006) and reptiles adapting to differentcolored substrates such as sand or lava flows (Rosenblum et al., 2004) adaptation hasbeen shown to rely largely on single gene polymorphisms, such as MC1R (Rosenblumet al., 2004), or few genes of large effect (Nachman et al., 2003), supporting our findingof adaptation via large effect mutations. In Batesian mimicry systems such as theharmless scarlet kingsnake (Lampropeltis elapsoides) and the venomous eastern coralsnake (Micrurus fulvius), cryptic kingsnakes exist in regions without coral snakeswhile in regions of co-occurence the kingsnake mimics the colorful coral snake. In-termediate phenotypes between these two morphs are selected against by predators(Kikuchi and Pfennig, 2009), and the character displacement that occurs in the co-occurring kingsnake requires further study to understand the role of genetics in thisadaptation (Pfennig and Pfennig, 2012). Furthering our understanding of the geneticarchitecture underlying important phenotypic traits in these and related species willprovide useful tests of our predictions for adaptation via either many small effect locior few large effect loci rather than through intermediate mutational regimes.Our model has incorporated many biologically realistic parameters, but also pos-sesses a lack of realism at some levels. The small-effect mutation regime has a veryhigh mutation rate that may be unlikely to exist in the real world. Nevertheless, ourpattern of a qualitative shift in expansion ability by genetic architecture holds whenexamining across the medium and large regimes. As the availability of genomic datacontinues to grow and our understanding of the true genetic architecture of manyspecies improves, we can gain valuable insight into the processes impacting speciessurvival in a changing world and understand which species or populations may be atgreatest risk of either extinction or detrimental invasion.80chapter 5Recommendations for utilizing and reporting populationgenetic analyses: The reproducibility of genetic clusteringusing the program structure5 .1 introductionThe reproducibility of scientific research is fundamental to maintaining scientific rigorand advancing science (Price, 2011). Full experimental replication provides the mostthorough means of verifying published empirical results, but this approach can beimpractical due to the difficulty in obtaining identical samples and large financialand time commitments (Peng, 2011). Diminishing costs and advancing technologyhave resulted in a plethora of large genetic data sets, while at the same time, there hasbeen an increase in the complexity of software applications. A previous investigationinto the reproducibility of microarray studies found that few were fully repeatable, asmany suffered from ambiguity in the methods, discrepancy in the results, and lackof available data or software (Ioannidis et al., 2009). Maintaining the rigor of today’sscientific research may therefore prove a more difficult task than expected, as both theempirical results and the often complex analyses need to be reproducible. Efforts toencourage and implement data archiving and sharing are expanding, and these createthe opportunity to test the validity and reproducibility of scientific results (Whitlocket al., 2010).Reproducing results within the field of molecular ecology is especially difficult be-cause biological samples are unique to their particular place and time, and subsequentsamples may reflect different ecological or evolutionary forces (Wolkovich et al., 2012).Researchers therefore tend to test the same overarching hypothesis with samples fromdifferent taxa and locations, in the hope of arriving at a more general and repeatable81chapter 5pattern. However, drawing broad conclusions from the results of many studies isineffective when the results of the individual studies cannot be reproduced from theirunderlying data. It is thus essential to test the reproducibility of statistical analysesat the level of individual papers as well. To examine how well we could recreatethe results from typical molecular ecology studies, we investigate, as an example,the reproducibility of studies that used genotype data to identify genetically similarclusters of individuals with structure (Pritchard et al., 2000). Many studies useclustering results based on structure to perform further analyses, making it animportant foundation upon which inferences are built. We ask whether (i) archiveddata sets are sufficiently complete and well annotated that they can be reused, (ii)published articles specify all the methodological details necessary to reproduce theanalysis, and (iii) where possible, the same conclusions can be reached by reanalysingthe archived data. Although reproducibility has many different aspects, we use ithere to mean the agreement between results obtained through analysing identicaldata sets using the same analytical method but under different conditions (differentobservers, computers and starting points in computer algorithms). We reanalysed 23articles from 2011 that used structure to infer genetic clustering and also checkedthe level of data completeness and methodology reporting in an additional 37 articles.5 .2 methods5 .2 .1 Obtaining DatasetsWe gathered structure data sets associated with 23 papers published in 2011: 21from Molecular Ecology, and two from the journal PLoS One. Data were obtainedfrom the online data repository Dryad (Dryad Digital Repository, 2012) in November2011, NCBI GenBank, or from the supplementary material accompanying the paper.With one exception, we excluded papers where data were archived on GenBankdue to the difficulty of compiling individual accessions into the correct format forstructure.82chapter 5For a broader assessment of data set completeness and methods reporting, we alsoincluded 37 data sets obtained by contacting the authors of original research paperspublished in PLoS One, PLoS Genetics and BMC Evolutionary Biology and collectedas part of a separate study (Vines et al., 2013).5 .2 .2 The Program structureThe freely available Bayesian clustering program structure (Pritchard et al., 2000)is the most commonly used application to infer population structure, with over 5000citations in Web of Science as of June 2012. structure uses multilocus genotypedata to describe and visualize population structure based on allele frequencies of thedata.structure is capable of analysing a variety of genotype data, including bothcodominant markers (microsatellite and single nucleotide polymorphism, SNP; Pritchardet al. 2000) and dominant markers (amplified fragment length polymorphism, AFLP;Falush et al. 2003). The model uses Markov chain Monte Carlo (MCMC) simulations toestimate the group membership of each individual, assuming Hardy-Weinberg andlinkage equilibrium within groups, random mating within populations and free re-combination between loci (Pritchard et al., 2000). Due to the random walk characteris-tic of the MCMC methods, structure outputs are not expected to produce identicalresults, yet the approach should be robust enough to yield identical conclusions whenreproduced. The program is initiated with a required text file containing individualgenotype data and labels as well as optional information on population assignment,sampling sites or locus names. The user specifies several essential parameters regard-ing the ancestry model, the allele frequencies model, the length of the burnin (initialruns of the simulation during which data are not retained to ensure results are notdependent on initial conditions), length of run time (number of MCMC repetitionsduring which data are retained), the number of independent replicates of each set ofparameters and the range of number of clusters (K values) to be tested. These can bespecified directly in the graphical user interface or in a separate text file when runin the command line. In addition, it is possible to specify extra parameters, mainly83chapter 5regarding the Markov chain process as well as a sampling location prior. The detailsof the model are described by Pritchard et al. (2000, 2007) and Falush et al. (2003, 2007).structure outputs are typically analysed to infer the optimal K by one or a com-bination of methods. In the method described in Pritchard et al. (2000), the optimalK is chosen by plotting the log probability of the data (referred to as ln Pr(X|K) instructure’s manual, Pritchard et al. 2007) against a range of K values and selectingthe K with the highest ln Pr(X|K) or the one after which the trend plateaus, whilealso taking into account the consistency of the groupings across multiple runs withthe same K. An alternative method, described by Evanno et al. (2005), formalizes anad hoc approach based on plotting the secondorder rate of change in ln Pr(X|K) forsuccessive Ks (referred to as ∆K) against a range of K values, and selecting the trueK based on where the maximal value of this distribution occurs. As emphasizedin the structure manual (Pritchard et al., 2007), selecting the optimal K can bequite a subjective procedure and is best inferred when the biology and history of theorganism are taken into account. Replicate structure runs can be combined usingthe software programs clumpp (Jakobsson and Rosenberg, 2007) and structureharvester (Earl et al., 2012). Bar plots depicting the ancestry proportions (ormembership coefficients, Q) of individuals in each cluster can then be created, forexample with the software distruct (Rosenberg, 2004), to visualize the populationclusters.5 .2 .3 Analysing Data SetsWe followed procedures for analysis as described in the methods section of eachpublication and used default settings for parameters that were not specified. Severalpublications performed multiple structure analyses, which we counted indepen-dently for a total of 34 analyses. We made use of the Bioportal computing resource(; Kumar et al. 2009) or local desktop computers. Out-put was compiled with structure harvester and processed following the au-thors’ description, including CLUMPP analysis where appropriate. We first assessedwhether we could reproduce the K values from the original study based on the meth-84chapter 5ods used by the authors. Then, whenever possible, membership coefficient bar plotswere visually compared by multiple authors of the present study to assess whetherour results were a true reproduction of the original results. When we concluded thesame value of K as the authors, we deemed the analysis as reproduced unless themembership coefficient bar plots showed strikingly different results.For the broader survey of data set completeness, we evaluated whether the samplesize and number of loci described in the paper matched the obtained data set fromboth the data sets obtained from online material (23 studies) and email correspon-dence (37 studies). To check the overall standards for reporting parameter settingswithin either the methods section or in a supplemental file, we also recorded thenumber of ‘essential‘ parameter settings (range of K values tested, length of burn-in,length of MCMC repetitions, number of independent replicates, the admixture modeland allele frequencies model) given in the paper or supplemental material for all ofthe above analyses.5 .3 resultsOf the 23 papers, we attempted to reanalyse using data from supplementary materialsor online repositories, two papers did not have archived data present at the time,making their reanalysis impossible. Three papers (13%) provided data where thenumber of individuals and/or loci specified in the publication did not match thosepresent in the data set, or the authors performed their structure analysis on anunspecified subset of the archived data. Of these 23 papers, three selected K usingthe Pritchard method (Pritchard et al., 2000), seven used the Evanno method (Evannoet al., 2005), eight used a combination, one used a nonparametric Wilcoxon test, twodid not specify their method, one used no standard method and rather utilized K= 2 to identify hybrid individuals and one discussed a comparison of two K valuesobtained in a previous study. We therefore also did not assess the reproducibility ofthese final two papers, leaving 19 papers (containing 30 analyses) that we attemptedto reproduce. See Tables D.1 and D.2 for full characteristics of all analyses.85chapter 5						F igure 5 .1 : Results of structure reanalyses. Initial branching arrows show thenumbers of analyses resulting in different outcomes at the point of selecting a K value.The subsequent arrows show the numbers of analyses successfully reaching the pointof matching membership coefficients. Size of arrowheads is proportional to numberof analyses present. *When K was not inferred, we attempted to match membershipcoefficients across all K values (still only counted as 1 analysis). **When K was notreproduced, we compared membership coefficients at the authors’ chosen K. ***Forincomplete data, analyses could not be run.We were able to reproduce the authors’ inference of K for 70% (21 of 30) of theanalyses (Figure 5.1). All of the successfully reproduced data sets consisted of mi-crosatellite genotypes. In general, microsatellite data sets were analysed using longerburn-in and MCMC run lengths as well as more independent replicates; however,there was no significant difference in an overall proxy for run length ([length ofburn-in + length of MCMC repetitions] 9 number of independent replicates) betweenanalyses that were reproduced and those that were not (t = 0.0617, d.f. = 13.564, P-value = 0.95). Comparing these parameters individually, we found a trend of longerburn-in (t = 1.8706, d.f. = 26.991, P-value = 0.072) but not of more MCMC repetitions(t = 1.6537, d.f. = 21.677, P-value = 0.11) or an increase in the number of independentreplicates (t = 1.1442, d.f. = 7.511, P-value = 0.29) for reproduced studies. Comparisonof the K values chosen by the original authors versus our reanalysed K results showeda significant correlation of 0.5934 (t = 3.9703, d.f. = 29, Pvalue = 0.0004; Figure 5.2).We also assessed the completeness and description of all 60 data sets that weobtained and found 35% to be either incorrectly or insufficiently described by the86chapter 5authors. We found that 17 data sets did not match the description given in the paper,most typically because the data contained a different number of loci or individualsthan suggested by the paper. Lastly, four papers did not give any clear description ofthe number of individuals, loci or both used in their structure analysis, makingit impossible to judge how well the archived data matched the data analysed by theauthors.Authors’ descriptions of the essential parameters used to run structure variedmarkedly, ranging from 0 described parameters to a maximum of 6 (median = 6). Wefound a significant difference in number of essential parameters between two of thejournals (t = 3.31, d. f. = 40.27, P-value = 0.015), with Molecular Ecology having amean of 5.7 parameters specified and PLoS One 4.6 (PLoS Genetics, 4.7 and BMC Evol.Biol., 4.8). Overall, length of burn-in ranged from 1000 to 50 000 000 (median = 50000), while MCMC repetitions ranged from 10 000 to 500 000 000 (median = 450 000).Independent replicates ranged from 3 to 100 (median = 10).5 .4 discussion and recommendationsReproducibility is a foundation of scientific research. The widespread application ofstructure makes it an ideal case study to test the ability to reproduce molecularecology results that rely on large data sets and complex algorithms. As struc -ture results often serve as the underpinnings for further analyses and conclusionswithin a study, it is important to assess whether the implementation of the program,subsequent analysis and associated conclusions are properly reported and can bereproduced.We find that reproduction of structure results can be difficult to achieve, de-spite the straightforward input requirements of the program (a genotype file and twoparameter files). Our results show that 30% of analyses are not reproducible. A largefactor in the failure to reproduce these analyses was the availability of data in a formthat could be readily understood by researchers not familiar with the study system.Had we included studies with no data available as the starting point for our reanalysis,our assessment of failure to reproduce would have been even higher, particularly for87chapter 50 1 2 3 4 5 60123456Original chosen K valueChosen K value of reanalysisF igure 5 .2 : Comparison of K values for original studies versus reproduced studies.Dotted line indicated the 1:1 line, points are jittered for better visualization.88chapter 5journals without a strongly enforced data archiving policy (see Vines et al. 2013 forfurther discussion of data accessibility).We recognize that assessing reproducibility is inherently difficult. Our main eval-uation criterion (same K value) is only a small part of full reproducibility of thesestudies, but the most objective one. Furthermore, it is difficult to disentangle thenonreproducibility caused by the stochastic nature of the program from that causedby both discrepancies in data sets available versus those used by authors and theirreported methods. The trend of longer burn-in lengths in reproduced studies suggeststhat at least a portion of the poor reproducibility of some studies is due to theinherent stochasticity of the Monte Carlo approach itself. In at least one case, wecan attribute our failure to reproduce the study to insufficiently described, complexanalyses performed; however, there seemed to be no other outstanding characteristicsof nonreproducible studies. It is important to note that although structure isthe most commonly used program, in some instances, other methods may be moreappropriate for a given data set. For example, performing a PCA allows examinationof variability within clusters, other Bayesian methods such as the program instruct(Gao et al., 2007) allows inbred genotypes to be used, tess (Chen et al., 2007) utilizesspatial information, and baps (Corander et al., 2003, 2004; Corander and Marttinen,2006a,b) aids in detection of admixed individuals. Using the right program is notonly essential to drawing correct conclusions, but may also improve reproducibilityof results. Further discussion of additional approaches can be found in Latch et al.(2006) and François and Durand (2010).In addition, we may have judged a study to be nonreproducible despite differencesin the final results that may or may not have biological significance. The correlationbetween original and reproduced K values implicates this, yet there is still clearlyroom for improvement. With such widespread use within its field, it is important thatusers of structure properly implement the software, regardless of whether or notthey possess a full understanding of the algorithm underlying the analysis. To ensurethat published results can be reproduced, we make the following recommendations89chapter 5for future users of the program. Although our study is specific to structure, manyof these recommendations are applicable to other types of analysis.1 For archiving purposes, authors should be encouraged to provide the finalversion of both the genotype and parameter files. We propose that authorsarchive genotype data from all individuals. If only a subset was used in theanalysis, these individuals should be clearly identified in the same file so thatthis information is retained. The parameter files include all the settings used inthe analysis, hence archiving the entire file avoids any confusion regarding useof default settings when not explicitly stated by the authors. When using thegraphical user interface version of the software, the parameters can be exportedfrom the program in text format for archiving purposes.2 Authors should ensure that burn-in and run lengths are sufficient. We foundremarkable variation in parameters affecting the computational demands of theanalysis (burn-in time, MCMC repetitions, and replicate runs). Though wefound no significant difference between an overall proxy for run length andreproducibility and only a slight trend individually for burn-in time, given theadvances in computing power, we feel that the proposed minimum require-ments, dating back to the software’s advent more than a decade ago, shouldbe increased. It is difficult to set a standard, as variability across data sets inthe number of loci, their levels of polymorphism, and the amount of populationstructure present all also contribute to the program’s ability to successfully de-tect the appropriate K (Rosenberg et al., 2001; Latch et al., 2006; Gao and Starmer,2007). We would advise a minimum of at least 100 000 burn-in iterations andMCMC repetitions for each run, and much longer burnin will be required forsome data sets. Comparing a range of run durations may help to determinethe appropriate run length, and it is always advisable to choose a longer burn-in and run length. To confirm that burn-in is adequate, it is also importantto check for convergence in values of summary statistics (particularly α, F, D,and the likelihood) that are estimated by the program, as recommended in the90chapter 5structure manual (Pritchard et al., 2007). Additional independent replicateruns are of great importance as they limit the influence of stochasticity andincrease the precision of the parameter estimates. That is especially true whenusing the Evanno method, which requires an estimate of variance. In at leastone reanalysis we performed, only five replicate runs were used, which mayexplain the failure to reproduce results (the chosen K) in this particular study.We recommend 20 replicates as used by Evanno et al. (2005).3 Proper reporting of the methods used to analyse structure results is vitalfor inferring K. Whether the method outlined by Pritchard et al. (2000) or byEvanno et al. (2005) or both are used to select K should be clearly stated, as wellas any biological factors that have influenced the choice of K. Special attentionshould be given to the comparison of K = 1 versus greater values, as the Evannomethod is not capable of performing this comparison. We advise that resultsare reported in the form of the graph of the natural logarithm of the likelihoodof the data given K (if the Pritchard method was used) and the ∆K graph (ifthe Evanno method was used) as well as the bar plot(s) showing individualassignments for the given K or comparison across plausible K values. Ideally,for full reproducibility of a study, membership coefficients for each individualshould also be provided. These results should be examined within each replicateto determine how much stochasticity is present before runs are averaged, as wellas after averaging all replicate runs.5 .5 conclusionA substantial proportion of structure results were not reproducible, despite therelative simplicity of the procedure, requiring only a genotype file and associatedparameter settings. Our recommendations on how to archive data sets analysed withstructure should reduce the component of nonreproducibility due to uncertaintyof parameter choice or lack of clarity in the data analysed, but some discrepancieswill no doubt still persist. We hope that scientists will increasingly acknowledge91chapter 5the concept of scientific reproducibility in the future and be aware of practices theycan enact both for better data archiving and better implementation of other similarprograms in their analyses.92chapter 6ConclusionThe field of population genetics has grown by leaps and bounds since its advent. Therapid advancement of genetic and genomic techniques for acquiring data from everyorganism imaginable is a testament to this growth, as well as a cause. The insightsallowed through these advances have greatly informed all aspects of evolutionarybiology today. Even though I have not presented new data from natural organisms,my thesis work, and all theoretical work broadly, relies heavily on biological datacollected by evolutionary biologists through the years. We gain a great deal of knowl-edge by creating models of the natural world where alteration of specific factors canbe controlled and investigated. Models that are based in reality with biologically accu-rate parameters not only improve our understanding of population genetic processes,but also have more directly applicable insights for real world populations.I have presented four projects within my thesis, each building upon the abundanceof existing knowledge in the field of evolutionary biology. In Chapter 2, I tested theaccuracy of methods designed for estimating effective population sizes. Out of a rangeof existing methods, I found two of these to perform best. Furthermore, I found that ofthese two methods the best-performing method varied across different demographicscenarios of migration among populations. This lends credit to the proposition madethroughout my thesis that knowing the demographic history of populations is bothuseful and necessary to make proper evolutionary inferences from data.In Chapter 3, I examined range expansions with combinations of conditions gen-erating expansion load, local maladaptation to an environmental gradient, or both. Ifound an interesting interaction between these two factors. Rather than the combi-nation of both expansion load and local maladaptation to an environmental gradientworsening fitness at range edges, they instead act in conjunction to alleviate fitness93chapter 6reductions at expanding fronts. When fitness reductions caused by expansion loadaccumulated at expanding range fronts combines with the fitness costs incurred aspopulations attempt to adapt to the local environment over steep gradients, theseeffects compound and lead to the local extinction of populations at the range edge.This leaves behind a surviving range edge necessarily consisting of more locallyadapted individuals. Because any lower degree of local adaptation would combinewith expansion load to generate populations with unsustainable fitness levels, thosepopulations simply no longer exist, and local adaptation in the surviving range edgeis higher while range expansion is slower. This slower range expansion thereforemeans that the range edge occurs sooner in space than it would otherwise in theabsence of either expansion load or local maladaptation to an environmental gradient.In Chapter 4, I simulated range expansions with different genetic architecturesacross environmental gradients that varied over space in terms of patchiness andsteepness. Varying the genetic architecture of the trait underlying adaptation to the en-vironment showed different abilities of populations to adapt and expand their ranges.Two qualitatively different regimes resulted whereby expansion could succeed fromeither many small effect alleles or few large effect alleles. Intermediate between thesetwo regimes, expansion was more limited, indicating that this may be parameter spacefor which adaptation is most difficult. Though the upper bound of the mutation ratesincluded may be unrealistic, this result provides impetus for future investigation ofthe interaction between mutation numbers, effect sizes, and the scale of change in theenvironment.In Chapter 5, I presented a reproducibility study on published datasets usingthe widespread analysis method structure. We found that the same biologicalresult of K values could be reproduced in 70% of studies. However, if the successof reproducibility was measured from the starting point of searching and acquiringthe datasets for which reanalyses were performed, rather than out of the datasetssuccessfully acquired, this percentage would decrease greatly. This implies that thelargest issue in this reproducibility study is not the analysis itself, but in obtaining thenecessary data. These results are nonetheless promising, as they indicate the success-94chapter 6ful repeatability of a stochastic genetic analysis method. It would be interesting toconduct a similar reproducibility study today, since the standards for data archivinghave greatly improved in the four years since publication of this study. Furthermore, itwould be even more valuable if it were feasible to do such a large-scale reproducibilitystudy in the true sense of the word, whereby new data would be collected in thesame systems and analyzed to test if the same biological conclusion for populationclustering is obtained from independent sampling efforts.Each of my thesis projects has its own limitations as well as its own implicationsthat leave open interesting avenues for future research. The simulation studies inChapters 2, 3 and 4 each investigate a specific range of parameter sets, as computa-tional resources are limited. We chose the combinations of parameters and scenariosdeemed to be most relevant to the questions of interest as well as most biologicallyrelevant for applied uses. Analytical models may serve better to describe the rangeof parameter space where certain interactions are expected to be most interesting orof greatest effect, but approaching these questions through simulations has providedinsight not otherwise accessible through analytical models. In particular, Chapters 3and 4 have used spatially explicit simulations over large scale landscapes to inves-tigate range expansions. Such an approach is computationally intensive and hasnot been used in previous studies of range expansion on this scale. The biologicalrealism introduced by this approach allowed me to directly investigate the specificbiology of the questions in mind, especially in Chapter 4 where the structure of thelandscape is a key aspect under investigation. This simulation approach has beenmade publicly available for others to use in future studies and will hopefully allowcontinued biological insight into population genetic processes occurring in large scalelandscapes.The simulated data generated from Chapter 2 was designed to test biological sce-narios most relevant to scenarios occurring during estimation of effective populationsize in natural populations. Testing all methods available at the time on this range ofdata was the first fair comparison method to method. As new methods for estimatingeffective population size have been developed, this data will continue to be valuable95chapter 6in further comparisons. Since publication of Chapter 2, a new method (Hui and Burt,2015) has been created that I hope to test in the future across all scenarios alreadyinvestigated in my study. A new study has also performed a comparison across asubset of these methods to test their performance in a different range of biologicalscenarios as well as with differing effects of genetic markers on these inferences(Wang, 2016). The increasing trend of methods testing and comparison may bebest seen for methods estimating effective population size. Several previous studieshave presented assessments of method performance (Barker, 2011; Hoehn et al., 2012;Holleley et al., 2013; Neel et al., 2013; Ryman et al., 2013), and authors who developthese methods generally present an evaluation of their approach upon publication.This method testing is vital, but often lacks consistency in comparison across methodsand datasets, resulting in a comparison of apples to oranges. Methods testing can bedifficult and time-consuming, however, and no author is generally able to predictall situations for which their method may be applied. The burden thus rests onresearchers applying these methods to ensure that they are meeting the appropriateassumptions of the method, or to first test how applying a method may be biased intheir biological scenario.A great proposal which has yet to come to fruition is the idea to maintain anonline database of simulated data for methods testing. I participated in a NESCentworking group initiated by MC Whitlock for this SimBank project, where we devel-oped ideas for designing and implementing this database. Such a bank of simulateddata would provide standards for comparisons that different researchers could useacross methods or to compare a novel method on an equivalent dataset that hasalready been tested. Simulated data allows accurate assessment of analysis resultsbecause the true parameters or biological answers within the simulated data areknown. This would also increase the ease of methods testing, as authors do not haveto continually generate their own test datasets. Furthermore, researchers applyinga method to a dataset with a new scenario that may violate the assumptions builtinto the method would be able to find equivalent simulated data and test the methodthemselves in order to understand how useful and accurate the method may be in96chapter 6their scenario. The finalization of this sort of database would be a great boon toadvancing population genetic analyses, particularly as more genetic analysis methodsare developed. As more data, simulated and natural, continues to be archived atpublication and resources for long-term storage and access to data grow, the state ofdata availability is optimistic for evolutionary biology, making the incorporation ofsuch datasets into a resource for methods testing feasible.The insight gained from Chapter 2 has already prompted my participation intwo studies of effective population size in real datasets of scenarios violating theassumptions of these estimation methods. I am working with collaborators on adataset consisting of a metapopulation of the weedy species Silene latifolia for whichthere is long-term census data and multi-year genetic data allowing the comparisonof effective population size measures through time and at different hierarchical levelsof population structure in relation to the colonization-extinction dynamics of thismetapopulation. Furthermore, collaborators interested in the effective populationsize of an endangered species of stickleback have conducted a mark-recapture studyin two lakes within British Columbia where distinct groups of benthic and limneticfish live. Comparing census sizes to effective population sizes in this system willprovide insight into the dynamics between the fish morphs and their respective levelsof diversity for conservation purposes.With the relative ease of acquiring genetic and genomic data, population geneticsis more able than ever to compare theoretical predictions to thorough biologicalanalyses. Results from Chapter 3 make predictions for the range of effect sizes ofdeleterious mutations we may expect to see accumulating due to range expansion.Empirical studies have shown various results for the effects of deleterious allelespersisting in human populations (Lohmueller et al., 2008; Henn et al., 2016; Do et al.,2015). Detecting small effect loci in real datasets is extremely difficult, compoundedby the statistical problem of comparing across huge sets of loci. It will therefore beworthwhile to compare the values found in empirical studies to those predicted fromour simulations. This has the potential to inform studies of expansion load in terms ofwhat portion of small effect loci are unable to be detected but likely still contributing97chapter 6to expansion load, adding to the commentary on how important expansion loadmay actually be in the real world. As there is still only little empirical evidenceon the distribution of effect sizes across species’ genomes, further simulations withadditional loci may lead to valuable insight. I saw that the effects of multiple loci onfitness and adaptation did not combine multiplicatively, but instead interacted. Thisboth lends credit to as well as points out the faults in learning biology from simplifiedmodels. Our model clearly improved upon any using simply one locus. If one wereto predict the effects on population survival during range expansion by extrapolatingfrom a one-locus model, the result would be highly unrealistic, as adding more locicreates the potential for ever-increasing interactions. Such interactions are likely to berife throughout biology, and understanding these interactions through controlled sim-ulation experiments becomes valuable for creating a strong foundation of predictionsto be tested in real world data.Relevant to both of my studies on range expansions are a wide range of furtherbiological scenarios that merit investigation. A range of mating systems may resultin different values of both expansion speed and fitness during range expansions.Chapter 3 used hermaphrodites capable of selfing, while Chapter 4 used obligatelyoutcrossing hermaphroditic individuals. The lack of selfing may slow range ex-pansion (which is not directly comparable across these two studies) also leading tofurther reductions in expansion load. If the simulated mating system was designedto represent plant species where pollen travels farther than seeds, this may also haveimplications for expansion load and local adaptation during expansion, as the speedof expansion would be limited by seeds, but migration load could be generated bymuch wider-ranging pollen movement (Lopez et al., 2008). More importantly, alongthese lines is the potential for the evolution of dispersal, which I did not incorporateinto our models. As discussed in each chapter, this has the potential to interact withrange expansions in unpredictable ways.An additional characteristic I plan to examine in relation to range expansions is adifferent process called a range shift, where not only is there an expanding front edgeof a population spreading geographically, but there is also a receding trailing edge.98chapter 6This scenario can be imagined in many cases of species tracking moving climaticchanges, where the trailing edge is going extinct as it is no longer able to adapt to thelocal conditions. In these scenarios, there is no dense population core that persists.We may expect there to be less recovery from expansion load that accumulates atthe front, as beneficial diversity is lost from the receding edge. In the case wheresuch a shift occurs over heterogeneous landscapes, however, it may in some casesbe a beneficial scenario. It is possible that migration load from differently adaptedpopulations at the receding edge of the species range may be less severe becausethose populations are less different than the expanding front and also not subject tofurther migration load from a dense core. The conditions for speed of expansion, sizeof species range, and scale of environmental change over space for such a scenariorequire further simulations. One further valuable examination relevant to Chapter 4 isto compare across a wider range of genetic architecture regimes. Varying the numberof loci contributing to the quantitative trait while still maintaining the same values ofgenetic variance or mutational variance will improve our understanding of the rolethat different types of genetic architecture play in local adaptation.Since publication of Chapter 5, we have published two follow-up studies investi-gating the availability of archived (Vines et al., 2013) and non-archived (Vines et al.,2014) data. In combination, these studies have highlighted the benefit of archivingdata for future research uses. A worthwhile future direction would be to investigatethe ability to reproduce results from other stochastic analysis methods. As studieshave shifted more to large-scale genomic datasets, many population genetic analysismethods have been developed to account for the increase in available data, evenwithin the topic of clustering algorithms and analyses of population structure (Rajet al., 2014; Petkova et al., 2015; Bradburd et al., 2016). One would expect more data toproduce more consistent results, but it is worth further investigation to examine thevalidity of this assumption.In conclusion, my thesis has made valuable contributions to the field of evolu-tionary biology. These simulation and methods testing studies have highlighted theimportance of understanding the demographic history of populations and the impacts99chapter 6this can have on local adaptation. When combined, theoretical and biological insightsinto evolution provide the most valuable knowledge for advancing our understandingof these processes, and I hope that the results of my work will lead to worthwhileinvestigations of population genetic processes in species alive today.100BibliographyAgrawal A.F. and Whitlock M.C. 2011. Inferences about the distribution of dominancedrawn from yeast gene knockout data. Genetics 187:553–566.Aguilée R., Shaw F.H., Rousset F., Shaw R.G., and Ronce O. 2012. How doespollen versus seed dispersal affect niche evolution? Evolution doi: 10.1111/j.1558-5646.2012.01816.x.Aitken S.N. and Whitlock M.C. 2013. Assisted gene flow to facilitate local adaptationto climate change. Annual Review of Ecology, Evolution, and Systematics 44:367.Altwegg R., Collingham Y.C., Erni B., and Huntley B. 2013. Density-dependentdispersal and the speed of range expansions. Diversity and Distributions 19:60–68.Anderson E.C. 2005. An efficient Monte Carlo method for estimating Ne fromtemporally spaced samples using a coalescent-based likelihood. Genetics 170:955–967.Angert A.L., Bradshaw Jr H., and Schemske D.W. 2008. Using experimental evolutionto investigate geographic range limits in monkeyflowers. Evolution 62:2660–2675.Atkins K.E. and Travis J.M.J. 2010. Local adaptation and the evolution of species’ranges under climate change. Journal of Theoretical Biology 266:449–457.Avise J.C. 1995. Mitochondrial dna polymorphism and a connection between geneticsand demography of relevance to conservation. Conservation Biology 9:686–690.Baalsrud H.T., Saether B.E., Hagen I.J., Myhre A.M., Ringsby T.H., Pärn H., and JensenH. 2014. Effects of population characteristics and structure on estimates of effectivepopulation size in a house sparrow metapopulation. Molecular Ecology 23:2653–2668.Baer C.F., Miyamoto M.M., and Denver D.R. 2007. Mutation rate variation inmulticellular eukaryotes: causes and consequences. Nature Reviews Genetics 8:619–631.Barker J.S.F. 2011. Effective population size of natural populations of Drosophilabuzzatii, with a comparative evaluation of nine methods of estimation. MolecularEcology 20:4452–4471.Barton N.H. 2001. Adaptation at the edge of a species’ range. In Integrating Ecologyand Evolution in a Spatial Context, pages 365–392, Blackwell, Oxford.Barton N.H. 2016. Sewall wright on evolution in mendelian populations and the"shifting balance". Genetics 202:3.Beaumont M.A. 2003. Estimation of population growth or decline in geneticallymonitored populations. Genetics 164:1139–1160.101bibliographyBeaumont M.A., Cornuet J.M., M M.J., and Robert C.P. 2009. Adaptive approximateBayesian computation. Biometrika 96:983–990.Beerli P. and Felsenstein J. 2001. Maximum likelihood estimation of a migration matrixand effective population sizes in n subpopulations by using a coalescent approach.Proceedings of the National Academy of Sciences 98:4563–4568.Bell G. 2013. Evolutionary rescue and the limits of adaptation. Phil. Trans. R. Soc. B368:20120080.Berthier P., Beaumont M.A., Cornuet J.M., and Luikart G. 2002. Likelihood-basedestimation of the effective population size using temporal changes in allelefrequencies: A genealogical approach. Genetics 160:741–751.Bradburd G.S., Ralph P.L., and Coop G.M. 2016. A spatial framework forunderstanding population structure and admixture. PLoS Genet 12:1–38.Brady K.U., Kruckeberg A.R., and Bradshaw Jr H. 2005. Evolutionary ecology ofplant adaptation to serpentine soils. Annual Review of Ecology, Evolution, andSystematics pages 243–266.Bratteler M., Lexer C., and Widmer A. 2006. Genetic architecture of traits associatedwith serpentine adaptation of silene vulgaris. Journal of Evolutionary Biology19:1149–1156.Bridle J.R., Polechová J., Kawata M., and Butlin R.K. 2010. Why is adaptationprevented at ecological margins? New insights from individual-based simulations.Ecology Letters 13:485–494.Bridle J.R. and Vines T.H. 2007. Limits to evolution at range margins: when and whydoes adaptation fail? Trends in Ecology & Evolution 22:140–147.Brown J.H. 1984. On the relationship between abundance and distribution of species.The American Naturalist 124:255–279.Buckler E.S., Holland J.B., Bradbury P.J., Acharya C.B., Brown P.J., Browne C., ErsozE., Flint-Garcia S., Garcia A., Glaubitz J.C., et al. 2009. The genetic architecture ofmaize flowering time. Science 325:714–718.Caballero A. 1994. Developments in the prediction of effective population size.Heredity 73:657–679.Carlson S.M., Cunningham C.J., and Westley P.A. 2014. Evolutionary rescue in achanging world. Trends in Ecology & Evolution 29:521–530.Carroll S.P., Dingle H., Famula T.R., and Fox C.W. 2001. Genetic architectureof adaptive differentiation in evolving host races of the soapberry bug, jaderahaematoloma. Genetica 112:257–272.Case T.J., Holt R.D., McPeek M.A., and Keitt T.H. 2005. The community context ofspecies’ borders: ecological and evolutionary perspectives. Oikos 108:28–46.Case T.J. and Taper M.L. 2000. Interspecific competition, environmental gradients,gene flow, and the coevolution of species’ borders. The American Naturalist155:583–605.102bibliographyCharlesworth B., Charlesworth D., and Charlesworth D. 2010. Elements ofevolutionary genetics. Roberts and Company Publishers Greenwood Village.Chen C., Durand E., Forbes F., and François O. 2007. Bayesian clustering algorithmsascertaining spatial population structure: a new computer program and acomparison study. Molecular Ecology Notes 7:747–756.Chen I.C., Hill J.K., Ohlemüller R., Roy D.B., and Thomas C.D. 2011. Rapid rangeshifts of species associated with high levels of climate warming. Science 333:1024–1026.Ching J., Musheyev S.A., Chowdhury D., Kim J.A., Choi Y., and Dennehy J.J. 2012.Migration enhances adaptation in bacteriophage populations evolving in ecologicalsinks. Evolution 67:10–17.Colautti R.I. and Barrett S.C. 2013. Rapid adaptation to climate facilitates rangeexpansion of an invasive plant. Science 342:364–366.Conover D. 1992. Seasonality and the scheduling of life history at different latitudes.Journal of Fish Biology 41:161–178.Coop G., Witonsky D., Di Rienzo A., and Pritchard J.K. 2010. Using environmentalcorrelations to identify loci underlying local adaptation. Genetics 185:1411–1423.Corander J. and Marttinen P. 2006a. Bayesian identification of admixture events usingmultilocus molecular markers. Molecular ecology 15:2833–2843.Corander J. and Marttinen P. 2006b. Bayesian identification of admixture events usingmultilocus molecular markers. Molecular ecology 15:2833–2843.Corander J., Waldmann P., Marttinen P., and Sillanpää M.J. 2004. Baps 2: enhancedpossibilities for the analysis of genetic population structure. Bioinformatics 20:2363–2369.Corander J., Waldmann P., and Sillanpää M.J. 2003. Bayesian analysis of geneticdifferentiation between populations. Genetics 163:367–374.Deng H.W. and Lynch M. 1996. Estimation of deleterious-mutation parameters innatural populations. Genetics 144:349–360.Denver D.R., Morris K., Lynch M., and Thomas W.K. 2004. High mutation rate andpredominance of insertions in the caenorhabditis elegans nuclear genome. Nature430:679–682.Do C., Waples R.S., Peel D., Macbeth G.M., Tillett B.J., and Ovenden J.R. 2014.NeEstimatorv2: re-implementation of software for the estimation of contemporaryeffective population size (Ne) from genetic data. Molecular Ecology Resources14:209–214.Do R., Balick D., Li H., Adzhubei I., Sunyaev S., and Reich D. 2015. No evidence thatselection has been less effective at removing deleterious mutations in europeansthan in africans. Nature genetics 47:126–131.Dryad Digital Repository. 2012. Dryad. the dryad digital repository. C.L., Blower D.C., Broderick D., Giles J.L., Holmes B.J., Kashiwagi T., KrückN.C., Morgan J.A.T., Tillett B.J., and Ovenden J.R. 2012. A review of the applicationof molecular genetics for fisheries management and conservation of sharks and rays.Journal of Fish Biology 80:1789–1843.Earl D.A. et al. 2012. Structure harvester: a website and program for visualizingstructure output and implementing the evanno method. Conservation geneticsresources 4:359–361.Eckert C.G., Samis K.E., and Lougheed S.C. 2008. Genetic variation across species’geographical ranges: the central–marginal hypothesis and beyond. MolecularEcology 17:1170–1188.Edmonds C.A., Lillie A.S., and Cavalli-Sforza L.L. 2004. Mutations arising in the wavefront of an expanding population. Proceedings of the National Academy of Sciencesof the United States of America 101:975–979.Ellstrand N.C. and Elam D.R. 1993. Population genetic consequences of smallpopulation size: implications for plant conservation. Annual review of Ecology andSystematics pages 217–242.Evanno G., Regnaut S., and Goudet J. 2005. Detecting the number of clusters ofindividuals using the software structure: a simulation study. Molecular ecology14:2611–2620.Excoffier L., Foll M., and Petit R.J. 2009. Genetic consequences of range expansions.Annual Review of Ecology, Evolution, and Systematics 40:481–501.Eyre-Walker A. and Keightley P.D. 2007. The distribution of fitness effects of newmutations. Nature Reviews Genetics 8:610–618.Falush D., Stephens M., and Pritchard J.K. 2003. Inference of population structureusing multilocus genotype data: linked loci and correlated allele frequencies.Genetics 164:1567–1587.Falush D., Stephens M., and Pritchard J.K. 2007. Inference of population structureusing multilocus genotype data: dominant markers and null alleles. Molecularecology notes 7:574–578.Feder J.L. and Nosil P. 2010. The efficacy of divergence hitchhiking in generatinggenomic islands during ecological speciation. Evolution 64:1729–1747.François O. and Durand E. 2010. Spatially explicit bayesian clustering models inpopulation genetics. Molecular Ecology Resources 10:773–784.Fry J.D., Keightley P.D., Heinsohn S.L., and Nuzhdin S.V. 1999. New estimates ofthe rates and effects of mildly deleterious mutation in drosophila melanogaster.Proceedings of the National Academy of Sciences 96:574–579.Gao H., Williamson S., and Bustamante C.D. 2007. A markov chain monte carloapproach for joint inference of population structure and inbreeding rates frommultilocus genotype data. Genetics 176:1635–1651.104bibliographyGao X. and Starmer J. 2007. Human population structure detection via multilocusgenotype clustering. BMC genetics 8:34.Garcia-Ramos G. and Kirkpatrick M. 1997. Genetic models of adaptation and geneflow in peripheral populations. Evolution 51:21–28.Gaston K.J. 2009. Geographic range limits: achieving synthesis. Proceedings of theRoyal Society of London B: Biological Sciences pages rspb–2008.Gomulkiewicz R. and Holt R.D. 1995. When does evolution by natural selectionprevent extinction? Evolution 49:201–207.Gomulkiewicz R., Holt R.D., and Barfield M. 1999. The effects of density dependenceand immigration on local adaptation and niche evolution in a black-hole sinkenvironment. Theoretical Population Biology 55:283–296.Gravel S. 2016. When is selection effective? Genetics 203:451–462.Guillaume F. and Rougemont J. 2006. Nemo: an evolutionary and population geneticsprogramming framework. Bioinformatics 22:2556–2557.Guttal V., Bartumeus F., Hartvigsen G., and Nevai A.L. 2011. Retention time variabilityas a mechanism for animal mediated long-distance dispersal. PloS one 6:e28447.Haag-Liautard C., Dorris M., Maside X., Macaskill S., Halligan D.L., CharlesworthB., and Keightley P.D. 2007. Direct estimation of per nucleotide and genomicdeleterious mutation rates in drosophila. Nature 445:82–85.Haldane J.B.S. 1932. The causes of evolution. Longmans, Green and Co., London; NewYork; Toronto.Hallatschek O. and Nelson D.R. 2010. Life at the front of an expanding population.Evolution 64:193–206.Halligan D.L. and Keightley P.D. 2009. Spontaneous mutation accumulation studiesin evolutionary genetics. Annual Review of Ecology, Evolution, and Systematics40:151–172.Hancock A.M., Witonsky D.B., Alkorta-Aranburu G., Beall C.M., Gebremedhin A.,Sukernik R., Utermann G., Pritchard J.K., Coop G., and Di Rienzo A. 2011.Adaptations to Climate-Mediated Selective Pressures in Humans. PLoS Genetics7:e1001375.Hargreaves A.L. and Eckert C.G. 2014. Evolution of dispersal and mating systemsalong geographic gradients: implications for shifting ranges. Functional Ecology28:5–21.Hargreaves A.L., Samis K.E., and Eckert C.G. 2014. Are species’ range limits simplyniche limits writ large? a review of transplant experiments beyond the range. TheAmerican Naturalist 183:157–173.Hastings A., Cuddington K., Davies K.F., Dugaw C.J., Elmendorf S., Freestone A.,Harrison S., Holland M., Lambrinos J., Malvadkar U., et al. 2005. The spatial spreadof invasions: new developments in theory and evidence. Ecology Letters 8:91–101.105bibliographyHenn B.M., Botigué L.R., Bustamante C.D., Clark A.G., and Gravel S. 2015. Estimatingthe mutation load in human genomes. Nature Reviews Genetics 16:333–343.Henn B.M., Botigue L.R., Peischl S., Dupanloup I., Lipatov M., Maples B.K., MartinA.R., Musharoff S., Cann H., Snyder M.P., et al. 2016. Distance from sub-saharanafrica predicts mutational load in diverse human genomes. Proceedings of theNational Academy of Sciences 113:E440–E449.Henry R.C., Coulon A., and Travis J.M. 2015. Dispersal asymmetries and deleteriousmutations influence metapopulation persistence and range dynamics. EvolutionaryEcology 29:833–850.Hewitt G.M. 1999. Post-glacial re-colonization of european biota. Biological Journal ofthe Linnean Society 68:87–112.Hey J. and Nielsen R. 2004. Multilocus methods for estimating population sizes,migration rates and divergence time, with applications to the divergence ofDrosophila pseudoobscura and D. persimilis. Genetics 167:747–760.Hill W.G. 1979. A note on effective population size with overlapping generations.Genetics 92:317–322.Hill W.G. 1981. Estimation of effective population size from data on linkagedisequilibrium. Genetical Research 38:209–216.Hill W.G. and Robertson A. 1968. The effects of inbreeding at loci with heterozygoteadvantage. Genetics 60:615–628.Hoehn M., Gruber B., Sarre S.D., Lange R., and Henle K. 2012. Can genetic estimatorsprovide robust estimates of the effective number of breeders in small populations?PLoS One 7:e48464.Hoekstra H.E., Drumm K.E., and Nachman M.W. 2004. Ecological genetics of adaptivecolor polymorphism in pocket mice: geographic variation in selected and neutralgenes. Evolution 58:1329–1341.Hoekstra H.E., Hirschmann R.J., Bundey R.A., Insel P.A., and Crossland J.P. 2006.A single amino acid mutation contributes to adaptive beach mouse color pattern.Science 313:101–104.Holleley C.E., Nichols R.A., Whitehead M.R., Adamack A.T., Gunn M.R., andSherwin W.B. 2013. Testing single-sample estimators of effective population sizein genetically structured populations. Conservation Genetics 15:23–35.Holloway G.J., Povey S.R., Sibly R.M., et al. 1990. The effect of new environment onadapted genetic architecture. Heredity 64:30.Holt R., Gomulkiewicz R., and Barfield M. 2003. The phenomenology of nicheevolution via quantitative traits in a ’black-hole’ sink. Proceedings of the RoyalSociety of London B: Biological Sciences 270:215–224.Holt R.D. and Barfield M. 2011. Theoretical perspectives on the statics and dynamicsof species’ borders in patchy environments. The American Naturalist 178:S6–S25.106bibliographyHolt R.D. and Gomulkiewicz R. 1997. How does immigration influence localadaptation? A reexamination of a familiar paradigm. The American Naturalist149:563–572.Houle D. 1992. Comparing evolvability and variability of quantitative traits. Genetics130:195–204.Houle D., Morikawa B., and Lynch M. 1996. Comparing mutational variabilities.Genetics 143:1467–1483.Hui T.Y.J. and Burt A. 2015. Estimating effective population size from temporallyspaced samples with a novel, efficient maximum-likelihood algorithm. Genetics200:285–293.Hunter M.L. and Hutchinson A. 1994. The virtues and shortcomings of parochialism:conserving species that are locally rare, but globally common. Conservation Biology8:1163–1165.Ibrahim K.M., Nichols R.A., and Hewitt G.M. 1996. Spatial patterns of geneticvariation generated by different forms of dispersal during range expansion.Heredity 77:282–291.Ioannidis J.P., Allison D.B., Ball C.A., Coulibaly I., Cui X., Culhane A.C., Falchi M.,Furlanello C., Game L., Jurman G., et al. 2009. Repeatability of published microarraygene expression analyses. Nature genetics 41:149–155.Jakobsson M. and Rosenberg N.A. 2007. Clumpp: a cluster matching and permutationprogram for dealing with label switching and multimodality in analysis ofpopulation structure. Bioinformatics 23:1801–1806.Johnson T. and Barton N. 2005. Theoretical models of selection and mutation onquantitative traits. Philosophical Transactions of the Royal Society of London B:Biological Sciences 360:1411–1425.Jorde P.E. and Ryman N. 2007. Unbiased estimator for genetic drift and effectivepopulation size. Genetics 177:927–935.Karlsson E.K., Kwiatkowski D.P., and Sabeti P.C. 2014. Natural selection and infectiousdisease in human populations. Nature Reviews Genetics 15:379–393.Kawecki T.J. and Ebert D. 2004. Conceptual issues in local adaptation. Ecology letters7:1225–1241.Keightley P.D. 1994. The distribution of mutation effects on viability in drosophilamelanogaster. Genetics 138:1315–1322.Keightley P.D. 2012. Rates and fitness consequences of new mutations in humans.Genetics 190:295–304.Kikuchi D.W. and Pfennig D.W. 2009. High-model abundance may permit the gradualevolution of batesian mimicry: an experimental test. Proceedings of the RoyalSociety of London B: Biological Sciences .Kimura M. 1962. On the probability of fixation of mutant genes in a population.Genetics 47:713.107bibliographyKimura M. 1964. Diffusion models in population genetics. Journal of AppliedProbability 1:177–232.Kingsolver J.G., Hoekstra H.E., Hoekstra J.M., Berrigan D., Vignieri S.N., Hill C.,Hoang A., Gibert P., and Beerli P. 2001. The strength of phenotypic selection innatural populations. The American Naturalist 157:245–261.Kirkpatrick M. and Barton N. 1997. Evolution of a species’ range. The AmericanNaturalist 150:1–23.Kirkpatrick M. and Barton N. 2006. Chromosome inversions, local adaptation andspeciation. Genetics 173:419–434.Klopfstein S., Currat M., and Excoffier L. 2006. The fate of mutations surfing on thewave of a range expansion. Molecular Biology and Evolution 23:482–490.Kubisch A., Degen T., Hovestadt T., and Poethke H.J. 2013. Predicting rangeshifts under global change: the balance between local adaptation and dispersal.Ecography 36:873–882.Kuhner M.K. 2006. LAMARC 2.0: maximum likelihood and Bayesian estimation ofpopulation parameters. Bioinformatics 22:768–770.Kumar S., Skjæveland Å., Orr R.J., Enger P., Ruden T., Mevik B.H., Burki F.,Botnen A., and Shalchian-Tabrizi K. 2009. Air: A batch-oriented web programpackage for construction of supermatrices ready for phylogenomic analyses. BMCbioinformatics 10:357.Labuda D., Zietkiewicz E., and Labuda M. 1997. The genetic clock and the age ofthe founder effect in growing populations: a lesson from french canadians andashkenazim. American journal of human genetics 61:768.Lande R. 1975. The maintenance of genetic variability by mutation in a polygeniccharacter with linked loci. Genetical research 26:221–235.Lande R. 1988. Genetics and demography in biological conservation. Science 241:1455–1460.Lande R. and Barrowclough G.F. 1987. Effective population size, genetic variation, andtheir use in population management. Viable populations for conservation pages 87–123.Latch E.K., Dharmarajan G., Glaubitz J.C., and Rhodes Jr O.E. 2006. Relativeperformance of bayesian clustering software for inferring population substructureand individual assignment at low levels of population differentiation. ConservationGenetics 7:295–302.Le Corre V. and Kremer A. 2012. The genetic differentiation at quantitative trait lociunder local adaptation. Molecular ecology 21:1548–1566.Leimu R. and Fischer M. 2008. A meta-analysis of local adaptation in plants. PLoSOne 3:e4010.Lohmueller K.E. 2014a. The distribution of deleterious genetic variation in humanpopulations. Current opinion in genetics & development 29:139–146.108bibliographyLohmueller K.E. 2014b. The impact of population demography and selection on thegenetic architecture of complex traits. PLoS Genet 10:e1004379.Lohmueller K.E., Indap A.R., Schmidt S., Boyko A.R., Hernandez R.D., HubiszM.J., Sninsky J.J., White T.J., Sunyaev S.R., Nielsen R., et al. 2008. Proportionallymore deleterious genetic variation in european than in african populations. Nature451:994–997.Lopez S., Rousset F., Shaw F., Shaw R., and Ronce O. 2008. Migration load inplants: role of pollen and seed dispersal in heterogeneous landscapes. Journal ofevolutionary biology 21:294–309.Louthan A.M., Doak D.F., and Angert A.L. 2015. Where and when do speciesinteractions set range limits? Trends in ecology & evolution 30:780–792.Lowe W.H. 2009. What drives long-distance dispersal? a test of theoretical predictions.Ecology 90:1456–1462.Lynch M., Conery J., and Burger R. 1995. Mutation accumulation and the extinctionof small populations. American Naturalist pages 489–518.Lynch M., Walsh B., et al. 1998. Genetics and analysis of quantitative traits, vol. 1.Sinauer Sunderland, MA.Macbeth G.M., Broderick D., Buckworth R.C., and Ovenden J.R. 2013. Linkagedisequilibrium estimation of effective population size with immigrants fromdivergent populations: A case study on Spanish mackerel (Scomberomoruscommerson). Genes Genomes Genetics 3:709–717.Marsico T.D. and Hellmann J.J. 2009. Dispersal limitation inferred from anexperimental translocation of lomatium (apiaceae) species outside their geographicranges. Oikos 118:1783–1792.Montague J., Barrett S., and Eckert C. 2008. Re-establishment of clinal variationin flowering time among introduced populations of purple loosestrife (lythrumsalicaria, lythraceae). Journal of evolutionary biology 21:234–245.Mousseau T.A., Roff D.A., et al. 1987. Natural selection and the heritability of fitnesscomponents. Heredity 59:181–197.Nachman M.W., Hoekstra H.E., and D’Agostino S.L. 2003. The genetic basis ofadaptive melanism in pocket mice. Proceedings of the National Academy ofSciences 100:5268–5273.Neel M.C., McKelvey K., Ryman N., Lloyd M.W., Bull R.S., Allendorf F.W., SchwartzM.K., and Waples R.S. 2013. Estimation of effective population size in continuouslydistributed populations: there goes the neighborhood. Heredity 111:1–11.Nei M. and Tajima F. 1981. Genetic drift and estimation of effective population size.Genetics 98:625–640.Nomura T. 2008. Estimation of effective number of breeders from molecularcoancestry of single cohort sample. Evolutionary Applications 1:462–474.Norio R. 2003. Finnish disease heritage i. Human genetics 112:441–456.109bibliographyNunney L. and Campbell K.A. 1993. Assessing minimum viable population size:demography meets population genetics. Trends in Ecology & Evolution 8:234–239.Palstra F.P. and Fraser D.J. 2012. Effective/census population size ratio estimation: acompendium and appraisal. Ecology and Evolution 2:2357–2365.Pavlík Z. 2000. What is demography. Position of demography among other disciplines.Charles University, Faculty of Science, Prague pages 9–18.Pease C.M., Lande R., and Bull J.J. 1989. A model of population growth, dispersal,and evolution in a changing environment. Ecology 70:1657–1664.Peichel C.L., Nereng K.S., Ohgi K.A., Cole B.L., Colosimo P.F., Buerkle C.A., SchluterD., and Kingsley D.M. 2001. The genetic architecture of divergence betweenthreespine stickleback species. Nature 414:901–905.Peischl S., Dupanloup I., Kirkpatrick M., and Excoffier L. 2013. On the accumulationof deleterious mutations during range expansions. Molecular ecology 22:5972–5982.Peischl S. and Excoffier L. 2015. Expansion load: recessive mutations and the role ofstanding genetic variation. Molecular ecology 24:2084–2094.Peischl S., Kirkpatrick M., and Excoffier L. 2015. Expansion load and the evolutionarydynamics of a species range. The American Naturalist 185:E81–E93.Peng R.D. 2011. Reproducible research in computational science. Science (New York,Ny) 334:1226.Petkova D., Novembre J., and Stephens M. 2015. Visualizing spatial populationstructure with estimated effective migration surfaces. Nature Genetics .Pfennig D.W. and Pfennig K.S. 2012. Development and evolution of characterdisplacement. Annals of the New York Academy of Sciences 1256:89–107.Phillips B.L. 2012. Range shift promotes the formation of stable range edges. Journalof Biogeography 39:153–161.Phillips B.L., Brown G.P., Webb J.K., and Shine R. 2006. Invasion and the evolution ofspeed in toads. Nature 439:803–803.Pickett S.T. and Rogers K.H. 1997. Patch dynamics: the transformation of landscapestructure and function. In Wildlife and landscape ecology, pages 101–127, Springer.Polechová J., Barton N., and Marion G. 2009. Species’ range: Adaptation in space andtime. The American Naturalist 174:E186–E204.Polechová J. and Barton N.H. 2015. Limits to adaptation along environmentalgradients. Proceedings of the National Academy of Sciences 112:6401–6406.Pollak E. 1983. A new method for estimating the effective population size from allelefrequency changes. Genetics 104:531–548.Prentis P.J., Wilson J.R., Dormontt E.E., Richardson D.M., and Lowe A.J. 2008.Adaptive evolution in invasive species. Trends in plant science 13:288–294.Price M. 2011. To replicate or not to replicate. Science Careers, doi 10.110bibliographyPrice T.D. and Kirkpatrick M. 2009. Evolutionarily stable range limits set byinterspecific competition. Proceedings of the Royal Society of London B: BiologicalSciences pages rspb–2008.Pritchard J.K., Stephens M., and Donnelly P. 2000. Inference of population structureusing multilocus genotype data. Genetics 155:945–959.Pritchard J.K., Wen W., and Falush D. 2007. Documentation for structure software:Version 2.2. Available from W.B. 2001. The origins of theoretical population genetics: with a newafterword. University of Chicago Press.Pudovkin A.I., Zaykin D.V., and Hedgecock D. 1996. On the potential for estimatingthe effective number of breeders from heterozygote-excess in progeny. Genetics144:383–387.Raj A., Stephens M., and Pritchard J.K. 2014. faststructure: variational inference ofpopulation structure in large snp data sets. Genetics 197:573–589.Renaut S., Grassa C., Yeaman S., Moyers B., Lai Z., Kane N., Bowers J., Burke J., andRieseberg L. 2013. Genomic islands of divergence are not affected by geography ofspeciation in sunflowers. Nature Communications 4:1827.Rieman B.E. 2001. Effective population size and genetic conservation criteria for Bulltrout. North American Journal of Fish Management 21:756–764.Ronce O. and Kirkpatrick M. 2001. When sources become sinks: Migrationalmeltdown in heterogeneous habitats. Evolution 55:1520–1531.Rosenberg N.A. 2004. Distruct: a program for the graphical display of populationstructure. Molecular Ecology Notes 4:137–138.Rosenberg N.A., Burke T., Elo K., Feldman M.W., Freidlin P.J., Groenen M.A., HillelJ., Mäki-Tanila A., Tixier-Boichard M., Vignal A., et al. 2001. Empirical evaluationof genetic clustering methods using multilocus genotypes from 20 chicken breeds.Genetics 159:699–713.Rosenblum E.B., Hoekstra H.E., and Nachman M.W. 2004. Adaptive reptile colorvariation and the evolution of the mcir gene. Evolution 58:1794–1808.Ryman N., Allendorf F.W., Jorde P.E., Laikre L., and Hössjer O. 2013. Samples fromsubdivided populations yield biased estimates of effective size that overestimatethe rate of loss of genetic variation. Molecular Ecology Resources 14:1–48.Sakai A.K., Allendorf F.W., Holt J.S., Lodge D.M., Molofsky J., With K.A., BaughmanS., Cabin R.J., Cohen J.E., Ellstrand N.C., et al. 2001. The population biology ofinvasive specie. Annual Review of Ecology and Systematics pages 305–332.Savolainen O., Lascoux M., and Merilä J. 2013. Ecological genomics of local adaptation.Nature Reviews Genetics 14:807–820.Schiffers K., Schurr F.M., Travis J.M., Duputié A., Eckhart V.M., Lavergne S., McInernyG., Moore K.A., Pearman P.B., Thuiller W., et al. 2014. Landscape structure andgenetic architecture jointly impact rates of niche evolution. Ecography 37:1218–1229.111bibliographySchoen D.J. 2005. Deleterious mutation in related species of the plant genus amsinckiawith contrasting mating systems. Evolution 59:2370–2377.Schwartz M.K., Luikart G., and Waples R.S. 2007. Genetic monitoring as a promisingtool for conservation and management. Trends in Ecology & Evolution 22:25–33.Scriver C.R. 2001. Human genetics: Lessons from quebec populations 1. Annualreview of genomics and human genetics 2:69–101.Serbezov D., Jorde P.E., Bernatchez L., Olsen E.M., and Vollestad L.A. 2012.Short-term genetic changes: Evaluating effective population size estimates ina comprehensively described brown trout (Salmo trutta) population. Genetics191:579–592.Sexton J.P., McIntyre P.J., Angert A.L., and Rice K.J. 2009. Evolution and ecology ofspecies range limits. Ecology, Evolution, and Systematics 40:415–436.Shaffer M.L. 1981. Minimum population sizes for species conservation. BioScience31:131–134.Shaw R.G., Byers D.L., and Darmo E. 2000. Spontaneous mutational effects onreproductive traits of arabidopsis thaliana. Genetics 155:369–378.Simons Y.B., Turchin M.C., Pritchard J.K., and Sella G. 2014. The deleterious mutationload is insensitive to recent population history. Nature Genetics 46:220–224.Skalski G.T. and Gilliam J.F. 2000. Modeling diffusive spread in a heterogeneouspopulation: a movement study with stream fish. Ecology 81:1685–1700.Slatkin M. 1985. Gene flow in natural populations. Annual review of ecology andsystematics pages 393–430.Slatkin M. 1987. Gene flow and the geographic structure of natural populations.Science 236:787–792.Soulé M.E. 1987. Viable populations for conservation. Cambridge university press.Stapley J., Reger J., Feulner P.G., Smadja C., Galindo J., Ekblom R., Bennison C., BallA.D., Beckerman A.P., and Slate J. 2010. Adaptation genomics: the next generation.Trends in ecology & evolution 25:705–712.Stevens G.C. 1992. The elevational gradient in altitudinal range: an extension ofrapoport’s latitudinal rule to altitude. American naturalist pages 893–911.Svenning J.C., Gravel D., Holt R.D., Schurr F.M., Thuiller W., Münkemüller T.,Schiffers K.H., Dullinger S., Edwards T.C., Hickler T., et al. 2014. The influence ofinterspecific interactions on species range expansion rates. Ecography 37:1198–1209.Tallmon D.A., Koyuk A., Luikart G., and Beaumont M.A. 2008. ONeSAMP: a programto estimate effective population size using approximate Bayesian computation.Molecular Ecology Resources 8:299–301.Tallmon D.A., Luikart G., and Beaumont M.A. 2004. Comparative Evaluation of a NewEffective Population Size Estimator Based on Approximate Bayesian Computation.Genetics 167:977–988.112bibliographyTaylor C.M. and Hastings A. 2005. Allee effects in biological invasions. Ecology Letters8:895–908.Theunert C., Tang K., Lachmann M., Hu S., and Stoneking M. 2012. Inferring thehistory of population size change from genome-wide SNP data. Molecular Biologyand Evolution 29:3653–3667.Tigano A. and Friesen V.L. 2016. Genomics of local adaptation with gene flow.Molecular ecology .Turelli M. 1984. Heritable genetic variation via mutation-selection balance: Lerch’szeta meets the abdominal bristle. Theoretical population biology 25:138–193.Via S. 2012. Divergence hitchhiking and the spread of genomic isolation duringecological speciation-with-gene-flow. Philosophical Transactions of the RoyalSociety of London B: Biological Sciences 367:451–460.Vines T.H., Albert A.Y., Andrew R.L., Débarre F., Bock D.G., Franklin M.T., GilbertK.J., Moore J.S., Renaut S., and Rennison D.J. 2014. The availability of research datadeclines rapidly with article age. Current Biology 24:94–97.Vines T.H., Andrew R.L., Bock D.G., Franklin M.T., Gilbert K.J., Kane N.C., Moore J.S.,Moyers B.T., Renaut S., Rennison D.J., et al. 2013. Mandated data archiving greatlyimproves access to research data. The FASEB journal 27:1304–1308.Vitalis R. and Couvet D. 2001a. ESTIM 1.0: a computer program to infer populationparameters from one- and two-locus gene identity probabilities. Molecular EcologyNotes 1:354–356.Vitalis R. and Couvet D. 2001b. Estimation of effective population size and migrationrate from one- and two-locus identity measures. Genetics 157:911–925.Vitalis R. and Couvet D. 2001c. Two-locus identity probabilities and identitydisequilibrium in a partially selfing subdivided population. Genetical Research77:67–81.Wang J. 2005. Estimation of effective population sizes from data on genetic markers.Philosophical Transactions of the Royal Society B: Biological Sciences 360:1395–1409.Wang J. 2009. A new method for estimating effective population sizes from a singlesample of multilocus genotypes. Molecular Ecology 18:2148–2164.Wang J. 2016. A comparison of single-sample estimators of effective population sizesfrom genetic marker data. Molecular Ecology pages n/a–n/a.Wang J. and Whitlock M.C. 2003. Estimating effective population size and migrationrates from genetic samples over space and time. Genetics 163:429–446.Waples R.S. and Do C. 2008. LDNE: a program for estimating effective populationsize from data on linkage disequilibrium. Molecular Ecology Resources 8:753–756.Waples R.S. and England P.R. 2011. Estimating contemporary effective population sizeon the basis of linkage disequilibrium in the face of migration. Genetics 189:633–644.113bibliographyWaples R.S. and Faulkner J.S. 2009. Modelling evolutionary processes in smallpopulations: not as ideal as you think. Molecular Ecology 18:1834–1847.Waples R.S. and Gaggiotti O. 2006. What is a population? an empirical evaluation ofsome genetic methods for identifying the number of gene pools and their degree ofconnectivity. Molecular ecology 15:1419–1439.Whitlock M.C. 1992. Temporal fluctuations in demographic parameters and thegenetic variance among populations. Evolution 46:608–615.Whitlock M.C. 2000. Fixation of new alleles and the extinction of small populations:drift load, beneficial alleles, and sexual selection. Evolution 54:1855–1861.Whitlock M.C. and Lotterhos K.E. 2015. Reliable detection of loci responsible for localadaptation: Inference of a null model through trimming the distribution of fst. TheAmerican naturalist 186:S24–36.Whitlock M.C. and McCauley D.E. 1999. Indirect measures of gene flow andmigration: Fst 6= 1/(4nm+ 1). Heredity 82:117–125.Whitlock M.C., McPeek M.A., Rausher M.D., Rieseberg L., and Moore A.J. 2010. Dataarchiving. The American Naturalist 175:145–146.Wilson R.J., Gutiérrez D., Gutiérrez J., Martínez D., Agudo R., and Monserrat V.J.2005. Changes to the elevational limits and extent of species ranges associated withclimate change. Ecology Letters 8:1138–1146.Wolkovich E.M., Regetz J., and O’Connor M.I. 2012. Advances in global changeresearch require open science by individual researchers. Global Change Biology18:2102–2110.Wright S. 1930. The genetical theory of natural selection a review. Journal of Heredity21:349–356.Wright S. 1931. Evolution in mendelian populations. Genetics 16:97–159.Wright S. 1943. Isolation by distance. Genetics 28:114–138.Wright S. 1946. Isolation by distance under diverse systems of mating. Genetics 31:39–59.Yeaman S. 2015. Local adaptation by alleles of small effect*. The American Naturalist186:S74–S89.Yeaman S. and Whitlock M.C. 2011. The genetic architecture of adaptation undermigration–selection balance. Evolution 65:1897–1911.Yotova V., Labuda D., Zietkiewicz E., Gehl D., Lovell A., Lefebvre J.F., Bourgeois S.,Lemieux-Blanchard É., Labuda M., Vézina H., et al. 2005. Anatomy of a foundereffect: myotonic dystrophy in northeastern quebec. Human genetics 117:177–187.114appendix aSupplementary Information to Chapter 2a .1 supplementary tablesTable A.1 : Isolated, no migration scenario RMSEsMethod Ne = 50 Ne = 500 Ne = 5000Estim 0.00921 0.00082 0.00026Colony2 Heterozygote Excess 0.00843 0.00625 0.00010Sibship likelihood 0.01000 0.00100 0.00010ONeSamp 0.00486 0.00103 -CoNe 0.00355 0.00191 0.00299MLNe Likelihood 0.00341 0.00141 0.00226Moment 0.00349 0.00202 0.00339TMVP 0.00349 0.00176 0.00306NeEstimator LDNe 0.00163 0.00094 0.00159Heterozygote excess 0.00859 0.00732 0.00679Coancestry 0.03151 0.04483 0.07211Pollak 0.00355 0.00252 0.00389Nei/Tajima 0.00352 0.00252 0.00385Jorde/Ryman 0.00360 0.00268 0.00418Root mean square error (RMSE) for scenarios with no migration at Ne = 50, 500, and 5000 respectively.Dashes indicate cases where Ne was not estimated (see text).115appendixaTable A.2 : Island model migration scenario RMSEs at Ne = 50Metapopulation m = 0.01 Metapopulation m = 0.1Method 0.01 Sink 0.1 Sink 0.25 Sink Within Whole 0.01 Sink 0.1 Sink 0.25 Sink Within WholeEstim 0.00612 0.0075 0.0073 0.00524 0.1066 0.00441 0.00279 0.00299 0.00179 0.00111Colony2 HeterozygoteExcess0.00795 0.00846 0.00835 0.00801 0.00061 0.00887 0.00868 0.00918 0.00764 0.00087Sibship like-lihood0.01 0.01 0.01 0.01 0.00061 0.01 0.01 0.01 0.01 0.00016ONeSamp 0.00795 0.00768 0.00783 0.00804 0.00062 0.00991 0.01093 0.01506 0.033 8.69568CoNe 0.00758 0.00786 0.00855 0.00867 0.00061 0.00284 0.00184 0.00289 0.00143 0.00087MLNe Likelihood 0.00333 0.00302 0.00261 0.00373 0.00011 0.00314 0.00393 0.00358 0.00438 0.00031Moment 0.00202 0.0023 0.0034 0.00188 0.00058 0.00163 0.00215 0.00319 0.00176 0.00024Likelihoodw/ mig.source0.00154 0.00121 0.00117 0.00171 0.00124 0.00096 0.0008 0.0009Moment w/mig. source0.00247 0.00229 0.00225 0.0024 0.00198 0.0016 0.00156 0.0017TMVP 0.00265 0.00226 0.00349 0.00228 0.00009 0.00247 0.00185 0.00252 0.00134 0.00031NeEstimator LDNe 0.00106 0.0009 0.00083 0.00093 0.00061 0.00108 0.00199 0.00147 0.0034 0.00087Heterozygoteexcess0.00798 0.00827 0.00839 0.00806 0.00992 0.00813 0.00836 0.00834 0.00902 0.01002Coancestry 0.07973 0.02717 0.03434 0.10064 0.00061 0.02825 0.01832 0.01265 0.02004 0.00135Pollak 0.00208 0.00242 0.00357 0.00232 0.66126 0.00212 0.00227 0.00344 0.00275 0.34384Nei/Tajima 0.00207 0.00238 0.00336 0.00223 0.18039 0.00206 0.00218 0.00326 0.00269 0.04148Jorde/Ryman 0.00252 0.00279 0.00359 0.00248 0.63027 0.00222 0.00246 0.00347 0.00308 0.15949Root mean square error (RMSE) for island model migration scenarios at Ne = 50. Dashes indicate cases where it was not applicable to estimate Ne.116appendixaTable A.3 : Island model migration scenario RMSEs at Ne = 500Metapopulation m = 0.01 Metapopulation m = 0.1Method 0.01 Sink 0.1 Sink 0.25 Sink Within Whole 0.01 Sink 0.1 Sink 0.25 Sink Within WholeEstim 0.00029 0.00034 0.00036 0.00025 0.00082 0.00013 0.0001 0.00015 0.00009 0.00001Colony2 HeterozygoteExcess0.00117 0.00106 0.00088 0.00095 0.00009 0.00189 - - - 0.00062Sibship like-lihood0.001 0.001 0.001 0.00099 0.00003 0.00024 - - - 0.00009CoNe 0.00099 0.001 0.001 0.001 0.00009 0.001 0.001 0.001 0.001 0.00009MLNe Likelihood 0.00049 0.00047 0.00041 0.00057 0.00003 0.0005 0.00059 0.00056 0.00064 0.00004Moment 0.00021 0.00019 0.00022 0.00023 0.00007 0.00024 0.00017 0.00012 0.00021 0.00005Like. w/mig. source0.00017 0.00057 0.00077 0.00021 0.00055 0.00065 0.00077 0.00061Moment w/mig. source0.0003 0.00059 0.00084 0.00032 0.00029 0.00059 0.0009 0.0005TMVP 0.00034 0.00033 0.00026 0.00039 0.00042 0.0004 0.00037 0.00031 - 0.00077NeEstimator LDNe 0.00013 0.0001 0.0001 0.00015 0.00009 0.00092 0.00753 0.00562 0.01815 0.00009Heterozygoteexcess0.00251 0.00318 0.00293 0.0028 0.00382 0.00522 0.02224 0.02591 0.02522 0.00078Coancestry 0.00429 0.00278 0.00239 0.0047 0.00009 0.00231 0.00168 0.00142 0.00174 0.00009Pollak 0.00038 0.00063 0.00061 0.00053 0.33423 0.00106 0.0062 0.00317 0.01064 0.30365Nei/Tajima 0.00038 0.00063 0.0006 0.00053 0.01765 0.00111 0.00616 0.00315 0.01042 0.00642Jorde/Ryman 0.00047 0.00072 0.00072 0.00066 0.09247 0.00109 0.00617 0.00317 0.01043 0.03022Root mean square error (RMSE) for island model migration scenarios at Ne = 500. Empty cells indicate cases where it was not applicable to estimateNe while dashes indicate cases where Ne was not estimated due to computational limits (see text).117appendixaTable A.4 : Stepping stone model IBD scenario RMSEs at Ne = 50m = 0.01 m = 0.1 m = 0.25Method Edge Within Whole Edge Within Whole Edge Within WholeEstim 0.0072 0.00663 99999* 0.00333 0.00277 0.0028 0.00132 0.00086 0.0005Colony2 HeterozygoteExcess0.00822 0.00795 0.00044 0.00832 0.00847 0.00063 0.00833 0.00844 0.00065Sibship like-lihood0.01 0.01 0.00044 0.01 0.01 0.00009 0.00992 0.00977 0.0001ONeSamp 0.00598 0.00747 0.00557 0.07639 0.01063 0.02011 0.19858 1.16473 0.00494CoNe 0.00696 0.00746 0.00044 0.00189 0.00158 0.00635 0.00286 0.00207 0.00065MLNe Likelihood 0.00292 0.00338 0.00006 0.00324 0.00394 0.00013 0.00348 0.0041 0.00015Moment 0.00233 0.00233 0.00044 0.00177 0.00168 0.00027 0.00255 0.00219 0.0002Likelihoodw/ mig.source0.00161 0.0017 0.001 0.00091 0.0009 0.0008Moment w/mig. source0.00259 0.00261 0.00152 0.00145 0.00153 0.00119TMVP 0.00244 0.00259 0.00006 0.00194 0.00159 0.00069 0.0022 0.00152 0.00035NeEstimator LDNe 0.00098 0.00103 0.11018 0.00126 0.00195 0.02703 0.00427 0.0063 0.00065Heterozygoteexcess0.00818 0.00824 0.00044 0.00818 0.00822 0.00063 0.00814 0.00835 0.00732Coancestry 0.06372 0.0815 0.05497 0.01575 0.01777 0.0094 0.01156 0.01059 0.00317Pollak 0.00239 0.00259 0.00044 0.00216 0.00178 0.00177 0.00291 0.00259 0.00065Nei/Tajima 0.0023 0.00248 0. 00044 0.00207 0.00171 0.00179 0.00289 0.00266 0.00065Jorde/Ryman 0.00255 0.0027 0. 00044 0.00233 0.00175 0.002 0.0034 0.00336 0.00065Root mean square error (RMSE) for stepping stone model IBD scenarios at Ne = 50. Empty cells indicate cases where it was not applicable toestimate Ne while dashes indicate cases where Ne was not estimated due to computational limits (see text). *Estim estimated Ne as approachingzero, and in this case, the authors recommend trying another estimation method.118appendixaTable A.5 : Stepping stone model IBD scenario RMSEs at Ne = 500m = 0.01 m = 0.1 m = 0.25Method Edge Within Whole Edge Within Whole Edge Within WholeEstim 0.00397 0.00428 0.00281 0.00941 0.00991 0.00009 0.01236 0.01286 0.00011Colony2 HeterozygoteExcess0.00263 0.00282 0.00063 0.00809 - - - - -Sibship like-lihood0.001 0.001 0.00008 0.0031 - - - - -CoNe 0.01086 0.01069 0.00063 0.01164 0.01147 0.00066 0.01209 0.01163 0.00067MLNe Likelihood 0.00733 0.00696 0.00034 0.006 0.00579 0.00031 0.00528 0.00499 0.0003Moment 0.00914 0.00893 0.00031 0.00921 0.00915 0.00021 0.0104 0.01026 0.00021Likelihoodw/ mig.source0.00789 0.00776 0.00774 0.00772 0.00794 0.00803Moment w/mig. source0.00919 0.00898 0.0086 0.00873 0.00807 0.00863TMVP 0.01046 0.01075 - 0.01185 0.01175 - - 0.01165 -NeEstimator LDNe 0.00745 0.00755 0.01549 0.01249 0.01485 0.00066 0.03084 0.04546 0.00067Heterozygoteexcess0.00381 0.00384 0.00063 0.00611 0.00794 0.00066 0.01394 0.01324 0.00067Coancestry 0.01185 0.01257 0.00962 0.01075 0.0105 0.00063 0.01006 0.01014 0.00047Pollak 0.00985 0.00964 0.00165 0.01446 0.01485 * 0.02041 0.01826 *Nei/Tajima 0.00985 0.00968 0.00173 0.01557 0.01566 * 0.02046 0.01813 *Jorde/Ryman 0.01093 0.01078 0.00173 0.01765 0.01713 * 0.02105 0.0188 *Root mean square error (RMSE) for stepping stone model IBD scenarios at Ne = 500. Empty cells indicate cases where it was not applicable toestimate Ne while dashes indicate cases where Ne was not estimated due to computational limits (see main text). Stars indicate cases where Neestimates of zero were produced and hence not counted as successful estimates (see main text).119appendixaTable A.6 : Isolated, no migration scenario coverage probability and infinite confidence intervalsIsolated Ne = 50 Isolated Ne = 500 Isolated Ne = 5000Method Within Inf Within Inf Within InfEstim 0 0.67 0.99 0.95 1 0.96Colony2 HeterozygoteExcess1 0.94 1 0.95 1 1Sibship like-lihood1 1 1 1 1 1CoNe 0.96 0 0 0 - -MLNe Likelihood 0.87 0.03 0.63 0.63 0.48 0.48Moment 0.82 0.01 0.89 0.39 0.16 0.16Likelihoodw/ mig.source0.87 0 0.84 0 0 0Moment w/mig. source0.8 0 0.68 0.06 0.24 0.24TMVP 0.81 0.95 0.98 0.98 1 1NeEstimator LDNe 0.51 0.15 0.39 0.34 0.24 0.24Heterozygoteexcess0.91 0 0.62 0.34 0.12 0.12Coancestry 0.91 0 0.66 0.34 0.12 0.12Pollak 0.91 0.07 0.91 0.72 0.52 0.52The coverage probability for the true value of Ne being contained within the 95% confidence intervals (“Within") and the percent of time the upper95% confidence interval spans to infinity (“Inf") across all methods for scenarios with no migration at Ne = 50, 500, and 5000 respectively. Dashesindicate cases where Ne was not estimated due to computational limits (see text).120appendixaTable A.7 : Island model scenarios’ coverage probability and infinite confidence intervalsEstim Colony2 ONeSamp CoNe MLNe TMVPHet. Excess Sibship Likelihood Likelihood Like. w/ mig.W/in Inf W/in Inf W/in Inf W/in Inf W/in Inf W/in Inf W/in Inf W/in InfNe = 50Metapop-ulationm = 0.01Sink 0.01 0 0 0.94 0.85 1 1 0 0 0.84 0.54 0.33 0 0.74 0 0.63 0Sink 0.1 0 0 0.89 0.93 1 1 0.07 0 0.89 0.6 0.14 0 0.8 0 0.57 0Sink 0.25 0 0 0.89 0.91 1 1 0.08 0 0.78 0.69 0.19 0 0.71 0 0.33 0In Meta 0.05 0 0.86 0.84 1 1 0.02 0 0.94 0.74 0.15 0 0.49 0 0.64 0Whole 0 0 1 1 1 1 0.01 0 1 1 0.81 1 0 0Ne = 50Metapop-ulationm = 0.1Sink 0.01 0 0 0.21 0.26 0.29 0.29 0.02 0 0.56 0.03 0.21 0 0.5 0 0.4 0Sink 0.1 0 0 0.07 0.07 0.09 0.09 0 0 0.35 0 0 0 0.3 0 0.05 0Sink 0.25 0 0 0.04 0.08 0.09 0.09 0 0 0.08 0 0 0 0.34 0 0 0In Meta 0.02 0 0.04 0.04 0.09 0.09 0 0 0.42 0 0 0 0.27 0 0.02 0Whole 0 0 0.09 0.09 0.03 0 0 0 1 1 0.05 0.86 0.05 0Ne = 500Metapop-ulationm = 0.01Sink 0.01 0 0 0.97 0.92 1 1 - - 1 0.99 0.12 0.04 0.93 0 0.33 0Sink 0.1 0 0 0.96 0.86 0.99 0.99 - - 1 1 0.03 0 0.01 0 0.3 0Sink 0.25 0 0 0.98 0.95 0.99 0.99 - - 1 0.99 0.07 0 0 0 0.59 0In Meta 0 0 0.97 0.86 0.93 0.93 - - 1 1 0.02 0.05 0.79 0 0.13 0Whole 0 0 0.99 0.99 0.84 0 - - 1 1 1 1 0 0Ne = 500Metapop-ulationm = 0.1Sink 0.01 0.05 0 0 0 0 0 - - 1 1 0 0 0 0 0.01 0Sink 0.1 0.09 0 - - - - - - 1 1 0 0 0 0 0 0Sink 0.25 0.01 0 - - - - - - 1 1 0 0 0 0 0.02 0In Meta 0.04 0 - - - - - - 1 1 0 0 0 0 - -Whole 0.26 0 0.75 0.75 0.75 0.74 - - 1 1 0.91 1 0 0NeEstimatorLDNe Het. Excess Coancestry Pollak Nei & Tajima Jorde & RymanWithin Inf Within Inf Within Inf Within Inf Within Inf Within InfNe = 50Metapop-ulationm = 0.01Sink 0.01 0.44 0 0.63 0.84 0.04 0 0.86 0 0.84 0 0.85 0Sink 0.1 0.45 0 0.55 0.94 0.3 0.01 0.79 0 0.82 0 0.83 0Sink 0.25 0.4 0 0.5 0.9 0.18 0.01 0.54 0 0.6 0 0.8 0In Meta 0.52 0 0.65 0.88 0.02 0 0.81 0 0.81 0 0.78 0Whole 1 1 1 1 1 1 0 0 0.06 0.06 0 0Ne = 50Metapop-ulationm = 0.1Sink 0.01 0.16 0 0.52 0.9 0 0 0.71 0 0.7 0 0.72 0Sink 0.1 0.05 0 0.49 0.86 0 0 0.72 0 0.71 0 0.79 0Sink 0.25 0.15 0 0.47 0.86 0 0 0.59 0 0.64 0 0.76 0In Meta 0 0 0.32 0.92 0 0 0.71 0 0.69 0 0.73 0Whole 1 1 0.97 0.97 0.99 0.99 0.03 0.03 0.85 0.84 0.69 0.69Ne = 500Metapop-ulationm = 0.01Sink 0.01 0.63 0 0.96 0.94 0.25 0.05 0.97 0.25 0.97 0.29 0.99 0.56Sink 0.1 0.83 0 0.93 0.92 0.43 0.16 0.91 0.26 0.92 0.27 0.96 0.5Sink 0.25 0.76 0 0.99 0.96 0.45 0.15 0.89 0.12 0.9 0.15 0.95 0.43In Meta 0.53 0 0.95 0.95 0.19 0.07 0.92 0.31 0.95 0.33 0.96 0.53Whole 1 1 1 1 1 1 0 0 0.88 0.88 0.68 0.68Ne = 500Metapop-ulationm = 0.1Sink 0.01 0.78 0.16 0.98 0.98 0.19 0.03 0.98 0.88 0.98 0.9 0.99 0.98Sink 0.1 0.95 0.95 0.86 0.86 0.16 0 0.96 0.96 0.96 0.96 0.85 0.85Sink 0.25 0.99 1 0.76 0.76 0.22 0 0.97 0.97 0.97 0.97 0.88 0.88In Meta 0.98 0.98 0.76 0.76 0.09 0 0.99 0.99 0.99 0.99 0.9 0.9Whole 1 1 1 1 1 1 0.01 0.01 0.99 0.99 0.99 0.99The coverage probability for the true value of Ne being contained within the 95% confidence intervals (“Within") and the percent of time the upper95% confidence interval spans to infinity (“Inf") across all methods for island model scenarios. Empty cells indicate cases where it was notapplicable to estimate Ne while dashes indicate cases where Ne was not estimated due to computational limits (see text).121appendixaTable A.8 : Stepping stone model (IBD) scenarios’ coverage probability and infinite confidence intervalsEstim Colony2 ONeSamp CoNe MLNe TMVPHet. Excess Sibship Likelihood Likelihood Like. w/ mig.W/in Inf W/in Inf W/in Inf W/in Inf W/in Inf W/in Inf W/in Inf W/in InfNe =50 m =0.01Edge 0 0 1 0.87 1 1 0.23 0 0.87 0.45 0.45 0 0.83 0 0.7 0In Meta 0.01 0 0.8 0.73 0.85 0.85 0.04 0 0.85 0.52 0.31 0 0.71 0 0.69 0Whole 0 0 1 1 1 1 0 0 1 1 1 1 0 0Ne =50 m = 0.1Edge 0 0 0.56 0.72 1 1 0.15 0 0.37 0 0 0 0.38 0 0.2 0In Meta 0 0 0.44 0.62 0.85 0.85 0 0 0.49 0 0 0 0.39 0 0.15 0Whole 0 0 0.03 0.03 0.03 0 0.06 0 1 1 0.04 1 0 0Ne =50 m =0.25Edge 0.03 0 0.48 0.63 0.58 0.61 0 0 0.02 0 0 0 0.09 0 0 0In Meta 0.14 0 0.39 0.55 0.03 0.03 0 0 0.07 0 0 0 0.12 0 0 0Whole 0 0 0.02 0.02 0.02 0 0 0 1 1 0 1 0 0Ne =500 m =0.01Edge 0 0 0.86 0.79 1 1 - - 0 0 0 0 0 0 0 0In Meta 0 0 0.77 0.71 1 1 - - 0 0 0 0 0 0 0 0Whole 0 0 0.04 0.04 0.03 0 - - 1 1 0.01 0.04 - -Ne =500 m =0.1Edge 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0In Meta 0 0 - - - - - - 0 0 0 0 0 0 0 0Whole 0.07 0 - - - - - - 1 1 0 0 - -Ne =500 m =0.25Edge 0 0 - - - - - - 0 0 0 0 0 0 - -In Meta 0 0 - - - - - - 0 0 0 0 0 0 0 0Whole 0.02 0 - - - - - - 1 1 0 0 - -NeEstimatorLDNe Het. Excess Coancestry Pollak Nei & Tajima Jorde & RymanWithin Inf Within Inf Within Inf Within Inf Within Inf Within InfNe =50 m =0.01Edge 0.55 0 0.64 0.88 0.1 0.01 0.8 0 0.86 0 0.9 0In Meta 0.61 0 0.59 0.89 0.09 0.02 0.85 0 0.81 0 0.82 0Whole 0 0 1 1 0 0 1 1 1 1 1 1Ne =50 m = 0.1Edge 0.09 0 0.47 0.88 0 0 0.78 0 0.78 0 0.81 0In Meta 0.05 0 0.47 0.87 0 0 0.8 0 0.81 0 0.87 0Whole 0.05 0.05 1 1 0 0 0.95 0.93 0.96 0.93 0.99 0.99Ne =50 m =0.25Edge 0 0 0.61 0.87 0 0 0.71 0 0.73 0 0.79 0In Meta 0 0 0.58 0.86 0 0 0.85 0 0.84 0 0.88 0Whole 1 1 0.98 0.98 0 0 1 1 1 1 1 1Ne =500 m =0.01Edge 0 0 0.89 0.89 0.01 0 0 0 0 0 0 0In Meta 0 0 0.89 0.88 0 0 0 0 0 0 0 0Whole 0.03 0.01 1 1 0 0 0.94 0.91 0.94 0.9 1 0.99Ne =500 m =0.1Edge 0 0 0.97 0.98 0 0 0 0 0 0 0.01 0.01In Meta 0 0 0.93 0.92 0 0 0 0 0 0 0 0Whole 1 1 1 1 0.57 0.03 1 1 1 1 1 1Ne =500 m =0.25Edge 0.05 0.01 0.92 0.92 0 0 0.12 0.06 0.12 0.06 0.38 0.31In Meta 0.08 0.07 0.97 0.97 0 0 0.14 0.1 0.14 0.1 0.61 0.51Whole 1 1 1 1 0.76 0.06 1 1 1 1 1 1The coverage probability for the true value of Ne being contained within the 95% confidence intervals (“Within") and the percent of time the upper95% confidence interval spans to infinity (“Inf") across all methods for stepping stone model (IBD) scenarios. Empty cells indicate cases where it wasnot applicable to estimate Ne while dashes indicate cases where Ne was not estimated due to computational limits (see text).122appendix aa .2 supplementary figuresRelated	Source	Unrelated	Source	Metapopula1on	as	Source	F igure A.1 : Range of incorrect source populations provided in the island modelscenarios for estimates from mlne. Grayed out arrows indicate the true immigrationsource to the focal patch while dotted lines indicate the incorrectly assumed sourceof immigration. For the stepping stone IBD model, the related patch used was oneseparated from the focal patch by three or thirteen intervening patches, with no othercases tested.123appendix a00. migrationlllNe =    50 500 5000Proportion Infinite EstimatesNe = 5000. Within Whole Sink Within WholeIsland ModelNe = 50000. = 0.01 m = 0.1Ne = 5000. Whole Within Whole Within WholeStepping StoneNe = 50000. = 0.01 m = 0.1 m = 0.25lllEstimColony, HetExcessONeSampTMVPCoNeLDNeMLNe likelihoodMLNe momentMLNe lik. w/ mig. sourceMLNe mom. w/ mig. sourceHetExcessCoancestryPollakNei/TajimaJorde/RymanF igure A.2 : Proportion of estimates that reported infinite values of Ne. Theproportions are out of 100 replicates except in cases where methods were not run forthe full 100 replicates or the result was considered invalid as stated by the programoutput or described by the program author(s). Note from the main text that negativevalues produced from neestimator have been counted as infinites, as per therecommendation of the authors (Do et al., 2014), and that for mlne’s likelihoodestimate, an estimate at the upper bound of the prior Ne is also considered as infinitehere.124appendixaF igure A.3 : All replicate estimated Ne values for neestimator’s ldne method across all scenarios. True Ne is shownby the horizontal green lines, point estimates in red with their average indicate by the orange square, and 95% confidenceintervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The threeisolated, no migration cases are shown on the left, followed by the island model migration cases in the middle and thestepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row.125appendixaF igure A.4 : All replicate estimated Ne values for mlne likelihood method across all scenarios. True Ne is shown bythe horizontal green lines, point estimates in red with their average indicate by the orange square, and 95% confidenceintervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The threeisolated, no migration cases are shown on the left, followed by the island model migration cases in the middle and thestepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row.126appendixaF igure A.5 : All replicate estimated Ne values for mlne moment method across all scenarios. True Ne is shown by thehorizontal green lines and point estimates in red with their average indicate by the orange square. Note that the momentestimate does not produce confidence intervals. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in thecenter row. The three isolated, no migration cases are shown on the left, followed by the island model migration cases inthe middle and the stepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row.127appendixaF igure A.6 : All replicate estimated Ne values for estim across all scenarios. True Ne is shown by the horizontal greenlines, point estimates in red with their average indicate by the orange square, and 95% confidence intervals by gray verticallines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The three isolated, no migrationcases are shown on the left, followed by the island model migration cases in the middle and the stepping stone IBD caseson the right, with Ne = 50 along the top row and 500 along the bottom row.128appendixaF igure A.7 : All replicate estimated Ne values for colony2’s heterozygote excess method across all scenarios. True Neis shown by the horizontal green lines, point estimates in red with their average indicate by the orange square, and 95%confidence intervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row.The three isolated, no migration cases are shown on the left, followed by the island model migration cases in the middleand the stepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row. Missing dataindicates replicates that were not run due to computational limits (see text).129appendixaF igure A.8 : All replicate estimated Ne values for colony2’s sibship likelihood method across all scenarios. True Neis shown by the horizontal green lines, point estimates in red with their average indicate by the orange square, and 95%confidence intervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row.The three isolated, no migration cases are shown on the left, followed by the island model migration cases in the middleand the stepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row. Missing dataindicates replicates that were not run due to computational limits (see text).130appendixaF igure A.9 : All replicate estimated Ne values for onesamp across all scenarios. True Ne is shown by the horizontalgreen lines, point estimates in red with their average indicate by the orange square, and 95% confidence intervals by grayvertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The three isolated, nomigration cases are shown on the left, followed by the island model migration cases in the middle and the stepping stoneIBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row. Missing data (i.e. all 500 and 5,000patch sizes except a few isolated 500) indicates replicates that were not run due to computational limits (see text).131appendixaF igure A.10 : All replicate estimated Ne values for cone across all scenarios. True Ne is shown by the horizontal greenlines, point estimates in red with their average indicate by the orange square, and 95% confidence intervals by gray verticallines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The three isolated, no migrationcases are shown on the left, followed by the island model migration cases in the middle and the stepping stone IBD caseson the right, with Ne = 50 along the top row and 500 along the bottom row. Estimates below zero are those for which theprogram produced values of -999.99, indicating an Ne outside of the specified prior (see text).132appendixaF igure A.11 : All replicate estimated Ne values for tmvp across all scenarios. True Ne is shown by the horizontal greenlines, point estimates in red with their average indicate by the orange square, and 95% confidence intervals by gray verticallines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The three isolated, no migrationcases are shown on the left, followed by the island model migration cases in the middle and the stepping stone IBD caseson the right, with Ne = 50 along the top row and 500 along the bottom row. Missing data indicates replicates that were notrun due to lack of convergence for reasonable parameter values (see text).133appendixaF igure A.12 : All replicate estimated Ne values for neestimator’s heterozygote excess method across all scenarios.True Ne is shown by the horizontal green lines, point estimates in red with their average indicate by the orange square,and 95% confidence intervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in thecenter row. The three isolated, no migration cases are shown on the left, followed by the island model migration cases inthe middle and the stepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row.134appendixaF igure A.13 : All replicate estimated Ne values for neestimator’s coancestry method across all scenarios. True Neis shown by the horizontal green lines, point estimates in red with their average indicate by the orange square, and 95%confidence intervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row.The three isolated, no migration cases are shown on the left, followed by the island model migration cases in the middleand the stepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row.135appendixaF igure A.14 : All replicate estimated Ne values for neestimator’s Pollak method across all scenarios. True Ne is shownby the horizontal green lines, point estimates in red with their average indicate by the orange square, and 95% confidenceintervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The threeisolated, no migration cases are shown on the left, followed by the island model migration cases in the middle and thestepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row. Negative estimatesare shown, but in analyses were changed to infinite Ne as indicated by the method’s authors (see text).136appendixaF igure A.15 : All replicate estimated Ne values for neestimator’s Nei and Tajima method across all scenarios. TrueNe is shown by the horizontal green lines, point estimates in red with their average indicate by the orange square, and 95%confidence intervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row.The three isolated, no migration cases are shown on the left, followed by the island model migration cases in the middleand the stepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row. Negativeestimates are shown, but in analyses were changed to infinite Ne as indicated by the method’s authors (see text).137appendixaF igure A.16 : All replicate estimated Ne values for neestimator’s Jorde and Ryman method across all scenarios. TrueNe is shown by the horizontal green lines, point estimates in red with their average indicate by the orange square, and 95%confidence intervals by gray vertical lines. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row.The three isolated, no migration cases are shown on the left, followed by the island model migration cases in the middleand the stepping stone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row. Negativeestimates are shown, but in analyses were changed to infinite Ne as indicated by the method’s authors (see text).138appendixaF igure A.17 : All replicate estimated Ne values for the average of mlne’s likelihood and moment methods across allscenarios. True Ne is shown by the horizontal green lines, point estimates in red with their average indicate by the orangesquare. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The three isolated, no migrationcases are shown on the left, followed by the island model migration cases in the middle and the stepping stone IBD caseson the right, with Ne = 50 along the top row and 500 along the bottom row. No confidence intervals are estimated.139appendixaF igure A.18 : All replicate estimated Ne values for the average of neestimator’s ldne at both temporal samplingpoints across all scenarios. True Ne is shown by the horizontal green lines, point estimates in red with their average indicateby the orange square. y-axis is 12Ne and scenarios are shown across the x-axis, labeled in the center row. The three isolated,no migration cases are shown on the left, followed by the island model migration cases in the middle and the steppingstone IBD cases on the right, with Ne = 50 along the top row and 500 along the bottom row. No confidence intervals areestimated.140appendixaF igure A.19 : All replicate estimated Ne values for the average of mlne’s likelihood and moment methods withneestimator’s ldne estimates at both temporal samples across all scenarios. True Ne is shown by the horizontalgreen lines, point estimates in red with their average indicate by the orange square. y-axis is 12Ne and scenarios are shownacross the x-axis, labeled in the center row. The three isolated, no migration cases are shown on the left, followed by theisland model migration cases in the middle and the stepping stone IBD cases on the right, with Ne = 50 along the top rowand 500 along the bottom row. No confidence intervals are estimated.141appendixaF igure A.20 : All replicate estimated Ne values for the average of mlne’s likelihood and moment methods with tmvpacross all scenarios. True Ne is shown by the horizontal green lines, point estimates in red with their average indicate bythe orange square. Scenarios are shown across the x-axis, labeled in the center row. The three isolated, no migration casesare shown on the left, followed by the island model migration cases in the middle and the stepping stone IBD cases onthe right, with Ne = 50 along the top row and 500 along the bottom row. No confidence intervals are estimated. Missingvalues are due to lack of tmvp estimates.142appendixaF igure A.21 : All replicate estimated Ne values for the average of mlne’s likelihood and moment methods withneestimator’s ldne estimates at both temporal samples and tmvp across all scenarios. True Ne is shown by thehorizontal green lines, point estimates in red with their average indicate by the orange square. Scenarios are shown acrossthe x-axis, labeled in the center row. The three isolated, no migration cases are shown on the left, followed by the islandmodel migration cases in the middle and the stepping stone IBD cases on the right, with Ne = 50 along the top row and500 along the bottom row. No confidence intervals are estimated. Missing values are due to lack of tmvp estimates.143appendix a0.0000.0050.0100.015Demographic ScenarioRMSE( 1/2Ne )lllLDNeMLNe LikelihoodMLNe MomentSample 250Sample 50Isolated Isolated Migration 50 Migration 50050 500F igure A.22 : RMSE compared across our standard sample size of 250 (filledpoints) versus a smaller sample size of 50 individuals (open points). ldne (blue) andmlne likelihood (red) and moment (dark red) methods are compared across isolatedpopulations at Ne = 50 and 500, and three migration scenarios each at Ne = 50 and500. The three migration scenarios are shown in order from left to right for eachmethod: the island model demography within metapopulation sampling scenario atm = 0.01, sink receiving immigrants at m = 0.1 from metapopulation at migrationrate m = 0.1, and within metapopulation m = 0.1, respectively. One result forldne at Ne = 500 showed a reduced RMSE for a population sampled within themetapopulation due to fewer infinite Ne estimates and increased precision around amore biased Ne estimate that nevertheless improved RMSE. Each point represents theaverage performance for a method over 100 replicate simulations.144appendix aF igure A.23 : RMSE of Ne across migration scenarios for mlne’s likelihood (solidlines) and moment (dashed lines) methods when identifying a source population thatis the correct source, a related source, an unrelated source, or (see figure S1) themetapopulation as the source. Ne = 50 is shown across the top row and Ne = 500across the bottom row. Island model scenarios are in the left panels and steppingstone (IBD) scenarios are in the right panels. Note that a migration source cannot beprovided when the whole metapopulation is sampled as one, so these cases are onlysingle populations experiencing migration either as a sink from the metapopulationor a population within the island model or stepping stone metapopulations.145appendix aF igure A.24 : Comparing migration rates for ongoing versus historic migrationscenarios estimated from mlne’s likelihood and moment methods. Green horizontallines show the true migration rate, where the left half of each panel is the case wheremigration rate is constant and the right half of each panel shows the same populationsexcept that migration stopped over the span of the temporal sampling. Therefore thetrue migration rate in the right half of each panel should reflect zero (green line)and the historic migration rate is shown by the horizontal gray line. Cases shownare sink populations and populations within the island model metapopulation wheremigration rates between the focal and source population are labeled in the center rowand correspond to the panels both above and below each label. Population size of 50is shown in the top row while 500 is shown in the bottom row.146appendix bSupplementary Information to Chapter 3b .1 supplementary results and analysesTablesTable B.1 : Fitness and expansion speed results per scenario parameter combina-tions.Environ- Genomic Dispersal Mean Total w¯ Quantitative DeleteriousmentalGradientDeleteriousMutationKernel ExpansionSpeedTrait w¯z Loci w¯D(b) Rate (UD) Core/Edge Core/Edge Core/Edge0.0 0.0 Gaussian 3.741 +/- 0.005 0.932/0.932 0.932/0.932 1/10.0375 0.0 Gaussian 2.314 +/- 0.011 0.929/0.399 0.929/0.399 1/10.375 0.0 Gaussian 0.756 +/- 0.003 0.870/0.199 0.870/0.199 1/10.0 0.1 Gaussian 3.559 +/- 0.008 0.863/0.748 0.933/0.933 0.925/0.8020.0375 0.1 Gaussian 2.283 +/- 0.011 0.859/0.390 0.929/0.461 0.924/0.8450.375 0.1 Gaussian 0.744 +/- 0.002 0.804/0.193 0.870/0.217 0.924/0.8910.0 1.0 Gaussian 2.285 +/- 0.014 0.496/0.306 0.933/0.945 0.532/0.3230.0375 1.0 Gaussian 1.991 +/- 0.011 0.494/0.268 0.929/0.694 0.532/0.3870.375 1.0 Gaussian 0.595 +/- 0.002 0.463/0.160 0.870/0.366 0.532/0.4370.0 0.0 Leptokurtic 5.689 +/- 0.010 0.933/0.929 0.933/0.929 1/10.0375 0.0 Leptokurtic 2.695 +/- 0.014 0.929/0.369 0.929/0.369 1/10.375 0.0 Leptokurtic 0.736 +/- 0.002 0.871/0.183 0.871/0.183 1/10.0 0.1 Leptokurtic 5.355 +/- 0.030 0.862/0.770 0.933/0.936 0.924/0.8230.0375 0.1 Leptokurtic 2.637 +/- 0.019 0.859/0.360 0.929/0.420 0.924/0.8580.375 0.1 Leptokurtic 0.721 +/- 0.002 0.803/0.181 0.870/0.206 0.923/0.8810.0 1.0 Leptokurtic 3.377 +/- 0.045 0.498/0.283 0.933/0.938 0.534/0.3020.0375 1.0 Leptokurtic 2.285 +/- 0.014 0.496/0.249 0.929/0.644 0.534/0.3880.375 1.0 Leptokurtic 0.577 +/- 0.004 0.463/0.162 0.870/0.367 0.532/0.442Expansion speeds are shown +/- 1 standard error and fitness measures are shown at the core and theedge.147appendix bFitness components impact expansion speed0.0 0.2 0.4 0.6 0.8 1.00123456  Total fitness at range edgeExpansion speed llll ll llll llll lll ll llllll ll llllllllUD = 0UD = 0.1UD = 1.0b  = 0b  = 0.0375b  = 0.375increasing bincreasing UDF igure B.1 : Range edge fitness predicts expansion speed, but more strongly incases with strong environmental gradients. The speed of expansion increases morerapidly as fitness increases with decreasing steepness of the environmental gradient(dark blue points and solid line, linear regression slope = 3.69, p < 0.0001). Speedof expansion increases, but less drastically, with higher fitness from decreasing UDand reducing expansion load (open circles and dashed line, linear regression slope= 1.95, p < 0.0001).148appendix bLeptokurtic dispersal kernel results0. FitnessUD = 0.0 UD = 0.1 UD = 1.0a)  Quantitative Trait0.0 0.03750.3750.0 0.03750.3750.0 0.03750.375Slope of environmental gradient (b )CoreEdge0. UD = 0.0 UD = 0.1 UD = 1.0b)  Deleterious Mutations0.0 0.03750.3750.0 0.03750.3750.0 0.03750.375Slope of environmental gradient (b )CoreEdgeF igure B.2 : Average core and edge fitness components for quantitative traitand deleterious alleles in the case of the leptokurtic dispersal kernel (long-distancedispersal). Error bars indicate 95% confidence intervals.149appendix bRecovery from expansion load0. since burn−in1000 2000 3000 4000 5000wDUD = 0.1, b  = 0UD = 0.1, b  = 0.0375UD = 0.1, b  = 0.375UD = 1.0, b  = 0UD = 1.0, b  = 0.0375UD = 1.0, b  = 0.375F igure B.3 : Recovery from expansion load measured as wD over time from initialcolonization to completion of simulations. Thin lines show all twenty replicatesimulations. Dashed lines are simulations with a kurtotic dispersal kernel (i.e.,increased long distance dispersal) while solid lines are those with a Gaussian dispersalkernel. The mean for each set of twenty replicates is indicated by the thick line.Fitness was measured through time at the cross section in the center of the landscape(x = 1000). Each line begins at the time point when individuals were first recordedin the central cross section, thus the starting point depends on expansion speed. Theend point in all cases is after the landscape filled, which took longest on the steepestenvironmental gradient (green).150appendix bFitness components through time across the landscapeF igure B.4 : Please refer to movie 1.Fitness across the landscape through time for each component of fitness, wD(deleterious mutations) and wz (quantitative trait) in the case of UD = 0.0. Timeis measured in generations relative to the end of the burn-in period of the simulation;i.e., t = 0 is the start of expansion. Each individual replicate (n = 20) is shownwith the average across replicates indicated by the darker, thick point size. Becausedifferent replicates have slightly different expansion speeds, the average at the rangeedge is noisy through space.F igure B.5 : Please refer to movie 2.Fitness across the landscape through time for each component of fitness, wD(deleterious mutations) and wz (quantitative trait) in the case of UD = 0.1. Timeis measured in generations relative to the end of the burn-in period of the simulation;i.e., t = 0 is the start of expansion. Each individual replicate (n = 20) is shownwith the average across replicates indicated by the darker, thick point size. Becausedifferent replicates have slightly different expansion speeds, the average at the rangeedge is noisy through space.F igure B.6 : Please refer to movie 3.Fitness across the landscape through time for each component of fitness, wD(deleterious mutations) and wz (quantitative trait) in the case of UD = 1.0. Timeis measured in generations relative to the end of the burn-in period of the simulation;i.e., t = 0 is the start of expansion. Each individual replicate (n = 20) is shownwith the average across replicates indicated by the darker, thick point size. Becausedifferent replicates have slightly different expansion speeds, the average at the rangeedge is noisy through space.151appendix bExpansion load due to fixed loci0. UD = 0.10.0 0.03750.375Slope of environmental gradient (b )Expansion loadTotalDue to fixed lociCoreEdge0. UD = 1.00.0 0.03750.375Slope of environmental gradient (b )TotalDue to fixed lociCoreEdgeF igure B.7 : The proportion of expansion load due to loci fixed only at the rangeedge. Load for these loci is measured at both the range edge and in a segment ofthe range core with approximately the same total population size. Because we onlymodeled 1,000 loci subject to unconditionally deleterious loci, fixation for small effectloci is inflated. Hence, in this analysis, we control for loci that are fixed in boththe range core and at the range edge. We remove those loci from this analysis andcalculate the load due to loci fixed only at the edge. Fixation is measured for thesame population of individuals defined to be at the edge as in our fitness calculations.The amount of load due to fixed loci was much higher at the range edge becausethe alleles are at higher frequency in the edge populations. The proportion of loaddue to these fixed loci increased with decreasing steepness of environmental gradient,and also increased with decreasing genome-wide deleterious mutation rate, UD. Bothof these results make sense in light of our study’s findings where faster expansioncontributes to increased expansion load. This shows that faster expansion also allowsfor increased fixation of deleterious alleles at the range edge.152appendix bAllele frequencies through time across the landscapeF igure B.8 : Please refer to movie 4.Frequency of deleterious alleles across the landscape throughout the course of asimulation. This simulation was selected as one where a large effect allele fixes on thelandscape (leptokurtic dispersal kernel, UD = 0.1, b = 0). Loci are colored by bins ofhomozygote effect size as indicated in the legend. Darker blues are larger effect loci.A single locus is colored in orange to represent one of the large effect loci (s = 0.0898)that fixed during the course of expansion. Time is measured in generations relative tothe end of the burn-in period of the simulation; i.e., t = 0 is the start of expansion. Astime proceeds, it can be seen that more loci are fixed at the range edge as a result ofallele surfing. Over time during expansion, recovery from fixation at the expandingfront occurs and larger effect loci are reduced in frequency.153appendix cSupplementary Information to Chapter 4Table C.1 : Per locus effect size differences across scenarios of genetic architectureregimes. The range from minimum to maximum is shown for each scenario, as wellas the mean.Genetic architecture ∆zopt Per locus difference: range (mean)µ = 10−2 5 −0.0674− 0.4603 (0.0190)α2 = 0.005 7.5 −0.0993− 0.4884 (0.0321)10 −0.3633− 0.5311 (0.0479)12.5 −0.4187− 0.5464 (0.0583)µ = 10−3 5 −0.0655− 0.2864 (0.0103)α2 = 0.05 7.5 −0.1233− 0.7258 (0.0277)10 −0.4088− 0.8786 (0.0447)12.5 −0.4308− 0.6513 (0.0596)µ = 10−4 5 −0.0144− 1.1386 (0.01799)α2 = 0.5 7.5 −0.2206− 2.6775 (0.0251)10 No expansion12.5 No expansionµ = 10−5 5 −0.0055− 2.6881 (0.0228)α2 = 5.0 7.5 −0.1967− 4.8746 (0.0190)10 −0.6236− 6.4207 (0.0522)12.5 No expansion154appendix cF igure C.1 : Population statistics for all genetic architecture regimes across thelandscape at the end of the simulation period for all replicates. Colors match thosein Figure 4.2. Mean fitness is shown in the first panel, followed by mean populationsize, and then genetic variance measured within each cross section. Simulations werehalted at generation 20,000 in all cases, hence slowly expanding replicates show asnot having colonized the full landscape. All populations survive in the landscapecore, and steeper gradients go extinct outside of this core. Within the core, eachgenetic architecture regime has its own equilibrium fitness and genetic variance, butonce expanding over the gradient, these values are determined by the steepness of b,not by the genetic architecture regime. The only exception to this is for a very weakgradient, b = 0.0075 (black), where the different genetic architecture regimes reflectdifferent population values across the landscape.155appendix cF igure C.2 : Genetic variance across parameter sets approaching respectiveequilibria during simulation burn-in.156appendix c0 5000 10000 15000 20000Time to Expansion10−210−310−410−5µ0.0050.050.55.0α2 ● ●∆zopt5 7.5 10 12.5 15F igure C.3 : Time spent (in generations) before expansion into second patch foreach genetic architecture regime with a larger dispersal kernel of σdisperse = 4. Pointsare jittered for visualization. All points beyond the vertical dashed line are censoredand did not expand during the course of the simulation.157appendix cF igure C.4 : Approximately constant genetic variance across parameter setsapproaching respective equilibria during simulation burn-in.158appendix c1 2 5 10 20 50 200 500020040060080010001200∆zopt = 5Mean population growth ratePatch Size5 2.5 1 0.5 0.25 0.09 0.03 0.01bl ll lllllllll l1 2 5 10 20 50 200 500020040060080010001200∆zopt = 10Patch Size10 5.0 2.0 1.0 0.50 0.17 0.07 0.02bl l l l l lllll llllllµ = 10−2;  α2 = 0.005  µ = 10−3;  α2 = 0.05µ = 10−4;  α2 = 0.5µ = 10−5;  α2 = 5.0F igure C.5 : Mean population growth rates (number of individuals per generation)across patchy landscapes for a large dispersal kernel of σdisperse = 4. ∆zopt is constantat either 5 or 10. The size of each patch directly correlates to the number of patcheson the landscape. The overall gradient increases with additional steps and can becalculated as b = ∆zopt(number o f patches− 1).159appendix dSupplementary Information to Chapter 5d .1 papers and data used in our study1. Baker A, Pearson T, Price EP, Dale J, Keim P, Hornstra H, Greenhill A, Padilla G,Warner J (2011) Molecular phylogeny of Burkholderia pseudomallei from a remoteregion of Papua New Guinea. PLoS One, 6(3), e18343.doi:10.1371/journal.pone.00183432. Beatty GE, Provan J (2011) Comparative phylogeography of two related plantspecies with overlapping ranges in Europe, and the potential effects of climatechange on their intraspecific genetic diversity. BMC Evolutionary Biology, 11:29,doi:10.1186/1471-2148-11-293. Bilgmann K, Moeller LM, Harcourt RG, Kemper CM, Beheregaray LB (2011)The use of carcasses for the analysis of cetacean population genetic structure: Acomparative study in two dolphin species. PLoS One, 6(5), e20103.doi:10.1371/journal.pone.00201034. Bol SM, Moerland PD, Limou S, van Remmerden Y, Coulonges C, van ManenD, Herbeck JT, Fellay J, Sieberer M, Sietzema JG, van’t Slot R, Martinson J,Zagury J-F, Schuitemaker H, van’t Wout AB (2011) Genome-wide associationstudy identifies single nucleotide polymorphism in DYRK1A associated withreplication of HIV-1 in monocyte-derived macrophages. PLoS One, 6(2), e17190.doi:10.1371/journal.pone.00171905. Bonilla C, Hooker S, Mason T, Bock CH, Kittles RA (2011) Prostate cancersusceptibility loci identified on chromosome 12 in African Americans. PLoSOne, 6(2), e16044. doi:10.1371/journal.pone.0016044160appendix d6. Caniato FF, Guimaraes CT, Hamblin M, Billot C, Rami J-F, Hufnagel B, KochianLV, Liu J, Garcia AAF, Hash CT, Ramu P, Mitchell S, Kresovich S, Oliveira AC,de Avellar G, Borem A, Glaszmann J-C, Schaffert RE, Magalhaes JV (2011) Therelationship between population structure and aluminum tolerance in cultivatedsorghum. PLoS One, 6(6), e20830. doi:10.1371/journal.pone.00208307. Colbeck GJ, Turgeon J, Sirois P, Dodson JJ (2011) Historical introgression and therole of selective vs. neutral processes in structuring nuclear genetic variation(AFLP) in a circumpolar marine fish, the capelin (Mallotus villosus). MolecularEcology, 20, 1976-1987.8. Colbeck GJ, Turgeon J, Sirois P, Dodson JJ (2011) Data from: Historical introgres-sion and the role of selective vs. neutral processes in structuring nuclear geneticvariation (AFLP) in a circumpolar marine fish, the capelin (Mallotus villosus).Dryad Digital Repository. doi:10.5061/dryad.86169. Combosch DJ, Vollmer SV (2011) Population genetics of an ecosystem-definingreef coral Pocillopora damicornis in the tropical eastern Pacific. PLoS One, 6(8),e21200. doi:10.1371/journal.pone.002120010. Cox K, Broeck AV, Calster HV, Mergeay J (2011) Temperature-related naturalselection in a wind-pollinated tree across regional and continental scales. Molec-ular Ecology, 20, 27224-2738.11. Cox K, Vanden Broeck A, Van Calster H, Mergeay J (2011) Data from: Temperature-related natural selection in a wind-pollinated tree across regional and continen-tal scales. Dryad Digital Repository. doi:10.5061/dryad.906712. Cullingham CI, Cooke JEK, Dang S, Davis CS, Cooke BJ, Coltman DW (2011)Mountain pine beetle host-range expansion threatens the boreal forest. MolecularEcology, 20, 2157-2171.13. Cullingham CI, Cooke JEK, Dang S, Davis CS, Cooke BJ, Coltman DW (2011)Data from: Mountain pine beetle host-range expansion threatens the borealforest. Dryad Digital Repository. doi:10.5061/dryad.8677161appendix d14. Didelot X, Bowden R, Street T, Golubchik T, Spencer C, McVean G, SangalV, Anjum MF, Achtman M, Falush D, Donnelly P (2011) Recombination andpopulation structure in Salmonella enterica. PLoS Genetics, 7(7), e1002191.doi:10.1371/journal.pgen.100219115. Erdner DL, Richlen M, McCauley LAR, Anderson DM (2011) Diversity and dy-namics of a widespread bloom of the toxic dinoflagellate Alexandrium fundyense.PLoS One, 6(7), e22965. doi:10.1371/journal.pone.002296516. Gomaa NH, Montesinos-Navarro A, Alonso-Blanco C, Picó FX (2011) Temporalvariation in genetic diversity and effective population size of Mediterranean andsubalpine Arabidopsis thaliana populations. Molecular Ecology, 20, 3540-3554.17. Gonthier P, Garbelotto M (2011) Amplified fragment length polymorphism andsequence analyses reveal massive gene introgression from the European fun-gal pathogen Heterobasidion annosum into its introduced congener H. irregulare.Molecular Ecology, 20, 2756-2770.18. Groot AT, Classen A, Inglis O, Blanco CA, López Jr J, Vargas AT, Schal C,Heckel DG, Schöfl G (2011) Genetic differentiation across North America inthe generalist moth Heliothis virescens and the specialist H. subflexa. MolecularEcology, 20, 2676-2692.19. Groot A, Classen A, Staudacher H, Schal C, Heckel D (2010) Data from: Phe-notypic plasticity in sexual communication signal of a noctuid moth. DryadDigital Repository. doi:10.5061/dryad.195620. Guillemaud T, Blin A, Simon S, Morel K, Franck P (2011) Weak spatial and tem-poral population genetic structure in the rosy apple aphid, Dysaphis plantaginea,in French apple orchards. PLoS One, 6(6),e21263. doi:10.1371/journal.pone.002126321. Guivier E, Galan M, Chaval Y, XuO˝reb A, Salvador AR, Poulle M-L, VoutilainenL, Henttonen H, Charbonnel N, Cosson JF (2011) Landscape genetics highlights162appendix dthe role of bank vole metapopulation dynamics in the epidemiology of Puumalahantavirus. Molecular Ecology, 20, 3569-3583.22. Guivier E, Galan M, Chaval Y, Xuéreb A, Ribas Salvador A, Poulle M, Vouti-lainen L, Henttonen H, Charbonnel N, Cosson JF (2011) Data from: Landscapegenetics highlights the role of bank vole metapopulation dynamics in the epi-demiology of Puumala hantavirus. Dryad Digital Repository.doi:10.5061/dryad.c6qj023. Gunn BF, Baudouin L, Olsen KM (2011) Independent origins of cultivated co-conut (Cocos nucifera L.) in the old world tropics. PLoS One, 6(6), e21143.doi:10.1371/journal.pone.002114324. Hao C, Wang L, Ge H, Dong Y, Zhang X (2011) Genetic diversity and linkagedisequilibrium in Chinese bread wheat (Triticum aestivum L.) revealed by SSRmarkers. PLoS One, 6(2), e17279. doi:10.1371/journal.pone.001727925. Hermansen JS, Sæther SA, Elgvin TO, Borge T, Hjelle E, Sætre G-P (2011) Hy-brid speciation in sparrows I: phenotypic intermediacy, genetic admixture andbarriers to gene flow. Molecular Ecology, 20, 3812-3822.26. Hermansen JS, Sæther SA, Borge T, Elgvin TO, Hjelle E, Sætre G (2011) Datafrom: Hybrid speciation in sparrows I: phenotypic intermediacy, genetic admix-ture and barriers to gene flow. Dryad Digital Repository. doi:10.5061/dryad.k7vh927. Hoffman JI, Grant SM, Forcada J, Phillips CD (2011) Bayesian inference of ahistorical bottleneck in a heavily exploited marine mammal. Molecular Ecology,20, 3989-4008.28. Hoffman JI, Grant SM, Forcada J, Phillips CD (2011) Data from: Bayesian infer-ence of a historical bottleneck in a heavily exploited marine mammal. DryadDigital Repository. doi:10.5061/dryad.0kj5n163appendix d29. Jacobs MMJ, Smulders MJM, van den Berg RG, Vosman B (2011) What’s ina name; Genetic structure in Solanum section Petota studied using population-genetic tools. BMC Evolutionary Biology, 11:42, doi:10.1186/1471-2148-11-4230. Kanno Y, Vokoun JC, Letcher BJ (2011) Fine-scale population structure andriverscape genetics of brook trout (Salvelinus fontinalis) distributed continuouslyalong headwater channel networks. Molecular Ecology, 20, 3711-3729.31. Kanno Y, Vokoun JC, Letcher BH (2011) Data from: Fine-scale population struc-ture and riverscape genetics of brook trout (Salvelinus fontinalis) distributedcontinuously along headwater channel networks. Dryad Digital Repository.doi:10.5061/dryad.5f8s232. Koblmüller S, Salzburger W, Obermüller B, Eigner E, Sturmbauer C, Sefc KM(2011) Separated by sand, fused by dropping water: habitat barriers and fluc-tuating water levels steer the evolution of rock-dwelling cichlid populations inLake Tanganyika. Molecular Ecology, 20, 2272-2290.33. Koblmüller S, Salzburger W, Obermüller B, Eigner E, Sturmbauer C, Sefc KM(2011) Data from: Separated by sand, fused by dropping water: habitat barriersand fluctuating water levels steer the evolution of rock-dwelling cichlid popula-tions in Lake Tanganyika. Dryad Digital Repository. doi:10.5061/dryad.883234. Komiya T, Fujita S, Watanabe K (2011) A novel resource polymorphism in fish,driven by differential bottom environments: An example from an ancient lakein Japan. PLoS One, 6(2), e17430. doi:10.1371/journal.pone.001743035. Kunte K, Shea C, Aardema ML, Scriber JM, Juenger TE, Gilbert LE, KronforstMR (2011) Sex chromosome mosaicism and hybrid speciation among tiger swal-lowtail butterflies. PLoS Genetics, 7(9), e1002274. doi:10.1371/journal.pgen.100227436. Langen K, Schwarzer J, Kullmann H, Bakker TCM, Thuenken T (2011) Mi-crosatellite support for active inbreeding in a cichlid fish. PLoS One, 6(9), e24689.doi:10.1371/journal.pone.0024689164appendix d37. Latch EK, Boarman WI, Walde A, Fleischer RC (2011) Fine-scale analysis revealscryptic landscape genetic structure in desert tortoises. PLoS One, 6(11), e27794.doi:10.1371/journal.pone.002779438. Lawton RJ, Messmer V, Pratchett MS, Bay LK (2011) High gene flow across largegeographic scales reduces extinction risk for a highly specialized coral feedingbutterflyfish. Molecular Ecology, 20, 3584-3598.39. Lawton RJ, Messmer V, Pratchett MS, Bay LK (2011) Data from: High gene flowacross large geographic scales reduces extinction risk for a highly specialisedcoral feeding butterflyfish. Dryad Digital Repository. doi:10.5061/dryad.4m80r40. Leache AD (2011) Multi-locus estimates of population structure and migrationin a fence lizard hybrid zone. PLoS One, 6(9), e25827.doi:10.1371/journal.pone.002582741. Lee C-R, Mitchell-Olds T (2011) Quantifying effects of environmental and ge-ographical factors on patterns of genetic differentiation. Molecular Ecology, 20,4631-4642.42. Lee C, Mitchell-Olds T (2011) Data from: Quantifying effects of environmentaland geographical factors on patterns of genetic differentiation. Dryad DigitalRepository. doi:10.5061/dryad.6rs5143. Lee S, Jia Y, Jia M, Gealy DR, Olsen KM, Caicedo AL (2011) Molecular evolutionof the rice blast resistance gene Pi-ta in invasive weedy rice in the USA. PLoSOne, 6(10), e26260. doi:10.1371/journal.pone.002626044. Leite TKM, Fonseca RMC, de Franca NM, Parra EJ, Pereira RW (2011) Genomicancestry, self-reported "color” and quantitative measures of skin pigmentationin Brazilian admixed siblings. PLoS One, 6(11), e27162.doi:10.1371/journal.pone.002716245. Litvintseva AP, Carbone I, Rossouw J, Thakur R, Govender NP, Mitchell TG(2011) Evidence that the human pathogenic fungus Cryptococcus neoformans var.165appendix dgrubii may have evolved in Africa. PLoS One, 6(5), e19688.doi:10.1371/journal.pone.001968846. Lombaert E, Guillemaud T, Thomas CE, Lawson Handley LJ, Li J, Wang S,Pang H, Goryacheva I, Zakharov IA, Jousselin E, Poland RL, Migeon A, vanLenteren J, de Clercq P, Berkvens N, Jones W, Estoup A (2011) Inferring theorigin of populations introduced from a genetically structured native rangeby approximate Bayesian computation: case study of the invasive ladybirdHarmonia axyridis. Molecular Ecology, 20, 4654-4670.47. Lombaert E, Guillemaud T, Thomas CE, Lawson Handley LJ, Li J, Wang S, PangH, Goryacheva I, Zakharov IA, Jousselin E, Poland RL, Migeon A, van LenterenJ, De Clercq P, Berkvens N, Jones W, Estoup A (2011) Data from: Inferring theorigin of populations introduced from a genetically structured native rangeby approximate Bayesian computation: case study of the invasive ladybirdHarmonia axyridis. Dryad Digital Repository. doi:10.5061/dryad.7m0b37bn48. Ma Y, Yang M, Fan Y, Wu J, Ma Y, Xu J (2011) Population structure of the malariavector Anopheles sinensis (Diptera: Culicidae) in China: Two gene pools inferredby microsatellites. PLoS One, 6(7), e22219. doi:10.1371/journal.pone.002221949. Magalhães IS, Gleiser G, Labouche A-M, Bernasconi G (2011) Comparative pop-ulation genetic structure in a plant-pollinator/seed predator system. MolecularEcology, 20, 4618-4630.50. Magalhães S, Blanchet E, Egas M, Olivieri I (2011) Data from: Environmentaleffects on the detection of adaptation. Dryad Digital Repository.doi:10.5061/dryad.1kv0451. Martin C, Herrero M, Hormaza JI (2011) Molecular characterization of apricotgermplasm from an old stone collection. PLoS One, 6(8), e23979.doi:10.1371/journal.pone.002397952. Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, Atzmon G, BurnsE, Ostrer H, Price AL, Reich D (2011) The History of African gene flow into166appendix dsouthern Europeans, Levantines, and Jews. PLoS Genetics, 7(4), e1001373.doi:10.1371/journal.pgen.100137353. Muscarella RA, Murray KL, Ortt D, Russell AL, Fleming TH (2011) Exploringdemographic, physical, and historical explanations for the genetic structure oftwo lineages of greater Antillean bats. PLoS One, 6(3), e17704.doi:10.1371/journal.pone.001770454. Nance HA, Klimley P, Galvan-Magana F, Martinez-Ortiz J, Marko PB (2011)Demographic processes underlying subtle patterns of population structure inthe scalloped hammerhead shark, Sphyrna lewini. PLoS One, 6(7),e21459. doi:10.1371/journal.pone.002145955. Ng J, Glor RE (2011) Genetic differentiation among populations of a Hispaniolantrunk anole that exhibit geographical variation in dewlap colour. MolecularEcology, 20, 4302-4317.56. Ozer F, Gellerman H, Ashley MV (2011) Genetic impacts of Anacapa deer micereintroductions following rat eradication. Molecular Ecology, 20, 3525-3539.57. Ozer F, Gellerman H, Ashley MV (2011) Data from: Genetic impacts of Anacapadeer mice reintroductions following rat eradication. Dryad Digital Repository.doi:10.5061/dryad.778c958. Pino-Yanes M, Corrales A, Basaldua S, Hernandez A, Guerra L, Villar J, FloresC (2011) North African influences and potential bias in case-control associationstudies in the Spanish population. PLoS One, 6(3), e18389.doi:10.1371/journal.pone.001838959. Qu Y, Luo X, Zhang R, Song G, Zou F, Lei F (2011) Lineage diversification andhistorical demography of a montane bird Garrulax elliotii - implications for thePleistocene evolutionary history of the eastern Himalayas. BMC EvolutionaryBiology, 11:174, doi:10.1186/1471-2148-11-174167appendix d60. Ricca M, Szövényi P, Temsch EM, Johnson MG, Shaw AJ (2011) Interploid hy-bridization and mating patters in the Sphagnum subsecundum complex. MolecularEcology, 20, 3203-3218.61. Ricca M, Szövényi P, Johnson MG, Shaw AJ, Temsch EM (2011) Data from:Interploidal hybridization and mating patterns in the Sphagnum subsecundumcomplex. Dryad Digital Repository. doi:10.5061/dryad.6pn8462. RonnNˇs C, Cassel-Lundhagen A, Battisti A, Wallén J, Larsson S (2011) Limitedemigration from an outbreak of a forest pest insect. Molecular Ecology, 20, 4606-4617.63. RonnNˇs C, Cassel-Lundhagen A, Battisti A, Wallén J, Larsson S (2011) Datafrom: Limited emigration from an outbreak of a forest pest insect. Dryad DigitalRepository. doi:10.5061/dryad.gg61064. Rossetto M, Thurlby KAG, Offord CA, Allen CB, Weston PH (2011) The impactof distance and a shifting temperature gradient on genetic connectivity acrossa heterogeneous landscape. BMC Evolutionary Biology, 11:126, doi:10.1186/1471-2148-11-12665. Sanchez-Guillen RA, Wellenreuther M, Cordero-Rivera A, Hansson B (2011)Introgression and rapid species turnover in sympatric damselflies. BMC Evo-lutionary Biology, 11:210, doi:10.1186/1471-2148-11-21066. Sandrock C, Razmjou J, Vorburger C (2011) Climate effects on life cycle vari-ation and population genetic architecture of the black bean aphid, Aphis fabae.Molecular Ecology, 20, 4165-4181.67. Sandrock C, Razmjou J, Vorburger C (2011) Data from: Climate effects on lifecycle variation and population genetic architecture of the black bean aphid,Aphis fabae. Dryad Digital Repository. doi:10.5061/dryad.2d1b868. Shen K-N, Jamandre BW, Hsu C-C, Tzeng W-N, Durand J-D (2011) Plio-Pleistocenesea level and temperature fluctuations in the northwestern Pacific promoted168appendix dspeciation in the globally-distributed flathead mullet Mugil cephalus. BMC Evo-lutionary Biology, 11:83, doi:10.1186/1471-2148-11-8369. Takahashi MK, Horner LM, Kubota T, Keller NA, Abrahamson WG (2011) Exten-sive clonal spread and extreme longevity in saw palmetto, a foundation clonalplant. Molecular Ecology, 20, 3730-3742.70. Takahashi MK, Horner LM, Kubota T, Keller NA, Abrahamson WG (2011) Datafrom: Extensive clonal spread and longevity of saw palmetto (Serenoa repens)provide insight into management plans. Dryad Digital Repository.doi:10.5061/dryad.6th2471. Trognitz B, Scheldeman X, Hansel-Hohl K, Kuant A, Grebe H, Hermann M(2011) Genetic population structure of cacao plantings within a young produc-tion area in Nicaragua. PLoS One, 6(1), e16056. doi:10.1371/journal.pone.001605672. Verspoor RL, Haddrill PR (2011) Genetic diversity, population structure and Wol-bachia infection status in a worldwide sample of Drosophila melanogaster and D.simulans populations. PLoS One, 6(10), e26318. doi:10.1371/journal.pone.002631873. Via M, Gignoux CR, Roth LA, Fejerman L, Galanter J, Choudhry S, Toro-LabradorG, Viera-Vera J, Oleksyk TK, Beckman K, Ziv E, Risch N, Burchard EG, Martinez-Cruzado JC (2011) History shaped the geographic distribution of genomic ad-mixture on the island of Puerto Rico. PLoS One, 6(1), e16513.doi:10.1371/journal.pone.001651374. Vigouroux Y, Mariac C, De Mita S, Pham J-L Gerard B, Kapran I, Sagnard F, DeuM, Chantereau J, Ali A, Ndjeunga J, Luong V, Thuillet A-C, Saidou A-A, Bezan-con G (2011) Selection for earlier flowering crop associated with climatic varia-tions in the Sahel. PLoS One, 6(5), e19563. doi:10.1371/journal.pone.001956375. Wallace LE, Culley TM, Weller SG, Sakai AK, Kuenzi A, Roy T, Wagner WL,Nepokroeff M (2011) Asymmetrical gene flow in a hybrid Zone of HawaiianSchiedea (Caryophyllaceae) species with contrasting mating systems. PLoS One,6(9), e24845. doi:10.1371/journal.pone.0024845169appendix d76. Wellenreuther M, Sanchez-Guillen RA, Cordero-Rivera A, Svensson EI, Hans-son B (2011) Environmental and climatic determinants of molecular diversityand genetic population structure in a Coenagrionid damselfly. PLoS One, 6(6),e20440. doi:10.1371/journal.pone.002044077. Zaffarano PL, Queloz V, Duo A, Gruenig CR (2011) Sex in the PAC: A hidden af-fair in dark septate endophytes? BMC Evolutionary Biology, 11:282, doi:10.1186/1471-2148-11-28278. Zardi GI, Nicastro KR, Canovas F, Costa JF, Serrao EA, Pearson GA (2011)Adaptive traits are maintained on steep selective gradients despite gene flowand hybridization in the intertidal zone. PLoS One, 6(6), e19402.doi:10.1371/journal.pone.0019402170appendixdTable D.1 : Characteristics of structure runs in original studiesStudy MarkertypeNumberof lociNumberindi-vidualsNumber pre-determinedclustersBurn-inMCMCrep.sRep.s KtestedCorrelatedallelefreq.s?AncestryModelLoc. prior? K selectionmethod∆K or Like-lihood plotshown?Barplotshown?Chosen K1 AFLP 252 179 - 100000 400000 8 1-8 Yes Admix. No P & E Yes Yes 2243 179 - 100000 400000 8 1-8 Yes Admix. No P & E Yes Yes 32 SSR 309 9 11 200000 1000000 5 1-12 Yes Admix. Yes E Yes Yes 2309 9 11 200000 1000000 5 1-12 Yes Admix. Yes E Yes Yes 3309 9 11 200000 1000000 5 1-12 Yes Admix. Yes E Yes Yes 53 SSR 9 258 9 50000 500000 50 1-10 Yes Admix. N.S. N.S. No Yes 68 111 9 50000 500000 50 1-10 Yes Admix. N.S. N.S. No Yes 14 SSR 12 233 5 1000000 10000000 10 1-5 Yes Admix. With and w/out E Yes No 211 233 5 1000000 10000000 10 1-5 Yes Admix. With and w/out E Yes No 25 SSR 8 740 - 100000 500000 10 1-10 Yes Admix. N.S. P & E Yes Yes 28 740 - 100000 500000 10 1-10 Yes Admix. N.S. P & E Yes Yes 48 740 - 100000 500000 10 1-10 Yes Admix. N.S. P & E Yes Yes 16 SSR 15, 18 538 1 100000 1000000 10 1-8 Yes Admix. N.S. P & E Yes Yes 218 181 3 100000 1000000 10 1-8 Yes Admix. No P & E Yes Yes 518 102 2 100000 1000000 10 1-8 Yes Admix. No P Yes N/A 17 SSR 317 15 47 100000 100000 N.S. N.S. N.S. Admix. Yes P & E No Yes 3317 15 47 100000 100000 N.S. N.S. N.S. Admix. No P & E No Yes 38 SSR 14 751 6 500000 1000000 10 1-9 Yes Admix. Yes P & E Yes Yes 214 751 6 500000 1000000 10 1-9 Yes Admix. No P & E Yes Yes 29 AFLP 564 267 9 10,000† 10,000† 5 1-9 No† Admix.† No P No Yes 410 SSR 8 218 11 10000 100000 10 1-6 Yes Admix. No P & E No Yes 211 SNP 160 142 9 20000 50000 10 2-15 Yes Linkage No Nonparam.Wilcoxon testNo Yes 612 SSR 16 310 - 300000 600000 100 1-10 Yes Admix. N.S. E Yes Yes 3, 4 *13 SSR 21 246 1 100000 1000000 5 1-9 Yes Admix. N.S. P Yes No 114 SSR 10 1361 - 10000 100000 20 1-6 Yes Admix. No P & E Yes Yes 315 SSR 12 380 16 10000 100000 20 1-16 Yes Admix. Yes E No Yes 216 SSR 18 586 9 100000 1000000 20 1-9 Yes Admix. Yes P Yes Yes 217 Sequence 153 550 35 500000 100000 7 1-35 No Admix. Yes E No No 218 SSR 512 250 - 50000 100000 5 1-11 Default Admix. No E Yes No 219 SSR 8 587 18 5000 20000 20 1-18 Default N.S. N.S. E No No 220 AFLP 465 409 - 1000 10000 N.S. N.S. No No Admix. N.S. N.S. No No 221 AFLP 134 284 13 10000 100000 20 1-5‡ Default Admix. With and w/out P & E Yes Yes 5 With, 3 W/out22 SSR 11 678 2 50000 500000 10 2** No Admix. No N/A** N/A Yes K not inferred **23 SSR 17 239 - 1000000 1000000 3 2, 3 N.S. N.S. N.S. N/A*** N/A No*** K not inferred**** Runs were divided according to G’ statistic in CLUMPP to produce two different conclusions, but method was not clear enough to reproduce theanalysis** structure was run only at K = 2 to identify hybrid individuals.*** Compared the results of K = 2 and 3 based on findings of a previous study, only Q values were provided.† Reanalysis was performed with burn-in of 100,000 and MCMC repetitions of 1,000,000; not all parameter settings were stated yet were obtainedfrom email correspondence.‡ Range of K tested obtained from likelihood plot; not stated in the text.N.S. Not stated in paperN.P. Figure not provided171appendix dTable D.2 : Characteristics of structure runs reanalyzed. Re-analyses followedthe same settings as the original paper, using defaults when nothing was specified.Study Chosen K K matches? ∆K or likelihoodplot matches?Bar plotmatches?Reproduced?1 2 Yes Yes Yes Yes4 No No No No2 2 Yes Yes Yes Yes3 Yes Yes Yes Yes4 No No No No3 2 No - No No1 Yes - Yes Yes4 2 Yes Yes - Yes2 Yes Yes - Yes5 2 Yes Yes Yes Yes4 Yes Yes Yes Yes1 Yes Yes Yes Yes6 2 Yes Yes Yes Yes4 No - - No1 Yes Yes - Yes7 3 Yes - Yes Yes2 No No No No8 2 Yes Yes Yes Yes2 Yes Yes Yes Yes9 2 No - No No10 2 Yes - Yes Yes11 2 No - No No12 3, 5 * No No No No*13 1 Yes Yes - Yes14 2 No No No No15 2 Yes - Yes Yes16 2 Yes Yes Yes Yes17 2 Yes - - Yes18 2 Yes - - Yes19 2 Yes - - Yes20 Data was notavailableN/A21 Data was notavailableN/A22 - - - - N/A**23 2 N/A - - N/A**** Runs were divided according to G’ statistic in CLUMPP to produce two different conclusions, butmethod was not clear enough to reproduce the analysis** structure was run only at K = 2 to identify hybrid individuals.*** Compared the results of K = 2 and 3 based on findings of a previous study, only Q values wereprovided.172


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items