Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Genetic population structure and adaptation to climate across the range of eastern white pine (Pinus… Nadeau, Simon 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2014_november_nadeau_simon.pdf [ 3.68MB ]
Metadata
JSON: 24-1.0166943.json
JSON-LD: 24-1.0166943-ld.json
RDF/XML (Pretty): 24-1.0166943-rdf.xml
RDF/JSON: 24-1.0166943-rdf.json
Turtle: 24-1.0166943-turtle.txt
N-Triples: 24-1.0166943-rdf-ntriples.txt
Original Record: 24-1.0166943-source.json
Full Text
24-1.0166943-fulltext.txt
Citation
24-1.0166943.ris

Full Text

  GENETIC POPULATION STRUCTURE AND ADAPTATION TO CLIMATE ACROSS THE RANGE OF EASTERN WHITE PINE (Pinus strobus L.) AND WESTERN WHITE PINE (Pinus monticola Douglas ex D. Don)  by  Simon Nadeau A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE  in  The Faculty of Graduate and Postdoctoral Studies  (Forestry) The University of British Columbia  (Vancouver)  August 2014  © Simon Nadeau, 2014 ii  Abstract Under rapid global warming, it is critical to better understand the capacity of forest trees to adapt to a changing climate. Western white pine (Pinus monticola Douglas ex D. Don) and eastern white pine (P. strobus L.) are species at higher risk to climate change, as both have fragmented ranges and have suffered declines due to harvesting, fire suppression, and the white pine blister rust. We identified and compared patterns of genetic diversity and adaptation to climate in both species using a set of 267 orthologous genes. These genes included candidates for growth, bud phenology, and resistance to biotic and abiotic stresses. Genotyping resulted in 158 and 153 successful SNPs for P. monticola and P. strobus, respectively. Each set of SNPs was genotyped on range-wide samples of 362 P. monticola individuals (61 populations) and 840 P. strobus individuals (133 populations). Analyses were conducted separately in each species. STRUCTURE analyses identified two genetic clusters in each species, corresponding to north-south genetic discontinuities, as well as weak hierarchical sub-structure within each of those groups. We found evidence of local adaptation in both species. FST outlier analyses revealed that ~7 % and ~10 % of SNPs were under selection in P. monticola and P. strobus, respectively. Environmental association methods identified that ~38 % of P. monticola SNPs and ~47 % of P. strobus SNPs were correlated with climate. Strong candidate genes for future adaptation studies were identified: 7 genes in each species were detected by at least 2 methods and 22 candidate genes were common to both species. These genes were involved in growth, bud phenology, and response to abiotic and biotic stress. The implications of these findings for the conservation of white pine populations under climate change are discussed. iii  Preface This thesis is the combination of two research chapters. None of the content of the thesis is taken directly from previously published articles. I performed the literature review, field collections, data analysis, interpretation of the results, and writing of the research chapters. My supervisors, Dr. Kermit Ritland and Dr. Nathalie Isabel, conceived the research project, provided guidance and editorial assistance throughout the project. Committee members, Dr. Sally Aitken and Dr. Loren Rieseberg, provided guidance and editorial comments. Four other people collaborated in this work. Dr. Julie Godbout helped with single nucleotide polymorphism (SNP) detection and with interpretation of the STRUCTURE results (Chapter 2). Julie also provided advice for statistical analyses for chapter 3. Manuel Lamothe provided assistance with SNP array development, SNP detection, genotypic data interpretation, and gene annotations. Marie-Claude Gros-Louis provided assistance with field collections, DNA extractions and seed germination. Dr. Patrick Meirmans provided the R code for the redundancy discriminant analysis (RDA) described in Chapter 3, and helped with the interpretation of the results. The following chapters will be submitted to scientific journals: Chapter 2: S. Nadeau, J. Godbout, M. Lamothe, M.-C. Gros-Louis, N. Isabel, and K. Ritland. North-south genetic discontinuities reveal new insights on postglacial recolonization in Pinus monticola and P. strobus.  iv  Chapter 3: S. Nadeau, P. Meirmans, K. Ritland, and N. Isabel. Local adaptation to climate in Pinus monticola and P. strobus: same or different genes?  v  Table of Contents Abstract .................................................................................................................................................. ii Preface ................................................................................................................................................... iii List of tables ........................................................................................................................................ viii List of figures ........................................................................................................................................ xi List of abbreviations ............................................................................................................................ xiv Acknowledgements ............................................................................................................................. xvi 1. General introduction ....................................................................................................................... 1 1.1 Adaptation to climate change in forest trees .......................................................................... 1 1.2 Approaches to study local adaptation to climate .................................................................... 3 1.2.1 FST outlier analysis .......................................................................................................... 4 1.2.2 Environmental association analysis ................................................................................ 6 1.3 Biology and adaptation in Pinus monticola and P. strobus .................................................... 7 1.3.1 Phylogenetic relationships and divergence time............................................................. 7 1.3.2 Natural distributions and climates .................................................................................. 9 1.3.3 Genetic variation at adaptive traits in natural populations ........................................... 10 1.3.4 Genetic variation at nuclear markers in natural populations ........................................ 13 1.4 Research objectives .............................................................................................................. 14 2. North-south genetic discontinuities reveal new insights on postglacial recolonization in Pinus monticola and P. strobus ...................................................................................................................... 16 2.1 Introduction .......................................................................................................................... 16 2.2 Material and methods ........................................................................................................... 20 2.2.1 Sampling strategy, tree collection and DNA extraction ............................................... 20 2.2.2 DNA sequencing .......................................................................................................... 21 2.2.3 SNP detection and genotyping ..................................................................................... 24 2.2.4 Genetic diversity ........................................................................................................... 25 2.2.5 Population structure ...................................................................................................... 26 2.3 Results .................................................................................................................................. 28 2.3.1 SNP genotyping ............................................................................................................ 28 2.3.2 Genetic diversity ........................................................................................................... 30 2.3.3 Population structure ...................................................................................................... 34 vi  2.4 Discussion ............................................................................................................................ 41 2.4.1 Genetic diversity in Pinus monticola and P. strobus .................................................... 42 2.4.2 Ascertainment bias ....................................................................................................... 44 2.4.3 Population structure and postglacial history in Pinus monticola and P. strobus .......... 44 3. Local adaptation to climate in Pinus monticola and P. strobus: same or different genes? .......... 51 3.1 Introduction .......................................................................................................................... 51 3.2 Material and methods ........................................................................................................... 56 3.2.1 Sampling and SNP genotyping ..................................................................................... 56 3.2.2 Hierarchical population structure ................................................................................. 58 3.2.3 Climatic data ................................................................................................................. 59 3.2.4 Isolation by distance (IBD) versus isolation by adaptation (IBA) ............................... 59 3.2.5 FST outlier tests ............................................................................................................. 63 3.2.6 Environmental association analysis .............................................................................. 65 3.2.7 Identification of highly supported candidate genes and comparisons among species .. 67 3.3 Results .................................................................................................................................. 68 3.3.1 Isolation by distance (IBD) versus isolation by adaptation (IBA) ............................... 68 3.3.2 FST outlier tests ............................................................................................................. 77 3.3.3 Environmental association tests.................................................................................... 79 3.3.4 Summary of FST outlier and environmental association analyses ................................. 83 3.3.5 Overlap between methods of analysis .......................................................................... 84 3.3.6 Overlap between species .............................................................................................. 87 3.4 Discussion ............................................................................................................................ 92 3.4.1 Isolation by adaptation in Pinus monticola and P. strobus .......................................... 92 3.4.2 FST outlier and environmental associations .................................................................. 94 3.4.3 Methods to detect signature of selection with small population sample sizes .............. 97 3.4.4 Comparisons between methods to detect signature of selection................................... 98 3.4.5 Overlap of candidate loci between methods ............................................................... 101 3.4.6 Overlap of candidate genes between Pinus monticola and P. strobus ....................... 104 4. Conclusion and perspectives ...................................................................................................... 109 References .......................................................................................................................................... 118 Appendix 1: Supplementary tables ..................................................................................................... 135 Appendix 2: Supplementary figures ................................................................................................... 202 vii  Appendix 3: Python code to look for private alleles in populations or genetic groups ...................... 217 Appendix 4 : R code for redundancy discriminant analysis ............................................................... 219 Appendix 5 : FDIST Ritland method Python and R code .................................................................. 226    viii  List of tables Table 2.1. Genotyping results from 204 Pinus monticola SNPs (380 trees) and 187 P. strobus SNPs (843 trees) using Sequenom Iplex Gold technology.  ........................................................................  29  Table 2.2. Classification of successful SNPs in orthologous and non-orthologous genes between Pinus monticola and P. strobus.  ........................................................................................................  30  Table 2.3. Observed heterozygosity (HO), Expected heterozygosity (HE) and Wright's fixation indices (Weir & Cockherham 1984) for Pinus monticola and P. strobus. . ...................................................  31  Table 2.4. Analysis of molecular variance (AMOVA) for Pinus monticola. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group"). . ...............................................................................................................  37  Table 2.5. Analysis of molecular variance (AMOVA) for Pinus strobus. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group").  ................................................................................................................  40  Table 2.6. Mantel test results for Pinus monticola and P. strobus.  ..................................................  41  Table 3.1. Correlation coefficients (r) from Mantel and Partial Mantel tests in Pinus monticola to test for the association between genetic distance (pairwise Slatkin linear FST, Y) and geographic distance (D), and between genetic distance and climatic variables, when controlled for D. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group"). Analyses were performed using all 158 P. monticola SNPs, as well as, using subsets of SNPs associated with climate as detected by Bayenv 2 and LFMM. The number of SNPs for each subset are in parentheses.  .........................................................................  70  Table 3.2. Correlation coefficients (r) from Mantel and Partial Mantel tests in Pinus strobus to test for the association between genetic distance (pairwise Slatkin linear FST, Y) and geographic distance (D), and between genetic distance and climatic variables, when controlled for D. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group"). Analyses were performed using all 153 P. strobus SNPs, as well as, using subsets of SNPs associated with climate as detected by Bayenv 2 and LFMM. The number of SNPs for each subset are in parentheses.  .........................................................................  71  Table 3.3. Redundancy discriminant analysis (RDA) to determine the fraction of among population genetic variation due to climate, space, and ancestry in Pinus monticola.  ........................................  74  Table 3.4. Redundancy discriminant analysis (RDA) to determine the fraction of among population genetic variation due to climate, space, and ancestry in Pinus strobus.  ............................................  76  Table 3.5. Number of FST outlier SNPs detected by the FDIST Ritland and BayeScan in Pinus monticola and P. strobus using false discovery rate (FDR) of 5%.  ..................................................  78  ix  Table 3.6. Number of SNPs associated with climate, as detected by LFMM and Bayenv 2 in Pinus monticola and P. strobus using false discovery rate (FDR) of 5% (LFMM) and Bayes factor (BF) > 3 (Bayenv 2).  ........................................................................................................................................  79  Table 3.7. "Strong candidate" SNPs in Pinus monticola, detected by a minimum of 2 different methods within a dataset ("range-wide", "northern group" or "southern group"). Gray and white areas refer to the alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported.  .................................................................................................................  85  Table 3.8. "Strong candidate" SNPs in Pinus strobus, detected by a minimum of 2 different methods within a dataset ("range-wide", "northern group" or "southern group"). Gray and white areas refer to the alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported.  ..........................................................................................................................  86  Table 3.9. SNPs/genes detected as FST outlier or associated with climatic variables in common to Pinus monticola and P. strobus. White and grey areas refer to the ”), alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported.  .........................  88  Table S1. Number of samples from each source. Samples were collected from provenance trials, natural stands and seedbank collections.  .........................................................................................  135  Table S2. Number of individuals sampled per population in Pinus monticola.  .............................  136  Table S3. Number of individuals sampled per population in Pinus strobus.  ..................................  138  Table S4. Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance.  .............................................................  142  Table S5. Successful SNPs (Sequenom iPlex Gold technology).  ...................................................  149  Table S6. Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus monticola. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002).  .................  156  Table S7. Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus strobus. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002).  ............................  161  Table S8. Gene annotations of RefSeq database (“tBlastx”).  .........................................................  166  Table S9. Number of FST outlier SNPs detected by BayeScan when varying the prior odds (PO) setting to 10, 100 and 1000 and using a false discovery rate (FDR) of 5%. Analysis was performed using all populations ("range-wide") and only populations within genetic groups ("northern group" and "southern group").  .....................................................................................................................  174  Table S10. Redundancy Discriminant Analysis (RDA) of among population genetic variation (dependent variable) on climatic variables (independent variables), constrained by space and ancestry in Pinus monticola.  ..........................................................................................................................  175 x  Table S11. Redundancy Discriminant Analysis (RDA) of among population genetic variation (dependent variable) on climatic variables (independent variables), constrained by space and ancestry in Pinus strobus.  ..............................................................................................................................  176  Table S12. Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations. .................................................................................................................  177  Table S13. Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations. .................................................................................................................  188    xi  List of figures Figure 2.1. Kriging interpolation of HO across a) Pinus monticola and b) P. strobus range using using the “Spatial Analyst” toolset in ArcGIS. Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).  ..  33  Figure 2.2. STRUCTURE results for Pinus monticola using a) all samples (“range-wide”) and b) within the “northern” and “southern” groups detected in a) (analyses performed separately). Shaded grey area represents the range of P. monticola (redrawn from Critchfield & Little 1966). Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).  ...........................................................................  36  Figure 2.3. STRUCTURE results for Pinus strobus using a) all samples (“range-wide”) and b) within the “northern” and “southern” groups detected in a) (analyses performed separately). Shaded grey area represents the range of P. strobus (redrawn from Critchfield & Little 1966). Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).  .....................................................................................................  39  Figure 2.4. Hypothetized locations of glacial refugia (circles) in a) Pinus monticola and b) P. strobus. Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).  ...................................................................  45  Figure 3.1. Proportion of among-population differentiation explained by climate (MAT, MWMT, MCMT, MAP, MSP, AHM, SHM, bFFP, eFFP, PAS, Eref, and CMD), space (Pinus monticola :x, y, xy, y2; Pinus strobus: x, y) and ancestry (Q-values from STRUCTURE, Chapter 2) in a) P. monticola and b) P. strobus using redundancy discriminant analysis (RDA). Significance codes: ** = p < 0.01; * = p < 0.05). Analyses using all populations (“range-wide”) are shown. The sizes of the circles are not to scale. For results within “northern” and “southern” groups see Table 3.3 and Table 3.4).  ...........  73  Figure 3.2. Number of SNPs associated with each climatic variable by LFMM (false discovery rate = 5%) and Bayenv 2 (Bayes factor > 3) in Pinus monticola: a) “range-wide” populations b) “northern group” populations and c) “southern group” populations. Lat: Latitude; Long: Longitude; Elev: elevation; MAT: mean annual temperature; MWMT: mean warmest month temperature; MCMT: mean coldest month temperature; MAP: mean annual precipitation; MSP: mean summer month precipitation; AHM: annual heat: moisture index; PAS: precipitation as snow; SHM: summer heat:moisture index; bFFP beginning of frost free period; eFFP: end of frost free period; Eref: Hargreaves reference evaporation; CMD: Hargreaves climatic moisture deficit.  .............................  80  Figure 3.3. Number of SNPs associated with each climatic variable by LFMM (false discovery rate = 5%) and Bayenv 2 (Bayes factor > 3) in Pinus strobus: a) “range-wide” populations b) “northern group” populations and c) “southern group” populations.  ................................................................  81  Figure S1. Selected population samples for genotyping in Pinus monticola and P. strobus.  ........  202    xii  Figure S2. Principal component analysis used to select provenances for genotyping in a) Pinus monticola and b) P. strobus. LTmin: lowest daily minimum temperature (°C); MTmin: mean of daily minimum temperature (°C); MTmean: mean of daily mean temperature (°C); MTmax: mean of daily maximum temperature (°C); HTmax: highest daily maximum temperature (°C); Precip: total precipitation (mm); SF: total snowfall (mm water); DP: mean dew point temperature (°C); RH: mean relative humidity (%);WS: mean wind speed at 10 m (km/h); DD0: degree-day summation over 0°C (°C/day); DD4: degree-day summation over 4°C (°C/day); FD: number of frost days (days); FFD: number of frost free days (days); GS: Growing season (days); PET: total potential evapotranspiration (mm); Ar: Aridity (water deficit, mm); VPD: Vapor pressure deficit (kPa); Rad: total radiation (MJ/m²); DwP: number of days with precipitation (days); CDwoP: consecutive days without precipitation (days).  .........................................................................................................................  203  Figure S3. Histograms of minor allele frequency in a) Pinus monticola and b) P. strobus.  ..........  204  Figure S4. Linear regression of HO against latitude and longitude in a) Pinus monticola and b) P. strobus. Population sample size was included as a covariate in the model.  ....................................  205  Figure S5. STRUCTURE analyses using all population samples (“range-wide”) for Pinus monticola (a,c) and P. strobus (b,d): a), b) are plots of log probability of the data (L[K]) vs. K; and c), d) are plots of delta K vs. K (Evanno et al. 2005). Plots were constructed using STRUCTURE HARVESTER (Earl & VonHoldt 2012). Arrows indicate the inferred number of clusters.  ..........  206  Figure S6. STRUCTURE analyses using population samples from the “northern groups” for Pinus monticola (a,c) and P. strobus (b,d): a), b) are plots of log probability of the data (L[K]) vs. K; and c), d) are plots of delta K vs. K (Evanno et al. 2005). Plots were constructed using STRUCTURE HARVESTER (Earl & VonHoldt 2012). Arrows indicate the inferred number of clusters.  ..........  207  Figure S7. STRUCTURE analyses using population samples from the “southern groups” for Pinus monticola (a,c) and P. strobus (b,d): a), b) are plots of log probability of the data (L[K]) vs. K; and c), d) are plots of delta K vs. K (Evanno et al. 2005). Plots were constructed using STRUCTURE HARVESTER (Earl & VonHoldt 2012). Arrows indicate the inferred number of clusters.  ..........  208  Figure S8. Individual-based FST vs. Weir & Cockerham (1984) FST estimator (W&C) in a) Pinus monticola and b) P. strobus. Dashed lines refer to equal individual-based FST and W&C FST. Red diamonds indicate FST outliers detected by the “FDIST Ritland” method.  .....................................  209  Figure S9. FDIST Ritland test in Pinus monticola using a) “range-wide” populations b) “northern group” populations and c) “southern group” populations. Red diamonds indicate outlier loci at a false discovery rate (FDR) of 5%.  ...........................................................................................................  210  Figure S10. FDIST Ritland test in Pinus strobus using a) “range-wide” populations b) “northern group” populations and c) “southern group” populations. Red diamonds indicate outlier loci at a false discovery rate (FDR) of 5%.  ...........................................................................................................  211  Figure S11. Hierarchical model (blue lines) vs. symmetrical island model (black lines) for the FDIST Ritland method in Pinus monticola. Red diamonds indicate outlier loci at a false discovery rate (FDR) of 5%.  ..............................................................................................................................................  212    xiii  Figure S12. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus monticola “range-wide” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown. Climate variables are represented by arrows. SNPs detected by Bayenv 2 = blue triangles; detected by LFMM = blue crosses; detected by both Bayenv 2 and LFMM = red diamonds; undetected SNPs = black circles. 213  Figure S13. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus monticola “northern group” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown.  .................................  214  Figure S14. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus strobus “range-wide” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown.  .................................  215  Figure S15. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus strobus “northern group” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown.  .................................  216   xiv  List of abbreviations AC: assisted colonization AGF: assisted gene flow AHM: annual heat: moisture index AM: assisted migration bp: base pairs BP: Before present BF: Bayes factor bFFP: beginning of frost free period CMD: Hargreaves climatic moisture deficit DNA: deoxyribonucleic acid Elev: elevation Eref: Hargreaves reference evaporation EST: expressed sequence tag FDR: false discovery rate GCAT: white spruce gene catalog HE: expected heterozygosity HO: observed heterozygosity HWE: Hardy-Weinberg equilibrium IBA: isolation by adaptation IBD: isolation by distance Lat: latitude LFMM: latent factor mixed model Long: longitude LGM: last glacial maximum xv  MAF: minor allele frequency MAP: mean annual precipitation MAT: mean annual temperature MCMC: Markov chain Monte Carlo MCMT: mean coldest month temperature MFLNRO: Ministry of Forests, Lands and Natural Resource Operations (British Columbia) MSP: mean summer precipitation MWMT: mean warmest month temperature MYA: million years ago PAS: precipitation as snow PCA: principal component analysis PCR: polymerase chain reaction PFP: Pyrophosphate-dependent fructose-6-phosphate 1-phosphotransferase PO: posterior odds SHM: summer heat:moisture index SNP(s): single nucleotide polymorphism(s) W&C: Weir & Cockerham WHISP: White Pines Resequencing Project USA: United States of America USDA: United States Department of Agriculture 3’ UTR: 3’ untranslated region   xvi  Acknowledgements This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) to Simon Nadeau and to Kermit Ritland; by the Fond Québecois de la Recherche sur la Nature et les Technologies (FQRNT) to Simon Nadeau; and by Adaptation to Climate Change funds to Nathalie Isabel. Thanks to: Dr. Nathalie Isabel and Dr. Kermit Ritland for their supervision and guidance throughout the project; Dr. Sally Aitken, Dr. Loren Rieseberg, and Dr. Carol Ritland for their precious advice and editorial comments; Dr. Dave Neale, Dr. Andrew Eckert, Jill Wegrzyn, and Dr. Jean Beaulieu for providing white pine DNA sequences; Dr. Julie Godbout for help with SNP markers development and advice on chapter 2 and 3; Manuel Lamothe and Patricia Lavigne for help with SNP marker development, and gene/SNP annotation; Dr. Tongly Wang for climate WNA and climate NA data; Joel Fillon for teaching and providing bioinformatic solutions; Marie-Claude Gros-Louis, Hubert Labissonière, Daniel Plourde, and Eric Dussault for DNA extractions, field collections, and seed germination; Dave Kolotelo and Spencer Reitenbach for providing material from the Tree Seed Centre (BC Ministry of Forests, Lands and Natural Resource Operations); Jeffrey Debell for field collections in Washington State Department of Natural Resources provenance trial (Whidbey Island); Ronald S. Zalesny Jr., John Brissette, and Richard Dionne for field collections in a United State Department of Agriculture Forest Service provenance trial (Orono, Maine); Agnes Yuen, Danny Leung, Jasmine Ono, Stéphanie Beauseigle, and Elizabeth Kleyhans for help with field collections. Special thanks to my partner, Aurore Thiercelin, for her personal support. 1  1. General introduction 1.1 Adaptation to climate change in forest trees Climate is a major selective agent acting on phenotypes of forest trees that has led to local adaptation (Howe et al. 2003). A wealth of provenance trial experiments have shown that considerable phenotypic variation for adaptive traits exists among populations, and that this variation is geographically structured along climatic gradients (Howe et al. 2003; Savolainen et al. 2007; Alberto et al. 2013). Populations are also often the fittest in their local environment, confirming that local adaptation is common in tree species (Leimu & Fischer 2008; Hereford 2009; Aitken & Whitlock 2013). Under rapid global warming, populations may become maladapted to their new environments, and are likely to suffer from reduced growth and survival (Rehfeldt et al. 1999). Hence, understanding the adaptive capacity and resilience in forest trees is of great importance if we want to effectively forecast and mitigate the effects of climate change (Lindner et al. 2010). Knowledge of factors influencing the spatial distribution of genetic diversity, such as past historical events, population dynamics, and the degree of local adaptation, plays a key role in predicting responses of tree species to future environmental changes (Petit et al. 2008). For example, predictions of future available habitat and conservation status using species distribution models are highly impacted by the ability of a species to migrate and adapt to new local conditions (Hamann & Aitken 2013). To avoid extirpation, tree populations will have to track climate change either by migrating toward new suitable habitat or through adaptation to new local conditions (Aitken et al. 2008). Adaptation can be 2  achieved either via genetic changes or via phenotypic plasticity (Crispo 2008; Chevin et al. 2010; Reed et al. 2011). Extensive range shifts and the rapid development of steep genetic clines during the last interglacial period suggest a strong capacity for forest trees to adapt to a changing climate (Davis & Shaw 2001; Hamrick 2004).  With recent global warming, biological responses, such as range shifts toward northern latitudes and phenological changes, have been reported for many species (Parmesan 2006). However, for tree species to track predicted changes of suitable habitat, migration rates need to ca. ten times faster than estimated migration rates of North American tree species following the last Pleistocene glaciations (Aitken et al. 2008). Furthermore, even if a whole species range could be shifted northward to its suitable climate, adaptation to new environmental conditions, including new biotic agents (e.g., pests & diseases, competing species) and abiotic agents (e.g., photoperiod, soil characteristics), would be necessary (Mimura & Aitken 2007). Although tree species have great potential for local adaptation (reviewed in Petit & Hampe 2006), species with small, fragmented populations, low fecundity, and long generation times are likely to suffer from “adaptational lag” (Aitken et al. 2008). Thus, informed assisted migration and the maintenance of suitable habitat through reserve planning are needed to conserve forest tree species at risk (Hamann & Aitken 2013; Aitken & Whitlock 2013). To date, very few studies have investigated adaptive genetic variation in non-model species such as white pines (Pinus subgenus Strobus). These species should be primary candidates for assisted migration because they have fragmented ranges and have suffered declines due to harvesting, fire suppression, and introduced diseases. Despite high timber and 3  ecological value, eastern white pine (Pinus strobus L.) and western white pine (P. monticola Douglas ex D. Don) are not used much for reforestation because of the devastating impact of the white pine blister rust (Cronartium ribicola J.C. Fisch.), an exotic pest introduced in the last century (Geils et al. 2010). Knowledge of adaptive genetic variation for traits related to climate and pest resistance is required to inform programs for assisted migration and conservation, and to enhance reforestation and maintain populations of white pines. 1.2 Approaches to study local adaptation to climate Traditionally, local adaptation in forest trees has been studied using common garden experiments, called provenance trials, in which trees from various origins across a species range are evaluated for different traits (e.g., growth performance, bud phenology, cold hardiness) over a number of sites (Savolainen et al. 2007). Clinal genetic variation along environmental gradients in such traits is generally considered as evidence of local adaptation (Alberto et al. 2013). More strictly speaking, local adaptation occurs when resident genotypes have higher fitness in their local habitat than foreign genotypes, reflecting genotype-by-environment interactions (Kawecki & Ebert 2004). With the emerging fields of landscape and population genomics, it is now possible to evaluate the adaptive potential of species directly at the genomic level (e.g., for suites of genes involved in adaptation) (Manel et al. 2010a). The various methods to detect genomic regions that show signature of selection (reviewed in Nielsen 2005; Vasemägi & Primmer 2005) can be classified in two main approaches: “top-down” and “bottom-up” (Barrett & Hoekstra 2011; Sork et al. 2013). The top-down approach involves identifying genes underlying phenotypic traits, where the traits are known to have functional importance or 4  vary between environments. Examples of such approaches are assessing the co-segregation of genotypes and phenotypes using controlled crosses between parents (quantitative trait locus mapping), or using naturally admixed populations to correlate genotypes with phenotypes (association mapping). In contrast, bottom-up approaches identify genes showing signatures of selection without knowing their phenotypic effects. Potential genetic signatures of selection include excessive population differentiation (see “1.2.1 FST outlier analysis”), genetic correlations with climatic variables (see “1.2.2 Environmental association analysis”), decreased polymorphism within populations, or extended linkage disequilibrium relative to neutral expectations (reviewed in Nielsen 2005; Vasemägi & Primmer 2005). Recent studies in conifers have shown how these methods can be applied to identify candidate genes associated with bioclimatic factors and adaptive traits (e.g., pest resistance, phenology, cold hardiness and drought tolerance) (Namroud et al. 2008, 2012; Eckert et al. 2010b; a, 2013b; Holliday et al. 2010, 2012; Prunier et al. 2011, 2012, 2013; Chen et al. 2012; Mosca et al. 2012, 2013; Cullingham et al. 2014; McKown et al. 2014). These studies demonstrate the use of single nucleotide polymorphism (SNP) markers, derived from expressed genes, as a powerful tool to identify, monitor, and manage natural genetic resources in response to climate change. Two top-down approaches, FST outlier and environmental association tests, are discussed further in the next sections. 1.2.1 FST outlier analysis Loci under selection, or closely linked to loci under selection, generally show atypical patterns of genetic differentiation among populations when compared to selectively neutral loci (i.e., shaped by genetic drift, gene flow, and mutation events, but not selection) (Luikart 5  et al. 2003). Lewontin & Krakauer (1973) first proposed a test based on Wright’s fixation index (FST, Wright 1951), and a variety of statistical tests have followed (Bowcock et al. 1991; Beaumont & Nichols 1996; Vitalis et al. 2001; Porter 2003; Beaumont & Balding 2004; Foll & Gaggiotti 2008; Excoffier et al. 2009). In FST outlier tests, sometimes termed “genome scans”, loci showing excessive differentiation among populations are deemed to be under divergent selection (i.e., geographically restricted directional selection where different variants are advantageous in different populations). In contrast, loci showing atypically low differentiation among populations are deemed to be under balancing selection (i.e., selection that tends to homogenise allele frequencies among populations; Beaumont & Balding 2004). Divergent selection indicates the presence of locally adapted populations (Kawecki & Ebert 2004).  Population genetic structure, such as hierarchical structure or patterns of isolation by distance (IBD), can strongly affect the results of FST outlier tests and produce large numbers of false positives (Excoffier et al. 2009; Meirmans 2012; Lotterhos & Whitlock 2014). In this regard, FST outlier methods can be classified in two groups (Chen et al. 2012): 1) tests based on an island model that assume a common and unique migration pool (Beaumont & Nichols 1996; Beaumont & Balding 2004; Foll & Gaggiotti 2008); and 2) tests that assume hierarchical population structure (i.e., different migration rates among and within groups) (Excoffier et al. 2009, 2010). Therefore, FST outlier analyses require a priori information on population structure and past demographic history in order to group samples into populations or hierarchical groups, and to select the appropriate test. Populations can be defined on the basis of genetic structure or, when genetic structure is absent, geographic distance, or by 6  clustering samples that share similar ecological conditions (e.g., Prunier et al. 2011, 2012). These decisions can strongly affect the identification of FST outliers (Eckert et al. 2010b). 1.2.2 Environmental association analysis Correlations between allelic variation and environmental gradients have been identified in a variety of tree species (Eckert et al. 2010b; a; Chen et al. 2012; Keller 2012; Mosca et al. 2012), and such associations are often interpreted as evidence of directional selection (Endler 1977, 1986; Vasemägi & Primmer 2005). A number of different methods have been developed to test this hypothesis (Joost et al. 2007; Poncet et al. 2010; Coop et al. 2010; Günther & Coop 2012; Frichot et al. 2013). As opposed to FST outlier tests, which identify candidate loci independent of the environment, the environmental association approach has the advantage of giving additional information on the selective climatic agents acting on populations. Another advantage is that environmental association analysis can be applied at the individual level rather than at the population level, thereby eliminating the need to define populations a priori. Like with FST outliers, spurious correlations can arise if environmental gradients are correlated with population structure (nearby populations are likely to share ancestry and occur in similar environments) (Eckert et al. 2010b). Various methods have been developed to overcome this problem (Coop et al. 2010; Günther & Coop 2012; Frichot et al. 2013). One method, called “Bayenv”, first estimates the empirical pattern of covariance in allele frequencies between populations, and then uses this as a null model to test environmental associations at each individual SNP (Coop et al. 2010; Günther & Coop 2012). Another method uses Latent Factor Mixed Models (LFMM), a variant of principal component 7  analysis. This method has been developed to test for genotypic-environment correlations while simultaneously correcting for population structure (Frichot et al. 2013). To increase power in environmental association analysis, it is recommended to genotype many populations in a broad range of conditions (Poncet et al. 2010; De Mita et al. 2013). Sampling designs could also be stratified across environmental variables of interest using existing landscape features and environments (i.e., sampling a relatively equal number of individuals in each environments) in a quasi-experimental design, in order to test specific hypotheses (Manel et al. 2010a). 1.3 Biology and adaptation in Pinus monticola and P. strobus 1.3.1 Phylogenetic relationships and divergence time The genus Pinus is the largest and most widespread genus of conifers, comprising over 100 widely recognised species (Price et al. 1998; Farjon 2001). The two subgenera, Pinus (Diploxylon or “hard pines”) and Strobus (Haploxylon or “soft pines”) are diagnosable by 2 versus 1 fibrovascular bundles per leaf, respectively (Gernandt et al. 2005). “White pines”, sometimes called “five needle pines” are classified into subgenus Strobus, section Quinquefoliae (Gernandt et al. 2005). Of the 23 members of section Quinquefoliae, eight are native to North America, five grow in the western United States and southwestern Canada (Pinus strobiformis Engelm., P. flexilis E. James, P. albicaulis Engelm., P. lambertiana Douglas, and P. monticola), and only one, P. strobus, grows in eastern North America. Noteworthy is also P. chiapensis (Martinez) Andresen, native to Mexico, which was previously considered as a variety of P. strobus but is now recognized as a separate species (Syring et al. 2007a). Phylogenetic studies using morphological and molecular characters 8  have consistently grouped P. strobus and P. monticola into subsection Strobus (previously called Strobi) (Critchfield 1986; Liston et al. 1999; Gernandt et al. 2005, 2008; Eckert & Hall 2006). However, relationships among species within subsection Strobus remains poorly resolved, likely due to past introgression of chloroplast and mitochondrial DNA between Eurasian and North American species through Beringia (Tsutsui et al. 2009), or to incomplete lineage sorting at nuclear, chloroplastic, and mitochondrial DNA sequences (Syring et al. 2007a; b; Tsutsui et al. 2009). Estimated divergence times and molecular rates vary widely depending on the fossil calibration used (Willyard et al. 2007; Gernandt et al. 2008). Paleontological and molecular data suggest that the divergence event separating the two major subgenus Pinus and Strobus occurred 85 to 45 million years ago (MYA) (reviewed in Willyard et al. 2007). Molecular phylogenies calibrated using those estimates yielded a divergence time between the two subgenus Strobus sections (Quinquefolia and Parrya) ranging between 25-47 MYA for nuclear loci and 19-35 MYA for chloroplastic loci (Willyard et al. 2007). Another tree based on 2 chloroplast genes (matK and rbcL, total length of 2838 bp) of all 11 extant genera of the Pinaceae family and including both extant genera and fossil taxa for calibration, found an age for the node Quinquefolia-Parrya between 35-37 MYA (Gernandt et al. 2008), which is within estimates of Willyard et al. (2007). According to Gernandt et al. (2008), divergence between P. monticola and P. strobus is recent, estimated at less than ~12 MYA (Miocene). Crosses between P. monticola and P. strobus produce viable seeds (Bingham et al. 1972). P. monticola x strobus F1 has also been successfully backcrossed with P. strobus (Lu et al. 2005). P. monticola x strobus seedlings show better growth than P. monticola (Bingham et al. 1972) and slightly greater resistance to rust (Lu et al. 2005) than P. strobus. 9  Hence, both species are not reproductively isolated, confirming their close phylogenetic relationship.  1.3.2 Natural distributions and climates P. monticola and P. strobus are temperate species widely distributed across western and eastern North America, respectively. Before their decline in the early 20th century, both species were among the most economically and ecologically valuable timber species (Echt & Nelson 1997; Nkongolo et al. 2002; Kim et al. 2003, 2010). P. monticola is widely distributed across western North America and occupies a wide climatic niche (Rehfeldt et al. 2008). Its range spans 15 degrees latitude (~1,800 km, estimated from ArcGIS, Figure S1) and 2,500 m elevation. The coastal populations, west of the Cascades, and the interior populations of the northern Rocky Mountains are separated from each other by the drier lands of the Columbia Basin. The mesic, Montane forests of the Rocky Mountains of northern Idaho, British Columbia and adjacent areas are home to the most productive sites for P. monticola, where it occurs as a major seral species (Neuenschwander 1999). In this region, P. monticola has an elevational distribution of 1,500 m, occurs in areas that have frost-free periods of 60-160 days, and receives 750-1,500 mm of precipitation (Wellner 1962). On the coast, P. monticola occurs as a minor seral component of the Tsuga heterophylla (Raf.) Sarg. zone from sea-level to 1500 m at the northern end of its range and as a minor seral component of subalpine forests throughout most of the Cascades (occurs between 1,300 m and 2,000 m) and Sierra Nevada (occurs above 2,500 m). The coast is characterized by a maritime climate with warm dry summers and with most of the precipitations (75-85%) occurring during the mild-wet winters (Grigg & Whitlock 1998). 10  Its distribution is limited by moisture stress at lower altitudes and by lack of growing season warmth at higher altitudes (Rehfeldt et al. 2008). P. strobus occupies the largest natural range of all species in the subgenus Strobus, spanning over ~2,000 km in the north-south and ~3,000 km in east-west directions (estimated from ArcGIS, Figure S1). Its distribution extends east from the eastern Maritime provinces (Newfoundland, Nova Scotia, New Brunswick), Québec, and New England; west to western Ontario, southeastern Manitoba, Minnesota, and Wisconsin; and south, primarily along the Appalachian Mountains, to western North Carolina, northern Georgia, and Tennessee (Critchfield & Little 1966; Wendel & Smith 1990). The climate over the range is cool and humid, with generally a moisture surplus in all seasons (Wendel & Smith 1990). Across most of its range, P. strobus grows at elevations between sea level and 460 m. It reaches its higher altitudes, between 370 m and 1,070 m, in the southern Appalachian Mountains (Wendel & Smith 1990). Temperature and precipitation variables vary widely across its north-south and east-west distribution. P. strobus distribution and abundance is primarily a function of seasonal temperatures and moisture balance (Joyce & Rehfeldt 2013). 1.3.3 Genetic variation at adaptive traits in natural populations A large number of provenance trial experiments in both species have shown contrasting patterns of genetic variation in quantitative traits. In P. monticola, very low differentiation among populations is detected, except at the California-Oregon border. In contrast, P. strobus populations are more strongly genetically differentiated (Rehfeldt et al. 1984; Genys 1987 and reference therein). 11  In P. monticola, population differentiation accounted for 3% to 62% of the phenotypic variance in growth and development traits, periodicity of shoot elongation, and cold hardiness (Rehfeldt et al. 1984). The vast majority of this variation was explained by three different groups of populations: 1) a large northern group, including populations from the Rocky Mountains (northeastern Washington, northern Idaho, northwestern Montana, interior British Columbia), northern Cascades and northern coastal areas (Oregon, Washington, coastal British Columbia), having high growth potential and low cold hardiness; 2) a southern group, including high elevation populations from the Sierra Nevada (California) and southern Oregon, exhibiting low growth potential and high cold hardiness; and 3) a transitional zone, including the population from the central and southern Cascades (Oregon), where a steep cline was associated with latitude and distance east-west of the Cascade crest (Rehfeldt et al. 1984; Campbell & Sugano 1989; Chuine et al. 2006). Thus distinct northern and southern ecotypes are differentiated according to growth and climatic stress resistance. Patterns of variation within the northern group and southern groups are weak to non-existent and the lowest observed for any species, suggesting that P. monticola adapted to harsh environments mostly via phenotypic plasticity within these groups (Rehfeldt et al. 1984; Chuine et al. 2006). In the northern group, only weak differentiation in cold hardiness and height deviation (3-year height vs. 2-year height) was observed between coastal and interior populations (Rehfeldt et al. 1984). As a result, two large seed transfer zones are delimited for British Columbia, corresponding to the interior and coastal portions of the range (Thomas & Lester 1992; Meagher et al. 1999), and five zones were suggested for Washington and Oregon, with no elevation restriction (Campbell & Sugano 1989). California 12  sources are not suitable for planting in the northern group due to poor survival and height growth. In contrast, P. strobus populations are moderately genetically differentiated with respect to quantitative traits as shown by range-wide and regional provenance tests (Fowler & Heimburger 1969; Garrett 1973; Genys 1987; Li et al. 1997; Joyce & Sinclair 2002; Lu et al. 2003a; b). Variance among populations explained as much as 20% to 22% of the genetic variation in height growth among Québec provenance (3 to 4 year height; (Li et al. 1997) and 10% to 22% among Ontario provenances (1 to 5 year height; Joyce & Sinclair 2002). Provenance variation mainly followed a latitudinal cline, with southern provenances exhibiting faster growth, earlier bud burst, delayed shoot elongation cessation and bud set, and lower cold hardiness (Genys 1987; Li et al. 1997; Joyce & Sinclair 2002; Lu et al. 2003a; b). Interestingly, southern provenances from ~34°N to ~36°N (Georgia, Tennessee, North Carolina) were genetically distinct (Joyce & Rehfeldt 2013) and were superior when planted as far north as 41°N, but performed poorly north of 45°N (Fowler & Heimburger 1969; Garrett 1973). Northward and southward seed transfers of 2-2.58° and 1.5–2°, respectively, were deemed as reasonable to promote growth, while keeping a low risk of frost damage (Joyce & Sinclair 2002; Lu et al. 2003a; b). In both species, provenance variation was correlated with the climate of provenance locations, suggesting that populations are locally adapted. In P. monticola, genetic differentiation between the northern and southern groups parallels a steep decrease in precipitation southward to California and west across the Cascade crest (Richardson et al. 2009). In P. strobus, geographic and climatic variables were able to explain ~46% to 65% of height growth variation among Ontario and Québec provenances (Li et al. 1997; Joyce & 13  Sinclair 2002), 26% to 50% of the variation in shoot elongation and bud phenology (Li et al. 1997; Lu et al. 2003b), and 8% to 30% in cold hardiness (Lu et al. 2003a). Accumulated degree days greater than 5°C was the best predictor of range-wide variation in growth potential (Joyce & Rehfeldt 2013). 1.3.4 Genetic variation at nuclear markers in natural populations Genetic diversity at nuclear markers has been characterized with allozyme, AFLP, RAPD, ISSR and microsatellite markers (Steinhoff et al. 1983; Beaulieu & Simon 1994; Rajora et al. 1998; Isabel et al. 1999; Mehes et al. 2007, 2009; Kim et al. 2010; Mandák et al. 2013). Those molecular studies found moderate to high levels of genetic diversity within P. monticola and P. strobus populations, but low among-population differentiation. Across the range of P. monticola, 15% to 20% of the total genetic variation, as measured by FST, was explained by differences among populations (Steinhoff et al. 1983; Kim et al. 2010). Most of this inter-population differentiation is attributable to the presence of a strong genetic cline located at the Oregon/California border, closely matching genetic variation at quantitative traits (Richardson et al. 2009). The northern group was found to have lower genetic variation and lower among-population differentiation than the southern group (Steinhoff et al. 1983; Kim et al. 2010). The populations of the southern Cascades and Klamath-Siskiyou Mountains (between 42-44°N lat) are transitional between the two clades (Steinhoff et al. 1983). In the northern group, differentiation between disjunct coastal populations and interior populations was low to moderate and did not constitute separate sub-groups (Nei’s unbiased genetic distance D = 0.0116, Steinhoff et al. 1983; FST = 0.099 Kim et al. 2010). Genetic diversity varied significantly across the range, and was lower in small 14  isolated populations or near the leading edge (high latitude) of the species range (Steinhoff et al. 1983; Kim et al. 2010). In P. strobus, geographic patterns of population differentiation were not apparent. Moderate to high levels of genetic diversity were observed within populations, but no differences of genetic diversity across regions were found (Beaulieu & Simon 1994; Rajora et al. 1998; Isabel et al. 1999; Mehes et al. 2009). Only 2% to 8% of the total genetic variation in the Canadian portion of the range was due to among-population differentiation (FST), and most of the genetic diversity was among individuals within population (Beaulieu & Simon 1994; Rajora et al. 1998; Isabel et al. 1999; Mehes et al. 2009). Another study including populations from the United States north of 40° latitude (Great Lakes region, central Appalachian Mountains and New England) also found low among-population differentiation (FST = 0.025; Mandák et al. 2013). Smaller isolated populations or occurring at the range margins had similar or higher levels of genetic diversity than central populations (Beaulieu & Simon 1994; Rajora et al. 1998). 1.4 Research objectives This study aimed at revealing range-wide patterns of neutral and adaptive genetic variation at the genomic level in Pinus monticola and P. strobus. To meet these objectives, three specific research questions were posed: 1. Are there detectable patterns of population structure in P. monticola and P. strobus? (Chapter 2); 2. Is there evidence of local adaptation to climate in P. monticola and P. strobus? (Chapter 3); 15  3. Are the same or different genes involved in adaptation to climate between P. monticola and P. strobus? (Chapter 3). Hypotheses for each research question are formulated in the introduction of the corresponding chapters. In Chapter 2, patterns of genetic diversity and population structure across the ranges of P. monticola and P. strobus are described, and potential glacial refugia are identified. In Chapter 3, the importance of local adaptation in shaping population structure of both species is investigated. Genomic regions showing signatures of adaptation and association with climatic variables using FST outlier and environmental association analysis are identified and compared between species. Chapter 4 discusses the implications of these findings in relation to conservation of adaptive genetic diversity in those two species at risk under climate change. This study is the first range-wide scale characterization of genetic variability at single nucleotide polymorphism (SNP) markers from expressed genes in P. strobus and the most extensive genetic sampling to date across the range of P. monticola. Furthermore, this will be the first time that SNPs will be used to test for signature of selection in these two species. The joint study of these two species also adds a unique comparative component to this thesis, which can provide novel insights into genes of importance for adaptation.   16  2. North-south genetic discontinuities reveal new insights on postglacial recolonization in Pinus monticola and P. strobus 2.1 Introduction Identifying factors shaping the spatial distribution of genetic diversity across species ranges is key for predicting responses to future environmental change and for developing informed conservation strategies (Hampe & Petit 2005; Petit et al. 2008). The knowledge of population structure and historical population demography is also an important first step to control for confounding effects when searching for signature of selection and local adaptation (Lotterhos & Whitlock 2014). Fossil pollen records and genetic data showed that species have survived historical climate changes by migration and adaptation (Davis & Shaw 2001; Hamrick 2004). During the last glacial maximum (LGM; ~20,000 years ago), ice sheets covered much of North America (Clark et al. 2009), forcing populations to recede into small glacial refugia. Temperate and boreal tree species recolonized most of their current ranges during the current interglacial, and the impact of such historical range shifts is imprinted in their contemporary spatial genetic distribution (Hoban et al. 2010). One observable effect of rapid recolonization of newly available habitats is the decrease of genetic diversity along colonization routes (Hewitt 1996, 2000). Also noticeable is the increase in genetic diversity near contact zones between lineages originating from separate glacial refugia (Petit et al. 2003). Such contact zones have been widely reported for many tree species (Richardson et al. 2002; Jaramillo-Correa et al. 2004; Walter & Epperson 2005; Godbout et al. 2005, 2008; Naydenov et al. 17  2007), and comparative phylogeography has highlighted congruent patterns between species in both eastern and western North America (Remington 1968; Soltis et al. 1997, 2006; Swenson & Howard 2005; Jaramillo-Correa et al. 2009; Shafer et al. 2010). Pinus monticola Douglas ex D. Don and P. strobus L. are widely distributed white pines across western and eastern North America, respectively. A review of northwestern North America have shown that species with high dispersal ability and large contemporary ranges, such as P. monticola and P. strobus, are likely to have survived glaciations in multiple refugia (Shafer et al. 2010). In addition, mountain ranges, such as the Cascade ranges, the Rocky Mountains, and the Appalachian mountains, may have acted as vicariance factors and prevented gene flow between lineages originating from different refugia (Brunsfeld et al. 2001; Jaramillo-Correa et al. 2009; Shafer et al. 2010). Thus, significant patterns of population structure in both species may result from past historical events and the existence of a number of glacial refugia. Genetic diversity in P. monticola and P. strobus have been characterized using several types of molecular markers: allozymes, AFLP, RAPD, ISSR, and microsatellites (Steinhoff et al. 1983; Beaulieu & Simon 1994; Rajora et al. 1998; Isabel et al. 1999; Mehes et al. 2007, 2009; Geils et al. 2010; Mandák et al. 2013). In both species, low among-population differentiation was observed, with the exception of a steep genetic discontinuity near the Oregon/California border in P. monticola. In P. monticola, very little to no genetic differentiation is observed between coastal and interior populations for both nuclear markers and adaptive traits (Steinhoff et al. 1983; Rehfeldt et al. 1984; Mehes et al. 2009; Kim et al. 2010), which could indicate that all northern populations were recolonized from the same 18  refugia. Further south, a steep genetic discontinuity located at the Oregon/California border was attributed to recolonization from both a northern and southern refugia after the LGM (Rehfeldt et al. 1984; but see Richardson et al. 2009).  In P. strobus, little is known about the locations and number of Pleistocene refugia (Jacobson & Dieffenbacher-Krall 1995). The lack of population structure found in previous molecular studies in the central and northern parts of its range (Beaulieu & Simon 1994; Rajora et al. 1998; Mehes et al. 2009; Mandák et al. 2013) suggests recolonization from a single refugium. However, other cryptic refugia may be possible, as the ranges of P. monticola and P. strobus encompass many proposed northern refugia, such as the Haida Gwaii and Alexander archipelagos on the northern coast of British Columbia, the “Clearwater refugium” in Idaho, or a refugium east of the Rocky Mountains in Montana (Jaramillo-Correa et al. 2009; Shafer et al. 2010; Mehes-Smith et al. 2011). Genetic discontinuities found in other tree species in eastern North America also suggest putative northern refugia, e.g., south of the Great Lakes or on the northern Atlantic coast (Jaramillo-Correa et al. 2009). However, the presence of these potential refugia remains unclear due to the scarcity of Late Pleistocene fossil data in northern regions (Jackson et al. 2000).  Most phylogeographic studies in North America have been conducted at a regional scale, especially for widely distributed taxa such as forest trees (Jaramillo-Correa et al. 2009; Shafer et al. 2010). To date, only two studies have described range-wide patterns of nuclear genetic diversity in P. monticola, with a limited number of sampled populations (Rehfeldt et al. 1984; Kim et al. 2010). In P. strobus, most molecular studies analyzed a small number of Canadian populations (Beaulieu & Simon 1994; Rajora et al. 1998; Mehes et al. 2009). One 19  study sampled a higher number of populations in the United States, but did not include provenances from the southern Appalachian Mountains or from the Canadian portion (Mandák et al. 2013). A review of phylogeographic studies across northwestern North America has shown that when more samples were included, more detailed population structure was revealed and new glacial refugia were discovered (Shafer et al. 2010). In order to describe more accurately patterns of genetic variation in P. monticola and P. strobus, studies that include a large number of sampled populations across the range of both species are needed. Here we investigate and compare the range-wide patterns of genetic structure in P. monticola and P. strobus. To this end, we developed new sets of 158 and 153 single nucleotide polymorphism (SNP) markers derived from expressed genes for P. monticola and P. strobus respectively, and genotyped each set of SNPs on trees from 61 P. monticola and 133 P. strobus populations distributed across their respective ranges. SNPs are useful markers for inferring population history because they have slow mutation rates and limited homoplasy (Brumfield et al. 2003; Brito & Edwards 2009). With this dataset, we investigate whether: 1) there is significant population structure in P. monticola and P. strobus, and 2) the observed patterns of genetic diversity are indicative of postglacial recolonization from a single glacial refugium or from multiple refugia. We expect to find evidence of recolonization from two glacial refugia in P. monticola (Rehfeldt et al. 1984), while the lack of population structure in P. strobus suggests just one refugium (Beaulieu & Simon 1994; Rajora et al. 1998; Mehes et al. 2009; Mandák et al. 2013). We also examine evidence for additional northern cryptic refugia (Jaramillo-Correa et al. 2009; Shafer et al. 2010). 20  2.2 Material and methods 2.2.1 Sampling strategy, tree collection and DNA extraction Range-wide samples representing natural populations of both species were collected from provenance trials and seedbank collections of the Canadian Forest Service, the BC Ministry of Forests, Lands and Natural Resource Operations (MFLNRO), the United States Department of Agriculture (USDA) Forest Service and the Washington State Department of Natural Resources (Table S1). In order to cover the species range, seedbank collections were used for provenances that were not represented in existing provenance trials. For this, P. monticola and P. strobus seeds were obtained from the Tree Seed Centre (MFLNRO, Surrey, British Columbia, http://www.for.gov.bc.ca/HTI/treeseedcentre/index.htm) and from the Canadian Forest Service national seed banks (Laurentian Forestry Centre, Québec city, Québec; Atlantic Forestry Centre, Fredericton, New-Brunswick), respectively. For DNA analysis, seedlings were grown in a greenhouse according to standard stratification and germination protocols for both species. In addition, mature P. strobus trees were sampled from natural stands in Maine, New Hampshire, Vermont, and New Brunswick, as these areas were under-represented by provenance trials and seed lot collections. Sampled trees from natural stands were at least 60 m apart to avoid sampling close relatives. We selected provenances that maximised the number of sampling locations across the ranges and that encompassed the largest possible range of environmental conditions. For every available provenance (provenance trials, seedbank collection and natural stands), 21 climatic variables averaged over the years 1981-2010 were estimated using BioSIM 9 (Régnière & St-Amant 2007). Principal component analysis was performed on all 21 21  standardized climatic variables. The first principal component was found to be mainly related to temperature and the second to precipitation (Figure S2), and these accounted for a total of 81.7% and 80.5% of the variation in climate across P. monticola and P. strobus populations, respectively. Histograms of the first two principal components were used to select provenances. All climate categories were well represented in our selected provenances and they included provenances showing extreme values of PC1 and PC2. In P. monticola, the sampling consisted of 362 trees from 61 populations (Figure S1, Table S2). For P. strobus, a total of 843 trees from 133 populations were sampled (Figure S1, Table S3). Populations were represented by 1 to 6 individuals. In geographic regions where only sparsely distributed samples were available, more than 6 trees per population were sampled to equally represent these regions (up to 12 for P. monticola and up to 21 for P. strobus). DNA was extracted from buds or current-year needles using the Nucleospin 96 Plant II kit (Macherey-Nagel, Bethlehem, Pensylvania) with the following modifications to the manufacturer’s protocol: 1) cell lysis using buffers PL2, heated at 65 oC for 2 hour, and PL3, and 2) elution with an in-house Tris-HCl 0.01mM pH 8.0 buffer. DNA concentration and quality for each sample was assessed using a Nanodrop 8000 (Thermo Fisher Scientific Inc., Waltham, MA). 2.2.2 DNA sequencing SNP markers were developed from DNA sequences of expressed genes representing both randomly distributed genes and candidate genes for adaptation. Gene sequences from the White Pine Resequencing Project (WHISP, http://dendrome.ucdavis.edu/wpgp) were 22  used as a set of randomly distributed genes across the genome. Out of 800 primer pairs developed for loblolly pine (Pinus taeda L.), 192 expressed sequence tag (EST) contigs successfully amplified and produced high quality sequence reads in both P. monticola and P. strobus (Eckert et al. 2013a). No selection criteria, other than PCR and sequencing success, were applied. These genes cover 11 of the 12 linkage groups in P. taeda and, given high levels of synteny observed among conifers (Pavy et al. 2012a), a similar genome-wide distribution is expected for both studied species. For each species, 15-20 DNA sequences obtained from haploid megagametophytes (Eckert et al. 2013a) were used for subsequent SNP detection. Candidate genes for adaptation were also resequenced in both P. strobus and P. monticola. We first identified candidate genes for growth-related traits and phenology (e.g., timing of bud burst and bud set) by searching in the white spruce (Picea glauca [Moench] Voss) gene catalog (GCAT, Rigault et al. 2011). Priority genes were previously identified from expression profiling experiments (Holliday et al. 2008; El Kayal et al. 2011), genome scans (Namroud et al. 2008; Prunier et al. 2011), and QTL mapping studies (Pelgas et al. 2011). In order to identify putative orthologs in white pines, we used available sugar pine (Pinus lambertiana Douglas) sequences (Jermstad et al. 2010), a close relative to P. strobus and P. monticola (Gernandt et al. 2005; Syring et al. 2007b). The 750 P. lambertiana sequences, were blasted on the Picea glauca gene catalog (27720 cDNA clusters) using a translated nucleotide query (“tblastx”) and the highest hits were retained for PCR amplification. In addition, we searched the literature in other conifers: Pinus taeda L. (Eckert et al. 2010b); P. pinaster Aiton (Eveno et al. 2008; Grivet et al. 2011); P. halepensis Mill. (Grivet et al. 2011); Pseudotsuga menziesii Mirb. (Eckert et al. 2009); and Picea sitchensis 23  (Bong.) Carr. (Holliday et al. 2010), and identified 9 candidate genes related to adaptation (abiotic stress) that were detected in a minimum of 2 studies. A total of 95 high priority genes (86 from Pinus lambertiana primers and 9 from the literature) were retained, and their corresponding primer pairs were tested for PCR amplification (Table S4). For PCR, about 15 ng of P. strobus megagametophyte DNA was added to a 30 μL reactions containing 1X PCR buffer, 1.66 mM MgCl2, 0.133 mM of each dNTPs, 0.133 μM of each Primer, 1U of Platinum Taq polymerase (Invitrogen, Burlington, Ontario). DNA regions were amplified using a PTC200 Thermal Cycler (MJ Research Waltham, MA) according to the following protocol: an initial 3 min at 94 °C, 35 cycles of 1 min at 94 °C, 45 seconds at the annealing temperature and 3 min at 70 °C followed by a final 10 min at 72 °C. Annealing temperatures for each primer pair is given in Supplementary Table S3. Successful PCR products from 67 primer pairs were sequenced at the McGill University and Génome Québec Innovation Centre on ABI 3730XL DNA Analyzer systems (Applied Biosystems, Carlsbad, California) using their internal protocols. The resulting haploid DNA sequences were visually inspected for quality using CLC Genomics Workbench version 5.5 (CLC bio, Cambridge, Massachusetts), and the presence of double peaks in haploid sequences was checked to ensure that no paralog was amplified. For further SNP detection, we selected 47 primers pairs that yielded high quality reads and resequenced 15-16 individuals (diploid) from each species along with a megagametophyte from either P. monticola or P. strobus using the same DNA sequencing protocol (McGill University and Génome Québec Innovation Centre). 24  2.2.3 SNP detection and genotyping SNP markers for P. monticola and P. strobus were developed from 239 orthologous genes (192 randomly distributed genes and 47 candidate genes for growth and adaptation). For every gene, DNA sequences were aligned using CLC Genomics Workbench version 5.5 (CLC bio, Cambridge, Massachusetts). High quality SNPs were selected by visual inspection of chromatograms. SNPs or DNA sequences showing poor quality reads or double peaks in haploid sequences, indicating paralogy, were not retained. In addition, we included 36 P. strobus SNPs and 3 P. monticola SNPs previously developed from 31 candidate genes for wood formation in Picea glauca and from 3 gene sequences available on GenBank (Beaulieu, unpublished data; S. Nadeau, unpublished data, Syring et al. 2005, 2007a; b). For genotyping, 204 P. monticola SNPs from 151 genes and 187 P. strobus SNPs from 144 genes were tested. All 362 P. monticola and 843 P. strobus samples were genotyped using a Sequenom iPlex Gold SNP array (San Diego, California) at the McGill University and Génome Québec Innovation Centre. A total of 9 Sequenom iPlex Gold SNP arrays were built (35-40 SNPs/array). After initial testing of a first set of Sequenom arrays, additional SNPs, when available, were selected to replace failed SNPs and increase the chance that at least one SNP per gene was successful. Individuals or SNPs that had less than 80% call rates were discarded for subsequent analysis. Loci for which the rare allele was present in only one copy in the entire sample were considered as monomorphic and were discarded. Cluster plots were visually inspected, and SNPs displaying unclear separation among genotype classes were also discarded. 25  Successfully genotyped SNPs can be classified into three categories: 1) “orthologous SNPs” between species, occurring within the same gene and at the same nucleotide position; 2) “SNPs of orthologous genes”, occurring within the same gene but having different SNP position; and 3) “Single-species SNPs”, with no detected or successful SNPs in the corresponding ortholog of the other species.  Linkage disequilibrium between successfully genotyped SNPs was measured by the correlation coefficient r using the package Genetics in R v. 2.15 (R Development Core Team 2013). 2.2.4 Genetic diversity Analyses of genetic diversity were conducted separately within each species. Using Arlequin v. 3.5 (Excoffier et al. 2010), a suite of genetic diversity statistics were calculated for each SNP: minimum allele frequency (MAF), observed heterozygosity (HO), and expected heterozygosity (HE). Wright’s F-statistics (FIS, FST, FIT) were calculated for each SNP using the Weir & Cockerham's (1984) method as implemented in Genepop v. 4.2 (Raymond & Rousset 1995; Rousset 2008). Range-wide diversity estimates and F-statistics were averaged across SNPs, and were computed for two different sets of SNPs 1) all SNPs and 2) the 34 orthologous SNPs. Non-parametric bootstrapping across SNPs (n = 1000 replicates) was used to estimate 95% confidence intervals (CIs) for MAF, HO, HE, FIS, FST, and FIT and each datasets. Exact tests of Hardy-Weinberg equilibrium (HWE) were also performed on each locus with Arlequin v. 3.5 (Excoffier et al. 2010), using 100,000 steps in the Markov chain, with the first 10,000 steps discarded as burn-in. To account for multiple testing, a false 26  discovery rate (FDR) was applied using the R package “qvalue” (Storey 2002). The smoother method (Storey & Tibshirani 2003) was used with the default parameters. To test for geographic patterns of genetic diversity, population HO was regressed on latitude and longitude using linear models. Both latitude and longitude were included simultaneously in the models as covariates. We also controlled for unequal sample size by including the number of genotyped individuals as a covariate. Model simplification was performed by backward selection using a cut-off α of 5% in R (R Development Core Team 2013). To further visualise those geographic trends, population HO was interpolated across the range of both species using a “kriging” procedure in the “Spatial Analyst” toolset in ArcGIS and were displayed in ArcMAP (ArcGIS v. 10.0, ESRI). 2.2.5 Population structure We used the Bayesian program STRUCTURE v. 2.3.4 (Pritchard et al. 2000) to investigate population structure across the range of P. strobus and P. monticola. STRUCTURE iteratively assigns individuals into a number of populations termed “K” based upon maximising the likelihood of Hardy-Weinberg and linkage equilibrium within populations.  We use the term “genetic group” to refer to the K groups of individuals sharing ancestry as inferred by STRUCTURE. Both species were analysed separately. We assumed that individuals followed the “admixture model” under STRUCTURE and that allele frequencies are correlated among groups (Falush et al. 2003). For the correlated allele frequencies model, the parameter λ was estimated at K = 1, as recommended in the user manual for SNP data including rare alleles (P. monticola: λ = 0.67; P. strobus: λ = 0.79). We 27  varied K from 1 to 10 with 15 replicates for each K. For each run, a burn-in of 100,000 Markov Chain Monte Carlo (MCMC) steps was followed by 200,000 MCMC steps to ensure convergence of estimated parameters. The most likely number of genetic group K for each species was assessed by: 1) looking for a plateau in the plot of the log probability of the data versus K; and 2) using the delta K method described in Evanno et al.(2005). Log probability of the data and delta K were plotted against K using STRUCTURE HARVESTER (Earl & VonHoldt 2012). Analyses were run with and without the “LOCPRIOR” option. In the “LOCPRIOR” model, individuals from the same sampling location are more likely to share ancestry (Hubisz et al. 2009). This model can improve accuracy when genetic structure is weak, as it is expected for P. strobus and P. monticola, but is not biased towards detecting structure when none is present (Hubisz et al. 2009).  In each species, populations were classified into the K = 2 inferred main genetic groups, as inferred by STRUCTURE, according to their largest proportion of ancestry (i.e., a population was classified into the first genetic group if its proportion of ancestry from the first group was greater than 50%). Because STRUCTURE often only detects the highest level of structuring (Evanno et al. 2005), we ran STRUCTURE within each group to detect hidden sub-structure using the same parameters as described previously. Populations within each group were assigned to a sub-group according to their largest proportion of ancestry as described above. For each species, Analysis of Molecular Variance (AMOVA) was conducted (K = 2 groups) using Arlequin v. 3.5 (Excoffier et al. 2010), to partition genetic variance into 3 components: 1) within populations; 2) among populations within groups; and 3) among 28  groups. AMOVA was also performed using subsets of populations within each group inferred by STRUCTURE. Mantel tests were performed to test for isolation by distance (IBD). We estimated the correlation between a matrix of pairwise Slatkin linear FST (FST/(1-FST)) and the matrix of the natural logarithm of geographic distances, as suggested by Rousset (1997). Geographic distances between populations were obtained from the Geographic Distance Matrix Generator online tool (http://biodiversityinformatics.amnh.org/open_source/gdmg). The significance of Mantel’s statistics were tested with 1,000 random permutations using Arlequin v. 3.5 (Excoffier et al. 2010). We also searched for alleles occurring in one genetic group or population (i.e., “private alleles”) using our own python script (Appendix 3). 2.3 Results 2.3.1 SNP genotyping To investigate patterns of genetic diversity and population structure, we genotyped 204 P. monticola SNPs and 187 P. strobus SNPs on range-wide samples of both species. Genotyping success was similar for both species (Table 2.1). Forty-six P. monticola and 34 P. strobus SNPs were discarded for further analysis because they were either monomorphic, failed PCR amplification, had call rates lower than 80%, or displayed unclear separation of clusters among genotypes. We obtained 158 and 153 successful SNPs from 127 P. monticola gene regions and 120 P. strobus gene regions, respectively (Table 2.1, Table S5). Among successful SNPs, the call rate was high (P. monticola: 97.2 ± 0.3%; P. strobus: 98.7 ± 0.2%) and those genotype calls were accurate as indicated by the very low error rate inferred from 29  replicated individuals (mistyped genotypes: P. monticola: 0.07% ± 0.07%; P. strobus: 0.15% ± 0.10%). After discarding individuals with call rates lower than 80% (14 P. monticola; 12 P. strobus), our final dataset was composed of 348 P. monticola trees from 61 populations (number of trees per population: 5.7 ± 0.2) and 831 P. strobus trees from 133 populations (number of trees per population: 6.2 ± 0.4), typed for the 158 and 153 successful SNPs respectively.  Table 2.1. Genotyping results from 204 Pinus monticola SNPs (380 trees) and 187 P. strobus SNPs (843 trees) using Sequenom Iplex Gold technology.   P. monticola P. strobus Failed SNPs 35 (17%) 23 (12%) Monomorphic 11 (5%) 11 (6%) Polymorphic 158 (77%) 153 (82%) Total 204 187  Similar allele frequency distribution across SNPs was found for both species and all frequency classes (0.001 to 0.495) were well represented (Figure S3). A majority of the SNPs, 68% and 74% in P. monticola and P. strobus, respectively, were common and had a minor allele frequency between 0.05 and 0.45. Genotypes departed from Hardy-Weinberg equilibrium (HWE) at 19 P. monticola and 34 P. strobus SNPs, using a false discovery rate (FDR) of 5% (Table S6, Table S7). For each species, there was one to three successful SNPs per gene (number of SNPs/gene: P.monticola: 1.24 ± 0.04; P. strobus: 1.28 ± 0.05). Nine and 11 pairs of SNPs in P. monticola and P. strobus, respectively, were in strong linkage disequilibrium (r > 0.80). These pairs of SNPs were located within the same genes, with the exception of one pair in each species. Because these tightly linked SNPs represented a negligible fraction of all the possible SNPs pairwise comparisons (0.07% in P. monticola and 0.09% in P. strobus), they 30  should have negligible effects on the results of population genetic analyses and, therefore, all SNPs were kept for further analysis. A total of 79 orthologous genes contained SNPs in both species. These included 34 orthologous SNPs between species, and 72 and 68 SNPs unique to P. monticola and P. strobus, respectively (SNPs of orthologous genes). Fifty-two SNPs from 48 genes and 51 SNPs from 41 genes were single-species SNPs in P. monticola and P. strobus, respectively (Table 2.2). Table 2.2. Classification of successful SNPs in orthologous and non-orthologous genes between Pinus monticola and P. strobus. Species P. monticola P. strobus I: Orthologous SNPs 34 II: SNPs of orthologous genes 72 68 III: Single-species SNPs a 52 51 Total 158 153 a: No successful SNP or no variation was detected in the corresponding ortholog of the other species.  2.3.2 Genetic diversity Range-wide genetic diversity, as measured by minor allele frequency (MAF), observed (HO), and expected (HE) levels of heterozygosity (Table 2.3) was high in both species. Within-species diversity estimates using all 158 and 153 P. monticola and P. strobus SNPs, respectively, were comparable those obtained using only the 34 orthologous SNPs, leading to similar Wright’s F statistics estimates. Overall, 8.1% and 4.6% of the observed genetic variation was due to among-population differentiation (FST) in P. monticola and P. strobus respectively (all SNPs). Some significant differences between species could be noted when using all SNPs: MAF, HO, HE were marginally lower (p < 0.1), FIS significantly lower 31  (p < 0.01) and FST significantly higher (p < 0.001) in P. monticola than in P. strobus. Differences between species were not significant when using only the 34 orthologous SNPs: only HE was marginally lower in P. monticola than in P. strobus (p < 0.1). Subsequent analyses within each species were conducted using all 158 and 153 P. monticola and P. strobus SNPs, respectively. Table 2.3. Observed heterozygosity (HO), Expected heterozygosity (HE) and Wright's fixation indices (Weir & Cockherham 1984) for Pinus monticola and P. strobus.   P. monticola a  P. strobus a  P. monticola vs.  P. strobus b   All SNPs c Orthologous SNPs d  All SNPs c Orthologous SNPs d  All SNPs c Orthologous SNPs d MAF 0.196 (0.012) 0.161 (0.0234)  0.225 (0.012) 0.216 (0.024)  ▪ ns HO 0.259 (0.013) 0.224 (0.028)  0.291 (0.013) 0.288 (0.027)  ▪ ns HE 0.269  (0.013) 0.233 (0.029)  0.304 (0.013) 0.300 (0.025)  ▪ ▪ FIS -0.033 (0.009) -0.044 (0.023)  0.000 (0.007) -0.008 (0.016)  ** ns FST 0.081 (0.006) 0.069 (0.013)  0.046 (0.003) 0.052 (0.004)  *** ns FIT 0.053 (0.008) 0.035 (0.015)  0.046 (0.007) 0.045 (0.016)  ns ns a: Standard error in parentheses. Italic numbers have 95% confidence intervals excluding zero (non-parametric bootstrapping across SNPs with n = 1000 replicates). b: t-test; ▪ =  p < 0.1; * = p < 0.05; ** = p < 0.01; *** = p < 0.001. c: using all  158 and 153 SNPs for P. strobus and P. monticola respectively. d: using the 34 orthologous SNPs among species.  In P. monticola, latitude and longitude together explained 39% of the variation in HO among populations. There was a significant decrease in HO towards northern populations (slope: -0.0076; p < 0.001) and a significant increase in HO towards eastern populations (slope: 0.0024, p = 0.039) (Figure S4). In P. strobus, no significant relationship was found between HO and latitude (p = 0.207) or longitude (p = 0.841, Figure S4). The number of genotyped individuals per population did not have a significant effect on HO and was removed from all linear models. 32  Kriging interpolations of HO across both species ranges confirmed those trends. In P. monticola, HO decreased from southern to northern populations, with a steep cline in the southern Cascades between 42°N and 45°N (Figure 2.1a). Populations from the Klamath-Siskiyou Mountains and Southern Cascades (northern California and Southern Oregon) harbored the highest genetic diversity as indicated by higher values of HO. Populations at the northwestern range margins (coast of Washington and Vancouver Island) showed the lowest HO. In the southern interior portion of the range (Idaho and Montana), populations generally displayed moderately high HO. Interior British Columbia populations were characterised by moderately low HO.   33   Figure 2.1. Kriging interpolation of HO across a) Pinus monticola and b) P. strobus range using using the “Spatial Analyst” toolset in ArcGIS. Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).  In P. strobus, high genetic diversity was found across the range with little variation in HO in most of the northern portion (Figure 2.1b). HO was found to be high in a broad latitudinal band extending from northwest Ontario, Minnesota, and Wisconsin; eastward to southern Ontario, and Québec, the New England states, New York, Pennsylvania, and the Maritime provinces. HO was slightly lower in the southern Appalachian Mountains (Tennesse, Kentucky, North Carolina, Virginia, West Virginia, and southern Pennsylvania). Note that interpolated HO in northwestern Ontario, Minnesota and Wisconsin may not be as 34  precise due to the high number of populations with small sample sizes in this region (1-3 genotypes per population). 2.3.3 Population structure We used STRUCTURE to infer the genetic population structure of P. monticola and P. strobus. Results are consistent with and without the “LOCPRIOR” option, and, therefore, we only present results from the latter. In P. monticola, the delta K plot suggested a clear separation of populations among K = 2 groups (Figure S5 a,c). A steep genetic cline around the Oregon/California border separated a northern group and a southern group (Figure 2.2a). The southern group comprised the provenances from the Klamath-Siskiyou Mountains and one isolated population in the interior at the Oregon/Califonia border (Lakeview, Oregon, 42.10°N; 120.28°W). The large northern group was comprised of provenances from the Cascades, coastal Washington, coastal British Columbia and from the interior portion of the range: northeastern Washington, northern Idaho, northwestern Montana, and interior British Columbia. When the analysis was run within the northern group, STRUCTURE detected K = 2 sub-groups (Figure S6 a,c). Provenances from the interior British Columbia formed a separate group (blue ancestry), whereas the rest of the northern group was composed of mixed ancestry (green and blue ancestry, Figure 2.2b). Ancestry from the “interior British Columbia” group is also more prevalent in northern coastal populations of British Columbia and northern Washington. In the southern group, K = 4 sub-groups were detected by STRUCTURE (Figure 2.2b, Figure S7 a,c). Three sub-groups corresponded to each of the three populations with at least six sampled individuals (Figure 2.2b, Figure S7 a,c). A fourth 35  sub-group included the three easternmost and southernmost populations (Lakeview, OR; Willow Mountain, CA; Duck Lake, CA). AMOVA analysis performed using the K = 2 main northern and southern groups revealed that genetic variation found among groups (FCT = 0.098) was about twice as large as genetic variation among populations within groups (FSC = 0.046, Table 2.4). Genetic differentiation among sub-groups (FCT) was much larger within the southern group (FCT = 0.089) than within the northern group (FCT = 0.012).  36  a)  b)  Figure 2.2. STRUCTURE results for Pinus monticola using a) all samples (“range-wide”) and b) within the “northern” and “southern” groups detected in a) (analyses performed separately). Shaded grey area represents the range of P. monticola (redrawn from Critchfield & Little 1966). Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).   37  Table 2.4. Analysis of molecular variance (AMOVA) for Pinus monticola. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group"). Source of variation Degrees of Freedom Sum of squares Variance components Percent of variation Fixation indices    Range-wide a                 Among groups 1 233 2.14 9.83 FCT = 0.098       Among population within groups 59 1709 0.90 4.15 FSC = 0.046       Within populations 635 11866 18.69 86.02       Total 695 13809 21.72                    Northern group                 Among sub-groups 1 96 0.24 1.22 FCT = 0.012       Among population within sub-groups 52 1383 0.69 3.54 FSC = 0.036       Within populations 590 10876 18.43 95.24         Total 643 12356 19.36                    Southern group                 Among sub-groups 3 160 2.19 8.93 FCT = 0.089       Among population within sub-groups 3 70 0.31 1.25 FSC = 0.014       Within populations 45 990 22.00 89.82         Total 51 1220 24.49     a: all sampled populations included (northern and southern group).  In P. strobus, STRUCTURE detected K = 2 main genetic groups, separating the southern Appalachian Mountains (southern West Virginia, North Carolina, Tennessee and Georgia) from the northern part of the range (Figure 2.3a, Figure S5 b,d). As with P. monticola, both southern and northern groups can be further subdivided (Figure S6 b,d, Figure S7 b,d). Within the northern group, K = 3 weakly differentiated sub-groups were detected (FCT = 0.009, Table 2.5, Figure S6 b,d). All populations were highly admixed but a longitudinal trend could be noted: ancestry from the first sub-group was greater in the central part of the range (central Appalachian Mountains, eastern Ontario, southern Québec); ancestry from the second sub-group was greater in the eastern Maritime provinces but also 38  well spread into southern Québec and eastern Ontario; and ancestry from the third sub-group was greater in the western part of the range in the Great Lakes region (Figure 2.3b). Analysis within the southern group revealed K = 4 sub-groups (FCT = 0.063, Table 2.5 Figure S7 b,d), with populations on the eastern slopes of the Appalachians being predominantly classified into two sub-groups and the two populations sampled on the western slopes being classified into the other two sub-groups (Figure 2.3b). AMOVA analysis revealed that the discontinuity between the major southern and the northern group (FCT = 0.022) was weaker than in P. monticola, and that differentiation between these two groups was lower than differentiation among populations within groups (FSC = 0.040).    39  a)  b)  Figure 2.3. STRUCTURE results for Pinus strobus using a) all samples (“range-wide”) and b) within the “northern” and “southern” groups detected in a) (analyses performed separately). Shaded grey area represents the range of P. strobus (redrawn from Critchfield & Little 1966). Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).   40  Table 2.5. Analysis of molecular variance (AMOVA) for Pinus strobus. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group"). Source of variation Degrees of Freedom Sum of squares Variance components Percent of variation Fixation indices    Range-wide                 Among groups 1 228 0.5 2.18 FCT = 0.022       Among population within groups 131 4284 0.90 3.92 FSC = 0.040       Within populations 1529 32993 21.56 93.90       Total 1661 37505 22.98                    Northern group                 Among sub-groups 2 241 0.19 0.86 FCT =  0.009       Among population within sub-groups 122 3639 0.70 3.11 FSC = 0.031       Within populations 1329 28879 21.73 96.03         Total 1453 32759 22.63                    Southern group                 Among sub-groups 3 288 1.41 6.32 FCT =  0.063       Among population within sub-groups 4 116 0.39 1.74 FSC = 0.019       Within populations 200 4115 20.57 91.94         Total 207 4519 22.38      We found significant isolation by distance (IBD) in both species (Table 2.6). For P. monticola, moderate IBD was found among populations at both the range-wide level and within the northern group. In P. strobus, significant but weak IBD was detected at both the range-wide level and within the northern group. No IBD was detected among the few populations sampled in the southern groups of both species.   41  Table 2.6. Mantel test results for Pinus monticola and P. strobus.   Regression coefficient (bY) Correlation coefficient (rY) Determination of Y by X p-value P. monticola    Range-wide 0.041 0.405 0.164 0.000    Northern group 0.013 0.237 0.056 0.000    Southern group -0.003 -0.035 0.001 0.514 P. strobus    Range-wide 0.013 0.156 0.024 0.000    Northern group 0.011 0.125 0.016 0.000    Southern group 0.003 0.064 0.004 0.450  For P. monticola 8 private alleles were found in the northern group (i.e., the alleles were not detected in the southern group), and one was found in the southern group. In P. strobus, 7 alleles were private to the northern group, and none were found only in the southern group. Private alleles were segregating at low frequency (from 0.002 to 0.133 in P. monticola and from 0.004 to 0.084 in P. strobus), and may have been undetected in the southern groups because of to the small number of population sampled.  2.4 Discussion In this Chapter, we developed a set of SNP markers for Pinus monticola and P. strobus, and evaluated patterns of genetic diversity and population structure across the ranges of both species. This dataset allowed us to look for evidence of the effects of postglacial recolonization on genetic diversity and to gain knowledge of population demographic history for future adaptation studies (Chapter 3). 42  2.4.1 Genetic diversity in Pinus monticola and P. strobus Consistent with the expectations of long-lived perennial, outcrossing, wind-pollinated tree species (Hamrick & Godt 1996), we found that populations of both species were highly genetically diverse and contained most of the genetic variation within populations (Hamrick & Godt 1996). In accordance to this study, moderate to high levels of genetic diversity were found in other studies of P. monticola and P. strobus using allozymes, AFLP, RAPD, ISSR and microsatellite markers (Steinhoff et al. 1983; Beaulieu & Simon 1994; Rajora et al. 1998; Isabel et al. 1999; Mehes et al. 2007, 2009; Kim et al. 2010; Mandák et al. 2013).  Genetic diversity estimates in our study species were comparable to those found in other widely distributed North American conifers using SNP markers (Pinus taeda: HE = 0.25, Eckert et al. 2010b; Picea glauca: HE = 0.27, Namroud et al. 2008; and P. mariana: HE = 0.25, Prunier et al. 2012). More narrowly distributed conifer species or having fragmented ranges had also similar genetic diversity (Pinus radiata: HE = 0.29, Dillon et al. 2013; Abies alba: HE = 0.26, Mosca et al. 2012) or lower than our two study species (Pinus cembra: HE = 0.24; P. mugo: HE = 0.21; Mosca et al. 2012).  Higher among-population differentiation was found in P. monticola than in P. strobus (P. monticola: FST = 0.081%; P. strobus: FST = 0.046%). This could be mainly attributed to a steep genetic cline in the southern part of the P. monticola range. Among-population differentiation within genetic groups were similar between the two species (P. monticola: FSC = 0.046; P. strobus: FSC = 0.040). Those estimates are also similar to what have been found in other pine species using SNP markers (P. taeda: FST = 0.043, Eckert et al. 2010a; P. radiata: FST = 0.043, Dillon et al. 2013; P. mugo: FST = 0.025, Mosca et al. 2012; P. cembra: 43  FST = 0.021, Mosca et al. 2012). Pinus pollen can travel distances of tens of kilometers, and long distance dispersal of up to 600 - 1,000km are reported (Williams 2010; Robledo-Arnuncio 2011; Kremer et al. 2012). Therefore, high levels of gene flow allowed the maintenance of high genetic variation in our study species. Spatial patterns of genetic diversity varied considerably across the range of P. monticola. Regression and kriging models showed a steep decrease in observed heterozygosity (HO) towards northern latitudes. Rapid range expansion during the last interglacial period may explain the loss of diversity in northern populations (Hewitt 1996, 2004). Computer simulations showed that when new populations are established from long-distance dispersal of few individuals, they are usually genetically differentiated and exhibit lower genetic diversity than refugial populations from which they originated (Bialozyt et al. 2006). In P. strobus, HO was high and varied less across the range as compared to P. monticola. Many peripheral populations of P. strobus had similar levels of genetic diversity as core populations (see also Beaulieu & Simon 1994; Rajora et al. 1998). Such uniform levels of diversity found in the large northern group of P. strobus could result from a more gradual expansion of a migration front and significant gene flow from core populations (Ibrahim et al. 1996). Hence, despite drastic population decline and fragmentation due to overharvesting and the introduction of the white pine blister rust, no reduction in diversity across the range of P. strobus was observed (see also Rajora et al. 1998; Mehes et al. 2009). It should be noted that only few generations have passed since the decline in population sizes of 19th and 20th century, and, therefore, retention of substantial effective population sizes can occur because of extensive gene flow among populations and across generations in this long-lived conifer species (England et al. 2003; Johansson et al. 2006). 44  2.4.2 Ascertainment bias Another particularity of our approach is that we developed SNPs in parallel with P. monticola and P. strobus. Often SNP primers developed in one species are tested and transferred directly to another species. This creates a bias as trans-species shared polymorphisms are more likely to be neutral or nearly neutral (purifying selection) (Bouillé & Bousquet 2005). We partly overcame this problem as SNP discovery panels included gene sequences from about equal numbers (15-20) of individuals from both species. However, bias in our data can be caused by conserved genes between P. taeda and among 11 Pinus subgenus strobus species. These genes have reduced nucleotide diversity and significantly lower ratio of non-synonymous to synonymous diversity (Eckert et al. 2013a). Similarly, the candidate genes for wood and growth were conserved between Picea glauca and our study species and may also suffer from reduced non-synonymous sites diversity. Most of the current genomic resources and candidate genes for non-model species are developed in a similar manner. Future work, involving next-generation sequencing technologies would potentially eliminate this bias (Parchman et al. 2012; Eckert et al. 2013a). 2.4.3 Population structure and postglacial history in Pinus monticola and P. strobus Both of our white pine species exhibited similarly structured genetic diversity as populations of both species were mainly divided into a northern and a southern genetic group. As it was suggested for many other North American and European species, the survival of P. monticola and P. strobus populations into glacial refugia during the Pleistocene glaciations can be invoked to explain the observed north–south population structure (Soltis et al. 1997; Brunsfeld et al. 2001; Jaramillo-Correa et al. 2009). One 45  possible explanation is that Pleistocene glaciations confined populations in “northern” and “southern” refugia, during which they experienced bottleneck and genetic differentiation (Figure 2.4). Holocene warming would have allowed populations from the two refugia to expand and come into secondary contact, creating the observed genetic discontinuities (Steinhoff et al. 1983; Soltis et al. 1997).  Figure 2.4. Hypothetized locations of glacial refugia (circles) in a) Pinus monticola and b) P. strobus. Map produced using ArcMAP (ArcGIS v. 10.0, ESRI).  In P. monticola, our results support this “north-south colonization hypothesis” (Soltis et al. 1997). We found distinct northern and southern genetic groups with a narrow contact zone near the Oregon/California border, consistent with previous findings using allozyme 46  and AFLP markers (Steinhoff et al. 1983; Kim et al. 2010). To the north of this region, a narrow transition zone between the two genetic groups across southern Oregon (between 42° and 45° latitude) was observed and high between group differentiation was detected (FCT = 0.098). A similar contact zone located near central or southern Oregon between northern and southern genotypes was identified in several plant and tree species distributed in the Pacific Northwest (Soltis et al. 1997; Jaramillo-Correa et al. 2009; Shafer et al. 2010; Keir et al. 2011). In addition, a comparative study including several tree, bird, and mammal species, confirmed the Klamath-Siskiyou Mountains as a hot spot of contact zones and phylogeographic breaks within species (Swenson & Howard 2005). Such discontinuities likely result from recolonization from separate northern and southern refugia (Soltis et al. 1997). However, there is little agreement as to the possible locations of these glacial refugia (Brunsfeld et al. 2001).  The observed patterns of genetic diversity suggest that P. monticola populations occurred in a “northern” refugium located in southern or central Oregon (Figure 2.5). We observed a strong decrease in genetic diversity across Oregon and towards northern populations, likely due to recolonization from a limited genetic base and subsequent bottlenecks of the leading edge populations (Steinhoff et al. 1983; Hewitt 1996, 2004). The pollen records also suggest that P. monticola may have survived the Pleistocene glaciations in the coast range of Oregon and that northward colonization from this region occurred during the Holocene (MacDonald et al. 1998).  Our results also suggest the presence of a southern refugium. Consistent with previous studies, populations of the Klamath-Siskiyou Mountains of southern Oregon and 47  northern California were part of the southern group (Steinhoff et al. 1983; Kim et al. 2010). Previous studies found that populations from the Sierra Nevada were also part of the southern group, but formed a distinct sub-group that was differentiated from the Klamath-Siskiyou Mountains populations (Steinhoff et al. 1983; Kim et al. 2010). Hence, these results suggest the presence of a southern refugium, possibly located in the Klamath-Siskiyou Mountains (Figure 2.4, Whittaker 1961; Smith & Sawyer 1988), where we found high genetic diversity, or in the Sierra Nevada. However, it is difficult to infer the putative location of southern refugia since we did not sample populations from the Sierra Nevada. A similar pattern of differentiation was found in P. strobus, which may also be indicative of north-south recolonization from dual refugia. A genetic discontinuity divided populations into a large northern group comprising most of the range and a smaller southern group, restricted to southern West Virginia, North Carolina, Tennessee, and Georgia. Previous molecular studies in P. strobus only sampled populations from the northern group and, consequently, failed to detect geographical patterns of population structure (Beaulieu & Simon 1994; Rajora et al. 1998; Isabel et al. 1999; Mehes et al. 2009; Mandák et al. 2013). The discontinuity zone we detected does not seem to correspond to a separation between eastern and western slopes of the Appalachian Mountains or any other strong geophysical barrier. Hence, the north-south population structure may be best explained by survival of P. strobus populations into dual glacial refugia, as with many other northeastern American species (Jaramillo-Correa et al. 2009). Macrofossils and pollen evidence of P. strobus at the last glacial maximum (LGM, ~20,000 BP) are scarce but support this hypothesis. A northern refugium may have existed in 48  the mid-Atlantic coast (Shenandoah Valley, Virginia) where one of the earliest P. strobus macrofossil (13,000 years before present, BP) was found (Davis 1983). This region also coincides with the inferred origin of the initial northward and westward expansion from pollen fossil records (Jacobson & Dieffenbacher-Krall 1995; Jackson et al. 1997). Furthermore, we found high genetic diversity all along the mid-atlantic coast, reinforcing evidence of glacial refugia in this region. Refuge populations of P. strobus may also have existed in proposed southern refugia on the gulf of Mexico or the southern Atlantic coast (Figure 2.4, Davis 1983; Jackson et al. 1997; Jaramillo-Correa et al. 2009). LGM assemblages in northwestern Georgia showed the presence of sporadic P. strobus macrofossils and trace amounts of haploxylon type pollen, indicating that P. strobus likely occurred in a southern refugium in this region as a minor component of P. banksiana Lamb. and P. resinosa Sol. Ex Aiton forests (Jackson et al. 2000). Considering the rarity of late Pleistocene P. strobus macrofossils (Jackson et al. 2000), this is considerable evidence that P. strobus occurred as far south as 34° at the LGM. Hence, fossil records corroborate observed patterns of genetic differentiation, and suggest the presence of at least two glacial refugia in P. strobus. We also found that large northern genetic groups of both species exhibited subtle genetic structure, although weaker than the major north-south discontinuity described earlier. In P. monticola, the interior British Columbia populations formed a slightly differentiated genetic sub-group from the main northern group. Similarly, Kim et al. (2010) also found that populations from the northern interior of British Columbia and northwestern Montana formed a distinct subclade and were genetically distant from other northern provenances. 49  Such sub-structure may be explained by additional glacial refugia in the Rocky Mountains (Shafer et al. 2010). Northern refugium in P. monticola may have existed in Glacier National Park, Montana, where macrofossils were found as early as 10,500 BP (MacDonald et al. 1998). Evidence of a glacial refugia east of the Rockies in Montana was also found in P. contorta Douglas (Godbout et al. 2008). Kim et al. (2010) also suggested additional refugia in P. monticola: the interior refugium Clearwater (northern Idaho, USA), and coastal refugium Haida Gwaii (British Columbia) or the Brooks Peninsula (British Columbia), but whether P. monticola populations survived glaciations in these putative refugia is unclear. In P. strobus, a longitudinal trend in ancestry was noted within the northern group. In other broad range boreal species, a division between eastern and western lineages located near eastern Ontario and the Great Lakes suggested that western lineages were colonized from a refugium south of the Great Lakes (Jaramillo-Correa et al. 2009). However, no fossil evidence suggests the presence of P. strobus at LGM in the Great Lakes region or in other cryptic northern refugia (Jackson et al. 2000). Hence, the presence of additional northern refugia is unclear due to the scarcity of fossil records and is not supported by the low among-population differentiation found among the northern sub-groups (FCT = 0.009). High pollen mediated gene flow may have weakened signals of population structure detected using bi-parentally inherited markers such as SNPs (seed and pollen mediated gene flow). Maternally inherited markers for which gene flow is limited by seed dispersal, and effective population size is one quarter of diploid nuclear markers are more prone to reveal population structure and the “genetic imprint” of past historic events (e.g., Richardson et al. 2002; Jaramillo-Correa et al. 2009). Studies using maternally inherited genetic markers are 50  needed to shed more light on putative cryptic northern refugia and postglacial colonization routes in our study species. Natural selection may have also played a role in the creation and maintenance of the observed hierarchical genetic structure. In P. monticola, genetic variation in quantitative traits displayed strikingly similar patterns of differentiation to that of selectively neutral AFLP genetic markers across the genetic discontinuity at the Oregon/California border and across the Cascade crest. Richardson et al. (2009) postulated that the southern Cascades region was not a recent contact zone and that differentiation resulted from divergent adaptation to contrasted climates. In P. strobus, the southern group populations also display genetic differentiation in growth performance as compared to northern group populations (Fowler & Heimburger 1969; Garrett 1973; Joyce & Sinclair 2002). Such differentiation between northern and southern groups may result from adaptation to different climates occurring during interglacial and glacial periods. Hewitt (1996) noted that the last 700,000 years have been dominated by major ice ages interrupted by relatively short warm interglacials (10,000 years) and, therefore, populations are more likely to be adapted to the climatic conditions prevailing in refugia during the longer glacial periods. In Chapter 3, we look for evidence of adaptation to climate and attempt to disentangle the relative influence of local adaptation from postglacial recolonization history in shaping population structure. 51  3. Local adaptation to climate in Pinus monticola and P. strobus: same or different genes? 3.1 Introduction Climate-driven selection has been shown to be a major factor affecting the landscape distribution of genetic diversity in tree species. Tree species generally exhibit moderate to high among-population genetic variation for adaptive traits along climatic gradients, despite the homogenizing effects of gene flow and rapid range expansion following the last glaciations (Howe et al. 2003; Savolainen et al. 2007; Alberto et al. 2013). Even though such evidence of local adaptation in forest trees from common garden experiments is well established, to date, population genetic studies have generally invoked neutral processes, such as genetic drift, gene flow, and past demographic events, to explain the observed population genetic structure (Orsini et al. 2013).  One pattern of population structure arising from neutral processes is isolation by distance (IBD), where gene flow between populations is reduced with increasing distance separating them. Another more recent view is that natural selection may also affect genome-wide population divergence via “isolation by adaptation” (IBA; Nosil et al. 2009). In IBA, gene flow among ecologically divergent habitats is reduced because of lower establishment success of non-locally adapted immigrants from different environments (i.e., increased genetic differentiation with increasing climatic distance). Evidence that IBA is an important factor shaping population structure in tree species is increasing (Mosca et al. 2013). IBA may produce patterns of landscape genetic structure similar to IBD when environmental variation 52  is correlated with geography (Orsini et al. 2013). Postglacial recolonization can also generate allele frequency gradients similar to IBD or IBA as a result of repeated founder effects and “allele surfing” on the expansion wave (de Lafontaine et al. 2013). Teasing apart the relative importance of past historic events, IBD, and IBA is therefore an important first step to look for signature of local adaptation. Genes that are important for local adaptation are often found by testing for atypically high or low among-population genetic differentiation against selectively neutral loci (“FST outlier tests” or “genome scans”, Beaumont & Nichols 1996; Beaumont & Balding 2004; Foll & Gaggiotti 2008; Excoffier et al. 2009) or correlations with environmental factors of interests (“environmental association methods”, Joost et al. 2007; Coop et al. 2010; Frichot et al. 2012; Günther & Coop 2012). However, signatures of selection can be confounded with complex demographic scenarios (see above), leading to high rates of false positives in FST outlier and environmental association tests (Excoffier et al. 2009; Meirmans 2012; de Villemereuil et al. 2014; Lotterhos & Whitlock 2014). Accurately detecting signatures of selection for candidate loci with low false positive rates can be better achieved by combining various methods (de Villemereuil et al. 2014) or by using replicates of populations across similar climatic gradients (Storz 2005). Replication among closely-related species or evolutionary lineages, using the same set of common genes, can also be a powerful way to detect and confirm conserved loci of importance for adaptation (Arendt & Reznick 2008; Grivet et al. 2011). Western white pine (Pinus monticola) and eastern white pine (P. strobus) are closely related species that diverged less than 12 million years ago (Gernandt et al. 2008). Both species are widely 53  distributed across a variety of climates in western and eastern North America, respectively. Past range shifts have been identified as a major driver of population structure in both species (see Chapter 2). In each species, expansion from two glacial refugia has been hypothesized, creating a genetic discontinuity between a large northern group and a southern group. Weak populations sub-structure was also detected within each of the genetic groups, and other cryptic northern refugia were suggested (see Chapter 2). However, the extent to which local adaptation has contributed to shape the observed population structure at nuclear loci, as opposed to neutral processes such as past range shifts, has yet to be elucidated. P. monticola and P. strobus show contrasting patterns of adaptation to climate. P. monticola is described as a habitat generalist, with little to no among-population genetic differentiation for adaptive traits within the northern and southern genetic groups, suggesting that it has adapted to harsh environments mostly via phenotypic plasticity (Rehfeldt et al. 1984; Chuine et al. 2006). In contrast, P. strobus populations showed moderate among-population differentiation for adaptive traits, and this variation could be partly explained by geographic and climatic variables, which is general evidence of local adaptation to climate (Fowler & Heimburger 1969; Garrett 1973; Genys 1987; Li et al. 1997; Joyce & Sinclair 2002; Lu et al. 2003a; b; Joyce & Rehfeldt 2013). Given their apparently differing strategies for surviving a wide range of environments, it is interesting to compare patterns of local adaptation between the two species. Despite these contrasting patterns, previous studies have suggested a role for climate in shaping population structure of both species. In P. monticola, two major northern and southern ecotypes showed marked differences in height growth and cold hardiness, closely 54  matching the northern (coastal populations north of ~43 degrees of latitude and interior populations) and southern genetic groups (southern Oregon and California) described for nuclear markers (Rehfeldt et al. 1984; Richardson et al. 2009). Although recolonization history is likely to have played a role in the divergence of these two groups (Chapter 2), Richardson et al. (2009) also suggested that local adaptation to contrasted climates could have resulted in reproductive isolation between the southern and northern ecotypes.  Similarly, local adaptation could have played a role in the observed among-population genetic differentiation in P. strobus. Populations from the southern Appalachian Mountains (~34°N to ~36°N) showed better growth performance than northern provenances when planted as far north as 45°N, but this was drastically reduced further north (Fowler & Heimburger 1969; Garrett 1973; Joyce & Sinclair 2002). Hence, part of the genetic differentiation between the southern and northern groups of both species may be attributed to IBA patterns, resulting from local adaptation to different climates.  Apart from using appropriate methods to detect local adaptation, the sampling design can also be of importance to the performance of genome scans and environmental association analyses (Manel et al. 2010a). De Mita et al. (2013) showed that sampling more outcrossing populations, with fewer individuals per population, dramatically improved the performance of FST outlier and environmental association methods. To further increase the power of environmental association analysis, it may be even more effective to sample only few individuals per population (1-5) from as many climatically variable populations as possible across the range of a species (Poncet et al. 2010). However, most previous studies of wild populations have focused on sampling a large number of individuals per populations (>20) 55  from a small number of populations (Willing et al. 2012). Existing individual based FST estimators (Ritland 1996 and further developed here) and Bayesian methods such as BayeScan (Foll & Gaggiotti 2008) that take into account uncertainty due to small sample sizes show great promise for providing unbiased FST estimators at single loci when only few individuals per populations are sampled. Furthermore, individual based environmental association methods can also be a powerful new approach to detect loci involved in adaptation to climate (Frichot et al. 2013). Here, we look for signatures of local adaptation in 61 P. monticola and 133 P. strobus populations distributed across their natural ranges, and compare patterns of adaptive genetic diversity between both species using 311 single nucleotide polymorphism (SNP) markers developed from a set of 168 orthologous genes. The objectives of this study were to: 1) determine whether adaptation to climate is shaping population structure in P. monticola and P. strobus; 2) identify genes involved in adaptation to climate in each species; and 3) compare patterns of adaptive variation at orthologous genes in common between the two species. We used a combination FST outlier methods suitable for small population sample sizes (BayeScan, Foll & Gaggiotti 2008; Bayenv 2, Günther & Coop 2012; LFMM, Frichot et al. 2012) and developed an individual based FST estimator to look for outliers using the “FDIST” framework (Beaumont & Nichols 1996). Given the large ecological amplitude of each species ranges with respect to both temperature and precipitations, their relatively recent divergence, and the high degree of synteny between conserved orthologous genes observed in conifers (Pavy et al. 2012b), both species may have adapted to similar evolutionary pressures via similar genetic mechanisms (i.e., via the same genes or gene families). 56  3.2 Material and methods 3.2.1 Sampling and SNP genotyping To investigate patterns of adaptations, we used a previously developed dataset (described in Chapter 2), in which 158 and 153 SNPs were genotyped on 348 and 831 Pinus monticola and P. strobus individuals, respectively (Figure S1). Briefly, the SNPs used in this study were mainly developed from two gene sets. A first set of 129 P. monticola and 103 P. strobus SNPs were developed from 118 gene sequences from the White Pine Resequencing Project (WHISP, http://dendrome.ucdavis.edu/wpgp). These sequences were selected based solely on PCR and sequencing success in 11 white pine species, without prior knowledge of putative gene function. However, those genes still contain some strong candidates for adaptation (Table S8). The second set of 29 P. monticola SNPs and 47 P. strobus SNPs comes from the resequencing of 24 candidate genes for growth, 24 candidate genes for wood formation, and one candidate for cold hardiness (Holliday et al. 2008; Namroud et al. 2008; Pelgas et al. 2011; El Kayal et al. 2011; Prunier et al. 2011). Finally, 3 P. strobus SNPs from 2 genes sequences available on GenBank were added to the dataset (Syring et al. 2007a; b).  SNP development was conducted in parallel using orthologous gene sequences available in both species and is described in more details in Chapter 2. SNPs can be classified into three categories: I) “orthologous SNPs” between species, occurring within the same gene and at the same nucleotide position; II) “SNPs of orthologous genes”, occurring within the same gene but having different SNP position; and III) “single-species SNPs”, with no detected or successful SNPs in the corresponding ortholog of the other species. A total of 79 orthologous genes contained SNPs in both species. These included 34 orthologous SNPs 57  between species, and 72 and 68 SNPs unique to P. monticola and P. strobus, respectively (SNPs of orthologous genes). Fifty-two SNPs from 48 genes and 51 SNPs from 41 genes were single-species SNPs in P. monticola and P. strobus, respectively (Table 2.2). Annotation of genes was completed from a “tblastx” search of the database refseq (http://www.ncbi.nlm.nih.gov/refseq/) using the full contigs (coding and non-coding regions, Table S8). Gene ontology (GO) terms were also extracted from this database. We used only those matches with E-values < 1 e-10 to conserve only high similarity matches and limit matches involving small fragments. For the WHISP set of genes, we deduced SNP annotations from the coding regions given for 167 fully annotated genes in Eckert et al. (2013a), which was based on “tblastx” searches against Arabidopsis, Oryza, Populus and Picea, in combination with the expressed sequence tag (EST) from Pinus taeda. For the candidate genes for growth, wood formation, and cold hardiness and the two genes sequences from GenBank, the Picea glauca expressed sequence tag (EST) database was used to deduce coding regions and SNP annotations (white spruce gene catalog GCAT, Rigault et al. 2011). To increase the power of environmental association analysis, we sampled a large number of populations (Pinus monticola: 61 populations; P. strobus: 133 populations), and those were selected to cover the widest possible range of climatic conditions across the natural distribution of both species (Figure S1). Each population is represented by few individuals, generally 1 to 6 individuals per population, but more individuals per population were sampled in some areas where only sparsely distributed samples were available (up to 12 for P. monticola and up to 21 for P. strobus). See Chapter 2 for more details. 58  3.2.2 Hierarchical population structure Analyses to identify the proportion of genetic variation due to climate adaptation (i.e., partial Mantel tests and redundancy discriminant analysis, RDA) and to detect loci under selection (i.e., FST outlier and environmental association analysis) attempted to control for neutral population structure in different ways. In our study species, two distinct genetic groups in each species, corresponding to northern and southern groups, were previously detected by STRUCTURE. This structure was weak to moderate with FCT between groups of 0.10 and 0.02 for P. monticola and P. strobus respectively (Chapter 2). Sub-structure was also detected within the southern groups (P. monticola: 4 sub-groups, FSC = 0.089; P. strobus: 4 sub-groups: FSC = 0.063) and weak sub-structure was detected within the northern groups (P. monticola: 2 sub-groups, FSC = 0.012; P. strobus: 3 sub-groups, FSC = 0.009; Chapter 2). The presence of hierarchical population structure can lead to higher type I errors (Excoffier et al. 2009). To take this into account, we performed all analyses using three different datasets for each species: 1) “range-wide” populations (i.e., northern and southern groups combined; P. monticola: n = 348 individuals, 61 populations; P. strobus: n = 831 individuals, 133 populations); 2) only samples from the northern genetic groups (P. monticola: n = 322 individuals, 54 populations; P. strobus: n = 727 individuals, 125 populations); and 3) only samples from the southern genetic groups (P. monticola: n = 26 individuals, 7 populations; P. strobus: n = 104 individuals, 8 populations). The southern groups of each species comprised a smaller number of individuals and populations, and it should be noted that the Sierra Nevada populations at the southern end of the P. monticola 59  range were not sampled. All analyses for P. monticola and P. strobus were performed separately. 3.2.3 Climatic data Climatic data for each population was obtained using Climate WNA (Wang et al. 2012b) for P. monticola and Climate NA (T. Wang, personal communication) for P. strobus. We selected 15 annual climatic variables that were not strongly correlated (r<0.90) in at least one of the two species: mean annual temperature (MAT, °C), mean warmest month temperature (MWMT, °C), mean coldest month temperature (MCMT, °C), mean annual precipitation (MAP, mm), mean summer precipitation (MSP, mm), annual heat: moisture index (AHM), summer heat:moisture index (SHM), beginning of frost free period (bFFP, Julian date), end of frost free period (eFFP, Julian date), precipitation as snow (PAS, mm), Hargreaves reference evaporation (Eref, mm) and Hargreaves climatic moisture deficit (CMD, mm). To account for other unmeasured climatic variables that would follow a geographic pattern, three geographic variables were also tested: latitude (decimal degrees), longitude (decimal degrees) and elevation (m). Reduction of climatic variables to a number of principal components was avoided to allow closer comparisons between both species. 3.2.4 Isolation by distance (IBD) versus isolation by adaptation (IBA) Partial Mantel tests and redundancy discriminant analysis (RDA) were both applied to separate the fraction of among-population differentiation that can be attributed to environment-driven selection, as opposed to isolation by distance (IBD) or past demographic events such as expansion from glacial refugia. 60  First, Mantel tests were performed to test for IBD among populations using range-wide populations of each species, as well as, populations within each groups (see “3.2.2 Hierarchical population structure”). We estimated the correlation between a matrix of pairwise Slatkin linear FST (FST /(1- FST)) and the matrix of geographic distances, as suggested by Rousset (1997). Slatkin linear FST was computed using Arlequin v. 3.5 (Excoffier et al. 2010). Geographic distances between populations were obtained from the Geographic Distance Matrix Generator online tool (http://biodiversityinformatics.amnh.org/open_source/gdmg). Second, we tested for isolation by adaptation (IBA) when controlling for IBD. Environmental distances for each 15 climatic variables were computed as the Euclidian distance between populations using the “dist” function in R (R development Team, 2013). The correlation between genetic distances (Slatkin linear FST) and environmental distances for each climate variables was tested when including geographic distances as a covariate. The significance of Mantel’s statistics was tested using n = 1000 random permutations using the “mantel” function in the ECODIST package (Goslee & Urban 2007) in R. Second, RDA analysis was performed to further quantify the relative proportion of among-population genetic variation that is due to IBA, IBD and ancestry from postglacial recolonization history. RDA is a multiple linear regression method between a first matrix of dependent and a second matrix of independent (explanatory) variables. This type of multivariate analysis is more appropriate than Mantel tests when multiple climatic variables are analysed to identify ecological drivers of population genetic structure (Orsini et al. 2013) and has recently been applied in a population genetic context (Legendre & Fortin 2010; Orsini et al. 2012; Vangestel et al. 2012).  61  In our case, the analysis involved one dependent and three independent matrices. The dependent matrix was the matrix of allele frequencies for each population. The three independent matrices were: 1) a matrix of 12 climatic variables for each population; 2) a matrix of spatial variables; and 3) a matrix of ancestry coefficients (Chapter 2). The 3 geographical variables (latitude, longitude and elevation) were omitted in the climatic matrix because they were included in the spatial matrix. The spatial matrix was obtained by calculating spatial variables using a cubic trend surface analysis (Borcard et al. 1992). For this, the x and y coordinates were used as spatial variables, as well as the combinations of their second order polynomials, which yielded five spatial variables: x, y, xy, x2, y2. This ensured that linear gradients in the data, as well as more complex patterns, were extracted. The spatial variables to be included in the RDA were determined using a forward selection procedure using the “step” function from the “vegan” package in R (R development Team, 2013). Following (Lee & Mitchell-Olds 2011) a stringent alpha value of 0.01 was used for the forward selection. This resulted in four spatial variables that were retained for P. monticola (x, y, xy, y2) and two in P. strobus (x, y).  The third matrix accounted for known population structure, likely due to expansion from two glacial refugia in each species, by including the Q values outputted by STRUCTURE (Chapter 2). In P. monticola, we used K = 3 groups to account for the two main northern and southern genetic groups and the two sub-groups found in the northern group. In P. strobus, we used K = 4 groups, separating populations from the two main northern and southern genetic groups and the three sub-groups found in the northern group. All three independent matrices were scaled with a mean of zero and a variance of one prior to 62  analysis. The among-population variation in each species was partitioned into pure climatic, spatial, and ancestry components, as well as all possible combinations of those three variables, using the “varpart” function from the R-package “vegan” (R development Team, 2013). Significance was tested with the “anova.cca” function of vegan with a maximum of n = 999 permutations using a series of six redundancy analyses: 1) Allele frequencies constrained by the bioclimatic matrix. 2) Allele frequencies constrained by the spatial matrix. 3) Allele frequencies constrained by the ancestry matrix. 4) Allele frequencies constrained by the bioclimatic matrix, after removing the effect of the spatial and ancestry matrix. 5) Allele frequencies constrained by the spatial matrix, after removing the effect of the bioclimatic and ancestry matrix. 6) Allele frequencies constrained by the ancestry matrix, after removing the effect of the bioclimatic and spatial matrix. Analyses were performed using range-wide populations and populations within the northern groups. However, analysis within the southern groups could not be conducted due to a small number of populations and collinearity among independent variables. The code used to implement the RDA analysis is given in Appendix 4. 63  3.2.5 FST outlier tests Two different methods were used to detect FST outliers in each species separately. First we used a summary-statistics method similar to the “FDIST” method of Beaumont & Nichols (1996) and extended it to allow for small sample sizes using an “individual based” F estimator (Ritland 1996). We will refer to this method as the “FDIST Ritland” method. Using this method, pairwise inbreeding coefficient Fij between individuals i and j at a single SNP locus can be calculated as: ܨ௜௝ ൌ 	14 ሺ݀௜ଵ,௝ଵ ൅ ݀௜ଵ,௝ଶ݌௜ଵ ൅݀௜ଶ,௝ଵ ൅ ݀௜ଶ,௝ଶ݌௜ଶ െ 4ሻ Where i1 and i2 represent allele 1 and 2 in individual i, respectively, and j1 and j2 represent allele 1 and 2 in individual j, respectively. Values of pi1  and pi2 are the range-wide allele frequencies for alleles 1 and 2, respectively. The values of di,j are equal to 1 if the two alleles compared are the same and to 0 if the two alleles are different. Pairwise Fij values are calculated for all within-population pairwise comparisons of individuals and averaged across the whole dataset to yield the individual based FST estimator: ܨௌ் 	ൌ 	∑ ∑ ܨ௜௝௡௝ୀ௜ାଵ௡௜ୀଵ݊	݁ݏ݅ݓݎ݅ܽ݌	ݏ݊݋ݏ݅ݎܽ݌݉݋ܿ The individual based FST estimator is unaffected by small sample sizes because it uses range-wide allele frequencies. Because we use pairwise comparisons of individuals within populations, the minimum number of individuals per population is two. Therefore, 2 P. monticola and 23 P. strobus populations from which only one individual was sampled were discarded for this analysis. 64  We compared the observed distribution of FST calculated using the individual based FST estimator to the FST distribution of a set of 50,000 bi-allelic SNPs generated under a symmetrical island model (Wright 1951) by means of coalescent simulations using Simcoal 2.1.2 (Eveno et al. 2008; Prunier et al. 2011, 2012). An equal number of populations and samples sizes as in our dataset were simulated. Migration rates between populations were adjusted to match the mean FST estimated from the observed data using the formula Nem = (1- FST)/(4* FST). Individual based FST and expected heterozygosity (HE) of simulated SNPs were estimated and SNPs were classified into discrete HE bins, each spanning a HE range of 0.25. Following Beaumont & Nichols (1996), a Johnson distribution was fitted to the simulated FST distribution within each bin and the 95% and 99% confidence intervals were calculated. Each observed locus was tested against the fitted Johnson distribution to calculate a corresponding p-value (two-sided test). A false discovery rate of 5% was applied based on the Benjamini & Hochberg (1995) criteria using the “qvalue” package (Storey 2002) using R v. 2.15 (R Development Core Team 2013). The code used to implement the FDIST method is given in Appendix 5.  We chose to use BayeScan as our second test of selection because it has been proven to be one of the most reliable methods (Narum & Hess 2011; De Mita et al. 2013) and can be used with small population sample sizes. Being Bayesian, BayeScan incorporates the uncertainty on allele frequencies due to small sample sizes, with the risk of reduced power, but no risk of bias. Based on the work of Beaumont & Balding (2004), and further developed by Foll & Gaggiotti (2008), the method decomposes FST into a locus-specific component (α), shared by all populations, and a population specific component (β), shared by all loci. Departure from neutrality at a given locus is assumed when the α component is significantly 65  different from zero. Positive values of α indicate diversifying selection, whereas negative values suggest balancing selection.  We performed all simulations using the default parameters. Increasing the prior odds (PO) for the neutral model has been shown to reduce the number of false positives without greatly affecting the ability to detect true positives (Lotterhos & Whitlock 2014). We tested PO from 10 to 1000 and reported the results with PO of 100, as we found it was a good compromise between reducing the number of false positives and statistical power (Table S9). A PO of 100 means that the neutral model is 100 times more likely than the model with selection (1 out of 101 loci is expected to be under selection). The internal q-value function provided in BayeScan was used to assess significance. Outliers at a FDR <5% (q<0.05) were reported. All samples of each species, separately, were used in the BayeScan analysis. 3.2.6 Environmental association analysis Adaptation to climate in both species was investigated using two different environmental association methods that explicitly take into account the correlation of allele frequency among populations: Bayenv 2 (Coop et al. 2010; Günther & Coop 2012) and LFMM (Frichot et al. 2013). Analysis was conducted using all samples of each species, separately. Bayenv 2 was used in a standard way as a “population based” method and allele frequencies were averaged within each population prior to calculations. Bayenv uses a set of control loci to estimate a covariance matrix of allele frequencies across populations. This covariance matrix serves as a null model to test whether allele frequencies at a SNP locus of interest are significantly correlated with each environmental variable. We first ran 100,000 66  Markov Chain Monte Carlo (MCMC) runs using the whole SNP dataset to estimate the covariance matrix for each species. We then tested association between each SNP and each 15 climatic variable by running Bayenv 2 in “test mode” with 100,000 MCMC runs. Bayes factors (BF) were averaged across 10 replicates using 10 independent estimates of the covariance matrix. A cut-off of 3 for the BF was used to assess significance, which correspond to “substantial evidence” for selection according to Jeffrey’s scale of evidence (Jeffrey 1961). Because a set of control SNPs, ideally composed of non-candidate SNPs, was not known a priori, our analysis with Bayenv 2 may face the problem of circularity: all the SNPs used to estimate the covariance matrix were also used to test for environmental associations at each locus. To overcome this problem, we used Latent Factors Mixed Models (LFMM), a hierarchical Bayesian method based on a variant of principal component analysis, as a second method to detect environmental associations (Frichot et al. 2013). An advantage of this method is that it allows testing for genotypic-environment correlations while simultaneously controlling for population structure, and no a priori selection of a set of control SNPs is needed. This method has been shown to efficiently estimate random effects due to population history and IBD, and to decrease the number of false positives (Frichot et al. 2013).  To allow for the full use of the data, the LFMM method was run as an “individual based” method (i.e., individuals were not pooled into populations prior to analysis). This has the potential of increasing power of the analysis (de Villemereuil et al. 2014). In the LFMM method, residual population structure is introduced via k unobserved (latent) factors. To account for population structure, in P. monticola, we ran the analysis using k = 2 latent 67  factors for the range-wide dataset. A preliminary principal component analysis (results not shown) confirmed that those two latent factors should capture most of the existing population structure (two genetic groups and two northern sub-groups as detected by STRUCTURE, Chapter 2). We also ran the analysis using only populations from the northern group (k = 1 to control for two sub-groups, Chapter 2) and using only populations from the southern group (k = 3 to control for four sub-groups, Chapter 2). In P. strobus, we used k = 3 for the range-wide dataset (control for the two southern and northern groups and three northern sub-groups, Chapter 2), k = 2 for the northern group dataset (three sub-groups, Chapter 2) and k = 3 for the southern group dataset (four sub-groups, Chapter 2). Increasing k should make the test more conservative (Frichot et al. 2013) and yielded similar results than smaller k values (results not shown). For each analysis, 6,000 sweeps were run with the first 1,000 sweeps discarded as burn in. A Benjamini & Hochberg (1995) FDR correction of 5% was applied on the p-values outputted by the LFMM program using the “qvalue” package (Storey 2002) in R (R development team, 2013). 3.2.7 Identification of highly supported candidate genes and comparisons among species We narrowed down to a smaller set of strong candidate SNPs by comparing and combining similar results obtained using the four different methods (FDIST Ritland, BayeScan, Bayenv, and LFMM). We identified strong candidates by: 1) looking for SNPs that were detected by a minimum of two methods in a particular dataset (range-wide, northern group, or southern group; replication across methods of data analysis); and 2) looking for orthologous SNPs/genes that were detected by at least one method in each 68  species (replication across species). The latter comparison among species was made irrespective of the dataset (range-wide, northern group, or southern group). To see if orthologous SNPs/genes were subjected to similar selection pressure in both P. monticola and P. strobus, we classified SNPs/genes that were detected in both species into two categories: 1) showing “similar” patterns of adaptation to climate when a gene contained SNPs associated with at least one of the same climatic variables in the two species); and 2) showing different patterns of adaptation if no climate variable was in common between species. Genes containing no SNPs associated with climatic variables in one of the two species could not be evaluated in this manner. 3.3 Results 3.3.1 Isolation by distance (IBD) versus isolation by adaptation (IBA) We found significant patterns of isolation by distance (IBD) and isolation by adaptation (IBA) in both species (Table 3.1, Table 3.2). Mantel tests detected highly significant IBD among range-wide and northern group populations in both species. No IBD was detected among southern group populations of both species. Partial Mantel tests also detected significant IBA for a number of climate variables in both species and in all geographic datasets, (Table 3.1, Table 3.2). Among Pinus monticola range-wide populations, distances in latitude (Lat), elevation (Elev) and summer heat:moisture (SHM) were significantly correlated with genetic distance, after correcting for geographic distance (p < 0.05). In P. strobus, latitude, temperature (MAT; MCMT; bFFP), aridity (Eref, AHM) and precipitation (MAP, MSP, PAS) significantly explained genetic distance among range-wide 69  populations. Different independent variables were involved in IBA in the northern and southern groups (Table 3.1, Table 3.2).  70  Table 3.1. Correlation coefficients (r) from Mantel and Partial Mantel tests in Pinus monticola to test for the association between genetic distance (pairwise Slatkin linear FST, Y) and geographic distance (D), and between genetic distance and climatic variables, when controlled for D. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group"). Analyses were performed using all 158 P. monticola SNPs, as well as, using subsets of SNPs associated with climate as detected by Bayenv 2 and LFMM. The number of SNPs for each subset are in parentheses.   Range-wide (n = 61 b)   Northern group (n = 54 b)   Southern group (n = 7 b)  Test a All SNPs (158) Bayenv 2 (20) LFMM (27)   All SNPs (158) Bayenv 2 (15) c LFMM (24) c   All SNPs (150) Bayenv 2 (6) LFMM (9) Y ~ D 0.497*** 0.501*** 0.450***   0.211*** 0.203*** 0.142**   -0.007 -0.110 -0.503 Y ~ Lat | D 0.297*** 0.288*** 0.306***   0.062 0.072 0.096   -0.501 -0.125 -0.593 Y ~ Long | D -0.339 -0.318 -0.344   -0.078 -0.090 -0.099   0.427• 0.001 0.617* Y ~ Elev | D 0.337*** 0.262** 0.282**   0.119• 0.178* 0.125*   0.417 0.483 0.015 Y ~ MAT | D 0.040 -0.058 0.010   0.203** 0.210** 0.176**   -0.021 0.388 -0.419 Y ~ MWMT | D -0.017 0.026 -0.083   -0.035 0.028 -0.091   -0.264 0.128 -0.336 Y ~ MCMT | D -0.159 -0.283 -0.096   0.185** 0.149* 0.175**   0.401 0.546 -0.160 Y ~ MAP | D -0.118 -0.150 -0.058   -0.094 -0.010 -0.009   0.646* 0.738* 0.076 Y ~ MSP | D 0.038 0.121 0.044   -0.038 0.152* -0.003   0.162 0.414• 0.075 Y ~ AHM | D -0.101 -0.075 -0.039   -0.144 -0.041 -0.079   0.406* 0.456** 0.267* Y ~ SHM | D 0.243* 0.445** 0.074   -0.041 0.162* -0.069   -0.294 0.044 -0.438 Y ~ bFFP | D 0.083 -0.030 0.069   0.210** 0.212 ** 0.186*   -0.023 0.185 0.199 Y ~ eFFP | D -0.054 -0.153 -0.046   0.170** 0.187** 0.163**   0.140 0.355* -0.040 Y ~ PAS | D -0.055 -0.054 -0.039   0.027 0.186* 0.020   -0.015 0.103 -0.067 Y ~ Eref | D 0.118• 0.257** -0.020   0.000 0.029 -0.015   -0.216 0.166 -0.454 Y ~ CMD | D a 0.117• 0.254** 0.056   -0.073 0.056 -0.061   -0.207 0.048 -0.329 a: Lat: Latitude; Long: Longitude; Elev: elevation; MAT: mean annual temperature; MWMT: mean warmest month temperature; MCMT: mean coldest month temperature; MAP: mean annual precipitation; MSP: mean summer month precipitation; AHM: annual heat: moisture index; PAS: precipitation as snow; SHM: summer heat:moisture index; bFFP beginning of frost free period; eFFP: end of frost free period; Eref: Hargreaves reference evaporation; CMD: Hargreaves climatic moisture deficit. d: Significance codes: *** = p < 0.001; ** = p < 0.01; * = p < 0.05; • = p < 0.1. b: n = the number of populations sampled. All sampled populations included (northern and southern group).   71  Table 3.2. Correlation coefficients (r) from Mantel and Partial Mantel tests in Pinus strobus to test for the association between genetic distance (pairwise Slatkin linear FST, Y) and geographic distance (D), and between genetic distance and climatic variables, when controlled for D. Analyses were performed using "range-wide" samples and within each genetic group detected by STRUCTURE ("northern group", "southern group"). Analyses were performed using all 153 P. strobus SNPs, as well as, using subsets of SNPs associated with climate as detected by Bayenv 2 and LFMM. The number of SNPs for each subset are in parentheses.   Range-wide (n = 133)   Northern group (n = 125)   Southern group (n = 8)   All SNPs (153) Bayenv 2 (19) LFMM (38)   All SNPs (153) Bayenv 2 (13) LFMM (28)   All SNPs (146) Bayenv 2 (13) LFMM (4) Y ~ D 0.170 *** 0.296 *** 0.183 ***   0.138 *** 0.240 *** 0.122 ***   0.034 0.190 0.182 Y ~ Lat | D 0.146 ** 0.037 0.222 ***   0.046 0.082 * -0.009   0.682 ** 0.662 ** 0.465 * Y ~ Long | D -0.152 -0.054 -0.235   -0.042 -0.072 0.011   -0.732 -0.672 -0.515 Y ~ Elev | D 0.002 -0.053 0.046   -0.101 -0.092 -0.094   -0.577 -0.565 -0.350 Y ~ MAT | D 0.126 * 0.031 0.183 **   0.055 0.064 • 0.007   0.264 0.209 0.469 * Y ~ MWMT | D 0.073 0.061 • 0.084 •   0.055 0.086 * -0.018   -0.224 -0.276 0.082 Y ~ MCMT | D 0.131 ** 0.039 0.192 ***   0.070 • 0.050 • 0.044   0.558 * 0.491 • 0.611 * Y ~ MAP | D 0.126 * 0.093 * 0.100 •   0.097 * 0.109 ** 0.029   0.303 0.509 * -0.229 Y ~ MSP | D 0.145 * 0.083 * 0.129 *   0.079 • 0.105 ** -0.002   0.404 0.626 * -0.093 Y ~ AHM | D 0.088 * 0.121 ** 0.067 •   0.118 * 0.102 ** 0.025   0.027 0.166 -0.258 Y ~ SHM | D 0.071 • 0.110 ** 0.042   0.082 • 0.110 ** -0.002   0.078 0.243 -0.231 Y ~ bFFP | D 0.095 * 0.068 * 0.138 **   0.038 0.078 * 0.026   0.038 -0.043 0.406 • Y ~ eFFP | D 0.057 0.015 0.122 **   -0.002 0.004 -0.013   0.202 0.121 0.476 * Y ~ PAS | D 0.113 ** 0.097 * 0.123 **   0.091 * 0.072 * 0.002   0.591 ** 0.575 ** 0.627 * Y ~ Eref | D 0.157 * 0.068  • 0.217 ***   0.082 0.121 * -0.014   0.565 * 0.539 • 0.546 * Y ~ CMD | D 0.072 • 0.083 * 0.060   0.060 0.067 • -0.038   0.340 0.534 • -0.088  72  We also performed partial Mantel tests of IBA using only SNPs significantly associated with climate by either Bayenv 2 or LFMM tests. In most cases, IBA with climatic variables detected using all SNPs were also detected when using the Bayenv 2 and LFMM subsets of SNPs. In addition, more climate variables were often significantly involved in IBA using Bayenv 2 and LFMM SNPs than when using all SNPs. In some cases, patterns of IBA differed between Bayenv 2 and LFMM candidate SNPs (Table 3.1, Table 3.2). For example, in range-wide populations of P. strobus, Bayenv 2 SNPs showed the strongest IBA with respect to aridity (AHM, SHM, CMD) and precipitation-related variables (PAS, MAP, MSP), whereas LFMM candidate SNPs were more closely associated with temperature-related variables (Eref, MCMT, MAT, bFFP, eFFP, also PAS). Another striking difference can be seen in the northern P. strobus group where Bayenv 2 SNPs showed significant IBA with most climate variables, but LFMM SNPs did not show IBA. Over all geographic and SNP datasets, more climatic variables were generally involved in IBA in P. strobus than in P. monticola. We further partitioned among-population genetic differentiation into three components (climate, space, and ancestry) using redundancy discriminant analysis (RDA). In P. monticola, 3 RDA axes significantly explained among-population variation (p < 0.05) in range-wide and northern group datasets (Table S10). Climate, when corrected for ancestry and space, did not explain a significant proportion of the among-population variation (range-wide: 0.7%, p = 0.33; northern group: 0.9%, p = 0.25, Table 3.3). In this species, a large proportion of among-population variation was confounded among the three components, and the model was not able to separate their relative contributions (Figure 3.1, Table 3.3).  73  a) b)   Figure 3.1. Proportion of among-population differentiation explained by climate (MAT, MWMT, MCMT, MAP, MSP, AHM, SHM, bFFP, eFFP, PAS, Eref, and CMD), space (Pinus monticola :x, y, xy, y2; Pinus strobus: x, y) and ancestry (Q-values from STRUCTURE, Chapter 2) in a) P. monticola and b) P. strobus using redundancy discriminant analysis (RDA). Significance codes: ** = p < 0.01; * = p < 0.05). Analyses using all populations (“range-wide”) are shown. The sizes of the circles are not to scale. For results within “northern” and “southern” groups see Table 3.3 and Table 3.4).1.3%4.4%11%Ancestry 3.7% **Climate 0.7%Space 2.5% ** 0%Unexplained 76.5%Space 0.9% **Climate 1.9% *Ancestry 4.2% **2.9%0.5%0.2% 0.3%Unexplained 89.2%74  Table 3.3. Redundancy discriminant analysis (RDA) to determine the fraction of among population genetic variation due to climate, space, and ancestry in Pinus monticola.   Range-wide (n = 55) a  Northern group (n = 50) a  Southern group (n = 5) a, c   Adjusted R2 (%) p (>F) b  Adjusted R2 (%) p (>F) b  Adjusted R2 (%) p (>F) b climate 0.126 0.001 ***  0.083 0.001 ***  NA NA space 0.191 0.001 ***  0.110 0.001 ***  NA NA ancestry 0.187 0.001 ***  0.060 0.001 ***  NA NA climate | (space + ancestry) 0.007 0.330  0.009 0.250  NA NA space | (climate + ancestry) 0.025 0.005 **  0.033 0.010 **  NA NA ancestry | (climate + space) 0.037 0.005 **  0.019 0.010 **  NA NA climate+space | ancestry 0.013    0.035    NA NA space+ancestry | climate 0.044    0.002    NA NA climate+ancestry | space 0.000    0.000    NA NA climate + ancestry + space 0.110    0.040    NA NA unexplained 0.765    0.862    NA NA a: n = the number of populations sampled. Populations having missing data for SNPs markers were discarded for this analysis. b: Significance codes: *** = p < 0.001; ** = p < 0.01; * = p < 0.05; • = p < 0.1. c: NA = not calculated due insufficient number of populations and collinearity among explanatory variables.  75  In P. strobus, 3 and 4 RDA axes were significant in range-wide and northern group datasets respectively (p < 0.05; Table S11). In both datasets, all three components were significant when controlled for the other two components (Table 3.4). The largest proportion of the variation was explained by ancestry (range-wide: 4.2%, p = 0.005; northern group: 3.4%, p = 0.005), followed by climate (range-wide: 1.9%, p = 0.013; northern group: 1.8%, p = 0.025) and by space (range-wide: 0.9%, p = 0.005; northern group: 0.7%, p = 0.015). A proportion of the variation was also confounded between all three components (Figure 3.1, Table 3.4). In all RDA models, a large proportion of the variation could not be explained by one of these three components (76.8% to 91.1%). 76  Table 3.4. Redundancy discriminant analysis (RDA) to determine the fraction of among population genetic variation due to climate, space, and ancestry in Pinus strobus.   Range-wide (n = 103)  Northern group (n = 97)  Southern group (n =  8) a   Adjusted R2 (%) p (>F)  Adjusted R2 (%) p (>F)  Adjusted R2 (%) p (>F) climate 0.053 0.001 ***  0.040 0.001 ***  NA NA space 0.045 0.001 ***  0.035 0.001 ***  NA NA ancestry 0.079 0.001 ***  0.063 0.001 ***  NA NA climate | (space + ancestry) 0.019 0.013 *  0.018 0.025 *  NA NA space | (climate + ancestry) 0.009 0.005 **  0.007 0.015 *  NA NA ancestry | (climate + space) 0.042 0.005 **  0.034 0.005 **  NA NA climate+space | ancestry 0.002    0.001    NA NA space+ancestry | climate 0.005    0.008    NA NA climate+ancestry | space 0.003    0.002    NA NA climate + ancestry + space 0.029    0.019    NA NA unexplained 0.892    0.911    NA NA   77  3.3.2 FST outlier tests To detect FST outliers, we first used an individual based FST estimator to account for small population sample sizes, and incorporated it into the FDIST method (FDIST Ritland). The individual based FST was highly correlated with the widely used Weir & Cockerham (W&C) method (P. monticola: r = 0.93; P. strobus: r = 0.84; Figure S8). Differences between the individual based FST and the W&C estimator (individual based FST  minus W&C) at individual SNP varied between -0.204 and 0.016 in P. monticola (mean difference:  -0.015) and between -0.057 and 0.079 in P. strobus (mean difference: -0.005). Individual based FST estimates were significantly lower compared to W&C when the W&C estimates increased (P. monticola: p < 2e-16, P. strobus: p < 1e-04). In P. monticola, the FDIST Ritland test performed using range-wide populations indicated that 5 SNPs displayed patterns of population differentiation that diverged from the expectations of a neutral island model using a 5% false discovery rate (FDR) cut-off (Figure S9, Table 3.5). In the northern and southern groups, a total of 2 and 3 SNPs were detected, respectively. In P. strobus, 3, 4 and 1 outlier SNPs were detected in range-wide, northern group, and southern group populations respectively (FDR<5%) (Figure S10, Table 3.5). An additional 5 and 4 SNPs with very low minor allele frequency (<1%) were found to be under balancing selection in P. monticola and P. strobus, respectively (Table 3.5). Such loci may violate the assumptions of the FDIST method (Excoffier et al. 2009), and, therefore, were not retained as outliers. None of these SNPs were detected by BayeScan. A hierarchical model, in which migration rates were adjusted to match among-groups and within-groups average 78  FST, did not produce markedly different 95% and 99% confidence intervals, and the same outlier SNPs were detected in both P. monticola (Figure S11) and P. strobus (not shown). Table 3.5. Number of FST outlier SNPs detected by the FDIST Ritland and BayeScan in Pinus monticola and P. strobus using false discovery rate (FDR) of 5%.   P. monticola  P. strobus   Range-wide Northern group Southern group  Range-wide Northern group Southern group FDIST Ritland a                 Divergent 5 2 2  2 4 1    Balancing 0 (5) 0 (3) 1  1 (1) 0 (4) 0    Total 5 (5) 2 (3) 3  3 (1) 4 (4) 1                BayeScan b                 Divergent 1 0 0  2 1 0    Balancing 2 0 0  9 5 0    Total 3 0 0  11 6 0                In common to both methods 1 0 0  3 1 0 Total a 7 (5) 2 (3) 3  11 (1) 9 (4) 1 a: the number of SNPs detected with minor allele frequency < 1% are shown between parentheses. These SNPs were not considered outliers. b: Using prior odds of 100.  In P. monticola, BayeScan detected 3 outlier SNPs using range-wide populations, but none were found with each genetic group (FDR<5%, Table 3.5). In P. strobus, a total of 11, 6 and 0 outlier SNPs were detected in range-wide, northern group, and southern group populations, respectively (FDR<5%, Table 3.5). Using both methods and considering all geographic datasets (range-wide, northern group, and southern group) a total of 11 (7%) P. monticola and 15 (9.8%) P. strobus SNPs were detected as FST outliers. In P. monticola, the one SNP detected to be under divergent selection by BayeScan (range-wide) was also identified by FDIST Ritland (Table 3.5). No SNP under balancing selection were common to both methods. In P. strobus, all SNPs 79  detected under divergent selection (2 in range-wide dataset, 1 in northern group dataset) and two SNPs found to be under balancing selection (range-wide) by BayeScan were also detected by FDIST Ritland (Table 3.5).  3.3.3 Environmental association tests Bayenv 2 and LFMM methods both detected a high number of SNP-environment correlations in both species (Table 3.6, Figure 3.2, Figure 3.3). In P. monticola, Bayenv 2 detected 20, 15, and 6 SNPs significantly associated with one or more climatic variable when considering range-wide, northern group, and southern group populations, respectively (Bayes factor [BF] >3). A similar number of SNPs were detected by Bayenv 2 in P. strobus: 19 range-wide, 13 in the northern group, and 13 in the southern group (BF >3). We noted that independent runs of Bayenv 2 were highly variable (results not shown).  Table 3.6. Number of SNPs associated with climate, as detected by LFMM and Bayenv 2 in Pinus monticola and P. strobus using false discovery rate (FDR) of 5% (LFMM) and Bayes factor (BF) > 3 (Bayenv 2).   P. monticola  P. strobus   Range-wide Northern group Southern group  Range-wide Northern group Southern group LFMM 27 24 9  38 28 4 Bayenv 2 20 15 6  19 13 13                In common to both methods 4 1 0  5 4 0 Total 43 38 15  52 37 17   80  a)  b)  c)  Figure 3.2. Number of SNPs associated with each climatic variable by LFMM (false discovery rate = 5%) and Bayenv 2 (Bayes factor > 3) in Pinus monticola: a) “range-wide” populations b) “northern group” populations and c) “southern group” populations. Lat: Latitude; Long: Longitude; Elev: elevation; MAT: mean annual temperature; MWMT: mean warmest month temperature; MCMT: mean coldest month temperature; MAP: mean annual precipitation; MSP: mean summer month precipitation; AHM: annual heat: moisture index; PAS: precipitation as snow; SHM: summer heat:moisture index; bFFP beginning of frost free period; eFFP: end of frost free period; Eref: Hargreaves reference evaporation; CMD: Hargreaves climatic moisture deficit.   051015202530Number of SNPs Bayenv 2 Bayenv 2 & LFMM LFMM051015202530Number of SNPs Bayenv 2 Bayenv 2 & LFMM LFMM051015202530Number of SNPs Bayenv 2 Bayenv 2 & LFMM LFMM81  a)  b)  c)  Figure 3.3. Number of SNPs associated with each climatic variable by LFMM (false discovery rate = 5%) and Bayenv 2 (Bayes factor > 3) in Pinus strobus: a) “range-wide” populations b) “northern group” populations and c) “southern group” populations.  051015202530Number of SNPs Bayenv 2 Bayenv 2 & LFMM LFMM051015202530Number of SNPs Bayenv 2 Bayenv 2 & LFMM LFMM051015202530Number of SNPs Bayenv 2 Bayenv 2 & LFMM LFMM82  An even higher number of SNPs associated with climate was detected by LFMM (FDR < 5%, Table 3.6, Figure 3.3). In P. monticola, 27, 24, and 9 SNPs were detected when considering range-wide, northern group and southern group populations, respectively. In P. strobus, 38 SNPs were associated range-wide, 28 in the northern group, and 4 in the southern group. A total of 60 (38%) outlier SNPs in P. monticola and 72 (47%) outlier SNPs in P. strobus were significantly associated with one or more climatic or geographic variable by either Bayenv 2 or LFMM (all geographic datasets combined, Table 3.6). In general, geographic variables (latitude, longitude, elevation) were highly represented in SNP-climate associations.  In P. monticola, a high number of SNPs was associated with reference evaporation (Eref, range-wide and northern group) and climatic moisture deficit (CMD, northern group), mainly by the LFMM method, while other climatic variables had lower number of SNP associations. In P. strobus, the number of SNPs associated with each climatic variable was generally higher than in P. monticola and SNP-climate associations were more evenly distributed across climatic variables. The top 2 climate variables in P. strobus were reference evaporation (Eref) and precipitation as snow (PAS) in range-wide populations, and mean annual precipitation (MAP) and annual heat moisture (AHM) in the northern group populations. Fewer SNP-climate associations were found in the southern groups of both species. SNPs associated with climatic variables strongly differed between the two methods (Table 3.6). In P. monticola, only 5 SNPs were detected by both Bayenv 2 and LFMM 83  methods (all datasets combined). In P. strobus, 9 SNPs were in common to both methods (all datasets combined). In this species, the strongest SNP-climate associations tended to be detected by both methods as 4 of the top 5 SNPs detected by Bayenv 2 (BF > 30, all geographic datasets combined), were also detected by LFMM. The number of SNPs associated with each climate variable also differed between the two methods (Figure 3.2). Most strikingly, in P. monticola, LFMM detected a high number of SNPs associated with reference evaporation (Eref) and climatic moisture deficit (CMD), while Bayenv 2 found no or few SNPs associated with these two climate variables. 3.3.4 Summary of FST outlier and environmental association analyses When all geographic datasets and all 4 methods were combined, a total of 64 (41%) P. monticola and 80 (52%) P. strobus SNPs were detected as FST outliers or were associated with climate (Table S12, Table S13). Note that 7 outlier SNPs were shared between P. monticola and P. strobus (see “3.3.6 overlap between species”). Among the 137 outlier SNPs, 13, 33, and 70 were located in 3’ untranslated region (3’ UTR), intron, and exon, respectively, and 21 could not be annotated. Among the 70 coding SNPs, 39 and 31 were synonymous and non-synonymous, respectively. Annotation of these genes and their putative functions can be found in Table S8.  Different outliers were detected depending on the geographic dataset analysed. Across the 4 methods, more outlier SNPs were detected when considering all range-wide populations, followed by the northern group datasets and the southern group datasets. In P. monticola and P. strobus respectively, 26% and 31% of outlier SNPs were found only in the range-wide datasets, 15% and 13% of outlier SNPs were specific to the northern group 84  datasets and 8% and 11% of outlier SNPs were specific to the southern group datasets. A large number of outlier SNPs were found in both range-wide and northern groups (percentage of outlier SNPs: P. monticola: 34%; P. strobus: 35%), as expected given that most sampled populations belonged to the northern groups. Less than <5% of outlier SNPs in both species were shared between either range-wide and southern groups or northern groups and southern groups. A small number of outliers were also detected in all three datasets within each species (percentage of outlier SNPs: P. monticola: 9.2%; P. strobus: 4%). 3.3.5 Overlap between methods of analysis In order to highlight “strong candidate” SNPs, we combined results from the 4 methods of analysis (FDIST Ritland, BayeScan, Bayenv 2 and LFMM), and we identified SNPs that were detected by at least 2 different methods for a particular geographic dataset. We found 8 (5.1%) and 8 (5.2%) strong candidate SNPs in P. monticola and P. strobus respectively, located in 7 genes in each species (Table 3.7, Table 3.8). Different SNPs were detected as strong candidates in different datasets. The majority of strong candidate SNPs were detected in the range-wide or the northern group datasets. Three SNPs were detected by both methods only in the southern groups. In P. monticola and P. strobus respectively, 5 and 3 SNPs showed signal of divergent selection and 1 and 2 SNPs were under balancing selection. All strong candidate SNPs, except one, were associated with one or more climatic or geographic variable. Among the 16 strong candidate SNPs, 2, 4 and 7 SNPs were located in 3’ UTR, intron and exon, respectively, and 3 could not be annotated. Among the 7 coding SNPs, 3 and 4 were synonymous and non-synonymous, respectively.  85  Table 3.7. "Strong candidate" SNPs in Pinus monticola, detected by a minimum of 2 different methods within a dataset ("range-wide", "northern group" or "southern group"). Gray and white areas refer to the alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported. SNP SNP classa Gene SNP Pos SNP codeb Dataset c FDIST Ritland d Bayescan d LFMM d Bayenv d  Putative function (RefSeq) P-039 II 0_9408_01 194 NS SG div ** NS NS Lat, MCMT, MAP, MAT * Transcription factor myc2-like R-022 II 0_9408_01 371 S SG div ** NS NS MCMT, MAP, MAT * Transcription factor myc2-like Q-004 II 2_3852_01 231 3’UTR NG, SG bal *** NS CMD, Eref, AHM, SHM, Elev *** bFFP, Elev * NA P-023 II 0_3192_01 289 S RW NS NS Lat * Lat * Protein CbxX, chromosomal-like S-034 I 2_10212_01 191 Intron RW div * NS NS Lat, Long * Glutathione s-transferase family Q-022 II 2_7532_01 370 NA RW NS NS Eref, CMD, Lat *** Lat * NA S-025 II CL1430-Contig1_06 286 Intron RW div * NS Lat * Lat, MCMT ** Pyrophosphate-fructose 6-phosphate 1-phosphotransferase  S-007 II CL3539-Contig1_01 180 Intron RW div * div ** Lat * Lat, Elev ** TOM1-like protein 2 a: I: orthologous SNP; II: SNP of orthologous genes; III: single-species SNP. See “3.2.1 Sampling and SNP genotyping”.  b: S: synonymous; NS: non-synonymous SNP; NA: not annotated. c: SNP detected by at least two different methods in these datasets. RW: range-wide, NG: northern group, SG: southern group. d: Significance codes for FDIST Ritland, Bayescan, and LFMM: * = q < 0.05; ** = q < 0.01; *** = q < 0.001. For Bayenv:  * = Bayes factor (BF) > 3, ** = BF > 10, *** = BF > 32, **** = BF > 100. NS = non-significant. The highest significance level among the 3 datasets is reported (see Supplementary Table 8). div: divergent selection, bal: balancing selection.  86  Table 3.8. "Strong candidate" SNPs in Pinus strobus, detected by a minimum of 2 different methods within a dataset ("range-wide", "northern group" or "southern group"). Gray and white areas refer to the alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported. SNP SNP class Gene SNP Pos SNP code Dataset FDIST Ritland Bayescan LFMM Bayenv Putative function (RefSeq) O-032 III 2_3726_02 347 3’ UTR SG NS NS NS CMD, MSP, SHM ** dnaJ homolog subfamily B member 1-like N-004 III 0_11649_03 71 S RW bal ** bal ** NS NS Tubulin beta chain-like M-017 I 0_8844_01 244 Intron RW NS NS Eref, Lat, MAT *** bFFP, PAS, MAT, lat, long * Galacturonosyltransferase 13-like N-029 II 0_6047_02 221 NA RW, NG div *** div *** MCMT, Lat, eFFP, MAT, bFFP *** Lat, MAT, MCMT, bFFP **** NA N-035 II 0_771_01 367 NA RW, NG bal bal *** Elev, MSP, SHM *** NS NA M-015 I 0_8683_01 224 NS RW, NG div *** NS MWMT, PAS, Long, Eref, Lat *** PAS, Long, MWMT, AHM **** Serine threonine-protein kinase at1g18390-like M-016 I 0_8683_01 371 NS RW, NG NS NS Long, MAP, AHM *** Long, MAP, AHM *** Serine threonine-protein kinase at1g18390-like G-014 III GQ0081.BR.1 D09 0 NS RW, NG div ** div *** Long, PAS * Eref, Lat, MAT, PAS *** Uncharacterized protein  87  3.3.6 Overlap between species Finally, we looked for genes including SNPs detected by one of the 4 methods (FDIST Ritland, BayeScan, Bayenv and LFMM) in both species (all geographic datasets combined). Out of the 79 orthologous genes, a total of 22 (28%) contained SNPs detected in both species (Table 3.9). One gene (CL1430Contig1_06) was detected by FST outlier methods in both species. This gene was under divergent selection in P. monticola, but under balancing selection in P. strobus. Out of the 17 genes that were associated with climate in both species, 6 genes were associated with at least one common climate variable between the two species (“similar”) and 11 were associated with different climatic variables (“different”, Table 3.9). The 46 outlier SNPs within the 22 common genes were annotated as follows: 4, 11 and 20 SNPs were located in 3’ UTR, intron, and exon, respectively, and 11 could not be annotated. Among the 20 coding SNPs, 12 and 8 were synonymous and non-synonymous, respectively.  Out of the 34 orthologous SNPs, 8 (24%) were detected in both species: 3 SNPs were associated with similar climate variables between species, 4 to different climate variables and 1 was associated with climate in one species and under divergent selection in the other species (Table 3.9). Three were located in introns, 2 were synonymous, 2 were non-synonymous, and 1 could not be annotated. 88  Table 3.9. SNPs/genes detected as FST outlier or associated with climatic variables in common to Pinus monticola and P. strobus. White and grey areas refer to the ”), alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported.                P. monticola  P. strobus        SNP SNP type Gene SNP pos. SNP code FST Outliers a Environmental associations a  FST Outliers a Environmental associations a  P. monticola vs P. strobus b Putative function (RefSeq) N-033 II 0_7001_01 59 NS NA NA  NS Lat, MCMT, PAS (3***)  Similar NADPH-dependent diflavin oxidoreductase ATR3-like isoform 2 N-034 II 0_7001_01 94 NS NA NA  NS MCMT, Lat, Eref (3***)  Similar NADPH-dependent diflavin oxidoreductase ATR3-like isoform 2 P-034 II 0_7001_01 270 S NS MCMT, eFFP, Lat (4**)  NA NA  Similar NADPH-dependent diflavin oxidoreductase ATR3-like isoform 2 O-002 II 0_8844_01 152 S NA NA  div (1 *) eFFP, bFFP, MAT (3***)  Similar Galacturonosyltransferase 13-like M-017 I 0_8844_01 244 Intron NS Eref, CMD (3***)  NS Eref, Lat, MAT, bFFP, PAS, long (3***; 4*)  Similar Galacturonosyltransferase 13-like O-013 II 2_6052_01 267 NS NA NA  NS Eref, MSP, Lat (3*)  Similar Manganese-dependent ADP-ribose/CDP-alcohol diphosphatase-like Q-010 II 2_6052_01 314 S NS Eref (3*)  NA NA  Similar Manganese-dependent ADP-ribose/CDP-alcohol diphosphatase-like M-022 II 2_7852_01 55 NA NA NA  NS Elev, PAS, MCMT, MAT (3***)  Similar NA Q-023 II 2_7852_01 428 NA NS Elev, Lat, Eref, Long (3***)  NA NA  Similar NA Q-024, O-021 I 2_7852_01 436 NA NS NS  NS Lat, Eref, MAT (3***)  Similar NA O-022 II 2_7852_01 442 NA NA NA  NS Eref, lat, PAS (3*; 4*)  Similar NA 89  Table 3.9 (Continued). SNPs/genes detected as FST outlier or associated with climatic variables in common to Pinus monticola and P. strobus. White and grey areas refer to the alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported.                P. monticola  P. strobus        SNP SNP type Gene SNP Pos SNP code FST Outliers a Environmental associations a  FST Outliers aEnvironmental associations a  P. monticola vs P. strobus b Putative function (RefSeq) M-029 I CL1806-Contig1_01 253 S NS MSP, SHM (4*)  NS SHM, eFFP (4*)  Similar Arogenate dehydratase prephenate dehydratase chloroplastic-like Q-038 II CL1852-Contig1_01 175 S NS Eref (3*)  NA NA  Similar Polyadenylate-binding protein 3 O-035 II CL1852-Contig1_01 259 Intron NA NA  NS Eref, PAS, Lat (3*)  Similar Polyadenylate-binding protein 3 P-003 II 0_11270_01 88 NS NS PAS (4*)  NA NA  Different Inactive LRR receptor-like protein kinase at3g28040-like N-002 II 0_11270_01 242 3’ UTR NA NA  NS Eref (4*)  Different Inactive LRR receptor-like protein kinase at3g28040-like M-028 I 0_18267_01 150 S NS AHM, MAP, MSP (4*)  NS NS  Different Predicted protein M-007 I 0_18267_01 174 S NS MSP, AHM, MAP (4*)  NS NS  Different Predicted protein M-008 I 0_18267_01 306 S NS NS  NS Long (4*)  Different Predicted protein N-023 II 0_2576_02 80 NA NA NA  NS MSP, SHM, Elev (3***)  Different NA P-022 II 0_2576_02 100 NA NS Eref (4*)  NA NA  Different NA R-029 II 2_3867_02 102 Intron NS Eref (4*)  NA NA  Different Profilin-2 4-like O-009 II 2_3867_02 181 Intron NA NA  NS bFFP, eFFP, MAT (4*)  Different Profilin-2 4-like Q-005 II 2_3867_02 489 3’ UTR NS Eref, CMD, Lat (3***)  NA NA  Different Profilin-2 4-like    90  Table 3.9 (Continued). SNPs/genes detected as FST outlier or associated with climatic variables in common to Pinus monticola and P. strobus. White and grey areas refer to the alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported.                P. monticola  P. strobus        SNP SNP type Gene SNP Pos SNP code FST Outliers aEnvironmental associations a  FST Outliers a Environmental associations a  P. monticola vs P. strobus b Putative function (RefSeq) M-021 II 2_7189_01 137 NS NA NA  NS Lat, Eref (3*)  Different Subtilisin-like protease-like Q-018 II 2_7189_01 298 S NS MSP, PAS, MAP, CMD, Lat (3***)  NA NA  Different Subtilisin-like protease-like O-026 II 2_9542_01 121 S NA NA  NS MAT, MCMT (4*)  Different Aminotransferase ACS10-like Q-029 II 2_9542_01 202 S NS Lat (4*)  NA NA  Different Aminotransferase ACS10-like M-026 I CL1694-Contig1_02 180 Intron NS CMD, Eref, MSP (3**)  NS MAP, MCMT (3*)  Different 116 kDa U5 small nuclear ribonucleoprotein component-like M-027 I CL1694-Contig1_02 260 Intron NS CMD, Eref, MSP (3***)  NS MAT, MAP, MCMT (3**)  Different 116 kDa U5 small nuclear ribonucleoprotein component-like O-014 II CL1698-Contig1_01 255 NA NA NA  NS Long (4*)  Different NA Q-036 II CL1698-Contig1_01 256 NA NS Lat, MSP, Elev (3***)  NA NA  Different NA M-031 I CL2332-Contig1_01 328 NA NS Eref, CMD, MSP (3***)  NS MAT, MCMT, Lat, bFFP (3**; 4*)  Different Calcium-dependent protein kinase 21 M-034 I CL3097-Contig1_01 169 NS NS Eref, CMD (3***)  NS bFFP (4*)  Different NA O-038 II CL3097-Contig1_01 275 Intron NA NA  NS NS  Different NA S-039, G-020 I GQ0026.BR_B03 0 S NS Elev (4*)  NS MAP, MSP, AHM (3**)  Different Protein aspartic protease in guard cell 1-like  91  Table 3.9 (Continued). SNPs/genes detected as FST outlier or associated with climatic variables in common to Pinus monticola and P. strobus. White and grey areas refer to the alternance between genes. The top 3 significant climatic variables for each method, in each dataset, are reported.                P. monticola  P. strobus        SNP SNP type Gene SNP Pos SNP code FST Outliers a Environmental associations a  FST Outliers aEnvironmental associations a  P. monticola vs P. strobus b Putative function (RefSeq) P-019 II 0_17938_01 146 NA NS Eref (3*)  NA NA    ATP-binding cassette transporter, subfamily B N-021 II 0_17938_01 270 NA NA NA  bal (2***) NS    ATP-binding cassette transporter, subfamily B T-015 II 2_3720_01 156 NS NA NA  NS Eref, Lat, MAP, MCMT (3**; 4*)    NA T-016 I 2_3720_01 198 NS div (1) *** NS  NS Lat, MAP (3*)    NA O-012 II 2_5724_02 133 3'UTR NA NA  bal (2*) NS    NA Q-008 II 2_5724_02 156 3'UTR NS Eref, MWMT, CMD (3***)  NA NA    NA T-022 II CL1430-Contig1_06 141 Intron NA NA  bal (2**) NS    Pyrophosphate--fructose 6-phosphate 1-phosphotransferase subunit alpha-like S-025 II CL1430-Contig1_06 286 Intron div (1) * Lat, MCMT  (3*; 4**)  NA NA    Pyrophosphate--fructose 6-phosphate 1-phosphotransferase subunit alpha-like T-029 I CL866-Contig1_01 109 Intron bal (2) * NS  NS NS    Dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex-like T-017 II CL866-Contig1_01 395 Intron NA NA  NS MSP (4*)    Dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex-like a: Significance for each method are given in parentheses: 1: FDIST Ritland; 2: BayeScan; 3: LFMM; 4: Bayenv (see Supplementary Table 8). NA = not tested because the SNP was not detected or unsuccessful in this species. b: similar = gene associated with at least one common climatic variable between species. Different = not associated with any common climatic variable. 92  3.4 Discussion In this chapter, we looked for evidence of adaptation to climate separately in two widely distributed North American white pine species, and attempted to disentangle selection from neutral processes in shaping patterns of among-population genetic differentiation. We found evidence of local adaptation to climate in both species, and built a list of strong candidate genes for each species by combining results from various FST outlier and environmental association analyses. We also discovered a small number of candidate genes under selection in both species and discussed the implications of such findings. 3.4.1 Isolation by adaptation in Pinus monticola and P. strobus Identifying the main drivers of population genetic structure can be a daunting task because selectively neutral processes, such as isolation by dispersal or recolonization history, can interact with natural selection in natural landscapes (Orsini et al. 2013). In P. monticola and P. strobus, the survival of populations in multiple glacial refugia and subsequent range shifts were found to be major determinants of the present day population structure (Chapter 2). Here we investigated the extent to which isolation by adaptation (IBA) contributed to shape among-population differentiation, as opposed to the effects of isolation by distance (IBD) and postglacial colonization. Mantel and partial Mantel tests suggested that IBD and IBA are both significant drivers of population structure in white pine. In addition, redundancy discriminant analysis (RDA) showed that postglacial recolonization (ancestry) and IBD (space) were the main drivers of population structure in P. monticola, but not climate after controlling for space and ancestry. 93  In P. strobus, all three factors (ancestry, space, and climate) were determining population structure. It should be noted that, in both species, the vast majority of among-population genetic variation could not be explained by these variables, and other unaccounted factors are likely determining population structure. In the RDA analysis, climate explained among-population differentiation in P. strobus, but not in P. monticola. Evidence of IBA is expected in P. strobus as significant among-population genetic variation can be found in adaptive traits, suggesting that populations have locally adapted to the wide environmental variation across its range (Fowler & Heimburger 1969; Garrett 1973; Genys 1987; Li et al. 1997; Joyce & Sinclair 2002; Lu et al. 2003a; b). In contrast, previous studies comparing P. monticola populations belonging to the large northern group showed no or little differentiation in adaptive traits and it is thought that P. monticola adapted to different climates mostly via phenotypic plasticity (Rehfeldt et al. 1984; Chuine et al. 2006). Our results support this hypothesis, as a very small and unsignificant proportion of the variation is associated with climate in P. monticola after the effects of IBD and postglacial colonization have been removed.  On the other hand, southern and northern P. monticola groups are very divergent, both morphologically and genetically. Richardson et al. (2009) suggested that adaptation to climate was involved in the differentiation between these groups. We cannot confirm this hypothesis because climate in this region is highly correlated with genetic differentiation. Indeed, RDA showed that the largest proportion of the variation in P. monticola is confounded between ancestry, space and climate and it is therefore very difficult to separate the effects of climate from those other factors.  94  Overall, IBA was detected as a significant driver of population structure in both P. monticola and P. strobus, although evidence was absent in P. monticola after removing the effect of IBD and postglacial colonization. Both species showed a relatively large proportion of explained variation that was confounded between ancestry, space, and climate. This suggests that FST outlier and environmental association methods may have trouble separating the neutral effects of space and ancestry from selection. 3.4.2 FST outlier and environmental associations Evidence of local adaptation was detected in both species by FST outlier and environmental association analyses. FST outlier analyses revealed that 7% and 9.8% were outliers in P. monticola and P. strobus, respectively. Similar or slightly lower estimates were found in other tree species using SNP markers (Namroud et al. 2008; Eckert et al. 2010b; Prunier et al. 2011; Chen et al. 2012; Keller 2012). Moreover, we detected that a surprisingly large number of SNPs were significantly associated with one or more climate variable in each species (P. monticola: 38%; P. strobus 47%). This number is unexpected considering the generally lower percentage of SNP-environment correlation detected in other species using similar methods (Pinus taeda: 22% SNPs, Eckert et al. 2010a; Populus balsamifera: 13.4%, Keller 2012). A number of hypotheses can be invoked to explain this large number of significant SNP-environment correlations: 1) a large number of SNPs are directly involved or are linked to loci involved in adaptation to climate in our study species, and environmental association methods were powerful at detecting them; or 2) we detected a high number of false positives due to complex underlying population structure not fully accounted for by Bayenv 2 or LFMM. 95  First, environmental association methods may have been powerful at detecting loci with small effects in combination with the sampling design we used. In this study, we prioritized sampling a large number of populations from widely different climates, over sampling a large number of individuals from fewer populations, which should increase the power of environmental association analysis (Poncet et al. 2010; De Mita et al. 2013). Furthermore, environmental association methods have been shown to be more powerful than FST outlier methods to detect small changes in allele frequency, as expected under cases of polygenic selection (De Mita et al. 2013; de Villemereuil et al. 2014). Adaptive phenotypic traits controlled by many small-effect loci are likely to be common in nature (reviewed in Rockman 2012). Given the large number of SNPs associated with climate detected and evidence of IBA, especially in P. strobus, our results support the hypothesis of a high number of small-effect loci involved in local adaptation. Second, a number of false positives may also be expected to explain the high number of SNP-environment correlations detected. This may be especially the case for P. monticola since climate was not a significant factor driving population structure in RDA analysis. Simulation studies showed that all methods we used had very high false positive rates in non-equilibrium expansion from refugia scenarios or highly structured populations (de Villemereuil et al. 2014; Lotterhos & Whitlock 2014). Indeed, both species recolonized most of their current ranges from two separate glacial refugia, creating hierarchically structured populations (Chapter 2). Under such scenarios, patterns of population structure are likely to covary with climate, which can lead to spurious genotypic-environment correlations (Eckert et al. 2010b). Indeed, RDA biplots show that some SNPs detected by Bayenv 2 and LFMM do not seem to be strongly associated with climate when the effects of ancestry and space 96  have been controlled (i.e., many SNPs are found close to the centre of the biplots, Figures S12-S15). Although RDA did not allow to statistically test each separate SNP for significant association with climate, this approach may prove useful as an additional way to visually validate if some SNPs are likely to be false positives or not. Hierarchical structure needs to be identified to avoid false positives (Excoffier et al. 2009). To this end, we also performed analyses in each different genetic group separately. All 4 methods that we used detected slightly fewer SNPs within genetic groups as opposed to range-wide groups, except for FDIST Ritland in P. strobus. This suggests that hierarchical structure may have generated a number of false positives. Indeed, a number of SNPs that were detected when analysing range-wide populations were not detected within either northern or southern groups (~26% and 31% of outlier SNPs). Therefore, the inclusion or exclusion of a small number of southern populations had a substantial effect on those loci. In this case, it is difficult to determine whether such loci are truly adaptive loci due to divergent selection between the two genetic groups, or are false positives due to hierarchical population structure. On the other hand, a number of outliers between the two groups may be expected, if part of the genetic differentiation is due to local adaptation. Across all 4 methods, we also found that different SNPs were detected within different genetic groups, and a number of SNPs were specific to the northern (13% to 15%) or southern groups (8% to 11%). Only 8% and 5% of SNPs detected were common to both northern and southern group datasets in P. monticola and P. strobus, respectively. Similarly, Prunier et al. (2012) found specific genes involved in adaptation to climate in each of the two main black spruce (Picea mariana [Mill.] Britton, Sterns & Poggenburg) genetic lineages. 97  Those results suggest that different selection pressures may target different genes in different genetic groups. For example, partial Mantel tests found IBA with different sets of climatic variables in each genetic group. This is expected given the different climatic conditions experienced by the northern and southern group populations in our study species. It should be noted however, that the power of our analysis was reduced in southern groups due to small sample size, and we did not sample the full southern range of P. monticola. Nevertheless, this emphasizes the importance of performing analyses using different subsets of population in different genetic lineages (Prunier et al. 2012), highly dissimilar climates (Orsini et al. 2012), or different spatial scales (Manel et al. 2010b). 3.4.3 Methods to detect signature of selection with small population sample sizes In this study, we prioritized sampling a large number of populations from widely different climates, over sampling a large number of individuals from fewer populations (Poncet et al. 2010; De Mita et al. 2013). We also used methods that allowed the analysis of only one to six sampled individuals per population. Being Bayesian, BayeScan, Bayenv and LFMM fully incorporated the uncertainty on allele frequencies due to small sample sizes. We also developed an individual based FST estimator that fully took into account uncertainty due to small sample sizes, and incorporated it in the FDIST framework (Beaumont & Nichols 1996). We showed that the individual based FST estimator was similar to other widely used population based FST estimators (Weir & Cockerham 1984). Willing et al. (2012) found that the Weir & Cockerham (W&C) estimator is largely unbiased when population sizes are reduced to as few as 6 individuals and >100 SNPs are used, but tended to slightly overestimate FST if genetic differentiation was moderate or large (FST ≥ 0.1). We 98  found similar results, as our individual based FST estimates were lower than the W&C FST estimator at high values of FST, therefore, potentially reducing the number of false positives in the right tail of the FST distribution. Furthermore, large differences between our individual based FST estimate and the W&C FST was observed at individual SNPs. We believe that the individual based FST estimator yielded more accurate FST estimates, when used with small sample sizes, than conventional population based FST estimators. BayeScan and Bayenv were implemented according to their default settings as population based methods: allele frequencies were averaged for each population prior to the calculation of test statistics. Although LFMM can also be used as population based, we chose to use it as an individual based method. This has the advantage of avoiding potential bias introduced by differences in sample sizes across populations, and it is expected to increase the power of the method by increasing sample size (De Mita et al. 2013; de Villemereuil et al. 2014). This potential increase in power comes as a trade-off with an increase in the number of false positives (de Villemereuil et al. 2014). However, it is worth nothing that LFMM was shown to be less sensitive to the data specification (i.e., individual vs. population based) in polygenic scenarios (de Villemereuil et al. 2014). Both increases in power and in the number of false positives could explain the larger number of outlier loci detected by LFMM as compared to Bayenv. 3.4.4 Comparisons between methods to detect signature of selection SNPs detected by all 4 methods were largely non-overlapping. This was especially the case for Bayenv 2 and LFMM, which both detected a large number of SNP-environment associations, but strongly differed in the number and identity of detected SNPs. Indeed, the 99  overlap of detected SNPs between methods was strikingly low (0-11% depending on the dataset). A small overlap of SNP loci between methods was also observed in a simulation study that tested both methods under realistic demographic scenarios (de Villemereuil et al. 2014). Both methods are fundamentally similar as they both estimate genotypic-environment correlations under mixed models, in which environmental variables are fixed effects, and population structure is introduced via unobserved variables or hidden factors (Frichot et al. 2013; de Villemereuil et al. 2014), but two main differences may explain the small overlap of detected SNPs. The first important difference is that LFMM simultaneously estimates genotypic-environment correlations while correcting for population structure whereas Bayenv 2 uses a two-step procedure: it first estimates the neutral covariance matrix of allele frequency among populations, and then test for significant genotypic-environment correlations using this covariance matrix as a null model. Because we were not able to identify a priori set of control SNPs, the single step LFMM procedure of testing and correcting for population structure may have been advantageous. The second difference between Bayenv 2 and LFMM is how population structure is taken into account. de Villemereuil et al. (2014) argues that the principal component analysis-related nature of LFMM would allow it to take into account more complex scenarios via hidden factors, such as unknown demographic history, IBD patterns or environmental gradients not accounted for in the study. Simulation studies testing the performances of both methods have showed variable performances in different demographic scenarios (Lotterhos, K., personal communication; De Mita et al. 2013; de Villemereuil et al. 2014). Bayenv has 100  been showed to perform more poorly than LFMM for hierarchical population structure, especially when the environmental selection pattern is correlated with demographic history, but had lower false positives than LFMM in cases of IBD (De Mita et al. 2013; de Villemereuil et al. 2014).  In P. strobus, hierarchical structure was weak, and Bayenv 2 results seemed highly reproducible between runs for climate variables involved in a high number of SNP-environment associations (correlation between BFs across runs r > 0.98 for Latitude, MAT,MCMT, AHM, bFFP, eFFP, PAS, and Eref). Much more variation was observed across runs in P. monticola populations (r varied from 0.60 to 0.85). Hence, Bayenv 2 may have performed poorly in highly hierarchically structured P. monticola populations, where most of the among-population genetic variation is confounded between geography, ancestry, and climate. Bayenv 2 results and stability may have been affected by the inclusion of adaptive SNPs in the estimation of the covariance matrix. We noticed that the estimated covariance matrix (first step) and the resulting Bayes factors (second step) were highly variable across independent runs of the program. Similar “run-to-run variability” was also pointed out by Blair et al. (2014). Bayenv has been shown to be somewhat sensitive to the inclusion of selected loci, especially when >10% of loci are truly selected loci (Lotterhos & Whitlock 2014), which is about the observed percentage of loci detected by the method in our study species (P. monticola: 13%; P. strobus: 12%). Run-to-run variability may also be due to the relatively small number of SNPs we used to construct the covariance matrix (Günther, T., 101  personal communication). In such cases of instability of Bayenv 2, we recommend carrying out multiple independent runs and average the BFs (Blair et al. 2014). Overall, the two different methods seemed to have performed differently depending on the species and underlying demographic history. Bayenv 2 and the relatively recent LFMM methods remain to be tested together in more empirical and theoretical studies, and validated using independent sets of populations and markers to better understand their strengths and weaknesses. 3.4.5 Overlap of candidate loci between methods We used of a combination of four methods to identify a set of candidate genes for selection in natural populations of P. monticola and P. strobus. Studies showed that combining results from a number of methods yielded to large reductions in false discovery rates, since methods tended to agree more regarding true positives than false positives (de Villemereuil et al. 2014). We identified a set of strong candidate loci, detected by at least two out of the 4 methods of analysis used in this study, including 7 genes in each of P. monticola and P. strobus (Table 3.7, Table 3.8). Functional annotation of candidate genes supports their importance for adaptation. Among the 7 strong candidate genes in P. monticola, a protein transporter targeting Myb genes (CL3539Contig1_01), which was included in this study because it was a priority gene for growth in Picea glauca (Pelgas et al. 2011), carried a SNP that was detected by all four methods. Various other genes of the large Myb family have been reported to be related to temperature in Pinus cembrae (Mosca et al. 2012) and Picea mariana (Prunier et al. 2011), and were reported to be involved in growth, bud set, responses to cold stress, and wood 102  formation in Picea (reviewed in Prunier et al. 2011, 2012). It is therefore not surprising that a gene targeting the large Myb family is a candidate for local adaptation in Pinus monticola.  Another priority candidate gene for growth in Picea glauca included in this study, a gene from the Phosphofructokinase family protein (CL1430Contig1_06), was identified under divergent selection and was associated with latitude and mean coldest month temperature. Pyrophosphate-dependent fructose-6-phosphate 1-phosphotransferase (PFP), involved in glycolysis and gluconeogenesis, was shown to be important for plant growth and to affect the expression of several genes involved in diverse physiological processes (Lim et al. 2009). PFP was also associated with response to heat and cold stresses (Guo et al. 2012). Note that this gene was also identified in Pinus strobus in the present study, but under balancing selection.  Strong candidate genes for bud phenology were also detected. A member of the PLATZ transcription factor family gene carried a SNP significantly associated with temperature and aridity-related variables, both within the northern and southern groups, and was found to be under balancing selection (2_3852_01). Its closest homolog in Picea glauca was also found to be associated with timing of bud set (Prunier et al. 2013).  Two genes were potentially involved in response to biotic and abiotic stress. A transcription factor of the myc2-like family was detected in southern group populations of Pinus monticola. Myc2 genes positively regulate oxidative stress tolerance, flavonoid biosynthesis, and insect herbivory resistance via the synthesis of defensive compounds in Arabidopsis thaliana (Dombrecht et al. 2007). Finally, a gluthatione s-transferase family protein (GSTs) gene was detected under divergent selection, and associated to latitude and 103  longitude. GSTs are mainly involved in detoxification of xenobiotics, but other functions include a wide range of biotic and abiotic stressors such as pathogens, heavy metal toxins, oxidative stress and UV radiation (Mueller et al. 2000; Kampranis et al. 2000; Loyall et al. 2000; Agrawal et al. 2002). A number of strong candidates in P. strobus are also confirmed by other studies and functional annotation. Two genes from the WHISP dataset were also found to be differentially expressed in developing buds in white spruce (0_8683_01 and 0_8844_01, El Kayal et al. 2011). The first (0_8683_01), a serine threonine-protein kinase carried two non-synonymous SNPs that were strongly associated with climate. Protein kinases are involved in signal transduction within cells and regulate important cell activity such as cell division (PFAM). One SNP within this gene was associated with temperature-related variables, precipitation as snow, and latitude, while the other was associated with longitude, mean annual precipitation, and annual heat-moisture index. It is not known which one of the two SNPs is causal or if they are linked to a causal SNP involved in adaptation.  The second (0_8844_01), a galacturonosyltransferase, carried a SNP that was associated with temperature-related variables, precipitation as snow, latitude, and longitude. The same SNP was also associated with climate in P. monticola in the present study, and a gene of the same family was the strongest FST outlier detected in Picea stichensis (Holliday et al. 2012). Members of this family are involved in carbohydrate synthesis such as pectins. Pectins are important structural component of primary cell walls of plants, and are required for normal plant growth and development (Willats et al. 2001). Saccharides remodeling has also been 104  observed in conjunction with starch breakdown, which is thought to be involved in response to drought, among other possible roles (Zwiazek et al. 2001).  Strong candidates for response to abiotic stress are also detected in P. strobus. A member of the DnaJ heat shock protein family was detected under divergent selection, and was associated with aridity-related variables (2_3726_02). Members of this family were shown to be associated with precipitation in Pinus cembra and P. mugo (Mosca et al. 2012), with precipitation, temperature, and height growth in Picea glauca (Prunier et al. 2012, 2013), and was also detected as a FST outlier in Larix decidua (Mosca et al. 2013). Heat shock proteins acts as important molecular chaperone responsive of various abiotic stress such as those related to temperature and moisture in plants (Sun et al. 2002; Zhao et al. 2010). A Tubulin family gene, involved in freezing tolerance in Arabidopsis thaliana (Tair, GO), was also detected in the present study. Other genes of unknown function are of interest as they were among the strongest outliers and were detected by all four methods (0_6047_02; GQ0081.BR.1_D09). Overall, most of the genes highlighted as strong candidates for adaptation to climate we identified were involved in growth, bud phenology, and response to abiotic and biotic stress in model plants or in other conifer species. As pointed out by Prunier et al. (2011), this suggests that similar genes or genes families may be involved in adaptation, even in phylogenetically distant species. 3.4.6 Overlap of candidate genes between Pinus monticola and P. strobus SNP markers were developed simultaneously in orthologous genes of P. monticola and P. strobus, allowing comparison of adaptive patterns between those two related species. We 105  found that a number of SNPs and genes (24% of orthologous SNPs, 28% of orthologous genes) showed signature of selection in both species, and that 6 genes were associated with at least one common climate variable. Although some false positives are expected, these findings suggest that local adaptation to climate may have partly evolved via the same genes in both species. This contrasts with results from Cullingham et al. (2014), who did not find any common outlier from 399 orthologous SNPs between sister species P. contorta var. latifolia and P. banksiana. Some overlap of candidate genes between our species may not be surprising given their relatively recent divergence (Gernandt et al. 2008). Both species may also have faced similar evolutionary pressures, for example, adaptation to cold or photoperiod along latitudinal or elevational gradients. Indeed, all 6 genes that are correlated with similar climatic variables between both species are associated with temperature-related variables or to latitude. Cases of parallel evolution, i.e., the evolution of the same phenotype via similar genetic mechanisms, are accumulating, even in distantly related organisms (Arendt & Reznick 2008). Mosca et al. (2012), looked at adaptation in common genes across four alpine conifer species (Abies alba Mill., Larix decidua Mill., Pinus cembra L., and P. mugo Turra), and detected seven genes associated with climate shared between two or more species. Of particular interest, one locus of the flavodoxin family protein (0_7001_01) was detected in L. decidua and P. mugo (Mosca et al. 2012, 2013), as well as in both P. monticola and P. strobus in the present study. The same loci was also associated with aridity and precipitation in P. taeda (Eckert et al. 2010a). More evidence of parallel evolution comes from a study of two species in the Fagaceae family (Quercus robur L., Castanea sativa Mill., 106  divergence time ~60 MYA), which found that QTLs for the timing of bud burst co-located in homologous chromosomes of the two species. Our results reinforce evidence that some loci may evolve in response to climatic selection pressures in a similar way across different species (Mosca et al. 2012). Such conserved loci across species may be of high importance for adaptation to climate. However, we found that the vast majority of genes detected under selection were not common to both species. Only 2 out of 14 strong candidate genes were common to both species. Furthermore, the number of outliers SNPs and genes common to both species did not differ from random expectations (orthologous SNPs: p = 0.44, 10,000 random draws of 64 P. monticola and 80 P. strobus candidate SNPs; orthologous genes: p = 0.23, 10,000 random draws 55 P. monticola and 68 P. strobus candidate genes using R). Hence, although a small number of common outliers were detected, our findings suggest that both species largely adapted to climate via different suite of genes. Closely related organisms often evolve the same phenotype via different genetic mechanisms, showing that many different genetic solutions may exist to solve similar ecological problems (reviewed in Arendt & Reznick 2008). Given the highly polygenic nature of adaptive traits in conifers (Rockman 2012; McKown et al. 2014), the evolution of a similar phenotype via different genes is likely to be common. For example, Grivet et al. (2011) found that selection affected distinct genes in phylogenetically close Mediterranean pine species (divergence time less than 10 MYA), despite their co-occurrence and overlapping environments. Similar comparisons between Picea mariana and P. glauca, which diverged more than 10 MYA, found more adaptive genetic similarities at the gene 107  family level rather than at the gene level (Prunier et al. 2011). Studies of conifer gene families such as Knox and HD-Zip III transcription factors showed that different types of selection affected different genes of the same family (Namroud et al. 2010). Hence, the redundancy of functions among recently duplicated genes in the conifer evolution offers the possibility for different gene family members to adapt differently in different species (Van de Peer et al. 2001; Guillet-Claude et al. 2004). The small number of genes surveyed here and the short sequence reads leading to inaccurate annotation of gene families did not allow us to assess whether common gene families between P. monticola and P. strobus are found more often than expected by chance. With a larger coverage of the genome, we would expect to find a significant number of gene families to be important for adaptation in both species, if gene functions have been reshuffled among gene family members since common ancestry. The large number of outlier genes that are not shared between P. monticola and P. strobus could also be due to genetic adaptation to different climates (see 1.3.2 Natural distributions and climates). Field experiments in Arabidopsis thaliana showed that adaptation to different environments relied upon genes with different molecular functions (Fournier-Level et al. 2011). This is also evidenced by different spectrums of climatic variables correlated with genetic variation by partial Mantel or environmental association tests between our study species. Furthermore, as previously noted, P. monticola seemed to have adapted to different climates across most of its range (i.e., large northern group) largely via phenotypic plasticity (Rehfeldt et al. 1984; Chuine et al. 2006), which may involve a very different set of genes. Hence, selection likely targeted different genes in both species resulting from adaptation to different climates. 108  The question whether selection is acting on standing variation (soft sweeps) as opposed to new mutations (hard sweeps) does arise from a number of common outlier SNPs in P. monticola and P. strobus. Trans-species shared polymorphisms were common among distantly related Picea species, and shared common ancestry (Bouillé & Bousquet 2005). Most shared polymorphism were neutral or nearly neutral, and were maintained by large effective population sizes in outcrossing, wind-pollinated trees species (Bouillé & Bousquet 2005). Using the same conserved set of genes as the present study (WHISP sequences), Eckert et al. (2013) found little evidence for long term lineage-wide adaptive evolution since common ancestry of 11 Pinus subgenus Strobus species, supporting the “nearly neutral theory of evolution” (Ohta 1973; Gillespie 1999). Hence, orthologous SNPs are likely shared between species since common ancestry, and evidence of selection acting on these SNPs support an interpretation of adaptive evolution from standing genetic variation rather than de-novo mutations.    109  4. Conclusion and perspectives We found evidence of local adaptation shaping population structure in both Pinus monticola and P. strobus, and we separated these effects from those of postglacial recolonization and isolation by distance (IBD). We highlighted a number of strong candidate genes for adaptation to climate in each species. We also detected a number of candidate genes in common to both species, demonstrating how comparison among species can provide novel insights into genes of importance for adaptation. Compared to next-generation sequencing, we have surveyed a relatively low number of genes, and surely did not identify all shared adaptive polymorphisms between both species. Hence, more genome-wide trans-species comparisons are needed to determine if parallel evolution is common.  Populations of both species and the distribution of suitable habitats are expected to be dramatically affected by climate change (Hamann & Wang 2006; Gray & Hamann 2012; Wang et al. 2012a; Joyce & Rehfeldt 2013). Forest management practices and the introduction of the white pine blister rust drastically reduced populations of both species to less than 10% of pre-settlement stands (Quinby 2000; Jain et al. 2004). This has resulted in fragmented, small, and isolated groups of trees over most of their former geographic range, creating a situation where inbreeding and genetic drift could begin to affect genetic diversity and compromise their adaptive capacity to a changing climate (Buchert 1994; Rajora et al. 1998).  Given the speed and magnitude of climate change, white pine populations are likely to be extirpated from the southern part of their ranges, becoming increasingly maladapted to new local environments, and unable to efficiently colonize emergent suitable northern 110  habitats (Joyce & Rehfeldt 2013). White pine blister rust (Cronartium ribicola J.C. Fisch.) epidemics may further cause drastic population reductions. Thus, P. monticola and P. strobus appear to be poorly equipped to cope with the effects of climate change, and human interventions will likely be required to ensure its persistence. Based on the results of this study, we provide the following recommendations for conservation and future research for P. strobus and P. monticola: 1) Ex-situ conservation programs (seed banks) should target diversity hotspots such as locations of glacial refugia. In both species, moderately high levels of genetic diversity and high levels of gene flow were found, efficiently redistributing genetic diversity across the ranges. Identified regions of putative glacial refugia are highly genetically diverse and should be prioritized in conservation programs (Hampe & Petit 2005). In particular, ex-situ collections should focus first on populations belonging to the southern genetic groups. These populations were found to have high within and among-population genetic diversity (Steinhoff et al. 1983; Kim et al. 2010), and were phenotypically differentiated from the northern groups, suggesting the presence of distinct local adaptations (present study; Richardson et al. 2009; Joyce & Rehfeldt 2013). Furthermore, southern populations are adapted to climates that may be prevalent further north in the future. Because rear edge populations are likely to be extirpated in future climate change scenario (Hampe & Petit 2005; Joyce & Rehfeldt 2013), there is an added urgency of preserving such diversity in ex-situ collections.  The large northern groups of both species should also be well represented in ex-situ collections. For the same reasons as for southern refugia, northern refugia should also be of high conservation priority. A number of seed bank collections are already in place for both P. 111  monticola and P. strobus. Given high levels of gene flow within the large northern groups, the existing extensive collections should be sufficient to preserve most of the genetic diversity, rare localized alleles, and genetic local adaptations associated with each geographic area or genetic sub-group (Gapare et al. 2005). Phylogeographic studies using mitochondrial, chloroplastic, or a large number of SNP markers are needed to detect additional cryptic northern refugia important for conservation. 2) In situ interventions should be used to maintain viable population sizes. Despite high levels of gene flow and high genetic diversity, small and fragmented populations of both species may be prone to genetic drift and loss of genetic variation in future generations (Rajora et al. 1998). In situ interventions can increase population size, decrease genetic drift, and increase the strength of natural selection to allow adaptation to future climates. A number of interventions can be implemented to restore and help establishment of populations of white pines, such as protecting a portion of their current and future habitat, disease control (e.g., Ribes control), prescribed burning to help regeneration, and reforestation of white pine blister rust resistant trees. For P. monticola, it was estimated that most protected areas within the current reserve system will be able to maintain sufficient suitable habitat by 2080 (effective population size > 1000), even under the worst case scenario of no adaptation and no migration (Hamann & Aitken 2013). However, pests and disease distributions may also change with global warming, possibly affecting the survival of the smaller and genetically depauperate northern edge populations. Without sufficient population sizes, riskier strategies such as assisted migration may not be successful (Hewitt et al. 2011). 112  3) Assisted gene flow should be used to reduce maladaptation. Assisted migration (AM), first proposed by (Peters & Darling 1985), can help species at risk adapt and persist under rapid climate change. AM within species ranges, termed “assisted gene flow” (AGF; Aitken & Whitlock 2013), could help mitigate maladaptation in the short and long term in populations of P. monticola and P. strobus. In the short term, AGF has the advantage of increasing the frequency of genotypes from source populations that are adapted to the new climates of the recipient population, increasing survival and productivity of individuals. Over the long term, AGF increases the rate of adaptation by increasing the frequency of pre-adapted alleles to new climatic conditions (Aitken & Whitlock 2013).  P. strobus is a prime candidate for AGF. This research corroborates previous evidence that local adaptation is a significant driver of population genetic structure, and that genetic variation is correlated with climatic variables, which are important prerequisites for successful AGF (Aitken & Whitlock 2013). Considerable among-provenance variation for adaptive traits, mostly along latitudinal clines has been described (Joyce & Rehfeldt 2013), suggesting large benefits of AGF programs. Future predictions for 2060 suggested that southern populations such as the southern Virginia provenances may be suitable to transfers as far north as Ontario (Joyce & Rehfeldt 2013).  AGF may be of less value in P. monticola because evidence of local adaptation was weaker than in P. strobus and northern populations were not highly genetically differentiated for adaptive traits (Rehfeldt et al. 1984; Chuine et al. 2006). This suggests that populations mostly adapted to climate via phenotypic plasticity, which could increase the chances of maladaptation because populations may first respond to climate change by plastic responses 113  instead of genetic responses (Aitken & Whitlock 2013). However, given that genetic variation at some loci was associated with climate, AGF may still help to provide new genetic material to fuel long term local adaptation, especially at the northern edge of the species range, where populations have lower genetic diversity. The movement of trees from regions having higher frequency of rust resistance, such as southern Oregon (Kinloch et al. 2004), may also help survival. It should be noted that AGF has already been practiced in British Columbia for several years because seeds are often moved upward and northward from seed orchards located at lower elevations within seed planning units (O’neill et al. 2008). P. monticola seeds transfer zones are also very large, allowing large latitudinal and elevational transfers (Campbell & Sugano 1989; Thomas & Lester 1992; Meagher et al. 1999).  The benefits of AGF should be carefully weighed against the potential risks (Aitken & Whitlock 2013). Risks of outbreeding depression and lineage swamping, should be low across the range of both species given that populations are highly admixed between genetic groups, and that most transfers would involve south to north transfers into the large, homogeneous, northern groups. Although transfers between the genetically differentiated southern and northern groups could potentially induce outbreeding depression, especially in P. monticola, the benefits of bringing new genetic background on which selection can act may outweigh by far the initial reduction in fitness (Aitken & Whitlock 2013). One other concern with large latitudinal transfers may be the risks of disrupting local adaptation to other environmental factors not taken into consideration, such as photoperiod, and shorter latitudinal distances or longitudinal transfers across climate gradients are recommended (Aitken & Whitlock 2013). If some populations are locally adapted to particular pests or 114  diseases (e.g., some P. monticola populations are adapted to a specific Cronartium ribicola virulent race), transfers should ensure local adaptation to these factors is not disrupted. 4) Assisted colonization should be used to help colonize new suitable habitat. Very large areas of northern Canada will become suitable for P. monticola and P. strobus in the near future (Hamann & Wang 2006; Gray & Hamann 2012; Wang et al. 2012a; Joyce & Rehfeldt 2013). However, tree species will likely not be able to colonize these increasingly disjunct habitats as fast as they are created (Aitken et al. 2008). We suggest the use of assisted colonization (AC), the translocation of individuals outside of their current ranges (Hunter 2007; Hoegh-guldberg et al. 2008), to help white pines track their climatic niches. All contemporary P. strobus ecotypes, classified based on height growth potential, are projected to be strongly displaced northward by 2060 (Joyce & Rehfeldt 2013). The most suited provenances to colonize northern habitats, outside the current range, are from Ontario, Québec, and the Canadian Maritime provinces (Joyce & Rehfeldt 2013). Large areas of new suitable habitat for P. monticola are also expected to become available (Hamann & Wang 2006; Gray & Hamann 2012), and AC in combination with in-situ interventions could ensure establishment of new populations in these areas. Because of the highly mountainous landscape of this region and the low amount of among-population differentiation in P. monticola, shorter upward transfers may be needed and the species may be able to track its climatic niche without the need for intensive AC interventions. As with AGM, the benefits of AC strategies should be carefully evaluated against their risks. Risks include the invasion of translocated species, and changing recipient communities composition and structure in an undesired way (McLachlan et al. 2007). 115  Finally, rust resistant individuals should be translocated to ensure successful establishment in recipient communities. AC is currently being tested in a case study of whitebark pine (Pinus albicaulis Engelm.) a highly vulnerable white pine species (McLane & Aitken 2012). 5) Marker assisted selection could increase the efficiency of assisted migration. Despite the lack of among-population phenotypic variation detected in previous studies of P. monticola (Rehfeldt et al. 1984; Chuine et al. 2006), we detected evidence of local adaptation, and identified a number of strong candidate genes in this species. This suggests that, even though strategies relying on the presence of local adaptation such as AGF and AC may not be as cost-effective in P. monticola, the detection of more genes implicated in adaptation to climate and marker-assisted selection may significantly increase the benefits of these strategies. Using marker-assisted selection from a larger set of adaptive markers, tree breeders could select for alleles that are adapted to particular climates, and specifically introduce these alleles in recipient populations to fuel local adaptation. 6) Functional studies and genome scans are needed to detect and confirm involvement of candidate genes in adaptation. In this study we highlighted a number of strong candidates for adaptation. These loci may be directly involved in location adaptation or linked to the causal loci. Causal link between genotype, phenotype and fitness is very difficult to establish and confirmatory evidence will come from a variety of approaches, such as expression studies, phenotypic associations and functional studies (Vasemägi & Primmer 2005; Barrett & Hoekstra 2011). Comparative analysis across multiple lineages or species can also confirm the importance of genes for local adaptation, as demonstrated here. Candidate outlier genes found in common 116  between P. monticola and P. strobus, as well as other conifer species, are thus priority genes for future adaptation studies. The search for adaptive genetic variation can also be complicated when a large proportion of among-population variation is confounded between climatic, spatial and postglacial colonization related factors, as we found in our study species. A number of false positives are expected and validation with an independent set of populations or genotypes is needed. Furthermore, because the relative performances of environmental and FST outlier methods have been shown to largely differ depending on the demographic scenarios (this study; Lotterhos, K., personal communication; De Mita et al. 2013; de Villemereuil et al. 2014), we advocate for the use of a number of different methods to identify candidate genes in natural populations. In this study we used a candidate gene approach in which conserved gene sequences were selected from existing conifer genomic resources. This approach was efficient at targeting and detecting important loci for adaptation and was cost effective considering the large genome size and the low levels of linkage disequilibrium in conifers (Grivet et al. 2011; Pavy et al. 2012a). Genome-wide scans have now the potential to allow for the detection of loci under selection in an unbiased way, and identify the complete genetic architecture of complex traits. Time and cost effective new technologies, such as reduced representation library, can generate 100 thousands of SNPs without the need for previous genome sequencing. These methods could be applied to non-model species such as white pines (Parchman et al. 2012). 117  In conclusion, the survey of genetic variation at common genes in two closely related white pine species has revealed range-wide patterns of genetic variation, previously undetected population genetic structure, as well as a list of strong candidate genes for adaptation. This will be highly valuable for future studies of adaptation in both species, and to inform tree breeding, assisted migration, and conservation programs under projected climate change.   118  References Agrawal GK, Jwa N, Rakwal R (2002) A pathogen-induced novel rice (Oryza sativa L.) gene encodes a putative protein homologous to type II glutathione S -transferases. Plant Science, 163, 1153–1160. Aitken SN, Whitlock MC (2013) Assisted Gene Flow to Facilitate Local Adaptation to Climate Change. Annual Review of Ecology, Evolution, and Systematics, 44, 367–388. Aitken SN, Yeaman S, Holliday JA, Wang T, Curtis-McLane S (2008) Adaptation, migration or extirpation: climate change outcomes for tree populations. Evolutionary Applications, 1, 95–111. Alberto FJ, Aitken SN, Alía R et al. (2013) Potential for evolutionary responses to climate change - evidence from tree populations. Global Change Biology, 19, 1645–1661. Arendt J, Reznick D (2008) Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends in Ecology & Evolution, 23, 26–32. Barrett RDH, Hoekstra HE (2011) Molecular spandrels: tests of adaptation at the genetic level. Nature Reviews. Genetics, 12, 767–780. Beaulieu J, Simon J-P (1994) Genetic structure and variability in Pinus strobus in Quebec. Canadian Journal of Forest Research, 24, 1726–1733. Beaumont MA, Balding DJ (2004) Identifying adaptive genetic divergence among populations from genome scans. Molecular Ecology, 13, 969–980. Beaumont MA, Nichols RA (1996) Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society B: Biological Sciences, 263, 1619–1626. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289–300. Bialozyt R, Ziegenhagen B, Petit RJ (2006) Contrasting effects of long distance seed dispersal on genetic diversity during range expansion. Journal of Evolutionary Biology, 19, 12–20. Bingham RT, Hoff RJ, Steinhoff RJ (1972) Genetics of western white pine. USDA Forest Service, Research Paper WO-12. Blair LM, Granka JM, Feldman MW (2014) On the stability of the Bayenv method in assessing human SNP-environment associations. Human Genomics, 8, 1–13. Borcard D, Legendre P, Drapeau P (1992) Partialling out the spatial component of ecological variation. Ecology, 73, 1045–1055. 119  Bouillé M, Bousquet J (2005) Trans-species shared polymorphisms at orthologous nuclear gene loci among distant species in the conifer Picea (Pinaceae): implications for the long-term maintenance of genetic diversity in trees. American Journal of Botany, 92, 63–73. Bowcock AM, Kidd JR, Mountain JL et al. (1991) Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. Proceedings of the National Academy of Sciences of the United States of America, 88, 839–43. Brito PH, Edwards SV (2009) Multilocus phylogeography and phylogenetics using sequence-based markers. Genetica, 135, 439–55. Brumfield RT, Beerli P, Nickerson DA, Edwards SV (2003) The utility of single nucleotide polymorphisms in inferences of population history. Trends in Ecology & Evolution, 18, 249–256. Brunsfeld SJ, Sullivan J, Soltis DE, Soltis PS (2001) Comparative phylogeography of north- western North America : a synthesis. In: Integrating ecological and evolutionary processes in a spatial context (eds Silvertown J, Antonovics J), pp. 319–339. Blackwell Science, Oxford. Buchert GP (1994) Genetics of white pine and implications for management and conservation. The Forestry Chronicle, 70, 427–434. Campbell RK, Sugano AI (1989) Seed zones and breeding zones for white pine in the Cascade range of Washington and Oregon. USDA Forest Service Research Paper PNW-RP-407. Chen J, Källman T, Ma X et al. (2012) Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics, 191, 865–881. Chevin L-M, Lande R, Mace GM (2010) Adaptation, plasticity, and extinction in a changing environment: towards a predictive theory. PLoS Biology, 8, 1–8. Chuine I, Rehfeldt GE, Aitken SN (2006) Height growth determinants and adaptation to temperature in pines: a case study of Pinus contorta and Pinus monticola. Canadian Journal of Forest Research, 36, 1059–1066. Clark PU, Dyke AS, Shakun JD et al. (2009) The last glacial maximum. Science, 325, 710–714. Coop G, Witonsky D, Di Rienzo A, Pritchard JK (2010) Using environmental correlations to identify loci underlying local adaptation. Genetics, 185, 1411–1423. Crispo E (2008) Modifying effects of phenotypic plasticity on interactions among natural selection, adaptation and gene flow. Journal of Evolutionary Biology, 21, 1460–1469. Critchfield W (1986) Hybridization and classification of the white pines (Pinus section Strobus). Taxon, 35, 647–656. 120  Critchfield W, Little (1966) Geographic distribution of the pines of the world. USDA Forest Service Miscellaneous. Publication. 991. Cullingham CI, Cooke JEK, Coltman DW (2014) Cross-species outlier detection reveals different evolutionary pressures between sister species. The New Phytologist, in press. Davis MB (1983) Quaternary history of deciduous forests of eastern North America and Europe. Annals of the Missouri Botanical Garden, 70, 550–563. Davis MB, Shaw RG (2001) Range shifts and adaptive responses to Quaternary climate change. Science, 292, 673–679. De Mita S, Thuillet A-C, Gay L et al. (2013) Detecting selection along environmental gradients: analysis of eight methods and their effectiveness for outbreeding and selfing populations. Molecular Ecology, 22, 1383–1399. De Villemereuil P, Frichot E, Bazin E, François O, Gaggiotti OE (2014) Genome scan methods against more complex models: when and how much should we trust them? Molecular Ecology, 23, 2006–2019. Dillon SK, Nolan MF, Matter P et al. (2013) Signatures of adaptation and genetic structure among the mainland populations of Pinus radiata (D. Don) inferred from SNP loci. Tree Genetics & Genomes, 9, 1447–1463. Dombrecht B, Xue GP, Sprague SJ et al. (2007) MYC2 differentially modulates diverse jasmonate-dependent functions in Arabidopsis. The Plant Cell, 19, 2225–2245. Earl DA, VonHoldt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources, 4, 359–361. Echt CS, Nelson CD (1997) Linkage mapping and genome length in eastern white pine (Pinus strobus L.). Theoretical and Applied Genetics, 94, 1031–1037. Eckert AJ, Bower AD, González-Martínez SC et al. (2010a) Back to nature: ecological genomics of loblolly pine (Pinus taeda, Pinaceae). Molecular Ecology, 19, 3789–805. Eckert AJ, Bower AD, Jermstad KD et al. (2013a) Multilocus analyses reveal little evidence for lineage-wide adaptive evolution within major clades of soft pines (Pinus subgenus Strobus). Molecular Ecology, 22, 5635–5650. Eckert AJ, Hall BD (2006) Phylogeny, historical biogeography, and patterns of diversification for Pinus (Pinaceae): phylogenetic tests of fossil-based hypotheses. Molecular Phylogenetics and Evolution, 40, 166–182. Eckert AJ, van Heerwaarden J, Wegrzyn JL et al. (2010b) Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics, 185, 969–982. 121  Eckert AJ, Wegrzyn JL, Liechty JD et al. (2013b) The Evolutionary Genetics of the Genes Underlying Phenotypic Associations for Loblolly Pine (Pinus taeda, Pinaceae). Genetics, 195, 1353–1372. Eckert AJ, Wegrzyn JL, Pande B et al. (2009) Multilocus patterns of nucleotide diversity and divergence reveal positive selection at candidate genes related to cold hardiness in coastal Douglas Fir (Pseudotsuga menziesii var. menziesii). Genetics, 183, 289–298. Endler JA (1977) Geographic variation, speciation, and clines. Princeton University Press, Princeton. Endler JA (1986) Natural selection in the wild. Princeton University Press, Princeton. England PR, Osler GHR, Woodworth LM et al. (2003) Effects of intense versus diffuse population bottlenecks on microsatellite genetic diversity and evolutionary potential. Conservation Genetics, 4, 595–604. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611–2620. Eveno E, Collada C, Guevara MA et al. (2008) Contrasting patterns of selection at Pinus pinaster Ait. Drought stress candidate genes as revealed by genetic differentiation analyses. Molecular Biology and Evolution, 25, 417–437. Excoffier L, Hofer T, Foll M (2009) Detecting loci under selection in a hierarchically structured population. Heredity, 103, 285–298. Excoffier L, Laval G, Schneider S (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564–567. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164, 1567–1587. Farjon A (2001) World checklist and bibliography of conifers. Royal Botanic Gardens, Kew. Foll M, Gaggiotti O (2008) A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics, 180, 977–993. Fournier-Level A, Korte A, Cooper MD et al. (2011) A map of local adaptation in Arabidopsis thaliana. Science, 334, 86–89. Fowler DP, Heimburger C (1969) Geographic variation in eastern white pine, 7-year results in Ontario. Silvae Genetica, 18, 123–129. Frichot E, Schoville SD, Bouchard G, François O (2013) Testing for associations between loci and environmental gradients using latent factor mixed models. Molecular Biology and Evolution, 30, 1687–1699. 122  Gapare W, Aitken S, Ritland C (2005) Genetic diversity of core and peripheral Sitka spruce ( (Bong.) Carr) populations: implications for conservation of widespread species. Biological Conservation, 123, 113–123. Garrett PW, Schreiner EJ, Kettlewood H (1973) Geographic varition of eastern white pine in the northeast. USDA Forest Service, Northeastern Forest Experiment Station Report Paper, NE-274. Geils BW, Hummer KE, Hunt RS (2010) White pines, Ribes, and blister rust: a review and synthesis. Forest Pathology, 40, 147–185. Genys JB (1987) Provenance variation among different provenances of Pinus strobus from Canada and the United States. Canadian Journal of Forest Research, 17, 228–235. Gernandt D, Lopez G, Garcia S, Liston A (2005) Phylogeny and classification of Pinus. Taxon, 54, 29–42. Gernandt DS, Magallón S, Geada López G et al. (2008) Use of simultaneous analyses to guide fossil-based calibrations of Pinaceae phylogeny. International Journal of Plant Sciences, 169, 1086–1099. Gillespie JH (1999) The role of population size in molecular evolution. Theoretical Population Biology, 55, 145–156. Godbout J, Fazekas A, Newton CH, Yeh FC, Bousquet J (2008) Glacial vicariance in the Pacific Northwest: evidence from a lodgepole pine mitochondrial DNA minisatellite for multiple genetically distinct and widely separated refugia. Molecular Ecology, 17, 2463–2475. Godbout J, Jaramillo-Correa JP, Beaulieu J, Bousquet J (2005) A mitochondrial DNA minisatellite reveals the postglacial history of jack pine (Pinus banksiana), a broad-range North American conifer. Molecular Ecology, 14, 3497–3512. Goslee SC, Urban DL (2007) The ecodist package for dissimilarity-based analysis of ecological data. Journal of Statistical Software, 22, 1–19. Gray LK, Hamann A (2012) Tracking suitable habitat for tree populations under climate change in western North America. Climatic Change, 117, 289–303. Grigg LD, Whitlock C (1998) Late-glacial vegetation and climate change in western Oregon. Quaternary Research, 49, 287–298. Grivet D, Sebastiani F, Alía R et al. (2011) Molecular footprints of local adaptation in two Mediterranean conifers. Molecular Biology and Evolution, 28, 101–116. Guillet-Claude C, Isabel N, Pelgas B, Bousquet J (2004) The evolutionary implications of knox-I gene duplications in conifers: correlated evidence from phylogeny, gene mapping, and analysis of functional divergence. Molecular Biology and Evolution, 21, 2232–2245. 123  Günther T, Coop G (2012) Robust identification of local adaptation from allele frequencies. Genetics, early online. Guo X, Ronhovde K, Yuan L et al. (2012) Pyrophosphate-dependent fructose-6-phosphate 1-phosphotransferase induction and attenuation of Hsp gene expression during endosperm modification in quality protein maize. Plant Physiology, 158, 917–929. Hamann A, Aitken SN (2013) Conservation planning under climate change: accounting for adaptive potential and migration capacity in species distribution models. Diversity and Distributions, 19, 268–280. Hamann A, Wang T (2006) Potential effects of climate change on ecosystem and tree species distribution in British Columbia. Ecology, 87, 2773–2786. Hampe A, Petit RJ (2005) Conserving biodiversity under climate change: the rear edge matters. Ecology Letters, 8, 461–467. Hamrick J. (2004) Response of forest trees to global environmental changes. Forest Ecology and Management, 197, 323–335. Hamrick JL, Godt MJW (1996) Effects of life history traits on genetic diversity in plant species. Philosophical Transactions: Biological Sciences, 351, 1291–1298. Hereford J (2009) A quantitative survey of local adaptation and fitness trade-offs. The American Naturalist, 173, 579–88. Hewitt GM (1996) Some genetic consequences of ice ages , and their role in divergence and speciation. Biological Journal of the Linnean Society, 58, 247–276. Hewitt GM (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907–913. Hewitt GM (2004) Genetic consequences of climatic oscillations in the Quaternary. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 359, 183–95. Hewitt N, Klenk N, Smith AL et al. (2011) Taking stock of the assisted migration debate. Biological Conservation, 144, 2560–2572. Hoban SM, Borkowski DS, Brosi SL et al. (2010) Range-wide distribution of genetic diversity in the North American tree Juglans cinerea: a product of range shifts, not ecological marginality or recent population decline. Molecular Ecology, 19, 4876–4891. Hoegh-guldberg O, Hughes L, Mclntyre S et al. (2008) Assisted colonization and rapid climate change. Science, 321, 345–346. Holliday JA, Ralph SG, White R, Bohlmann J, Aitken SN (2008) Global monitoring of autumn gene expression within and among phenotypically divergent populations of Sitka spruce (Picea sitchensis). The New Phytologist, 178, 103–122. 124  Holliday JA, Ritland K, Aitken SN (2010) Widespread, ecologically relevant genetic markers developed from association mapping of climate-related traits in Sitka spruce (Picea sitchensis). The New Phytologist, 188, 501–514. Holliday JA, Suren H, Aitken SN (2012) Divergent selection and heterogeneous migration rates across the range of Sitka spruce (Picea sitchensis). Proceedings of the Royal Society B: Biological Sciences, 279, 1675–1683. Howe G, Aitken SN, Neale D et al. (2003) From genotype to phenotype: unraveling the complexities of cold adaptation in forest trees. Canadian Journal of Botany, 81, 1247–1266. Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources, 9, 1322–1332. Hunter ML (2007) Climate change and moving species: furthering the debate on assisted colonization. Conservation Biology, 21, 1356–1358. Ibrahim KM, Nichols RA, Hewitt GM (1996) Spatial patterns of genetic variation generated by different forms of dispersal during range expansion. Heredity, 77, 282–291. Isabel N, Beaulieu J, Thériault P, Bousquet J (1999) Direct evidence for biased gene diversity estimates from dominant random amplified polymorphic DNA (RAPD) fingerprints. Molecular Ecology, 8, 477–483. Jackson ST, Overpeck JT, Webb- T, Keattch SE, Anderson KH (1997) Mapped plant-macrofossil and pollen records of late quaternary vegetation change in eastern North America. Quaternary Science Reviews, 16, 1–70. Jackson ST, Webb RS, Anderson KH et al. (2000) Vegetation and environment in Eastern North America during the last glacial maximum. Quaternary Science Reviews, 19, 489–508. Jacobson GL, Dieffenbacher-Krall A (1995) White pine and climate change - insights from the past. Journal of Forestry, 93, 39–42. Jain TB, Graham RT, Morgan P (2004) Western white pine growth relative to forest openings. Canadian Journal of Botany, 34, 2187–2198. Jaramillo-Correa JP, Beaulieu J, Bousquet J (2004) Variation in mitochondrial DNA reveals multiple distant glacial refugia in black spruce (Picea mariana), a transcontinental North American conifer. Molecular Ecology, 13, 2735–47. Jaramillo-Correa JP, Beaulieu J, Khasa DP, Bousquet J (2009) Inferring the past from the present phylogeographic structure of North American forest trees: seeing the forest for the genes. Canadian Journal of Forest Research, 39, 286–307. Jeffrey H (1961) The Theory of Probability. Oxford University Press, Oxford. 125  Jermstad KD, Eckert AJ, Wegrzyn JL et al. (2010) Comparative mapping in Pinus: sugar pine (Pinus lambertiana Dougl.) and loblolly pine (Pinus taeda L.). Tree Genetics & Genomes, 7, 457–468. Johansson M, Primmer CR, Merilä J (2006) History vs. current demography: explaining the genetic population structure of the common frog (Rana temporaria). Molecular Ecology, 15, 975–983. Joost S, Bonin A, Bruford MW et al. (2007) A spatial analysis method (SAM) to detect candidate loci for selection: towards a landscape genomics approach to adaptation. Molecular Ecology, 16, 3955–69. Joyce DG, Rehfeldt GE (2013) Climatic niche, ecological genetics, and impact of climate change on eastern white pine (Pinus strobus L.): guidelines for land managers. Forest Ecology and Management, 295, 173–192. Joyce DG, Sinclair RW (2002) Genetic variation in height growth among populations of eastern white pine (Pinus strobus L.) in Ontario. Silvae Genetica, 51, 136–142. Kampranis SC, Damianova R, Atallah M et al. (2000) A novel plant glutathione S-transferase/peroxidase suppresses Bax lethality in yeast. The Journal of biological chemistry, 275, 29207–29216. Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters, 7, 1225–1241. El Kayal W, Allen CCG, Ju CJ-T et al. (2011) Molecular events of apical bud formation in white spruce, Picea glauca. Plant, Cell & Environment, 34, 480–500. Keir KR, Bemmels JB, Aitken SN (2011) Low genetic diversity, moderate local adaptation, and phylogeographic insights in Cornus nuttallii (Cornaceae). American Journal of Botany, 98, 1327–1336. Keller SR (2012) Local adaptation in the flowering time gene network of balsam poplar,. Molecular Biology and Evolution, 10, 3143–3152. Kim M-S, Brunsfeld SJ, McDonald GI, Klopfenstein NB (2003) Effect of white pine blister rust (Cronartium ribicola) and rust-resistance breeding on genetic variation in western white pine (Pinus monticola). Theoretical and Applied Genetics, 106, 1004–1010. Kim M-S, Richardson BA, McDonald GI, Klopfenstein NB (2010) Genetic diversity and structure of western white pine (Pinus monticola) in North America: a baseline study for conservation, restoration, and addressing impacts of climate change. Tree Genetics & Genomes, 7, 11–21. Kinloch BB, Sniezko RA, Dupper GE, Plantation MC (2004) Virulence gene distribution and dynamics of the white pine blister rust pathogen in western North America. Phytopathology, 94, 751–758. Kremer A, Le Corre V (2012) Decoupling of differentiation between traits and their underlying genes in response to divergent selection. Heredity, 108, 375–85. 126  Kremer A, Ronce O, Robledo-Arnuncio JJ et al. (2012) Long-distance gene flow and adaptation of forest trees to rapid climate change. Ecology Letters, 15, 378–392. De Lafontaine G, Ducousso A, Lefèvre S, Magnanou E, Petit RJ (2013) Stronger spatial genetic structure in recolonized areas than in refugia in the European beech. Molecular Ecology, 22, 4397–4412. Lee C-R, Mitchell-Olds T (2011) Quantifying effects of environmental and geographical factors on patterns of genetic differentiation. Molecular Ecology, 20, 4631–4642. Legendre P, Fortin M-J (2010) Comparison of the Mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data. Molecular Ecology Resources, 10, 831–844. Leimu R, Fischer M (2008) A meta-analysis of local adaptation in plants. PLoS One, 3, e4010. Lewontin RC, Krakauer J (1973) Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics, 74, 175–195. Li P, Beaulieu J, Daoust G, Plourde A (1997) Patterns of adaptive genetic variation in eastern white pine (Pinus strobus) from Quebec. Canadian Journal of Forest Research, 27, 199–206. Lim H, Cho M-H, Jeon J-S et al. (2009) Altered expression of pyrophosphate: fructose-6-phosphate 1-phosphotransferase affects the growth of transgenic Arabidopsis plants. Molecules and Cells, 27, 641–649. Lindner M, Maroschek M, Netherer S et al. (2010) Climate change impacts, adaptive capacity, and vulnerability of European forest ecosystems. Forest Ecology and Management, 259, 698–709. Liston A, Robinson WA, Pin D, Alvarez-buylla ER, Aiton P (1999) Phylogenetics of Pinus (Pinaceae) Based on Nuclear Ribosomal DNA Internal Transcribed Spacer Region Sequences taxa of particularly problematic placement including. Molecular Phylogenetics and Evolution, 11, 95–109. Lotterhos KE, Whitlock MC (2014) Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests. Molecular Ecology, 23, 2178–2192. Loyall L, Uchida K, Braun S, Furuya M, Frohnmeyer H (2000) Glutathione and a UV light-induced glutathione S-transferase are involved in signaling to chalcone synthase in cell cultures. The Plant Cell, 12, 1939–1950. Lu P, Joyce DG, Sinclair RW (2003a) Geographic variation in cold hardiness among eastern white pine (Pinus strobus L.) provenances in Ontario. Forest Ecology and Management, 178, 329–340. Lu P, Joyce DG, Sinclair RW (2003b) Effect of selection on shoot elongation rhythm of eastern white pine (Pinus strobus L.) and its implications to seed transfer in Ontario. Forest Ecology and Management, 182, 161–173. 127  Lu P, Sinclair RW, Boult TJ, Blake SG (2005) Seedling survival of Pinus strobus and its interspecific hybrids after artificial inoculation of Cronartium ribicola. Forest Ecology and Management, 214, 344–357. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P (2003) The power and promise of population genomics: from genotyping to genome typing. Nature Reviews. Genetics, 4, 981–94. MacDonald GM, Cwynar LC, Whitlock C (1998) The late Quaternary dynamics of pines in northern North America. In: Ecology and biogeography of Pinus (ed Richardson DM), pp. 122–136. Cambridge University Press, New York. Mandák B, Hadincová V, Mahelka V, Wildová R (2013) European invasion of North American Pinus strobus at large and fine scales: high genetic diversity and fine-scale genetic clustering over time in the adventive range. PLoS One, 8, e68514. Manel S, Joost S, Epperson BK et al. (2010a) Perspectives on the use of landscape genetics to detect genetic adaptive variation in the field. Molecular Ecology, 19, 3760–3772. Manel S, Poncet BN, Legendre P, Gugerli F, Holderegger R (2010b) Common factors drive adaptive genetic variation at different spatial scales in Arabis alpina. Molecular Ecology, 19, 3824–3835. McKown AD, Klápště J, Guy RD et al. (2014) Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. The New Phytologist, 203, 535–553. McLachlan JS, Hellmann JJ, Schwartz MW (2007) A framework for debate of assisted migration in an era of climate change. Conservation Biology, 21, 297–302. McLane SC, Aitken SN (2012) Whitebark pine (Pinus albicaulis) assisted migration potential: testing establishment north of the species range. Ecological Applications, 22, 142–153. Meagher MD, Hunt RS (1999) The transferability of western white pine to and within British Columbia based on early survival , environmental damage , and juvenile height. Western Journal of Applied Forestry, 14, 41–47. Mehes MS, Nkongolo KK, Michael P (2007) Genetic analysis of Pinus strobus and Pinus monticola populations from Canada using ISSR and RAPD markers: development of genome-specific SCAR markers. Plant Systematics and Evolution, 267, 47–63. Mehes M, Nkongolo KK, Michael P (2009) Assessing genetic diversity and structure of fragmented populations of eastern white pine (Pinus strobus) and western white pine (P. monticola) for conservation management. Journal of Plant Ecology, 2, 143–151. Mehes-Smith M, Nkongolo KK, Kim NS (2011) A comparative cytogenetic analysis of five pine species from North America, Pinus banksiana, P. contorta, P. monticola, P. resinosa, and P. strobus. Plant Systematics and Evolution, 292, 153–164. Meirmans PG (2012) The trouble with isolation by distance. Molecular Ecology, 21, 2839–2846. 128  Mimura M, Aitken SN (2007) Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis. Heredity, 99, 224–232. Mosca E, Eckert AJ, Di Pierro EA et al. (2012) The geographical and environmental determinants of genetic diversity for four alpine conifers of the European Alps. Molecular Ecology, 21, 5530–5545. Mosca E, González-Martínez SC, Neale DB (2013) Environmental versus geographical determinants of genetic structure in two subalpine conifers. The New Phytologist, 201, 180–192. Mueller LA, Goodman CD, Silady RA, Walbot V (2000) AN9, a petunia glutathione S-transferase required for anthocyanin sequestration, is a flavonoid-binding protein. Plant Physiology, 123, 1561–1570. Namroud M-C, Beaulieu J, Juge N, Laroche J, Bousquet J (2008) Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce. Molecular Ecology, 17, 3599–3613. Namroud M-C, Bousquet J, Doerksen T, Beaulieu J (2012) Scanning SNPs from a large set of expressed genes to assess the impact of artificial selection on the undomesticated genetic diversity of white spruce. Evolutionary Applications, 5, 641–656. Namroud M-C, Guillet-Claude C, Mackay J, Isabel N, Bousquet J (2010) Molecular evolution of regulatory genes in spruces from different species and continents: heterogeneous patterns of linkage disequilibrium and selection but correlated recent demographic changes. Journal of Molecular Evolution, 70, 371–386. Narum SR, Hess JE (2011) Comparison of FST outlier tests for SNP loci under selection. Molecular Ecology Resources, 11 (Suppl 1), 184–94. Naydenov K, Senneville S, Beaulieu J, Tremblay F, Bousquet J (2007) Glacial vicariance in Eurasia: mitochondrial DNA evidence from Scots pine for a complex heritage involving genetically distinct refugia at mid-northern latitudes and in Asia Minor. BMC Evolutionary Biology, 7, 233. Neuenschwander (1999) White pine in the American west, a vanishing species - Can We Save It ? U.S. Department of Agriculture Forest Service General Technical Report RMRS - 35. Nielsen R (2005) Molecular signatures of natural selection. Annual Review of Genetics, 39, 197–218. Nkongolo KK, Michael P, Gratton WS (2002) Identification and characterization of RAPD markers inferring genetic relationships among Pine species. Genome, 45, 51–58. Nosil P, Funk DJ, Ortiz-Barrientos D (2009) Divergent selection and heterogeneous genomic divergence. Molecular Ecology, 18, 375–402. Ohta T (1973) Slightly deleterious mutant substitutions in evolution. Nature, 246, 96–98. 129  O'Neill GA, Ukrainetz N, Carlson M et al. (2008) Assisted migration to address climate change in British Columbia: recommendations for interim seed transfer standards . B.C. Ministry of Forest and Range, Research Branch, Technical Report 048. Orsini L, Spanier KI, DE Meester L (2012) Genomic signature of natural and anthropogenic stress in wild populations of the waterflea Daphnia magna: validation in space, time and experimental evolution. Molecular Ecology, 21, 2160–2175. Orsini L, Vanoverbeke J, Swillen I, Mergeay J, De Meester L (2013) Drivers of population genetic differentiation in the wild: isolation by dispersal limitation, isolation by adaptation and isolation by colonization. Molecular Ecology, 22, 5983–5999. Parchman TL, Gompert Z, Mudge J et al. (2012) Genome-wide association genetics of an adaptive trait in lodgepole pine. Molecular Ecology, 21, 2991–3005. Parmesan C (2006) Ecological and evolutionary responses to recent climate change. Annual Review of Ecology, Evolution, and Systematics, 37, 637–669. Pavy N, Namroud M, Gagnon F, Isabel N, Bousquet J (2012a) The heterogeneous levels of linkage disequilibrium in white spruce genes and comparative analysis with other conifers. Heredity, 108, 273–284. Pavy N, Pelgas B, Laroche J et al. (2012b) A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers. BMC Biology, 10, 84. Pelgas B, Bousquet J, Meirmans PG, Ritland K, Isabel N (2011) QTL mapping in white spruce: gene maps and genomic regions underlying adaptive traits across pedigrees, years and environments. BMC Genomics, 12, 145. Peters RL, Darling JDS (1985) The effect greenhouse and nature reserves. BioScience, 35, 707–717. Petit RJ, Aguinagalde I, de Beaulieu J-L et al. (2003) Glacial refugia: hotspots but not melting pots of genetic diversity. Science, 300, 1563–1565. Petit RJ, Hampe A (2006) Some evolutionary consequences of being a tree. Annual Review of Ecology, Evolution, and Systematics, 37, 187–214. Petit RJ, Hu FS, Dick CW (2008) Forests of the past : a window to future changes. Science, 320, 1450–1452. Poncet BN, Herrmann D, Gugerli F et al. (2010) Tracking genes of ecological relevance using a genome scan in two independent regional population samples of Arabis alpina. Molecular Ecology, 19, 2896–907. Porter AH (2003) A test for deviation from island-model population structure. Molecular Ecology, 12, 903–915. 130  Price RA, Liston A, Strauss SH (1998) Phylogeny and systematic of Pinus. In: Ecology and biogeography of Pinus (ed Richardson DM), pp. 49–68. Cambridge University Press, Cambridge. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945–959. Prunier J, Gérardi S, Laroche J, Beaulieu J, Bousquet J (2012) Parallel and lineage-specific molecular adaptation to climate in boreal black spruce. Molecular Ecology, 21, 4270–4286. Prunier J, Laroche J, Beaulieu J, Bousquet J (2011) Scanning the genome for gene SNPs related to climate adaptation and estimating selection at the molecular level in boreal black spruce. Molecular Ecology, 20, 1702–1716. Prunier J, Pelgas B, Gagnon F et al. (2013) The genomic architecture and association genetics of adaptive characters using a candidate SNP approach in boreal black spruce. BMC Genomics, 14, 368. Quinby PA (2000) An overview of the conservation of old-growth red and eastern white pine forest in Ontario. Report No. 23, Ancient Forest Research Report. Rajora O, DeVerno L, Mosseler A, Innes D (1998) Genetic diversity and population structure of disjunct Newfoundland and central Ontario populations of eastern white pine (Pinus strobus). Canadian Journal of Botany, 76, 500–508. Raymond M, Rousset F (1995) GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. Journal of Heredity, 86, 248–249. Reed TE, Schindler DE, Waples RS (2011) Interacting effects of phenotypic plasticity and evolution on population persistence in a changing climate. Conservation Biology, 25, 56–63. Régnière J, St-Amant R (2007) Stochastic simulation of daily air temperature and precipitation from monthly normals in North America north of Mexico. International Journal of Biometeorology, 51, 415–430. Rehfeldt GE, Ferguson DE, Crookston NL, Rehfeldt E, Crookston L (2008) Quantifying the abundance of co-occurring conifers along inland northwest (USA) climate gradients. Ecology, 89, 2127–2139. Rehfeldt GE, Hoff RJ, Steinhoff RJ (1984) Geographic Patterns of Genetic Variation in Pinus monticola. Botanical Gazette, 145, 229–239. Rehfeldt GE, Ying CCh, Spittlehouse DL, Hamilton DAJ (1999) Genetic responses to climate in Pinus contorta: niche breadth, climate change, and reforestation. Ecological Monographs, 69, 375–407. 131  Remington CL (1968) Suture-zones of hybrid interaction between recently joined biotas. In: Evolutionary Biology (eds Dobzhansky T, Hecht MK, Steere WC), pp. 321–428. Plenum, New York. Richardson BA, Brunsfeld SJ, Klopfenstein NB (2002) DNA from bird-dispersed seed and wind-disseminated pollen provides insights into postglacial colonization and population genetic structure of whitebark pine (Pinus albicaulis). Molecular Ecology, 11, 215–227. Richardson BA, Rehfeldt GE, Kim M-S (2009) Congruent climate-related genecological responses from molecular markers and quantitative traits for western white pine (Pinus monticola). International Journal of Plant Sciences, 170, 1120–1131. Rigault P, Boyle B, Lepage P et al. (2011) A white spruce gene catalog for conifer genome analyses. Plant Physiology, 157, 14–28. Ritland K (1996) Estimators for pairwise relatedness and individual inbreeding coefficients. Genetical Research, 67, 175–185. Robledo-Arnuncio JJ (2011) Wind pollination over mesoscale distances: an investigation with Scots pine. The New Phytologist, 190, 222–233. Rockman MV (2012) The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution, 66, 1–17. Rousset F (1997) Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics, 145, 1219–1228. Rousset F (2008) genepop’007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources, 8, 103–106. Savolainen O, Pyhäjärvi T, Knürr T (2007) Gene flow and local adaptation in trees. Annual Review of Ecology, Evolution, and Systematics, 38, 595–619. Shafer ABA, Cullingham CI, Côté SD, Coltman DW (2010) Of glaciers and refugia: a decade of study sheds new light on the phylogeography of northwestern North America. Molecular Ecology, 19, 4589–4621. Smith JE, Sawyer JOJ (1988) Endemic vascular plants of northwestern California and southwestern Oregon. Madrofio, 35, 54–69. Soltis DE, Gitzendanner MA, Strenge DD, Soltis PS (1997) Chloroplast DNA intraspecific phylogeography of plants from the Pacific Northwest of North America. Plant Systematics and Evolution, 206, 353–373. Soltis DE, Morris AB, McLachlan JS, Manos PS, Soltis PS (2006) Comparative phylogeography of unglaciated eastern North America. Molecular Ecology, 15, 4261–93. 132  Sork VL, Aitken SN, Dyer RJ et al. (2013) Putting the landscape into the genomics of trees: approaches for understanding local adaptation and population responses to changing climate. Tree Genetics & Genomes, 9, 901–911. Steinhoff RJ, Joyce DG, Fins L (1983) Isozyme variation in Pinus monticola. Canadian Journal of Forest Research, 13, 1122–1132. Storey JD (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society B: Statistical Methodology, 64, 479–498. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445. Storz J (2005) Using genome scans of DNA polymorphism to infer adaptive population divergence. Molecular Ecology, 14, 671–688. Sun W, Van Montagu M, Verbruggen N (2002) Small heat shock proteins and stress tolerance in plants. Biochimica et Biophysica Acta, 1577, 1–9. Swenson NG, Howard DJ (2005) Clustering of contact zones, hybrid zones, and phylogeographic breaks in North America. The American Naturalist, 166, 581–591. Syring J, del Castillo RF, Cronn R, Liston A (2007a) Multiple nuclear loci reveal the distinctiveness of the threatened , neotropical Pinus chiapensis. Systematic Botany, 32, 703–717. Syring J, Farrell K, Businský R, Cronn R, Liston A (2007b) Widespread genealogical nonmonophyly in species of Pinus subgenus Strobus. Systematic Biology, 56, 163–181. Syring J, Willyard A, Cronn R, Liston A (2005) Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. American Journal of Botany, 92, 2086–2100. Thomas B, Lester DT (1992) An examination of regional, provenance and family variation in cold hardiness of Pinus monticola. Canadian Journal of Forest Research, 22, 1917–1921. Tsutsui K, Suwa A, Sawada K et al. (2009) Incongruence among mitochondrial, chloroplast and nuclear gene trees in Pinus subgenus Strobus (Pinaceae). Journal of Plant Research, 122, 509–521. Van de Peer Y, Taylor JS, Braasch I, Meyer A (2001) The ghost of selection past: rates of evolution and functional divergence of anciently duplicated genes. Journal of Molecular Evolution, 53, 436–446. Vangestel C, Mergeay J, Dawson D a et al. (2012) Genetic diversity and population structure in contemporary house sparrow populations along an urbanization gradient. Heredity, 109, 163–172. 133  Vasemägi A, Primmer CR (2005) Challenges for identifying functionally important genetic variation: the promise of combining complementary research strategies. Molecular Ecology, 14, 3623–3642. Vitalis R, Dawson K, Boursot P (2001) Interpretation of variation across marker loci as evidence of selection. Genetics, 158, 1811–1823. Walter R, Epperson BK (2005) Geographic pattern of genetic diversity in Pinus resinosa: contact zone between descendants of glacial refugia. American Journal of Botany, 92, 92–100. Wang T, Campbell EM, O’Neill GA, Aitken SN (2012a) Projecting future distributions of ecosystem climate niches: uncertainties and management applications. Forest Ecology and Management, 279, 128–140. Wang T, Hamann A, Spittlehouse DL, Murdock TQ (2012b) ClimateWNA - high-resolution spatial climate data for western North America. Journal of Applied Meteorology and Climatology, 51, 16–29. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358–1370. Wellner CA (1962) Silvics of western white pine. USDA Forest Service Misellaneous Publication 26. Intermountain Forest and Range Experiment Station, Ogden, Utah. Wendel GW, Smith HC (1990) Pinus strobus L., eastern white pine. In: Silvics of North America (eds Burns RM, Honkala BH), pp. 476–488. USDA. Whittaker RH (1961) Vegetation history of the Pacific Coast states and the “central” significance of the Klamath region. Madrofio, 16, 5–17. Willats WG, McCartney L, Mackie W, Knox JP (2001) Pectin: cell biology and prospects for functional analysis. Plant Molecular Biology, 47, 9–27. Williams CG (2010) Long-distance pine pollen still germinates after meso-scale dispersal. American Journal of Botany, 97, 846–855. Willing E-M, Dreyer C, van Oosterhout C (2012) Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS One, 7, e42649. Willyard A, Ann W, Syring J et al. (2007) Fossil calibration of molecular divergence infers a moderate mutation rate and recent radiations for Pinus. Molecular Biology and Evolution, 24, 90–101. Wright S (1951) The genetical structure of populations. Annals of Eugenics, 15, 323–354. Zhao ZC, Zhang WR, Yan JP et al. (2010) Over-expression of Arabidopsis DnaJ (Hsp40) contributes to NaCl-stress tolerance. African Journal of Biotechnology, 9, 972–978. 134  Zwiazek JJ, Renault S, Croser C, Hansen J, Beck E (2001) Biochemical and biophysical changes in relation to cold hardiness. In: Conifer cold hardiness (eds Bigras FJ, Colombo SJ), pp. 165–186. Kluwer Academic Publishers, Dordrecht, The Netherlands.   135  Appendix 1: Supplementary tables Table S1. Number of samples from each source. Samples were collected from provenance trials, natural stands and seedbank collections. Source Number of trees sampled Number of populations sampled Pinus monticola           Provenance trial        Lone Lake Trial, Whidbey Island, Washington, USA a 242 43       Seedbank collection        Tree Seed centre, Surrey, British Columbia, Canada b 126 19       Total 368 62 Pinus strobus           Provenance trials 345 43    Orono, Maine, USA c 208 14    Grand-Mère, Québec, Canada d 41 9    Saint-Joseph de Lévis, Québec, Canada d 26 7    St-Gabriel-de-Valcartier, Québec, Canada d 70 28       Natural stands 50 6    New Brunswick, Canada 20 1    Maine, USA 6 1    New Hampshire, USA 18 3    Vermont, USA 6 1       Seedbank collections 448 84    Laurentian Forestry Centre, Québec, Canada d 359 68    Atlantic Forestry Centre, New-Brunskwick, Canada d 89 16       Total 843 133 a Washington State Department of Natural Resources. b British Columbia Ministry of Forests, Lands and Natural Resource Operations (MFLNRO) c United States Department of Agriculture (USDA) Forest Service d Canadian Forest Service  136  Table S2. Number of individuals sampled per population in Pinus monticola. Population name State, country Number of individuals Latitude (decimal degrees) Longitude (decimal degrees) Elevation (m) Beavermouth BC, CAN 6 51.55 -117.48 1143 Beavervale Creek BC, CAN 6 49.23 -117.45 1470 Birk Creek BC, CAN 6 51.32 -119.90 670 Buckworth Creek BC, CAN 6 49.10 -116.83 1200 Currie Ridge BC, CAN 6 50.25 -118.42 1300 Duck Lake BC, CAN 6 49.75 -119.22 775 Englishman River BC, CAN 12 49.17 -124.37 800 Foam Creek BC, CAN 6 51.97 -119.30 970 Halfmoon Bay Road BC, CAN 6 49.58 -123.92 690 Hanner Lake BC, CAN 6 51.08 -118.02 670 Harbour Lakes BC, CAN 6 51.52 -119.20 909 Kwoiek C. BC, CAN 5 50.12 -121.88 1400 Morrissey Creek BC, CAN 6 49.38 -114.98 1500 Nakusp BC, CAN 6 50.25 -117.83 762 Nanaimo 1 BC, CAN 6 49.20 -124.18 457 Nanaimo 2 BC, CAN 5 49.08 -124.08 472 North of Carnes Creek BC, CAN 6 51.30 -118.28 775 Pete Lake (Don's Folly) BC, CAN 12 48.72 -123.97 650 Shelter Bay BC, CAN 6 50.63 -117.93 575 Shuswap Lake BC, CAN 5 50.92 -119.33 762 Silence Lake Road 9  KM 34-39 BC, CAN 6 51.75 -119.70 960 St Leon BC, CAN 6 50.43 -117.83 1021 UBC Reseach Forest BC, CAN 6 49.27 -122.58 140 Victoria BC, CAN 6 48.42 -123.62 381 Chicago Creek CA, USA 6 41.97 -123.67 1295 Dead Cow CA, USA 6 42.00 -122.92 1920 Duck Lake CA, USA 1 41.30 -122.92 1829 Elliot Creek CA, USA 1 42.00 -123.00 1737 Willow Mtn CA, USA 3 41.83 -122.23 1981 Beaver Creek ID, USA 6 48.73 -116.87 914 Bertha Hill ID, USA 6 46.77 -115.80 1524 Independence Creek ID, USA 6 46.77 -115.10 1219 Montford Creek ID, USA 6 47.73 -116.52 914 White Rock ID, USA 3 47.00 -116.10 1524 Bug Creek MT, USA 6 47.98 -113.95 1158 Bull Lake MT, USA 5 48.38 -115.82 671 Flower Creek MT, USA 4 48.38 -115.57 671   137  Table S2. (Continued). Number of individuals sampled per population in Pinus monticola. Population name State, countryNumber of individuals Latitude (decimal degrees) Longitude (decimal degrees) Elevation (m)Lake McDonald MT, USA 6 48.55 -113.95 1006 Rainey Creek MT, USA 6 47.40 -113.63 1189 Fry Meadow OR, USA 6 45.77 -117.82 1524 Glide OR, USA 5 43.60 -122.75 1158 Hood River OR, USA 6 45.47 -121.63 1143 Lakeview OR, USA 3 42.10 -120.28 2286 LaPine OR, USA 5 43.90 -121.90 1829 Mount Hood OR, USA 6 45.27 -121.77 975 Oakridge OR, USA 6 43.60 -122.12 1372 Pinehurst OR, USA 6 42.20 -122.30 1646 Prospect OR, USA 6 42.87 -122.37 1524 Sweet Home OR, USA 6 44.42 -122.17 1250 Tillamook OR, USA 6 45.57 -123.30 671 Belfair WA, USA 6 47.48 -122.90 91 Blue Mountains WA, USA 6 46.17 -117.58 1311 Everett WA, USA 6 48.08 -122.13 30 Forks WA, USA 5 47.93 -124.37 152 Ione WA, USA 6 48.63 -117.37 628 Mount Ranier WA, USA 4 47.05 -121.60 1372 Quilcene WA, USA 6 47.92 -123.05 975 Randle WA, USA 6 46.30 -121.68 1067 Shelton WA, USA 6 47.37 -122.10 610 Usk WA, USA 6 48.30 -117.27 664 White Pass WA, USA 6 46.57 -121.37 1311     138  Table S3. Number of individuals sampled per population in Pinus strobus. Population name State, country Number of individuals Latitude (decimal degrees) Longitude (decimal degrees) Elevation (m) Mason  IL, USA 1 40.33 -90.00 142 Buena Vista State Forest 1  MN, USA 1 46.92 -94.17 426 Buena Vista State Forest 2  MN, USA 1 47.53 -96.52 276 Cass Lake  MN, USA 3 47.83 -95.25 373 Chippewa National Forest 1  MN, USA 1 47.05 -93.92 409 Chippewa National Forest 2  MN, USA 1 47.08 -94.58 417 George Washington State Forest 1  MN, USA 1 47.33 -93.50 397 George Washington State Forest 2  MN, USA 2 47.50 -93.42 415 Marcell  MN, USA 9 47.58 -94.67 414 Oconto River Soclone  MN, USA 1 47.67 -91.50 533 Superior National Forest 1  MN, USA 2 47.67 -92.67 446 Superior National Forest 2  MN, USA 1 47.60 -91.35 594 White Earth State Forest  MN, USA 5 47.32 -95.95 367 Cambridge  NB, CAN 5 45.83 -66.08 54 Adirondack Mountains  NY, USA 6 43.93 -74.83 548 Chemung  NY, USA 1 42.25 -76.75 418 Falls Mills  WV, USA 2 38.78 -80.55 297 Neola  WV, USA 1 37.97 -80.13 620 Pachaug State Forest CT, USA 10 41.48 -71.83 79 Yale Union Forest Eastford CT, USA 6 41.92 -75.08 561 Fannin County GA, USA 9 34.85 -84.32 600 Union County GA, USA 10 34.83 -83.92 635 Allamakee County IA, USA 17 43.25 -91.50 371 Worcester County MA, USA 15 42.50 -72.25 216 Garrett County MD, USA 11 39.65 -78.75 279 Bowdoinham ME, USA 6 44.01 -69.95 63 Penobscot County ME, USA 15 44.85 -68.63 63 Perry Maine ME, USA 1 44.97 -67.08 23 Searsmont ME, USA 6 44.35 -69.18 99 Clam River MI, USA 1 44.17 -85.00 336 Hartwick Pine Virgin Area MI, USA 10 44.75 -84.67 388 Hiawatha National Forest MI, USA 1 46.40 -86.65 184 Howard City MI, USA 2 43.25 -85.50 301 Ishpening MI, USA 2 46.33 -87.68 423 Moran MI, USA 3 45.92 -84.88 173 Newaygo MI, USA 1 43.42 -85.80 214 Ottawa National Forest 1 MI, USA 1 46.87 -89.32 181   139  Table S3 (Continued). Number of individuals sampled per population in Pinus strobus. Population name State, countryNumber of individualsLatitude (decimal degrees) Longitude (decimal degrees) Elevation (m)Ottawa National Forest 2 MI, USA 1 46.58 -89.57 391 Ottawa National Forest 3 MI, USA 1 46.47 -88.88 383 Ottawa National Forest 4 MI, USA 2 46.08 -88.63 492 Lake County MN, USA 15 48.00 -91.75 425 Allardville NB, CAN 3 47.57 -65.43 112 Caribou Depot NB, CAN 6 47.58 -66.25 410 Estey Brook NB, CAN 6 47.05 -65.92 63 Kouchibouguak National Park NB, CAN 20 46.80 -64.97 18 Lake George NB, CAN 6 45.75 -67.08 154 Buncombe NC, USA 10 35.50 -82.50 738 Graham County NC, USA 16 36.07 -79.40 196 Jackson County NC, USA 10 35.33 -83.25 713 Transylvania County NC, USA 15 35.23 -82.63 691 Carroll county NH, USA 10 43.75 -71.42 196 Kingston NH, USA 6 42.97 -71.09 51 Warner NH, USA 6 43.32 -71.87 254 White Mountain National Forest NH, USA 6 44.10 -71.84 364 Flat Bay Brook NL, CAN 10 48.30 -58.60 90 New Bay Pond NL, CAN 9 49.23 -55.53 105 Trans-Canada Highway Howley Junction NL, CAN 7 49.33 -57.20 90 Beaver Lake NS, CAN 5 44.22 -65.35 161 Caledonia NS, CAN 5 44.35 -65.08 103 Indian Brook NS, CAN 1 46.70 -60.53 466 Ingonish NS, CAN 4 46.76 -60.37 62 Shelburne NS, CAN 6 44.00 -65.33 93 Franklin County NY, USA 9 44.42 -74.25 500 Ulster County NY, USA 20 41.75 -74.25 325 Ashland County OH, USA 15 40.75 -82.25 383 Ohio Findley State Park OH, USA 10 41.17 -82.25 253 Algoma-Ranwick Mine ON, CAN 3 47.25 -84.58 337 Antler Lake, Chapleau Township ON, CAN 6 47.60 -83.70 438 Atikokan ON, CAN 9 48.63 -92.23 386 Aylmer ON, CAN 6 42.77 -80.98 229 Borden Township ON, CAN 5 47.88 -83.17 415 Chapleau ON, CAN 6 47.13 -82.50 451 Coldwater ON, CAN 6 44.70 -79.67 282  140  Table S3 (Continued). Number of individuals sampled per population in Pinus strobus. Population name State, country Number of individuals Latitude (decimal degrees) Longitude (decimal degrees) Elevation (m) Cranberry ON, CAN 12 46.00 -80.50 218 Espanola ON, CAN 6 46.25 -81.77 198 Fonthill (Niagara Region) ON, CAN 6 43.03 -79.28 192 Kirkland Lake ON, CAN 6 48.00 -80.00 294 Macdiarmid ON, CAN 10 49.27 -88.15 336 Minden ON, CAN 6 45.00 -78.67 310 Napanee  ON, CAN 6 44.25 -76.95 95 Nipigon ON, CAN 6 48.98 -88.35 283 North Bay ON, CAN 6 46.32 -79.47 209 Simcoe ON, CAN 6 42.83 -80.30 211 Wingham ON, CAN 6 43.88 -81.32 304 Clearfield PA, USA 6 41.02 -78.43 388 Coburn PA, USA 6 40.88 -77.47 339 Cooksburg PA, USA 10 41.33 -79.25 465 Huntington & Warren County PA, USA 10 40.48 -78.02 194 Madera PA, USA 6 40.92 -78.50 405 Monroe County PA, USA 21 41.08 -75.42 585 Potters Mills PA, USA 6 40.80 -77.63 381 Swamp House Hollow PA, USA 6 40.83 -77.50 561 Vira PA, USA 6 40.67 -77.55 411 Brudenell PE, CAN 6 46.20 -62.57 8 Melville PE, CAN 6 46.05 -62.87 37 Anticosti Island QC, CAN 10 49.57 -62.68 146 Baie Cascouia QC, CAN 6 48.45 -71.47 179 Baie-de-Gaspe-Sud QC, CAN 5 48.87 -64.68 311 Bier Lake QC, CAN 6 47.40 -75.50 386 Bishopton QC, CAN 6 45.58 -71.55 208 Cote de la Miche, Saint-Joachim QC, CAN 6 47.07 -70.87 133 Cowansville QC, CAN 6 45.20 -72.67 146 Deux rivieres QC, CAN 6 46.27 -78.30 159 Fossambault Township QC, CAN 6 46.88 -71.67 233 Kondiaronk Lake QC, CAN 6 47.92 -76.83 464 Lac Aux Araignees QC, CAN 6 45.45 -70.78 499 Lake Temiscamingue QC, CAN 6 47.23 -79.40 175 L'Isle-aux-Allumettes QC, CAN 6 45.90 -77.02 122 Montbray Township QC, CAN 6 48.38 -79.48 423  141  Table S3 (Continued). Number of individuals sampled per population in Pinus strobus. Population name State, countryNumber of individuals Latitude (decimal degrees) Longitude (decimal degrees) Elevation (m)Parke Township QC, CAN 6 47.62 -69.37 426 Perodeau Township QC, CAN 6 46.83 -75.17 306 Petite Riviere Croche QC, CAN 6 47.92 -72.58 233 Pomponne Camp (Ebeddy) QC, CAN 6 47.17 -77.33 408 Rivard Township QC, CAN 6 46.25 -75.33 431 Riviere-Bonaventure QC, CAN 6 48.28 -65.53 263 Ste-Marguerite River QC, CAN 6 48.28 -69.95 242 St-Gerard D'Yamaska QC, CAN 6 46.00 -72.83 29 St-Joseph-De-Beauce QC, CAN 6 46.27 -70.80 232 St-Jovite QC, CAN 6 46.12 -74.58 244 York River (Gaspe) QC, CAN 5 48.83 -64.70 27 Greene County TN, CAN 17 36.00 -82.80 712 Killington VT, USA 6 43.71 -72.82 506 Little River VT, USA 6 44.48 -73.10 108 Chequamegan National Forest 1 WI, USA 2 45.93 -90.45 458 Chequamegan National Forest 2 WI, USA 1 46.02 -91.47 362 Forest County WI, USA 11 45.50 -88.50 424 Gliden WI, USA 5 46.03 -90.70 466 Hiles WI, USA 2 45.73 -89.00 517 Lincoln County WI, USA 1 45.18 -88.73 396 Nicolet National Forest 1 WI, USA 2 45.30 -88.52 388 Nicolet National Forest 2 WI, USA 3 45.13 -89.10 465 Three Lakes WI, USA 1 45.87 -89.08 509 Greenbrier County WV, USA 17 38.00 -80.50 691 142  Table S4. Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance. Primer ID Gene ID Primer source Primer forward Primer reverse Annealing T. (°C) Selected for SNP detection a55 0_10049_02 Jermstad et al. (2010) CTAGCCATGTGAAATCC TCTCATACCCATCTCC 53 Yes 56 0_12021_01 Jermstad et al. (2010) GCACAATAGATGGAGAGCAAAC CGCCTACATCATCCTAACATTACAGAAC 55 Yes 14 0_13552_02 Jermstad et al. (2010) AAACCCTGCGATGGTTG ACTTGAATCCTTCTGGTG 55 Yes 85 0_18132_01 Jermstad et al. (2010) GGGCATTACTTCTTTCAC CAATACATCGAGGAGG 55 Yes 15 0_382_01 Jermstad et al. (2010) TTTAGGTCCTCCTGCTG TATGAGAATCGAGAAAGACTGGATG 58 Yes 11 0_8850_01 Jermstad et al. (2010) TGATTCAGGAATAGCGACATGAAC TTGGCAAGGCAATTTTGAAGCATGGG 60 Yes 12 0_8850_02 Jermstad et al. (2010) GAGCCATATCAGTCAG CAAAAATGCCGAATCCC 55 Yes 17 2_10059_02 Jermstad et al. (2010) GGAAAACAAAGGATAACAGGAG CCAGTTCGACGTGTAAAG 55 Yes 52 2_1818_01 Jermstad et al. (2010) CCCGAATCCAAACAGAAC GACCTCCCAGACCATTATTCC 55 Yes 8 2_1850_01 Jermstad et al. (2010) TACGGTGGAGTAGAGGGATG TGGTATGACACTGCTGGAAATAATTGG 55 Yes 45 2_2015_02 Jermstad et al. (2010) CCTCAATCAAGCATCC CATCGCCTCTTCAAAC 53 Yes 78 2_3720_01 Jermstad et al. (2010) AATCTCGGCTCCCTTTC AAAAGCTCAAGGCGTG 55 Yes 46 2_4281_02 Jermstad et al. (2010) ACAAAGTGTGCAGCATC CTCAACTACCCATCATTCTC 58 Yes 84 2_4724_01 Jermstad et al. (2010) TGTTCTCTGGATCGGAAAGGG AAACTGGTAGATTTTGGGCTGGCAAGGG 55 Yes     143  Table S4 (Continued). Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance. Primer ID Gene ID Primer source Primer forward Primer reverse Annealing T. (°C) a Selected for SNP detection b 1 2_4892_01 Jermstad et al. (2010) CAACAATCATTCCAGGAG AACAACAACAACAGCAGCAAC 58 Yes 3 2_9280_01 Jermstad et al. (2010) TGTTGGGCACTAAGAGG AAGTTGGAGATCAAGTTGAGG 55 Yes 60 2_9480_01 Jermstad et al. (2010) GCTGCTCATCTTATTTGTC GTAACATTCACCCAGG 53 Yes 4 2_9930_01 Jermstad et al. (2010) CCCTAACATCATCAACAACATCATC ACAACATCTCCACAAGCTCAAC 55 Yes 61 CL1019Contig1_01 Jermstad et al. (2010) CAGGCTTGAAGAATCTTAGCAAAAC GGGGATTCAAATATGCTGG 55 Yes 86 CL1029Contig1_01 Jermstad et al. (2010) CTCCTAACAATTCCACATC GTACTGCGAGCACAATTTCC 55 Yes 9 CL1430Contig1_06 Jermstad et al. (2010) CATCTCTTCACAGTCATC TTCTTTGGCTATCAGGCTC 55 Yes 62 CL1530Contig1_04 Jermstad et al. (2010) GGGGAGATAAGAAAAAGAAGAAGAG CGGAAGAACAAGTTTAACAGCAG 55 Yes 47 CL1933Contig1_06 Jermstad et al. (2010) GTGGAAGAGAGAAACTTG GAAGAGGGAATGAGATG 53 Yes 63 CL2117Contig1_03 Jermstad et al. (2010) ATGCAGCAACTCCAAC GGATTTTGAGGAGAGTAAAG 53 Yes 48 CL304Contig1_01 Jermstad et al. (2010) TGACGGTGAACAGGAAG CGGGGAAATACGAGATGAAG 58 Yes 25 CL3116Contig1_03 Jermstad et al. (2010) CAACTTTCCGAATTTCTTCC CAGGTTACTTTTCACAGG 55 Yes 40 CL3162Contig1_02 Jermstad et al. (2010) GTGTGATTTCCATTGCC GCTTGAAGAATTGAGAAACC 56 Yes 65 CL3363Contig1_04 Jermstad et al. (2010) CAACACCCAACTTTCTTC GCATTTGCAGAACGAG 53 Yes    144  Table S4 (Continued). Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance. Primer ID Gene ID Primer source Primer forward Primer reverse Annealing T. (°C) a Selected for SNP detection b 26 CL3539Contig1_01 Jermstad et al. (2010) GATGAGATGGTGAAGATG GTGGAGGAGTGAATATTG 55 Yes 75 CL3602Contig1_03 Jermstad et al. (2010) AGCAAGTCCAACAAGC CTTCTTTTTCCACCTTTCC 56 Yes 27 CL4023Contig1_01 Jermstad et al. (2010) GAAGATGTTAGATTGATAGGTGTGG TAAGGAAGCTGTGCTCTGG 60 Yes 49 CL4138Contig1_01 Jermstad et al. (2010) ATAACAACCACATCCAAACC TGCAAGCAGCCCAAAAAGAAAAAG 58 Yes 41 CL4284Contig1_01 Jermstad et al. (2010) CAGAAATGGTTGGCAG AATGTCACAGTGGTGG 53 Yes 66 CL4470Contig1_01 Jermstad et al. (2010) CCTCATCTACCCATATTAC GATCCAGACAGACATGCAG 55 Yes 28 CL4552Contig1_01 Jermstad et al. (2010) GTCTATGTTCTCTTCTGG CATGTTAGATACTCAAGAGG 56 Yes 68 CL594Contig1_06 Jermstad et al. (2010) GGAATTGGATATTGAGGG TGTGTGGTATTGGGTC 53 Yes 69 CL708Contig1_02 Jermstad et al. (2010) TACCAGCAGAATAAGCAAG ATTGAGATTATGATCCACCCAC 55 Yes 29 CL730Contig1_04 Jermstad et al. (2010) CTGGTGCTTGGTCGTAAAAAAATTC GCTATCAATGCTAATAACAACGGTGGTC 58 Yes 82 CL866Contig1_01 Jermstad et al. (2010) GCTGTAACTGTAGGTTTTGATG TGCAATGGATGGAGGAC 55 Yes 31 UMN_1142_01 Jermstad et al. (2010) TTGGGGGGATTGAGTAG GAAAAACTGTAGGTGAATGCACAAG 53 Yes 32 UMN_1590_01 Jermstad et al. (2010) CGATGCCTTTTTTAAGTCAG CGAGAATAGGATTTTCAGGAAG 60.8 Yes 44 UMN_2399_01 Jermstad et al. (2010) CGTCTGGAATGTGAAGAAGTATTG TTACTAGGGTTTCTAGGGTTTG 55 Yes 145  Table S4 (Continued). Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance. Primer ID Gene ID Primer source Primer forward Primer reverse Annealing T. (°C) a Selected for SNP detection b 51 UMN_4361_01 Jermstad et al. (2010) CCTTCTATTTGAATCCCTTG CATAGTAACAGCCTACAG 55 Yes 13 UMN_5867_01 Jermstad et al. (2010) GGATGTAGTTGAGTGG TCTGGACCCCTTCATTG 53 Yes 74 UMN_6899_04 Jermstad et al. (2010) CGGCACGAGGATGAATTTCAAGAGAAAG TTTGCGTAGCGGGGATAGAG 55 Yes 83 UMN_927_01 Jermstad et al. (2010) GCAATGAGGGATTGAATTAC TTGGAAGAATACAAGGCAGG 55 Yes 88 0_10240_01 Eckert et al., (2010b) TGACGGTCTACATAGG CTCCCATTTTCTTCCC 53 Yes 57 0_12517_01 Jermstad et al. (2010) CACATGCTCTTGATGAGG TTGGTGCTATGGGCTTTGG 55 No 58 0_16015_01 Jermstad et al. (2010) GCATTCTAGCTGTGTTC CTATTGTTTTGTTGCCCC NA No 33 0_17419_03 Jermstad et al. (2010) GCTATTCAACGACGAGG GGGCATAGCAAGTCAAG NA No 87 0_18830_01 Jermstad et al. (2010) ACACTGCTTTCACCCTTCC TACAGTTCACCATTCCGCACCC 62.6 No 79 0_2643_01 Jermstad et al. (2010) GGACAAATCCCTTTGAAC AGCTTCAATGGCTGCTG 53 No 59 0_3046_01 Jermstad et al. (2010) TATCTTGGGCAGTGGTC CGCCTCTTCTTATTCATC 54 No 38 0_439_02 Jermstad et al. (2010) GGATTCTTATGAGAAACTGGTTTTTGTGTGGCTGCGG 60 No 16 0_6566_02 Jermstad et al. (2010) GCTCAAAAGAGGGGACTTTATTTAC CATGCTAACAGATTACTCAC 60.8 No 34 0_9082_01 Jermstad et al. (2010) CAGTTGCATATCGAGAAG TGCTCTGTTTCAGTCC NA No  146  Table S4 (Continued). Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance. Primer ID Gene ID Primer source Primer forward Primer reverse Annealing T. (°C) a Selected for SNP detection b 37 2_1405_01 Jermstad et al. (2010) CATTTTGCAGAGGCAAG TGCGTAAGGCAGAACAG 58 No 35 2_3141_01 Jermstad et al. (2010) GTTTTCATATTTGGCTGCTC GTGCTCTCAGTTAGAAG NA No 18 2_5996_01 Jermstad et al. (2010) AAGCCAGGAGTGAACATAAG ATTTATTTCATAAGAACAGCCCCCGAG 55 No 2 2_6413_01 Jermstad et al. (2010) TCTTCATCCATAGCTCCATGTC GAACAGCACTGGAACTGGCAATAC 62.6 No 19 2_6618_01 Jermstad et al. (2010) ATGACTGCCGCAAAGTAACCACCC CAAGTTCCTATTGATGCTCTTTTCC NA No 36 2_7961_01 Jermstad et al. (2010) CCCAAGCTAAGGAAAGGCCT CCTGGCCGTTTTTCATCATG NA No 20 CL1045Contig1_03 Jermstad et al. (2010) GTTGGCATCACAGTATG GTGGAAAATGGGTTGG NA No 77 CL1061Contig1_03 Jermstad et al. (2010) TTCGGAAGATGCGACAG TGGATGGAGGTGGAAAG NA No 39 CL1104Contig1_03 Jermstad et al. (2010) GGTTGTGTTCTCCTTC AGTGCCAGCAATCAGTG 56 No 21 CL1205Contig1_04 Jermstad et al. (2010) AACTCGGTGTTTTCGG ATCTTGGTCTCAGGGTC NA No 5 CL1437Contig1_03 Jermstad et al. (2010) GGAGGAAAGAAGAAAGG TGTGGTGGGCTTGTAAC NA No 22 CL1888Contig1_02 Jermstad et al. (2010) GTCCATATCATTGTCAACC ATGTGTCCACCATTGC NA No 23 CL1888Contig1_03 Jermstad et al. (2010) GGAAGAGGTATGATGG GTCAGCTATTGATGTTGAG NA No 10 CL1920Contig1_01 Jermstad et al. (2010) TGGCAACTTTGTTGGGG TATGGGGTGAGAAGAGG 53 No  147  Table S4 (Continued). Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance. Primer ID Gene ID Primer source Primer forward Primer reverse Annealing T. (°C) a Selected for SNP detection b 76 CL2108Contig1_02 Jermstad et al. (2010) GCCATAAAGGATGACCAG TTGGGGGACAAGGATTC NA No 64 CL2166Contig1_01 Jermstad et al. (2010) AGTCTCTCTCCTTGTG CTGTCTTCCTCTATATGTTC NA No 6 CL2172Contig1_09 Jermstad et al. (2010) GATTTGCTGGAGATGATG TGAGGAGAATAGGGTG NA No 7 CL382Contig1_06 Jermstad et al. (2010) GATATGGAGTATGGGG GGATGGGAAGATTTCAC 53 No 42 CL4354Contig1_01 Jermstad et al. (2010) AGAGATTCCCCATCCAC TTACAGACCTTTTTGACTACTTCCC 60.8 No 54 CL4737Contig1_03 Jermstad et al. (2010) GATTGGGGGACTTATG AAAGATCGAGAGCAGG NA No 67 CL533Contig2_04 Jermstad et al. (2010) CAAAATGGTGGGAAGAG AAAGGAATGGGTGCGAG NA No 43 CL813Contig1_03 Jermstad et al. (2010) CAGACAGAGTCATTAATCTCTCAG TGAAATTATTGTGGATGGGG 58 No 80 CL814Contig1_06 Jermstad et al. (2010) GACTAAGATATGGCTGAGG GAAGGATATTATAGGGCTTTTGAGG 60.8 No 70 CL91Contig1_01 Jermstad et al. (2010) CCAGAAATAAGTAATCCTCAAGCC AAATCAGACTTTAGAGAGCC 58 No 30 UMN_1037_01 Jermstad et al. (2010) CATCTTCCATTTCCCC TTGCTGCAACTTCCAC NA No 71 UMN_2415_03 Jermstad et al. (2010) TGTATTCGTTTTCCAGG ACCCACGAAAAATCAAACAG NA No 50 UMN_3055_01 Jermstad et al. (2010) CATCTGGTTTCTCTGG CTGTGCTTCAAATCTGTC 55 No 72 UMN_4156_02 Jermstad et al. (2010) CAGTTTCATCTCCACTACCTTTTGTC GTAATACGTGTGTGTCTGGCGTCGTTC NA No  148  Table S4 (Continued). Primer sequences and annealing temperatures used for the resequencing of 96 candidate genes for growth, phenology and abiotic stress resistance. Primer ID Gene ID Primer source Primer forward Primer reverse Annealing T. (°C) a Selected for SNP detection b 81 UMN_5272_01 Jermstad et al. (2010) AGACTGTTGAGAGAGCCTATGCACCATCTTGACAAAATTGCC NA No 73 UMN_6852_02 Jermstad et al. (2010) TTCCTCCCCTTCATTC CAACTGCTTCAAATACGG 53 No 53 UMN_CL309Contig1_03 Jermstad et al. (2010) GCGATTTCGGGTATGTTCTATG AGTGGACTTTCTGGCTG NA No 95 0_1320_01 Eckert et al. (2010a) TGTCGGTCGGATTCTAC CTGCCTGAGAAACTTG NA No 96 0_5488_01 Eckert et al. (2010a) AGTGTTTTGGTCGAGGAATACAGTGGG CGGCGTTGTGTCGTGTGCTGATATAG NA No 94 4cl-Pta Grivet et al. (2011) TCTGGCTCCTGCGGAACAGT AGGAACGACTGCTGCGTCAG NA No 89 CL3949Contig1 Eckert et al. (2010b) GCGTAATGGTAAAGGG GGTCAAGTTGTAGAGG NA No 92 CN637306.1 Eckert et al. (2009) CTAAACAATGGGAAGGG ATCTCGTTGTCCGTTC 53 No 90 dhn1 Eveno et al. (2008) GAAGAAAGGGTCGAAGGACAA GTGCTTTCCATCACCAGG NA No 91 dhn2 Eveno et al.(2008) CTGCAGAGACTGTGCCTGAGC CCAGGGAGCTTTTCCTTGATCT 55 No 93 LEA-EMB11 Eckert et al. (2009) CTCCGGGTATACAGCTCGC GACTTTCTTGAAAGAAGCTTCTGC NA No a: NA = no PCR amplification. b: 47 high quality sequences were selected for SNP detection after visual inspection (see “2.2.2. DNA sequencing”). 149  Table S5. Successful SNPs (Sequenom iPlex Gold technology). SNP Species Class a Type Pos Amplicon P-001 P. monticola III C/T 206 0_10054_01 P-002 P. monticola II G/T 272 0_11270_01 P-003 P. monticola II G/T 88 0_11270_01 P-004 P. monticola II G/A 351 0_12156_02 P-005 P. monticola II T/A 147 0_12216_02 P-006 P. monticola II T/C 215 0_12745_01 P-007 P. monticola II G/A 306 0_12978_02 P-008 P. monticola III G/T 161 0_1347_01 P-009 P. monticola II A/G 310 0_13957_02 P-010 P. monticola II C/A 296 0_14221_01 P-011 P. monticola II T/C 420 0_14221_01 P-012 P. monticola II T/C 103 0_14837_01 P-014 P. monticola II T/C 287 0_15187_01 P-015 P. monticola II A/G 272 0_15991_01 P-018 P. monticola II T/A 242 0_16889_02 P-019 P. monticola II G/T 146 0_17938_01 P-022 P. monticola II A/G 100 0_2576_02 P-023 P. monticola II A/G 289 0_3192_01 P-024 P. monticola II T/A 341 0_350_01 P-027 P. monticola II T/C 133 0_4032_02 P-028 P. monticola II G/C 320 0_4032_02 P-029 P. monticola II G/T 97 0_4756_01 P-030 P. monticola II C/G 155 0_6047_02 P-031 P. monticola II T/C 411 0_6448_02 P-032 P. monticola III G/A 317 0_6683_01 P-033 P. monticola II G/A 249 0_6878_01 P-034 P. monticola II C/T 270 0_7001_01 P-035 P. monticola II C/T 134 0_771_01 P-036 P. monticola II G/T 281 0_771_01 P-037 P. monticola II C/T 359 0_846_01 P-038 P. monticola II G/T 94 0_8683_01 P-039 P. monticola II T/C 371 0_9408_01 P-040 P. monticola II G/T 178 2_2501_01 Q-001 P. monticola II C/A 385 2_2952_01 Q-002 P. monticola II A/G 48 2_3465_01 Q-004 P. monticola II T/C 231 2_3852_01 Q-005 P. monticola II G/A 489 2_3867_02 Q-006 P. monticola II A/G 272 2_5483_02 Q-008 P. monticola II G/A 156 2_5724_02 Q-010 P. monticola II A/G 314 2_6052_01 150  Table S5 (continued). Successful SNPs (Sequenom iPlex Gold technology). SNP Species Class a Type pos Amplicon Q-011 P. monticola III A/G 226 2_6313_01 Q-012 P. monticola III C/T 322 2_2960_02 Q-013 P. monticola II G/C 129 2_6491_01 Q-014 P. monticola II T/C 320 2_6491_01 Q-016 P. monticola II G/T 376 2_684_01 Q-018 P. monticola II G/A 298 2_7189_01 Q-020 P. monticola II T/A 65 2_7213_02 Q-021 P. monticola II C/T 294 2_7532_01 Q-022 P. monticola II T/C 370 2_7532_01 Q-023 P. monticola II C/A 428 2_7852_01 Q-026 P. monticola II T/C 105 2_8627_01 Q-028 P. monticola II G/T 39 2_9466_01 Q-029 P. monticola II C/T 202 2_9542_01 Q-030 P. monticola III A/G 284 0_15867_01 Q-031 P. monticola II A/T 131 2_9665_01 Q-032 P. monticola II T/C 391 2_9665_01 Q-033 P. monticola II G/T 160 CL1367Contig1_03 Q-034 P. monticola II G/A 112 CL1692Contig1_05 Q-036 P. monticola II T/C 256 CL1698Contig1_01 Q-037 P. monticola II C/A 291 CL1767Contig1_02 Q-038 P. monticola II A/G 175 CL1852Contig1_01 R-001 P. monticola III A/G 248 0_10267_01 R-002 P. monticola III A/G 64 0_10706_01 R-003 P. monticola III G/A 78 0_11324_01 R-004 P. monticola III C/A 269 0_11508_01 R-005 P. monticola III G/T 317 0_11980_01 R-007 P. monticola III A/G 437 0_12329_02 R-008 P. monticola III G/A 331 0_12730_01 R-009 P. monticola III G/T 40 0_820_02 R-010 P. monticola III C/T 302 0_13237_01 R-011 P. monticola III C/G 304 0_13913_02 R-013 P. monticola III A/G 200 0_15361_01 R-014 P. monticola III A/T 266 0_17017_01 R-015 P. monticola III C/T 186 0_17247_02 R-016 P. monticola III A/T 224 0_3128_02 R-017 P. monticola III A/G 333 0_4105_01 R-018 P. monticola III G/A 180 0_7009_01 R-019 P. monticola III C/T 177 0_7844_01 R-020 P. monticola III C/A 225 0_8737_01 R-022 P. monticola II T/C 194 0_9408_01 151  Table S5 (continued). Successful SNPs (Sequenom iPlex Gold technology). SNP Species Class a Type pos Amplicon R-023 P. monticola III T/A 118 0_9749_01 R-024 P. monticola III C/A 272 0_9922_01 R-025 P. monticola III A/G 201 1_1609_01 R-026 P. monticola III G/A 213 1_1609_01 R-027 P. monticola III C/A 135 CL180Contig1_03 R-028 P. monticola II G/A 162 2_3852_01 R-029 P. monticola II G/T 102 2_3867_02 R-031 P. monticola III A/G 428 CL1966Contig1_05 R-034 P. monticola III T/C 149 CL1634Contig1_03 R-035 P. monticola III G/A 159 CL1879Contig1_02 R-036 P. monticola III G/T 63 CL2637Contig1_04 R-037 P. monticola III G/A 88 CL3036Contig1_01 R-038 P. monticola II G/T 103 CL3097Contig1_01 R-039 P. monticola III T/C 128 CL3321Contig1_03 R-040 P. monticola II T/C 112 CL3770Contig1_01 S-001 P. monticola II C/T 266 UMN_5867_01 S-003 P. monticola II C/A 164 0_13552_02 S-004 P. monticola II C/T 314 0_13552_02 S-005 P. monticola II G/C 65 2_10059_02 S-006-2 P. monticola III A/G 125 2_6457_01 S-007 P. monticola II C/T 180 CL3539Contig1_01 S-008 P. monticola II C/A 290 CL3539Contig1_01 S-009 P. monticola III C/T 85 UMN_1142_01 S-010 P. monticola II C/G 416 UMN_1590_01 S-011 P. monticola III C/T 96 UMN_2399_01 S-012 P. monticola III A/G 348 UMN_2399_01 S-014 P. monticola III T/C 169 CL4138Contig1_01 S-015 P. monticola III G/A 358 UMN_4361_01 S-016 P. monticola III C/T 271 2_9480_01 S-017 P. monticola III G/A 248 CL4470Contig1_01 S-018 P. monticola III T/A 503 CL4470Contig1_01 S-019-2 P. monticola III C/G 296 2_9603_01 S-020 P. monticola II G/T 345 2_4724_01 S-021 P. monticola II G/C 537 2_4724_01 S-022 P. monticola III T/C 345 0_18132_01 S-023 P. monticola III G/A 277 0_10240_01 S-024 P. monticola III G/A 172 0_10240_01 S-025 P. monticola II G/T 286 CL1430Contig1_06 S-028 P. monticola II C/T 237 2_3591_03 S-029 P. monticola II A/G 347 2_8852_01 152  Table S5 (continued). Successful SNPs (Sequenom iPlex Gold technology). SNP Species Class a Type pos Amplicon S-030 P. monticola III C/T 178 0_13058_01 S-034 P. monticola III C/T 191 2_10212_01 S-035 P. monticola III G/T 215 2_5064_01 S-036 P. monticola III G/T 129 2_6355_02 Q-024, O-021 P. monticola, P. strobus I A/G 436 2_7852_01 M-001 P. monticola, P. strobus I A/T 269 0_13978_01 M-002 P. monticola, P. strobus I T/A 288 0_13978_01 M-003 P. monticola, P. strobus I T/C 142 0_14837_01 M-006 P. monticola, P. strobus I T/A 47 0_18261_01 M-007 P. monticola, P. strobus I C/T 174 0_18267_01 M-008 P. monticola, P. strobus I C/T 306 0_18267_01 M-010 P. monticola, P. strobus I A/G 133 0_3073_01 M-011 P. monticola, P. strobus I G/T 241 0_3073_01 M-012 P. monticola, P. strobus I T/C 120 0_3192_01 M-013 P. monticola, P. strobus I C/A 370 0_3192_01 M-014 P. monticola, P. strobus I G/A 119 0_4756_01 M-015 P. monticola, P. strobus I C/A 224 0_8683_01 M-016 P. monticola, P. strobus I G/A 371 0_8683_01 M-017 P. monticola, P. strobus I A/G 244 0_8844_01 M-018 P. monticola, P. strobus I T/C 99 0_9462_01 M-025 P. monticola, P. strobus I T/C 93 CL1536_Contig1_03 M-026 P. monticola, P. strobus I G/T 180 CL1694Contig1_02 M-027 P. monticola, P. strobus I T/A 260 CL1694Contig1_02 M-028 P. monticola, P. strobus I C/A 150 0_18267_01 M-029 P. monticola, P. strobus I T/C 253 CL1806Contig1_01 M-030 P. monticola, P. strobus I T/A 30 CL1905Contig1_03 M-031 P. monticola, P. strobus I C/T 328 CL2332Contig1_01 M-032 P. monticola, P. strobus I C/T 281 CL3007Contig1_02 M-034 P. monticola, P. strobus I G/A 169 CL3097Contig1_01 S-038, G-008 P. monticola, P. strobus I G/A 490 GQ0015.BR_K18 S-039, G-020 P. monticola, P. strobus I C/T 280 GQ0026.BR_B03 S-040, G-023 P. monticola, P. strobus I G/A 575 GQ0045.B3_E18 T-016 P. monticola, P. strobus I A/G 198 2_3720_01 T-019 P. monticola, P. strobus I A/T 361 2_4724_01 T-026 P. monticola, P. strobus I C/T 179 CL4023Contig1_01 T-028 P. monticola, P. strobus I C/T 91 2_9280_01 T-029 P. monticola, P. strobus I G/A 109 CL866Contig1_01 T-031 P. monticola, P. strobus I G/C 197 0_2433_01 G-001 P. strobus III G/C 1354 GQ0015.B3.r_B10 G-002 P. strobus III C/T 234 GQ0254.B7_N02 153  Table S5 (continued). Successful SNPs (Sequenom iPlex Gold technology). SNP Species Class a Type pos Amplicon G-003 P. strobus III G/T 802 GQ0047.B3_H11 G-004 P. strobus III T/C 245 GQ033.TB_H23 G-005 P. strobus III C/T 577 GQ0011.B3.r_G02 G-009 P. strobus III G/T 126 GQ00410.B3_B06 G-010 P. strobus III G/A 313 GQ0045.B3_N12 G-011 P. strobus III G/A 151 GQ0025.BR_C19 G-012 P. strobus III C/T 175 GQ0045.B3_N12 G-014 P. strobus III A/G 184 GQ0081.BR.1_D09 G-015 P. strobus III G/A 161 GQ0048.TB_H08 G-017 P. strobus III G/T 250 GQ00410.B3_G18 G-019 P. strobus III C/G 582 GQ00612.B3_J14 G-021 P. strobus III T/C 609 GQ033.TB_H23 G-022 P. strobus III G/A 258 GQ0032.TB_I23 G-025 P. strobus III T/C 268 GQ0042.BR_E10 G-026 P. strobus III C/T 213 GQ0206.B3_C13 G-027 P. strobus III A/G 220 GQ0073.TB_E24 G-028 P. strobus III C/A 115 GQ0162.B3.r_L01 G-029 P. strobus III C/A 521 GQ044.B3.r_N02 G-030 P. strobus III A/G 100 GQ0026.B3.r_024 G-031 P. strobus III C/A 432 GQ011.BR_F15 G-033 P. strobus III A/G 357 GQ0132.B3_K05 G-034 P. strobus III C/A 200 AGP6 G-035 P. strobus III C/A 518 IFG8612 G-036 P. strobus III C/A 320 IFG8612 M-004 P. strobus III T/C 259 0_1688_02 M-009 P. strobus II G/T 275 0_2433_01 M-019 P. strobus III G/C 297 2_2799_03 M-021 P. strobus II A/G 137 2_7189_01 M-022 P. strobus II A/G 55 2_7852_01 N-002 P. strobus II C/G 242 0_11270_01 N-003 P. strobus III C/T 221 0_11649_03 N-004 P. strobus III A/G 71 0_11649_03 N-005 P. strobus II A/G 166 0_12156_02 N-006 P. strobus II G/T 98 0_12216_02 N-007 P. strobus II A/G 320 0_12745_01 N-008 P. strobus II G/C 214 0_12978_02 N-010 P. strobus II G/A 166 0_13957_02 N-011 P. strobus II A/G 285 0_13957_02 N-012 P. strobus II A/T 161 0_14221_01 N-013 P. strobus II C/T 468 0_14221_01 154  Table S5 (continued). Successful SNPs (Sequenom iPlex Gold technology). SNP Species Class a Type pos Amplicon N-015 P. strobus II C/T 164 0_15187_01 N-016 P. strobus III C/T 312 0_15762_01 N-017 P. strobus II T/C 276 0_15991_01 N-018 P. strobus III C/A 139 0_16619_01 N-019 P. strobus III C/A 422 0_16619_01 N-020 P. strobus II C/T 235 0_16889_02 N-021 P. strobus II T/C 270 0_17938_01 N-022 P. strobus II G/T 275 0_18261_01 N-023 P. strobus II G/A 80 0_2576_02 N-024 P. strobus II C/T 363 0_3073_01 N-028 P. strobus II G/T 146 0_4032_02 N-029 P. strobus II T/C 221 0_6047_02 N-030 P. strobus II G/T 126 0_6448_02 N-032 P. strobus II G/A 352 0_6878_01 N-033 P. strobus II C/A 59 0_7001_01 N-034 P. strobus II C/A 94 0_7001_01 N-035 P. strobus II C/G 367 0_771_01 N-036 P. strobus II G/A 198 0_846_01 N-037 P. strobus II A/G 108 0_9408_01 N-038 P. strobus II T/C 193 2_2501_01 N-039 P. strobus III G/A 158 2_3726_02 N-040 P. strobus III G/T 243 2_4107_01 O-001 P. strobus III G/A 326 0_15762_01 O-002 P. strobus II C/T 152 0_8844_01 O-003 P. strobus II G/A 289 2_2952_01 O-004 P. strobus II G/A 317 2_2952_01 O-005 P. strobus II G/A 118 2_3465_01 O-008 P. strobus II C/A 155 2_3852_01 O-009 P. strobus II C/T 181 2_3867_02 O-010 P. strobus II C/T 166 2_5483_02 O-011 P. strobus III C/A 264 2_5668_01 O-012 P. strobus II G/A 133 2_5724_02 O-013 P. strobus II G/T 267 2_6052_01 O-014 P. strobus II A/G 255 CL1698Contig1_01 O-015 P. strobus II C/T 121 2_6491_01 O-016 P. strobus III T/C 288 2_6731_01 O-017 P. strobus II T/C 379 2_684_01 O-019 P. strobus II C/A 100 2_7213_02 O-020 P. strobus II G/A 32 2_7532_01 O-022 P. strobus II T/C 442 2_7852_01 155  Table S5 (continued). Successful SNPs (Sequenom iPlex Gold technology). SNP Species Class a Type pos Amplicon O-024 P. strobus II C/A 173 2_8627_01 O-025 P. strobus II G/T 120 2_9466_01 O-026 P. strobus II G/A 121 2_9542_01 O-027 P. strobus II C/T 33 2_9665_01 O-028 P. strobus II T/C 128 CL1367Contig1_03 O-029 P. strobus III T/A 115 CL1588Contig1_04 O-030 P. strobus III A/T 158 CL1646Contig1_01 O-032 P. strobus III A/G 347 2_3726_02 O-033 P. strobus II C/T 44 CL1692Contig1_05 O-034 P. strobus II G/A 136 CL1767Contig1_02 O-035 P. strobus II C/A 259 CL1852Contig1_01 O-036 P. strobus II C/A 107 CL1905Contig1_03 O-037 P. strobus III C/T 160 CL206Contig1_03 O-038 P. strobus II C/T 275 CL3097Contig1_01 O-039 P. strobus II C/T 218 CL3770Contig1_01 O-040 P. strobus III G/C 289 CL3795Contig1_01 T-001 P. strobus II G/A 300 UMN_5867_01 T-002 P. strobus II C/T 263 0_13552_02 T-003 P. strobus II C/A 196 2_10059_02 T-005 P. strobus III T/C 186 CL3116Contig1_03 T-006 P. strobus II G/T 356 CL3539Contig1_01 T-008 P. strobus II T/C 338 UMN_1590_01 T-009 P. strobus III G/A 188 2_4281_02 T-013 P. strobus III A/T 148 CL3602Contig1_03 T-014 P. strobus III C/T 256 CL3602Contig1_03 T-015 P. strobus II G/A 156 2_3720_01 T-017 P. strobus II C/A 395 CL866Contig1_01 T-018 P. strobus III C/T 200 UMN_927_01 T-021 P. strobus III C/T 61 CL1029Contig1_01 T-022 P. strobus II G/T 141 CL1430Contig1_06 T-023 P. strobus II G/C 327 CL1430Contig1_06 T-027 P. strobus II T/A 180 CL4023Contig1_01 T-032 P. strobus III T/C 139 0_1688_02 T-033 P. strobus III C/T 525 0_1688_02 T-034 P. strobus II C/A 235 0_350_01 T-035 P. strobus II C/G 188 2_3591_03 T-036 P. strobus II T/C 57 2_8852_01 a: I: orthologous SNP; II: SNP of orthologous genes; III: single-species SNP. See “3.2.1 Sampling and SNP genotyping”.     156  Table S6. Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus monticola. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) a M-001 345 0.47 0.43 0.50 0.131 M-002 344 0.43 0.46 0.49 0.451 M-003 346 0.01 0.01 0.01 1.000 M-006 337 0.20 0.29 0.32 0.331 M-007 346 0.16 0.27 0.28 1.000 M-008 346 0.02 0.04 0.04 1.000 M-010 343 0.29 0.41 0.41 1.000 M-011 346 0.18 0.28 0.29 0.814 M-012 337 0.11 0.20 0.20 0.962 M-013 346 0.04 0.07 0.08 0.687 M-014 346 0.10 0.17 0.18 0.368 M-015 346 0.14 0.24 0.24 0.995 M-016 345 0.14 0.24 0.25 0.880 M-017 340 0.33 0.36 0.44 0.017 * M-018 346 0.40 0.54 0.48 0.118 M-025 346 0.26 0.37 0.38 0.814 M-026 315 0.30 0.39 0.42 0.594 M-027 295 0.30 0.41 0.42 0.962 M-028 347 0.05 0.08 0.09 0.179 M-029 344 0.01 0.03 0.03 1.000 M-030 306 0.01 0.02 0.02 1.000 M-031 301 0.12 0.22 0.21 0.962 M-032 346 0.06 0.10 0.11 0.551 M-034 326 0.14 0.21 0.24 0.236 P-001 348 0.03 0.05 0.06 0.194 P-002 348 0.21 0.31 0.33 0.589 P-003 347 0.01 0.03 0.03 1.000 P-004 340 0.37 0.50 0.47 0.530 P-005 326 0.38 0.48 0.47 1.000 P-006 344 0.31 0.44 0.42 0.796 P-007 337 0.09 0.16 0.16 1.000 P-008 301 0.22 0.32 0.34 0.521 P-009 337 0.01 0.01 0.01 1.000 P-010 321 0.04 0.07 0.08 0.325 P-011 347 0.33 0.42 0.44 0.752 P-012 340 0.07 0.12 0.13 0.481 157  Table S6 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus monticola. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) a P-014 345 0.33 0.42 0.44 0.752 P-015 343 0.03 0.06 0.06 0.194 P-018 347 0.00 0.01 0.01 1.000 P-019 343 0.40 0.45 0.48 0.507 P-022 345 0.01 0.02 0.02 1.000 P-023 347 0.21 0.30 0.33 0.451 P-024 342 0.48 0.50 0.50 1.000 P-027 346 0.38 0.46 0.47 0.814 P-028 346 0.38 0.46 0.47 0.876 P-029 347 0.02 0.03 0.04 0.065 P-030 340 0.35 0.44 0.46 0.687 P-031_corr 347 0.18 0.31 0.30 0.752 P-032 347 0.06 0.12 0.11 0.846 P-033 342 0.01 0.03 0.03 1.000 P-034 348 0.06 0.10 0.11 0.551 P-035 341 0.11 0.20 0.20 0.962 P-036 337 0.11 0.19 0.20 0.814 P-037 347 0.07 0.13 0.14 0.325 P-038 339 0.08 0.13 0.15 0.118 P-039 347 0.05 0.07 0.10 0.002 ** P-040 341 0.04 0.06 0.07 0.594 Q-001 335 0.26 0.37 0.39 0.578 Q-002 347 0.01 0.01 0.01 1.000 Q-004 286 0.34 0.44 0.45 0.900 Q-005 307 0.41 0.42 0.49 0.102 Q-006 347 0.05 0.09 0.09 0.814 Q-008 330 0.49 0.48 0.50 0.814 Q-010 338 0.14 0.26 0.24 0.328 Q-011 344 0.17 0.23 0.28 0.031 * Q-012 337 0.29 0.40 0.41 0.900 Q-013 335 0.48 0.53 0.50 0.546 Q-014 337 0.36 0.43 0.46 0.551 Q-016 338 0.18 0.31 0.30 0.907 Q-018 319 0.47 0.59 0.50 0.027 * Q-020 324 0.27 0.35 0.39 0.305 Q-021 315 0.22 0.32 0.35 0.540  158  Table S6 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus monticola. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) a Q-022 316 0.45 0.43 0.50 0.111 Q-023 317 0.23 0.42 0.35 0.010 ** Q-024 344 0.13 0.21 0.23 0.310 Q-026 347 0.06 0.11 0.11 0.846 Q-028 337 0.28 0.37 0.41 0.446 Q-029 340 0.03 0.05 0.05 0.513 Q-030 336 0.03 0.06 0.06 0.562 Q-031 340 0.44 0.52 0.49 0.547 Q-032 283 0.46 0.55 0.50 0.310 Q-033 341 0.04 0.08 0.07 1.000 Q-034 337 0.27 0.37 0.39 0.546 Q-036 332 0.48 0.48 0.50 0.814 Q-037 345 0.04 0.07 0.08 0.325 Q-038 344 0.07 0.13 0.13 1.000 R-001 342 0.08 0.13 0.15 0.209 R-002 346 0.48 0.46 0.50 0.356 R-003 341 0.32 0.40 0.44 0.323 R-004 346 0.42 0.43 0.49 0.166 R-005 347 0.47 0.49 0.50 0.995 R-007 343 0.11 0.18 0.20 0.410 R-008 345 0.07 0.12 0.14 0.146 R-009 338 0.25 0.36 0.38 0.752 R-010 339 0.42 0.49 0.49 1.000 R-011 328 0.32 0.41 0.44 0.573 R-013 348 0.46 0.49 0.50 0.995 R-014 309 0.17 0.27 0.28 0.813 R-015 340 0.07 0.12 0.12 0.876 R-016 334 0.17 0.28 0.29 0.900 R-017 313 0.20 0.28 0.32 0.166 R-018 347 0.01 0.02 0.03 0.236 R-019 339 0.29 0.35 0.41 0.050 * R-020 319 0.49 0.44 0.50 0.134 R-022 347 0.05 0.07 0.09 0.050 * R-023 346 0.20 0.28 0.32 0.182 R-024 346 0.25 0.33 0.38 0.126 R-025 348 0.29 0.41 0.42 0.900  159  Table S6 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus monticola. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) a R-026 348 0.01 0.02 0.02 1.000 R-027 315 0.39 0.43 0.47 0.310 R-028 333 0.34 0.47 0.45 0.752 R-029 345 0.39 0.46 0.48 0.876 R-031 307 0.42 0.45 0.49 0.414 R-034 319 0.28 0.38 0.40 0.758 R-035 346 0.07 0.14 0.14 0.901 R-036 326 0.50 0.46 0.50 0.451 R-037 342 0.06 0.10 0.11 0.296 R-038 343 0.10 0.15 0.18 0.152 R-039 326 0.08 0.13 0.15 0.102 R-040 347 0.22 0.32 0.34 0.546 S-001 346 0.04 0.05 0.07 0.010 ** S-003 340 0.32 0.40 0.43 0.358 S-004 348 0.31 0.40 0.43 0.498 S-005 287 0.03 0.03 0.05 0.001 *** S-006-2 348 0.02 0.03 0.04 0.325 S-007 341 0.08 0.11 0.14 0.018 * S-008 344 0.16 0.24 0.27 0.246 S-009 345 0.07 0.09 0.13 0.001 *** S-010 347 0.22 0.33 0.35 0.551 S-011 337 0.18 0.34 0.29 0.029 S-012 336 0.49 0.73 0.50 0.000 *** S-014 348 0.34 0.42 0.45 0.518 S-015 345 0.12 0.20 0.21 0.575 S-016 344 0.10 0.16 0.18 0.166 S-017 344 0.24 0.33 0.37 0.182 S-018 344 0.17 0.27 0.28 0.751 S-019-2 344 0.05 0.09 0.09 0.814 S-020 340 0.03 0.06 0.06 1.000 S-021 341 0.07 0.14 0.14 1.000 S-022 346 0.21 0.27 0.33 0.015 * S-023 344 0.37 0.52 0.47 0.197 S-024 346 0.04 0.07 0.08 0.123 S-025 346 0.13 0.16 0.23 0.000 *** S-028 348 0.00 0.01 0.01 1.000  160  Table S6 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus monticola. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) a S-029 346 0.37 0.45 0.47 0.761 S-030 346 0.13 0.22 0.23 0.594 S-034 346 0.09 0.12 0.16 0.002 ** S-035 346 0.11 0.16 0.20 0.031* S-036 346 0.14 0.26 0.25 0.546 S-038 345 0.00 0.00 0.00 1.000 S-039 345 0.19 0.38 0.31 0.000 *** S-040 345 0.13 0.21 0.23 0.310 T-016 346 0.01 0.01 0.01 1.000 T-019 340 0.01 0.01 0.01 1.000 T-026 345 0.40 0.46 0.48 0.573 T-028 346 0.02 0.05 0.05 1.000 T-029 346 0.28 0.42 0.41 0.780 T-031 343 0.02 0.03 0.05 0.009 ** a: *** q < 0.001, ** q<0.01, *q<0.05     161  Table S7. Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus strobus. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) G-001 824 0.38 0.42 0.47 0.027 * G-002 822 0.28 0.39 0.40 0.429 G-003 813 0.34 0.41 0.45 0.043 * G-004 826 0.29 0.36 0.41 0.004 ** G-005 822 0.46 0.49 0.50 0.844 G-008 816 0.28 0.41 0.41 0.844 G-009 756 0.45 0.49 0.49 0.863 G-010 825 0.36 0.45 0.46 0.583 G-011 811 0.46 0.47 0.50 0.287 G-012 657 0.40 0.42 0.48 0.013 * G-014 824 0.08 0.13 0.15 0.001 ** G-015 822 0.39 0.32 0.48 0.000 *** G-017 818 0.29 0.40 0.41 0.442 G-019 829 0.24 0.35 0.37 0.254 G-020 829 0.23 0.37 0.36 0.444 G-021 826 0.45 0.44 0.50 0.011 * G-022 829 0.19 0.30 0.31 0.459 G-023 829 0.31 0.41 0.43 0.345 G-025 830 0.24 0.33 0.37 0.029 * G-026 824 0.15 0.25 0.25 0.945 G-027 827 0.38 0.46 0.47 0.482 G-028 820 0.33 0.41 0.44 0.133 G-029 825 0.14 0.22 0.24 0.029 * G-030 830 0.32 0.40 0.43 0.100 G-031 828 0.48 0.49 0.50 0.753 G-033 805 0.30 0.45 0.42 0.174 G-034 830 0.31 0.43 0.43 0.852 G-035 822 0.32 0.40 0.44 0.075 G-036 818 0.43 0.46 0.49 0.202 M-001 826 0.32 0.55 0.44 0.000 *** M-002 827 0.32 0.55 0.43 0.000 *** M-003 827 0.08 0.14 0.15 0.213 M-004 699 0.32 0.42 0.44 0.457 M-006 812 0.10 0.15 0.19 0.000 *** M-007 828 0.16 0.26 0.28 0.213 M-008 827 0.04 0.07 0.08 0.500  162  Table S7 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus strobus. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) M-009 803 0.17 0.26 0.28 0.057 M-010 828 0.04 0.07 0.07 0.945 M-011 825 0.04 0.08 0.08 0.945 M-012 823 0.20 0.29 0.32 0.085 M-013 826 0.20 0.29 0.32 0.085 M-014 829 0.45 0.48 0.49 0.543 M-015 827 0.13 0.20 0.23 0.012 * M-016 829 0.49 0.45 0.50 0.016 * M-017 822 0.24 0.33 0.37 0.027 * M-018 827 0.36 0.42 0.46 0.054 M-019 807 0.16 0.26 0.27 0.564 M-021 828 0.31 0.42 0.43 0.704 M-022 786 0.46 0.43 0.50 0.002 ** M-025 829 0.38 0.47 0.47 0.898 M-026 823 0.46 0.46 0.50 0.133 M-027 813 0.46 0.46 0.50 0.175 M-028 830 0.17 0.26 0.28 0.254 M-029 829 0.09 0.16 0.17 0.177 M-030 826 0.12 0.20 0.22 0.057 M-031 812 0.05 0.08 0.09 0.093 M-032 826 0.08 0.12 0.14 0.013 * M-034 819 0.08 0.14 0.15 0.213 N-002 831 0.00 0.01 0.01 0.945 N-003 830 0.13 0.21 0.22 0.482 N-004 831 0.28 0.38 0.40 0.239 N-005 828 0.03 0.05 0.06 0.029 * N-006 722 0.01 0.01 0.01 0.945 N-007 831 0.02 0.04 0.05 0.201 N-008 829 0.20 0.31 0.32 0.405 N-010 826 0.31 0.38 0.43 0.008 ** N-011 816 0.43 0.45 0.49 0.133 N-012 829 0.32 0.41 0.43 0.211 N-013 829 0.22 0.34 0.34 0.945 N-015 830 0.01 0.02 0.02 0.945 N-016 831 0.38 0.46 0.47 0.743 N-017 816 0.04 0.08 0.08 0.776  163  Table S7 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus strobus. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) N-018 805 0.34 0.42 0.45 0.133 N-019 829 0.24 0.36 0.37 0.833 N-020 831 0.42 0.45 0.49 0.071 N-021 829 0.35 0.66 0.45 0.000 *** N-022 831 0.45 0.48 0.49 0.444 N-023 813 0.26 0.37 0.39 0.313 N-024 829 0.37 0.44 0.47 0.182 N-028 830 0.13 0.22 0.23 0.449 N-029 829 0.44 0.40 0.49 0.000 *** N-030 830 0.18 0.28 0.29 0.444 N-032 830 0.02 0.04 0.04 0.945 N-033 824 0.19 0.31 0.31 0.844 N-034 825 0.39 0.46 0.48 0.435 N-035 749 0.32 0.61 0.44 0.000 *** N-036 828 0.23 0.35 0.35 0.863 N-037 829 0.01 0.02 0.02 0.945 N-038 831 0.04 0.08 0.08 0.945 N-039 831 0.39 0.43 0.48 0.026 * N-040 830 0.05 0.08 0.09 0.192 O-001 831 0.01 0.03 0.03 0.945 O-002 829 0.45 0.46 0.49 0.093 O-003 831 0.01 0.03 0.03 0.945 O-004 828 0.45 0.47 0.50 0.290 O-005 831 0.01 0.03 0.03 0.945 O-008 814 0.08 0.15 0.15 0.301 O-009 828 0.10 0.17 0.17 0.459 O-010 819 0.08 0.15 0.15 0.945 O-011 831 0.22 0.33 0.35 0.254 O-012 829 0.07 0.12 0.13 0.211 O-013 828 0.43 0.47 0.49 0.500 O-014 831 0.22 0.33 0.35 0.222 O-015 829 0.32 0.42 0.44 0.429 O-016 823 0.04 0.07 0.07 0.029 * O-017 831 0.27 0.39 0.39 0.872 O-019 829 0.30 0.40 0.42 0.419 O-020 830 0.05 0.08 0.09 0.213  164  Table S7 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus strobus. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) O-021 830 0.47 0.44 0.50 0.014 * O-022 831 0.17 0.25 0.28 0.013 * O-024 831 0.30 0.40 0.42 0.201 O-025 830 0.21 0.33 0.34 0.545 O-026 830 0.11 0.18 0.19 0.213 O-027 830 0.01 0.01 0.02 0.014 * O-028 831 0.14 0.23 0.24 0.590 O-029 831 0.13 0.21 0.23 0.109 O-030 831 0.26 0.38 0.39 0.500 O-032 830 0.08 0.12 0.15 0.000 *** O-033 825 0.00 0.01 0.01 0.945 O-034 824 0.23 0.32 0.35 0.039 * O-035 830 0.49 0.51 0.50 0.866 O-036 831 0.12 0.20 0.22 0.080 O-037 718 0.34 0.43 0.45 0.419 O-038 825 0.49 0.45 0.50 0.023 * O-039 830 0.12 0.21 0.22 0.553 O-040 831 0.04 0.08 0.08 0.759 T-001 827 0.02 0.03 0.03 0.359 T-002 828 0.16 0.26 0.27 0.459 T-003 828 0.03 0.04 0.05 0.013 * T-005 830 0.42 0.48 0.49 0.704 T-006 830 0.00 0.01 0.01 0.945 T-008 818 0.41 0.45 0.48 0.162 T-009 828 0.01 0.01 0.01 0.945 T-013 822 0.10 0.18 0.18 0.945 T-014 822 0.10 0.18 0.17 0.945 T-015 802 0.39 0.46 0.47 0.545 T-016 828 0.32 0.40 0.43 0.074 T-017 829 0.18 0.28 0.30 0.192 T-018 828 0.08 0.15 0.15 0.894 T-019 696 0.21 0.34 0.33 0.427 T-021 803 0.42 0.47 0.49 0.459 T-022 820 0.31 0.42 0.43 0.592 T-023 823 0.10 0.15 0.17 0.014 * T-026 829 0.17 0.28 0.28 0.590  165  Table S7 (Continued). Observed (HO) and expected heterozygotisty (HE), and test for Hardy-Weinberg equilibrium (HWE) at each SNP in Pinus strobus. Two sided significance tests for HWE and q-values were calculated using the false discovery rate (FDR) method of Storey (2002). SNP Number of genotypes MAF HO HE HWE test (q-value) T-027 829 0.07 0.12 0.13 0.054 T-028 830 0.01 0.02 0.02 0.945 T-029 828 0.13 0.22 0.23 0.254 T-031 775 0.16 0.21 0.27 0.000 *** T-032 826 0.44 0.47 0.49 0.240 T-033 827 0.02 0.04 0.04 0.463 T-034 824 0.08 0.15 0.16 0.182 T-035 826 0.03 0.06 0.06 0.725 T-036 829 0.29 0.35 0.41 0.000 ***  166  Table S8. Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value 0_13552_02 GCAT XP_004235070 Uncharacterized protein Solanum lycopersicum 8.72917E-15 0_18132_01 GCAT XP_002274119 Glutathione S-transferase U17 Vitis vinifera 3.84026E-15 0_10240_01 GCAT NA       2_10059_02 GCAT NA       2_3720_01 GCAT NA       2_4281_02 GCAT NA       2_4724_01 GCAT XP_001765647 Serine threonine-protein kinase HT1-like Physcomitrella patens 6.01499E-30 2_9280_01 GCAT NA       2_9480_01 GCAT XP_002283619 Malate dehydrogenase, chloroplastic-like Vitis vinifera 3.57363E-69 CL1029 Contig1_01 GCAT XP_001753292 Galactinol--sucrose galactosyltransferase 2-like Physcomitrella patens 3.00677E-11 CL1430 Contig1_06 GCAT XP_002276269 Pyrophosphate--fructose 6-phosphate 1-phosphotransferase subunit alpha-like Vitis vinifera 6.28853E-19 CL3116 Contig1_03 GCAT XP_004293011 GTP-binding nuclear protein ran-3-like Fragaria vesca 1.21621E-16 CL3539 Contig1_01 GCAT XP_002275091 TOM1-like protein 2 Vitis vinifera 2.5378E-18 CL3602 Contig1_03 GCAT NP_001132676 Protochlorophyllide reductase1 Zea mays 5.26998E-28 CL4023 Contig1_01 GCAT XP_004248313 Tryptophan synthase beta chain 2, chloroplastic-like Solanum lycopersicum 9.65983E-27 CL4138 Contig1_01 GCAT XP_002275195 Dihydroflavonol-4-reductase Vitis vinifera 2.55921E-21 CL4470 Contig1_01 GCAT XP_002978069 Isoflavone reductase homolog Selaginella moellendorffii 1.42827E-11 CL866 Contig1_01 GCAT XP_002325998 Dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex-like Populus trichocarpa 2.17461E-12 UMN_1142_01 GCAT XP_002266838 BEL1-like homeodomain protein 1 Vitis vinifera 1.74921E-19 UMN_1590_01 GCAT XP_003575211 Phosphopantothenate--cysteine ligase 2-like Brachypodium distachyon 1.4203E-23 UMN_2399_01 GCAT XP_002969804 U-box domain-containing protein 13-like Selaginella moellendorffii 2.02731E-28 UMN_4361_01 GCAT XP_002336555 GRAS family transcription factor Populus trichocarpa 4.12168E-17 UMN_5867_01 GCAT NP_001174433 Plant glycogenin-like starch initiation protein 1 Oryza sativa 2.07782E-24 167  Table S8 (continued). Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value UMN_927_01 GCAT XP_004296258 COBRA-like protein 7-like Fragaria vesca 3.59123E-24 GQ0011.B3.r_G02 GCAT XP_002968409 Hypothetical protein Selaginella moellendorffii 4.61347E-18 GQ0015.B3.r_B10 GCAT NP_001043344 Vacuolar-processing enzyme-like Oryza sativa 4.93718E-24 GQ0015.BR_K18 GCAT XP_004503678 Polygalacturonase-like Cicer arietinum 8.80846E-63 GQ0025.BR_C19 GCAT NP_172432 Uncharacterized protein Arabidopsis thaliana 8.38415E-60 GQ0026.B3.r_O24 GCAT XP_004171086 Chaperone protein ClpC, chloroplastic-like Cucumis sativus 5.00216E-38 GQ0026.BR_B03 GCAT XP_002457394 Protein aspartic protease in guard cell 1-like Sorghum bicolor 1.18952E-70 GQ0032.TB_I23 GCAT NA       GQ00410.B3_B06 GCAT XP_003554414 Glucomannan 4-beta-mannosyltransferase 9-like Glycine max 4.38522E-43 GQ00410.B3_G18 GCAT XP_002304874 Transmembrane protein 45B-like Populus trichocarpa 8.1512E-33 GQ0042.BR_E10 GCAT XP_004299659 Phospho-2-dehydro-3-deoxyheptonate aldolase 2, chloroplastic-like Fragaria vesca 1.85381E-22 GQ0045.B3_E18 GCAT XP_002279067 Clathrin adaptor complexes medium subunit family protein Vitis vinifera 6.4877E-122 GQ0045.B3_N12 GCAT XP_002334077 Auxin response factor 6-like Populus trichocarpa 2.12469E-20 GQ0047.B3_H11 GCAT NP_172432 Uncharacterized protein Arabidopsis thaliana 8.38415E-60 GQ0048.TB_H08 GCAT NA       GQ00612.B3_J14 GCAT XP_002973743 Hypothetical protein Selaginella moellendorffii 9.6837E-69 GQ0073.TB_E24 GCAT NP_001132476 Arogenate dehydratase prephenate dehydratase chloroplastic-like Zea mays 8.9453E-108 GQ0081.BR.1_D09 GCAT NP_001044057 Uncharacterized protein Oryza sativa 4.64819E-61 GQ0011.BR_F15 GCAT XP_002272782 4-coumarate--CoA ligase 1 isoform 1 Vitis vinifera 2.81731E-88 GQ0132.B3_K05 GCAT XP_001752315 Catalase 3 Physcomitrella patens 7.57664E-66 GQ0162.B3.r_L01 GCAT XP_002302943 Odorant1-like Populus trichocarpa 2.51359E-44 GQ0206.B3_C13 GCAT NA       GQ0254.B7_N02 GCAT XP_004237379 Isopentenyl-diphosphate Delta-isomerase II-like Solanum lycopersicum 1.02816E-64 GQ0033.TB_H23 GCAT XP_004486415 Endoglucanase 25-like Cicer arietinum 6.7445E-55 168  Table S8 (continued). Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value GQ0044.B3.r_N02 GCAT XP_002963809 Isoflavone reductase homolog Selaginella moellendorffii 2.18715E-19 AGP6 GenBank NA       IFG8612 GenBank XP_002992268.1 hypothetical protein SELMODRAFT_448713 Selaginella moellendorffii 3.00E-13 0_10054_01 WHISP NA       0_10267_01 WHISP XP_002512589 r2r3-myb transcription factor Ricinus communis 6.2273E-19 0_10706_01 WHISP XP_001761981 Predicted protein Physcomitrella patens 1.29423E-37 0_11270_01 WHISP XP_001769840 Inactive LRR receptor-like protein kinase at3g28040-like Physcomitrella patens 7.74583E-24 0_11324_01 WHISP XP_002518413 Heat shock protein 70 (HSP70)-interacting protein, putative Ricinus communis 3.70465E-11 0_11508_01 WHISP XP_003623760 Cytochrome P450 76C4 Medicago truncatula 3.11096E-58 0_11649_03 WHISP XP_003564425 Tubulin beta chain-like Brachypodium distachyon 2.87203E-68 0_11980_01 WHISP XP_002263876 Histone-lysine n-methyltransferase suvr4-like Vitis vinifera 1.03497E-26 0_12156_02 WHISP XP_002299349 ADP-ribosylation factor GTPase-activating protein AGD3-like Populus trichocarpa 7.963E-44 0_12216_02 WHISP XP_002264268 U4 tri-snRNP-associated protein Vitis vinifera 2.20495E-16 0_12329_02 WHISP XP_003556172 polyadenylate-binding protein 5-like isoform 1 Glycine max 3.2609E-13 0_12730_01 WHISP XP_004134941 Pathogenesis-related genes transcriptional activator PTI5-like Cucumis sativus 2.40526E-34 0_12745_01 WHISP XP_001772776 F-box kelch-repeat protein at5g26960-like Physcomitrella patens 1.85187E-66 0_12978_02 WHISP XP_003541323 Brefeldin A-inhibited guanine nucleotide-exchange protein 1-like Glycine max 2.45991E-13 0_13058_01 WHISP XP_002277239 Polygalacturonase-like Vitis vinifera 8.11309E-42 0_13237_01 WHISP XP_003541002 Cyclic nucleotide-gated ion channel 4-like Glycine max 1.3544E-53 0_1347_01 WHISP NP_001064481 Uncharacterized protein Oryza sativa 5.83622E-60 0_13913_02 WHISP XP_001765960 Exocyst subunit exo70 family protein g1 Physcomitrella patens 9.83018E-22 0_13957_02 WHISP NA       0_13978_01 WHISP XP_002312240 AP-5 complex subunit beta-1-like Populus trichocarpa 3.41945E-35 169  Table S8 (continued). Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value 0_14221_01 WHISP XP_004237941 Serine--tRNA ligase-like Solanum lycopersicum 3.24459E-13 0_14837_01 WHISP NA       0_15187_01 WHISP NP_001170305 Mediator of RNA polymerase II transcription subunit 23-like Zea mays 2.12545E-25 0_15361_01 WHISP XP_002275339 Casp-like protein vit_01s0010g01870-like Vitis vinifera 3.38026E-18 0_15762_01 WHISP NA       0_15867_01 WHISP XP_004135876 Receptor-like protein kinase At1g80870-like Cucumis sativus 1.96306E-35 0_15991_01 WHISP XP_002331573 E3 ubiquitin-protein ligase COP1-like Populus trichocarpa 1.05825E-15 0_16619_01 WHISP XP_003538481 DEAD-box ATP-dependent RNA helicase 18-like isoform 1 Glycine max 5.47232E-33 0_1688_02 WHISP XP_002329462 LRR receptor-like serine threonine-protein kinase at5g10290-like Populus trichocarpa 4.52669E-28 0_16889_02 WHISP NA       0_17017_01 WHISP XP_001769898 CGS1 mRNA stability Physcomitrella patens 2.39438E-14 0_17247_02 WHISP NA       0_17938_01 WHISP XP_001751435 ATP-binding cassette transporter, subfamily B Physcomitrella patens 1.47236E-33 0_18261_01 WHISP XP_002991368 Ribonuclease p Selaginella moellendorffii 2.45181E-40 0_18267_01 WHISP XP_001785144 Predicted protein Physcomitrella patens 2.45596E-11 0_2433_01 WHISP NA       0_2576_02 WHISP NA       0_3073_01 WHISP NA       0_3128_02 WHISP NA       0_3192_01 WHISP XP_004287411 Protein CbxX, chromosomal-like Fragaria vesca 1.36954E-79 0_350_01 WHISP NA       0_4032_02 WHISP XP_001760681 U-box domain-containing protein 16-like Physcomitrella patens 4.86587E-34 0_4105_01 WHISP NA       0_4756_01 WHISP NA       0_6047_02 WHISP NA       170  Table S8 (continued). Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value 0_6448_02 WHISP XP_003558946 Endoribonuclease Dicer homolog 1-like Brachypodium distachyon 3.53329E-45 0_6683_01 WHISP XP_003616654 Zinc finger CCCH domain-containing protein Medicago truncatula 2.41529E-11 0_6878_01 WHISP XP_003540983 F-box protein ORE9-like Glycine max 7.90072E-59 0_7001_01 WHISP XP_004232370 NADPH-dependent diflavin oxidoreductase ATR3-like isoform 2 Solanum lycopersicum 6.09245E-57 0_7009_01 WHISP XP_002978427 F-box kelch-repeat protein at1g55270-like Selaginella moellendorffii 7.00509E-52 0_771_01 WHISP NA       0_7844_01 WHISP XP_003620546 Immunoglobulin superfamily DCC subclass member Medicago truncatula 3.43615E-36 0_820_02 WHISP NP_001236070 1-deoxy-D-xylulose 5-phosphate synthase 1 Glycine max 5.8043E-24 0_846_01 WHISP NA       0_8683_01 WHISP NP_001056244 Serine threonine-protein kinase at1g18390-like Oryza sativa 2.2618E-61 0_8737_01 WHISP NA       0_8844_01 WHISP XP_002284273 Galacturonosyltransferase 13-like Vitis vinifera 4.25941E-71 0_9408_01 WHISP XP_002329512 Transcription factor myc2-like Populus trichocarpa 5.88057E-48 0_9462_01 WHISP XP_003631537 F-box LRR-repeat protein 10-like Vitis vinifera 7.89203E-41 0_9749_01 WHISP XP_001780578 LRR receptor-like protein kinase at5g49770-like Physcomitrella patens 2.73578E-16 0_9922_01 WHISP NA       1_1609_01 WHISP XP_004136341 Uncharacterized protein Cucumis sativus 1.23457E-12 2_10212_01 WHISP XP_002867914 Glutathione s-transferase family protein Arabidopsis lyrata 8.07938E-53 2_2501_01 WHISP NA       2_2799_03 WHISP NA       2_2952_01 WHISP NA       2_2960_02 WHISP NA       2_3465_01 WHISP NA       2_3591_03 WHISP NA       171  Table S8 (continued). Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value 2_3726_02 WHISP XP_004299626 dnaJ homolog subfamily B member 1-like Fragaria vesca 1.22578E-27 2_3852_01 WHISP NA       2_3867_02 WHISP XP_002324161 Profilin-2 4-like Populus trichocarpa 7.34503E-16 2_4107_01 WHISP XP_004229619 Thylakoid lumenal 19 kDa protein, chloroplastic-like Solanum lycopersicum 1.57189E-61 2_5064_01 WHISP NA       2_5483_02 WHISP NA       2_5668_01 WHISP XP_002530043 Alpha-L-fucosidase 2 precursor Ricinus communis 5.93024E-42 2_5724_02 WHISP NA       2_6052_01 WHISP XP_003634217 Manganese-dependent ADP-ribose/CDP-alcohol diphosphatase-like Vitis vinifera 2.06546E-72 2_6313_01 WHISP XP_002982597 Uncharacterized protein Selaginella moellendorffii 1.75375E-21 2_6355_02 WHISP NA       2_6457_01 WHISP NA       2_6491_01 WHISP NA       2_6731_01 WHISP XP_002510145 F-box protein GID2, putative Ricinus communis 4.73982E-17 2_684_01 WHISP XP_002977841 LRR receptor protein kinase exs-like Selaginella moellendorffii 4.10017E-53 2_7189_01 WHISP XP_003632775 Subtilisin-like protease-like Vitis vinifera 1.1322E-63 2_7213_02 WHISP NA       2_7532_01 WHISP NA       2_7852_01 WHISP NA       2_8627_01 WHISP XP_002302935 Centromere protein v-like Populus trichocarpa 4.47563E-49 2_8852_01 WHISP NP_001051803 Galactokinase-like Oryza sativa 1.42099E-17 2_9466_01 WHISP NP_001141972 Zinc metalloprotease SLR1821-like Zea mays 9.4877E-20 2_9542_01 WHISP XP_004245687 Aminotransferase ACS10-like Solanum lycopersicum 2.62103E-58 2_9603_01 WHISP NA       172  Table S8 (continued). Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value  2_9665_01 WHISP XP_002509420 interferon-induced guanylate-binding protein Ricinus communis 5.85685E-41  CL1367 Contig1_03 WHISP XP_003549901 Monocopper oxidase-like protein SKU5-like Glycine max 8.00946E-32  CL1536 Contig1_03 WHISP NP_001234025 GDP-mannose pyrophosphorylase Solanum lycopersicum 3.79592E-37  CL1588 Contig1_04 WHISP NP_001055523 Alpha beta-hydrolases superfamily protein Oryza sativa 1.58819E-33  CL1634 Contig1_03 WHISP XP_004141678 Transmembrane 9 superfamily member 4-like Cucumis sativus 1.99085E-28  CL1646 Contig1_01 WHISP XP_002279779 Coatomer subunit alpha-1 Vitis vinifera 1.75848E-28  CL1692 Contig1_05 WHISP XP_001783895 E3 ubiquitin-protein ligase BRE1-like 2-like Physcomitrella patens 1.97078E-11  CL1694 Contig1_02 WHISP XP_004139267 116 kDa U5 small nuclear ribonucleoprotein component-like Cucumis sativus 1.09696E-27  CL1698 Contig1_01 WHISP NA        CL1767 Contig1_02 WHISP NP_001043181 Mitochondrial substrate carrier family protein W-like Oryza sativa 1.3776E-27  CL1806 Contig1_01 WHISP XP_002446397 Arogenate dehydratase prephenate dehydratase chloroplastic-like Sorghum bicolor 1.8671E-30  CL180 Contig1_03 WHISP XP_002532733 Glycosyltransferase QUASIMODO1 Ricinus communis 4.78431E-23  CL1852 Contig1_01 WHISP NA        CL1879 Contig1_02 WHISP NA        CL1905 Contig1_03 WHISP XP_002302405 Pollen-specific protein sf3-like Populus trichocarpa 2.00083E-11  CL1966 Contig1_05 WHISP NA        CL206 Contig1_03 WHISP NA        CL2332 Contig1_01 WHISP XP_002268814 Calcium-dependent protein kinase 21 Vitis vinifera 5.49667E-18  CL2637 Contig1_04 WHISP XP_002512676 Beta-1,4-xylosyltransferase IRX10L-like Ricinus communis 4.28971E-46  CL3007 Contig1_02 WHISP NA        CL3036 Contig1_01 WHISP NA        CL3097 Contig1_01 WHISP NA        CL3321 Contig1_03 WHISP XP_001754987 Nucleolar protein 56-like Physcomitrella patens 1.96797E-28  CL3770 Contig1_01 WHISP NA           173  Table S8 (continued). Gene annotations of RefSeq database (“tBlastx”). Gene Gene set a Top hit accession b Top hit description Top hit species Top hit e-value CL3795 Contig1_01 WHISP NP_001140502 C-1-tetrahydrofolate cytoplasmic-like Zea mays 3.24317E-17 a: GCAT: resequenced candidate genes from the white spruce gene catalog (Picea glauca); WHISP: White Pine Resequencing Project. b: NA = No significant hit.  174  Table S9. Number of FST outlier SNPs detected by BayeScan when varying the prior odds (PO) setting to 10, 100 and 1000 and using a false discovery rate (FDR) of 5%. Analysis was performed using all populations ("range-wide") and only populations within genetic groups ("northern group" and "southern group").   Pinus monticola  Pinus strobus   Range-wide Northern group Southern group  Range-wide Northern group Southern groupBayescan                 Divergent                    PO 10 2 0 2  2 1 0       PO 100 a 1 0 0  2 1 0       PO 1000 1 0 0  2 1 0                   Balancing                    PO 10 6 3 0  52 37 0       PO 100 a 2 0 0  9 5 0       PO 1000 0 0 0  3 3 0                   Total                    PO 10 8 3 2  54 38 0       PO 100 a 3 0 0  11 6 0       PO 1000 1 0 0  5 4 0 a: prior odd of 100 retained for compilation of results and comparisons between methods (see “3.2.5 FST outlier tests”).    175  Table S10. Redundancy Discriminant Analysis (RDA) of among population genetic variation (dependent variable) on climatic variables (independent variables), constrained by space and ancestry in Pinus monticola.   Range-wide (n = 61) a   Northern group (n = 54) a   Southern group (n = 7) a,c axes Var Pr (>F) b   Var Pr (>F) b   Var Pr (>F) b RDA1 0.0896 0.003 **   0.09037 0.003 **   NA NA RDA2 0.08656 0.004 **   0.08058 0.007 **   NA NA RDA3 0.07225 0.022 *   0.06971 0.044 *   NA NA RDA4 0.0608 0.146   0.06225 0.125   NA NA RDA5 0.05675 0.215   0.0575 0.194   NA NA RDA6 0.04857 0.494   0.0501 0.469   NA NA RDA7 0.04306 0.681   0.0461 0.648   NA NA RDA8 0.04155 0.765   0.04174 0.757   NA NA RDA9 0.03377 0.959   0.03802 0.897   NA NA RDA10 0.02792 0.99   0.03055 0.968   NA NA RDA11 0.02566 0.996   0.02875 0.987   NA NA RDA12 0.02361 0.999   0.024 0.998   NA NA Residual 1.76619     1.59103     NA   a: n = the number of populations sampled. b: 1000 permutation. Significance codes: *** = p < 0.001; ** = p < 0.01; * = p < 0.05; • = p < 0.1. c: not calculated due insufficient number of populations and collinearity among explanatory variables.     176  Table S11. Redundancy Discriminant Analysis (RDA) of among population genetic variation (dependent variable) on climatic variables (independent variables), constrained by space and ancestry in Pinus strobus.   Range-wide (n = 61)   Northern group (n = 54)   Southern group (n = 7 a) axes Var Pr (>F)   Var Pr (>F)   Var Pr (>F) RDA1 0.10537 0.001 ***   0.11321 0.001 ***   NA NA RDA2 0.05609 0.002 **   0.05639 0.002 **   NA NA RDA3 0.05126 0.002 **   0.05268 0.004 **   NA NA RDA4 0.0381 0.102   0.04508 0.037 *   NA NA RDA5 0.03261 0.35   0.03481 0.395   NA NA RDA6 0.03188 0.361   0.03143 0.589   NA NA RDA7 0.02587 0.797   0.0298 0.696   NA NA RDA8 0.02341 0.913   0.02677 0.865   NA NA RDA9 0.02273 0.922   0.02456 0.906   NA NA RDA10 0.01873 1   0.02079 0.995   NA NA RDA11 0.01611 1   0.0161 1   NA NA RDA12 0.0133 1   0.01171 1   NA NA Residual 2.62434     2.69083     NA   a: not calculated due insufficient number of populations and collinearity among explanatory variables.  177  Table S12. Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG S-005 S 0.019 0.021 0.025             [Eref] * [Lat] *         S-007 Intron 0.282 (div *) 0.137 -0.021   div **       [Lat] *       [Lat, Elev] ** [Lat] *   S-015 3' UTR 0.074 0.039 0.052           [Lat] *         [Lat] *   S-016 NS 0.109 0.102 NA       NA       NA   [Lat] ** [Lat] * NA S-017 Intron 0.058 0.058 0.030               [CMD, Lat] **         S-025 Intron 0.228 (div *) 0.096 0.015           [Lat] *       [Lat, MCMT] ** [Lat] *   S-001 S 0.305 (div *) 0.011 0.700                       [MCMT, MAT, MAP, Elev, Long] *    178  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG S-011 NA 0.055 0.042 0.003                   [Lat] *     S-012 NA 0.029 0.029 0.022   bal *                     S-024 S 0.011 0.012 -0.059                   [MCMT, MAT, eFFP] ** [MCMT, eFFP, MAT, Lat] **   Q-029 S 0.166 0.033 0.343                   [Lat] *     Q-033 NS 0.031 0.032 0.075                   [MAT] *     Q-034 S 0.118 0.103 0.007           [Elev] *             P-003 NS 0.012 0.014 -0.031                   [PAS] *        179  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG P-004 NS 0.046 0.035 0.205           [Lat] **   [Long, MCMT, MAT, Eref] ***         P-007 S 0.039 0.025 0.250                   [Elev] *     P-008 S 0.080 0.077 -0.050           [Elev, Lat] ** [Elev, Lat] **           P-010 Intron 0.046 0.043 NA       NA       NA       NA P-019 NA 0.067 0.047 -0.010           [Eref] *             P-022 NA 0.014 0.016 0.003                     [Eref] *   P-023 S 0.105 0.043 0.134           [Lat] *       [Lat] *        180  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG P-034 S 0.046 0.045 0.003                   [MCMT, eFFP, Lat, Long] ** [MCMT, Lat, eFFP, MAT] *   P-039 NS 0.223 0.021 0.826 (div **)                     [Lat] * [MCMT, MAP, MAT, Elev, AHM, Eref, MWMT] * Q-004 3' UTR 0.091 0.103 -0.153 (bal ***)           [CMD, Eref, AHM] * [Eref, CMD, SHM] *** [Elev] **     [bFFP, Elev] *   Q-005 3' UTR 0.059 0.046 0.230           [Eref] ** [Eref, CMD, Lat] ***           Q-008 3' UTR 0.038 0.027 0.043           [Eref, MWMT] ** [Eref, MWMT, CMD, Long] ***           Q-010 S 0.039 0.033 0.064           [Eref] *                181  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG Q-013 NS 0.068 0.047 0.167           [Eref] *** [Eref, MAT] ***           Q-014 NS 0.090 0.076 0.313           [MWMT] *             Q-016 S 0.125 0.124 (div *) -0.035                         Q-018 S 0.027 0.016 0.051           [MSP, PAS] * [MAP, MSP, PAS, SHM, CMD] *** [CMD, Lat] **         Q-021 NA 0.050 0.046 0.010           [Eref, CMD] **[Eref, CMD, MWMT, Long] *** [bFFP] **         Q-022 NA 0.026 0.002 -0.023           [Eref, CMD, Lat] *** [Eref, CMD, Lat, MWMT, SHM] ***     [Lat] *        182  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG Q-023 NA 0.050 0.037 -0.053           [Elev, Lat, Eref] *** [Elev, Eref, Lat, Long] ** [Long] ***         Q-026 S 0.044 0.045 0.030                   [MCMT] * [MAP, AHM, MCMT] *   Q-028 S 0.068 0.057 0.170             [Eref] *           Q-030 S 0.091 0.028 0.241           [Eref] ** [Eref] **           Q-031 NS 0.060 0.063 -0.002           [Eref] ** [Eref, CMD] ***           Q-032 S 0.054 0.055 0.028           [Eref, Lat] * [Eref, CMD, Lat] ***           Q-036 NA 0.093 0.062 0.089           [Lat, MSP, Elev] *** [Lat, MSP] * [Lat] **            183  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG Q-038 S 0.100 0.103 -0.028             [Eref] *           R-008 NS 0.085 0.094 -0.091                   [Long, MCMT, eFFP] ** [Long, MCMT, eFFP] **   R-009 NA 0.124 0.056 0.047           [Lat] **             R-017 3' UTR 0.140   0.054           [Elev, bFFP, PAS, MWMT] ** [Elev, SHM] **           R-022 S 0.233 0.018 0.826 (div **)                       [MCMT, MAP, MAT, Elev, AHM, Eref, MWMT] * R-029 Intron 0.062 0.054 0.182                     [Eref] *      184  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b  BayeScan b  LFMM b  Bayenv b SNP SNP code a RW NG SG  RW NG SG  RW NG SG  RW NG SG R-037 S 0.035 0.030 0.075                [Elev] *     S-006-2 NS 0.040 0.025 0.032                [Lat] * [Lat] *   S-034 Intron 0.203 (div *) 0.077 0.317                [Lat, Long] *     S-039, G-020 S 0.057 0.048 -0.017                [Elev] *     S-040, G-023 S 0.068 0.057 NA      NA      NA    [MCMT] * NA T-016 NS 0.385 (div ***) 0.378 (div ***) NA      NA      NA      NA T-029 Intron -0.011 -0.010 -0.039  bal *                      185  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG M-017 Intron 0.083 0.066 -0.033           [Eref, CMD] ** [Eref, CMD] ***           M-030 NS -0.013 -0.014 0.001             [Eref, CMD, Lat] ***           M-015 NS 0.000 0.003 -0.069   bal •                     M-025 S 0.041 0.041 0.026                         M-026 Intron 0.030 0.034 -0.028             [CMD, Eref, MSP] **          M-027 Intron 0.019 0.022 -0.026             [CMD, Eref, MSP, Lat, SHM] ***           M-029 S 0.031 0.034 0.003                   [MSP, SHM] *     186  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG M-031 NA 0.032 0.006 -0.003             [Eref, CMD, MSP] ***           M-034 NS 0.021 0.024 -0.041           [Eref] * [Eref, CMD] ***           M-006 NS 0.072 0.071 0.014           [CMD, SHM, AHM, MSP, Elev, Eref] ** [CMD, Eref] * [Long, Elev] ***         M-007 S 0.069 0.046 0.221                       [MSP, AHM, MAP, eFFP] * M-011 S 0.061 0.059 0.048                       [Elev] * M-013 S 0.045 0.039 0.043                   [Long, CMD] * [Long, MAP, eFFP] **      187  Table S12 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus monticola using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland b   BayeScan b   LFMM b   Bayenv b SNP SNP code a RW NG SG   RW NG SG   RW NG SG   RW NG SG M-028 S 0.155 0.089 0.288                       [AHM, MAP, MSP, eFFP, bFFP] * a: S: synonymous; NS: non-synonymous SNP; NA: not annotated. b: Significance codes for FDIST Ritland, Bayescan, and LFMM: * = q < 0.05; ** = q < 0.01; *** = q < 0.001. For Bayenv:  * = Bayes factor (BF) > 3, ** = BF > 10, *** = BF > 32, **** = BF > 100. Blank space = non-significant. NA = not tested because the SNP was monomorphic in this dataset. div: divergent selection, bal: balancing selection.   188  Table S13. Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG G-001 Intron 0.071 0.067 0.103           [AHM, Long, MWMT, PAS] *** [AHM, PAS, Long, MWMT, bFFP, Eref, SHM, CMD] ***           G-004 Intron 0.059 0.006 0.064           [bFFP, eFFP, Eref, Lat, MAT, MCMT, MWMT] **   [PAS] *         G-009 Intron 0.057 0.036 0.061           [bFFP, CMD, Eref, Lat, MAT, MWMT, PAS] **             G-010 S 0.039 0.048 0.004           [Long] * [Long] *           G-015 Intron 0.051 0.049 0.066           [AHM, SHM] *** [AHM, PAS, SHM, Long] **           189  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG G-021 Intron 0.076 0.048 0.094           [PAS] * [MAP, AHM, Long, PAS] **           G-031 S 0.022 0.029 -0.014           [MAP] *             G-033 S 0.031 0.020 0.076           [PAS] *             G-002 S 0.059 0.061 0.048             [Long] *           G-003 Intron 0.056 0.040 0.128           [SHM] * [bFFP, MWMT, AHM] *           G-011 NS 0.071 0.066 0.082           [Elev, Long, PAS] *** [Long, MAP, PAS, AHM, Elev] ***              190  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland  BayeScan  LFMM  Bayenv SNP SNP code RW NG SG  RW NG SG  RW NG SG  RW NG SG G-014 NS 0.168 (div **) 0.169 (div *) NA  div *** div *** NA  [Long, PAS] * [Long, PAS] * NA  [Eref, Lat, MAT, PAS, MSP, Elev, MWMT, MCMT, bFFP] *** [Eref, Lat, PAS, Elev] * NA G-019 3' UTR 0.073 0.050 -0.006         [bFFP, Eref, Lat, MWMT, PAS] *            G-026 NS 0.047 0.038 0.042                  [MAP, Long] *   G-027 S 0.046 0.051 0.027         [AHM, Long, PAS] ** [AHM, PAS, MAP, Long] **          G-028 Intron 0.071 0.038 0.210         [AHM] ** [AHM] *        [MSP, MAP] *    191  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland  BayeScan  LFMM  Bayenv SNP SNP code RW NG SG  RW NG SG  RW NG SG  RW NG SG G-035 Intron 0.064 0.028 0.165            [MAP, CMD, MSP,AHM] * G-036 Intron 0.021 0.018 0.031  bal *                   T-008 Intron 0.055 0.032 0.115           [Elev] *        [CMD, AHM] * T-009 Intron 0.097 0.104 0.054                [SHM, AHM] *[AHM] *   T-015 NS 0.075 0.031 0.062         [Elev, Eref, Lat, MAP, MSP] **          [Lat, Eref, MCMT] * T-017 Intron 0.069 0.038 0.222                    [MSP] * T-022 Intron 0.020 0.020 0.018  bal **                   192  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG N-040 S 0.056 0.016 0.028                   [MSP, Lat, Eref, MAT, MAP, MCMT, bFFP] **     O-001 NS 0.113 0.121 NA       NA       NA   [MAP, MCMT, Long, eFFP, MAT, bFFP, Lat] ** [MAP, Long, MCMT, eFFP] *** NA O-002 S 0.100 0.100 (div *) 0.044           [bFFP, eFFP, Eref, Lat, MAP, MAT, MCMT, MWMT] ***             O-019 NA 0.053 0.042 0.084           [bFFP, Lat, PAS] *             O-026 S 0.072 0.023 0.090                   [MAT, MCMT] *     193  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG M-004 Intron 0.066 0.034 0.150                       [CMD, MSP, MAP, AHM, SHM] * M-021 NS 0.041 0.033 0.042           [Eref, Lat] *             M-022 NA 0.099 0.071 0.085           [Elev, PAS] *** [MCMT, MAT, Elev] *           N-002 3' UTR -0.005 -0.002 -0.010                   [Eref] *     N-004 S -0.002 (bal **) -0.005 -0.008   bal **                     N-006 Intron 0.007 -0.005 0.010           [Elev, Lat, MSP] *** [MSP, Elev, Lat, SHM, MAP, MCMT, MAT] ***               194  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG N-010 3' UTR 0.065 0.029 0.184                       [PAS, MCMT] * N-015 S 0.022 0.021 NA       NA       NA   [MWMT] *   NA N-017 NA 0.059 0.015 0.140             [MSP, MAP] ***           N-018 Intron 0.046 0.038 0.049             [MSP] **     [Long] *     N-019 NS 0.044 0.024 0.094                       [Eref] * N-020 NS 0.064 0.052 0.019           [Eref, Lat, MAT, MCMT, PAS] *             N-021 NA 0.019 0.019 0.017   bal *** bal ***                      195  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG N-023 NA 0.043 0.031 0.045             [MSP, SHM, Elev] ***           N-028 NS 0.012 0.006 0.019   bal *** bal ***                   N-029 NA 0.224 (div ***) 0.103 (div *) 0.072   div ***       [AHM, bFFP, CMD, eFFP, Elev, Eref, Lat, MAP, MAT, MCMT, MSP, MWMT, PAS] *** [Lat, MAT, bFFP, Eref, MCMT, MWMT, PAS] ***     [LAT, MAT. MCMT, Eref, bFFP, eFFP, MWMT, PAS, MAP] **** [Lat, MAT, bFFP, MCMT, Eref, eFFP] **   N-033 NS 0.033 0.025 0.034               [Lat, MCMT, PAS, Eref] ***            196  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG N-034 NS 0.046 0.024 0.124               [MCMT, Lat, Eref, PAS, MAT, eFFP, bFFP] ***         N-035 NA 0.000 0.001 -0.007   bal *** bal ***     [Elev] ** [MSP, Elev, SHM, MAP] ***           O-004 3' UTR 0.038 0.010 0.155   bal * bal **                 [eFFP] * O-009 Intron 0.038 0.017 0.093                       [bFFP, eFFP, MAT, MCMT] * O-012 3' UTR 0.008 0.004 0.011   bal *                     O-013 NS 0.041 0.023 0.047           [Elev, Eref, Lat, MSP, PAS] *                197  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM  Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG O-014 NA 0.037 0.020 0.087                   [Long] *     O-016 NS 0.020 -0.007 0.031                   [MAP] * [MAP] *   O-022 NA 0.076 0.057 0.191           [Eref, Lat] *           [Eref, Lat, PAS, MCMT] * O-032 3' UTR 0.128 0.038 0.385 (div *)                       [CMD, MSP, SHM, MAP] ** O-034 Intron 0.039 0.041 0.031                     [MAP] *   O-035 Intron 0.051 0.025 0.078           [Eref, Lat, PAS] *             O-037 3' UTR 0.019 0.011 0.044           [PAS] * [PAS] *              198  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland  BayeScan  LFMM  Bayenv SNP SNP code RW NG SG  RW NG SG  RW NG SG  RW NG SG O-038 Intron 0.057 0.063 -0.004         [PAS] *            T-033 NS 0.099 0.125 -0.018                [Long, MAP] * [MAP, Long, MCMT] *   T-034 NA 0.035 0.043 -0.002                [Long] ** [Long, MAP] **   T-035 3' UTR 0.012 0.016 -0.004  bal *                   T-036 S 0.095 0.028 0.121         [Elev, Lat] *            S-039, G-020 S 0.050 0.048 0.027         [MAP, MSP] ** [MAP, AHM] *         S-038, G-008 Intron 0.024 0.029 -0.006           [Lat, Eref, Elev] *             199  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland   BayeScan   LFMM   Bayenv SNP SNP code RW NG SG   RW NG SG   RW NG SG   RW NG SG T-016 NS 0.076 0.034 0.088           [Lat, MAP] *             T-019 Intron 0.042 0.033 -0.027           [bFFP, CMD, eFFP, Elev, Eref, Lat, MAP, MAT, MCMT, MSP, MWMT, PAS] *** [MSP, PAS, Eref, bFFP, MAP, MCMT, Lat, eFFP, MWMT, MAT, CMD, AHM, SHM] *** [Long, AHM, Lat, MAP] ***        M-017 Intron 0.118 0.093 0.091           [bFFP, CMD, eFFP, Eref, Lat, MAP, MAT, MCMT, MWMT, PAS] ***       [bFFP, PAS, MAT, eFFP, CMD, Lat] *   [Lat, Long] *    200  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland  BayeScan  LFMM  Bayenv SNP SNP code RW NG SG  RW NG SG  RW NG SG  RW NG SG M-016 NS 0.070 0.070 0.067         [AHM, Long, MAP, MCMT] *** [Long, MAP, AHM, PAS, MCMT] ***    [Long, MAP, AHM] *** [Long, MAP, AHM, PAS] ***   M-025 S 0.063 0.035 0.092         [Lat] *            M-026 Intron 0.058 0.058 0.058           [MAP, MCMT] *          M-027 Intron 0.058 0.058 0.058           [MAT, MAP, MCMT, Eref] **          M-029 S 0.080 0.099 0.014                  [SHM, eFFP] *   M-031 NA 0.080 0.083 0.023           [MAT, MCMT] **    [Lat, bFFP, MAT, Eref, eFFP,MCMT] *        201  Table S13 (continued). Results and annotation of candidate SNPs detected by FDIST Ritland, BayeScan, LFMM and Bayenv 2 in Pinus strobus using “range-wide” (RW), “northern group” (NG) and “southern group” (SG) populations.      FDIST Ritland  BayeScan  LFMM  Bayenv SNP SNP code RW NG SG  RW NG SG  RW NG SG  RW NG SG M-034 NS 0.048 0.032 0.070                [bFFP] *     M-002 NS 0.040 0.026 0.082    bal *                 M-008 S 0.041 0.049 0.016                [Long] * [Long] *   Q-024, O-021 NA 0.091 0.066 0.081         [bFFP, Elev, Eref, Lat, MAT, MCMT, MSP, MWMT, PAS] ***            T-031 Intron 0.071 0.077 0.031         [Eref, Lat] * [Eref] *           202  Appendix 2: Supplementary figures  Figure S1. Selected population samples for genotyping in Pinus monticola and P. strobus.203  a) b) Figure S2. Principal component analysis used to select provenances for genotyping in a) Pinus monticola and b) P. strobus. LTmin: lowest daily minimum temperature (°C); MTmin: mean of daily minimum temperature (°C); MTmean: mean of daily mean temperature (°C); MTmax: mean of daily maximum temperature (°C); HTmax: highest daily maximum temperature (°C); Precip: total precipitation (mm); SF: total snowfall (mm water); DP: mean dew point temperature (°C); RH: mean relative humidity (%);WS: mean wind speed at 10 m (km/h); DD0: degree-day summation over 0°C (°C/day); DD4: degree-day summation over 4°C (°C/day); FD: number of frost days (days); FFD: number of frost free days (days); GS: Growing season (days); PET: total potential evapotranspiration (mm); Ar: Aridity (water deficit, mm); VPD: Vapor pressure deficit (kPa); Rad: total radiation (MJ/m²); DwP: number of days with precipitation (days); CDwoP: consecutive days without precipitation (days). 204  a) b)  Figure S3. Histograms of minor allele frequency in a) Pinus monticola and b) P. strobus.  205  a) b) Figure S4. Linear regression of HO against latitude and longitude in a) Pinus monticola and b) P. strobus. Population sample size was included as a covariate in the model. 206  a) b) c) d) Figure S5. STRUCTURE analyses using all population samples (“range-wide”) for Pinus monticola (a,c) and P. strobus (b,d): a), b) are plots of log probability of the data (L[K]) vs. K; and c), d) are plots of delta K vs. K (Evanno et al. 2005). Plots were constructed using STRUCTURE HARVESTER (Earl & VonHoldt 2012). Arrows indicate the inferred number of clusters.  ‐45500‐45000‐44500‐44000‐43500‐43000‐42500‐420001 2 3 4 5 6 7 8 9 10Mean Ln P(K)K ‐116000‐115500‐115000‐114500‐114000‐113500‐113000‐112500‐112000‐1115001 2 3 4 5 6 7 8 9 10Mean Ln P(K)K0501001502002503002 3 4 5 6 7 8 9Mean Ln P(K)K024681012142 3 4 5 6 7 8 9Mean Ln P(K)K207  a) b) c) d) Figure S6. STRUCTURE analyses using population samples from the “northern groups” for Pinus monticola (a,c) and P. strobus (b,d): a), b) are plots of log probability of the data (L[K]) vs. K; and c), d) are plots of delta K vs. K (Evanno et al. 2005). Plots were constructed using STRUCTURE HARVESTER (Earl & VonHoldt 2012). Arrows indicate the inferred number of clusters.  ‐39800‐39600‐39400‐39200‐39000‐38800‐38600‐38400‐38200‐380001 2 3 4 5 6 7 8 9 10Mean Ln P(K)K ‐101000‐100500‐100000‐99500‐99000‐98500‐98000‐975001 2 3 4 5 6 7 8 9 10Mean Ln P(K)K02468101214162 3 4 5 6 7 8 9Mean Ln P(K)K00.511.522.533.544.552 3 4 5 6 7 8 9Mean Ln P(K)K208  a) b) c) d) Figure S7. STRUCTURE analyses using population samples from the “southern groups” for Pinus monticola (a,c) and P. strobus (b,d): a), b) are plots of log probability of the data (L[K]) vs. K; and c), d) are plots of delta K vs. K (Evanno et al. 2005). Plots were constructed using STRUCTURE HARVESTER (Earl & VonHoldt 2012). Arrows indicate the inferred number of clusters. ‐4100‐4000‐3900‐3800‐3700‐3600‐3500‐34001 2 3 4 5 6 7 8 9 10Mean Ln P(K)K ‐14200‐14000‐13800‐13600‐13400‐13200‐13000‐12800‐126001 2 3 4 5 6 7 8 9 10Mean Ln P(K)K0246810121416182 3 4 5 6 7 8 9Mean Ln P(K)K05101520252 3 4 5 6 7 8 9Mean Ln P(K)K209  a) b) Figure S8. Individual-based FST vs. Weir & Cockerham (1984) FST estimator (W&C) in a) Pinus monticola and b) P. strobus. Dashed lines refer to equal individual-based FST and W&C FST. Red diamonds indicate FST outliers detected by the “FDIST Ritland” method.  Y = 0.003 + 0.777x, R2 = 0.87, p < 2e-16 Y = 0.013 + 0.827x, R2 = 0.71, p < 2e-16Y = 0.003 - 0.223x, R2 = 0.35, p < 2e-16 Y = 0.013 - 0.173x, R2 = 0.09, p < 1e-04 210  a)     b)    c)    Figure S9. FDIST Ritland test in Pinus monticola using a) “range-wide” populations b) “northern group” populations and c) “southern group” populations. Red diamonds indicate outlier loci at a false discovery rate (FDR) of 5%.  HO HO HO Individual based FST Individual based FST Individual based FST 211  a)    b)    c)    Figure S10. FDIST Ritland test in Pinus strobus using a) “range-wide” populations b) “northern group” populations and c) “southern group” populations. Red diamonds indicate outlier loci at a false discovery rate (FDR) of 5%.   HOHOHOIndividual based FST Individual based FST Individual based FST 212   Figure S11. Hierarchical model (blue lines) vs. symmetrical island model (black lines) for the FDIST Ritland method in Pinus monticola. Red diamonds indicate outlier loci at a false discovery rate (FDR) of 5%.     HO Individual based FST 213  a) b) Figure S12. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus monticola “range-wide” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown. Climate variables are represented by arrows. SNPs detected by Bayenv 2 = blue triangles; detected by LFMM = blue crosses; detected by both Bayenv 2 and LFMM = red diamonds; undetected SNPs = black circles. 214  a) b)  Figure S13. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus monticola “northern group” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown.    215  a) b) Figure S14. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus strobus “range-wide” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown.   216  a) b) Figure S15. Test of the effect of climate on among-population differentiation, when controlled for space and ancestry, using redundancy discriminant analysis in Pinus strobus “northern group” populations. Biplots of a) 1st and 2nd axis; and b) 2nd and 3rd axis are shown. 217  Appendix 3: Python code to look for private alleles in populations or genetic groups I developed this code in Python v.2.7.2 to detect alleles occurring in only one population or genetic group. #Start of code fz = open('out_mol_div_indices_each_marker_each_pop.txt', 'r') myfile2 = fz.readlines() fz.close myfile2_noheader = myfile2[1:]  #create a list of markers columns = myfile[0].split('\t') #ne pas oublier d'enlever les \n marker_list = columns[5:]  #create a list of populations population_list = [] for x in myfile_noheader:     if x.split('\t')[3] not in population_list:         population_list.append(x.split('\t')[3])  #create a list of all possible genotypes allele_list = ['A','T','C','G'] genotype_list = [{} for i in range(0,pow(2,len(allele_list))+1)] k=0 for i in range(0,len(allele_list)):     for j in range(0, len(allele_list)):         genotype_list[k] = allele_list[i]+allele_list[j]         k=k+1 genotype_list[16] = '0'  #create the output file name = 'out_range_wide_SNP_diversity_and_private_alleles.txt' fy = open(name, 'w')  #header of the output file genotype_titles = 'AA\tAT\tAC\tAG\tTT\tTC\tTG\tCC\tCG\tGG\tmissing data\tsuccessful genotypes\tnumber of trees' fy.write('marker\t' + genotype_titles + '\tallele 1\tallele 2\tp\tq\tHo\tHe(2pq)\tn_harmonic\tHo averaged over populations\tHs (2pq averaged over populations)\tHs adjusted\tHt\tHt adjusted\tFis\tFis adjusted\tFit\tFit adjusted\tFst\tFst_adjusted\tMinor Allele\tMAF\tnumber of populations the MA is present\tlist of populations\tnumber of populations the major A is present\tlist of populations' + '\n')   ### Note: other diversity indices were calculated by this program that were not used in this thesis ... 218  ###     ########     #look for private alleles in populations     #use: out_mol_div_indices_each_marker_each_pop.txt = myfile2     ########  for i in range (0,len(marker_list)):   #######select each marker     nb_pop_presence_MA = 0        pop_list_presence_MA = []     nb_pop_presence_MajA = 0     pop_list_presence_MajA = []     for line in myfile2_noheader: #each line correspond to a different population. Contains diversity indices (Alleles 1, Allele 2, p, q, Ho, He) for this loci         if line.split('\t')[1] == marker:             #count and list the populations the minor allele appears in:             if line.split('\t')[4] == MA: #check allele 1                 nb_pop_presence_MA = nb_pop_presence_MA + 1                 pop_list_presence_MA.append(line.split('\t')[0])             elif line.split('\t')[4] != 'NA':                 nb_pop_presence_MajA = nb_pop_presence_MajA + 1                 pop_list_presence_MajA.append(line.split('\t')[0])                              if line.split('\t')[5] == MA: # check allele 2                 nb_pop_presence_MA = nb_pop_presence_MA + 1                 pop_list_presence_MA.append(line.split('\t')[0])             elif line.split('\t')[5] != 'NA':                 nb_pop_presence_MajA = nb_pop_presence_MajA + 1                 pop_list_presence_MajA.append(line.split('\t')[0])                      #####     #Write the output file     fy.write(marker_list[i].replace('\n','') + '\t' + genotype_count + str(num_succ_genotypes) + '\t' + str(num_trees) + '\t' + str(alleles[0])              +  '\t' + str(alleles[1])+ '\t' + str(pq[0]) + '\t' + str(pq[1]) + '\t' + str(Ho) + '\t' + str(He) + '\t' + str(n_harm) + '\t' + str(Ho_aver_over_pops) + '\t'              + str(Hs_aver_over_pops)+ '\t' + str(Hs_adjust) + '\t' +  str(Ht)  + '\t' + str(Ht_adjust)  + '\t' + str(Fis)  + '\t'              + str(Fis_adjust)  + '\t' + str(Fit)  + '\t' + str(Fit_adjust)  + '\t' + str(Fst)  + '\t' + str(Fst_adjust) +'\t' + str(MA)+ '\t'              + str(MAF) + '\t' + str(nb_pop_presence_MA) + '\t' + str(pop_list_presence_MA) + '\t'+ str(nb_pop_presence_MajA) + '\t' + str(pop_list_presence_MajA) + '\n')     219  Appendix 4 : R code for redundancy discriminant analysis This original R code was provided by Dr. Patrick Meirmans and adapted by Simon Nadeau for the present thesis. #use the vegan package for vegetation analysis library(vegan) library(packfor) library(SoDA)  # Available on R-Forge,                # URL= https://r-forge.r-project.org/R/?group_id=195  # In MacOS X, the gfortran package is required by the forward.sel function   # of packfor, so users must install the gfortran compiler. Choose â ˜MacOS Xâ ™   # in the â ˜cran.r-project.orgâ ™ window, then â ˜toolsâ ™.  #read the genetic data, the first row should contain headers (a combi of locus and allele names) #the first column should contain the names of the populations freqs = read.table("RDA_allelefreqs.txt", header=TRUE, row.names=1) freqs = freqs[,seq(1,dim(freqs)[2],2)] freqs = na.omit(freqs) #NAs create errors pops.to.keep<-row.names(freqs) freqs = freqs[order(rownames(freqs)),]  #fill in your overall fst global_fst = 0.081  #read the coordinates space.latlong = read.table("spatial.txt", header=TRUE, row.names=1) rows.to.keep<-which(rownames(space.latlong) %in% pops.to.keep) space.latlong = space.latlong[rows.to.keep,] space.latlong = space.latlong[order(rownames(space.latlong)),] space.xy = geoXY(space.latlong$lat, space.latlong$lon, unit = 1000) #convert to x and y, center but do not scale (yet) #make 2nd order polynomials x = scale(space.xy[,1], scale=FALSE) y = scale(space.xy[,2], scale=FALSE) space = poly(cbind(x,y), degree=2, raw=FALSE) colnames(space) <- c("X","X2","Y","Y2","XY")  #read the ecological data eco = read.table("climate.txt", header=TRUE, row.names=1) rows.to.keep<-which(rownames(eco) %in% pops.to.keep) eco = eco[rows.to.keep,] eco = eco[order(rownames(eco)),]  #I preselected some environmental variables that are uncorrelated (r < 0.8) names(eco) include = c("MAT", "MWMT", "MCMT", "MAP", "MSP", "AHM", "SHM", "bFFP", "eFFP", "PAS", "Eref", "CMD") eco = eco[,include] 220   #Ancestry from Structure results ancestry = read.table("ancestry.txt", header=TRUE, row.names=1) rows.to.keep<-which(rownames(ancestry) %in% pops.to.keep) ancestry = ancestry[rows.to.keep,] ancestry = ancestry[order(rownames(ancestry)),]  #Select ancestry variable to include include = c("K3_gr2", "K3_gr3") ancestry = ancestry[,include]  #scale the variables from 0 to 1, with unit variance #we do not want to do this below as part of the rda, #since we should not scale the allele frequencies because #that puts more emphasis on rare alleles and #removes the link with Fst space = scale(space) eco = scale(eco) ancestry=scale(ancestry)  #perform forward selection of spatial variables ord.spa = rda(freqs ~ ., data.frame(space), scale= FALSE) (R2.all.spa = RsquareAdj(ord.spa)$adj.r.squared) forward <- forward.sel(freqs, space, adjR2thresh= R2.all.spa, alpha=0.01) (selected.spa = forward[,1])  #subsample space to the selected variables #you may be interested to pause here to see which ones they are space = space[,selected.spa]  #I don't do forward selection of ecological variables, but you can do that here #ord.eco = rda(freqs ~ ., data.frame(eco), scale= FALSE) #(R2.all.eco = RsquareAdj(ord.eco)$adj.r.squared) #forward <- forward.sel(freqs, eco, adjR2thresh= R2.all.eco ) #(selected.eco = forward[,1])  #subsample eco to the selected variables #you may be interested to pause here to see which ones they are #eco = as.matrix(eco[,match(selected.eco,colnames(eco))])  #do variance partitioning part = varpart(freqs, eco, space, ancestry) fractions = part$part$indfract[,3]  #test significance of partitioning # Test of fractions [a+d+f+g] test_eco = anova.cca(rda(freqs, eco), step=1000) # Test of fractions [b+d+e+g] test_space = anova.cca(rda(freqs, space), step=1000) # Test of fractions [c+e+f+g] test_ancestry = anova.cca(rda(freqs, ancestry), step=1000) # Test of fractions [a+b+c+d+e+f+g] = ALL three <- cbind(eco, space, ancestry) test_all = anova.cca(rda(freqs, three), step=1000) # Test of fraction [a] 221  space_ancestry <- cbind(space, ancestry) testA = anova.cca(rda(freqs~eco + Condition(space_ancestry), step=1000)) # Test of fraction [b] eco_ancestry <- cbind(eco, ancestry) testB = anova.cca(rda(freqs~space + Condition(eco_ancestry), step=1000)) #test of fraction [c] eco_space <- cbind(eco, space) testC = anova.cca(rda(freqs~ancestry + Condition(eco_space), step=1000))  #write the raw results to a file capture.output(part,file="RDA out.txt") capture.output(test_eco,file="RDA out.txt", append=TRUE) capture.output(test_space,file="RDA out.txt", append=TRUE) capture.output(test_ancestry,file="RDA out.txt", append=TRUE) capture.output(test_all,file="RDA out.txt", append=TRUE) capture.output(testA,file="RDA out.txt", append=TRUE) capture.output(testB,file="RDA out.txt", append=TRUE) capture.output(testC,file="RDA out.txt", append=TRUE)  #we plot a pie diagram to a pdf pdf(file = "pie.pdf", onefile=TRUE, paper="a4r", width=8, height=8)  par(bg="transparent")  #create labels, names etc for pie chart names = c("eco", "eco+space", "space", "unexplained")  pielabels = round(fractions/sum(fractions) * 100, 1) pielabels = paste(pielabels,"%", sep="") pielabels[1] = sprintf("%s (p=%s)", pielabels[1], testA[[5]][1]) pielabels[3] = sprintf("%s (p=%s)", pielabels[3], testC[[5]][1]) piecolors = c("darkolivegreen4", "goldenrod1", "red3", "grey") main = sprintf("Partitioning of among-population variation\n(Fst = %s)", global_fst)  #draw the pie chart and the legend pie(part$part$indfract[,3], clockwise =TRUE, labels = pielabels, font=2, cex=1.2, col=piecolors, main=main) legend("bottomright", legend=names, cex=1.0, fill= piecolors)  #close the pdf device dev.off()  ###Do variance partitionning without taking ancestry into account part = varpart(freqs, eco, space) fractions = part$part$indfract[,3]  #test significance of partitioning # Test of fractions [a+b] testAB = anova.cca(rda(freqs, eco), step=1000) # Test of fractions [b+c] testBC = anova.cca(rda(freqs, space), step=1000) # Test of fractions [a+b+c] both <- cbind(eco, space) testABC = anova.cca(rda(freqs, both), step=1000) # Test of fraction [a] testA = anova.cca(rda(freqs, eco, space), step=1000) # Test of fraction [c] 222  testC = anova.cca(rda(freqs, space, eco), step=1000)  #write the raw results to a file capture.output(part,file="RDA out no ancestry.txt") capture.output(testAB,file="RDA out no ancestry.txt", append=TRUE) capture.output(testBC,file="RDA out no ancestry.txt", append=TRUE) capture.output(testABC,file="RDA out no ancestry.txt", append=TRUE) capture.output(testA,file="RDA out no ancestry.txt", append=TRUE) capture.output(testC,file="RDA out no ancestry.txt", append=TRUE)  #we keep the model including ancestry  ###output the biplot to a pdf  #plot outliers in a different colour symb = rep(1, dim(freqs)[2]) cols = rep("black", dim(freqs)[2])  subsets <- rep(0, ncol(freqs)) # subsets of different outlier SNPs  #Bayenv outliers_bayenv = read.table("outliers_bayenv.txt", header=TRUE) all_outliers_bayenv = outliers_bayenv[,"SNP"]  subsets[colnames(freqs) %in% all_outliers_bayenv] <-"bayenv" # subsets for plotting the labels names  col.outlier.bayenv<-which(colnames(freqs) %in% all_outliers_bayenv) cols[col.outlier.bayenv] = "blue" #plotting outliers in a different color symb[col.outlier.bayenv] = 2 #plotting outliers in a different color  #LFMM outliers_LFMM = read.table("outliers_LFMM.txt", header=TRUE) all_outliers_LFMM = outliers_LFMM[,"SNP"]  subsets[colnames(freqs) %in% all_outliers_LFMM] <-"LFMM"  col.outlier.LFMM<-which(colnames(freqs) %in% all_outliers_LFMM) cols[col.outlier.LFMM] = "blue" symb[col.outlier.LFMM] = 3  #Two different methods. outliers_2methods = read.table("outliers_2methods.txt", header=TRUE) outliers_2methods = outliers_2methods[,"SNP"]  subsets[colnames(freqs) %in% outliers_2methods] <- "2methods"  col.outlier.2methods<-which(colnames(freqs) %in% outliers_2methods) cols[col.outlier.2methods] = "red" symb[col.outlier.2methods] = 5    ####biplot results RDA eco | (space +ancestry)  #do final, most interesting RDA, get % variances 223  rda_test = rda(freqs~eco + Condition(space_ancestry),scale=FALSE) part = varpart(freqs, eco, space, ancestry) fractions = part$part$indfract[,3]  overall.test = anova.cca(rda_test, step=1000) # P<0.01 axis.test = anova.cca(rda_test, by="axis", step=1000) #first 3 axis significant  tot_inert = rda_test$tot.chi orig_eigs = rda_test$CCA$eig orig_inert = sum(orig_eigs) all_perc = orig_eigs/orig_inert  #RDA1-2 rda_res = rda(freqs, eco, space_ancestry)  SNPs.scr <- scores(rda_res, scaling = 1)$species row.names(SNPs.scr) <- colnames(freqs)  main =sprintf("Allele frequencies ~ Eco | (Space + ancestry) (%.1f %%, p = %s)", 100*fractions[1], overall.test[[5]][1]); xlab=sprintf("RDA1 (%.0f%%, p = %s)", 100*all_perc[1], axis.test[[5]][1]) ylab=sprintf("RDA2 (%.0f%%, p = %s)", 100*all_perc[2], axis.test[[5]][2])  #plot outliers with different symbols and colours pdf(file = "biplot1-2.pdf", onefile=TRUE, paper="a4r", width=8, height=8)  par(bg="transparent")  plot(rda_res, scaling=1, main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="none") text(rda_res, scaling=1, display=c("cn"), col = "grey40", lwd=2, head.arrow = 0.12) points(rda_res, scaling=1, display = ("sp"), pch = symb, col=cols) legend("topleft", bty = "n", c("Bayenv outliers","LFMM outliers", "Outliers by 2 methods"),pch=c(2,3,5),col=c("blue","blue", "red"), cex = 1)  dev.off()  #plot outliers with label names pdf(file = "biplot1-2_Outlierlabels.pdf", onefile=TRUE, paper="a4r", width=8, height=8)   plot(rda_res, scaling=1, main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="none") text(rda_res, scaling=1, display=c("cn"), col = "grey40", lwd=2, head.arrow = 0.12) points(SNPs.scr[subsets==0,], col="black", pch=1) #points(SNPs.scr[subsets=="bayenv",], col="blue", pch = 2) #points(SNPs.scr[subsets=="LFMM",], col="blue", pch = 3) #points(SNPs.scr[subsets=="2methods",], col="red", pch = 5) text(SNPs.scr[subsets=="LFMM", ],labels=row.names(SNPs.scr[subsets=="LFMM", ]), col="blue", cex=0.7) text(SNPs.scr[subsets=="bayenv", ],labels=row.names(SNPs.scr[subsets=="bayenv", ]), col="darkolivegreen4", cex=0.7) text(SNPs.scr[subsets=="2methods", ],labels=row.names(SNPs.scr[subsets=="2methods", ]), col="red", cex=0.7) legend("topleft", bty = "n", c("Bayenv outliers","LFMM outliers", "Outliers by 2 methods"),pch=c(2,3,5),col=c("darkolivegreen4","blue", "red"), cex = 1)  dev.off()  #plot all SNP label names 224  pdf(file = "biplot1-2_Alllabels.pdf", onefile=TRUE, paper="a4r", width=8, height=8)  plot(rda_res, scaling=1, main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="text")  dev.off()  #also the third axis is significant #RDA 2-3  SNPs.scr <- scores(rda_res, choices = c(2,3), scaling = 1)$species row.names(SNPs.scr) <- colnames(freqs)  main =sprintf("Allele frequencies ~ Eco | (Space + ancestry) (%.1f %%, p = %s)", 100*fractions[1], overall.test[[5]][1]); xlab=sprintf("RDA2 (%.0f%%, p = %s)", 100*all_perc[2], axis.test[[5]][2]) ylab=sprintf("RDA3 (%.0f%%, p = %s)", 100*all_perc[3], axis.test[[5]][3])  #plot outliers with different symbols and colours pdf(file = "biplot2-3.pdf", onefile=TRUE, paper="a4r", width=8, height=8)  par(bg="transparent")  plot(rda_res, scaling=1, choice= c(2,3), main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="none") text(rda_res, scaling=1, choice= c(2,3), display=c("cn"), col = "grey40", lwd=2, head.arrow = 0.12) points(rda_res, scaling=1, choice= c(2,3), display = ("sp"), pch = symb, col=cols) legend("topleft", bty = "n", c("Bayenv outliers","LFMM outliers", "Outliers by 2 methods"),pch=c(2,3,5),col=c("blue","blue", "red"), cex = 1)  dev.off()  #plot outliers with label names pdf(file = "biplot2-3_Outlierlabels.pdf", onefile=TRUE, paper="a4r", width=8, height=8)   plot(rda_res, scaling=1, choice= c(2,3), main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="none") text(rda_res, scaling=1, choice= c(2,3), display=c("cn"), col = "grey40", lwd=2, head.arrow = 0.12) points(SNPs.scr[subsets==0,], col="black", pch=1) #points(SNPs.scr[subsets=="bayenv",], col="blue", pch = 2) #points(SNPs.scr[subsets=="LFMM",], col="blue", pch = 3) #points(SNPs.scr[subsets=="2methods",], col="red", pch = 5) text(SNPs.scr[subsets=="LFMM", ],labels=row.names(SNPs.scr[subsets=="LFMM", ]), col="blue", cex=0.7) text(SNPs.scr[subsets=="bayenv", ],labels=row.names(SNPs.scr[subsets=="bayenv", ]), col="darkolivegreen4", cex=0.7) text(SNPs.scr[subsets=="2methods", ],labels=row.names(SNPs.scr[subsets=="2methods", ]), col="red", cex=0.7) legend("topleft", bty = "n", c("Bayenv outliers","LFMM outliers", "Outliers by 2 methods"),pch=c(2,3,5),col=c("darkolivegreen4","blue", "red"), cex = 1)  dev.off()  #plot all SNP label names pdf(file = "biplot2-3_Alllabels.pdf", onefile=TRUE, paper="a4r", width=8, height=8)   plot(rda_res, scaling=1, choice= c(2,3), main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="text")  dev.off()  #RDA 3-4 225  SNPs.scr <- scores(rda_res, choices = c(3,4), scaling = 1)$species row.names(SNPs.scr) <- colnames(freqs)  main =sprintf("Allele frequencies ~ Eco | (Space + ancestry) (%.1f %%, p = %s)", 100*fractions[1], overall.test[[5]][1]); xlab=sprintf("RDA3 (%.0f%%, p = %s)", 100*all_perc[3], axis.test[[5]][3]) ylab=sprintf("RDA4 (%.0f%%, p = %s)", 100*all_perc[4], axis.test[[5]][4])  #plot outliers with different symbols and colours pdf(file = "biplot3-4.pdf", onefile=TRUE, paper="a4r", width=8, height=8)  par(bg="transparent")  plot(rda_res, scaling=1, choice= c(3,4), main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="none") text(rda_res, scaling=1, choice= c(3,4), display=c("cn"), col = "grey40", lwd=2, head.arrow = 0.12) points(rda_res, scaling=1, choice= c(3,4), display = ("sp"), pch = symb, col=cols) legend("topleft", bty = "n", c("Bayenv outliers","LFMM outliers", "Outliers by 2 methods"),pch=c(2,3,5),col=c("blue","blue", "red"), cex = 1)  dev.off()  #plot outliers with label names pdf(file = "biplot3-4_Outlierlabels.pdf", onefile=TRUE, paper="a4r", width=8, height=8)   plot(rda_res, scaling=1, choice= c(3,4), main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="none") text(rda_res, scaling=1, choice= c(3,4), display=c("cn"), col = "grey40", lwd=2, head.arrow = 0.12) points(SNPs.scr[subsets==0,], col="black", pch=1) #points(SNPs.scr[subsets=="bayenv",], col="blue", pch = 2) #points(SNPs.scr[subsets=="LFMM",], col="blue", pch = 3) #points(SNPs.scr[subsets=="2methods",], col="red", pch = 5) text(SNPs.scr[subsets=="LFMM", ],labels=row.names(SNPs.scr[subsets=="LFMM", ]), col="blue", cex=0.7) text(SNPs.scr[subsets=="bayenv", ],labels=row.names(SNPs.scr[subsets=="bayenv", ]), col="darkolivegreen4", cex=0.7) text(SNPs.scr[subsets=="2methods", ],labels=row.names(SNPs.scr[subsets=="2methods", ]), col="red", cex=0.7) legend("topleft", bty = "n", c("Bayenv outliers","LFMM outliers", "Outliers by 2 methods"),pch=c(2,3,5),col=c("darkolivegreen4","blue", "red"), cex = 1)  dev.off()  #plot all SNP label names pdf(file = "biplot3-4_Alllabels.pdf", onefile=TRUE, paper="a4r", width=8, height=8)   plot(rda_res, scaling=1, choice= c(3,4), main = main, xlab=xlab, ylab=ylab, display=c("sp","cn"), type="text")  dev.off()     226  Appendix 5 : FDIST Ritland method Python and R code I developed the following code in Python code v.2.7.2 to calculate the individual based FST estimator of Ritland (1996) (see details in “3.2.5 FST outlier tests”). Comments are indicated by “#”. #Start of code from __future__ import division import math  fx = open('Genotypes_EWP_FINAL_April_11_2013.txt', 'r') myfile = fx.readlines() fx.close myfile_noheader = myfile[1:]  #create list of markers columns = myfile[0].replace('\n','').split('\t') marker_list = columns[2:]  #Create output files  name = 'Fst_loci_out.txt' fz = open(name, 'w')  fz.write('Loci\tp\tq\t2pq\tFw\tFst\n')   #calculations print "Your awesome program is calculating..."  for j in range (2, len(marker_list)+2): #j correspond to the column number = to a different marker     marker = marker_list[j-2]     print marker      #Create a list of possible alleles     allele_list = []       for i in myfile_noheader:         genotype = i.split('\t')[j].replace('\n','')                 if genotype != '0':             allele1 =genotype[0]             allele2 =genotype[1]             if allele1 not in allele_list:                 allele_list.append(allele1)             if allele2 not in allele_list:                 allele_list.append(allele2)          alleleA = allele_list[0]     alleleB = allele_list[1] 227        ###count the number of alleles and calculate allele frequencies     alleleA_count = 0     alleleB_count = 0     pA = ''     pB = ''          for i in myfile_noheader: #Read all the genotypes for a given marker and count the number of each alleles         genotype = i.split('\t')[j].replace('\n','')         if genotype != '0': #if genotype is not null                         allele1 =genotype[0]             allele2 =genotype[1]                          if allele1 == alleleA:                 alleleA_count = alleleA_count + 1 #count the number of allele A             elif allele1 == alleleB:                 alleleB_count = alleleB_count + 1 # count the number of allele B             else:                 print 'error'                              if allele2 == alleleA:                 alleleA_count = alleleA_count + 1 #count the number of allele A             elif allele2 == alleleB:                 alleleB_count = alleleB_count + 1 # count the number of allele B             else:                 print 'error'                      pA = alleleA_count/(alleleA_count+alleleB_count)     pB = alleleB_count/(alleleA_count+alleleB_count)     He = 2*pA*pB      ###Calculate individual Fst for each loci      pairwise_F = 0 # reset the pairwise F. Arithmetic average of ind_F_list (over the 4 possible allele pairs)     nb_pairwise_F = 0     Fw = 0     nb_Fw = 0      for i in range(0, len(myfile_noheader)): #read each line corresponding to a single individual         populationi = myfile_noheader[i].split('\t')[1]                              for k in range(i, len(myfile_noheader)): #read each line corresponding to the second individual             populationk = myfile_noheader[k].split('\t')[1]             if k != i and populationi == populationk: #avoid within individuals and between populations comparisons                                                       allelesi_list = [] #reset the list alleles for individual i, [allele i, allele j]                 allelesk_list = [] #reset the list alleles for individual k, [allele k, allele l]                 pi_list = [] #list of allele frequencies for genotype i [pi, pj]                 dik_list = [] #[dik, dil, djk, djl]                                          genotypei = myfile_noheader[i].split('\t')[j].replace('\n','')                 genotypek = myfile_noheader[k].split('\t')[j].replace('\n','')                  228                  if genotypei != '0' and genotypek !='0': #no missing data for either individuali ok individualk                     allelesi_list.append(genotypei[0]) #individuali allele 1                     allelesi_list.append(genotypei[1]) #individuali allele 2                     allelesk_list.append(genotypek[0]) #individualk allele 1                     allelesk_list.append(genotypek[1]) #individualk allele 2                      for allelei in allelesi_list:                         if allelei == alleleA: #find range wide allele frequency, ex: AA, allele frequency p = pA. If BB, allele frequency p = pB.                             pi_list.append(pA)                         elif allelei == alleleB:                              pi_list.append(pB)                         else:                                                                 print 'error 1'                      for allelei in allelesi_list:                         for allelek in allelesk_list:                                          if allelei ==  allelek: #if homozygote, delta = 1                                 dik_list.append(1)                             elif allelei !=  allelek: #if heterozygote, delta = 0                                 dik_list.append(0)                             else:                                 print 'error 2'                                                    pairwise_F = pairwise_F + 0.25*(((dik_list[0]+dik_list[1])/pi_list[0])+((dik_list[2]+dik_list[3])/pi_list[1])-4) #calculate individual F                     nb_pairwise_F = nb_pairwise_F + 1                      Fst_loci = pairwise_F/nb_pairwise_F      fz.write(marker + '\t' + str(pA) + '\t' + str(pB) + '\t' + str(He) + '\t' + 'NA' + '\t' + str(Fst_loci) +'\n')  fz.close()    229  The second step was to test each locus against the FST distribution of a set of selectively neutral loci (simulated using Simcoal 2.1.2), according to the FDIST method (Beaumont & Nichols 1996) (see details in “3.2.5 FST outlier tests”). This was done using the following R code. Dr. Katie Lotterhos provided the code to fit a Johnson distribution to the simulated FST distribution. library(SuppDists) library(qvalue)  simul_Ritland <- read.table("simul_Fst.Ritland.txt", header = TRUE) obs_Ritland <-read.table("obs_Fst.Ritland.txt", header = TRUE)  ###Classify observed and simulated loci into He bins and calculate mean He, Ho, Fst observed, Fst simulated and confidence intervals for the desired alpha values. #The output of this function is used by the "plot_confint" function to plot the confidence intervals #Inputs: #increment: numeric, giving the He bin width. Ex; 0.05, each class cover a range of 0.05 #alpha the desired values for confidence intervals. Quantiles of alpha and 1-aplha/2 are calculated #simul, table of simulated Fst for each loci, containing a column named SNP, Fst and a column named X2pq #obs, , table of observed Fst for each loci, containing a column named SNP, Fst and a column named X2pq  Calc_He_bins_stats <- function(increment, alpha, simul,obs){      num = nchar(increment)-2      categories <- c(0,seq(increment, round(max(simul$X2pq),num), increment))   categories[length(categories)] = max(simul$X2pq) #include he that are slightly higher than 0.5   num_categories = length(categories)        bin_stats <- matrix(rep(NA,(num_categories-1)*5), nrow=num_categories-1,ncol=5, dimnames=list(NULL, c("meanHe", "obs mean Fst","simul mean Fst", "quant_low", "quant_high")))      for (i in 2:num_categories) {         simul_subset  <- subset(simul, X2pq >= categories[i-1] & X2pq < categories[i])     obs_subset  <- subset(obs, X2pq >= categories[i-1] & X2pq < categories[i])          meanFst_obs = mean(obs_subset$Fst)     meanFst_simul = mean(simul_subset$Fst)     meanHe =(categories[i]+categories[i-1])/2          Fst_lower <- qJohnson(alpha/2, parms=JohnsonFit(simul_subset$Fst))     Fst_higher <- qJohnson(1-alpha/2, parms=JohnsonFit(simul_subset$Fst))                                        bin_stats[i-1,1] <- meanHe     bin_stats[i-1,2] <- meanFst_obs     bin_stats[i-1,3] <- meanFst_simul     bin_stats[i-1,4] <- Fst_lower     bin_stats[i-1,5] <- Fst_higher      230    }      bin_stats <- data.frame(bin_stats)      return (bin_stats) }  #Function to plot confidence intervals with alpha level specified in "Calc_He_bins_stats".  #Inputs: #bin_stats: output of "Calc_He_bins_stats" #lty: desired line type for confidence intervals #col: desired colour for confidence intervals plot_confint <- function(bin_stats, lty, col){    #plot confint at specified alpha level (output from "Calc_He_bins_stats")   lines(bin_stats$meanHe, bin_stats$quant_low, lty = lty, col = col)   lines(c(0, min(bin_stats$meanHe)),c(bin_stats$quant_low[1], bin_stats$quant_low[1]), lty = lty, col = col)   lines(c(max(bin_stats$meanHe), 0.5),c(bin_stats$quant_low[nrow(bin_stats)], bin_stats$quant_low[nrow(bin_stats)]), lty = lty, col = col)    lines(bin_stats$meanHe, bin_stats$quant_high,lty = lty, col = col)   lines(c(0, min(bin_stats$meanHe)),c(bin_stats$quant_high[1], bin_stats$quant_high[1]),lty = lty, col = col)   lines(c(max(bin_stats$meanHe), 0.5),c(bin_stats$quant_high[nrow(bin_stats)], bin_stats$quant_high[nrow(bin_stats)]),lty = lty, col = col)  }  ###Function to calculate P-values for each observed loci from simulated data. #inputs: #increment: numeric, giving the He bin width. Ex; 0.05, each class cover a range of 0.05 #alpha: significance level. does not correct for multiple testing  outlier_test <- function(increment,alpha, simul, obs){    FST_sim <- simul$Fst #list of simulated Fst      num = nchar(increment)-2      categories <- c(0,seq(increment, round(max(simul$X2pq),num), increment))   categories[length(categories)] = 0.51 #include he that are slightly higher than 0.5   num_categories = length(categories)      quantiles <- matrix(rep(NA,(num_categories-1)*4), nrow=num_categories-1,ncol=4, dimnames=list(NULL, c("He_low","He_high","5%","95%")))      for (i in 2:num_categories) {     simul_subset  <- subset(simul, X2pq >= categories[i-1] & X2pq < categories[i]) #Subset for He bin          Fst_lower <- qJohnson(alpha/2, parms=JohnsonFit(simul_subset$Fst))     Fst_higher <- qJohnson(1-alpha/2, parms=JohnsonFit(simul_subset$Fst))           quantiles[i-1,1] <- categories[i-1]     quantiles[i-1,2] <- categories[i]     quantiles[i-1,3] <- Fst_lower     quantiles[i-1,4] <- Fst_higher  231    }      quantiles <- data.frame(quantiles)        #check if observed F for each marker is outside of 95% and 99% quantiles. i.e. are outliers     Fst_outlier <- matrix(rep(NA,length(obs$Loci)*3), nrow = length(obs$Loci), ncol = 3, dimnames=list(NULL, c(paste("Outlier",toString(alpha),sep="_"),paste("Pos/Bal",toString(alpha),sep="_"),"P_value")))      for (k in 1:nrow(obs)){     marker = obs$Loci[k] #each line (k) correspond to a different SNP     FST_obs <- obs$Fst[k]     outlier = "NA"     type_sel = "NA"     p_val = "NA"     for (z in 1:nrow(quantiles)){       if ((obs$X2pq[k] >= quantiles$He_low[z]) & (obs$X2pq[k] < quantiles$He_high[z])){                  #Determine if it is an outlier based on 95% and 99% quantiles of the simulated Johnson distribution         if (FST_obs < quantiles$X5.[z]){           outlier = "yes"           type_sel = "bal"         } else {           if (FST_obs > quantiles$X95.[z]){             outlier = "yes"             type_sel = "pos"           } else {             outlier = "no"             type_sel = "no_sel"           }                        }                  #Determine p value based on a Johnson distribution         simul_subset  <- subset(simul, X2pq >= quantiles$He_low[z] & X2pq <= quantiles$He_high[z]) #Subset for He bin                  FST_sim_He_bin <- simul_subset$Fst #list of Fst in the He bin                           ###         #Code provided by Katie Lotterhos code, University of British Columbia, unpublished                  #Fit the simulated FSTs to a Johnson distribution and get the density         minFST <- min(c(min(FST_sim), FST_obs))         FST <- seq(minFST,1.01,by=0.0001) #just to make sure "1" is included in sequence         FST_dens <- dJohnson(FST, parms=JohnsonFit(FST_sim_He_bin))         FST_dens[which(FST_dens=="NaN")]=0 #0 probabilities are undefined in the function                  #Normalize the density and get the p-value for the observed FST          FST_dens2 <- FST_dens/sum(FST_dens) #Sum of the normalized density is 1         FST_index <- max(which(FST<=FST_obs))         p_val <- sum(FST_dens2[1:FST_index])         #End of code provided by Katie Lotterhos code, University of British Columbia, unpublished       }       Fst_outlier[k,1] = outlier 232        Fst_outlier[k,2] = type_sel       Fst_outlier[k,3] = p_val     }   }   Fst_outlier   return(Fst_outlier) }  ######Calculations He_bins_stats_0.01_Ritland <- Calc_He_bins_stats(0.025,0.01, simul_Ritland, obs_Ritland) #alpha of 1% He_bins_stats_0.05_Ritland <- Calc_He_bins_stats(0.025,0.05, simul_Ritland, obs_Ritland) #alpha of 5%  outlier_Ritland_0.01 <- data.frame(outlier_test(0.025,0.01, simul_Ritland, obs_Ritland)) #alpha of 1% outlier_Ritland_0.05 <- data.frame(outlier_test(0.025,0.05, simul_Ritland, obs_Ritland)) #alpha of 5%  P_bal_Ritland <- as.numeric(as.character(outlier_Ritland_0.05$P_value)) fdr_bal_Ritland <- qvalue(P_bal_Ritland, lambda=0, fdr.level = 0.05) P_pos_Ritland <- 1 - as.numeric(as.character(outlier_Ritland_0.05$P_value)) fdr_pos_Ritland <- qvalue(P_pos_Ritland, lambda=0, fdr.level = 0.05)  results_Ritland <- data.frame(obs_Ritland, Outlier_0.01 = outlier_Ritland_0.01$Outlier_0.01 , Pos.Bal_0.01 = outlier_Ritland_0.01$Pos.Bal_0.01, Outlier_0.05 = outlier_Ritland_0.05$Outlier_0.05, Pos.Bal_0.05 = outlier_Ritland_0.05$Pos.Bal_0.05, P_value = outlier_Ritland_0.05$P_value, q_value_bal = fdr_bal_Ritland$qvalues, q_value_pos = fdr_pos_Ritland$qvalues) write.csv(results_Ritland,"Fst_outlier_results_Ritland.csv")  #Visualise Ritland outliers to the plot par(mfrow = c(1,1)) x <- '' y <- ''  plot(x,y, main = "Fst Ritland", xlab = "He", ylab = "FST", xlim = c(min(obs_Ritland$X2pq),max(obs_Ritland$X2pq)), ylim = c(min(obs_Ritland$Fst),max(obs_Ritland$Fst)))  #plot_confint calculated using Fst Ritland plot_confint(He_bins_stats_0.05_Ritland, 5, 'black') plot_confint(He_bins_stats_0.01_Ritland, 3, 'black')  #plot observed data no_outliers_Ritland = subset(results_Ritland, Outlier_0.05 == "no") #don't plot outliers now points(no_outliers_Ritland$X2pq, no_outliers_Ritland$Fst, col='black', pch = 1)  fdr_threshold = 0.1 #plotting outliers at 5% outliers_Ritland = subset(results_Ritland, (Pos.Bal_0.05 == "bal" & q_value_bal > fdr_threshold)  | (Pos.Bal_0.05 == "pos" & q_value_pos > fdr_threshold)) points(outliers_Ritland$X2pq, outliers_Ritland$Fst, col='red', pch = 4) text(outliers_Ritland$X2pq, outliers_Ritland$Fst, col = 'red', labels=outliers_Ritland$Loci, pos=2, offset=0.5,cex=0.7)  #ploting outliers at 10% fdr outliers_fdr_Ritland = subset(results_Ritland, q_value_bal <= fdr_threshold | q_value_pos <= fdr_threshold) points(outliers_fdr_Ritland$X2pq, outliers_fdr_Ritland$Fst, col='red', pch = 5) text(outliers_fdr_Ritland$X2pq, outliers_fdr_Ritland$Fst, col = 'red', labels=outliers_fdr_Ritland$Loci, pos=1, offset=0.5,cex=0.7) 233   legend(0.33,0.38, bty = "n", c("95% confidence intervals","99% confidence intervals"), lty = c(5,3), col = c('black','black') ) 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166943/manifest

Comment

Related Items