Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Molecular genetic diversity among natural populations of Populus Ismail, Mohamed 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2010_fall_ismail_mohamed.pdf [ 2.12MB ]
Metadata
JSON: 24-1.0071044.json
JSON-LD: 24-1.0071044-ld.json
RDF/XML (Pretty): 24-1.0071044-rdf.xml
RDF/JSON: 24-1.0071044-rdf.json
Turtle: 24-1.0071044-turtle.txt
N-Triples: 24-1.0071044-rdf-ntriples.txt
Original Record: 24-1.0071044-source.json
Full Text
24-1.0071044-fulltext.txt
Citation
24-1.0071044.ris

Full Text

 MOLECULAR GENETIC DIVERSITY AMONG NATURAL POPULATIONS OF POPULUS  by Mohamed Ismail     A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE STUDIES (Forestry)     THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) July 2010  © Mohamed Ismail, 2010   ii ABSTRACT Genetic diversity is a key factor in species survival, evolution, and adaptation.  It also reveals species genetic structure and provides insights into how different demographic forces shape species genetic variability.  Although, black cottonwood (Populus trichocarpa Torr. & Gray) is the first tree to have its genome completely sequenced; however, information regarding its natural genetic diversity and population structure is lacking.  I have investigated the extent of genetic diversity within and among 38 natural populations of P. trichocarpa sampled across British Columbia using 10 nuclear (nuSSR) and 12 chloroplast microsatellite (cpSSR) markers. CpSSR represents two haplotypes, clustering as northern and southern groups; however, a Bayesian population structure analysis suggested the presence of three highly admixed groups supported by low population differentiation (low FST and RST).  Monmonier’s spatial analysis suggested the presence of one genetic discontinuity dividing the studied area into northern and southern regions.  These findings indicated that P. trichocarpa might have originated from two, northern and southern, glacial refugia that have experienced moderate contact through extensive gene flow.  Nucleotide diversity for 10 candidate-gene loci involved in adaptive, defence, and housekeeping functions was abundant and varied across loci, with the majority showing neutral variations.  Linkage disequilibrium (LD), decayes rapidly to r2  ≈ 0.18 within 700 base pairs (bp). Comparing the nucleotide diversity between P. trichocarpa and P. balsamifera L. to the Eurasian P. tremula L. indicated that the two North American species had lower diversity (θw range 0.002 to 0.004) than the Eurasian poplar (θw = 0.005).  The estimated time of divergence between the two North American and the Eurasian species indicated that the latter was five- to six-fold older compared to the two former species.  The substitution rate was lower in North American species (0.4 x 10-8 per year) compared to the Eurasian poplar (2 x 10-8).  Different association genetics models produced strikingly different results after the inclusion or exclusion of population structure, highlighting the importance of proper model construction.  iii TABLE OF CONTENTS ABSTRACT................................................................................................................................... ii TABLE OF CONTENTS ............................................................................................................ iii LIST OF TABLES ....................................................................................................................... vi LIST OF FIGURES .................................................................................................................... vii ACKNOWLEDGMENTS ......................................................................................................... viii CO-AUTHORSHIP STATEMENT ........................................................................................... ix Chapter 1. Literature review and research objectives ...............................................................1 1.1 Introduction........................................................................................................................... 1 1.2 Black cottonwood geographic range and classification........................................................ 1 1.3 Importance of Populus trichocarpa to forest tree research .................................................. 2 1.4 Molecular markers in population genetic studies ................................................................. 3 1.5 Nucleotide polymorphisms and linkage disequilibrium in Populus ..................................... 3 1.6 Comparative genomics.......................................................................................................... 4 1.7 Association genetics.............................................................................................................. 5 1.8 Research objectives............................................................................................................... 5 1.9 References............................................................................................................................. 7 Chapter 2. Regional and population genetic structure of Populus trichocarpa in British Columbia.......................................................................................................................................11 2.1 Introduction......................................................................................................................... 11 2.2 Materials and methods ........................................................................................................ 13 2.2.1 Plant material and DNA extraction.................................................................................13 2.2.2 Microsatellite genotyping ...............................................................................................13 2.2.3 Data analysis ...................................................................................................................14 2.2.3.1 Analysis of introgression between P. trichocarpa and P. balsamifera .............................14 2.2.3.2 Microsatellite diversity and deviations from Hardy-Weinberg equilibrium......................15 2.2.3.3 Population demographic inference using chloroplast microsatellites (cpSSRs)................15 2.2.3.4 Regional population structure using nuclear SSRs............................................................15 2.2.3.5 Hierarchical population differentiation..............................................................................17 2.3 Results......................................................................................................................................17 2.3.1 Introgression between P. trichocarpa and P. balsamifera .............................................17 2.3.2 Microsatellite diversity and deviations from Hardy-Weinberg equilibrium...................17 2.3.3 Population demographic inference using chloroplast microsatellite (cpSSRs) ..............18 2.3.4 Regional population structure .........................................................................................18 2.3.5 Hierarchical population differentiation...........................................................................19 2.4 Discussion ........................................................................................................................... 19 2.4.1 Introgression between P. trichocarpa and P. balsamifera .............................................19 2.4.2 Microsatellite diversity and deviations from Hardy-Weinberg equilibrium...................20 2.4.3 Population demographic inference using chloroplast microsattelite (cpSSRs) ..............21 2.4.4 Population structure and landscape genetics...................................................................22 2.4.5 Hierarchical population differentiation...........................................................................24 2.5 Conclusions......................................................................................................................... 25  iv 2.6 References........................................................................................................................... 43 Chapter 3. Nucleotide diversity and linkage disequilibrium in natural populations of Populus trichocarpa Torr. & Gray .............................................................................................50 3.1 Introduction......................................................................................................................... 50 3.2 Materials and methods ........................................................................................................ 51 3.2.1 Plant material ..................................................................................................................51 3.2.2 Candidate-gene loci selection .........................................................................................52 3.2.3 DNA extraction, PCR primer design, fragments amplification, and sequencing ...........52 3.2.4 Sequence analysis and SNP discovery............................................................................53 3.2.5 Nucleotide diversity and neutrality tests.........................................................................53 3.2.6 Haplotypes inference and analysis of linkage disequilibrium (LD) ...............................54 3.3 Results................................................................................................................................. 55 3.3.1 Nucleotide diversity ........................................................................................................55 3.3.2 Deviation from neutrality................................................................................................56 3.3.3 Haplotype structure and linkage disequilibrium (LD) decay..........................................57 3.4 Discussion ........................................................................................................................... 57 3.4.1 Nucleotide diversity ........................................................................................................57 3.4.2 Deviation from neutrality................................................................................................59 3.4.3 Haplotypes and linkage disequilibrium (LD) .................................................................61 3.5 Conclusions......................................................................................................................... 63 3.6 References........................................................................................................................... 73 Chapter 4. Comparative nucleotide diversity across North American and European Populus species .............................................................................................................................78 4.1 Introduction......................................................................................................................... 78 4.2 Materials and methods ........................................................................................................ 80 4.2.1 Plant material and DNA extraction.................................................................................80 4.2.2 Candidate gene selection and fragment amplification and sequencing ..........................80 4.2.3 Sequence analysis and SNP discovery............................................................................81 4.2.4 Nucleotide diversity and deviation from neutrality ........................................................82 4.2.5 Molecular clock and divergence time estimates .............................................................82 4.2.6 Haplotype inference and analysis of linkage disequilibrium (LD).................................83 4.3 Results................................................................................................................................. 84 4.3.1 Comparative nucleotide diversity in Populus .................................................................84 4.3.2 Neutrality tests ................................................................................................................85 4.3.3 Divergence time estimates ..............................................................................................85 4.3.4 Haplotype structure and linkage disequilibrium.............................................................86 4.4 Discussion ........................................................................................................................... 86 4.4.1 Comparative nucleotide diversity in Populus .................................................................86 4.4.2 Deviation from neutrality................................................................................................88 4.4.3 Sequence divergence in Populus ....................................................................................89 4.4.4 Linkage disequilibrium in Populus .................................................................................90 4.5 Conclusions......................................................................................................................... 90 4.6 Reference .......................................................................................................................... 101 Chapter 5. Association genetics in Populus trichocarpa: Methods comparison and caveats associated with sampling natural populations ........................................................................106  v 5.1 Introduction....................................................................................................................... 106 5.2 Materials and methods ...................................................................................................... 108 5.2.1 Plant material, phenotypic traits, and genotyping.........................................................108 5.2.2 Statistical analyses ........................................................................................................108 5.3 Results and discussion ...................................................................................................... 109 5.4 References......................................................................................................................... 122 Chapter 6. Thesis conclusions...................................................................................................126 6.1 Introduction....................................................................................................................... 126 6.2 Population structure and landscape genetics of P. trichocarpa ........................................ 126 6.3 Sequence diversity and linkage disequilibrium in P. trichocarpa .................................... 127 6.4 Comparative nucleotide diversity in Populus ................................................................... 127 6.5 Association genetics model comparisons in P. trichocarpa ............................................. 128 6.6 Limitations of the current study........................................................................................ 129 6.7 Future research.................................................................................................................. 130 6.8 References......................................................................................................................... 131  vi LIST OF TABLES Table 2.1 List of P. trichocarpa and P. balsamifera populations. ................................... 26 Table 2.2 Diversity estimates of P. trichocarpa chloroplast SSRs. ................................. 28 Table 2.3 Primer information and diversity estimates of P. trichocarpa nuclear SSRs... 29 Table 2.4 List of the 38 P. trichocarpa populations. ........................................................ 30 Table 2.5 Nuclear SSRs diversity estimates by group in P. trichocarpa. ........................ 32 Table 2.6 Hierarchical population structure of P. trichocarpa......................................... 33 Table 3.1 List of P. trichocarpa populations used to estimate nucleotide diversity. ....... 64 Table 3.2 Description of ten nuclear gene loci in P. trichocarpa..................................... 66 Table 3.3 Primer information for ten nuclear loci in P. trichocarpa. ............................... 67 Table 3.4 Summary of SNPs in P. trichocarpa. ............................................................... 68 Table 3.5 Summary of sequence variation and deviation from neutrality in P. trichocarpa. .................................................................................................................................. 69 Table 3.6 Sequence divergence and McDonald-Kreitman (MK) test in P. trichocarpa. . 70 Table 4.1 Location of the Populus populations ................................................................ 92 Table 4.2 Primer information for nine gene loci in Populus. ........................................... 93 Table 4.3 Summary of SNPs in Populus. ......................................................................... 94 Table 4.4 Summary of diversity estimates in Populus. .................................................... 95 Table 4.5 MK test, neutrality index (NI), and Fay and Wu’s H in P. balsamifera. ......... 96 Table 4.6 Inter-species haplotype diversity estimates in Populus. ................................... 97 Table 4.7 MK test, NI, and Fay and Wu’s H in seven gene loci in P. trichocarpa.......... 98 Table 5.1 Associations summary in P. trichocarpa and P. tremula. .............................. 115 Table 5.2 Mean squared differences and Spearman’s rank correlations. ....................... 116  vii LIST OF FIGURES Figure 1.1 Natural range of P. trichocarpa in North America. .......................................... 6 Figure 2.1 British Columbia’ P. trichocarpa river drainages geographic location. ......... 34 Figure 2.2 STRUCTURE output of P. trichocarpa and P. balsamifera. ......................... 35 Figure 2.3 Haplotype structure of cpSSRs in P. trichocarpa. .......................................... 36 Figure 2.4 Nuclear SSRs-based STRUCTURE analyses for P. trichocarpa. .................. 38 Figure 2.5 Primary genetic discontinuity identified using Monmonier’s algorithm. ....... 39 Figure 2.6 Maps of GENELAND assignments of P. trichocarpa.................................... 40 Figure 3.1 Geographic location of P. trichocarpa populations. ....................................... 71 Figure 3.2 Scatter plot of linkage disequilibrium (LD) in P. trichocarpa........................ 72 Figure 4.1 Synonymous (Ks) and nonsynonymous (Ka) divergence in Populus. ............. 99 Figure 4.2 Decay of LD in Populus. ............................................................................... 100 Figure 5.1 STRUCTURE output of 170 P. trichocarpa individuals. ............................. 117 Figure 5.2 The relationship between SNPs associated with severity of Valsa sordida.. 118 Figure 5.3 Haplotype block of SNPs associated with Valsa sordida in TI-3. ................ 119 Figure 5.4 Correlation between SNPs frequency and percent variance. ........................ 120 Figure 5.5 Plot of the observed vs. expected P-values. .................................................. 121  viii ACKNOWLEDGMENTS First, I would like to acknowledge my supervisor, Yousry A. El-Kassaby.  This thesis would not have been possible without his guidance, patience, and support.  I would like to express my gratitude to my committee members Rob Guy, Kermit Ritland, Chang-Yi Xie, and C. Peter Constabel whose advice improved this thesis in many ways.  Also, I would like to thank William Schroeder, AAFC-AESB for providing balsam poplar material and Steve DiFazio, for providing SSR primer sequences.  Also, I am grateful to Gancho Slavov and P. K. Ingvarsson, for their advice on data analysis.  I would like to thank the Ministry of Higher Education, Government of Egypt for financial support.  This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC-IRC) to Yousry A. El-Kassaby.  ix CO-AUTHORSHIP STATEMENT This thesis was written as a series of related manuscripts on the advice of my research supervisor Yousry A. El-Kassaby.  Yousry A. El-Kassaby provided guidance throughout the execution of each of the thesis chapters.  For all chapters, I took the lead in developing the ideas, collecting and analyzing the data, producing figures, tables and draft manuscripts; however, this research would not have been possible without the contribution of other collaborators.  Chapter 2: M. Ismail designed the experiment, analysed the data, and wrote the manuscript. Chang-Yi Xie at British Columbia Ministry of Forests and Range provided access to the plant material and edited the manuscript.  A version of this chapter will be submitted for publication.  Chapter 3: M. Ismail designed the experiment, analysed the data, and wrote the manuscript; P. K. Ingavrsson at Umeå Plant Science Centre, Department of Ecology and Environmental Science, University of Umeå provided primer sequence for PHYB2, advised on the data analyses and edited the manuscript; C. Peter Constabel at Centre for Forest Biology and Department of Biology, University of Victoria provided EST sequences; and Yousry A. El-Kassaby edited the manuscript.  A version of this chapter has been submitted for publication.  Chapter 4: M. Ismail and R. Y. Soolanayakanahally contributed equally to the manuscript as a part of their PhD dissertations; P. K. Ingavrsson advised on the data analysis and edited the manuscript; R. D. Guy advised on the estimation of single nucleotide polymorphism (SNPs) age among the three Populus species, analyzing the two cottonwood species as a single species, and edited the manuscript; Stefan Jansson at Umeå Plant Science Centre, Department of Ecology and Environmental Science, University of Umeå provided Populus tremula DNA samples; Salim N. Silim at Agri-Environment Services Branch, Agriculture Agri-Food Canada provided Populus balsamifera plant material, and Yousry A. El-Kassaby edited the manuscript.  A version of this chapter will be submitted for publication.  Chapter 5: Yousry A. El-Kassaby developed the concept for association mapping modeles comparison; M. Ismail analysed the data, produced figures, tables and draft manuscript; and P. K. Ingvarsson provided Populus tremula data.  A version of this chapter will be submitted for publication.  1 Chapter 1. Literature review and research objectives 1.1 Introduction In the plant kingdom, trees are characterized by their long life span, production of wood biomass, and as a source of wood-based products, including timber, pulp, and paper (Taylor 2002).  Populus L. has long been recognized as a model species in tree research due to its short life span and ease of propagation and hybridization as compared to other tree genera.  Until recently, no appropriate model organism was available to facilitate genome research in trees. Most research on model plant species is associated with Arabidopsis, which is well characterized and has been used to understand gene expression of many plants including Populus (Quesada et al. 2008).  However, the lack of certain characters (e.g., wood formation, dormancy, long life span) hinder Arabidopsis use in tree genomic research.  As widely distributed and adapted to diverse environments, Populus harbours considerable genetic variability (Eckenwalder 1996). Among Populus species, black cottonwood (Populus trichocarpa Torr. & Gray) is the only woody perennial plant with a completed genome sequence (Tuskan et al. 2006).  The integration of its genome sequence, detailed genetic maps (Kelleher et al. 2008), and physiology (Ridge et al. 1986) will substantially facilitate our understating of many features of Populus and will provide insights on the use of molecular genetic tools to unravel Populus population and genomic structure, inference of genome evolution, and natural variation. 1.2 Black cottonwood geographic range and classification Black cottonwood (Populus trichocarpa) covers large areas of North America (Burns and Huncala 1990 and Figure 1.1).  Its coastal range extends from Kodiak Island along Cook Inlet (57° 44` N) to latitude 62° 30` N, then southeast in Alaska and British Columbia to the forested areas of Washington and Oregon, to the mountains in southern California and northern Baja California at latitude 31° N.  Its inland presence covers the western side of the Rocky Mountains in British Columbia, western Montana, and northern Idaho with some small-scattered populations in eastern Montana, western North Dakota, western Wyoming, Utah, and Nevada (Burns and Honkala 1990). Black cottonwood is a species within the genus Populus L. that includes approximately 29 species classified into six sections (Eckenwalder 1996) distributed mainly in the forests of temperate and boreal regions of northern hemisphere (Rahman and Rajora 2003).  This  2 classification is mainly associated with major hybridization barriers (Zsuffa 1975; Eckenwalder 1996).  However, the number of species recognized in the literature spans from 22 to 85 as a result of misinterpretation of hybrids as species (Eckenwalder 1996).  Despite the extensive studies addressing Populus biology, its taxonomy at the sectional and species levels is still controversial (Eckenwalder 1996).  This controversy is more pronounced at the species level within sections.  For instance, black cottonwood (Populus trichocarpa Torr. & Gray) and balsam poplar (Populus balsamifera L.) from section Tacamahaca in North America are considered by many authors as two different species (Eckenwalder 1996).  According to Brayshaw (1965) and the United States Department of Agriculture (USDA), however, they are a single species [i.e., Populus balsamifera L. ssp. trichocarpa (Torr. & A. Gray ex Hook.) Brayshaw and Populus balsamifera L. spp. balsamifera].  In contrast, members of section Populus (e.g., Populus tremula L. and other aspens) are well differentiated from those of section Tacamahaca, and will not easily hybridize (Eckenwalder 1996). For the ease of writing, analysis, and interpretation throughout my thesis, I will be referring to Populus trichocarpa Torr. & Gray and Populus balsamifera L. as two separate species, unless otherwise indicated. 1.3 Importance of Populus trichocarpa to forest tree research Poplars are being extensively used at different levels in tree research programs as a result of their characteristics (see Stettler et al. 1996; Taylor 2002, for review).  For example, P. trichocarpa is widely used in commercial production of P. trichocarpa x P. deltoids L. hybrids (i.e.; TD hybrids) which is essential compared to other hybrids in forest regeneration due to its high survival rate (Heilman 1999).  Although hybrid poplars are favoured in some instances (Heilman 1999), the use of pure species is well established in addressing questions related to molecular and ecophysiological characteristics for adaptative importance (Ingvarsson et al. 2006; Soolanayakanahally et al. 2009) and wood formation attributes (Mijnsbrugge 2000).  More recently, collaborative efforts between governments, universities and the private sector are focusing on the use of poplars as a source of biofuel (Yemshanov and McKenny 2008; Davis 2008).    3 1.4 Molecular markers in population genetic studies Molecular markers, including protein (isozymes) and DNA, have been used in many natural and domesticated population studies of many tree species to assess, among other things, the genetic variation within and among populations (Hamrick et al. 1992).  A few studies were conducted on P. trichocarpa natural variation including Weber and Stettler’s (1981) isozyme study that was restricted to 10 populations from the Pacific Northwest, Sigurdsson et al. (1995) using RAPD markers on nine trees, and the study of Rahman and Rajora (2003) using microsatellite markers to determine genetic differentiation and relationship among clones and cultivars of six poplar species.  Among the wide range of molecular markers available for population genetics inferences, microsatellites (SSRs) the most revealing genetic markers despite their limitations (Zhang et al. 2003).  High variability and cross taxa amplification make SSRs widely applicable in gene flow assessment (Smulders et al. 2008; Slavov et al. 2009) and population structure inferences (Dvorak et al. 2009; Dubreuil et al. 2010). 1.5 Nucleotide polymorphisms and linkage disequilibrium in Populus Nucleotide diversity varies significantly among forest trees and the amount of variability is attributable to many factors including evolutionary forces and species life history traits (Ingvarsson 2010; Keller et al. 2010).  Populus is an obligate outcrossing genus, is widely distributed across geographical ranges, and is expected to harbour a substantial level of nucleotide diversity (Ingvarsson 2010).  However, to date only limited studies have investigated the level of sequence variability, and these studies were limited to a handful of poplar species that include Eurasian aspen, P. tremula (Ingvarsson 2010), and the North American P. balsamifera (Breen et al. 2009; Keller et al. 2010).  Despite the completion of its genome sequence, P. trichocarpa did not gain comparable attention, with only one study (Gilchrist et al. 2006) investigating sequence diversity limited to 41 genotypes sampled across the species range. This calls for more investigations to be pursued on this important tree species.  Generally speaking, the North American sister species P. trichocarpa and P. balsamifera appear to harbour lower silent site diversity (πs) compared to other pairs of Populus species.  In P. trichocarpa and P. balsamifera the silent site diversity varied between 0.0029 and 0.0035 (Tuskan et al. 2006; Gilchrist et al. 2006; Breen et al. 2009).  On the other hand, P. tremula and P. nigra silent site diversity was higher (0.010-0.0120; Ingvarsson 2005, 2008b; Chu et al. 2008).  Additionally, the  4 total nucleotide diversity (πT) varied among poplar species to the same extent (Gilchrist et al. 2006; Breen et al. 2009; Olson et al. 2010). One of the important properties of a species genome is linkage disequilibrium (LD) or the non-random association between alleles at different loci within a population (Nordborg and Tavare 2002).  LD is a function of many factors that increase/decrease its strength and is mainly associated with species reproductive biology, and evolutionary forces (Gaut and Long 2003). For instance, in obligate outbreeding species such as P. trichocarpa, P. tremula, and P. nigra LD extends over short distance (~300-400 bp, Gilchrist et al. 2006; Ingvarsson 2008; Chu et al. 2009), whereas in species with a predominantly selfing mating system such as Oryza sativa and Hordeum vulgare, LD tends to extend over hundreds of kilobases (Garris et al. 2003; Caldwell et al. 2006).  Additionally, population structure, admixture, and complex demographic history, are considered to increase the level of LD (Gaut and Long 2003; Ingvarsson 2008).  On the other hand, natural selection acting individually or combined with other factors has great influence on the level of LD across species.  Natural selection is expected to increase the level of LD similar to the effect of admixture (Nordborg and Tavare 2002). 1.6 Comparative genomics With increasing information from genome sequence data for many organisms, it has become feasible to investigate the similarity and differences at the genome level among different taxonomic units.  In this context, comparative genomics provides insights on how a species has evolved.  Comparative genomics utilizes computer programs that can align different genomes from multiple species and search for sequence similarities.  Comparative genomics has been applied in several model organisms including Arabidopsis (Wright 2003) and rice (Jianxin and Bennetzen 2004).  Genome comparison is an effective tool in addressing many biological and evolutionary questions at different taxonomical levels.  In closely related species, comparative genomics highlights the unique features of a species (Hardison 2003) and facilitates genetic mapping (Brunner et al. 2005), whereas among distantly related species comparative genomics might unravel genome evolution and function (Hardison 2003).  Comparative genomics in Populus as combined with its completed genome sequence (Tuskan et al. 2006) and functional genomic studies (Ralph 2008) will lead to valuable resources for other tree species.   5 1.7 Association genetics The phenotypic variability at many complex traits is influenced by the environment, multiple quantitative trait loci (QTL), and their interaction (Zhu et al. 2008).  Many studies have identified loci at a wide interval across the genome (10-20 cM; Zhu et al. 2008); however, among these studies only a few QTLs have been tagged at the gene level (Price et al. 2006).  The alternative approach that is now widely applied in many species is association mapping or LD mapping that can dissect genotype-phenotype association at the nucleotide level using information from population recombination events (Nordborg and Tavare 2002).  Although association genetics provides great advantages including high resolution mapping and (Zhu et al. 2008; Hall et al. 2010), it has potential problems that should be considered.  Among factors that affect the accuracy of association studies is hidden population and family structures which might reveal false associations; however, many methods have been developed to correct for population and family structures (see Zhu et al. 2008; for review). 1.8 Research objectives In the following chapters, I investigate the level of molecular genetic diversity in Populus by (i) assessing the level of regional and population structure in natural populations of P. trichocarpa using microsatellites (SSRs), (ii) determine the level of nucleotide polymorphisms and linkage disequilibrium in natural populations of P. trichocarpa using gene loci involved in defence, stress response, photoperiodism, and housekeeping roles, (iii) compare the level of nucleotide polymorphisms and linkage disequilibrium among three Populus species; namely, P. trichocarpa, P. balsamifera, and P. tremula at nine gene loci involved in defence, stress response, photoperiodism, freezing tolerance, and housekeeping functions, and (iv) compare different methods of association genetics in natural populations of P. trichocarpa.  6    Figure 1.1 Natural range of P. trichocarpa in North America, after Little (1976)    7 1.9 References Aagard, J. E., K. V. Krutosovkii, and S. H. Strauss. 1998. RAPDs and allozymes exhibit similar levels of diversity and differentiation among populations and races of Douglas-fir. Heredity 81:69-78. Bekessy, S. A., R .A. Ennos, M. A. Burgman, et al. 2003. Neutral DNA markers fail to detect genetic divergence in an ecologically important trait. Biol Cons 110:267-275. Breen, A. L., E. Glenn, A. Yeager, et al. 2009. Nucleotide diversity among natural populations of a North American poplar (Populus balsamifera, Salicaceae). New Phytol 182:763-773. Brunner, S., K. Fengler, M. Morgante, et al. 2005. Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell 17:343-360. Burns, R. M., and H. H. Honkala. 1990. Silvics of North America: hardwoods. Agriculture Handbook 654. Vol.2. U.S. Department of Agriculture, Forest Service,Washington, D.C. Caldwell, K. S., J. Russell, P. Langridge, et al. 2006. Extreme population-dependent linkage disequilibrium detected in an inbreeding plant species, Hordeum vulgare. Genetics 172:557-567. Chu, Y., X. Su, Q. Huang, et al. 2009. Patterns of DNA sequence variation at candidate gene loci in black poplar (Populus nigra L.) as revealed by single nucleotide polymorphisms. Genetica 137:141-150. Davis, J. M. 2008. Genetic improvement of poplar (Populus spp.) as a bioenergy crop. Pp. 377- 396 in W. Vermerris (ed) Genetic Improvement of Bioenergy Crops. Springer, New York. Dubreuil, M., M. Riba, S. C. Gonzalez-Martinez, et al. 2010. Genetic effects of chronic habitat fragmentation revisited: Strong genetic structure in a temperate tree, Taxus baccata (Taxaceae), with great dispersal capability. Am J Bot 97:303-310. Dvorak, W. S., K. M. Potter, V. D. Hipkins, et al. 2009. Genetic diversity and gene exchange in Pinus oocarpa, a Mesoamerican pine with resistance to the pitch canker fungus (Fusarium circinatum). Int J Plant Sci 170:609-626. Eckenwalder, J. E. 1996. Systematics and evolution of Populus. Pp. 7-32 in R. F. Stettler, H. D. Jr. Bradshaw, P. E. Heilman, et al. (eds) Biology of Populus and its implications for management and conservation. NRC Research press, Ottawa, Ontario, Canada.  8 Garris, A. J., S. R. McCouch, and S. Kresovich. 2003. Population structure and its effect on haplotype diversity and linkage disequilibrium surrounding the xa5 locus of rice (Oryza sativa L.). Genetics 165:759-769. Gaut, B. S., and A. D. Long. 2003. The lowdown on linkage disequilibrium. Plant Cell 15:1502- 1506. Gilchrist, E. J., G. W. Haughn, C. C. Ying, et al. 2006. Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol 15:1367-1378. Hall, D., C. Tegstrom, and P. K. Ingvarsson. 2010. Using association mapping to dissect the genetic basis of complex traits in plants. Brief Funct Genomic 2:157-165. Hamrick, J. L., M. J. W. Godt, and L. S. Susan. 1992. Factors influencing levels of genetic diversity in woody plant species. New Forests 6:95-124. Hardison, R. C. 2003. Comparative genomics. PLoS biology, 1(2):e58. doi: 10.1371/journal.pbio.0000058. Heilman, P.E. 1999. Planted forests: poplars. New Forest 17:89-93. Ingvarsson, P. K. 2008. Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180:329-340. Ingvarsson, P. K. 2010. Nucleotide polymorphism, linkage disequilibrium and complex trait dissection in Populus. Pp 91-111 in S. Jansson, R. Bhaleroa, and A. Groover  (eds) Genetics and Genomics of Populus, DOI: 10.1007/978-1-4419-1541-2_5, Springer, New York. Ingvarsson, P. K., M. V. Garcia, D. Hall, et al. 2006. Clinal variation in phyB2, a candidate gene for day-length-induced growth cessation and bud set, across a latitudinal gradient in European aspen (Populus tremula). Genetics 172:845-1853. Jianxin, Ma., and J. L. Bennetzen. 2004. Rapid recent growth and divergence of rice nuclear genomes. PNAS 101:12404-12410. Kelleher, C. T., R. Chiu, H. Shin, et al. 2007. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation. Plant J 50:1063-1078. Keller, S. R., M. S. Olson, S. Silim, et al. 2010. Genomic diversity, population structure, and migration following rapid range expansion in the Balsam Poplar, Populus balsamifera. Mol Ecol 19:1212-1226.  9 Little, E. L. Jr. 1976. Atlas of United States Trees, volume 3, Minor Western Hardwoods: U.S. Department of Agriculture miscellaneous publication 1314, 13 p., 290 maps. Mijnsbrugge, K. V., H. Meyermans, M. van Montagu, et al. 2000. Wood formation in poplar: identification, characterization, and seasonal variation of xylem proteins. Planta 210:589- 598. Nordborg, M., and S. Tavare. 2002. Linkage disequilibrium: what history has to tell us. Trends Genet 18:83-90. Olson, M. S., A. L. Robertson, N. Takebayashi, et al. 2010. Nucleotide diversity and linkage disequilibrium in balsam poplar (Populus balsamifera). New Phytol 186:526-536. Quesada, T., Z. Li, C. Dervinis, et al. 2008. Comparative analysis of the transcriptomes of Populus trichocarpa and Arabidopsis thaliana suggests extensive evolution of gene expression regulation in angiosperms. New Phytol 180:408-420. Rajora, O. P., and M. H. Rahman. 2003. Microsatellite DNA and RAPD fingerprinting, identification and genetic relationships of hybrid poplar (Populus x canadensis) cultivars. Theor Appl Genet 106:470-477. Ralph, S. G., H. J. Chun, D. Cooper, et al. 2008. Analysis of 4,664 high-quality sequence finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding. BMC Genomics 9:57. Ralph, S., C. Oddy, D. Cooper, et al. 2006. Genomics of hybrid poplar (Populus trichocarpa x deltoides) interacting with forest tent caterpillars (Malacosoma disstria): normalized and full-length cDNA libraries, expressed sequence tags, and a cDNA microarray for the study of insect-induced defenses in poplar. Mol Ecol 15: 1275-1297. Ridge, C. R., T. M. Hinckley, R. F. Stettler, et al. 1986. Leaf growth characteristics of fast growing poplar hybrids Populus trichocarpa x P. deltoides. Tree Physiol 1:209-216. Sigurdsson, V., K. A. Jonsson, and A. Sigurgeirsson. 1995. DNA fingerprinting of Populus trichocarpa clones using RAPD markers. New Forests 10: 197-206. Slavov, G. T., S. Leonardi, J. Burczyk, et al. 2009. Extensive pollen flow in two ecologically contrasting populations of Populus trichocarpa. Mol Ecol 18:357-373. Smulders, M. J. M., J. E. Cottrell, F. Lefever, et al. 2008. Structure of the genetic diversity in black poplar (Populus nigra L.) populations across European river systems: Consequences for conservation and restoration. Forest Ecol Manage 255:1388-1399. Soolanayakanahally, R. Y., R. D. Guy, S. N. Silim, et al. 2009. Enhanced assimilation rate and water use efficiency with latitude through increased photosynthetic capacity and internal  10 conductance in balsam poplar (Populus balsamifera L.). Plant Cell Environ 32:1821- 1832. Stettler, R. F., L. Zsuffa, and R. Wu. 1996. The role of hybridization in the genetic manipulation of populus. Pp 87-112 in F. R. Stettler, H. D. Jr. Bradshaw, P. E. Heilman, et al. (eds) Biology of Populus and its Implications for Management and Conservation. NRC Research press, Ottawa, Ontario, Canada. Taylor, G. 2002. Populus: Arabidopsis for Forestry. Do we need a model tree? Ann Bot 90:681- 689. Walter, R., and B. K. Epperson. 2001. Geographic pattern of genetic variation in Pinus resinosa area of greatest diversity is not the origin of postglacial populations. Mol Ecol 10:103- 111. Weber, J. C., and R. F. Stettler. 1981. Isoenzymes variation among ten populations of Populus trichocarpa Torr. et Gray in the Pacific Northwest. Silvae Genet 30:82-87. Wright, S., B. Lauga, D. Charlesworth. 2003. Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol Biol Evol 19:1407-1420. Wullschleger, S. D., S. Jansson, and G. Taylor. 2002. Genomics and forest biology: Populus emerges as the perennial favorite. Plant Cell 14:2651-2655. Yemshanov, D., and D. McKenney. 2008. Fast-growing poplar plantations as a bioenergy supply source for Canada. Biomass and Bioenergy 32:185-197. Zhang, D. X., and G. M. Hewitt. 2003. Nuclear DNA analyses in genetic studies of populations: practice, problems and prospects. Mol Ecol 12:563-584. Zhu, C., M. Gore, E. S. Buckler, and J. Yu. 2008. Status and prospects of association mapping in plants. Plant Genome 1:5-20. Zsuffa, L. 1975. A summary review of interspecific breeding in the genus Populus. 107-123. in: Proceedings of the 14th  Meeting of the Canadian Tree Improvement Association, part 2. Canadian Forest Service, Ottawa, ON. Canada.   11 Chapter 2. Regional and population genetic structure of Populus trichocarpa in British Columbia1 2.1 Introduction Trees are characterized by substantial genetic diversity, presumably as a result of their long life-spans, wide ranges, and large population sizes, typically outcrossing mating systems, and extensive propagule dispersal (Hamrick et al. 1992).  Populus trichocarpa Torr. & Gray has become the model tree species for genetic and genomic research (Taylor 2002).  The species has a small genome size (485 Mb, Tuskan et al. 2006) which is 1/40th the size of the pine (Pinus) genome (Ahuja and Neale 2005) and can be propagated and hybridized easily (Stettler et al. 1996).  Furthermore, information from extensive quantitative and ecological genetic studies (Stettler et al. 1996), a large EST database (Christopher et al. 2004), and whole-genome sequence (Tuskan et al. 2006) are available.  More importantly, as a fast growing, high-yielding plantation species, poplar is being viewed as a viable biofuel source.  In Canada, the forestry- based biofuel industry is experiencing exponential expansion (Government of Canada 2008), and therefore better understanding of the genetic control of poplar biomass traits is of great interest (Pan et al. 2006).  Compared to its “sister” species Populus balsamifera L., which was recently studied in its natural range (Keller et al. 2010), information about the population genetic structure of P. trichocarpa is currently limited to the results from studies focused on (1) characterizing the overall levels of diversity, differentiation, and linkage disequilibrium (Weber and Stettler 1981; Gilchrist et al. 2006), (2) identifying species-specific diagnostic markers or distinguishing different clones (Sigurdsson et al. 1995; Rahman and Rajora 2002), or (3) assessing the extent of pollen-mediated gene flow (Slavov et al. 2009). Hidden population structure is a major confounding factor in association mapping studies because it introduces linkage disequilibrium that masks causative marker-trait associations (Balding 2006; Mackay and Powell 2007; McCarthy et al. 2008).  Additionally, the overlapping ranges of P. trichocarpa and P. balsamifera in certain areas in British Columbia (Braatne 1999), allows natural hybrids between both species to occur.  Accordingly, it is essential to assess the degree of introgression from P. balsamifera.  Thus, the accurate characterization of population ______________________________ 1 A version of this chapter will be submitted for publication. Ismail M., and C-Yi. Xie. Regional and population genetic structure of Populus trichocarpa in British Columbia.  12 structure is essential in understanding species dispersal and to shed light on biogeographical processes that shaped the current genetic structure of the species.  Moreover, population structure analysis is considered a key factor for the success of association mapping in P. trichocarpa and other Populus species. The concept that landscape characteristics could potentially influence population genetic structure dates back to at least the work of Mayr (1942) and Fisher and Ford (1947) on allopatric speciation via geographical isolation.  One important aspect of understanding the population genetic structure of a species is, therefore, to identify genetic discontinuities and relate them to current or past landscape characteristics.  Recently very active field of landscape genetics (which shares many concepts and methods with ecological genetics and phylogeography) is an amalgamation of molecular population genetics and landscape ecology that provides the tools necessary for spatially explicit analyses of population genetic structure and its causes (Manel et al. 2003; Jorgensen et al. 2005; Spear et al. 2005; Coulon et al. 2006; Storfer et al. 2007).  The majority of species in the Pacific Northwest, and in particular forest tree species (reviewed in Jaramillo-Correa et al. 2009), is thought to have survived in more than a single glacial refugium (Soltis et al. 1997; Godbout et al. 2008).  More interestingly, P. trichocarpa populations in British Columbia showed differentiation between north and south in both ecophysiological and phenotypic traits (Gornall and Guy 2007; Xie et al. 2009), suggesting the presence of more than one uniform group.  Thus the investigation of molecular information that shows regional differentiation across the species’ range might provide insights into the post-glacial migration (Tribsch and Schönswetter 2003), unraveling species past demographic history. In the present study, I hypothesize that British Columbia’s P. trichocarpa populations survived in more than one refugium that may have led to the formation of distinct groups following similar tracts of expansion to other biota in the Pacific Northwest (Soltis et al. 1997; Godbout et al. 2008; Jaramillo-Correa et al. 2009). The key objectives of this study are to characterize the population genetic structure of P. trichocarpa in British Columbia using microsatellite loci and relate the detected patterns to available information about (1) landscape characteristics of British Columbia, (2) the life history of the species, and (3) its current and past demography.    13 2.2 Materials and methods 2.2.1 Plant material and DNA extraction I used material collected by the Research Branch of the BC Ministry of Forests and Range (Xie et al. 2009) that was subsequently planted in a replicated field trial in Surrey, BC (49° 03′ N, 122° 42′ W).  The study included a total of 369 trees from 47 populations (mean = 7.9, range = 2-25 trees per population) sampled from 18 river drainages (Table 2.1, Figure 2.1). Trees were sampled using a helicopter leaving large distances between collected trees (i.e., attempts were made to sample scattered trees within a stand to avoid the selection of members of the same clone, although no minimal distance was imposed, Xie et al. 2009).  This collection sampled from the natural range of P. trichocarpa in BC (Figure 2.1).  During the winters of 2005 and 2006, cuttings with dormant vegetative buds were collected from the Surrey plantation, shipped to the laboratory, and immediately forced to flush in growth chambers with temperatures set at 25°C and photoperiods of 16 hours per day.  Newly flushed foliage was collected and stored at -80°C.  DNA extraction followed the method described by Doyle and Doyle (1987). Additionally, 15 trees representing three P. balsamifera populations were included in the study as an outgroup (Table 2.1). 2.2.2 Microsatellite genotyping In order to verify P. trichocarpa purity, all sampled P. trichocarpa trees were genotyped along with the P. balsamifera reference trees using one diagnostic nuclear microsatellite (SSR) locus (WIN3) which previously used to differentiate between poplar species (Bradshaw et al. 1994; Khasa et al. 2005; Ziegenhagen et al. 2008).  Additionally, to test if the sampled trees originated from different refugia, a north-south transect represented by 64 randomly selected P. trichocarpa trees from 32 populations (populations 1-32; Table 2.4) were genotyped for 12 chloroplast SSRs (cpSSRs) loci (Weising and Gardener 1999; Lian et al. 2005; Table 2.2).  PCR conditions for WIN3 and the 12 cpSSRs loci followed the methods listed in Weising and Gardener (1999) and Lian et al. (2005).  Similarly, the P. trichocarpa and P. balsamifera trees were genotyped for nine nuclear SSRs (Table 2.3).  PCR amplifications were performed in a 10 µL reaction volume containing 30 ng genomic DNA, 0.2 mM of dNTP, 1 µL reaction buffer containing: 51 mM KCl, 10 mM Tris-HCl and 2.5 mM MgCl2, 1-1.5 pmol of forward and reverse primers, 0.3-0.5 pmol fluorescently labeled (IRD700 or 800) M13 sequence primer [Li- Cor Biosciences, Lincoln, NE, USA], and one unit of AmpliTaq DNA polymerase [Applied  14 Biosystems, Foster City, CA, USA (ABI)].  For the nine nuclear SSRs, amplifications were performed using GeneAMP 9700 thermocyclers (ABI) with the following temperature profile: i) initial denaturation at 94°C for 3 min, ii) 29 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 1 min, and iii) 4 min final extension at 72°C.  Fragments were separated by polyacrylamide gel electrophoresis using a 4200 DNA analyzer (Li-Cor).  For nuclear and cpSSRs primers, a M13 universal sequence tail was added to the 5′-end of either forward or reverse primers (Li-Cor) and allelic sizes were estimated using the SAGA GT software (Li-Cor). 2.2.3 Data analysis 2.2.3.1 Analysis of introgression between P. trichocarpa and P. balsamifera As their natural ranges overlap in British Columbia, P. trichocarpa and P. balsamifera are known to produce natural hybrids in the overlapping zones (Braatne 1999); hence, it is necessary to eliminate any putative hybrids in the P. trichocarpa samples.  Three methods were used to assess the introgression between P. trichocarpa and P. balsamifera: 1) genotyping all samples from both species using the WIN3 locus (Khasa et al. 2005), 2) using the Bayesian clustering method of Anderson and Thompson (2002) as implemented in program NewHybrids 1.1, and 3) using model-based clustering algorithm implemented in STRUCTURE 2.3 (Hubisz et al. 2009).  Genotyping using WIN3 locus might eliminate P. trichocarpa genotypes that share WIN3 alleles with P. balsamifera.  NewHybrids computes a posterior probability distribution of individuals in a mixture from which the samples are drawn and classifies them into different genotypic categories using a Markov Chain Monte Carlo (MCMC) simulation approach (for details see Anderson and Thompson 2002).  The analysis was performed using the 10 nuclear SSRs (i.e., WIN3 locus and the nine SSRs) on the 369 P. trichocarpa and the 15 P. balsamifera reference trees.  Five independent runs were performed; each consisted of 106 MCMC generations following a burn-in period of 106, uniform prior for theta was used for all runs. Finally, STRUCTURE analysis was performed on the same genotypic data from both species (i.e., the same 10 SSRs) to determine any further grouping of P. trichocarpa genotypes with P. balsamifera using an admixture model with correlated allele frequencies.  The number of clusters (K) varied between one and six, each run had a burn-in period of 106 iterations followed by 106 data collection iterations and was replicated at least 10 times.  The number of clusters for the combined species was determined using the ad-hoc statistic ∆K (Evanno et al. 2005), and the assignment probabilities were visualized using DISTRUCT 1.1 (Rosenberg 2004).  Trees with >  15 10% assignment probability to P. balsamifera were considered as non-trichocarpa trees. NewHybrids and STRUCTURE programs were ran concurrently on the same data set. 2.2.3.2 Microsatellite diversity and deviations from Hardy-Weinberg equilibrium For the cpSSRs, the number of alleles per locus and Nei's gene diversity (He; Nei 1978) were obtained using SPAGeDi 1.2 (Hardy and Vekemans 2002).  For nuclear SSRs, the number of alleles per locus, allelic richness corrected for unequal sample size using the rarefaction approach (El Mousadik et al. 1996; Petit et al. 1998) as implemented in MSA 4.05 (Dieringer and Schlötterer 2003), observed heterozygosities (Ho), and Nei's unbiased estimate of expected heterozygosity (He; Nei 1987) were calculated using MSA 4.05.  Departures from Hardy- Weinberg equilibrium were assessed within populations based on exact tests as implemented in GENEPOP 4.0 (Raymond and Rousset 1995), and estimates of fixation indices FIS (Wright 1965) were obtained using MSA 4.05. 2.2.3.3 Population demographic inference using chloroplast microsatellites (cpSSRs) To test the hypothesis of different refugia, observed RST was compared with an estimated distribution of statistics (pRST; Hardy et al. 2003), obtained by permuting allele sizes for all allelic states in a locus among and within populations.  The significance of the differences between observed RST and pRST were estimated using 10,000 permutations using SPAGeDi 1.2. In the situation where RST > pRST (i.e., phylogeographic structure exists), alleles are expected to be more related between nearby populations than between distant populations.  Additionally, a median joining network (MJ) was constructed among cpSSRs haplotypes.  The MJ network was built using the software Network 4.5.10 (Bandelt et al. 1999; www.flux-engineering.com). 2.2.3.4 Regional population structure using nuclear SSRs To determine the number of genetically homogeneous groups (i.e., clusters); STRUCTURE 2.3 was used with an admixture model and correlated allele frequencies.  I ran the analysis under two different scenarios, with and without using explicit sampling location information as a prior.  The location prior was used as recommend by Hubisz et al. (2009) in situations where weak population structure exists which might not be captured by standard STRUCTURE models.  Both scenarios were performed at the level of river drainages.  The number of clusters (K) varied between one and six.  The program ran with a burn-in period of 106 iterations followed by 106 data collection iterations, and was replicated at least 10 times.  The number of clusters was determined using the ad-hoc statistic ∆K.  For the number of determined  16 clusters, the probabilities of membership in each cluster (Q matrices) were aligned using CLUMPP 1.1.1 (Jakobsson and Rosenberg 2007) with full search option using 1000 randomized inputs.  DISTRUCT 1.1 (Rosenberg 2004) was used to plot the average membership across STRUCTURE runs at the number of clusters (K) that was best supported by ∆K approach.  Only results from the model using location priors will be presented. Three methods were applied to test the effect of landscape characteristics on the genetic structure of P. trichocarpa populations.  First, Monmonier’s maximum difference algorithm (Monmonier 1973) as implemented in Barrier 2.2 (Manni et al. 2004) was used to identify genetic discontinuities based on genetic distances measured by RST (i.e., the degree of differentiation among populations under the stepwise mutation model, Slatkin 1995).  Values of RST for all pairs of populations were calculated using SPAGeDi 1.2 (Hardy and Vekemans 2002). In order to assess the robustness of genetic discontinuities, Nei’s genetic distance (Nei 1983) was used to generate 100 bootstrapped matrices, which were used to infer genetic discontinuities. Second, the spatially explicit clustering algorithm was used based on a Bayesian model implemented in GENELAND 3.1.5 (Guillot et al. 2008).  Its ability is well recognized compared to other spatially implicit Bayesian clustering methods, particularly when genetic differentiation is weak (Fontaine et al. 2007; Renoult et al. 2009), a typical situation for trees collected over a large geographic area.  In this analysis, 10 independent runs were performed using a fixed number of K (i.e; three groups) which is equal to that determined by STRUCTURE.  Each run consisted of 105 MCMC iterations with a thinning interval of 100, using correlated allele frequencies and spatial information.  In order to obtain the posterior probabilities of population membership of each individual and each of spatial area, these 10 runs were then post processed with a burn-in of 104 MCMC and 100 pixels along the X-axis and 200 pixels along the Y-axis. Then the consistency of the results was visually checked by comparing the outputs across the 10 runs.  Third, the degree of isolation-by-distance was assessed by testing the association between geographic and genetic distances for all pairs of populations (Rousset 1997).  This analysis was performed on the entire P. trichocarpa data set and within each group as inferred by model- based clustering (see above).  Following the regression method of Rousset (1997), geographic distances were log-transformed, and genetic distances were expressed as RST/(1- RST).  The statistical significance of the associations was tested based on a Mantel test (Mantel 1967) with 10,000 permutations, using program IBD (Jensen et al. 2005).   17 2.2.3.5 Hierarchical population differentiation To quantify the extent of hierarchical population structure, locus-by-locus analyses of molecular variance (AMOVA) was performed based on the infinite alleles model (F-statistics; Wright 1965; Weir and Cockerham 1984) and the stepwise mutation model (R-statistics; Slatkin 1995) using ARLEQUIN 3.11 (Excoffier et al. 2005).  The subdivision of populations into groups was based on the results from model-based clustering. 2.3 Results 2.3.1 Introgression between P. trichocarpa and P. balsamifera In the P. balsamifera reference genotypes, five different alleles were observed for locus WIN3 (184, 186, 266, 268, and 270 bp) with ~50% of the genotypes having the 268 bp allele; and the remainig 50% had least one of the WIN3’ remaining four alleles.  All P. trichocarpa individuals were homozygous for the WIN3 allele 184 bp with only two genotypes being heterozygous (184/266 bp) indicating their possible hybrid origin.  Additionally, the NewHybrids program identified six P. trichocarpa genotypes that grouped with P. balsamifera. The STRUCTURE analysis suggested the existence of two groups as inferred from ∆K statistics (Figure 2.2).  This grouping includes a cluster of P. balsamifera and one cluster representing P. trichocarpa with 70 individuals with probability of membership 10% (i.e; Q ≥ 0.10).  All P. trichocarpa genotypes showing similarity to P. balsamifera using WIN3, NewHybrids, and STRUCTURE were excluded from any further analyses (n = 70), despite the low but highly significant differentiation between the excluded trees and the remaining P. trichocarpa trees (FST = 0.018, P < 0.0001).  Additionally, one population that represented by one tree was also removed from the analyses.  Hence, all subsequent analyses of nuclear SSRs were carried out on 298 genotypes representing 38 populations from 16 river drainages (Table 2.4). 2.3.2 Microsatellite diversity and deviations from Hardy-Weinberg equilibrium The observed number of alleles per locus for cpSSR loci was generally low (average: 2; range: 1-3) and most of the studied loci showed very low gene diversity He (average: 0.16; range: 0.0-0.409) (Table 2.2).  The nuclear SSRs, on the other hand, produced a higher average number of alleles per locus (range: 14-31) (Table 2.3).  There was a consistent pattern of highly significant heterozygote deficiency (P < 0.0001) relative to Hardy-Weinberg expectations within each population.  The mean observed heterozygosity was 0.591 and ranged across loci from  18 0.415 to 0.816, whereas the mean expected heterozygosity was 0.769 and ranged between 0.679 and 0.835. 2.3.3 Population demographic inference using chloroplast microsatellites (cpSSRs) Significant demographic expansion signal was detected among populations (RST = 0.816 > pRST = 0.0; P < 0.0001) while no significant difference was found within populations (RST = 0.816 > pRST = 0.746, P > 0.05).  In addition, the constructed MJ network using cpSSR haplotypes showed two main groups (northern and southern) (Figure 2.3a).  A clinal trend in allele sizes at cpSSR loci (CSU02, CSU03, CSU04, CCMP4, and CCMP6) was detected (see Figure 2.3b for CSU03 locus).  Moreover, CSU03, CSU04, CCMP4, CCMP5, and CCMP6, all showed distinct allele frequency distribution that varied between northern and southern haplotype groups (Figure 2.3c). 2.3.4 Regional population structure The analyses of log likelihood values across STRUCTURE runs using ∆K supported the presence of three groups (Figure 2.4a).  The first group included Stikine, Nass, Kitimat, Homathko, and EVIN, the second group consisted of Bulkley, Dean, Jervis, and EVIC and the third group harboured Skeena, FRMcG, FRCRQR, Knight, LHG, Squamish, and Fraser (Figure 2.4b).  However, the observed grouping did not show a distinct differentiation at most of the studied drainages except for Skeena and Homathko (Figure 2.4b).  In addition, the proportion of drainage membership in each group was mosaic and did not correspond to the respective geographic origin (Figure 2.1). The analysis using Monmonier’s algorithm identified one major genetic barrier that occurred south of the Omineca and Skeena Mountains (Figure 2.5), separating the studied populations into two main regions.  The first region included populations from Stikine, Nass, and populations 2, 3, and 7 from Skeena (Table 2.4) and the second region included the remaining populations.  On the other hand, the Bayesian spatial analysis using GENELAND indicated that the STRUCTURE-identified groups are spatially distinct.  The pre-assumed K = 3, determined by STRUCTURE, were spatially assigned into three clusters; cluster 1, 2, and 3, corresponding to northern, interior, and southern regions, respectively (Figure 2.6).  The northern region was separated from the interior and southern regions approximately at the same area where Monmonier’s algorithm inferred the genetic barrier (Figure 2.5).  Remarkably, there was no association between genetic and geographic distances for pairs of populations from the entire  19 area (r = 0.02, P > 0.05) as determined in the Mantel analysis as well as in each group identified by the STRUCTURE (r = -0.31, 0.01, and 0.06, P > 0.05 for groups 1, 2, and 3, respectively). The three inferred groups by STRUCTURE showed similar levels of polymorphisms; the number of alleles per locus and allelic richness as well as observed and expected heterozygosities were consistent among groups (Table 2.5). 2.3.5 Hierarchical population differentiation As expected, AMOVA results based on nuclear SSRs, applying both the infinite alleles model (IAM) and the stepwise mutation model (SMM), indicated that approximately 93% of genetic variation resided within populations (Table 2.6).  Differentiation among populations was low but highly significant (FST = 0.075, RST = 0.07; P < 0.0001); however, with regional differentiation accounting for 0.29 % and -0.07 % of total genetic variation under IAM and SSM, respectively.  Except for among-groups variances under IAM and SMM, all variance components were highly significant (P < 0.0001). 2.4 Discussion 2.4.1 Introgression between P. trichocarpa and P. balsamifera The WIN3 locus produced one fragment in non-hybrid P. trichocarpa (184 bp) while 80% of P. balsamifera samples had fragments ranging between 266-270 bp.  In spite of the previously reported allele size differences between species, 20% of P. balsamifera had alleles 184-186 bp (Bradshaw et al. 1995; Khasa et al. 2005).  After genotyping individuals in both species, two of the P. trichocarpa were identified as putative hybrids.  The observed putative hybrids were not surprising since they were sampled from the contact zone between the two species ranges (McGregor River: 54º N122º W and Quesnel River: 52º N and 122º W).  It should be stated that the concurrent presence of the WIN3 locus’ diverse fragments of the two species in an individual is indicative of its hybrid status.  The NewHybrids program identified six additional individuals as putative hybrids by assigning them to the P. balsamifera group, these individuals are different from those identified using the WIN3 locus.  Additionally, the STRUCTURE identified more individuals (n = 70, including the 8 listed above) as putative hybrids, supporting the presence of a higher admixture proportion, particularly at the northern (n = 28) and interior (n = 18) ranges of P. trichocarpa in BC (Figure 2.2).  The remaining putative hybrids identified by the STRUCTURE analysis (n = 24) originated from the southern populations.  These individuals could not be classified as either hybrids or backcrosses, thus their  20 classification may be the result of the genetic similarity between these two species which are reported to share 25% of their ancestral polymorphisms (Olson et al. 2010).  The lack of congruence between the NewHybrids and STRUCTURE in identifying hybrid genotypes may reflect the high genetic similarity between the two species.  Moreover, the estimated population differentiation between the identified hybrids and the remaining P. trichocarpa trees was low but significant (FST = 0.018, P < 0.0001).  All “hybrid/admixed” individuals identified by any of the three approaches (WIN3 locus, NewHybrids, and STRUCTURE) were excluded from the study. 2.4.2 Microsatellite diversity and deviations from Hardy-Weinberg equilibrium The levels of microsatellite polymorphisms and diversity were similar to those reported for P. trichocarpa from Oregon (Slavov et al. 2009) and other Populus species (e.g., Wyman et al. 2003; Imbert and Lefevre 2003; Suvanto and Latva-Karjanmaa 2005; Smulders et al. 2008). The pronounced heterozygote deficiency detected was substantially higher than the median reported for other studies in Populus, but falls within the range of the reported values of FIS (Table 4 in Slavov and Zhelev 2010).  It is possible that the observed heterozygote deficiency is in part due to the occurrence of null alleles and/or allele ‘drop-out,’ both of which are common at microsatellite loci (Pemberton et al. 1995; Slate et al. 2000; Ewen et al. 2000; Dakin and Avise 2004; Hoffman and Amos 2005).  The prevalence of severe heterozygote deficiency (i.e., for all loci and in most populations); however, is in good agreement with the results from a study of the population genetic structure of P. trichocarpa in Washington and Oregon (just south of the area we sampled) based on allozyme markers (Weber and Stettler 1981).  Weber and Stettler (1981) detected significant heterozygote deficiency for (1) at least one of the 11 polymorphic allozyme loci they used in nine of the ten populations sampled and (2) multiple loci in six populations. Similarly, heterozygote deficiency was detected in a microsatellite-based study of pollen flow in two P. trichocarpa populations in Oregon, and its magnitude was 2.7 times greater in the population in which excessive mating among related trees appeared more likely (Slavov et al. 2009).  Thus, the existence of substructure (i.e., the Wahlund effect, Hedrick 2005a) in the P. trichocarpa populations included in this study, and possibly throughout the Pacific Northwest of North America is likely combined with other factors (cf. next paragraph). If substructure is indeed present in P. trichocarpa populations, what biological characteristics of the species may be causing it?  Several life history traits of riparian P. trichocarpa could conceivably lead to fine-scale population substructure as a result of excessive mating among trees that are (1) spatially close to one another, and (2) more closely related than  21 the average for the population.  First, under natural conditions, seeds in all riparian P. trichocarpa remain viable for only 1-2 weeks and for only 2-3 days after imbibition (Braatne et al. 1996).  Second, Populus seedlings are shade- and drought-intolerant, and the availability of a recently disturbed site and of sufficient moisture in the first month of establishment are the most critical factors for their survival (Steinberg 2001; Braatne et al. 1996, 2007).  Because of this short window of opportunity, successful seed establishment critically depends on perfect synchronization of (1) disturbance creating appropriate microsites, (2) spring river runoff continuously providing sufficient moisture, and (3) seed dispersal, the timing of which varies greatly within P. trichocarpa populations (S. P. DiFazio and G. T. Slavov, personal communication).  Thus, establishment episodes occur as rarely as once every 5-10 years (Braatne et al. 1996; Steinberg 2001), and relatively few maternal parents with opportune phenologies may produce the resulting cohorts.  Once these cohorts reach reproductive maturity, the frequency of mating among close relatives (e.g., sib mating) might be substantial because approximately 50% of the mating events in P. trichocarpa populations are likely to occur between trees located less than 500 m apart (Slavov et al. 2009). 2.4.3 Population demographic inference using chloroplast microsattelites (cpSSRs) The level of polymorphisms observed at the cpSSRs is lower than that reported for most woody species (e.g., Gomeze et al. 2003; Aguinagalde et al. 2005).  However, the study revealed the presence of demographic structure in P. trichocarpa natural populations as evidenced by higher RST than pRST, suggesting that the species’ contemporary range could be the product of more than one refugium.  It is expected that the populations originating from the same refugium will have RST = pRST as a result of their shared alleles. This pattern was also supported by the MJ network that provided more than one haplotype group (Figure 2.3a), rejecting a single population expansion hypothesis.  If only one panmictic population was present, we would have observed a star-like haplotype network.  In addition, the allele size of five loci showed gradual change along latitude (e.g., locus CSU03, Figure 2.3b).  Additionally, the allele frequency distribution for the five cpDNA loci, despite the low number of alleles, showed a distinct distribution among northern and southern haplotype groups (Figure 2.3c).  Surprisingly, a similar clinal trend was also reported for photosynthetic rate in P. trichocarpa populations of BC (Gornal and Guy 2007).  In their study, Gornal and Guy (2007) speculated that the observed higher photosynthetic ability in northern populations may be controlled by alleles that are not present in their southern counterparts as a result of local  22 adaptation.  Thus, the hypothesis regarding the origin of P. trichocarpa populations in British Columbia, the presence of two (northern and southern) refugia, mirroring several reports on a large number of biota and fauna species in the Pacific Northwest (Soltis et al. 1997; Godbout et al. 2008; Jaramillo-Correa et al. 2009).  These results indicate the presence of at least two main genetically differentiated groups, which probably originated from two different glacial refugia. 2.4.4 Population structure and landscape genetics Population structure analysis based on nuclear SSRs provided support for the presence of three genetically homogeneous groups; however, the clustering also showed a high level of admixture within the three identified groups (Figure 2.4b).  Only Homathko and Skeena drainages showed low levels of admixture and they belonged to different groups; 1 and 3, respectively (Figure 2.4b). The amalgamation of genetic and spatial data possibly explains how landscape features as well as spatial information have important roles in shaping the genetic structure of populations. Using Monmonier’s algorithm, one major barrier was identified that runs south of the Omineca and Skeena mountains separating the studied populations into two main regions (i.e., northern and southern).  Additionally, Bayesian analysis showed that the studied populations are well separated spatially and formed three distinct groups corresponding to northern, interior, and southern regions (three clusters, Figure 2.6).  Although K = 3 was used as a prior, the predefined clusters showed clear spatial separation and separated the northern group from its southern counterpart, in agreement with the phylogeography hypothesis.  The presence of mountain ranges may have limited gene flow between north and south populations; however, some long distance seed and pollen dispersal probably leads to mixed populations.  Therefore, high gene flow significantly influences the population structure patterns and the hierarchical genetic differentiation (see below). Because of the spatial separation observed between the subpopulations, no isolation-by- migration was observed, indicating that genetic drift and migration might have caused this pattern.  In a parallel study based on the same collection of plant materials, Xie et al. (2009) detected a clear genetic discontinuity between northern and southern populations based on quantitative traits (i.e., height, severity of infection by Valsa sordida and Melampsora occidentalis, and abnormality of vegetative bud break) measured in a common garden in Surrey, BC.  The location of the discontinuity detected by Xie et al. (2009) approximately corresponds with the one detected based on nuclear microsatellite data.  Xie et al. (2009) indicated that trees  23 originating north of the Kitimat River drainage are on average 54% shorter than those from the southern region, and also have dramatically higher rates of abnormal flushing and infection by the two fungal diseases in the Surrey common garden.  A north-south grouping was also observed by Weber and Stettler (1981) based on allozyme data from 10 populations in Washington and Oregon. A similar pattern, with one or more latitudinal discontinuities that define more or less distinct southern and northern genetic groups, has been detected in a number of taxonomically diverse perennial herbaceous plants, shrubs, and trees across the Pacific Northwest of North America (e.g., Wheeler and Guries 1982; Steinhoff et al. 1983; Furnier and Adams 1986; Li and Adams 1989; Soltis et al. 1997; Ritland et al. 2001; Brunsfeld et al. 2007; Godbout et al. 2008). Presuming that this recurrent phylogeographic pattern results from genetic drift and patterns of migration during and after Pleistocene glaciations, Soltis et al. (1997) proposed two hypothetical scenarios for its underlying mechanism.  Under the first of these scenarios, the “North-South Recolonization Hypothesis,” plants that currently inhabit the Pacific Northwest and show north- south regional differentiation survived the glaciation in refugia located both south of the glacial maximum (e.g., in the Klamath-Siskiyou Mountains) and in one or more unglaciated areas at much higher latitudes (e.g., the Vancouver Island and the Queen Charlotte Islands in BC, and unglaciated areas of central Alaska).  Populations in these southern and northern refugia became differentiated as a result of genetic drift as indicated by nuclear SSRs (possibly undergoing severe bottlenecks).  Following deglaciation, populations migrated south of the northern refugia and north of the southern refugia, and restored their continuous distribution across the Pacific Northwest but remained differentiated.  Under the second scenario, the “Leading Edge Hypothesis,” refugia existed only south of the glacial maximum.  Early recolonization involved long-distance dispersal, which formed a leading edge of small groups that (1) experienced severe genetic drift and became differentiated from their source populations, and (2) dominated further expansion to the north because of their proximity to habitats that were becoming available as the ice was receding. Consistent with both of these scenarios, P. trichocarpa occurs only sporadically (i.e., rather than in large continuous populations) in an area just south of the discontinuity, we identified (Peterson et al. 1996; Xie et al. 2009).  A ‘no-cottonwood belt’ identified by Xie et al. (2009) may have prevented or delayed the homogenization between the two regions.  The current analyses, however, consistently suggests that the genetic discontinuity may be caused by the physical barrier to gene flow imposed by the Omineca and Skeena Mountains, which does not  24 necessarily contradict the results from the analyses of quantitative traits (Figure 2 in Xie et al. 2009). Although a pattern consistent with the two scenarios proposed by Soltis et al. (1997) has been detected in a number of tree species using nuclear markers (e.g., Wheeler and Guries 1982; Steinhoff et al. 1983; Ritland et al. 2001; this study), north-south differentiation appears to be particularly strong for maternally-inherited organellar markers, as shown using cpSSRs (reviewed by Soltis et al. 1997; Godbout et al. 2008) and in the present study.  Presumably, this is because maternally-inherited genomes (1) have lower effective population sizes than nuclear loci (Petit et al. 2005), (2) can experience gene flow only through seed dispersal, and (3) technically represent a single haploid locus, which makes them potentially more prone to selective sweeps (Muir and Filatov 2007).  All of these factors make the differentiation or even fixation of alternate organellar haplotypes in northern and southern populations more likely than for nuclear loci under both of the hypothetical scenarios proposed by Soltis et al. (1997). This prediction appears to hold true in several plant species from the Pacific Northwest, particularly for organellar markers (reviewed by Soltis et al. 1997) as in the cpSSR data (Figure 2a).  Thus, P. trichocarpa natural populations appear to have followed the same migration route as other plant species after deglaciation in the Pacific Northwest.  The presence of populations probably was restricted to colonize similar areas available for plant growth after the glaciers retreated.  However, previous reports identified other mountain ranges as putative barriers to gene flow that contributed to regional and population differentiation (e.g., Soltis et al. 1997; Jaramillo-Correa et al. 2009). 2.4.5 Hierarchical population differentiation Under both the IAM and SMM, the extent of differentiation among the studied populations was generally low but slightly higher compared to values reported in other studies of nuclear microsatellites in Populus (Slavov and Zhelev 2010).  Surprisingly, the estimated FST and RST values were also higher than the GST value reported for populations of P. trichocarpa in Washington and Oregon based on allozyme markers (Weber and Stettler 1981).  Values of FST obtained based on nuclear microsatellite data under the IAM tend to be lower than those based on less variable markers presumably (1) as a result of the constraint on FST imposed by the higher heterozygosities of microsatellite markers, and (2) because the IAM does not account for allele size differences, which is believed to be substantial at microsatellite loci (Hedrick et al. 1999; Hedrick 2005b; Rosenberg et al. 2002, 2003; Excoffier et al. 2003).  25 Although most of the observed variation was attributable to within-population differences under both IAM and SMM, the differentiation among populations within groups remained highly significant and stronger than in other species of Populus or in P. trichocarpa populations from Washington and Oregon.  Thus, the existence of measurable hierarchical structure in P. trichocarpa populations in British Columbia appears to be robust, despite the fact that most of the variability was observed within populations. 2.5 Conclusions Nine highly variable nuclear and 12 maternal chloroplast microsatellite markers provided sufficient resolution to study the demographic history and population genetic structure of P. trichocarpa populations in British Columbia, supporting the hypothesis of several glacial refugia.  As expected, within-population variation was high compared to among-population variation.  A high admixture proportion was prominent at the studied populations, which is likely to occur as a result of excessive gene flow among closely related trees in genetic neighborhoods and long distance propagule dispersal.  I identified a clear genetic discontinuity, which subdivided the studied area into two major spatially and genetically distinct groups.  The possible cause for this regional structure is the demographic expansion and recolonization patterns after Pleistocene glaciations.  Differentiation among populations was relatively weak but measurable and probably caused by severe bottlenecks.  26   Table 2.1 List of P. trichocarpa drainages, populations (number, code, and sample size), and sample locations included in the present study.  Note the P. balsamifera reference population is also included (No. 48). No. na Drainage Location Latitude Longitude 1 4 Taku Nakina River 58º56′ 133º11′ 2 10  Taku River 58º42′ 133º24′ 3 2 Stikine  Forrest Kerr Creek 56º44′ 130º38′ 4 5  Iskut River  56º49′ 130º45′ 5 2  Stikine River 57º02′ 131º47′ 6 7 Bell-Irving Bell-Irving River 56º48′ 129º41′ 7 2  Bell-Irving River (Treaty) 55º39′ 128º48′ 8 7 Nass Nass River  56º34′ 129º49′ 9 2 Skeena  Cedarvale 55º01′ 128º20′ 10 3  Exchamsiks River 54º25′ 129º26′ 11 2  Hazelton 55º13′ 127º40′ 12 3  Kitwanga 55º05′ 128º11′ 13 25  Skeena River 54º25′ 128º55′ 14 17 Bulkley Bulkkey River 54º52′ 127º09′ 15 2 Kitimat Hirsch Creek 54º04′ 128º27′ 16 14  Kitimat River 54º09′ 128º36′ 17 18 Dean Dean River 52º49′ 126º46′ 18 4 FRMcG McGregor (McGregor River) 54º11′ 122º00′ 19 6  Fraser River (Shelley) 54º02′ 122º36′ 20 5  Fraser River (Willow River) 53º55′ 122º17′ 21 6 FRCRQR Cunningham River (Kimball Creek) 52º56′ 121º10′ 22 4  Fraser River (Baker Creek) 52º57′ 122º52′ 23 3  Fraser River (Cottonwood) 53º02′ 122º09′ 24 3  Fraser River 53º04′ 122º31′ 25 4  Quesnel River 52º58′ 122º19′ 26 21 Knight west Klinaklini River 51º26′ 125º37′ 27 17 Homathko Homathko River 51º06′ 124º53′ 28 22 Jervis Skwawka River  50º14′ 123º59′ 29 12 LHG Harrison (Lillooet River) 49º58′ 122º26′ 30 20  Lillooet (Lillooet River) 50º30′ 123º05′ 31 5 Squamish Glacier 50º06′ 123º00′ 32 12  Squamish River  49º58′ 123º18′ 33 4 EVIC Black Creek 49º50′ 125º11′ a n is the number of trees sampled.  27    Table 2.1 Continued.  No. na Drainage Location Latitude Longitude 34 4 EVIC Campbell River  49º57′ 125º15′ 35 4  Courtenay 49º40′ 125º04′ 36 5  Fanny Bay  49º31′ 124º51′ 37 12 EVIN Salmon River  50º15′ 125º50′ 38 4  White river  50º08′ 126º03′ 39 16 Fraser Chilliwack River  49º06′ 121º39′ 40 9  Hope North 49º25′ 121º30′ 41 9  Harrison River  49º15′ 121º54′ 42 5  McMillan Island  49º11′ 122º35′ 43 3  Matsqui Island  49º07′ 122º20′ 44 9  Nahatlatch River  49º59′ 121º42′ 45 5  Vender Canal  49º09′ 122º06′ 46 5  Wellington Bar 49º40′ 121º25′ 47 6  Yale West 49º34′ 121º26′ 48b 5  Cypress Hills 49º04′ 109º28′  5  Stettler 52º26′ 112º44′  5  Grand Prairie 54º45′ 118º53′ a n is the number of trees sampled. bP. balsamifera reference population.  28    Table 2.2 Number of alleles, allele size range, and Nei’s gene diversity (He) of 12 cpSSRs used in 32 P. trichocarpa populations representing a north- south transect from British Columbia. Locus No. of alleles Size range (bp) He CSU01 1 178 0.000 CSU02 3 148-152 0.253 CSU03 3 302-308 0.428 CSU04 2 164-170 0.351 CSU05 1 152 0.000 CSU07 2 198-201 0.062 CCMP2 3 226-234 0.133 CCMP3 2 120-123 0.033 CCMP4 2 129-130 0.409 CCMP5 2 168-170 0.051 CCMP6 3 119-125 0.203 CCMP8 1 53 0.000 Mean 2.08  0.160  29   Table 2.3 Primer sequences, linkage group location, number of alleles for the primers shown, allele size range, and heterozygosities of 9 nuclear SSRs in 298 P. trichocarpa individual trees from British Columbia. Locus Primer sequence (5'- 3') LGa kb Size range (bp) Hoc Hed PMGC-2385 F e: TTCTTCACCTGGGCAATATG Rf: CTTGGCTGTAAATGACGAGTC I 31 114-182 0.816 0.835 PMGC-2235 F: CAAAATAGTAAGTGTGATGG R: CACACATTCTCTCATTCAAAGC IV 24 128-176 0.641 0.818 PMGC-2522 F: TCTGTTAATTTCTCAGCTGTTG R: CTTTACTAAACTTTTTACTGC IX 29 148-230 0.649 0.778 PMGC-2571 F: TCGCAGATTCATGTAACCC R: ACTGTATGTTGACCATGCCC X 13 106-138 0.522 0.739 PMGC-2885 F: TGATCAAATTGGATTTGAATG R: AAAGATGAACATGGCTAGCTC XII 19 296-338 0.626 0.757 PMGC-2610 F: ACACGCAAGAACATACATAAG R: GATTAACATGTTTCGCTACGC XIII 15 102-132 0.455 0.718 PMGC-2515 F: AAAGGGATTGTTAATAAACCC R: CAAAATCATAAAAGACAGGGC XIV 17 208-244 0.660 0.804 PMGC-2585 F: ACTGCTGTGTATTGCCCTAG R: TAGTTGAAGTTGGAGCACAAC XV 14 138-164 0.415 0.679 ORPM-15 F: CGTGAGTTTTGAGGCCATTT R: CATGGAAAGGATCACCCACT XVII 27 242-294 0.533 0.795 Mean   21  0.591 0.769 a LG; linkage group to which the marker was mapped by Tuskan et al. (2004). b k; number of alleles per locus. c  Ho; observed heterozygosity. d  He; unbiased estimate of expected heterozygosity (Nei 1987). e F; sequence of the forward primer. f R; sequence of the reverse primer.  30   Table 2.4 List of the 38 P. trichocarpa populations as they appear in Figure 2.4.                      a n; number of trees sampled.  No. Population code Drainage na 1 ISK Stikine 5 2 NAS Nass 5 3 CDR 2 4 EXC 3 5 HAZ 2 6 KTW 3 7 SKN Skeena 20 8 BUL Bulkley 6 9 HIR 2 10 KTM Kitimat 13 11 DEN Dean 18 12 WLO FRMcG 5 13 KIM 3 14 QBK 4 15 QCT 2 16 QFR FRCRQR 3 17 KLN Knight 19 18 HOM Homathko 17 19 SKW Jervis 22 20 HAR 12 21 LIL LHG 20 22 GLC 5 23 SQM Squamish 9 24 BLC 4 25 CMB 4 26 CNY 4 27 FNY EVIC 5 28 SLM 12 29 WHT EVIN 4 30 CHW 16 31 HOP 9 32 HRS 7 33 MCM Fraser 5  31    Table 2.4 Continued.  No. Population code Drainage na 34 MTS 3 35 NHT 9 36 VND 5 37 WEL 5 38 YAL Fraser 6 a n; number of trees sampled.   32    Table 2.5 Nuclear SSRs polymorphism and diversity in 38 P. trichocarpa natural populations from British Columbia and as classified in three groups identified using model-based clustering analyses.  Standard errors are shown in parentheses unless otherwise indicated. Data set Npopa nb Ac r(10)d Hoe Hef FIS (95% CI)g All populations 38 8 5.96 (0.07) 6.59 (0.07) 0.591 (0.007) 0.769 (0.003) 0.297 (0.228-0.367) Group 1 7 8 6.25 (0.17) 6.27 (0.15) 0.568 (0.022) 0.798 (0.006) 0.292 (0.203-0.381) Group 2 7 9 6.27 (0.19) 6.28 (0.16) 0.590 (0.020) 0.749 (0.011) 0.310 (0.213-0.424) Group 3 24 7 5.78 (0.08) 6.59 (0.08) 0.598 (0.009) 0.768 (0.004) 0.288 (0.213-0.368) a  Npop; number of populations. b  n; average number of trees genotyped per population. c  A; average number of alleles per locus detected in each population. d  r(10); allelic richness, defined as the number of alleles that would have been detected if 10 alleles (i.e., 5  trees: rarefaction cutoff) had been sampled in each population. e  Ho; observed heterozygosity. f  He; unbiased estimate of expected heterozygosity (Nei 1987). g  FIS; fixation index measuring the correlation of alleles within individuals relative to that within populations. (Wright 1965). 95% confidence intervals based on bootstrapping over loci are shown in parentheses.   33    Table 2.6 Hierarchical population structure of 38 P. trichocarpa populations from British Columbia. Variance component (%) Level of variation IAMa SSMb Among populations 7.50c 7.00c Among groups 0.29d -0.07e Among populations within groups 7.26c 6.98c Among individuals within populations 92.45c 93.10c a AMOVA under the infinite alleles model. b AMOVA under the stepwise mutation model. c P < 0.0001, d P < 0.05, e P > 0.05.   34    Figure 2.1 British Columbia’s P. trichocarpa river drainages location and their average proportional membership based on the STRUCTURE-defined  groups.  River drainages with asterisk were excluded from the analysis.    35    Figure 2.2 STRUCTURE analyses using 10 nuclear SSRs for 369 P. trichocarpa and 15 P. balsamifera genotypes: (a) magnitude of ∆K as a function of K calculated using combined P. trichocarpa and P. balsamifera genotypic data, and (b) DISTRUCT output.  Individual genotypes are in north-south gradient.  (a)   (b)   36    Figure 2.3 Haplotype structure and allele frequency distribution of cpSSR loci in P. trichocarpa populations representing a British Columbia north-south transect: (a) Median-joining network showing northern and southern haplotype groups, haplotypes are indicated by circles (circle size represents the observed haplotype frequency, the most common haplotype (Hi) is indicated at each circle, and median vector is labeled Mv1), (b) allele size distribution along latitudinal gradient for CSU03 locus, and (c) allele frequency distribution for five cpSSR loci within northern and southern haplotype groups.                        37 (a)   (b)  (c)  38    Figure 2.4 Nuclear SSRs-based STRUCTURE analyses for 298 P. trichocarpa trees from 16 river drainages in British Columbia: (a) magnitude of ∆K as a function of K, (b) distribution of STRUCTURE-defined groups among 16 river drainages for K = 3, membership coefficients for each drainage were averaged across population within drainage to generate drainages Q-matrix as output from DISTRUCT.  Red, yellow, and turquoise colours represent group 1, group 2, and group 3, respectively.  (a)  (b)    39    Figure 2.5 Primary genetic discontinuity identified using Monmonier’s algorithm: genetic discontinuity identified using a single RST matrix (yellow line), and genetic discontinuity identified using 100 bootstrapped Nei’s genetic distance matrices (blue lines), bootstrap scores (red labels) are indicated.  Red lines represent Voronoï tessellation.  Population labels (dark brown lables) are shown in Table 2.4.    40    Figure 2.6 Maps of GENELAND assignments of 298 P. trichocarpa individuals (dots) to three clusters (a, b, and c).  The three plots represent the assignment of pixels to each cluster. The highest membership values are in light yellow.  The plots are based on the highest probability run at K = 3.                  41        (a) (b)  42       (c)   43 2.6 References Aguinagalde, I., A. Hampe, A. Mohanty, et al. 2005. Effects of life-history traits and species distribution on genetic structure at maternally inherited markers in European trees and shrubs. J Biogeogr 32:329-339. Ahuja, M. R., and D. B. Neal. 2005. Evolution of genome size in conifers. Silvae Genet 54:126- 137. Anderson, E. C., and E. A. Thompson. 2002. A model-based method for identifying species hybrids using multilocus genetic data. Genetics 160:1217-1229. Balding, D. J. 2006. A tutorial on statistical methods for population association studies. Nature Rev Genet 7:781-791. Bandelt, H-J., P. Forster, and A. Roehl. 1999. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16:37-48. Braatne, J. H. 1999. Biological aspects of hybrid poplar cultivation on floodplains in Western North America: A review, USEPA Region 10, Seattle, WA. Braatne, J. H., R. Jamieson, K. M. Gill, et al. 2007. Instream flows and the decline of riparian cottonwoods along the Yakima River, Washington, USA. River Research and Applications 23:247-267. Braatne, J. H., S. B. Rood, and P. E. Heilman. 1996. Life history, ecology, and reproduction of riparian cottonwoods in North America. Pp 57-85 in F. R. Stettler, H. D. Jr. Bradshaw, P. E. Heilman, et al. (eds) Biology of Populus and its implication for management and conservation. NRC Research Press, Ottawa, Ontario, Canada. Bradshaw, H. D. Jr., M. Villar, B. D. Watson, et al. 1994. Molecular genetics of growth and development in Populus. III. A genetic linkage map of a hybrid poplar composed of RFLP, STS, and RAPD markers. Theor Appl Genet 89:167-178. Brunsfeld, S. J., T. R. Miller, and B. C. Carstens. 2007. Insights into the biogeography of the Pacific Northwest of North America: Evidence from the phylogeography of Salix melanopsis. Syst Botany 32:129-139. Christopher, M. E., M. Miranda, I. T. Major, et al. 2004. Gene expression profiling of systemically wound-induced defenses in hybrid poplar. Planta 219:936-947. Coulon, A., G. Guillot, J-F. Cosson, et al. 2006. Genetics structure is influenced by landscape features: empirical evidence from a roe deer population. Mol Ecol 15:1669-679.   44 Dakin, E. E., and J. C. Avise. 2004. Microsatellite null alleles in parentage analysis. Heredity 93:504-509. Dieringer, D., and C. Schlötterer. 2003. MICROSATELLITE ANALYZER (MSA): a platform- independent analysis tool for large microsatellite data sets. Mol Ecol Notes 3:167-169. Doyle, J. J., and J. L. Doyle. 1987. A rapid DNA isolation procedure from small quantities of fresh leaf tissues. Phytochem Bull 19:11-15. El Mousadik, A., and R. J. Petit. 1996. Chloroplast DNA phylogeography of the argan tree of Morocco. Mol Ecol 5:547-555. Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14: 2611-2620. Ewen, K. R., M. Bahlo, S. A. Treloar, et al. 2000. Identification and analysis of error types in high-throughput genotyping. Am J Hum Genet 67:727-736. Excoffier, L., and G. Hamilton. 2003. Comment on "Genetic structure of human populations". Science 300:1877. Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evol Bioinform Online 1:47-50. Fisher, R. A., and E. B. Ford. 1947. The spread of a gene in natural conditions in a colony of moth Panaxia dominula L. Heredity 1:143-174. Furnier, G. R., and W. T. Adams. 1986. Geographic patterns of allozyme variation in Jeffrey pine. Am J Bot 73:1009-1015. Gilchrist, E. J., G. W. Haughn, C. C. Ying, et al. 2006. Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol 15:1367-1378. Godbout, J., A. Fazekas, C. H. Newton, et al. 2008. Glacial vicariance in the Pacific Northwest: evidence from a lodgepole pine mitochondrial DNA minisatellite for multiple genetically distinct and widely separated refugia. Mol Ecol 17:2463-2475. Gomez, A., Gonzalez-Martinez S. C., Collada C., et al. 2003. Complex  population genetic structure in the endemic Canary Island pine revealed using chloroplast microsatellite markers. Theor Appl Genet 107:1123-1131. Gornall, J. L., and R. D. Guy. 2007. Geographic variation in ecophysiological traits of black cottonwood (Populus trichocarpa). Can J Bot 85:1202-1213. Government of Canada (2008) Clean energy. http://www.cleanenergy.gc.ca/   45 Guillot, G., F. Santos, and A. Estoup. 2008. Analysing georeferenced population genetics data with GENELAND: a new algorithm to deal with null alleles and a friendly graphical user interface. Bioinformatics 24:1406-1407. Hamrick, J. L., M. J. W. Godt, and L. S. Susan. 1992. Factors influencing levels of genetic diversity in woody plant species. New Forests 6:95-124. Hardy, O. J, N. Charbonnel, H. Freville, et al. 2003. Microsatellite allele size: a simple test to assess their significance in genetic structure at the individual or population levels. Mol Ecol Notes 2:618-620. Hardy, O. J., and X. Vekemans. 2002. SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes 2:618-620. Hedrick, P. W. 2005a. Genetics of populations, 3rd edn. Jones and Bartlett, Sudbury, Massachusetts. Hedrick, P. W. 2005b. A standardized genetic differentiation measure. Evol 59:1633-1638. Hedrick, P. W., O. Savolainen, and K. Karkkainen. 1999. Factors influencing the extent of inbreeding depression: an example from Scots pine. Heredity 82:441-450. Hoffman, J. I., and W. Amos. 2005. Microsatellite genotyping errors: detection approaches, common sources and consequences for paternal exclusion. Mol Ecol 14:599-612. Hubisz, M. J., D. Falush, M. Stephens, et al. 2009. Inferring weak population structure with the assistance of sample group information. Mol Ecol Res 9:1322-1332. Imbert, E., and F. Lefevre. 2003. Dispersal and gene flow of Populus nigra (Salicaceae) along a dynamic river system. Ecology 91:447-456. Jakobsson, M., and N. A. Rosenberg. 2007. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801-1806. Jaramillo-Correa, J. P., J. Beaulieu, K. Damase, et al. 2009. Inferring the past from the present phylogeographic structure of North American forest trees: seeing the forest for the genes. Can J For Res 39:286-307. Jensen, J. L., A. J. Bohonak, and S. T. Kelley. 2005. Isolation by distance, web service. BMC Genet 6:13. Jorgensen, H. B. H., M. M. Hansen, D. Bekkevold, et al. 2005. Marine landscapes and population structure of herring (Clupea harengus L.) in the Baltic Sea. Mol Ecol 14:3219-3234.   46 Keller, S. R., M. S. Olson, S. Silim, et al. 2010. Genomic diversity, population structure, and migration following rapid range expansion in the Balsam Poplar, Populus balsamifera. Mol Ecol 19:1212-1226. Khasa, D., P. Pollefeys, A. Navarro-Quezada, et al. 2005. Species-specific microsatellite markers to monitor gene flow between exotic poplars and their natural relatives in eastern North America. Mol Ecol Notes 5:920-923. Lian, C. L., R. Oishi, N. Miyashita, et al. 2003. Genetic structure and reproduction dynamics of Salix reinii during primary succession on Mount Fuji, as revealed by nuclear and chloroplast microsatellite analysis. Mol Ecol 12:609-618. Mackay, I., and W. Powell. 2007. Methods for linkage disequilibrium mapping in crops. Trends Plant Sci 12:57-63. Manel, S., M. K. Schwartz, G. Luikart, et al. 2003. Landscape genetics: Combining landscape ecology and population genetics. Trends Ecol Evol 18:189-197. Manni, F., E. Guerard, and E. Heyer. 2004. Geographic patterns (genetic, morphologic, linguistic) variation: how barriers can be detected by using Monmonier’s algorithm. Hum Biol 76:173-190. Mantel, N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Res 27:209-220. Mayr, E. 1942. Systematics and the Origin of Species. Columbia University Press, New York. McCarthy, M. I., G. R. Abecasis, L. R. Cardon, et al. 2008. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev Genet 9:356-369. Monmonier, M. 1973. Maximum-difference barriers: An alternative numerical regionalization method. Geo Anal 3:245-261. Muir, G., and D. Filatov. 2007. A selective sweep in the chloroplast DNA of dioecious Silene (Section Elisanthe). Genetics 177:1239-1247. Nei, M., F. Tajima, and Y. Tateno. 1983. Accuracy of estimated phylogenetic trees from molecular data. J Mol Evol 19:153-170. Nei, M. 1987. Molecular Evolutionary Genetics, Columbia University Press, New York. Nei, M. 1978. Estimation of average heterozygosity and genetic distance for small number of individuals. Genetics 89:583-590. Olson, M. S., A. L. Robertson, N. Takebayashi, et al. 2010. Nucleotide diversity and linkage disequilibrium in balsam poplar (Populus balsamifera). New Phytol 186:526-536.   47 Pan, X., J. F. Kadla, K. Ehara, et al. 2006. Organosolv ethanol lignin from hybrid poplar as a radical scavenger: Relationship between lignin structure, extraction conditions, and antioxidant activity. J Agr Food Chem 54:5806-5813. Pemberton, J. M., J. Slate, D. R. Bancroft, et al. 1995. Nonamplifying alleles at microsatellite loci: A caution for parentage and population studies. Mol Ecol 4:249-252. Peterson, E. B., N. M. Peterson, and A. R. Mcleod. 1996. Black cottonwood and balsam poplar managers’ handbook for British Columbia. FRDA Report 250, Forestry Canada and British Columbia Ministry of Forests, Victoria, BC 116 pp. Petit, R. J., J. Duminil, S. Fineschi, et al. 2005. Comparative organization of chloroplast, mitochondrial and nuclear diversity in plant populations. Mol Ecol 14:689-701. Petit, R. J., A. El Mousadik, and O. Pons. 1998. Identifying populations for conservation on the basis of genetic markers. Conserv Biol 12:844-855. Rahman, M. H., and O. P. Rajora. 2002. Microsatellite DNA fingerprinting, differentiation, and genetic relationships of clones, cultivars, and varieties of six poplar species from three sections of the genus Populus. Genome 45:1083-1094. Raymond, M., and F. Rousset. 1995. GENEPOP (Version 1.2): Population genetics software for exact tests and ecumenicism. J Hered 86:248-249. Ritland, C., T. Pape, and K. Ritland. 2001. Genetic structure of yellow cedar (Chamaecyparis nootkatensis). Can J Bot 79:822-828. Rosenberg, N. A. 2004. DISTRUCT: a program for the graphical display of population structure. Mol Ecol Notes 4:137-138. Rosenberg, N. A., J. K. Pritchard, J. L. Weber, et al. 2002. Genetic structure of human populations. Science 298:2381-2385. Rosenberg, N. A., J. K. Pritchard, J. L. Weber, et al. 2003. Response to comment on "Genetic structure of human populations". Science 300:1877. Rousset, F. 1997. Genetic differentiation and estimation of gene flow from statistics under isolation by distance. Genetics 145:1219-1228. Sigurdsson, V., K. A. Jonsson, and A. Sigurgeirsson. 1995. DNA fingerprinting of Populus trichocarpa clones using RAPD markers. New Forests 10:197-206. Slate, J., T. Marshall, and J. Pemberton. 2000. A retrospective assessment of the accuracy of the paternity inference program CERVUS. Mol Ecol 9:801-808.   48 Slatkin, M. 1995. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457-462. Slavov, G. T., S. Leonardi, J. Burczyk, et al. 2009. Extensive pollen flow in two ecologically contrasting populations of Populus trichocarpa. Mol Ecol 18:357-373. Slavov, G. T., and P. Zhelev. 2010. Salient biological features, systematics, and genetic variation of Populus. Pp 15-38 in S. Jansson, R. Bhalerao, A. T. Groover (eds) Genetics and genomics of Populus, Springer, NY. Smulders, M. J. M, J. E. Cottrell, F. Lefever, et al. 2008. Structure of the genetic diversity in black poplar (Populus nigra L.) populations across European river systems: Consequences for conservation and restoration. For Ecol Manag 255:1388-1399. Soltis, D. E., M. A. Gitzendanner, D. D. Strenge, et al. 1997. Chloroplast DNA intraspecific phylogeography of plants from the Pacific Northwest of North America. Pl Syst Evol 206:353-373. Spear, S. F., C. R. Peterson, M. D. Matocq, et al. 2005. Landscape genetics of the blotched tiger salamander (Ambystoma tigrinum melanostictum). Mol Ecol 14:2553-2564. Steinberg, P. D. 2001. Populus balsamifera subsp. trichocarpa. In: Fire Effects Information System, [Online] U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station, Fire Sciences Laboratory. Avilable :http://www.fs.fed.us/database/feis/. Steinhoff, R. J., D. G. Joyce, and L. Fins. 1983. Isozyme variation in Pinus monticola. Can J For Res 13:1122-1132. Stettler, R. F., H. D. Jr. Bradshaw, P. E. Heilman, et al. 1996. Biology of Populus and its implication for management and conservation. NRC Research Press, Ottawa, Ontario, Canada. 539p. Storfer, A., M. A. Murphy, J. S. Evans, et al. 2007. Putting the landscape in landscape genetics. Heredity 98:128-142. Suvanto, L. I., and T. B. Latva-Karjanmaa. 2005. Clone identification and clonal structure of the European aspen (Populus tremula). Mol Ecol 14:2851-2860. Taylor, G. 2002. Populus: Arabidopsis for forestry: Do we need a model tree? Ann Bot 19:681:689. Tribsch, A., and P. Schönswetter. 2003. In search for Pleistocene refugia for mountain plants: patterns of endemism and comparative phylogeography confirm palaeo-environmental evidence in the Eastern European Alps. Taxon 52:477-497.   49 Tuskan, G. A., S. DiFazio, S. Jansson, et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596-1604. Tuskan, G. A., L. E. Gunter, K. Y. Zamin, et al. 2004. Characterization of microsatellites revealed by genomic sequencing of Populus trichocarpa. Can J For Res 34:85-93. Weber, J. C., and R. F. Stettler. 1981 Isoenzyme variation among ten populations of Populus trichocarpa Torr. et Gray in the Pacific Northwest. Silvae Genet 30:82-87. Weir, B. S., and C. C. Cockerham. 1984. Estimating F-statistics for analysis of population structure. Evolution 38:1358-1370. Weising, K., and R. A. Gardner. 1999. A set of conserved PCR primers for the analysis of simple sequence repeat polymorphisms in chloroplast genomes of dicotyledonous angiosperms. Genome 42:9-19. Wheeler, N. C., and R. P. Guries. 1982. Biogeography of lodgepole pine. Can J Bot 60:1805- 1814. Wilson, G. A., and B. Rannala. 2003. Bayesian inference of recent migration rates using multilocus genotypes. Genetics 163:1177-1191. Wright, S. 1965. The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19:295-420. Wyman, J., A. Bruneau, and M. F. Tremblay. 2003. Microsatellite analysis of genetic diversity in four populations of Populus tremuloides in Quebec. Can J Bot 81:360-367. Xie, C-Y., C. C. Ying, A. D. Yanchuk, et al. 2009. Ecotypic mode of regional differentiation due to restricted gene migration: a case in black cottonwood (Populus trichocarpa) along the Pacific Northwest coast. Can J For Res 39:519-526. Ziegenhagen, B., S. Gneuss, G. Rathmacher, et al. 2008. A fast and simple genetic survey reveals the spread of poplar hybrids at a natural Elbe river site. Conserv Genet 9:373-379.    50 Chapter 3. Nucleotide diversity and linkage disequilibrium in natural populations of Populus trichocarpa Torr. & Gray1 3.1 Introduction As a model tree species with a complete genome sequence (Tuskan et al. 2006), Populus trichocarpa Torr. & Gray is receiving increased attention and an extensive research community is developing, focusing on its physiology, molecular biology, and potential cultivation as a potential biofuel production crop (Jansson and Douglas 2007).  Populus is among few tree genera that reach maturity in a relatively short period, facilitating functional genomics studies and genetic mapping of economically important traits related to adaptation, wood quality, and most recently, biofuel attributes (Rae et al. 2008). Within the Pacific Northwest, P. trichocarpa natural populations range from Alaska to northern California, covering a wide latitudinal range.  Natural populations generally are subjected to many evolutionary forces that contribute to their genetic variability, acting on both the entire genome and individual gene levels.  P. trichocarpa is a long-lived dioecious species characterized by an obligate outcrossing mating system and long distance seed and pollen dispersal, all expected to result in high genetic diversity and low population differentiation (Hamrick et al. 1996).  These factors, combined with population demographic history, are anticipated to operate on the entire genome.  However, natural selection is a significant force that might act differently on individual genes.  Studies of natural selection may provide insights on the effect of selection dynamics in shaping the nucleotide variability across different parts of the genome. Studies have been conducted on many Populus species and all have pointed to life- history attributes such as gene flow, introgression, and demographic history, individually and/or collectively greatly influencing its level of nucleotide diversity and population differentiation (Ingvarsson 2005a).  While nucleotide diversity has been studied extensively in other Populus species (e.g., P. tremula L.: Ingvarsson 2005a, b; 2008; Ingvarsson et al. 2006; P. balsamifera L.: Breen et al. 2009; Olson et al. 2010), P. trichocarpa is less well-studied compared to other ______________________________ 1 A version of this chapter has been submitted for publication.  Ismail M., P. K. Ingvarsson, C. P. Constabel, and Y. A. El-Kassaby. Nucleotide diversity and linkage disequilibrium in natural populations of Populus trichocarpa Torr. & Gray.   51 model organisms (Gilchrist et al. 2006).  In forest trees, the study of nucleotide diversity is in its infancy and suffers from variation in species coverage, sample size, and the number of loci studied.  Although the P. trichocarpa genome is still the only sequenced genome available for a tree species, genetic diversity studies in this species are scarce compared to other Populus species.  So far, previous reports of sequence diversity in P. trichocarpa are restricted to a limited number of genotypes and/or loci.  However, a recent study of P. balsamifera with samples across its native range showed low level of nucleotide diversity (π = 0.0027) combined with slow linkage disequilibrium decay which was not detectable within a span of 750 bp (Olson et al. 2010). Linkage disequilibrium (LD) is considered one of the major factors affecting the precision of association mapping studies.  The level of LD is affected by many factors; for instance, population structure, small effective population size, and inbreeding all increase the level of LD, whereas larger effective population size and outcrossing may decrease the magnitude of LD.  In species with a high selfing rate LD can extend over large physical distances (e.g., 100 kb in rice: Caldwell et al. 2006), whereas LD extends over much shorter distances in obligate outcross species (e.g., 200 bp in P. tremula: Ingvarsson 2005a).  Additionally, the extent of LD varies among species, populations, families, and loci (Ingvarsson 2005a; Caldwell et al. 2006). In the present study, ten well studied candidate-gene loci with putative roles in adaptation to biotic and abiotic stress were used to assess the level of nucleotide diversity and linkage disequilibrium in P. trichocarpa.  I hypothesized that LD in P. trichocarpa is expected to decay over short distances as a result of its obligate outbreeding nature.  In the light of this hypothesis I address the following questions: 1) what is the extent of nucleotide diversity in natural populations of P. trichocarpa as compared to its sister North American P. balsamifera species?, 2) what is the rate of LD decay in the species and does it compare to estimates from previous reports?, and 3) what is the effect of selection on the studied loci? 3.2 Materials and methods 3.2.1 Plant material Sampled trees are part of a range-wide collection of black cottonwood conducted by the British Columbia Ministry of Forests and Range that was established in replicated clonal trials over four locations throughout BC (Surrey: 49° 03′ N, 122° 42′ W; Terrace: 54° 30′ N, 128º 42′   52 W; Prince George 53º 53′ N, 122º W; and the University of British Columbia 49° 15′ N, 123° 14′ W).  A total of 172 trees (mean = 4, range = 1-12 trees per population) were randomly sampled to represent British Columbia’ north-south transect. The sampled trees represent 47 populations covering a geographic range extending from 49° 06′ to 58° 56′ N and from 121° 10′ to 133° 24′ W (Table 3.1 and Figure 3.1). 3.2.2 Candidate-gene loci selection The primarily focus was on defence, stress response, and photoperiodism genes due to their involvement in tree adaptation to biotic and abiotic stress.  The studied loci are well documented and characterized in different poplar species (Major and Constable 2008; Auge et al. 2009) and are classified as 1) Defence genes ((I) Inducible defence genes: three wound-inducible Kunitz trypsin inhibitor (KTI) genes: TI-3, TI-4, and WIN3, (II) Cysteine protease inhibitor gene CI-1), 2) Stress response genes (Phenylalanine ammonia-lyase (PAL), Endochitinase (Chi), and Dehydrin (Dhn)), and 3) Photoperiodism genes (GIGANTEA (GI) and Phytochrome B2 (PHYB2)).  Additionally, a housekeeping gene Glyceraldehyde-3-phosphate dehydrogenase (Gapdh) was surveyed to allow comparison with the above mentioned tissue-specific genes (Table 3.2). 3.2.3 DNA extraction, PCR primer design, fragments amplification, and sequencing Total genomic DNA was extracted from newly flushed buds following the method described by Doyle and Doyle (1987).  Locus-specific primers for CI-1, TI-3, TI-4, WIN3, and GI were designed using the online program Primer 3.0 (Rozen and Skaletsky 2000) from the published sequences of gene bank accession numbers AJ842952, AY378088, AY378089, X15516, and BU833698, respectively.  Primers to amplify Dhn, Chi, and PAL were obtained from Joseph and Lexer (2008).  Part of the Gapdh gene was amplified using primers GPDX7F and GPDX9R from Strand et al. (1997).  Finally, unpublished primers were used to amplify part of PHYB2 (Par K. Ingvarsson, Umeå Plant Science Centre, Department of Ecology and Environmental Science, University of Umeå, unpublished).  Out of the 172 sampled trees, only 58-144 trees (i.e., 116-288 diploid sequences) had high quilty sequences.  Primers sequence, annealing temperatures, and products size are listed in Table 3.3.  Polymerase chain reaction (PCR) amplification was performed in 25 µL reaction volume consisting of 70 ng genomic DNA, 0.25 mM of dNTP, 4 µL reaction buffer containg: 50 mM KCl, 15 mM Tris-HCl and 2.5 mM MgCl2, 40 pmol each of forward and reverse primer, and one unit of AmpliTaq Gold DNA   53 polymerase [Applied Bisoystems (ABI), Foster City, CA, USA].  Amplifications were performed using a GeneAMP 9700 thermocycler (ABI) with the following PCR conditions: i) initial denaturation at 94°C for 3 min, ii) 30 cycles of 94°C for 30 s, [45, 58.5 or 61°C according to each primer pair, Table 3.3] for 30 s, 72°C for 1 min, and iii) 4 min final extension at 72°C. PCR products were checked on 1.8% agarose (Invitrogen, Canada, Burlington, Ont.) to insure single fragment amplification.  PCR products were sequenced in both directions using the same primers used for loci amplifications and Big Dye Terminator v3.1 (ABI) on 3730x1 DNA analyzer (ABI). 3.2.4 Sequence analysis and SNP discovery Sequence alignment, editing, and SNP discovery were carried out using CodonCode Aligner 2.0.6 (CodonCode Corporation, Dedham, MA, USA).  Raw sequences were base-called using Phred quality score > 30 (Ewing and Green 1998), implemented in CodonCode Aligner. All chromatograms were visually checked for base calling errors.  Sequence ends were trimmed until the average Phred quality score was > 25 in a window of 10 bases.  Heterozygous SNPs were scored using the ‘call second peaks’ function in CodonCode Aligner with the minimum lower peak height set at 60% of the first peak and manually confirmed.  Consensus sequences were assembled using Phrap as implemented in CodonCode Aligner.  After construction of consensus sequences, all polymorphic sites were verified against the original chromatograms. The sequences were then blasted against the P. trichocarpa genome sequence for verification using the JGI database (DoE Joint Genome Institute and Poplar Genome Consortium; http://genome.jgi-psf.org/Poptr1/Poptr1.home.html).  The same database was also used to identify exonic and intronic regions.  The sequences have been submitted to the European Molecular Biology Laboratory (EMBL)/GenBank nucleotide sequence database under accession numbers (FN396878 - FN396887). 3.2.5 Nucleotide diversity and neutrality tests Nucleotide diversity per site was estimated as π (Lynch and Crease 1990) and θw (Watterson 1975) and was obtained for total (πT), synonymous (πs), nonsynonymous (πa), and noncoding sites (πnc).  Deviation from neutral expectation was investigated at each locus using Tajima’s D (Tajima 1989) (with no outgroup) and Fay and Wu's H, HFay&Wu (Fay and Wu 2000) (with an outgroup), the latter was used to polarize the mutations in P. trichocarpa.  Combining both D and HFay&Wu may allow distinguishing between the effects of natural selection and   54 population expansion.  The significance of D and HFay&Wu were evaluated using 1000 coalescent simulations.  To make all tests conservative, all simulations were ran using a fixed number of segregating sites, equal to the observed number of segregating sites at each locus, and without recombination (Ingvarsson 2005a).  In addition, the McDonald-Kreitman test (MK; McDonald and Kreitman 1991) was used to compare levels of polymorphism within species to the level of divergence between species across the ten gene loci.  MK was used to test the hypothesis that synonymous and replacement mutations are neutral by comparing the ratio of synonymous to nonsynonymous polymorphism within P. trichocarpa to nonsynonymous divergence between P. trichocarpa and P. tremula.  Statistical departure from neutrality was then tested with a G-test on a 2 x 2 contingency table of silent and replacement fixed differences between species and silent and replacement polymorphic changes within P. trichocarpa.  Additionally, the neutrality index (NI, Rand and Kann 1996) was obtained for each locus to indicate the directionality of the MK test.  NI value > 1 suggests the presence of negative selection, while a NI < 1 is suggestive of positive selection.  All analyses were performed in DnaSP 4.9 (Rozas et al. 2003).  In order to investigate the deviation from neutrality across all loci, a multilocus Hudson-Kreitman-Aguadé test (HKA, Hudson et al. 1987) was applied as implemented in the program HKA (http://lifesci.rutgers.edu/~heylab).  In this test, the ratio of polymorphisms within species to the divergence between species is assessed.  It is expected that in the absence of selective forces, the ratio of polymorphisms within species to the divergence between species (divergence) should be constant across loci.  Therefore, this test will allow distinguishing between selection forces and population demography, where the former is more likely to affect individual loci compared to the latter which most likely will affect the unlinked or closely linked loci (Zhu et al. 2007).  In all the tests mentioned above, a single sequence from P. tremula was used as outgroup, except for Chi and GI where an outgroup sequence was obtained from NCBI gene bank accession numbers EU825062 and BU833698, respectively.  The choice of P. tremula as an outgroup was based on the availability of orthologous sequences for the same studied loci, unless otherwise indicated. 3.2.6 Haplotypes inference and analysis of linkage disequilibrium (LD) Haplotype phase for heterozygous alleles was inferred using PHASE 2.1.1 (Stephens et al. 2001; Stephens and Scheet 2005).  The default model (-MR0) was used, the general model for recombination rate variation (Li and Stephens 2003).  A separate PHASE run was performed for each gene fragment; each run had a burn-in of 100 followed by 10,000 iterations.  Haplotypes were determined at a confidence probability of ≥ 95%.  All indels were excluded from   55 subsequent analyses.  Following haplotype phasing, the number of haplotypes (H) and haplotype diversity (Hd, Nei 1987) for each locus were estimated using DnaSP. The square of the correlation coefficient between pairs of polymorphic sites (r2; Weir 1990) at the candidate loci were obtained using DnaSP and statistical significance of r2 values was also determined using chi-squared test and Bonferroni corrections.  The LD decay with physical distance in base pairs (bp) pooled across loci was assessed between polymorphic sites using nonlinear regression analysis of r2 values (Remington et al. 2001; Ingvarsson 2005a). Linkage disequilibrium for individual loci was not reported.  The expected value of r2 under drift-recombination equilibrium is E(r2) = 1/(1+ρ), where ρ = 4Nc is the scaled recombination rate for the gene region, N is the effective population size, and c is the recombination rate between sites.  When a low mutation rate is assumed, the expected decay of LD becomes         ++ +++ +      ++ + = )11)(2( )1212)(3(1)11)(2( 10)( 2 2 ρρ ρρρ ρρ ρ n rE      (1) (Hill and Weir 1988), where n is number of sampled sequences.  Equation (1) was fitted using pooled r2 with PROCNLIN procedure of SAS software (version 9.1, SAS Institute, Inc., Cary, NC, www.sas.com).  The nonlinear regression yields least-square estimates of ρ per base pair. Although this estimate may not be precise due to lack of independence between linked sites (Remington et al. 2001), it is nonetheless useful to illustrate the rate of LD decay with physical distance. 3.3 Results 3.3.1 Nucleotide diversity The studied loci yielded a total of 4,781 bp; including 3,511 bp representing coding regions (Table 3.4).  The length of analyzed regions varied between 212 and 747 bp, with only four loci (CI-1, Dhn, GI, and Gapdh) containing noncoding regions.  Across the 10 loci, a total of 185 SNPs were identified, corresponding to one SNP per 26 base pairs.  The number of SNPs per locus varied among loci and ranged between 4 and 30, with higher SNP/bp ratio detected at defence and stress response genes (~1SNP/21 bp) as compared to photoperodic and housekeeping genes (~1SNP/37 bp) (Table 3.4).  Total nucleotide diversity (πT) yielded an average of 3.69 x 10-3 (range: 0.44 x 10-3-12.93 x 10-3), while θw averaged 10.02 x 10-3 (range: 2.86 x 10-3-18.36 x 10-3), with higher nucleotide diversity attributable to defence and stress response loci compared to the remaining loci (Table 3.5).  The average nucleotide diversity at   56 synonymous sites, πs, (3.12 x 10-3) was higher than that at nonsynonymous sites, πa, (2.45 x 10-3) and this pattern was observed for TI-3, TI-4, CI-1, and Gapdh while this trend was reversed for WIN3, PAL, Chi, Dhn, and PHYB2.  In addition, GI showed no diversity at both synonymous and nonsynonymous sites (πs = πa = 0.0).  Noncoding nucleotide diversity, πnc, obtained only for CI- 1, Dhn, GI, and Gapdh averaged 4.06 x 10-3, with Dhn showing the lowest πnc among the studied loci (Table 3.5). 3.3.2 Deviation from neutrality With the exception of TI-3, all loci showed deviation from neutrality as indicated by significant negative Tajima’s D; however, when P. tremula was used as an outgroup, only WIN3 showed deviation from neutral expectation as indicated by significant HFay&Wu.  Moreover, only TI-4, WIN3, Chi, and Dhn had negative values for both D and H statistics (Table 3.5). In the present study MK test (McDonald and Kreitman 1991) was used to check the null hypothesis that the studied loci follow neutral expectation by comparing the pattern of protein polymorphisms and divergence at each locus.  Under the neutral model of molecular evolution (Kimura 1983), only neutral mutations contribute both to variation within and divergence among species.  Purifying selection will slightly eliminate deleterious mutations from the populations, and hence these mutations are less likely to contribute to the divergence between species and polymorphisms within species (Duret 2008).  Consequently, the ratio of replacement to synonymous polymorphisms will be equal to the ratio of replacements to synonymous fixed difference (McDonald and Kreitman 1991).  Among the studied loci, TI-3, CI-1, Dhn, and Gapdh showed higher synonymous compared to replacement divergence yielding Ks > Ka (Table 3.6).  On the other hand, TI-4, WIN3, PAL, and PHYB2 showed an opposite pattern where the replacement divergence was higher compared to silent sites divergence.  Moreover, only Chi showed similar pattern of synonymous versus nonsynonymous divergence.  Additionally, synonymous or replacement differences within or between species at GI was not observed (data not shown).  The MK test showed no significant signs of adaptive evolution at any of the studied loci.  Surprisingly, no divergence was observed between species at WIN3 although it produced the highest number of fixed replacements.  The neutrality index (NI) varied among loci with a range of 0.5-2.6 (Table 3.6).  For the studied loci, NI was not defined for TI-4 and PAL, only Chi and Dhn had NI > 1whereas the remaining loci had NI < 1; nevertheless, none of the neutrality index values were significant.  Furthermore, no deviation from neutrality was observed at the   57 multilocus level as indicated by the non-significant HKA test (X2 = 6.98, d.f. = 9, P = 0.64), suggesting that the null hypothesis of neutral evolution at these loci can not be rejected. 3.3.3 Haplotype structure and linkage disequilibrium (LD) decay Strong haplotype structure was observed at the studied loci, in particular at defence and stress response loci, with an average of 31 haplotypes per locus (Table 3.5).  The number of haplotypes (H) varied remarkably among loci with the highest number (139) being observed for TI-3 as compared to that for PAL (5).  Surprisingly, the housekeeping gene Gapdh had 37 haplotypes, comparable to the stress response locus Dhn with 31 haplotypes.  In addition, haplotype diversity (Hd) followed the same trend and averaged 47% with the highest Hd observed for TI-3; 98% as compared to the other studied loci.  The housekeeping locus Gapdh showed moderate diversity; 76% compared to other loci.  Considering pooled data across the ten loci, the non-linear regression model for LD (r2) with distance (bp) showed a rapid decay of LD, with r2 declins to ~0.18 within < 700 bp (Figure 3.2). 3.4 Discussion 3.4.1 Nucleotide diversity The level of nucleotide diversity per site observed in this study (θw = 0.01) was approximately three- to five-fold higher than that reported for P. balsamifera (Breen et al. 2009; Olson et al. 2010); however, the estimate of θw in P. trichocarpa was slightly lower than that of P. tremula (Ingvarsson 2005a; θw = 0.016).  The studied loci are involved in various metabolic pathways associated with defence, stress response, photoperiodism, and glycolysis, and exhibited a remarkable range of nucleotide diversity per site, ranging from 0.0028 to 0.0183 (Table 3.5). The substantial variability at the defence and stress response gene loci in the present study, excluding PAL, might be indicative of their faster evolution rate (higher substitution rate).  On the other hand, the high nucleotide diversity observed at defence and stress response genes is probably due to the past genome duplication events in Populus.  These duplication events are expected to affect the genome-wide level of nucleotide diversity (Tuskan et al. 2006); however, the outcome of duplication events tends to be more pronounced at the stress-regulated genes (Zou et al. 2009). The PHYB2 and Gapdh exhibited relatively low SNP frequency at the coding region. This pattern is not surprising since these genes are highly conserved and might be undergoing a   58 slower evolution rate, a pattern previously associted with light harvesting (Jansson 1999) and housekeeping genes (Zhang and Li 2004).  More interestingly, the observed low variability (hence slower evolution) at PHYB2 in P. trichocarpa was also reported for the same gene in P. tremula where the authors also pointed out that approximately 80 kb within and around this gene is evolving according to neutral evolution (Ingvarsson et al. 2008). The present study demonstrated that P. trichocarpa natural populations, on average, harboured moderate levels of nucleotide diversity (πT = 0.0037), similar to that reported for an Alaskan population of the same species (N = 15) (Breen et al. 2009; πT = 0.0032), but higher than that reported for a sample from the species range (N = 41) (Gilchrist et al. 2006; πT = 0.0018).  In comparison with other poplar species, the level of nucleotide diversity observed in the present study was slightly higher than that reported for P. balsamifera (Olson et al. 2010; πT = 0.0027) but similar to that obtained for P. tremula (Ingvarsson 2008; πT = 0.0042).  These differences are not surprising as nucleotide diversity is highly affected by population history and the loci under investigation (Ingvarsson 2005b).  Additionally, population demographic history has a significant impact on shaping species genetic diversity providing distinct genetic background among species within genus. In the Pacific Northwest, P. trichocarpa has probably undergone severe bottlenecks followed by population expansion and recolonization after Pleistocene glaciations as determined using nuclear and chloroplast microsatellite markers (Chapter 2).  A similar pattern has also been observed in P. balsamifera (Breen et al. 2009; Olson et al. 2010), a species with the same life history as P. trichocarpa. Total nucleotide diversity calculated from pair-wise differences (πT) ranged from 0.0004 to 0.013 for PAL and TI-3, respectively.  Conversely, nucleotide diversity calculated from the number of segregating sites (θw) ranged between 0.0028 and 0.018 for PAL and TI-4, respectively, showing that there is significant variation among the studied loci, results similar to that reported in other related species, such as P. tremula (Ingvarsson 2005b; πT = 0.0059-0.0147). The observed deference between πT and θw in the present study was also similar to those reported in other angiosperms including Helianthus annuus (Liu and Burke 2006; πT = 0.002-0.022 and θw = 0.003-0.027) and Persea americana (Chen et al. 2008; πT = 0.006-0.0123 and θw = 0.004- 0.109).  The differences in total nucleotide diversity calculated from pair-wise differences (πT) and that from the number of segregating sites (θw) may be indicative of a possible role for   59 demographic history, as suggested by the presence of two main maternal cpDNA haplotypes groups in P. trichocarpa (Chapter 2). Synonymous site diversity (πS) varied to the same extent as total nucleotide diversity, with an average of 0.0031 and ranging between 0.0 and 0.015.  The average πS observed in the present study was lower than that reported in P. tremula (Ingvarsson 2005a; πS = 0.022; Ingvarsson 2008; πS = 0.012) and P. nigra (Chu et al. 2009; πS = 0.0106).  On the other hand, the average nonsynonymous nucleotide diversity (πa) was lower than that of synonymous nucleotide diversity (πS), varied across loci with an average of 0.002, and ranged between 0.0 and 0.011 for GI and TI-3, respectively.  In the present study P. trichocarpa had lower πa (0.0025) as compared to P. tremula (Ingvarsson 2005a; πa = 0.0059) and P. nigra (Chu et al. 2009; πa = 0.0045). The level of polymorphism varies extensively among genomic regions, sampling schemes, and species; accordingly, the polymorphism comparisons among species will be more informative when many orthologous loci are included (Krutovsky and Neal 2005).  Additionally, nucleotide diversity is usually affected by substitution rate, selection, population size, and history, thus the use of locus-specific rates is preferable to species-specific averages for explaining the diversity estimates (Gilchrist et al. 2006). The present study as well as others cited here have all shown that nucleotide diversity estimates very substantially among different genes and call for a cautious approach when comparisons of nucleotide diversities are made among and/or within species using one or few loci (Ingvarsson 2005b). 3.4.2 Deviation from neutrality With the exception of TI-3, most of the studied loci showed large negative D values, suggesting an excess of low-frequency mutations (Table 3.5).  However, an excess of high- frequency-derived variants was apparent at TI-3, TI-4, WIN3, Chi, and Dhn, as indicated by negative values of Fay and Wu’s H.  The observed low- and high- frequency derived mutations at the studied loci were in agreement with those previously reported in P. tremula (Ingvarsson et al. 2008) and P. nigra (Chu et al. 2009) and are indicative of two possible scenarios: 1) the studied loci are mainly under selection and/or 2) the populations have undergone a recent expansion as suggested by negative H at five of the studied loci (Table 3.5).  Although discriminating between these two scenarios is very difficult, it seems that recent population expansion has played the major role in shaping the observed pattern of nucleotide diversity.  P. trichocarpa populations of BC are thought to comprise multiple genetic backgrounds with   60 populations originating from two different refugia following species expansion after the last glaciation (Chapter 2).  However, a role of purifying selection cannot be excluded as a possible explanation for the observed pattern, which was also apparent from lower average πa as compared to πS.  A similar pattern of negative Tajimas’D across four Kunitz trypsin inhibitor genes was also observed in P. tremula (Ingvarsson 2005b) where the author suggested that the strength of population subdivision might partially mask the effect of selection. To further investigate if the observed departure from neutrality is due to selection and/or population history, Fay and Wu’s H was obtained considering outgroup data.  Although negative values of Tajima's D alone might indicate low and/or high frequency polymorphism, a negative D and H for TI-4, WIN3, and Dhn indicates the presence of excess frequency derived polymorphisms which could be the result of population structure or positive selection (Przeworski 2002).  It should also be noted that only WIN3 showed this pattern, as indicated by significant H (p < 0.001), rejecting the null hypothesis of neutral evolution at this locus.  The lack of significant H combined with significant D at the studied loci (Table 3.5) favour the demographic expansion, and this most likely explains the observed negative D at the majority of the loci.  A similar pattern of negative D at different loci was also observed in Arabidopsis thaliana, another species that has recently undergone population expansion (Beck et al. 2008). The observed negative D and positive H for CI-1, PAL, PHYB2, GI, and Gapdh indicate deficiency of moderate- and high-frequency variants, as well as an excess of low-frequency polymorphisms, providing information on the demographic history of P. trichocarpa, which is characterized by population expansion and the re-colonization of habitat.  This observation was also supported in a parallel study on the same populations using chloroplast and nuclear microsatellites (Chapter 2). With the exception of TI-4, divergence at synonymous sites (Ks) of defence and stress response genes was slightly lower than that at replacement sites (Ka).  In addition, the photoperiod gene PHYB2 showed higher divergence at replacement compared to synonymous sites.  Gapdh showed no divergence at synonymous sites and only negligible divergence at replacement sites, suggesting that Gapdh, as a housekeeping gene, tends to evolve slowly compared to tissue-specific genes (Zhang and Li 2004).  Purifying selection is a likely cause for the divergence pattern observed for TI-3, CI-1, Dhn, and Gapdh, as they had Ka < Ks.  These results are in agreement with previous reports in P. tremula where TI-3, CI-1, and Gapdh showed lower nonsynonymous divergence as compared to synonymous divergence (Ingvarsson 2005a).   61 On the other hand, WIN3, PAL, and PHYB2 showed signs of long term adaptive evolution as pointed out by Ka > Ks, indicating that nonsynonymous mutations exist in high frequency and might be associated with fitness advantages compared to synonymous mutations.  This situation is noticeable when defence genes co-evolve with parasites (e.g., positive selection, Hurst 2002). Similar scenarios might also arise when the genes evolve under no functional constraints.  Given the observed low levels of divergence, it is difficult to distinguish between these two hypotheses. Although P. trichocarpa showed an excess of nonsynonymous polymorphisms that were fixed between species as compared to those segregating within species, this excess was not significant (Table 3.6), consistent with the null hypothesis of neutral evolution.  Furthermore, the neutral evolution hypothesis also could not be rejected at the multilocus level where the multilocus HKA test provided more support for non-adaptive evolution at the studied loci. The neutral pattern observed at the studied loci is not surprising, particularly when these loci previously have been found to have diverse responses to selection pressure at the inter and intra-specific level in other Populus species (P. tremula: Ingvarsson 2005, 2008, P. balsamifera: Neiman et al. 2009).  The shared pattern of neutrality observed across the studied loci, despite locus-specific characteristics, is indicative of a shared force that affects all loci to a similar extent.  The post-glacial colonization of P. trichocarpa is in part possibly responsible for the neutral pattern at the studied loci.  Post-glacial migration probably allowed a few founder populations to survive, which has led to small effective population sizes.  Selective forces such as codon bias previously reported in P. tremula (Ingvarssone 2008) are expected to be more pronounced in species with smaller effective population size (e.g., P. trichocarpa).  More interestingly, in a recent study on P. balsamifera across its native North American range using 590 gene fragments, Olson et al. (2010) suggested that the species past demographic history has led to lower effective population size compared to P. tremula and hence weaker effect of purifying selection in the former. 3.4.3 Haplotypes and linkage disequilibrium (LD) The distribution of the number of haplotypes among loci indicates the presence of strong haplotype structure, mainly attributable to the defence loci (Table 3.5).  The majority of haplotypes were obtained from inducible defence and stress response loci (~76% of the total number) as compared to the photoperiod and housekeeping loci (which accounted for approximately 11 and 12%, respectively.  The observed pattern of haplotype structure at defence loci was also prominent in the protease inhibitor genes in other Populus species, suggesting that   62 defence genes are evolving rapidly (Ingvarsson 2005b).  More interestingly, the results of the elevated haplotypes structure are in line with previous reports in P. balsamifera on the same set of induced defence loci that surveyed in the present study (i.e., TI-3, TI-4, WIN3, and CI-1; Neiman et al. (2009)).  This is also supported by the high diversity at the biochemical and functional level observed for the Kunitz trypsin inhibitor gene family in Populus (Major and Constable 2008).  Consequently, defence genes are likely to harbour high polymorphism and most likely have undergone recombination resulting in the observed pattern of high recombination previously found in disease resistance genes (Ellis et al. 2000; Rose et al. 2004). On the other hand, the unusual pattern of elevated number of haplotypes in TI-3 as compared to other defence-related loci might indicate recent allele duplication that has led to many paralogous copies (Neiman et al. 2009). Linkage disequilibrium, r2, pooled across loci decayed rapidly with distance.  In general, average linkage disequilibrium declined to < 0.18 within less than 700 bp.  This observation is consistent with earlier estimates for P. trichocarpa and P. nigra (Gilchrist et al. 2006; Chu et al. 2009).  Surprisingly, the observed LD decay in P. trichocarpa was in remarkable contrast with that reported in P. balsamifera where the LD showed no decay within 750 bp (Olson et al. 2010). Although both species are considered as two subspecies (Brayshaw 1965), the stark differences in the LD decay between P. trichocarpa in the current study and that previously reported for P. balsamifera in Olsen et al. (2010) might in part due to the differences between gene loci assayed. The observed rate of LD decay in the present study is slightly slower than that reported in P. tremula (Ingvarsson 2005b) which probably reflects a higher recombination rate combined with larger effective population size in P. tremula (Ingvarsson 2008) compared to P. trichocarpa. Although the decay of LD with physical distance varies significantly across species (Nordborg 2000), an obligately outbreeding species like P. trichocarpa is expected to show rapid LD decay similar to that reported in other outbreeding species (e.g., maize: Guillet-Claude et al. 2004; sunflower: Liu and Buker 2006).  In gymnosperm, however, LD decays over scales of kilobases (Pinus taeda: Brown et al. 2004).  On the other hand, selfing species show very low rates of LD decay, with significant LD extending several hundred kilobases (Oryza sativa: Garris et al. 2003; Hordium vulgare: Caldwell et al. 2006).  Thus, the decay of LD differs considerably among species according to factors such as effective population size, mating system, admixture, and demographic history.  Among the previously mentioned factors that influence the rate of LD   63 decay, its obligately outcrossing nature, high recombination rate, and demographic history provide the most likely explanations for the observed fast decline of LD in the sampled loci. 3.5 Conclusions The present study contributes to the growing body of knowledge on naturally occurring levels of nucleotide diversity and linkage disequilibrium in Populus.  Different populations across the species range in British Columbia were sampled in order to assess P. trichocarpa natural variability.  Although the genome-wide nucleotide diversity, in P. trichocarpa is still unknown, P. trichocarpa has moderate nucleotide diversity similar to P. balsamifera, and comparable to other related Populus species, considering ten gene loci involved in different biological functions.  In addition, the majority of the studied loci show a deviation from neutrality, invoking different selection modes.  The neutral pattern observed at the studied loci is most likely due to range expansion; however, more loci and explicit demographic modeling is required to further support this claim.  As an obligate outbreeding species, P. trichocarpa has a level of LD that decays within a few hundred base pairs, a pattern consistent with previous observations in other Populus species; however, this rate is in contrast to that of P. balsamifera. The observed rate of LD decay and high SNP frequency will greatly influence the expected resolution of association studies.  Generally, this study illustrates that sequence data from natural populations can be used to infer different evolutionary forces that might shape species diversity, and calls for further consideration of an increased number of candidate genes with complementary investigations at a genome-wide scale.   64 Table 3.1 List of P. trichocarpa populations (number, code, and number of trees) and sample locations. No. Population code No. of trees Latitude Longitude 1 NKN 1 58º56′ 133º11′ 2 TAK 4 58º42′ 133º24′ 3 FKR 1 56º44′ 130º38′ 4 ISK 3 56º49′ 130º45′ 5 STK 2 57º02′ 131º47′ 6 IRV 2 56º48′ 129º41′ 7 TRT 1 55º39′ 128º48′ 8 NAS 3 56º34′ 129º49′ 9 CDR 1 55º01′ 128º20′ 10 EXC 2 54º25′ 129º26′ 11 HAZ 1 55º13′ 127º40′ 12 KTW 1 55º05′ 128º11′ 13 SKN 8 54º25′ 128º55′ 14 BUL 5 54º52′ 127º09′ 15 HIR 1 54º04′ 128º27′ 16 KTM 6 54º09′ 128º36′ 17 DEN 9 52º49′ 126º46′ 18 MCG 2 54º11′ 122º00′ 19 SHE 2 54º02′ 122º36′ 20 WLO 1 53º55′ 122º17′ 21 KIM 3 52º56′ 121º10′ 22 QBK 2 52º57′ 122º52′ 23 QCT 2 53º02′ 122º09′ 24 QFR 2 53º04′ 122º31′ 25 QLK 2 52º58′ 122º19′ 26 KLN 10 51º26′ 125º37′ 27 HOM 8 51º06′ 124º53′ 28 SKW 12 50º14′ 123º59′ 29 HAR 6 49º58′ 122º26′ 30 LIL 10 50º30′ 123º05′ 31 GLC 3 50º06′ 123º00′    65    Table 3.1 Continued. No. Population code No. of trees Latitude Longitude 32 SQM 6 49º58′ 123º18′ 33 BLC 2 49º50′ 125º11′ 34 CMB 2 49º57′ 125º15′ 35 CNY 2 49º40′ 125º04′ 36 FNY 5 49º31′ 124º51′ 37 SLM 6 50º15′ 125º50′ 38 WHT 2 50º08′ 126º03′ 39 CHW 7 49º06′ 121º39′ 40 HOP 2 49º25′ 121º30′ 41 HRS 4 49º15′ 121º54′ 42 MCM 4 49º11′ 122º35′ 43 MTS 1 49º07′ 122º20′ 44 NHT 4 49º59′ 121º42′ 45 VND 3 49º09′ 122º06′ 46 WEL 3 49º40′ 121º25′ 47 YAL 3 49º34′ 121º26′    66    Table 3.2 Description and location by linkage group (LG) of the ten P. trichocarpa nuclear loci used in the study. Gene Source Gene bank reference LG Functional role TI-3 P. trichocarpa x P. deltoides AY378088 XIX Wound-inducible Kunitz trypsin inhibitor (KTI). TI-4 P. trichocarpa x P. deltoides AY378089 IV Wound-inducible Kunitz trypsin inhibitor (KTI). Win3 P. trichocarpa X15516 X Wound-inducible Kunitz trypsin inhibitor (KTI). CI-1 P. tremula AJ842952 III Cysteine protease inhibitor. PALa P. trichocarpa EU603319 XVI Defence response, response to oxidative stress, and response to wounding. Endochitinase (Chi) P. trichocarpa BI139082 XIV Defence-related protein, stress response, and cellular sensing and response. Dehydrin (Dhn) P. tremula BU863852 IV Stress response. PHYB2 P. trichocarpa AF309807 X Photoreceptor, sensor molecule and signal transduction. GIGANTEA (GI) P. tremula x P. tremuloides BU833698 V Response to blue light, regulation of circadian rhythm, response to cold, flower development, positive regulation of long-day photoperiodism, flowering and temperature compensation of the circadian clock. Gapdhb P. tremula AJ843622 X Glycolysis and glycogenesis. aPhenylalanine ammonia-lyase bGlyceraldehyde-3-phosphate dehydrogenase   67    Table 3.3 Primer sequence, annealing temperature, and product size for ten P. trichocarpa nuclear loci. Locus Primer sequences (5'-3') Taa Product size (bp) TI-3 F: ATCGATGTCTTCGGTGA R: AGAAGCTCTATCGGATGGTA 45.0 474 TI-4 F: CTTGACATTCAGGGCGAA R: AACCACGAAAGGTGA 45.0 509 WIN3 F: CGATTTCTACGGTCGTGA R: CGCATCCGGTTTAAACCTA 58.5 465 CI-1 F: TCCCCACAAATATATAAATACG R: TTACAAAGCTTGTAAATTACAC 58.5 802 PAL F: TGGATTGCCATCAAATCTCA R: CTCTTGCGCTCTCAACCTCT 61.0 605 Chi F: GGAGATGATTCGGAGAAGACA R: AAGCAGCAAGCTCCTTCATC 61.0 286 Dhn F: ACTGCCATGATGAGCGAAGATG R: GGTGTGTACCTCAGCGGTCT 58.5 540 PHYB2 F: ATGGCATCACAATCACAAAGG R: GACAACCTCACCATGCTCGTCC 65.5 797 GI F: TAACATGGGAAGCCCATAGC R: CTGAATGGGAAAAAGGGTCA 61.0 619 Gapdh F: GATAGATTTGGAATTGTTGAGG R: AAGCAATTCCAGCCTTGG 58.5 809 aAnnealing temperature (ºC).   68    Table 3.4 Length of regions analyzed (bp) and SNPs discovered in ten P. trichocarpa nuclear loci. Gene N Total length (bp)a  Coding region (bp) Total SNP  Coding SNP TI-3 128 464 464 23 23 TI-4 112 459 459 20 20 WIN3 114 434 434 26 26 CI-1 144 212 172 10 6 PAL 71 548 548 4 4 Chi 87 235 235 20 20 Dhn 113 495 193 30 12 PHYB2 63 639 639 12 12 GI 58 548 26 15 1 Gapdh 83 747 341 25 4 Total  4781 3511 185 128 N, number of trees. aIncluding indels.    69    Table 3.5 Summary statistics of sequence variation (x10-3) and deviation from neutrality in ten P. trichocarpa nuclear loci. Gene H Hd  πT  θw  πs πa πnc Tajima’s D HFay & Wu TI-3 139 0.984 12.93 9.14 14.65 10.70 - 1.065 -1.601 TI-4 19 0.324 5.51 18.36 6.86 3.89 - -1.870*** -1.075 WIN3 19 0.223 1.47 10.46 0.54 1.49 - -2.363*** -3.414*** CI-1 12 0.343 2.76 9.66 1.87 1.04 5.47 -1.602 *** 0.141 PAL 5 0.096 0.44 2.86 0.00 0.37 - -1.584* 0.109 Chi 16 0.247 3.05 17.18 1.54 2.30 - -2.239*** -1.274 Dhn 31 0.573 2.71 11.72 1.19 1.92 2.99 -2.153*** -0.891 PHYB2 19 0.647 1.93 4.19 0.91 2.21 - -1.384* 0.33 GI 16 0.484 3.62 10.59 0.00 0.00 3.62 -1.791** 0.867 Gapdh 37 0.765 2.47 6.04 3.61 0.61 4.14 -1.668** 0.845 Average 31.3 0.469 3.69 10.02 3.12 2.45 4.06 H, number of haplotypes; Hd, haplotype diversity; πT, total nucleotide diversity; θw, nucleotide diversity per site; πs, synonymous nucleotide diversity; πa, non synonymous nucleotide diversity; πnc non coding nucleotide diversity. *p < 0.05, **p < 0.01, ***p < 0.001.    70    Table 3.6 Sequence divergence and McDonald-Kreitman (MK) tests for deviations from neutrality at coding regions in nine P. trichocarpa locia. Synonymous Replacement Gene Ka Ks Fixed Poly Fixed Poly MK (P-value) NI TI-3 0.061 0.131 7 8 17 10 1.042 (0.307) 0.515 TI-4 0.041 0.003 0 3 3 12 - b  - WIN3 0.132 0.057 4 4 36 20 0.593 (0.441) 0.556 CI-1 0.033 0.040 1 2 3 5 0.017 (0.898) 0.833 PAL 0.021 0.000 0 1 4 2 - - Chi 0.023 0.023 1 2 2 6 0.075 (0.785) 1.5 Dhn 0.022 0.039 1 3 1 8 0.385 (0.535) 2.667 PHYB2 0.014 0.010 1 2 5 7 0.071 (0.790) 0.7 Gapdh 0.008 0.027 2 4 2 3 0.052 (0.819) 0.750 Ka, nonsynonymous divergence; Ks, synonymous divergence; NI, neutrality index. aGI locus results not shown. bNot defined.    71    Figure 3.1 Geographic location of P. trichocarpa populations from British Columbia used in the study.  Numbers represent population codes in Table 3.1.    72    Figure 3.2 Scatter plot of squared allele frequency (r2) versus distance in base pairs (bp) pooled across ten nuclear loci in P. trichocarpa.  The fitted curve is based on nonlinear regression of r2 against distance using equation (1).     73 3.6 References Auge, G. A., S. Perelman, C. D. Crocco, et al. 2009. Gene expression analysis of light-modulated germination in tomato seeds. New Phytol 183:301-314. Beck, J. B., H. Schmuths, and B. A. Schaal. 2008. Native range genetic variation in Arabidopsis thaliana is strongly geographically structured and reflects Pleistocene glacial dynamics. Mol Ecol 17:902-915 Breen, A. L., E. Glenn, A. Yeager, et al. 2009. Nucleotide diversity among natural populations of a North American poplar (Populus balsamifera, Salicaceae). New Phytol 3:763-773. Brown, G. R., G. P. Gill, R. J. Kuntz, et al. 2004. Nucleotide diversity and linkage disequilibrium in loblolly pine. PNAS 101:15255-15260. Caldwell, K. S., J. Russell, P. Langridge, et al. 2006. Extreme population-dependent linkage disequilibrium detected in an inbreeding plant species, Hordeum vulgare. Genetics 172:557-567. Chen, H., P. L. Morrell, M. del la Cruz, et al. 2008. Nucleotide diversity and linkage disequilibrium in wild avocado (Persea americana Mill.) J Hered 99: 382-389. Chu, Y., X. Su, Q. Huang, et al. 2009. Patterns of DNA sequence variation at candidate gene loci in black poplar (Populus nigra L.) as revealed by single nucleotide polymorphisms. Genetica 137:141-150. Doyle, J. J., and J. L. Doyle. 1987. A rapid DNA isolation procedure from small quantities of fresh leaf tissues. Photochem Bull 19:11-15. Duret, L. 2008. Neutral theory: The null hypothesis of molecular evolution. Nature Education vol. 1 no.1. Ellis, J., P. Dodds, and T. Pryor. 2000. Structure, function and evolution of plant disease resistance genes. Curr Opin Plant Biol 3:278-284. Ewing, B., and P. Green. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186-194. Fay, J., and C-I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405- 1413. Garris, A. J., S. R. McCouch, and S. Kresovich. 2003. Population structure and its effect on haplotype diversity and linkage disequilibrium surrounding the xa5 locus of rice (Oryza sativa L.). Genetics 165: 759-769.   74 Gilchrist, E. J., G. W. Haughn, C. C. Ying, et al. 2006. Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol 15:13671378. Guillet-Claude, C., C. Birolleau-Touchard, P. Manicacci, et al. 2004. Nucleotide diversity of the ZmPox3 maize peroxidase gene: Relationships between a MITE insertion in exon 2 and variation in forage maize digestibility. BMC Genetics 5:19. Hamrick, J. L., M. J. W. Godt, and L. Sherman-Broyles. 1992. Factors influencing levels of genetic diversity in woody plant species. New Forests 6:95-124. Hill, W. G., and B. S. Weir. 1988. Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol 33:54-78. Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159. Hurst, L. D. 2002. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet 18:486-487. Ingvarsson, P. K. 2005a. Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European aspen (Populus termula L., Salicaceae). Genetics 169:945-953. Ingvarsson, P. K. 2005b. Molecular population genetics of herbivore-induced protease inhibitor genes in European aspen (Populus tremula L., Salicaceae). Mol Biol Evol 22:1802-1812. Ingvarsson, P. K. 2008. Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180:329-340. Ingvarsson, P. K., M. V. Garcia, D. Hall, et al. 2006. Clinal variation in phyB2, a candidate gene for day-length-induced growth cessation and bud set, across a latitudinal gradient in European aspen (Populus tremula). Genetics 172:845-1853. Ingvarsson, P. K., M. V. Garcia, V. Luquez, et al. 2008. Nucleotide polymorphism and phenotypic associations within and around the phytochrome B2 locus in European aspen (Populus tremula, Salicaceae). Genetics 178:2217-2226. Jansson, S. 1999. A guide to the Lhc genes and their relatives in Arabidopsis. Trends Plant Sci 4:236-240. Jansson, S., and C. J. Douglas. 2007. Populus: A model system for plant biology Annu Rev Plant Biol 58:435-58.   75 Joseph, J. A., and C. Lexer. 2008. A set of novel DNA polymorphisms within candidate genes potentially involved in ecological divergence between Populus alba and P. tremula, two hybridising European forest trees. Mol Ecol Notes 8:188-192. Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge, Cambridge University Press. Krutovsky, K. V., and D. B. Neale. 2005. Nucleotide diversity and linkage disequilibrium in cold-hardiness- and wood quality-related candidate genes in Douglas fir. Genetics 171:2029-2041. Li, N., and M. Stephens. 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213-2233. Liu, A. Z., and J. M. Burke. 2006. Patterns of nucleotide diversity in wild and cultivated sunflower. Genetics 173:321-330. Lynch, M., and T. J. Crease. 1990. The analysis of population survey data on DNA sequence variation. Mol Biol Evol 7:377-394. Major, I. T., and C. P. Constabel. 2008. Functional analysis of the Kunitz trypsin inhibitor family in poplar reveals biochemical diversity and multiplicity in defense against herbivores. Plant Physiol 146:888-903. McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654. Nei, M. 1987. Molecular Evolutionary Genetics. Columbia Univ. Press, New York. Neiman, M., M. Olson M, and P. Tiffin. 2009. Selective histories of protease inhibitors: elevated polymorphism, purifying selection, and positive selection driving divergence of recent duplicates. New Phytol 183:740-750. Nordborg, M. 2000. Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with partial self-fertilization. Genetics 154:923-929. Olson, M. S., A. L. Robertson, N. Takebayashi, et al. 2010. Nucleotide diversity and linkage disequilibrium in balsam poplar (Populus balsamifera). New Phytol 186:526-536. Przeworski, M. 2002. The signature of positive selection at randomly chosen loci. Genetics 160:1179-1189. Rae, A. M., M. P. C. Pinel, C. Bastien, et al. 2008. QTL for yield in bioenergy populus: Identifying G x E interactions from growth at three contrasting sites. Tree Genet Genom 4:97-112.   76 Rand, D. M, and L. M. Kann. 1996. Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans Mol Biol Evol 13:735-748. Remington, D. L., J. M. Thornsberry, Y. Matsuoka, et al. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. PNAS 98:11479- 11484. Rose, L. E., P. Bittner-Eddy, C. H. Langley, et al. 2004. The maintenance of extreme amino acid diversity at the disease resistance gene, RPP13, in Arabidopsis thaliana. Genetics 166:1517-1527. Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, et al. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other analyses. Bioinformatics 19:2496-2497. Rozen, S, and H. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365-386. Stephens, M., and P. Scheet. 2005. Accounting for decay of linkage disequilibrium in haplotype inference and missing data imputation. Am J Hum Genet 76:449-462. Stephens, M., N. Smith, and P. Donnelly. 2001. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978-989. Strand, A. E., J. Leebens-Mack, and G. B. Milligan. 1997. Nuclear-DNA-based markers for plant evolutionary biology. Mol Ecol 6:113-118. Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595. Tuskan, G. A., S. DiFazio, S. Jansson, et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596-1604. Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7:256-276. Weir, B. S. 1990. Genetic data analysis: methods for discrete population genetic data. Sinauer Associates, Sunderland. Zhang, L., and W. H. Li. 2004. Mammalian housekeeping genes evolve more slowly than tissue- specific genes. Mol Biol Evol 21:236-239. Zhu, Q., X. Zheng, J. Luo, et al. 2007. Multilocus analysis of nucleotide variation of Oryza sativa and its wild relatives: severe bottleneck during domestication of rice. Mol Biol Evol 24:875-888.   77 Zou, C., M. D. Lehti-Shiu, M. Thomashow, et al. 2009. Evolution of stress-regulated gene expression in duplicate genes of Arabidopsis thaliana. PLoS Genet 5:e1000581.    78 Chapter 4. Comparative nucleotide diversity across North American and European Populus species1 4.1 Introduction Comparative genomics has been applied to several model organisms (e.g., Brassica: Kowalski et al. 1994; Arabidopsis: Wright 2003; rice: Jianxin and Bennetzen 2004).  Genome comparison is an effective approach in addressing many biological and evolutionary questions at different hierarchical levels, such as among and within genera.  Accordingly, it is of interest to utilize this approach to understand the different evolutionary forces acting on orthologous genes. In closely related species, comparative genomics sheds light on the similarities and differences in genomic regions, whereas the comparison of distantly related species might reveal sequences underlining different selection mechanisms (Hardison 2004).  For instance, in species that diverged less than 20 million years ago (e.g., Populus, Eckenwalder et al. 1996) comparative genomics may explain the early evolutionary forces that shape related species and highlight the molecular changes associated with ancestral and/or newly derived changes within a single clade (Jackson et al. 2006). From an evolutionary perspective, comparisons among closely related species could be limited because a lack of sequence variation among recently diverged species, which limits our ability to compare conserved and functional regions (Boffelli et al. 2003).  Therfore, the inclusion of distantly related species in genomic comparison studies is expected to provide a broader understanding of sequence divergence. Populus is a widely distributed tree comprising ~29 species falling into six sections (Eckenwalder et al. 1996).  In North America, the section Tacamahaca includes two closely related species that are considered as sister species (Populus trichocarpa Torr. & Gray and Populus balsamifera L., Eckenwalder et al. 1996).  Both taxa are still undergoing divergence as supported by phylogenetic studies using morphological traits (Eckenwalder 1996) and nuclear markers (Hamzeh and Dayanadan 2004).  Some authers consider them a single species (Brayshaw 1965).  The limited fossil record for poplars makes it difficult to estimate the time of ______________________________ 1 A version of this chapter will be submitted for publication.  Ismail M., R. Y. Soolanayakanahally, P. K. Ingvarsson, R. D. Guy, S. Jansson, S. N. Silim, and Y. A. El-Kassaby. Comparative nucleotide diversity across North American and European Populus species.    79 divergence between different poplar species; however, fossils were used to determine the evolutionary pattern at the section level (Eckenwalder 1996).  In addition, most of the early fossil records that were thought to belong to poplars have been shown to belong to other taxa (Eckenwalder 1996).  Furthermore, the feasibility of using fossil record of highly similar species such as P. trichocarpa and P. balsamifera for evolutionary comparisons has been questioned (Eckenwalder 1977).  The relationships among and within different Populus sections have been inferred using molecular data (Hamzeh and Dayanandan 2004; Cervera et al. 2005; Ingvarsson 2010); however, these studies did not focus on the time of diversification.  The application of “molecular clock” calculations to estimate the time of divergence is widely used in different taxa, either with or without calibration (Hedges and Kumar 2003).  For example, Ingvarsson (2008) used the silent site divergence rate to estimate the time of bottleneck events in Populus tremula L.  Previous reports showed that P. trichocarpa and P. balsamifera usually form one clade isolated from Populus tremuloides Michx., the closest species to the Eurasian P. tremula (Eckenwalder 1996; Hamzeh and Dayanadan 2004; Hamzeh et al. 2006).  This indicates that the North American sister species probably diverged long ago from aspens, resulting in development of different genetic backgrounds and life history traits (Eckenwalder 1996).  In this study, we compare the level of nucleotide diversity among three Populus species; P. tremula, P. trichocarpa, and P. balsamifera which are distributed along similar latitudes (56-66 ºN, 32-62 ºN, and 42-68 °N, respectively) indicating their adaptability to a broad range of environments. The availability of plant materials from the three species, particularly from P. tremula, makes it possible to include them in the current study.  The three species exist in the Northern hemisphere and probably exposed to similar geoclimatic conditions, nontheless they exist in different continents.  However, independent studies (e.g.; Gilchrist et al. 2004; Ingvarsson et al. 2005; Olson et al. 2010) showed stark differences among the three species that might reflect different evolutionary history.  Most of the genetic diversity investigations conducted on these three species have focused on assessment of within-species genetic diversity (Gilchrist et al. 2004; Ingvarrson et al. 2005; Breen et al. 2009; Olson et al. 2010; Chapter 3) and the lack of orthologous genes has hindered among species comparisons.  Consequently, strong conclusions are not warranted in comparing the results from studies that used different targeted gene loci and/or a different number of genotypes. It is worthwhile to investigate sequence diversity and evolutionary patterns across species in a consistent set of loci that might be associated with their local adaptation.  Here I hypothesize   80 that the different evolutionary histories of these species have shaped their nucleotide diversity in spite of their similar latitudinal climatic conditions.  Across species, comparison was conducted using the same set of loci that are known to play a role in defence, housekeeping, stress response, photoperiodism, and freezing tolerance to address the following questions: 1. What is the extent of nucleotide diversity within the studied loci for each speceis? 2. What is the approximate time to their most recent common ancestor (mrca)? 3. What is the rate of decay in linkage disequilibrium (LD) across the three species? 4.2 Materials and methods 4.2.1 Plant material and DNA extraction P. trichocarpa samples were collected from a range-wide replicated clonal trial established by the British Columbia Ministry of Forests and Range (Xie et al. 2009).  The sample consisted of five populations each represented by six clones (N = 30) originating from different latitudes west of the Rocky Mountains (Table 4.1).  Similarly, P. balsamifera samples represented by five populations each consisted of six clones (N = 30) originated from different latitudes east of the Rocky Mountains from the Agriculture Canada balsam poplar (AgCanBaP) collection (Table 4.1).  The P. tremula samples consisted of three populations with 10 clones per population (N = 30) from the Swedish Aspen (SwAsp) collection (Ingvarsson 2006) (Table 4.1). During winters of 2005 and 2006, dormant P. balsamifera and P. trichocarpa whips were collected and forced to flush in a growth chamber at 25°C and photoperiod of 16 hours per day. Newly flushed buds were collected and stored at -80°C.  DNA extraction followed the method described by Doyle and Doyle (1987).  P. tremula total genomic DNA was extracted from frozen leaf tissues using the DNeasy plant mini prep kit (Qiagen Inc. Valencia, CA). 4.2.2 Candidate gene selection and fragment amplification and sequencing The main focus was to study multiple loci associated with diverse biological functions that have putative roles in tree response to biotic and abiotic stress and growth.  The studied loci are well documented and characterized in different poplar species (Benedict et al. 2006; Major and Constable 2008; Auge et al. 2009) and were classified as 1) Defence genes ((I) Inducible defence genes: three wound-inducible Kunitz trypsin inhibitor (KTI) genes: TI-3, TI-4, and WIN3, (II) Cysteine protease inhibitor gene CI-1), 2) Stress response genes (Phenylalanine ammonia-lyase (PAL), Endochitinase (Chi), and Dehydrin (Dhn)), 3) Photoperiodism genes   81 (GIGANTEA (GI) and Phytochrome B2 (PHYB2)), and 4) freezing tolerance gene PtCBF2.  In addition to the previously mentioned tissue-specific genes, a housekeeping gene Glyceraldehyde- 3-phosphate dehydrogenase (Gapdh) was also surveyed (Table 4.2). Specific primers for TI-3, TI-4, WIN3, GI, and PtCBF2 were designed using the online program Primer 3.0 (Rozen and Skaletsky 2000) from published sequences of gene bank accession numbers AY378088, AY378089, X15516, BU833698, and DQ354395, respectively (Table 4.2).  Primers to amplify Dhn and PAL were obtained from Joseph and Lexer (2008).  Part of the Gapdh gene was amplified in the three species using primers GPDX7F and GPDX9R from Strand et al. (1997).  Finally, an unpublished primer pair was used to amplify part of PHYB2 (Par K. Ingvarsson, Umeå Plant Science Centre, Department of Ecology and Environmental Science, University of Umeå, unpublished).  Polymerase chain reaction (PCR) amplification was performed in 25 µL reaction volume consisting of 70 ng genomic DNA, 0.25 mM of dNTP, 4 µL reaction buffer containing: 50 mM KCl, 15 mM Tris-HCl and 2.5 mM MgCl2), 40 pmol each of forward and reverse primer, and one unit of AmpliTaq Gold DNA polymerase [Applied Bisoystems, Foster City, CA, USA (ABI)].  Amplifications were performed using a GeneAMP 9700 thermocycler (ABI) with the following PCR conditions: i) initial denaturation at 94°C for 3 min, ii) 30 cycles of 94°C for 30 s, [45, 58.5 or 61°C according to each primer pair, Table 4.2] for 30 s, and 72°C for 1 min, and iii) 4 min final extension at 72°C.  No amplicons were obtained for GI and PTCBF2 in P. tremula.  PCR products were then cleaned using Qiagen MinElute Kit (Qiagen, Valencia, CA, USA) before using in direct sequencing.  PCR products were sequenced in both directions using the same primers used for loci amplifications and Big Dye Terminator v3.1 (Applied Biosystems, Foster City, CA) on 3730x1 DNA analyzer Applied Biosystems (ABI). 4.2.3 Sequence analysis and SNP discovery Sequence alignment, editing, and SNP discovery were carried out using CodonCode Aligner 2.0.6 (CodonCode Corporation, Dedham, MA, USA).  Raw sequences were base-called using Phred quality score > 30 (Ewing and Green 1998), implemented in CodonCode Aligner. All chromatograms were visually checked for base calling errors.  Sequence ends were trimmed until the average Phred quality score was > 25 in a window of 10 bases.  Heterozygous SNPs were scored using the ‘call second peaks’ function in CodonCode Aligner with the minimum lower peak height set at 60% of the first peak and manually confirmed.  Consensus sequences were assembled using Phrap as implemented in CodonCode Aligner.  After construction of   82 consensus sequences, all polymorphic sites were verified against the original chromatograms. The sequences were then blasted against the P. trichocarpa genome sequence for verification using the JGI database (DoE Joint Genome Institute and Poplar Genome Consortium; http://genome.jgi-psf.org/Poptr1/Poptr1.home.html).  The same database was also used to identify exonic and intronic regions.  The sequences have been submitted to the European Molecular Biology Laboratory (EMBL)/GenBank nucleotide sequence database under accession numbers (FN669610 - FN669634). 4.2.4 Nucleotide diversity and deviation from neutrality Total nucleotide diversity was estimated as π (Lynch and Crease 1990) and θw (Watterson 1975).  Presence of non-neutral evolution for the studied loci was estimated according to Tajima’s D (Tajima 1989) and Fay and Wu's H, HFay&Wu (Fay and Wu 2000) using DnaSP 4.9 (Rozas et al. 2003).  Pairwise divergence was obtained among all three species (P. balsamifera vs. P. tremula, P. trichocarpa vs. P. tremula, and P. balsamifera vs. P. trichocarpa) at silent and replacement sites and was used in the McDonald-Kreitman (MK) test (McDonald and Kreitman 1991) and estimates of the neutrality index (NI; Rand and Kann 1996) as implemented in DnaSP. In order to investigate the deviation from neutrality across all loci, a multilocus Hudson- Kreitman-Aguadé test (HKA, Hudson et al. 1987) as implemented in the program HKA (http://lifesci.rutgers.edu/~heylab) was applied.  In this test, three different analyses were conducted: 1- P. tremula used as an outgroup with P. balsamifera sequence data, 2- P. trichocarpa used as an outgroup with P. balsamifera sequence data, and 3- P. tremula used as an outgroup with P. trichocarpa sequence data.  The GI and PTCBF2 loci were excluded from these analyses since no data were available for P. tremula (i.e.; the two loci failed to amplify in P. tremula). 4.2.5 Molecular clock and divergence time estimates In this analysis the concatenated sequences obtained from the consensus sequences across loci were used.  The concatenated sequence approach is expected to out-perform the use of individual loci in inferring diversification time estimates (e.g., Hedges and Kumar 2003; Battistuzzi et al. 2010).  Additionally, sequence information from different loci that evolved independently is expected to provide a species tree that is similar to the gene tree, particularly in closely related species (Pamilo and Nei 1988).  The concatenated sequence was obtained in each species using DnaSP.  Only the common shorter fragments amplified across species were   83 considered.  The concatenated sequences were then used to determine the most probable nucleotide substitution model using Findmodel software (http://www.hiv.lanl.gov/content/sequence/findmodel/findmodel.html), which implements different packages to obtain estimates for 28 or 12 different nucleotide substitution models (see the program website for details).  The selection of the best model that fits the data was based on Akaike information criterion (AIC) (Akaike 1974).  The model with lowest AIC is considered to be the best fit to the data.  Following model selection, the divergence times and substitution rates were estimated using the Bayesian approach implemented in BEAST 1.5.3 (Drummond and Rambaut 2007).  The xml input file of the concatenated sequences was obtained using BEAUTi 1.5.3 (Rambaut and Drummond 2007a).  Two different models were used to investigate the time- scale of diversification among the three Populus species; the relaxed molecular clock that assumes different evolutionary rates among lineages, and the strict molecular clock that assumes equal evolutionary rates among tree branches (Sanderson et al. 1997).  In both analyses, the Hasegawa, Kishino, and Yano (HKY) model of nucleotide substitution (Hasegawa et al. 1985) was used as suggested by Findmodel (see results).  To calibrate the molecular clock, a calibration point of 5 million years (Ma) ± standard deviation of 1.2 Ma was imposed at the node connecting P. trichocarpa and P. tremula.  The selection of this calibration point was based on a previously estimated time of divergence between P. tremula and P. trichocarpa (3.8-6.2 Ma; Ingvarsson 2005).  This calibration point was also used to estimate mean substitution rate per site per unit time (Ma).  The posterior distribution of parameter estimates was obtained by two independent runs of Markov Chain Monte Carlo simulation (MCMC), each consisting of 6 x 106 steps following a burn-in of 6 x 104 steps sampling every 300 generations.  For each analysis the convergence of MCMC was confirmed using Tracer 1.5 (Rambaut and Drummond 2007b). 4.2.6 Haplotype inference and analysis of linkage disequilibrium (LD) In each of the studied species, the haplotype phase for heterozygous alleles was inferred using PHASE 2.1.1 (Stephens et al. 2001; Stephens and Scheet 2005).  The default model (-MR0) was used, the general model for recombination rate variation (Li and Stephens 2003).  A separate PHASE run was performed for each gene fragment.  Each run had a burn-in of 100 followed by 10,000 iterations.  Haplotypes were determined at a confidence probability of ≥ 95%.  All indels were excluded from the analyses.  Following haplotype phasing, the number of haplotypes (H) and haplotype diversity (Hd; Nei 1987) for each locus were estimated using DnaSP.   84 The square of the correlation coefficient between pairs of segregating sites (r2; Weir 1990) at the studied loci in each species was obtained using DnaSP and statistical significance of the r2 values was determined using chi-squared test and Bonferroni corrections.  The LD decay, with physical distance in base pairs (bp) pooled across loci, was assessed between polymorphic sites using nonlinear regression analysis of r2 values (Remington et al. 2001; Ingvarsson 2005a). The expected value of r2 under drift-recombination equilibrium is E(r2) = 1/(1+ρ), where ρ = 4Nc is the scaled recombination rate for the gene region, N is the effective population size, and c is the recombination rate between sites.  When a low mutation rate is assumed, the expected decay of LD becomes         ++ +++ +      ++ + = )11)(2( )1212)(3(1)11)(2( 10)( 2 2 ρρ ρρρ ρρ ρ n rE      (1) (Hill and Weir 1988), where n is number of sampled sequences.  Equation (1) was fitted using pooled r2 with PROCNLIN procedure of SAS software (version 9.1, SAS Institute, Inc., Cary, NC, www.sas.com).  The nonlinear regression yields least-square estimates of ρ per base pair. Although this estimate may not be precise due to non-independence between linked sites (Remington et al. 2001), it is nonetheless still useful to illustrate the rate of LD decay with physical distance. 4.3 Results 4.3.1 Comparative nucleotide diversity in Populus A total of 4.4, 4.0, and 3.7 kilobases (kb) were sequenced across studied loci in P. trichocarpa, P. balsamifera, and P. tremula, respectively (Table 4.3).  The observed total number of SNPs ranged between 27 and 70 across species with the highest number observed for P. balsamifera.  The average nucleotide diversity (πT) was highest for P. tremula (7.52 x 10-3) followed by P. balsamifera (5.34 x 10-3) and P. trichocarpa (3 x 10-3) (Table 4.4).  A similar trend was observed for the nucleotide diversity per site (θw) with P. tremula showing the highest diversity (4.97 x 10-3), followed by P. balsamifera (4.48 x 10-3) and P. trichocarpa (2.51 x 10-3) (Table 4.4).  The Dhn locus had the highest nucleotide diversity (πT and θw) in P. tremula while GI nucleotide diversity was highest for P. balsamifera.  Nucleotide diversity for PHYB2 was consistently low across all three species (Table 4.4).   85 4.3.2 Neutrality tests Across the studied loci, TI-3 and WIN3 had slightly higher divergence at silent sites compared to the other loci in P. balsamifera when P. tremula was used as an outgroup; however, only TI-4 had Ka/Ks > 1 (Figure 4.1a).  Additionally, the MK test indicated that WIN3 has the highest level of replacement mutations (neutrality index; NI = 0.750, P < 0.034, Table 4.5).  On the other hand, Gapdh had only one fixed mutation at a replacement site.  Significant departure from neutrality was observed only for WIN3 as indicated by Fay and Wu’s H (-2.742, P < 0.05). As expected, the two North American species showed no signs of divergence at either synonymous or replacement sites (Figure 4.1b).  A similar divergence pattern was observed for WIN3 in P. trichocarpa when P. tremula was used as an outgroup (Figure 4.1c).  Among the studied loci, only TI-4 in P. trichocarpa showed deviation from neutrality as indicated by Tajima’s D (data not shown).  Additionally, no deviation from neutrality was observed at the multilocus level as indicated by the non-significant HKA test when P. tremula was used as an outgroup (X2 = 3.13, d.f. = 6, P = 0.80) or when P. trichocarpa was used as an outgroup (X2 = 3.40, d.f. = 6, P = 0.76). 4.3.3 Divergence time estimates The concatenated sequence for the cross-amplified loci in all three species was 2.779 kb obtained from TI-3, TI-4, WIN3, Dhn, Gapdh, PAL, and PHYB2.  Results from Findmodel using the AIC indicated that HKY (Hasegawa et al. 1985) is the most appropriate evolutionary substitution model.  Bayesian analysis of the estimated time to the most recent common ancestor (mrca) between P. tremula and both of P. trichocarpa and P. balsamifera was 4.5 Ma with a 95% confidence interval of 2.7-6.3 Ma which was consistent in relaxed and strict clock models. The estimated time to mrca between P. trichocarpa and P. balsamifera was slightly different among clock; 0.76 and 0.83 Ma for the relaxed and strict models with 95% confidence intervals of 0.15-2.52 and 0.39-1.35 Ma, respectively.  The estimated nucleotide substitution per Ma combined for the three species was 0.005 regardless of the model used; however, substitution rate varied among species.  P. tremula showed a 0.02 per Ma average substitution estimate whereas in the North American species substitution per Ma was 0.004; nonetheless, the estimated rates were consistent in both models.    86 4.3.4 Haplotype structure and linkage disequilibrium The number of haplotypes (H) ranged from 1-36 across all loci.  The total number of haplotypes varied among species and among loci within species.  In total, P. tremula had 83 haplotypes compared to 86 and 39 for P. balsamifera and P. trichocarpa, respectively (Table 4.6).  Haplotype diversity (Hd) varied substantially among species and ranged between 44-97, 0.0-96, and 0.0-88% for P. tremula, P. balsamifera, and P. trichocarpa, respectively (Table 4.6). Among the nine loci, the wound-inducible Kunitz trypsin inhibitor (TI-3) had haplotype diversity ≥ 85% across the three species, while no variation was observed in P. balsamifera for PHYB2 as compared to the other two species (Table 4.6).  Linkage disequilibrium pooled across loci varied among the three species and showed a similar pattern in the two North American species, which was different from that of P. tremula.  Across loci, the non-linear regression model for LD (r2) with distance (bp) showed rapid decay of LD in both P. trichocarpa and P. balsamifera with r2 declining to ~0.18 within < 400 bp, a situation different from that of P. tremula (Figure 4.2). 4.4 Discussion 4.4.1 Comparative nucleotide diversity in Populus Nucleotide diversity per site θw was higher in P. tremula (0.005) compared to the two North American species (0.0045, 0.0025 for P. balsamifera and P. trichocarpa, respectively). This finding is comparable to a previous report in P. tremula using a set of 77 genes (θw = 0.0048; Ingvarsson 2008).  Generally, compared to P. trichocarpa, P. balsamifera had two-fold higher nucleotide diversity (θw = 0.0045 vs. 0.0025).  The P. balsamifera finding is approximately two-fold greater than previously reported for the same species (θw = 0.0028; Breen et al. 2009; Olson et al. 2010).  The nucleotide diversity observed in P. trichocarpa is similar to that previously reported for this species in one Alaskan population (Breen et al. 2009; θw = 0.003, averaged across three gene loci) while higher than that observed for a subsample from the species range using nine loci (Gilchrist et al. 2006).  Among the studied species, P. tremula harboured the highest nucleotide diversity and this might indicate possible admixture from related species in native range (Lexer et al. 2005) or experiencing less bottlenecks. Moreover, the demographic histories of population expansion and contraction might have played another role in this pattern, leading to highly diverged P. tremula populations (Ingvarsson 2008), while the North American P. trichocarpa may still be undergoing range contraction and expansion following the ice sheet retreat (Chapter 2), indicating the presence of fewer founder   87 populations.  This discrepancy was also prominent in Arabidopsis where the European Arabidopsis lyrata ssp. petraea had higher diversity compared to the North American Arabidopsis lyrata ssp. lyrata (Wright et al. 2003). The higher nucleotide diversity observed for P. balsamifera compared to its sister species probably reflects substantial expansion of the former compared to the latter.  P. balsamifera is thought to have undergone massive range expansion that initiated from a central admixed population towards northern and eastern United States (Keller et al. 2010).  Although P. balsamifera populations used in the present study were sampled from the western part of the species range, this indicates that these populations harbour substantial variability, which is in line with previous findings (Keller et al. 2010).  On the other hand, P. trichocarpa expansion in the Pacific Northwest was limited within a narrow range (i.e., 15° in longitude and 30° in latitude) as opposed to P. balsamifera which extended over a much wider range (i.e., 110° in longitude and 26° in latitude), reflecting the latter species’ adaptability to diverse environmental conditions.  In addition, P. trichocarpa is thought to have spread from two different glacial refugia (i.e., northern and southern) followed by bottlenecks (e.g., heterozygote deficiency, Chapter 2) and range expansion after deglaciation, nevertheless harbouring less variability than P. balsamifera. More interestingly, when analyzing the combined data of the two sister species (i.e., as a single species), the average nucleotide diversity (πT) was similar to that of P. balsamifera but remained lower than that of P. tremula (data not shown).  However, the nucleotide diversity per site (θw) was higher for the combined North American species as compared to individual species’ estimate, yet it was slightly higher than that of P. tremula (data not shown). The observed lower nucleotide diversity in North American poplars compared to Eurasian aspen reflects the presence of one or more evolutionary forces that contributed to the discrepancy among species sharing similar geoclimatic conditions yet which exist on a different continent and belong to different sections of Populus (Eckenwalder 1996).  The multiple colonization routes from southern and western Europe (Hewitt 2000; Petit et al. 2003) combined with admixture events in P. tremula with its relative P. alba (Lexer et al. 2007) might have played a significant role in the observed high nucleotide diversity in P. tremula compared to the North American cottonwoods.  In addition, P. tremula is characterized by higher effective population size (Ingvarsson 2008) as compared to that of P. balsamifera (Olson et al. 2010). During the Pleistocene glaciations, plant and animal populations in North America and Europe were affected to different degrees.  The existence of the unglaciated areas in Asia probably   88 prevented severe bottlenecks from occurring and hence maintained higher effective population size in P. tremula (Olson et al. 2010), whereas the two North American species probably underwent severe bottlenecks during glaciation that led to fewer founder events. In the present study, very low fixed polymorphisms were observed in P. trichocarpa and P. balsamifera (three fixed mutations, data not shown) indicating high sequence similarities, in fact they share 25% of their ancestral polymorphisms (Olson et al. 2010).  The two North American species commonly form a monophyletic group (Eckenwalder 1996; Hamzeh and Dayanadan 2004; Ingvarsson 2010) reflecting their genetic similarity.  On the other hand, the North American species showed a comparable level of fixed mutations with P. tremula; 95 in P. balsamifera and 83 in P. trichocarpa (Table 4.5 and Table 4.7, respectively); indicating low shared polymorphism among the North American and Eurasian poplars, congruent with previous reports where only 3.9% of shared polymorphisms was observed between P. tremula and P. trichocarpa (Ingvarsson 2010). 4.4.2 Deviation from neutrality An excess of amino acid divergence (NI < 1) was observed at the three defence loci in P. balsamifera when P. tremula was used as an outgroup; however, this excess was only significant for WIN3 (P < 0.05) (Table 4.5).  In addition, the same locus showed a non-neutral evolution pattern as indicated by HFay&Wu  Interestingly, the same locus showed a pattern of purifying selection in P. balsamifera when P. trichocarpa was used as an outgroup (data not shown). Furthermore, the divergence at synonymous and nonsynonymous sites was similar among the North American species and most of the loci had Ks = Ka (Figure 4.1b) which might indicate that these loci are highly conserved or simply that they are evolving under neutral constraints.  On the other hand, only TI-4 in P. balsamifera showed higher nonsynonymous versus synonymous divergence when P. tremula was used as an outgroup (Figure 4.1a), probably reflecting purifying selection at this locus in P. balsamifera. The deviation from neutrality observed at WIN3 in P. balsamifera when P. tremula or P. trichocarpa was used as an outgroup and not observed in P. trichocarpa in the opposite scenario (due to the lack of polymorphisms at both synonymous and non synonymous sites, Table 4.7) might reflect two possible scenarios: 1) response to different herbivores (Hollick and Gordon 1993), hence maintaining diverse alleles or 2) presence of purifying selection (i.e., ka/ks < 1). With the exception of WIN3, all studied loci showed no evidence of adaptive evolution, which was also supported by the multilocus HKA test.  The lack of purifying selection that was   89 consistent across most loci and across species might be explained under different scenarios: 1) the lower effective population size attributable to North American poplar compared to European aspen, where species with lower effective population size are expected to show lower constraints of purifying selection (Eyre-Walkert et al. 2002), or 2) the studied loci are evolving under no functional constraints despite the differences in demographic histories for the studied species. 4.4.3 Sequence divergence in Populus The estimated time to mrca between the North American and Eurasian poplars obtained in the current study was ~4.5 Ma which falls within the range initially suggested as the time of divergence between P. tremula and P. trichocarpa (3.8-6.2 Ma; Ingvarsson 2005a); however, it is near to the lower bound of the estimated time where Populus emerged as a genus (5-10 Ma; Tuskan et al. 2006).  As expected, the two North American species shared a common ancestor from a more recent time; the estimate ranged between 0.76 and 0.83 Ma depending on the molecular clock model assumed.  Additionally, P. tremula showed a higher substitution per million years (0.02), yielding an estimate of 2 x 10-8 substitutions per site per year, five-fold higher than that obtained for the P. trichocarpa and P. balsamifera (0.4 x 10-8). The lower substitution rate observed for the North American species compared to P. tremula is probably associated with succession glacial events (e.g., Pleistocene) in North America.  During this time span, approximately half of North America was under the Laurentide ice sheet, which probably allowed fewer populations with lower genetic variability to survive (Comes and Kadereit 1998).  This is reflected in P. balsamifera where populations across the native range formed only three genetic groups (Keller et al. 2010) as compared to P. trichocarpa which probably expanded from two refugia (Chapter 2).  Therfore, it is likely that the North American poplar went through different demographic conditions compared to the Eurasian counterpart and the two sister species have low level of genetic divergence that is apparent from their sequence similarities.  Although the estimated substitution rate varied between the North American and Eurasian Populus, the overall rate of nucleotide substitution per year across species was 0.5 x 10-8.  While the synonymous substitution rate in Populus is thought to be 1/6th that of Arabidopsis (Tuskan et al. 2006), in the present study, the overall estimate of substitution rate (including synonymous and nonsynonymous sites) was only 1/3rd that of Arabidopsis (Koch et al. 2000), probably reflecting different substitution rates at nonsynonmous compared to synonymous sites.   90 4.4.4 Linkage disequilibrium in Populus The expected LD decay with distance observed in the present study is in sharp contrast to that previously reported in P. tremula (Ingvarsson 2005, 2008) and P. balsamifera (Olson et al. 2010); however, it is in line with previous reports in P. trichocarpa (Gilchrist et al 2004, Chapter 3).  The rapid LD decay with distance observed in P. trichocarpa and P. balsamifera (r2 < 0.3 in < 400 bp) is similar to that reported for other outcrossing species (maize: Guillet-Claude et al. 2004; sunflower: Liu and Buker 2006) and poplar species (P. nigra, Chu et al. 2008).  The disparity of LD decay observed in P. balsamifera (this study) compared to that reported by Olson et al. (2010) is probably attributable to the loci surveyed here, which were mainly involved in defence and stress response and most likely have undergone recombination resulting in more rapid decay of LD.  These observations are indicative of higher recombination events, previously reported in disease resistance genes (Ellis et al. 2000; Rose et al. 2004).  In addition, LD within a species varies significantly among genomic regions and among populations, hence the current estimates of fast LD decay in P. balsamifera indeed provide evidence of high recombination events at the studied loci and do not necessarily contradict previous findings.  On the other hand, the lack of LD decay observed in P. tremula was unexpected given the fact that this species has the higher effective population size and admixture compared to the North American species.  In P. tremula, LD decay varied among populations, which in one population extended up to more than 1 Kb (Ingvarsson 2005) as compared to the overall LD that decayed within 500 bp (Ingvarsson 2005, 2008).  This pattern was not observed in the present study for the overall LD across loci; however, considering LD variability among populations and the total sites analyzed in P. tremula (1kb shorter compared to North American poplars), it is possible that the observed LD in the present study is underestimated and longer fragments are required for reasonable assessments of LD in P. tremula. 4.5 Conclusions In the present study, populations from two North American poplars (P. trichocarpa and P. balsamifera) were sampled from both sides of the Rocky Mountains and a western Eurasian species P. tremula was sampled from Sweeden, all sampled across a similar latitudinal gradient. Although P. balsamifera and P. trichocarpa are considered as two sister species, the former showed greater nucleotide diversity than the latter.  A similar pattern was also observed between P. balsamifera and P. tremula.  In addition, among the studied species, P. tremula harboured the   91 highest nucleotide variability, probably attributable to admixture events and weak bottleneck effects.  As expected, the North American species showed similar divergence at synonymous and replacement sitesIn general, the divergence between the two sections appears to be quite recent (5 Ma); however, the time to mrca between the two North American cottonwoods is more recent. LD decays within short distances and shows a similar magnitude in the North American species as compared to P. tremula, which did not show any decay with distance at the studied loci.  This pattern was probably associated with sampling schemes, and calls for longer gene fragments for proper LD assessment in P. tremula.  With the exception of the WIN3 locus in P. balsamifera, the studied loci did not deviate significantly from neutral expectations.  These findings contribute to historical inferences regarding the genetic diversity between the two North American species and illustrate how different population history and life history traits can shape species genetic diversity despite sharing similar ecogeoclimatic conditions.  Additionally, the present study highlights the importance of using multiple loci to date the shared ancestry of closely related species.   92    Table 4.1 Location of the studied populations in three Populus species. Species Population Latitude (°N) Longitude (°W/°E) Chilliwack River 49º 06´ 121º 39´ West Klinakline River 51º 26´ 125º 37´ Dean River 52º 49´ 126º 46´ Nass River 56º 34′ 129º 49′ P. trichocarpa Taku River 58º 42´ 133º 24´ Cypress Hills 49º 04´ 109º 28´ Stettler 52º 26´ 112º 44´ Grand Prairie 54º 45´ 118º 53´ Whitehorse 60º 42´ 135º 21´ P. balsamifera Denali National Park 63º 39´ 148º 51´ Svalöv 56º 42´ 13º 15´ Brunsberg 59º 37´ 12º 57´ P. tremula Arjeplog 66º 12´ 18º 25´    93    Table 4.2 Primer sequences and annealing temperatures for nine candidate gene loci used in the study. Gene Gene bank reference LG Primer sequences (5'-3') Ta Product size (bp) TI-3 AY378088 XIX F: ATCGATGTCTTCGGTGA R: AGAAGCTCTATCGGATGGTA 45.0 474 TI-4 AY378089 IV F: CTTGACATTCAGGGCGAA R: AACCACGAAAGGTGA 45.0 509 WIN3 X15516 X F: CGATTTCTACGGTCGTGA R: CGCATCCGGTTTAAACCTA 58.5 465 Dhn BU863852 IV F: ACTGCCATGATGAGCGAAGATG R: GGTGTGTACCTCAGCGGTCT 58.5 540 Gapdh AJ843622 X F: GATAGATTTGGAATTGTTGAGG R: AAGCAATTCCAGCCTTGG 58.5 809 PAL EU603319 XVI F: TGGATTGCCATCAAATCTCA R: CTCTTGCGCTCTCAACCTCT 61.0 605 PHYB2 AF309807 X F: CGATTTCTACGGTCGTGA R: CGCATCCGGTTTAAACCTA 65.5 797 GI BU833698 V F: TAACATGGGAAGCCCATAGC R: CTGAATGGGAAAAAGGGTCA 61.0 619 PTCBF2 DQ354395 I F: CATTATCCGTGCCCAAAAGT R: CACAACGACCAATCAGCATC 61.0 755 TI-3, TI-4, WIN3, Kunitz trypsin inhibitors; Dhn, dehydrine; Gapdh,Glyceraldehyde-3-phosphate dehydrogenase; PAL, Phenylalanine ammonia-lyase; PHYB2, phytochrome B2; GI, GIGANTIA; PTCBF2, C-repeat binding factor; Ta, annealing temperature (ºC).   94    Table 4.3 Summary of single nucleotide polymorphisms for nine gene loci in three Populus species.   TI-3 TI-4 WIN3 Dhn Gapdh PAL PHYB2 GI PTCBF2 Average Total P. trichocarpa 21 19 19 21 22 24 23 20 4 19 P. balsamifera 23 23 25 18 24 22 20 21 23 22 Number of genotypes P. tremula 26 30 12 28 27 27 27 -a - 25   P. trichocarpa 443 328 439 446 723 406 553 512 634 498 4484 P. balsamifera 451 434 431 471 256 522 544 348 635 455 4092 Gene fragment size (bp) P. tremula 453 300 466 685 654 526 687 - - 539 3771   P. trichocarpa 8 12 0 4 1 0 1 1 0 3 27 P. balsamifera 17 8 9 5 2 4 0 23 2 8 70 SNP numbers P. tremula 16 2 16 20 11 3 1 - - 10 69 TI-3, TI-4, WIN3, Kunitz trypsin inhibitors; Dhn, dehydrin; Gapdh,Glyceraldehyde-3-phosphate dehydrogenase; PAL, Phenylalanine ammonia-lyase; PHYB2, phytochrome B2; GI, GIGANTIA; PTCBF2, C-repeat binding factor, a Data not available.    95    Table 4.4 Summary of total nucleotide diversity (πT), and nucleotide diversity per site (θw) (x10-3) for nine gene loci in three Populus species.   TI-3 TI-4 WIN3 Dhn Gapdh PAL PHYB2 GI PTCBF2 Average P. trichocarpa 8.70 14.32 0.00 2.40 0.29 0.00 1.13 0.19 0.00 3.00 P. balsamifera 8.81 4.70 4.42 2.11 2.65 1.47 0.00 23.22 0.64 5.34 πT P. tremula 8.31 3.38 13.09 18.60 6.15 2.44 0.64 -a - 7.52   P. trichocarpa 4.43 14.65 0.00 2.23 0.32 0.00 0.54 0.46 0.00 2.51 P. balsamifera 9.02 4.21 4.67 3.73 1.79 0.17 0.00 15.96 0.73 4.48 θw P. tremula 8.22 1.43 9.76 11.22 3.74 0.12 0.32 - - 4.97 aData not available.    96    Table 4.5 McDonald-Kreitman test (MK), neutrality index (NI), and Fay and Wu’s H in seven gene loci in P. balsamifera (P. tremula used as an outgroup). Synonymous Replacement Gene Fixed Poly Fixed Poly MK (P-value) NI  HFay & Wu (P-value) TI-3 7 7 13 11 0.059 (0.830) 0.846  -3.532 (0.066) TI-4 2 2 10 4 0.535 (0.920) 0.400  0.843 (0.985) WIN3 12 3 32 6 0.123 (0.034)* 0.750  -2.743 (0.048)* Dhn 0 0 1 2 -a - -0.318 (0.206) Gapdh 0 0 1 0 - - 0.098 (0.311) PAL 3 1 9 3 0.000 (0.465) 1.00  0.254 (0.465) PHYB2 3 0 2 0 - - - *P < 0.05, aNot defined.   97    Table 4.6 Inter-species number of haplotypes (H) and haplotype diversity (Hd) for nine examined gene loci.   TI-3 TI-4 WIN3 Dhn Gapdh PAL PHYB2 GI PTCBF2 Average Total H P. trichocarpa 14 9 1 7 2 1 2 2 1 4.33 39  P. balsamifera 31 9 10 5 3 3 1 22 2 9.56 86  P. tremula 13 4 7 36 16 5 2 -a - 11.86 83  Hd P. trichocarpa 0.879 0.421 0.000 0.722 0.206 0.000 0.476 0.097 0.000 0.31  P. balsamifera 0.965 0.863 0.728 0.535 0.577 0.545 0.000 0.899 0.198 0.59  P. tremula 0.851 0.666 0.826 0.975 0.902 0.728 0.440 - - 0.77 aData not avilable.    98    Table 4.7 McDonald Kreitman test (MK), neutrality index (NI), and Fay and Wu’s H in seven gene loci  in P. trichocarpa (P. tremula used as an outgroup). Synonymous Replacement Gene Fixed Poly Fixed Poly MK (P-value) NI HFay & Wu (P-value) TI-3 6 2 14 5 0.164 (0.685) 1.071 -0.576 (0.250) TI-4 3 5 1 6 0.186 (0.666) 3.600  0.603 (0.460) WIN3 9 0 38 0 -a - Dhn 0 0 1 0 - - Gapdh 1 0 0 1 - - PAL 1 0 6 0 - - PHYB2 0 0 3 1 - -0.336 (0.100) aNot defined.    99 Figure 4.1 Synonymous (Ks) and nonsynonymous divergence (Ka) for seven gene loci in three Populus species.     100    Figure 4.2 Decay of linkage disequilibrium in three Populus species based on nonlinear regression of squared allele frequency (r2) versus distance (bp) using equation (1).      101 4.6 Reference Akaike, H. 1974. A new look at the statistical model identification. IEEE Trans Autom Control 19:716-723. Auge, G. A., S. Perelman, C. D. Crocco, et al. 2009. Gene expression analysis of light-modulated germination in tomato seeds. New Phytol 183:301-314. Battistuzzi, F. U., A. Filipski, S. B. Hedges, et al. 2010. Performance of relaxed clock methods in estimating evolutionary divergence times and their credibility intervals. Mol Biol Evol doi:10.1093/molbev/msq014. Benedict, C., J. S. Skinner, R. Meng, et al. 2006. The CBF1- dependent low temperature signalling pathway, regulon, and increase in freeze tolerance are conserved in Populus spp. Plant Cell Environ 29:1259-1272. Boffelli, D., J. McAuliffe, D. Ovcharenko, et al. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299:1391-1394. Brayshaw, T. C. 1965. The status of the black cottonwood (Populus trichocarpa Torrey and Gray). Canad. Field-Naturalist 79:91-95. Breen, A. L., E. Glenn, A. Yeager, et al. 2009. Nucleotide diversity among natural populations of a North American poplar (Populus balsamifera, Salicaceae). New Phytol 182:763-773. Cervera, M. T., V. Storme V, A. Soto, et al. 2005. Intraspecific and interspecific genetic and phylogenetic relationships in the genus Populus based on AFLP markers. Theor Appl Genet 111:1440-1456. Comes, H. P., and W. Kadereit. 1998. The effect of Quaternary climatic changes on plant distribution and evolution. Trends in Plant Sci 3:432-438. Chu,Y., X. Su, Q. Huang, et al. 2009. Patterns of DNA sequence variation at candidate gene loci in black poplar (Populus nigra L.) as revealed by single nucleotide polymorphisms. Genetica 137:141-150. Doyle, J. J., and J. L. Doyle. 1987. A rapid DNA isolation procedure from small quantities of fresh leaf tissues. Photochemical Bulletin 19:11-15. Drummond, A. J., and A. Rambaut. 2007. BEAST. Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214-221. Eckenwalder, J. E. 1977. North American cottonwoods (Populus, Salicaceae) of sections Abaso and Aigeiros. J Arnold Arboretum 58:193-208.   102 Eckenwalder, J. E. 1996. Systematics and evolution of Populus. Pp 7-32 in R. F. Stettler, H. D. Jr. Bradshaw, P. E. Heilman, et al. (eds) Biology of Populus and its implications for management and conservation. NRC Research Press, Ottawa, Ontario, Canada. Ellis, J., P. Dodds, and T. Pryor. 2000. Structure, function and evolution of plant disease resistance genes. Curr Opin Plant Biol 3:278-284. Ewing, B., and P. Green. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186-194. Eyre-Walker, A., P. D. Keightley, N. G. C. Smith, et al. 2002. Quantifying the slightly deleterious mutation model of molecular evolution. Mol Biol Evol 19:2142-2149. Fay, J., and C-I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405- 1413. Gilchrist, E. J., G. W. Haughn, C. C. Ying, et al. 2006. Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol 15:1367-1378. Guillet-Claude, C., C. Birolleau-Touchard, P. Manicacci, et al. 2004. Nucleotide diversity of the ZmPox3 maize peroxidase gene: Relationships between a MITE insertion in exon 2 and variation in forage maize digestibility. BMC Genet 5:19. Hamzeh, M., and S. Dayanandan. 2004. Phylogeny of Populus (Salicaceae) based on nucleotide sequences of chloroplast TRNT-TRNF region and nuclear rDNA. Am J Bot 91:1398- 1408. Hamzeh, M., P. Perinet, and S. Dayanandan. 2006. Genetic relationships among species of Populus (Salicaceae) based on nuclear genomic data. J Torrey Bot Soc 133:519-527. Hardison, R. C. 2003. Comparative genomics. PLoS biology, 1(2):e58. doi: 10.1371/journal.pbio.0000058. Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160-174. Hedges, S. B., and S. Kumar. 2003. Genomic clocks and evolutionary timescales. Trends Genet 19:200-206. Hewitt, G. 2000. The genetic legacy of the Quaternary ice ages. Nature 405:907-913. Hill, W. G., and B. S. Weir. 1988. Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol 33:54-78.   103 Hollick, J. B., and M. P. Gordon. 1993. A poplar tree proteinase inhibitor-like gene promoter is responsive to wounding in transgenic tobacco. Plant Mol Biol 22:561-572. Ingvarsson, P. K. 2005. Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European aspen (Populus termula L., Salicaceae). Genetics 169:945-953. Ingvarsson, P. K. 2008. Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180:329-340. Ingvarsson, P. K. 2010. Nucleotide polymorphism, linkage disequilibrium and complex trait dissection in Populus. Pp 91-111 in S. Jansson, R. Bhaleroa, A. Groover (eds) Genetics and Genomics of Populus. DOI: 10.1007/978-1-4419-1541-2_5, Springer New York. Ingvarsson, P. K, M. V. Garcia, D. Hall, et al. 2006. Clinal variation in phyB2, a candidate gene for day-length-induced growth cessation and bud set, across a latitudinal gradient in European aspen (Populus tremula). Genetics 172:845-1853. Jackson, S., S. Rounsley, and M. Purugganan. 2006. Comparative Sequencing of Plant Genomes: Choices to Make. The Plant Cell 18:1100-1104. Jianxin, Ma., and J. L. Bennetzen. 2004. Rapid recent growth and divergence of rice nuclear genomes. PNAS 101:12404-12410. Joseph, J. A., and C. Lexer. 2008. A set of novel DNA polymorphisms within candidate genes potentially involved in ecological divergence between Populus alba and P. tremula, two hybridising European forest trees. Mol Ecol Notes 8:188-192. Keller, S. R., M. S. Olson, S. Silim, et al. 2010. Genomic diversity, population structure, and migration following rapid range expansion in the Balsam Poplar, Populus balsamifera. Mol Ecol 19:1212-1226. Koch, M. A., B. Haubold, and T. Mitchell-Olds. 2000. Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol Biol Evol 17:14831498. Kowalski, S. P., T. H. Lan, K. A. Feldmann, et al. 1994. Comparative mapping of Arabidopsis thaliana and Brassica oleracea chromosomes reveals islands of conserved organization. Genetics 138:499-510. Lexer, C., C. A. Buerkle, J. A. Joseph, et al. 2007. Admixture in European Populus hybrid zones makes feasible the mapping of loci that contribute to reproductive isolation and trait differences. Heredity 98:74-84.   104 Lexer, C., M. F. Fay, J. A. Joseph, et al. 2005. Barrier to gene flow between two ecologically divergent Populus species, P. alba (White Poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression. Mol Ecol 14:1045-1057. Li, N, and M. Stephens. 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213-2233. Liu, A. Z, and J. M Burke. 2006. Patterns of nucleotide diversity in wild and cultivated sunflower. Genetics 173:321-330. Lynch, M., and T. J. Crease. 1990. The analysis of population survey data on DNA sequence variation. Mol Biol Evol 7:377-394. Major, I. T., and C. P. Constabel. 2008. Functional analysis of the Kunitz trypsin inhibitor family in poplar reveals biochemical diversity and multiplicity in defense against herbivores. Plant Physiol 146:888-903. McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654 Nei, M. 1987. Molecular Evolutionary Genetics. Columbia Univ. Press, New York. Olson, M. S., A. L. Robertson, N. Takebayashi, et al. 2010. Nucleotide diversity and linkage disequilibrium in balsam poplar (Populus balsamifera). New Phytol 186:526-536. Pamilo, P., and M. Nei. 1988. Relationships between gene trees and species trees. Mol Biol Evol 5:568-583. Petit, J. R., I. Aguinagalde, J-L. de Beaulieu, et al. 2003. Glacial refugia: hotspots but not melting pots of genetic diversity. Science 300:1563-1565. Rambaut, A., and A. J. Drummond. 2007a. BEAUti v1.4.2. Bayesian Evolutionary Analysis Utility. Available from http://beast.bio.ed.ac.uk/BEAUti Rambaut, A., and Drummond AJ (2007b) Tracer v1.3. Available from http://tree.bio.ed.ac.uk/software/tracer/ Rand, D. M, and L. M. Kann. 1996. Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans Mol Biol Evol 13:735-748. Remington, D. L., J. M. Thornsberry, Y. Matsuoka, et al. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. PNAS 98:11479- 11484.   105 Rose, L. E., P. Bittner-Eddy, C. H. Langley, et al. 2004. The maintenance of extreme amino acid diversity at the disease resistance gene, RPP13, in Arabidopsis thaliana. Genetics 166:1517-1527. Rozen, S., and H. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods in Mol Biol 132:365-386. Sanderson, M. 1997. Nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218-1231. Stephens, M., and P. Scheet. 2005. Accounting for decay of linkage disequilibrium in haplotype inference and missing data imputation. Am J Hum Genet 76:449-462. Stephens, M., N. Smith N, and P. Donnelly. 2001. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978-989. Strand, A. E., J. Leebens-Mack, and G. B. Milligan. 1997. Nuclear-DNA-based markers for plant evolutionary biology. Mol Ecol 6:113-118. Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595. Tuskan, G. A., S. DiFazio, S. Jansson, et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596-1604. Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7:256-276. Wright, S., B. Lauga, and D. Charlesworth. 2003. Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol Biol Evol 19:1407-1420. Xie, C-Y., C. C. Ying, A. D. Yanchuk, et al. 2009. Ecotypic mode of regional differentiation caused by restricted gene migration: a case in black cottonwood (Populus trichocarpa) along the Pacific Northwest coast. Can J For Res 39:519-526.    106 Chapter 5. Association genetics in Populus trichocarpa: Methods comparison and caveats associated with sampling natural populations1 5.1 Introduction Understanding the genotype-phenotype relationship and revealing the role of genes/alleles in determining qualitative and quantitative traits are of a major biological and economic importance.  This relationship is considered the geneticist’s ultimate goal and is expected to aid in enhancing selection and population improvement programs.  Combined advances in genetic marker development and genotype-phenotype analytical methods have made it possible to gain insights into the nature of quantitative and adaptive trait inheritance, as well as the identification of the genes controlling them (i.e., number, magnitude, and their possible interaction) (Risch 2000).  The elusive promise of effective molecular markers utility in population improvement is becoming a reality for model and non-model species such as forest trees.  Neale and Savolainen (2004) presented a compelling argument supporting the feasibility of complex trait dissection in trees as well as demonstrating that their perceived drawbacks, such as long generation span and the required resources, time and space to conduct proper experiments, are in fact assets.  The predominantly outcrossing mating system, large unstructured natural populations, and the undomesticated nature of most tree species are expected to yield high levels of nucleotide diversity and low linkage disequilibrium (Garcia-Gil et al. 2003; Brown et al. 2004; Krutovsky and Neale 2005; Ingvarsson et al. 2008), thus making them suitable candidates for association genetics or linkage disequilibrium mapping (Neale and Savolainen 2004). The presence of confounding factors such as population (Flint-Garcia et al. 2003) and/or family (Voight and Pritchard 2005) structures were implicated as the main cause(s) for many reported spurious association(s) (i.e., false positives) (Lander and Schork 1994).  To overcome population and/or kinship structure drawbacks, several association genetics methods were developed and successfully implemented (Devlin and Roeder 1999; Pritchard et al. 2000; Marchini et al. 2001; Pritchard and Donnelly 2001; Thornsberry et al. 2001).  The development ______________________________ 1 A version of this chapter will be submitted for publication. El-Kassaby Y. A., M. Ismail, and P. K. Ingvarsson. Association genetics in Populus trichocarpa: Methods comparison and caveats associated with sampling natural populations.    107 of the “Unified Mixed-Model (UMM)” (Yu et al. 2006) is a breakthrough in association genetics research because it offers a simple method whereby both population and family structures, individually or collectively, are accounted for during data analyses.  The population subdivision is determined using a model-based Bayesian clustering algorithm (STRUCTURE; Pritchard et al. 2000), and genetic relatedness estimated using any of the pair-wise kinship estimation procedures (e.g., Ritland 1996; Lynch and Ritland 1999; Zhao et al. 2007), these are included as fixed and random effects in the UMM, respectively (Yu et al. 2006).  Statistically speaking, the UMM consists of multiple vectors accounting for phenotypic, genotype (fixed), population (fixed), kinship (random), and residual effects and the population and kinship vectors are, in turn, included in the model as required (Yu et al. 2006).  Furthermore, to overcome the somewhat computationally demanding STRUCTURE algorithm, specifically when a large number of populations and markers are used, Price et al. (2006) introduced the principle component analysis (PCA) approach to summarize population structure as an alternative. To date, very few association genetics studies have been reported for forest trees using material from either structured or unstructured populations and various analytical methods were applied, including classical GLM, with and without family structure (El-Kassaby 1982; Thumma et al. 2005), UMM (Gonzalez-Martinez et al. 2007; Ingvarsson et al. 2008; Eckert et al. 2009), and quantitative transmission test for linkage disequilibrium (QTDT) (Gonzalez-Martinez et al. 2008a, b).  In the absence of population and/or family structures, the inclusion of the Q (population structure) and/or K (family structure) matrices in the UMM is expected to unnecessarily remove parts from the error term’s variation and degrees of freedom to the population and/or family effects, resulting in changes to the mean square error term, which ultimately affects the association test’s statistical power.  Gonzalez-Martinez et al. (2007) and Ingvarsson et al. (2008) included both the Q and K matrices in their Mixed Liner Model even when population and/or family structures were lacking. The objectives of this study were to: 1) determine presence/absence of genotype- phenotype associations using 170 black cottonwood (Populus trichocarpa Torr. & Gray) individuals growing in a replicated clonal experiment with defined polymorphism at 185 SNPs from 10 candidate (adaptive, defence, and housekeeping) gene loci and three complex traits (height, bud burst, and resistance to Valsa sordida fungus), 2) test the efficacy of including the Q and/or K matrices in the association genetics analyses when samples are originated from natural populations, and 3) evaluate and compare different association mapping approaches.   108 5.2 Materials and methods 5.2.1 Plant material, phenotypic traits, and genotyping A sample of 170 black cottonwood (Populus trichocarpa) individuals growing in a 9- year-old replicated clonal trial established by British Columbia Ministry of Forests and Range in Surrey, BC (49° 03′ N, 122° 42′ W) were used in this study (see Chapter 3 for details).  The experiment harboured a total of 854 clones collected from 188 populations covering 36 river drainages (Xie et al. 2009).  Scion material was collected from the original ortets (donor trees) by a helicopter, leaving large distances among collected trees (i.e., attempts were made to sample scattered trees within a stand to avoid the selection of members of the same clone, although no minimal distance was imposed, Xie et al. 2009).  The present study’s sample included trees representing the species natural range in British Columbia covering 15° and 12° of latitude (range: 44 06′ to 58° 56′ N) and longitude (range: 121° 39′ to 133° 11′ W), respectively.  Height measurements (cm) and date to bud burst and severity of the Valsa sordida Nitschke fungus infection (categorical data  with 6 and 5 classes, respectively) were assessed during the spring of the second and in the fall of the third growing season, respectively (Xie et al. 2009).  The studied 170 individuals were genotyped using nine nuclear microsatellite markers (Chapter 2) and 185 SNPs from 10 candidate (adaptive, defence, and housekeeping) gene loci (Chapter 3) to assess population structure and estimate pair-wise kinship coefficients, linkage disequilibrium, and association genetics analyses. 5.2.2 Statistical analyses Genotypic data from the 9 SSRs and 185 SNPs were used to determine both population structure (i.e., using a blind test without prior population information) and pair-wise kinship coefficients among the studied 170 individuals using the model-based Bayesian clustering algorithm (STRUCTURE; Pritchard et al. 2000) and the kinship procedure of Ritland (1996) using the software SPAGeDi (Hardy and Vekemans 2002), respectively. The 185 SNPs were reduced to 87 after the removal of all singletons (95) and those with extensive missing data (3).  Single-marker (SNP) associations were determined using the Unified Mixed-Model (Yu et al. 2006): y = Sα + Qv + Zu + e   (Eq. 1) where y, α, v, u, and e are vectors of phenotypic observations, SNP effect (fixed), population effect (fixed), kinship effect (random), and residual effect, respectively, and S, Q, Z are incidence   109 matrices of 1s and 0s relating y to α, v, and u, respectively.  Several models were tested including the GLM without population and family structures and UMM with different combinations of population and family structures (Q+K, Q-K, -Q+K, and -Q-K), additionally as recommended by Price et al. (2006), the computationally demanding Q matrix was substituted by the principal component analysis’ P matrix resulting in two additional models (P+K and P- K).  All analyses were conducted using TASSEL version 2.0.1 (http://www.maizegenetics.net) (release April 2009) and positive associations were determined at the nominal P < 0.05% level and were further corrected for multiple testing using the false discovery rate method for multiple comparisons (FDR) (Benjamini and Hochberg 1995) of P < 0.05%.  The FDR thresholds were calculated using the QVALUE package (Storey and Tibshirani 2003) implemented in R (http://www.r-project.org/). Model comparisons were based on the degree of deviation from uniform distribution and estimates of the mean square differences (MSD) between the observed and expected P-values of all SNPs following the method described in Stich et al. (2008).  A high MSD value indicates a strong deviation from a uniform distribution of P-values, suggesting that the type I error of the tested model could be substantially higher than the nominal α-level (Stich et al. 2008). 5.3 Results and discussion The STRUCTURE analysis failed to identify any clear population grouping in the studied 170 individuals, indicating that the sampled trees probably are drawn from a single large panmictic population (Figure 5.1).  These observations are supported by the low FST values of 0.060 and 0.015 obtained for SSRs and SNPs, respectively, and are in line with the 0.078 reported for populations from Washington and Oregon using allozyme markers (Weber and Stettler 1981), (and 0.07; Chapter 2) highlighting the fact that 94-98% of the total variation resides within populations, an indication of substantial gene flow.  This pattern is typical of most tree species that are characterized by an extended range, and is a reflection of the species’ successional status, breeding system, and dispersal mode (Hamrick and Godt 1996).  The resulting pattern from the STRUCTURE program as well as the large within-population variation indicate that the use of the Q matrix in the association genetics analyses was not warranted, and furthermore that its inclusion is expected to affect the association test’s statistical power (see below).  It should be pointed out that the STRUCTURE program has low power to detect weak population structure, especially if there is isolation by distance, and high level of sib mating   110 (http://pritch.bsd.uchicago.edu/software /structure22/readme.pdf software documentation), as is the case in the present study (see below) (Pritchard et al. 2000). The pair-wise relatedness among the sampled 170 trees resulted in the identification of kinship relationships, indicating the presence of within-population structure.  If family structure is truly present, then the pair-wise relatedness is expected to correlate with geographic distance. In fact, negative and highly significant Mantel correlations, between kinship and geographic distance matrices, was observed for the SSR (-0.017, p < 0.05) and SNP (-0.062, p < 0.001) markers, probably reflecting the presence of isolation-by-distance pattern leading to family structure and providing justification to the inclusion of the K matrix in the association genetics analyses. The observed low and fast decaying linkage disequilibrium LD (Chapter 3, Figure 3.2) supports the use of a candidate gene approach for association genetics for this and other species where the cost of genome-wide sequencing is currently prohibitive.  While genome-wide scan approaches are ideal for initial gene discovery, ultimately study of individual candidate genes is necessary and essential to identify the biological meaning of the causal associations. Great disparity was observed among the model results (Table 5.1).  The -Q+K model was used as reference for comparison for the following reasons: 1) lack of population structures, 2) presence of family structure, and 3) the model’s MSD P-value was the second lowest among the tested seven models (Table 5.2), and 4) the model’s plot between observed and expected P- values was very close to a uniform distribution (Figure 5.3a).  These findings are consistent with those reported by Stich et al. (2008) who found the ANOVA approach to be inappropriate for the complicated pedigree they used in their study.  This situation is similar to the current case in which relatedness exists, but for which the observed lack of structured pedigree precluded its inclusion in the implemented ANOVA model. A total of 27 positive associations were detected across five out of the seven tested models.  Only 21 were significant using a FDR of < 0.05 across different models (Table 5.1). However, no associations were found for the Q-K and -Q-K models.  These associations were related to height and resistance to Valsa sordida and were restricted to 21 SNPs present in four out of the studied 10 candidate gene loci; namely, trypsin inhibitor (TI-3 and TI-4) and dehydrin (Dhn), with functional roles related to defence and stress response (Major and Constabel 2001; Caruso et al. 2002), and GIGANTEA (GI) involved in photoperiod response (Park et al. 1999).   111 Multiple associations were observed with several SNPs within the candidate genes.  More notable were TI-4 and Dhn, which were associated with height in GLM and -Q+K models. Similarly, TI-3, Dhn, and GI had multiple associations with Valsa sordida resistance (Table 5.1, Figure 5.2).  Additionally, the same 10 associations that were associated with height for the GLM were olso observed for the -Q+K model; however, they were rejected by the false discovery rate (FDR) test, an indication of their lack of authenticity (Table 5.1). The observed multiple associations within a specific candidate gene confirm their authenticity and they merely mirror the same association (e.g., TI-3 for the -Q+K model).  This was confirmed by SNPs membership within the TI-3 haplotype block (Figure 5.3).  The variable percent variance attributable to these multiple associations within a candidate gene (e.g., Valsa sordida resistance within the TI-3 and GI for the -Q+K model) is a reflection of the SNP allelic frequencies.  As SNP frequency increases, its average effect is expected to decrease and this was demonstrated in Figure 5.4 (Falconer and Mackay 1996).  Similarly, the two additional associations observed between SNP41 and SNP42 of the GI, showing different percent variance reflect differences in SNPs frequencies (Table 5.1). Irrespective of which population structure matrix was used (Q or P), the outcomes obtained from the tested models generally produced somewhat unexpected results and are summarized as follows: 1- It is expected that the complete Q+K (Yu et al. 2006) and P+K (Price et al. 2006) models will be complementary; however, they produced mixed results, with agreement for SNP50-height and disagreement for SNP49-height associations, which were undetected by the former but detected by the latter (Table 5.1), 2- The complementary Q-K and P-K models produced different results, where the former failed to detect any association while the latter detected SNP49- and SNP50-height associations which were identified by the GLM (Table 5.1), 3- Since the Q and K matrices were not included in the -Q-K, then this model is expected to be complementary to GLM; however, the former failed to detect any association while the latter detected multiple associations with height (Table 5.1), and 4- The correct -Q+K model, detected association with height; however, they were rejected by the FDR test as well as multiple associations with resistance to Valsa sordida   112 spanning three candidate gene loci and representing a wide range of the total phenotypic variation (R2 = 5-45%) (Table 5.1). Additionally, mean squared difference (MSD) and the plot of observed vs. expected P-values (Stich et al. 2008) were used to compare the tested models and produced the following conclusions: 1- Only GLM and -Q+K produced close to uniform distribution while the remaining models deviated substantially from uniformity (Figure 5.5a), 2- The GLM produced the lowest MSD while the remaining models produced values orders of magnitude higher than that observed for the GLM model (Table 5.2), 3- The inclusion of the K matrix in any model was associated with substantial reduction in MSD values, and 4- With the exception of the GLM model, the Spearman’s rank correlations between the - Q+K and remaining models were either moderate or low (Table 5.2). The results from these comparisons highlight the importance of model structure and the appropriate inclusion of the selected model’s sources of variation (i.e., Q and/or K matrices).  As indicated above, the inclusion of the K matrix in the analysis is warranted due to the presence of relatedness (family structure) which was supported by the negative and significant correlation between the pair-wise kinship and pair-wise geographic distance estimates.  The detection of multiple associations with the resistance to Valsa sordida infection is tantalizing and their authenticity is supported by their presence in one haplotype block (Figure 5.3).  The high R2 values observed for these associations possibly explained by the gene-for-gene hypothesis of pathology (Flor 1955) which implies vertical resistance, meaning the presence of a small number of major genes controlling the resistance and hence the observed high R2 values (Table 5.1). Finally, it should be pointed out that these associations should be considered as redundant associations since the majority of the SNPs implicated are present in one haplotype block (Figure 5.3). Ingvarsson et al. (2008) reported two significant associations between timing of bud set and two SNPs (T608N and L1078P) within the Phytochrome B2 (PHYB2) locus in European aspen (P. tremula) after including the Q and K matrices in their analyses.  They also indicated the lack of population structure and presence of relatedness in their tested material; accordingly,   113 these two associations were reanalyzed to evaluate the impact of removing the Q matrix on the reported associations as well as comparing the various models tested in the present study.  The analyses of Ingvarsson et al. (2008) data produced the following results: 1- No changes in the R2 value were observed between bud set and either of the PHYB2.1823 and 6030 SNPs across the two models (Q+K and -Q+K) (Table 5.1), 2- Slight differences in R2 values were produced across the different models (Table 5.1), 3- -Q+K model produced low MSD and did not differ significantly from that of the Q+K, 4- Including the K matrix in the analysis produced lower MSD across models, and 5- Plots of MSD between observed and expected P-values of all models were very close to uniform distribution (Figure 5.5b), indicating the minor role of population structure on the observed associations. It is interesting to note that while the magnitude of these two associations did not change after the removal of the Q matrix, their persistence, albeit with slight differences across the different models, is an indication of their authenticity (Table 5.1). It is noteworthy that there was a drastic difference in excluding/including the population structure as a source of variation in the association model between P. trichocarpa and P. tremula.  This could be the result of the two species’ historical colonization patterns after glaciation and their contemporary population structure (Hall et al. 2007; Ingvarsson et al. 2008). The suitability of marker-based relative kinship estimates, specifically in cases with complicated pedigree, for studying quantitative traits in natural populations was demonstrated by Ritland (1996) and was successfully implemented as the K matrix by Yu et al. (2006) in their Unified Mixed Model.  In complicated familial relationships, the Q+K approach is considered to be more appropriate than the GLM; however, the latter is suitable where structured pedigree such as those produced from mating designs are present. Neale and Savolainen (2004) presented a convincing case for the suitability of natural population sampling for association genetics studies by capitalizing on historic genetic recombination.  I support this assertion; however, I also caution against the inappropriate inclusion of unwarranted population and/or family structures in the analyses.  While the population biology of P. trichocarpa and P. tremula and the methods of sampling implemented in the studied material precluded population structure, in cases where structured pedigree exists such as those commonly generated through selective breeding mating designs (i.e., half- and full- sib families), the GLM approach could be rewarding, by providing the opportunity for the   114 explicit inclusion of family structure and its interactions in the model, thus enabling the implementation of mixed-models that are capable of executing the association genetics analysis in one step, specifically when the plant material is tested over multiple sites and replications, where they could be considered simultaneously.  Not only will this approach provide insight into the associations nature but it also will highlight the role of genotype-environment interaction of the detected associations. In conclusion, it is recommended that many criteria sould be addressed while conducting association mapping studies that include: the inclusion of proper sources of variation in the selected genetic association model, stringent significant threshold level such as 1 or 5% concurrently with a stringent false discovery rate test, and the determination of mean squared differences as well as comparing the observed vs. expected P-values, as the risk of reporting false positives is more detrimental than reporting false negatives.    115 Table 5.1 Associations between SNPs and height and resistance to Valsa sordida in P. trichocarpa and date of bud set in P. tremula across the seven compared models showing the SNP number, its candidate gene, and its attributable % of phenotypic variance (R2).  Listed associations were significant at P ≤ 0.05 and were verified by the false discovery rate method. UMM Species Attribute SNP Candidate gene GLM Q+K Q-K -Q+K -Q-K P+K P-K 24 0.096   r 25 0.096   r 27 0.096   r 28 0.096   r 31 0.096   r 32 0.096   r 33 0.096   r 34 Trypsin inhibitor (TI-4) 0.096   r  49 0.097   r  0.115 0.115 H e i g h t  50 Dehydrin (Dhn) 0.123 0.125  r  0.141 0.141  4    0.073 5    0.194 11    0.446 12    0.125 15    0.061 17    0.086 18    0.065 20    0.216 21 Trypsin inhibitor (TI-3)    0.199  41    0.060 42 GIGANTEA (GI)    0.047  P .  t r i c h o c a r p a  V a l s a  s o r d i d a  r e s i s t a n c e  50 Dehydrin (Dhn)    0.100 1823 r 0.084 r 0.084 r 0.084 0.074 P. tremula Bud set 6030 PHYB2 0.092 0.078 0.092 0.078 0.092 0.079 0.093 r: rejected significant associations by the false discovery rate test.   116    Table 5.2 Mean squared differences (MSD) between observed and expected P-values for the compared seven association genetics models and the Spearman’s rank correlation ρ between the -Q+K and the other models’ P-values values for P. trichocarpa and P. tremula.  P. trichocarpa P. tremula Model MSD Spearman’s ρ MSD Spearman’s ρ GLM 0.00000001 0.77058 0.00920 0.90575 Q+K 0.00068941 0.19256 0.00547 0.99953 Q-K 0.00118141 0.08569 0.00873 0.91142 -Q+K 0.00004162 - 0.00563 - -Q-K 0.00123583 0.10307 0.00932 0.90635 P+K 0.00047575 0.26554 0.00562 0.99940 P-K 0.00123362 0.09361 0.00949 0.90614   117    Figure 5.1 STRUCTURE output showing lack of structure among the studied 170 individuals using 9 SSRs and 182 SNPs under K = 3.  Each vertical line represents an individual and the proportion of each color represents that individual’s posterior membership to a specific genetic background.     118    Figure 5.2 The relationship between the genotypic classes of SNPs associated with severity of Valsa sordid infestation (%).  Only significant associations in the trypsin inhibitor (TI- 3), GIGANTEA (GI), and dehydrin (Dhn) gene loci are given (see Table 5.1 for SNP numbers).     119    Figure 5.3 Haplotype block structure and correlation among SNPs showing positive association with resistance to Valsa sordida.  Haplotype block indentifies TI-3 candidate gene locus and SNP code numbers 1, 2, 3, …, 9 (top) represent the actual P. trichocarpa SNP numbers (Table 5.1) in ascending order (i.e., SNPs 4, 5 and 6 correspond to numbers 4, 5 and 11 in the haplotype blocks).  Block structure was based on the LD coefficient r2 using haploview (Barrett et al. 2005).     120    Figure 5.4 Correlation between SNPs frequency and percent variance associated with resistance to Valsa sordida.      121    Figure 5.5 Plot of the observed vs. expected P-values for the compared seven association genetics models using 170 and 116 P. trichocarpa (a) and P. tremula (b) trees, respectively.       122 5.4 References Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J R Stat Soc B 57:289-300. Brown, G. R., G. P. Gill, R. J. Kuntz, et al. 2004. Nucleotide diversity and linkage disequilibrium in Loblolly pine. PNAS 101:15255-15260. Caruso, A., D. Morabito, F. Delmotte, et al. 2002. Dehydrin induction during drought and osmotic stress in Populus. Plant Physiol Biochem 40:1033-1042. Devlin, B., and K. Roeder. 1999. Genomic control for association studies. Biometrics 55: 997- 1004. Eckert, A. J., A. D. Bower, J. L. Wegrzyn, et al. 2009. Association genetics of coastal Douglas- fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-hardiness related traits. Genetics 182:1289-1302. El-Kassaby, Y. A. 1982. Association between allozyme genotypes and quantitative traits in Douglas-fir [Pseudotsuga menziesii (Mirb.) Franco]. Genetics 101:103-115. Falconer, D. S., and T. F. C. Mackay. 1996. Introduction to Quantitative Genetics, Ed 4. Longmans Green, Harlow, Essex, UK. Flint-Garcia, S. A., J. M. Thornsberry, and E. S. Buckler. 2003. Structure of linkage disequilibrium in plants. Annu Rev Plant Biol 54:357-374. Flor, H. H. 1955. Host-parasite interaction in flax rust - its genetics and other implications. Phytopathology 45:680-685. Garcia-Gil, M. R., M. Mikkonen, and O. Savolainen. 2003. Nucleotide diversity at two phytochrome loci along a latitudinal cline in Pinus sylvestris. Mol Ecol 12:1195-1206. Gilchrist, E. J., G. W. Haughn, C. C. Ying, et al. 2006. Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol 15:1367-1378. Gonzalez-Martinez, S. C., D. Huber, E. Ersoz et al. 2008. Association genetics in Pinus taeda L. II. carbon isotope discrimination. Heredity 101:19-26. Gonzalez-Martinez, S. C., N. C. Wheeler, E. Ersoz, et al. 2007. Association genetics in Pinus taeda L. I. wood property traits. Genetics 175:399-409.   123 Gonzalez-Martinez, S. C., E. Ersoz, G. R. Brown, et al. 2006. DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda L. Genetics 172:1915-1926. Hall, D., V. Luquez, V. M. Garcia, et al. 2007. Adaptive population differentiation in phenology across a latitudinal gradient in European aspen (Populus tremula, L.): A comparison of neutral markers, candidate genes and phenotypic traits. Evolution 61:2849-2860. Hamrick, J. L., and M. J. W. Godt. 1996. Effects of life history traits on genetic diversity in plant species. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 351:1291-1298. Hardy, O. J., and X. Vekemans. 2002. SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes 2:618-620. Hudson, R. R., M. Slatkin, and W.P. Maddison. 1992. Estimation of levels of gene flow from DNA sequence data. Genetics 132:583-589. Ingvarsson, P. K. 2005. Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European aspen (Populus tremula L., Salicaceae). Genetics 169:945-953. Ingvarsson, P. K. 2008. Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180:329-340. Ingvarsson, P. K., M. V. Garcia, V. Luquez, et al. 2008. Nucleotide polymorphism and phenotypic associations within and around the phytochrome B2 locus in European aspen (Populus tremula, Salicaceae). Genetics 178:2217-2226. Krutovsky, K. V., and D. B. Neale. 2005. Nucleotide diversity and linkage disequilibrium in cold-hardiness- and wood quality-related candidate genes in Douglas fir. Genetics 171:2029-2041. Lander, E. S., and N. J. Schork. 1994. Genetic dissection of complex traits. Science 265:2037- 2048. Lynch, M., and K. Ritland. 1999. Estimation of pairwise relatedness with molecular markers. Genetics 152:1753-1766. Major, I. T., and C. P. Constabel. 2008. Functional analysis of the Kunitz trypsin inhibitor family in poplar reveals biochemical diversity and multiplicity in defense against herbivores. Plant Physiol 146:888-903.   124 Marchini, J., L. R. Cardon, M. S. Phillips, and P. Donnelly. 2004. The effects of human population structure on large genetic association studies. Nat Genet 36:512-517. Neale, D. B., and O. Savolainen. 2004. Association genetics of complex traits in conifers. Trends Plant Sci 9:325-330. Park, D. H., D. E. Somers, Y. S. Kim, et al. 1999. Control of circadian rhythms and photoperiodic flowering by the Arabidopsis GIGANTEA gene. Science 285:1579-1582. Price, A. L., N. J. Patterson, R. M. Plenge, et al. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904-909. Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-959. Risch, N. J. 2000. Searching for genetic determinants in the new millennium. Nature 405:847- 856. Ritland, K. 1996. Estimators for pairwise relatedness and individual inbreeding coefficients. Genet Res 67:175-185. Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, et al. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other analyses. Bioinformatics 19:2496-2497. Smith, M. W., and S. J. O'Brien. 2005. Mapping by admixture linkage disequilibrium: Advances, limitations and guidelines. Nat Rev Genet 6:623-632. Stich, B., J. Mohring, H. P. Piepho, et al. 2008. Comparison of mixed-model approaches for association mapping. Genetics 178:1745-1754. Storey, J. D., and R. Tibshirani. 2003. Statistical significance for genomewide studies. PNAS 100:9440-9445. Thornsberry, J. M., M. M. Goodman, J. Doebley, et al. 2001. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet 28:286-289. Thumma, B. R., M. R. Nolan, R. Evans et al. 2005. Polymorphisms in Cinnamoyl CoA Reductase (CCR) are associated with variation in microfibril angle in Eucalyptus spp. Genetics 171:1257-1265. Voight, B. F., and J. K. Pritchard. 2005. Confounding from cryptic relatedness in case-control association studies. Plos Genetics 1:302-311. Xie, C-Y., C. C. Ying, A. D. Yanchuk, et al. 2009. Ecotypic mode of regional differentiation due to restricted gene migration: a case in black cottonwood (Populus trichocarpa) along the Pacific Northwest coast. Can J For Res 39:519-526.   125 Yu, J. M., G. Pressoir, W. H. Briggs, et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203-208. Weber, J. C., and R. F. Stettler. 1981. Isoenzymes variation among ten populations of Populus trichocarpa Torr. et Gray in the Pacific Northwest. Silvae Genet 30:82-87. Zhao, K. Y., M. J. Aranzana, S. Kim, et al. 2007. An Arabidopsis example of association mapping in structured samples. Plos Genetics: 3, e4. doi:10.1371/journal.pgen.0030004.    126 Chapter 6. Thesis conclusions 6.1 Introduction The genus Populus harbours approximately 29 species distributed across six sections (Eckenwalder 1996) and has great importance at many levels including ecosystem services, wood production, and scientific research.  Among tree genera, Populus has many attributes that make it the subject of a wide range of scientific and commercial applications.  Poplars are distinguished by their obligate outcrossing mating system, short life span, long-distance seed and pollen dispersal, and ease of propagation and hybridization.  Despite their general characteristics and economic importance, research programs have mainly focused on natural or man-made hybrids, presumably due to their fast growing habits and their considerable genetic variability compared to their parents (Rajora and Rahman 2003).  Populus species have received modest research attention, until the production of a complete sequence of Populus trichocarpa Torr. & Gray (Tuskan et al. 2006), a native North American tree. The primary goal of this thesis was to determine the amount of genetic variability of Populus, with emphasize on P. trichocarpa natural populations in British Columbia.  To achieve this, I first characterized the level of genetic differentiation within and among natural populations of P. trichocarpa using microsatellite markers (SSRs).  Then I used candidate gene loci to determine the level of sequence diversity within P. trichocarpa and among P. trichocarpa, P. balsamifera L., and P. tremula L. to contrast their sequence diversity and to estimate their time of divergence.  Finally, I used a sample of P. trichocarpa natural populations to compare different methods of association genetics. 6.2 Population structure and landscape genetics of P. trichocarpa With the use of 12 maternal (chloroplast SSRs) and nine highly variable nuclear SSRs, I found that the studied P. trichocarpa populations formed two main groups based on chloroplast SSR, supporting a possible expansion from two glacial refugia, previously observed for many taxa including several tree species in the Pacific Northwest (Soltis et al. 1997; Jaramillo-Correa 2009).  Additionally, I observed a high level of admixture among populations and found that most of the genetic variability was attributable to within as compared to among populations, which is in line with previous reports in other Populus species (see Slavov and Zhelev 2010 for   127 review).  The observed weak among-populations differentiation is probably attributable to long distance seed and pollen dispersal (Slavov et al. 2009) combined with excessive mating among related individuals.  Using a spatial genetic approach, I identified a clear genetic discontinuity that subdivided the studied area into two major spatially distinct north and south groups; the north-south distribution corresponded to a north-south cline observed for ecophysiological and life history traits as well as resistance to fungal infection (Gornall and Guy 2007; Xie et al 2009). 6.3 Sequence diversity and linkage disequilibrium in P. trichocarpa I observed moderate nucleotide diversity (πT) and fast decay of linkage disequilibrium (LD) at the loci studied.  The observed average nucleotide diversity (πT) was either lower or similar to that observed in P. tremula (Ingvarsson et al. 2008) and P. balsamifera (Olson et al. 2010), respectively.  Most of the observed variability was attributable to defence and stress response genes compared to non-defence genes, which probably reflect the former fast evolutionary rate and pronounced effect of genome duplication (Zou et al. 2009).  The less variable (PHYB2) and (Gapdh) gene loci probably are associated with a slower evolutionary rate (Jansson 1999; Zhang and Li 2004).  I observed fast decay of LD across the studied loci, which declined within 700 bp, similar to most outcrossing species including poplars (Gilchrist et al. 2006; Chu et al. 2009).  The fast LD decay observed in P. trichocarpa is attributable to its outbreeding mating system, high recombination rate, and demographic history (Chapter 2). Since selection is an important factor in shaping species genetic variability, I investigated the pattern of non-neutral evolution at the studied loci and did not observe any deviation from neutral expectation at the multilocus level, as indicated from a non-significant Hudson-Kreitman- Aguadé test (HKA, Hudson et al. 1987); however, with the exception of TI-3, the remaining loci showed negative Tajima’s D.  This pattern is best explained by range expansion; nonetheless, more loci and explicit demographic modeling are required to further support this hypothesis. 6.4 Comparative nucleotide diversity in Populus I observed a wide range of nucleotide diversity across the three Populus species (P. trichocarpa, P. balsamifera, and P. tremula).  When compared with the Eurasian P. tremula, both P. trichocarpa and P. balsamifera showed lower nucleotide diversity, which probably reflects admixture in P. tremula from other related species in Europe (Lexer et al. 2005).  The discrepancy in nucleotide diversity in P. trichocarpa compared to P. tremula is may be due to   128 the former still undergoing bottleneck and expansion following ice sheet retreat (Chapter 2). Despite the fact that P. trichocarpa and P. balsamifera are considered as sister species (Eckenwalder 1984), I observed contrasting patterns of genetic variability, which was two-fold higher in the former compared to the latter.  This observation most probably associated with massive range expansion of P. balsamifera from a central admixed population, thus it is expected to harbour substantial genetic variability (Keller et al. 2010).  On the other hand, the lower genetic variability I observed in P. trichocarpa probably is due to its origin from two different glacial refugia (Chapter 2) followed by bottleneck events.  When both sister species are considered as a single species, the average nucleotided diversity was higher than that of individual species estimates, nonetheless, it was similar to that of P. balsamifera.  The level of LD decay was similar in both sister species and declined within 400 bp; however, no decay was apparent in P. tremula within this range.  I obtained information from orthologous sequences that allowed me to estimate the time of divergence of the three species, which extended my understanding of shared polymorphisms among the three species.  I found that the North American species had high sequence similarity and possibly shared a common ancestor approximately 0.76-0.83 million years ago. The incongruent nucleotide diversity estimates obtained for P. trichocarpa in Chapter 3 (i.e; πT = 0.00369 and θw = 0.01) and in Chapter 4 (i.e.; πT = 0.003 and θw = 0.0025) was expected.  On average, 97 trees/locus were sequenced in Chapter 3 compared to 19 in Chapter 4. It is expected that the level of nucleotide diversity be affected by sample size differences. Additionally, LD declined to r2 = 0.18 within <700 bp in P. trichocarpa in Chapter 3 compared to <400 bp in Chapter 4.  The differences in LD decay level, although happened at a short physical distance; is mainly due to the lower number of sites obtained across loci in Chapter 4 compared to that obtained in Chapter 3. 6.5 Association genetics model comparisons in P. trichocarpa I observed positive associations in 21 single nucleotide polymorphisms (SNPs) using a false discovery rate (FDR) of < 0.05 in five out of seven tested models.  The two models lacking the kinship matrix (K), Q-K and -Q-K, produced no associations.  Multiple SNPs within a candidate gene showed significant associations probably reflecting redundant associations (e.g., TI-3 for the -Q+K model) that were apparent from their existence in one haplotype block. Additionally, I observed a variable percentage of variance explained by these multiple   129 associations, probably reflecting SNP allelic frequencies.  I observed a contrasting pattern among the tested models, where complementary models produced opposite results.  For instance, GLM and -Q-K models are expected to be complementary since both lack population and family structures; nevertheless, they produced different results.  When I compared the models using mean square differences values (MSD) between observed and expected P-values, only GLM produced the lowest MSD value.  In general, the inclusion of the K matrix in any model considerably reduced the MSD.  The observed contradictory results across models call for the appropriate inclusion of the source of variation in the used model.  Finally, applying a strict significant threshold level such as 1 or 5% concurrently with strict (FDR) is better than the relaxed 10% for the determination of authentic association. 6.6 Limitations of the current study Studies of population structure provide comprehensive information on the amount and distribution of genetic variation within and among populations and require highly variable reliable markers that vary within species.  In my thesis, I conducted the first assessment of P. trichocarpa genetic diversity on a large scale (Chapter 2).  While I believe the number of SSRs loci used provided adequate information for detecting the species genetic diversity and population structure, more markers are probably required to capture most of the species genetic variability.  Nonetheless, in the case of highly variable SSRs (e.g., current study), the inclusion of more individuals along with sampling from non-sampled geographic gaps should also provide an appropriate estimate of species genetic diversity.  Additionally, sampling from the entire species range and specifically its extremes distribution (e.g.; southern Alaska and Pan-handle) would be beneficial in the verification of the proposed existence of two glacial refugia. Although I examined the application of different association genetic models using small sample size from natural populations (Chapter 5), it was adequate to highlight the the effect of the proper inclusion of source of variation in association studies.  It would be of great value to validate the the detected associations using larger sample size and possibly applying a simulation approach.  During the current study, the sampled trees were restricted to those available at Surrey location and produced enough tissue for DNA extraction.  For the difficulties in obtaining tissues from northern British Columbia trees at Surrey site, this part is less represented in the current study and need to be considered in future work.    130 6.7 Future research This study adds to the body of knowledge aimed at understanding the genetic variability in a model tree species where we have limited knowledge of the magnitude and extent of its genetic diversity.  Moreover, it highlights its population structure, one of the significant factors affecting association mapping studies.  As to prospective goals, it will be of interest to expand the sampling scheme to include populations of P. trichocarpa from the extremes of its range to enhance our understanding of its demographic history and post glacial migration.  Also, considering its fast LD decay, it will be worthwhile to conduct association mapping at a fine scale using a candidate gene approach focused on genes involved in fibre and wood formation as a potential alternative source of biofuel production.  Finally and most importantly, if related species are to be included in one association genetic study using genes of adaptation, this definitely will allow us to pinpoint the variants that are more conservative across species and potentially have adaptive roles that make them unchangeable across taxa, and not species specific.  This approch is, in part, similar to the admixture mapping where two parental populations from the same species are used to generate advanced generations.  This approach is known to be less sensitive to genetic heterogeneity (Darvasi and Shifman 2005) and requires less effort in genotyping, since fewer numbers of markers are required compared to LD mapping (Buurkle and Lexer 2008).  When expanding this concept to include closely related species rather than populations, it is possiple to detect “common associations” with less effort.  This probably could be achieved using cross-species genetic markers that reduce genotyping efforts, while nonetheless providing diverse phenotypic traits across species of interest.   131 6.8 References Buerkle, C. A., and C. Lexer. 2008. Admixture as the basis for genetic mapping. Trends in Ecol Evol 23:686-694. Chu, Y., X. Su, Q. Huang, et al. 2009. Patterns of DNA sequence variation at candidate gene loci in black poplar (Populus nigra L.) as revealed by single nucleotide polymorphisms. Genetica 137:141-150. Darvasi, A., and S. Shifman. 2005. The beauty of admixture. Nat Genet 37:118-119. Eckenwalder, J. E. 1984. Natural intersectional hybridization between North American species of Populus (Salicacae) in sections Aigerios and Tacamahaca. II. Taxonomy. Can J Bot. 62:325-335. Eckenwalder, J. E. 1996. Systematics and evolution of Populus. Pp. 7-32 in R. F. Stettler, H. D. Jr. Bradshaw, P. E. Heilman, et al. (eds) Biology of Populus and its implications for management and conservation NRC Research Press, Ottawa, Ontario, Canada. Gilchrist, E. J., G. W. Haughn, C. C. Ying, et al. 2006. Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol 15:13671378. Gornall, J. L., and R. D. Guy. 2007. Geographic variation in ecophysiological traits of black cottonwood (Populus trichocarpa). Can J Bot 85:1202-1213. Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159. Ingvarsson, P. K. 2008. Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180:329-340. Jansson, S. 1999. A guide to the Lhc genes and their relatives in Arabidopsis. Trends Plant Sci 4:236-240. Jaramillo-Correa, J. P., J. Beaulieu, K. Damase, et al. 2009. Inferring the past from the present phylogeographic structure of North American forest trees: seeing the forest for the genes. Can J For Res 39:286-307. Keller, S. R., M. S. Olson, S. Silim, et al. 2010 Genomic diversity, population structure, and migration following rapid range expansion in the Balsam Poplar, Populus balsamifera. Mol Ecol 19:1212-122.   132 Lexer, C., M. F. Fay, J. A. Joseph, et al. 2005. Barrier to gene flow between two ecologically divergent Populus species, P. alba (white poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression. Mol Ecol 14:1045-1057. Olson, M. S., A. L. Robertson, N. Takebayashi, et al. 2010. Nucleotide diversity and linkage disequilibrium in balsam poplar (Populus balsamifera). New Phytol 186:526-536 Rajora, O. P., and M. H. Rahman. 2003. Microsatellite DNA and RAPD fingerprinting, identification and genetic relationships of hybrid poplar (Populus × canadensis) cultivars. Theor Appl Genet 106:470-477. Slavov, G. T, and P. Zhelev. 2010. Salient biological features, systematics, and genetic variation of Populus. Pp 15-38 in S. Jansson, R. Bhalerao, A. T. Groover (eds) Genetics and genomics of Populus Springer, NY. Slavov, G. T., S. Leonardi, J. Burczyk, et al. 2009. Extensive pollen flow in two ecologically contrasting populations of Populus trichocarpa. Mol Ecol 18:357-373. Soltis, D. E., M. A. Gitzendanner, D. D. Strenge, et al. 1997. Chloroplast DNA intraspecific phylogeography of plants from the Pacific Northwest of North America. Pl Syst Evol 206:353-373. Xie, C-Y., C. C. Ying, A. D. Yanchuk, et al. 2009. Ecotypic mode of regional differentiation due to restricted gene migration: a case in black cottonwood (Populus trichocarpa) along the Pacific Northwest coast. Can J For Res 39:519-526. Zhang, L., and W. H. Li. 2004. Mammalian housekeeping genes evolve more slowly than tissue- specific genes. Mol Biol Evol 21:236-239. Zou, C., M. D. Lehti-Shiu, M. Thomashow, et al. 2009. Evolution of stress-regulated gene expression in duplicate genes of Arabidopsis thaliana. PLoS Genet 5:e1000581 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0071044/manifest

Comment

Related Items