Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Genetic evaluation of natural and domesticated lodgepole pine populations using molecular markers Liewlaksaneeyanawin, Cherdsak 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-ubc_2006-200179.pdf [ 9.94MB ]
Metadata
JSON: 831-1.0075040.json
JSON-LD: 831-1.0075040-ld.json
RDF/XML (Pretty): 831-1.0075040-rdf.xml
RDF/JSON: 831-1.0075040-rdf.json
Turtle: 831-1.0075040-turtle.txt
N-Triples: 831-1.0075040-rdf-ntriples.txt
Original Record: 831-1.0075040-source.json
Full Text
831-1.0075040-fulltext.txt
Citation
831-1.0075040.ris

Full Text

GENETIC E V A L U A T I O N OF N A T U R A L A N D DOMESTICATED LODGEPOLE PINE POPULATIONS USING M O L E C U L A R M A R K E R S by C H E R D S A K L I E W L A K S A N E E Y A N A W I N B.Sc., Kasetsart University, 1994 M . S c , The University of British Columbia, 2000 A THESIS SUBMITTED IN PARTIAL F U L F I L M E N T OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE F A C U L T Y OF G R A D U A T E STUDIES (Forestry) THE UNIVERSITY OF BRITISH C O L U M B I A July 2006 © Cherdsak Liewlaksaneeyanawin, 2006 11 ABSTRACT Lodgepole pine (P. contorta ssp. latifolia) has been recently subject to breeding and artificial propagation, and has also undergone migration and range expansion since the Pleistocene glaciation. This thesis studied the genetic effects of these processes, in terms of patterns of genetic diversity, sibship structure, and mating system. For these inferences, cross-species transfer of Pinus teada microsatellite (SSR) markers provided a battery of 23 polymorphic microsatellite primer pairs for lodgepole pine, and amplified fragment length polymorphism (AFLP) markers were also developed. Genetic variability in natural and domesticated populations from interior British Columbia (Prince George breeding zone) was investigated with both SSRs and AFLPs. Changes from the natural population, to the breeding population, to the seed orchard, and finally to seed and seedling populations, were estimated. AFLPs and SSRs did not always reveal the same trends, except in portraying the genetic relationship among natural and domesticated populations. Overall, some reduction of genetic diversity was observed along the domestication process (2-10%), but only for some stages and dependent on marker type. Two peripheral lodgepole pine populations, representing outcomes of the historical migration-expansion of this species, were sampled and genotyped for progeny arrays. Estimation of mating system parameters revealed high outcrossing rate and low correlated paternity in both populations, as well as the difficulties in using dominant AFLP markers for these inferences. Sibship analyses of SSRs supported low correlated paternity. Biparental inbreeding was significant in both, and more pronounced in the most northern population (compared to the eastern population), reflecting lower stand density and founder effects. A new characterization of genomic diversity, the "correlation of diversity between linked loci", was estimated in both populations, as well as in the central (Prince George) population. Significant correlations were observed in both peripheral populations, but not in the central population, suggesting range-expansion effects. As expected, higher correlations were observed for more closely linked loci. The correlation of diversity extended out to ca. 10 map units in both peripheral populations, much further than linkage disequilibrium can extend. While such correlations may be due to genetic drifts, it is possible that the signature of selection may be obscured by demographic factors associated with mating system. TABLE OF CONTENTS ABSTRACT ii LIST OF TABLES ix LIST OF FIGURES xii ACKNOWLEDGEMENTS xiv CO-AUTHORSHIP STATEMENT xv CHAPTER 1 INTRODUCTION AND LITERATURE REVIEW 1 1.1 INTRODUCTION 1 1.2 R E S E A R C H THEMES 3 1.3 LITERATURE REVIEW 5 1.3.1 Biology of lodgepole pine 5 1.3.1.1 Taxonomy and nomenclature 5 1.3.1.2 Geographic distribution 5 1.3.2 Microsatellite (SSR) 6 1.3.2.1 Microsatellites sourced from EST databases 7 1.3.2.2 Microsatellites sourced from related species 8 1.3.3 Amplified Fragment Length Polymorphism (AFLP) 9 1.3.3.1 A F L P development in trees 9 1.3.4 Analysis of SSR and A F L P data 10 1.3.4.1 Models of microsatellite mutations 10 1.3.4.2 Estimating population differentiation and genetic distance with SSR markers 11 1.3.4.3 Estimating gene diversity from AFLP markers 12 1.3.5 Applications of SSR and A F L P in forest genetics 13 1.3.5.1 Applications of SSR in forest genetics 13 1.3.5.2 Applications of AFLPs in forest genetics 14 1.3.6 Genetic effects of domestication in forest trees 14 1.3.6.1 History of lodgepole pine breeding programs 15 1.3.6.2 Genetic gain and diversity 15 iv 1.3.7 Mating system, sibship structure, and informative genetic markers 19 1.3.8 Genetic variability within the genome 21 1.4 REFERENCES 28 CHAPTER 2 SINGLE-COPY, SPECIES-TRANSFERABLE MICROSATELLITE MARKERS DEVELOPED FROM LOBLOLLY PINE ESTS 34 2.1 INTRODUCTION 34 2.2 M A T E R I A L S A N D M E T H O D S 36 2.2.1 Source of microsatellites 36 2.2.1.1 P. taeda EST databases 36 2.2.1.2 P. taeda microsatellites (PtTX series) 37 2.2.2 Cross-species transferability of microsatellites 37 2.2.3 D N A extraction and polymerase chain reaction optimization 38 2.2.4 Scoring and estimating genetic polymorphism 38 2.2.5 Sequence verification and comparison 39 2.2.6 Mendelian inheritance analysis of SSR markers 39 2.3 RESULTS 40 2.3.1 Frequency and distribution of microsatellites 40 2.3.2 SSR-marker development and cross species transferability 40 2.3.3 Sequence comparison at microsatellite loci 42 2.3.4 Mendelian inheritance analyses 43 2.4 DISCUSSION 44 2.4.1 Prevalence of microsatellites in ESTs 44 2.4.2 Transferability of P. taeda microsatellites 44 2.4.3 Analysis of sequence variation at the microsatellite loci 46 2.4.4 Functional roles of EST microsatellites 46 2.4.5 Implications for forest genetic research 47 2.5 REFERENCES 61 CHAPTER 3 GENETIC VARIABILITY OF LODGEPOLE PINE: COMPARISON OF MICROSATELLITE AND AFLP MARKERS 64 3.1 INTRODUCTION 64 3.2 M A T E R I A L S A N D M E T H O D S 67 3.2.1 Plant materials 67 3.2.2 D N A extraction and procedures of SSR and A F L P 67 3.2.3 Detection of SSRs and AFLPs 68 3.2.4 Data analyses 69 3.2.4.1 Genetic diversity analysis 69 3.2.4.2 Genetic differentiation analysis 69 3.3 RESULTS 71 3.3.1 Genetic variability within populations 71 3.3.2 Genetic variability among populations 72 3.4 DISCUSSION 73 3.4.1 Level of diversity within populations 73 3.4.2 Genetic differentiation 74 3.4.3 Comparison of results from SSR and AFLP markers 75 3.5 REFERENCES 91 C H A P T E R 4 I M P A C T O F D O M E S T I C A T I O N O N G E N E T I C V A R I A B I L I T Y O F L O D G E P O L E P I N E : C O M P A R I S O N O F M I C R O S A T E L L I T E A N D A F L P M A R K E R S 95 4.1 INTRODUCTION 95 4.2 MATERIALS A N D M E T H O D S 98 4.2.1 Plant materials 98 4.2.1.1 Natural populations 98 4.2.1.2 Domesticated populations 98 4.2.2 D N A extraction and procedures of SSR and A F L P 99 4.2.3 Data analyses 100 4.2.3.1 Genetic diversity analysis 100 4.2.3.2 Genetic differentiation analyses 101 4.2.3.3 Genetic similarity analysis among genotypes in breeding population 102 4.3 RESULTS 102 4.3.1 Genetic diversity for breeding, orchard populations, seedlot, and seedling versus natural populations 102 vi 4.3.2 Lost of alleles along domestication process 103 4.3.3 Genetic similarity among individuals in breeding and production populations ..103 4.3.4 Population differentiation and dendrogram 104 4.4 DISCUSSION 105 4.4.1 Genetic diversity in natural vs. domesticated populations 105 4.4.2 Genetic relationships among individuals within breeding and production populations 106 4.4.3 Comparative analysis of genetic diversity and population differentiation 107 4.5 REFERENCES 120 CHAPTER 5 HIGH RESOLUTION ANALYSIS OF BIPARENTAL INBREEDING AND SIBSHIP STRUCTURE IN PERIPHERAL POPULATIONS OF LODGEPOLE PINE 124 5.1 INTRODUCTION 124 5.2 MATERIALS A N D METHODS 127 5.2.1 Plant material 127 5.2.2 D N A extraction and SSR assay 127 5.2.3 Mating system analysis 128 5.2.4 Sibship reconstruction 128 5.3 RESULTS 129 5.3.1 Mendelian inheritance and gene diversity 129 5.3.2 Population estimates of the mating system 129 5.3.3 Individual family estimates of mating system parameters 130 5.3.4 Sibship structure of progeny arrays 131 5.4 DISCUSSION 131 5.4.1 Outcrossing rate 131 5.4.2 Biparental inbreeding and spatial genetic structure 132 5.4.3 Correlated paternity and sibship structure within seed progeny 135 5.5 REFERENCES 144 CHAPTER 6 THE UTILITY OF AMPLIFIED FRAGMENT LENGTH POLYMORPHISMS (AFLPS) FOR MATING SYSTEM ESTIMATION 147 Vll 6.1 INTRODUCTION 147 6.2 M A T E R I A L A N D M E T H O D S 149 6.2.1 Plant materials and D N A isolation 149 6.2.2 SSR and A F L P analyses 150 6.2.3 Mating system analysis 151 6.3 RESULTS 151 6.3.1 Polymorphism of A F L P markers 151 6.3.2 Multilocus (tm) and single (ts) locus outcrossing rates 152 6.3.3 Biparental inbreeding 152 6.3.4 Correlated paternity 153 6.3.5 Correlation of outcrossing rate within families 153 6.4 DISCUSSION 154 6.4.1 Mating system in peripheral populations 154 6.4.2 Comparative study of mating system using SSRs and AFLPs 154 6.4.3 Future uses of microsatellites and AFLPs for mating system analysis in forest trees 156 6.5 REFERENCES 161 CHAPTER 7 THE CORRELATION OF GENE DIVERSITY AT LINKED LOCI IN PERIPHERAL VS. CENTRAL LODGEPOLE PINE POPULATIONS 164 7.1 INTRODUCTION 164 7.2 M A T E R I A L A N D M E T H O D S 167 7.2.1 Plant materials 167 7.2.2 D N A isolation and A F L P analysis 167 7.2.3 Data analyses 168 7.3 RESULTS 168 7.3.1 A F L P polymorphisms 168 7.3.2 Correlation of gene diversity at linked loci 169 7.4 DISCUSSION 170 7.4.1 Correlation of gene diversity at linked loci among lodgepole pine populations . 170 7.4.2 Decline in patterns of gene diversity between linked loci 172 7.4.3 Reduction in gene diversity at linked loci 172 viii 7.5 REFERENCES 182 CHAPTER 8 CONCLUSIONS 185 8.1 M A R K E R D E V E L O P M E N T IN A CONIFER 185 8.2 EFFECTS OF DOMESTICATION 187 8.3 M A T I N G SYSTEM A N D SIBSHIP STRUCTURE 187 8.4 CORRELATION OF G E N E DIVERSITY AT LINKED LOCI 189 8.5 M A R K E R CHOICE 190 8.6 REFERENCES 191 APPENDIX 192 ix LIST OF TABLES Table 2.1 Primer sets of 14 P. taeda L. EST-SSRs markers 49 Table 2.2 Transferable microsatellite loci from P. taeda L. to P. contorta ssp. latifolia 50 Table 2.3 Allele size (in bp) and number of alleles (A) of seven EST-SSR loci in four different pine species 51 Table 2.4 Polymorphisms, indels (insertions/deletions), and base substitution (BS) in flanking regions (relative to P. taeda sequences) in P. contorta ssp. latifolia at eight EST-SSR loci and 16 polymorphic PtTX microsatellite loci that were successful in cross-species amplification 52 Table 2.5 Transferability successes and polymorphisms of SSR markers from P. taeda to P. contorta ssp. latifolia based on the sources of library. A one-way A N O V A indicated no significant difference in P value = 0.17 53 Table 2.6 Repeat structures of SSRs and indels (insertions/deletions) in the flanking regions (relative to P. taeda sequences) between four pine species at seven EST-SSR loci 54 Table 2.7 The percentage of base substitution in the flanking regions between species 55 Table 2.8 Log-likelihood G test on segregation ratios of 19 microsatellite loci in P. contorta ssp. latifolia seeds 56 Table 3.1 Location of the 10 P. contorta natural populations from Prince George, BC 78 Table 3.2 Sequences of primers and adaptors used for amplified fragment length polymorphisms analysis 79 Table 3.3 Estimates of genetic diversity for each microsatellite locus over the 10 P. contorta natural populations 80 Table 3.4 Genetic variation at SSR and AFLP loci for the 10 P. contorta populations 81 Table 3.5 Spearman rank correlation coefficients (r) showing the association of different diversity measures among populations 82 Table 3.6 Estimates of genetic diversity and population structure for each microsatellite locus over the 10 P. contorta natural populations 83 Table 3.7 Pairwise comparison matrix of F S T estimates for SSRs (below diagonal) and for AFLPs (above diagonal) from 10 natural populations of P. contorta ssp. latifolia 84 X Table 3.8 Analysis of molecular variance results based on the number of different alleles (FST) from the 10 natural population of P. contorta ssp. latifolia with SSRs and AFLPs 85 Table 3.9 Genetic variability in P. contorta ssp. latifolia from isozyme and D N A marker analyses 86 Table 3.10 Comparative analysis of genetic diversity at population (P), region (R), and taxa (T) level as well as population differentiation in natural plant populations using SSR and AFLP markers 87 Table 4.1 Genetic diversity parameters for natural versus domesticated populations 111 Table 4.2 Distribution of microsatellite alleles over allele frequency classes along the domestication process (rare: P < 0.01, low: 0.01 < P < 0.25, intermediate: 0.25 < P < 0.75, high: P > 0.75) 112 Table 4.3 Allele losses along the domestication process (compared to natural populations) 113 Table 4.4 Pairwise comparison matrix of FST estimates for SSRs (below diagonal) and for AFLPs (above diagonal) among the five tested populations of P. contorta ssp. latifolia 114 Table 5.1 Allelic diversity at 11 microsatellite loci in two natural populations of P. contorta ssp. latifolia 137 Table 5.2 Population estimates of mutilocus (tm) and single-locus (ts) outcrossing rate, parental inbreeding coefficient (F), multilocus (rp ( m)) and single (rP(S)) correlation of paternity, correlation of selfing between loci (rsi), and correlation of t between individuals within families (rt) for P. contorta ssp. latifolia from Carbondale and Whitehorse populations, as estimated using 11 microsatellite loci 138 Table 5.3 Estimates of mutilocus (tm) and single-locus (ts) outcrossing rate, and multi (rP(m)) and single-locus (rP(m)) correlation of paternity at individual family level for P. contorta ssp. latifolia 139 Table 5.4 Numbers of full-sib groups and correlation paternity for each family in P. contorta ssp. Latifolia 140 Table 5.5 Levels of correlated paternity in natural populations of tree species (adapted from Hardy et al. 2004) 141 XI Table 6.1 Sequences of primers and adaptors used for amplified fragment length polymorphisms analysis 158 Table 6.2 Population estimates of multilocus (tm) and single-locus (/s) outcrossing rate, multilocus (VP(m)) and single (/P(S)) correlation of paternity, correlation of selfing between loci (rsi), and correlation of t estimates (rt) for P. contorta ssp. latifolia from Whitehorse and Carbondale populations estimated using SSR and AFLP primer combinations 159 Table 7.1 Number of polymorphic AFLP loci observed from A F L P primer combinations. 176 Xll LIST OF FIGURES Figure 1.1 Phylogeny of 18 species in subgenus Pinus based on an analysis of chloroplast D N A restriction site mutations. Percentages from 2,000 bootstrap replications are shown within circles at nodes (Krupkin et al. 1996) 24 Figure 1.2 Natural distribution of Pinus contorta (Little 1971) 25 Figure 1.3 Relationship between genetic gain and diversity on the pyramid of improvement in tree breeding program (after Johnson et al. 2001) 26 Figure 1.4 A diagram showing tree improvement delivery system with its associated activities (from El-Kassaby 2000a) 27 Figure 2.1 Distribution of the di- and trinucleotide SSR-ESTs. The frequency of SSRs observed within each repeat classes (%) are shown in parentheses 57 Figure 2.2 Nucleotide sequence comparison of two P. taeda microsatellite loci (LOP8 and PtTX 3030). The dots indicate conserved nucleotides (relative to P. taeda) 58 Figure 2.3 An example of inheritance at microsatellite locus PtTX3011 from an open-pollinated half-sib family of P. contorta ssp. latifolia. Arrows indicate the segregation of maternal alleles, confirming the mode of Mendelian inheritance 59 Figure 2.4 a Transferability of P. taeda EST-SSR at locus LOP1 on different pine species, b An example of cross transferability of microsatellite markers from P. taeda to P. contorta at one EST-SSR marker (PtTX2146) and two PtTX markers (PtTX3034 and PtTX3107). Note all allelic size have Licor primer tails 60 Figure 3.1 Allelic distribution for the 12 microsatellite loci over the studied 10 P. contorta ssp. latifolia populations 88 Figure 3.2 Dendrograms generated by neighbour-joining showing the genetic relationships among 10 natural populations. Based on (a) SSR data and Nei's genetic distance; (b) SSR data and (8p)2 distance; (c) SSR data and Dc distance; (d) A F L P data and Nei's genetic distance. The numbers on the branches are the percentage support of 1,000 bootstrap replications 90 Figure 4.1 Location of the breeding population parent trees and seed planning zones. Dashed areas represent seed zone overlaps 115 Xll l Figure 4.2 Number of lost alleles in the domesticated populations according to their frequency (copy number) in the overall sample of 300 individuals from the 10 natural populations in P. contorta ssp. latifolia. A copy number of 5 or less corresponds to a rare frequency class (P < 0.01) 116 Figure 4.3 U P G M A dendrogram of the 92 trees from Prince George breeding population in P. contorta ssp. latifolia based on AFLP data using Jaccard's genetic similarity matrix. Bold and italic indicate seed orchard parents 117 Figure 4.4 Dendrograms generated by U P G M A clustering showing the genetic relationships among populations in the domestication process. Based on (a) SSR data and Nei's genetic distance and (b) AFLP data and Nei's genetic distance. The numbers on the branches are the percentage support of 1,000 bootstrap replications 118 Figure 5.1 The distribution of full-sib group size for each family for Carbondale (a) and Whitehorse (b) P. contorta ssp. latifolia populations 142 Figure 5.2 Relationship between correlated paternity and variances of full-sib group size. 143 Figure 6.1 A F L P profiles of 4 open-pollinated families with 10 individuals each using AFLP primer: Pstl+CAG/Msel+CGG in P. contorta ssp. latifolia 160 Figure 7.1 The distribution of pairwise recombination rates between loci in Whitehorse (a) and Carbondale (b) populations.. 177 Figure 7.2 The correlation of gene diversity between pairs of markers, plotted against estimates of pairwise recombination rates in Whitehorse (a), Carbondale (b), and the 10 Prince George (c) populations (NS; not significant, *; significant at P < 0.05, **; significant at P < 0.01, ***; significant at P < 0.001) 178 Figure 7.3 The average gene diversity between pairs of markers, plotted against estimates of pairwise recombination rates in Whitehorse (a), Carbondale (b) and the 10 Prince George (c) populations 179 Figure 7.4 Decline in correlation of gene diversity with increasing recombination rates evaluated by polynomial regression analysis 180 Figure 7.5 Scatter plots showing no reductions in heterozygosities between pairs of closely linked loci (r<0.05) 181 XIV ACKNOWLEDGEMENTS I am very grateful to my supervisory committee. Dr. Kermit Ritland, my supervisor, provided me with research assistantship and research advice. Dr. Yousry E l - Kassaby for his wealth of knowledge on forest tree domestication and for kindly serving on my research committee. Dr. Carol Ritland, who significantly contributed to my knowledge of molecular markers, for consistence encouragement, patience, and moral support during my study. I would like to thank Andy Benowicz for helping with my sample collection. I also would like to thank the past and present members of the Ritland lab and Genetic Data Centre for providing excellent intellectual environment and friendship: Dilara Ally, Lisa O'Connell, Marissa LeBlanc, Jaclyn Beland, Jennifer Wilkin, Allyson Miscampbell, Mohammed Iddrisu, Charles Chen, Yanik Berube, and Hugh Wellman. My thanks to Dr. Washington Gapare, Dr. Tanya Wahbe, Dr. Nazip Suratman, and Dr. Xin-sheng Hu for their help, discussions, suggestions, and friendship. I am indebted to all Thai students at U B C for their friendship. I am sincerely grateful to my former boss, Dr. Kowit Chaisurisri, who has taught, supported, and encouraged me to study at UBC. Without his help and understanding, my life and career would not have gone this far. I acknowledge the financial support from the Faculty of Graduate Studies (University of British Columbia) for the partial tuition fee awards and Faculty of Forestry for the Donald S. M C P H E E fellowships. Finally, I would like to acknowledge my parents, my sisters and brother, and my cousins for listening, understanding, and encouraging me during my study and beyond. I dedicate this thesis to all of them. XV CO-AUTHORSHIP STATEMENT Chapter 2 is a revised version of the following paper: Liewlaksaneeyanawin C , Ritland C , El-Kassaby Y . A . , and Ritland K. 2004. Single-copy, species-transferable microsatellite markers developed from loblolly pine ESTs. Theoretical and Applied Genetics 109: 361-369. For this study Carol Ritland supervised all the steps from optimizing the protocol for primer testing, sequencing of SSR products, and edited the paper. I conducted the majority of the lab work and wrote the paper. Yousry A. El-Kassaby edited the paper. Kermit Ritland supervised the study and the analyses, and edited the manuscript. Kermit Ritland Yousry A . El-Kassaby Carol Ritland 1 CHAPTER 1 INTRODUCTION AND LITERATURE REVIEW 1.1 Introduction Increased knowledge of forest tree genetics helps in conserving and managing our forests. Many aspects of forest genetics such as understanding levels and distribution of genetic diversity, mating system dynamics, and genetic structure at the species, population, individual, and even chromosome levels have been and still are under extensive investigation for many tree species. Information on the extent and distribution of genetic diversity is a prerequisite for the effective management of forest genetic resources. It is well known that many conifer tree species are characterized to harbour high levels of genetic variation and covers extensive natural and plantation ranges. Life-history characteristics such as longevity, predominant outcrossing, wide seed and pollen dispersal, high fecundity, and late succession were strongly associated with the high levels of genetic diversity observed in forest tree populations (Hamrick et al. 1992). This high level of genetic variation is needed for the maintainance of high adaptability under challenging and unpredictable environmental conditions and for developing rational and effective conservation and tree improvement strategies. Molecular markers have allowed us to understand how various evolutionary forces (e.g., drift, selection, recombination, mutation, and gene flow) influence the patterns of genetic diversity in natural populations, and therefore help us develop suitable conservation and improvement strategies. For example, by comparing genetic variability between natural and breeding and/or production populations, the effects of domestication can be assessed. In the past two decades, the advent of numerous types of D N A markers (e.g., RFLPs, RAPDs, 2 AFLPs, SSRs, and SNPs) in forest trees now allows us to: 1) understand genome organization and evolution, 2) develop genetic studies on various economic traits and help breeding programs through the development of marker assisted selection and breeding, 3) develop effective conservation strategies, and 4) obtain unbiased estimates of genetic parameters for natural and artificial populations. A l l of this knowledge could have the potential to guide efforts for forest conservation and to increase the efficiency of methods for tree breeding. The recent advances in molecular biology, specifically the introduction of many types of molecular markers have resulted in some difficulties when selecting the appropriate marker for a specific biological question (Schlotterer 2004). Lodgepole pine (Pinus contorta ssp. latifolia) is the most economically important pine species in British Columbia. Isozyme studies on its genetic diversity revealed that the species possesses low population differentiation but high within-population diversity, the latter more than 90% as reported by Yeh & Layton (1979), Dancik & Yeh (1983), and Wheeler & Guries (1982). In addition, isozyme studies demonstrated that northern, peripheral populations are more differentiated and harbour less allelic diversity than central populations. Similarly, a decrease of genetic variability in marginal populations relative to central populations was reported by Fazekas & Yeh (2001) using RAPDs. MacDonald & Cwynar (1985) proposed that the differences in variation between central and marginal populations may arise from the process of postglacial migration across the landscape. Repeated long distance founding events during postglacial spread may have cause the observed reduction of allelic diversity of marginal populations. Using multilocus analysis, Fazekas & Yeh (2001) also suggested that founder effect and multilocus Wahlund effect are 3 the prominent forces contributing to the genetic structure of marginal and central population, respectively. At present, there are many reports on population history, genetic diversity, and mating system in natural populations of lodgepole pine. Also, population variation in growth, pest resistance, phenological cycle, and frost hardiness have been extensively studied. However, there is no information on genetic variation in domesticated populations. Little is known about the sibling relationships within seed arrays of an individual lodgepole pine tree's progeny and correlation of gene diversity at linked loci. As well, studies of natural variation have not used modern molecular markers such as SSRs and AFLPs, no have they taken any innovative genomics approaches. 1.2 Research themes With regard to the above considerations, there are three major research themes in this thesis. These themes thread though most chapters and serve to unify the topics of this thesis. Theme 1: What are the properties and attributes of modern molecular genetic markers? I consider the cost vs. benefits of SSRs vs. AFLPs. Included in this are the development of EST-SSR markers (SSRs found in EST databases) and the testing cross-species transferability of EST-SSR markers in other pine species. Also included is the relative statistical information provided by each marker class about levels of diversity and mating system parameters, and any statistical bias afforded by these markers. Theme 2: How are lodgepole pines influenced by human domestication? This research theme examined the impacts of tree domestications on genetic diversity to ensure that seeds or seedlings produced from seed orchards, which are currently primary means of 4 delivering improved planting stock for Crown land reforestation, contain sufficient genetic diversity. Theme 3: How are lodgepole pines influenced by historical population processes? This research theme explored the influences of historical population processes (i.e., repeated founder effects or bottlenecks during population expansions) on mating system and patterns of gene diversity along chromosomes, with implementing a new procedure called "diversity mapping". This may provide us more insights into the evolutionary history of lodgepole pine populations. These themes are implemented in the following topics, whose aims were to: 1) develop microsatellite markers from loblolly pine EST databases and to test their ability to cross amplify from loblolly pine to lodgepole pine, and also to develop A F L P markers in lodgepole pine, 2) establish baseline genetic variation of the species central distribution in B.C. using populations sampled from the Prince George Seed-Planning Zone using microsatellites and AFLPs gene markers, 3) evaluate the effects of the domestication process (natural populations —» breeding population —> production population (seed orchard) —» seed —» seedling) on the species' genetic variability using microsatellites and AFLPs gene markers, 4) investigate the mating system and sibling relationships within seed arrays from peripheral populations, 5) test whether both types of marker (SSR and AFLP) give the same estimates of population diversity, differentiation, and mating system, and 5 6) examine the correlation of gene diversity between linked loci based on multiple half-sib families using A F L P markers. 1.3 Literature Review 1.3.1 Biology of lodgepole pine 1.3.1.1 Taxonomy and nomenclature Lodgepole pine belongs to the family Pinaceae and genus Pinus, which is a large genus containing over 110 species (Richardson & Rundel 1998). An analysis of chloroplast D N A restriction site mutations from 18 species in subgenus Pinus revealed that P. contorta (subsection Contortae) is a sister group to all of the other North American pines except for P. resinosa, which was allied with four Eurasian species (Fig. 1.1) (Krupkin et al. 1996). Lodgepole pine has been divided geographically into four subspecies; I) P. contorta ssp. contorta (shore pine), 2) P. contorta ssp. murrayana (tamarack pine), 3) P. contorta ssp. latifolia (Rocky Mountain lodgepole pine), and 4) P. contorta ssp. bolanderi (Bolander pine) (Wheeler & Critchfield 1985). 1.3.1.2 Geographic distribution Lodgepole pine is extensively widespread throughout western North America covering some 6 million ha in the USA and 20 million ha in Canada. The species distribution ranges from 31 °N in Baja California to 64 °N in the Yukon for its latitude range and extends from the Pacific Coast to the Black Hills of South Dakota for its longitude range (Fig. 1.2) Little (1971). In the interior, it can be found at elevations from 490 to 3660 m. MacDonald & Cwynar (1985) studied fossil pollen in western interior Canada during the late Pleistocene to the Holocene and hypothesized that ssp. latifolia migrated northwards along 6 the Rocky Mountains from refugia located south of the continental glacial limits, reaching central eastern British Columbia by 5000 BP. The species reached the Yukon border region by 8000 BP and extended to its contemporary northern limit range in the Yukon during the past thousand years. 1.3.1.3 Botanical features P. contorta ssp. latifolia is an aggressive pioneer species due to the production of cones from an early age, the small seed size, high dispersal ability, and relatively rapid juvenile growth. Seed-cones usually mature during August-October one year following pollination. The cones are most often serotinous. The serotinous seed-cones remain firmly cemented until exposure to fire. Lodgepole pine produces only 10-12 seeds per cone, but it commonly produces abundant cones, and good seed crops occur at one to three year intervals. The seeds are small and retain their viability for many years inside the serotinous cones. Seed size is about 4-5 mm in length with wing length of about 10-15 mm. Seedlings have few cotyledons and juvenile growth is rapid. The subspecies latifolia is the most commercially important among the four subspecies, particularly for its ability to provide multiple products such as veneer, lumber, poles, railroad ties, pulp and paper. 1.3.2 Microsatellite (SSR) Microsatellites are short, tandemly repeated simple units (one to ten base pair repeats in length, but normally less than 5 base pairs in length) composed of di-, tri- or tetranucleotide repeats such as (AC) n , (AAT) n or (GATA) n . They may be further classified by the specific composition of their core sequence as perfect (pure; CACACACACACACACA), imperfect (interrupted; CACATTCACACATTCA), or compound repeats (CACACACAGAGAGAGA) (Jarne & Lagoda 1996). They are highly polymorphic and codominant, providing the greatest 7 information for population genetic studies. The main disadvantage of microsatellites is that identifying primer regions from a genomic library for a new species can be labour and time intensive. A better source of SSRs, containing more conserved primer regions, can be obtained from expressed sequence tag (EST) databases of target species or of closely related species. 1.3.2.1 Microsatellites sourced from EST databases Expressed sequence tags (ESTs) are small pieces of D N A sequence (usually 200 to 500 nucleotides long) that are generated from complementary D N A (cDNA) libraries by sequencing either one or both ends of an expressed gene. Sequencing only the beginning portion of the cDNA produces a 5' EST. These are portions of transcripts that usually code for proteins. These regions tend to be conserved across species and do not change much within a gene family. Sequencing the end portion of the cDNA molecule produces a 3' EST. Because these ESTs are generated from the 3' end of a transcript, they are likely to fall within non-coding, or untranslated regions (UTRs), and therefore tend to exhibit less cross-species conservation than 5' ESTs. Identifying genes from ESTs varies among organisms and is dependent upon genome size as well as the presence or absence of introns, the intervening D N A sequences interrupting the protein coding sequence of a gene. Pine EST sequences were obtained from three cDNA libraries, one prepared from seedlings and one from developing phloem (Kinlaw et al. 1996) and the third from developing xylem (Allona et al. 1998). The ESTs from the random-primered seedling library and the oligo-dT-primer phloem library were found to B L A S T match to sequences in public databases at frequencies of 43 and 19 percent, respectively. Although, a large fraction of ESTs were not annotated, nearly 10 percent of the total ESTs from the seedling library 8 matched Rubisco (ribose bisphophate carboxylose/oxygenase) small subunit and light-harvesting complex mRNAs. These genes are related to photosynthesis in light-germinating seedlings. Nearly 10 percent of the total ESTs from the xylem library showed similarity to known cell wall biosynthesis enzymes (Allona et al. 1998). Currently, the loblolly pine genome project has sequenced and submitted more than 296,433 ESTs from different cDNA libraries to GenBank (http://www.ncbi.nlm.nih.gov/dbEST/dbEST summary.html) representing more than 10,000 unique genes (http://www.ncbi.nlm.nih.gov/entrez/ query.fcgi?db=unigene), as of 06 Jun 2005. 1.3.2.2 Microsatellites sourced from related species Cross amplification of microsatellites is a desirable approach for developing microsatellites in a related species. The transfer of SSRs between related species has been reported in several pine species. Echt et al. (1999) reported that dinucleotide microsatellites of P. strobus L. amplified SSR loci in soft pines of the subgenus Strobus, but not in hard pines of the subgenus Pinus. In contrast, they found that dinucleotide microsatellites from P. radiata D. Don. amplified SSR loci in various hard pines, but not in soft pines. However, Kutil & Williams (2001) found that seven of 15 trinucleotide microsatellites from hard pine (P. taeda L.) had trans-specific amplification in both hard (P. palustris Mi l l . , P. echinata Mil l . , P. radiata, P. patula Schiede et Deppe, P. halepensis M i l l . , P. kesiya Royle) and soft {P. strobus) pines. Shepherd et al. (2002) also reported that microsatellites designed in P. strobus, P. radiata, and P. taeda have been useful in both hard pine species including P. elliottii var. elliottii and P. caribaea var. hondurensis. In addition, Gonzalez-Martinez et al. (2004) reported successful microsatellites transfer from P. taeda and P. sylvestris L. to seven 9 Eurasian hard pine species (P. uncinata Ram., P. sylvestris L. , P. nigra Arn., P. pinaster Ait., P. halepensis M i l l . , P. pinea L. , and P. canariensis Sm.). The caveat of transferable microsatellites is that heterologous primers are more likely to produce products of unexpected size or produce products of expected sizes that are not SSRs, thus requiring the verification of amplification products by the means of hybridization or sequencing (Westman & Kresovich 1998). However, comparing the sequence of orthologous loci in different species can provide information on the birth (Messier et al. 1996) and death (Taylor et al. 1999) of microsatellites. 1.3.3 Amplified Fragment Length Polymorphism (AFLP) The amplified fragment length polymorphism, dominant marker, is a combination of RFLP (Restriction fragment length polymorphism) and the PCR (Polymerase chain reaction) analysis, developed by Vos et al. (1995). The procedure of A F L P consists of several steps. Firstly, genomic D N A is digested with two restriction enzymes, usually an infrequent (rare) cutter and frequent cutter such as EcoRI and Msel. Secondly, the fragments are ligated to adaptors, which are short segment of double-stranded D N A with sticky ends complementary to the restriction sites. Thirdly, the fragments are amplified using different specific primers, which are based on the combination of adaptor sequence, restriction site, and selective nucleotides. Subsequently, the amplified fragments are electrophoresed on polyacrylamide gels and scored as presence or absence of bands. 1.3.3.1 AFLP development in trees AFLP protocols for genome analysis in trees of different genome sizes, such as peach (Prunus spp.), popular (Populus spp.), eucalypt (Eucalyptus spp.), oaks (Quercus spp.), and pine (Pinus taeda L.) have been successful in providing informative and reproducible AFLP 10 fingerprints (Cervera et al. 2000). Primer selection for pre-amplification and/or selective PCR amplification steps are discussed elsewhere (Cervera et al. (2000). For P. taeda L., a pre-amplification with EcoRl + 2 - Msel + 2 followed by a selective amplification with EcoRl + 3 - Msel + 4 or with EcoRl + 3 - Msel + 5 provided 100-130 amplified fragments per primer combination. It is well known that conifers have a large amount of repetitive DNA, with genome sizes ranging from 1.5 to 2.5 x 10 1 0 bp. The high complexity of conifer genomes may result in the production of highly complex banding patterns with AFLP analyses. Paglia & Morgante (1998) found that the EcoRl/Msel enzyme combination produced more complex A F L P patterns than did the PstllMsel enzyme combination in Norway spruce (Picea abies K.). A pre-amplification with Pstl + 1 - Msel + 1 followed by Pstl + 3 - Msel + 3 produced clear banding patterns with a number of amplified fragments ranging from 20 to 150. Costa et al. (2000) suggested that the A F L P technique was twice as fast and produced more polymorphic loci per marker than the R A P D technique. 1.3.4 Analysis of SSR and AFLP data Determining how much genetic diversity is present within and among species and populations is of importance in evolutionary and conservation biology. With advances in both marker methods and statistical inference, reliable estimates of genetic parameters have been developed for both dominant and co-dominant markers. 1.3.4.1 Models of microsatellite mutations Understanding the mutation model underlying microsatellite evolution is a prerequisite for the application of population genetic analyses because population parameters' estimates are dependent on the mutation model assumed. Four models of mutation have been proposed to estimating populations' parameters, namely: 1) the infinite 11 allele model (IAM; Kimura & Crow 1964), a mutation involves any number of tandem repeats and always creates an allele, which is not previously encountered in the population, 2) the ^-allele model ( K A M ; Crow & Kimura 1970), the number of possible alleles is K, and any allele has the same probability [u/(K-l)] to mutate to any other (K-l) allelic state, 3) the stepwise mutation model (SMM; Kimura & Ohta 1978), each mutation creates a novel allele by the loss or gain of a single repeat, and 4) the two phase model (TPM; Di Rienzo et al. (1994), mutations increase or decrease allele size by X repeats. Three mutation models have been considered for microsatellite loci: the IAM, S M M , and T P M . 1.3.4.2 Estimating population differentiation and genetic distance with SSR markers Estimates of genetic differentiation [e.g. FST (Wright 1965), GST (Nei 1973) and © (Weir & Cockerham 1984)] and genetic distance [e.g. Nei's unbiased distance (Nei 1978)] are based on the infinite allele (IAM) or k allele (KAM) models. Microsatellites are thought to evolve by a stepwise mutation process (Schlotterer & Tautz 1992). Therefore, I A M is not suitable for microsatellite analyses, where most mutations involve the adding or deleting a small number of repeat units. Statistics based on models which take into account some features of microsatellite evolution have been proposed: i?sT (Slatkin 1995) a measure of genetic differentiation analogous to FST and (8p)2 (Goldstein et al. 1995a; 1995b) a genetic distance measure. These measures are appropriate for microsatellite loci under the stepwise mutation model with no constraint on allele size (i.e., mean mutation change is zero and there is no limit on the number of possible allelic states/allele size). Slatkin (1995) demonstrated that FST will provide biased estimates in level of genetic differentiation when applied to microsatellite data; however, FST can be used when the short time scales of interest are tens 12 or hundreds of generations. Goldstein et al. (1995a) also suggested that Nei's distance is not linear for microsatellites over extended periods of time. 1.3.4.3 Estimating gene diversity from AFLP markers Data analysis for A F L P and other dominant markers presents the difficulty of estimating allele frequencies for diploid species. With dominant markers, individuals heterozygous for a band at a specific locus cannot be distinguished from individuals homozygous at that locus. This increases the variance of estimates and also introduces potential biases for estimating genetic diversity and population differentiation. To estimate null allele frequency at dominant loci, three methods have been proposed: the square root transformation of the null homozygous frequency, the Lynch & Milligan procedure (LM), and Bayesian methods. The frequency of null alleles at a given locus can be estimated by taking the square root of the frequency of the null-homozygote. Alternative, less biased estimates than the square root transformation was introduced by Lynch & Milligan (1994). However, the L M procedure recommends excluding loci with three or fewer null homozygotes for estimating null allele frequency. This leads to strongly biased estimates of average heterozygosity and genetic distances, especially for species with low polymorphism (Zhivotovsky 1999). As an alternative approach, Zhivotovsky (1999) introduced a new model based on a Bayesian approach to null allele frequency estimations for dominant markers. Based on computer simulations, gene diversity estimates from the Bayesian approach gave nearly unbiased estimates. Krauss (2000a), however, demonstrated that the accurate and equivalent estimates of gene diversity from dominant markers can be obtained by all three procedures when there were relatively few loci with no null homozygotes and/or low null allele frequency. 13 1.3.5 Applications of SSR and AFLP in forest genetics 1.3.5.1 Applications of SSR in forest genetics Microsatellite markers have been used in many areas of forest conservation genetics and tree improvement programs including: monitoring the effects of forest fragmentation (Aldrich et al. 1998; Chase et al. 1996) and forest management practices (Glaubitz et al. 2003; Rajora et al. 2000), seed orchard management (Lexer et al. 1999; Stoehr & Newton 2002a), and mating designs of advanced breeding program (Lambeth et al. 2001). For example, Chase et al. (1996) used five-microsatellite markers to study the impacts of forest fragmentation on gene flow in Pithecellobium elegans Ducke. They concluded that isolated trees in pastures might be stepping stones for gene flow among patches, which contributed to the genetic diversity of continuous undisturbed forest. Aldrich et al. (1998) studied a population-level pedigree in a fragmented tropical forest tree species, Symphonia globulifera L. The samples of adults, saplings, and seedlings were genotyped with three microsatellite markers. Pedigree reconstruction showed that adults in pastures produced most seedlings in remnant forests. The patterns of pollen and seed dispersal among fragmented populations observed in their study supported the hypothesis that trees in pastures serve as "bridges" between isolated fragments. In tree breeding programs, microsatellites have been used in seed orchard management for detecting seed and pollen contamination. Lexer et al. (1999) and Stoehr & Newton (2002a) used microsatellites for detecting pollen contamination in oaks (Quercus robur L.) and lodgepole pine seed orchards, respectively. Management of advanced generation breeding is another application of microsatellites in tree improvement programs. Lambeth et al. (2001) suggested that microsatellite markers could be useful when the 14 polymix mating design is used. They managed to identify the pollen parent of sired seedlings only when each male parent had unique multilocus genotypes. 1.3.5.2 Applications of AFLPs in forest genetics At present, AFLPs are considered to be more suitable for addressing genetic diversity and constructing genetic maps, as they have increased reliability and precision; however, they have some limitations due to dominance. In forest trees, AFLPs are proven to be powerful for the construction of genetic maps for many conifer species (Arcade et al. 2000; Cato et al. 1999; Costa et al. 2000; Lerceteau & Szmidt 1999; Remington & O'JVIalley 2000; Remington et al. 1999; Sewell et al. 1999; Travis et al. 1998). Genetic maps are important in understanding genome organization and evolution and for determining molecular markers closely associated with important traits, which can be used for marker-aided selection and breeding programs. On the individual tree level, AFLPs are useful as a tool for finger-printing and clonal identification (Hornero et al. 2001). Currently, AFLPs are commonly used in population and ecological studies and as a supporting tool for conservation of forest genetic resources in many species such as Avicennia germinans L. (Ceron-Souza et al. 2005), Caesalpinia echinata Lam (Cardoso et al. 2005), and Eugenia uniflora L. (Margis et al. 2002). 1.3.6 Genetic effects of domestication in forest trees Domestication of forest tree species, if not applied carefully, has the potential to reduce the genetic diversity in domesticated populations, especially i f the process of sampling, breeding, and selection is repeated over multiple generations. Conifer breeding programs start with the selection of trees, based on their phenotypic appearance, in either natural populations or undomesticated populations. These selections are placed in seed 15 orchards. A variety of breeding designs are then used to create genetic tests to rank the selections, estimate genetic parameters, and make new selections for the second generation. Wind- or controlled-pollinated seed collected from these orchards can then be used for operational reforestation programs. 1.3.6.1 History of lodgepole pine breeding programs Selective breeding of interior lodgepole pine for increased wood volume started in 1976 (Carlson 2001). Gene archives and breeding orchards were established between 1976 and 1986 from several hundred trees from wild stands in eight distinct geographic areas, referred to as seed planning zones (SPZs) (McAuley 1998): Bulkley Valley (BV), Central Plateau (CP), East Kootenay (EK), Nelson (NE), Nass Skeena (NS), Prince George (PG), Peace River (PR), Thompson Okanagan (TO). Thirteen progeny test series were established between 1984 and 1988 and were planted across six SPZs with three sites per unit, approximately 300 selected trees progeny tested per unit. Since 1974, 17 seed orchards have been established at various times. Seven seed orchards have been established with selections based on 10-year progeny test ranking within the six SPZs and have started seed production with seedlots with expected genetic worth ranging from 8 to 16%. The remaining 10 seed orchards have not been progeny tested and are producing seedlots with expected genetic worth values ranging from 2 to 6%. 1.3.6.2 Genetic gain and diversity The major short-term goal of most tree improvement program is to achieve genetic gains for the attributes under selection, while still maintaining sufficient genetic diversity for long-term gain. Over the longer term, selection of different attributes and unpredictable environmental contingencies can occur. Erosion of genetic variation may also occur. These 16 short- and long-term goals are not complementary, in most cases. To achieve high genetic gains, high selection intensity must be made, thus leading to rapid reduction in genetic diversity in the breeding population as a result of decreasing population sizes. The relationship between genetic gain and diversity in improvement programs is similar to staked layers forming a pyramid, where the lower layer represents maximum diversity and minimal genetic gain and conversely the top layer represents minimum diversity and maximum genetic gain (Fig. 1.3). The lower layer is equivalent to the gene resource population representing all genetic variation available to the breeding population (i.e., native stand, plantations, provenance trials, seed orchards, and progeny tests). The next layer is the breeding population, which harbours sufficient genetic variation to maintain genetic gains over multiple generations. The top layer represents the production population, which consists of a subset of elite genotypes managed for the sole purpose of the production of genetically improved seed for operational artificial reforestation programs. Ideally, the breeding populations should have a very large number of selections; however, breeding population size should meet acceptable financial constrains and practical feasibility. Maintaining low frequency alleles for many generations required thousands of parents in the breeding population (Yanchuk 2001). Clearly, phenotypic selection reduces the population size, thus it is expected that some loss of genetic diversity will occur. However, additional biological factors in the seed and seedling production stages also affect levels of genetic variability at these stages. Genetic variability during the seed production phase is dependent on many biological factors such as reproductive phenology asynchrony, differential male and female reproductive output (known as parental imbalance), and inbreeding, all of which act individually or in concert, 17 resulting in a further reduction in genetic diversity (Fig. 1.4). Moreover, during the seedling production phase, issues such as seed biology (seed dormancy, germination rate, and speed) and management practices (thinning and culling) also affect the genetic diversity in seedling crops (Fig. 1.4). As described above, it can be concluded that the domestication of forest trees is composed of several consecutive steps including 1) phenotypic selections from natural or artificial stands to form the breeding population, 2) breeding programs that involves the application of various mating designs for testing and selection, 3) seed production in seed orchards, 4) seedling production in nurseries, and 5) reforestation and the establishment of new plantations. At present, most studies of forest tree domestication have been mainly devoted to the genetic consequences of domestication rather than evaluation of the process itself (El-Kassaby 2000). By comparing the genetic diversity between natural and domesticated populations, the effects of domestication can be discerned. The effects of phenotypic selection on genetic diversity have been studied in some conifer species using biochemical and molecular markers. Rajora (1999) studied the impact of phenotypic selection in white spruce using RAPDs and demonstrated the presence of loss of genetic diversity during the phenotypic selection phase. However, other studies have revealed that production populations (seed orchards) harbour a broad sample of genetic variation that is complementary to that in natural populations in conifer species: Picea abies (L.) H. Krast. (Bergmann & Ruetz 1991), Picea sitchensis (Bong.) Carriere (Chaisurisri & El-Kassaby 1994), Pesudotsuga menziesii (Mirb.) (El-Kassaby & Ritland 1996), and Picea glauca x engelmannii (Stoehr & El-Kassaby 1997). 18 An extensive study on the effect of phenotypic selection on genetic diversity was reported for Pseudotsuga menziesii, by comparing level of genetic diversity through different stages of selection (breeding population and first- and second-generation seed orchard) (El-Kassaby & Ritland 1996). Several genetic parameters (e.g., number of allele per locus, percent of polymorphic loci, and expected heterozygosity) were estimated and the results showed that the domesticated populations were similar or, in some cases, higher than that of their natural populations. Additionally, El-Kassaby and Ritland (1996) demonstrated that there was no reduction of genetic diversity between first- and second-generation seed orchards. In another study, Stoehr & El-Kassaby (1997) assessed the impact of domestication on genetic diversity at multiple levels, levels analogous to the pyramid diagram (Fig. 1.4), for interior spruce (Picea glauca x engelmannii) using isozyme markers. In this study, they demonstrated that the levels of genetic diversity were comparable between natural populations and the seed orchard production populations; however, when they extended the work to the seed and seedling production phases, they revealed that the level of genetic diversity of seedling population was lower than that of natural, seed orchard, and seedlot populations, showing gradual reduction throughout the domestication process. In summary, most of these studies have shown that the level of genetic diversity in production population (seed orchard) were similar or, in some cases, even higher than that of natural populations, indicating a high efficiency of phenotypic selection. This increase in genetic diversity was attributed to the breadth of the breeding population sampling. In most cases, the breeding population consisted of several hundred parents selected from a very wide geographic zone, thus providing opportunities for sampling hundreds of local gene pools. 19 When the breeding population-sampling scheme is compared to several intensively sampled natural populations, it is expected that the genetic diversity of all these local gene pools should be higher. However, it is likely that the majority of the genetic erosion could occur during the later phases of domestication process; namely, during seed and seedling production. Therefore, it is paramount to monitor the genetic variability at various stages of the domestication cycle and to pinpoint the possible causes of genetic erosion. 1.3.7 Mating system, sibship structure, and informative genetic markers The mating system, defined as the genetic relationship between mates, governs the pattern of gene transmission between generations (Brown 1988). Classically, plants have been characterized as exhibiting a variety of mating systems, ranging from complete outcrossing (as in most conifers) to a mixture of self-fertilization and outcrossing (as in some herbaceous angiosperms and notably western redcedar), to complete selling (as in many grasses). This classical definition involves the proportion of self-fertilization vs. random outcrossing. More recently, additional facets have been identified. These include the rate of biparental inbreeding (mating between relatives) and the extent of "correlated paternity" (tendency of a female to mate to one male, as opposed to a pollen pool of many males, the common classical assumption). The recent application of high-resolution molecular markers (highly heterozygous SSRs, multiple locus AFLPs) to the investigation of plant mating systems has provided new insights into the fine structure of mating systems, including the difference between multilocus vs. singlelocus correlation of paternity (Ritland 2002), individual inbreeding coefficients (Vogl et al. 2002), individual outcrossing rates (O'Connell et al. 2004), and pollen dispersal patterns (Dick et al. 2003; Nagamitsu et al. 2001). 20 Highly informative markers can also address questions involving genetic relatedness between individuals in a population, especially in relation to spatial distribution (Dow & Ashley 1999; Gonzalez-Martinez et al. 2003). For example, Gonzalez-Martinez et al. (2003) used a pairwise-likelihood approach and microsatellites to determine possible half-sib and full-sib relationships in both mature trees and natural regeneration of a cohort stand of maritime pine. They reported the presence of a low level of genetic relatedness in both adult trees and saplings within the studied stand. The lack of genetic structure (i.e., relatedness) prompted them to recommend the use of this stand as a source of seed for reforestation. Knowledge of the genetic relationships among individuals is critical for managing populations of endangered species, as one can estimate heritabilities of quantitative traits in natural settings (Ritland 2000; Thomas & Hil l 2000), characterize breeding strategies and fitness (Herbinger et al. 1995), and estimate effective population size (Herbinger et al. 1997). Initially, inferences about relatedness in natural populations were based upon pairwise comparisons (Ritland 2000). However, pairwise approaches are less informative than approaches that incorporate high-order relationship (Thomas & Hil l 2002). For example, i f three individuals were sampled from a single generation, then they could be unrelated, all full sibs, or two can be sibs with the third unrelated to them. Accounting for the joint relationships among several individuals, in the form of group-likelihood approaches, has recently received interest (Smith et al. 2001; Thomas & Hil l 2002; Wang 2004). These approaches are however, restricted and assume cohorts of full-sibs vs. unrelated (Smith et al. 2001), or full-sibs, half-sibs, and unrelated (Thomas & Hil l 2002; Wang 2004). Population structure (isolation by distance) is not incorporated. 21 There has been very little empirical work on sibship reconstruction using a group-likelihood approach in evolutionary and ecological genetics (but see Chapman et al. 2003; Wang 2004). Chapman et al. (2003) used 6 microsatellite loci and group-likelihood method to estimate the number of colonies of two bumblebee species utilizing a given foraging site. Wang (2004) demonstrated the use of group-likelihood methods for estimating the number of colonies of ant species and for determining level of multiple paternity in the Kemp's ridley sea turtle. 1.3.8 Genetic variability within the genome Diversity patterns cannot only be described in terms of the partitioning of variation within vs. between populations, but also the pattern of variation along chromosomes within individuals. With highly informative molecular markers, we can now examine these patterns. The single most important component of this pattern (beyond mean diversity, which can be measured by less informative, single locus markers) is the correlation of diversity along a chromosome and the location of "hotspots" of diversity within the genome. In a species such as lodgepole pine, with a recent history of migration, bottlenecks, and colonization of new habitats with consequent selection pressures, highly informative molecular markers may provide new insights into the forces shaping genomic diversity. Patterns of diversity at the genome level are likely to be shaped by mutation, selection, recombination, and migration. The evidence suggests that local genome diversity is strongly affected by recombination and selection. In Drosophila, genetic diversity along chromosomes varies as a function of recombination rate. Loci near centromeres tend to have lower recombination rates and lower levels of genetic diversity. In plants, the correlation between recombination rate and genetic diversity has been reported in two species. Kraft et 22 al. (1998) investigated the correlation between the level of genetic variation and the recombination rate per physical unit in sea beet (Beta vulgaris subsp. maritima) using RFLP markers that covered two of the nine chromosomes. They used the linkage map of sugar beet to estimate level of genetic variation in three populations of sea beet and found that the recombination rate was suppressed in the middle and increased further toward the end of chromosome for both linkage groups. However, one linkage group showed that the recombination rate decreased at the very ends of the chromosomes, but overall positive correlation between recombination rate and genetic variation was detected. Using RFLP markers, similar results also were reported by Dvorak et al. (1998) across seven chromosomes in six goatgrasses species (Aegilops L.). They reported that the level of variation at a RFLP locus was a function of the position of the locus on the chromosome and the recombination rate in the neighborhood loci. Loci in the proximal chromosome regions showed greatly reduced recombination rates relative to the distal regions and were significantly less variable than loci in the distal chromosome regions in all six species. The reduction of diversity in regions of low recombination rate may be explained by a selective sweep, in which a strongly selected advantageous allele rapidly heads to fixation, thus reducing diversity at the selected locus. A neutral allele can spread as a result of being dragged along with the advantageous allele, decreasing diversity in the adjacent region. This process is known as genetic hitchhiking (Maynard Smith & Haigh 1974). In the case of hitchhiking, there should be a relatively high frequency of a rare polymorphism in comparison with the pure neutrality. The phenomenon of selective sweeps can be easily observed in cases where there is independent evidence that a locus has been subjected to directional selection (e.g., pesticide resistance). Mathematical modeling suggests that a 23 strong selective sweep may result in differentiation of populations at the hitchhiked locus if the gene flow between populations is low enough (Slatkin & Wiehe 1998). An alternative hypothesis that might explain the reduction in diversity is background selection (Charlesworth et al. 1993). This hypothesis predicts a reduction in diversity due to removal of deleterious mutants and continual elimination of linked deleterious mutant loci from the population. Background selection should act similarly in different populations or in closely related species with similar structure, as its action depends on the local genomic features such as gene density and recombination rate. In contrast, selective sweep is often a population-specific event. In addition to selective sweep and background selection, the population bottleneck may result in decreasing diversity. Bottlenecks will have a similar effect on all loci within a given population, whereas a selective sweep will have locus-specific effects (Nurminsky 2001). 24 P. plnea P. brutla P. canarlensls — p, restnosa P. sylvestrls w \rP.patula P.oocarps i-P.tebcote 100} : ' P. taeda P. radiate P. lelophylla P.tumhoim P. torreyam Figure 1.1 Phylogeny of 18 species in subgenus Pinus based on an analysis of chloroplast D N A restriction site mutations. Percentages from 2,000 bootstrap replications are shown within circles at nodes (Krupkin et al. 1996) Figure 1.2 Natural distribution of Pinus contorta (Little 1971) 26 High genetic gain Baorder genetic diversity Figure 1.3 Relationship between genetic gain and diversity on the pyramid of improvement in tree breeding program (after Johnson et al. 2001) 27 Phenotypic selection* Natural Population Breeding population klnfusion Selection Breeding strategy Testing Breeding Reproductive phenology Parental imbalance Inbreeding Contamination Germination Dormnacy Thinning and culling Seed Orchards Spacing and thinning Nurseries \ Plantations, Advanced generation Seed production Seedling: production Figure 1.4 A diagram showing tree improvement delivery system with its associated activities (from El-Kassaby 2000a) 28 1.4 References Aldrich PR, Hamrick JL, Chavarriaga P, Kochert G (1998) Microsatellite analysis of demographic genetic structure in fragmented populations of the tropical tree Symphonia globulifera. Molecular Ecology 7, 933-944. Allona I, Quinn M , Shoop E, Swope K, St Cyr S, Carlis J, Riedl J, Retzel E, Campbell M M , Sederoff R, Whetten RW (1998) Analysis of xylem formation in pine by cDNA sequencing. Proceedings of the National Academy of Sciences of the United States of America 95, 9693-9698. Arcade A , Anselin F, Faivre Rampant P, Lesage M C , Paques L E , Prat D (2000) Application of A F L P , R A P D and ISSR markers to genetic mapping of European and Japanese larch. Theoretical and Applied Genetics 100, 299-307. Bergmann F, Ruetz W (1991) Isozyme genetic variation and heterozygosity in random tree samples and selected orchard clones from the same Picea abies populations. Forest Ecology and Management 46, 39-47. Brown A H D (1988) Genetic characterization of plant mating system. In: Plant Population Genetics, Breeding, and Genetic Resources (eds. Brown A H D , Clegg MT, Kahler A L , Weir BS), pp. 145-162. Sinauer Associates, Inc., Sunderland, Masschusetts. Cardoso SRS, Provan J, Lira CCD, Pereira LDR, Ferreira PCG, Cardoso M A (2005) High levels of genetic structuring as a result of population fragmentation in the tropical tree species Caesalpinia echinata Lam. Biodiversity and Conservation 14, 1047-1057. Carlson M (2001) "Select" lodgepole pine seed availability to 2010. In: TICtalk, pp. 25-28. Forest Genetics Council of British Columbia. Cato SA, Corbett GE, Richardson TE (1999) Evaluation of A F L P for genetic mapping in Pinus radiata D. Don. Molecular Breeding 5, 275-281. Ceron-Souza I, Toro-Perea N , Cardenas-Henao H (2005) Population genetic structure of neotropical mangrove species on the Colombian Pacific coast: Avicennia germinans (Avicenniaceae). Biotropica 37, 258-265. Cervera MT, Remington D, Frigerio JM, Storme V , Ivens B, Boerjan W, Plomion C (2000) Improved A F L P analysis of tree species. Canadian Journal of Forest Research 30, 1608-1616. Chaisurisri K , El-Kassaby Y A (1994) Genetic diversity in a seed production population vs. natural populations of Sitka spruce. Biodiversity and Conservation 3, 512-523. Chapman RE, Wang J, Bourke A F G (2003) Genetic analysis of spatial foraging patterns and resource sharing in bumble bee pollinators. Molecular Ecology 12, 2801-2808. Charlesworfh B, Moran MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289-1303. Chase M , Kesseli R, Bawa K (1996) Microsatellite markers for population and conservation genetics of tropical trees. American Journal of Botany 83, 51-57. 29 Costa P, Pot D, Dubos C, Frigerio JM, Pionneau C, Bodenes C, Bertocchi E, Cervera MT, Remington DL, Plomion C (2000) A genetic map of maritime pine based on AFLP, RAPD and protein markers. Theoretical and Applied Genetics 100, 39-48. Crow JF, Kimura M (1970) An introduction to population genetics theory Harper and Row, New York, Evanston and London. Dancik BP, Yeh FC (1983) Allozyme variability and evolution of lodgepole pine (Pinus contorta var. latiforlia) and jack pine (P. banksiana) in Alberta. Canadian Journal of Genetics and Cytology 25, 57-64. Di Rienzo A , Peterson A C , Garza JC, Valdes A M , Slatkin M , Freimer N B (1994) Mutational processes of simple sequence repeat loci in human populations. Proceedings of the National Academy of Sciences of the United States of America 91,3166-3170. Dick CW, Etchelecu G, Austerlitz F (2003) Pollen dispersal of tropical trees (Dinizia excelsa: Fabaceae) by native insects and African honeybees in pristine and fragmented Amazonian rainforest. Molecular Ecology 12, 753-764. Dow B, Ashley M (1999) Changing genetic structure of a savanna bur oak population. Forest Genetics 6, 29-39. Dvorak J, Luo M C , Yang Z L (1998) Restriction fragment length polymorphism and divergence in the genomic regions of high and low recombination in self-fertilizing and cross-fertilizing Aegilops species. Genetics 148, 423-434. Echt CS, Vendramin GG, Nelson CD, Marquardt P (1999) Microsatellite D N A as shared genetic markers among conifer species. Canadian Journal of Forest Research 29, 365-371. El-Kassaby Y A , Ritland K (1996) Impact of selection and breeding on the genetic diversity in Douglas-fir. Biodiversity and Conservation 5, 795-813. El-Kassaby Y A (2000) Effect of forest tree domestication on gene pools. In: Forest conservation genetics (eds. Young A, Boshier D, Boyle T), pp. 197-213. CSIRO publishing, Collingwood, Australia. Fazekas AJ , Yeh FC (2001) Random amplified polymorphic D N A diversity of marginal and central populations in Pinus contorta subsp. latifolia. Genome 44, 13-22. Glaubitz JC, Murrell JC, Moran GF (2003) Effects of native forest regeneration practices on genetic diversity in Eucalyptus consideniana. Theoretical and Applied Genetics 107, 422-431. Goldstein DB, Linares AR, Cavallisforza L L , Feldman M W (1995a) An evaluation of genetic distances for use with microsatellite loci. Genetics 139, 463-471. Goldstein DB, Linares AR, Cavallisforza L L , Feldman M W (1995b) Genetic absolute dating based on microsatellites and the origin of modern Humans. Proceedings of the National Academy of Sciences of the United States of America 92, 6723-6727. Gonzalez-Martinez SC, Gerber S, Cervera MT, Martinez-Zapater J M , Al ia R, Gi l L (2003) Selfing and sibship structure in a two-cohort stand of maritime pine (Pinus pinaster Ait.) using nuclear SSR markers. Annals of Forest Science 60, 115-121. 30 Gonzalez-Martinez SC, Robledo-Arnuncio JJ, Collada C, Diaz A , Williams CG, Alia R, Cervera M T (2004) Cross-amplification and sequence variation of microsatellite loci in Eurasian hard pines. Theoretical and Applied Genetics 109, 103-111. Hamrick JL, Godt MJW, Sherman-Broyles SL (1992) Factors affecting levels of genetic diversity in woody plant species. New Forests 6, 95-124. Herbinger C M , Doyle RW, Pitman ER, Paquet D, Mesa K A , Morris DB, Wright JM, Cook D (1995) D N A fingerprint based analysis of paternal and maternal effects on offspring growth and survival in communally reared rainbow trout. Aquaculture 137, 245-256. Herbinger C M , Doyle RW, Taggart CT, Lochmann SE, Brooker J M , Cook D (1997) Family relationship and effetive population size in a natural cohort of Atlantic cod (Gadus morhua) larvae. Canadian Journal of Fisheries and Aquatic Sciences 54(suppl. 1), 11-18. Hornero J, Martinez I, Celestino C, Gallego FJ, Torres V , Toribio M (2001) Early checking of genetic stability of cork oak somatic embryos by A F L P analysis. International Journal of Plant Sciences 162, 827-833. Jarne P, Lagoda PJL (1996) Microsatellites, from molecules to populations and back. Trends in Ecology & Evolution 11, 424-429. Johnson R, St. Clair B, Liopow S (2001) Genetic conservation in applied tree breeding programs. In: Proceedings ITTO conference on in situ and ex situ conservation of commercial tropical trees, pp. 215-230. Kimura M , Crow J (1964) The number of alleles that can be maintained in a finite population. Genetics 49, 725-738. Kimura M , Ohta T (1978) Stepwise mutation model and distribution of allelic frequencies in a finite population. Proceedings of the National Academy of Sciences of the United States of America 75, 2868-2872. Kinlaw CS, Ho T, Gerttula S M , Gladstone E, Harry DE, Quintana L , Baysdorfer C (1996) Gene discovery in loblolly pine through cDNA sequencing. In: Somatic Cell Genetics and Molecular Genetics of Trees (eds. Ahuja MR, Boerjan W, Neale DB), pp. 175-182. Kluwer Academic Publishers, Dordrecht, The Netherlands. Kraft T, Sail T, Magnusson-Rading I, Nilsson NO, Hallden C (1998) Positive correlation between recombination rates and levels of genetic variation in natural populations of sea beet (Beta vulgaris subsp. maritima). Genetics 150, 1239-1244. Krauss SL (2000) Accurate gene diversity estimates from amplified fragment length polymorphism (AFLP) markers. Molecular Ecology 9, 1241-1245. Krupkin A B , Liston A , Strauss SH (1996) Phylogenetic analysis of the hard pines (Pinus subgenus Pinus, Pinaceae) from chloroplast D N A restriction site analysis. American Journal of Botany 83, 489-498. Kutil BL , Williams C G (2001) Triplet-repeat microsatellite shared among hard and soft pines. Journal of Heredity 92, 327-332. 31 Lambeth C, Lee BC, O'Malley D, Wheeler N (2001) Polymix breeding with parental analysis of progeny: An alternative to full-sib breeding and testing. Theoretical and Applied Genetics 103, 930-943. Lerceteau E, Szmidt A E (1999) Properties of AFLP markers in inheritance and genetic diversity studies of Pinus sylvestris L. Heredity 82, 252-260. Lexer C, Heinze B, Steinkellner H, Kampfer S, Ziegenhagen B, Glossl J (1999) Microsatellite analysis of maternal half-sib families of Quercus robur, pedunculate oak: detection of seed contaminations and inference of the seed parents from the offspring. Theoretical and Applied Genetics 99, 185-191. Little EL, Jr. (1971) Atlas of United States trees, volume I, conifers and important hardwoods U.S. Department of Agriculture, Washington. Lynch M , Milligan B G (1994) Analysis of population genetic structure with RAPD markers. Molecular Ecology 3, 91-99. MacDonald G M , Cwynar L C (1985) A fossil pollen based reconstruction of the late Quaternary history of lodgepole pine (Pinus contorta ssp. latifolia) in the western interior of Canada. Canadian Journal of Forest Research 15, 1039-1044. MacDonald G M , Cwynar L C (1991) Postglacial population growth rates of Pinus contorta ssp. latifolia in western Canada. Journal of Ecology 79, 417-429. Margis R, Felix D, Caldas JF, Salgueiro F, De Araujo DSD, Breyne P, Van Montagu M , De Oliveira D, Margis-Pinheiro M (2002) Genetic differentiation among three neighboring Brazil-cherry (Eugenia uniflora L.) populations within the Brazilian Atlantic rain forest. Biodiversity and Conservation 11, 149-163. Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favorable gene. Genetical Research 23, 23-25. McAuley L (1998) Interior SPZ Review Report, p. 39. Tree Improvement Program, Ministry of Forests, BC. Messier W, L i SH, Stewart CB (1996) The birth of microsatellites. Nature 381, 483-483. Nagamitsu T, Ichikawa Se, Ozawa M , Shimamura R, Kachi N , Tsumura Y , Muhammad N (2001) Microsatellite analysis of the breeding system and seed dispersal in Shorea leprosula (Dipterocarpaceae). International Journal of Plant Sciences 162, 155-159. Nei M (1973) Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences of the United States of America 70, 3321-3323. Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 83, 583-590. Nurminsky DI (2001) Genes in sweeping competition. Cellular & Molecular Life Sciences 58, 125-134. O'Connell L M , Russell J, Ritland K (2004) Fine-scale estimation of outcrossing in western redcedar with microsatellite assay of bulked DNA. Heredity 93, 443-449. Paglia G, Morgante M (1998) PCR-based multiplex D N A fingerprinting techniques for the analysis of conifer genomes. Molecular Breeding 4, 173-177. 32 Rajora O (1999) Genetic biodiversity impacts of silvicultural practices and phenotypic selection in white spruce. Theoretical and Applied Genetics 99, 954-961. Rajora OP, Rahman M H , Buchert GP, Dancik BP (2000) Microsatellite D N A analysis of genetic effects of harvesting in old-growth eastern white pine (Pinus strobus) in Ontario, Canada. Molecular Ecology 9, 339-348. Remington DL, O'Malley D M (2000) Whole-genome characterization of embryonic stage inbreeding depression in a selfed loblolly pine family. Genetics 155, 337-348. Remington DL, Whetten RW, Liu B H , O'Malley D M (1999) Construction of an AFLP genetic map with nearly complete genome coverage in Pinus taeda. Theoretical and Applied Genetics 98, 1279-1292. Richardson D M , Rundel PW (1998) Ecology and biogeography of Pinus: an introduction. In: Ecology and Biogeography of Pinus (ed. Richardson DM), pp. 3-46. Cambridge University Press, Cambridge. Ritland K (2000) Marker-inferred relatedness as a tool for detecting heritability in nature. Molecular Ecology 9, 1195-1204. Ritland K (2002) Extensions of models for the estimation of mating systems using n independent loci. Heredity 88, 221-228. Schlotterer C (2004) The evolution of molecular markers - just a matter of fashion? Nature Reviews Genetics 5, 63-69. Schlotterer C, Tautz D (1992) Slippage synthesis of simple sequence DNA. Nucleic Acids Research 20,211-215. Sewell M M , Sherman B K , Neale DB (1999) A consensus map for loblolly pine (Pinus taeda L.). I. Construction and integration of individual linkage maps from two outbred three-generation pedigrees. Genetics 151, 321-330. Shepherd M , Cross M , Maguire TL, Dieters MJ , Williams CG, Henry RJ (2002) Transpecific microsatellites for hard pines. Theoretical and Applied Genetics 104, 819-827. Slatkin M (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics 139, 457-462. Slatkin M , Wiehe T (1998) Genetic hitch-hiking in a subdivided population. Genetical Research 71, 155-160. Smith BR, Herbinger C M , Merry HR (2001) Accurate partition of individuals into full-sib families from genetic data without parental information. Genetics 158, 1329-1338. Stoehr M U , El-Kassaby Y A (1997) Levels of genetic diversity at different stages of the domestication cycle of interior spruce in British Columbia. Theoretical and Applied Genetics 94, 83-90. Stoehr M U , Newton C H (2002) Evaluation of mating dynamics in a lodgepole pine seed orchard using chloroplast D N A markers. Canadian Journal of Forest Research 32, 469-476. 33 Taylor JS, Durkin J M H , Breden F (1999) The death of a microsatellite: A phylogenetic perspective on microsatellite interruptions. Molecular Biology and Evolution 16, 567-572. Thomas SC, Hil l W G (2000) Estimating quantitative genetic parameters using sibships reconstructed from marker data. Genetics 155, 1961-1972. Thomas SC, Hil l W G (2002) Sibship reconstruction in hierarchical population structures using Markov chain Monte Carlo techniques. Genetical Research 79, 227-234. Travis SE, Ritland K, Whifham TG, Keim P (1998) A genetic linkage map of pinyon pine (Pinus edulis) based on amplified fragment length polymorphisms. Theoretical and Applied Genetics 97, 871-880. Vogl C, Karhu A , Moran G, Savolainen O (2002) High resolution analysis of mating systems: Inbreeding in natural populations of Pinus radiata. Journal of Evolutionary Biology 15, 433-439. Vos P, Hogers R, Bleeker M , Reijans M , Van De Lee T, Homes M , Frijters A , Pot J, Peleman J, Kuiper M , Zabeau M (1995) A F L P : A new technique for D N A fingerprinting. Nucleic Acids Research 23, 4407-4414. Wang JL (2004) Sibship reconstruction from genetic data with typing errors. Genetics 166, 1963-1979. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38, 1358-1370. Westman A L , Kresovich S (1998) The potential for cross-taxa simple-sequence repeat (SSR) amplification between Arabidopsis thaliana L. and crop brassicas. Theoretical and Applied Genetics 96, 272-281. Wheeler NC, Critchfield WB (1985) The distribution and botanical characteristics of lodgepole pine: Biogeographical and management implications. In: Lodgepole pine: The species and its management (eds. Baumgartner D M , Krebill RG, Arnott JT, Weetman GF), pp. 1-13, Washington State University, USA. Wheeler NC, Guries RP (1982) Population structure, genie diversity, and morphological variation in Pinus contorta Dougl. Canadian Journal of Forest Research 12, 595-606. Wright S (1965) The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19, 395-420. Yanchuk A D (2001) A quantitative framework for breeding and conservation of forest tree genetic resources in British Columbia. Canadian Journal of Forest Research 31, 566-576. Yeh FC, Layton C (1979) The organization of genetic variability in central and marginal populations of lodgepole pine Pinus contorta ssp. latifolia. Canadian Journal of Genetics and Cytology 21, 487-503. Zhivotovsky L A (1999) Estimating population structure in diploids with multilocus dominant D N A markers. Molecular Ecology 8, 907-913. 34 CHAPTER 2 SINGLE-COPY, SPECIES-TRANSFERABLE MICROSATELLITE MARKERS DEVELOPED FROM LOBLOLLY PINE ESTS 1 2.1 Introduction In the past several years, the advent of molecular markers such as random amplified polymorphic D N A (RAPD), amplified fragment length polymorphisms (AFLPs), and microsatellites [(or simple sequence repeats SSRs)] have helped greatly in population genetic studies, in areas such as gene diversity, mating systems, and gene mapping. Microsatellites are often regarded as the "marker of choice", being codominant and showing high variability, but require significant investment to develop, as primer pairs specific to the microsatellite locus must be designed. Microsatellite markers can be developed in several ways: via genomic libraries, enriched genomic libraries, B A C / Y A C libraries, and cDNA libraries (Scott 2001). However, reliable microsatellites are difficult to develop for conifer species, due to their large genome size and extensive repetitive nature of their D N A (Kinlaw & Neale 1997). Often, complex banding patterns (multiple loci) are obtained because of duplications, and null alleles are more frequent due to variation at primer binding sites. Several approaches have been applied to eliminate highly repetitive D N A in conifer libraries. Elsik & Williams (2001) developed microsatellites in loblolly pine using a low-copy enrichment method. In their study, they suggested that low-copy microsatellites provided more polymorphic and informative markers than total-genomic microsatellites. Zhou et al. (2002) also reported an alternative method of microsatellite development in 1 " A version of this chapter has been published." Liewlaksaneeyanawin C, Ritland CE, E l -Kassaby Y A , Ritland K (2004) Single-copy, species-transferable microsatellite markers developed from loblolly pine ESTs. Theoretical and Applied Genetics 109, 361-369. 35 loblolly pine. Using a methylation-sensitive restriction enzyme (McrBC), they developed microsatellites from a library of undermethylated (UM) DNA. Although this method only eliminated some of the highly repetitive D N A and provided a single-locus inheritance microsatellite; the level of polymorphisms for U M microsatellites, however, was lower than that of low-copy microsatellites. Scotti et al. (2000) developed six microsatellite loci by screening a cDNA library in Picea abies for repeats. They reported that four of the six expressed sequence tag microsatellites (EST-SSRs) that they found provided clear banding patterns and a high level of polymorphism. This suggests that EST sequences, obtained from EST databanks, might be a good source for microsatellites. As they reside in, or near, coding DNA, they should be more conserved than genomic sequences, allowing cross-species transferability and lower frequency of null alleles, and they should more likely appear as a single-copy in the genome, alleviating the multiple band problem. Indeed, microsatellites from EST sequences have been recently identified in some plant species, including rice (Cho et al. 2000), grapes (Scott et al. 2000), sugarcane (Cordeiro et al. 2001), and rye (Hackauf & Wehling 2002). Lodgepole pine (Pinus contorta Dougl. ex. Loud. ssp. latifolia Engelm.) is the most important commercial pine species in British Columbia and is widely distributed throughout the Rocky Mountain and Pacific coast regions. Besides being an important commercial species, lodgepole pine is important in watershed management, wildlife habitat, and provides recreational and scenic value in many national parks and wilderness areas (Koch 1996). Understanding genetic history and variation in natural and domesticated populations is important for developing rational conservation and tree improvement strategies. The use of molecular markers, as genetic tools, can help in providing information needed to achieve 36 these purposes. Until now there are only five microsatellite loci developed in P. contorta (Hicks et al. 1998). In this study, we report on the development of loblolly pine (P. taeda) EST-SSR markers and their cross-species transferability to P. contorta ssp. latifolia and other pines. We also investigated the sequence variation and possible evolution at EST-SSR and traditional microsatellite loci for the focal and nonfocal species. The sequence analyses tested whether (1) microsatellites derived from EST libraries are more conserved than those derived from other libraries, and (2) the polymorphism detected in Pinus spp. is the result of expansion or contraction of the microsatellite motif itself or is due to an accumulation of sequence differences coupled with insertion/deletion events in the flanking regions outside the SSR domains. Our results also provide a battery of 23 polymorphic, robust microsatellite primer pairs for lodgepole pine. 2.2 Materials and Methods 2.2.1 Source of microsatellites 2.2.1.1 P. taeda EST databases A total of 55,000 Loblolly pine EST sequences were obtained from Dr. David Neale (U. of California Davis; http://dendrome.ucdavis.edu/Gen_res.htm). Only dinucleotides of eight or more repeats and trinucleotides of six or more repeats were selected for this study. Sequences with sufficient repeat lengths and flanking regions on both sides were selected for primer design. A total of 14 primer pairs (LOP 1-LOP 14) were designed (nine dinucleotide repeats and five trinucleotide repeats) using Oligo 6.0 (LifeScience Software Resource, Minn.) (Table 2.1). 37 2.2.1.2 P. taeda microsatellites (PtTX series) Ninety-nine polymorphic markers were selected from three different sources: seven genomic SSR loci, 56 low-copy SSR loci, and 36 undermethylated SSR loci (hereon referred to as traditional SSRs). One P. taeda (PtTX) marker (PtTX2146) derived from total genomic library amplified the same marker as EST-SSR locus RPTest9 (Elsik et al. 2000) From this point on, we will treat this locus as an EST-SSR. The lists of P. taeda (PtTX) microsatellite series can be found in the Conifer Microsatellite Handbook (Auckland et al. 2002). 2.2.2 Cross-species transferability of microsatellites Eight P. taeda EST-SSRs (Table 2.2) and 99 P. taeda (PtTX) SSRs were used to examine the cross-species transferability of microsatellites on four to eight individuals of lodgepole pine (P. contorta). Pinus taeda EST-SSRs were also used to test cross-species transferability on two pine species, Ponderosa pine (P. ponderosa Dougl.) represented by two individuals and Scots pine (P. sylvestris L.) represented by three individuals (Table 2.3). First, presence or absence of microsatellite PCR products Was scored on 2% agarose gels. When microsatellite fragments were found to be present, they were tested for polymorphism on 6% (Long Ranger™) polyacrylamide gels using a LiCor 4200 automated sequencer (LiCor Inc., Lincoln, NE). Microsatellite products were detected by M13 tailed primer Oetting et al. 1995) or end-labelled primer. To determine the success of cross-species transferability of P. taeda SSR markers, the three classes of microsatellite markers were classified according to the transfer criteria described by Shepherd et al. (2002) as follows: (1) polymorphic, (2) monomorphic, and (3) polymorphic but with low product yield or non-specific amplification. 38 2.2.3 DNA extraction and polymerase chain reaction optimization Genomic D N A was isolated from individual vegetative buds or germinants (1 week old) using a modification of the C T A B (cetyltrimethyl ammonium bromide) method (Doyle & Doyle 1987). PCR reactions were carried out in 10 pi (final volume) using an MJ Research PTC-100 thermal cycler (Watertown, M A ) based on the protocol proposed by Elsik et al. (2000), with modifications. Each reaction was composed of 50 ng of total genomic DNA, 1 pmol of each primer, 0.5 m M each of dATP, dCTP, dGTP, and dTTP, I X Buffer (10 mM Tris-HCl, 1.5 m M M g C l 2 , 50 m M K C l , pH 8.3) (Roche, Laval, Que.), 0.25 U of Taq D N A (Roche, Laval, Que.). When the tailed primer was used for amplification, 0.3 pmol of M l 3 Infrared Label (LiCor Inc., Lincoln, NE) was added to PCR reactions. Samples were amplified as follows: 5 min at 94 °C, followed by 30 cycles of 1 min at 94 °C, 1 min at the annealing temperature (see Auckland et al. 2002), 1 min at 72 °C, followed by a long denaturation cycle of 3 min at 72 °C. PCR conditions were optimized for loci having complex banding patterns or low yield by changing annealing temperature and/or primer concentration. Two microliters of stop dye buffer (LiCor Inc., Lincoln, NE) were added to each PCR reaction tube, and PCR reactions were kept at -20°C in the dark until electrophoresis. 2.2.4 Scoring and estimating genetic polymorphism Polymorphisms of transferable P. taeda EST-SSR and P. taeda (PtTX) SSR markers were tested on P. contorta ssp. latifolia. A l l genotypes were scored using SAGA™ software (LiCor Inc.). The allelic diversity of eight P. taeda EST-SSR loci and 16 polymorphic P. taeda PtTX SSR loci (Table 2.4) were based on an evaluation of 24 individuals sampled across the natural range of P. contorta ssp. latifolia. 39 2.2.5 Sequence verification and comparison To verify the cross-species amplification of microsatellite fragments the amplification products of 16 polymorphic PtTX SSR markers were sequenced from P. contorta ssp. latifolia. Seven P. taeda EST-SSR markers (LOP nos. 1, 3, 5, 8-9, 11, PtTX2146) were also tested for sequence variation in P. contorta ssp. latifolia, P. ponderosa, and P. sylvestris. For each of these markers, two homozygous PCR products were amplified and purified using Qiaquick PCR amplification kits (QIAGEN Inc., Mississauga, Ont.) and sequenced using SequiTherm EXCEL™ II Long-Read D N A sequencing Kits-LC (Epicentre Technologies) on a LiCor 4200 automated sequencer. D N A sequences were aligned using ESEE3S software (Cabot & Beckenbach 1989) and then edited manually. To determine similarity in flanking regions and repeat motifs, sequences of each locus were compared to the original P. taeda sequences from GenBank (http://www.ncbi.nlm.nih.gov/BLAST/). The frequency of base substitution in the flanking regions was calculated as a percentage of the proportion between the total number of base substitutions and the total number of base pairs in the flanking regions. The insertion/deletion was also counted by comparing the sequence data to P. taeda. A total of 36 sequences indicated with following accession numbers were conducted (Tables 2.4 and 2.6). 2.2.6 Mendelian inheritance analysis of SSR markers Only 19 microsatellites (Table 2.8), which were subsequently selected for further study, and 40 half-sib families each with 20 seeds collected from two natural P. teada spp. latifolia populations were used to determine the mode of inheritance of these microsatellite markers. The maternal genotypes were inferred by assigning a specific color code to each allele within a progeny array of heterozygous tree. Based on the rules of codominant 40 Mendelian inheritance all offspring within a progeny array should display at least one of the maternal alleles. The heterozygous genotypes of offspring, which have similar genotypes to a maternal genotype, were excluded from the analyses. The total numbers of heterozygous genotypes of maternal trees included in the segregation analysis are listed in Appendix IV. The observed maternal allele's ratios of each locus were tested against the expected Mendelian segregation ratio (1:1) using log-likelihood G test. 2.3 Results 2.3.1 Frequency and distribution of microsatellites A total of 98 EST sequences with repeat motifs n > 8 for dinucleotides and n > 6 for trinucleotides were identified among 55,000 loblolly pine EST sequences, of which 62 sequences contained dinucleotide motifs (AT, A G , and GT) and 36 contained trinucleotide motifs (mostly A A G and GGC) (Fig. 2.1). 2.3.2 SSR-marker development and cross species transferability Of the 14 P. taeda EST-SSRs developed, three primer pairs (LOP4, LOP6, and LOP 14) produced multiple bands and four primer pairs (LOP2, LOP7, LOP 10, and LOP 13) were not successful in amplification of SSR products in P. taeda. The remaining seven primer pairs (LOP1, LOP3, LOP5, LOP8, LOP9, LOP11, and LOP12) were successful in amplification of SRR products in P. taeda. These seven LOP loci and one PtTX2146 locus amplified products of the expected size in all species tested (Tables 2.2 and 2.3). Of the eight primer pairs that produced clear reproducible bands, four pairs (LOP1, LOP5, LOP12, and PtTX2146) were polymorphic in all species and three pairs (LOP8, LOP9, and LOP11) were polymorphic in P. taeda and P. contorta ssp. latifolia, but were monomorphic in P. 41 ponderosa and P. sylvestris (Table 2.3). The remaining pair (L0P3) was polymorphic in P. ponderosa and P. sylvestris, but not in P. taeda and P. contorta ssp. latifolia (Table 2.3). Allele size and number of alleles detected by the EST-SSRs within the four pine species are shown in Table 2.3. The polymorphisms of those eight EST-SSRs, tested on lodgepole pine, are shown in Table 2.4. Allele number ranged from 1 to 17 with a mean of 7.88 per locus while observed heterozygosities ranged from 0.043 to 1.000 (Table 2.4). Of the 99 PtTX microsatellite loci tested, 39 loci amplified products (Table 2.2) and the remaining 60 loci gave no amplification products in P. contorta ssp. latifolia. Twenty-five of the 39 loci produced single and clear amplification products, but the remaining 14 loci produced weak, null and/or non-specific amplifications (Table 2.2). Of the 25 single-banded loci, 16 produced polymorphic bands and the remaining nine were monomorphic (see Table 2.4 for the description of these 16 polymorphic loci). The allele number ranged from four to 14 with a mean of 8.50 per locus, while observed heterozygosities (Ho) ranged from 0.304 to 0.870 (Table 2.4). Null alleles, observed in both EST-SSRs and PtTX SSRs (Table 2.2), were inferred from markers giving no amplification products and excess homozygotes. The transferability successes and polymorphisms in P. contorta ssp. latifolia based on the sources of SSR are shown in Table 2.5. The success rates of transferability of EST, genomic, low-copy, and undermethylated SSRs were 100, 29, 23, and 30%, respectively with the expected heterozygosities (HE) from the respective sources as 0.54, 0.64, 0.74, and 0.85. However, there were no significant differences in average expected heterozygosities (HE) between the sources of SSR marker (one-way A N O V A F-value = 1.84, P value = 0.17; Table 2.5). 42 2.3.3 Sequence comparison at microsatellite loci Sequencing of PCR products confirmed the presence of microsatellite repeats at all loci. The repeat structures of seven P. taeda EST-SSRs and 16 PtTX microsatellites were highly conserved in P. contorta ssp. latifolia with the exception of LOP1, PtTX3025, PtTX3011, and PtTX3030. Single-base pair substitution caused interrupt and compound repeats in P. contorta ssp. latifolia at locus LOP1 {(TA)3CA(TA)3} and locus PtTX3025 {(CAA )5 (CAAAAA )3} , respectively. A single base pair substitution in the repeat of the SSR at locus P f D G O l l also altered the length of poly(A)„ from (A) 6 to (A) 3 [(GAA)6(A)3(GAA)3 . . . (GAT)i3)]. A 54-bp insertion was observed between the compound microsatellite repeat of PtTX3030 in P. contorta ssp. latifolia (Fig. 2.2). The presences of insertion/deletion (indel) in the flanking regions of P. contorta ssp. latifolia (compared to the P. taeda sequences) were lower for EST-SSRs (43%) than for PtTX microsatellite markers (56%) (Table 2.4). In addition, EST-SSRs seem to have a small number of base pairs inserted or deleted when compared to PtTX microsatellite markers. The average base substitution for each locus was 1.02 and 2.40 per 100 bp in EST-SSRs and PtTX microsatellite markers, respectively (Table 2.4). A Mest indicated no significant difference in the average base substitution between two sources of microsatellite markers (two-tailed t-tests, t value of -1.49, P value = 0.15). The lengths of flanking sequences of EST-SSRs and PtTX microsatellites are shown in Table 2.4 indicating that frequency of indels and base substitution are not depend on the flanking sequence length. PCR products at seven EST-SSR loci from P. ponderosa and P. sylvestris were also sequenced. Table 2.6 represents the repeat structure and the presence of indels (compared to the P. taeda sequences). Table 2.7 represents pairwise comparison of the percentage of base 43 substitution in the flanking regions between pine species tested. The average base substitution across loci of each species was 0.98, 1.02, and 1.76 per 100 bp for P. ponderosa, P. contorta ssp. latifolia, and P. sylvestris, respectively (compared to the P. taeda sequences). The lowest average base substitution across loci was observed between P. ponderosa and P. contorta ssp. latifolia (0.20 per 100 bp). There were no changes in the flanking sequences observed at locus LOP9. Size homoplasy was observed at locus LOP3 and LOP8. At locus LOP3 P. ponderosa, P. contorta ssp. latifolia, and P. sylvestris shared the same allele sizes at 209 bp; however, a single base pair substitution in the flanking region (T—>G) was observed in P. contorta ssp. latifolia. Similarly, an identical allele size of 370 bp was observed at locus LOP8 in P. ponderosa and P. contorta ssp. latifolia, but sequencing revealed a single base pair substitution in the flanking region (C—>T) in P. ponderosa (Fig. 2.2). 2.3.4 Mendelian inheritance analyses A total of 194 heterozygous genotypes of maternal trees were inferred from the segregation of 19 microsatellite loci and used for Mendelian inheritance analyses. For the purpose of adequate statistical power, analyses for only those 108 genotypes that occurred at least in two parents, except for PtTX 4046, are shown in Table 2.8. The complete lists of analyses are shown in Appendix III. There were no significant differences in segregation analyses for both pooled and heterogeneity G values, thus confirming that all 19 microsatellite loci segregated as expected in a Mendelian fashion. (Table 2.8; Fig. 2.3). However, maternal genotypes were heterozygous for null alleles in some families at PtTX2128, PtTX3034, PtTX3107, PtTX4046, and PtTX4139, as evidence by the observation of more than two homozygous offspring genotypes within progeny array. 44 2.4 Discussion 2.4.1 Prevalence of microsatellites in ESTs Among dimeric repeats, the motifs A T (80.6%) and A G (17.7%) were most common in databases, whereas TG motifs were rarely detected (1.6%) (Fig. 2.1). Similarly, the AT motif was the most prevalent in spruce (Picea) ESTs (Rungis et al. 2004). In contrast to other crop species, the A T motifs were the rarest (Temnykh et al. 2000; Thiel et al. 2003). Among trimeric repeats, A A G (33.3%) and GGC (44.4%) were the most abundant (Fig. 2.1). Similarly, the motifs A A G and GGC appeared common in Arabidopsis (Cardie et al. 2000) and GGC was most abundant in maize (Chin et al. 1996). However, it is difficult to compare the prevalence of a particular motif in different plant species due to a difference in minimal motif repeat criteria. 2.4.2 Transferability of P. taeda microsatellites Most primers amplified genomic regions of the expected size in the majority of species tested (Table 2.2). The reason for this can be explained by the conservation of the sequence repeats and flanking regions in a majority of pine species. P. taeda EST-SSRs had a high transfer rate (100%) across different subsections of pine species (Fig. 2.4a). In this study, the success rate of transferability of EST-SSRs was also higher than that of other types of SSRs in P. contorta (Table 2.5). Similarly, the ability to transfer EST-SSRs among closely related genera has been reported in spruce (Picea) (Rungis et al. 2004) and crop species (Chen et al. 2002; Cordeiro et al. 2001; Decroocq et al. 2003). In contrast, the level of transferability of genomic and low-copy SSRs observed in this study was lower than that of transferability from P. taeda to P. elliottii var. elliottii and P. caribaea var. hondurensis (Shepherd et al. 2002). This result can be explained by the fact that the transferability 45 success reduces as the evolutionary distance between the source and target species increases. Pinus elliottii var. elliottii and P. caribaea var. hondurensis are found to be more closely related to P. taeda than P. contorta ssp. latifolia (Little & Critchfield 1969). EST-SSR markers with trinucleotide repeats were less polymorphic than dinucletide repeats and had low polymorphisms when compared to markers from other sources. Similarly, the low variability of trimeric EST-SSR loci was reported in Oryza sativa L. (Cho et al. 2000). Dimeric EST-SSR markers with high numbers of repeats seem to have high polymorphisms as do genomic SSRs. The relationship between polymorphisms and the number of repeats has been reported for dimeric, trimeric and tetrameric EST-SSRs in barley (Hordeum vulgare L.) (Thiel et al. 2003). However, the analysis of a large set of primer pairs will be required to confirm this result, especially for trimeric and tetra-meric EST-SSRs. On average, the level of genetic polymorphisms was lower for EST-SSRs than for other types of microsatellite (Table 2.5, Fig. 2.4). The results of this study were similar to previous studies in Spruce (Picea), rice (O. sativa L.), and barley (H. vulgare L.). In spruce, the EST-SSRs showed significantly less variation than the genomic-derived SSRs; HE values were 6.25% less in white spruce, 15% less in black spruce, and 9% less in Sitka spruce (Rungis et al. 2004). In rice, microsatellites derived from ESTs had a lower level of polymorphism than those derived from genomic library (83.8 versus 54.0) (Cho et al. 2000). The mean level of polymorphism is also lower for EST-SSRs (0.45) than for genomic SSRs (0.58) in barley (Thiel et al. 2003). Successful cross-amplification of microsatellites also has been reported among pine species that have diverged over 100 million years ago (Kutil & Williams 2001). In their study Kutil & Williams (2001) found that microsatellites from hard pine (P. taeda L.) had 46 trans-specific amplification in both hard and soft pines. They also suggested that perfect trinucleotide repeat SSRs seem to cross-amplify better than do compound SSRs. 2.4.3 Analysis of sequence variation at the microsatellite loci Sequence analyses confirmed that the level of conservation in the microsatellite motif and flanking regions was higher for EST-SSRs than for other types of microsatellite. Most of the variation in allele length among pine species was mainly due to changes in the number of repeat motifs in the microsatellite region, combined with indel and base substitution in flanking regions. The percentage of base substitution of EST-SSRs was not strongly supported the phylogenetic study in pine species by Krupkin et al. (1996). Pinus taeda and P. ponderosa were close relatives and P. contorta emerged as a sister group to this pair and more closely related to P. ponderosa. P. sylvestris was highly differentiated as outgroup. However, our results shown that the lowest average base substitution across loci were observed between P. ponderosa and P. contorta ssp. latifolia. This result is not in concordance with the previous pine phylogeny because EST-SSRs are short and highly conserved sequences, and thus may not differentiate the phylogenetic relationship in closely related pine species. However, there was a tendency of increasing in base substitution when genetic distance between species increased such as in locus LOP8. Similar results to our study were observed by Karhu et al. (2000) who studied the evolution of microsatellites in pines. 2.4.4 Functional roles of EST microsatellites There have been few reports of the functional roles of microsatellites located near or within coding regions in plants. However, the typical example of the characteristics of microsatellites in regulatory gene has been well conducted in yeast (Saccharomyces 47 cerevisiae). In the yeast genome mono-, di-, and tetranucleotie repeats were located primarily in adjacent regions (intergenic regions), whereas trinucleotide repeats were often found in open reading frames (ORFs) and were related to cellular regulation (Richard & Dujon 1996; Young et al. 2000). Young et al. (2000) reported that certain types of trinucleotide repeats were overrepresented in ORFs and encoded a biased set of amino acid. They suggested that negative selection might act against certain trinucleotide repeats at different levels. Recent study in the feature of microsatellites within transcribed regions of rice and Arabidopsis was reported by Fujimori et al. (2003). In their study, microsatellites in the transcribed regions of rice and Arabidopsis were frequently found in the 5'UTRs than in coding regions or 3'UTRs, suggesting that they can potentially act as factors in regulating gene expression. The waxy gene of rice containing GA/CT repeats in the 5'UTR was found to be associated with amylose content (Ayres et al. 1997). Microsatellites (CCG) n in 5'UTRs of some ribosomal protein genes of maize may be related to the regulation of fertilization (Dresselhaus et al. 1999). 2.4.5 Implications for forest genetic research Published ESTs and SSR markers from related species proved to be valuable resources for SSR marker development in P. contorta. The transferability success reduces as the evolutionary distance between the source and target species increases. According to phylogenetic relationship in pines (Krupkin et al. 1996), it can be expected that transferability success of PtTX microsatellites should be higher in P. ponderosa than in P. sylvestris. Our 16 polymorphic PtTX microsatellites are a good source for testing cross-species transferability in particular for genetic study in P. ponderosa. Our results also suggest that the level of polymorphisms in Pinus EST-SSRs also may depend on the number 48 of repeats, and EST sequences with dinucleotide repeats of ten or more can be useful for the development of informative microsatellites in Pinus spp. Although EST-SSRs have low levels of polymorphisms, their ability to amplify across species/genera is high and they may be associated with genes of known function. The variability of flanking-region sequences of SSRs can be used to elucidate the genetic history of pine species from the standpoint of evolution. In addition, the analysis of these microsatellites will provide an important new tool for addressing questions related to conservation and tree improvement programs. Many parameters, such as genetic diversity in natural and breeding populations, gene flow, pollen and/or seed dispersal, and mating systems, are important for the conservation of genetic resources. In tree improvement programs, microsatellites can be used for QTL mapping, clone identification, estimating pollen flow/contamination, and determining male parentage of seeds produced in seed orchards. Table 2.1 Primer sets of 14 P. taeda L. EST-SSRs markers Locus Forward Primer (5'-3') Reverse Primer (5'-3') T a a Repeat Accession (°C) motif number LOP1 GGCTAATGGCCGGCCAGTGCT GCGATTACAGGGTTGCAGCCT 55 (TA) 1 0 AI812473 LOP2 GTCTCCAGCCAGTTCACCTGC CTTCACCACGTAGGCCCGCTC 55 (TA), A WO 10960 LOP3 GTCTCCAGCCAGTTCACCTGC CAGTGGATCTGTCACCTCCTC 48 (TA), AA556662 LOP4 GCCTCATCATATGAAAAGCAA CATTGTTCTCACTACGAATGC 49 (TA) 2 0 AW888197 LOP5 AGCCGTAAAAGCTATCTTGTG GGCATACTTACATTTTAATAA 45 (TA) 3 3 AW758812 LOP 11 CCAGAAGGCTATAGTACAC CAACAATACAAGTAGCAATAC 45 (TA) 2T(TA) I 2 AA739689 LOP 12 AGGACAGTCCTTACTGCCCAA CATGTTTTCCCATGGTTTTCC 45 (TA) 2 6 AW888197 LOP 13 GGCTGGAAAGTGGTCTTTGTT ACATAAAATGCATAATAAACG 46 (TA) 2 1 BE 187296 LOP 14 GGTCCATCTGGTTATATATTG AGGAATTTCGCCACTTCACTG 50 (TA) 1 2 BF517779 L0P6 AGTTTTATCCATGCTGCACAG ACCTAAAGCCCAATATCCACA 55 (AAT) 7 AA556221 LOP7 CGGGGAATTGATAGTGTG TCATCGTCCTCAGCTGCAAGT 54 (AAG) S AW981642 L0P8 TATCCACCAGAAGGGCATC CGGGAGCTTTAATGATCTTGA 50 (CCT) 6 AI725303 L0P9 GGATTCTCGTTGTGGCTGG TTGCCTTTGCACATAATATCT 55 (GGC)6 AI813163 LOP 10 CTCTCCTCCGGCTATTTGCAG CGGCGAAGCTCTTCATTCCT 59 (AAG) 6 BE520122 T a (annealing temperature) derived from Oligo 6.0 program 50 Table 2.2 Transferable microsatellite loci from P. taeda L. to P. contorta ssp. latifolia Marker Repeat Motif Source of Ta Allele size (bp) Quality Library" (°C) P. taeda* P. contorta" Classd L0P1 (TA) 1 0 EST 55 161 153 1 LOP5 (TA) 3 3 EST 45 209 168 1 LOP8 (CTT)6 EST 50 369 370 1 LOP9 (GCC)c EST 55 142 131 1 LOP 11 (TA) 2T(AT), 2 EST 45 254 243 1 LOP 12 (TA) 2 6 EST 45 191 154 1(N) PtTX2146 (GCT)4GCC(GCT)7GCC(GCT)g EST 55 180 196 1 LOP3 (TA), EST 48 220 209 2 LOP4 (TA)2(, EST 49 207 208 3(M) LOP6 (AAT), EST 55 228 259 3(M) LOP 14 (TA) I 2 EST 50 300 320 3(M) PtTX2123 (AGC) 8 G 55 202 200 1 PtTX2128 (GAC)8 G 55 245 228 1 PtTX2183 (CAA) I 8 G 55 205 100 3(M) PtTX3011 (GAA) 5(A) 6(GAA) 3... (GAT), 5 LC 55 186 178 1 PtTX3025 (CAA) 1 0 LC 59 266 266 1 PtTX3029 (GCT) 5...(GCT)8...(GCT) 5 LC 61 255 274 1(M) PtTX3030 (TA) 4...(GGT),„ LC 59 327 320 1 PtTX3034 (GT),o(GA)13 LC 55 207 207 1 PtTX3049 (TG) 1 6 LC 55 311 302 1 PtTX3052 (ATC) 8...(ATC) 4 LC 55 242 239 1 PtTX3107 (CAT), 4 LC 55 182 177 1 PtTX3127 (CAA) 1 ( I LC 55 183 187 1 PtTX2003 (ACC) 8 LC 61 122 125 2 PtTX2082 (GT) I 4(GAGT) 7(GA) 1 3 LC 61 208 200 2 PtTX3002 (GAG) 6... (GAG) 4AA(GAG) 4 LC 65 194 191 2 PtTX3091 (GTT) 1 ( ) T, 3 GGT I O CT 5 LC 64 229 172 2 PtTX2034 (TTTG) 9 LC 55 170 123 3(M) PtTX2037 (GTGA) 8GT, 4 LC 58 177 145 3(N) PtTX3020 A, 6 (CAA), LC 61 183 183 3(M) PtTX3045 (CA) 1 2 LC 55 226 210 3(M) PtTX3055 (GAT) 5...(GAT) 8...(GAT>, LC 59 402 375 3(N) PtTX3090 (CAC) 4(CAT) 2 4CAC(C AT),, LC 57 259 285 3(W) PtTX3098 (GTT)8 LC 61 187 185 3(M) PtTX3105 (GTT), LC 55 258 170 3(N) PtTX3116 (TTG) 7...(TTG) 5 LC 55 146 122 3(N) PtTX4046 (TA)3(TG)1 3 UM 55 363 331 1 PtTX4054 (GA) 2 1 UM 55 179 292 1 PtTX4056 (GA)„ UM 65 436 429 1 PtTX4058 (CA)3(GA)2 I ) UM 55 188 158 1 PtTX4139 (CT) 2 1 UM 59 153 113 1 PtTX4009 (CA) 3TA(CA) 1 4 UM 63 280 252 2 PtTX4050 (CA) 6...(CA) 3 UM 57 188 185 2 PtTX4090 (CTT) 6...(CTT) 8 UM 55 188 183 2 PtTX4112 (AT)6(GT)6 UM 57 463 424 2 PtTX4146 (GAA)s UM 59 126 115 2 PtTX4004 (GT),4...(T) 4(GTMT), UM 55 175 168 3(M) PtTX4092 (GAA) 2 1 UM 57 162 150 3(M) PtTX4100 (GAA) 8 UM 61 207 210 3(W) PtTX4137 (GAA) 2 1 UM 61 139 145 3(M) 3 Source of microsatellite markers: EST; Expressed sequence tag; G; Genomic library; L C ; Low-copy library; U M ; Undermethylated library b Expected size based on the Conifer Microsatellite Handbook c Allele size based on most common allele d Quality class of microsatellites: 1 polymorphic; 2 monomorphic; 3 poor amplification; M Multiple alleles; N Null alleles; W Weak amplification 51 Table 2.3 Allele size (in bp) and number of alleles (A) of seven EST-SSR loci in four different pine species P. contorta ssp Locus P. taeda P. ponderosa latifolia P. sylvestris Allele size A Allele size A Allele size A Allele size A LOP1 153,155,157 3 155,157 2 153,155,159 3 158,162 2 LOP3 208 1 209,215 2 209 1 209,213,217 3 LOP5 173,213,217,241 4 174,176,178,192 4 168,178,186,216 4 166-190 5 LOP8 369,375 2 370 1 370,373 2 367 1 LOP9 135,138 2 132 1 128,131,137 3 135 1 LOP 11 243-269 6 243 1 243,253,272 3 235 1 LOP 12 154,156 2 152,154,156 3 150,154,160,168 4 156,160 2 PtTX2146 163,172,175 3 163,178,187 3 169,190,193,196 4 181,193,211,220 4 52 Table 2.4 Polymorphisms, indels (insertions/deletions), and base substitution (BS) in flanking regions (relative to P. taeda sequences) in P. contorta ssp. latifolia at eight EST-SSR loci and 16 polymorphic PtTX microsatellite loci that were successful in cross-species amplification Marker" Allele size No. of H0h Insertion0 Deletion0 BS C Flanking Change in Accession range (bp)2 alleles3 (%) sequence SSR length (bp) structure0 number LOP1 153-163 4 0.261 0.278 0 0 0.70 141 Y E S AY330148 LOP3 209 1 - - 0 1 2.00 201 No AY330152 LOP5 166-252 17 0.826 0.946 0 1 0.65 118 No AY330155 LOP8 370-373 2 0.217 0.198 1 0 0.84 358 No AY330158 LOP9 128-137 3 0.043 0.273 0 0 0.00 116 No AY330161 LOP 11 243-271 14 1.000 0.908 0 0 0.44 225 No AY330164 LOP 12 150-176 10 0.333 0.899 N A d N A N A N A N A N A P1TX2146 160-208 12 0.913 0.824 0 0 2.54 117 No AY330133 Mean 7.88 0.449 0.541 1.02 182 PtTX2123 194-203 4 0.652 0.630 2 1 3.61 178 No AY330131 PtTX2128 228-237 4 0.565 0.654 1 3 3.27 214 No AY330132 PtTX3011 151-211 14 0.696 0.892 0 0 0.81 103 Y E S AY330134 PtTX3025 254-305 7 0.636 0.653 0 0 0.42 236 yes AY330135 PtTX3029 253-304 11 0.739 0.885 0 1 0.00 166 No AY330136 PtTX3030 318-325 4 0.304 0.704 0 8 4.72 223 Y E S AY330137 PtTX3034 201-221 11 0.652 0.708 0 2 0.63 159 No AY330138 PtTX3049 300-332 11 0.696 0.885 0 0 14.11 276 No AY330139 PfTX3052 239-254 5 0.478 0.446 1 1 2.34 199 No AY330140 PtTX3107 156-177 6 0.348 0.760 0 0 1.43 138 No AY330141 PtTX3127 169-202 7 0.409 0.689 0 0 0.66 158 No AY330142 PtTX4046 324-342 6 0.500 0.766 0 6 2.90 314 No AY330143 PtTX4054 268-302 14 0.826 0.913 0 0 0.00 254 No AY330144 PtTX4056 427-453 10 0.864 0.864 1 2 1.51 399 No AY330145 PtTX4058 128-160 13 0.870 0.901 0 0 1.02 100 No AY330146 PtTX4139 113-147 9 0.739 0.844 1 0 0.93 106 No AY330147 Mean 8.50 0.623 0.762 2.40 201 a LOP series were derived from EST, and PtTX series were derived from G (Genomic library), L C (Low-copy library), and U M (UM; Undermethylated library). b based on P. taeda microsatellite loci that were successful in amplifying PCR products in P. contorta ssp. latifolia. c based on the comparison between P. contorta ssp. latifolia and P. taeda. d N A indicates no data at this locus 53 Table 2.5 Transferability successes and polymorphisms of SSR markers from P. taeda to P. contorta ssp. latifolia based on the sources of library. A one-way A N O V A indicated no significant difference in HE, P value = 0.17 Source of No. of No. of No. of H E B Library3 loci tested amplified loci polymorphic loci EST 8 8 7 0.54±0.39 G 7 2 2 0.64±0.02 L C 56 13 9 0.74±0.14 U M 36 10 5 0.85±0.06 a Source of microsatellite markers: EST; Expressed sequence tag; G; Genomic library; L C ; Low-copy library; U M ; Undermethylated library B HE; average expected heterozygosity based on polymorphic markers 54 Table 2.6 Repeat structures of SSRs and indels (insertions/deletions) in the flanking regions (relative to P. taeda sequences) between four pine species at seven EST-SSR loci Locus Species Repeat motif Insertion Deletion Accession number LOP1 P. taeda (TA) 1 0 0 0 P. ponderosa (TA) 1 0 0 0 AY330149 P. contorta (TA) 3CA(TA) 3 0 0 P. sylvestris (TA)2T(TA)7 0 0 AY330150 LOP3 P. taeda (TA)9 0 0 P. ponderosa (TA)4 0 1 AY330151 P. contorta (TA)4 0 1 P. sylvestris (TA)4 0 1 AY330153 LOP5 P. taeda (TA) 3 3 0 0 P. ponderosa (TA) 1 8 0 1 AY330154 P. contorta (TA)9 0 1 P. sylvestris (TA), 0 0 1 AY330156 LOP8 P. taeda (CCT)4 0 0 P. ponderosa (CCT)4 1 0 AY330157 P. contorta (CCT)4 1 0 P. sylvestris (CCT)3 1 0 AY330159 LOP9 P. taeda (GGC)6 0 0 P. ponderosa (GGC)5 0 0 AY330160 P. contorta (GGC)5 0 0 P. sylvestris (GGC)6 0 0 AY330162 LOP 11 P. taeda (TA) 2T(TA) 1 2 0 0 P. ponderosa (TA)2TTATG(TA)4 0 0 AY330163 P. contorta (TA)2T(TA)7 0 0 P. sylvestris NA a N A N A N A PtTX2146 P. taeda (GCT)4GCC(GCT)7GCC(GCT)8 0 0 P. ponderosa (GCT)4GCC(GCT)5GCC(GCT)1 2 0 0 AY489266 P. contorta (GCT) 4GCC(GCT) I 0GCC(GCT) 1 0 0 0 P. sylvestris (GCT)3CCT(GCT)8CCT(GCT)1 5 0 0 AY489267 N A indicates no sequence data available Table 2.7 The percentage of base substitution in the flanking regions between Species Locus P. ponderosa P. contorta P. sylvestris P. taeda LOP1 0.70a 0.70 0.70 LOP3 1.52 2.00 1.52 LOP5 0.65 0.65 1.31 LOP8 0.56 0.84 2.80 LOP9 0.00 0.00 0.00 LOP 11 0.88 0.44 N A b PtTX2146 2.54 2.54 4.24 P. ponderosa LOP1 1.40 1.40 LOP3 0.51 0.00 LOP5 0.00 0.65 LOP8 0.28 2.80 LOP9 0.00 0.00 LOP 11 0.44 N A PtTX2146 0.00 1.69 P. contorta LOP1 1.40 LOP3 0.51 . LOP5 0.65 LOP8 2.51 LOP9 0.00 LOP 11 N A PtTX2146 1.69 a The percentage of base substitution was calculated as (total number of base substitutions/total number of base pairs in the flanking region) X 100 b N A indicates no sequence data available 56 Table 2.8 Log-likelihood G test on segregation ratios of 19 microsatellite loci in P. contorta ssp. latifolia seeds No. of Observed Pooled Heterogeneity Locus Genotype trees ratio G a P G b P PtTX2123 200:203 11 64:61 0.81 0.37 2.22 0.99 PtTX2128 222:234 2 17:19 0.11 0.74 1.86 0.40 228:234 4 36:26 1.62 0.20 3.73 0.29 PtTX2146 157:196 2 17:16 0.03 0.86 0.03 0.87 169:196 2 15:15 0.00 1.00 0.13 0.71 172:196 3 25:30 0.46 0.50 1.50 0.47 190:196 2 14:29 5.34 0.02 0.07 0.80 PtTX3011 172:175 3 25:31 0.64 0.42 0.47 0.79 PtTX3025 266:275 2 15:18 0.27 0.60 1.35 0.25 266:305 2 15:13 0.14 0.71 0.14 0.70 257:266 3 20:26 0.78 0.38 0.35 0.84 257:275 3 26:25 0.02 0.89 0.26 0.88 PtTX3029 271:274 2 20:12 2.02 0.16 0.14 0.71 PfTX3030 318:320 2 10:18 2.32 0.13 0.00 1.00 320:326 3 23:25 0.08 0.77 0.14 0.93 323:326 2 24:15 2.10 0.15 0.21 . 0.65 PtTX3034 199:207 2 14:16 0.13 0.71 0.00 1.00 207:213 2 18:16 0.12 0.73 0.13 0.72 207:219 2 22:13 2.34 0.13 0.08 0.78 209:219 2 29:29 0.00 1.00 0.00 1.00 PtTX3049 312:322 2 19:19 0.00 1.00 0.42 0.52 PtTX3052 239:248 3 19:17 0.11 0.74 0.66 0.72 PtTX3107 171:177 2 18:20 0.11 0.75 5.41 0.02 PtTX3127 178:187 5 39:36 0.12 0.73 1.52 0.82 PtTX4054 280:292 2 19:14 0.76 0.38 0.07 0.80 282:290 2 17:15 0.13 0.72 0.00 0.98 290:296 2 20:14 1.06 0.30 0.49 0.49 292:294 2 21:17 0.42 0.52 0.38 0.54 PtTX4056 429:445 2 16:18 0.12 0.73 0.00 1.00 PtTX4058 128:142 2 18:14 0.50 0.48 0.05 0.82 142:144 2 22:16 0.95 0.33 0.15 0.70 142:146 2 17:17 0.00 1.00 1.06 0.30 144:148 2 18:18 0.00 1.00 0.45 0.50 PtTX4139 113:129 2 19:19 0.00 1.00 0.00 1.00 113:135 3 23:27 0.32 0.57 0.03 0.99 113:141 2 17:16 0.03 0.86 0.26 0.61 127:131 2 21:15 1.00 0.32 0.11 0.74 129:139 2 24:15 2.10 0.15 1.25 0.26 LOP5 168:178 2 25:16 1.99 0.16 0.27 0.61 170:178 2 15:17 0.13 0.72 0.47 0.49 LOP 11 243:249 2 22:14 1.79 0.18 0.07 0.79 243:251 2 15:15 0.00 1.00 0.54 0.46 245:247 3 30:24 0.67 0.41 0.68 0.71 PtTX4046 331:341 1 5:7 0.33 0.56 - -331:335 1 12:8 0.81 0.37 - -a Pooled G values indicate the overall deviation from 1:1 ratio. b Heterogeneity G values indicate the amount of heterogeneity in the segregation 57 Figure 2 . 1 Distribution of the di- and trinucleotide SSR-ESTs. The frequency of SSRs observed within each repeat classes (%) are shown in parentheses 58 L0P8 P. taeda TATGCACCAGAAGGGCATCCTCAGGCATACCCC (GCT) 4GGGTATCCTCAACATGTTTATCCTCCCCCTCAGGATTATGCTCCTGCTCAGAC P.ponderosa ...........................G (CCT)4 . . . . . . . . : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. contorta ...........................G.....(CCT)4 .... ... . : . . . . . P.sylvestrls ....... (CCT) 3 ........T ... .;..:....... P. taeda AAAGC&GAGAGGAGATGGCTTCTGGAAAGGATGGTGAGTTAACAATTTCTAM^  P.ponderosa .A...... P.contorta .A...... P.sylvestrls . . . . . . .>...•.:.......'.. .........A.G P.taeda AAGGTTTGCATGGTTAGAAAAACTGCGTAATATATCTGAAACTGGGTACCCATTTTTATTTTACAAATTCTGATAAAGAAGGCTTTCGTTGC P.ponderosa ........... :v. P. contorta . . . . . . . . . . . . . . . . . . . . . . . . i.T' P.sylvestrls .. .. ............ .T.....-C.. .T. ............. .. .......CG. P.taeda TGTTCTATGTTGCAGTTGCGCTGCCCTCTGTTGCTGCTGTATGTTGGA»CTTGCTTTTGAGATGGTTCAAGATCATTAAAGCTCCCG P.ponderosa :••..............:...... .C P.contorta .C... .-. — P.sylvestrls . ................ ............. .C.................. .G. ;........ PtTX3030 P.taeda AATGAAAGGCAAGTGTCGAAAAGGACACTGATGCTTGTGATTTCACATTTTTATGAG P.contorta C . . . ...... A......................... — (TA) 5 P.taeda TTAAACA (TA) 4 A • • '• (GGT) 10GGACCTNCAATGAGAC P.contorta ... ..... . .... TAGTTTA. TGATTAGATTTTTTTAAGAAGAGAAAATAGTAATTTTTATTAAAAAAATAAAGT (GGT) 6 .>Ci P.taeda . CAAATTGGTCTATGTGACCTATAATTTTCTTAGTTCTTAATTGTCTAGCTTNTAGTCTTTGTTCCCGAGATTGGTGTTGTAGCTTCTTTTTT P.contorta .CA .C....... .C........C. .GG...........T. .C............... P.taeda TTCTCTATGTTCTTNCAACTTCCTTTATCTTGCATCTC P.contorta ..G. ...C Figure 2.2 Nucleotide sequence comparison of two P. taeda microsatellite loci (LOP8 and PtTX 3030). The dots indicate conserved nucleotides (relative to P. taeda) 59 204 bp m 200 bp • 175 bp • • j * " * * • ' m ^ 145 bp Figure 2.3 An example of inheritance at microsatellite locus PtTX3011 from an open-pollinated half-sib family of P. contorta ssp. latifolia. Arrows indicate the segregation of maternal alleles, confirming the mode of Mendelian inheritance 60 200 bp a 175 bp 145 bp • P. contorta P. ponderosa P. sylvestris P. taeda 255 bp - b 230 bp s i <• * 204 bp 200 bp 175 bp -145 bp PtTX2146 PtTX3034 PtTX3107 Figure 2.4 a Transferability of P. taeda EST-SSR at locus LOP1 on different pine species, b An example of cross transferability of microsatellite markers from P. taeda to P. contorta at one EST-SSR marker (PtTX2146) and two PtTX markers (PtTX3034 and PtTX3107). Note all allelic size have Licor primer tails 61 2.5 References Auckland L D , Bui T, Zhou Y , Shepherd M , Williams C G (2002) Conifer Microsatellite Handbook Corporate Press, Raleigh, North Carolina. Ayres N M , McClung A M , Larkin PD, Bligh HFJ, Jones C A , Park WD (1997) Microsatellites and a single-nucleotide polymorphism differentiate apparent amylose classes in an extended pedigree of US rice germ plasm. Theoretical and Applied Genetics 94, 773-781. Cabot EL, Beckenbach A T (1989) Simultaneous Editing of Multiple Nucleic-Acid and Protein Sequences with Esee. Computer Applications in the Biosciences 5, 233-234. Cardie L, Ramsay L , Milbourne D, Macaulay M , Marshall D, Waugh R (2000) Computational and experimental characterization of physically clustered simple sequence repeats in plants. Genetics 156, 847-854. Chen X , Cho Y G , McCouch SR (2002) Sequence divergence of rice microsatellites in Oryza and other plant species. Molecular Genetics and Genomics 268, 331-343. Chin ECL, Senior M L , Shu H , Smith JSC (1996) Maize simple repetitive D N A sequences: Abundance and allele variation. Genome 39, 866-873. Cho Y G , Ishii T, Temnykh S, Chen X , Lipovich L , McCouch SR, Park WD, Ayres N , Cartinhour S (2000) Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice (Oryza sativa L.). Theoretical and Applied Genetics 100, 713-722. Cordeiro G M , Casu R, Mclntyre CL, Manners JM, Henry RJ (2001) Microsatellite markers from sugarcane (Saccharum spp.) ESTs cross transferable to erianthus and sorghum. Plant Science 160, 1115-1123. Decroocq V , Fave M G , Hagen L, Bordenave L, Decroocq S (2003) Development and transferability of apricot and grape microsatellite markers across taxa. Theoretical and Applied Genetics 106, 912-922. Doyle JJ, Doyle JL (1987) A rapid D N A isolation procedure for small quantities of fresh tissue. Phytochemical Bulletin 19, 11-15. Dresselhaus T, Cordts S, Heuer S, Sauter M , Lorz H, Kranz E (1999) Novel ribosomal genes from maize are differentially expressed in the zygotic and somatic cell cycles. Molecular and General Genetics 261, 416-427. Elsik CG, Minihan VT, Hall SE, Scarpa A M , Williams C G (2000) Low-copy microsatellite markers for Pinus taeda L. Genome 43, 550-555. Elsik CG, Williams C G (2001) Low-copy microsatellite recovery from a conifer genome. Theoretical and Applied Genetics 103, 1189-1195. Fujimori S, Washio T, Higo K, Ohtomo Y , Murakami K, Matsubara K , Kawai J, Carninci P, Hayashizaki Y , Kikuchi S, Tomita M (2003) A novel feature of microsatellites in plants: a distribution gradient along the direction of transcription. Febs Letters 554, 17-22. 62 Hackauf B, Wehling P (2002) Identification of microsatellite polymorphisms in an expressed portion of the rye genome. Plant Breeding 121, 17-25. Hicks M , Adams D, O'Keefe S, Macdonald E, Hodgetts R (1998) The development of RAPD and microsatellite markers in lodgepole pine (Pinus contorta var. latifolia). Genome 41, 797-805. Karhu A , Dieterich JH, Savolainen O (2000) Rapid expansion of microsatellite sequences in pines. Molecular Biology and Evolution 17, 259-265. Kinlaw CS, Neale DB (1997) Complex gene families in pine genomes. Trends in Plant Science 2, 356-359. Koch P (1996) Lodgepole pine in North America Forest Product Society, Madison WI. Krupkin A B , Liston A , Strauss SH (1996) Phylogenetic analysis of the hard pines (Pinus subgenus Pinus, Pinaceae) from chloroplast D N A restriction site analysis. American Journal of Botany 83, 489-498. Kutil B L , Williams C G (2001) Triplet-repeat microsatellite shared among hard and soft pines. Journal of Heredity 92, 327-332. Little EL, Jr., Critchfield WB (1969) Subdivisions of the Genus Pinus (pines). Miscellaneous Publication 1144. Oetting WS, Lee H K , Flanders DJ, Wiesner GL, Sellers TA, King R A (1995) Linkage analysis with multiplexed short tandem repeat polymorphisms using infrared fluorescence and M l 3 tailed primers. Genomics 30, 450-458. Richard GF, Dujon B (1996) Distribution and variability of trinucleotide repeats in the genome of the yeast Saccharomyces cerevisiae. Gene 174, 165-174. Rungis D, Berube Y , Zhang J, Ralph S, Ritland CE, Ellis BE , Douglas C, Bohlmann J, Ritland K (2004) Robust simple sequence repeat markers for spruce (Picea spp.) from expressed sequence tags. Theoretical and Applied Genetics 109, 1283-1294. Scott K D (2001) Microsatellites derived from ESTs, and their comparison with those derived from by other methods. In: Plant Genotyping: The DNA Fingerprinting of Plants (ed. Henry RJ), pp. 225-237. CABI Publishing, New York. Scott K D , Eggler P, Seaton G, Rossetto M , Ablett E M , Lee LS, Henry RJ (2000) Analysis of SSRs derived from grape ESTs. Theoretical and Applied Genetics 100, 723-726. Scotti I, Magni F, Fink R, Powell W, Binelli G, Hedley PE (2000) Microsatellite repeats are not randomly distributed within Norway spruce (Picea abies K.) expressed sequences. Genome 43, 41-46. Shepherd M , Cross M , Maguire TL, Dieters MJ , Williams CG, Henry RJ (2002) Transpecific microsatellites for hard pines. Theoretical and Applied Genetics 104, 819-827. Temnykh S, Park WD, Ayres N , Cartinhour S, Hauck N , Lipovich L, Cho Y G , Ishii T, McCouch SR (2000) Mapping and genome organization of microsatellite sequences in rice (Oryza sativa L.). Theoretical and Applied Genetics 100, 697-712. 63 Thiel T, Michalek W, Varshney R K , Graner A (2003) Exploiting EST databases for the development and characterization of gene-derived SSR-markers in the barley (Hordeum vulgare L.). Theoretical and Applied Genetics 106, 411-422. Young ET, Sloan JS, Van Riper K (2000) Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics 154, 1053-1068. Zhou Y , Bui T, Auckland L D , Williams C G (2002) Undermethylated D N A as a source of microsatellites from a conifer genome. Genome 45, 91-99. 64 CHAPTER 3 GENETIC VARIABILITY OF LODGEPOLE PINE: COMPARISON OF MICROSATELLITE AND AFLP MARKERS 3.1 Introduction Molecular genetic markers have provided novel approaches to the study of population, conservation and evolutionary genetics of plant populations. Novel approaches possible without traditional morphological markers include estimation of genetic diversity and their comparison among species (Hamrick & Godt 1989), progeny array analysis of plant mating systems (Brown & Allard 1970), paternity analysis of natural populations (Ellstrand & Marshall 1986), and the assessment of genetic diversity for the purpose of conservation core collections (Brown 1989). Isozymes and restriction fragment length polymorphism (RFLPs) had traditionally been used as markers in these examples. However, both techniques offer a relative limited number of loci as well as low allelic richness. Newer polymerase chain reaction (PCR) based markers such as microsatellites (SSRs) and amplified fragment length polymorphisms (AFLPs) are much more informative and are gaining wide use and utility in a multitude of genetic studies. Microsatellites are often regarded as the "marker of choice", as they are co-dominant and highly polymorphic. However, they require a costly development phase, as primers specific to each locus must be identified with prior knowledge of their sequence. This usually limits the number of loci that can be used for a particular species (generally 10 to 20). The second new class of markers, AFLPs, does not require any prior sequence information for primer design and provide an opportunity to study large number of randomly spread loci throughout the genome. However, AFLPs exhibit dominance, where 65 heterozygotes have the same banding phenotype as dominant homozygotes (recessive homozygotes have no band). Because each of these two types of markers has both advantages and disadvantages, two alternative strategies can be used to guide their selection for measuring genetic diversity: (i) sampling a few but highly informative markers (microsatellites) or (ii) selection of numerous, randomly distributed within the genome, but poorly informative markers (AFLPs) (Mariette et al. 2002a). Empirically, we can ascertain the best marker to use by examining studies where both AFLP and SSR markers are used, and comparing their results. This gives a qualitative assessment of the outcomes of real genetic and population processes. Congruent results between A F L P and SSR markers have been reported in the study of genetic similarity (e.g., distance) for a number of crop species: soybean (Powell et al. 1996), maize (Pejic et al. 1998), barley (Russell et al. 1997), and coconut (Teulat et al. 2000). However, comparative studies of A F L P and SSR markers at the population level show that the two marker types differ in their inferences about genetic diversity. Evolutionary factors such as drift and migration could affect genetic diversity either between loci (genomic heterogeneity) or among populations (population heterogeneity), hence producing these discrepancies (Mariette et al. 2002a). Theoretically, we can use computer simulation to gauge the relative statistical information of these two marker types. Doing this, Mariette et al. (2002a) showed that by increasing the number of AFLP loci at least four times relative to the number of microsatellite loci, A F L P markers were more efficient than microsatellites for estimating genetic diversity when level of genomic heterogeneity is high, such as under small 66 population size and low gene flow. They demonstrated that at least 10 times more AFLP loci than SSR loci were required to attain the same efficiency, when populations exhibit high gene flow and low level of genomic heterogeneity. Lodgepole pine (P. contorta Dougl. ex. Loud. ssp. latifolia Engelm.) is a fast-growing, short-lived and fire-adapted hard pine species. It is the most important commercial pine species in British Columbia and is also an important component of forest ecosystems in western North America (Critchfield 1980). Genetic variability of P. contorta ssp. latifolia has been investigated across geographic ranges using isozymes and RAPDs, revealing that the majority (>90%) of the genetic diversity resides within populations (Dancik & Yeh 1983; Fazekas & Yeh 2001; Wheeler & Guries 1982; Yeh & Layton 1979). Additionally, evidences that population differentiation are related to geographic variation (latitude, longitude, and elevation) were also reported (Rehfeldt 1988; Yeh et al. 1985; Yeh & Layton 1979). In this study, 10 natural populations of P. contorta ssp. latifolia were sampled from one of the species eight seed-planning zone (SPZ) in British Columbia (B.C.) (McAuley 1998). The Prince George SPZ, representing the species central distribution in B.C., with its high regeneration program reaching an average of 24 million seedlings from 1999 - 2003 (Woods 2003) is the focus of the present study. The objectives of this study were to: 1) document levels of genetic variation for SSRs and AFLPs in P. contorta ssp. latifolia in the Prince George SPZ, and 2) test the congruence of SSRs and AFLPs for the estimation of population diversity and differentiation. 67 3.2 Materials and Methods 3.2.1 Plant materials Dormant vegetative buds were collected from a random sample of 30 trees (per population) from 10 natural populations within the Prince George Seed Planning Zone and kept in a cool container and shipped to the Faculty of Forestry (Table 3.1). The collected data will provide the benchmark for genetic diversity present in natural populations within this SPZ. Upon arrival to the laboratory, all dormant bud samples were stored at -80°C until D N A extraction. 3.2.2 DNA extraction and procedures of SSR and AFLP Genomic D N A was isolated from individual vegetative buds using a modification of the CTAB (cetyltrimethyl ammonium bromide) method (Doyle & Doyle 1987). A set of 13 species-transferable microsatellite markers was selected from 1 genomic (G), 1 express sequences tag (EST), 6 low copy (LC), 5 undermethylated (UM) microsatellites (Liewlaksaneeyanawin et al. 2004): PtTX2123, PtTX2146, PtTX3011, PtTX3034, PtTX3025, PtTX3029, PtTX3052, PtTX3127, PtTX4054, PtTX4046, PtTX4056, PtTX4058, PtTX4139. The SSR procedures and loci nomenclatures are described in Liewlaksaneeyanawin et al. (2004). A F L P analysis, as described by Vos et al. (1995) were carried out according to Remington et al. (1999) with the following modification: 500 ng of genomic D N A was digested with Pstl and Msel and ligated with Pstl and Msel adaptors (Table 3.2). The restriction-ligation mixture was diluted to 1:10 in autoclave water prior to preamplification. Preamplification was performed in 10 pi using selective nucleotides Pstl + C and Msel + C (Table 3.2). The reaction was composed of 2.5 pi diluted ligation mixture, 0.6 U Tag 68 polymerase (Roche, Laval, Que.), 15 ng EcoRl primer, 15 ng Msel primer, I X Buffer (10 mM Tris-HCl, 1.5 m M M g C l 2 , 50 mM K C l , pH 8.3) (Roche, Laval, Que.), and 0.2 mM of each dATP, dCTP, dGTP, and dTTP. PCR amplifications were carried out using the following profile: initial cycle of 94 °C for 60 sec, then 28 cycles of 94 °C for 30 sec, 60 °C for 30 sec, 72 °C for 60 sec, and then followed by a long extension cycle at 72 °C for 5 min. A final amplification was performed using three selective nucleotides of tailed P primers and M primers: Pstl + CAG/Msel + CCC, Pstl + CAG/Msel + CGG, Pstl + CCA/Msel + C C A , Pstl + CGA/Msel + C C G (Table 3.2). An M13-tail was added to the selective P primers to use an M13-labelled primer on a Licor automated sequencer. The reaction was carried out in 10 pi with 2.5 pi of 1 : 40 dilution of the preamplification products, 0.6 U Taq polymerase (Roche, Laval, Que.), 2.52 ng of tailed P primer and M primer, 0.3 pmole of M l 3 Infrared Label (LiCor Inc), I X Buffer (10 m M Tris-HCl, 1.5 mM M g C l 2 , 50 m M K C l , pH 8.3) (Roche, Laval, Que.), and 0.2 m M of each dATP, dCTP, dGTP, and dTTP. Selective amplifications were carried out with the following parameters: initial cycle of 94°C for 60 sec, three cycles of 94°C for 30 sec, 65°C for 30 sec, 72 °C for 60 sec, 12 cycles with the annealing temperature decreasing 0.7°C per cycle, 22 cycles of 94°C for 30 sec, 56°C for 30 sec, 72°C for 60 sec, and then followed by a long extension cycle at 72°C for 5 min. 3.2.3 Detection of SSRs and AFLPs Microsatellite products were detected by M l 3 tailed primer (Oetting et al. 1995) or infrared dye (IRD)-labeled primer. A F L P products were also detected by M l 3 tailed primer. The amplification products were electrophoresed on 5.5% Long Ranger polyacrylamide gels using a LiCor 4200 automated sequencer (LiCor Inc., Lincoln, NE). IRD-labeled molecular 69 weight markers 50-350 bp and 50-700 bp (LiCor Inc.) were loaded at least two lanes as standard for microsatellites and AFLPs, respectively. Microsatellite alleles were scored according to their molecular weight and AFLP fragments were scored as present (+) or absent (-), within the size range of 50-700 base pairs using SAGA™ G T / M X (LiCor Inc.). 3.2.4 Data analyses 3.2.4.1 Genetic diversity analysis Genetic polymorphisms of microsatellites were estimated using FSTAT 2.9.3.2 (Goudet 2002). Allele frequencies, observed (Ho) and expected (HE) heterozygosities, and the number of allele per locus (Ao) were estimated for each population. Deviations from Hardy-Weinberg equilibrium (HWE), based on exact test and the Markov chain method with 1,000 iterations, were analyzed using GENEPOP 3.3 (Raymond & Rousset 1995b). A F L P variation was analyzed using AFLP-SURV 1.0 (Vekemans et al. 2002). For each population, allele frequencies at AFLP loci were calculated using the Bayesian method with a nonuniform prior distribution of allele frequencies proposed by Zhivotovsky (1999), assuming some deviation from Hardy-Weinberg equilibrium as estimated from SSR data (Fis = 0.09) in all calculations; this is incorporated in the program. The percent of polymorphic loci at 5% level (PPL) was calculated and unbiased expected heterozygosity (Hy, analogous to HE) were estimated following the method described in Lynch & Milligan (1994). 3.2.4.2 Genetic differentiation analysis The genetic differentiations for SSR and AFLP markers were examined by using FSTAT 2.9.3.2 (Goudet 2002) and AFLP-SURV 1.0 (Vekemans et al. 2002), respectively. For both types of markers, population differentiations were calculated using the following measures: F-statistics (Fsi) (Weir & Cockerham 1984). The F S T (0) was calculated over all 70 populations and jackknifing over loci provided standard errors. The significances of the genetic differentiation were obtained by 1,000 random permutations of individuals among populations. FST significant values for each microsatellite locus were also assessed by performing exact tests (Raymond & Rousset 1995b). The standardized RST value, under the stepwise model of microsatellite mutation, was also calculated for SSR markers using RST Calc (Goodman 1997). RST was calculated with allele sizes transformed to standard deviations from the global mean of repeat unit numbers to accounts for data containing loci with significantly different variances (Goodman 1997). Significances of RST values across all loci were calculated by permutation tests. An analysis of molecular variance ( A M O V A , Excoffier et al. 1992) was also performed for the SSRs and AFLPs using ARLEQULN (Schneider et al. 2000). To examine the genetic relationship among all natural populations, for microsatellites, the Nei's genetic distances (D s ; Nei, 1978), the (8p)2 estimator (Goldstein et al. 1995b), and the Cavalli-Sforza and Edwards' chord distance (Dc; Cavalli-Sforza & Edwards 1967) were used to construct phylogenetic trees. The Ds assume an infinite allele model, while (8u) assume a stepwise mutation model. The Dc is based on geometric distances rather than on a mutational model (Cavalli-Sforza & Edwards 1967). According to Takezaki & Nei (1996), DQ provides a better estimate of genetic divergence for microsatellite analysis compared to measures based on the stepwise mutation model. The Ds and (8p)2 were calculated between populations using MICROS A T 1.5d (Minch et al. 1996) and the Dc was calculated between populations using PHYLIP 3.62 (Felsenstein 2004). For A F L P , the Nei's (1987) genetic distance was calculated between populations using AFLP-SURV 1.0 (Vekemans et al. 2002). Based on four distance measures the unrooted neighbour-joining trees with 1,000 iterations 71 over all loci were constructed using PHYLIP 3.62 (Felsenstein 2004) and visualized using TREEVIEW (Page 1996). Estimates of populations' diversity (Ao, Ht, and PPL) generated from SSR and AFLP data were compared using Spearman's rank correlations (SAS version 9). The correlation between SSR and A F L P genetic distance matrices was also investigated using the Mantel test of matrix correspondence (Mantel 1967a). 3.3 Results 3.3.1 Genetic variability within populations For the SSR loci, Table 3.3 gives locus-specific properties, averaged over populations. A total of 195 alleles were detected from the 13 microsatellite loci, ranging from 6 (PtTX2123) to 24 (PtTX3011). The expected heterozygosity ranged from 0.472 (PtTX3052) to 0.921 (PtTX3011). Significant deviations from Hardy-Weinberg equilibrium were observed at a number of loci, possibly indicating the effects of selection acting on a gene associated with the studied loci or the presence of null alleles. Of the 13 studied loci, 12 were selected from EST-, low-copy, and undermethylated microsatellites and possibly were under strong selection. Null alleles seemed to be a likely cause for HWE deviation at locus PtTX4046. This locus showed strong homozygote excess and did not amplify SSR products in many samples. Thus locus PtTX4046 was omitted from any further analyses. Table 3.4 gives measures of variability within populations for both SSRs and AFLPs. At the population level, the 12 microsatellites showed higher level of polymorphism for all populations, with an average total number of alleles over all loci of 125 (range: 120-134). The mean number of alleles per locus was 10.45 ranging from 10 (Purden) to 11.17 (Clear Lake) alleles per locus, with 19 private alleles over the ten populations. Allelic distribution 72 for each of 12 microsatellite loci is shown in Figure 3.1. The average expected heterozygosity over the 12 loci was 0.776, ranging from 0 .723 (Foothills) to 0.793 (Clear Lake). The four A F L P primer combinations produced 187 scorable loci that showed considerable variability, with an average percentage of polymorphic loci of 7 1 . 3 3 % per population (range: 62 - 77) and average expected heterozygosity of 0 .295 (range: 0.271 -0.332). Table 3.5 shows the extent to which these diversity measures are correlated among populations. For SSRs, AQ and HE showed a low correlation (r = 0 .345, ns), but for AFLPs, PPL and Hj showed very high correlation (r = 0.966, P < 0 .0001) . Comparing between SSRs and AFLPs produced very low correlations between diversity measures, with correlations ranging from -0 .37 to 0 .067. 3.3.2 Genetic variability among populations Table 3.6 shows that the genetic differentiation among the 10 populations was small but significant for SSRs; mean F S T = 0.005 (± 0 .002; P < 0 .001) and mean RST = 0 .006 (± 0.002; P = 0.02). For AFLPs, mean F S T = 0.02 (P < 0.001). Interestingly, for SSRs in which there are two alternative measures of distance, the mean FST was lower compared to RST, but the FST showed much higher statistical significance. Table 3.7 gives estimates of pairwise FST for both the SSR and A F L P markers (RST for SSRs was omitted because of its greater variance). There were 45 pairs of populations. For SSRs, no significant positive values were found except for four pairs: Foothills - Clear Lake, Foothills - Gregg Creek, Foothills - Bowron, and Bowron - Kenneth Creek. Similarly, for AFLPs significant values were found between Clear Lake and Kenneth Creek. 73 Interestingly, a Mantel test showed there was no significant correlation between the SSR and AFLP pairwise distances (r - -0.06, P = 0.36); there was essentially no association. The results of analysis of molecular variance, given in Table 3.8, gave similar results for both SSRs and AFLPs, with 99.4% vs. 97.7% of the genetic variances accounted for variation within populations. The genetic relationships among the 10 natural populations constructed by neighbour-joining using SSRs and AFLPs are shown in Figure 3.2. Neighbour-joining trees constructed from SSRs based on the Nei's (Fig. 3.2a) and Sp (Fig. 3.2b) distances revealed no concordant patterns among populations. Discrepancies were also found between the two markers. The neighbour-joining tree based on AFLPs was best supported by bootstrap values ranging from 38 to 95%. 3.4 Discussion 3.4.1 Level of diversity within populations P. contorta ssp. latifolia possesses high within-population diversity and low population differentiation. In general, the levels of genetic diversity from SSRs and AFLPs that we found for lodgepole pine are comparable to other long-lived perennial, outcrossing, wind-pollinated, and widely distributed species (Nybom 2004). The average expected heterozygosity (HE) value of 0.776 reported in this study is similar to a previous study on the same species using microsatellite markers (Thomas et al. 1999). This value is also comparable with those reported for other pine species. Average HE values of 0.610 and 0.734 were reported for P. taeda (Williams et al. 2000) and P. pinaster (Mariette et al. 2001). These values are quite concordant given the known problems of "ascertainment bias" in molecular markers, where the strategy of marker development in 74 candidate panels affects the levels of polymorphism detected in subsequent population surveys. We found higher HE values for SSRs comparing to AFLPs (Table 3.3). However, HE properly cannot be compared between SSRs and AFLPs, as SSRs have much higher mutation rates, and have been selectively screened for higher levels of polymorphism. In general, HE for SSRs show 1.7 to 4.6 times more heterozygosity than AFLPs (Nybom 2004). 3.4.2 Genetic differentiation The low population differentiation statistics estimated in this study for both SSRs and AFLPs are lower than previously reported statistics based upon isozyme and RAPDs. This is probably due to the limited sampling range in our study. For isozymes, other researchers found G S T values of 0.041 (Yeh & Layton 1979) and 0.060 (Wheeler & Guries 1982). For RAPDs, researchers have found F S T values of 0.122 (Fazekas & Yeh 2001) and 0.162 (Ye et al. 2002) but over the entire range (Table 3.9). For the purposes of our study, we limited out sampling to the Prince George breeding zone, which is the pool of natural populations from which the breeding population is derived. As such, estimates of differentiation presented here should be lower than a range-wide comparison. It should be noted that the apparent small among-populations genetic differentiation observed in the present study is a general feature found in most temperate and boreal coniferous tree species (Yeh 1989). The high level of genetic variation in conifers has been attributed to their ecological amplitude, large effective population size, high outcrossing rates, and potential for long-distance pollen and seed dispersal. 75 3.4.3 Comparison of results from SSR and AFLP markers The lack of correlation between measures of diversity and distance among natural populations observed in this study differs from that reported for other studies (Table 3.10). Several plant species showed significant correlations among populations between measures of SSRs vs. AFLPs (Table 3.10). Mariette et al. (2002a) suggested that low correlation of estimates between marker types is more likely due to low differentiation of populations, with consequential statistical error of estimates obscuring any correlation between the alternative marker types. Low differentiation is especially likely in temperate forest tree species, which have large population sizes and high gene flow, a population structure previously observed for lodgepole pine (Yang & Yeh 1995). An alternative hypothesis is that random "actual" variation of diversity occurs among populations at this low level of differentiation. Even with both classes of markers (SSRs and AFLPs) being highly informative, the actual variation of diversity obscures the concordance of SSRs with AFLPs. Also, Gaudeul et al. (2004) and Mariette et al. (2001) suggested that at higher levels of comparison, such as between regions or taxa, that the correlation of diversity estimates between SSRs and AFLPs should become more pronounced. In our study, we found estimates of Fsr to be lower for SSRs than for AFLPs. Other plant studies have found similar lower levels of FST calculated from SSRs compared to AFLPs (see Table 3.9 for listing of various studies conducted on lodgepole pine). This is likely due to the high mutation rates and high heterozygosity of SSR loci. The level of differentiation and genetic distance between populations is greatly influenced by the level of within population heterozygosity because differentiation indices, such as FST, are calculated as ratios of among- over within-population genetic variance (Weir & Cockerham 1984). 76 High levels of polymorphism and mutation rates of SSR loci may also lead to significant size homoplasy, thus potentially reducing the statistical power to define genetic relationships between populations. As observed in this study, AFLPs tended to show a better resolution of genetic relationships than microsatellites. As well, other studies have reported more discrete clusters from AFLPs compared to SSRs (Table 3.10). In summary, we found low levels of genetic variation among populations, and high levels of variation within populations, for lodgepole pine occurring in the Prince George region of British Columbia. These estimates of genetic variation will provide a benchmark for comparing diversity between wild and domesticated populations (material used in the ongoing Ministry of Forests breeding program). The distributions of alleles among populations are also consistent with low population differentiation (Fig. 3.1), as only nineteen of the 187 alleles from 12 SSR loci are rare/private alleles occurring in only one of the 10 populations. It is expected that these alleles would likely be the first to be lost along the domestication process. Changes in diversity from natural to domesticated populations, and their implications for improving the management of the tree breeding program in Prince George seed planning zone, are documented in the next chapter. The other conclusion of our study is that we detected little congruence between statistics computed from SSR data vs. AFLP data. The discrepancies can be due to the differences in: (1) the number of marker representing the whole genome coverage, (2) the relative mutation rates, and (3) the level of population heterogeneity in diversity levels. Microsatellites are often regarded as the "marker of choice" and are preferable to AFLPs for fine-scale population genetic studies. However, AFLPs are probably more reliable and efficient than SSRs for differentiating between closed populations, due to their broad 77 coverage of the genome and lower expense. Indeed, in our study they gave more significant estimates for several of the measures of both diversity and differentiation. Although this is a function of the numbers of SSR vs. AFLP loci, in this study, roughly comparable effort was put into each type of marker. Table 3.1 Location of the 10 P. contorta natural populations from Prince George, BC Location Latitude Longitude Elevation (m) Bowron 54 00 95 121 48 49 637 Kenneth Creek 53 55 70 121 47 56 722 Purden 53 53 28 121 58 28 726 Buckhorn 53 46 29 122 39 60 660 Domano 53 49 92 122 45 49 602 West Lake 53 49 21 122 50 46 762 Scout Camp 53 43 32 122 52 95 751 Clear Lake 53 38 53 122 56 08 866 Gregg Creek 53 49 01 123 05 34 718 Foothills 53 56 97 122 46 49 658 79 Table 3.2 Sequences of primers and adaptors used for amplified fragment length polymorphisms analysis Primer name Sequence 5' —>3' Adapters Psrl-adapter forward Pstl-adapter reverse Msel-adapter forward Msel-adapter reverse A A C G A C G A C TGC G T A C A T G C A TGT A C G C A G TCG TC G A C GAT G A G TCC T G A G T A C T C A G G A CTC A T Pre-amplication primers Pstl + C Msel + C G A C TGC GTA CAT G C A GC G A T G A G TCC TGA GTA A C AFLP Pstl primers Pstl + C A G Pstl + C C A Pstl + C G A G A C TGC GTA CAT G C A G C A G G A C TGC GTA CAT G C A G C C A G A C TGC GTA CAT G C A G C G A AFLP Msel primers Msel + C C A Msel + CCC Msel + C G G Msel + C C G G A T G A G TCC TGA GTA A C C A GAT G A G TCC TGA GTA A CCC G A T G A G TCC TGA GTA A C G G G A T G A G TCC T G A GTA A C C G 80 Table 3.3 Estimates of genetic diversity for each microsatellite locus over the 10 P. contorta natural populations Locus A Ho HE Pis a PtTX3052 11 0.477 0.472 -0.010 PtTX3127 11 0.639 0.695 0.081 PtTX4054 18 0.882 0.895 0.014 PtTX4139 18 0.652 0.860 0.242** PtTX3025 9 0.607 0.660 0.081 PtTX2123 6 0.620 0.614 -0.010 PtTX4058 21 0.790 0.890 0.113** PtTX3034 14 0.710 0.832 0.147** PtTX2146 21 0.773 0.829 0.067 PtTX4056 16 0.750 0.830 0.097* PtTX3011 24 0.747 0.921 0.189** PtTX4046 8 0.401 0.773 0.482** PtTX3029 18 0.840 0.867 0.031 A = Number of alleles, Ho = Observed heterozygosity, HE = Expected heterozygosity, Fis = Fixation index a Loci showing significant deviations from Hardy-Weinberg equilibrium. * P<0.05, ** P<0.01. 81 Table 3.4 Genetic variation at SSR and AFLP loci for the 10 P. contorta populations Microsatellites (SSR) A F L P Population Aj A0 AP Ho HE PPL Hi Clear Lake 134 11.17 3 0.728 0.793 77.00 0.322 Scout Camp 132 11.00 3 0.696 0.775 65.24 0.271 Gregg Creek 124 10.33 2 0.736 0.788 83.42 0.324 Purden 120 10.00 2 0.689 0.771 75.40 0.309 Bowron 121 10.08 3 0.749 0.790 62.00 0.272 West Lake 125 10.42 2 0.755 0.780 70.58 0.289 Foothills 123 10.25 2 0.645 0.723 75.93 0.312 Kenneth Creek 124 10.33 0 0.692 0.791 68.98 0.284 Buckhorn 124 10.33 1 0.674 0.775 66.30 0.272 Domano 127 10.58 1 0.710 0.775 68.40 0.285 Mean 125 10.45 1.9 0.707 0.776 71.33 0.295 Aj = Total number of alleles, AQ = Average number of alleles, A? = Number of private alleles, HQ = Observed heterozygosity, HE = Expected heterozygosity, PPL = Proportion of polymorphic loci at 5% level, Hj = same as HE for AFLP markers. 82 Table 3.5 Spearman rank correlation coefficients (r) showing the association of different diversity measures among populations Comparison r P Within marker comparisons SSR (A0) vs. (HE) 0.354 0.316 AFLP (PPL) vs. (H}) 0.966 0.000 Between marker comparisons SSR (HE) VS. A F L P (Hj) 0.067 0.853 SSR (Ao) vs. A F L P (Hj) -0.037 0.919 SSR (Ao) vs. A F L P (PPL) 0.031 0.933 SSR (HE) VS. A F L P (PPL) 0.055 0.879 83 Table 3.6 Estimates of genetic diversity and population structure for each microsatellite locus over the 10 P. contorta natural populations Locus PtTX3052 0.004 0.004 PtTX3127 0.001 -0.001 PtTX4054 0.005* 0.021 PtTX4139 0.002 0.017 PtTX3025 0.007** 0.005 PtTX2123 0.028* 0.003 PtTX4058 0.001 0.007 PtTX3034 0.003 0.010 PtTX2146 0.006** 0.008 PtTX4056 0.004** -0.005 PtTX3011 0.002 0.010 PtTX3029 0.001 -0.004 A l l 0.005*** 0.006* *P<0.05, ***P< 0.001. 84 Table 3.7 Pairwise comparison matrix of FST estimates for SSRs (below diagonal) and for AFLPs (above diagonal) from 10 natural populations of P. contorta ssp. latifolia C L SC G C P Bo W L F K C Bu D C L 0.0216 0.0030 0.0135 0.0418 0.0171 0.0267 0.0325 0.0314 0.0261 SC -0.0002 0.0366 0.0389 0.0326 0.0185 0.0302 0.0334 0.0304 0.0176 GC 0.0045 0.0085 0.0000 0.0423 0.0253 0.0125 0.0173 0.0307 0.0316 P 0.0030 0.0032 0.0028 0.0460 0.0240 0.0262 0.0282 0.0386 0.0316 Bo 0.0079 0.0084 -0.0020 0.0040 0.0081 0.0217 0.0132 0.0179 0.0069 WL 0.0030 0.0002 0.0007 -0.0008 0.0022 0.0150 0.0075 0.0060 0.0000 F 0.0164 0.0093 0.0192 0.0115 0.0216 0.0100 0.0000 0.0274 0.0239 K C 0.0063 0.0043 0.0051 0.0043 0.0044 -0.0031 0.0113 0.0129 0.0105 Bu 0.0009 -0.0018 0.0000 0.0011 -0.0018 -0.0017 0.0053 0.0014 0.0009 D 0.0081 0.0071 0.0045 0.0040 0.0048 0.0032 0.0115 0.0006 0.0010 SSR significance tested by permutations using FSTAT (Goudet, 2002); tests for significance of population differentiation for AFLPs using exact test in TFPGA (Miller, 2000). Significant pairwise difference between populations is indicated in bold letters. 85 Table 3.8 Analysis of molecular variance results based on the number of different alleles (FST) from the 10 natural population of P. contorta ssp. latifolia with SSRs and AFLPs Source of variation d.f. Sum of Variance Percentage FST squares components of variation SSRs Among populations 9 56.96 0.029 0.61 0.0061*** Within populations 590 2724.37 4.618 99.39 Total 599 2781.33 4.647 AFLPs Among populations 9 305.78 0.468 2.3 0.023*** Within populations 290 5775.06 19.914 97.7 Total 299 6080.85 20.382 .***/»< 0.001 86 Table 3.9 Genetic variability in P. contorta ssp. latifolia from isozyme and D N A marker analyses No. of populations Method No. of loci PPL Ao FST /GST References 9 Isozyme 25 58.7 1.90 0.160 0.041 Yeh & Layton (1979) 24 Isozyme 42 69.0 1.86 0.118 0.060 Wheeler & Guries (1982) 5 Isozyme 21 51.4 2.50 0.180 0.018 Dancik& Yeh (1983) 35 Isozyme 21 90.5 N A 0.194 N A Yang & Yeh (1995) 23 R A P D 39 52.4 - 0.143 0.162 Ye et al. (2002) 15 RAPD 52 63.4 - 0.160 0.122 Fazekas & Yeh (2001) 10 R A P D 10 N A - 0.430 0.061 Thomas et al. (1999) 8 SSR 5 100 21.0 0.730 0.028 Thomas et al. (1999) PPL - Proportion of polymorphic loci at 5% level, Ao = Average number of alleles, HE = Expected heterozygosity, N A = not available 87 Table 3.10 Comparative analysis of genetic diversity at population (P), region (R), and taxa (T) level as well as population differentiation in natural plant populations using SSR and AFLP markers (adapted from Woodhead et al. 2005) No. of Correlation of Correlation loci diversity estimates _sj of pairwise Population/ genetic taxon Species SSR AFLP P R T SSR AFLP distance » differentiation Pinus pinaster^ 3 122 No Yes N A 0.111 0.102 N A N A Quercus petraea2 6 155 No N A Yes 0.023 0.118 N A N A Quercus robur2 No N A Yes 0.114 0.020 N A N A Avicennia marina 3 918 N A N A N A 0.547 0.628 0.628 AFLP>SSR Eryngium alpinurn Malus sylvestris5 7 63 No N A N A 0.230 0.420 0.500 AFLP>SSR 12 126 No N A N A 0.097 0.140 0.872 AFLP>SSR Athyrium distentifolium6 18 265 Some Some N A 0.349 0.496 0.985 AFLP>SSR Pinus contorta7 12 187 No N A N A 0.004 0.020 -0.060 AFLP>SSR References: 'Mariette et al. (2001), 2Mariette et al. (2002b), 3Maguire et al. (2002), 4Gaudeul et al. (2004), 5Coart et al. (2003), 6Woodhead et al. (2005), and 7this study, N A = not available 88 Figure 3.1 Allelic distribution for the 12 microsatellite loci over the studied 10 P. contorta ssp. latifolia populations .'Allele size (bp) Figure 3.1. - Continued 90 Figure 3.2 Dendrograms generated by neighbour-joining showing the genetic relationships among 10 natural populations. Based on (a) SSR data and Nei's genetic distance; (b) SSR data and (5p)2 distance; (c) SSR data and DQ distance; (d) A F L P data and Nei's genetic distance. The numbers on the branches are the percentage support of 1,000 bootstrap replications 91 3.5 References Brown A H D (1989) Core collections - A practical approach to genetic-resources management. Genome 31, 818-824. Brown A H D , Allard RW (1970) Estimation of mating system in open-pollinated maize populations using isozyme polymorphisms. Genetics 66, 133-145. Cavalli-Sforza L L , Edwards A W F (1967) Phylogenetic analysis models and estimation process. American Journal of Human Genetics 19, 233-257. Coart E, Vekemans X , Smulders M J M , Wagner I, Van Huylenbroeck J, Van Bockstaele E, Roldan-Ruiz I (2003) Genetic variation in the endangered wild apple (Malus sylvestris (L.) Mill.) in Belgium as revealed by amplified fragment length polymorphism and microsatellite markers. Molecular Ecology 12, 845-857. Critchfield WB (1980) Genetics of lodgepole pine. USDA Forest Service Research Paper, WO-37. Dancik BP, Yeh FC (1983) Allozyme variability and evolution of lodgepole pine (Pinus contorta var. latiforlia) and jack pine (P. banksiana) in Alberta. Canadian Journal of Genetics and Cytology 25, 57-64. Doyle JJ, Doyle JL (1987) A rapid D N A isolation procedure for small quantities of fresh tissue. Phytochemical Bulletin 19, 11-15. Ellstrand NC, Marshall D L (1986) Patterns of multiple paternity in populations of Raphanus sativus. Evolution 40, 837-842. Excoffier L, Smouse PE, Quattro J M (1992) Analysis of molecular variance inferred from metric distances among D N A haplotypes: application to human mitochondrial-DNA restriction data. Genetics 131, 479-491. Fazekas AJ , Yeh FC (2001) Random amplified polymorphic D N A diversity of marginal and central populations m. Pinus contorta subsp. latifolia. Genome 44, 13-22. Felsenstein J (2004) PHYLIP (Phylogeny Inference Package) version 3.62 Distributed by the author, http://evolution.RS.washington.edu/phylip.html. Department of Genome Sciences, University of Washington, Seattle. Gaudeul M , Till-Bottraud I, Barjon F, Manel S (2004) Genetic diversity and differentiation in Eryngium alpinum L . (Apiaceae): comparison of A F L P and microsatellite markers. Heredity 92, 508-518. Goldstein DB, Linares AR, Cavallisforza L L , Feldman M W (1995b) Genetic absolute dating based on microsatellites and the origin of modern Humans. Proceedings of the National Academy of Sciences of the United States of America 92, 6723-6727'. Goodman SJ (1997) RST Calc: a collection of computer programs for calculating estimates of genetic differentiation from microsatellite data and determining their significance for microsatellite data. Molecular Ecology 6, 881-885. Goudet J (2002) FSTAT 2.9.3.2, A program to estimate and test gene diversities and fixation indices Available from http://www2.unil.ch/izea/softwares/fstat.html. 92 Hamrick JL, Godt M J (1989) Allozyme diversity in plant species. In: Plant Population Genetics, Breeding and Germplasm Resources (eds. Brown A H D , Clegg MT, Kahler A L , Weir BS), pp. 43-63. Sinauer Associates, Inc., Sunderland, Masschusetts. Liewlaksaneeyanawin C, Ritland CE, El-Kassaby Y A , Ritland K (2004) Single-copy, species-transferable microsatellite markers developed from loblolly pine ESTs. Theoretical and Applied Genetics 109, 361-369. Lynch M , Milligan B G (1994) Analysis of population genetic structure with RAPD markers. Molecular Ecology 3, 91-99. Maguire TL, Peakall R, Saenger P (2002) Comparative analysis of genetic diversity in the mangrove species Avicennia marina (Forsk.) Vierh. (Avicenniaceae) detected by AFLPs and SSRs. Theoretical and Applied Genetics 104, 388-398. Mantel N (1967) Detection of disease clustering and a generalized regression approach. Cancer Research 27, 209-220. Mariette S, Chagne D, Lezier C, Pastuszka P, Baffin A , Plomion C, Kremer A (2001) Genetic diversity within and among Pinus pinaster populations: comparison between AFLP and microsatellite markers. Heredity 86, 469-479. Mariette S, Cottrell J, Csaikl U M , Goikoechea P, Konig A , Lowe AJ , Van Dam BC, Barreneche T, Bodenes C, Streiff R, Burg K, Groppe K , Munro RC, Tabbener H, Kremer A (2002b) Comparison of levels of genetic diversity detected with AFLP and microsatellite markers within and among mixed Q. petraea (MATT.) LIEBL. and Q. robur L. stands. Silvae Genetica 51, 72-79. Mariette S, Le Corre V , Austerlitz F, Kremer A (2002a) Sampling within the genome for measuring within-population diversity: trade-offs between markers. Molecular Ecology 11, 1145-1156. McAuley L (1998) Interior SPZ Review Report, p. 39. Tree Improvement Program, Ministry of Forests, BC. Minch E, Ruiz-Linares A , Goldstein DB, Feldman M W , Cavalli-Sforza L L (1996) MICROSAT Version 1.5. A computer program for calculating various statistics on microsatellite allele data, Standford University Medical Centre, Standford, CA. Nybom H (2004) Comparison of different nuclear D N A markers for estimating intraspecific genetic diversity in plants. Molecular Ecology 13, 1143-1155. Oetting WS, Lee HK, Flanders DJ, Wiesner GL, Sellers TA, King R A (1995) Linkage analysis with multiplexed short tandem repeat polymorphisms using infrared fluorescence and M l 3 tailed primers. Genomics 30, 450-458. Page R D M (1996) TREEVIEW: an application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 12, 357-358. Pejic I, Ajmone-Marsan P, Morgante M , Kozumplick V , Castiglioni P, Taramino G, Motto M (1998) Comparative analysis of genetic similarity among maize inbred lines detected by RFLPs, RAPDs, SSRs, and AFLPs. Theoretical and Applied Genetics 97, 1248-1255. 93 Powell W, Morgante M , Andre C, Hanafey M , Vogel J, Tingey S, Rafalski A (1996) The comparison of RFLP, RAPD, AFLP and SSR (microsatellite) markers for germplasm analysis. Molecular Breeding 2, 225-238. Raymond M , Rousset F (1995) Genepop (Version-1.2): Population genetics software for exact tests and ecumenicism. Journal of Heredity 86, 248-249. Rehfeldt GE (1988) Ecological Genetics of Pinus Contorta from the Rocky Mountains (USA). Silvae Genetica 37, 131-135. Remington DL, Whetten RW, Liu B H , O'Malley D M (1999) Construction of an AFLP genetic map with nearly complete genome coverage in Pinus taeda. Theoretical and Applied Genetics 98, 1279-1292. Russell JR, Fuller JD, Macaulay M , Hatz BG, Jahoor A , Powell W, Waugh R (1997) Direct comparison of levels of genetic variation among barley accessions detected by RFLPs, AFLPs, SSRs and RAPDs. Theoretical and Applied Genetics 95, 714-722. Schneider S, Roessli D, Excoffier L (2000) Arlequin ver 2.000: A software for population genetic analysis, Genetics and Biometry Laboratory, Univerisy of Geneva, Switzerland. Takezaki N , Nei M (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144, 389-399. Teulat B, Aldam C, Trehin R, Lebrun P, Barker JHA, Arnold G M , Karp A , Baudouin L, Rognon F (2000) An analysis of genetic diversity in coconut (Cocos nucifera) populations from across the geographic range using sequence-tagged microsatellites (SSRs) and AFLPs. Theoretical and Applied Genetics 100, 764-771. Thomas B, Macdonald S, Hicks M , Adams D, Hodgetts R (1999) Effects of reforestation methods on genetic diversity of lodgepole pine: an assessment using microsatellite and randomly amplified polymorphic D N A markers. Theoretical and Applied Genetics 98, 793-801. Vekemans X , Beauwens T, Lemaire M , Roldan-Ruiz I (2002) Data from amplified fragment length polymorphism (AFLP) markers show indication of size homoplasy and of a relationship between degree of homoplasy and fragment size. Molecular Ecology 11, 139-151. Vos P, Hogers R, Bleeker M , Reijans M , Van De Lee T, Homes M , Frijters A , Pot J, Peleman J, Kuiper M , Zabeau M (1995) A F L P : A new technique for D N A fingerprinting. Nucleic Acids Research 23, 4407-4414. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38, 1358-1370. Wheeler NC, Guries RP (1982) Population structure, genie diversity, and morphological variation in Pinus contorta Dougl. Canadian Journal of Forest Research 12, 595-606. Williams CG, Elsik CG, Barnes RD (2000) Microsatellite analysis of Pinus taeda L. in Zimbabwe. Heredity 84, 261-268. 94 Woodhead M , Russell J, Squirrell J, Hollingsworth P M , Mackenzie K , Gibby M , Powell W (2005) Comparative analysis of population genetic structure in Athyrium distentifolium (Pteridophyta) using AFLPs and SSRs from anonymous and transcribed gene regions. Molecular Ecology 14, 1681-1695. Woods JH (2003) Forest Genetics Council of BC business plan 2003 - 2004, p. 24. Forest Genetics Council of British Columbia, BC. Yang RC, Yeh FC (1995) Patterns of gene flow and geographic structure in Pinus contorta DOUGL. Forest Genetics 2, 65-75. Ye TZ, Yang RC, Yeh FC (2002) Population structure of a lodgepole pine (Pinus contorta) and jack pine (P. banksiana) complex as revealed by random amplified polymorphic DNA. Genome 45, 530-540. Yeh FC (1989) Isozyme analysis for revealing population structure for use in breeding strategies In: Breeding Tropical Trees: Population Structure and Genetic Improvement Strategies in Clonal and Seedling Forestry, Proceedings of the IUFRO Conferece, Pattaya, Thailand, 28 Nov. - 3 Dec. 1988, Oxford Forestry Institute, Oxford and Winrock International, Arlington, Va., pp. 119-131. Yeh FC, Cheliak W M , Dancik W M , Illingworth K , Trust DC, Pryhitka B A (1985) Population differentiation in lodgepole pine, Pinus contorta ssp. latifolia: a discriminant analysis of allozyme variation. Canadian Journal of Genetics and Cytology 21, 210-218. Yeh FC, Layton C (1979) The organization of genetic variability in central and marginal populations of lodgepole pine Pinus contorta ssp. latifolia. Canadian Journal of Genetics and Cytology 21, 487-503. Zhivotovsky L A (1999) Estimating population structure in diploids with multilocus dominant D N A markers. Molecular Ecology 8, 907-913. 95 CHAPTER 4 IMPACT OF DOMESTICATION ON GENETIC VARIABILITY OF LODGEPOLE PINE: COMPARISON OF MICROSATELLITE AND AFLP MARKERS 4.1 Introduction Knowledge of the patterns of genetic diversity in a forest tree species is important for both short-term and long-term programs of its genetic improvement and gene conservation. Tree improvement programs aim to: (i) achieve rapid genetic gains of target traits; (ii) maintain sufficient genetic diversity in the breeding population for target traits for continued genetic gain and also any changes of breeding strategy; and (iii) conserve genetic diversity for non-target traits especially those involved with adaptation to changes of environment (Zobel & Talbert 1984). A narrow genetic base at the early stages of domestication process, and reductions of genetic diversity at subsequent stages of domestication, can lead to the program failure and can substantially reduce the effectiveness of tree improvement programs. Characterization of the pattern of genetic diversity along the domestication process with genetic markers provides information about both existing patterns and changes of patterns through the tree improvement delivery system (El-Kassaby 2000a). To illustrate the potential consequences of ignoring changes of diversity during domestication, a rapid reduction in genetic diversity during the domestication process was reported in Acacia mangium Willd., a tropical tree species, by Butcher et al. (1996). In this species, reductions were mainly caused by phenotypic selection. Butcher et al. (1996) used RFLP (restriction fragment length polymorphism) markers, and found that only 56% of the total variation was present in the production seed orchard, compared to natural populations. 96 Lodgepole pine (P. contorta Dougl. ex Loud. ssp. latifolia Engelm.) is the most important commercial pine species in British Columbia. It ranks first among species harvested in British Columbia (B.C.). Annual volume of all products reached 21,894,000 m , representing 28 percent of the provincial harvest (Carlson 2001). The demand for lodgepole pine seedlings in B.C. was approximately 93 million of which most seedlings, approximately 85 percent, came from natural stands while the remaining 15 percent came from seed orchards and superior provenances. It is expected that the demand for improved planting stock from orchard produced seedlots will increase to meet the expected seed orchard utilization target of 75% by year 2007 (Stoehr et al. 2004), and the increased planting caused by the salvage harvest following the Mountain Pine Beetle (MPB) infestation (Carroll et al. 2003), as well as recent wide-spread natural forest fires. The effects of lodgepole pine domestication through genetic improvement and its long-term impacts on genetic diversity must be evaluated to ensure the maintenance of a broad genetic base. Traditionally, isozyme markers have been used to assess the effect of domestication. In several conifer species, isozymes have not shown any significant reductions of genetic variation between natural and domesticated populations. Examples include Picea abies (Bergmann & Ruetz 1991), Picea sitchensis (Chaisurisri & El-Kassaby 1994), Pesudotsuga menziesii (El-Kassaby & Ritland 1996), and Picea glauca x engelmannii (Stoehr & E l -Kassaby 1997). Occasional studies did find a loss of rare or localized alleles, associated with the reduced number of parental trees in seed orchards. Two limitations of using isozyme markers in detecting losses of variation during the early stages of domestication is their low level of polymorphism, and limited numbers of loci. In P. contorta ssp. latifolia, the allelic 97 diversity of isozymes have been shown to be low, with the mean number of alleles per locus ranging from 1.9 to 2.5 (Dancik & Yeh 1983; Yeh & Layton 1979). Both microsatellites (SSRs) and amplified fragment length polymorphisms (AFLPs) are recently introduced molecular genetic markers, which provide much more information about patterns of genetic diversity. SSRs are highly polymorphic due to high mutation rates. SSRs have been used to quantify genetic variation within breeding populations (Poltri et al. 2003), estimate outcrossing rates in breeding populations (Butcher et al. 1999), investigate pollen contamination in seed orchards (Lexer et al. 1999), and evaluate the effect of domestication in agroforestry system (Hollingsworth et al. 2005). Microsatellites are highly sensitive to genetic bottlenecks and selection, both of which are likely to occur during the domestication events. The other new molecular genetic technique, A F L P , allows the assay of large numbers of loci, but unfortunately each locus is diallelic and usually exhibits dominance. At least, the high numbers of loci makes it a powerful tool for genetic map construction, genotype identification, and taxonomic studies in many plant species. In the area of forest tree breeding, the A F L P technique has only been used to evaluate genetic relationships among accessions in the breeding population of Eucalyptus dunnii Maiden (Poltri et al. 2003). The utility of AFLPs for documenting changes of variation during domestication, particularly as compared to isozymes and SSRs, have not been previously evaluated. The first objective of this study is to evaluate the effects of domestication process on the genetic variability of lodgepole pine with both SSR and A F L P markers. It is hypothesized that the domestication process may cause a gradual reduction in genetic diversity through several bottlenecks, starting with phenotypic selection, followed by the 98 formation of the breeding and production populations, and finally the production of seed and seedling crops. At each stage, there is a reduction of the number of individuals. The level of genetic diversity from 10 natural populations within the British Columbia's Prince George Seed Planning Zone (McAuley 1998) was used the benchmark for natural diversity. The second objective of this study is to compare the results from SSR vs. AFLP loci, to ascertain whether the changes in the patterns of genetic diversity differ between the two types of markers. We expect that each marker is appropriate for a certain facet of diversity and may also differ in the power to detect changes of diversity. This comparison should allow future studies to focus on the most appropriate markers to document such changes of diversity. 4.2 Materials and Methods 4.2.1 Plant materials 4.2.1.1 Natural populations Ten natural populations were sampled within the Prince George Seed Planning Zone (see Chapter 3 for details). Levels of genetic diversity among these populations will provide a benchmark for natural genetic diversity. 4.2.1.2 Domesticated populations Dormant vegetative buds were collected from 92 and 44 trees representing the breeding and production seed orchard populations, respectively. The breeding and production (a subset from the breeding population) population were sampled from the British Columbia Ministry of Forests' Kalamalka Research Station (Vernon, BC) and the Canadian Forest Products' seed orchard (Prince George, BC), respectively. Sampled trees were from an elevation band between 610 and 1200 m representing the Prince George Seed Planning 99 Zone (Fig. 4.1). The seed population was represented by a bulk sample of 120 seeds from the 2002 seed crop of the studied seed orchard (seedlot # 61042 of British Columbia's Provincial seed inventory). 120 1-year-old seedlings produced from the studied seedlot represented the seedling population. Thus, the studied populations represent a progression starting from natural populations and ending with seedlings providing a complete sampling of the domestication process. A l l dormant bud samples were stored at -80°C until D N A extraction. Seeds were stored at 4°C until germination prior to D N A extraction. Twelve SSR loci and 3 AFLP primer combinations were used to compare genetic diversity along the domestication process. Four AFLP primer combinations were also use to investigate genetic relationship among genotypes in breeding and seed orchard populations. 4.2.2 DNA extraction and procedures of SSR and AFLP Genomic D N A was isolated from individual vegetative buds or germinants (7-day-old) using a modification of the C T A B (cetyltrimethyl ammonium bromide) method (Doyle & Doyle 1987). A set of 12 species-transferable SSR markers was selected from 1 genomic (G), 1 express sequences tag (EST), 6 low copy (LC), 4 undermethylated (UM) SSRs (Liewlaksaneeyanawin et al. 2004): PtTX2123, PtTX2146, PtTX3011, PtTX3034, PtTX3025, PtTX3029, PtTX3052, PtTX3127, PtTX4054, PtTX4056, PtTX4058, PtTX4139. The SSR procedures and loci nomenclatures are described in Liewlaksaneeyanawin et al. (2004). SSR alleles were scored according to their molecular weight and AFLP fragments were scored as present (+) or absent (-), within the size range of 50-700 base pairs using SAGA™ G T / M X (LiCor Inc.). 100 A F L P analysis was carried out as described in Chapter 3, using a LiCor 4200 automated sequencer (LiCor Inc., Lincoln, NE). Three selective primer pairs (Pstl + CAG/Msel + CCC, Pstl + CAG/Msel + CGG, Pstl + CGAJMsel + CCG) were used to compare genetic diversity along the domestication process. For investigating genetic similarity in breeding and orchard, three primer combinations as indicated above and one additional primer (Pstl + CCA/Msel + CCA) were used. 4.2.3 Data analyses 4.2.3.1 Genetic diversity analysis Genetic polymorphisms of SSRs were estimated using FSTAT 2.9.3 (Goudet 2002). Allele frequencies, expected and observed heterozygosities, and number of allele per locus were estimated for the natural and the domesticated populations (breeding, seed orchard, seed, and seedling populations). Allelic richness (Petit et al. 1998), the measure of allelic diversity, which accounts for sample size differences, was also computed for the combining natural and domesticated populations. Alleles were classified into four frequency classes: high; P > 0.75, intermediate; 0.25 < P < 0.75, low; 0.01 < P < 0.25, and rare; P < 0.01 (Rajora et al. 2000). AFLP variation was analyzed using AFLP-SURV 1.0 (Vekemans et al. 2002). For each population, allele frequencies at AFLP loci were calculated using the Bayesian method with a non-uniform prior distribution of allele frequencies as proposed by Zhivotovsky (1999), assuming some deviation from Hardy-Weinberg equilibrium as estimated from SSR data. The percent of polymorphic loci at 5% level (PPL) was calculated and unbiased expected heterozygosity (Hy, analogous to HE) were estimated following the method described in Lynch & Milligan (1994). 101 4.2.3.2 Genetic differentiation analyses The genetic differentiation for SSR and AFLP markers was examined using FSTAT 2.9.3.2 (Goudet 2002) and AFLP-SURV 1.0 (Vekemans et al. 2002), respectively. For both markers, population differentiation was calculated using F-statistics (FST; Weir & Cockerham 1984). The FST (0) was calculated over all populations, and jackknifing over loci provided standard errors. Significance was also determined by 1,000 random permutations of individuals among populations. FST significance values for each SSR locus were assessed by exact tests (Raymond & Rousset 1995a). The standardized RSJ value, under the stepwise model of SSR mutation, was also calculated for SSR markers using RSJ Calc (Goodman 1997). RST was calculated with allele sizes transformed to standard deviations from the global mean of repeat unit numbers to accounts for data containing loci with significantly different variances (Goodman 1997). Significance of i?sT values across all loci was calculated by permutation tests. To examine the genetic relationship among all natural populations and among natural versus domesticated populations, the U P G M A trees with 1,000 iterations over all loci were produced from distance matrices of Nei (1978) using MICROSAT 1.5d (Minch et al. 1996) for SSRs and from distance matrices of Nei (1987) using A F L P - S U R V 1.0 (Vekemans et al. 2002) for AFLPs. A consensus tree was built with CONSENSUS (PHYLIP 3.62; (Felsenstein 2004) and visualized using TREEVIEW (Page 1996). To compare diversity between markers, estimates of expected heterozygosities from SSR and AFLP were compared between populations using Spearman's rank correlations. The correlation between SSR and AFLP genetic distance matrices was tested by the Mantel test of matrix correspondence (Mantel 1967b). 102 4.2.3.3 Genetic similarity analysis among genotypes in breeding population To estimate genetic relationship between the 92 trees in the breeding population, the U P G M A tree was constructed by four AFLP primer combinations using NTSYS-pc software, based on the similarity matrix calculated with Jaccard's coefficient (Jaccard 1908): S= a/(a + b + c) where a = number of fragments present in both samples; b = number of fragments present in sample A , but not in B; c = number of fragments present in sample B, but not in A . The goodness-of-fit of the tree to the distance matrix data was tested by cophenetic correlation. 4.3 Results 4.3.1 Genetic diversity for breeding, orchard populations, seedlot, and seedling versus natural populations Genetic differences among the 10 natural populations were small and non-significant (Chapter 3), thus permitting grouping the populations into one group and their collective data was used as a benchmark for comparing the level of genetic variation at different domestication stages (natural, breeding, production, seed, and seedling populations). Genetic diversity parameters for SSRs in the five test populations are shown in Table 4.1. In general, the total number of alleles over loci (AT) and average number of alleles per locus per population (AQ) are higher in the composite natural population compared to that observed in any individual population. Lower allelic diversity, represented by (AT) and (A0), were observed along the domestication process. By contrast, allelic richness (Ai) and expected heterozygosity (HE) did not substantially differ along this process (Table 4.1). For A F L P , the percentage of polymorphic loci ranged from 82.3% (seed orchard) to 95.4% (natural population) (Table 4.1). Expected heterozygosity estimates were similar 103 along the domestication process. Seedlings gave the highest estimate (0.354), while natural populations gave the lowest estimate (0.316). There was no significant correlation of expected heterozygosities between SSRs and AFLPs (r = -0.162, P = 0.79). 4.3.2 Lost of alleles along domestication process SSR alleles were classified into four different frequency classes and were compared along the domestication process (Table 4.2). Most alleles occur in the low and rare frequency classes, and it is in these classes where the most of loss of alleles along the domestication process was observed (Table 4.3). Most of rare alleles were localized alleles, found only in one or few populations. Only four out of the 19 private alleles, present in only a single copy, were retained in the domesticated populations. Figure 4.2 shows the frequency (copy number) distributions of alleles detected in natural populations, but absent in breeding, production, seed, and seedling populations. In this figure, allele frequencies are given in copy number units (referring to the number of copies of allele in the natural populations), where a copy number of 5 or less is the rare frequency class (P < 0.01). The numbers of private alleles that were lost in the breeding, production, seed, and seedling populations were 10, 11, 13, and 14, respectively (Fig. 4.2). Pollen contamination is the most probable means for explaining the presence of 9 and 13 alleles in the seed and seedling populations, respectively, in spite of their absence in the seed orchard (data not shown). 4.3.3 Genetic similarity among individuals in breeding and production populations No best method can be identified for measuring genetic similarity between diploid organisms with dominant markers. However, Jaccard's coefficient has been preferred in plant breeding and evolution studies with dominant markers, due to its good comparison 104 capacity in analyzing genotypes of the same species, when higher genetic similarity are expected. In the breeding and production populations, the four A F L P primer combinations gave 187 loci, 155 of which (82.8) were polymorphic. These 155 polymorphic loci were used to estimate genetic similarity between genotypes within populations. The mean genetic similarity among the 92 genotypes in breeding population was 0.602 (SD = 0.075) varying from 0.314 (346 and 1616) to 0.809 (1746 and 1821). Mean genetic similarity for the 44 seed orchard parents was 0.580 (SD = 0.084). Cluster analysis indicated that no particular grouping among genotypes in breeding and seed orchard populations (Fig. 4.3). This dendrogram showed a good fit to the genetic similarity data, as reflected by a cophenetic correlation coefficient of 0.90. 4.3.4 Population differentiation and dendrogram When all five populations involved in the domestication process were analyzed for both types of markers, a low but significant level of genetic differentiation was found for both SSRs ( F S T = 0.008, P< 0.001; RSr = 0.003, PO.001) and for AFLPs ( F S T = 0.065, FO.001). Table 4.4 shows the pairwise FST among these five populations as calculated from SSRs and AFLPs (RST was not used for reasons cited in Chapter 3). For SSRs, most pairwise FST values were statistically significant, with the exceptions being between natural and breeding, between natural and production, and between breeding and production populations. The smallest pairwise FST estimates were observed between breeding and production populations for both types of markers, indicating that these two populations are most 105 genetically similar. The largest distance between populations was observed for AFLPs between seed and seedling populations. A Mantel test that compared the similarity of the SSR and A F L P pairwise FSJ values revealed significant similarity between these two datasets (r = 0.68, P = 0.008). Consequently, the SSR- and AFLP-derived dendrograms, based on Nei's genetic distance, appeared quite similar (Fig. 4.4). A dendrogram was also constructed using (8p)z for SSR markers, and it produced the same pattern of clustering as that based on Nei's genetic distance (data not shown). Generally, the breeding and production populations clustered together. These two next joined the natural populations, indicating slight differentiation, with bootstrap support values ranging from 89 to 100%. The seed and seedling populations were both separated from these three populations, for both SSRs and AFLPs. SSRs grouped the seed and seedling populations at a bootstrap support value of 100%, while the AFLP analysis grouped the seed with the natural, breeding, and production populations with a bootstrap support value of 66%, and also showed separation for the seedling from the other populations. Despite slight differences between SSR and A F L P analyses, there is a general trend of steady differentiation as the domestication process proceeds. 4.4 Discussion 4.4.1 Genetic diversity in natural vs. domesticated populations The results from this study demonstrate that lodgepole pine has experienced some reduction of genetic diversity along the domestication process. The data from SSRs suggest that significant numbers of rare and low frequency alleles were lost during domestication. The losses of low frequency alleles are more significant than that of rare alleles because these 106 alleles are widespread and account for approximately 66% of the total allelic component. Most losses occurred in the breeding (15%) and production (23.5%) populations. In contrast to this loss of alleles, domesticated populations had similar expected heterozygosities relative to their natural counterparts. This is due to the minimal contribution of rare alleles to expected heterozygosity, and to the fact that changes in intermediate allele frequency, while probably significant during domestication, have little effect upon expected heterozygosity. This also implies that AFLPs, which are a poor indicator of allele number, are not nearly as informative about the effects of domestication as SSRs, and therefore that SSRs are the superior marker for studying domestication. The loss of genetic diversity during domestication is mainly due to the phenotypic selection used to construct these populations, which reduces effective population size by favoring the propagation of a subset of individuals. The reduction of allele number from production to seedling populations was also due to finite population size as well as unintended natural selection. However, some alleles were not present in production but were observed in both of the seed and seedling populations, and this indicated pollen migration (contamination) from outside the seed orchard. 4.4.2 Genetic relationships among individuals within breeding and production populations No major grouping among accessions was observed in breeding and production populations. However, parents in breeding population tend to have moderate mean similarity value (0.602), but high when compared with the values reported by Keil & Griffin (1994) for Ecucalyptus grandis (0.55), by Poltri et al. (2003) for E. dunnii Maiden (0.38), and by Leite et al. (2002) for E. urophylla S.T. Blake (0.32). The distribution of similarity index also 107 indicated that selection of a seed orchard based on 44 individuals has not increased genetic similarity (Fig. 4.5). Mean genetic similarity for the production was 0.580 compared to 0.602 for the breeding population of 92 genotypes. Thus, the seed orchard with its present population structure (i.e., subset of the breeding population) is expected to have minimal or no impact on genetic variability. Individuals selected for the inclusion in breeding and production seed orchard populations should harbour enough genetic variability to ensure an adequate genetic base for long-term viability of breeding programs (Lindgren et al. 1989). 4.4.3 Comparative analysis of genetic diversity and population differentiation Across the five populations, there were no significant correlations between SSRs and AFLPs for their estimates of diversity and pairwise genetic distance. This can be explained by the fact that conifers, with their known high genetic diversity, are still in their early stages of domestication; therefore, the levels of differentiation are small among populations. This can lead to lack of correlation among diversity estimates from both markers because the observed ranking of populations may only be generated by the random statistical variation of diversity, with no "actual" variance of diversity present (Gaudeul et al. 2004; Mariette et al. 2001). However, we did find a significant correlation between SSRs and AFLPs for their estimates of pairwise FST, which was also supported by a concordance of dendrograms between SSRs and AFLPs. The genetic relationships between natural and domesticated populations tended to show similar trend where a steady diversion and separation among the populations was observed by clustering analyses from both markers. Cluster analyses seemed to reveal subtle changes along the domestication process. Both breeding and production populations were grouped with natural populations, indicating 108 the adequacy of plus-tree selection in capturing the majority of genetic variation present in natural populations. This was also observed for Picea abies (Bergmann & Ruetz 1991), Picea sitchensis (Chaisurisri & El-Kassaby 1994), Pesudotsuga menziesii (Mirb.) (El-Kassaby & Ritland 1996), and Picea glauca x engelmannii (Stoehr & El-Kassaby 1997), where the authors attributed the breadth of plus-tree sampling as the main reason for the observed similarity. The observed similarity between the breeding and production populations in both analyses (SSR and AFLP) occurs because the seed orchard population is a subset of the breeding population. The observed separation of both seed and seedlings populations is entirely expected, as this would be caused by imbalances of parental gametic contribution (males vs. females), which occurs in most seed orchards. With regard to parental imbalance, it is commonly known that the majority of seed orchard cone/seed crops are produced from a smaller than expected set of parents. The term "80-20" was coined by the North Carolina State Tree Improvement Cooperative (Anonymous 1976) to describe this situation. Parental imbalance in seed orchards was observed for seed- and pollen-cone production for many species (El-Kassaby et al. 1989; Elkassaby & Askew 1991; Roberds et al. 1991; Schoen & Stewart 1986; 1987). The genetic distinctness of the seedling population is also expected. It supports earlier results obtained from container seedling productions (El-Kassaby 2000b; El-Kassaby & Thomson 1996). In these studies the authors demonstrated that the parental genetic output is different between the seed and seedlings due to the unintentional selection against specific genotypes. They identified two possible bottlenecks where this selection takes place. These are (1) during thinning of surplus germinants after germination, caused by genetic differences 109 in germination speed and dormancy among seed donors, and (2) culling of "substandard" seedlings at the end of the growing season. In summary, our results show that breeding practices cause an erosion of genetic variability, particularly for rare alleles, but not significantly for heterozygosity. This indicates that rare alleles are the best indicator of the changes of genetic variability resulting from domestication. It also indicates that the SSR technique, in contrast to the A F L P technique, is the best way to measure of genetic changes due to domestication. SSR loci contain many rare alleles, while A F L P loci have two alleles, so the opportunity to detect losses of alleles is much higher with SSRs. Rare alleles may have future adaptive values for pest or disease resistance as well as future climate changes. We note that most of the SSRs used in this study were developed from EST, low-copy, and undermethylated libraries in which their SSR sequences primarily reside in the single-copy region and might be associated with genes controlling fitness and adaptability. Recently, Gwaze et al. (2003) suggested that a QTL with a large effect for growth rate was detected by linkage with specific SSR markers (PtTX3030-PtTX3127) in P. taeda. The value of low and rare frequency alleles needs further investigation. For practical breeding purposes, capturing most of locally common alleles is of significance for forest trees in adapting to local environments. Despite the observed lack of correlation among the diversity estimates obtained from SSRs and AFLPs, the different types of markers (both co- and/or dominant markers) have been used for scanning the genome for genetic diversity and genetic differentiation along chromosome (Scotti-Saintagne et al. 2004), raising the possibility of linking molecular polymorphisms to adaptively significant phenotypic variation. At present, however, 110 investigating genetic diversity in domesticated populations has not been used molecular markers associating with genes under selection for economically important traits. The discovery of adaptive single nucleotide polymorphisms (SNPs) will add a new dimension to the characterization of forest tree domestication, as it will allow us to trace selection and the spread of economically important alleles. I l l Table 4.1 Genetic diversity parameters for natural versus domesticated populations Microsatellites AFLP Population n Aj Ao Ah HE F1S PPL Hj Natural 300 187 15.58 (0.23) 11.70 (0.18) 0.781 (0.040) 0.094* 95.4 0.316(0.011) Breeding 92 159 13.25 (0.18) 11.65 (0.17) 0.784(0.041) 0.049* 83.8 0.323 (0.013) Production 44 143 11.92(0.18) 11.86(0.18) 0.784(0.042) 0.058n s 82.3 0.340 (0.012) Seed 120 139 11.50(0.17) 10.00(0.15) 0.757 (0.041) 0.073* 87.6 0.323 (0.013) Seedling 120 138 11.50(0.18) 9.80(0.16) 0.757(0.042) 0.055* 88.4 0.354 (0.012) n; number of samples, Aj, total number of alleles, AQ; average number of alleles, AL; allelic richness (random sample size of 40), HE; expected heterozygosity, P P L ; proportion of polymorphic loci at 5% level, Hy, same as HE for AFLP markers. n s ; not significant, *; Significant differentiation after Bonferroni correction (5% level, adjusted a = 0.0008). Standard errors in parentheses. 112 Table 4.2 Distribution of microsatellite alleles over allele frequency classes along the domestication process (rare: P < 0.01, low: 0.01 < P < 0.25, intermediate: 0.25 < P < 0.75, high: P > 0.75) Allelic No. of alleles (%) per population frequency class Natural Breeding Production Seed Seedling Actual3 Rare 53 (28.3) 22(13.8) 0 (0.0) 24(17.3) 24(17.4) Low 124 (66.3) 127 (79.9) 132 (92.3) 104 (74.8) 99 (71.1) Intermediate 10(5.4) 10(6.3) 11(7.7) 10(7.2) 10(10.2) High 0 (0.0) 0 (0.0) 0 (0.0) 1 (0.7) 1 (0.7) 3 " e IT Retained Rare 33(20.7) 24(16.8) 20 (14.4) 19(13.8) Low 116(73.0) 109 (76.2) 109 (78.4) 109 (79.0) Intermediate 10(6.3) 10(7.0) 10(7.2) 10(7.2) High 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) aAlleles present/absent in the domesticated populations compared to the natural populations. bGeneral representation of alleles over the allele frequency classes in the domesticated populations, but with their observed frequencies classes irrespective of their presence or absence from the natural populations. 113 Table 4.3 Allele losses along the domestication process (compared to natural populations) Allele losses Breeding Production Seed Seedling Total number (%) 28(15.0) 44 (23.5) 48 (25.7) 49 (26.2) Alleles lost of total alleles detected (%) Rare 10.7 15.5 17.7 18.2 Low 4.3 8.0 8.0 8.0 Intermediate and high 0.0 0.0 0.0 0.0 Four allele frequency class* [Number (%)] Rare 20 (37.7) 29 (54.7) 33 (63.3) 34 (64.2) Low 8 (6.5) 15 (12.1) 15 (12.1) 15 (12.1) Intermediate and high 0(0) 0(0) 0(0) 0(0) *Rare: P < 0.01, Low: 0.01 < P < 0.25, Intermediate: 0.25 < P < 0.75, High: P > 0.75 114 Table 4.4 Pairwise comparison matrix of F$T estimates for SSRs (below diagonal) and for AFLPs (above diagonal) among the five tested populations of P. contorta ssp. latifolia Natural Breeding Production Seed Seedling Natural 0.0181 0.0184 0.0971 0.0816 Breeding 0.0001 0.0000 0.0730 0.0868 Production 0.0004 -0.0062 0.0606 0.0808 Seed 0.0107 0.0094 0.0053 0.1258 Seedling 0.0143 0.0143 0.0090 0.0024 SSRs significance tested by permutations using FSTAT (Goudet 2002); tests for significance of population differentiation for AFLPs using exact test in TFPGA (Miller 2000). A pairwise significant difference between population is indicated in bold letters. 115 Figure 4.1 Location of the breeding population parent trees and seed planning zones. Dashed areas represent seed zone overlaps 116 0 2 4 6. 8 10::12 14 16 18! 20 .22! 24 ;26-28 30. 32 34 36 :38;.40 : 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30.32 34 36 38 40 0 2 4 6 8 10 -12 14; 16 18 20 22 24 26 28 30 32 34-36 38 40 0 2 4 6 8 .10 12 14:16 18-20.22 24 .26 28 30 32 34 36 38 40: Copy number in natural population Figure 4.2 Number of lost alleles in the domesticated populations according to their frequency (copy number) in the overall sample of 300 individuals from the 10 natural populations in P. contorta ssp. latifolia. A copy number of 5 or less corresponds to a rare frequency class (P < 0.01) 117 — — a ; : . i n • 471 • • -mi y. .. — -264: ' i a 17 1809 — : — — 2 . ' ' • • " -I . ' ;««: I ' <—. -~—2jX • 1 H ;;r L - 2 ^ • jlSZl -CE n 1 7 4 6 — 315 -- 1 7 4 ! -1806 -1773 -1774 -1776 -1642 -1614 -lf:24 -174.1 -1775 -/7<*.> Figure 4.3 U P G M A dendrogram of the 92 trees from Prince George breeding population in P. contorta ssp. latifolia based on AFLP data using Jaccard's genetic similarity matrix. Bold and italic indicate seed orchard parents 118 (a) 100 97 0.01 100 Natural population Breeding population Seed Orchard Seedlot Seedling :: •(b) 99 66 |— Natural population Breeding population TOO Seed Orchard Seedlot Seedling o.or Figure 4.4 Dendrograms generated by U P G M A clustering showing the genetic relationships among populations in the domestication process. Based on (a) SSR data and Nei's genetic distance and (b) A F L P data and Nei's genetic distance. The numbers on the branches are the percentage support of 1,000 bootstrap replications 119 i-n.l • Breeding • Seed Orchard I Genetic similarity 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Figure 4.5 Distribution of genetic similarity based on A F L P markers for the breeding population of 92 genotypes (Black bars) and the seed orchard of 44 genotypes (Open bars) 120 4.5 References Anonymous (1976) Twentieth annual report on cooperative tree improvement and hardwood research program. North Carolina State University, Raleigh, North Carolina. Bergmann F, Ruetz W (1991) Isozyme genetic variation and heterozygosity in random tree samples and selected orchard clones from the same Picea abies populations. Forest Ecology and Management 46, 39-47. Butcher PA, Glaubitz JC, Moran GF (1999) Applications for microsatellite markers in the domestication and conservation of forest trees. Forest Genetic Resources 27, 34-42. Butcher PA, Moran GF, Perkins HD (1996) Genetic resources and domestication of Acacia mangium Tree improvement for sustainable tropical forestry, 27 October-1 November 1996, Caloundra, Queensland, Australia, pp. 467-471. Carlson M (2001) "Select" lodgepole pine seed availability to 2010. In: TICtalk, pp. 25-28. Forest Genetics Council of British Columbia. Carroll A , Taylor S, Regniere J, Safranyik L (2003) Effects of climate change on range expansion by the mountain pine beetle in British Columbia. In: Mountain Pine Beetle Symposium: Challenges and Solutions (eds. Shore TL, Brooks JE, Stone JE), pp. 223-232. Natural Resources Canada, Pacific Forestry Centre, Victoria. Chaisurisri K , El-Kassaby Y A (1994) Genetic diversity in a seed production population vs. natural populations of Sitka spruce. Biodiversity and Conservation 3, 512-523. Dancik BP, Yeh FC (1983) Allozyme variability and evolution of lodgepole pine (Pinus contorta var. latiforlia) and jack pine (P. banksiana) in Alberta. Canadian Journal of Genetics and Cytology 25, 57-64. Doyle JJ, Doyle JL (1987) A rapid D N A isolation procedure for small quantities of fresh tissue. Phytochemical Bulletin 19, 11-15. El-Kassaby Y A (2000a). Effect of forest tree domestication on gene pools. In: Forest Conservation Genetics: Principles and Practice. Commonwealth Scientific and Industrial Research Organisation (CSIRO) (Young, A. , D. Boshier and T. Boyle, eds.). CSIRO Publishing-CABI Publishing, Canberra, Australia. Chapter 13:197-213. El-Kassaby Y A (2000b) Representation of Douglas-fir and western hemlock families in seedling crops as affected by seed biology and nursery crop management practices. Forest Genetics 4, 305-315. El-kassaby Y A , Askew GR (1991) The Relation between Reproductive Phenology and Reproductive Output in Determining the Gametic Pool Profile in a Douglas-Fir Seed Orchard. Forest Science 37, 827-835. El-Kassaby Y A , Fashler A M K , Crown M (1989) Variation in fruitulness in a douglas-fir seed orchard and its effect on crop-managemnet decisions. Silvae Genetica 38, 113-121. El-Kassaby Y A , Ritland K (1996) Impact of selection and breeding on the genetic diversity in Douglas-fir. Biodiversity and Conservation 5, 795-813. 121 El-Kassaby Y A , Thomson A J (1996) Parental rank changes associated with seed biology and nursery practices in Douglas-Fir. Forest Science 42, 228-235. Felsenstein J (2004) PHYLIP (Phylogeny Inference Package) version 3.62 Distributed by the author, http://evolution.gs.washington.edu/phylip.html. Department of Genome Sciences, University of Washington, Seattle. Gaudeul M , Till-Bottraud I, Barjon F, Manel S (2004) Genetic diversity and differentiation in Eryngium alpinum L. (Apiaceae): comparison of A F L P and microsatellite markers. Heredity 92, 508-518. Goodman SJ (1997) RST Calc: a collection of computer programs for calculating estimates of genetic differentiation from microsatellite data and determining their significance for microsatellite data. Molecular Ecology 6, 881-885. Goudet J (2002) FSTAT 2.9.3.2, A program to estimate and test gene diversities and fixation indices Available from http://www2.unil.ch/izea/softwares/fstat.html. Gwaze DP, Zhou Y , Reyes-Valdes M H , Al-Rababah M A , Williams C G (2003) Haplotypic QTL mapping in an outbred pedigree. Genetical Research 81, 43-50. Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bulletin Society Vaud Science National 44, 223-270. Keil M , Griffin A R (1994) Use of random amplified polymorphic D N A (RAPD) markers in the discrimination and verification of genotypes in Eucalyptus. Theoretical and Applied Genetics 89, 442-450. Leite S M M , Bonine C A , Mori ES, Do Valle CF, Marino C L (2002) Genetic variability in a breeding population of Eucalyptus urophylla S.T. Blake. Silvae Genetica 51, 253-256. Lexer C, Heinze B, Steinkellner H, Kampfer S, Ziegenhagen B, Glossl J (1999) Microsatellite analysis of maternal half-sib families of Quercus robur, pedunculate oak: detection of seed contaminations and inference of the seed parents from the offspring. Theoretical and Applied Genetics 99, 185-191. Liewlaksaneeyanawin C, Ritland CE, El-Kassaby Y A , Ritland K (2004) Single-copy, species-transferable microsatellite markers developed from loblolly pine ESTs. Theoretical and Applied Genetics 109, 361-369. Lindgren D, Libby WS, Bondesson FL (1989) Deployment to Plantations of Numbers and Proportions of Clones with Special Emphasis on Maximizing Gain at a Constant Diversity. Theoretical and Applied Genetics 77, 825-831. Lynch M , Milligan B G (1994) Analysis of population genetic structure with RAPD markers. Molecular Ecology 3, 91-99. Mantel N (1967) Detection of Disease Clustering and a Generalized Regression Approach. Cancer Research 27, 209-&. Mariette S, Chagne D, Lezier C, Pastuszka P, Baffin A , Plomion C, Kremer A (2001) Genetic diversity within and among Pinus pinaster populations: comparison between A F L P and microsatellite markers. Heredity 86, 469-479. 122 Minch E, Ruiz-Linares A , Goldstein DB, Feldman M W , Cavalli-Sforza L L (1996) MICROSAT Version 1.5. A Computer Program for Calculating Various Statistics on Microsatellite Allele Data, Standford University Medical Centre, Standford, CA. Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 83, 583-590. Page R D M (1996) TREEVIEW: an application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 12, 357-358. Petit RJ, E l Mousadik A , Pons O (1998) Identifying populations for conservation on the basis of genetic markers. Conservation Biology 12, 844-855. Poltri SNM, Zelener N , Traverso JR, Gelid P, Hopp HE (2003) Selection of a seed orchard of Eucalyptus dunnii based on genetic diversity criteria calculated using molecular markers. Tree Physiology 23, 625-632. Rajora OP, Rahman M H , Buchert GP, Dancik BP (2000) Microsatellite D N A analysis of genetic effects of harvesting in old-growth eastern white pine {Pinus strobus) in Ontario, Canada. Molecular Ecology 9, 339-348. Raymond M , Rousset F (1995) Genepop (Version-1.2) - Population-Genetics Software for Exact Tests and Ecumenicism. Journal of Heredity 86, 248-249. Roberds JH, Friedman ST, El-kassaby Y A (1991) Effective number of pollen parents in clonal seed orchards. Theoretical and Applied Genetics 82, 313-320. Schmidtling RC, Carroll E, LaFarge T (1999) Allozyme diversity of selected and natural loblolly pine populations. Silvae Genetica 48, 35-45. Schoen DJ, Stewart SC (1986) Variation in male reproductive investment and male reproductive success in white spruce. Evolution 40, 1109-1120. Schoen DJ, Stewart SC (1987) Variation in male fertilities and pairwise mating probabilities in Picea glauca. Genetics 116, 141-152. Scotti-Saintagne C, Mariette S, Porth I, Goicoechea PG, Barreneche T, Bodenes K, Burg K, Kremer A (2004) Genome scanning for interspecific differentiation between two closely related oak species [Quercus robur L. and Q. petraea (Matt.) Liebl.]. Genetics 168, 1615-1626. Stoehr M , Webber J, Woods J (2004) Protocol for rating seed orchard seedlots in British Columbia: quantifying genetic gain and diversity. Forestry 11, 297-303. Stoehr M U , El-Kassaby Y A (1997) Levels of genetic diversity at different stages of the domestication cycle of interior spruce in British Columbia. Theoretical and Applied Genetics 94, 83-90. Vekemans X , Beauwens T, Lemaire M , Roldan-Ruiz I (2002) Data from amplified fragment length polymorphism (AFLP) markers show indication of size homoplasy and of a relationship between degree of homoplasy and fragment size. Molecular Ecology 11, 139-151. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38, 1358-1370. 123 Yeh FC, Layton C (1979) The organization of genetic variability in central and marginal populations of lodgepole pine Pinus contorta ssp. latifolia. Canadian Journal of Genetics and Cytology 21, 48"/'-503. Zhivotovsky L A (1999) Estimating population structure in diploids with multilocus dominant D N A markers. Molecular Ecology 8, 907-913. 124 CHAPTER 5 HIGH RESOLUTION ANALYSIS OF BIPARENTAL INBREEDING AND SIBSHIP STRUCTURE IN PERIPHERAL POPULATIONS OF LODGEPOLE PINE 5.1 Introduction The mating system determines the transmission of genes from parents to progeny, and over the longer term, influences the distribution of genetic variation within and among populations (Brown 1988). In plants, it is classically characterized by the proportion of self-fertilization vs. random outcrossing, as estimated by analysis of isozymes in progeny arrays (Ritland & Jain 1981). More recently, the patterns of paternity among outcrossed progeny have received attention, using methods such as the progeny-pair model which gives the proportion of full-sibs among all pairwise comparisons (Ritland 1989), and pedigree reconstruction methods which give the distribution full-sib groups within an open-pollinated family (Thomas & Hi l l 2002; Wang 2004). However, patterns of paternity require genetic markers with more information than given by isozymes. In plant populations, localized seed and pollen dispersal result in "substructuring" where neighboring plants may often be genetic relatives, such that outcrosses may occur between related individuals (biparental inbreeding). However, distinguishing biparental inbreeding from true selfing is difficult and also requires many highly polymorphic markers and adequate statistical analyses. Recent extensions of classic mating system models now provide additional measures of the mating system (Ritland 2002), and these measures are particularly appropriate for more informative markers such as microsatellites and AFLPs (amplified fragment length polymorphisms). The correlation of selfing between pairs of loci within individuals and comparison of single vs. multilocus correlated matings provide new measures of biparental 125 inbreeding and population substructure, respectively (Ritland 2002). At the individual level, these highly informative markers allow estimation of outcrossing and correlated matings for individual parents, based upon their array of progeny. The high variability of microsatellite markers has also provided opportunities for applying new approaches for partitioning individuals within progeny arrays into full- and half-sib groups (Thomas & Hil l 2002; Wang 2004). Recently, microsatellite markers have been applied to study mating systems in tropical (Collevatti et al. 2001; Nagamitsu et al. 2001) and temperate (Robledo-Arnuncio et al. 2004; Vogl et al. 2002) tree species. Collevatti et al. (2001) demonstrated that microsatellite markers provide high resolution to precisely discriminate selfing from outcrossing events even among close relatives. In addition, dominant markers such as RAPDs and AFLPs have been used for mating system studies in limited number of plant species such as Datisca glomerata (Fritsch & Rieseberg 1992) and Eucalyptus urophylla (Gaiotto et al. 1997). In a separate study we have evaluated the use of AFLPs for mating system estimation (Chapter 6). Lodgepole pine {Pinus contorta Dougl. ex Loud) is a coniferous tree species that is widely distributed in western North America (Wheeler & Critchfield 1985). Palynological studies revealed that lodgepole pine migrated from southern refugia to the Yukon Territory in the last 12,000 years (MacDonald & Cwynar 1985). It is monoecious and wind-pollinated, with high levels of outcrossing expected. Yeh & Layton (1979) estimated outcrossing rate from peripheral, intermediate, and central populations of lodgepole pine using Wright's fixation index estimated from isozyme markers, and found that there were no differences of outcrossing rate observed among these populations {t = 0.918-1.292). However, these estimates suffer from large error and assume inbreeding equilibrium. More recently, isozyme 126 markers assayed from progeny arrays indicated the rate of outcrossing in lodgepole pine was higher than 90% (Epperson & Allard 1984; Perry & Dancik 1986). Peripheral populations occur near the edge of the geographic range of the species and are. often of relatively small size and are more likely to be at the limits of adaptation. The consequences of isolated and/or peripheral populations on plant mating system have been reported in annual plants (Centaurea solstialis L.) (Sun & Ritland 1998), eastern white pine (P. strobus L.) (Rajora et al. 2002), and Scots pine (P. sylvestris L.) (Robledo-Arnuncio et al. 2004). These studies demonstrated the presence of higher expected level of selfing and correlation of outcrossed paternity in isolated and/or peripheral populations. Yeh & Layton (1979) hypothesized that the level of biparental inbreeding in peripheral P. contorta ssp. latifolia populations was higher than those of central populations as a result of founder effects, reduced gene flow, and low density. New approaches to characterize the mating system, particularly in peripheral populations, can provide insights into micro-evolutionary changes due to founder events and/or small population size, and also provide insights into the genetic diversity of seed collections for reforestation. Previous studies of P. contorta ssp. latifolia mating system studies have mainly focused on estimating outcrossing rate; no attempts were made to determine the extent of correlated paternity, patterns of biparental inbreeding, individual-tree outcrossing rates, and the fine-scale pedigree structure within individual tree progenies. In this study, the objectives were to (1) obtain fine-scale estimates of mating system parameters, (2) infer sibling relationships within progeny arrays, and (3) evaluate the concordance of these two approaches. 127 5.2 Materials and methods 5.2.1 Plant material Two natural populations were sampled from Carbondale (Calgary) and Whitehorse (Yukon) representing the eastern and northern limits of the species distribution, respectively. The Carbondale population (60°43'N, 135°05'W; elevation 762 m) is densely populated with trees ranging from 30 and 80 years, while the Whitehorse population (49°26'N, 114°25'W; elevation 1510 m) is an even-aged, low-density stand. Seed-cones were collected from 20-30 trees of each population with a minimum of 30.0 m between sampled trees. Seed were extracted and kept separate to maintain the identity of every sampled tree within each population. Seeds were germinated in petri dishes on moist filter paper at room temperature for approximately 10 days. For each population, a total of 400 individual germinants (20 germinants/tree and 20 families/population) were used for assay and analyses. 5.2.2 D N A extraction and SSR assay Genomic D N A was isolated from individual germinants in 1.5 mL microtubes using a modification of the C T A B (cetyltrimethyl ammonium bromide) method (Doyle & Doyle 1987). A set of 11 species-transferable microsatellite markers (Liewlaksaneeyanawin et al. (2004) were selected for the mating system and sibship reconstruction analyses (PtTX2123, PtTX2146, PtTX3011, PtTX3025, PtTX3034, PtTX3052, PtTX3127, PtTX4054, PtTX4056, PtTX4058, PtTX4139). PCR amplification are described in Liewlaksaneeyanawin et al. (2004). Microsatellite products were detected by M l 3 tailed primer (Oetting et al. 1995) or infrared dye (IRD)-labeled primer. The amplification products were electrophoresed on 5.5% 128 Long Ranger polyacrylamide gels using a LiCor 4200 automated sequencer (LiCor Inc., Lincoln, NE). IRD-labeled molecular weight markers 50-350 bp (LiCor Inc.) were loaded at least two lanes as standards. Microsatellite alleles were scored according to their molecular weight. 5.2.3 Mating system analysis The multilocus mixed-mating model of Ritland (1990; 2002) was used to estimate mating system parameters. The assumptions of mixed-mating model are: 1) maternal genotypes outcross at the same rate to a homogenous pollen pool; 2) alleles at different loci segregate independently; and 3) no mutation or selection between the time of mating and the progeny assay (Ritland & Jain 1981). Mating system parameters including populations and individuals' single-locus (ts) and multilocus (tm) outcrossing rate, single-locus (rP(S)) and multilocus (rP(m)) correlated matings, correlation of selfing between pairs of loci within individuals (rsi), correlation of t between individuals within families (rt), inbreeding coefficient (F) were estimated using the Expectation - Maximization (EM) procedure of the computer program MLTR 3.1 (Ritland 2002). Maternal genotypes at each locus were inferred from progeny arrays using the most likely parent method (Brown & Allard 1970). Variances for mating system estimates were obtained from 1,000 bootstrap replicates. The differences between the individual family tm and ts were tested by the paired t test. 5.2.4 Sibship reconstruction Each family was partitioned into sibling (half- and full-sib) groups based on the group-likelihood approach using the computer program Colony 1.0 (Wang 2004) assuming an error rate of 0.05 for each locus and individual. The group likelihood approach assumes: 1) individuals are sampled from a single cohort in a large random-mating population; 2) 129 genetic markers are neutral, unlinked between loci, and in linkage equilibrium, and 3) only one sex is allowed to be multiply mated. After partitioning the offspring from each family into full-sib groups, an prediction of the correlation of paternity was found for each family as where AT, is the number of full-sib groups of size /', and T=20 (number T(T-1), of progeny in family). 5.3 Results 5.3.1 Mendelian inheritance and gene diversity The Mendelian segregation of the 11 microsatellite markers was confirmed in Chapter 2. However, maternal genotypes were heterozygous for null alleles in some families at PtTX3034 and PtTX4139, as evidence by the observation of more than two homozygous offspring genotypes within progeny array. These loci were excluded from further analyses for families that null alleles were observed. Eleven microsatellite loci showed high level of polymorphism in both populations. The number of maternal alleles per population ranged from 3 to 14 at PtTX2123 and PtTX3011 loci, respectively (Table 5.1). The mean numbers of maternal alleles were 8.45 and 10.09 for Whitehorse and Carbondale population, respectively. Approximately zero F values were observed for the Carbondale (0.010) and Whitehorse (0.000) populations (Table 5.2), indicating a lack of significant frequencies of null alleles. 5.3.2 Population estimates of the mating system Both populations showed practically complete multilocus outcrossing, with multilocus outcrossing rates (tm) of 0.990 and 0.992 for the Whitehorse and Carbondale 130 populations, respectively (Table 5.2). The single-locus outcrossing rate estimate (ts) for Whitehorse (0.874) was lower than that for Carbondale (0.936). The differences between the single- and multilocus (tm - ts) estimates were likewise higher for Whitehorse (0.116) compared to Carbondale (0.056) (Table 5.2). At the single-locus level, the correlation of paternity was moderate, at about 0.043 for both populations (Table 5.2), while the multilocus correlated paternity (rp(m)) did differ significantly between populations, with values of 0.009 and 0.021 for Whitehorse and Carbondale, respectively (Table 5.2). Thus the difference between the single locus and multilocus correlated paternity was higher in Whitehorse than in Carbondale (0.033 vs. 0.022, Table 5.2), and this is statistically significant. The correlation of outcrossing rate among families (rt) was low but significant, indicating low variation of outcrossing rate among maternal parents (due to variation of biparental inbreeding). The correlation of selfing between loci (rsi) was quite low, being 0.096 and 0.146 for Whitehorse and Carbondale, respectively (in populations with no biparental inbreeding, it should be unity). 5.3.3 Individual family estimates of mating system parameters Estimates of individual family single- and multilocus outcrossing rates and correlation of paternity are shown in Table 5.3. The individual family estimates of outcrossing rates in both populations show a skewed distribution, with the majority of families having high single- and multilocus outcrossing rates. The estimates of tm for individual families were identical for most of the families ranging from 0.931 to 0.984 for Carbondale and from 0.900 to 0.992 for Whitehorse population. No significant differences were observed between populations for individual family tm (P > 0.05). The mean individual family ts for Whitehorse population were significantly lower than those for the Carbondale 131 population. The ts for individual families were significantly lower than their corresponding tm for Whitehorse (P < 0.0001) and Carbondale populations (P < 0.0001). The estimates of multilocus correlated paternity (rp(m)) for individual family ranged from 0.022 to 0.151 for Carbondale and from 0.016 to 0.054 for Whitehorse population. The individual family estimates of rp(m) and rP(S) were not significantly different between populations (P > 0.05). The rp(S) for individual families were significantly higher than their corresponding rp(m) for both populations (P < 0.0001). 5.3.4 Sibship structure of progeny arrays The number of full-sib groups within each family ranged from 7 to 13 (mean = 10.5) and from 8 to 12 (mean = 10.45) for Carbondale and Whitehorse, respectively (Table 5.4). The distribution of full-sib group size for Carbondale and Whitehorse populations are shown in Figure 5.1. The estimates of Correlated paternity calculated from the distribution of full-sib group size are shown in Table 5.4. The rates of correlated paternity were significant and positively correlated with the variances of full-sib group size (r2 = 0.7326, P < 0.001) (Fig. 5.2). 5.4 Discussion 5.4.1 Outcrossing rate The results of this study show that, even in peripheral populations, lodgepole pine exhibits high outcrossing, with an average ^m of 0.991. These estimates are also comparable with the ^m estimates for lodgepole pine from northeastern Washington (^m = 0.990) (Epperson & Allard 1984) and that from the Rocky mountain of Alberta (tm = 0.948) (Perry & Dancik 1986), and are comparable to the outcrossing rates observed in most other pine species (O'Connell 2003). 132 High outcrossing rates in conifers are mainly due to the abundance of wind pollination and the high inbreeding depression at the seed stage, which filters out any remaining selfed seed (Husband & Schemske 1996). As seedlings are normally used to infer outcrossing rates, these selfed embryos that do not survive are not included in the outcrossing rate analyses, thus the actual outcrossing rate observed in this study is biased upward. Actual inbreeding or selfing rates, estimated by combining molecular-based estimates of selfing using the filled seeds with estimates of inbreeding from the proportions of empty seeds, are much higher than detected by molecular makers alone (Rajora et al. 2002), who reported an average of 22% selfing in eastern white pine populations. In addition, the high outcrossing rate for P. contorta ssp. latifolia is due to protoandry, meaning that peak pollen shed occurs a few days before peak female receptivity on the same tree (Owens et al. 1981), as well polyembryony, which provides ample opportunity of outcrossed embryos to out compete their selfed counterparts (Ledig 1998). Low values of r, (correlation of outcrossing rate within progeny array) were observed in both populations, indicating that the outcrossing rate was not variable among parent trees (e.g., among families). Low variation of outcrossing rate among families is expected when multilocus outcrossing rates are high, as any measured selfing is "apparent" selfing caused by biparental inbreeding, which has less opportunity to vary among individuals. 5.4.2 Biparental inbreeding and spatial genetic structure We estimated biparental inbreeding in three ways, two of which are novel and not yet applied to other studies of mating systems. The classic measure of biparental inbreeding is the increased level of single-locus outcrossing rate relative to multilocus outcrossing. The single-locus selfing rate was indeed lower in both populations, indicating the presence of 133 biparental inbreeding. The first new measure is difference of the single-locus correlation of paternity from the multilocus correlation of paternity (Ritland 2002). We found the single-locus correlation to be higher than the multilocus correlation, which is indicative of population substructure. But unlike the difference of selfing rates, this difference of paternity correlation is due to the correlation of paternally-derived alleles between half-sib progeny. The second new measure of population structure is the correlation of selfing between loci. With pure mixed-mating (selfing but no biparental inbreeding) this correlation should be equal to one. However we found quite low values (ca. 0.2). This indicates that most (if not all) of the single-locus selfing was due to biparental inbreeding. A problem with using the difference between single- and multilocus estimates of outcrossing is that it depends upon the number of loci used, with more loci giving a larger difference. The correlation of selfing among loci does not, so we can be more confident about the level of biparental inbreeding. However, the estimate of the correlation of selfing suffers from large statistical error, warranting the use of highly informative microsatellite markers. We found the level of biparental inbreeding was more pronounced in Whitehorse than Carbondale. This could be due to the extent of low-stand density, substructured populations, and founder effects. The Whitehorse population had much lower stand density than the Carbondale populations. Morgante et al. (1991) reported that low-density in Picea abies stand had significant effects on increasing level of biparental inbreeding. Similarly, high levels of biparental inbreeding were observed in small, isolated, peripheral populations of Pinus strobus (Rajora et al. 2002). 134 The presence and/or lack of spatial-genetic structure in conifers have been reported in several studies. Epperson & Allard (1989) studied spatial-genetic structure in two continuous lodgepole pine populations and found that the distribution of tree genotypes were almost random; however, Washington & Aitken (2005) revealed the presence of strong within-population spatial genetic structure in peripheral, continuous and peripheral, disjunct populations of Picea sitchensis (Bong.) Carr. Similarly, spatial and temporal patterns of seedling establishment following disturbances strongly affected the spatial genetic structure within populations in Pinus clausa var. clausa populations (Parker et al. 2001). The regeneration of lodgepole pine along the northern distribution limit in Yukon is commonly associated with fire (Johnstone & Chapin 2003). It is noteworthy to indicate that the serotinous cones of lodgepole pine store and release a large number of related seed following fire disturbance allowing the creation of spatial genetic structures in the regenerated stands (Lotan et al. 1985). The strong dependence of lodgepole pine regeneration on fire disturbance may act as a limiting factor to long-distance seed dispersal, thus creating nonrandom spatial distribution of tree genotypes. For wind-pollinated and wind-dispersed species the spatial structure may result from genetic drift associated with limited gene flow over greater distances (Heywood 1991; Wright 1943). The Whitehorse population has experienced more founder effects during population expansion to northern limit of its range leading to reduction of allelic diversity (Fazekas & Yeh 2001; Yeh & Layton 1979). Loss of allelic diversity may have decreased the number of compatible mates in neighborhood. Although the Carbondale population represents the eastern distribution limits; however, it may receive sufficient gene flow to maintain its genetic diversity contributing to its lower biparental breeding. 135 5.4.3 Correlated paternity and sibship structure within seed progeny Low values of rp(m) (correlation of outcross paternity within progeny arrays) were observed in both populations indicating that the probability of full-sibship within open-pollinated families were low. The correlation of outcross paternity is inversely related to the number of outcross parents (n) by rp = \/n, where n is the effective number of pollen donors. Following this relationship, a total of 47.6 and 111 pollen donors sired the progeny of trees from Carbondale and Whitehorse populations, respectively. Similarly high values of the effective number of pollen donors (ranging from 83-125) have been reported in natural populations of P. sylvestris (Robledo-Amuncio et al. 2004). In general, coniferous trees, which are wind-pollinated species, have lower level of correlated paternity than other insect or animal pollinated plants (see Table 5.5). However, increasing in correlated paternity (rp = 0.196) was reported in small, isolated populations of P. sylvestris (Robledo-Amuncio et al. 2004). In the present study, low levels of correlated paternity were observed in peripheral populations of P. contorta ssp. latifoloia. An explanation for this might be due to the fact that these peripheral populations are not geographically isolated. The multilocus correlated paternity (rp(mj) at population level was slightly higher in Carbondale than Whitehorse. As stated earlier, the Whitehorse population had much lower stand density than the Carbondale population. The estimates of correlated paternity were found to be significant in high-density populations, but not significant in low-density populations of Larix occidentalis, possibly because high tree density limiting pollen movement (El-Kassaby & Jaquish 1996). It is likely that stand density has much less effects on the level of correlated paternity in the present study. If stand density is one of the factors affecting the correlated paternity, the individual estimates of correlated paternity should be 136 similar across families. However, only two families (#4 and #17) exhibited high level of correlated paternity estimates (0.151 and 0.064) within the Carbondale population. Sibship analyses also supported the presence of high level of correlated paternity for family no. 4 and 17 showing full-sib group sizes of 9 and 7 (Table 5.4). In summary, sibling groups can be used to infer the minimum number of pollen donors contributing to the family or multiple paternity. The level of multiple paternity from different geographic regions will help in improving the sampling strategies for genetic conservation and tree improvement programs. Outcrossing events promote recombination; however, outcrossing between close relatives may result in the reduction of recombination rates. The information of mating system will provide insight to understanding the correlation between recombination rates and gene diversity. 137 Table 5.1 Allelic diversity at 11 microsatellite loci in two natural populations of P. contorta ssp. latifolia Locus No. of maternal alleles No. of paternal alleles Whitehorse Carbondale Whitehorse Carbondale PtTX2123 3 4 7 8 PtTX2146 7 15 15 20 PtTX3011 14 14 20 24 PtTX3025 6 6 12 12 PtTX3034 11 10 12 , 14 PtTX3052 4 5 8 9 PtTX3127 6 4 10 8 PtTX4054 13 14 16 17 PtTX4056 10 13 15 15 PtTX4058 10 12 17 19 PtTX4139 9 14 16 20 Mean 8.45 10.09 13.45 15.09 138 Table 5.2 Population estimates of mutilocus (tm) and single-locus (ts) outcrossing rate, parental inbreeding coefficient (F), multilocus (rP(m)) and single (rP(S)) correlation of paternity, correlation of selfing between loci (rs\), and correlation of t between individuals within families (rt) for P. contorta ssp. latifolia from Carbondale and Whitehorse populations, as estimated using 11 microsatellite loci Parameter Carbondale Whitehorse F 0.005 (0.010) 0.000 (0.000) tm 0.992 (0.006) 0.990 (0.006) ts 0.936 (0.012) 0.874 (0.012) tm~ ts 0.056 (0.009). 0.116(0.010) rp(m) 0.021 (0.008) 0.009 (0.003) 0.043 (0.006) 0.042 (0.004) rP(s) - r p (m) 0.022 (0.005) 0.033 (0.003) rsi 0.146 (0.111) 0.096(0.034) rt 0.099 (0.005) 0.091 (0.007) 139 Table 5.3 Estimates of mutilocus (tm) and single-locus (7S) outcrossing rate, and multi (rP(m)) and single-locus (rp(m)) correlation of paternity at individual family level for P. contorta ssp. latifolia Population/ Family _, t, tm-ts r__ __j r^.r^ Carbondale CAL01 0 984 ± 0 007 0 902 ± 0 004 0 082 ± 0 005 0 030 ± 0 106 0 083 ± 0. 130 0 053 ± 0.024 CAL02 0 984 ± 0 007 0 917 ± 0 007 0 067 ± 0 004 0 032 ± 0 106 0 086 ± 0. 131 0 054 ± 0.025 CAL03 0 984 ± 0 007 0 932 ± 0 010 0 052 ± 0 006 0 045 ± 0 112 0 085 ± 0. 130 0 040 ±0.018 CAL04 0 984 ± 0 007 0 928 ± 0 009 0 057 ± 0 005 0 151 ± 0 163 0 121 ± 0. 151 0 030 ±0.016 CAL05 0 984 ± 0 007 0 915 ± 0 006 0 070 ± 0 004 0 033 ± 0 107 0 090 ± 0. 133 0 056 ± 0.026 CAL06 0 984 ± 0 007 0 922 ± 0 008 0 062 ± 0 005 0 031 ± 0 106 0 082 ± 0. 129 0 051 ± 0.024 CAL07 0 984 ± 0 007 0 902 ± 0 006 0 083 ± 0 006 0 028 ± 0 105 0 085 ± 0. 130 0 057 ± 0.026 CAL08 0 900 ± 0 000 0 862 ± 0 010 0 038 ± 0 010 0 040 ± 0 110 0 083 ± 0. 130 0 043 ± 0.020 CAL09 0 984 ± 0 007 0 912 ± 0 006 0 072 ± 0 004 0 022 ± 0 102 0 081 ± 0. 129 0 059 ± 0.027 CAL10 0 984 ± 0 007 0 936 ± 0 011 0 048 ± 0 005 0 027 ± 0 104 0 076 ± 0. 126 0 049 ± 0.023 CAL11 0 984 ± 0 007 0 908 ± 0 006 0 076 ± 0 005 0 042 ± 0 112 0 088 ± 0. 132 0 045 ±0.021 CAL12 0 931 ± 0 002 0 891 ± 0 005 0 040 ± 0 006 0 047 ± 0 113 0 091 ± 0. 133 0 044 ± 0.020 CAL13 0 984 ± 0 007 0 916 ± 0 008 0 068 ± 0 005 0 026 ± 0 104 0 081 ± 0. 129 0 055 ± 0.025 C A L M 0 984 ± 0 007 0 913 ± 0 008 0 071 ± 0 005 0 034 ± 0 107 0 084 ± 0. 130 0 050 ± 0.023 CAL15 0 984 ± 0 007 0 944 ± 0 011 0 040 ± 0 006 0 055 ± 0 117 0 093 ± 0. 134 0 037 ±0.017 CAL16 0 984 ± 0 007 0 864 ± 0 010 0 120 ± 0 016 0 037 ± 0 109 0 086 ± 0. 131 0 049 ± 0.022 CAL17 0 984 ± 0 006 0 904 ± 0 004 0 081 ± 0 005 0 064 ± 0 121 0 098 ± 0. 137 0 034 ±0.016 CAL18 0 984 ± 0 007 0 921 ± 0 008 0 063 ± 0 005 0 033 ± 0 106 0 087 ± 0. 131 0 055 ± 0.025 CAL19 0 984 ± 0 007 0 923 ± 0 009 0 062 ± 0 005 0 030 ± 0 105 0 085 ± 0. 130 0 055 ± 0.025 CAL20 0 984 ± 0 007 0 902 ± 0 004 0 083 ± 0 005 0 026 ± 0 104 0 081 ± 0. 128 0 055 ± 0.025 Whitehorse YK01 0 992 ± 0 003 0 914 ± 0 008 0 078 ± 0 006 0 020 ± 0 101 0 085 ± 0. 130 0 065 ± 0.030 YK02 0 992 ± 0 003 0 905 ± 0 007 0 087 ± 0 006 0 026 ± 0 104 0 084 ± 0. 130 0 058 ± 0.026 YK03 0 992 ± 0 003 0 883 ± 0 006 0 109 ± 0 008 0 047 ± 0 113 0 101 ± 0. 139 0 053 ± 0.026 YK04 0 992 ± 0 003 0 811 ± 0 026 0 182 ± 0 029 0 076 ± 0 127 0 094 ± 0. 135 0 018 ±0.010 YK05 0 992 ± 0 003 0 893 ± 0 005 0 100 ± 0 007 0 024 ± 0 103 0 082 ± 0. 129 0 059 ± 0.027 YK06 0 992 ± 0 003 0 909 ± 0 004 0 083 ± 0 004 0 036 ± 0 109 0 086 ± 0. 131 0 050± 0.023 YK07 0 992 ± 0 003 0 926 ± 0 009 0 066 ± 0 006 0 020 ± 0 101 0 083 ± 0. 130 0 063 ± 0.029 YK08 0 992 ± 0 003 0 931 ± 0 010 0 062 ± 0 007 0 016 ± 0 099 0 075 ± 0. 126 0 059 ± 0.027 YK09 0 992 ± 0 003 0 903 ± 0 005 0 089 ± 0 004 0 035 ± 0 108 0 083 ± 0. 130 0 048 ± 0.022 YK10 0 946 ± 0 002 0 869 ± 0 008 0 077 ± 0 008 0 016 ± 0 099 0 083 ± 0. 129 0 067 ±0.031 YK11 0 992 ± 0 003 0 913 ± 0 006 0 079 ± 0 005 0 017 ± 0 100 0 081 ± 0. 129 0 064 ± 0.029 YK12 0 946 ± 0 000 0 870 ± 0 008 0 076 ± 0 010 0 017 ± 0 100 0 085 ± 0. 130 0 067 ±0.031 YK13 0 900 ± 0 000 0 849 ± 0 012 0 051 ± 0 012 0 054 ± 0 117 0 088 ± 0. 132 0 033 ±0.015 YK14 0 992 ± 0 003 0 889 ± 0 006 0 103 ± 0 007 0 028 ± 0 105 0 083 ± 0. 129 0 054 ± 0.025 YK15 0 992 ± 0 003 0 892 ± 0 004 0 100 ± 0 006 0 046 ± 0 113 0 087 ± 0. 131 0 040 ±0.019 YK16 0 992 ± 0 003 0 863 ± 0 012 0 129 ± 0 015 0 020 ± 0 101 0 085 ± 0. 131 0 066 ± 0.030 YK17 0 992 ± 0 003 0 892 ± 0 006 0 100 ± 0 007 0 033 ± 0 107 0 087 ± 0. 132 0 054 ± 0.025 YK18 0 992 ± 0 003 0 907 ± 0 006 0 086 ± 0 005 0 019 ± 0 101 0 079 ± 0. 127 0 060 ± 0.028 YK19 0 992 ± 0 003 0 846 ± 0 014 0 146 ± 0 016 0 021 ± 0 101 0 080 ± 0. 128 0 059 ± 0.027 YK20 0 992 ± 0 003 0 892 ± 0 004 0 100 ± 0 005 0 019 ± 0 101 0 081 ± 0. 128 0 061 ± 0.028 140 Table 5.4 Numbers of full-sib groups and correlation paternity for each family in P. contorta ssp. latifolia Family No. of full-sib group rv Carbondale CAL01 11 0.058 CAL02 12 0.053 CAL03 10 0.079 CAL04 6 0.242 CAL05 11 0.068 CAL06 9 0.084 CAL07 11 0.053 CAL08 13 0.058 CAL09 11 0.058 CAL10 13 0.042 CAL11 9 0.074 CAL12 10 0.079 CAL13 13 0.042 CAL14 12 0.047 CAL15 10 0.074 CAL16 11 0.063 CAL17 7 0.158 CAL18 9 0.095 CAL19 12 0.068 CAL20 10 0.063 Whitehorse YK01 11 0.037 YK02 11 0.058 YK03 8 0.100 YK04 10 0.084 YK05 11 0.068 YK06 10 0.068 YK07 11 0.058 YK08 12 0.047 YK09 12 0.053 YK10 9 0.100 YK11 11 0.068 YK12 12 0.047 YK13 11 0.074 YK14 10 0.079 YK15 10 0.074 YK16 9 0.084 YK17 9 0.105 YK18 9 0.074 YK19 11 0.058 YK20 12 0.063 141 Table 5.5 Levels of correlated paternity in natural populations of tree species (adapted from Hardy et al. 2004) Species Pollination r„ Ne Reference Conifers Larix occidentalis Wind Pinus sylvestris Wind Angiosperms Quercus lobata Wind 0.270 3.7 Acacia melanoxylon Insect 0.030 33 Albizia julibrissin Insect 0.340 2.9 Dinizia excels a Insect 0.210 4.9 Dryobalanops aromatica Insect 0.080 12.5 Caryocar brasiliense Bat 0.051-0.205 4.9-11.6 Ceiba pentandra Bat 0.354-0.600 1.7-2.8 0.060-0.100 9.6-16.1 El-Kassaby & Jaquish (1996) 0.012-0.008 83-125 Robledo-Amuncio et al. (2004) Sork et al. (2002a) Muona et al. (1991) Irwin et al. (2003) Dick et al. (2003) Lee (2000) Collevatti et al. (2001) Lobo et al. (2005) 142 Figure 5.1 The distribution of full-sib group size for each family for Carbondale (a) and Whitehorse (b) P. contorta ssp. latifolia populations 143 Figure 5.2 Relationship between correlated paternity and variances of full-sib group size 144 5.5 References Brown A H D (1988) Genetic characterization of plant mating system. In: Plant Population Genetics, Breeding, and Genetic Resources (eds. Brown A H D , Clegg MT, Kahler A L , Weir BS), pp. 145-162. Sinauer Associates, Inc., Sunderland, Masschusetts. Brown A H D , Allard RW (1970) Estimation of mating system in open-pollinated maize populations using isozyme polymorphisms. Genetics 66, 133-145. Collevatti RG, Grattapaglia D, Hay JD (2001) High resolution microsatellite based analysis of the mating system allows the detection of significant biparental inbreeding in Caryocar brasiliense, an endangered tropical tree species. Heredity 86, 60-67. Dick CW, Etchelecu G, Austerlitz F (2003) Pollen dispersal of tropical trees (Dinizia excelsa: Fabaceae) by native insects and African honeybees in pristine and fragmented Amazonian rainforest. Molecular Ecology 12, 753-764. Doyle JJ, Doyle JL (1987) A rapid D N A isolation procedure for small quantities of fresh tissue. Phytochemical Bulletin 19, 11-15. El-Kassaby Y A , Jaquish B (1996) Population density and mating pattern in western larch. Journal of Heredity 87, 438-443. Epperson B, Allard R (1984) Allozyme analysis of the mating system in lodgepole pine populations. Journal of Heredity 75, 212-214. Epperson B K , Allard RW (1989) Spatial auto-correlation analysis of the distribution of genotypes within populations of lodgepole pine. Genetics 121, 369-377. Fazekas A J , Yeh FC (2001) Random amplified polymorphic D N A diversity of marginal and central populations in Pinus contorta subsp. latifolia. Genome 44, 13-22. Fritsch P, Rieseberg L H (1992) High outcrossing rates maintain male and hermaphrodite individuals in populations of the flowering plant Datisca glomerata. Nature 359, 633-636. Gaiotto FA, Bramucci M , Grattapaglia D (1997) Estimation of outcrossing rate in a breeding population of Eucalyptus urophylla with dominant R A P D and A F L P markers. Theoretical and Applied Genetics 95, 842-849. Hardy OJ, Gonzalez-Martinez SC, Colas B, Freville H , Mignot A , Olivieri I (2004) Fine-scale genetic structure and gene dispersal in Centaurea corymbosa (Asteraceae). II. Correlated paternity within and among sibships. Genetics 168, 1601-1614. Heywood JS (1991) Spatial-Analysis of Genetic-Variation in Plant-Populations. Annual Review of Ecology and Systematics 22, 335-355. Husband BC, Schemske D W (1996) Evolution of the magnitude and timing of inbreeding depression in plants. Evolution 50, 54-70. Irwin AJ , Hamrick JL, Godt MJW, Smouse PE (2003) A multiyear estimate of the effective pollen donor pool for Albizia julibrissin. Heredity 90, 187-194. Johnstone JF, Chapin FS (2003) Non-equilibrium succession dynamics indicate continued northern migration of lodgepole pine. Global Change Biology 9, 1401-1409. 145 Ledig FT (1998) Genetic variation in Pinus. In: Ecology and biogeography of Pinus (ed. Richard DM), pp. 251-280. Cambridge University Press, Cambridge, United Kingdom. Lee SL (2000) Mating system parameters of Dryobalanops aromatica Gaertn. f. (Dipterocarpaceae) in three different forest types and a seed orchard. Heredity 85, 338-345. Liewlaksaneeyanawin C, Ritland CE, El-Kassaby Y A , Ritland K (2004) Single-copy, species-transferable microsatellite markers developed from loblolly pine ESTs. Theoretical and Applied Genetics 109, 361-369. Lobo JA, Quesada M , Stoner K E (2005) Effects of pollination by bats on the mating system of Ceiba pentandra (Bombacaceae) populations in two tropical life zones in Costa Rica. American Journal of Botany 92, 370-376. Lotan JE, Brown JK, Neuenschwander LF (1985) Role of fire in lodgepole pine forests. In: Lodgepole pine: The species and its management (eds. Baumgartner D M , Krebill RG, Arnott JT, Weetman GF), pp. 133-152, Washington State University, USA. MacDonald G M , Cwynar L C (1985) A fossil pollen based reconstruction of the late Quaternary history of lodgepole pine (Pinus contorta ssp. latifolia) in the western interior of Canada. Canadian Journal of Forest Research 15, 1039-1044. Morgante M , Vendramin GG, Rossi P (1991) Effects of stand density on outcrossing rate in two Norway spruce (Picea abies) populations. Canadian Journal of Botany 69, 2704-2708. Muona O, Moran GF, Bell JC (1991) Hierarchical patterns of correlated mating in Acacia melanoxylon. Genetics 127, 619-626. Nagamitsu T, Ichikawa Se, Ozawa M , Shimamura R, Kachi N , Tsumura Y , Muhammad N (2001) Microsatellite analysis of the breeding system and seed dispersal in Shorea leprosula (Dipterocarpaceae). International Journal of Plant Sciences 162, 155-159. O'Connell L M (2003) The evolution of inbreeding in western redcedar (Thuja plicata: Cupressaceae) PhD, University of British Columbia. Oetting WS, Lee H K , Flanders DJ, Wiesner GL, Sellers TA, King R A (1995) Linkage analysis with multiplexed short tandem repeat polymorphisms using infrared fluorescence and M l 3 tailed primers. Genomics 30, 450-458. Owens JN, Simpson SJ, Molder M (1981) Sexual Reproduction of Pinus contorta. 1. Pollen Development, the Pollination Mechanism, and Early Ovule Development. Canadian Journal of Botany 59, 1828-1843. Parker K C , Hamrick JL, Parker A J , Nason JD (2001) Fine-scale genetic structure in Pinus clausa (Pinaceae) populations: effects of disturbance history. Heredity 87, 99-113. Perry DJ, Dancik BP (1986) Mating system dynamics of lodgepole pine in Alberta, Canada. Silvae Genetica 35, 190-195. 146 Rajora OP, Mosseler A , Major JE (2002) Mating system and reproductive fitness traits of eastern white pine (Pinus strobus) in large, central versus small, isolated, marginal populations. Canadian Journal of Botany 80, 1173-1184. Ritland K (1989) Correlated matings in the partial selfer Mimulus guttatus. Evolution 43, 848-859. Ritland K (1990) A series of fortran computer-programs for estimating plant mating systems. Journal of Heredity 81, 235-237. Ritland K (2002) Extensions of models for the estimation of mating systems using n independent loci. Heredity 88, 221-228. Ritland K , Jain S (1981) A model for the estimation of outcrossing rate and gene-frequencies using N independent loci. Heredity 47, 35-52. Robledo-Amuncio JJ, Smouse PE, Gil L, Alia R (2004) Pollen movement under alternative silvicultural practices in native populations of Scots pine (Pinus sylvestris L.) in central Spain. Forest Ecology and Management 197, 245-255. Robledo-Arnuncio JJ, Al ia R, Gil L (2004) Increased selfing and correlated paternity in a small population of a predominantly outcrossing conifer, Pinus sylvestris. Molecular Ecology 13, 2567-2577. Sork V L , Davis FW, Smouse PE, Apsit VJ, Dyer RJ, Fernandez JF, Kuhn B (2002) Pollen movement in declining populations of California Valley oak, Quercus lobata: where have all the fathers gone? Molecular Ecology 11, 1657-1668. Sun M , Ritland K (1998) Mating system of yellow starthistle (Centaurea solstitialis), a successful colonizer in North America. Heredity 80, 225-232. Thomas SC, Hil l W G (2002) Sibship reconstruction in hierarchical population structures using Markov chain Monte Carlo techniques. Genetical Research 79, 227-234. Vogl C, Karhu A , Moran G, Savolainen O (2002) High resolution analysis of mating systems: Inbreeding in natural populations of Pinus radiata. Journal of Evolutionary Biology 15, 433-439. Wang JL (2004) Sibship reconstruction from genetic data with typing errors. Genetics 166, 1963-1979. Washington JG, Aitken SN (2005) Strong spatial genetic structure in peripheral but not core populations of Sitka spruce [Picea sitchensis (Bong.) Carr.]. Molecular Ecology 14, 2659-2667. Wheeler NC, Critchfield WB (1985) The distribution and botanical characteristics of lodgepole pine: Biogeographical and management implications. In: Lodgepole pine: The species and its management (eds. Baumgartner D M , Krebill RG, Arnott JT, Weetman GF), pp. 1-13, Washington State University, USA. Wright S (1943) Isolation by distance. Genetics 28, 114-138. Yeh FC, Layton C (1979) The organization of genetic variability in central and marginal populations of lodgepole pine Pinus contorta ssp. latifolia. Canadian Journal of Genetics and Cytology 21,487-503. 147 CHAPTER 6 THE UTILITY OF AMPLIFIED FRAGMENT LENGTH POLYMORPHISMS (AFLPS) FOR MATING SYSTEM ESTIMATION 6.1 Introduction The mating system, defined as the genetic relationship between mates, governs the pattern of gene transmission between generations (Brown 1988), and ultimately affects the patterning of genetic within and among populations. It is usually described in terms of the fraction of progeny produced by self-fertilization or biparental inbreeding, but can also include the patterns of paternity in open-pollinated families. The estimation of plant mating system is primarily achieved through the assay of genetic markers in progeny arrays. Isozymes have traditionally been the marker of choice for plant mating system analysis (Brown & Allard 1970; Cruzan 1998), but they show relatively low levels of polymorphism, and limited number of loci that are available for assay. More recently, microsatellite (SSR) markers have been used; they show high allelic diversity and codominant expression, and their greater informative value allow more powerful and novel inferences about mating systems (O'Connell et al. 2004; Ritland & Leblanc 2004). Studies of mating system analysis using microsatellites have been published in many plants species such as Mimulus spp. (Ritland & Leblanc 2004), Eucalyptus spp. (Butcher et al. 2005; Chaix et al. 2003; Jones et al. 2005), Quercus spp. (Fernandez & Sork 2005; Sork et al. 2002), and Pinus spp. (Lian et al. 2001; Liewlaksaneeyanawin et al. in prep.; Robledo-Amuncio et al. 2004; Robledo-Arnuncio & Gi l 2005). With the advent of the technique of amplified fragment length polymorphisms (AFLP; Vos et al. 1995), dominant markers have regained popularity in research of plants, 148 fungi, and bacteria. Like randomly amplified polymorphic D N A (RAPD; Williams et al. 1990), a large number of DNA-level markers can be obtained with no prior knowledge of D N A sequence. RAPDs have been used for a few mating system studies, such as Datisca glomerata (Fritsch & Rieseberg 1992), Iris (Cruzan & Arnold 1994), and Eucalyptus urophylla (Gaiotto et al. 1997). However, the RAPD technique is particularly sensitive to initial deoxyribonucleic acid (DNA) content, concentration of Mg and Taq polymerase, and fhermalcyclers in the optimization of PCR conditions (review in Jones et al. 1997), and have earned a reputation of being unreliable. The A F L P technique is more robust and more molecular ecology studies are employing this marker in activities such as estimation of genetic diversity and population structure, assignment of individuals, characterization of hybridization and hybrid zones, gene mapping, and species phylogenies (review in Bensch & Akesson 2005). Sufficient numbers of AFLP markers can outperform a smaller number of high polymorphic microsatellites for some applications such as assignment tests and population discrimination (Woodhead et al. 2005), and also parentage. Relatively few studies have compared AFLPs and SSRs for their ability to infer population genetic structure (see Woodhead et al. 2005, for a review) or parentage (Gerber et al. 2000). To date, no studies have compared AFLPs with SSRs for mating system analysis. The aim of this study was to compare the utility of dominant AFLPs to codominant SSRs for mating system analysis, using the example of two peripheral populations of lodgepole pine (Pinus contorta Dougl. ex Loud) sampled from the Yukon and Alberta, Canada. High outcrossing rate in this wind-pollinated species have been previously reported in seed orchards (Stoehr & Newton 2002) and natural populations (Epperson & Allard 1984; 149 Perry & Dancik 1986), but other facets of the mating system, including the correlation of paternity and the correlation of outcrossing within families, have not been reported. In this chapter, we examine the relationship between the number of A F L P loci and estimates of multilocus outcrossing rates and correlated paternity, and the numbers of A F L P loci needed to obtain the same precision as SSRs. Statistical bias associated with specific AFLP primer combinations, and with the method of estimation of mating system with dominant markers, is examined in detail. 6.2 Material and Methods 6.2.1 Plant materials and DNA isolation Seed-cones were collected from 20-30 trees with a minimum of 30 m between sampled trees from two natural populations: Carbondale (near Calgary, Alberta) and Whitehorse (in the Yukon Territory). These represent the eastern and northern limits of the species distribution, respectively. The Carbondale population (60°43'N, 135°05'W; elevation 762 m) is characterized by dense trees of ages between 30 and 80 years while the Whitehorse population (49°26'N, 114°25'W; elevation 1510 m) is even-aged with lower density. Seed were extracted and were kept separate by parent to maintain the identity of every sampled tree within each population. Twenty trees from each population were randomly selected for analysis. Seeds were germinated in petri dishes on moist filter paper at room temperature for approximately 10 days. A total of 400 progeny per population (20 seeds per family) isolated for genomic DNA, in 1.5 mL microtubes using a modification of the CTAB (cetyltrimethyl ammonium bromide) method (Doyle & Doyle 1987). 150 6.2.2 SSR and AFLP analyses Four hundred individuals from each population were genotyped at eleven microsatellite loci (PtTX2123, PtTX2146, PfTX3011, PtTX3025, PtTX3034, PtTX3052, PtTX3127, PtTX4054, PtTX4056, PtTX4058, PtTX4139). The same 20 open-pollinated families of Whitehorse and Carbondale populations were genotyped with EcoRVMsel and PstllMsel A F L P primer combinations, respectively. A total of 12 EcoRVMsel A F L P primer combinations in Yukon population were performed using three selective nucleotides of tailed E primers and four or five selective nucleotides of M primers: EcoRl + ACC/Msel + CCCC, EcoRl + ACC/Msel + C C C G , EcoRl + ACC/Msel + C C G A , EcoRl + ACC/Msel + CCGG, EcoRl + ACC/Msel + CCGT, EcoRl + ACC/Msel + C C A G A , EcoRl + ACC/Msel + CCAGT, EcoRl + ACG/Msel + CCGT, EcoRl + ACG/Msel + CCTC, EcoRl + ACT/Msel + CCGG, EcoRl + ACT/Msel + CCGT, EcoRl + ACT/Msel + C C A G A (Table 6.1). Preamplifications of E / M primer combinations were performed using selective nucleotides EcoRl + A C and Msel + CC. For the Calgary population, a total of 4 Pstl/Msel A F L P primer combinations used, consisting of three selective nucleotides of tailed P primers and three selective nucleotides of M primers: Pstl + CAG/Msel + CCC, Pstl + CAG/Msel + CGG, Pstl + CCA/Msel + C C A , Pstl + CGAJMsel + CCG. Preamplifications of P /M primer combinations were performed using selective nucleotides Pstl + C and Msel + C. The AFLP procedures were modified from Paglia & Morgante (1998) and Remington et al. (1999) at the UBC Genetic Data Centre (www.forestrv.ubc.ca/gdc). Sequences of primers and adaptors used for A F L P analyses are shown in Table 6.1. AFLP fragments were scored as present (+) or absent (-), within the size range of 50-700 base pairs using SAGA™ G T / M X (LiCor Inc.). 151 6.2.3 Mating system analysis The multilocus mixed-mating model of Ritland (1990; 2002) was used to estimate the mating system parameters following the assumptions as stated in Ritland and Jain (1981). These assumptions were (1) maternal genotypes outcross at the same rate to a homogenous pollen pool, (2) alleles at different loci segregate independently; and (3) no mutation or selection between the time of mating and the progeny assay. Mating system parameters including single-locus (ts) and multilocus (tm) outcrossing rate, single-locus (rP(S)) and multilocus (rp(m)) correlated paternity, and correlation of t between individuals within families (rt). These were estimated using the Expectation - Maximization (EM) procedure of the computer program M L T R 3.1 (Ritland 2002). Variances for mating system estimates were obtained from the 1,000 bootstrap replicates. Maternal genotypes at each locus were inferred from progeny arrays by choosing the most likely parent, given the progeny array (Brown & Allard 1970). The differences in mating system parameters among AFLP primer combinations were tested with a chi-square test. 6.3 Results 6.3.1 Polymorphism of AFLP markers The A F L P marker assay in P. contorta ssp. latifolia provided a large number of polymorphic loci. The numbers of polymorphic loci scored for each primer combination are shown in Table 6.2. Loci that were monomorphic in an open-pollinated family were often polymorphic in another (Fig. 6.1). The mean number of polymorphic bands was 36 and 41 for E / M primers [maximum: 48 (E + A C G / M + CCGT); minimum: 26 (E + A C C / M + CCAGT)] and P /M primers [maximum: 51 (P + C A G / M + CGG); minimum: (P + C C A / M + CCA)], respectively (Table 6.2). 152 6.3.2 Multilocus (tm) and single (ts) locus outcrossing rates For microsatellites, both populations show predominant outcrossing, with population multilocus outcrossing rate estimates of 0.990 and 0.992 for Whitehorse and Carbondale populations, respectively (Table 6.2). The population single-locus (ts) estimate for Whitehorse was 0.874, which was lower than the 0.936 value found for Carbondale (Table 6.2). For AFLPs, across all primer combinations, the average population multilocus and single-locus outcrossing rates for Whitehorse were 0.995 and 0.925, respectively (Table 6.2). For the four A F L P primer pairs used for Carbondale, these averages were 0.998 and 0.910, respectively (Table 6.2). Among A F L P primer combinations, no significant difference for multilocus outcrossing rate was observed in either population (Table 6.2), as shown by the chi-square test for Whitehorse (x2 = 16.6, d.f. = 11, P = 0.121) and Carbondale (x 2 = 7.68, d.f. = 3,P = 0.053). A significant difference for single-locus outcrossing rate estimate among AFLP primer combinations was found for the Whitehorse population (% = 22.4, d.f. = 11, P = 0.021), but not for the Carbondale population (x 2 = 5.4, d.f. = 3,P = 0.142). 6.3.3 Biparental inbreeding For SSRs, the difference between the population's single- and multilocus (tm - ts) estimate was higher for Whitehorse (0.116) than for Carbondale (0.056) (Table 6.2). For all AFLP primer combinations, this difference of outcrossing rate (tm - ts) was significant for the both populations (Table 6.2). For AFLPs, this difference was 0.070 and 0.088 for Whitehorse and Carbondale, respectively (Table 6.2). No significant differences in the population's single- and multilocus (tm - ts) estimates were observed among A F L P primer 153 2 2 combinations as shown by a nonsignificant x -test in Whitehorse (x - 12.0, d.f. = 11, P = 0.363) and Carbondale (x 2 = 3.1, d.f. = 3,P = 0.372). 6.3.4 Correlated paternity Correlated paternity, e.g., the probability that siblings shared the same male parent, was low for both populations. For SSRs, the multilocus correlated paternity estimates were 0.009 and 0.021 for Whitehorse and Carbondale, respectively (Table 6.2). For AFLPs,.the estimates of multilocus correlated paternity (rP(m)) for Whitehorse ranged from 0.005 (E + A C C / M + C C A G A ) to 0.036 (E + A C C / M + CCCC and E + A C C / M + CCAGT), with the mean of 0.022. A significant difference in correlated paternity was observed among AFLP 2 m 2 primer combinations, as shown by a high significant x -test in Whitehorse population (x = 28.9, d.f. = 11, P = 0.002), but was not significant in Carbondale population (x = 4.7, d.f. = 3, P = 0.190). The single-locus correlation of paternity was higher than the multilocus correlation of paternity for SSRs and AFLPs in both populations, which is indicative of population substructure (Ritland 2002). No significant differences in the /-p ( s ). r p ( m ) was observed among 2 2 AFLP primer combinations, as shown by a nonsignificant x -test in Whitehorse (x = 16.4, d.f. = 11, P = 0.125) and Carbondale (x2 = 4.6, d.f. = 3, P = 0.199). 6.3.5 Correlation of outcrossing rate within families The correlation of outcrossing rate (rt) within families (this is caused by variation of outcrossing rate among families) was low for both SSRs and AFLPs in both populations (Table 6.2). No significant differences in the correlation of outcrossing rate among families 2 ' was observed among A F L P primer combinations as shown by a non-significant X -test in 154 both the Whitehorse (x2 = 4.9, d.f. = 11, P = 0.935) and Carbondale (x2 = 7.2, d.f. = 3,P = 0.066) populations. 6.4 Discussion 6.4.1 Mating system in peripheral populations Outcrossing is predominant in P. contorta ssp. latifolia as suggested by high outcrossing rates shown by both SSR and AFLP markers in both populations. These estimates are comparable with the tm estimates for P. contorta ssp. latifolia based upon isozyme markers as reported by Epperson & Allard (1984) (tm = 0.990) and Perry & Dancik (1986) (tm = 0.948). Both marker-types also revealed some biparental inbreeding, as well as low levels of correlated paternity, in these peripheral populations of lodgepole pine. 6.4.2 Comparative study of mating system using SSRs and AFLPs Microsatellite markers are gaining prominence among mating system studies. Microsatellite markers were used to determine the genetic composition at early life stages (i.e., aborted seeds, immature and mature seeds, and seedlings) in Shorea leposula (Nagamitsu et al. 2001) and Platypodium elegans J. Vogel (Hufford & Ffamrick 2003). Another application of microsatellites was carried out by Lian et al. (2001) to estimate outcrossing rates from individual cones on different branches of Pinus densiflora Sieb. & Zucc. Recently, Lobo et al. (2005) used microsatellites for determining the relationship between pollinators and mating systems in Ceiba pentandra. Garcia et al. (2005) studied the effects of gender type and density level on variation in mating patterns using microsatellites within a highly isolated population of Prunus mahaleb. 155 Some studies; however, have been suggested that AFLPs would be useful for mating system studies in forest trees (Beland et al. 2005; Freitas et al. 2004; Gaiotto et al. 1997). By using 44 A F L P polymorphic loci for analyzing the mating system, high outcrossing, significant biparental inbreeding, and low level of correlated paternity were reported in Arbutus menziesii Pursh (Beland et al. 2005). Freitas et al. (2004) studied the mating system in a natural population of Myracrodruon urundeuva F.F. & M.F. Allemao using AFLP markers and reported the presence of high level of outcrossing rate and correlated paternity, but low level of biparental inbreeding. This study revealed that multilocus outcrossing rate analysis could be performed using either A F L P markers or SSRs with similar results (Table 6.2). Nonsignificant differences of multilocus outcrossing rate observed among A F L P primer combinations suggests that multilocus outcrossing rates do not change with increasing numbers of markers. Gaiotto et al. (1997) also suggested that 18-20 polymorphic A F L P loci are adequate for estimating the outcrossing rate in breeding population of Eucalyptus urophylla. However, AFLPs are less efficient than microsatellites for estimating biparental inbreeding and correlation of paternity. The levels of biparental inbreeding (tm - ts) were lower for AFLPs (0.070) than for SSRs (0.116) in Whitehorse, but were higher for AFLPs (0.088) than for SSRs (0.056) in Carbondale populations (Table 6.2). SSRs detected lower levels of single-and multilocus correlated paternity in both populations, compared to the AFLPs (Table 6.2). Differences of the r p ( s ) . r p ( m ) were higher for AFLPs than for SSRs (Table 6.2). This was due to the fact that the single-locus correlations estimated from microsatellites were lower than that of estimated from AFLPs. 156 A bias of estimating biparental inbreeding using A F L P markers might be due to violations of some of the mating system assumptions. It is likely that linkage disequilibrium between loci may occur when many AFLP polymorphic loci are used to estimate selfing rate. The influence of disequilibrium between loci on multilocus estimates of the outcrossing rate can lead to an underestimation of biparental inbreeding (Hedrick & Ritland 1990). In addition, the high polymorphism of microsatellites is likely to impact the power to differentiate selfing from outcrossing events between close relatives and to estimate single-and multilocus correlation of paternity. However, robust estimates of multilocus correlated paternity with AFLPs can be achieved by increasing the number of polymorphic loci. The results of this study suggest that ca. 50 polymorphic A F L P loci are adequate for estimating multilocus correlated paternity. As stated earlier, the aim of this study was to compare the utility of dominant AFLPs to codominant SSRs for mating system analysis, and not to compare the results between the two populations. Along these lines, although the EcoRI and Pstl are likely to target different genomic regions, the results of this study showed that both E / M and P /M primer combinations gave similar outcrossing rate estimates, which were also comparable to that given by SSRs. In addition, the number of polymorphic loci seemed to be more important for estimation of mating system parameters than the type of A F L P primer combination. 6.4.3 Future uses of microsatellites and AFLPs for mating system analysis in forest trees The high variability of microsatellites has allowed a novel approach for estimating mating system based upon the bulking D N A samples from several individuals (O'Connell et al. 2004), and another novel approach that used only pairs of progeny for estimating correlated matings (Ritland & Leblanc 2004). These approaches can reduce the number of 157 genetic assays per progeny array, thus reducing cost and time for the genotyping. For example, the bulking of several individual progeny into one sample was used to estimate outcrossing rates in Thuja plicata at the level of individual tree and within individual tree (i.e., different crown heights within a tree) (O'Connell et al. 2004). By applying the theory of four-gene coefficient of relationship, Ritland & Leblanc (2004) used 20 progeny pairs of monkeyflower (Mimulus) species for estimating the mating system with nine microsatellite markers. In general, microsatellites are ideal genetic markers for mating system analysis, providing the greater statistical power than other markers. However, their cost of development limits their use. Fortunately, for conifers such as Pinus spp., microsatellites have been developed (Echt & May-Marquardt 1996; Elsik et al. 2000; Fisher et al. 1998) and these markers can be transferred among members of the species (Echt et al. 1999; Gonzalez-Martinez et al. 2004; Liewlaksaneeyanawin et al. 2004; Shepherd et al. 2002). However, tropical trees have a much higher diversity of taxa. While changes of the mating system after forest fragmentation, are of great current interest, unfortunately, because of their higher among species diversity, cross amplification of microsatellites among tropical trees is not very successful. In this case, AFLPs could be the marker of choice for estimating mating systems because of their lower development costs and the large number of loci that this technique yields, in spite of their dominant nature. 158 Table 6.1 Sequences of primers and adaptors used for amplified fragment length polymorphisms analysis Primer name Sequence 5' —»3' Adapters FcoRI-adpater forward .EcoRI-adpater reverse Psrt-adapter forward Psfl-adapter reverse Msel-adapter forward Msel-adapter reverse A A T TGG TAC G C A GTC GTC A A C G A C G A C TGC GTA CC A A C G A C G A C TGC GTA C A T G C A TGT A C G C A G TCG TC G A C G A T G A G TCC T G A G T A C T C A G G A CTC A T Pre-amplication primers EcoRl + A C Pstl + C Msel + C Msel + CC G A C TGC GTA C C A ATT C A C G A C TGC GTA C A T G C A G C GAT G A G TCC TGA G T A A C G A T G A G TCC TGA G T A A CC AFLP EcoRl primers £coRI + A C C G A C EcoRl + A C G G A C EcoRl + A C T G A C TGC GTA C C A ATT C A C C TGC GTA C C A ATT C A C G TGC GTA C C A ATT C A C T AFLP Pstl primers Pstl + C A G G A C Pstl + C C A G A C Pstl + C G A G A C TGC GTA CAT G C A G C A G TGC GTA C A T G C A G C C A TGC G T A C A T G C A G C G A AFLP Msel primers Msel + C C A GAT G A G TCC TGA GTA A C C A Msel + CCC GAT G A G TCC TGA G T A A CCC Msel + C G G G A T G A G TCC TGA G T A A C G G Msel + C C G G A T G A G TCC TGA G T A A C C G Msel + CCCC GAT G A G TCC TGA G T A A C C C C Msel + C C C G G A T G A G TCC TGA G T A A C C C G Msel + C C G A G A T G A G TCC T G A G T A A C C G A Msel + C C G G G A T G A G TCC TGA GTA A C C G G Msel + CCGT G A T G A G TCC T G A GTA A CCGT Msel + CCTC G A T G A G TCC T G A GTA A CCTC Msel + C C A G A G A T G A G TCC TGA GTA A C C A G A Msel + C C A G T G A T G A G TCC T G A GTA A C C A G T 159 Table 6.2 Population estimates of multilocus (tm) and single-locus (4) outcrossing rate, multilocus (rp(m)) and single (rp(s)) correlation of paternity, correlation of selfing between loci (rsi), and correlation of t estimates (rt) for P. contorta ssp. latifolia from Whitehorse and Carbondale populations estimated using SSR and A F L P primer combinations Primer #PL HE An ts An " ts rp(m) fv(s) rp(s).rp(m) Whitehorse PCI: E+ ACC/M+CCAGT 26 0.403 0.979 0.909 0.070 0.036 0.092 0.056 0.091 (0.010) (0.009) (0.014) (0.008) (0.009) (0.010) (0.017) PC2: E+ACG/M+CCTC 27 0.409 0.996 0.930 0.066 0.027 0.090 0.062 0.100 (0.005) (0.014) (0.015) (0.007) (0.010) (0.008) (0.012) PC3: E+ACC/M+CCAGA 30 0.388 0.999 0.939 0.061 0.005 0.081 0.076 0.103 (0.007) (0.015) (0.017) (0.003) (0.013) (0.012) (0.011) PC4: E+ACC/M+CCCC 35 0.390 0.998 0.936 0.062 0.036 0.082 0.046 0.107 (0.009) (0.010) (0.012) (0.010) (0.005) (0.009) (0.013) PC5: E+ACT/M+CCGG 35 0.377 0.997 0.923 0.074 0.017 0.088 0.071 0.103 (0.004) (0.011) (0.012) (0.006) (0.009) (0.009) (0.010) PC6: E+ACC/M+CCGA 36 0.397 1.000 0.913 0.087 0.024 0.094 0.069 0.108 (0.000) (0.007) (0.007) (0.005) (0.004) (0.004) (0.000) PC7: E+ACC/M+CCCG 37 0.413 0.997 0.915 0.082 0.019 0.092 0.073 0.108 (0.007) (0.011) (0.014) (0.005) (0.019) (0.006) (0.005) PC8: E+ACT/M+CCGT 39 0.369 0.987 0.912 0.075 0.028 0.094 0.066 0.096 (0.007) (0.012) (0.013) (0.008) (0.008) (0.011) (0.018) PC9: E+ACT/M+CCAGA 39 0.435 0.998 0.915 0.083 0.019 0.093 0.074 0.105 (0.003) (0.009) (0.010) (0.004) (0.006) (0.006) (0.007) PC10:E+ACC/M+CCGG 40 0.366 0.991 0.918 0.073 0.021 0.092 0.071 0.100 (0.005) (0.006) (0.007) (0.005) (0.003) (0.004) (0.007) PC11: E+ACC/M+CCGT 42 0.372 1.000 0.948 0.052 0.017 0.074 0.057 0.109 (0.000) (0.007) (0.007) (0.004) (0.006) (0.006) (0.000) PC12: E+ACG/M+CCGT 48 0.373 0.995 0.945 0.051 0.010 0.070 0.061 0.092 (0.004) (0.005) (0.006) (0.004) (0.003) (0.003) (0.011) PC: E/M (MEAN) 36 0.391 0.995 0.925 0.070 0.022 0.087 0.065 0.102 (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) SSRs 11 0.990 0.874 0.116 0.009 0.042 0.033 0.091 (0.006) (0.012) (0.010) (0.003) (0.004) (0.003) (0.007) Carbondale PC1:P+CCA/M+CCA 31 0.361 0.992 0.908 0.084 0.043 0.093 0.051 0.102 (0.009) (0.003) (0.008) (0.007) (0.002) (0.006) (0.003) PC2: P+CAG/M+CCC 35 0.347 1.000 0.909 0.091 0.028 0.093 0.065 0.107 (0.000) (0.002) (0.002) (0.005) (0.001) (0.004) (0.000) PC3: P+CGA/M+CCG 48 0.335 1.000 0.908 0.092 0.040 0.095 0.055 0.107 (0.000) (0.002) (0.003) (0.007) (0.001) (0.006) (0.002) PC4: P+CAG/M+CGG 51 0.300 1.000 0.915 0.085 0.027 0.091 0.064 0.108 (0.000) (0.003) (0.003) (0.007) (0.002) (0.006) (0.002) PC: P/M (MEAN) 41 0.336 0.998 0.910 0.088 0.035 0.093 0.059 0.106 (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) SSRs 11 0.992 0.936 0.056 0.021 0.043 0.023 0.099 (0.006) (0.012) (0.009) (0.008) (0.006) (0.005) (0.005) # PL = number of polymorphic loci at 5% level, HE = Expected heterozygosity 160 350 bp - -**"' '"""SS4"-• i-.-*P*t:- llll III. I -I**.:- • -300 bp -2S5 bp — 204 bp _ 145 bp 100 bp — — ~— —— - — • ,„.„„, . . ^ " — - ~ " , _ _ _ W ~ I : W ., 7. ... -* • >• -+ >• •* Family 1 Family 2 Family 3 Family 4 Figure 6.1 A F L P profiles of 4 open-pollinated families with 10 individuals each using AFLP primer: Pst\+CAGIMse\+CGG in P. contorta ssp. latifolia 161 6.5 References Beland JB, Krakowski J, Ritland CE, Ritland K, El-Kassaby Y A (2005) Genetic structure and mating system of northern Arbutus menziesii populations. Canadian Journal of Forest Research 83, 1581-1589. Bensch S, Akesson M (2005) Ten years of A F L P in ecology and evolution: why so few animals? Molecular Ecology 14, 2899-2914. Brown A H D , Allard RW (1970) Estimation of mating system in open-pollinated maize populations using isozyme polymorphisms. Genetics 66, 133-145. Butcher PA, Skinner A K , Gardiner C A (2005) Increased inbreeding and inter-species gene flow in remnant populations of the rare Eucalyptus benthamii. Conservation Genetics 6,213-226. Chaix G, Gerber S, Razafimaharo V , Vigneron P, Verhaegen D, Hamon S (2003) Gene flow estimation with microsatellites in a Malagasy seed orchard of Eucalyptus grandis. Theoretical and Applied Genetics 107, 705-712. Cruzan M B (1998) Genetic markers in plant evolutionary ecology. Ecology 79, 400-412. Cruzan M B , Arnold M L (1994) Assortative mating and natural selection in an Iris hybrid zone. Evolution 48, 1946-1958. Doyle JJ, Doyle JL (1987) A rapid D N A isolation procedure for small quantities of fresh tissue. Phytochemical Bulletin 19, 11-15. Echt CS, May-Marquardt P (1996) Characterization of microsatellite markers in eastern white pine. Genome 39, 1102-1108. Echt CS, Vendramin GG, Nelson CD, Marquardt P (1999) Microsatellite D N A as shared genetic markers among conifer species. Canadian Journal of Forest Research 29, 365-371. Elsik CG, Minihan VT, Hall SE, Scarpa A M , Williams C G (2000) Low-copy microsatellite markers for Pinus taeda L. Genome 43, 550-555. Epperson B, Allard R (1984) Allozyme analysis of the mating system in lodgepole pine populations. Journal of Heredity 75, 212-214. Fernandez MJ , Sork V L (2005) Mating patterns of a subdivided population of the andean oak (Quercus humboldtii Bonpl., Fagaceae). Journal of Heredity 96, 635-643. Fisher PJ, Richardson TE, Gardner RC (1998) Characteristics of single- and multi-copy microsatellites from Pinus radiata. Theoretical and Applied Genetics 96, 969-979. Freitas M L M , Sebbenn A M , Moraes M L T , Lemos E G M (2004) Mating system of a population of Myracrodon urundeuva F.F. & M.F. Allemao using fAFLP molecular marker. Genetics and Molecular Biology 27, 425-431. Fritsch P, Rieseberg L H (1992) High outcrossing rates maintain male and hermaphrodite individuals in populations of the flowering plant Datisca glomerata. Nature 359, 633-636. 162 Gaiotto FA, Bramucci M , Grattapaglia D (1997) Estimation of outcrossing rate in a breeding population of Eucalyptus urophylla with dominant R A P D and A F L P markers. Theoretical and Applied Genetics 95, 842-849. Garcia C, Arroyo JM, Godoy JA, Jordano P (2005) Mating patterns, pollen dispersal, and the ecological maternal neighbourhood in a Prunus mahaleb L. population. Molecular Ecology 14, 1821-1830. Gerber S, Mariette S, Streiff R, Bodenes C, Kremer A (2000) Comparison of microsatellites and amplified fragment length polymorphism markers for parentage analysis. Molecular Ecology 9, 1037-1048. Gonzalez-Martinez SC, Robledo-Arnuncio JJ, Collada C, Diaz A , Williams CG, Alia R, Cervera M T (2004) Cross-amplification and sequence variation of microsatellite loci in Eurasian hard pines. Theoretical and Applied Genetics 109, 103-111. Hedrick PW, Ritland K (1990) Gametic disequilibrium and multilocus estimation of selfing rates. Heredity 65, 343-347. Hufford K M , Hamrick JL (2003) Viability selection at three early life stages of the tropical tree, Platypodium elegans (Fabaceae, Papilionoideae). Evolution 57, 518-526. Jones CJ, Edwards K J , Castaglione S, Winfield M O , Sala F, van de Wiel C, Bredemeijer G, Vosman B, Matthes M , Daly A , Brettschneider R, Bettini P, Buiatti M , Maestri E, Malcevschi A , Marmiroli N , Aert R, Volckaert G, Rueda J, Linacero R, Vazquez A , Karp A (1997) Reproducibility testing of RAPD, A F L P and SSR markers in plants by a network of European laboratories. Molecular Breeding 3, 381-390. Jones RC, McKinnon GE, Potts B M , Vaillancourt RE (2005) Genetic diversity and mating system of an endangered tree Eucalyptus morrisbyi. Australian Journal of Botany 53, 367-377. Lian C, Miwa M , Hogetsu T (2001) Outcrossing and paternity analysis of Pinus densiflora (Japanese red pine) by microsatellite polymorphism. Heredity 87, 88-98. Liewlaksaneeyanawin C, Ritland CE, El-Kassaby Y A , Ritland K (2004) Single-copy, species-transferable microsatellite markers developed from loblolly pine ESTs. Theoretical and Applied Genetics 109, 361-369. Lobo JA, Quesada M , Stoner K E (2005) Effects of pollination by bats on the mating system of Ceiba pentandra (Bombacaceae) populations in two tropical life zones in Costa Rica. American Journal of Botany 92, 370-376. Nagamitsu T, Ichikawa Se, Ozawa M , Shimamura R, Kachi N , Tsumura Y , Muhammad N (2001) Microsatellite analysis of the breeding system and seed dispersal in Shorea leprosula (Dipterocarpaceae). International Journal of Plant Sciences 162, 155-159. O'Connell L M , Russell J, Ritland K (2004) Fine-scale estimation of outcrossing in western redcedar with microsatellite assay of bulked DNA. Heredity 93, 443-449. Paglia G, Morgante M (1998) PCR-based multiplex D N A fingerprinting techniques for the analysis of conifer genomes. Molecular Breeding 4, 173-177. 163 Perry DJ, Dancik BP (1986) Mating system dynamics of lodgepole pine in Alberta, Canada. Silvae Genetica 35, 190-195. Remington DL, Whetten RW, Liu B H , O'Malley D M (1999) Construction of an AFLP genetic map with nearly complete genome coverage in Pinus taeda. Theoretical and Applied Genetics 98, 1279-1292. Ritland K (1990) A series of fortran computer-programs for estimating plant mating systems. Journal of Heredity 81, 235-237. Ritland K (2002) Extensions of models for the estimation of mating systems using n independent loci. Heredity 88, 221-228. Ritland K, Jain S (1981) A model for the estimation of outcrossing rate and gene-frequencies using N independent loci. Heredity 47, 35-52. Ritland K , Leblanc M (2004) Mating system of four inbreeding monkeyflower (Mimulus) species revealed using progeny-pair analysis of highly informative microsatellite markers. Plant Species Biology 19, 149-157. Robledo-Amuncio JJ, Smouse PE, Gi l L, Alia R (2004) Pollen movement under alternative silvicultural practices in native populations of Scots pine (Pinus sylvestris L.) in central Spain. Forest Ecology and Management 197, 245-255. Robledo-Arnuncio JJ, Gil L (2005) Patterns of pollen dispersal in a small population of Pinus sylvestris L. revealed by total-exclusion paternity analysis. Heredity 94, 13-22. Shepherd M , Cross M , Maguire TL, Dieters MJ , Williams CG, Henry RJ (2002) Transpecific microsatellites for hard pines. Theoretical and Applied Genetics 104, 819-827. Sork V L , Davis FW, Smouse PE, Apsit V J , Dyer RJ, Fernandez M J , Kuhn B (2002) Pollen movement in declining populations of California Valley oak, Quercus lobata: where have all the fathers gone? Molecular Ecology 11, 1657-1668. Stoehr M U , Newton C H (2002) Evaluation of mating dynamics in a lodgepole pine seed orchard using chloroplast D N A markers. Canadian Journal of Forest Research 32, 469-476. Vos P, Hogers R, Bleeker M , Reijans M , Van De Lee T, Homes M , Frijters A , Pot J, Peleman J, Kuiper M , Zabeau M (1995) A F L P : A new technique for D N A fingerprinting. Nucleic Acids Research 23, 4407-4414. Williams JG, Kubelik AR, Livak KJ , Rafalski JA, Tingey SV (1990) D N A polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Research 18, 6531-6535. Woodhead M , Russell J, Squirrell J, Hollingsworth P M , Mackenzie K, Gibby M , Powell W (2005) Comparative analysis of population genetic structure in Athyrium distentifolium (Pteridophyta) using AFLPs and SSRs from anonymous and transcribed gene regions. Molecular Ecology 14, 1681-1695. 164 CHAPTER 7 THE CORRELATION OF GENE DIVERSITY AT LINKED LOCI IN PERIPHERAL VS. CENTRAL LODGEPOLE PINE POPULATIONS 7.1 Introduction The fundamental evolutionary factors of recombination, mutation, selection, and genetic drift are vital to shaping the patterns of genetic diversity in extant populations. In addition to shaping average levels of diversity of a genome, these factors can shape the pattern of variation of gene diversity along chromosomes. Firstly, levels of recombination may vary among chromosome and among regions within chromosomes (Nachman 2002). Secondly, the interplay between recombination and selection may result in reduction of genetic diversity at neutral loci surrounding a selected locus ("hitchhiking"; Maynard Smith & Haigh 1974). Finally, in bottlenecked populations, purely random processes can create linkage disequilibrium and localized reductions of diversity (the "Hill-Robertson" effect; Hil l & Robertson 1966). In general, genetic drift causes a stochastic reduction of gene diversity, especially in small or recent bottlenecked populations or in fronts of migration in expanding populations (Galtier et al. 2000). A l l of these processes in isolation or together can generate variation of diversity along a chromosome, manifested as a correlation of diversity between linked loci, a new dimension of population structure (Ritland 2004). Positive correlations between genetic diversity and physical map distance have been reported in mammals; humans (Homo sapiens) (Przeworski 2002), mouse (Mus domesticus) (Nachman 1997), and in Drosophila (Drosophila melanogaster) (Begun & Aquadro 1992). In these species, complete physical maps have been laboriously or expensively constructed from assembling bacterial artificial chromosome (BAC) sequence contigs or from complete 165 genome shotgun sequencing. For most wild species, physical maps are not available, and for conifers, would be practically impossible, as they have enormous, repetitive genomes (ca. 40 billion base pairs, roughly 10 times the size of humans). Alternatively, the correlation of diversity between linked loci can be estimated by the use of linkage analysis in segregating pedigrees where recombination rates can be estimated between loci, and a genetic map constructed. The genetic map distance, as measured in centimorgans, is proportional to physical map distance. Most genetic map construction is based on experimental populations derived from crosses between homozygous parents or inbred lines (Paterson 1996). Such an approach has been implemented in agricultural crop species as inbred lines are available or easily formed. In many plant species including forest trees, such pedigrees cannot be constructed, as these species have long life cycles and high genetic load. More importantly for our purposes, a genetic map by itself has little if no information about patterns of diversity along chromosomes, unless a separate assay of diversity is done for the mapped loci as was done recently on oak (Scotti-Saintagne et al. 2004). As a means to directly assay the correlation of diversity between linked loci in natural populations, a novel method for the joint estimation of recombination and gene diversity using multiple progeny arrays was developed by Hu et al. (2004). The use of several progeny arrays is equivalent to combining information from multiple experimental populations generated by inbred line crosses. Because a given marker locus segregates in only a fraction of pedigrees, this method also allows the integration of more markers into a marker map. Of interest here is that multiple progeny arrays allow direct estimation of gene 166 diversity in the population of interest, without bulk assays of population variation where prior knowledge of linkage relationships is known. Our organism of interest is Pinus contorta (lodgepole pine), which is widely distributed in western North America (Wheeler & Critchfield 1985). Palynological studies revealed that lodgepole pine migrated from southern refugia to the Yukon Territory in the last 12,000 years (MacDonald & Cwynar 1985). Central populations likely contain more genetic diversity than marginal populations. The reduction of genetic diversity in marginal populations may be due to random drift, increased inbreeding, and/or selection (Fazekas & Yeh 2001; Yang & Yeh 1995). Cwynar & MacDonald (1987) found that the time since population founding was positively correlated with the mean number of alleles and wing loadings (calculated as seed mass divided by the surface area of the wing (i.e., measure of seed mobility). Here, with the joint estimation of recombination and gene diversity using progeny arrays, we seek to determine the correlation of gene diversity at linked loci from populations of different geographic regions, which differ in population history. The aims of this study were (1) to estimate recombination frequency based on multiple half-sib family using AFLP markers, (2) to ascertain whether there exists a measurable correlation of diversity between loci separated on the order of a few map units, and (3) to compare the correlation of gene diversity between peripheral vs. central populations. The extent of the correlation of diversity between linked loci is another measure of population structure, one based upon genome structure, which may allow new insights into the evolutionary history of populations. 167 7.2 Material and Methods 7.2.1 Plant materials Two peripheral populations assayed for progeny arrays using two different A F L P primer combinations (Chapter 6). Seed-cones were collected from 20-30 trees with a minimum of 30 m between sampled trees from Carbondale (Calgary) and Whitehorse (Yukon) representing the eastern and northern limits of the species distribution, respectively. Seed were extracted and were kept separate by tree to maintain the identity of every sampled tree within each population. Twenty trees from each population were randomly selected for analysis. In summary, 400 samples (20 seeds per family) were used for join estimation of recombination and gene diversity for each population. The ten natural populations within the Prince George seed zone representing the central of species distribution (Chapter 3) were used to estimate genetic diversity and differentiation of the central population of lodgepole pine, based upon four PsXl-Msel AFLP primer combinations. Dormant vegetative buds were collected from a random sample of 30 trees per population. 7.2.2 DNA isolation and AFLP analysis Genomic D N A was isolated from individual germinants or vegetative buds using a modification of the C T A B (cetyltrimethyl ammonium bromide) method (Doyle & Doyle 1987). Details of the protocol for AFLP analysis are presented in the different previous Chapters (Chapter 3 and 6). A total of 12 AFLP primer combinations used for analysis in Whitehorse population were performed using three selective nucleotides of tailed E primers and four or five selective nucleotides of M primers: EcoRl + ACC/Msel + CCCC, EcoRl + ACC/Msel + C C C G , EcoRl + ACC/Msel + CCGA, EcoRl + ACC/Msel + CCGG, EcoRl + ACC/Msel + CCGT, EcoRl + ACC/Msel + C C A G A , EcoRl + ACC/Msel + CCAGT, EcoRl 168 + ACG/Msel + CCGT, EcoRl + ACG/Msel + CCTC, EcoRl + ACT/Msel + CCGG, EcoRl + ACT/Msel + CCGT, EcoRl + ACT/Msel + C C A G A . For Carbondale and the 10 Prince George populations, a total of 4 AFLP primer combinations used for analysis were performed using three selective nucleotides of tailed P primers and three selective nucleotides of M primers: Pstl + CAG/Msel + CCC, Pstl + CAG/Msel + CGG, Pstl + CCAIMsel + CCA, Pstl + CGAJMsel + CCG. 7.2.3 Data analyses Based on the assumption of Hardy-Weinberg Equilibrium, allele frequencies and recombination fractions of markers were estimated simultaneously using Expectation-maximization (EM) algorithm combined with Newton-Raphson Iteration. A computer program was written by K. Ritland to estimate heterozygosities and recombination fractions for the Whitehorse and Carbondale populations. For each population, correlations of heterozygosity between pair of loci were determined with Pearson correlation coefficients (SAS version 9). Estimates of average heterozygosity of locus pair were also obtained for each population. For the 10 Prince George populations, the pairwise recombination rates calculated using Pstl + Msel primers from the Carbondale population were used as a basis for investigating the correlation of gene diversity at linked loci. 7.3 Results 7.3.1 AFLP polymorphisms For Whitehorse population, AFLP analysis using 12 primer combinations of the EcoRl/Msel identified a total of 518 AFLP fragments, of which 434 (83.8%) were polymorphic (Table 7.1). For Carbondale population, A F L P analysis using 4 primer 169 combinations of the PstllMsel identified a total of 221 A F L P fragments, of which 165 (74.7%) were polymorphic (Table 7.1). The polymorphic markers of each population were then used to estimate pairwise recombination rates (r) and gene diversity. For the 10 Prince George populations, the polymorphisms of 4 AFLP primer combinations of the PstllMsel were as reported in Chapter 3. 7.3.2 Correlation of gene diversity at linked loci A total of 55,686 and 6,235 pairwise recombination rates between loci were estimated for Whitehorse and Carbondale, respectively. The distributions of the pairwise recombination rates between loci are shown in Figure 7.1 for (a) the Whitehorse population and (b) the Carbondale population. Twenty-four and 37 percent of the pairs of loci were closely linked (r < 0.05) for Whitehorse and Carbondale, respectively. There was an excess of tightly linked loci (r = 0.01) in both populations. This is due to the fact that half-sibs have less power to infer recombination rate; this results in small sample bias, towards a near-zero estimate of recombination rate with some pairs of loci, even though their true values are greater (simulations performed by K. Ritland have confirmed this). Figure 7.2 shows relationship between the correlation of gene diversity and linkage distance. For the Whitehorse population, the correlation of diversity between loci was observed when recombination rates had values less than ca. 0.12 (Fig. 7.2a). High significant correlation of diversity (P < 0.0001) was observed in more closely linked loci, and declined markedly with increasing recombination rates, reaching non-significant level at r > 0.08 (Fig. 7.2a). For the Carbondale population, significant correlation of heterozygosity between loci was observed at ca. r < 0.08 (Fig. 7.2b). For the 10 Prince George populations, no correlation of diversity between linked loci was apparent (Fig. 7.2c). Paradoxically, a 170 significant correlation of diversity (P < 0.05) was observed for very loosely linked loci (0.46 <r<0.48) (Fig. 7.2c). The average heterozygosity of locus pair plotted against estimates of pairwise recombination rates are shown in Figure 7.3 (a), (b) and (c) for Whitehorse, Carbondale, and the 10 natural populations, respectively. Small but significant differences in average heterozygosities of locus pairs were observed at different recombination rates between loci (P < 0.0001) in all studied populations. Average heterozygosity (HE) of locus pairs ranged from 0.368 (r = 0.45) to 0.430 (r = 0.12), 0.298 (r = 0.43) to 0.389 (r = 0.49), 0.197 (r = 0.08) to 0.398 (r = 0.17) for Whitehorse, Carbondale, and the 10 Prince George populations, respectively. 7.4 Discussion 7.4.1 Correlation of gene diversity at linked loci in lodgepole pine populations This study has demonstrated that the correlation of gene diversity at linked loci can be significant, at least in peripheral populations of lodgepole pine, with significant values extending out to 10 map units. By contrast, linkage disequilibrium rarely is significant beyond 100KB or so, which would be a small fraction of a map unit in a conifer. Further theoretical and empirical comparisons of the extent of the correlation of gene diversity, compared to linkage disequilibrium are needed. This study has also demonstrated that the correlation of gene diversity at linked loci is much higher for peripheral populations of lodgepole pine, compared to central populations. We purposely chose to focus on peripheral populations wherein the likelihood of significant correlations is greatest. The increased correlation in peripheral populations is probably due to the increased level of bottlenecks experienced by these peripheral populations during the 171 post-Pleistocene expansion of lodgepole pine. This is the first demonstration of differences in the correlation of diversity between linked loci within a species, and also the first to use the novel method for the joint estimation of recombination and gene diversity using progeny arrays for this purpose (Hu et al. 2004). Differences between populations in the strength of the correlation of gene diversity can result from differences in rates of genetic drifts or bottleneck events during population expansion, as such associations between linked loci are proportional to the rate of genetic drift (Barton 2000). From fossil pollen studies, MacDonald & Cwynar (1985) hypothesized that P. contorta ssp. latifolia migrated northwards along the Rocky Mountains from refugia located south of the continental glacial limits and extended to its present northern limit range in the Yukon in the last thousand years. Assuming a generation time for lodgepole pine of 80-100 years, the Whitehorse population may have occupied its present site for 10 or more generations. The effects of genetic drift are more pronounced in small, isolated or expanding populations (Galtier et al. 2000); therefore, the Whitehorse has more effects of genetic drift than Carbondale population, as probably indicated by Figure 7.2. No significant correlation of gene diversity among linked loci was observed in the 10 Prince George populations, which represent a central population. We note that between quite loosely linked loci, there was some correlation of gene diversity both in Carbondale and the 10 Prince George populations. This is likely a statistical artifact due to, again, the problems with using dominant genetic markers and half-sib progeny arrays, although an association between gene diversity of loci can occur in inbreeding populations, even i f they are not linked (Charlesworth 1991). This seems unlikely for lodgepole pine, which has outcrossing rates exceeding 95%. 172 7.4.2 Decline in patterns of gene diversity between linked loci High correlation of diversity was observed at more closely linked loci, and this correlation declined with increasing recombination rate in Whitehorse population (Fig. 7.4, r = 0.89). As well, the mean genetic diversity between linked loci declined slightly with recombination rate (Fig. 7.3a). This mean diversity between linked loci is a distinct statistic from the correlation of diversity between linked loci. The latter pattern of a decline of mean diversity between linked loci is not expected from population genetic theory, and although slight, merits further attention as to the actual or artificial causes. The results presented here are similar to many previous studies on linkage disequilibrium (LD). The decline of L D with increasing molecular map distance between loci has been reported for Drosophila melanogaster (Aguade et al. 1989; Miyashita & Langley 1988), and Plasmodium falciparum (Conway et al. 1999). However, it is expected that levels of gene diversity might be more closely associated between linked loci than levels of LD. Linkage disequilibrium generated by selection or genetic drift declines rapidly with time. A rapid decline in correlation of diversity appears to be the results of recombination breaking down linkage. 7.4.3 Reduction in gene diversity at linked loci No reductions in gene diversity between closely linked loci (r < 0.05) were observed in this study as shown by scatter plot of heterozygosity between pairs of loci (Fig. 7.5). Decreased diversity at linked loci could be explained by the interplay between recombination and selection based on two theoretical models: (1) genetic hitchhiking and (2) background selection. The genetic hitchhiking model assumes that the loss of variation at a neutral locus is due to a selective sweep at linked locus (Maynard Smith & Haigh 1974). A strongly 173 advantageous mutation (allele) rapidly goes to fixation and neutral variants linked to selected allele are hitchhiked to fixation, thus reducing genetic diversity at linked loci. In contrast, the background selection model proposes that deleterious mutations are selected against and eliminated from a population thus decreasing linked neutral genetic variation (Charlesworth 1994; Charlesworth et al. 1993). For both models, the reduction in genetic variation at linked loci is stronger in genomic regions with low recombination rates. The results of this study are not in agreement with previous studies on correlation between levels of gene diversity and recombination rate per physical unit for some plant species. Decreased diversity in genomic regions with low recombination rates have been reported in tomato (Lycopersicon esculentum) (Stephan & Langley 1998), sea beet (Beta vulgaris) (Kraft et al. 1998), and goatgrass (Aegilops L.) (Dvorak et al. 1998). However these comparisons involve different populations, and not genomic regions within populations. Within a population, the dynamics of diversity will differ from that between reproductively isolated populations. As well, physical distance between gene loci may not show a high correspondence with recombination distance. It is important to note that our estimates for recombination fractions and heterozygosities are based upon polymorphic loci, which excludes homozygous loci, as these loci cannot be mapped. This can also lead to no reduction of gene diversity at linked loci as observed in this study. Similar to the present study, however, there is evidence that the level of gene diversity is not reduced in genes located in low-recombination regions of plant species such as Lycopersicon spp. (Baudry et al. 2001) and Zea may ssp. mays L (Tenaillon et al. 2002). In lodgepole pine, increased seed dispersal in peripheral populations may result from directional selection associated with long-distance dispersal events (Cwynar & 174 MacDonald 1987). If so, one might expect greater reduction of gene diversity between closely linked loci in the most northerly population. However, there is no convincing evidence that directional selection contributes to the positive correlation between recombination and diversity in the present study. There are two possible explanations for this, as follows. The first possibility is based upon the patterns of genetic diversity and population structure as described by Epperson & Allard (1986). They suggested that optimal initial conditions for hitchhiking occur rarely in lodgepole pine, as this species shows low population differentiation and little population structure. Studies on genetic variability of P. contorta ssp. latifolia across geographic ranges using isozymes and RAPDs has revealed that the majority (>90%) of the genetic diversity resides within populations (Dancik & Yeh 1983; Fazekas & Yeh 2001; Wheeler & Guries 1982; Yeh & Layton 1979). In addition, estimates of the mating systems of these populations (Chapter 5) indicates little population substructure within these populations. These patterns suggested that during the past 12,000 years, new selectively beneficial alleles have been infrequently introduced into the most northern population of lodgepole pines. This raises the question of how the most northern population of lodgepole pine can adapt to a severe environment or habitat expansion. Hermisson and Pennings (2005) suggested that such a population might adapt with standing genetic variation. The second possibility is that the signature of selection could be obscured or driven by demographic factors (Baudry et al. 2001; Tenaillon et al. 2002). Baudry et al. (2001) investigated the influence of recombination on genetic diversity by comparing the level of D N A polymorphism at chromosome regions with high and low recombination rate in five 175 Lycopersicon (tomato) species, two of which inbreed at high to intermediate levels. A l l five species revealed the presence of positive correlation between recombination and diversity; however, the correlation was found to be stronger in self-compatible, but not in the outbreeding species. They suggested that the type of mating system, including demographic factors associated with the mating system, have had a stronger influence on genetic variation than does recombination. Also, Tenaillon et al. (2002) investigated the patterns of diversity and recombination along chromosome 1 of maize by comparing the level of diversity in SNPs and microsatellites to two measures of recombination estimated from D N A sequence data (C; population-recombination estimate) and from a quantitative cytogenetic map (R; physical-recombination estimate). They reported that SNP diversity was positively correlated with C, but not with R. In contrast, microsatellite diversity was positively correlated with R, but not with C. ?^ and C measure different quantities. R is not affected by population history, selection, and demography, but C is affected by selection and demographic factors such as population subdivision and population bottleneck. Thus, the authors demonstrate that the correlation between SNP diversity and C may be driven by demography. In conclusion, this study has suggested that demographic factors such as bottlenecks and migrational events have had a strong effect in shaping genomic patterns in lodgepole pine. However, the extent that natural selection has affected genomic patterns of diversity, particularly by the hitchhiking of adaptive variants, is still a major question to be addressed in the Yukon population, the most northern lodgepole pine population of this study, as well as possibly other peripheral populations of lodgepole pine. 176 Table 7.1 Number of polymorphic A F L P loci observed from A F L P primer combinations Total number Number of Percent of Primer combinations of loci polymorphic loci polymorphic loci Whitehorse E+ACC/M+CCCC 42 35 83.3 E+ACC/M+CCCG 46 37 80.4 E+ACC/M+CCGA 44 36 81.8 E+ACC/M+CCGG 46 40 87.0 E+ACC/M+CCGT 47 42 89.4 E+ACC/M+CCAGA 41 30 73.2 E+ACC/M+CCAGT 31 26 83.9 E+ACG/M+CCGT 54 48 88.9 E+ACG/M+CCTC 34 27 79.4 E+ACT/M+CCGG 41 35 85.4 E+ACT/M+CCGT 44 39 88.6 E+ACT/M+CCAGA 48 39 81.3 Total 518 434 83.8 Carbondale P+CAG/M+CCC 58 35 60.3 P+CAG/M+CGG 59 51 86.4 P+CCA/M+CCA 50 31 62.0 P+CGA/M+CCG 54 48 88.9 Total 221 165 74.7 177 T 1 1 r 0 . 0 0 0 . 0 5 0 . 1 0 0 . 1 5 0 .20 0 .25 0 . 3 0 0 . 3 5 0 .40 0 . 4 5 0 . 5 0 Recombination rate between loci Figure 7.1 The distribution of pairwise recombination rates between loci in Whitehorse (a) and Carbondale (b) populations 178 0.10 0.05 - --1 - --0.15 u 0.20/ £ % •• | 0;10 • SI t- . Si o.oo -o.io H .2 -0.20 • c n a If I J U L -0.30 • o.6o 0.40 KS HS •Ti r~ a CJ • -0.20 0.0 0.1 0.2 0.3 0.4 Recombination rate (distance) between loci 0:5 Figure 7.2 The correlation of gene diversity between pairs of markers, plotted against estimates of pairwise recombination rates in Whitehorse (a), Carbondale (b), and the 10 Prince George (c) populations (NS; not significant, *; significant at P < 0.05, **; significant at P< 0.01, ***; significant at P< 0.001) 179 0.5 0.4 0.3 0.2 0.1 0.0 0.4 Z 0.3 o i? ' § ' eti • >. g 0.2 0.1 0.0 0.4 0.3 0.2 0.1 \ \ 0.0 0.1 0.2 0.3 0.4 Recombination rate (distance) between loci Figure 7.3 The average gene diversity between pairs of markers, plotted against estimates of pairwise recombination rates in Whitehorse (a), Carbondale (b) and the 10 Prince George (c) populations 180 Figure 7.4 Decline in correlation of gene diversity with increasing recombination rates evaluated by polynomial regression analysis 181 Figure 7.5 Scatter plots showing no reductions in heterozygosities between pairs of closely linked loci (r < 0.05) 182 7.5 References Aguade M , Miyashita N , Langley C H (1989) Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster. Genetics 122, 607-615. Barton N H (2000) Genetic hitchhiking. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences 355, 1553-1562. Baudry E, Kerdelhue C, Innan H, Stephan W (2001) Species and recombination effects on D N A variability in the tomato genus. Genetics 158, 1725-1735. Begun DJ, Aquadro CF (1992) Levels of naturally occurring D N A polymorphism correlate with recombination rates in Drosophila melanogaster. Nature 356, 519-520. Charlesworth B (1994) The effect of background selection against deleterious mutations on weakly selected, linked variants. Genetical Research 63, 213-227. Charlesworth B, Moran MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289-1303. Charlesworth D (1991) The apparent selection on neutral marker loci in partially inbreeding populations. Genetical Research 57, 159-175. Conway DJ, Roper C, Oduola A M J , Arnot DE, Kremsner PG, Grobusch MP, Curtis CF, Greenwood B M (1999) High recombination rate in natural populations of Plasmodium falciparum. Proceedings of the National Academy of Sciences of the United States of America 96, 4506-4511. Cwynar L C , MacDonald G M (1987) Geographic variation of Lodgepole pine in relation to population history. American Naturalist 129, 463-469. Dancik BP, Yeh FC (1983) Allozyme variability and evolution of lodgepole pine {Pinus contorta var. latiforlia) and jack pine (P. banksiana) in Alberta. Canadian Journal of Genetics and Cytology 25, 57-64. Doyle JJ, Doyle JL (1987) A rapid D N A isolation procedure for small quantities of fresh tissue. Phytochemical Bulletin 19, 11-15. Dvorak J, Luo M C , Yang Z L (1998) Restriction fragment length polymorphism and divergence in the genomic regions of high and low recombination in self-fertilizing and cross-fertilizing Aegilops species. Genetics 148, 423-434. Epperson B K , Allard RW (1986) Linkage disequilibrium between allozymes in natural populations of lodgepole pine. Genetics 115, 341-352. Fazekas AJ , Yeh FC (2001) Random amplified polymorphic D N A diversity of marginal and central populations in Pinus contorta subsp. latifolia. Genome 44, 13-22. Galtier N , Depaulis F, Barton Nicholas H (2000) Detecting bottlenecks and selective sweeps from D N A sequence polymorphism. Genetics 155, 981-987. Hil l WG, Robertson A (1966) Effect of linkage on limits to artificial selection. Genetical Research 8, 269-294. 183 Hu X S , Goodwillie C, Ritland K M (2004) Joining genetic linkage maps using a joint likelihood function. Theoretical and Applied Genetics 109, 996-1004. Kraft T, Sail T, Magnusson-Rading I, Nilsson NO, Hallden C (1998) Positive correlation between recombination rates and levels of genetic variation in natural populations of sea beet (Beta vulgaris subsp. maritima). Genetics 150, 1239-1244. MacDonald G M , Cwynar L C (1985) A fossil pollen based reconstruction of the late Quaternary history of lodgepole pine (Pinus contorta ssp. latifolia) in the western interior of Canada. Canadian Journal of Forest Research 15, 1039-1044. Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favorable gene. Genetical Research 23, 23-25. Miyashita N , Langley C H (1988) Molecular and phenotypic variation of the white locus region in Drosophila melanogaster. Genetics 120, 199-212. Nachman M W (1997) Patterns of D N A variability at X-linked loci in Mus domesticus. Genetics 147, 1303-1316. Nachman M W (2002) Variation in recombination rate across the genome: evidence and implications. Current Opinion in Genetics & Development 12, 657-663. Paterson A H (1996) Making genetic maps. In: Genome mapping in plants (ed. Paterson AH), pp. 23-39. R.G. Landes Company, Georgetown, Texas. Przeworski M (2002) The signature of positive selection at randomly chosen loci. Genetics 160, 1179-1189. Ritland K (2004) Pathways to plant population genomics In: Plant Adaptation: Molecular Genetics and Ecology, Proceedings of an International Workshop sponsored by the UBC Botanical Garden and Centre for Plant Research, December 11-13, 2002, Vancouver, British Columbia, Canada, N R C Research Press, Ottawa, Ontario, pp. 11-17. Scotti-Saintagne C, Mariette S, Porth I, Goicoechea PG, Barreneche T, Bodenes K, Burg K, Kremer A (2004) Genome scanning for interspecific differentiation between two closely related oak species [Quercus robur L. and Q. petraea (Matt.) Liebl.]. Genetics 168,1615-1626. Stephan W, Langley C H (1998) D N A polymorphism in Lycopersicon and crossing-over per physical length. Genetics 150, 1585-1593. Tenaillon MI, Sawkins M C , Anderson L K , Stack SM, Doebley J, Gaut BS (2002) Patterns of diversity and recombination along chromosome 1 of maize (Zea mays ssp mays L.). Genetics 162, 1401-1413. Wheeler NC, Critchfield WB (1985) The distribution and botanical characteristics of lodgepole pine: Biogeographical and management implications. In: Lodgepole pine: The species and its management (eds. Baumgartner D M , Krebill RG, Arnott JT, Weetman GF), pp. 1-13, Washington State University, USA. Wheeler NC, Guries RP (1982) Population structure, genie diversity, and morphological variation in Pinus contorta Dougl. Canadian Journal of Forest Research 12, 595-606. 184 Yang RC, Yeh FC (1995) Patterns of gene flow and geographic structure in Pinus contorta DOUGL. Forest Genetics 2, 65-75. Yeh FC, Layton C (1979) The organization of genetic variability in central and marginal populations of lodgepole pine Pinus contorta ssp. latifolia. Canadian Journal of Genetics and Cytology 21, 487-503. 185 CHAPTER 8 CONCLUSIONS My thesis evaluated the genetics of natural and domesticated populations of lodgepole pine (Pinus contorta ssp. latifolia) with new microsatellite and A F L P markers, and with novel analysis of data. Lodgepole pine is a conifer species which offers opportunities to (1) evaluate the efficacy of marker development in a conifer, as it, like other conifers, has a huge genome with many consequent effects upon molecular protocols for marker development, (2) study the effects of domestication upon genetic variability, as it is a major species in the British Columbia Ministry of Forests tree improvement program, and (3) explore the genetic effects of extensive range expansion, as it has undergone dramatic expansion of its range since the Pleistocene glaciation, with consequent effects upon its pattern of genomic genetic variation. This rather unique biology allows opportunities to study the mating system and sibship structure of peripheral populations, and the genomic structure of a species that has undergone bottlenecks due to repeated episodes of migration. 8.1 Marker development in a conifer Microsatellites are the "marker of choice" in population genetics research because of their high variability. The development cost of these markers is usually high, and microsatellite primers for one species often do not cross-amplify in related species. In this thesis, we targeted microsatellites found in ESTs (expressed sequence tags), on the assumption that they will better cross-amplify with other related species, as they reside in, or near, conserved coding DNA, where primer sites are conserved and more likely to cross-amplify. 186 We identified 14 Pinus taeda (loblolly pine) EST-SSRs from public EST databases and tested for their cross-species transferability to P. contorta ssp. latifolia, P. ponderosa and P. sylvestris. As part of our development of P. contorta microsatellites, we also compare their transferability to that of 99 traditional microsatellite markers developed in P. taeda and tested on P. contorta ssp. latifolia. Compared to traditional microsatellites, EST-SSRs had higher transfer rates across pine species; however, the level of polymorphism of microsatellites derived from EST was lower. Sequence analyses revealed that the frequencies of insertions/deletions and base substitutions were lower in EST-SSRs than in other types of microsatellites, confirming that EST-SSRs are more conserved. The transferability success of SSRs increases as the evolutionary distance between the source and target species decreases. According to the taxonomy of the genus Pinus, P. contorta is in the same subsection Contortae as P. banksiana and P. virginiana (Little & Critchfield 1969) and closely related to other pine species in the subsection Ponderosae (P. ponderosa and P.jeffereyi) (Krupkin et al. 1996). The 16 polymorphic PtTX microsatellites can be used as a starting set of primers for testing cross-species transferability to other pines. It would be interesting to study the evolution of microsatellites particularly on PtTX3030 due to its 54-bp insertion which was observed between the compound microsatellite repeat of PtTX3030 in P. contorta ssp. latifolia (compared to P. taeda). Locus PtTX3030 have been reported to successfully cross amplify in other pine species (Kutil & Williams 2001); however, there are no reports on other loci sequence data. Comparing the sequence of orthologous loci in different pine species can provide information on the birth and death of microsatellites (Sokol & Williams 2005). 187 8.2 Effects of domestication The genetic diversity along the domestication process (natural -> breeding -> production -> seed -> seedling populations) of lodgepole pine was investigated using microsatellite (SSR) and amplified fragment length polymorphism (AFLP) D N A genetic markers. Genetic variability of 10 natural populations, each with 30 individuals, from the British Columbia's Prince George breeding zone, provided the benchmark for comparison. A small amount of genetic variability was observed among the studied natural populations, thus allowing using their collective genetic diversity for comparison. Most of the lost alleles along the domestication process were rare and expected heterozygosity (HE) did not change substantially among populations. This was expected due to the minuscule effect of rare allele's contribution to overall diversity. Expanding large breeding populations is not financially and practically feasible for maintaining rare frequency alleles. However, rare frequency alleles can be maintained in gene resource populations. The current inventory of low elevation, lodgepole pine in Prince George Seed Planning Unit (SPU) indicated that at least 5,000 trees are present in 26 natural reserves (Hamann et al. 2004). In-situ populations, consisting of a census number of 5,000 (Ne = 1,000), are adequate for conserving rare alleles (Yanchuk 2001). The early domestication process of lodgepole pine revealed some impacts on the reduction of genetic diversity; therefore, monitoring the impact of more intensive breeding is required to prevent any further erosion of genetic variability, especially to prevent the loss of rare and low frequency alleles. 8.3 Mating system and sibship structure The mating system analyses of lodgepole pine revealed high outcrossing rate in both peripheral populations, in agreement with the previous studies using isozyme markers. 188 However, the outcrossing rates estimated using molecular markers is likely biased upward, due to the inability to genetically assay progeny that have died as a result of inbreeding depression due to selfing. The level of biparental inbreeding is more pronounced in the northern compared to the eastern population, probably due to the extent of low-stand density, population substructure, and founder effects. Low levels of correlated paternity were observed in both peripheral populations, possibly due to the fact that these peripheral populations are not geographically isolated. Outcrossing promotes recombination, but outcrossing between close relatives may result in the reduction of recombination rates. Further studies of mating systems will help the understanding of the observed correlation between recombination rates and gene diversity. Sibship analyses within family seed array of lodgepole pines revealed that the minimum number of pollen donors contributing to the family or multiple paternity and also indicated the low level of correlated mating. Increasing sample size in each family array will increase the analysis accuracy. Statistical analyses of group likelihood approach need further development by extending the model for the case of polygamous in both sexes (Wang 2004). In seed orchards, where all parents can be genotyped, partial and/or full pedigree reconstruction can be conducted and the number of male and female parents contributing to bulk seed lots as well as pollen migration (contaminations) can be accurately inferred using molecular markers. Extending the above mentioned method of pedigree reconstruction to molecular breeding can lead to substantial saving in time and resources and could effectively change the way conventional breeding is being conducted (El-Kassaby, personal communications). 189 8.4 Correlation of gene diversity at linked loci A significant correlation of gene diversity was observed in lodgepole pine in both peripheral populations, with the most northern population exhibiting a greater correlation. Mean levels of diversity did not depend on linkage distance. A rapid decline in correlation of diversity appears to be the results of recombination breaking down linkage, as facilitated by the high outcrossing rate of lodgepole pine. No correlation of gene diversity between linked loci was observed in the central population, suggesting that selection had no or weak effect in shaping genetic diversity in the peripheral populations; however, it is possible that the signature of selection could be obscured by demographic factors (population bottlenecks and population founding events) associated with mating system. The advantage of applying the novel method of the joint estimation of recombination and gene diversity using progeny arrays is that it does not require the presence of control crosses (i.e., mapping population). Based on simulation, Ritland and his colleagues indicated that the resolution of this approach is dependent on the number of families, the number of individuals per family, and the types of molecular markers (i.e., co- and/or dominant markers) used. Sampling more families provides better estimates of gene diversity while sampling more family size provides better estimates of recombination rate; an optimization between them is required. Codominant and/or a mixture dominant and co-dominant markers provide better estimates of recombination rate than dominant markers. The statistical approach for joining half-sib linkage maps, namely "gene diversity mapping", is under developed (K. Ritland, personal communications). By using gene diversity mapping approach particularly with species where many microsatellite markers are available, the 190 pattern of gene diversity along chromosomes could be evaluated and more insights into the evolutionary history of population and gene function could be discerned. 8.5 Marker choice The constant emergence of new molecular markers coupled with the application of advanced statistical analyses allowed biologists to study various facets of population genetics and gene conservation. However, the performance of markers requires careful selection of the most suitable marker. The present study revealed slight differences in the apportionment of genetic variability within and among populations for AFLPs vs. SSRs. This discrepancy is due to the differences in: (1) the number of marker representing the whole genome coverage, (2) the relative mutation rates, and (3) the level of population heterogeneity in diversity levels. SSRs are preferred to AFLPs for fine-scale population genetic studies such as autocorrelation, mating system, and paternity analyses, while AFLPs are more efficient in differentiating between closed populations due to their board coverage of the genome. Both types of markers can further increase densities of linkage maps. SSRs provide anchor points for specific regions of the genome, while AFLPs fill gaps between SSR markers. Single nucleotide polymorphisms (SNPs) are rapidly becoming the marker of choice in human genetic studies, as they can target genes of specific interest, but there are currently limited numbers of SNPs in plants especially in forest tree species. This will undoubtedly change in the near future. Combined with their amenability to high-throughput assays, SNPs will allow us to answer many aspects in population ecology, evolution, and conservation (Morin et al. 2004). 191 8.6 References Hamann A , Aitken SN, Yanchuk A D (2004) Cataloguing in situ protection of genetic resources for major commercial forest trees in British Columbia. Forest Ecology and Management 197, 295-305. Krupkin A B , Liston A , Strauss SH (1996) Phylogenetic analysis of the hard pines (Pinus subgenus Pinus, Pinaceae) from chloroplast D N A restriction site analysis. American Journal of Botany 83, 489-498. Kutil BL , Williams C G (2001) Triplet-repeat microsatellite shared among hard and soft pines. Journal of Heredity 92, 327-332. Little EL, Jr., Critchfield WB (1969) Subdivisions of the Genus Pinus (pines). Miscellaneous Publication 1144. Morin PA, Luikart G, Wayne R K (2004) SNPs in ecology, evolution and conservation. Trends in Ecology and Evolution 19, 208-216. Sokol K A , Williams C G (2005) Evolution of a trplet repeat in a conifer. Genome 48, 417-426. Wang JL (2004) Sibship reconstruction from genetic data with typing errors. Genetics 166, 1963-1979. Yanchuk A D (2001) A quantitative framework for breeding and conservation of forest tree genetic resources in British Columbia. Canadian Journal of Forest Research 31, 566-576. 192 APPENDIX I List of PtTX microsatellite series No. Marker Motif Library Type Allele size Ta (°C) 1 PtTX2001 AGCTTTTTGAGCATAGGAATAA - F TTTGTGAAGGTCGGTTGAA - R (TGG)n LC 337 65 2 PtTX2002 CACATTTTCTAACATTTTTT - F G C T C C T T C A T T T A T T T A T T T - R (TGG),o(TG)3TC(TG)4 LC 261 65 3 PtTX2003 C C T C C A C A A T A T A C A C C T T - F CCAGATCATCACTTCCTA - R (ACC)g LC 122 65 4 PtTX2033 CATTCCTACAAAACTTCTAAATTAA - F C C A T A T T T G A T G C G T T G A T T - R (CACT)„ATCAA(TCA) 5 , . .(TCA) 4TCGC(TCA) 6 LC 215 55 5 PtTX2034 TCTGAGGAGGAACATGTCATTTACT - F GCATGTCTGAATTATTGTGTTCTAT - R (TTTG), LC 217 55 6 PtTX2037 G C C T T T A G A T G A A T G A A C C C A - F TAAGCGGGATATTATAGAGTTT - R (GTGA)„GT,4 LC 177 61 7 PtTX2082 AAATGTTTAATATGAAGTTGAG - F G A T G G A T C T A T G T T G G T T T - R (GT), 4(GAGT) 7(GA) 1 3 LC 253 61 8 PtTX2085 T A C G C A A A C G T T A C G T A C A C - F TATATCCCCCGTTAGTAGA - R (CAA), LC 191 55 9 PtTX2090 C C C G C C T A T T C C A C C T A - F C T A C A C A T T T C A C C C A T A A G T C C - R (CGT)2T(CGT), LC 338 59 10 PtTX2095 T C C G G T A T T G T C T C T G T T C - F TTCGCAAACGTATCCTAAC - R (TTTG)iiTTTTG(TTTG)1( TTTG) 4 . . .(GT) 1 2 LC 410 63 11 PtTX2121 T C G G T G T C G G A G A C C A A A C T G C - F A C G G T C G T C C C C G G A T G T G A A T - R (GCT)5T(GCT)6 (TGC),... (TTG)sCCGTTG G 182 55 12 PtTX2122 A C T C A G A C T G C A A C G T T A G C - F AGTGGGATTATTTCACAGAT T(TGC)5...(TTG) f >A(TGC) 4 G 123 55 13 PtTX2123 G A A G A A C C C A C A A A C A C A A G - F GGGCAAGAATTCAATGATAA - R (AGC) 8 G 202 55 14 PtTX2128 TGGATAATCCTTTCAGTC - F TCTCGGATTCTCTTACAG - R (GAC) 8 G 245 55 15 PtTX2146 CCTGGGGATTTGGATTGGGTATTTG - F ATATTTTCCTTGCCCCTTCCAGACA - R (GAG) 5...(CAG) SCGG(C AG) 7CGG(CAG) 4 (CGT) 5...(CCG) 4...(CGT) 7 G (EST) 180 55 16 PtTX2164 T C A A A T A T T A A G A A G G T A A C A A T A C - F (CAT) 3(CGT) 4CAT(CGT) 5 TGTCGTCAT(CGT) 5(CA T)„, G 252 55 G A A A A T G A A A A T C T T A A A A A A A T T C - R 17 PtTX2183 TTAGTTGCAAAGAATATTTAAGGT - F CCTGCACTAGCTTTATATTTCATA - R (CAA) 18 G 205 55 18 PtTX2189 ATGAGCCTTTATTTATTGTTTTTG - F ATAGGATTTAAGTAGTTTTTCATT - R A, 5 (AACC) 6 G 289 63 19 PtTX3001 A T A A A G G C A G A G G A T G A A C A - F (CAA) 3... (CAA),CAG(CA A) 4 LC 313 61 C C C A A T T G T T A T T T C T G A T T - R 20 PtTX3002 T T G T T G T G C T C A T A A T T A C T A G T G T - F (GAG) 6...(GAG) 4AA(GA G) 4 (AGG)c,... (AGG)3 AG A A G LC 194 65 CTCCTAAGCTTGCTCATGTG - R 21 PtTX3003 C A A A T C A T T T A G T A T C T C A T A T C - F AAAGTTCAGTCTCAGTGGAC - R (AGG) 4...(AGG) 4CAA(A GG) 3 LC 219 59 22 PtTX3005 T G T T G A T G A T G A G G A T G A C G A - F CATTAATTTAGTGTGGCTTTTT - R (GAC) 5AAC(GAC)„ LC 80 61 23 PtTX3011 AATTTGGGTGTATTTTTCTTAGA - F AAAAGTTGAAGGAGTTGGTGATC - R (GAA) 5 . . .(GAT) 1 5 LC 186 61 24 PtTX3013 G C T T C T C C A T T A A C T A A T T C T A - F TCAAAATTGTTCGTAAAACCTC - R (GTT)KI LC 134 63 25 PtTX3019 A A G A A T A T C A A G C A C T C C - F C A A A G G C A T A A A G A A A C T - R (CAA),„ LC 223 55 193 No. Marker Motif Library Type Allele size Ta (°C) 26 PtTX3020 G T C G G G G A A G T G A A A G T A - F C T A G G T G C A A G A A A A G A G T A T - R A, 6 (CAA), LC 211 61 27 PtTX3023 C A T C T A G T T A C C A A A G T T A T - F A T T T A T G A A A A T G G T A A G T - R (CAA) 4...(CAA) 4 LC 168 55 28 PtTX3025 C A C G C T G T A T A A T A A C A A T C T A - F T T C T A T A T T C G C T T T T A G T T T C - R (CAA),„ LC 266 59 29 PtTX3026 AATACTTGGGAGGGATAC - F AATAGCCAGTTTTGTTTG - R (ATC)„.. .(ATC) 5 . . . (ACC) 4(ATC)f, LC 344 59 30 PtTX3027 TCCATTTGAGAACTTTTT - F A G G A G C C A C A A C A T A A T A - R (CAT),,, LC 280 55 31 PtTX3029 CTTGTTGCTGCTTCTGC - F A A C A A A A T A A T A T A A A T G C T C T G C - R (GCT) 5...(GCT) 8...(GCT) 5 LC 255 61 32 PtTX3030 AATGAAAGGCAAGTGTCG - F G A G A T G C A A G A T A A A G G A A G T T - R (TA) 5...(GGT), 0 (GAT) 3 5(GAC) 3GAT(GAC LC 287 59 33 PtTX3032 C T G C C A C A C T A C C A A C C - F AACATTAAGATCTCATTTCAA - R )„AACGAGAAG(GAC) 6A AT(GAT) 6 LC 335 59 34 PtTX3034 T C A A A A T G C A A A A G A C G - F A T T A G G A C T G G G G A T G A T - R (GT),„(GA),3 LC 207 55 35 PtTX3037 C G T T T G G A G C A C T A C T T - F A A G T C A C T T A A T G C A A T A T G T A - R (GA),A, 3(CAA),5 LC 144 59 36 PtTX3044 ACCCTTTTGCCCTCACC - F T A G C A T A A T C C A C C A G A A T A A C T C - R (ACT) 4 2ATT(ACT),„ LC 233 65 37 PtTX3045 CATCGCATATCGCAATCAGG - F A T C G G A G T C A A A A C A C A A A A G A A A - R (CA)„ LC 226 55 38 PtTX3047 TTGGAATACTTGCACGATGAC - F ATTTAGATAGGAGATGGTTGTTTA - R (TACA) 3 (TA) 2(CA) 2„ LC 354 55 39 PtTX3049 G A A G T G A T A A T G G C A T A G C A A A A T - F (TGV, LC 311 55 C A G A C C C G T G A A A G T A A T A A A C A T - R (ATC) 8 . . .(ATC) 4 . . .(ATC)4 ...(ATC) 4 . . .(ATC) 6 40 PtTX3052 C C T C A C T A G G A G G C T A C G G A A G A G - F AAAGACTCCTTGATGTTGTGAACA - R LC 242 63 41 PtTX3053 A G G C G A C T G A T G A G A A G A T G T A A T - F C T A G T A A C G G C C G C C A G T G T G C T - R (CAT),,, LC 333 61 42 PtTX3055 AGCAGACTTGAAGGGAAAAA - F (GAT) 5...(GAT) 8...(GAT) r. LC 402 59 ATCATCTATATTACCAGGGAGTT - R 43 PtTX3058 A C T C T A C T A C T T G T T C T A C C T C A - F A T T A T A T T T C G G C A T T G T - R (CAA) g . . .(CAA) 4 . . .(CAA) 3 CAG(CAA) 8 CAG(CAA) 5 LC 423 59 44 PtTX3063 C A A T C A G A A T C A G C G G C A A A C A A A - F T T C A A C A A C A T T C A T C A C A C T A - R (CAA) 7(CAT) 2(CAA) 6CA G(CAA), 5 LC 268 65 45 PtTX3067 G T T C A A T T G C C C T T T A C A T C - F TCGCTGCCTTGAATAGAG - R (GTT), 4T, 7 LC 349 57 46 PtTX3084 C A T G T C A G G C T A G G A G G T C A - F GATGTCATGGATGTGTATTTATTG - R (CAA) 3 2 LC 332 65 47 PtTX3087 TTGAAAGTCTTGTCCCTATGTAAT - F A A G A A A A C C C C C A A A C T C G - R (ATT) 4(GTT),,TTT(GTT) 5 GCT(GTT) 3(ATT), LC 292 65 48 PtTX3090 GCTAGCCTCTTACAATGTCAAAAT - F (CAC) 4(CAT) 2 4CAC(CAT ).i LC 258 57 AAAAAGTGTAATTCATTCTATTC - R 49 PtTX3091 G T G G C C A C C T G C T T A T T - F AACCCTTCCTATGACTATGG - R (GTT),„T,3GGT,„CT5 LC 229 64 50 PtTX3096 TAATTGGTTATCATTTGTCTTT - F (GAT) 6(GAA) 3GAT(GAA ) 7 LC 260 57 C A T T G A C T T A A A A T C C A T A C A T - R 51 PtTX3098 T T T G C A C T A T G G C A T A A G T C C T - F CCCTGTTTCTACCCTTGATGA - R (GTT)8 LC 187 63 52 PtTX3101 ATGTATTGCAGTATTTTAGTATCA - F TATTTTGTCTTGGTTATCAT - R (TAA),„(CAA) 2 (, LC 263 65 53 PtTX3102 ATTTAGTTATGATCTGGTTTTT - F (GAT)i(GAA)3(GAT)(GA A), LC 130 65 AAGCTATTATAATCATTTCTCACA - R 194 No. Marker Motif Library Allele Ta Type size C C ) 54 PtTX3104 TGTCGGTGGAGTTGGCAGTAGACT-F AGGGCCCAGCGTTTCCTG - R (CAA), 8 LC 226 55 55 PtTX3105 TGTCGGTGGAGTTGGCAGTAGACT-F AGGGCCCAGCGTTTCCTG - R (GTT), LC 258 55 56 PtTX3107 A A A C A A G C C C A C A T C G T C A A T C - F TCCCCTGGATCTGAGGA - R (CAT) I 4 LC 182 55 57 PtTX3110 CTCCTAGGACTTTCTTTGTTG - F GGGGTGGAGGAGGAATCATA - R (GTT) 8...(CTT) 8 LC 305 61 58 PtTX3112 A A A A G G G C C T C A A A G A A A A A T - F A T A G G G A G A T A A G T T G A A A A T A - R A,,(CAA) I 2 LC 161 65 59 PtTX3115 ACACAAGATAGTTATACTACC - F AGGTGGCTACATTTTCT - R (CAT) 8CGT(CAT)„ LC 253 61 60 PtTX3116 CCTCCCAAAGCCTAAAGAAT - F CATACAAGGCCTTATCTTACAGAA - R (TTG),...(TTG) 5 LC 146 55 61 PtTX3118 CACGGCCCTTAGCTTTACCTT - F T T C T G A T G G G G C A A C T G - R (CAT) 3CGT(CAT) 4CAC(C AT),, LC 212 65 62 PtTX3120 C C C A C A A A C A A G G A G G T C - F T A G C A G T C G A G T T A G A A G A T T A G A - R (CAA) 7 CAT(CAA) 2 5 LC 343 55 63 PtTX3123 TTTGGCAAAAGAACATTGAGAT - F A T A T T G G T A T T A G T T G A A G T T - R (GAT)„(GAC) 3GAT(GAC )8...(GAC>,...(GAT) f, LC 300 57 64 PtTX3127 ACCCTTACTTTCAGAAGAGGATA - F AATTGGGGTTCAACTATTCTATTA - R (CAA),„ LC 183 55 65 PtTX4001 CTATTTGAGTTAAGAAGGGAGTC - F C T G T G G G T A G C A T C A T C - R (CA),5 U M 224 65 66 PtTX4004 A A A A T A A G G G G A A A A G A G A A A A C C - F G A A C A G G C C C G T G A A C C A G T - R (GT)U...(T)4(GT)2(T), UM 175 55 67 PtTX4005 ACACAAACAAGAGATTTTCTATCA - F A T A T T T C C C T T T C T T C T T C T T G T - R (TG) 2 8(GA) 4...(TG)„ UM 222 61 68 PtTX4009 ACCTTGACCTTGTAGAGC - F CTGTGTCCCTTTAGAGATAG - R (CA) 3TA(CA), 4 UM 280 63 69 PtTX4011 GGTAACATTGGGAAAACACTCA - F T T A A C C A T C T A T G C C A A T C A C T T - R (CA)2(, UM 305 65 70 PtTX4018 G G C T G G A T T T G C G G C T A T A C C C - F TGCGGGAGAGGCAGAGTCC - R (GAC) 1 5 UM 164 59 71 PtTX4020 CGTCGCGAGCAAATGGTC - F C A G A T C C G C C C G C A A C A C A A T - R (TCG)6 UM 151 65 72 PtTX4024 A G G C C C C G T G C T G G T C T A - F TCCCGTCCTGCAAGTGAAAAG - R (GCA),, UM 261 61 73 PtTX4027 G C C C T T G G A C A C T G C T C T A T - F T A A T G C C G C C G C G C T T G G T C A - R (GCT)4 UM 301 65 74 PtTX4028 ACCGGCGTTACACATTTTATCTTG - F GCCGGCGCTTCTTATTAGTGTAG - R (GTC)5 UM 134 59 75 PtTX4030 TTGGGAGGATGACTCCATTATAT - F ACTATGGTTGGTCAAGTTAA - R (GT)2,(GA),3 UM 161 59 76 PTtx4033 ACCCATTTCCTTTTTCAAC - F GGTGGCGAGGCATTATTC - R (CA), 5 UM 156 59 77 PtTX4034 T G A T G G G A A G G A A A A G A A T A A A C - R A A T G C C C C C A C A A C T A A A A C - R (CA>, UM 190 65 78 PtTX4041 AAATATAAGGGGAGATGTGTAGGT - F T C T C T G T T C T T T T A T T C T C T T C T T - R (TG)27(CGTG),8(TG>, UM 115 59 79 PtTX4046 AATGTATATTGGCAACCCTATCA - F A C T A T G G A A C A T T G G G A A A C C - R (CA) 2 l l UM 363 57 80 PtTX4049 TGACCGCTCATGTAGAAG - F C T C T C C T T T G G A T T G T A T C T - R (CA), 5 UM 158 59 81 PtTX4050 A C A G G C G T C A C C A C A G A T A C A - F A G G T G G G C T A C G T G G G G A T T C - R (CA),5 UM 188 57 82 PtTX4054 T G C A T T C A C C T T G G A G T T - F TAGGAGATAATATAAAATGTT - R (GA)2, UM 179 59 83 PtTX4055 GTAAATGTGGGAGGAGGTGTTAA - F A C A A C A A C A T C A A T A A G A T C - R (TG)3CG(TG)3CGTA(TG) , 2 . . . ( G A ) , 8 . . . ( G A ) I B UM 450 55 84 PtTX4056 TTAAGGCCAGTTCCAATACAAAAT - F G A G C C C A A C A A C T A A A A C A A T G A G - R (GA)„ UM 436 65 85 PtTX4058 AAGTGTTGGGAGAAAAATGTAAT - F C T C C T T C T G T C C C T A T C C T C T - R (CA) 3(GA) 2„ UM 188 60 195 No. Marker Motif Library Type Allele size Ta CO 86 PtTX4061 CCAGGCGGCGCAGTCTG - F ACAGCGCGTAGCCAGTGTGG - R (GTC) 3...(GTC) 6...(GTT) 4 ...(GTT) 6 UM 164 57 87 PtTX4062 TCTAGGCAATCTTTTTACCAAC - F ATCATAGCCTCATCCAATACA - R T 8(GT)„ UM 176 60 88 PtTX4071 A G C C A G T A A A A T A A G A A A A A T A G T - F TCGACCCAGTTGAGATAA - R (TC),. . .(CA)„ UM 255 55 89 PtTX4076 A A A G G T G G G G A A A T G A A A T - F GTTTTTGGGTCTTTATGGTTCT-R (CA)2„ UM 247 63 90 PtTX4084 A C T G G C G A A G G C G A G C C G A C A C - F ACCGCTGTGGGAACCCTCCTCGTC - R (GCT)8 UM 145 65 91 PtTX4090 ACTTTCAAGATTCACTAATG - F AGTCCAGCACTCCAAGAAA - R (CTT) 6...(CTT), UM 188 57 92 PtTX4092 GGATGATACTTTCCATGAGTTAGG - F T C T A G T C C A G A T C T T G G T C C A C - R (GAA) 2 1 UM 162 57 93 PtTX4093 T T G C T T T G C T A A T G T T G A C C T G - F CTAGAGTATGCCTTGAGC - R (CTT)16(CTT)C(CTT).,...(C TT) 4 UM 337 62 94 PtTX4098 G T G G G A C C C C A A G C A C T - F (GAA) 7...(GAA) 3TA(GA A) 3 UM 174 61 A T T G C C T C C T C T T T A G T C A T C T C A - R 95 PtTX4100 A T C T C C C T A T A G G T T C A C T C A - F A C C C A C T C T T C A T A C T T T T G - R (GAA) 8 UM 207 63 96 PtTX4112 CCCTCTGTTAGCCGATGTA - F A A T G T T A G C C C T A G A T G T T T G A T G - R (CA) 4AA(CA) I 3 UM 463 57 97 PtTX4115 A C A T C A T A A C T C G G T A G T A A T T C - F GATACCAGATCGATGATGAC - R (TCA), 2 UM 118 62 98 PtTX4137 C A T T G T A T T A G T C C T A G C C T C T G T - F G G T G C A C C C A A C A A T G T G - R (CA) 2 7 UM 139 61 99 PtTX4139 TGGCATGCTAGGAAGAAGA - F TTGTATGTTGCCTGTGGAGA - R (CT)2 1 UM 153 59 100 PtTX4146 TAGAACCGATGGATGTTGATG - F TGACTCTGCCTTGTCCTCATA - R (GAT) ii UM 126 59 196 APPENDIX II Total Genomic DNA Isolation from Lodgepole pine Recipies 0.1 M C T A B For 100 ml, use 3.645 g C T A B , bring the volume to 100 ml with sterile water. Concentrate Buffer Base 0. 2 . Tris, 0.04M EDTA, and 2.8M NaCl For 100 ml of concentrate buffer base: 20 ml of 1M Tris 8 ml of0.5M EDTA 16.36 g of NaCl Bring the volume up to 100 mis with sterile water, adjust the pH to 8.3 Preparation of buffer and samples Buffer (800 pi of buffer per sample) For 20 mis of buffer (24 samples), use 10.0 ml of 0.1 M C T A B and 10.0 ml of concentrate buffer base. Add 40 pi of P-mercaptoethanol (0.2% of final volume). Sample Grind tissues (buds or young needles) rapidly in liquid nitrogen using pestle. Add 800 pi of buffer to each tube. Vortex briefly. Protocol 1. Incubate at 65 °C for an hour. Shake every 10 minutes to release trapped DNA. 2. Spin tubes at high speed for 1-2 minutes. 3. Pour off supernatant into new 1.5 ml tubes. Add 20 pi of 500 pg/ml RNase (1 pi of 10 mg/ml) to each tube. Incubate at 37 0 C for 30-45 minutes. 4. Add 700 pi of chloroform:isoamylalcohol (24:1). Rotamix for 30 minutes. Spin for 10 minutes at high speed. 5. Transfer the top layer to new 1.5 ml tubes. Add 2/3 volume (400 pi) of ice-cold isopropanol to each tube. Mix gently. Place tubes at -20 °C for at least 30 minutes, preferably overnight. 6. Spin tubes for 30minutes at high speed at 4 °C. 7. Wash the pellet with 200 pi of ice-cold 70% ethanol. Centrifuge for 10 minutes at high speed, carefully pour off the ethanol. Repeat step #7. 8. Dry the D N A in the fumehood (approximately 15-20 minutes) or by use of speed Vac for 5 minutes. 9. Resuspend D N A in 50 or 100 pi of sterile water at 65 °C for an hour. Vortex and quick spin. 10. Prepare 10 X diluted samples for D N A quantification. APPENDIX III Log-likelihood G test on segregation ratios of 19 microsatellite loci in P. contorta spp latifolia seeds Locus Genotype Observed ratio Pooled G P Heterogeneity G P df PtTX2123 200:203 64:61 0.81 0.37 2.22 0.99 10 PtTX2128 222:234 17:19 0.11 0.74 1.86 0.40 1 228:231 12:8 0.81 0.37 228:234 36:26 1.62 0.20 3.73 0.29 3 228:237 9:6 0.60 0.44 234:237 9:10 0.05 0.82 PtTX2146 157:196 17:16 0.03 0.86 0.03 0.87 1 157:199 9:9 0.00 1.00 157:202 13:7 1.83 0.18 160:196 10:8 0.22 0.64 169:178 9:9 0.00 1.00 169:196 15:15 0.00 1.00 0.13 0.71 1 169:199 8:10 0.22 0.64 172:196 25:30 0.46 0.50 1.50 0.47 2 178:196 8:8 0.00 1.00 178:205 8:6 0.29 0.59 187:196 9:8 0.06 0.81 190:193 10:9 0.05 0.82 190:196 14:29 5.34 0.02 0.07 0.80 1 PtTX3011 151:169 9:8 0.06 0.81 151:175 8:10 0.22 0.64 157:163 7:13 1.83 0.18 157:172 6:8 0.29 0.59 157:187 7:11 0.90 0.34 160:166 9:9 0.00 1.00 160:178 10:10 0.00 1.00 166:178 10:7 0.53 0.47 166:187 8:11 0.48 0.49 166:193 9:10 0.05 0.82 169:175 8:10 0.22 0.64 169:178 7:9 0.25 0.62 169:181 8:11 0.48 0.49 169:190 11:9 0.20 0.65 172:175 25:31 0.64 0.42 0.47 0.79 2 172:178 10:7 0.53 0.47 172:193 8:8 0.00 1.00 175:208 8:6 0.29 0.59 175:211 11:6 1.49 0.22 178:187 8:10 0.22 0.64 178:190 13:7 1.83 0.18 187:190 8:12 0.81 0.37 187:205 11:9 0.20 0.65 187:220 11:8 0.48 0.49 PtTX3025 266:275 15:18 0.27 0.60 1.35 0.25 1 266:305 15:13 0.14 0.71 0.14 0.70 1 257:266 20:26 0.78 0.38 0.35 0.84 2 257:275 26:25 0.02 0.89 0.26 0.88 2 PtTX3029 265:274 8:7 0.07 0.80 265:277 7:9 0.25 0.62 268:271 6:13 2.64 0.10 271:274 20:12 2.02 0.16 0.14 0.71 1 271:277 10:10 0.00 1.00 271:286 10:9 0.05 0.82 274:277 10:6 1.01 0.31 277:283 12:8 0.81 0.37 PtTX3030 300:320 9:11 0.20 0.65 318:320 10:18 2.32 0.13 0.00 1.00 1 320:324 8:8 0.00 1.00 320:326 23:25 0.08 0.77 0.14 0.93 2 323:326 24:15 2.10 0.15 0.21 0.65 1 Locus Genotype Observed ratio Pooled G P Heterogeneity G P df PtTX3034 197:213 8:10 0.22 0.64 199:207 14:16 0.13 0.71 0.00 1.00 1 201:209 12:8 0.81 0.37 201:221 9:10 0.05 0.82 203:207 8:9 0.06 0.81 203:211 8:11 0.48 0.49 205:219 10:8 0.22 0.64 207:209 8:10 0.22 0.64 207:213 18:16 0.12 0.73 0.13 0.72 1 207:219 22:13 2.34 0.13 0.08 0.78 1 209:217 11:8 0.48 0.49 209:219 29:29 0.00 1.00 0.00 1.00 1 213:215 12:8 0.81 0.37 215:219 9:8 0.06 0.81 PtTX3049 302:304 9:11 0.20 0.65 302:310 9:11 0.20 0.65 302:322 6:5 0.09 0.76 302:326 8:7 0.07 0.80 304:312 10:7 0.53 0.47 304:318 7:9 0.25 0.62 304:322 11:8 0.48 0.49 310:316 7:6 0.08 0.78 312:322 19:19 0.00 1.00 0.42 0.52 1 316:326 9:9 0.00 1.00 316:328 12:8 0.81 0.37 318:322 11:9 0.20 0.65 322:334 9:7 0.25 0.62 PtTX3052 239:248 19:17 0.11 0.74 0.66 0.72 2 248:254 8:11 0.48 0.49 PtTX3107 156:177 7:7 0.00 1.00 156:180 8:11 0.48 0.49 171:177 18:20 0.11 0.75 5.41 0.02 1 174:177 8:10 0.22 0.64 177:183 9:7 0.25 0.62 PtTX3127 178:187 39:36 0.12 0.73 1.52 0.82 4 178:190 12:8 0.81 0.37 178:202 11:8 0.48 0.49 181:187 6:9 0.60 0.44 187:190 11:9 0.20 0.65 187:199 9:6 0.60 0.44 187:202 10:8 0.22 0.64 PtTX4046 335:341 6:6 0.00 1.00 327:331 6:12 2.04 0.15 331:341 5:7 0.33 0.56 331:335 12:8 0.81 0.37 PtTX4054 270:274 13:7 1.83 0.18 272:284 7:10 0.53 0.47 274:290 11:9 0.20 0.65 280:290 10:8 0.22 0.64 280:292 19:14 0.76 0.38 0.07 0.80 1 280:298 8:12 0.81 0.37 282:284 11:9 0.20 0.65 282:290 17:15 0.13 0.72 0.00 0.98 1 282:292 7:10 0.53 0.47 284:286 8:12 0.81 0.37 284:292 9:6 0.60 0.44 284:296 11:8 0.48 0.49 286:288 7:10 0.53 0.47 286:292 7:9 0.25 0.62 286:298 9:9 0.00 1.00 290:296 20:14 1.06 0.30 0.49 0.49 1 290:298 7:9 0.25 0.62 292:294 21:17 0.42 0.52 0.38 0.54 1 292:300 9:9 0.00 1.00 292:302 11:8 0.48 0.49 294:296 13:7 1.83 0.18 294:296 12:8 0.81 0.37 Locus PtTX4056 PtTX4058 PtTX4139 LOP5 LOP 11 Genotype Observed ratio Pooled G P Heterogeneity G P df 411:417 10:9 0.05 0.82 411:431 8:9 0.06 0.81 427:429 6:6 0.00 1.00 427:449 10:7 0.53 0.47 429:431 9:9 0.00 1.00 429:441 8:8 0.00 1.00 429:445 16:18 0.12 0.73 0.00 1.00 1 429:447 9:9 0.00 1.00 429:461 6:8 0.29 0.59 439:447 9:11 0.20 0.65 128:142 18:14 0.50 0.48 0.05 0.82 1 128:148 9:8 0.06 0.81 134:148 8:11 0.48 0.49 140:148 9:11 0.20 0.65 140:152 12:7 1.33 0.25 140:158 10:9 0.05 0.82 140:160 7:7 0.00 1.00 142:144 22:16 0.95 0.33 0.15 0.70 1 142:146 17:17 0.00 1.00 1.06 0.30 1 144:148 18:18 0.00 1.00 0.45 0.50 1 144:150 10:7 0.53 0.47 144:152 7:9 0.25 0.62 144:154 9:7 0.25 0.62 144:156 9:10 0.05 0.82 146:148 7:13 1.83 0.18 146:158 6:9 0.60 0.44 148:152 12:7 1.33 0.25 154:160 13:7 1.83 0.18 113:129 19:19 0.00 1.00 0.00 1.00 1 113:135 23:27 0.32 0.57 0.03 0.99 2 113:137 11:6 1.49 0.22 113:141 17:16 0.03 0.86 0.26 0.61 1 113:149 10:9 0.05 0.82 127:131 21:15 1.00 0.32 0.11 0.74 1 127:139 10:9 0.05 0.82 129:139 24:15 2.10 0.15 1.25 0.26 1 131:135 10:7 0.53 0.47 131:139 10:7 0.53 0.47 131:157 8:10 0.22 0.64 133:141 9:7 0.25 0.62 135:137 9:8 0.06 0.81 135:151 9:6 0.60 0.44 137:147 12:8 0.81 0.37 168:170 8:8 0.00 1.00 168:178 25:16 1.99 0.16 0.27 0.61 1 170:172 10:8 0.22 0.64 170:174 9:7 0.25 0.62 170:176 10:7 0.53 0.47 170:178 15:17 0.13 0.72 0.47 0.49 1 174:178 10:8 0.22 0.64 174:186 10:10 0.00 1.00 176:184 8:12 0.81 0.37 176:186 7:10 0.53 0.47 178:192 6:13 2.64 0.10 182:188 9:7 0.25 0.62 184:210 8:12 0.81 0.37 241:247 8:12 0.81 0.37 243:245 9:8 0.06 0.81 243:247 7:10 0.53 0.47 243:249 22:14 1.79 0.18 0.07 0.79 1 243:251 15:15 0.00 1.00 0.54 0.46 1 245:247 30:24 0.67 0.41 0.68 0.71 2 247:255 12:8 0.81 0.37 249:251 12:8 0.81 0.37 249:261 9:10 0.05 0.82 249:265 7:11 0.90 0.34 251:253 11:9 0.20 0.65 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0075040/manifest

Comment

Related Items