"Science, Faculty of"@en . "Botany, Department of"@en . "DSpace"@en . "UBCV"@en . "Rai, Hardeep Singh"@en . "2009-01-26T14:47:53Z"@en . "2008"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "To investigate vascular-plant phylogeny at deep levels of relationship, I collected and analyzed a large set of plastid-DNA data comprising multiple protein-coding genes and associated noncoding regions. I addressed questions relating to overall tracheophyte phylogeny, including relationships among the five living lineages of seed plants, and within two of the largest living gymnosperm clades (conifers and cycads). I also examined relationships within and among the major lineages of monilophytes (ferns and relatives), including their relationship to the remaining vascular plants. Overall, I recovered three well-supported lineages of vascular plants: lycophytes, monilophytes, and seed plants. I inferred strong support for most of the phylogenetic backbones of cycads and conifers. My results suggest that the cycad family Stangeriaceae (Stangeria and Bowenia) is not monophyletic, and that Stangeria is instead more closely related to Zamia and Ceratozamia. Within the conifers, I found Pinaceae to be the sister-group of all other conifers, and I argue that two conifer genera, Cephalotaxus and Phyllocladus (often treated as monogeneric families) should be recognized under Taxaceae and Podocarpaceae, respectively. Systematic error likely affects inference of the placement of Gnetales within seed-plant phylogeny. As a result, the question of the relationships among the five living seed-plant groups still remains largely unresolved, even though removal of the most rapidly evolving characters appears to reduce systematic error. Phylogenetic analyses that included these rapidly evolving characters often led to the misinference of the \u00E2\u0080\u009CGnetales-sister\u00E2\u0080\u009D hypothesis (Gnetales as the sister-group of all other seed plants), especially when maximum parsimony was the inference method. Filtering of rapidly evolving characters had little effect on inference of higher-order relationships within conifers and monilophytes, and generally resulted in reduced support for backbone relationships. Within the monilophytes, I found strong support for the majority of relationships along the backbone. These were generally congruent with other recent studies. Equisetaceae and Marattiaceae may be, respectively, the sister-groups of the remaining monilophytes and of the leptosporangiate ferns, but relationships among the major monilophyte lineages are sensitive to the outgroups used, and to long branches in lycophytes."@en . "https://circle.library.ubc.ca/rest/handle/2429/3889?expand=metadata"@en . "21953827 bytes"@en . "application/pdf"@en . " MOLECULAR PHYLOGENETIC STUDIES OF THE VASCULAR PLANTS by Hardeep Singh Rai A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Botany) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) December 2008 \u00C2\u00A9 Hardeep Singh Rai, 2008 ii ABSTRACT To investigate vascular-plant phylogeny at deep levels of relationship, I collected and analyzed a large set of plastid-DNA data comprising multiple protein-coding genes and associated noncoding regions. I addressed questions relating to overall tracheophyte phylogeny, including relationships among the five living lineages of seed plants, and within two of the largest living gymnosperm clades (conifers and cycads). I also examined relationships within and among the major lineages of monilophytes (ferns and relatives), including their relationship to the remaining vascular plants. Overall, I recovered three well- supported lineages of vascular plants: lycophytes, monilophytes, and seed plants. I inferred strong support for most of the phylogenetic backbones of cycads and conifers. My results suggest that the cycad family Stangeriaceae (Stangeria and Bowenia) is not monophyletic, and that Stangeria is instead more closely related to Zamia and Ceratozamia. Within the conifers, I found Pinaceae to be the sister-group of all other conifers, and I argue that two conifer genera, Cephalotaxus and Phyllocladus (often treated as monogeneric families) should be recognized under Taxaceae and Podocarpaceae, respectively. Systematic error likely affects inference of the placement of Gnetales within seed-plant phylogeny. As a result, the question of the relationships among the five living seed-plant groups still remains largely unresolved, even though removal of the most rapidly evolving characters appears to reduce systematic error. Phylogenetic analyses that included these rapidly evolving characters often led to the misinference of the \u00E2\u0080\u009CGnetales-sister\u00E2\u0080\u009D hypothesis (Gnetales as the sister-group of all other seed plants), especially when maximum parsimony was the inference method. Filtering of rapidly evolving characters had little effect on inference of higher-order iii relationships within conifers and monilophytes, and generally resulted in reduced support for backbone relationships. Within the monilophytes, I found strong support for the majority of relationships along the backbone. These were generally congruent with other recent studies. Equisetaceae and Marattiaceae may be, respectively, the sister-groups of the remaining monilophytes and of the leptosporangiate ferns, but relationships among the major monilophyte lineages are sensitive to the outgroups used, and to long branches in lycophytes. iv TABLE OF CONTENTS ABSTRACT.............................................................................................................................. ii TABLE OF CONTENTS......................................................................................................... iv LIST OF TABLES................................................................................................................. viii LIST OF FIGURES ................................................................................................................. ix ACKNOWLEDGEMENTS..................................................................................................... xi DEDICATION........................................................................................................................ xii CO-AUTHORSHIP STATEMENT....................................................................................... xiii CHAPTER 1. INTRODUCTION..............................................................................................1 1.1 Overview of Vascular Plant Systematics................................................................1 1.2 Objectives of the Thesis..........................................................................................3 1.3 References...............................................................................................................7 CHAPTER 2. INFERENCE OF HIGHER-ORDER RELATIONSHIPS IN THE CYCADS FROM A LARGE CHLOROPLAST DATA SET......................................12 2.1 Introduction...........................................................................................................12 2.2 Materials and Methods..........................................................................................14 2.2.1 Taxonomic and Genomic Sampling.............................................................14 2.2.2 DNA Extraction, Amplification, and Sequencing .......................................15 2.2.3 Data Assembly .............................................................................................16 2.2.4 Phylogenetic Analysis..................................................................................17 2.3 Results...................................................................................................................18 2.3.1 26-Taxon Data Set .......................................................................................18 v 2.3.2 Cycads and Ginkgo ......................................................................................18 2.3.3 Molecular Evolution in Cycads and Relatives.............................................19 2.4 Discussion .............................................................................................................20 2.4.1 Cycad Molecular Evolution .........................................................................20 2.4.2 Basal Cycads and the Sister Group of Cycadales ........................................21 2.5 References.............................................................................................................37 CHAPTER 3. INFERENCE OF HIGHER-ORDER CONIFER RELATIONSHIPS FROM A MULTI-LOCUS PLASTID DATA SET ......................................................43 3.1 Introduction...........................................................................................................43 3.2 Materials and Methods..........................................................................................46 3.2.1 Plant Material and Genomic Sampling ........................................................46 3.2.2 Recovery of Plastid Sequences, DNA Alignment and Characterization of an Indel Hotspot ...............................................................................................47 3.2.3 Phylogenetic Analyses .................................................................................48 3.3 Results...................................................................................................................50 3.4 Discussion .............................................................................................................53 3.4.1 Rapidly Evolving Plastid DNA Sites and the Inference of Higher-Order Conifer Relationships ..................................................................................53 3.4.2 The Ovulate Cone and Conifer Systematics ................................................54 3.4.3 The Case for Recognizing Cephalotaxus as a Member of Taxaceae...........55 3.4.4 Relationships within Cupressaceae..............................................................57 3.4.5 The Higher-Order Position of Phyllocladus in Conifer Phylogeny.............58 3.4.6 The Position of Wollemia within Araucariaceae .........................................58 vi 3.4.7 Significance of an Expansion Hotspot in the Plastid Ribosomal Protein Gene rps7 ....................................................................................................59 3.4.8 Seed-Plant Phylogeny and the Position of Gnetales ....................................60 3.4.9 The Inference of Conifer Phylogeny from Plastid Data ..............................61 3.5 References.............................................................................................................71 CHAPTER 4. INFERENCE AND MISINFERENCE OF HIGHER-ORDER SEED- PLANT RELATIONSHIPS FROM PLASTID DATA ................................................80 4.1 Introduction...........................................................................................................80 4.2 Materials and Methods..........................................................................................84 4.2.1 Taxonomic and Genomic Sampling.............................................................84 4.2.2 DNA Extraction, Amplification, Sequencing and Data Assembly..............85 4.2.3 Phylogenetic Analyses .................................................................................86 4.2.4 Inference of Nucleotide Rate Classes ..........................................................87 4.2.5 Systematic Error...........................................................................................87 4.3 Results...................................................................................................................89 4.3.1 Phylogenetic Analysis of the Real Data.......................................................89 4.3.2 Inference of Systematic Error Using Monte Carlo Simulations ..................91 4.3.3 Mis-inference of the Gnetales-Sister Hypothesis when there is No Evidence for It .............................................................................................................92 4.4 Discussion .............................................................................................................93 4.5 References...........................................................................................................115 vii CHAPTER 5. INFERENCE OF DEEP VASCULAR-PLANT PHYLOGENY, WITH A FOCUS ON BACKBONE RELATIONSHIPS IN MONILOPHYTA......................123 5.1 Introduction.........................................................................................................123 5.2 Materials and Methods........................................................................................127 5.2.1 Taxonomic and Genomic Sampling...........................................................127 5.2.2 DNA Extraction, Amplification and Sequencing ......................................129 5.2.3 Phylogenetic Analyses ...............................................................................129 5.2.4 Inference of Nucleotide Rate Classes and Exploration of the Effect of Long Branches ....................................................................................................130 5.3 Results.................................................................................................................131 5.3.1 Phylogenetic Analyses ...............................................................................131 5.3.2 Rate Class Analyses...................................................................................134 5.4 Discussion ...........................................................................................................136 5.5 References...........................................................................................................154 CHAPTER 6. CONCLUSION...............................................................................................160 6.1 Overall Conclusions............................................................................................160 6.1.1 Reconstruction of Higher-Order Relationships in Cycad Phylogeny ........160 6.1.2 Reconstruction of Higher-Order Relationships in Conifer Phylogeny .....161 6.1.3 Seed-Plant Phylogeny: Inference and Misinference of Higher-Order Relationships .............................................................................................162 6.1.4 Monilophytes and Deep Vascular-Plant Phylogeny ..................................163 6.2 Summary and Future Directions .........................................................................164 6.3 References ...........................................................................................................166 viii LIST OF TABLES 2.1 GenBank accession numbers and vouchers for exemplar taxa with one or more previously unpublished DNA sequences .....................................................................24 2.2 Lengths and variation in length of the noncoding regions used in this study..............28 2.3 Likelihood ratio test (LRT) for different substitution models .....................................30 2.4 Estimated likelihood parameters (HKY + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2.................................................31 2.5 Estimated likelihood parameters (GTR + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2.................................................32 3.1 Source information and GenBank numbers.................................................................63 4.1 New primers designed for this study ...........................................................................98 4.2 Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum parsimony (MP) as a search criterion......................................100 4.3 Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum likelihood (ML) as a search criterion ......................................102 4.4 Major seed-plant hypotheses inferred from simulations of various partitions of the real data constrained to the anthophyte hypothesis (Gnetales united with angiosperms). Both maximum parsimony (MP) and maximum likelihood results are shown .........................................................................................................................104 5.1 GenBank accession numbers and vouchers for exemplar pteridophyte (and outgroup) taxa.............................................................................................................................140 5.2 New primers designed for this study .........................................................................147 ix LIST OF FIGURES 1.1 Generalized phylogenetic relationships of the land plants ............................................6 2.1 Summary of Stevenson\u00E2\u0080\u0099s (1992) classification of the cycads .....................................34 2.2 Chloroplast based phylogeny of the cycads and relatives. ..........................................35 2.3 Chloroplast based phylogeny of the cycads using different optimality criteria...........36 3.1 Plastid-based phylogeny of the conifers and relatives inferred from MP....................67 3.2 Plastid-based phylogeny of the conifers and relatives inferred from ML ...................68 3.3 Summary of bootstrap support after removal of sites classified as the two of nine fastest rate classes ........................................................................................................69 3.4 Dot-plot showing the pairwise similarity of complete translated sequences of the plastid rps7 locus from selected conifers.....................................................................70 4.1 Various seed-plant topologies proposed in the literature with regard to the position of Gnetales......................................................................................................................105 4.2 Plastid-based phylogeny of the conifers and relatives. ..............................................106 4.3 Maximum likelihood tree found using coding regions from 17 plastid genes and including all 9 rate classes .........................................................................................107 4.4A Proportion of the total nucleotides in each of two codon-position plastid data partitions that belong to different rate classes............................................................109 4.4B Proportion of the total characters in each of nine rate classes that belong to the two codon position data partitions ....................................................................................110 4.5 Maximum likelihood tree found using codon positions 1 and 2 for multiple plastid genes ..........................................................................................................................111 x 4.6 Depiction of the zero-length branch when maximum likelihood is used as the criterion for viewing the anthophyte hypothesis ......................................................................112 Sup.1 Relationships within the conifer clades presented in Figs. 4.2 and 4.3 .....................114 5.1 The consensus tree presented in Smith et al. (2006) based on recent and ongoing phylogenetic studies...................................................................................................148 5.2 Plastid-based phylogeny of the vascular plants. ........................................................149 5.3 Maximum likelihood tree found using 17 plastid genes and associated noncoding regions........................................................................................................................151 5.4 Maximum likelihood tree found using 17 plastid genes and associated noncoding regions, excluding the two fastest rate classes and including or excluding Selaginella..................................................................................................................152 5.5 Placement of Equisetum from various taxon-exclusion analyses for the plastid data considered here ..........................................................................................................153 xi ACKNOWLEDGEMENTS I am indebted to my Ph.D. supervisor, Dr. Sean W. Graham, for his guidance over the course of my graduate career, and for allowing me to pursue my research interests in his laboratory. I also thank the members of my supervisory committees at the University of Alberta (Drs. Ruth Stockey and Felix Sperling) and the University of British Columbia (Drs. Jeannette Whitton, Quentin Cronk, Wayne Maddison, and Patrick Keeling) for their help and guidance throughout my graduate career. My laboratory, herbarium, and field work has been supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grants to Sean Graham, the University of British Columbia, and the Department of Botany (University of British Columbia). Personal financial support was provided by a Postgraduate Scholarship (PGS-D), a University Graduate Fellowship (University of British Columbia), and an NSERC Discovery Grant to Sean W. Graham. xii DEDICATION For my family Michelle, Symrin and Darshan Rai xiii CO-AUTHORSHIP STATEMENT Chapter 2 is based on a published manuscript: Rai, H. S., H. E. O\u00E2\u0080\u0099Brien, P. A. Reeves, R. G. Olmstead, and S. W. Graham. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Molecular Phylogenetics and Evolution 29: 350- 359. The project was suggested by S. W. Graham. I conducted all laboratory work and data analyses, and wrote the manuscript. S. W. Graham provided insights into data analyses and contributed to the writing. Chapter 3 is based on a published manuscript: Rai, H. S., P. A. Reeves, R. Peakall, R. G. Olmstead, and S. W. Graham. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 29: 350-359. Sean W. Graham and I designed the project. I conducted all laboratory work and data analyses, and wrote the manuscript. S. W. Graham provided insights into taxonomic sampling, data analyses and contributed to the writing. Chapter 4 is a draft manuscript that will be submitted for publication: Rai, H. S., and S. W. Graham. Inference and misinference of higher-order seed-plant relationships from plastid data. Sean W. Graham and I designed the project. I carried out the laboratory work, designed 27 new seed-plant specific primers, carried out data analyses and wrote the manuscript. Sean W. Graham provided valuable guidance with respect to data analyses and writing. xiv Chapter 5 is a draft manuscript that will be submitted for publication: Rai, H. S., and S. W. Graham. Deep vascular plant phylogeny with a focus on the backbone of Monilophyta. Sean W. Graham and I designed the project. I carried out the laboratory work, designed 16 new monilophyte specific primers, carried out data analyses and wrote the manuscript. Many plant and DNA samples were kindly provided by Drs. P. G. Wolf (Utah State University) and K. M. Pryer (Duke University). Sean W. Graham provided valuable guidance with respect to taxonomic sampling, data analyses and writing. 1 CHAPTER 1 INTRODUCTION 1.1 OVERVIEW OF VASCULAR PLANT SYSTEMATICS The vascular plants (tracheophytes) include all extant plants with branched sporophytes. They usually have true roots, stems and leaves and possess a system of vascular tissue that transport water and nutrients between different parts of the plant. With a known fossil record that stretches back at least 410 Myr (Lang, 1937), tracheophytes include true rhyniophytes (such as Rhynia gwynne-vaughanii; Kenrick and Crane, 1997), small plants with simple bifurcating stems that are most likely the sister-group of all other vascular plants (Judd et al., 2008). Several extinct lineages were once placed within the rhyniophytes, most notably Cooksonia cambrensis and Aglaophyton (Rhynia) major. Cooksonia has recently been found to be a non-monophyletic assemblage of extinct species, with Cooksonia cambrensis more closely related to lycophytes (Fig. 1.1) than to other described Cooksonia species that are more closely related to the euphyllophytes (Kenrick and Crane, 1997; Crane et al., 2004). Aglaophyton major (originally described as Rhynia major) has been shown to lack true secondary thickening in its xylem and thus is no longer classified as a vascular plant (although it is likely to be the sister-group of tracheophytes; Edwards, 1986; Crane et al., 2004). Broadly, extant tracheophytes can be viewed as three distinct lineages: spermatophytes (seed plants), monilophytes (including all living ferns), and lycophytes. Immature sporophytes (embryos) of spermatophytes are enveloped in one or more integument layers within the seed, together with nutritive tissue that is often derived from the 2 megasporangium or megagametophyte. Seed plants, and especially flowering plants make up much of the world\u00E2\u0080\u0099s current plant diversity. The lycophytes, are the most species-poor of the three major vascular-plant clades, and are now recognized as the sister-group of all other extant vascular plants (Fig. 1.1; Raubeson and Jansen, 1992; Kenrick and Crane, 1997; Pryer et al., 2001, 2004; Rydin et al., 2002). Within the extant euphyllophytes (spermatophytes and monilophytes), recent studies have revealed two major clades; monilophytes (including whisk ferns, horsetails, and eusporangiate and leptosporangiate ferns), and spermatophytes (Pryer et al., 2001, 2004). Despite considerable progress regarding relationships within many of their constituent subclades (e.g., the angiosperms, APG II, 2003; conifers, Quinn et al., 2002; leptosporangiate ferns, Schuettpelz et al., 2006; Schuettpelz and Pryer, 2007) and a large number of studies using both morphological and molecular evidence (Raubeson and Jansen, 1992; Rothwell and Serbet, 1994; Kranz and Huss, 1996; Kenrick and Crane, 1997; Doyle, 1998; Duff and Nickrent, 1999; Rothwell, 1999; Soltis et al., 1999; Nickrent et al., 2000; Pryer et al., 2001, 2004; Rydin et al., 2002; Doyle, 2006; Rothwell and Nixon, 2006), deep relationships among these vascular-plant groups remain largely unsettled. The monilophytes (Monilophyta; Cantino et al., 2007) comprise the eusporangiate ferns, psilotophytes (whisk ferns), equisetophytes (horsetails) and leptosporangiate ferns. Previous studies of the monilophytes, based on morphology and single-gene molecular studies, have left partly unclarified the relationships among major taxa (Duff and Nickrent, 1999; Rothwell, 1999; Kenrick and Crane, 1997; Kranz and Huss, 1996; Pryer et al., 1995; Smith, 1995; Manhart, 1994; Pichi Sermolli, 1974). A four-gene study (Pryer et al., 2001, 2004) of vascular plants and subsequent five-gene follow-up (Schuettpelz et al., 2006) have 3 provided strong bootstrap support for some of the deepest relationships, mostly within leptosporangiate ferns. The reconstruction of seed-plant relationships is recognized as one of the most difficult problems in plant systematics (e.g., Donoghue and Doyle, 2000). A wide range of studies using evidence from morphology and molecules have given many different and often strongly conflicting results (e.g., Hill and Crane, 1982; Doyle and Donoghue, 1986; Loconte and Stevenson, 1990; Rothwell and Serbet, 1994; Boivin et al., 1996; Doyle, 1996; Goremykin et al., 1996; Chaw et al., 1997, 2000; Bowe et al., 2000; Sanderson, 2000; Rydin et al., 2002; Burleigh and Mathews, 2004, 2007a, b). One problem that has plagued researchers working with molecular evidence is that gymnosperms (particularly the conifers among living taxa) have a diverse fossil record but relatively few extant representatives (e.g., Ginkgo with only a single living species as a remnant of a more diverse Mesozoic group; Thomas and Spicer, 1986), and so all extant lineages are subtended by very long interior branches, which may lead to long-branch attraction (Felsenstein, 1978; Penny and Hendy, 1985; Hendy and Penny, 1989). Approaches for dealing with these problematic long branches include denser taxonomic sampling within extant clades, consideration of conservatively evolving data (e.g., the plastid genome of plants, which displays a slower rate of evolution than its nuclear counterpart; Graham and Olmstead, 2000), adding additional molecular data, and examining the data collected for evidence of systematic bias (e.g., Sanderson et al., 2000). In this thesis I attempt to employ a fusion of these approaches, employing what currently represents the largest amount of data sampled per taxon with this level of taxonomic sampling. 4 1.2 OBJECTIVES OF THE THESIS I present data from multiple plastid genes for a relatively dense taxonomic sampling of seed plants [seven cycads representing all tribes recognized in Stevenson, 1992, Chapter 2; 22 conifers with multiple representatives for each of the seven families that are usually recognized, Chapter 3; and nine additional exemplar representatives of the remaining seed plant diversity (three Gnetales, five angiosperms, and Ginkgo); Chapter 4] to examine relationships both within and among these seed plant groups. In addition to basic phylogenetic reconstructions, I examine the effect of fast-evolving data (e.g., third codon positions, and the fastest evolving rate classes according to a maximum likelihood classification; Chapters 3, 4, 5) on phylogenetic inference, and also address the potential for misinference due to systematic error using Monte Carlo simulations (e.g. Sanderson et al., 2000; Chapter 4). In Chapter 5 I broadly explore monilophyte relationships using 64 representative taxa (34 of these are monilophyte exemplars). An overarching goal of this thesis is to solidify our basic knowledge of deep vascular- plant relationships, with a focus on the deepest evolutionary relationships within these groups: cycads (Chapter 2), conifers (Chapter 3), seed plants as a whole (Chapter 4), and monilophytes, especially leptosporangiate ferns (Chapter 5). This thesis is based on a large sampling of the coding and noncoding regions of the plastid genome. These regions span 17 genes that form the backbone of all data collected for this work and include atpB, rbcL, ten Photosystem II (psb) genes, three ribosomal protein genes, and two NADH dehydrogenase subunit (ndh) genes. Three chapters (Chapters 2, 3, 5) also include 10-11 associated noncoding regions; three introns and eight intergenic spacer regions [leptosporangiate ferns possess a large inversion with a breakpoint within one of the intergenic spacers (Wolf et al., 5 2003), and so one fewer region was included for them]. The regions surveyed are generally extremely slowly evolving (Graham and Olmstead, 2000), and should prove useful in addressing the impact of systematic bias in vascular-plant phylogenetic inference. In addition to inferring the deep portions of the vascular-plant \u00E2\u0080\u009CTree of Life,\u00E2\u0080\u009D the larger significance of my thesis work is that it will provide more resolved and better supported phylogenies at each of these deep levels of relationship, for use by evolutionary and genomic biologists (for example, for studying the molecular evolution of the plastid genome) and by systematists (for constructing more natural classification schemes), in addition to using various approaches to gauge the degree to which tree inference can be trusted at the deepest and most difficult-to-infer points of plant phylogeny. 6 Figure 1.1. Generalized phylogenetic relationships of the land plants, modified from Judd et al. (2008; Fig. 7.8). Red bars indicate some synapomorphies that define several major clades (tracheophytes, lycophytes, and euphyllophytes). \u00E2\u0080\u00A0 indicates extinct taxa. 7 1.3 REFERENCES ANGIOSPERM PHYLOGENY GROUP (APG II). 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141: 399-436. BOIVIN, R., M. RICHARD, D. BEAUSEIGLE, J. BOUSQUET, AND G. BELLEMARE. 1996. Phylogenetic inferences from chloroplast chlB gene sequences of Nephrolepis exaltata (Filicopsida), Ephedra altissima (Gnetopsida), and diverse land plants. Mol. Phyl. Evol. 6: 19-29. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales\u00E2\u0080\u0099 closest relatives are conifers. Proc. Acad. Nat. Sci. 97: 4092-4097. BURLEIGH, J. G. AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am. J. Bot. 91: 1599-1613. BURLEIGH, J. G. AND S. MATHEWS. 2007a. Assessing among-locus variation in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 111-124. BURLEIGH, J. G. AND S. MATHEWS. 2007b. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO, P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, AND M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822-846. CHAW, S-M., A. ZHARKIKH, H-M. SUNG, T-C. LAU, AND W-H. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68. 8 CHAW, S., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Acad. Nat. Sci. 97: 4086-4091. CRANE, P. R., P. HERENDEEN, AND E. M. FRIIS. 2004. Fossils and plant phylogeny. Amer. J. Bot. 91: 1683-1699. DONOGHUE, M. J. AND J. A. DOYLE. 2000. Demise of the anthophyte hypothesis? Curr. Biol. 10: R106-R109. DOYLE, J. A. 1996. Seed plant phylogeny and the relationships of Gnetales. Int. J. Plant Sci. 157: S3-S39. DOYLE, J. A. 1998. Phylogeny of vascular plants. Ann. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A. 2006. Seed ferns and the origin of angiosperms. J. Torreya Bot. Soc. 133: 169- 209. DOYLE, J. A., AND M. J. DONOGHUE. 1986. Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach. Bot. Rev. 52:321-431. DUFF, R. J. AND D. L. NICKRENT. 1999. Phylogenetic relationships of land plants using mitochondrial small-subunit rDNA sequences. Am. J. Bot. 86: 372-386. EDWARDS, D. S. 1986. Aglaophyton major, a non-vascular land-plant from the Devonian Rhynie chert. Bot. J. Linn. Soc. 93: 173-204. FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27: 401-410. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730. 9 GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A. ANTONOV, AND W. MARTIN. 1996. Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Mol. Biol. Evol. 13: 383-396. HENDY, M. D. AND D. PENNY. 1989. Framework for the quantitative study of evolutionary trees. Syst. Zool. 38: 297-309. HILL, C. R., AND P. R. CRANE. 1982. Evolutionary cladistics and the origin of the angiosperms. Pages 269-361 In: Problems of phylogenetic reconstruction (K.A. Joysey and E.A. Friday, eds.). Academic Press, London. JUDD, W., C. S. CAMPBELL, E. A. KELLOGG, P. F. STEVENS, AND M. J. DONOGHUE. 2008. Plant systematics: A phylogenetic approach. 3rd ed. Sinauer Associates, Sunderland, Massachusetts. KENRICK, P. AND P. R. CRANE. 1997. The origin and early diversification of land plants: a cladistic study. Smithsonian Press, Washington, D.C., USA. KRANZ, H. D., AND V. A. R. HUSS. 1996. Molecular evolution of pteridophytes and their relationship to seed plants: evidence from complete 18S rRNA gene sequences. Plant Syst. Evol. 202: 1-11. LOCONTE, H. AND D. W. STEVENSON. 1990. Cladistics of the spermatophyta. Brittonia. 42: 197-211. LANG, W. H. 1937. On the plant remains from the Downtonian of England and Wales. Phil. Trans. R. Soc. 227B: 245-291. MANHART, J. R. 1994. Phylogenetic analysis of green plant rbcL sequences. Mol. Phylogenet. Evol. 3: 114-127. 10 NICKRENT, D. L., C. L. PARKINSON, J. D. PALMER, AND R. J. DUFF. 2000. Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants. Mol. Biol. Evol. 17: 1885-1895. PENNY, D., AND M. D. HENDY. 1985. Testing methods of evolutionary tree construction. Cladistics 1: 266-272. PICHI SERMOLLI, R. E. G. 1977. Tentamen pteridophytorum genera in taxonomicum ordinem redigendi. Webbia 31: 313-512. PRYER, K. M., H. SCHNEIDER, A. R. SMITH, R. CRANFILL, P. G. WOLF, J. S. HUNT, AND S. D. SIPES. 2001. Horsetail and ferns are a monophyletic group and the closest living relatives to seed plants. Nature 409: 618-622. PRYER, K. M., E. SCHUETTPELZ, P. G. WOLF, H. SCHNEIDER, A. R. SMITH, AND R. CRANFILL. 2004. Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am. J. Bot. 91: 1582-1598. QUINN, C. J., R. A. PRICE, AND P. A. GADEK. 2002. Familial concepts and relationships in the conifers based on rbcL and matK sequence comparisons. Kew Bull. 57: 513-531. RAUBESON, L. A., AND R. K. JANSEN. 1992. Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science 255: 1697-1699. ROTHWELL, G. W. 1999. Fossils and ferns in the resolution of land plant phylogeny. Bot. Rev. 65: 188-218. ROTHWELL, G. W., AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482. 11 ROTHWELL, G. W., AND K. C. NIXON. 2006. How does the inclusion of fossil data change our conclusions about the phylogenetic history of euphyllophytes? Int. J. Plant Sci. 167: 737- 749. RYDIN, C., M. K\u00C3\u0084LLERSJ\u00C3\u0096, AND E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci. 163:197-214. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17:782-797. SCHUETTPELZ, E., P. KORALL, AND K. M. PRYER. 2006. Plastid atpA data provide improved supports for deep relationships among ferns. Taxon 55: 897-906. SCHUETTPELZ, E. AND K. M. PRYER. 2007. Fern phylogeny inferred from 400 leptosporangiate species and three plastid genes. Taxon 56: 1037-1050. SMITH, A. R. 1995. Non-molecular phylogenetic hypotheses for ferns. Am. Fern. J. 85: 104- 122. SOLTIS, P. S., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402:402-404. STEVENSON, D. W. 1992. A formal classification of the extant cycads. Brittonia 44: 220-223. THOMAS, B. A. AND R. A. SPICER. 1986. The evolution and palaeobiology of land plants. Dioscorides Press, Portland, OR. WOLF, P. G., C. A. ROWE, R. B. SINCLAIR, AND M. HASEBE. 2003. Complete nucleotide sequence of the chloroplast genome from a leptosporangiate fern, Adiantum capillus- veneris L. DNA Res. 10: 59-65. 12 CHAPTER 21 INFERENCE OF HIGHER-ORDER RELATIONSHIPS IN THE CYCADS FROM A LARGE CHLOROPLAST DATA SET 2.1. INTRODUCTION The cycads (order Cycadales) are one of five living groups of seed-bearing plants. They are long-lived trees and shrubs, with a highly distinctive vegetative and reproductive morphology (e.g., Chamberlain, 1965; Stevenson, 1981, 1990; Norstog, 1990; Jones, 1993; Norstog and Nicholls, 1997). They have a substantial and ancient (ca. 270 million years ago; Mamay, 1969) fossil record, and are currently recognized as comprising three or four families (Cycadaceae, Stangeriaceae and Zamiaceae; Boweniaceae was erected by Stevenson in 1981 but reduced by him to a subfamily of Stangeriaceae in 1992). In all phylogenetic studies to date, the basal placement of the monogeneric family Cycadaceae within the order has not been disputed. Cycas is one of the most distinctive entities, morphologically, in the cycads, and it is frequently used as an outgroup for studies of the remaining taxa. However, there is still substantial variation in published inferences of higher-order phylogenetic relationships among the remaining taxa, and most studies have not provided robust support (as measured by bootstrap or jackknife analysis, for example) for the broad backbone of cycad phylogeny (Crane, 1988; Stevenson, 1990; Caputo et al., 1993; Schutzman and Dehgan, 1993; Rydin et al., 2002; Bogler and Francisco-Ortega, 2004; see 1 A version of this chapter has been published: Rai, H.S., O\u00E2\u0080\u0099BRIEN, H.E., P.A. REEVES, R. G. OLMSTEAD, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phyl. Evol. 29: 350-359. 13 Caputo et al., 1991, for an exception). There is also a considerable difference of opinion about the relationship of the cycads to the other extant and extinct seed plant groups (see Doyle, 1998, for a recent summary of the diversity of findings regarding seed plant phylogeny). To better sort out higher-order relationships in the cycads I sequenced multiple coding and noncoding regions of the plastid genome (ca. one tenth of the entire genome) using primers designed by Graham and Olmstead (2000a). The regions examined comprise atpB, rbcL, ten Photosystem II genes, three ribosomal protein genes, two NADH dehydrogenase subunit genes, three introns and eight intergenic spacer regions (see below). These regions have proven useful for reconstructing phylogenetic relationships among deep branches of the angiosperms (Graham and Olmstead, 2000a,b; Graham et al., 2000) and within the monocots (Saarela et al., 2008). I examined exemplar species that represent all of the tribes recognized by Stevenson (1992; summarized in Fig. 2.1), in addition to a number of outgroup taxa (Table 2.1). The question of cycad placement in the seed plants, and of seed plant relationships in general, remains a thorny issue (Hill and Crane, 1982; Doyle and Donoghue, 1986; Loconte and Stevenson, 1990; Rothwell and Serbet, 1994; Boivin et al., 1996; Doyle, 1996, 1998; Goremykin et al., 1996; Chaw et al., 1997, 2000, Bowe et al., 2000; Sanderson et al., 2000; Rydin et al., 2002). I address the latter question briefly with a parsimony analysis using exemplars from all of the living seed plant groups and several free-sporing plants; a more detailed examination of seed plant relationships is presented in Chapter 4. While the focus of the current paper is on phylogenetic relationships within the cycads, I also examine several 14 features of the molecular evolution of the genes examined, and assess how these may have an impact upon inference of phylogeny within the order. 2.2 MATERIALS AND METHODS 2.2.1 Taxonomic and Genomic Sampling A total of 17 chloroplast genes and associated noncoding regions were used for this study, representing about a tenth of the entire genome. The various coding and noncoding regions examined are enumerated in Tables 2.1 and 2.2. The seven exemplar species of cycads examined here represent all families, subfamilies and tribes recognized by Stevenson (1992; Fig. 2.1). Most of the sequences for six of the seven cycads (and three of the five conifer exemplars) are completely new (Table 2.1). These data were added to sequences that were generated previously for studies of basal angiosperm phylogeny (Graham and Olmstead, 2000a,b; Graham et al., 2000; see the former two references for source and voucher information for previously published seed plant sequences). A few additional sequences were added for taxa that were examined previously for fewer regions (Graham and Olmstead 2000a,b), including ndhF for Zamia furfuracea and part of rps7, the rps7-ndhB intergenic spacer region, and ndhB for Sciadopitys verticillata. The final matrix included eight species that represent a broad sampling of the diversity of basal angiosperms (see Mathews and Donoghue, 1999; Parkinson et al., 1999; Soltis et al., 1999; Graham and Olmstead, 2000a,b; Graham et al., 2000; Qiu et al., 2000; Savolainen et al., 2000), five conifers (Pinus thunbergii was obtained from GenBank; accession number D17510), seven exemplar cycads, three exemplar Gnetales (representing all extant families), and two outgroup species obtained from GenBank (Marchantia 15 polymorpha and Psilotum nudum; GenBank accession numbers NC_00319 and NC_003386). GenBank numbers for Gnetum gnemon and six of the eight angiosperms are provided in Graham and Olmstead (2000a,b); previously unpublished numbers for the other angiosperms and Gnetales are provided here (Table 2.1; see also Graham et al., 2000). 2.2.2 DNA Extraction, Amplification and Sequencing DNA was extracted from fresh and silica-dried specimens using the protocol of Doyle and Doyle (1987), except that we added 10% PVP 40 (polyvinylpyrollidine) to the extraction buffer. DNA amplification and sequencing methods are as described in Graham and Olmstead (2000a), with the exception that a ca. 1.0 kb PCR fragment from Ceratozamia miqueliana (spanning four genes: psbB, psbT, psbN, and psbH) was cloned using the TOPO TA cloning kit (Invitrogen Corporation; Carlsbad, Ca.). This taxon appears to have at least one additional version of this region that includes several pseudogenes (H. S. Rai and S. W. Graham, unpublished data). All regions were sequenced at least twice for each taxon, and with a few minor exceptions were completely sequenced in both forward and reverse directions. Regions that I was unable to amplify or sequence, or that have been confirmed as lost from the plastid genome (ndhB and ndhF for Pinus thunbergii; Wakasugi et al., 1994) were coded as missing data in the final matrix. The estimated percentage of length \u00E2\u0080\u009Cmissing\u00E2\u0080\u009D by taxon (relative to Nicotiana tabacum) is: Welwitschia mirabilis (35.8%); Ephedra nevadensis (35.5%); Gnetum gnemon (33.0%); Cedrus deodara and Pinus thunbergii (24.1%). The major genes that are missing or were not obtained for Cedrus and Pinus are ndhB and ndhF; these two genes and rpl2 were also not obtained for the Gnetales exemplars (Table 2.1). All 16 of the noncoding regions (Table 2.2) were also excluded for Marchantia and Psilotum. A maximum of 5.5% of data (for Stangeria eriopus) was coded as missing for the remaining taxa. 2.2.3 Data Assembly Contiguous sequences were compiled and base calling performed using Sequencher 4.1 (Gene Codes Corporation; Ann Arbor, MI.). The consensus sequences for each taxon were exported into a previously generated alignment (Graham et al., 2000) that was adjusted manually in Se-Al version 1.0 (Rambaut, 1998) using criteria provided in Graham et al. (2000). The alignments were imported into PAUP* version 4.0b10 (Swofford, 2002) for compilation and analysis. Tobacco and Ginkgo sequences were used to determine gene and exon boundaries. The intergenic spacer (IGS) regions of the Photosystem II genes (Table 2.2), and short portions of the rps7-ndhB IGS and the rpl2 intron (239 bp and 111 bp in the alignment, respectively) were difficult to align in the conifers. These characters were omitted in seed- plant wide analyses. The regions considered for analysis comprise a total of 15,257 aligned nucleotide characters (corresponding to 13,320 nucleotides unaligned; reference taxon = Bowenia serrulata). Of these characters, 8,471 nucleotides are constant, 2,101 nucleotides are variable but uninformative, and 4,685 nucleotides are parsimony informative. In a second set of analyses that considered only the cycads and Ginkgo biloba, we included all of the noncoding regions, because homology assessment for these regions was straightforward at this level of comparison. This alignment spans a total of 16,181 aligned nucleotides (corresponding to 13,784 nucleotides unaligned; reference taxon = Bowenia serrulata), of 17 which 14,412 nucleotides are constant, 1,393 nucleotides are parsimony uninformative, and 376 nucleotide positions are parsimony informative. 2.2.4 Phylogenetic Analysis Two sets of analyses were performed. The first set employed the alignment of the 24 seed plant taxa and outgroups, and was performed using maximum parsimony (MP). Heuristic searches were performed in PAUP* with all characters and character state changes equally weighted, using TBR (tree-bisection-reconnection) branch swapping. The \u00E2\u0080\u009CMulTrees\u00E2\u0080\u009D option was turned on, and 100 random addition replicates were performed for each search. A second set of more intensive searches focussed on the alignment that included only the seven cycads and Ginkgo biloba. Both MP and maximum likelihood (ML) criteria were used for these analyses. A model was chosen for the ML searches using the likelihood ratio test (LRT; see Swofford et al. 1996; Huelsenbeck and Crandall, 1997); model parameters were estimated from the data in each case. The hierarchy of models tested is a modification of that shown in Fig. 4 of Huelsenbeck and Crandall (1997; see Graham et al., 2002 for details). The LRT was repeated using both shortest trees obtained from the maximum parsimony analysis (Table 2.3, and see Results). Parsimony-based bootstrap analysis (Felsenstein, 1985) was performed for the twenty-six and eight taxon data sets using the search criteria described above, except that only one random addition replicate was used for each of 100 bootstrap replicates. The bootstrap analysis was repeated for the eight taxon data set using ML, with the optimal model indicated by the LRT. 18 2.3 RESULTS 2.3.1 26-Taxon Data Set Heuristic searches of the 26-taxon matrix produced two most parsimonious trees (Fig. 2.2). The relationships within the cycads are similar to recent results obtained by Bogler and Francisco-Ortega (2004) using chloroplast trnL intron and nuclear ITS2 sequences. The parsimony analysis shows high bootstrap support for the cycads as a whole (Fig. 2.2). Bowenia is isolated from Stangeria; the latter appears as the sister group of Zamia. In the parsimony analysis Ginkgo is found to be the sister group of the cycads. This relationship is only weakly supported by parsimony analysis (54% of bootstrap replicates; Fig. 2.2). The conifers are a well-supported clade that is the sister group of (Ginkgo + cycads). The Gnetales are also well supported as a monophyletic group. They form the sister group of the remaining seed plants, corresponding to the \u00E2\u0080\u009CGnetales basal\u00E2\u0080\u009D hypothesis of Rydin et al. (2002; see also Sanderson et al., 2000). 2.3.2 Cycads and Ginkgo Major relationships within the cycads are nearly identical in parsimony and likelihood analyses (Fig. 3). The GTR + ! + I model was chosen with the likelihood ratio test [Table 2.3; general-time-reversible (GTR) rate matrix with proportion of invariable sites (I) considered, and among-site rate variation accounted for using the gamma (!) distribution as described by the shape parameter alpha (\")]. An exhaustive search was performed using this model, with all parameter values estimated from the data. The single shortest ML tree is illustrated in Fig. 2.3; this topology is also one of two trees found in the parsimony analysis. Cycas is part of the basal split in the cycads, and Dioon is the sister group of the remaining 19 taxa. In all analyses, Stangeria and Zamia are sister taxa, and Ceratozamia is the sister group of these two (Fig. 2.3). None of the analyses performed indicates a sister group relationship between Bowenia and Stangeria. The only difference between the parsimony and likelihood analyses is in the precise placement of Bowenia and Encephalartos relative to the clade of Ceratozamia, Stangeria and Zamia (Fig. 2.3). In the likelihood analysis, Encephalartos is the sister group of the latter clade and Bowenia is the sister group of this entire clade. The MP tree that is not congruent with the ML topology instead depicts Bowenia as the sister group of (Ceratozamia, Stangeria, Zamia), with Encephalartos as the sister group of that clade (Fig. 2.3). 2.3.3 Molecular Evolution in Cycads and Relatives There is significant heterogeneity in the rate of molecular evolution across the 26 taxa considered here (ln likelihood scores for the parsimony-based tree depicted in Fig. 2: GTR + ! + I model = -97609.78; GTR + ! + I + molecular clock = -98196.06; -2 ln \" = 1172.56; P < 0.05). Pairwise comparisons also suggest that the transition:transversion ratio (ti/tv) is substantially higher in the cycads and Ginkgo than in the rest of the seed plants (data not shown). I confirmed this by estimating the ti/tv ratio from the HKY + ! + I likelihood model for pruned subtrees in the seed plants derived from the MP tree shown in Fig. 2.2. The ti/tv ratio is ca. 2.5 in most seed plants, but almost twice as high in the cycads alone or cycads and Ginkgo together (Table 2.4). Examination of substitution rate parameters in the GTR + ! + I model on the same set of subtrees suggests that most of the elevated ti/tv bias in cycads and Ginkgo comes from higher A:G transitions and lower A:T and C:G transversions than the rest 20 of the seed plants (Table 2.5). Base frequencies and the proportion of invariable characters were fairly consistent across the different subtrees considered here, while the !-shape parameter, \", was elevated in the conifers and perhaps the Gnetales (Table 2.5). 2.4 DISCUSSION 2.4.1 Cycad Molecular Evolution The chloroplast regions examined here for the cycads and Ginkgo are highly conserved, both with regards to length variation in noncoding regions (Table 2.2) and the amount of nucleotide substitution. The comparison of likelihood models with and without a molecular clock indicates substantial rate heterogeneity across the 26-taxon tree of seed plants and relatives (see Results). Ginkgo and the crown-group cycads have a relatively shallow depth on the seed plant tree compared to the other seed plant groups (Fig. 2.2), despite a deep fossil record for both lineages. Fossil cycads are known from ca. 270 million years ago (Mamay, 1969) and the fossil record for Ginkgoales is of comparable age, reaching back into at least the Triassic (Stewart and Rothwell, 1993). Thus, substantially more chloroplast evolution has occurred in each of the other major seed-plant clades than in the cycads or Ginkgo, for comparable numbers of basal taxa, across comparable or shorter time frames [Fig. 2.2; conifer relatives are known from ca. 310 million years ago, the oldest probable crown-group angiosperm fossils are ca. 130 million years old, and crown-group Gnetales may have diversified at around the same point in the early Cretaceous (although putative stem-lineage relatives of the Gnetales existed in the Triassic); see Taylor and Taylor (1993) and Sanderson and Doyle (2001). Note that branch lengths in Gnetales and some conifers are artificially short here because of incomplete molecular data, see Table 2.1]. 21 Substitution model parameters estimated using current ML algorithms have a consistent value across the entire tree. It would be therefore valuable to know whether the observed variation in substitution rate matrix (R-matrix) values across the seed plants affects phylogenetic inference within the cycads. I assessed this by performing heuristic likelihood searches (GTR + ! + I model) for the eight-taxon data set using R-matrix estimates from the different seed-plant subtrees shown in Table 2.5; the other parameters of the model were left free to vary. Only one tree topology (that of Fig. 2.3; results not shown) was recovered in all cases, suggesting that variation of this magnitude across the seed plants in the ti/tv ratio does not affect inference of relationships within the cycads, and is presumably unlikely to bias phylogenetic inference within the cycads or other seed plant clades when all taxa are considered together. 2.4.2 Basal Cycads and the Sister Group of Cycadales The large numbers of characters in this study provide substantial new evidence for estimating deep relationships within and among the cycads and their seed plant relatives. Although only 376 parsimony informative characters were recovered across ca. 110kb total DNA sequence data for the seven exemplar cycads and Ginkgo (less than a tenth of the total number of informative characters observed for 26 taxa, with slightly fewer nucleotides examined per taxon in the latter case), my data permit robust inference of most aspects of phylogenetic relationship among the basal cycads (Fig. 2.3). The very slow molecular evolution observed for the cycads and Ginkgo may additionally minimize any possibility of there being biased phylogenetic inference for these taxa (\u00E2\u0080\u009Clong branch attraction\u00E2\u0080\u009D) due to terminal branches that are long relative to internal branches (see Felsenstein, 1983). 22 While the seed plant relationships inferred here are well resolved and generally very well supported by parsimony-based bootstrap analysis, my data provide only weak support for a sister group relationship between Ginkgo and the cycads. However, this relationship is congruent with some of the most-parsimonious morphology-based trees of Rothwell and Serbet (1994), chloroplast genome structural evidence (L. A. Raubeson, pers. comm.) and several recent molecular studies (Boivin et al., 1996; Goremykin et al., 1996; Chaw et al., 1997, 2000). The elevated ti/tv ratio and slow rate of molecular evolution observed here for cycads and Ginkgo may constitute additional molecular synapomorphies for these two seed plant groups. The arrangement of the exemplar cycad taxa that I examined parallels and reinforces several previous morphological and molecular studies of basal cycad lineages. For example, Bogler and Francisco-Ortega (2004) found essentially the same arrangement with respect to the exemplar taxa that I examined, but with poorer bootstrap support across the backbone of cycad phylogeny. My results strongly support Cycas as the sister group of the remaining cycads (Figs. 2.2, 2.3), consistent with all previous phylogenetic studies and most cycad classification schemes. I find robust support for Dioon being part of the next deepest split in cycad phylogeny (Fig. 2.3), and for Stangeria being closely related to Zamia and Ceratozamia, with moderate support for Stangeria being the sister taxon of Zamia (of those taxa included here). My results also parallel an early morphology-based cladistic study of the cycads by Petriella and Crisci (1977; cited in Crane, 1988), who observed the same basal arrangement of Cycas and then Dioon, and found tribe Encephalarteae to be the sister group of a clade containing Bowenia, Ceratozamia, Stangeria and Zamia, one of the two general 23 arrangements that I observed in the MP analysis (Fig. 2.3). The same basic arrangement was also seen (regarding the relative positions of Cycas, Dioon, Ceratozamia and Zamia), in a phylogenetic study of chloroplast DNA restriction-site data that was robustly supported by bootstrap analysis (Caputo et al., 1991; these authors also included Microcycas and Chigua as exemplars, but did not include Bowenia and Stangeria). The phylogenetic analyses of morphological evidence by Petriella and Crisci (1977), Crane (1988) and Stevenson (1990) are largely congruent with my results with respect to the exemplar taxa that our studies share in common, except that these authors found Bowenia and Stangeria to form a clade near the root of the cycads. In contrast, my results show neither genus to be near the root. While some uncertainty remains over the precise placement of Bowenia, my data strongly suggest that these two taxa are not each others\u00E2\u0080\u0099 closest living relatives, in line with Bogler and Francisco-Ortega\u00E2\u0080\u0099s (2004) results. Bowenia and Stangeria are well separated on my chloroplast-based tree by several branches with moderate to strong bootstrap support (Fig. 2.3). Future studies should focus on clarifying the position of the remaining cycad genera and species. Although I focussed on an exemplar-based sampling for the current study of broad phylogenetic relationships, my results (Figs. 2.2, 2.3) warrant two main possibilities for cycad classification to accommodate the multiple basal cycad lineages observed; either recognition of one or two large families, or of multiple small families. Current schemes with three or four families are unlikely to be satisfactory without substantial modification. At the very least, a recircumscription of Zamiaceae, Stangeriaceae and subfamily Encephalartoideae will be necessary, since none of these taxa are monophyletic (cf. Figs. 2.1-2.3). 24 Table 2.1. GenBank accession numbers and vouchers for exemplar taxa with one or more previously unpublished DNA sequences 1 . Gene or region _________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'-rps12, & psbH & psbJ rps7 & ndhB Taxon (voucher, herbarium) ANGIOSPERMS Austrobaileya (AF092107) 2 AF238052 AY007460 AF239777 AY007475 (L12632) 2 AY007489 AF238065 scandens (Olmstead s.n., WTU) Hydrastis (AF093382) 2 AF238055 AY007464 AF239782 AY007479 (L75849.2) 2 AY007492 AF238069 canadensis (Olmstead s.n., WTU) CONIFERS Cedrus AF469655 n/a AF469704 AF462401 AF469714 (X63662) 2 AF469723 AF469739 deodara (SWG XI-98-1, ALTA) Metasequoia AF469660 AF469698 AF469710 AF462406 AF469719 (AJ235805) 2 AF469728 AF469736 glyptostroboides (Rai 1007, ALTA) 25 Gene or region _________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'-rps12, & psbH & psbJ rps7 & ndhB Taxon (voucher, herbarium) Podocarpus AF469661 AF469699 AF469711 AF462407 AF469720 AF462414 AF469729 AF469737 chinensis (Graham & Denton VII-98-8, ALTA) Sciadopitys AF239792 2 AF469700 AY116650, AF239793 2 AY007486 2 (L25753) 2 AY007499 2 AF238076 3 verticillata AY116651 (Graham & Denton VII-98-1, WTU) CYCADS Bowenia AF469654 AF469693 AF469703 AF462400 AF469713 AF462409 AF469722 AF469731 serrulata (Bogler 1202, FTG) Ceratozamia AF469656 AF469694 AF469705/ AF462402 AF469715 AF462410 AF469724 AF469732 miqueliana AF469706 (Hubbuch et al. 106, FTG) 26 Gene or region _________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'-rps12, & psbH & psbJ rps7 & ndhB Taxon (voucher, herbarium) Cycas AF469657 AF469695 AF469707 AF462403 AF469716 AF462411 AF469725 AF469733 revoluta (O'Brien 1000, ALTA) Dioon AF469658 AF469696 AF469708 AF462404 AF469717 AF462412 AF469726 AF469734 purpusii (O'Brien 1001, ALTA) Encephalartos AF469659 AF469697 AF469709 AF462405 AF469718 AF462413 AF469727 AF469735 barteri (O'Brien 1002, ALTA) Stangeria AF469662 AF469701 AF469712 AF462408 AF469721 AF462415 AF469730 AF469738 eriopus (Beck 1117, FTG) Zamia AF188845 2 AF469702 AF188846 2 AF188848 2 AF188847 2 AF202959 2 AF188849 2 AF188850 2 furfuracea (Graham VIII-98-1, WTU) 27 Gene or region _________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'-rps12, & psbH & psbJ rps7 & ndhB Taxon (voucher, herbarium) GNETALES 4 Ephedra AF239779 n/a AY007462 AF239780 AY007477 (D10732) 2 n/a AF238067 nevadensis (Olmstead s.n., WTU) Welwitschia AF239795 n/a AY007472 AF239796 AY116660 (D10735) 2 n/a AF238078 mirabilis (Graham + Denton VII-98-6, WTU) 1 Partly sequenced genes by taxon, relative to sequences considered in Graham and Olmstead (2000a):(1) psbD: Dioon and Welwitschia, 311 bp missing at 5'-end; (2) rbcL: Encephalartos and Stangeria, 69 bp and 93 bp missing at 5'-end, respectively; (3) atpB: Ephedra, 364 bp, and Stangeria, 209 bp missing at 5'-end; (4) ndhF: Ephedra, Gnetum, and Welwitschia missing entire ndhF gene; Metasequoia and Zamia, both missing 520 bp at 5' end and 84 bp at 3\u00E2\u0080\u0099 end, the other cycad and conifer ndhF sequences lacking 271 bp at the 5\u00E2\u0080\u0099- end; (5) psbB: Ceratozamia, 85 bp missing at the 5'-end; (6) psbJ: Metasequoia and Welwitschia, lacking all of the examined region (ca. 90 bp); (7) rpl2: Ephedra, Gnetum, and Welwitschia missing entire rpl2 region; Dioon and Encephalartos missing 112 bp at 5'-end, and Cedrus, Metasequoia, Bowenia, Ceratozamia, and Dioon missing 90 bp at 3'-end; (8) 3'-rps12: Cycas and Stangeria, missing 81 bp and 214 bp at 5'-end, respectively; (9) ndhB: Podocarpus and Cycas missing 400 bp and 389 bp at 3'-end, respectively; Cedrus, Ephedra, Gnetum, and Welwitschia missing entire ndhB region. 2 Previously published sequences. Accessions in brackets were produced by other workers; see Graham and Olmstead (2000a,b) and Graham et al. (2000) for a complete list of taxa and accession numbers for other sequences employed in phylogenetic analyses here. 3 Sequence is updated for this publication (see text). 4 One sequence of Gnetum gnemon (Gnetaceae) was used in a previous study (Graham and Olmstead, 2000a) but its GenBank number was omitted there (3\u00E2\u0080\u0099-rps12\u00E2\u0080\u0094rps7; AY116648). 28 Table 2.2. Lengths and variation in length of the noncoding regions used in this study. Cycads Seed plants (including cycads) Region Mean length a Range Mean length a Range 3\u00E2\u0080\u0099rps12, rps7, ndhB 3\u00E2\u0080\u0099rps12 intron 549 540-557 532 476-575 ndhB intron 726 720-727 705 669-736 3\u00E2\u0080\u0099rps12-rps7 IGS 52 52 53 48-64 rps7-ndhB IGS 361 353-379 318 215-379 rpl2 intron 692 681-711 674 649-717 psbE, psbF, psbL, psbJ psbE-psbF IGS 9 9 9 9-14 29 Cycads Seed plants (including cycads) Region Mean length a Range Mean length a Range psbE, psbF, psbL, psbJ psbF-psbL IGS 22 22 26 11-38 psbL-psbJ IGS 114 108-118 123 b 113-161 b psbB, psbT, psbN, psbH psbB-psbT IGS 169 161-185 152 c 69-193 c psbT-psbN IGS 76 74-81 71 d 57-91 d psbN-psbH IGS 83 83 97 e 78-120 e a rounded to nearest bp; b - e The following were excluded from length calculations due to incomplete data or aberrantly long IGS sequences (Graham et al., unpubl. data: b Metasequoia glyptostroboides: psbL-psbJ IGS; c Welwitschia mirabilis: psbB-psbT IGS; d Sciadopitys verticillata: psbT-psbN IGS; e Austrobaileya scandens: psbN-psbH IGS) 30 Table 2.3. Likelihood ratio test (LRT) for different substitution models based on the two most parsimonious tree topologies for the seven cycads and Ginkgo (tree 1 is the main topology depicted in Fig. 3). Substitution model -ln likelihood Comparison 1 -2 ln ! 2 P 3 MP tree 1 JC69 32411.01 ----- ----- ----- F81 32152.95 JC69 vs. F81 516.12 < 0.05 HKY85 31166.37 F81 vs. HKY85 1973.13 < 0.05 GTR 31055.03 HKY85 vs. GTR 222.71 < 0.05 GTR + \" 30852.41 GTR vs. (GTR + \") 405.24 < 0.05 GTR + \" + I 30848.06 (GTR + \") vs. (GTR + \" + I) 8.70 < 0.05 MP tree 2 JC69 32415.52 ----- ----- ----- F81 32159.56 JC69 vs. F81 511.92 < 0.05 HKY85 31174.33 F81 vs. HKY85 1970.47 < 0.05 GTR 31061.90 HKY85 vs. GTR 224.86 < 0.05 GTR + \" 30857.18 GTR vs. (GTR + \") 409.44 < 0.05 GTR + \" + I 30852.68 (GTR + \") vs. (GTR + \" + I) 8.98 < 0.05 1 Abbreviations: JC69 = Jukes Cantor (1969); F81 = Felsenstein (1981); HKY85 = Hasegawa et al. (1985); GTR = General Time-Reversible (Lanave et al., 1984; Tavar\u00C3\u00A9, 1986; Barry and Hartigan, 1987; Rodr\u00C3\u00ADguez et al., 1990). \" = Gamma; I = Proportion of invariable sites 2 Likelihood ratio test statistic. 3 The #-level was adjusted using the Bonferroni correction for ten tests 31 Table 2.4. Estimated likelihood parameters (HKY + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2. Subtree Parameters Cycads Cycads + Ginkgo Conifers Gnetales Angiosperms Seed plants (excl. cycads + Ginkgo) ti/tv 4.64329 4.78367 2.52218 2.84368 2.42617 2.62560 Base Frequencies A 0.27742 0.27871 0.28924 0.27822 0.27446 0.28967 C 0.18952 0.19041 0.18242 0.18649 0.199253 0.19380 G 0.22543 0.22304 0.21173 0.21782 0.221099 0.20421 T 0.30764 0.30784 0.21173 0.21782 0.305187 0.31232 I 0.59578 0.50307 0.61724 0.69313 2 0.57354 0.43993 \" 1 0.87172 0.85585 ! ! 2 0.71664 1.43375 1 The gamma (!) shape parameter. 2 Values when genes not obtained for Gnetales (ndhB, ndhF, rpl2) are excluded during ML estimation: I = 0; \" = 0.166. 32 Table 2.5. Estimated likelihood parameters (GTR + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2. Subtree Parameters Cycads Cycads + Ginkgo Conifers Gnetales Angiosperms Seed plants (excl. cycads + Ginkgo) Rate Matrix AC 1.85110 1.74707 1.81512 1.72428 1.61416 1.77199 AG 7.34954 6.47428 4.90730 4.44244 3.78213 4.83661 AT 0.14563 0.11631 0.63952 0.85701 0.27506 0.48881 CG 0.44685 0.38870 0.86065 0.20063 0.83620 1.00865 CT 7.84315 7.63734 5.92336 7.18822 4.87788 6.28927 GT 1.0 1.0 1.0 1.0 1.0 1.0 33 Subtree Parameters Cycads Cycads + Ginkgo Conifers Gnetales Angiosperms Seed plants (excl. cycads + Ginkgo) Base Frequencies A 0.27783 0.28191 0.29019 0.27887 0.27856 0.30873 C 0.18737 0.18615 0.17725 0.18249 0.19087 0.16549 G 0.22577 0.22127 0.21432 0.22376 0.22228 0.19241 T 0.30903 0.31067 0.31824 0.31488 0.30829 0.33336 I 0.59283 0.48078 0.60248 0.66040 2 0.57041 0.43879 ! 1 0.85281 0.84493 26.86624 7.15538 2 0.72979 1.42102 1 The gamma (\") shape parameter. 2 Values when genes not obtained for Gnetales (ndhB, ndhF, rpl2) are excluded during ML estimation: I = 0.61; ! = 2.37 34 Figure 2.1. Summary of Stevenson\u00E2\u0080\u0099s (1992) classification of the cycads, reproduced with permission from Ken Hill (http://plantnet.rbgsyd.gov.au/PlantNet/cycad/ident.html). 35 Figure 2.2. Chloroplast based phylogeny of the cycads and relatives. The tree is one of two most parsimonious trees (14229 steps, CI=0.593, RI=0.669) found using 17 chloroplast genes and associated noncoding regions (three introns and two intergenic spacer regions). Bootstrap values are indicated beside branches. The arrow points to a branch not found in the strict consensus of the two most-parsimonious trees. 36 Figure 2.3. Chloroplast based phylogeny of the cycads using different optimality criteria (rooted according to Fig. 2). The tree shown is one of two MP trees (2088 steps, CI=0.884, RI=0.5) and the best ML tree (-lnL=30848.062; see text) found using 17 chloroplast genes and associated noncoding regions (three introns and eight intergenic spacer regions). The double-headed arrow indicates an alternative relationship seen in the other MP tree. Numbers above branches are parsimony-based branch lengths (ACCTRAN optimization), and numbers below are support values from MP (first values) and ML bootstrap analyses (second values). 37 2.5 REFERENCES BARRY, D., AND J. A. HARTIGAN. 1987. Asynchronous distance between homologous DNA sequences. Biometrics 43: 261-276. BOGLER, D. J., AND J. FRANCISCO-ORTEGA. 2004. Molecular systematic studies in cycads: evidence from trnL intron and ITS2 rDNA sequences. Bot. Rev. 70: 260-273. BOIVIN, R., M. RICHARD, D. BEAUSEIGLE, J. BOUSQUET, AND G. BELLEMARE. 1996. Phylogenetic inferences from chloroplast chlB gene sequences of Nephrolepis exaltata (Filicopsida), Ephedra altissima (Gnetopsida), and diverse land plants. Mol. Phylogenet. Evol. 6: 19-29. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales\u00E2\u0080\u0099 closest relatives are conifers. Proc. Nat. Acad. Sci. 97: 4092-4097. CAPUTO, P., D. W. STEVENSON, AND E. T. WURTZEL. 1991. A phylogenetic analysis of American Zamiaceae (Cycadales) using chloroplast DNA restriction fragment length polymorphisms. Brittonia. 43: 135-145. CAPUTO, P., C. MARQUIS, T. WURTZEL, D. W. STEVENSON, AND E. T. WURTZEL. 1993. Molecular biology in cycad systematics. pp. 213- 219. In Proceedings of CYCAD 90, the Second International Conference of Cycad Biology. Edited by D. W. Stevenson, and K. J. Norstog. Palm and Cycad Societies of Australia Ltd., Queensland. CHAMBERLAIN, C. J. 1965. The living cycads. Hafner Publishing Company, New York, NY. CHAW, S., A. ZHARKIKH, H. SUNG, T. LAU, AND W. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68. 38 CHAW, S., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Nat. Acad. Sci. 97: 4086-4091. CRANE, P. R. 1988. Major clades and relationships in the \u00E2\u0080\u009Chigher\u00E2\u0080\u009D gymnosperms. Pp. 218- 272. In Origin and evolution of gymnosperms. Edited by C. B. Beck. Columbia University Press, New York, NY. DOYLE J. A. 1996. Seed plant phylogeny and the relationships of Gnetales. Int. J. Plant Sci. 157: S3-S39. DOYLE J. A. 1998. Phylogeny of vascular plants. Ann. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A., AND M. J. DONOGHUE. 1986. Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach. Bot. Rev. 52: 321-431. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. FELSENSTEIN, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17: 368-376. FELSENSTEIN, J. 1983. Parsimony in systematics: biological and statistical issues. Ann. Rev. Ecol. Syst. 14: 313-333. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783\u00E2\u0080\u0093791. GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A. ANTONOV, AND W. MARTIN. 1996. Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Mol. Biol. Evol. 13: 383-396. 39 GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Amer. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant. Sci. 161: S83- S96. GRAHAM, S. W., R. G. OLMSTEAD, AND S. C. H. BARRETT. 2002. Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. Mol. Biol. Evol. 19:1769-1781. HASEGAWA, M., H. KISHINO, AND T. YANO. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 21: 160-174. HILL, C. R., AND P. R. CRANE. 1982. Evolutionary cladistics and the origin of the angiosperms. Pp. 269-361. In Problems of phylogenetic reconstruction. Edited by K. A. Joysey, and E. A. Friday. Academic Press, London. HUELSENBECK, J. P., AND K. A. CRANDALL. 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Ann. Rev. Ecol. Syst. 28: 437-466. JONES, D. L., 1993. Cycads of the world. Smithsonian Institution Press, Washington, D.C. JUKES, T. H., AND C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 21-32. In Mammalian Protein Metabolism. Edited by H. N. Munro. Academic Press, New York, NY. 40 LANAVE, C., G. PREPARATA, C. SACCONE, AND G. SERIO. 1984. A new method for calculating evolutionary substitution rates. J. Mol. Evol. 20: 86-93. LOCONTE, H., AND D. W. STEVENSON. 1990. Cladistics of the Spermatophyta. Brittonia 42: 197-211. MAMAY, S. H. 1969. Cycads: fossil evidence of late paleozoic origin. Science 164: 295-296. MATHEWS, S. M., AND M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286: 947-950. NORSTOG, K. J. 1990. Studies of cycad reproduction at Fairchild Tropical Garden. Mem. New York Bot. Gard. 57: 63-81. NORSTOG, K. J., AND T. J. NICHOLLS. 1997. The biology of the cycads. Cornell University Press, Ithaca, NY. PARKINSON, C. L., K. L. ADAMS, AND J. D. PALMER. 1999. Multigene analyses identify the three earliest lineage of extant flowering plants. Curr. Biol. 9: 1485-1488. QIU, Y., J. LEE, F. BERNASCONI-QUADRONI, D. E. SOLTIS, P. S. SOLTIS, M. ZANIS, E. A. ZIMMER, Z. CHEN, V. SAVOLAINEN, AND M. W. CHASE. 2000. Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int. J. Plant Sci. 161: S3-S27. RAMBAUT, A. 1998. Se-Al (Sequence Alignment Editor). Version 1.0, Computer program and documentation. Department of Zoology, University of Oxford, UK. RODR\u00C3\u008DGUEZ, F., J. L. OLIVER, A. MAR\u00C3\u008DN, AND J. R. MEDINA. 1990. The general stochastic model of nucleotide substitution. J. Theor. Biol. 142: 485-501. ROTHWELL, G. W. AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482. 41 RYDIN, C., M. K\u00C3\u0084LLERSJ\u00C3\u0096, AND E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci. 163: 197-214. SAARELA, J.M., P.J. PRENTIS, H.S. RAI, AND S.W. GRAHAM. 2008. Phylogenetic relationships in the monocot order Commelinales, with a focus on Philydraceae. Botany 86: 719-731. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17: 782-797. SANDERSON, M. J., AND J. A. DOYLE. 2001. Sources of error and confidence intervals in estimating the age of the angiosperms from rbcL and 18S rDNA data. Amer. J. Bot. 88: 1499-1516. SAVOLAINEN, V., M. W. CHASE, S. B. HOOT, C. M. MORTON, D. E. SOLTIS, C. BAYER, M. W. FAY, A. Y. DE BRUIJN, S. SULLIVAN, AND Y. QIU. 2000. Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Syst. Biol. 49: 306-362. SCHUTZMAN, B. AND B. DEHGAN. 1993. Computer-assisted systematics in the Cycadales. Pp. 281-289. In Proceedings of CYCAD 90, the Second International Conference of Cycad Biology. Edited by D. W. Stevenson, and K. J. Norstog. Palm and Cycad Societies of Australia Ltd., Queensland. SOLTIS, P. E., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402: 402-404. STEVENSON, D. W. 1981. Observations on ptyxis, phenology, and trichomes in the Cycadales and their systematic implications. Amer. J. Bot. 68: 1104-1114. 42 STEVENSON, D. W. 1990. Morphology and systematics of the Cycadales. Mem. New York Bot. Gard. 57: 8-55. STEVENSON, D. W. 1992. A formal classification of the extant cycads. Brittonia 44: 220-223. STEWART, W. N., AND G. W. ROTHWELL. 1993. Paleobotany and the evolution of plants. Second edition. Cambridge University Press, New York, NY. SWOFFORD, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, AND D. M. HILLIS. 1996. Phylogenetic Inference. Pp. 407-543. In Molecular Systematics. Second edition. Edited by D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, MA. TAVAR\u00C3\u0089, S. 1986. Some probabilistic and statistical problems on the analysis of DNA sequences. Lec. Math. Life Sci. 17: 57-86. TAYLOR, T. N., AND E. L. TAYLOR. 1993. The biology and evolution of fossil plants. Prentice Hall, Englewood Cliffs, NJ. WAKASUGI, T., J. TSUDZUKI, S. ITO, K. NAKASHIMA, T. TSUDZUKI, AND M. SUGIURA. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Nat. Acad. Sci. 91: 9794-9798. 43 CHAPTER 3 1 INFERENCE OF HIGHER-ORDER CONIFER RELATIONSHIPS FROM A MULTI-LOCUS PLASTID DATA SET 3.1 INTRODUCTION Conifers have a rich and deep fossil record, with taxa assignable to extant families dating back to the Triassic (Yao et al., 1997; Stockey et al., 2005). Although angiosperms are now the dominant group of seed plants in most terrestrial ecosystems, conifers are still ecologically significant in all continental floras (Enright and Hill, 1995), and they dominate the northern boreal forests. Approximately 670 extant species are recognized in some 70 genera. Most conifer systematists recognize seven families to accommodate their diversity: Araucariaceae, Cephalotaxaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae and Taxaceae (Phyllocladaceae are sometimes separated from Podocarpaceae, while Taxodiaceae are now usually included in Cupressaceae). Considerable progress in our understanding of conifer classification and phylogenetics has been made using multiple lines of evidence (e.g., Quinn et al., 2002). For example, Taxaceae have sometimes between treated as an order distinct from other conifers (e.g., Florin, 1951), but morphological and molecular data clearly support a nested phylogenetic position for the family among the rest of the conifers (e.g., Hart, 1987; Raubeson and Jansen, 1992; Quinn et al., 2002). However, there are still points of weakness in our understanding of conifer higher- order phylogenetic relationships. For example, it is still not clear whether extant conifers are 1 A version of this chapter has been published: RAI, H.S., P. A. REEVES, R. PEAKALL, R. G. OLMSTEAD, AND S. W. GRAHAM. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 86: 658-669. 44 monophyletic (e.g., Burleigh and Mathews, 2004, 2007a, b). Some studies find the enigmatic Gnetales (Ephedra, Gnetum and Welwitschia) to be the sister group of the pines and relatives (Pinaceae) with moderate to strong support (the \u00E2\u0080\u0098gnepine\u00E2\u0080\u0099 hypothesis, see Bowe et al., 2000; Chaw et al., 2000; Gugerli et al., 2001), whereas others support the monophyly of extant conifers, with Gnetales being placed elsewhere in seed-plant phylogeny (e.g., Chaw et al., 1997; Rydin et al., 2002; Rai et al., 2003). A placement of Gnetales as sister to Pinaceae is difficult to justify on morphological grounds (e.g., Donoghue and Doyle, 2000), and may be a strong analytical artifact, perhaps due to long-branch attraction (e.g., Burleigh and Mathews, 2004). On the other hand, a relationship between Gnetales and conifers among extant seed plants, a result seen in a subset of molecular studies (e.g., Chaw et al., 1997), is less problematic from a morphological perspective (e.g., Mundry and St\u00C3\u00BCtzel, 2004; Doyle, 2005). Although there is considerable disparity among molecular studies concerning the placement of Gnetales, there is broad agreement on a sister-group relationship between Pinaceae (or gnepines) and the remaining conifer families. Phylogenetic studies have also clarified the circumscription and interrelationships of the other conifer families. For example, they have led to the recognition of a sister-group relationship between the two predominantly southern hemisphere taxa, Araucariaceae and Podocarpaceae, and have provided support for Araucariaceae-Podocarpaceae as the sister group of a clade consisting of Cephalotaxaceae, Cupressaceae, Sciadopityaceae and Taxaceae (Chaw et al., 1997; Stefanovic et al., 1998; Gugerli et al., 2001; Quinn et al., 2002; Rydin et al., 2002). The resulting large clade \u00E2\u0080\u0093 comprising all extant conifers except Pinaceae \u00E2\u0080\u0093 has been referred to 45 informally as \u00E2\u0080\u0098conifers II\u00E2\u0080\u0099 (e.g., Rydin et al., 2002), and more recently as Cupressophyta by Cantino et al. (2007). Within Cupressophyta, various molecular and morphological phylogenetic studies support the existence of a clade consisting of members of Cephalotaxaceae and Taxaceae (e.g., Hart, 1987; Cheng et al., 2000; Quinn et al., 2002), although there is some uncertainty about the limits and monophyly of Taxaceae (e.g., Page, 1990d). For example, Quinn et al. (2002) proposed that Taxaceae should be circumscribed to include Cephalotaxus, although this recommendation is not yet generally followed. Most taxa formerly included in Taxodiaceae are now recognized under a more broadly defined Cupressaceae, a circumscription proposed by Eckenwalder (1976) on morphological grounds, that has since been supported by numerous phylogenetic studies (e.g., Hart, 1987; Brunsfeld et al., 1994; Stefanovic et al., 1998; Gadek et al., 2000; Kusumi et al., 2000; Quinn et al., 2002; Rydin et al., 2002). The distinctiveness of Sciadopitys verticillata, traditionally considered to belong to Cupressaceae, supports its recognition as a separate family, Sciadopityaceae (e.g., Page, 1990c). Morphological and molecular phylogenetic data confirm this view (e.g., Hart, 1987; Brunsfeld et al., 1994). Both families are now recognized as part of the larger clade that includes Cephalotaxaceae and Taxaceae. Finally, the relative arrangement of Cephalotaxaceae, Cupressaceae, Taxaceae and Sciadopityaceae to each other is incompletely understood (e.g., Stefanovic et al., 1998; Quinn et al., 2002; Rydin et al., 2002). The main goal of this study is to obtain well-supported relationships for the deep branches of conifer phylogeny by surveying a large multigene plastid data set (15-17 plastid genes and associated noncoding regions) for a broad range of exemplar conifers. Increasing the amount of nucleotide data sampled per taxon has been shown to be an effective way to 46 clarify our understanding of deep phylogenetic relationships in various groups of plants, and to generally increase support for phylogenetic inferences (for empirical examples using the current gene set see Graham and Olmstead, 2000a; Rai et al., 2003; Graham et al., 2006; Saarela et al., 2007; Zgurski et al., 2008). This chapter focuses on relationships among the families, but I also sampled Araucariaceae and Cupressaceae at sufficient taxonomic depth to address basic features of their internal phylogenetic structure. Rapidly evolving characters can have a substantial impact on the inference of overall seed-plant relationships (Burleigh and Mathews, 2004, 2007b; H.S. Rai and S.W. Graham, unpublished data), and so I assess whether this affects phylogenetic inference within the conifers by including or excluding the most rapidly evolving characters from consideration. I also characterize a curious structural mutation in one of the plastid ribosomal protein genes from two families of conifers, Araucariaceae and Podocarpaceae. 3.2 MATERIALS AND METHODS 3.2.1 Plant Material and Genomic Sampling I surveyed 17 genes, which together with their associated noncoding regions represent between 1/8 and 1/9th of the entire plastid genome (~120 kb in Pinus; Wakasugi et al., 1994). The coding regions include photosynthetic genes (atpB, rbcL and ten photosystem II, psb, genes), translation apparatus genes (the plastid ribosomal protein genes rpl2, rps7 and 3\u00E2\u0080\u0099-rps12), and two chlororespiratory genes (ndhB and ndhF, which code for two of the subunits of plastid NADH dehydrogenase). The noncoding regions consist of three introns (in rpl2, 3\u00E2\u0080\u0099-rps12 and ndhB) and eight intergenic spacer regions (Table 3.1). I used exemplar-based taxon sampling to represent the major branches of conifer phylogeny; in 47 choosing representatives for each non-monotypic family I attempted to represent their internal systematic diversity as broadly as possible, at least as understood from prior studies. In total I included 22 exemplar conifer species and multiple outgroups (11 other seed plants, two monilophytes and three bryophytes). Source and GenBank information is provided in Table 3.1. 3.2.2 Recovery of Plastid Sequences, DNA Alignment and Characterization of an Indel Hotspot I extracted DNA from fresh and silica-dried specimens following Doyle and Doyle (1987) and Rai et al. (2003). DNA samples of Wollemia, Agathis robusta and Araucaria cunninghamii were extracted as described in Peakall et al. (2003). DNA amplification and sequencing methods follow Graham and Olmstead (2000a). I sequenced all regions at least twice for each taxon, and with a few exceptions completely sequenced all regions in both directions. Several regions confirmed as lost from the plastid genome or that I could not amplify were coded as missing data in the final matrix. Two genes that are missing (or not retrievable) for Pinaceae are ndhB and ndhF (see Wakasugi et al., 1994); these two genes and rpl2 were also not retrievable from the Gnetales exemplars examined here. I was unable to recover atpB from Widdringtonia cedarbergensis and rpl2 from Thuja plicata. I excluded noncoding regions for three of the outgroup taxa (Anthoceros, Marchantia and Physcomitrella), because these were difficult to align across land plants. I compiled contiguous sequences, performed base-calling using Sequencher 4.1 (Gene Codes Corporation; Ann Arbor, MI), and added new sequences to an alignment (Graham et al., 2006) that includes sequences generated for previous studies of seed-plant 48 phylogeny (Graham and Olmstead, 2000a, b; Graham et al., 2000; Rai et al., 2003). I adjusted alignments manually for each contiguous region using Se-Al version 1.0 (Rambaut, 1998), following alignment criteria in Graham et al. (2000), and used tobacco (Nicotiana tabacum), Ginkgo and Pinus sequences to define gene and exon boundaries, following Graham and Olmstead (2000a). I offset several regions that were too difficult to align in the noncoding regions [the intergenic spacers (IGS) of two of the photosystem II clusters (psbE- psbF-psbL-psbJ and psbB-psbT-psbN-psbH), the IGS between rps7-ndhB, and the introns], following Graham et al. (2006). The resulting staggered regions were frequently limited to single taxa, which are effectively ignored for parsimony-based tree searches and scores (Graham et al., 2006) and should have only minimal effect for model-based methods (e.g., on estimation of base frequency parameter values). Subsets of the offset regions include aligned blocks involving two or more taxa. The final alignment is 25 687 bp in length, derived from ~14 kb of unaligned data per taxon (e.g., 14.1 kb in Agathis australis). Of the total, 5 384 aligned sites are potentially parsimony informative, and 2 575 variable but parsimony uninformative. I also characterized a structural mutation in the ribosomal protein gene rps7 of Araucariaceae and Podocarpaceae, using the Dotlet browser-based application (v. 1.5; Junier and Pagni, 1999) to make pairwise amino-acid comparisons under the PAM-30 matrix of amino-acid substitution. 3.2.3 Phylogenetic Analyses I performed heuristic maximum parsimony (MP) and maximum likelihood (ML) searches using PAUP* (version 4.0b10; Swofford, 2002) and PhyML (version 2.4.4; Guindon and Gascuel, 2003). For the MP analysis (using PAUP*), I treated all characters 49 and character-state changes as equally weighted, and used TBR (tree-bisection-reconnection) branch swapping with 100 random addition replicates. PAUP* defaults were used for all other settings. For the ML search (using PhyML), I first chose a model of DNA sequence evolution with the hierarchical likelihood ratio test (hLRT) and the Akaike Information Criterion (AIC), using Modeltest (version 3.7; Posada and Crandall, 1998), estimating model parameters from the data in each case. Both assessment methods recovered the same optimal DNA substitution model, GTR + ! + I [i.e., the general-time-reversible (GTR) model, with among-site rate variation accounted for by considering the proportion of invariable sites (I), and the gamma (!) distribution, with four substitution-rate categories for the shape parameter alpha (\")]. I estimated substitution model parameters (base frequencies, the proportion of invariable sites, and the gamma distribution parameter) during the ML search. I assessed branch support using the nonparametric bootstrap (Felsenstein, 1985) with 100 bootstrap replicates (in the MP search using one random addition replicate per bootstrap replicate). I use \u00E2\u0080\u0098weak,\u00E2\u0080\u0099 \u00E2\u0080\u0098moderate,\u00E2\u0080\u0099 and \u00E2\u0080\u0098strong\u00E2\u0080\u0099 in reference to clades that have bootstrap support values < 70%, 70-89%, and ! 90%, respectively (e.g., Graham et al., 1998). I reanalyzed the main matrix after removal of the most rapidly evolving characters, in order to assess whether they distort the inference of conifer higher-order relationships. I used HyPhy (Kosakovsky Pond et al., 2005; and see Burleigh and Mathews, 2004) to classify each alignment site into one of nine rate change classes (referred to as RC0-RC8, with RC0 being the zero-rate category and RC8 the fastest). The single most parsimonious tree (see below) was used as a reference tree for estimating GTR model parameters and the site rate classifications in HyPhy. I re-ran the MP and ML analyses after excluding the two fastest rate categories from consideration (i.e., RC7 and RC8, see Burleigh and Mathews, 2004). 50 3.3 RESULTS The relationships inferred among the major groups of seed plants differ in the MP and ML analysis of the full plastid data set, with Ginkgo (MP) or a clade consisting of Ginkgo, cycads and angiosperms (ML) inferred to be the sister group of conifers, in both cases with moderately strong support (Figs. 3.1, 3.2). Neither method supports a placement of Gnetales in or near the conifers, and both support conifer monophyly (100% from MP, 85% bootstrap support from ML). Both methods infer identical relationships within conifers (Figs. 3.1, 3.2), with four of the five non-monogeneric conifer families strongly supported as monophyletic, at least at the current level of taxon sampling. The two fastest rate classes (RC7 and RC8) comprise 4 252 characters, corresponding to a substantial fraction of all parsimony informative characters (~79% of 5 384 sites). Of these RC78 sites, only 3 501 are parsimony informative, and so deleting these sites (corresponding to the RC0-6 analyses) leaves ~35% of all parsimony informative sites, not ~21% (the fraction expected if RC78 sites are all parsimony informative). The parsimony uninformative RC78 sites are all variable. I examined a subset of these and found them to be sites in alignment blocks that include only a few taxa. Presumably these are predicted to be rapidly evolving sites in the ML classification because variation is seen in them across a small taxon sampling. When all RC78 characters are excluded from consideration, support for relationships among the five major groups of seed plants falls substantially, providing only poor to moderate support for the relevant branches (data not shown). However, relationships inferred within conifers are essentially unchanged after deletion of the fastest characters, with 51 mostly very minor shifts in bootstrap support (cf. Figs. 3.1-3.3). Bootstrap support for conifer monophyly (moderate to strong support from ML and MP analysis, respectively) is also largely unchanged. However, the best MP and ML trees for RC0-6 do not depict Taxaceae as monophyletic (e.g., Fig. 3.3), and the clade consisting of these three taxa is then only moderately supported (70-80%). The remaining results focus on the analyses that consider all of the data. With all data included, the Cupressophyta clade is well supported as monophyletic, with 100% support from MP and ML bootstrap analysis (Figs. 3.1-3.2). Within this clade, Araucariaceae and Podocarpaceae are strongly supported as sister taxa (100% for MP and ML), and this two-family clade is in turn strongly supported as the sister-group of a clade consisting of Cupressaceae, Cephalotaxaceae, Sciadopityaceae and Taxaceae (i.e., both Cupressophyta and the latter four-family clade are strongly supported). The interfamilial relationships within the latter clade are also well supported, with 96% to 100% bootstrap support from MP and ML analysis. More specifically, Sciadopitys (Sciadopityaceae) is strongly supported as the sister group of the remaining three families in the latter clade; Cephalotaxaceae and Taxaceae are sister taxa, and the Cephalotaxaceae-Taxaceae clade is then sister to Cupressaceae. Of the four families sampled for more than two exemplar taxa (Figs. 3.1, 3.2), Pinaceae and Podocarpaceae have only weakly to moderately supported intrafamilial backbones (58-88% bootstrap support), although the same deep splits are inferred by the two phylogenetic methods (i.e., in Pinaceae, Abies is sister to Cedrus and Pinus sister to Pseudotsuga; in Podocarpaceae, Podocarpus is sister to Saxegothaea). In contrast, all relationships inferred within Araucariaceae and Cupressaceae have 100% MP and ML 52 bootstrap support. In particular, the three \u00E2\u0080\u0098core Cupressaceae\u00E2\u0080\u0099 taxa sampled here (Juniperus, Thuja and Widdringtonia) are deeply nested among other members of Cupressaceae, with a basal split in the family seen between Cunninghamia and other taxa. Within Araucariaceae, Wollemia is strongly supported as the sister group of Agathis. A structural feature in the 5\u00E2\u0080\u0099-end of one of the ribosomal proteins considered here, rps7, is worth commenting on, as it seems to represent an otherwise quiescent region that has experienced a \u00E2\u0080\u0098recent\u00E2\u0080\u0099 burst of microstructural mutations (insertions and deletions) in Araucariaceae and Podocarpaceae, including at least one tandem repeat expansion shared by these two families (Fig. 3.4). I refer to this hotspot of structural mutations as an \u00E2\u0080\u0098expansion region,\u00E2\u0080\u0099 as all taxa examined in Araucariaceae and Podocarpaceae are longer due to (predicted) insertions in this region relative to other land plants. However, it should be noted that the region is likely to have undergone both expansions and contractions (data not shown). The total expansion region is quite complex and includes multiple repeated motifs, a subset of which is shared among taxa in Araucariaceae and Podocarpaceae. For example, a ~15 amino-acid indel is present as six copies in Podocarpus, three copies in Agathis, Araucaria, Saxegothaea and Wollemia, and two copies in Phyllocladus (e.g., Fig. 3.4). The tandem repeat (and broadly speaking the hotspot region itself) provides a microstructural synapomorphy for the clade consisting of these two families. The expansion region has a mean length of 149.6 bp (mean length of rps7=614.6 bp, SD=69.7 bp) across the eight taxa included in Araucariaceae and Podocarpaceae, compared to Pinus (length of rps7 excluding stop codon=465 bp). To provide some perspective, the mean total length of rps7 for the other taxa considered in this study is 466.4 bp, with a standard deviation of 3.37 bp. Although we have no experimental evidence that the expansion is part of the translated 53 sequence in these taxa, this seems probable based on comparative evidence. The rps7 gene in the two families is both variable in length and sequence (particularly so in Podocarpaceae), and consistently in-frame in all taxa examined, including multiple taxa in both families that were not included here (data not shown). The portion of the gene containing the expansion region does not appear to be especially prone to indel events elsewhere in the land plants. A more comprehensive survey for the entire expansion region that includes all genera from these two families will be presented elsewhere. 3.4 DISCUSSION 3.4.1 Rapidly Evolving Plastid DNA Sites and the Inference of Higher-Order Conifer Relationships Classifying characters into different rate classes and then removing the fastest ones is a useful alternative approach to dealing with so-called saturated sites, alignment positions that may be mis-informative for phylogenetic inference due to \u00E2\u0080\u0098unseen\u00E2\u0080\u0099 multiple hits. In principle we might expect that ML analysis should be unaffected by removal of these rate classes, as the method should properly correct for multiple hits if the DNA substitution model is adequate (e.g., Sullivan and Swofford, 2001). However, this adjustment might be expected to improve the accuracy of MP results if the amount of saturation is substantial enough to affect phylogenetic inference (e.g., Burleigh and Mathews, 2004). Removing all third-codon positions from protein-coding genes is arguably less desirable than excluding the most rapid ML rate classes, as the former approach is an overly coarse approach for correcting for multiple hits (Olmstead et al., 1998; Yang, 1998; Sanderson et al., 2000), and the latter is applicable to both coding and non-coding data. I 54 would also argue that an exclusion method based on site rates is preferable to the use of parsimony-based successive weighting methods, as in the study of conifer higher-order relationships by Quinn et al. (2002). Successive weighting has been criticized because it may lead to heuristic searches becoming trapped on local optima that depend on starting trees (Swofford et al., 1996). While the rate classification method used here may also partly depend on the starting tree, it has the potential advantage that the substitution rates used to cull the data are explicitly model-based estimates (see Olmstead et al., 1998 for a parsimony- based approach for excluding highly variable characters). However, as I find that deleting the most rapidly evolving characters has little to no effect on our major findings within the conifers (Figs. 3.1-3.3), and only a small effect on branch support, the debate could be considered moot for conifer phylogeny inference. The slight to modest reduction in bootstrap support observed within conifers after the removal of the two fastest rate classes is consistent with an expectation of increased sampling error due to fewer characters. In summary, I find no evidence here that the most rapidly evolving sites distort the inference of higher-order conifer relationships. 3.4.2 The Ovulate Cone and Conifer Systematics The ovulate (seed) cone has been considered to be particularly significant in conifer systematics (e.g., Pilger, 1926; Florin, 1951; Miller, 1999). For example, because members of Taxaceae lack the \u00E2\u0080\u0098typical\u00E2\u0080\u0099 compound ovulate cone of conifers, this was used to justify their recognition as Taxales, an order distinct from the remaining conifers (e.g., Florin, 1951). My data confirm the widely accepted view that Taxaceae have a nested position within the Cupressophyta clade of conifers, sister to Cephalotaxaceae (Figs. 3.1, 3.2). If a 55 compound ovulate cone of the sort found in Pinaceae was ancestral in extant conifers, as is usually assumed, this would require that the ovule-bearing arrangement in Taxaceae (an apparently \u00E2\u0080\u0098coneless conifer\u00E2\u0080\u0099 from the perspective of its ovules) was derived by reduction from the more complex form (e.g., Chamberlain, 1935; Takhtajan, 1953; Hart, 1987; Doyle, 1998; Quinn et al., 2002). However, Tomlinson and Takaso (2002) discuss general difficulties in applying Florin\u00E2\u0080\u0099s model in most families of conifers, due to the extreme modification or apparent absence of the ovuliferous scale in these taxa (the ovuliferous scale is a condensed ovule-bearing secondary shoot axis whose underlying structure seems clearest in the ovulate cones of Pinaceae and Sciadopityaceae; Tomlinson and Takaso, 2002). Hart (1987) suggested that too much weight has been placed on the compound ovulate cone in higher-order conifer systematics. 3.4.3 The Case for Recognizing Cephalotaxus as a Member of Taxaceae I infer Cephalotaxus to be the sister group of the two genera of Taxaceae sampled here, Taxus and Torreya; Taxaceae s.s. are monophyletic at this taxon sampling (Figs. 3.1, 3.2). A sister-group relationship between Cephalotaxaceae and Taxaceae was first recovered by Hart (1987) using morphological data, and subsequently recovered in a morphological analysis by Doyle (1998). A matK-based analysis of the two families that surveyed all five genera of Taxaceae (Cheng et al., 2000) found strong support for the monophyly of Taxaceae (96%, from MP analysis), but included only a handful of outgroups. In the more broadly based study of two plastid loci (matK and rbcL) by Quinn et al. (2002), parsimony analysis found strong support for a clade comprising Cephalotaxus and Taxaceae. Their analysis did not strongly support the monophyly of Taxaceae s.s. unless the data were re-weighted using 56 successive weighting, a method that may yield artifactual results (see above). However, it should be noted that Quinn et al. (2002) consistently recovered two strongly supported clades of Taxaceae in equally and unequally weighted analyses, one of which includes Taxus and the other Torreya. These are the two genera that we included as exemplars for the family, and which I found to comprise a clade. When I removed the fastest evolving characters from phylogenetic analysis (RC7 and RC8), relationships among Cephalotaxus, Taxus and Torreya are no longer strongly supported (Fig. 3.3). My main analyses support the view of Quinn et al. (2002) that it is no longer useful to recognize Cephalotaxaceae (Figs. 3.1, 3.2); Cephalotaxus should be returned to its original home in Taxaceae. The rationale for a circumscription of Taxaceae that includes Cephalotaxus is straightforward should the latter prove to be nested in the former, as the more broadly defined family would then be monophyletic. However, if Cephalotaxus is instead shown to be the sister group of the five genera usually assumed to be in Taxaceae s.s. (i.e., Austrotaxus, Amentotaxus, Pseudotaxus, Taxus and Torreya), a straightforward case can also be made for reducing Cephalotaxaceae to synonymy. Backlund and Bremer (1998) have argued that higher-order classifications that recognize two families in this situation (where a small monogeneric family is the sister group of a larger one) do not optimize phylogenetic information, and ought to be considered redundant. Furthermore, the morphological distinction between Cephalotaxus and Taxaceae is clearly not so great that their combination would create a morphologically unrecognizable taxon (see APG II, 2003). For example, Cephalotaxus also has a relatively simple ovule-bearing arrangement (a pair of ovules in the axil of a bract, the two separated by a narrow flange of tissue of uncertain origin; Tomlinson and Takaso, 2002). A morphological connection between Cephalotaxus and Taxaceae s.s is 57 uncontroversial (Doyle, 1998; St\u00C3\u00BCtzel and R\u00C3\u00B6wekamp, 1999). However, the relationship among the six genera of Taxaceae in its broadened circumscription ought to be addressed by including more taxa in phylogenetic analysis for a sampling of plastid data at least as large as that examined here. 3.4.4 Relationships within Cupressaceae Relationships for three \u00E2\u0080\u0098core Cupressaceae\u00E2\u0080\u0099 sampled here are in line with other studies and are well supported: Widdringtonia (representing the \u00E2\u0080\u0098callitroid\u00E2\u0080\u0099 clade of Gadek et al., 2000) is the sister group of Juniperus + Thuja (two exemplars that represent the \u00E2\u0080\u0098cupressoid\u00E2\u0080\u0099 clade of Gadek et al., 2000). Basal relationships in the family have generally not been inferred with strong support (e.g., Brunsfeld et al., 1994; Stefanovic et al., 1998; Gadek et al., 2000; Kusumi et al., 2000, although see Quinn et al., 2002). The limited taxon sampling here is congruent with these earlier studies (Figs. 3.1-3.3). As with other recent studies (Hart, 1987; Stefanovic et al., 1998; Gadek et al., 2000; Kusumi et al., 2000; Quinn et al., 2002), I find that \u00E2\u0080\u009CTaxodiaceae\u00E2\u0080\u009D (represented here by Cunninghamia, Metasequoia and Taxodium) comprise a grade of taxa near the base of Cupressaceae. Metasequoia and Taxodium represent the \u00E2\u0080\u0098sequoioid\u00E2\u0080\u0099 and \u00E2\u0080\u0098taxodioid\u00E2\u0080\u0099 clades of Gadek et al. (2000), respectively; the latter is the sister group of the core Cupressaceae (Figs. 3.1-3.3). Cunninghamia is inferred to be the sister group of the remainder of the family, as in other recent studies with a broad taxon sampling (Gadek et al., 2000; Quinn et al., 2002). All of these relationships are well supported here. 3.4.5 The Higher-Order Position of Phyllocladus in Conifer Phylogeny 58 Phyllocladus has sometimes been recognized as a distinct family, Phyllocladaceae, because of its highly distinctive morphology, including details of its pollen morphology, wood anatomy and its unique (for conifers) broad leaf-like cladodes (Keng, 1973, 1978; Page, 1990a). However, it shares a relatively simple fleshy ovulate cone with the other podocarps (note that the fleshy part of the cone is not homologous across Podocarpaceae; Kelch, 1997; Tomlinson and Takaso, 2002), and the need to recognize it at the family level has been contested based on evidence from embryogeny and other morphological characters (Quinn, 1986, 1987). We consistently find Phyllocladus to be the sister group of the two other taxa that we surveyed for Podocarpaceae (Podocarpus and Saxegothaea) with weak to strong support, and we consistently find strong support for the clade consisting of all three genera (Figs. 3.1-3.3). Other phylogenetic studies find Phyllocladus to be nested among podocarps (Hart, 1987; Kelch, 1997, 1998; Conran et al., 2000), sister to the rest of the family (Kelch, 1998; Quinn et al., 2002; Sinclair et al., 2002), or even a grade at the base of the family (Kelch, 1998), generally with poor support. However, straightforward arguments paralleling those used above to justify the recognition of Cephalotaxus under Taxaceae can be used to support a circumscription of the podocarps (Podocarpaceae) that includes Phyllocladus. Improved taxon sampling using the current plastid data set should help clarify the backbone of relationships within this large and diverse family. 3.4.6 The Position of Wollemia within Araucariaceae The recently discovered conifer Wollemia nobilis (Jones et al., 1995) has attracted much attention (e.g., Hogbin et al., 2000; Peakall et al., 2003) because of its status as a \u00E2\u0080\u0098living fossil\u00E2\u0080\u0099 (in other words, a previously unknown living taxon that bears considerable 59 similarity to fossil taxa). The monotypic Wollemia has been placed unequivocally in Araucariaceae (Jones et al., 1995), although morphological evidence on where it fits in relation to the other genera is inconclusive, since it approaches members of Agathis and Araucaria in contrasting leaf and ovulate cone characters (Chambers et al., 1998). With the exception of Setoguchi et al. (1998), who found moderate support for Wollemia as the sister- group of Agathis and Araucaria using rbcL, other studies have recovered Wollemia as the sister-group of Agathis with weak to moderate support (Gilmore and Hill, 1997; Stefanovic et al., 1998; Conran et al., 2000). The placement was also strongly supported in a combined analysis of matK and rbcL data by Quinn et al. (2002). I confirm this result here: in all analyses Wollemia nobilis is strongly supported as the sister group of Agathis (Figs. 3.1-3.3). 3.4.7 Significance of an Expansion Hotspot in the Plastid Ribosomal Protein Gene rps7 The morphological evidence on where the podocarps place in higher-order conifer phylogeny is not clear (e.g., Page, 1990b), and a close relationship between the predominantly southern hemisphere families Araucariaceae and Podocarpaceae was not firmly supported until relatively recently. Several molecular studies have demonstrated that these two families are sister taxa (Chaw et al., 1997; Stefanovic et al., 1998; Gugerli et al., 2001; Quinn et al., 2002; Rydin et al., 2002) and the phylogenies inferred here provide further support for this relationship (Figs. 3.1-3.3). The rps7 expansion hotspot (and the associated tandemly repeated amino-acid motif; Fig. 3.4) provides a microstructural synapomorphy supporting this two-family clade. Although the functional significance of this expansion region is unknown, comparable large expansions have been found in another plastid ribosomal gene, rps4, in Araucariaceae and Podocarpaceae (D. Kelch, pers. comm., 60 2007), which suggests that a survey of other plastid ribosomal protein genes might uncover additional protein structural shifts. 3.4.8 Seed-Plant Phylogeny and the Position of Gnetales I observe moderately to strongly conflicting sets of relationships here among the major seed-plant groups (cf. MP, ML analyses of the complete data set; Figs. 3.1, 3.2). There has been extensive debate on the potential for incorrectly inferring relationships among the major groups of extant seed plants from molecular data due to systematic bias, including the question of conifer monophyly relative to a possible relationship between Pinaceae and Gnetales (e.g., Sanderson et al., 2000; Burleigh and Mathews, 2004, 2007a, b). [The broader issue of the monophyly of extant and extinct conifers is unsettled; for example, it is not clear whether voltzialean conifers such as Emporia are closely related to extant conifers (Rothwell and Serbet, 1994; Doyle, 2005).] Extinct taxa are essentially inaccessible to molecular systematists, but Burleigh and Mathews (2004) suggested that better taxon sampling of extant taxa might help reduce the observed conflict among studies regarding broad seed-plant relationships. My substantially improved taxon sampling within conifers (compared to Rai et al., 2003) does not yield a clearer answer for seed-plant relationships as a whole (see Chapter 4), although it is possible that this picture will change with additional conifer sampling. Removing the faster characters (RC78), which might be expected to reduce systematic bias, results in generally poorer support for relationships among the major non- conifer seed plant clades in MP and ML analyses (data not shown). For example, in the analyses of the full data set the bootstrap support for the clade that is the sister group of Gnetales (which consists of angiosperms, conifers, cycads and Ginkgo; Figs. 3.1, 3.2) is 61 100% from MP analysis and 86% from ML analysis. These values fall to 79%, and 44%, respectively, with the fastest characters removed. I re-examined our bootstrap profiles to determine the levels of support for alternative relationships involving Gnetales for the full and reduced data set. I found that bootstrap support for the hypothetical gnepine clade from our MP and ML analyses is <1% and 9% (respectively) with all data included, versus 7% and 12% with the RC78 sites removed. Bootstrap support for even a loose version of the \u00E2\u0080\u0098gnetifer\u00E2\u0080\u0099 hypothesis (i.e., with Gnetales and conifers in a clade, without regard to the monophyly of either) is weak at best. There is <1% and 14% bootstrap support for this clade (from MP and ML analyses, respectively) when all data are included. The latter support values show modest improvement when RC78 sites are removed, with 22% and 50% bootstrap support for this weak version of the gnetifer hypothesis from our MP and ML analyses, respectively. I address the broader issue of seed-plant relationships more fully in Chapter 4. 3.4.9 The Inference of Conifer Phylogeny from Plastid Data Considering either the entire data set or the reduced subset of it, my MP and ML analyses support Pinaceae as the sister group of the rest of the conifers, with or without the most rapidly evolving characters included (Figs. 3.1-3.3). I therefore find moderate to strong support for conifer monophyly (setting aside for now the possibility that the gnepine hypothesis is correct but not recovered here due to strong systematic bias; see above). I recover strong bootstrap support for the broad backbone of conifer phylogeny, comparable to or better than that found in other studies with a broad sampling of conifers (e.g., Stefanovic et al., 1998; Quinn et al., 2002). This is in line with theoretical expectations that increasing 62 the amount of data per taxon should reduce the effect of sampling error on phylogenetic inference. Further increasing the taxon sampling within the major clades of conifers for the plastid gene set examined here, or others of comparable size, may help address the subset of relationships that I did not infer with strong support (i.e., relationships within Pinaceae, Podocarpaceae and Taxaceae s.l.; Figs. 3.1-3.3); see Hillis (1998) for a rationale. Adding genes may also help address some of the hard-to-resolve branches within conifers (e.g., within Pinaceae) and some of the tougher questions involving the major groups of seed plants (e.g., concerning Gnetales placement). It is becoming reasonably straightforward, for example, to obtain plastid data sets of the order of size of the whole plastid genome (e.g., Leebens-Mack et al., 2005). However, the current gene sample is clearly sufficient to recover strong bootstrap support for most of the higher-order relationships that I address in conifers (Figs. 3.1, 3.2). Indeed, even the relatively small set of characters that I infer to be among the most slowly evolving (i.e., RC0-6) provides excellent support for almost the entire broad backbone of conifer phylogeny (Fig. 3.3). 63 TABLE 3.1. Source information and GenBank numbers. Gene or region _______________________________________________________________________________________________ Taxon atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, (Voucher, herbarium) & psbH & psbJ rps7& ndhB Araucariaceae Agathis australis AY664829 AY902169 AF528892 AF528919 AF528865 AF362993* AY664864 AY164586 (D. Don) Loudon (H.S. Rai 1002, ALTA) Agathis robusta EF490502 EF494250 EF490512 EF490506 EF490515 EF490509 EF490521 EF490518 (C. Moore ex F. Muell.) F.M. Bailey (037944-037947, GAU) Araucaria bidwillii AY664830 AY902170 AY664852 AY664840 AY664846 U96472* AY664865 AY664816 Hook. (H.S. Rai 1006, ALTA) Araucaria cunninghamia EF490503 EF494251 EF490513 EF490507 EF490516 EF490510 EF490522 EF490519 Aiton ex D. Don (037942 & 037943, GAU) Wollemia nobilis EF490504 EF494249 EF490511 EF490505 EF490514 EF490508 EF490520 EF490517 W.G. Jones, K.D. Hill & J.M. Allen (no voucher\u00E2\u0080\u00A0) 64 Gene or region _______________________________________________________________________________________________ Taxon atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, (Voucher, herbarium) & psbH & psbJ rps7& ndhB Cephalotaxaceae Cephalotaxus harringtonii AY664831 AY902171 AF528896 AF528923 AF528869 AF227461* AY664866 AY664817 (Knight ex J. Forbes) K. Koch (R.G. Olmstead 2000-55, WTU) Cupressaceae s.l. Cunninghamia lanceolata AY664833 AY902174 AF528898 AF528925 AF528871 L25757* AY664869 AY664820 (Lamb.) Hook. (P.A. Reeves & J. Metropulos 18, WTU) Juniperus communis L. AY664834 AY902175 AY664854 AY664842 AY664848 AY664859 AY664870 AY664821 (H.S. Rai 1011, ALTA) Taxodium distichum AY664835 AY902176 AF528915 AF525949 AF528888 AF119185* AY664871 AY664822 (L.) Rich. (K. Ikegama 2002-1, WTU) Thuja plicata AY664836 AY902177 AF528917 AF528942 AF528890 AF127428* n/a AY664823 Donn ex. D. Don (P.A. Reeves & J. Metropulos 19, WTU) Widdringtonia cedarbergensis n/a AY902178 AF528918 AF528943 AF528891 AY140261 AY664872 AY664824 J.A. Marsh (H.S. Rai 1001, ALTA) 65 Gene or region _______________________________________________________________________________________________ Taxon atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, (Voucher, herbarium) & psbH & psbJ rps7& ndhB Pinaceae Abies lasiocarpa AY664825 n/a AY664849 AY664837 AY664843 AY664855 AY664860 AY664813 (Hook.) Nutt. (R.G. Olmstead 2001-82, WTU) Pseudotsuga menziesii AY664826 n/a AY664850 AY664838 AY664844 AY664856 AY664861 AY664814 (Mirb.) Franco (H.S. Rai 1022, ALTA) Podocarpaceae Phyllocladus alpinus AY664827 AY902167 AF528905 AF528933 AF528879 AF249650* AY664862 AY237142 Hook. f. (R.G. Olmstead 2000-54, WTU) Saxegothaea conspicua AY664828 AY902168 AY664851 AY664839 AY664845 AY664857 AY664863 AY664815 Lindl. (D.M. Cherniawsky ZB-VI-VII, ALTA) 66 Gene or region _______________________________________________________________________________________________ Taxon atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, (Voucher, herbarium) & psbH & psbJ rps7& ndhB Taxaceae Taxus brevifolia Nutt. AF528864 AY902172 AF528916 AF525948 AF528889 AF249666* AY664867 AY664818 (A. Colwell 2000-32, WTU) Torreya californica Torr. AY664832 AY902173 AY664853 AY664841 AY664847 AY664858 AY664868 AY664819 (H.S. Rai 1008, ALTA) Notes. * Previously published sequences; see Graham and Olmstead (2000a,b) and Chapter 2 for a complete list of taxa and accession numbers for other taxa considered here, including the following conifers: Cedrus deodora and Pinus thunbergii (Pinaceae), Metasequoia glyptostroboides (Cupressaceae), Podocarpus chinensis (Podocarpaceae) and Sciadopitys verticillata (Sciadopityaceae). \u00E2\u0080\u00A0 This voucherless sample is from the same population as vouchered specimens described in Jones et al. (1995). 67 Figure 3.1. Plastid-based phylogeny of the conifers and relatives inferred from MP for 15-17 chloroplast genes and associated noncoding regions (three introns and eight intergenic spacer regions). This single most-parsimonious tree (21,532 steps, CI=0.541, RI=0.659) using) is depicted as a phylogram, with ACCTRAN optimization of branch lengths. MP bootstrap values are indicated beside branches. 68 Figure 3.2. Plastid-based phylogeny of the conifers and relatives inferred from ML for 15-17 chloroplast genes and associated noncoding regions (three introns and eight intergenic spacer regions) using the GTR + ! + I model of sequence evolution. The ML tree (-lnL=128,297.454) is depicted as a phylogram. ML bootstrap values are indicated beside branches. 69 Figure 3.3. Summary of bootstrap support after removal of sites classified as the two of nine fastest rate classes for 15-17 chloroplast genes and associated noncoding regions. Bootstrap values are indicated beside branches (left- and right-hand values are for MP and ML analysis, respectively; \u00E2\u0080\u0098-\u00E2\u0080\u0099 refers to < 50% support). The topology shown is that of the best MP tree when the fastest sites are removed, with outgroups pruned for clarity. 70 Figure 3.4. Dot-plot showing the pairwise similarity of complete translated sequences of the plastid rps7 locus from selected conifers (Pinus, Pinaceae; Podocarpus, Podocarpaceae; Wollemia; Araucariaceae; Sciadopitys, Sciadopityaceae) using a PAM-30 amino-acid substitution model, an 11-residue sliding window and a gray scale of 58-77%. 71 3.5 REFERENCES ANGIOSPERM PHYLOGENY GROUP (APG II). 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141: 399-436. BACKLUND, A., AND K. BREMER. 1998. To be or not to be \u00E2\u0080\u0093 principles of classification and monotypic plant families. Taxon 47: 391-400. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales\u00E2\u0080\u0099 closest relatives are conifers. Proc. Nat. Acad. Sci. 97: 4092-4097. BRUNSFELD, S. J., P. E. SOLTIS, D. E. SOLTIS, P. A. GADEK, C. J. QUINN, D. D. STRENGE, AND T. A. RANKER. 1994. Phylogenetic relationships among the genera of Taxodiaceae and Cupressaceae: evidence from rbcL sequences. Syst. Bot. 19: 253-262. BURLEIGH, J. G., AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Amer. J. Bot. 91: 1599-1613. BURLEIGH, J. G., AND S. MATHEWS. 2007a. Assessing among-locus variation in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 111-124. BURLEIGH, J. G., AND S. MATHEWS. 2007b. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, AND M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822-846. CHAMBERLAIN, C. J. 1935. Gymnosperms, structure and evolution. Chicago University Press, Chicago, Illinois. 72 CHAMBERS, T. C., A. N. DRINNAN, AND S. MCLOUGHLIN. 1998. Some morphological features of Wollemi Pine (Wollemia nobilis: Araucariaceae) and their comparison to cretaceous plant fossils. Int. J. Plant Sci. 159: 160-171. CHAW, S-M., A. ZHARKIKH, H-M. SUNG, T-C. LAU, AND W-H. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68. CHAW, S-M., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Acad. Nat. Sci. 97: 4086-4091. CHENG, Y., R. G. NICOLSON, K. TRIPP, AND S-M. CHAW. 2000. Phylogeny of Taxaceae and Cephalotaxaceae genera inferred from chloroplast matK gene and nuclear rDNA ITS region. Mol. Phylogenet. Evol. 14: 353-365. CONRAN, J. G., G. M. WOOD, P. G. MARTIN, J. M. DOWD, C. J. QUINN, P. A. GADEK, AND R. A. PRICE. 2000. Generic relationships within and between the gymnosperm families Podocarpaceae and Phyllocladaceae based on an analysis of the chloroplast gene rbcL. Aust. J. Bot. 48: 715-724. DONOGHUE, M. J., AND J. A. DOYLE. 2000. Seed plant phylogeny: demise of the anthophyte hypothesis? Curr. Biol. 10: R106-R109. DOYLE, J. A. 1998. Phylogeny of vascular plants. Annu. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A. 2005. Seed ferns and the origin of angiosperms. J. Torrey. Bot. Soc. 133: 169- 209. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. 73 ECKENWALDER, J. E. 1976. Re-evaluation of Cupressaceae and Taxodiaceae: A proposed merger. Madro\u00C3\u00B1o 23: 237-256. ENRIGHT, N. J., AND R. S. HILL. 1995. Ecology of the southern conifers. Smithsonian Institution Press, Washington, D.C. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783\u00E2\u0080\u0093791. FLORIN, R. 1951. Evolution in cordaites and conifers. Acta Horti Berg. 15: 285-388. GADEK, P. A., D. L. ALPERS, M. M. HESLEWOOD, AND C. J. QUINN. 2000. Relationships within Cupressaceae sensu lato: A combined morphological and molecular approach. Amer. J. Bot. 87: 1044-1057. GILMORE, S., AND K.D. HILL. 1997. Relationships of the Wollemi Pine (Wollemia nobilis) and a molecular phylogeny of the Araucariaceae. Telopea 7: 275-291. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Amer. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S.W., J.R. KOHN, B. R. MORTON, J. E. ECKENWALDER, AND S. C. H. BARRETT. 1998. Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. Syst. Biol. 47: 545\u00E2\u0080\u0093567. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels 74 and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161: S83- S96. GRAHAM, S. W., J. M. ZGURSKI, M. A. MCPHERSON, D. M. CHERNIAWSKI, J. M. SAARELA, E. F. C. HORNE, S.Y. SMITH, W. A. WONG, H. E. O\u00E2\u0080\u0099BRIEN, V. L. BIRON, J. C. PIRES, R. G. OLMSTEAD, M. W. CHASE, AND H. S. RAI. 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Pp. 3-21. In Monocots: comparative biology and evolution (excluding Poales) Edited by J. T. Columbus, E. A. Friar, J. M. Porter, L. M. Prince, and M. G. Simpson. Rancho Santa Ana Botanic Garden, Claremont, CA. GUGERLI, F., C. SPERISEN, U. B\u00C3\u009CCHLER, I. BRUNNER, S. BRODBECK, J. D. PALMER, AND Y-L. QUI. 2001. The evolutionary split of Pinaceae from other conifer: evidence from an intron loss and a multigene phylogeny. Mol. Phylogenet. Evol. 21: 167-175. GUINDON S., AND O. GASCUEL. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704. HART, J. A. 1987. A cladistic analysis of conifers: Preliminary results. J. Arnold Arb. 68: 269- 307. HILLIS, D. 1998. Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. 42: 182-192. HOGBIN, P. M., R. PEAKALL, AND M. A. SYDES. 2000. Achieving practical outcomes from genetic studies of rare plants. Australian Journal of Botany 48: 375-382 JONES, W. G., K. D. HILL, AND J. M. ALLEN. 1995. Wollemia nobilis, a new living Australian genus and species in the Araucariaceae. Telopea 6: 173-176. 75 JUNIER, T., AND M. PAGNI. 1999 Dotlet: diagonal plots in a web browser. Bioinformatics 16: 178-179. KELCH, D. G. 1997. The phylogeny of the Podocarpaceae based on morphological evidence. Syst. Bot. 22: 113-131. KELCH, D. G. 1998. Phylogeny of Podocarpaceae: comparison of evidence from morphology and 18S rDNA. Amer. J. Bot. 85: 986-996. KENG, H. 1973. On the family Phyllocladaceae. Taiwania 18: 142-145. KENG, H. 1978. The genus Phyllocladus (Phyllocladaceae). J. Arnold. Arb. 59: 249-273. KOSAKOVSKY POND, S. L., S. D. W. FROST, AND S. V. MUSE. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676-679. KUSUMI, J., Y. TSUMURA, H. YOSHIMARU, AND H. TACHIDA. 2000. Phylogenetic relationships in Taxodiaceae and Cupressaceae sensu stricto based on matK gene, chlL gene, trnL-trnF IGS region, and trnL intron sequences. Amer. J. Bot. 87: 1480-1488. LEEBENS-MACK, J., L. A. RAUBESON, L. CUI, J. V. KUEHL, M. H. FOURCADE, T. W. CHUMLEY, J. L. BOORE, R. K. JANSEN, AND C. W. DEPAMPHILIS. 2005. Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one's way out of the Felsenstein zone. Mol. Biol. Evol. 22: 1948-1963. MILLER, N. M. JR. 1999. Implications of fossil conifers for the phylogenetic relationships of living families. Bot. Rev. 65: 239-277. MUNDRY, M., AND T. ST\u00C3\u009CTZEL. 2004. Morphogenesis of the reproductive shoots of Welwitschia mirabilis and Ephedra distachya (Gnetales), and its evolutionary implications. Organisms, Diversity and Evolution 4: 91-108. 76 OLMSTEAD, R. G., P. A. REEVES. AND A. C. YEN. 1998. Patterns of sequence evolution and implications for parsimony analysis of chloroplast DNA. Pp. 164-187. In Molecular systematics of plants II: DNA sequencing. Edited by P. S. Soltis, D. E. Soltis, and J. J. Doyle. Kluwer, Boston, MA. PAGE, C. N. 1990a. Phyllocladaceae. Pp. 317-319. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PAGE, C. N. 1990b. Podocarpaceae. Pp. 332-346. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PAGE, C. N. 1990c. Sciadopityaceae. Pp. 346-348. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PAGE, C. N. 1990d. Taxaceae. Pp. 346-353. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PEAKALL, R., D. EBERT, L. SCOTT, P. MEAGHER, AND C. OFFORD. 2003. Comparative genetic study confirms exceptionally low genetic variation in the ancient and endangered relictual conifer, Wollemia nobilis (Araucariaceae). Mol. Ecol. 12: 2331-2343. PILGER, R. 1926. Coniferae. Pp. 121-407. In Die Nat\u00C3\u00BCrlichen Pflanzenfamilien 2 nd edition. Edited by A. Engler and K. Prantl. W. Engelmann, Leipzig. POSADA, D., AND K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14: 817-818. QUINN, C. J. 1986. Embryogeny in Phyllocladus. New Zealand J. Bot. 24: 575-580. QUINN, C. J. 1987. The Phyllocladaceae Keng \u00E2\u0080\u0093 a critique. Taxon 36: 559-565. QUINN, C. J., R. A. PRICE, AND P. A. GADEK. 2002. Familial concepts and relationships in the conifers based on rbcL and matK sequence comparisons. Kew Bull. 57: 513-531. 77 RAI, H. S., H. E. O\u00E2\u0080\u0099BRIEN, P. A. REEVES, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phylogenet. Evol. 29: 350-359. RAMBAUT, A. 1998. \u00E2\u0080\u009CSe-Al (Sequence Alignment Editor Version 1.0),\u00E2\u0080\u009D Computer program and documentation. Department of Zoology, University of Oxford, UK. RAUBESON, L. A., AND R. K. JANSEN. 1992. A rare chloroplast-DNA structural mutation is shared by all conifers. Biochem. Syst. Ecol. 20: 17-24. RYDIN, C., M. K\u00C3\u0084LLERSJ\u00C3\u0096, E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci 16: 197-214. ROTHWELL, G. W., AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482. Saarela, J. M., H. S. RAI, J. A. DOYLE, P. K. ENDRESS, S. MATHEWS, A. D. MARCHANT, B. G. BRIGGS, AND S. W. GRAHAM. 2007. Hydatellaceae identified as a new branch near the base of the angiosperm phylogenetic tree. Nature 446: 312-315. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17: 782-797. SETOGUCHI, H., T. A. OSAWA, J-C. PINTAUD, T. JAFFR\u00C3\u0089, AND J-M. VEILLON. 1998. Phylogenetic relationships within Araucariaceae based on rbcL gene sequences. Amer. J. Bot. 85: 1507-1516. SINCLAIR, W. T., R.R. MILL, M. F. GARDNER, P. WOLTZ, T. JAFFR\u00C3\u0089, J. PRESTON, M. L. HOLLINGSWORTH, A. PONGE, AND M. M\u00C3\u0096LLER. 2002. Evolutionary relationships of the 78 New Caledonian heterotrophic conifer, Parasitaxus usta (Podocarpaceae), inferred from chloroplast trnL-F intron/spacer and nuclear rDNA ITS2 sequences. Plant Syst. Evol. 233: 79-104. STEFANOVIC, S., M. JAGER, J. DEUTSCH, J. BROUTIN, AND M. MASSELOT. 1998. Phylogenetic relationships of conifers inferred from partial 28S rRNA gene sequences. Amer. J. Bot. 85: 688-697. STOCKEY, R. A., J. KVACEK, R. S. HILL, G. W. ROTHWELL, AND K. KVACEK. 2005. The fossil record of Cupressaceae s. lat. Pp. 64-68. In A monograph of Cupressaceae and Sciadopitys. Edited by A. Farjon. Royal Botanic Gardens, Kew, UK. ST\u00C3\u009CTZEL, T., AND I. R\u00C3\u0096WEKAMP. 1999. Female reproductive structures in Taxales. Flora 194: 145-157. SULLIVAN, J., AND D. L. SWOFFORD. 2001. Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution patterns are violated? Syst. Biol. 50: 723-729. SWOFFORD, D. L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, AND D. M. HILLIS. 1996. Phylogenetic inference. Pp. 407-514. In Molecular Systematics, 2nd edition. USA. Edited by D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, MA. TAKHTAJAN, A. L. 1953. Phylogenetic principles of the system of higher plants. Bot. Rev. 19: 1-45. 79 TOMLINSON, P. B., AND T. TAKASO. 2002. Seed cone structure in conifers in relation to development and pollination: a biological approach. Can. J. Bot. 80: 1250-1273. WAKASUGI, T., J. TSUDZUKI, S. ITO, K. NAKASHIMA, T. TSUDZUKI, AND M. SUGIURA. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Nat. Acad. Sci. 91: 9794-9798. YANG, Z. 1998. On the best evolutionary rate for phylogenetic analysis. Syst. Biol. 47: 125- 133. YAO, X., T. N. TAYLOR, AND E. L. TAYLOR. 1997. A taxodiaceous seed cone from the Triassic of Antarctica. Amer. J. Bot. 84: 343-354. ZGURSKI, J. M., H. S. RAI, Q. M. FAI, D. J. BOGLER, J. FRANCISCO-ORTEGA, AND S. W. GRAHAM. 2008. How well do we understand the overall backbone of cycad phylogeny? New insights from a large, multigene plastid data set. Mol. Phylogenet. Evol. 47: 1232- 1237. 80 CHAPTER 4 1 INFERENCE AND MISINFERENCE OF HIGHER-ORDER SEED-PLANT RELATIONSHIPS FROM PLASTID DATA 4.1 INTRODUCTION The reconstruction of extant seed-plant phylogenetic relationships is recognized as one of the most difficult unresolved problems in plant systematics (e.g., Donoghue and Doyle, 2000; Burleigh and Mathews, 2004). Despite substantial clarification from molecular data concerning relationships within each major clade of seed plant (e.g., see Chapters 2, 3 for cycads and conifers), considerable uncertainty still persists regarding the question of the overall pattern of seed-plant phylogeny. Indeed, although individual studies sometimes find modest to strong support for particular relationships (e.g., a possible sister-group relationship between Ginkgo and cycads, Chapter 2), there is no clear consensus on what the sister group is for any of the five major, extant seed plant clades. A wide range of studies of morphological and molecular evidence have given many different and often strongly conflicting results (e.g., Hill and Crane, 1982; Doyle and Donoghue, 1986; Loconte and Stevenson, 1990; Rothwell and Serbet, 1994; Boivin et al., 1996; Doyle, 1996, 2006; Goremykin et al., 1996; Chaw et al., 1997, 2000; Winter et al., 1999; Bowe et al., 2000; Sanderson, 2000; Rydin et al., 2002; Soltis et al., 2002; Friis et al. 2007; Chapters 2, 3). Much of the discussion of higher-order seed plant relationships has focussed on the relative placement of Gnetales along the backbone of seed-plant phylogeny, 1 A version of this chapter will be submitted for publication: RAI, H. S., AND S. W. GRAHAM. Inference and misinference of higher-order seed-plant relationships from plastid data. 81 particularly their placement relative to the conifers. In some analyses, conifers are found not to be monophyletic, and Gnetales are then usually found to be the sister group of Pinaceae, the so-called \u00E2\u0080\u009Cgnepine\u00E2\u0080\u009D hypothesis (Bowe et al, 2000; Chaw et al, 2000; Hajibabaei et al., 2006; Fig. 4.1). Other studies support a \u00E2\u0080\u009CGnetales-sister\u00E2\u0080\u009D hypothesis in which Gnetales are the sister-group of all other seed plants, with the conifers then depicted as monophyletic (e.g., Rydin et al., 2002; Chapters 2, 3; Fig. 4.1). In contrast, various lines of morphological evidence from living and extinct taxa seem to support the idea that Gnetales are instead more closely related to the angiosperms among extant taxa, the so-called \u00E2\u0080\u009Canthophyte\u00E2\u0080\u009D hypothesis (e.g., Doyle, 1996, 2006; Hilton and Bateman, 2006; Fig. 4.1). The vast majority of molecular phylogenetic analyses have recovered trees that are inconsistent with the anthophyte hypothesis (but see Stefanovic et al., 1998). Features possessed by some or all members of Gnetales, such as aspects of their seed architecture (see Friis et al. 2007), their net-veined leaves and vessels, lend morphological support to the idea of a close association between Gnetales and angiosperms (Nixon et al., 1994; Rothwell and Serbet, 1994; Doyle, 1996, 2006). However, in his recent morphological re-analysis, Doyle (2006) found trees only one step longer than the shortest MP trees that depict Gnetales as more closely related to conifers than to angiosperms. A significant problem that has plagued plant molecular systematists is that gymnosperms have a diverse and ancient evolutionary history. Relatively few major extant clades have persisted to the present day, relative to the total diversity of gymnosperm clades that have left traces in the fossil record (e.g., Rothwell, 1982; Rothwell and Serbet, 1994; Crane et al., 2004), with a net effect that the major extant crown groups of gymnosperms are subtended by long interior (non-terminal) branches. There has been considerable speculation 82 (e.g., Sanderson et al. 2000; Rothwell and Stockey, 2002; Burleigh and Mathews, 2004) that long-branch attraction (LBA) or other sources of systematic bias may give rise to strongly misleading results in this situation. If this is correct, then strong bootstrap support values (and bayesian posterior probabilities) for seed-plant relationships may be strongly misleading (see Felsenstein, 1978; Hendy and Penny, 1989; Bergsten, 2005) in at least a subset of studies, and perhaps all of them. However, we should expect model-based methods to be considerably less prone to this problem than maximum parsimony, so long as the analytical model and associated parameters used for analysis provide adequate estimates of the real (unknown) pattern of DNA substitution (see Huelsenbeck, 1995; Chang, 1996; Swofford et al. 2001). Burleigh and Mathews (2004) showed that different site-rate classes estimated for a 13-locus data set comprising plastid, mitochondrial and nuclear data, tend to favour different hypotheses of seed-plant relationship in MP analysis. For example, they found that sites evolving at intermediate rates support the gnepine hypothesis, whereas faster ones favour the Gnetales-sister hypothesis. By removing the fastest classes they inferred well-supported gnepine trees, whereas the full dataset yielded well-supported Gnetales-sister trees. They also found that it is difficult to differentiate between trees that support the gnepine hypothesis and trees that support a \u00E2\u0080\u009Cgnetifer\u00E2\u0080\u009D hypothesis (i.e., with Gnetales as the sister group of Pinaceae vs. of a clade comprising all conifers), as both hypotheses are supported by many sites in the fastest rate categories, and relatively fewer sites in intermediate rate categories. In an analysis of stratigraphic data, they also found that the fossil record is more consistent with the anthophyte and gnepine hypotheses than Gnetales-sister reconstructions. 83 Several studies have addressed the potential effect of systematic error in phylogenetic inference using Monte Carlo simulations (e.g., Huelsenbeck et al., 1998; Maddison et al., 1999 and Sanderson et al., 2000). This method has been shown to be useful for quantifying type I and II error rates on tree reconstruction by generating simulated data that are based upon parameters (including tree structure) determined from the original data set (Sanderson et al., 2000). In their study, Sanderson et al. (2000) found biases favouring the Gnetales- sister topology and against recovering the anthophyte hypothesis when MP was the phylogenetic criterion, for several plastid genes. More recently, Burleigh and Mathews (2007b) used this technique on a 12-locus data set that included sequence information from all three genomic compartments, and found similar biases when MP was used. However, they also found that when maximum likelihood (ML) is used instead, the apparent bias appears to be limited mostly against recovering the anthophyte hypothesis. I present sequence data from 17 slowly evolving plastid genes for a broad sampling of the major seed plant clades to address deep seed-plant phylogeny. The conservative nature of these characters is predicted to be particularly useful in the reconstruction of deep phylogenetic relationships, as they should be less prone to long-branch attraction than faster genes (Felsenstein, 1983; Graham and Olmstead, 2000a), and they have proven their usefulness for other deep and difficult phylogenetic questions in vascular-plant phylogeny (e.g., Graham and Olmstead, 2000a; Graham et al., 2006; Saarela et al., 2007; Zgurski et al., 2008; Chapters 2, 3, 5). All three families of Gnetales and a sampling of cycads (Chapter 2) are included here, and I include representatives of all major branches of conifer phylogeny (Chapter 3), in addition to Ginkgo and a representative sampling from the basal splits of angiosperm phylogeny. 84 This study is the largest to date with this level of taxon sampling (in terms of the amount of data per taxon), and it focuses exclusively on the plastid genome (cf. Burleigh and Mathews, 2004). I use the methodology outlined in Sanderson et al. (2000) to explore the possibility that systematic error may badly distort some aspects of seed-plant phylogenetic inference from plastid data, and assess the potential for this problem using two different phylogenetic criteria (MP and ML). I repeat these analyses for different sub-partitions of the data that might be expected to favour different phylogenetic hypotheses due to their different rates (rapid vs. slow), specifically for two different codon position partitions (after Sanderson et al., 2000), and for re-partitionings of the protein-coding data based on sites that are classified as slowly vs. rapidly evolving (after Burleigh and Mathews, 2004). 4.2 MATERIALS AND METHODS 4.2.1 Taxonomic and Genomic Sampling I surveyed 17 genes that collectively represent approximately one-eighth to one-ninth of the gymnosperm plastid genome. The coding regions include atpB, rbcL, ten photosystem II (psb) genes, three ribosomal protein genes, and two NADH dehydrogenase subunit genes (see Chapters 2, 3). The final matrix includes five species selected from a broad sample of the diversity of basal angiosperms (see Mathews and Donoghue, 1999; Parkinson et al., 1999; Soltis et al., 1999; Graham and Olmstead, 2000a, b; Graham et al., 2000; Qiu et al., 2000; Savolainen et al., 2000; APG II, 2003), two cycads (Cycas revoluta and Dioon purpusii), three Gnetales (representing the three extant families), 19 conifers (Pinus thunbergii was obtained from GenBank; accession number NC_001631), and five outgroup species. The outgroups are two pteridophytes (Adiantum capillus-veneris and Psilotum 85 nudum; GenBank accession numbers NC_004766 and NC_003386) and three bryophytes (Anthoceros formosae, Marchantia polymorpha and Physcomitrella patens; GenBank accession numbers NC_004543, NC_00319 and NC_005087, respectively). The 19 conifer species examined include at least one representative from each of the eight recognized families in Farjon (2001), the largest sampling to date of this diverse clade for this amount of plastid data. These sequences are a subset of those found in Chapter 3. GenBank numbers for the cycads, Gnetales and the angiosperms are provided in Table 2.1 and Graham and Olmstead (2000a, b). GenBank numbers for the 19 conifers are provided in Table 3.1. The final alignment used here comprises 35 taxa (30 seed plant taxa and five outgroups; but note that I excluded three conifer taxa here from the larger taxon set considered in Chapter 3). 4.2.2 DNA Extraction, Amplification, Sequencing and Data Assembly I extracted DNA from fresh and silica-dried specimens using the protocol of Doyle and Doyle (1987), as modified in Chapter 2. DNA amplification and sequencing methods are outlined in Graham and Olmstead (2000a). I sequenced all regions at least twice for each taxon and, with a few minor exceptions, completely sequenced all products in both forward and reverse directions. I designed a set of 27 new seed-plant specific primers to facilitate amplification and sequencing (Table 4.1). I compiled sequence contigs and performed base calling using Sequencher 4.1 (Gene Codes Corporation; Ann Arbor, MI.). I added these data to a previously generated alignment (Graham et al., 2006) and adjusted this manually in Se-Al version 1.0 (Rambaut, 1998) using alignment criteria outlined in Graham et al. (2000). I used tobacco, Ginkgo and Pinus sequences to determine gene and exon boundaries, and codon positions for each nucleotide. I 86 decided to focus exclusively on coding regions for the analyses in this chapter; it is not yet possible to realistically simulate indels in non-coding regions, which can be large and often have complex patterns of overlap. I obtained most of the regions for most taxa, but several regions have either been lost from the plastid genome (e.g., ndhB and ndhF for all representatives of Pinaceae examined to date; Raubeson and Jansen, 1992; Wakasugi et al., 1994; ndh genes for Welwitschia, McCoy et al., 2008) or I could not amplify or sequence them (see Table 3.1 for further details; rpl2 is now reported as present but hard-to-align in Welwitschia; McCoy et al., 2008). I coded these regions as missing data in the final matrix. The aligned coding regions considered for analysis comprise 12,635 bp per taxon (corresponding to 7,441 bp unaligned in Welwitschia, which has the most genes missing in the matrix, and 11,396 bp unaligned in Ginkgo, a more typical size for most taxa). Of these characters, 1,659 bp are variable but uninformative across land plants, and 4,220 bp are parsimony informative. 4.2.3 Phylogenetic Analyses I performed an heuristic MP search using PAUP* (Swofford, 2002), with all characters and character state changes equally weighted, and using TBR (tree-bisection- reconnection) branch swapping, with 100 random addition replicates, and otherwise using default settings. I also performed an ML heuristic search using PhyML (v.2.4.4; Guindon and Gascuel, 2003), with a BIONJ starting tree, NNI (nearest-neighbour-interchange) branch swapping and model parameters estimated from the data in each case. I chose a DNA substitution model for ML analysis using the hierarchical likelihood ratio test (hLRT) and the Akaike Information Criterion (AIC) in Modeltest v. 3.7 (Posada and Crandall, 1998). The 87 optimal model in both cases was GTR + ! + I [general-time-reversible (GTR) rate matrix with the proportion of invariable sites (I) considered, and among-site rate variation accounted for using the gamma (!) distribution as described by the shape parameter alpha (\")]. I performed non-parametric bootstrap analysis (Felsenstein, 1985) using the same search criteria, with 100 bootstrap replicates (and a single random addition replicated per parsimony bootstrap replicate). I use \u00E2\u0080\u0098weak,\u00E2\u0080\u0099 \u00E2\u0080\u0098moderate,\u00E2\u0080\u0099 and \u00E2\u0080\u0098strong\u00E2\u0080\u0099 in reference to clades that have bootstrap support values < 70%, 70-89%, and ! 90%, respectively (e.g., Graham et al. 1998). 4.2.4 Inference of Nucleotide Rate Classes I partitioned the data into different rate classes using HyPhy (Kosakovsky Pond et al., 2005). HyPhy allows for partitioning of data by site rate, based on an estimate of the most likely rate class for each nucleotide given a specified model (in this case the GTR model with eight discrete rate classes), for a given user-supplied tree (here I used the best MP topology). HyPhy uses the tree to estimate all likelihood parameters, and assigns each site to its most likely individual rate category (the total number of rate classes is user specified). I partitioned the data set into nine rate classes (RC0 representing sites with no change, and RC8 the fastest sites). 4.2.5 Systematic Error I used Monte Carlo simulation studies (Sanderson et al., 2000) to explore the possibility that long-branch attraction (LBA) or other types of systematic bias may badly distort major aspects of seed-plant phylogenetic inference using MP or ML analysis. To generate model trees for the simulations I performed constrained ML searches of the full data 88 set in PAUP*, in which the topological constraints reflect particular hypotheses of monophyly. I investigated constraints corresponding to four major hypotheses of seed-plant relationship (see Fig. 4.1; the constrained branch or branches yielding an hypothesis are indicated in each case with an asterisk): A \u00E2\u0080\u0093 Gnetales-sister (Gnetales as the sister group of all other seed plants); B \u00E2\u0080\u0093 Gnepine (Gnetales as the sister group of Pinaceae); C \u00E2\u0080\u0093 Gnetifer (Gnetales as the sister group of monophyletic conifers); D \u00E2\u0080\u0093 Anthophyte (Gnetales as the sister group of angiosperms). I simulated 1000 new data sets with Seq-Gen, using each resulting ML tree (v.1.3.2; Rambaut and Grassly, 1997; only the first 100 of these were considered for subsequent ML searches in each case, due to time constraints) and considering the same model that I used for the constrained ML searches (i.e., GTR + ! + I, in addition to the estimated ML branch lengths and model parameters in each case. These simulated matrices were all set to be the same size as the original data partitions (12,635 bp) that they were based on. I then performed unconstrained MP and ML searches on the simulated data sets using the same search settings that I used for the real data (e.g., allowing model parameters to be estimated from each simulated data set, for the ML searches). These were run using the batch modes of PAUP* and PhyML, respectively. Trees resulting from these searches were imported into PAUP* and scored for specific hypotheses (i.e., topologies corresponding to each set of asterisked branches in Figs. 4.1), by filtering for topologies that do not satisfy each constraint (or constraint set; Fig. 4.1A, C), in turn. I scored trees that did not fall into the four constraint categories explored here as \u00E2\u0080\u009Cother topologies recovered.\u00E2\u0080\u009D The results are summarized in a matrix in which the diagonal elements of a row give the probability of obtaining the relationship specified by the particular hypothesis used to simulate that row 89 (i.e., one minus an estimate of the type I error), and the off-diagonal elements of a column give the probability of reconstructing an incorrect relationship given the null hypothesis specified in that column (i.e., estimates of type II error). I repeated these simulation analyses on four different partitions of the data. The data partitions considered are: (1) Combined data for the first two codon positions from the plastid genes (ignoring a small overlap in psbC and psbD, in which each site falls into two different codon position classes); (2) Third codon position sites from all genes combined; (3) The six slowest rate classes defined by HyPhy, RC0-RC6 (where RC1 is the slowest class, and RC6 the fastest of these); (4) The two fastest rate classes, RC7 + RC8 (where RC8 is the fastest class of all). 4.3 RESULTS 4.3.1 Phylogenetic Analysis of the Real Data Maximum parsimony and likelihood (MP and ML) analyses of the full data set recover the same phylogenetic relationships (subtree topologies) within angiosperms and conifers, the two major seed-plant clades where I examined more than two taxa per clade (Figs. 4.2, 4.3, Supplementary Figure). They strongly support the monophyly of seed-plants as a whole, of each of the four non-monotypic clades (angiosperms, conifers, cycads, Gnetales) and of Pinaceae (94-100% support; Figs. 4.2, 4.3). However, there are strong conflicts between MP and ML concerning the relative arrangement of most of the major seed-plant clades. Maximum parsimony analysis strongly supports a clade consisting of cycads, Ginkgo and conifers, with the latter two groups then moderately supported as sister groups, and with the angiosperms strongly supported as sister to all seed plants except 90 Gnetales (Fig. 4.2). In contrast, ML analysis strongly supports a sister-group relationship between conifers and a clade consisting of Ginkgo, cycads and angiosperms, with the latter two groups then strongly supported as sister taxa (Fig. 4.3). A major common feature of higher-order seed plant relationship in MP and ML analyses of the full data set is that Gnetales are depicted as the sister group of all other seed plants, with strong support (Figs. 4.2, 4.3). I found that many characters that we might assume to be rapidly evolving because they belong to codon position 3 are actually distributed throughout the rate classes determined by HyPhy (Fig. 4.4 A, B). For example, ~41% of all sites in the moderately evolving rate classes (RC3-6; a total of 2820 sites) belong to codon position 3 (1148 sites); conversely, ~25% of the total characters in the fastest rate classes (RC78, a total of 3059 sites) belong to the \u00E2\u0080\u009Cconservative\u00E2\u0080\u009D codon position classes, 1 and 2 (i.e., 774 sites) (see Fig. 4.4A). Nonetheless, as might be expected, codon positions 1 and 2 contributed more characters than codon position 3 for each of the middle three rate classes (RC3-5; Fig. 4.4B). Analyses of the codon position and rate-class data partitions reveals that the Gnetales- sister relationship is strongly supported in MP and ML analyses of the two most rapidly evolving data partitions of the real data (i.e., codon position 3 and the RC78 data; headers in Tables 4.2, 4.3). In contrast, the slower data partitions typically provide moderate support for one of the two relationships considered between Gnetales and conifers: the gnepine hypothesis was moderately to strongly supported by MP and ML analysis of codon positions 1 and 2 (Fig. 4.5; headers in Tables 4.2 and 4.3), while the gnetifer hypothesis was weakly supported by ML analysis of RC0-6 (Fig. 4.5; header in Table 4.3). In contrast, MP analysis of RC0-6 yields moderate support for the Gnetales-sister hypothesis (Table 4.2). 91 4.3.2 Inference of Systematic Error Using Monte Carlo Simulations For hypothesis D (the anthophyte hypothesis) maximum likelihood was unable to assign any length to the branch subtending this clade in the tree used for simulations (i.e., based on the topology inferred using the constraint that consists of angiosperms plus Gnetales; Fig 4.1D); see arrow pointing to trichotomy in Fig. 4.6). I specifically used the ML branch lengths to generate all my simulated data; I therefore discuss the simulation results based on this tree separately (Table 4.4) from the three other hypotheses (A-C). I describe the results of analyses of the simulated data by referring first to MP and then ML results for the two most slowly evolving data partitions (RC0-6, and codon positions 1 and 2), the two fastest data partitions (RC78, and codon position 3), and finally the full data set. Recall that if there were no error in tree inference from the simulated data, the model tree (whatever that is) should always be recovered for a particular data partition. In other words, in Tables 4.2 and 4.3 ideally I should always infer high values along the diagonals, and zero (or low) values in the off-diagonals (i.e., low type I and type II error, respectively) for a given 3 x 3 matrix that summarizes the results for model trees considered for that data partition (Gnetales-sister, gnepine and gnetifer, respectively). For MP analysis, the data partitions that showed lowest error rates are for RC0-6 (close to 100% on the matrix diagonals) and codon positions 1 and 2 (91% recovery for Gnetales-sister, 100% for gnepine, 58% for gnetifer; in the final case the gnepine result was the most common mis-inference, 36%). The most poorly performing data partitions are RC78 and codon position 3 (Gnetales-sister inferred in nearly all instances; Table 4.2). For the full data set, MP inferred the correct tree when either the Gnetales-sister or gnepine hypotheses 92 were used as the correct tree, but did rather poorly when the gnetifer hypothesis was considered (I only recovered this model hypothesis 30% of the time; Gnetales-sister was then recovered 59% of the time). For ML, the simulated RC0-6 data again permitted very high recovery (100% in each case) of the model hypotheses, A-C (Table 4.3). For the codon position 1 and 2 partition, when the Gnetales-sister hypothesis was the model tree, this hypothesis was recovered slightly less often than for the corresponding MP case (81% for ML vs. 91% for MP; Tables 4.2 and 4.3). In contrast, when the gnetifer hypothesis was the model tree for this data partition, it was recovered more frequently (91% for ML vs. 58% for MP). The most rapidly evolving data partitions also performed less well for ML than the slower partitions. When RC78 was considered, ML is less error-prone than MP for the two model trees; when gnepine and gnetifer were used as model trees, they were typically recovered for this data partition with ML (100% and 76%; Table 4.3), and rarely (or never) recovered with MP (6% and 0%, respectively; Table 4.2). The most poorly performing data partition for ML, by far, was codon position 3, for which I inferred the Gnetales-sister hypothesis for all three model trees (A-C) 100% of the time. However, for the full data set, ML recovered all three model trees (A-C) with no inferred error (100% of cases; Table 4.3), in spite of the fact that the third codon position data constitute 58% of all 5879 variable sites. 4.3.3 Mis-inference of the Gnetales-Sister Hypothesis when there is No Evidence for It I noticed that the anthophyte topology had no support from the real data in an ML framework (Fig. 4.6; see zero-length branch in left-hand phylogram). ML analysis of all five simulated data partitions considered here led to the inference of three different hypotheses 93 when the anthophyte topology was used as the model tree, with a roughly even split among them (Table 4.4). For example, for RC0-6, Gnetales-sister is inferred 31% of the time, the anthophyte hypothesis 39% of the time and an additional hypothesis (with Gnetales-sister to the rest of the gymnosperms) 30% of the time. This indecision makes sense, as these three relationships are the only possible resolutions of the underlying trichotomy given the zero- length ML branch. In contrast, the MP analyses of the same simulated data (with this zero- length ML branch for anthophytes; Fig. 4.6, left-hand side) led to recovery of Gnetales-sister 100% of the time for three of five data partitions (i.e., for the full data, codon position 3 and RC78; Table 4.4). The only partition that came close to behaving as well as ML inference for the anthophyte hypothesis was RC0-6. MP analysis of this data partition yielded a roughly even split between the three same hypotheses found in the corresponding ML analyses (Table 4.4; ~41.5% for Gnetales-sister, ~31.5% for the anthophyte case and 27% for \u00E2\u0080\u0098Gnetales-sister to other gymnosperms\u00E2\u0080\u0099). 4.4 DISCUSSION There has been a long history of ambiguous and conflicting inferences of seed-plant phylogeny from a variety of data sources. It is worth remembering that these inferences involve very deep branches of land-plant phylogeny (seed-plants reach back at least 350 Myr; e.g. Elkinsia polymorpha; Serbet and Rothwell, 1992) and a very sparse sampling of the total diversity of major seed-plant clades (molecular data can only consider the five living groups of seed plants). Potential approaches to dealing with mis-inference of seed-plant phylogeny due to problematic long branches include considering conservative data (e.g., Graham and Olmstead, 2000a), removing data that are inferred (or presumed) to be rapidly 94 evolving (e.g., Burleigh and Mathews, 2004), including a reasonable density of taxa within major seed-plant clades (e.g., Rydin et al., 2002), using model-based approaches such as ML in preference to MP (e.g., Burleigh and Mathews, 2004, 2007b), and using simulated data to examine problematic data partitions (Sanderson et al., 2000; Burleigh and Mathews, 2004, 2007a, 2007b). These methods can also be combined (e.g., Burleigh and Mathews, 2007b), and this is the approach I took here, focussing exclusively on the performance of data from the plastid genome. Most molecular studies of seed-plant phylogeny have surveyed a few genes for many taxa (e.g., four genes for 88 gymnosperms in Rydin et al., 2002), or multiple genes for a few taxa (e.g., 12 genes for 10 gymnosperms in Burleigh and Mathews, 2007b). I considered 17 genes from 25 gymnosperms, with a particularly heavy sampling in the conifers (19 total, representing all major clades). I focussed exclusively on conservative protein-coding regions in the plastid genome in an attempt to refine our understanding of what the pitfalls may be in using this genome in inference of deep branches of land-plant phylogeny. I considered slower and faster data partitions, including first and second vs. third codon positions (changes in these largely reflect non-synonymous and synonymous changes, respectively; Sanderson et al., 2000), and data filtered for the fastest evolving ML rate classes. I also performed simulations based on these real data for a range of seed-plant hypotheses proposed in the literature, concerning the local placement of Gnetales in seed-plant phylogeny. Simulations cannot tell us what the correct answer is concerning Gnetales placement in seed-plant phylogeny. However, they can indicate which relationships may be hard to infer, and the conditions under which mis-inferences may occur. I generally find ML to be a less error-prone method than MP, except for third-codon position data, where I consistently 95 found a strong bias towards the Gnetales-sister hypothesis using both methods. The Gnetales-sister result has been seen in previous studies of seed-plant phylogeny (Rydin et al. 2002), including earlier iterations of this genomic set (Chapters 2, 3; Zgurski et al., 2008). More than half of the variable characters in the full data set belong to codon position 3 (and also RC78), despite this, ML simulations of the full data set do not exhibit the same bias towards the Gnetales-sister hypothesis (100% recovery of each hypothesis considered; Table 4.3). The simulations indicate that ML can \u00E2\u0080\u0098correct\u00E2\u0080\u0099 for poor characters (those that tend to lead to tree misinference), particularly when they do not form the entire data set. The data partitions that are more slowly evolving also tend to be less problematic for MP and ML analyses. It is common practice in studies that utilize protein-coding regions to partition data by codon position and subsequently exclude so-called \u00E2\u0080\u0098saturated\u00E2\u0080\u0099 third codon position data. I show that filtering out the two fastest ML rate classes is a more effective strategy in reducing systematic error than simply considering codon positions 1 and 2. The first two codon positions include some sites that are very rapidly evolving (~9% of the 8391 sites belonging to codon position 1 and 2 are in RC78; Fig. 4.4A), consistent with the diverse range of functional constraints found even in highly conserved proteins like rbcL (Kellogg and Juliano, 1997). Conversely, a substantial fraction of moderately evolving characters may be thrown out when all codon position 3 sites are excluded from analysis; ~33% of all 3433 variable third position sites lie outside the fastest two rate classes (Fig. 4.4A). Oddly, however, despite containing a larger fraction of conservative sites than the least conservative data (i.e., RC78, which solely includes these most rapid sites), the third codon position is clearly the most error-prone data partition considered here for ML analysis (Table 4.3). This strongly suggests that the very strong tendency I saw for codon position 3 data partition to 96 lead to tree mis-inference (Table 4.2, 4.3) is not solely a function of the high fraction of high- rate sites that it contains. One of the major hypotheses of seed-plant relationships, the anthophyte hypothesis, is only rarely recovered from molecular data (e.g., Stefanovic et al., 1998), and never with strong support. When I forced the real data to fit this hypothesis, I observed a startling contrast in how it is perceived by ML vs. MP. According to the former phylogenetic method, there are no characters that support this hypothesis (Fig. 4.6). In contrast, maximum parsimony infers that there are numerous characters that support this hypothesis in the real data. The source of the conflict between MP and ML branch lengths in this situation is unclear. Nonetheless, it is of interest to see how ML and MP perform for a model hypothesis for which there is no supporting data (I used the ML estimates of branch lengths for generating simulation data). One might expect in this situation to randomly resolve the relative positions of the three clades in the trichotomy resulting from this zero-length anthophyte branch (i.e., angiosperms, Gnetales and the remaining gymnosperms). This is indeed what I see for ML. Disturbingly, however, most MP analyses of simulated data sets for the anthophyte hypothesis (Table 4.4) strongly infer a result (the Gnetales-sister hypothesis) that is completely unsupported by these simulated data, indicated by the trichotomy inferred when the real data are constrained to the anthophyte hypothesis (i.e., the model tree used for simulation; Fig. 4.6). This underlines the need for considerable caution in all inferences of seed-plant phylogeny from molecular data. This suggests that it may be desirable in simulation studies of the error of tree inference to consider a range of branch length hypotheses for a given model tree topology, particularly if we suspect that ML inference of 97 branch length is itself subject to strong bias (e.g., if evolution is heterotachous and the ML model is not). My results also bring to light a more subtle problem. The simulation results suggest that ML is generally much less prone to tree misinference than MP. For example, for the three major hypotheses shown in Table 4.3, and considering the slowest rate classes (RC0-6), the first two codon positions, or the full data set, ML typically infers whatever I pose as the model tree (81-100% inference along the diagonals for the 3 x 3 matrix). The simulations therefore suggest that ML analysis of my plastid data for these three partitions of the data should be trustworthy. And yet, I infer different hypotheses using ML inference of the real data for these partitions (Figs. 4.3, 4.5): for RC0-6 I infer the gnetifer tree (with weak support; and also find moderate support, 84%, for the monophyly of conifers); for the first two codon positions I infer the gnepine tree (with moderate support, 88%) and for the full data set I infer the Gnetales-sister hypothesis (with strong support). This continuing conflict among partitions of the data, in situations where simulations predict them to be effective, demonstrates that the simulations and tree inference methods must not be capturing one or more critical aspects of the underlying (real) model. Future studies should therefore consider more complex ML models, including ones that may take account of heterotachy or other aspects of model heterogeneity across a tree (e.g., Chapter 2, and see Tuffley and Steel, 1998; Kolaczkowski and Thornton, 2004, 2008; Zhou et al., 2007). 98 TABLE 4.1. New primers designed for this study. Primer name/sequence (5\u00E2\u0080\u0099-3\u00E2\u0080\u0099) Gene/region B2F: CGTTCTAGTGCGTTGTAKATTC 3\u00E2\u0080\u0099-rps12 B3R: GATTGGAAATCRTGTATTTTC 3\u00E2\u0080\u0099-rps12 B4F: GTATGTACGGTTTGGAGGGAG 3\u00E2\u0080\u0099-rps12 B4R: GCATGAGTGTGAAAAAGGTTCC 3\u00E2\u0080\u0099-rps12 \u00E2\u0080\u0093 rps7 IGS B5F: CGTATTCTTAAACACGGAAAAAAATC rps7 C5F: ACTTGCYATTCGTTGGTTATTAG rps7 B6F: CAARAAGGAAGAGAYTCATAAAATG rps7 B6R: CATTTTATGARTCTCTTCCTTYTTG rps7 B7F: GGTTCTATTTCATCTCTTYAACAAG ndhB B7R: ATYAGRAGAAGAAATAGGCC ndhB C7R: GTTRAAGAGATGAAATAGAACCAAG ndhB 8F: ACTYTATGTATTCCTCTATCCG ndhB B9F: TCTGGATATACCAARAGAGATGTAC ndhB B9R: GTACATCTCTYTTGGTATATCCAG ndhB C10F: TGGTCTTATMAATACACAAATG ndhB D10F: TTTRCAAGTTMGTTATTACGGGTAG ndhB D10R: CGAATCRCACTCCTTCATATAC ndhB B11F: GAGAATCAAACGATTATGCTCATTTTTTTATC ndhB B12F: GGAGCCGTGCGAGAWGAAAG ndhB B13R: ATRCAAGCAAAAGTTCCTAAATTC ndhB 99 Primer name/sequence (5\u00E2\u0080\u0099-3\u00E2\u0080\u0099) Gene/region B14R: CACCAGAATAGATAAAGTTTTCC ndhB B40F: GGGTTGGTCCGGTCTATTGCTYTTTC psbD C40F: CTATTGCTYTTTCCTTGYGCTTATTTTGC psbD B83R: AAATCAAGTCCACCRCGTAGACATTC rbcL C91F: TTGTGAGGTACARCAATTATTAGG atpB B92R: TCCACYACTTTAATTCCTGTTTC atpB B93F: GGAAATGATCTTTAYATGGAAATG atpB 100 TABLE 4.2. Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum parsimony (MP) as a search criterion. The first row in each case notes the best supported hypothesis as inferred from the real data (marked \u00E2\u0080\u0098X\u00E2\u0080\u0099), and its bootstrap support; the remaining rows indicate the constraint (major hypothesis) used to infer model trees used for Monte Carlo simulations, and columns indicate the fraction of trees in 1000 simulations that are inferred for the corresponding major hypothesis. Data partition Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Tree D (Anthophyte) Other topologies recovered All codon positions Real data X (100%) Tree A (Gnetales-sister) 1.000 0.000 0.000 0.000 Tree B (Gnepine) 0.008 0.992 0.000 0.000 Tree C (Gnetifer) 0.589 0.053 0.301 0.000 0.057 Codon positions 1 and 2 Real data X (96%) Tree A (Gnetales-sister) 0.908 0.000 0.000 0.035 0.057 Tree B (Gnepine) 0.000 1.000 0.000 0.000 Tree C (Gnetifer) 0.000 0.364 0.582 0.000 0.054 Codon position 3 Real data X (100%) Tree A (Gnetales-sister) 1.000 0.000 0.000 0.000 Tree B (Gnepine) 1.000 0.000 0.000 0.000 Tree C (Gnetifer) 1.000 0.000 0.000 0.000 101 Data partition Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Tree D (Anthophyte) Other topologies recovered Rate Classes (RC) 0-6 Real data X (80%) Tree A (Gnetales-sister) 0.993 0.000 0.000 0.007 Tree B (Gnepine) 0.000 0.993 0.004 0.000 0.003 Tree C (Gnetifer) 0.000 0.004 0.996 0.000 RC78 Real data X (100%) Tree A (Gnetales-sister) 1.000 0.000 0.000 0.000 Tree B (Gnepine) 0.939 0.061 0.000 0.000 Tree C (Gnetifer) 1.000 0.000 0.000 0.000 102 TABLE 4.3. Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum likelihood (ML) as a search criterion. The first row in each case notes the best supported hypothesis as inferred from the real data (marked \u00E2\u0080\u0098X\u00E2\u0080\u0099), and its bootstrap support; the remaining rows indicate the constraint (major hypothesis) used to infer model trees used for Monte Carlo simulations, and columns indicate the fraction of trees in 100 simulations that are inferred for the corresponding major hypothesis. Data partition Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Tree D (Anthophyte) Other topologies recovered All codon positions Real data X (100%) Tree A (Gnetales-sister) 1.00 0.00 0.00 0.00 Tree B (Gnepine) 0.00 1.00 0.00 0.00 Tree C (Gnetifer) 0.00 0.00 1.00 0.00 Codon positions 1 and 2 Real data X (88%) Tree A (Gnetales-sister) 0.81 0.00 0.00 0.09 0.10 Tree B (Gnepine) 0.00 1.00 0.00 0.00 Tree C (Gnetifer) 0.00 0.05 0.91 0.00 0.04 Codon position 3 Real data X (100%) Tree A (Gnetales-sister) 1.00 0.00 0.00 0.00 Tree B (Gnepine) 1.00 0.00 0.00 0.00 Tree C (Gnetifer) 1.00 0.00 0.00 0.00 103 Data partition Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Tree D (Anthophyte) Other topologies recovered Rate Classes (RC) 0-6 Real data X (65%) Tree A (Gnetales-sister) 1.00 0.00 0.00 0.00 Tree B (Gnepine) 0.00 1.00 0.00 0.00 Tree C (Gnetifer) 0.00 0.00 1.00 0.00 RC78 Real data X (100%) Tree A (Gnetales-sister) 1.00 0.00 0.00 0.00 Tree B (Gnepine) 0.00 1.00 0.00 0.00 Tree C (Gnetifer) 0.00 0.11 0.76 0.00 0.13 104 TABLE 4.4. Major seed-plant hypotheses inferred from simulations of various partitions of the real data constrained to the anthophyte hypothesis (Gnetales united with angiosperms). Both maximum parsimony (MP) and maximum likelihood results are shown. The rows indicate the data partition used to simulate data given the constraint tree (Tree D, anthophyte), and columns indicate the fraction of trees (MP=1000 simulations; ML=100 simulations) that are inferred for the corresponding major hypothesis. Data partition Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Tree D (Anthophyte) Other topologies recovered* Maximum parsimony All codon positions 1.000 0.000 0.000 0.000 Codon positions 1 and 2 0.745 0.000 0.000 0.077 0.178 Codon position 3 1.000 0.000 0.000 0.000 Rate Classes (RC) 0-6 0.415 0.000 0.000 0.316 0.269 RC78 1.000 0.000 0.000 0.000 Maximum likelihood All codon positions 0.41 0.00 0.00 0.23 0.36 Codon positions 1 and 2 0.29 0.00 0.00 0.30 0.41 Codon position 3 0.30 0.00 0.00 0.31 0.39 Rate Classes (RC) 0-6 0.31 0.00 0.00 0.39 0.30 RC78 0.41 0.00 0.00 0.33 0.26 *The other topology recovered in every case depicts Gnetales as the sister group of all other gymnosperms 105 Figure 4.1. Various seed-plant topologies proposed in the literature with regard to the position of Gnetales. [ANG = angiosperms, CON = conifers (Pinaceae + Cupressophyta); CUP = non-Pinaceae conifers, (or Cupressophyta; Cantino et al. 2007), GNE = Gnetales, GYM = other gymnosperms, PIN = Pinaceae]. Asterisks denote branches that were constrained in ML searches used to generate the model trees for simulations; the other branches depict the overall arrangements for the other major seed-plant clades found in the respective constrained ML searches of the full data set. 106 Figure 4.2. Plastid-based phylogeny of the conifers and relatives. The tree is the one most parsimonious trees recovered (17,584 steps, CI=0.506, RI=0.616) found using coding regions from 17 plastid genes. Bootstrap values are indicated above branches. 107 Figure 4.3. Maximum likelihood tree (-lnL=95,771.648) found using coding regions from 17 plastid genes and including all 9 rate classes (RC0-RC8). The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches. 108 Figure 4.4 A. Proportion of the total nucleotides in each of two codon-position plastid data partitions (codon positions 1 and 2 vs. codon position 3) that belong to different rate classes; B. Proportion of the total characters in each of nine rate classes that belong to the two codon position data partitions. In each graph, RC0 represents sites that do not change, and RC8 the fastest sites. These values were estimated using the best MP tree (Fig. 4.2) and the GTR + ! model (with model parameters determined from the data in each case). Note that two thirds of all 12590 sites considered belong to the data partition comprising codon positions 1 and 2 (8391 characters). 109 110 111 Figure 4.5. Maximum likelihood tree (-lnL=38,508.097) found using codon positions 1 and 2 for multiple plastid genes. The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches; values for codon positions 1 and 2 are before the slash, and numbers after the slash are maximum likelihood bootstrap values for the partitioned data that excludes the two fastest rate classes (RC0-6). The double-headed arrow (and associated bootstrap value) represents an alternative placement of Pinaceae (gnetifer hypothesis) found using the RC0-6 data. 112 Figure 4.6. Depiction of the zero-length branch (left side tree; large arrow) when maximum likelihood is used as the criterion for viewing the anthophyte hypothesis for coding regions from 17 plastid genes. The same constraint under the maximum parsimony criterion is displayed on the right hand side. Each phylogram is a depiction of the same tree, which was obtained from a heuristic ML search constrained according to the anthophyte hypothesis (Fig. 4.1D), shown here with branch lengths optimized using ML (GTR + ! + I model, with all parameters estimated using ML), and MP (ACCTRAN optimization), respectively. 113 114 Supplementary Figure. Relationships within the conifer clades presented in Figs. 4.2 and 4.3. The tree shown here is a portion of the tree recovered using maximum parsimony for coding regions from 17 plastid genes; the likelihood tree is identical in topology. Parsimony and likelihood bootstrap values are indicated before and after the slash, respectively. Juniperus Thuja Widdringtonia Taxodium Metasequoia Cunninghamia Taxus Torreya Cephalotaxus Sciadopitys Podocarpus Saxegothaea Phyllocladus Agathis Araucaria Abies Cedrus Pinus Pseudotsuga 100/100 82/ 79 97/98 100/94 100/100 72/70 74/53 81/90 Pinaceae Araucariaceae Podocarpaceae Taxaceae Cupressaceae s.l. Cephalotaxaceae Sciadopityaceae 100/100 100/100 100/100 100/100 100/100 100/100 100/100 100/ 100 100/ 100 100/100 115 4.5 REFERENCES ANGIOSPERM PHYLOGENY GROUP (APG II). 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141: 399-436. BERGSTEN, J. 2005. A review of long-branch attraction. Cladistics 21: 163-193. BOIVIN, R., M. RICHARD, D. BEAUSEIGLE, J. BOUSQUET, AND G. BELLEMARE. 1996. Phylogenetic inferences from chloroplast chlB gene sequences of Nephrolepis exaltata (Filicopsida), Ephedra altissima (Gnetopsida), and diverse land plants. Mol. Phyl. Evol. 6: 19-29. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales\u00E2\u0080\u0099 closest relatives are conifers. Proc. Acad. Nat. Sci. 97: 4092-4097. BURLEIGH, J. G. AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am. J. Bot. 91: 1599-1613. BURLEIGH, J. G. AND S. MATHEWS. 2007a. Assessing among-locus variation in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 111-124. BURLEIGH, J. G. AND S. MATHEWS. 2007b. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, and M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822\u00E2\u0080\u0093846. CHANG, J. T. 1996. Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Math. Biosci. 134: 189-215. 116 CHAW, S-M., A. ZHARKIKH, H-M. SUNG, T-C. LAU, AND W-H. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68. CHAW, S., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Acad. Nat. Sci. 97: 4086-4091. CRANE, P. R., P. HERENDEEN, AND E. M. FRIIS. 2004. Fossils and plant phylogeny. Amer. J. Bot. 91: 1683-1699. DONOGHUE, M. J. AND J. A. DOYLE. 2000. Demise of the anthophyte hypothesis? Curr. Biol. 10: R106-R109. DOYLE, J. A. 1996. Seed plant phylogeny and the relationships of Gnetales. Int. J. Plant Sci. 157: S3-S39. DOYLE, J. A. 2006. Seed ferns and the origin of angiosperms. J. Torrey. Bot. Soc. 133: 169- 209. DOYLE, J. A., AND M. J. DONOGHUE. 1986. Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach. Bot. Rev. 52: 321-431. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. FARJON, A. 2001. World checklist and bibliography of conifers, 2 nd edition. The Bath Press, Bath, England. FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27: 401-410. 117 FELSENSTEIN, J. 1983. Parsimony in systematics: biological and statistical issues. Ann. Rev. Ecol. Syst. 14: 313-333. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783\u00E2\u0080\u0093791. FRIIS, E. M., P. R. CRANE, K. R. PEDERSON, S. BENGTSON, P. C. J. DONOGHUE, G. W. GRIMM, AND M. STAMPANONI. 2007. Phase-contrast X-ray microtomography links Cretaceous seeds with Gnetales and Bennettitales. Nature 450: 549-553. GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A. ANTONOV, AND W. MARTIN. 1996. Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Mol. Biol. Evol. 13: 383-396. GRAHAM, S. W., J. R. KOHN, B. R. MORTON, J. E. ECKENWALDER, AND S. C. H. BARRETT. 1998. Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. Syst. Biol. 47: 545-567. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161:S83-S96. 118 GRAHAM, S.W., J.M. ZGURSKI, M.A. MCPHERSON, D.M. CHERNIAWSKI, J.M. SAARELA, V.L. BIRON, J.C. PIRES, R.G. OLMSTEAD, M.W. CHASE AND H.S. RAI. 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Pp. 3-21. In Monocots: comparative biology and evolution (excluding Poales) Edited by J. T. Columbus, E. A. Friar, J. M. Porter, L. M. Prince, and M. G. Simpson. Rancho Santa Ana Botanic Garden, Claremont, CA. GUINDON, S. AND O. GASCUEL. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704. HAJIBABAEI, M., J. XIA, AND G. DROUIN. 2006. Seed plant phylogeny: Gnetophytes are derived conifers and a sister group to Pinaceae. Mol. Phylog. Evol. 40: 208-217. HENDY, M. D. AND D. PENNY. 1989. Framework for the quantitative study of evolutionary trees. Syst. Zool. 38: 297-309. HILL, C. R., AND P. R. CRANE. 1982. Evolutionary cladistics and the origin of the angiosperms. Pp. 269-361. In: Problems of phylogenetic reconstruction (K.A. Joysey and E.A. Friday, eds.). Academic Press, London. HILTON, J., AND R. M. BATEMAN. 2006. Pteridosperms are the backbone of seed-plant phylogeny. J. Torrey. Bot. Soc. 133: 119-168. HUELSENBECK, J. P. 1995. Performance of phylogenetic methods in simulation. Syst. Biol. 44: 17-48. HUELSENBECK, J. P. 1998. Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol. 47: 519-537. KELLOGG, E. A. AND N. D. JULIANO. 1997. The structure and function of RuBisCo and their implications for systematic studies. Am. J. Bot. 84: 413-428. 119 KOLACZKOWSKI, B. AND J. W. THORNTON. 2004. Performance of maximum pasimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431: 980-984. KOLACZKOWSKI, B. AND J. W. THORNTON. 2008. A mixed branch length model of heterotachy improves phylogenetic accuracy. Mol. Biol. Evol. 25: 1054-1066. KOSAKOVSKY POND, S. L., S. D. W. FROST, AND S. V. MUSE. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676-679. LOCONTE, H. AND D. W. STEVENSON. 1990. Cladistics of the spermatophyta. Brittonia. 42: 197-211. MADDISON, D. R., M. D. BAKER, AND K. A. OBER. 1999. Phylogeny of carabid beetles inferred from 18S ribosomal DNA (Coleoptera: Carabidae). Syst. Entomol. 24: 103-138. MATHEWS, S., AND M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286: 947-950. MCCOY, S. R., J. V. KUEHL, J. L. BOORE, AND L. A. RAUBESON. 2008. The complete plastid genome sequence of Welwitschia mirabilis: an unusually compact plastome with accelerated divergence rates. BMC Evol. Biol. 8: 130. NIXON, K. C., W. L. CREPET, D. STEVENSON, AND E. M. FRIIS. 1994. A reevaluation of seed plant phylogeny. Ann. Missouri Bot. Gard. 81: 484-533. PARKINSON, C. L., K. L. ADAMS, AND J. D. PALMER. 1999. Multigene analyses identify the three earliest lineage of extant flowering plants. Curr. Biol. 9: 1485-1488. POSADA, D., AND K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14: 817-818. 120 QIU, Y., J. LEE, J, F. BERNASCONI-QUADRONI, D. E. SOLTIS, P. S. SOLTIS, M. ZANIS, E. A. ZIMMER, Z. CHEN, V. SAVOLAINEN, AND M. W. CHASE. 2000. Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int. J. Plant Sci. 161: S3-S27. RAI, H. S., H.E. O\u00E2\u0080\u0099BRIEN, P. A. REEVES, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phylogenet. Evol. 29: 350-359. RAI, H. S., P. A. REEVES, R. PEAKALL, R. G. OLMSTEAD, AND S. W. GRAHAM. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 86: 659- 669. RAMBAUT, A., AND N.C. GRASSLY. 1997. SEQ-GEN: an application for the Monte carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13: 235-238. RAMBAUT, A. 1998. \u00E2\u0080\u009CSe-Al (Sequence Alignment Editor Version 1.0),\u00E2\u0080\u009D Computer program and documentation. Department of Zoology, University of Oxford, UK. RAUBESON, L. A., AND R. K. JANSEN. 1992. A rare chloroplast-DNA structural mutation is shared by all conifers. Biochem. Syst. Ecol. 20: 17-24. ROTHWELL, G. W. 1982. New interpretation of the earliest conifers. Rev. Palaeobot. Palynol. 37: 7-28. ROTHWELL, G. W., AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482. ROTHWELL, G. W., AND R. A. STOCKEY. 2002. Anatomically preserved Cycadeoidea (Cycadeoidaceae) with a reevaluation of systematic characters for the seed cones of Bennettitales. Am. J. Bot. 89: 1447-1452. 121 RYDIN, C., M. K\u00C3\u0084LLERSJ\u00C3\u0096, AND E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci. 163: 197-214. SAARELA, J. M., H. S. RAI, J. A. DOYLE, P. K. ENDRESS, S. MATHEWS, A. D. MARCHANT, B. G. BRIGGS, AND S. W. GRAHAM. 2007. Hydatellaceae identified as a new branch near the base of the angiosperm phylogenetic tree. Nature 446: 312-315. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17: 782-797. SAVOLAINEN, V., M. W. CHASE, S. B. HOOT, C. M. MORTON, D. E. SOLTIS, C. BAYER, M. W. FAY, A. Y. DE BRUIJN, S. SULLIVAN, AND Y. QIU. 2000. Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Syst. Biol. 49: 306-362. SERBET, R. AND G. W. ROTHWELL. 1992. Characterizing the most primitive seed ferns. A reconstruction of Elkinsia polymorpha. Int. J. Plant Sci. 153: 602-621. SOLTIS, P. S., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402: 402-404. SOLTIS, D. E., P. S. SOLTIS, AND M. ZANIS. 2002. Phylogeny of seed plants based on evidence from eight genes. Am. J. Bot. 89: 1670-1681. STEFANOVIC, S., M. JAGER, J. DEUTSCH, J. BROUTIN, AND M. MASSELOT. 1998. Phylogenetic relationships of conifers inferred from partial 28S rRNA gene sequences. Am. J. Bot. 85: 688-697. 122 SWOFFORD, D. L.; P. J. WADDELL; J. P. HUELSENBECK; P. G. FOSTER; P. O. LEWIS; AND J. S. ROGERS. 2001. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50: 525-539. SWOFFORD, D. L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. TUFFLEY, C. AND M. STEEL. 1998. Modeling the covarion hypothesis of nucleotide substitution. Math Biosci. 147: 63-91. WAKASUGI, T., J. TSUDZUKI, S. ITO, K. NAKASHIMA, T. TSUDZUKI, AND M. SUGIURA. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Natl. Acad. Sci. USA 91: 9794-9798. WINTER, K. U., A. BECKER, T. MUNSTER, J. T. KIM, H. SAEDLER, AND G. THEISSEN. 1999. MADS-box genes reveal that gnetophytes are more closely related to conifers than to flowering plants. Proc. Natl. Acad. Sci. USA 96: 7342-7347. ZGURSKI, J. M., H. S. RAI, Q. M. FAI, D. J. BOGLER, J. FRANCISCO-ORTEGA, S. W. GRAHAM. 2008. How well do we understand the overall backbone of cycad phylogeny? New insights from a large, multigene plastid data set. Mol. Phylogenet. Evol. 47: 1232-1237. ZHOU, Y., N. RODRIGUE, N. LARTILLOT, AND H. PHILIPPE. 2007. Evaluation of the models handling heterotachy in phylogenetic inference. BMC Evol. Biol. 7: 206. 123 CHAPTER 5 1 INFERENCE OF DEEP VASCULAR-PLANT PHYLOGENY, WITH A FOCUS ON BACKBONE RELATIONSHIPS IN MONILOPHYTA 5.1 INTRODUCTION The fossil record of the vascular plants (tracheophytes; Tracheophyta of Cantino et al., 2007) reaches back at least 410 Myr (Lang, 1937; Cooksonia, the earliest evidence of tracheophytes in the fossil record), representing some of the first fossil traces of land plants. Vascular plants have a dominant diploid sporophytic phase, in contrast to the other land plants (the bryophytes, with a dominant haploid phase). The gametophyte (haploid phase) of vascular plants is reduced but is nutritionally independent of the sporophyte plant in most major clades (the seed plants, Spermatophyta [Chapters 2-4] are one major exception). The sporophytes of extant vascular plants usually have true roots and branched stems bearing leaves and multiple sporangia. Although roots and leaves characterize almost all extant vascular plants, these clearly arose several times in parallel (Doyle, 1998), and so neither organ type constitutes a synapomorphy for the clade as a whole. A branched sporophyte is a synapomorphy for the vascular plants, or more precisely for Polysporangiophyta (a slightly larger clade that includes all the plants with branched sporophytes, Kenrick and Crane, 1997). In contrast to other land plants, the vascular plants have a continuous system of true vascular tissues (xylem and phloem) that transports nutrients and water around the plant body. 1 A version of this chapter will be submitted for publication: RAI, H. S., AND S. W. GRAHAM. Deep vascular-plant phylogeny, with a focus on backbone relationships in Monilophyta. 124 Tracheophytes were traditionally divided into the seed plants and the seedless vascular plants, the latter also known as the \u00E2\u0080\u009Cseed free\u00E2\u0080\u009D or \u00E2\u0080\u009Cfree-sporing\u00E2\u0080\u009D vascular plants, the \u00E2\u0080\u009Cferns and fern allies\u00E2\u0080\u009D, or the \u00E2\u0080\u009Cpteridophytes\u00E2\u0080\u009D (\u00E2\u0080\u009CPteridophyta\u00E2\u0080\u009D). However, it is now well- recognized that the deepest extant phylogenetic split in vascular plants is between the lycophytes and euphyllophytes (see Fig. 1.1; Raubeson and Jansen, 1992; Kranz and Huss, 1996; Kenrick and Crane, 1997; and Pryer et al., 2001, 2004; referred to, respectively, as Lycopodiophyta and Euphyllophyta in Cantino et al., 2007). Lycophytes have microphylls (simple, often small leaves that usually bear a single unbranched vascular trace), whereas most euphyllophytes have megaphyllous leaves (typically large leaves with a complex structure, often with multiple, branched vascular traces). These two major leaf types are not homologous. The euphyllophytes are in turn divided into two major clades (see Fig. 1.1); the seed plants (Chapters 2-4) and monilophytes (Monilophyta; Cantino et al., 2007). Both \u00E2\u0080\u0098monilophytes\u00E2\u0080\u0099 and \u00E2\u0080\u0098euphyllophytes\u00E2\u0080\u0099 unfortunately derive their names from structural concepts that either apply in part to unrelated taxa (i.e., \u00E2\u0080\u009Cmoniliform\u00E2\u0080\u009D steles of extinct fern clades that are not closely related to the monilophytes; Rothwell and Stockey, 2008) or to organs that are not synapomorphic for the clade (i.e. the \u00E2\u0080\u009Ctrue leaves\u00E2\u0080\u009D or megaphylls of euphyllophytes, see below). The monilophytes, which are all seedless, comprise the whisk ferns, horsetails and many taxa traditionally thought of as \u00E2\u0080\u009Ctrue\u00E2\u0080\u009D ferns (i.e., the two extant eusporangiate fern families, Ophioglossaceae and Marattiaceae, and the various families of leptosporangiate ferns (Polypodiopsida of Smith et al., 2006; Leptosporangiatae of Cantino et al., 2007). Various groups of extinct and extant plants have been referred to as ferns. Classically ferns were construed as the vascular plants that have megaphyllous leaves but which lack seeds 125 (e.g., \u00E2\u0080\u009CFilicophyta\u00E2\u0080\u009D in Gifford and Foster, 1989). Although seedless, not all monilophytes have obvious megaphylls, perhaps by reduction (i.e., the whisk ferns, Psilotaceae, apparently lack megaphylls; extant horsetails have leaves that resemble microphylls). Most recently, however, Pryer et al. (2004) referred to the entire monilophyte clade as ferns. This expanded phylogenetic usage of \u00E2\u0080\u009Cfern\u00E2\u0080\u009D is arguably terminologically confusing, since it includes taxa that would not traditionally have been recognized as ferns (horsetails, whisk ferns). However, the classical usage is itself problematic, as the megaphyllous leaf likely arose multiple times in euphyllophyte phylogeny from simple or compound overtopped branches (e.g., Doyle, 1998; Boyce and Knoll, 2002), giving rise to apparently independently derived extant and extinct \u00E2\u0080\u009Cfern\u00E2\u0080\u009D clades, some likely outside the monilophyte clade inferred from molecular data (e.g., Rothwell, 1999; Rothwell and Nixon, 2006; Rothwell and Stockey, 2008). Moreover, the seed plants are nested in a larger clade (lignophytes) that includes Archaeopteris and other megaphyllous but seedless plants (which might be considered ferns according to the morphological definition, although they are not traditionally recognized as such). Adding to the terminological confusion, many branches of early seed-plant phylogeny are informally referred to as \u00E2\u0080\u009Cseed ferns\u00E2\u0080\u009D (pteridosperms; see Doyle, 2006 for a discussion of seed ferns and their relationship to angiosperms). Here I avoid use of the word fern, unless it is part of an informal name. Considerable progress has been made at various levels of vascular-plant phylogeny. Within seed plants, for example, we now have a remarkably clear picture of most aspects of conifer phylogeny (Chapter 3). Within extant lycophytes, the two heterosporous lycophyte families (Iso\u00C3\u00ABtaceae and Selaginellaceae) are recognized as a clade that is sister to the homosporous lycophytes (Lycopodiaceae; e.g. Kenrick and Crane, 1997; Wikstr\u00C3\u00B6m and 126 Kenrick, 1997), and the phylogenetic structure of several lycophyte families is becoming clear (e.g. Lycopodiaceae; Wikstr\u00C3\u00B6m and Kenrick, 1997, 2000; Selaginellaceae; Korall and Kenrick, 2004; Iso\u00C3\u00ABtaceae; Rydin and Wikstr\u00C3\u00B6m, 2002). Within monilophytes, we now have reasonably well-resolved pictures of the major relationships within Ophioglossaceae (Wagner, 1990; Hauk et al., 2003), Marattiaceae (Murdock, 2008), Equisetaceae (Des Marais et al., 2003; Guillon, 2004, 2007) and for much of leptosporangiate fern phylogeny. Many phylogenetic studies of monilophytes have focused primarily on the leptosporangiate ferns, by far the largest branch of monilophyte phylogeny (33 families; Smith et al. 2006). These studies are largely congruent with one another (e.g., Hasebe et al., 1995; Pryer et al., 1995, 2001, 2004; Schneider et al., 2004; Schuettpelz et al. 2006; Schuettpelz and Pryer, 2007). For example, they find well-supported relationships within the leptosporangiate ferns, including the monophyly of heterosporous ferns, tree ferns, and polypod ferns [the relative arrangement of these \u00E2\u0080\u0098core leptosporangiates\u00E2\u0080\u0099 has only recently been established with strong support, Schuettpelz et al., (2006)], in addition to strong support for the placement of Osmundaceae as the sister group of all other leptosporangiate ferns. Deeper aspects of the structure of vascular-plant phylogeny remain controversial (cf. Doyle, 1998; Pryer et al. 2001; Wikstr\u00C3\u00B6m and Pryer, 2005; Rothwell and Nixon, 2006), including the relationships among the five major groups of seed plants (Chapters 2-4), and among the five major lines of monilophytes (Equisetaceae, Ophioglossaceae, Psilotaceae, Marattiaceae and the leptosporangiate ferns). Within monilophytes, there is only strong molecular evidence for a sister-group relationship between two of five subclades (between the whisk ferns, Psilotaceae, and a family of eusporangiate ferns, Ophioglossaceae), and several aspects of the backbone of leptosporangiate fern phylogeny remain incompletely 127 understood (Pryer et al., 2001, 2004; Schneider et al., 2004; Schuettpelz et al., 2006; Schuettpelz and Pryer, 2007). In this study I broadly explore monilophyte relationships and their placement within the vascular plants using 17 plastid genes; these markers, and their associated noncoding regions have proven useful for various deep phylogenetic questions (Graham and Olmstead, 2000a; Rai et al. 2003, 2008; Chapters 3, 4). My study represents one of the largest molecular datasets (in terms of nucleotides sequenced per taxon) compiled for vascular plants to date, and samples multiple representatives of all living vascular plant groups, including a majority of the fern families recognized by Smith et al. (2006; Fig. 5.1). The main purpose of this study is to determine whether this expanded gene set permits robust resolution of vascular-plant phylogeny, particularly for the relationships among the major clades of monilophytes and vascular plants. I use filtering methods employed in two previous chapters (Chapters 3, 4) to assess whether removing the fastest evolving characters has a major influence on these phylogenetic reconstructions. I also investigate the potential influence of one very long branch in the inferred phylogeny (Selaginella; Selaginellaceae) on phylogenetic estimation. 5.2 MATERIALS AND METHODS 5.2.1 Taxonomic and Genomic Sampling The final matrix considered here includes 64 representatives of the major vascular plant lineages (see Table 5.1 for a list of newly generated sequences and associated GenBank accession numbers). Of these, four represent the major lineages of lycophytes (Selaginella uncinata and Huperzia lucidula sequences were obtained from GenBank; accession numbers 128 AB197035 and NC_006861, respectively). I include six angiosperms to represent the basal structure of angiosperm phylogeny (see Chapter 4; and see Saarela et al., 2007), and included 16 gymnosperms that are broadly representative of relationships inferred in Chapters 3, 4. I also included four bryophyte outgroups (GenBank accession numbers for Anthoceros formosae, Marchantia polymorpha and Physcomitrella patens are NC_004543, NC_00319 and NC_005087, respectively; see Table 5.1 for Sphagnum sp.). The remaining 34 taxa represent each of the major lineages of monilophytes (Fig. 5.1, see also Fig. 1.1), including eight eusporangiate taxa (Angiopteris evecta and Psilotum nudum sequences were obtained from GenBank, accession numbers NC_008829 and NC_003386, respectively; see Table 5.1 for the remainder) and 26 representative leptosporangiate ferns (Polypodiopsida) (Adiantum capillus-veneris, GenBank accession number NC_004766; see Table 5.1 for the remainder). The latter represent each of the 7 orders and 21 of the 33 families of leptosporangiate ferns recognized by Smith et al. (2006; see Fig. 5.1 here). The leptosporangiate fern families that I have not sampled for this study are mainly limited to small families (each with one to five genera) in Polypodiales and Cyatheales (Fig. 5.1). I surveyed 17 genes and associated non-coding regions that represent approximately 10% of the monilophyte plastid genome (reference taxon = Adiantum capillus-veneris). The regions I retrieved here are the same coding regions as Chapters 2, 3, 4 and non-coding regions from Chapters 2 and 3 with one exception, the intergenic spacer (IGS) between rps7 and ndhB that is present in all seed plants surveyed for this study, is apparently not present as a contiguous region in most leptosporangiate ferns due to a large inversion that involves a large portion of the inverted repeat (Raubeson and Stein, 1995; Wolf et al., 2003), precluding its recovery. 129 5.2.2 DNA Extraction, Amplification and Sequencing Genomic DNA was extracted using the protocol of Doyle and Doyle (1987), as modified in Chapter 2, from fresh, silica-dried, and herbarium specimens. Amplification and sequencing follows Chapters 2, 3, and 4 (and as originally described in Graham and Olmstead, 2000a) and, with a few minor exceptions, I completely sequenced all products in both forward and reverse directions. I designed a set of 16 new fern specific primers to facilitate amplification and sequencing (Table 5.2). The new data were added to a previously generated alignment (Chapter 3) using alignment criteria outlined in Graham et al. (2000) and Chapters 2, 3, 4. Regions in some taxa that I could not amplify or sequence (see Table 5.1) were coded as missing data in the final matrix. The aligned regions considered for analysis include all of the coding regions and unambiguously aligned non-coding regions from two introns (rpl2 and ndhB) and seven intergenic spacer regions (3\u00E2\u0080\u0099-rps12-rps7, and three each in the psbE-psbF-psbL-psbJ and psbB-psbT-psbN-psbH clusters). The final aligned matrix includes 36,139 bp per taxon (corresponding to 11,726 bp unaligned in Vandenboschia davallioides, for example). Of these aligned characters, 2573 bp are variable but uninformative across vascular plants, and 7479 bp are parsimony informative. 5.2.3 Phylogenetic Analyses All phylogenetic analyses were conducted using the full 64-taxon matrix. I performed an heuristic maximum parsimony (MP) search using PAUP* (ver. 4.0b10; Swofford, 2003), with all characters and character state changes equally weighted, and using 130 TBR (tree-bisection-reconnection) branch swapping, with 100 random addition replicates, and otherwise using default settings. I also performed a maximum likelihood (ML) heuristic search using PhyML (ver. 2.4.4; Guindon and Gascuel, 2003), with a BIONJ starting tree, NNI (nearest-neighbour-interchange) branch swapping and model parameters estimated from the data in each case. I chose a DNA substitution model for ML analysis using the hierarchical likelihood ratio test (hLRT) and the Akaike Information Criterion (AIC) in Modeltest ver. 3.7 (Posada and Crandall, 1998). The optimal model chosen in each case was GTR + ! + I [general-time-reversible (GTR) rate matrix with proportion of invariable sites (I) considered, and among-site rate variation accounted for using the gamma (!) distribution as described by the shape parameter alpha (\")]. I performed non-parametric bootstrap analysis (Felsenstein, 1985) using the same search criteria, with 100 bootstrap replicates (and a single random addition replicate per parsimony bootstrap replicate). I use \u00E2\u0080\u0098weak,\u00E2\u0080\u0099 \u00E2\u0080\u0098moderate,\u00E2\u0080\u0099 and \u00E2\u0080\u0098strong\u00E2\u0080\u0099 in reference to clades that have bootstrap support values < 70%, 70- 89%, and ! 90%, respectively (e.g., Chapters 2, 3, 4 and Graham et al., 1998). 5.2.4 Inference of Nucleotide Rate Classes and Exploration of the Effect of Long Branches I partitioned the data into nine rate classes using HyPhy (Kosakovsky Pond et al., 2005; and see Chapter 4). I used the best MP tree topology (above) to assign each of the 36,139 aligned nucleotide sites to its most likely individual rate category (using the GTR model and eight discrete rate classes). 26,087 of the sites were assigned to RC0, representing the total number of sites with no change across this data set. The fastest rate class (RC8) 131 included 2075 sites. After excluding the two fastest rate classes (RC7 and RC8), I performed a maximum-likelihood heuristic search using the search criteria outlined above. It is apparent that the sequence for Selaginella is quite divergent when compared to the rest of the vascular plants (see Results; this was also evident from a visual examination of the DNA sequence alignment). I recalculated the rate class partitioning using a new MP tree (found using PAUP*), after the removal of Selaginella. I then performed an ML search (using the GTR + ! + I model and search criteria outlined above, with parameters estimated from the data) of the seven slowest rate class partitions (RC0-6). I also performed MP and ML analyses of the full (unfiltered) data set after excluding various subsets (and in one case all) of the four bryophyte outgroup taxa to examine their possible effect on ingroup (vascular plant) relationships. 5.3 RESULTS 5.3.1 Phylogenetic Analyses Maximum parsimony (MP) and maximum likelihood (ML) analyses of the full data set (i.e., all sites, including noncoding regions) produce generally similar trees, with a majority of backbone branches strongly supported by bootstrap analysis. This includes strong bootstrap support (94% MP and 100% ML) for the monophyly of monilophytes (Figs. 5.2 and 5.3). I also performed MP and ML analyses of the coding regions only (e.g., Chapter 4) and found that trees were topologically very similar (not shown here; exceptions include branches in the analyses of the full data-set with <50% bootstrap support; these tended to vary in their relative arrangement of constituent clades in MP vs. ML analysis of the full data 132 set). Because the results of the analyses involving the full data set vs. the coding regions are broadly comparable, I discuss the latter only here in detail. There are several substantial conflicts between MP and ML analysis. The MP analyses provide strong support for lycophytes as the sister group of seed plants (94% support; Fig. 5.2), whereas the ML analysis provides moderate support for them as the sister group of all other vascular plants (81% support; Fig. 5.3). Within the seed plants, MP trees recover Gnetales as the sister group of all other seed plants, while Gnetales are associated with conifers in ML analysis (with 100% and 73% bootstrap support, respectively; see Chapter 4 for a more thorough examination of seed-plant relationships) Within the monilophytes, the leptosporangiate ferns are strongly supported as monophyletic (100% MP and ML support; Figs. 5.2, 5.3). Both MP and ML strongly unite the eusporangiate fern lineage Ophioglossaceae with whisk ferns (Psilotaceae) and each family is strongly supported as monophyletic (Figs. 5.2, 5.3). The other eusporangiate fern lineage, Marattiaceae, is also strongly supported as monophyletic (99% and 100% bootstrap support; Figs. 5.2, 5.3) although MP is unable to robustly resolve the exact relationship of these two clades relative to the leptosporangiate ferns (Fig. 5.2). Maximum likelihood provides moderate support for Marattiaceae as the sister group of the leptosporangiate ferns (82% support; Fig. 5.3). ML and MP analyses both resolve Equisetum as the sister group of all other monilophytes, with moderate to poor support (74% and 58% support from MP and ML analysis, respectively). In the leptosporangiate ferns, Leptopteris (Osmundaceae) is strongly supported as the sister group of the remaining taxa, followed by moderately well supported splits between Hymenophyllaceae (=Hymenophyllales) and the rest, and then between gleichenioid ferns, 133 Gleicheniales (a strongly supported clade that includes Dipteridaceae, Matoniacaeae, and Gleicheniaceae; 92% support from both MP and ML; Figs. 5.2, 5.3) and the remainder. The schizaeoid ferns, Schizaeales (represented here by Lygodium and Schizaea; the monophyly of which is well supported) are strongly supported as the sister group of a clade that comprises the heterosporous water ferns, Salviniales (the monophyly of which is also strongly supported; 100% and 99% support from MP and ML analysis, respectively), the tree ferns, Cyatheales (monophyly strongly supported; 100% support from both MP and ML analysis) and the polypod ferns, Polypodiales (monophyly strongly supported; 100%). Within this large clade, a sister group relationship between Cyatheales and Polypodiales is also moderately well supported (76% vs. 89% from MP and ML analysis, respectively); consequently, Salviniales are moderately well supported as the sister group of Cyatheales- Polypodiales. At the current taxon sampling, several families with more than one representative are strongly supported as monophyletic. Hymenophyllaceae (=Hymenophyllales, represented here by Hymenophyllum and Vandenboschia) have 100% bootstrap support from both MP and ML. The monophyly of both Dipteridaceae (represented here by Dipteris and Cheiropleuria) and Lindsaeaceae (represented here by Lindsaea and Lonchitis) is strongly supported by MP bootstrap (100% and 97% support respectively), although ML bootstrap analysis only weakly recovers Lindsaeaceae (67%; Fig. 5.3). Within Pteridaceae, both maximum parsimony and maximum likelihood recover strong support for a sister-group relationship between Adiantum and Vittaria (100% and 97% bootstrap support from MP and ML respectively), with Ceratopteris strongly supported as the sister group of both, since the monophyly of the family as a whole is also well supported (Figs. 5.2, 5.3). 134 As noted above, the monophyly of each of the orders defined by Smith et al. (2006) is well supported here, at the current taxon sampling: Hymenophyllales, Gleicheniales, Schizaeales, Salviniales, Cyatheales and Polypodiales (only one member of Osmundales, Leptopteris, was sampled here). Within Gleicheniales, Dicranopteris (Gleicheniaceae) is strongly supported as the sister group of a clade consisting of Matoniaceae and Dipteridaceae (97% and 100% support from MP and ML, respectively; Figs. 5.2, 5.3). Within Cyatheales, MP and ML analyses resolve Plagiogyria as the sister group of a strongly supported clade that includes Cyathea and Dicksonia (99% and 96% bootstrap support from MP and ML, respectively). MP and ML analyses do not robustly resolve all of the basal relationships within the polypod ferns; it is not clear whether Saccolomataceae or Lindsaeaceae (or both) are the sister group of the clade that includes most of the living diversity of the leptosporangiate ferns (represented here by exemplar taxa from Pteridaceae, Dennstaedtiaceae, Polypodiaceae, Dryopteridaceae, Aspleniaceae, Thelypteridaceae and Blechnaceae; Table 5.1, Fig. 5.1). Maximum parsimony bootstrap analysis weakly suggests that Pteridacaeae is the sister group of Dennstaedtiaceae and the remaining polypod ferns (67%; Fig. 5.2), but this arrangement is not recovered in the best ML tree (Fig. 5.3) and is only very weakly supported by ML bootstrap analysis (54%; value recovered from the bootstrap majority-rule consensus tree). Three other clades comprising multiple families of polypods are moderately to strongly supported by MP and ML bootstrap analysis: a clade consisting of Dryopteridaceae and Polypodiaceae, a clade consisting of Aspleniaceae, Blechnaceae and Thelypteridaceae, and (Blechnaceae and Thelypteridaceae). 135 5.3.2 Rate Class Analyses When I analyzed the seven moderately evolving rate classes (RC0-6; i.e., excluding the fastest two rate classes) using ML, the ML bootstrap support for several nodes was substantially reduced when compared to analyses that include all of the rate classes (e.g. several major backbone nodes in Fig. 5.3, cf. left-hand bootstrap values in Fig. 5.4). Relationships that are now poorly supported by bootstrap analysis of the RC0-6 data include basal splits within the monilophytes (e.g., the sister group of leptosporangiate ferns) and basal splits within leptosporangiate ferns (e.g., whether Hymenophyllaceae is the sister group of all leptosporangiate ferns except Osmundaceae; whether Gleicheniales are monophyletic). The RC0-6 ML tree (Fig. 5.4) is compatible with the ML tree from the full data set (Fig. 5.3) concerning overall vascular-plant relationships (e.g., lycophytes as the sister group of all others, and Gnetales associated with conifers), but relationships are often weaker (e.g., support for Marattiaceae as sister to leptosporangiate ferns decreases from 82% to <50%). With respect to monilophyte relationships, the tree recovered after removal of the fastest sites (Fig. 5.4) is generally topologically congruent with both ML and MP trees found using the full data set, but with weak to moderate disagreement concerning the relative arrangements of Aspleniaceae, Blechnaceae and Thelypteridaceae (cf. Figs. 5.2-5.4). Other differences between MP and filtered and unfiltered ML analyses concern relationships outside of the monilophytes, particularly the placement of Gnetales in the seed plants (seed-plant relationships are discussed in detail in Chapter 4). The reduced ML bootstrap values observed in the RC0-6 analyses may not simply be a function of a reduced amount of data, since when I excluded Selaginella prior to assignment of rate classes and then re-analysed the data without this taxon, I found improved 136 ML bootstrap support for several major branches (cf. the left-hand and right-hand RC0-6 ML bootstrap values in Fig. 5.4). The major clades with improved support with Selaginella deleted in this way are the euphyllophytes as a whole (80% vs. 52%), the monilophytes as a whole (100% vs. 68%), a possible sister-group relationship between Equisetum and other monilophytes (71% vs. <50%) and leptosporangiate ferns as a whole (100% vs. 71%). The inclusion (or exclusion) of the various bryophyte outgroups for the full data set resulted in several weakly to moderately supported placements of Equisetum within the monilophytes (Fig. 5.5). For example, when all bryophyte taxa were removed prior to ML bootstrap analysis, Equisetum was found to be weakly supported as the sister group of the leptosporangiate ferns (59%; Fig. 5.5), whereas using only a single bryophyte outgroup (Anthoceros) places Equisetum weakly as the sister group of Marattiaceae and leptosporangiate ferns (65%; Fig. 5.5). 5.4 DISCUSSION Overall, the results of this large multigene plastid survey are largely congruent with several recent studies regarding the resolution of higher-order relationships in leptosporangiate ferns and overall vascular-plant phylogeny (Pryer et al., 2001, 2004; Schuettpelz et al., 2006). As in these previous studies, I find strong support for the placement of lycophytes as the sister-group of all other living vascular plants (except, curiously, in MP analysis, where they are strongly supported as the sister group of seed plants) and for a clade that includes all other extant seedless vascular plants (the monilophytes). In addition to this general congruence, I observe improved support for several clades that were previously only weakly to moderately supported by maximum 137 likelihood or maximum parsimony bootstrap in recent one to few gene studies (e.g., 89% ML support for the clade consisting of Cyatheales (tree ferns) and Polypodiales; moderate to strong ML bootstrap support along the rest of the main backbone of leptosporangiate ferns; Fig. 5.3). Some recent studies (Pryer et al., 2004; Schuettpelz et al., 2006; Smith et al., 2006) have suggested that Psilotopsida (Psilotaceae and Ophioglossaceae) is the sister group of all other monilophytes. In contrast, I generally find Equisetum to be the sister group of all other monilophytes, with weak to moderate support from ML analysis (Figs. 5.3, 5.4), although this is sensitive to reductions in outgroup sampling; Fig. 5.5). This result is somewhat less at odds with a placement of Equisetum outside the monilophyte clade, as seen in a morphological study (Rothwell and Nixon, 2006), and is less problematic from a morphological perspective than a placement of horsetails as nested more deeply within this clade (Gar Rothwell, Ohio University, pers. comm.). I also find moderate support from ML analysis of the full data set for Marattiaceae as the sister-group of leptosporangiate ferns (82% ML bootstrap support; Fig. 5.3), an arrangement consistent with Schuettpelz et al. (2006). Along the main backbone of leptosporangiate ferns, the exact nature of the relationship between Hymenophyllaceae and the gleichenioid ferns (Gleicheniales) was equivocal or poorly supported in the most recent classification of monilophytes (Smith et al., 2006; Fig. 5.1) and the earlier phylogenetic study of Pryer et al. (2004). My results indicate that Osmundaceae, Hymenophyllaceae, and the gleichenioid ferns comprise a basal grade of leptosporangiate ferns, with relative arrangements that are consistent with the five-gene data 138 set of Schuettpelz et al. (2006) and the taxonomically dense three plastid-gene data set of Schuettpelz and Pryer (2007). Contrasting several recent studies (Burleigh and Mathews, 2004, 2007; Chapters 3, 4), I found that removing the most rapidly evolving characters prior to analysis generally had little effect on relationships among major vascular-plant groups and also within the monilophytes, except to reduce some ML bootstrap support values for some key nodes. In contrast, removing these fast rate classes and additionally deleting Selaginella from analyses resulted in a dramatic increase of bootstrap values, especially of basal monilophyte nodes (Fig. 5.4). This suggests that this bootstrap reduction is not just a function of a more limited character sampling, but that long branches in lycophytes can have a negative impact on phylogenetic inference within euphyllophytes. The current data set provides further resolution and support for inferences of deep vascular-plant relationship. In particular it corroborates other recent studies of fern phylogeny, with a taxon and gene sampling that is very different from these studies. It also offers new insights into relationships along the backbone of leptosporangiate ferns, including increased support for the earliest evolutionary splits within this incredibly diverse group of vascular plants. This clarified scaffold of monilophyte phylogeny should be of value to researchers investigating fern plastid genomes and their rearrangements, morphological character evolution, and to systematists focusing on individual lineages within the monilophytes. Although my results are generally highly congruent with several recent studies regarding overall monilophyte relationships, I would urge some caution regarding the inference of basal monilophyte relationships. I have highlighted a problematic vascular plant 139 lineage (Selaginella) with a relatively long branch. Long branches have posed rather severe problems for the phylogenetic inference various plant groups (e.g., seed plants; Chapter 4) and my work here demonstrates that the potential for similar problems when investigating monilophyte relationship exists as well (Fig. 5.4, 5.5). Although I have sampled extensively across the backbone of the vascular plant tree, greater taxonomic density, especially within \u00E2\u0080\u009Cbasal\u00E2\u0080\u009D lineages of the major groups may usefully help to break-up at least some of the long branches present in the current data set (e.g. Hippochaete and additional Equisetum species). Concerning euphyllophyte relationships as a whole, the addition of new data, and especially the incorporation of morphological data from the numerous known crown monilophyte fossil taxa and various euphyllophyte stem taxa with data from extant lineages, may hold the key to further improving our knowledge of euphyllophyte and ultimately vascular-plant relationships as a whole. 140 TABLE 5.1. GenBank accession numbers and vouchers for exemplar pteridophyte (and outgroup) taxa. Gene or region Taxon atpB ndhF psbB, T, psbD & C psbE, F, L rbcL rpl2 3'-rps12, ndhB authority N, & psbH & psbJ rps7 (voucher, herbarium) BRYOPHYTES Sphagnum sp. EU352260 EU349580 EU552803 EU328217 EU558386 EU352288 EU558420 EU558470 EU558449 L. (C. Lafarge Cedar Swamp 28-07-02 s.n.) LYCOPHYTES Isoetaceae Iso\u00C3\u00ABtes sp. EU352261 EU349581 EU552804 EU328218 EU558387 EU352289 EU558421 EU558471 EU558450 L. (H. Rai 1005, ALTA) Lycopodiaceae Lycopodium annotinum EU352262 EU349582 EU552805 EU328219 EU558388 EU352290 EU558422 EU558472 EU558451 L. (H.S. Rai and J.M. Zgurski 14-09-02-13, ALTA) EQUISETOPSIDA Equisetaceae Equisetum x ferrissii EU352264 EU349584 EU552807 EU328221 EU558390 EU352292 EU558424 EU558474 EU558452 Clute (P. Hammond s.n., UC) 141 Gene or region Taxon atpB ndhF psbB, T, psbD & C psbE, F, L rbcL rpl2 3'-rps12, ndhB authority N, & psbH & psbJ rps7 (voucher, herbarium) PSILOTOPSIDA Ophioglossaceae Helminthostachys zeylanica EU352265 EU349585 EU552808 EU328222 EU558391 EU352293 EU558425 EU558475 EU558453 (L.) Hook. (NYBG 233/84) Ophioglossum reticulatum (U93825) a n/a EU552810 EU328224 EU558393 (AF313582)a EU558427 EU558477 EU558455 L. (R. Moran 5644, MO) Psilotaceae Tmesipteris elongata EU352266 EU349587 EU552811 EU328225 EU558394 EU352294 EU558428 EU558478 EU558456 P. A. Dang. (A. R. Smith 2607, UC) MARATTIOPSIDA Marattiaceae Danaea elliptica EU352263 EU349583 EU552806 EU328220 EU558389 EU352291 EU558423 EU558473 n/a Sm. (J. Sharpe s.n., UC) Marattia attenuata (AF313546) a EU349586 EU552809 EU328223 EU558392 (AF313581) a EU558426 EU558476 EU558454 Labill. (R. Schmid s.n., UC) 142 Gene or region Taxon atpB ndhF psbB, T, psbD & C psbE, F, L rbcL rpl2 3'-rps12, ndhB authority N, & psbH & psbJ rps7 (voucher, herbarium) POLYPODIOPSIDA Aspleniaceae Asplenium viride EU352267 EU349588 EU552812 EU328226 EU558395 EU352295 n/a EU558479 EU558457 Huds. (H.S. Rai and J.M. Zgurski 14-09-02-12, ALTA) Blechnaceae Blechnum occidentale EU352268 EU349589 EU552813 EU328227 EU558396 EU352296 EU558429 EU558480 EU558458 L. (Wolf 289, UTC) Cyatheaceae Cyathea klossii EU352271 n/a EU552816 EU328230 EU558399 EU352299 EU558432 EU558483 n/a Ridl. (Johns 9728, KEW) Dennstaedtiaceae Dennstaedtia punctilobula (U93836) a EU349592 EU552817 EU328231 EU558400 EU352300 EU558433 EU558484 n/a (Michx.) T. Moore (H.H. Schmidt, M.W.R Eddy & E.C. Rempala 1533, MO) Dicksoniaceae Dicksonia Antarctica (U93829) a n/a EU552818 EU328232 EU558401 EU352301 EU558434 EU558485 n/a Labill. (H.S. Rai 1015, ALTA) 143 Gene or region Taxon atpB ndhF psbB, T, psbD & C psbE, F, L rbcL rpl2 3'-rps12, ndhB authority N, & psbH & psbJ rps7 (voucher, herbarium) Dipteridaceae Cheiropleuria integrifolia EU352270 EU349591 EU552815 EU328229 EU558398 EU352298 EU558431 EU558482 n/a (D.C. Eaton ex Hook.) M. Kato, Y. Yatabe, Sahashi & N. Murak. (Yokoyama 27619, TI) Dipteris conjugata (AF612696) a n/a EU552820 EU328234 EU558403 EU352303 EU558436 EU558487 n/a Reinw. (J. Game 98/106, UC) Dryopteridaceae Dryopteris filix-mas EU352273 EU349594 EU552821 EU328235 EU558404 (AY268845) a EU558437 EU558488 EU558461/ (L.) Schott EU558462 (H.S. Rai and J.M. Zgurski 14-09-02-8, ALTA) Gleicheniaceae Dicranopteris linearis EU352272 EU349593 EU552819 EU328233 EU558402 EU352302 EU558435 EU558486 EU558460 (Burm f.) Underw. (J. Game 98/105A, UC) Hymenophyllaceae Hymenophyllum hirsutum EU352274 EU349595 EU552822 EU328236 EU558405 (AF275645) a EU558438 EU558489 n/a (L.) Sw. (M. Kessler 11596, UC) Vandenboschia davallioides (U93828) a EU349606 EU552835 EU328249 EU558418 EU352314 EU558447 EU558502 EU558469 Copel. (Wolf 248, UTC) 144 Gene or region Taxon atpB ndhF psbB, T, psbD & C psbE, F, L rbcL rpl2 3'-rps12, ndhB authority N, & psbH & psbJ rps7 (voucher, herbarium) Lindsaeaceae Lindsaea rufa EU352276 EU349597 EU552824 EU328238 EU558407 EU352304 EU558439 EU558491 EU558464 K.U. Kramer (G. McPherson & J. Munzinger 18124, MO) Lonchitis hirsuta EU352277 EU349598 EU552825 EU328239 EU558408 EU352305 EU558440 EU558492 n/a L. (F. Axelrod 9601, UTC) Lygodiaceae Lygodium japonicum EU352278 EU349599 EU552826 EU328240 EU558409 (L13479) a EU558441 EU558493 EU558465 (Thunb.) Sw. (H. S. Rai 1013, ALTA) Marsiliaceae Marsilea drummondii EU352279 EU349600 EU552827 EU328241 EU558410 EU352306 EU558442 EU558494 n/a A. Braun (J. Zgurski 78, ALTA) Matoniaceae Matonia pectinata EU352280 EU349601 EU552828 EU328242 EU558411 EU352307 n/a EU558495 EU558466 R. Br. (E. Schuettpelz 752, DUKE) Osmundaceae Leptopteris wilkesiana EU352275 EU349596 EU552823 EU328237 EU558406 (AY612678) a n/a EU558490 EU558463 H. Christ (J. Game 95/035, no voucher) 145 Gene or region Taxon atpB ndhF psbB, T, psbD & C psbE, F, L rbcL rpl2 3'-rps12, ndhB authority N, & psbH & psbJ rps7 (voucher, herbarium) Plagiogyriaceae Plagiogyria japonica EU352281 EU349602 EU552829 EU328243 EU558412 EU352308 EU558443 EU558496 n/a Nakai (M. Hasebe 27614, TI) Polypodiaceae Polypodium hesperium EU352282 EU349603 EU552830 EU328244 EU558413 EU352309 EU558444 EU558497 EU558467 Maxon (H.S. Rai and J.M. Zgurski 14-09-02-2, ALTA) Pteridaceae Ceratopteris richardii EU352269 EU349590 EU552814 EU328228 EU558397 EU352297 EU558430 EU558481 EU558459 Brongn. (P. Killip 44595, GH) Vittaria volkensii EU352287 n/a EU552836 EU328250 EU558419 EU352315 EU558448 EU558503 n/a Hieron. (E.T. Africa, Cherangani Tweedia 2708, KEW) Saccolomataceae Saccoloma inaequale EU352283 EU349604 EU552831 EU328245 EU558414 EU352310 n/a EU558498 n/a (Kunze) Mett. (372076, DUKE) 146 Gene or region Taxon atpB ndhF psbB, T, psbD & C psbE, F, L rbcL rpl2 3'-rps12, ndhB authority N, & psbH & psbJ rps7 (voucher, herbarium) Salviniaceae Salvinia sp. EU352284 n/a EU552832 EU328246 EU558415 EU352311 EU558445 EU558499 EU558468 S\u00C3\u00A9g. (H.S. Rai 1023, UBC) Schizaeaceae Schizaea dichotoma EU352285 EU349605 EU552833 EU328247 EU558416 EU352312 n/a EU558500 n/a (L.) J. Sm. (S.W. Graham 02-03-36B s.n.) Thelypteridaceae Thelypteris reticulata EU352286 n/a EU552834 EU328248 EU558417 EU352313 EU558446 EU558501 n/a (L.) Proctor (J.S. Miller & M. C. Merello 8864, MO) a Previously published sequences. Accessions in brackets were produced by other workers; see Rai et al. (2003), Graham and Olmstead (2000a, b) and Graham et al. (2000) for a complete list of taxa and accession numbers for other sequences employed in phylogenetic analyses here. 147 Table 5.2. New primers designed for this study. Primer name/ Sequence (5\u00E2\u0080\u0099-3\u00E2\u0080\u0099) Gene/region F1F 1: CCATAATTTRCARGAACATTC 3\u00E2\u0080\u0099-rps12 L1F 2 : GAGRTAACRGCTTACATAC 3\u00E2\u0080\u0099-rps12 L2F: AAACAACTTGGTGTCYAAGG 3\u00E2\u0080\u0099-rps12 L2R: CTTAGACACCAAGTTGTTTC 3\u00E2\u0080\u0099-rps12 L4F: TGGAAAGCTGTATTCGATG 3\u00E2\u0080\u0099-rps12-rps7 IGS L4R: TCATCGAATACAGCTTTCC 3\u00E2\u0080\u0099-rps12 L5F: GATCCAATTTATCGTAATCG rps7 L5R: GATTACGATAAATTGGATC 3\u00E2\u0080\u0099-rps12-rps7 IGS F9F: TTATGGGTGGARCAAGTTC ndhB F9R: TAGAAGAACTTGYTCCACC ndhB F13F: GAAACGTATGCTTGCATATTC ndhB F13R: GAATATGCAAGCATACGTTTC ndhB F20F: ATATCGTSAAATWGATTTTCG rpl2 F24R: ATCTCTTCCCRAACTGTAC rpl2 F41F: GGTCCTGARGCACARGG psbD F45R: CATTAAAGAGCGTTTCCAC psbD 1 The prefix \u00E2\u0080\u0098F\u00E2\u0080\u0099 indicates a primer designed to work across all ferns 2 The prefix \u00E2\u0080\u0098L\u00E2\u0080\u0099 indicates a primer designed to work specifically in leptosporangiate ferns 148 Figure 5.1. The consensus tree presented in Smith et al. (2006) based on recent and ongoing phylogenetic studies, redrawn to highlight the taxonomic sampling presented in this study (clades in red are represented by at least one taxon in the current data set). All resolved nodes have ! 70% bootstrap support (unless otherwise noted above branches) from at least one of the phylogenetic studies used to create the consensus tree. The classification proposed by Smith et al. (2006) is presented to the right. 149 Figure 5.2. Plastid-based phylogeny of the vascular plants. The tree is the most parsimonious tree recovered (47,342 steps, CI=0.338, RI=0.58) found using 17 chloroplast genes and associated noncoding regions. Bootstrap values are indicated above branches. Labels within open circles denote leptosporangiate fern family names, as classified by Smith et al. (2006): Os \u00E2\u0080\u0093 Osmundaceae; Hy \u00E2\u0080\u0093 Hymenophyllaceae; Gl \u00E2\u0080\u0093 Gleicheniaceae; Mt \u00E2\u0080\u0093 Matoniaceae; Di \u00E2\u0080\u0093 Dipteridaceae; Ly \u00E2\u0080\u0093 Lygodiaceae; Sc \u00E2\u0080\u0093 Schizeaceae; Ma \u00E2\u0080\u0093 Marsileaceae; Sl \u00E2\u0080\u0093 Salviniaceae; Pl \u00E2\u0080\u0093 Plagiogyriaceae; Cy \u00E2\u0080\u0093 Cyatheaceae; Di \u00E2\u0080\u0093 Dicksoniaceae; Li \u00E2\u0080\u0093 Lindsaeaceae; Sa \u00E2\u0080\u0093 Saccolomataceae; Pt \u00E2\u0080\u0093 Pteridaceae; De \u00E2\u0080\u0093 Dennstaedtiaceae; Dr \u00E2\u0080\u0093 Dryopteridaceae; Po \u00E2\u0080\u0093 Polypodiaceae; As \u00E2\u0080\u0093 Aspleniaceae; Bl \u00E2\u0080\u0093 Blechnaceae; Th \u00E2\u0080\u0093 Thelypteridaceae. 150 151 Figure 5.3. Maximum likelihood tree (-lnL=23,5448.293) found using 17 plastid genes and associated noncoding regions. This analysis includes all nine rate classes (RC0-RC8). The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches. 152 Figure 5.4. Maximum likelihood tree (-lnL=11,2651.990) found using 17 plastid genes and associated noncoding regions, excluding the two fastest rate classes (RC7 and RC8) and including or excluding Selaginella. The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches, the \u00E2\u0080\u0098*\u00E2\u0080\u0099 indicates 100% bootstrap support. The values before the slash represent an ML analysis of RC0-6 that includes Selaginella, and the numbers after the slash denote an ML analysis of the same data, with Selaginella removed (prior to a new assignment of rate classes by HyPhy). The \u00E2\u0080\u0098-\u00E2\u0080\u0098 indicates an inapplicable node (because of the removal of Selaginella). 153 Figure 5.5. Placement of Equisetum from various taxon-exclusion analyses for the plastid data considered here. Various clades have been collapsed for clarity. Relevant maximum likelihood bootstrap values for each alternative are above branches (above and below the letter indicating each alternative on the vertical branch). A. Placement of Equisetum when no bryophyte representatives are included, B. Placement of Equisetum when a single bryophyte (Anthoceros) is used to root the tree, C. Placement of Equisetum when Selaginella is removed prior to analysis of RC0-6 (rate classes were recalculated after the removal of Selaginella; see Fig. 5.5; Equisetum is also found here in MP and ML analyses of the full data set; Figs. 5.3, 5.4). 154 5.5 REFERENCES BOYCE, K. C. AND A. H. KNOLL. 2002. Evolution of developmental potential and the multiple independent origins of leaves in Paleozoic vascular plants. Paleobiol. 28: 70-100. BURLEIGH, J. G. AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am. J. Bot. 91: 1599-1613. BURLEIGH, J. G. AND S. MATHEWS. 2007. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO, P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, AND M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822-846. DES MARAIS, D. L., A. R. SMITH, D. M. BRITTON, AND K. M. PRYER. 2003. Phylogenetic relationships and evolution of extant horsetails, Equisetum, based on chloroplast DNA sequence data (rbcL and trnL-F). Int. J. Plant Sci. 164: 737-751. DOYLE, J. A. 1998. Phylogeny of vascular plants. Ann. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A. 2006. Seed ferns and the origin of angiosperms. J. Torreya Bot. Soc. 133: 169- 209. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783\u00E2\u0080\u0093791. GIFFORD, E. M. AND A. S. FOSTER. 1989. Morphology and evolution of vascular plants. 3 rd ed. W.H. Freeman, New York, NY. GRAHAM, S. W., J. R. KOHN, B. R. MORTON, J. E. ECKENWALDER, AND S. C. H. BARRETT. 155 1998. Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. Syst. Biol. 47: 545-567. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161: S83- S96. GUILLON, J. M. 2004. Phylogeny of horsetails (Equisetum) based on the chloroplast rps4 gene and adjacent noncoding sequences. Syst. Bot. 29: 251-259. GUILLON, J. M. 2007. Molecular phylogeny of horsetails (Equisetum) including chloroplast atpB sequences. J. Plant Res. 120: 569-574. GUINDON, S. AND O. GASCUEL. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704. HASEBE, M., P. G. WOLF, K. M. PRYER, K. UEDA, M. ITO, R. SANO, G. J. GASTONY, J. YOKOYAMA, J. R. MANHART, N. MURAKAMI, E. H. CRANE, C. H. HAUFLER, AND W. D. HAUK. 1995. Fern phylogeny based on rbcL nucleotide sequences. Am. Fern J. 85: 134- 181. 156 HAUK, W. D., C. R. PARKS, AND M. W. CHASE. 2003. Phylogenetic studies of Ophioglossaceae: evidence from rbcL and trnL-F plastid DNA sequences and morphology. Mol. Phylogenet. Evol. 28: 131-151. KENRICK, P. AND P. R. CRANE. 1997. The origin and early diversification of land plants: a cladistic study. Smithsonian Press, Washington, D.C., USA. KORALL, P. AND P. KENRICK. 2004. The phylogenetic history of Selaginellaceae based on DNA sequences from the plastid and nucleus: extreme substitution rates and rate heterogeneity. Mol. Phylogenet. Evol. 31: 852-864. KOSAKOVSKY POND, S. L., S. D. W. FROST, AND S. V. MUSE. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676-679. KRANZ, H. D., AND V. A. R. HUSS. 1996. Molecular evolution of pteridophytes and their relationship to seed plants: evidence from complete 18S rRNA gene sequences. Plant Syst. Evol. 202: 1-11. LANG, W. H. 1937. On the plant remains from the Downtonian of England and Wales. Phil. Trans. R. Soc. 227B: 245-291. MURDOCK, A. G. 2008. Phylogeny of marattioid ferns (Marattiaceae): Inferring a root in the absence of a closely related outgroup. Amer. J. Bot. 95: 626-641. NICKRENT, D. L., C. L. PARKINSON, J. D. PALMER, AND R. J. DUFF. 2000. Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants. Mol. Biol. Evol. 17: 1885-1895. POSADA, D., AND K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14: 817-818. 157 PRYER, K. M., A. R. SMITH, AND J. E. SKOG. 1995. Phylogenetic relationships of extant ferns based on evidence from morphology and rbcL sequences. Am. Fern J. 85: 205-282. PRYER, K. M., H. SCHNEIDER, A. R. SMITH, R. CRANFILL, P. G. WOLF, J. S. HUNT, AND S. D. SIPES. 2001. Horsetail and ferns are a monophyletic group and the closest living relatives to seed plants. Nature 409: 618-622. PRYER, K. M., E. SCHUETTPELZ, P. G. WOLF, H. SCHNEIDER, A. R. SMITH, AND R. CRANFILL. 2004. Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am. J. Bot. 91: 1582-1598. RAI, H. S., H.E. O\u00E2\u0080\u0099BRIEN, P. A. REEVES, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phylogenet. Evol. 29: 350-359. RAI, H. S., P. A. REEVES, R. PEAKALL, R. G. OLMSTEAD, AND S. W. GRAHAM. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 86: 659- 669. RAUBESON, L. A., AND R. K. JANSEN. 1992. Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science 255: 1697-1699. RAUBESON, L. A., AND D. B. STEIN. 1995. Insights into Fern Evolution from Mapping Chloroplast Genomes. Am. Fern J. 85: 193-204. ROTHWELL, G. W. 1999. Fossils and ferns in the resolution of land plant phylogeny. Bot. Rev. 65: 188-218. ROTHWELL, G. W., AND K. C. NIXON. 2006. How does the inclusion of fossil data change our conclusions about the phylogenetic history of euphyllophytes? Int. J. Plant Sci. 167: 737- 749. 158 ROTHWELL, G. W., AND R. A. STOCKEY. 2008. Phylogeny and evolution of ferns: a paleontological perspective. Pp. 332-366. In The biology and evolution of ferns and lycophytes. Edited by T. A. Rankor and C. H. Haufler. Cambridge University Press, NY. RYDIN, C. AND N. WIKSTR\u00C3\u0096M. 2002. Phylogeny of Isoetes (Lycopsida): resolving basal relationships using rbcL sequences. Taxon 51: 83-89. SAARELA, J. M., H. S. RAI, J. A. DOYLE, P. K. ENDRESS, S. MATHEWS, A. D. MARCHANT, B. G. BRIGGS, AND S. W. GRAHAM. 2007. Hydatellaceae identified as a new branch near the base of the angiosperm phylogenetic tree. Nature 446: 312-315. SCHNEIDER, H., E. SCHUETTPELZ, K. M. PRYER, R. CRANFILL, S. MAGALL\u00C3\u0093N, AND R. LUPIA. 2004. Ferns diversified in the shadow of angiosperms. Nature 428: 553-557. SCHUETTPELZ, E., P. KORALL, AND K. M. PRYER. 2006. Plastid atpA data provide improved supports for deep relationships among ferns. Taxon 55: 897-906. SCHUETTPELZ, E. AND K. M. PRYER. 2007. Fern phylogeny inferred from 400 leptosporangiate species and three plastid genes. Taxon 56: 1037-1050. SMITH, A. R., K. M. PRYER, E. SCHUETTPELZ, P. KORALL, H. SCHNEIDER, AND P. G. WOLF. 2006. A classification for extant ferns. Taxon 55: 705-731. SOLTIS, P. S., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402:402-404. SWOFFORD, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. WAGNER, W. H. 1990. Ophioglossaceae. Pp. 193-197 In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. 159 WIKSTR\u00C3\u0096M, N. AND P. KENRICK. 1997. Phylogeny of Lycopodiaceae (Lycopsida) and the relationships of Phylloglossum drummondii Kunze based on rbcL sequences. Int. J. Plant Sci. 158: 862-871. WIKSTR\u00C3\u0096M, N. AND P. KENRICK. 2000. Relationships of Lycopodium and Lycopodiella based on combined plastid rbcL gene and trnL intron sequence data. Syst. Bot. 25: 495-510. WIKSTR\u00C3\u0096M, N. AND K. M. PRYER. 2005. Incongruence between primary sequence data and the distribution of a mitochondrial atp1 group II intron among ferns and horsetails. Mol. Phylogenet. Evol. 36: 484-493. WOLF, P. G., C. A. ROWE, R. B. SINCLAIR, AND M. HASEBE. 2003. Complete nucleotide sequence of the chloroplast genome from a leptosporangiate fern, Adiantum capillus- veneris L. DNA Res. 10: 59-65. 160 CHAPTER 6 CONCLUSION 6.1 OVERALL CONCLUSIONS I collected and analyzed a large amount of plastid DNA sequence data to address several questions relating to the deep branches of the vascular plants. In Chapters 2 and 3, I focused on broad-scale relationships within the two largest groups of gymnosperms, the cycads and conifers. In Chapter 4, with the addition of previously published data from the largest seed-plant group, the angiosperms, I addressed the vexing issue of seed-plant relationships in general. Finally, in Chapter 5, I reconstructed relationships among the major lineages of the ferns and relatives (monilophytes), with a particular focus on the backbone of leptosporangiate fern phylogeny. All four studies surveyed a large set of exemplar taxa for a more or less consistent set of plastid regions, the largest that has been attempted to date for this number of taxa. I encountered unusual or noteworthy molecular evolutionary phenomena (Chapters 2, 3), evidence of systematic bias in the inference of deep seed-plant relationships (Chapter 4) and used a likelihood-based filtering method for removing potentially problematic rapidly evolving characters (Chapters 3-5). 6.1.1 Reconstruction of Higher-Order Relationships in Cycad Phylogeny I reconstructed phylogenetic relationships in cycads using 17 plastid genes and with two different optimality criteria (MP and ML). Higher-order cycad relationships have proven difficult to infer, apparently because of a very slow rate of molecular evolution, perhaps coupled with some relatively rapid radiations. I found substantial support for most 161 of the inferred backbone of cycad phylogeny, and weak evidence that the sister group of the cycads among living seed plants is Ginkgo biloba. Cycas (representing Cycadaceae) is the sister-group of the remaining cycads; Dioon is part of the next most basal split. I found two of the three major families of cycads (Zamiaceae and Stangeriaceae) not to be monophyletic; Stangeria (Stangeriaceae) is embedded within Zamiaceae, close to Zamia and Ceratozamia, and is not closely allied to the other genus of Stangeriaceae, Bowenia. These findings are congruent with a recently expanded taxonomic sampling of my data set (Zgurski et al., 2008), in which we obtained a complete genus-level sample for the cycads. In contrast to the other seed plants, cycad chloroplast genomes share two features with Ginkgo; a reduced rate of evolution and an elevated transition:transversion ratio. The latter aspect of their molecular evolution is unlikely to have affected inference of cycad relationships in the context of seed- plant wide analyses, as I demonstrated that large variations in the transition:transversion ratios across the seed plants seem to have no affect on tree topology when all seed-plant taxa are included. 6.1.2 Reconstruction of Higher-Order Relationships in Conifer Phylogeny In Chapter 3, I reconstructed the broad backbone of conifer phylogeny using 22 exemplar conifer species. Parsimony and likelihood analyses recover the same higher-order relationships, and I find strong support for most of the deep splits in conifer phylogeny, including those within the two families that I sampled most heavily, Araucariaceae and Cupressaceae. My findings are broadly congruent with other recent studies (e.g., Gadek et al., 2000; Quinn et al., 2002), and are inferred with comparable or improved bootstrap support. The deepest phylogenetic split in conifers is inferred to be between Pinaceae and all 162 other conifers (Cupressophyta). Within the Cupressophyta clade I recovered well-supported relationships among Cephalotaxaceae, Cupressaceae, Sciadopityaceae, and Taxaceae. My data are consistent with recent moves to recognize Cephalotaxus under Taxaceae, and find strong support for a sister-group relationship between the two predominantly southern hemisphere conifer families, Araucariaceae and Podocarpaceae. I argue that Phyllocladus should be recognized under Podocarpaceae, despite residual uncertainty about its relationships to other podocarps. I also identified an unusual local hotspot of indel evolution shared by the latter two conifer families in the coding portion of a plastid ribosomal protein gene, rps7, which has become greatly expanded in a subset of conifers (Araucariaceae and Podocarpaceae). I found that the removal of the most rapidly evolving plastid characters, as defined using a likelihood-based classification of substitution rates for the taxa considered in this thesis, has little to no effect on inferences of higher-order conifer relationships. 6.1.3 Seed-Plant Phylogeny: Inference and Misinference of Higher-Order Relationships The gene sampling used throughout this thesis appears to be sufficient for recovering strong and congruent support within the major clades of vascular plants (angiosperms, conifers, cycads; e.g., Chapters 2 and 3; Graham and Olmstead, 2000; Graham et al., 2006), but the placement of Gnetales in the seed plant phylogeny remains unresolved. The Gnetales clade is one of the few extant lineages of gymnosperms, together with conifers, cycads and Ginkgo. I examined the possibility that systematic error contributes to conflicting suggestions on Gnetales placement. It appears to: I used a simulation approach to show that the \u00E2\u0080\u009CGnetales-sister\u00E2\u0080\u009D hypothesis (in which Gnetales are the sister group of all other extant seed plants) found in several recent molecular studies may be a long-branch artifact, 163 especially when using maximum parsimony as the reconstruction method. I showed that the use of model-based methods, in particular maximum likelihood, appears to be less prone to systematic error, especially when combined with the removal of rapidly evolving sites; however, different partitions of the data can still produce strongly conflicting results for ML (e.g., different rate class partitions), even when the simulations suggest that there is little systematic error within each of them. Disturbingly, despite zero character support in the real data for the anthophyte relationship according to ML (reflected in the constrained trees used to simulate the anthophyte hypothesis), MP inference of these simulated data usually recovered an alternative relationship (the Gnetales-sister tree) that also had no character support from the anthophyte model tree. Finally, I showed that tree misinference using ML is unlikely to be purely a function of rapidly evolving characters, since third-codon position data, which include both rapidly and slowly evolving characters, are inferred to be much more error-prone using this inference method than are plastid data filtered to remove invariant and slowly to moderately evolving nucleotides. 6.1.4 Monilophytes and Deep Vascular-Plant Phylogeny In Chapter 5, I produced the largest survey of monilophytes to date using 34 exemplar taxa and the same plastid regions used throughout this thesis. I also address the broader issue of deep vascular-plant relationship with the addition of the third major line of vascular plants, the lycophytes. I was able to recover most of the backbone relationships with a high level of bootstrap support. The results of this study are generally congruent with several recently published studies that also address the same question (Pryer et al., 2004; Schuettpelz et al., 2006), although I found improved support in several key areas, for example, along the 164 backbone of leptosporangiate ferns, and for a sister-group relationship between the tree ferns (Cyatheales) and Polypodiales. In contrast to these recent studies, I found that the sister- group of all other monilophytes is Equisetum, although its placement is only moderately supported by the current data. ML analyses of the full data set suggest that Marattiaceae may be the sister group of leptosporangiate ferns. When I used likelihood-based filtering of rapidly evolving characters, I found that the net effect was to generally reduce support for backbone relationships of the vascular plants, but that this did not affect the overall topology recovered when compared to analyses of the full data set. Taxon-exclusion analyses using Selaginella and bryophyte outgroups affect relationships in the monilophytes, and suggest that Selaginella, at least, is potentially problematic when included in analyses of vascular- plant phylogeny at current taxon densities. Equisetum and other monilophyte taxa with relatively long branches may pose a more significant obstacle to accurate phylogenetic inference among the major monilophyte clades, and so additional taxonomic sampling in these lineages may be useful for more clearly resolving the basal nodes of monilophyte phylogeny. 6.2 SUMMARY AND FUTURE DIRECTIONS An overall goal of my research was to solidify our basic knowledge of deep vascular- plant relationships, particularly the evolutionary relationships within the cycads, conifers, and monilophytes, and to provide well-supported phylogenies at each of these deep levels of relationship. The data I collected permit robust phylogenetic inference of much of the vascular-plant backbone, and strong support for the monophyly of, and relationships within, the largest clades examined here (i.e., conifers, cycads and leptosporangiate ferns). In 165 addition to this solid framework, I have shown that maximum likelihood is substantially less prone to systematic error when reconstructing phylogenetic relationship than maximum parsimony, and that filtering plastid data according to ML-based rate classifications can be useful when systematic error is evident or suspected. However, it does not solve the source of the conflict among different data partitions, at least concerning the inference of overall seed-plant relationships. Several questions remain, including the ever-lingering question of Gnetales placement within the seed plants, and relationships among the major clades of monilophytes. Current simulated data sets of seed-plant phylogeny as a whole, and the models used to simulate them, may not be complex enough to truly reflect fundamental but incompletely-characterized processes of molecular evolution in these taxa, and as a result, current analyses of higher-order relationship may still be considerably biased in seed plants, at least. Since monilophyte relationships involve even deeper and longer branches, we should consider the possibility that some of these relationships may also be prone to strong misinference. It may be that the key to resolving the seed-plant mystery lies in collaborative efforts that seek to incorporate more nuanced analyses of molecular data for each plant genome and their constituent genes, and by the incorporation of morphological data sets that include the many missing extinct lineages. 166 6.3 REFERENCES GADEK, P. A., D. L. ALPERS, M. M. HESLEWOOD, AND C. J. QUINN. 2000. Relationships within Cupressaceae sensu lato: A combined morphological and molecular approach. Am. J. Bot. 87: 1044-1057. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730. GRAHAM, S.W., J.M. ZGURSKI, M.A. MCPHERSON, D.M. CHERNIAWSKI, J.M. SAARELA, V.L. BIRON, J.C. PIRES, R.G. OLMSTEAD, M.W. CHASE AND H.S. RAI. 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Pp. 3-21. In Monocots: comparative biology and evolution (excluding Poales) Edited by J. T. Columbus, E. A. Friar, J. M. Porter, L. M. Prince, and M. G. Simpson. Rancho Santa Ana Botanic Garden, Claremont, CA. PRYER, K. M., E. SCHUETTPELZ, P. G. WOLF, H. SCHNEIDER, A. R. SMITH, AND R. CRANFILL. 2004. Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am. J. Bot. 91: 1582-1598. QUINN, C. J., R. A. PRICE, AND P. A. GADEK. 2002. Familial concepts and relationships in the conifers based on rbcL and matK sequence comparisons. Kew Bull. 57: 513-531. SCHUETTPELZ, E., P. KORALL, AND K. M. PRYER. 2006. Plastid atpA data provide improved supports for deep relationships among ferns. Taxon 55: 897-906. ZGURSKI, J. M., H. S. RAI, Q. M. FAI, D. J. BOGLER, J. FRANCISCO-ORTEGA, S. W. GRAHAM. 2008. How well do we understand the overall backbone of cycad phylogeny? New insights from a large, multigene plastid data set. Mol. Phylogenet. Evol. 47: 1232-1237. "@en . "Thesis/Dissertation"@en . "2009-05"@en . "10.14288/1.0066904"@en . "eng"@en . "Botany"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@en . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@en . "Graduate"@en . "Molecular phylogenetic studies of the vascular plants"@en . "Text"@en . "http://hdl.handle.net/2429/3889"@en .