UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Molecular phylogenetic studies of the vascular plants Rai, Hardeep Singh 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2009_spring_rai_hardeep.pdf [ 20.94MB ]
Metadata
JSON: 24-1.0066904.json
JSON-LD: 24-1.0066904-ld.json
RDF/XML (Pretty): 24-1.0066904-rdf.xml
RDF/JSON: 24-1.0066904-rdf.json
Turtle: 24-1.0066904-turtle.txt
N-Triples: 24-1.0066904-rdf-ntriples.txt
Original Record: 24-1.0066904-source.json
Full Text
24-1.0066904-fulltext.txt
Citation
24-1.0066904.ris

Full Text

MOLECULAR PHYLOGENETIC STUDIES OF THE VASCULAR PLANTS by Hardeep Singh Rai  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Botany)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) December 2008 © Hardeep Singh Rai, 2008  ABSTRACT  To investigate vascular-plant phylogeny at deep levels of relationship, I collected and analyzed a large set of plastid-DNA data comprising multiple protein-coding genes and associated noncoding regions. I addressed questions relating to overall tracheophyte phylogeny, including relationships among the five living lineages of seed plants, and within two of the largest living gymnosperm clades (conifers and cycads). I also examined relationships within and among the major lineages of monilophytes (ferns and relatives), including their relationship to the remaining vascular plants. Overall, I recovered three wellsupported lineages of vascular plants: lycophytes, monilophytes, and seed plants. I inferred strong support for most of the phylogenetic backbones of cycads and conifers. My results suggest that the cycad family Stangeriaceae (Stangeria and Bowenia) is not monophyletic, and that Stangeria is instead more closely related to Zamia and Ceratozamia. Within the conifers, I found Pinaceae to be the sister-group of all other conifers, and I argue that two conifer genera, Cephalotaxus and Phyllocladus (often treated as monogeneric families) should be recognized under Taxaceae and Podocarpaceae, respectively. Systematic error likely affects inference of the placement of Gnetales within seed-plant phylogeny. As a result, the question of the relationships among the five living seed-plant groups still remains largely unresolved, even though removal of the most rapidly evolving characters appears to reduce systematic error. Phylogenetic analyses that included these rapidly evolving characters often led to the misinference of the “Gnetales-sister” hypothesis (Gnetales as the sister-group of all other seed plants), especially when maximum parsimony was the inference method. Filtering of rapidly evolving characters had little effect on inference of higher-order  ii  relationships within conifers and monilophytes, and generally resulted in reduced support for backbone relationships. Within the monilophytes, I found strong support for the majority of relationships along the backbone. These were generally congruent with other recent studies. Equisetaceae and Marattiaceae may be, respectively, the sister-groups of the remaining monilophytes and of the leptosporangiate ferns, but relationships among the major monilophyte lineages are sensitive to the outgroups used, and to long branches in lycophytes.  iii  TABLE OF CONTENTS  ABSTRACT.............................................................................................................................. ii TABLE OF CONTENTS......................................................................................................... iv LIST OF TABLES................................................................................................................. viii LIST OF FIGURES ................................................................................................................. ix ACKNOWLEDGEMENTS..................................................................................................... xi DEDICATION........................................................................................................................ xii CO-AUTHORSHIP STATEMENT....................................................................................... xiii CHAPTER 1. INTRODUCTION..............................................................................................1 1.1 Overview of Vascular Plant Systematics................................................................1 1.2 Objectives of the Thesis..........................................................................................3 1.3 References...............................................................................................................7 CHAPTER 2. INFERENCE OF HIGHER-ORDER RELATIONSHIPS IN THE CYCADS FROM A LARGE CHLOROPLAST DATA SET ......................................12 2.1 Introduction...........................................................................................................12 2.2 Materials and Methods..........................................................................................14 2.2.1 Taxonomic and Genomic Sampling.............................................................14 2.2.2 DNA Extraction, Amplification, and Sequencing .......................................15 2.2.3 Data Assembly .............................................................................................16 2.2.4 Phylogenetic Analysis..................................................................................17 2.3 Results...................................................................................................................18 2.3.1 26-Taxon Data Set .......................................................................................18  iv  2.3.2 Cycads and Ginkgo ......................................................................................18 2.3.3 Molecular Evolution in Cycads and Relatives.............................................19 2.4 Discussion .............................................................................................................20 2.4.1 Cycad Molecular Evolution .........................................................................20 2.4.2 Basal Cycads and the Sister Group of Cycadales ........................................21 2.5 References.............................................................................................................37 CHAPTER 3. INFERENCE OF HIGHER-ORDER CONIFER RELATIONSHIPS FROM A MULTI-LOCUS PLASTID DATA SET ......................................................43 3.1 Introduction...........................................................................................................43 3.2 Materials and Methods..........................................................................................46 3.2.1 Plant Material and Genomic Sampling ........................................................46 3.2.2 Recovery of Plastid Sequences, DNA Alignment and Characterization of an Indel Hotspot ...............................................................................................47 3.2.3 Phylogenetic Analyses .................................................................................48 3.3 Results...................................................................................................................50 3.4 Discussion .............................................................................................................53 3.4.1 Rapidly Evolving Plastid DNA Sites and the Inference of Higher-Order Conifer Relationships ..................................................................................53 3.4.2 The Ovulate Cone and Conifer Systematics ................................................54 3.4.3 The Case for Recognizing Cephalotaxus as a Member of Taxaceae...........55 3.4.4 Relationships within Cupressaceae..............................................................57 3.4.5 The Higher-Order Position of Phyllocladus in Conifer Phylogeny.............58 3.4.6 The Position of Wollemia within Araucariaceae .........................................58  v  3.4.7 Significance of an Expansion Hotspot in the Plastid Ribosomal Protein Gene rps7 ....................................................................................................59 3.4.8 Seed-Plant Phylogeny and the Position of Gnetales ....................................60 3.4.9 The Inference of Conifer Phylogeny from Plastid Data ..............................61 3.5 References.............................................................................................................71 CHAPTER 4. INFERENCE AND MISINFERENCE OF HIGHER-ORDER SEEDPLANT RELATIONSHIPS FROM PLASTID DATA ................................................80 4.1 Introduction...........................................................................................................80 4.2 Materials and Methods..........................................................................................84 4.2.1 Taxonomic and Genomic Sampling.............................................................84 4.2.2 DNA Extraction, Amplification, Sequencing and Data Assembly..............85 4.2.3 Phylogenetic Analyses .................................................................................86 4.2.4 Inference of Nucleotide Rate Classes ..........................................................87 4.2.5 Systematic Error...........................................................................................87 4.3 Results...................................................................................................................89 4.3.1 Phylogenetic Analysis of the Real Data.......................................................89 4.3.2 Inference of Systematic Error Using Monte Carlo Simulations ..................91 4.3.3 Mis-inference of the Gnetales-Sister Hypothesis when there is No Evidence for It.............................................................................................................92 4.4 Discussion .............................................................................................................93 4.5 References...........................................................................................................115  vi  CHAPTER 5. INFERENCE OF DEEP VASCULAR-PLANT PHYLOGENY, WITH A FOCUS ON BACKBONE RELATIONSHIPS IN MONILOPHYTA......................123 5.1 Introduction.........................................................................................................123 5.2 Materials and Methods........................................................................................127 5.2.1 Taxonomic and Genomic Sampling...........................................................127 5.2.2 DNA Extraction, Amplification and Sequencing ......................................129 5.2.3 Phylogenetic Analyses ...............................................................................129 5.2.4 Inference of Nucleotide Rate Classes and Exploration of the Effect of Long Branches ....................................................................................................130 5.3 Results.................................................................................................................131 5.3.1 Phylogenetic Analyses ...............................................................................131 5.3.2 Rate Class Analyses...................................................................................134 5.4 Discussion ...........................................................................................................136 5.5 References...........................................................................................................154 CHAPTER 6. CONCLUSION ...............................................................................................160 6.1 Overall Conclusions............................................................................................160 6.1.1 Reconstruction of Higher-Order Relationships in Cycad Phylogeny ........160 6.1.2 Reconstruction of Higher-Order Relationships in Conifer Phylogeny .....161 6.1.3 Seed-Plant Phylogeny: Inference and Misinference of Higher-Order Relationships .............................................................................................162 6.1.4 Monilophytes and Deep Vascular-Plant Phylogeny ..................................163 6.2 Summary and Future Directions .........................................................................164 6.3 References ...........................................................................................................166  vii  LIST OF TABLES  2.1  GenBank accession numbers and vouchers for exemplar taxa with one or more previously unpublished DNA sequences .....................................................................24  2.2  Lengths and variation in length of the noncoding regions used in this study..............28  2.3  Likelihood ratio test (LRT) for different substitution models .....................................30  2.4  Estimated likelihood parameters (HKY + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2.................................................31  2.5  Estimated likelihood parameters (GTR + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2.................................................32  3.1  Source information and GenBank numbers.................................................................63  4.1  New primers designed for this study ...........................................................................98  4.2  Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum parsimony (MP) as a search criterion......................................100  4.3  Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum likelihood (ML) as a search criterion ......................................102  4.4  Major seed-plant hypotheses inferred from simulations of various partitions of the real data constrained to the anthophyte hypothesis (Gnetales united with angiosperms). Both maximum parsimony (MP) and maximum likelihood results are shown .........................................................................................................................104  5.1  GenBank accession numbers and vouchers for exemplar pteridophyte (and outgroup) taxa.............................................................................................................................140  5.2  New primers designed for this study .........................................................................147  viii  LIST OF FIGURES  1.1  Generalized phylogenetic relationships of the land plants ............................................6  2.1  Summary of Stevenson’s (1992) classification of the cycads .....................................34  2.2  Chloroplast based phylogeny of the cycads and relatives. ..........................................35  2.3  Chloroplast based phylogeny of the cycads using different optimality criteria...........36  3.1  Plastid-based phylogeny of the conifers and relatives inferred from MP....................67  3.2  Plastid-based phylogeny of the conifers and relatives inferred from ML ...................68  3.3  Summary of bootstrap support after removal of sites classified as the two of nine fastest rate classes ........................................................................................................69  3.4  Dot-plot showing the pairwise similarity of complete translated sequences of the plastid rps7 locus from selected conifers.....................................................................70  4.1  Various seed-plant topologies proposed in the literature with regard to the position of Gnetales......................................................................................................................105  4.2  Plastid-based phylogeny of the conifers and relatives...............................................106  4.3  Maximum likelihood tree found using coding regions from 17 plastid genes and including all 9 rate classes .........................................................................................107  4.4A  Proportion of the total nucleotides in each of two codon-position plastid data partitions that belong to different rate classes............................................................109  4.4B  Proportion of the total characters in each of nine rate classes that belong to the two codon position data partitions ....................................................................................110  4.5  Maximum likelihood tree found using codon positions 1 and 2 for multiple plastid genes ..........................................................................................................................111  ix  4.6  Depiction of the zero-length branch when maximum likelihood is used as the criterion for viewing the anthophyte hypothesis ......................................................................112  Sup.1 Relationships within the conifer clades presented in Figs. 4.2 and 4.3 .....................114 5.1  The consensus tree presented in Smith et al. (2006) based on recent and ongoing phylogenetic studies...................................................................................................148  5.2  Plastid-based phylogeny of the vascular plants. ........................................................149  5.3  Maximum likelihood tree found using 17 plastid genes and associated noncoding regions........................................................................................................................151  5.4  Maximum likelihood tree found using 17 plastid genes and associated noncoding regions, excluding the two fastest rate classes and including or excluding Selaginella..................................................................................................................152  5.5  Placement of Equisetum from various taxon-exclusion analyses for the plastid data considered here ..........................................................................................................153  x  ACKNOWLEDGEMENTS  I am indebted to my Ph.D. supervisor, Dr. Sean W. Graham, for his guidance over the course of my graduate career, and for allowing me to pursue my research interests in his laboratory. I also thank the members of my supervisory committees at the University of Alberta (Drs. Ruth Stockey and Felix Sperling) and the University of British Columbia (Drs. Jeannette Whitton, Quentin Cronk, Wayne Maddison, and Patrick Keeling) for their help and guidance throughout my graduate career. My laboratory, herbarium, and field work has been supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grants to Sean Graham, the University of British Columbia, and the Department of Botany (University of British Columbia). Personal financial support was provided by a Postgraduate Scholarship (PGS-D), a University Graduate Fellowship (University of British Columbia), and an NSERC Discovery Grant to Sean W. Graham.  xi  DEDICATION  For my family Michelle, Symrin and Darshan Rai  xii  CO-AUTHORSHIP STATEMENT  Chapter 2 is based on a published manuscript: Rai, H. S., H. E. O’Brien, P. A. Reeves, R. G. Olmstead, and S. W. Graham. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Molecular Phylogenetics and Evolution 29: 350359. The project was suggested by S. W. Graham. I conducted all laboratory work and data analyses, and wrote the manuscript. S. W. Graham provided insights into data analyses and contributed to the writing.  Chapter 3 is based on a published manuscript: Rai, H. S., P. A. Reeves, R. Peakall, R. G. Olmstead, and S. W. Graham. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 29: 350-359. Sean W. Graham and I designed the project. I conducted all laboratory work and data analyses, and wrote the manuscript. S. W. Graham provided insights into taxonomic sampling, data analyses and contributed to the writing.  Chapter 4 is a draft manuscript that will be submitted for publication: Rai, H. S., and S. W. Graham. Inference and misinference of higher-order seed-plant relationships from plastid data. Sean W. Graham and I designed the project. I carried out the laboratory work, designed 27 new seed-plant specific primers, carried out data analyses and wrote the manuscript. Sean W. Graham provided valuable guidance with respect to data analyses and writing.  xiii  Chapter 5 is a draft manuscript that will be submitted for publication: Rai, H. S., and S. W. Graham. Deep vascular plant phylogeny with a focus on the backbone of Monilophyta. Sean W. Graham and I designed the project. I carried out the laboratory work, designed 16 new monilophyte specific primers, carried out data analyses and wrote the manuscript. Many plant and DNA samples were kindly provided by Drs. P. G. Wolf (Utah State University) and K. M. Pryer (Duke University). Sean W. Graham provided valuable guidance with respect to taxonomic sampling, data analyses and writing.  xiv  CHAPTER 1 INTRODUCTION  1.1 OVERVIEW OF VASCULAR PLANT SYSTEMATICS The vascular plants (tracheophytes) include all extant plants with branched sporophytes. They usually have true roots, stems and leaves and possess a system of vascular tissue that transport water and nutrients between different parts of the plant. With a known fossil record that stretches back at least 410 Myr (Lang, 1937), tracheophytes include true rhyniophytes (such as Rhynia gwynne-vaughanii; Kenrick and Crane, 1997), small plants with simple bifurcating stems that are most likely the sister-group of all other vascular plants (Judd et al., 2008). Several extinct lineages were once placed within the rhyniophytes, most notably Cooksonia cambrensis and Aglaophyton (Rhynia) major. Cooksonia has recently been found to be a non-monophyletic assemblage of extinct species, with Cooksonia cambrensis more closely related to lycophytes (Fig. 1.1) than to other described Cooksonia species that are more closely related to the euphyllophytes (Kenrick and Crane, 1997; Crane et al., 2004). Aglaophyton major (originally described as Rhynia major) has been shown to lack true secondary thickening in its xylem and thus is no longer classified as a vascular plant (although it is likely to be the sister-group of tracheophytes; Edwards, 1986; Crane et al., 2004). Broadly, extant tracheophytes can be viewed as three distinct lineages: spermatophytes (seed plants), monilophytes (including all living ferns), and lycophytes. Immature sporophytes (embryos) of spermatophytes are enveloped in one or more integument layers within the seed, together with nutritive tissue that is often derived from the  1  megasporangium or megagametophyte. Seed plants, and especially flowering plants make up much of the world’s current plant diversity. The lycophytes, are the most species-poor of the three major vascular-plant clades, and are now recognized as the sister-group of all other extant vascular plants (Fig. 1.1; Raubeson and Jansen, 1992; Kenrick and Crane, 1997; Pryer et al., 2001, 2004; Rydin et al., 2002). Within the extant euphyllophytes (spermatophytes and monilophytes), recent studies have revealed two major clades; monilophytes (including whisk ferns, horsetails, and eusporangiate and leptosporangiate ferns), and spermatophytes (Pryer et al., 2001, 2004). Despite considerable progress regarding relationships within many of their constituent subclades (e.g., the angiosperms, APG II, 2003; conifers, Quinn et al., 2002; leptosporangiate ferns, Schuettpelz et al., 2006; Schuettpelz and Pryer, 2007) and a large number of studies using both morphological and molecular evidence (Raubeson and Jansen, 1992; Rothwell and Serbet, 1994; Kranz and Huss, 1996; Kenrick and Crane, 1997; Doyle, 1998; Duff and Nickrent, 1999; Rothwell, 1999; Soltis et al., 1999; Nickrent et al., 2000; Pryer et al., 2001, 2004; Rydin et al., 2002; Doyle, 2006; Rothwell and Nixon, 2006), deep relationships among these vascular-plant groups remain largely unsettled. The monilophytes (Monilophyta; Cantino et al., 2007) comprise the eusporangiate ferns, psilotophytes (whisk ferns), equisetophytes (horsetails) and leptosporangiate ferns. Previous studies of the monilophytes, based on morphology and single-gene molecular studies, have left partly unclarified the relationships among major taxa (Duff and Nickrent, 1999; Rothwell, 1999; Kenrick and Crane, 1997; Kranz and Huss, 1996; Pryer et al., 1995; Smith, 1995; Manhart, 1994; Pichi Sermolli, 1974). A four-gene study (Pryer et al., 2001, 2004) of vascular plants and subsequent five-gene follow-up (Schuettpelz et al., 2006) have  2  provided strong bootstrap support for some of the deepest relationships, mostly within leptosporangiate ferns. The reconstruction of seed-plant relationships is recognized as one of the most difficult problems in plant systematics (e.g., Donoghue and Doyle, 2000). A wide range of studies using evidence from morphology and molecules have given many different and often strongly conflicting results (e.g., Hill and Crane, 1982; Doyle and Donoghue, 1986; Loconte and Stevenson, 1990; Rothwell and Serbet, 1994; Boivin et al., 1996; Doyle, 1996; Goremykin et al., 1996; Chaw et al., 1997, 2000; Bowe et al., 2000; Sanderson, 2000; Rydin et al., 2002; Burleigh and Mathews, 2004, 2007a, b). One problem that has plagued researchers working with molecular evidence is that gymnosperms (particularly the conifers among living taxa) have a diverse fossil record but relatively few extant representatives (e.g., Ginkgo with only a single living species as a remnant of a more diverse Mesozoic group; Thomas and Spicer, 1986), and so all extant lineages are subtended by very long interior branches, which may lead to long-branch attraction (Felsenstein, 1978; Penny and Hendy, 1985; Hendy and Penny, 1989). Approaches for dealing with these problematic long branches include denser taxonomic sampling within extant clades, consideration of conservatively evolving data (e.g., the plastid genome of plants, which displays a slower rate of evolution than its nuclear counterpart; Graham and Olmstead, 2000), adding additional molecular data, and examining the data collected for evidence of systematic bias (e.g., Sanderson et al., 2000). In this thesis I attempt to employ a fusion of these approaches, employing what currently represents the largest amount of data sampled per taxon with this level of taxonomic sampling.  3  1.2 OBJECTIVES OF THE THESIS I present data from multiple plastid genes for a relatively dense taxonomic sampling of seed plants [seven cycads representing all tribes recognized in Stevenson, 1992, Chapter 2; 22 conifers with multiple representatives for each of the seven families that are usually recognized, Chapter 3; and nine additional exemplar representatives of the remaining seed plant diversity (three Gnetales, five angiosperms, and Ginkgo); Chapter 4] to examine relationships both within and among these seed plant groups. In addition to basic phylogenetic reconstructions, I examine the effect of fast-evolving data (e.g., third codon positions, and the fastest evolving rate classes according to a maximum likelihood classification; Chapters 3, 4, 5) on phylogenetic inference, and also address the potential for misinference due to systematic error using Monte Carlo simulations (e.g. Sanderson et al., 2000; Chapter 4). In Chapter 5 I broadly explore monilophyte relationships using 64 representative taxa (34 of these are monilophyte exemplars). An overarching goal of this thesis is to solidify our basic knowledge of deep vascularplant relationships, with a focus on the deepest evolutionary relationships within these groups: cycads (Chapter 2), conifers (Chapter 3), seed plants as a whole (Chapter 4), and monilophytes, especially leptosporangiate ferns (Chapter 5). This thesis is based on a large sampling of the coding and noncoding regions of the plastid genome. These regions span 17 genes that form the backbone of all data collected for this work and include atpB, rbcL, ten Photosystem II (psb) genes, three ribosomal protein genes, and two NADH dehydrogenase subunit (ndh) genes. Three chapters (Chapters 2, 3, 5) also include 10-11 associated noncoding regions; three introns and eight intergenic spacer regions [leptosporangiate ferns possess a large inversion with a breakpoint within one of the intergenic spacers (Wolf et al.,  4  2003), and so one fewer region was included for them]. The regions surveyed are generally extremely slowly evolving (Graham and Olmstead, 2000), and should prove useful in addressing the impact of systematic bias in vascular-plant phylogenetic inference. In addition to inferring the deep portions of the vascular-plant “Tree of Life,” the larger significance of my thesis work is that it will provide more resolved and better supported phylogenies at each of these deep levels of relationship, for use by evolutionary and genomic biologists (for example, for studying the molecular evolution of the plastid genome) and by systematists (for constructing more natural classification schemes), in addition to using various approaches to gauge the degree to which tree inference can be trusted at the deepest and most difficult-to-infer points of plant phylogeny.  5  Figure 1.1. Generalized phylogenetic relationships of the land plants, modified from Judd et al. (2008; Fig. 7.8). Red bars indicate some synapomorphies that define several major clades (tracheophytes, lycophytes, and euphyllophytes).  †  indicates extinct taxa.  6  1.3 REFERENCES ANGIOSPERM PHYLOGENY GROUP (APG II). 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141: 399-436. BOIVIN, R., M. RICHARD, D. BEAUSEIGLE, J. BOUSQUET, AND G. BELLEMARE. 1996. Phylogenetic inferences from chloroplast chlB gene sequences of Nephrolepis exaltata (Filicopsida), Ephedra altissima (Gnetopsida), and diverse land plants. Mol. Phyl. Evol. 6: 19-29. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales’ closest relatives are conifers. Proc. Acad. Nat. Sci. 97: 4092-4097. BURLEIGH, J. G. AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am. J. Bot. 91: 1599-1613. BURLEIGH, J. G. AND S. MATHEWS. 2007a. Assessing among-locus variation in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 111-124. BURLEIGH, J. G. AND S. MATHEWS. 2007b. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO, P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, AND M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822-846. CHAW, S-M., A. ZHARKIKH, H-M. SUNG, T-C. LAU, AND W-H. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68.  7  CHAW, S., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Acad. Nat. Sci. 97: 4086-4091. CRANE, P. R., P. HERENDEEN, AND E. M. FRIIS. 2004. Fossils and plant phylogeny. Amer. J. Bot. 91: 1683-1699. DONOGHUE, M. J. AND J. A. DOYLE. 2000. Demise of the anthophyte hypothesis? Curr. Biol. 10: R106-R109. DOYLE, J. A. 1996. Seed plant phylogeny and the relationships of Gnetales. Int. J. Plant Sci. 157: S3-S39. DOYLE, J. A. 1998. Phylogeny of vascular plants. Ann. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A. 2006. Seed ferns and the origin of angiosperms. J. Torreya Bot. Soc. 133: 169209. DOYLE, J. A., AND M. J. DONOGHUE. 1986. Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach. Bot. Rev. 52:321-431. DUFF, R. J. AND D. L. NICKRENT. 1999. Phylogenetic relationships of land plants using mitochondrial small-subunit rDNA sequences. Am. J. Bot. 86: 372-386. EDWARDS, D. S. 1986. Aglaophyton major, a non-vascular land-plant from the Devonian Rhynie chert. Bot. J. Linn. Soc. 93: 173-204. FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27: 401-410. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730.  8  GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A. ANTONOV, AND W. MARTIN. 1996. Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Mol. Biol. Evol. 13: 383-396. HENDY, M. D. AND D. PENNY. 1989. Framework for the quantitative study of evolutionary trees. Syst. Zool. 38: 297-309. HILL, C. R., AND P. R. CRANE. 1982. Evolutionary cladistics and the origin of the angiosperms. Pages 269-361 In: Problems of phylogenetic reconstruction (K.A. Joysey and E.A. Friday, eds.). Academic Press, London. JUDD, W., C. S. CAMPBELL, E. A. KELLOGG, P. F. STEVENS, AND M. J. DONOGHUE. 2008. Plant systematics: A phylogenetic approach. 3rd ed. Sinauer Associates, Sunderland, Massachusetts. KENRICK, P. AND P. R. CRANE. 1997. The origin and early diversification of land plants: a cladistic study. Smithsonian Press, Washington, D.C., USA. KRANZ, H. D., AND V. A. R. HUSS. 1996. Molecular evolution of pteridophytes and their relationship to seed plants: evidence from complete 18S rRNA gene sequences. Plant Syst. Evol. 202: 1-11. LOCONTE, H. AND D. W. STEVENSON. 1990. Cladistics of the spermatophyta. Brittonia. 42: 197-211. LANG, W. H. 1937. On the plant remains from the Downtonian of England and Wales. Phil. Trans. R. Soc. 227B: 245-291. MANHART, J. R. 1994. Phylogenetic analysis of green plant rbcL sequences. Mol. Phylogenet. Evol. 3: 114-127.  9  NICKRENT, D. L., C. L. PARKINSON, J. D. PALMER, AND R. J. DUFF. 2000. Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants. Mol. Biol. Evol. 17: 1885-1895. PENNY, D., AND M. D. HENDY. 1985. Testing methods of evolutionary tree construction. Cladistics 1: 266-272. PICHI SERMOLLI, R. E. G. 1977. Tentamen pteridophytorum genera in taxonomicum ordinem redigendi. Webbia 31: 313-512. PRYER, K. M., H. SCHNEIDER, A. R. SMITH, R. CRANFILL, P. G. WOLF, J. S. HUNT, AND S. D. SIPES. 2001. Horsetail and ferns are a monophyletic group and the closest living relatives to seed plants. Nature 409: 618-622. PRYER, K. M., E. SCHUETTPELZ, P. G. WOLF, H. SCHNEIDER, A. R. SMITH, AND R. CRANFILL. 2004. Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am. J. Bot. 91: 1582-1598. QUINN, C. J., R. A. PRICE, AND P. A. GADEK. 2002. Familial concepts and relationships in the conifers based on rbcL and matK sequence comparisons. Kew Bull. 57: 513-531. RAUBESON, L. A., AND R. K. JANSEN. 1992. Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science 255: 1697-1699. ROTHWELL, G. W. 1999. Fossils and ferns in the resolution of land plant phylogeny. Bot. Rev. 65: 188-218. ROTHWELL, G. W., AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482.  10  ROTHWELL, G. W., AND K. C. NIXON. 2006. How does the inclusion of fossil data change our conclusions about the phylogenetic history of euphyllophytes? Int. J. Plant Sci. 167: 737749. RYDIN, C., M. KÄLLERSJÖ, AND E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci. 163:197-214. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17:782-797. SCHUETTPELZ, E., P. KORALL, AND K. M. PRYER. 2006. Plastid atpA data provide improved supports for deep relationships among ferns. Taxon 55: 897-906. SCHUETTPELZ, E. AND K. M. PRYER. 2007. Fern phylogeny inferred from 400 leptosporangiate species and three plastid genes. Taxon 56: 1037-1050. SMITH, A. R. 1995. Non-molecular phylogenetic hypotheses for ferns. Am. Fern. J. 85: 104122. SOLTIS, P. S., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402:402-404. STEVENSON, D. W. 1992. A formal classification of the extant cycads. Brittonia 44: 220-223. THOMAS, B. A. AND R. A. SPICER. 1986. The evolution and palaeobiology of land plants. Dioscorides Press, Portland, OR. WOLF, P. G., C. A. ROWE, R. B. SINCLAIR, AND M. HASEBE. 2003. Complete nucleotide sequence of the chloroplast genome from a leptosporangiate fern, Adiantum capillusveneris L. DNA Res. 10: 59-65.  11  CHAPTER 21 INFERENCE OF HIGHER-ORDER RELATIONSHIPS IN THE CYCADS FROM A LARGE CHLOROPLAST DATA SET  2.1. INTRODUCTION The cycads (order Cycadales) are one of five living groups of seed-bearing plants. They are long-lived trees and shrubs, with a highly distinctive vegetative and reproductive morphology (e.g., Chamberlain, 1965; Stevenson, 1981, 1990; Norstog, 1990; Jones, 1993; Norstog and Nicholls, 1997). They have a substantial and ancient (ca. 270 million years ago; Mamay, 1969) fossil record, and are currently recognized as comprising three or four families (Cycadaceae, Stangeriaceae and Zamiaceae; Boweniaceae was erected by Stevenson in 1981 but reduced by him to a subfamily of Stangeriaceae in 1992). In all phylogenetic studies to date, the basal placement of the monogeneric family Cycadaceae within the order has not been disputed. Cycas is one of the most distinctive entities, morphologically, in the cycads, and it is frequently used as an outgroup for studies of the remaining taxa. However, there is still substantial variation in published inferences of higher-order phylogenetic relationships among the remaining taxa, and most studies have not provided robust support (as measured by bootstrap or jackknife analysis, for example) for the broad backbone of cycad phylogeny (Crane, 1988; Stevenson, 1990; Caputo et al., 1993; Schutzman and Dehgan, 1993; Rydin et al., 2002; Bogler and Francisco-Ortega, 2004; see  1  A version of this chapter has been published: Rai, H.S., O’BRIEN, H.E., P.A. REEVES, R. G. OLMSTEAD, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phyl. Evol. 29: 350-359. 12  Caputo et al., 1991, for an exception). There is also a considerable difference of opinion about the relationship of the cycads to the other extant and extinct seed plant groups (see Doyle, 1998, for a recent summary of the diversity of findings regarding seed plant phylogeny). To better sort out higher-order relationships in the cycads I sequenced multiple coding and noncoding regions of the plastid genome (ca. one tenth of the entire genome) using primers designed by Graham and Olmstead (2000a). The regions examined comprise atpB, rbcL, ten Photosystem II genes, three ribosomal protein genes, two NADH dehydrogenase subunit genes, three introns and eight intergenic spacer regions (see below). These regions have proven useful for reconstructing phylogenetic relationships among deep branches of the angiosperms (Graham and Olmstead, 2000a,b; Graham et al., 2000) and within the monocots (Saarela et al., 2008). I examined exemplar species that represent all of the tribes recognized by Stevenson (1992; summarized in Fig. 2.1), in addition to a number of outgroup taxa (Table 2.1). The question of cycad placement in the seed plants, and of seed plant relationships in general, remains a thorny issue (Hill and Crane, 1982; Doyle and Donoghue, 1986; Loconte and Stevenson, 1990; Rothwell and Serbet, 1994; Boivin et al., 1996; Doyle, 1996, 1998; Goremykin et al., 1996; Chaw et al., 1997, 2000, Bowe et al., 2000; Sanderson et al., 2000; Rydin et al., 2002). I address the latter question briefly with a parsimony analysis using exemplars from all of the living seed plant groups and several free-sporing plants; a more detailed examination of seed plant relationships is presented in Chapter 4. While the focus of the current paper is on phylogenetic relationships within the cycads, I also examine several  13  features of the molecular evolution of the genes examined, and assess how these may have an impact upon inference of phylogeny within the order.  2.2 MATERIALS AND METHODS 2.2.1 Taxonomic and Genomic Sampling A total of 17 chloroplast genes and associated noncoding regions were used for this study, representing about a tenth of the entire genome. The various coding and noncoding regions examined are enumerated in Tables 2.1 and 2.2. The seven exemplar species of cycads examined here represent all families, subfamilies and tribes recognized by Stevenson (1992; Fig. 2.1). Most of the sequences for six of the seven cycads (and three of the five conifer exemplars) are completely new (Table 2.1). These data were added to sequences that were generated previously for studies of basal angiosperm phylogeny (Graham and Olmstead, 2000a,b; Graham et al., 2000; see the former two references for source and voucher information for previously published seed plant sequences). A few additional sequences were added for taxa that were examined previously for fewer regions (Graham and Olmstead 2000a,b), including ndhF for Zamia furfuracea and part of rps7, the rps7-ndhB intergenic spacer region, and ndhB for Sciadopitys verticillata. The final matrix included eight species that represent a broad sampling of the diversity of basal angiosperms (see Mathews and Donoghue, 1999; Parkinson et al., 1999; Soltis et al., 1999; Graham and Olmstead, 2000a,b; Graham et al., 2000; Qiu et al., 2000; Savolainen et al., 2000), five conifers (Pinus thunbergii was obtained from GenBank; accession number D17510), seven exemplar cycads, three exemplar Gnetales (representing all extant families), and two outgroup species obtained from GenBank (Marchantia  14  polymorpha and Psilotum nudum; GenBank accession numbers NC_00319 and NC_003386). GenBank numbers for Gnetum gnemon and six of the eight angiosperms are provided in Graham and Olmstead (2000a,b); previously unpublished numbers for the other angiosperms and Gnetales are provided here (Table 2.1; see also Graham et al., 2000).  2.2.2 DNA Extraction, Amplification and Sequencing DNA was extracted from fresh and silica-dried specimens using the protocol of Doyle and Doyle (1987), except that we added 10% PVP 40 (polyvinylpyrollidine) to the extraction buffer. DNA amplification and sequencing methods are as described in Graham and Olmstead (2000a), with the exception that a ca. 1.0 kb PCR fragment from Ceratozamia miqueliana (spanning four genes: psbB, psbT, psbN, and psbH) was cloned using the TOPO TA cloning kit (Invitrogen Corporation; Carlsbad, Ca.). This taxon appears to have at least one additional version of this region that includes several pseudogenes (H. S. Rai and S. W. Graham, unpublished data). All regions were sequenced at least twice for each taxon, and with a few minor exceptions were completely sequenced in both forward and reverse directions. Regions that I was unable to amplify or sequence, or that have been confirmed as lost from the plastid genome (ndhB and ndhF for Pinus thunbergii; Wakasugi et al., 1994) were coded as missing data in the final matrix. The estimated percentage of length “missing” by taxon (relative to Nicotiana tabacum) is: Welwitschia mirabilis (35.8%); Ephedra nevadensis (35.5%); Gnetum gnemon (33.0%); Cedrus deodara and Pinus thunbergii (24.1%). The major genes that are missing or were not obtained for Cedrus and Pinus are ndhB and ndhF; these two genes and rpl2 were also not obtained for the Gnetales exemplars (Table 2.1). All  15  of the noncoding regions (Table 2.2) were also excluded for Marchantia and Psilotum. A maximum of 5.5% of data (for Stangeria eriopus) was coded as missing for the remaining taxa.  2.2.3 Data Assembly Contiguous sequences were compiled and base calling performed using Sequencher 4.1 (Gene Codes Corporation; Ann Arbor, MI.). The consensus sequences for each taxon were exported into a previously generated alignment (Graham et al., 2000) that was adjusted manually in Se-Al version 1.0 (Rambaut, 1998) using criteria provided in Graham et al. (2000). The alignments were imported into PAUP* version 4.0b10 (Swofford, 2002) for compilation and analysis. Tobacco and Ginkgo sequences were used to determine gene and exon boundaries. The intergenic spacer (IGS) regions of the Photosystem II genes (Table 2.2), and short portions of the rps7-ndhB IGS and the rpl2 intron (239 bp and 111 bp in the alignment, respectively) were difficult to align in the conifers. These characters were omitted in seedplant wide analyses. The regions considered for analysis comprise a total of 15,257 aligned nucleotide characters (corresponding to 13,320 nucleotides unaligned; reference taxon = Bowenia serrulata). Of these characters, 8,471 nucleotides are constant, 2,101 nucleotides are variable but uninformative, and 4,685 nucleotides are parsimony informative. In a second set of analyses that considered only the cycads and Ginkgo biloba, we included all of the noncoding regions, because homology assessment for these regions was straightforward at this level of comparison. This alignment spans a total of 16,181 aligned nucleotides (corresponding to 13,784 nucleotides unaligned; reference taxon = Bowenia serrulata), of  16  which 14,412 nucleotides are constant, 1,393 nucleotides are parsimony uninformative, and 376 nucleotide positions are parsimony informative.  2.2.4 Phylogenetic Analysis Two sets of analyses were performed. The first set employed the alignment of the 24 seed plant taxa and outgroups, and was performed using maximum parsimony (MP). Heuristic searches were performed in PAUP* with all characters and character state changes equally weighted, using TBR (tree-bisection-reconnection) branch swapping. The “MulTrees” option was turned on, and 100 random addition replicates were performed for each search. A second set of more intensive searches focussed on the alignment that included only the seven cycads and Ginkgo biloba. Both MP and maximum likelihood (ML) criteria were used for these analyses. A model was chosen for the ML searches using the likelihood ratio test (LRT; see Swofford et al. 1996; Huelsenbeck and Crandall, 1997); model parameters were estimated from the data in each case. The hierarchy of models tested is a modification of that shown in Fig. 4 of Huelsenbeck and Crandall (1997; see Graham et al., 2002 for details). The LRT was repeated using both shortest trees obtained from the maximum parsimony analysis (Table 2.3, and see Results). Parsimony-based bootstrap analysis (Felsenstein, 1985) was performed for the twenty-six and eight taxon data sets using the search criteria described above, except that only one random addition replicate was used for each of 100 bootstrap replicates. The bootstrap analysis was repeated for the eight taxon data set using ML, with the optimal model indicated by the LRT.  17  2.3 RESULTS 2.3.1 26-Taxon Data Set Heuristic searches of the 26-taxon matrix produced two most parsimonious trees (Fig. 2.2). The relationships within the cycads are similar to recent results obtained by Bogler and Francisco-Ortega (2004) using chloroplast trnL intron and nuclear ITS2 sequences. The parsimony analysis shows high bootstrap support for the cycads as a whole (Fig. 2.2). Bowenia is isolated from Stangeria; the latter appears as the sister group of Zamia. In the parsimony analysis Ginkgo is found to be the sister group of the cycads. This relationship is only weakly supported by parsimony analysis (54% of bootstrap replicates; Fig. 2.2). The conifers are a well-supported clade that is the sister group of (Ginkgo + cycads). The Gnetales are also well supported as a monophyletic group. They form the sister group of the remaining seed plants, corresponding to the “Gnetales basal” hypothesis of Rydin et al. (2002; see also Sanderson et al., 2000).  2.3.2 Cycads and Ginkgo Major relationships within the cycads are nearly identical in parsimony and likelihood analyses (Fig. 3). The GTR + ! + I model was chosen with the likelihood ratio test [Table 2.3; general-time-reversible (GTR) rate matrix with proportion of invariable sites (I) considered, and among-site rate variation accounted for using the gamma (!) distribution as described by the shape parameter alpha (")]. An exhaustive search was performed using this model, with all parameter values estimated from the data. The single shortest ML tree is illustrated in Fig. 2.3; this topology is also one of two trees found in the parsimony analysis. Cycas is part of the basal split in the cycads, and Dioon is the sister group of the remaining 18  taxa. In all analyses, Stangeria and Zamia are sister taxa, and Ceratozamia is the sister group of these two (Fig. 2.3). None of the analyses performed indicates a sister group relationship between Bowenia and Stangeria. The only difference between the parsimony and likelihood analyses is in the precise placement of Bowenia and Encephalartos relative to the clade of Ceratozamia, Stangeria and Zamia (Fig. 2.3). In the likelihood analysis, Encephalartos is the sister group of the latter clade and Bowenia is the sister group of this entire clade. The MP tree that is not congruent with the ML topology instead depicts Bowenia as the sister group of (Ceratozamia, Stangeria, Zamia), with Encephalartos as the sister group of that clade (Fig. 2.3).  2.3.3 Molecular Evolution in Cycads and Relatives There is significant heterogeneity in the rate of molecular evolution across the 26 taxa considered here (ln likelihood scores for the parsimony-based tree depicted in Fig. 2: GTR + ! + I model = -97609.78; GTR + ! + I + molecular clock = -98196.06; -2 ln " = 1172.56; P < 0.05). Pairwise comparisons also suggest that the transition:transversion ratio (ti/tv) is substantially higher in the cycads and Ginkgo than in the rest of the seed plants (data not shown). I confirmed this by estimating the ti/tv ratio from the HKY + ! + I likelihood model for pruned subtrees in the seed plants derived from the MP tree shown in Fig. 2.2. The ti/tv ratio is ca. 2.5 in most seed plants, but almost twice as high in the cycads alone or cycads and Ginkgo together (Table 2.4). Examination of substitution rate parameters in the GTR + ! + I model on the same set of subtrees suggests that most of the elevated ti/tv bias in cycads and Ginkgo comes from higher A:G transitions and lower A:T and C:G transversions than the rest  19  of the seed plants (Table 2.5). Base frequencies and the proportion of invariable characters were fairly consistent across the different subtrees considered here, while the !-shape parameter, ", was elevated in the conifers and perhaps the Gnetales (Table 2.5).  2.4 DISCUSSION 2.4.1 Cycad Molecular Evolution The chloroplast regions examined here for the cycads and Ginkgo are highly conserved, both with regards to length variation in noncoding regions (Table 2.2) and the amount of nucleotide substitution. The comparison of likelihood models with and without a molecular clock indicates substantial rate heterogeneity across the 26-taxon tree of seed plants and relatives (see Results). Ginkgo and the crown-group cycads have a relatively shallow depth on the seed plant tree compared to the other seed plant groups (Fig. 2.2), despite a deep fossil record for both lineages. Fossil cycads are known from ca. 270 million years ago (Mamay, 1969) and the fossil record for Ginkgoales is of comparable age, reaching back into at least the Triassic (Stewart and Rothwell, 1993). Thus, substantially more chloroplast evolution has occurred in each of the other major seed-plant clades than in the cycads or Ginkgo, for comparable numbers of basal taxa, across comparable or shorter time frames [Fig. 2.2; conifer relatives are known from ca. 310 million years ago, the oldest probable crown-group angiosperm fossils are ca. 130 million years old, and crown-group Gnetales may have diversified at around the same point in the early Cretaceous (although putative stem-lineage relatives of the Gnetales existed in the Triassic); see Taylor and Taylor (1993) and Sanderson and Doyle (2001). Note that branch lengths in Gnetales and some conifers are artificially short here because of incomplete molecular data, see Table 2.1].  20  Substitution model parameters estimated using current ML algorithms have a consistent value across the entire tree. It would be therefore valuable to know whether the observed variation in substitution rate matrix (R-matrix) values across the seed plants affects phylogenetic inference within the cycads. I assessed this by performing heuristic likelihood searches (GTR + ! + I model) for the eight-taxon data set using R-matrix estimates from the different seed-plant subtrees shown in Table 2.5; the other parameters of the model were left free to vary. Only one tree topology (that of Fig. 2.3; results not shown) was recovered in all cases, suggesting that variation of this magnitude across the seed plants in the ti/tv ratio does not affect inference of relationships within the cycads, and is presumably unlikely to bias phylogenetic inference within the cycads or other seed plant clades when all taxa are considered together.  2.4.2 Basal Cycads and the Sister Group of Cycadales The large numbers of characters in this study provide substantial new evidence for estimating deep relationships within and among the cycads and their seed plant relatives. Although only 376 parsimony informative characters were recovered across ca. 110kb total DNA sequence data for the seven exemplar cycads and Ginkgo (less than a tenth of the total number of informative characters observed for 26 taxa, with slightly fewer nucleotides examined per taxon in the latter case), my data permit robust inference of most aspects of phylogenetic relationship among the basal cycads (Fig. 2.3). The very slow molecular evolution observed for the cycads and Ginkgo may additionally minimize any possibility of there being biased phylogenetic inference for these taxa (“long branch attraction”) due to terminal branches that are long relative to internal branches (see Felsenstein, 1983).  21  While the seed plant relationships inferred here are well resolved and generally very well supported by parsimony-based bootstrap analysis, my data provide only weak support for a sister group relationship between Ginkgo and the cycads. However, this relationship is congruent with some of the most-parsimonious morphology-based trees of Rothwell and Serbet (1994), chloroplast genome structural evidence (L. A. Raubeson, pers. comm.) and several recent molecular studies (Boivin et al., 1996; Goremykin et al., 1996; Chaw et al., 1997, 2000). The elevated ti/tv ratio and slow rate of molecular evolution observed here for cycads and Ginkgo may constitute additional molecular synapomorphies for these two seed plant groups. The arrangement of the exemplar cycad taxa that I examined parallels and reinforces several previous morphological and molecular studies of basal cycad lineages. For example, Bogler and Francisco-Ortega (2004) found essentially the same arrangement with respect to the exemplar taxa that I examined, but with poorer bootstrap support across the backbone of cycad phylogeny. My results strongly support Cycas as the sister group of the remaining cycads (Figs. 2.2, 2.3), consistent with all previous phylogenetic studies and most cycad classification schemes. I find robust support for Dioon being part of the next deepest split in cycad phylogeny (Fig. 2.3), and for Stangeria being closely related to Zamia and Ceratozamia, with moderate support for Stangeria being the sister taxon of Zamia (of those taxa included here). My results also parallel an early morphology-based cladistic study of the cycads by Petriella and Crisci (1977; cited in Crane, 1988), who observed the same basal arrangement of Cycas and then Dioon, and found tribe Encephalarteae to be the sister group of a clade containing Bowenia, Ceratozamia, Stangeria and Zamia, one of the two general  22  arrangements that I observed in the MP analysis (Fig. 2.3). The same basic arrangement was also seen (regarding the relative positions of Cycas, Dioon, Ceratozamia and Zamia), in a phylogenetic study of chloroplast DNA restriction-site data that was robustly supported by bootstrap analysis (Caputo et al., 1991; these authors also included Microcycas and Chigua as exemplars, but did not include Bowenia and Stangeria). The phylogenetic analyses of morphological evidence by Petriella and Crisci (1977), Crane (1988) and Stevenson (1990) are largely congruent with my results with respect to the exemplar taxa that our studies share in common, except that these authors found Bowenia and Stangeria to form a clade near the root of the cycads. In contrast, my results show neither genus to be near the root. While some uncertainty remains over the precise placement of Bowenia, my data strongly suggest that these two taxa are not each others’ closest living relatives, in line with Bogler and Francisco-Ortega’s (2004) results. Bowenia and Stangeria are well separated on my chloroplast-based tree by several branches with moderate to strong bootstrap support (Fig. 2.3). Future studies should focus on clarifying the position of the remaining cycad genera and species. Although I focussed on an exemplar-based sampling for the current study of broad phylogenetic relationships, my results (Figs. 2.2, 2.3) warrant two main possibilities for cycad classification to accommodate the multiple basal cycad lineages observed; either recognition of one or two large families, or of multiple small families. Current schemes with three or four families are unlikely to be satisfactory without substantial modification. At the very least, a recircumscription of Zamiaceae, Stangeriaceae and subfamily Encephalartoideae will be necessary, since none of these taxa are monophyletic (cf. Figs. 2.1-2.3).  23  1  Table 2.1. GenBank accession numbers and vouchers for exemplar taxa with one or more previously unpublished DNA sequences . Gene or region _________________________________________________________________________________________ atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  AY007460  AF239777  AY007475  (L12632)  AF238055  AY007464  AF239782  AY007479  (L75849.2)  n/a  AF469704  AF462401  AF469714  (X63662)  AF469698  AF469710  AF462406  AF469719  (AJ235805)  3'-rps12, rps7 & ndhB  Taxon (voucher, herbarium) ANGIOSPERMS 2 Austrobaileya (AF092107) AF238052 scandens (Olmstead s.n., WTU) Hydrastis (AF093382) canadensis (Olmstead s.n., WTU) CONIFERS Cedrus AF469655 deodara (SWG XI-98-1, ALTA) Metasequoia AF469660 glyptostroboides (Rai 1007, ALTA)  2  2  AY007489 AF238065  2  2  AY007492 AF238069  AF469723 AF469739  2  AF469728 AF469736  24  Gene or region _________________________________________________________________________________________ atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7 & ndhB  AF469711  AF462407  AF469720  AF462414  AF469729 AF469737  AY116650, AY116651  AF239793  AF469693  AF469703  AF462400  AF469694  AF469705/ AF469706  AF462402  Taxon (voucher, herbarium) Podocarpus AF469661 AF469699 chinensis (Graham & Denton VII-98-8, ALTA) 2  Sciadopitys AF239792 AF469700 verticillata (Graham & Denton VII-98-1, WTU) CYCADS Bowenia AF469654 serrulata (Bogler 1202, FTG) Ceratozamia AF469656 miqueliana (Hubbuch et al. 106, FTG)  2  AY007486  2  2  2  (L25753)  AY007499 AF238076  AF469713  AF462409  AF469722 AF469731  AF469715  AF462410  AF469724 AF469732  3  25  Gene or region _________________________________________________________________________________________ atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7 & ndhB  Cycas AF469657 revoluta (O'Brien 1000, ALTA)  AF469695  AF469707  AF462403  AF469716  AF462411  AF469725 AF469733  Dioon AF469658 purpusii (O'Brien 1001, ALTA)  AF469696  AF469708  AF462404  AF469717  AF462412  AF469726 AF469734  Encephalartos AF469659 barteri (O'Brien 1002, ALTA)  AF469697  AF469709  AF462405  AF469718  AF462413  AF469727 AF469735  Stangeria AF469662 eriopus (Beck 1117, FTG)  AF469701  AF469712  AF462408  AF469721  AF462415  AF469730 AF469738  AF469702  AF188846  Taxon (voucher, herbarium)  Zamia AF188845 furfuracea (Graham VIII-98-1, WTU)  2  2  AF188848  2  AF188847  2  AF202959  2  2  AF188849 AF188850  2  26  Gene or region _________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'-rps12, & psbH & psbJ rps7 & ndhB Taxon (voucher, herbarium) GNETALES  4  Ephedra AF239779 nevadensis (Olmstead s.n., WTU)  n/a  AY007462  AF239780  AY007477  (D10732)  Welwitschia AF239795 n/a mirabilis (Graham + Denton VII-98-6, WTU)  AY007472  AF239796  AY116660  (D10735)  2  2  n/a  AF238067  n/a  AF238078  1  Partly sequenced genes by taxon, relative to sequences considered in Graham and Olmstead (2000a):(1) psbD: Dioon and Welwitschia, 311 bp missing at 5'-end; (2) rbcL: Encephalartos and Stangeria, 69 bp and 93 bp missing at 5'-end, respectively; (3) atpB: Ephedra, 364 bp, and Stangeria, 209 bp missing at 5'-end; (4) ndhF: Ephedra, Gnetum, and Welwitschia missing entire ndhF gene; Metasequoia and Zamia, both missing 520 bp at 5' end and 84 bp at 3’ end, the other cycad and conifer ndhF sequences lacking 271 bp at the 5’- end; (5) psbB: Ceratozamia, 85 bp missing at the 5'-end; (6) psbJ: Metasequoia and Welwitschia, lacking all of the examined region (ca. 90 bp); (7) rpl2: Ephedra, Gnetum, and Welwitschia missing entire rpl2 region; Dioon and Encephalartos missing 112 bp at 5'-end, and Cedrus, Metasequoia, Bowenia, Ceratozamia, and Dioon missing 90 bp at 3'-end; (8) 3'-rps12: Cycas and Stangeria, missing 81 bp and 214 bp at 5'-end, respectively; (9) ndhB: Podocarpus and Cycas missing 400 bp and 389 bp at 3'-end, respectively; Cedrus, Ephedra, Gnetum, and Welwitschia missing entire ndhB region. 2 Previously published sequences. Accessions in brackets were produced by other workers; see Graham and Olmstead (2000a,b) and Graham et al. (2000) for a complete list of taxa and accession numbers for other sequences employed in phylogenetic analyses here. 3 Sequence is updated for this publication (see text). 4 One sequence of Gnetum gnemon (Gnetaceae) was used in a previous study (Graham and Olmstead, 2000a) but its GenBank number was omitted there (3’-rps12—rps7; AY116648).  27  Table 2.2. Lengths and variation in length of the noncoding regions used in this study. Cycads Region  Mean length  Seed plants (including cycads) a  Range  Mean length  a  Range  3’rps12, rps7, ndhB 3’rps12 intron  549  540-557  532  476-575  ndhB intron  726  720-727  705  669-736  3’rps12-rps7 IGS  52  52  53  48-64  rps7-ndhB IGS  361  353-379  318  215-379  692  681-711  674  649-717  9  9  9  9-14  rpl2 intron psbE, psbF, psbL, psbJ psbE-psbF IGS  28  Cycads Region  Mean length  Seed plants (including cycads) a  Range  Mean length  a  Range  psbE, psbF, psbL, psbJ psbF-psbL IGS  22  22  26  11-38  psbL-psbJ IGS  114  108-118  123  psbB-psbT IGS  169  161-185  152  psbT-psbN IGS  76  74-81  71  psbN-psbH IGS  83  83  97  b  b  113-161  psbB, psbT, psbN, psbH  a  rounded to nearest bp;  b-e  d e  c  c  69-193 57-91 d 78-120  e  The following were excluded from length calculations due to incomplete data or aberrantly  long IGS sequences (Graham et al., unpubl. data: b Metasequoia glyptostroboides: psbL-psbJ IGS; c Welwitschia mirabilis: psbB-psbT IGS; d Sciadopitys verticillata: psbT-psbN IGS; eAustrobaileya scandens: psbN-psbH IGS)  29  Table 2.3. Likelihood ratio test (LRT) for different substitution models based on the two most parsimonious tree topologies for the seven cycads and Ginkgo (tree 1 is the main topology depicted in Fig. 3). Substitution model -ln likelihood Comparison1  -2 ln !  2  P3  MP tree 1 JC69  32411.01  -----  -----  -----  F81  32152.95  JC69 vs. F81  516.12  < 0.05  HKY85  31166.37  F81 vs. HKY85  1973.13  < 0.05  GTR  31055.03  HKY85 vs. GTR  222.71  < 0.05  GTR + "  30852.41  GTR vs. (GTR + ")  405.24  < 0.05  GTR + " + I  30848.06  (GTR + ") vs. (GTR + " + I) 8.70  < 0.05  JC69  32415.52  -----  -----  -----  F81  32159.56  JC69 vs. F81  511.92  < 0.05  HKY85  31174.33  F81 vs. HKY85  1970.47  < 0.05  GTR  31061.90  HKY85 vs. GTR  224.86  < 0.05  GTR + "  30857.18  GTR vs. (GTR + ")  409.44  < 0.05  GTR + " + I  30852.68  (GTR + ") vs. (GTR + " + I) 8.98  MP tree 2  1  < 0.05  Abbreviations: JC69 = Jukes Cantor (1969); F81 = Felsenstein (1981); HKY85 = Hasegawa et al. (1985); GTR = General Time-Reversible (Lanave et al., 1984; Tavaré, 1986; Barry and Hartigan, 1987; Rodríguez et al., 1990). " = Gamma; I = Proportion of invariable sites  2  Likelihood ratio test statistic.  3  The #-level was adjusted using the Bonferroni correction for ten tests  30  Table 2.4. Estimated likelihood parameters (HKY + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2. Subtree  Cycads  Cycads +  Conifers  Gnetales  Angiosperms  Ginkgo  Seed plants (excl. cycads + Ginkgo)  Parameters ti/tv  4.64329  4.78367  2.52218  2.84368  2.42617  2.62560  A  0.27742  0.27871  0.28924  0.27822  0.27446  0.28967  C  0.18952  0.19041  0.18242  0.18649  0.199253  0.19380  G  0.22543  0.22304  0.21173  0.21782  0.221099  0.20421  T  0.30764  0.30784  0.21173  0.21782  0.305187  0.31232  0.59578  0.50307  0.61724  0.69313  0.57354  0.43993  0.87172  0.85585  !  !  0.71664  1.43375  Base Frequencies  I "  1  2  2  1  The gamma (!) shape parameter.  2  Values when genes not obtained for Gnetales (ndhB, ndhF, rpl2) are excluded during ML estimation: I = 0; " = 0.166.  31  Table 2.5. Estimated likelihood parameters (GTR + ! + I model) for various seed-plant subtrees derived from the 26-taxon MP tree depicted in Fig. 2.  Subtree  Cycads  Cycads +  Conifers  Gnetales  Angiosperms  Ginkgo  Seed plants (excl. cycads + Ginkgo)  Parameters Rate Matrix AC  1.85110  1.74707  1.81512  1.72428  1.61416  1.77199  AG  7.34954  6.47428  4.90730  4.44244  3.78213  4.83661  AT  0.14563  0.11631  0.63952  0.85701  0.27506  0.48881  CG  0.44685  0.38870  0.86065  0.20063  0.83620  1.00865  CT  7.84315  7.63734  5.92336  7.18822  4.87788  6.28927  GT  1.0  1.0  1.0  1.0  1.0  1.0  32  Subtree  Cycads  Cycads +  Conifers  Gnetales  Angiosperms  Ginkgo  Seed plants (excl. cycads + Ginkgo)  Parameters  Base Frequencies  I !  1  A  0.27783  0.28191  0.29019  0.27887  0.27856  0.30873  C  0.18737  0.18615  0.17725  0.18249  0.19087  0.16549  G  0.22577  0.22127  0.21432  0.22376  0.22228  0.19241  T  0.30903  0.31067  0.31824  0.31488  0.30829  0.33336  0.59283  0.48078  0.60248  0.66040  2  0.57041  0.43879  0.85281  0.84493  26.86624  7.15538  2  0.72979  1.42102  1  The gamma (") shape parameter.  2  Values when genes not obtained for Gnetales (ndhB, ndhF, rpl2) are excluded during ML estimation: I = 0.61; ! = 2.37  33  Figure 2.1. Summary of Stevenson’s (1992) classification of the cycads, reproduced with permission from Ken Hill (http://plantnet.rbgsyd.gov.au/PlantNet/cycad/ident.html).  34  Figure 2.2. Chloroplast based phylogeny of the cycads and relatives. The tree is one of two most parsimonious trees (14229 steps, CI=0.593, RI=0.669) found using 17 chloroplast genes and associated noncoding regions (three introns and two intergenic spacer regions). Bootstrap values are indicated beside branches. The arrow points to a branch not found in the strict consensus of the two most-parsimonious trees.  35  Figure 2.3. Chloroplast based phylogeny of the cycads using different optimality criteria (rooted according to Fig. 2). The tree shown is one of two MP trees (2088 steps, CI=0.884, RI=0.5) and the best ML tree (-lnL=30848.062; see text) found using 17 chloroplast genes and associated noncoding regions (three introns and eight intergenic spacer regions). The double-headed arrow indicates an alternative relationship seen in the other MP tree. Numbers above branches are parsimony-based branch lengths (ACCTRAN optimization), and numbers below are support values from MP (first values) and ML bootstrap analyses (second values).  36  2.5 REFERENCES BARRY, D., AND J. A. HARTIGAN. 1987. Asynchronous distance between homologous DNA sequences. Biometrics 43: 261-276. BOGLER, D. J., AND J. FRANCISCO-ORTEGA. 2004. Molecular systematic studies in cycads: evidence from trnL intron and ITS2 rDNA sequences. Bot. Rev. 70: 260-273. BOIVIN, R., M. RICHARD, D. BEAUSEIGLE, J. BOUSQUET, AND G. BELLEMARE. 1996. Phylogenetic inferences from chloroplast chlB gene sequences of Nephrolepis exaltata (Filicopsida), Ephedra altissima (Gnetopsida), and diverse land plants. Mol. Phylogenet. Evol. 6: 19-29. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales’ closest relatives are conifers. Proc. Nat. Acad. Sci. 97: 4092-4097. CAPUTO, P., D. W. STEVENSON, AND E. T. WURTZEL. 1991. A phylogenetic analysis of American Zamiaceae (Cycadales) using chloroplast DNA restriction fragment length polymorphisms. Brittonia. 43: 135-145. CAPUTO, P., C. MARQUIS, T. WURTZEL, D. W. STEVENSON, AND E. T. WURTZEL. 1993. Molecular biology in cycad systematics. pp. 213- 219. In Proceedings of CYCAD 90, the Second International Conference of Cycad Biology. Edited by D. W. Stevenson, and K. J. Norstog. Palm and Cycad Societies of Australia Ltd., Queensland. CHAMBERLAIN, C. J. 1965. The living cycads. Hafner Publishing Company, New York, NY. CHAW, S., A. ZHARKIKH, H. SUNG, T. LAU, AND W. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68.  37  CHAW, S., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Nat. Acad. Sci. 97: 4086-4091. CRANE, P. R. 1988. Major clades and relationships in the “higher” gymnosperms. Pp. 218272. In Origin and evolution of gymnosperms. Edited by C. B. Beck. Columbia University Press, New York, NY. DOYLE J. A. 1996. Seed plant phylogeny and the relationships of Gnetales. Int. J. Plant Sci. 157: S3-S39. DOYLE J. A. 1998. Phylogeny of vascular plants. Ann. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A., AND M. J. DONOGHUE. 1986. Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach. Bot. Rev. 52: 321-431. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. FELSENSTEIN, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17: 368-376. FELSENSTEIN, J. 1983. Parsimony in systematics: biological and statistical issues. Ann. Rev. Ecol. Syst. 14: 313-333. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783–791. GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A. ANTONOV, AND W. MARTIN. 1996. Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Mol. Biol. Evol. 13: 383-396.  38  GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Amer. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant. Sci. 161: S83S96. GRAHAM, S. W., R. G. OLMSTEAD, AND S. C. H. BARRETT. 2002. Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. Mol. Biol. Evol. 19:1769-1781. HASEGAWA, M., H. KISHINO, AND T. YANO. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 21: 160-174. HILL, C. R., AND P. R. CRANE. 1982. Evolutionary cladistics and the origin of the angiosperms. Pp. 269-361. In Problems of phylogenetic reconstruction. Edited by K. A. Joysey, and E. A. Friday. Academic Press, London. HUELSENBECK, J. P., AND K. A. CRANDALL. 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Ann. Rev. Ecol. Syst. 28: 437-466. JONES, D. L., 1993. Cycads of the world. Smithsonian Institution Press, Washington, D.C. JUKES, T. H., AND C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 21-32. In Mammalian Protein Metabolism. Edited by H. N. Munro. Academic Press, New York, NY.  39  LANAVE, C., G. PREPARATA, C. SACCONE, AND G. SERIO. 1984. A new method for calculating evolutionary substitution rates. J. Mol. Evol. 20: 86-93. LOCONTE, H., AND D. W. STEVENSON. 1990. Cladistics of the Spermatophyta. Brittonia 42: 197-211. MAMAY, S. H. 1969. Cycads: fossil evidence of late paleozoic origin. Science 164: 295-296. MATHEWS, S. M., AND M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286: 947-950. NORSTOG, K. J. 1990. Studies of cycad reproduction at Fairchild Tropical Garden. Mem. New York Bot. Gard. 57: 63-81. NORSTOG, K. J., AND T. J. NICHOLLS. 1997. The biology of the cycads. Cornell University Press, Ithaca, NY. PARKINSON, C. L., K. L. ADAMS, AND J. D. PALMER. 1999. Multigene analyses identify the three earliest lineage of extant flowering plants. Curr. Biol. 9: 1485-1488. QIU, Y., J. LEE, F. BERNASCONI-QUADRONI, D. E. SOLTIS, P. S. SOLTIS, M. ZANIS, E. A. ZIMMER, Z. CHEN, V. SAVOLAINEN, AND M. W. CHASE. 2000. Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int. J. Plant Sci. 161: S3-S27. RAMBAUT, A. 1998. Se-Al (Sequence Alignment Editor). Version 1.0, Computer program and documentation. Department of Zoology, University of Oxford, UK. RODRÍGUEZ, F., J. L. OLIVER, A. MARÍN, AND J. R. MEDINA. 1990. The general stochastic model of nucleotide substitution. J. Theor. Biol. 142: 485-501. ROTHWELL, G. W. AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482.  40  RYDIN, C., M. KÄLLERSJÖ, AND E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci. 163: 197-214. SAARELA, J.M., P.J. PRENTIS, H.S. RAI, AND S.W. GRAHAM. 2008. Phylogenetic relationships in the monocot order Commelinales, with a focus on Philydraceae. Botany 86: 719-731. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17: 782-797. SANDERSON, M. J., AND J. A. DOYLE. 2001. Sources of error and confidence intervals in estimating the age of the angiosperms from rbcL and 18S rDNA data. Amer. J. Bot. 88: 1499-1516. SAVOLAINEN, V., M. W. CHASE, S. B. HOOT, C. M. MORTON, D. E. SOLTIS, C. BAYER, M. W. FAY, A. Y. DE BRUIJN, S. SULLIVAN, AND Y. QIU. 2000. Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Syst. Biol. 49: 306-362. SCHUTZMAN, B. AND B. DEHGAN. 1993. Computer-assisted systematics in the Cycadales. Pp. 281-289. In Proceedings of CYCAD 90, the Second International Conference of Cycad Biology. Edited by D. W. Stevenson, and K. J. Norstog. Palm and Cycad Societies of Australia Ltd., Queensland. SOLTIS, P. E., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402: 402-404. STEVENSON, D. W. 1981. Observations on ptyxis, phenology, and trichomes in the Cycadales and their systematic implications. Amer. J. Bot. 68: 1104-1114.  41  STEVENSON, D. W. 1990. Morphology and systematics of the Cycadales. Mem. New York Bot. Gard. 57: 8-55. STEVENSON, D. W. 1992. A formal classification of the extant cycads. Brittonia 44: 220-223. STEWART, W. N., AND G. W. ROTHWELL. 1993. Paleobotany and the evolution of plants. Second edition. Cambridge University Press, New York, NY. SWOFFORD, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, AND D. M. HILLIS. 1996. Phylogenetic Inference. Pp. 407-543. In Molecular Systematics. Second edition. Edited by D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, MA. TAVARÉ, S. 1986. Some probabilistic and statistical problems on the analysis of DNA sequences. Lec. Math. Life Sci. 17: 57-86. TAYLOR, T. N., AND E. L. TAYLOR. 1993. The biology and evolution of fossil plants. Prentice Hall, Englewood Cliffs, NJ. WAKASUGI, T., J. TSUDZUKI, S. ITO, K. NAKASHIMA, T. TSUDZUKI, AND M. SUGIURA. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Nat. Acad. Sci. 91: 9794-9798.  42  CHAPTER 31 INFERENCE OF HIGHER-ORDER CONIFER RELATIONSHIPS FROM A MULTI-LOCUS PLASTID DATA SET  3.1 INTRODUCTION Conifers have a rich and deep fossil record, with taxa assignable to extant families dating back to the Triassic (Yao et al., 1997; Stockey et al., 2005). Although angiosperms are now the dominant group of seed plants in most terrestrial ecosystems, conifers are still ecologically significant in all continental floras (Enright and Hill, 1995), and they dominate the northern boreal forests. Approximately 670 extant species are recognized in some 70 genera. Most conifer systematists recognize seven families to accommodate their diversity: Araucariaceae, Cephalotaxaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae and Taxaceae (Phyllocladaceae are sometimes separated from Podocarpaceae, while Taxodiaceae are now usually included in Cupressaceae). Considerable progress in our understanding of conifer classification and phylogenetics has been made using multiple lines of evidence (e.g., Quinn et al., 2002). For example, Taxaceae have sometimes between treated as an order distinct from other conifers (e.g., Florin, 1951), but morphological and molecular data clearly support a nested phylogenetic position for the family among the rest of the conifers (e.g., Hart, 1987; Raubeson and Jansen, 1992; Quinn et al., 2002). However, there are still points of weakness in our understanding of conifer higherorder phylogenetic relationships. For example, it is still not clear whether extant conifers are 1  A version of this chapter has been published: RAI, H.S., P. A. REEVES, R. PEAKALL, R. G. OLMSTEAD, AND S. W. GRAHAM. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 86: 658-669. 43  monophyletic (e.g., Burleigh and Mathews, 2004, 2007a, b). Some studies find the enigmatic Gnetales (Ephedra, Gnetum and Welwitschia) to be the sister group of the pines and relatives (Pinaceae) with moderate to strong support (the ‘gnepine’ hypothesis, see Bowe et al., 2000; Chaw et al., 2000; Gugerli et al., 2001), whereas others support the monophyly of extant conifers, with Gnetales being placed elsewhere in seed-plant phylogeny (e.g., Chaw et al., 1997; Rydin et al., 2002; Rai et al., 2003). A placement of Gnetales as sister to Pinaceae is difficult to justify on morphological grounds (e.g., Donoghue and Doyle, 2000), and may be a strong analytical artifact, perhaps due to long-branch attraction (e.g., Burleigh and Mathews, 2004). On the other hand, a relationship between Gnetales and conifers among extant seed plants, a result seen in a subset of molecular studies (e.g., Chaw et al., 1997), is less problematic from a morphological perspective (e.g., Mundry and Stützel, 2004; Doyle, 2005). Although there is considerable disparity among molecular studies concerning the placement of Gnetales, there is broad agreement on a sister-group relationship between Pinaceae (or gnepines) and the remaining conifer families. Phylogenetic studies have also clarified the circumscription and interrelationships of the other conifer families. For example, they have led to the recognition of a sister-group relationship between the two predominantly southern hemisphere taxa, Araucariaceae and Podocarpaceae, and have provided support for Araucariaceae-Podocarpaceae as the sister group of a clade consisting of Cephalotaxaceae, Cupressaceae, Sciadopityaceae and Taxaceae (Chaw et al., 1997; Stefanovic et al., 1998; Gugerli et al., 2001; Quinn et al., 2002; Rydin et al., 2002). The resulting large clade – comprising all extant conifers except Pinaceae – has been referred to  44  informally as ‘conifers II’ (e.g., Rydin et al., 2002), and more recently as Cupressophyta by Cantino et al. (2007). Within Cupressophyta, various molecular and morphological phylogenetic studies support the existence of a clade consisting of members of Cephalotaxaceae and Taxaceae (e.g., Hart, 1987; Cheng et al., 2000; Quinn et al., 2002), although there is some uncertainty about the limits and monophyly of Taxaceae (e.g., Page, 1990d). For example, Quinn et al. (2002) proposed that Taxaceae should be circumscribed to include Cephalotaxus, although this recommendation is not yet generally followed. Most taxa formerly included in Taxodiaceae are now recognized under a more broadly defined Cupressaceae, a circumscription proposed by Eckenwalder (1976) on morphological grounds, that has since been supported by numerous phylogenetic studies (e.g., Hart, 1987; Brunsfeld et al., 1994; Stefanovic et al., 1998; Gadek et al., 2000; Kusumi et al., 2000; Quinn et al., 2002; Rydin et al., 2002). The distinctiveness of Sciadopitys verticillata, traditionally considered to belong to Cupressaceae, supports its recognition as a separate family, Sciadopityaceae (e.g., Page, 1990c). Morphological and molecular phylogenetic data confirm this view (e.g., Hart, 1987; Brunsfeld et al., 1994). Both families are now recognized as part of the larger clade that includes Cephalotaxaceae and Taxaceae. Finally, the relative arrangement of Cephalotaxaceae, Cupressaceae, Taxaceae and Sciadopityaceae to each other is incompletely understood (e.g., Stefanovic et al., 1998; Quinn et al., 2002; Rydin et al., 2002). The main goal of this study is to obtain well-supported relationships for the deep branches of conifer phylogeny by surveying a large multigene plastid data set (15-17 plastid genes and associated noncoding regions) for a broad range of exemplar conifers. Increasing the amount of nucleotide data sampled per taxon has been shown to be an effective way to  45  clarify our understanding of deep phylogenetic relationships in various groups of plants, and to generally increase support for phylogenetic inferences (for empirical examples using the current gene set see Graham and Olmstead, 2000a; Rai et al., 2003; Graham et al., 2006; Saarela et al., 2007; Zgurski et al., 2008). This chapter focuses on relationships among the families, but I also sampled Araucariaceae and Cupressaceae at sufficient taxonomic depth to address basic features of their internal phylogenetic structure. Rapidly evolving characters can have a substantial impact on the inference of overall seed-plant relationships (Burleigh and Mathews, 2004, 2007b; H.S. Rai and S.W. Graham, unpublished data), and so I assess whether this affects phylogenetic inference within the conifers by including or excluding the most rapidly evolving characters from consideration. I also characterize a curious structural mutation in one of the plastid ribosomal protein genes from two families of conifers, Araucariaceae and Podocarpaceae.  3.2 MATERIALS AND METHODS 3.2.1 Plant Material and Genomic Sampling I surveyed 17 genes, which together with their associated noncoding regions represent between 1/8 and 1/9th of the entire plastid genome (~120 kb in Pinus; Wakasugi et al., 1994). The coding regions include photosynthetic genes (atpB, rbcL and ten photosystem II, psb, genes), translation apparatus genes (the plastid ribosomal protein genes rpl2, rps7 and 3’-rps12), and two chlororespiratory genes (ndhB and ndhF, which code for two of the subunits of plastid NADH dehydrogenase). The noncoding regions consist of three introns (in rpl2, 3’-rps12 and ndhB) and eight intergenic spacer regions (Table 3.1). I used exemplar-based taxon sampling to represent the major branches of conifer phylogeny; in  46  choosing representatives for each non-monotypic family I attempted to represent their internal systematic diversity as broadly as possible, at least as understood from prior studies. In total I included 22 exemplar conifer species and multiple outgroups (11 other seed plants, two monilophytes and three bryophytes). Source and GenBank information is provided in Table 3.1.  3.2.2 Recovery of Plastid Sequences, DNA Alignment and Characterization of an Indel Hotspot I extracted DNA from fresh and silica-dried specimens following Doyle and Doyle (1987) and Rai et al. (2003). DNA samples of Wollemia, Agathis robusta and Araucaria cunninghamii were extracted as described in Peakall et al. (2003). DNA amplification and sequencing methods follow Graham and Olmstead (2000a). I sequenced all regions at least twice for each taxon, and with a few exceptions completely sequenced all regions in both directions. Several regions confirmed as lost from the plastid genome or that I could not amplify were coded as missing data in the final matrix. Two genes that are missing (or not retrievable) for Pinaceae are ndhB and ndhF (see Wakasugi et al., 1994); these two genes and rpl2 were also not retrievable from the Gnetales exemplars examined here. I was unable to recover atpB from Widdringtonia cedarbergensis and rpl2 from Thuja plicata. I excluded noncoding regions for three of the outgroup taxa (Anthoceros, Marchantia and Physcomitrella), because these were difficult to align across land plants. I compiled contiguous sequences, performed base-calling using Sequencher 4.1 (Gene Codes Corporation; Ann Arbor, MI), and added new sequences to an alignment (Graham et al., 2006) that includes sequences generated for previous studies of seed-plant  47  phylogeny (Graham and Olmstead, 2000a, b; Graham et al., 2000; Rai et al., 2003). I adjusted alignments manually for each contiguous region using Se-Al version 1.0 (Rambaut, 1998), following alignment criteria in Graham et al. (2000), and used tobacco (Nicotiana tabacum), Ginkgo and Pinus sequences to define gene and exon boundaries, following Graham and Olmstead (2000a). I offset several regions that were too difficult to align in the noncoding regions [the intergenic spacers (IGS) of two of the photosystem II clusters (psbEpsbF-psbL-psbJ and psbB-psbT-psbN-psbH), the IGS between rps7-ndhB, and the introns], following Graham et al. (2006). The resulting staggered regions were frequently limited to single taxa, which are effectively ignored for parsimony-based tree searches and scores (Graham et al., 2006) and should have only minimal effect for model-based methods (e.g., on estimation of base frequency parameter values). Subsets of the offset regions include aligned blocks involving two or more taxa. The final alignment is 25 687 bp in length, derived from ~14 kb of unaligned data per taxon (e.g., 14.1 kb in Agathis australis). Of the total, 5 384 aligned sites are potentially parsimony informative, and 2 575 variable but parsimony uninformative. I also characterized a structural mutation in the ribosomal protein gene rps7 of Araucariaceae and Podocarpaceae, using the Dotlet browser-based application (v. 1.5; Junier and Pagni, 1999) to make pairwise amino-acid comparisons under the PAM-30 matrix of amino-acid substitution.  3.2.3 Phylogenetic Analyses I performed heuristic maximum parsimony (MP) and maximum likelihood (ML) searches using PAUP* (version 4.0b10; Swofford, 2002) and PhyML (version 2.4.4; Guindon and Gascuel, 2003). For the MP analysis (using PAUP*), I treated all characters  48  and character-state changes as equally weighted, and used TBR (tree-bisection-reconnection) branch swapping with 100 random addition replicates. PAUP* defaults were used for all other settings. For the ML search (using PhyML), I first chose a model of DNA sequence evolution with the hierarchical likelihood ratio test (hLRT) and the Akaike Information Criterion (AIC), using Modeltest (version 3.7; Posada and Crandall, 1998), estimating model parameters from the data in each case. Both assessment methods recovered the same optimal DNA substitution model, GTR + ! + I [i.e., the general-time-reversible (GTR) model, with among-site rate variation accounted for by considering the proportion of invariable sites (I), and the gamma (!) distribution, with four substitution-rate categories for the shape parameter alpha (")]. I estimated substitution model parameters (base frequencies, the proportion of invariable sites, and the gamma distribution parameter) during the ML search. I assessed branch support using the nonparametric bootstrap (Felsenstein, 1985) with 100 bootstrap replicates (in the MP search using one random addition replicate per bootstrap replicate). I use ‘weak,’ ‘moderate,’ and ‘strong’ in reference to clades that have bootstrap support values < 70%, 70-89%, and ! 90%, respectively (e.g., Graham et al., 1998). I reanalyzed the main matrix after removal of the most rapidly evolving characters, in order to assess whether they distort the inference of conifer higher-order relationships. I used HyPhy (Kosakovsky Pond et al., 2005; and see Burleigh and Mathews, 2004) to classify each alignment site into one of nine rate change classes (referred to as RC0-RC8, with RC0 being the zero-rate category and RC8 the fastest). The single most parsimonious tree (see below) was used as a reference tree for estimating GTR model parameters and the site rate classifications in HyPhy. I re-ran the MP and ML analyses after excluding the two fastest rate categories from consideration (i.e., RC7 and RC8, see Burleigh and Mathews, 2004).  49  3.3 RESULTS The relationships inferred among the major groups of seed plants differ in the MP and ML analysis of the full plastid data set, with Ginkgo (MP) or a clade consisting of Ginkgo, cycads and angiosperms (ML) inferred to be the sister group of conifers, in both cases with moderately strong support (Figs. 3.1, 3.2). Neither method supports a placement of Gnetales in or near the conifers, and both support conifer monophyly (100% from MP, 85% bootstrap support from ML). Both methods infer identical relationships within conifers (Figs. 3.1, 3.2), with four of the five non-monogeneric conifer families strongly supported as monophyletic, at least at the current level of taxon sampling. The two fastest rate classes (RC7 and RC8) comprise 4 252 characters, corresponding to a substantial fraction of all parsimony informative characters (~79% of 5 384 sites). Of these RC78 sites, only 3 501 are parsimony informative, and so deleting these sites (corresponding to the RC0-6 analyses) leaves ~35% of all parsimony informative sites, not ~21% (the fraction expected if RC78 sites are all parsimony informative). The parsimony uninformative RC78 sites are all variable. I examined a subset of these and found them to be sites in alignment blocks that include only a few taxa. Presumably these are predicted to be rapidly evolving sites in the ML classification because variation is seen in them across a small taxon sampling. When all RC78 characters are excluded from consideration, support for relationships among the five major groups of seed plants falls substantially, providing only poor to moderate support for the relevant branches (data not shown). However, relationships inferred within conifers are essentially unchanged after deletion of the fastest characters, with  50  mostly very minor shifts in bootstrap support (cf. Figs. 3.1-3.3). Bootstrap support for conifer monophyly (moderate to strong support from ML and MP analysis, respectively) is also largely unchanged. However, the best MP and ML trees for RC0-6 do not depict Taxaceae as monophyletic (e.g., Fig. 3.3), and the clade consisting of these three taxa is then only moderately supported (70-80%). The remaining results focus on the analyses that consider all of the data. With all data included, the Cupressophyta clade is well supported as monophyletic, with 100% support from MP and ML bootstrap analysis (Figs. 3.1-3.2). Within this clade, Araucariaceae and Podocarpaceae are strongly supported as sister taxa (100% for MP and ML), and this two-family clade is in turn strongly supported as the sister-group of a clade consisting of Cupressaceae, Cephalotaxaceae, Sciadopityaceae and Taxaceae (i.e., both Cupressophyta and the latter four-family clade are strongly supported). The interfamilial relationships within the latter clade are also well supported, with 96% to 100% bootstrap support from MP and ML analysis. More specifically, Sciadopitys (Sciadopityaceae) is strongly supported as the sister group of the remaining three families in the latter clade; Cephalotaxaceae and Taxaceae are sister taxa, and the Cephalotaxaceae-Taxaceae clade is then sister to Cupressaceae. Of the four families sampled for more than two exemplar taxa (Figs. 3.1, 3.2), Pinaceae and Podocarpaceae have only weakly to moderately supported intrafamilial backbones (58-88% bootstrap support), although the same deep splits are inferred by the two phylogenetic methods (i.e., in Pinaceae, Abies is sister to Cedrus and Pinus sister to Pseudotsuga; in Podocarpaceae, Podocarpus is sister to Saxegothaea). In contrast, all relationships inferred within Araucariaceae and Cupressaceae have 100% MP and ML  51  bootstrap support. In particular, the three ‘core Cupressaceae’ taxa sampled here (Juniperus, Thuja and Widdringtonia) are deeply nested among other members of Cupressaceae, with a basal split in the family seen between Cunninghamia and other taxa. Within Araucariaceae, Wollemia is strongly supported as the sister group of Agathis. A structural feature in the 5’-end of one of the ribosomal proteins considered here, rps7, is worth commenting on, as it seems to represent an otherwise quiescent region that has experienced a ‘recent’ burst of microstructural mutations (insertions and deletions) in Araucariaceae and Podocarpaceae, including at least one tandem repeat expansion shared by these two families (Fig. 3.4). I refer to this hotspot of structural mutations as an ‘expansion region,’ as all taxa examined in Araucariaceae and Podocarpaceae are longer due to (predicted) insertions in this region relative to other land plants. However, it should be noted that the region is likely to have undergone both expansions and contractions (data not shown). The total expansion region is quite complex and includes multiple repeated motifs, a subset of which is shared among taxa in Araucariaceae and Podocarpaceae. For example, a ~15 amino-acid indel is present as six copies in Podocarpus, three copies in Agathis, Araucaria, Saxegothaea and Wollemia, and two copies in Phyllocladus (e.g., Fig. 3.4). The tandem repeat (and broadly speaking the hotspot region itself) provides a microstructural synapomorphy for the clade consisting of these two families. The expansion region has a mean length of 149.6 bp (mean length of rps7=614.6 bp, SD=69.7 bp) across the eight taxa included in Araucariaceae and Podocarpaceae, compared to Pinus (length of rps7 excluding stop codon=465 bp). To provide some perspective, the mean total length of rps7 for the other taxa considered in this study is 466.4 bp, with a standard deviation of 3.37 bp. Although we have no experimental evidence that the expansion is part of the translated  52  sequence in these taxa, this seems probable based on comparative evidence. The rps7 gene in the two families is both variable in length and sequence (particularly so in Podocarpaceae), and consistently in-frame in all taxa examined, including multiple taxa in both families that were not included here (data not shown). The portion of the gene containing the expansion region does not appear to be especially prone to indel events elsewhere in the land plants. A more comprehensive survey for the entire expansion region that includes all genera from these two families will be presented elsewhere.  3.4 DISCUSSION 3.4.1 Rapidly Evolving Plastid DNA Sites and the Inference of Higher-Order Conifer Relationships Classifying characters into different rate classes and then removing the fastest ones is a useful alternative approach to dealing with so-called saturated sites, alignment positions that may be mis-informative for phylogenetic inference due to ‘unseen’ multiple hits. In principle we might expect that ML analysis should be unaffected by removal of these rate classes, as the method should properly correct for multiple hits if the DNA substitution model is adequate (e.g., Sullivan and Swofford, 2001). However, this adjustment might be expected to improve the accuracy of MP results if the amount of saturation is substantial enough to affect phylogenetic inference (e.g., Burleigh and Mathews, 2004). Removing all third-codon positions from protein-coding genes is arguably less desirable than excluding the most rapid ML rate classes, as the former approach is an overly coarse approach for correcting for multiple hits (Olmstead et al., 1998; Yang, 1998; Sanderson et al., 2000), and the latter is applicable to both coding and non-coding data. I  53  would also argue that an exclusion method based on site rates is preferable to the use of parsimony-based successive weighting methods, as in the study of conifer higher-order relationships by Quinn et al. (2002). Successive weighting has been criticized because it may lead to heuristic searches becoming trapped on local optima that depend on starting trees (Swofford et al., 1996). While the rate classification method used here may also partly depend on the starting tree, it has the potential advantage that the substitution rates used to cull the data are explicitly model-based estimates (see Olmstead et al., 1998 for a parsimonybased approach for excluding highly variable characters). However, as I find that deleting the most rapidly evolving characters has little to no effect on our major findings within the conifers (Figs. 3.1-3.3), and only a small effect on branch support, the debate could be considered moot for conifer phylogeny inference. The slight to modest reduction in bootstrap support observed within conifers after the removal of the two fastest rate classes is consistent with an expectation of increased sampling error due to fewer characters. In summary, I find no evidence here that the most rapidly evolving sites distort the inference of higher-order conifer relationships.  3.4.2 The Ovulate Cone and Conifer Systematics The ovulate (seed) cone has been considered to be particularly significant in conifer systematics (e.g., Pilger, 1926; Florin, 1951; Miller, 1999). For example, because members of Taxaceae lack the ‘typical’ compound ovulate cone of conifers, this was used to justify their recognition as Taxales, an order distinct from the remaining conifers (e.g., Florin, 1951). My data confirm the widely accepted view that Taxaceae have a nested position within the Cupressophyta clade of conifers, sister to Cephalotaxaceae (Figs. 3.1, 3.2). If a  54  compound ovulate cone of the sort found in Pinaceae was ancestral in extant conifers, as is usually assumed, this would require that the ovule-bearing arrangement in Taxaceae (an apparently ‘coneless conifer’ from the perspective of its ovules) was derived by reduction from the more complex form (e.g., Chamberlain, 1935; Takhtajan, 1953; Hart, 1987; Doyle, 1998; Quinn et al., 2002). However, Tomlinson and Takaso (2002) discuss general difficulties in applying Florin’s model in most families of conifers, due to the extreme modification or apparent absence of the ovuliferous scale in these taxa (the ovuliferous scale is a condensed ovule-bearing secondary shoot axis whose underlying structure seems clearest in the ovulate cones of Pinaceae and Sciadopityaceae; Tomlinson and Takaso, 2002). Hart (1987) suggested that too much weight has been placed on the compound ovulate cone in higher-order conifer systematics.  3.4.3 The Case for Recognizing Cephalotaxus as a Member of Taxaceae I infer Cephalotaxus to be the sister group of the two genera of Taxaceae sampled here, Taxus and Torreya; Taxaceae s.s. are monophyletic at this taxon sampling (Figs. 3.1, 3.2). A sister-group relationship between Cephalotaxaceae and Taxaceae was first recovered by Hart (1987) using morphological data, and subsequently recovered in a morphological analysis by Doyle (1998). A matK-based analysis of the two families that surveyed all five genera of Taxaceae (Cheng et al., 2000) found strong support for the monophyly of Taxaceae (96%, from MP analysis), but included only a handful of outgroups. In the more broadly based study of two plastid loci (matK and rbcL) by Quinn et al. (2002), parsimony analysis found strong support for a clade comprising Cephalotaxus and Taxaceae. Their analysis did not strongly support the monophyly of Taxaceae s.s. unless the data were re-weighted using  55  successive weighting, a method that may yield artifactual results (see above). However, it should be noted that Quinn et al. (2002) consistently recovered two strongly supported clades of Taxaceae in equally and unequally weighted analyses, one of which includes Taxus and the other Torreya. These are the two genera that we included as exemplars for the family, and which I found to comprise a clade. When I removed the fastest evolving characters from phylogenetic analysis (RC7 and RC8), relationships among Cephalotaxus, Taxus and Torreya are no longer strongly supported (Fig. 3.3). My main analyses support the view of Quinn et al. (2002) that it is no longer useful to recognize Cephalotaxaceae (Figs. 3.1, 3.2); Cephalotaxus should be returned to its original home in Taxaceae. The rationale for a circumscription of Taxaceae that includes Cephalotaxus is straightforward should the latter prove to be nested in the former, as the more broadly defined family would then be monophyletic. However, if Cephalotaxus is instead shown to be the sister group of the five genera usually assumed to be in Taxaceae s.s. (i.e., Austrotaxus, Amentotaxus, Pseudotaxus, Taxus and Torreya), a straightforward case can also be made for reducing Cephalotaxaceae to synonymy. Backlund and Bremer (1998) have argued that higher-order classifications that recognize two families in this situation (where a small monogeneric family is the sister group of a larger one) do not optimize phylogenetic information, and ought to be considered redundant. Furthermore, the morphological distinction between Cephalotaxus and Taxaceae is clearly not so great that their combination would create a morphologically unrecognizable taxon (see APG II, 2003). For example, Cephalotaxus also has a relatively simple ovule-bearing arrangement (a pair of ovules in the axil of a bract, the two separated by a narrow flange of tissue of uncertain origin; Tomlinson and Takaso, 2002). A morphological connection between Cephalotaxus and Taxaceae s.s is  56  uncontroversial (Doyle, 1998; Stützel and Röwekamp, 1999). However, the relationship among the six genera of Taxaceae in its broadened circumscription ought to be addressed by including more taxa in phylogenetic analysis for a sampling of plastid data at least as large as that examined here.  3.4.4 Relationships within Cupressaceae Relationships for three ‘core Cupressaceae’ sampled here are in line with other studies and are well supported: Widdringtonia (representing the ‘callitroid’ clade of Gadek et al., 2000) is the sister group of Juniperus + Thuja (two exemplars that represent the ‘cupressoid’ clade of Gadek et al., 2000). Basal relationships in the family have generally not been inferred with strong support (e.g., Brunsfeld et al., 1994; Stefanovic et al., 1998; Gadek et al., 2000; Kusumi et al., 2000, although see Quinn et al., 2002). The limited taxon sampling here is congruent with these earlier studies (Figs. 3.1-3.3). As with other recent studies (Hart, 1987; Stefanovic et al., 1998; Gadek et al., 2000; Kusumi et al., 2000; Quinn et al., 2002), I find that “Taxodiaceae” (represented here by Cunninghamia, Metasequoia and Taxodium) comprise a grade of taxa near the base of Cupressaceae. Metasequoia and Taxodium represent the ‘sequoioid’ and ‘taxodioid’ clades of Gadek et al. (2000), respectively; the latter is the sister group of the core Cupressaceae (Figs. 3.1-3.3). Cunninghamia is inferred to be the sister group of the remainder of the family, as in other recent studies with a broad taxon sampling (Gadek et al., 2000; Quinn et al., 2002). All of these relationships are well supported here. 3.4.5 The Higher-Order Position of Phyllocladus in Conifer Phylogeny  57  Phyllocladus has sometimes been recognized as a distinct family, Phyllocladaceae, because of its highly distinctive morphology, including details of its pollen morphology, wood anatomy and its unique (for conifers) broad leaf-like cladodes (Keng, 1973, 1978; Page, 1990a). However, it shares a relatively simple fleshy ovulate cone with the other podocarps (note that the fleshy part of the cone is not homologous across Podocarpaceae; Kelch, 1997; Tomlinson and Takaso, 2002), and the need to recognize it at the family level has been contested based on evidence from embryogeny and other morphological characters (Quinn, 1986, 1987). We consistently find Phyllocladus to be the sister group of the two other taxa that we surveyed for Podocarpaceae (Podocarpus and Saxegothaea) with weak to strong support, and we consistently find strong support for the clade consisting of all three genera (Figs. 3.1-3.3). Other phylogenetic studies find Phyllocladus to be nested among podocarps (Hart, 1987; Kelch, 1997, 1998; Conran et al., 2000), sister to the rest of the family (Kelch, 1998; Quinn et al., 2002; Sinclair et al., 2002), or even a grade at the base of the family (Kelch, 1998), generally with poor support. However, straightforward arguments paralleling those used above to justify the recognition of Cephalotaxus under Taxaceae can be used to support a circumscription of the podocarps (Podocarpaceae) that includes Phyllocladus. Improved taxon sampling using the current plastid data set should help clarify the backbone of relationships within this large and diverse family.  3.4.6 The Position of Wollemia within Araucariaceae The recently discovered conifer Wollemia nobilis (Jones et al., 1995) has attracted much attention (e.g., Hogbin et al., 2000; Peakall et al., 2003) because of its status as a ‘living fossil’ (in other words, a previously unknown living taxon that bears considerable  58  similarity to fossil taxa). The monotypic Wollemia has been placed unequivocally in Araucariaceae (Jones et al., 1995), although morphological evidence on where it fits in relation to the other genera is inconclusive, since it approaches members of Agathis and Araucaria in contrasting leaf and ovulate cone characters (Chambers et al., 1998). With the exception of Setoguchi et al. (1998), who found moderate support for Wollemia as the sistergroup of Agathis and Araucaria using rbcL, other studies have recovered Wollemia as the sister-group of Agathis with weak to moderate support (Gilmore and Hill, 1997; Stefanovic et al., 1998; Conran et al., 2000). The placement was also strongly supported in a combined analysis of matK and rbcL data by Quinn et al. (2002). I confirm this result here: in all analyses Wollemia nobilis is strongly supported as the sister group of Agathis (Figs. 3.1-3.3).  3.4.7 Significance of an Expansion Hotspot in the Plastid Ribosomal Protein Gene rps7 The morphological evidence on where the podocarps place in higher-order conifer phylogeny is not clear (e.g., Page, 1990b), and a close relationship between the predominantly southern hemisphere families Araucariaceae and Podocarpaceae was not firmly supported until relatively recently. Several molecular studies have demonstrated that these two families are sister taxa (Chaw et al., 1997; Stefanovic et al., 1998; Gugerli et al., 2001; Quinn et al., 2002; Rydin et al., 2002) and the phylogenies inferred here provide further support for this relationship (Figs. 3.1-3.3). The rps7 expansion hotspot (and the associated tandemly repeated amino-acid motif; Fig. 3.4) provides a microstructural synapomorphy supporting this two-family clade. Although the functional significance of this expansion region is unknown, comparable large expansions have been found in another plastid ribosomal gene, rps4, in Araucariaceae and Podocarpaceae (D. Kelch, pers. comm.,  59  2007), which suggests that a survey of other plastid ribosomal protein genes might uncover additional protein structural shifts.  3.4.8 Seed-Plant Phylogeny and the Position of Gnetales I observe moderately to strongly conflicting sets of relationships here among the major seed-plant groups (cf. MP, ML analyses of the complete data set; Figs. 3.1, 3.2). There has been extensive debate on the potential for incorrectly inferring relationships among the major groups of extant seed plants from molecular data due to systematic bias, including the question of conifer monophyly relative to a possible relationship between Pinaceae and Gnetales (e.g., Sanderson et al., 2000; Burleigh and Mathews, 2004, 2007a, b). [The broader issue of the monophyly of extant and extinct conifers is unsettled; for example, it is not clear whether voltzialean conifers such as Emporia are closely related to extant conifers (Rothwell and Serbet, 1994; Doyle, 2005).] Extinct taxa are essentially inaccessible to molecular systematists, but Burleigh and Mathews (2004) suggested that better taxon sampling of extant taxa might help reduce the observed conflict among studies regarding broad seed-plant relationships. My substantially improved taxon sampling within conifers (compared to Rai et al., 2003) does not yield a clearer answer for seed-plant relationships as a whole (see Chapter 4), although it is possible that this picture will change with additional conifer sampling. Removing the faster characters (RC78), which might be expected to reduce systematic bias, results in generally poorer support for relationships among the major nonconifer seed plant clades in MP and ML analyses (data not shown). For example, in the analyses of the full data set the bootstrap support for the clade that is the sister group of Gnetales (which consists of angiosperms, conifers, cycads and Ginkgo; Figs. 3.1, 3.2) is  60  100% from MP analysis and 86% from ML analysis. These values fall to 79%, and 44%, respectively, with the fastest characters removed. I re-examined our bootstrap profiles to determine the levels of support for alternative relationships involving Gnetales for the full and reduced data set. I found that bootstrap support for the hypothetical gnepine clade from our MP and ML analyses is <1% and 9% (respectively) with all data included, versus 7% and 12% with the RC78 sites removed. Bootstrap support for even a loose version of the ‘gnetifer’ hypothesis (i.e., with Gnetales and conifers in a clade, without regard to the monophyly of either) is weak at best. There is <1% and 14% bootstrap support for this clade (from MP and ML analyses, respectively) when all data are included. The latter support values show modest improvement when RC78 sites are removed, with 22% and 50% bootstrap support for this weak version of the gnetifer hypothesis from our MP and ML analyses, respectively. I address the broader issue of seed-plant relationships more fully in Chapter 4.  3.4.9 The Inference of Conifer Phylogeny from Plastid Data Considering either the entire data set or the reduced subset of it, my MP and ML analyses support Pinaceae as the sister group of the rest of the conifers, with or without the most rapidly evolving characters included (Figs. 3.1-3.3). I therefore find moderate to strong support for conifer monophyly (setting aside for now the possibility that the gnepine hypothesis is correct but not recovered here due to strong systematic bias; see above). I recover strong bootstrap support for the broad backbone of conifer phylogeny, comparable to or better than that found in other studies with a broad sampling of conifers (e.g., Stefanovic et al., 1998; Quinn et al., 2002). This is in line with theoretical expectations that increasing  61  the amount of data per taxon should reduce the effect of sampling error on phylogenetic inference. Further increasing the taxon sampling within the major clades of conifers for the plastid gene set examined here, or others of comparable size, may help address the subset of relationships that I did not infer with strong support (i.e., relationships within Pinaceae, Podocarpaceae and Taxaceae s.l.; Figs. 3.1-3.3); see Hillis (1998) for a rationale. Adding genes may also help address some of the hard-to-resolve branches within conifers (e.g., within Pinaceae) and some of the tougher questions involving the major groups of seed plants (e.g., concerning Gnetales placement). It is becoming reasonably straightforward, for example, to obtain plastid data sets of the order of size of the whole plastid genome (e.g., Leebens-Mack et al., 2005). However, the current gene sample is clearly sufficient to recover strong bootstrap support for most of the higher-order relationships that I address in conifers (Figs. 3.1, 3.2). Indeed, even the relatively small set of characters that I infer to be among the most slowly evolving (i.e., RC0-6) provides excellent support for almost the entire broad backbone of conifer phylogeny (Fig. 3.3).  62  TABLE 3.1. Source information and GenBank numbers.  Taxon (Voucher, herbarium)  Gene or region _______________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, & psbH & psbJ rps7& ndhB  Araucariaceae Agathis australis (D. Don) Loudon (H.S. Rai 1002, ALTA)  AY664829  AY902169  AF528892  AF528919  AF528865  AF362993*  AY664864  AY164586  Agathis robusta EF490502 (C. Moore ex F. Muell.) F.M. Bailey (037944-037947, GAU)  EF494250  EF490512  EF490506  EF490515  EF490509  EF490521  EF490518  Araucaria bidwillii Hook. (H.S. Rai 1006, ALTA)  AY664830  AY902170  AY664852  AY664840  AY664846  U96472*  AY664865  AY664816  Araucaria cunninghamia Aiton ex D. Don (037942 & 037943, GAU)  EF490503  EF494251  EF490513  EF490507  EF490516  EF490510  EF490522  EF490519  Wollemia nobilis EF490504 W.G. Jones, K.D. Hill & J.M. Allen (no voucher†)  EF494249  EF490511  EF490505  EF490514  EF490508  EF490520  EF490517  63  Taxon (Voucher, herbarium)  Gene or region _______________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, & psbH & psbJ rps7& ndhB  Cephalotaxaceae Cephalotaxus harringtonii AY664831 (Knight ex J. Forbes) K. Koch (R.G. Olmstead 2000-55, WTU)  AY902171  AF528896  AF528923  AF528869  AF227461*  AY664866  AY664817  Cunninghamia lanceolata AY664833 (Lamb.) Hook. (P.A. Reeves & J. Metropulos 18, WTU)  AY902174  AF528898  AF528925  AF528871  L25757*  AY664869  AY664820  Juniperus communis L. (H.S. Rai 1011, ALTA)  AY664834  AY902175  AY664854  AY664842  AY664848  AY664859  AY664870  AY664821  Taxodium distichum (L.) Rich. (K. Ikegama 2002-1, WTU)  AY664835  AY902176  AF528915  AF525949  AF528888  AF119185*  AY664871  AY664822  Thuja plicata AY664836 Donn ex. D. Don (P.A. Reeves & J. Metropulos 19, WTU)  AY902177  AF528917  AF528942  AF528890  AF127428*  n/a  AY664823  Widdringtonia cedarbergensis J.A. Marsh (H.S. Rai 1001, ALTA)  AY902178  AF528918  AF528943  AF528891  AY140261  AY664872  AY664824  Cupressaceae s.l.  n/a  64  Taxon (Voucher, herbarium)  Gene or region _______________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, & psbH & psbJ rps7& ndhB  Pinaceae Abies lasiocarpa AY664825 (Hook.) Nutt. (R.G. Olmstead 2001-82, WTU)  n/a  AY664849  AY664837  AY664843  AY664855  AY664860  AY664813  Pseudotsuga menziesii (Mirb.) Franco (H.S. Rai 1022, ALTA)  n/a  AY664850  AY664838  AY664844  AY664856  AY664861  AY664814  Phyllocladus alpinus AY664827 Hook. f. (R.G. Olmstead 2000-54, WTU)  AY902167  AF528905  AF528933  AF528879  AF249650*  AY664862  AY237142  Saxegothaea conspicua AY664828 Lindl. (D.M. Cherniawsky ZB-VI-VII, ALTA)  AY902168  AY664851  AY664839  AY664845  AY664857  AY664863  AY664815  AY664826  Podocarpaceae  65  Taxon (Voucher, herbarium)  Gene or region _______________________________________________________________________________________________ atpB ndhF psbB, T, N, psbD & C psbE, F, L rbcL rpl2 3'rps12, & psbH & psbJ rps7& ndhB  Taxaceae Taxus brevifolia Nutt. (A. Colwell 2000-32, WTU)  AF528864  AY902172  AF528916  AF525948  AF528889  AF249666*  AY664867  AY664818  Torreya californica Torr. (H.S. Rai 1008, ALTA)  AY664832  AY902173  AY664853  AY664841  AY664847  AY664858  AY664868  AY664819  Notes. * Previously published sequences; see Graham and Olmstead (2000a,b) and Chapter 2 for a complete list of taxa and accession numbers for other taxa considered here, including the following conifers: Cedrus deodora and Pinus thunbergii (Pinaceae), Metasequoia glyptostroboides (Cupressaceae), Podocarpus chinensis (Podocarpaceae) and Sciadopitys verticillata (Sciadopityaceae). † This voucherless sample is from the same population as vouchered specimens described in Jones et al. (1995).  66  Figure 3.1. Plastid-based phylogeny of the conifers and relatives inferred from MP for 15-17 chloroplast genes and associated noncoding regions (three introns and eight intergenic spacer regions). This single most-parsimonious tree (21,532 steps, CI=0.541, RI=0.659) using) is depicted as a phylogram, with ACCTRAN optimization of branch lengths. MP bootstrap values are indicated beside branches.  67  Figure 3.2. Plastid-based phylogeny of the conifers and relatives inferred from ML for 15-17 chloroplast genes and associated noncoding regions (three introns and eight intergenic spacer regions) using the GTR + ! + I model of sequence evolution. The ML tree (-lnL=128,297.454) is depicted as a phylogram. ML bootstrap values are indicated beside branches.  68  Figure 3.3. Summary of bootstrap support after removal of sites classified as the two of nine fastest rate classes for 15-17 chloroplast genes and associated noncoding regions. Bootstrap values are indicated beside branches (left- and right-hand values are for MP and ML analysis, respectively; ‘-’ refers to < 50% support). The topology shown is that of the best MP tree when the fastest sites are removed, with outgroups pruned for clarity.  69  Figure 3.4. Dot-plot showing the pairwise similarity of complete translated sequences of the plastid rps7 locus from selected conifers (Pinus, Pinaceae; Podocarpus, Podocarpaceae; Wollemia; Araucariaceae; Sciadopitys, Sciadopityaceae) using a PAM-30 amino-acid substitution model, an 11-residue sliding window and a gray scale of 58-77%.  70  3.5 REFERENCES ANGIOSPERM PHYLOGENY GROUP (APG II). 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141: 399-436. BACKLUND, A., AND K. BREMER. 1998. To be or not to be – principles of classification and monotypic plant families. Taxon 47: 391-400. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales’ closest relatives are conifers. Proc. Nat. Acad. Sci. 97: 4092-4097. BRUNSFELD, S. J., P. E. SOLTIS, D. E. SOLTIS, P. A. GADEK, C. J. QUINN, D. D. STRENGE, AND T. A. RANKER. 1994. Phylogenetic relationships among the genera of Taxodiaceae and Cupressaceae: evidence from rbcL sequences. Syst. Bot. 19: 253-262. BURLEIGH, J. G., AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Amer. J. Bot. 91: 1599-1613. BURLEIGH, J. G., AND S. MATHEWS. 2007a. Assessing among-locus variation in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 111-124. BURLEIGH, J. G., AND S. MATHEWS. 2007b. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, AND M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822-846. CHAMBERLAIN, C. J. 1935. Gymnosperms, structure and evolution. Chicago University Press, Chicago, Illinois.  71  CHAMBERS, T. C., A. N. DRINNAN, AND S. MCLOUGHLIN. 1998. Some morphological features of Wollemi Pine (Wollemia nobilis: Araucariaceae) and their comparison to cretaceous plant fossils. Int. J. Plant Sci. 159: 160-171. CHAW, S-M., A. ZHARKIKH, H-M. SUNG, T-C. LAU, AND W-H. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68. CHAW, S-M., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Acad. Nat. Sci. 97: 4086-4091. CHENG, Y., R. G. NICOLSON, K. TRIPP, AND S-M. CHAW. 2000. Phylogeny of Taxaceae and Cephalotaxaceae genera inferred from chloroplast matK gene and nuclear rDNA ITS region. Mol. Phylogenet. Evol. 14: 353-365. CONRAN, J. G., G. M. WOOD, P. G. MARTIN, J. M. DOWD, C. J. QUINN, P. A. GADEK, AND R. A. PRICE. 2000. Generic relationships within and between the gymnosperm families Podocarpaceae and Phyllocladaceae based on an analysis of the chloroplast gene rbcL. Aust. J. Bot. 48: 715-724. DONOGHUE, M. J., AND J. A. DOYLE. 2000. Seed plant phylogeny: demise of the anthophyte hypothesis? Curr. Biol. 10: R106-R109. DOYLE, J. A. 1998. Phylogeny of vascular plants. Annu. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A. 2005. Seed ferns and the origin of angiosperms. J. Torrey. Bot. Soc. 133: 169209. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15.  72  ECKENWALDER, J. E. 1976. Re-evaluation of Cupressaceae and Taxodiaceae: A proposed merger. Madroño 23: 237-256. ENRIGHT, N. J., AND R. S. HILL. 1995. Ecology of the southern conifers. Smithsonian Institution Press, Washington, D.C. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783–791. FLORIN, R. 1951. Evolution in cordaites and conifers. Acta Horti Berg. 15: 285-388. GADEK, P. A., D. L. ALPERS, M. M. HESLEWOOD, AND C. J. QUINN. 2000. Relationships within Cupressaceae sensu lato: A combined morphological and molecular approach. Amer. J. Bot. 87: 1044-1057. GILMORE, S., AND K.D. HILL. 1997. Relationships of the Wollemi Pine (Wollemia nobilis) and a molecular phylogeny of the Araucariaceae. Telopea 7: 275-291. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Amer. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S.W., J.R. KOHN, B. R. MORTON, J. E. ECKENWALDER, AND S. C. H. BARRETT. 1998. Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. Syst. Biol. 47: 545–567. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels  73  and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161: S83S96. GRAHAM, S. W., J. M. ZGURSKI, M. A. MCPHERSON, D. M. CHERNIAWSKI, J. M. SAARELA, E. F. C. HORNE, S.Y. SMITH, W. A. WONG, H. E. O’BRIEN, V. L. BIRON, J. C. PIRES, R. G. OLMSTEAD, M. W. CHASE, AND H. S. RAI. 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Pp. 3-21. In Monocots: comparative biology and evolution (excluding Poales) Edited by J. T. Columbus, E. A. Friar, J. M. Porter, L. M. Prince, and M. G. Simpson. Rancho Santa Ana Botanic Garden, Claremont, CA. GUGERLI, F., C. SPERISEN, U. BÜCHLER, I. BRUNNER, S. BRODBECK, J. D. PALMER, AND Y-L. QUI. 2001. The evolutionary split of Pinaceae from other conifer: evidence from an intron loss and a multigene phylogeny. Mol. Phylogenet. Evol. 21: 167-175. GUINDON S., AND O. GASCUEL. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704. HART, J. A. 1987. A cladistic analysis of conifers: Preliminary results. J. Arnold Arb. 68: 269307. HILLIS, D. 1998. Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. 42: 182-192. HOGBIN, P. M., R. PEAKALL, AND M. A. SYDES. 2000. Achieving practical outcomes from genetic studies of rare plants. Australian Journal of Botany 48: 375-382 JONES, W. G., K. D. HILL, AND J. M. ALLEN. 1995. Wollemia nobilis, a new living Australian genus and species in the Araucariaceae. Telopea 6: 173-176.  74  JUNIER, T., AND M. PAGNI. 1999 Dotlet: diagonal plots in a web browser. Bioinformatics 16: 178-179. KELCH, D. G. 1997. The phylogeny of the Podocarpaceae based on morphological evidence. Syst. Bot. 22: 113-131. KELCH, D. G. 1998. Phylogeny of Podocarpaceae: comparison of evidence from morphology and 18S rDNA. Amer. J. Bot. 85: 986-996. KENG, H. 1973. On the family Phyllocladaceae. Taiwania 18: 142-145. KENG, H. 1978. The genus Phyllocladus (Phyllocladaceae). J. Arnold. Arb. 59: 249-273. KOSAKOVSKY POND, S. L., S. D. W. FROST, AND S. V. MUSE. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676-679. KUSUMI, J., Y. TSUMURA, H. YOSHIMARU, AND H. TACHIDA. 2000. Phylogenetic relationships in Taxodiaceae and Cupressaceae sensu stricto based on matK gene, chlL gene, trnL-trnF IGS region, and trnL intron sequences. Amer. J. Bot. 87: 1480-1488. LEEBENS-MACK, J., L. A. RAUBESON, L. CUI, J. V. KUEHL, M. H. FOURCADE, T. W. CHUMLEY, J. L. BOORE, R. K. JANSEN, AND C. W. DEPAMPHILIS. 2005. Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one's way out of the Felsenstein zone. Mol. Biol. Evol. 22: 1948-1963. MILLER, N. M. JR. 1999. Implications of fossil conifers for the phylogenetic relationships of living families. Bot. Rev. 65: 239-277. MUNDRY, M., AND T. STÜTZEL. 2004. Morphogenesis of the reproductive shoots of Welwitschia mirabilis and Ephedra distachya (Gnetales), and its evolutionary implications. Organisms, Diversity and Evolution 4: 91-108.  75  OLMSTEAD, R. G., P. A. REEVES. AND A. C. YEN. 1998. Patterns of sequence evolution and implications for parsimony analysis of chloroplast DNA. Pp. 164-187. In Molecular systematics of plants II: DNA sequencing. Edited by P. S. Soltis, D. E. Soltis, and J. J. Doyle. Kluwer, Boston, MA. PAGE, C. N. 1990a. Phyllocladaceae. Pp. 317-319. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PAGE, C. N. 1990b. Podocarpaceae. Pp. 332-346. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PAGE, C. N. 1990c. Sciadopityaceae. Pp. 346-348. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PAGE, C. N. 1990d. Taxaceae. Pp. 346-353. In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin. PEAKALL, R., D. EBERT, L. SCOTT, P. MEAGHER, AND C. OFFORD. 2003. Comparative genetic study confirms exceptionally low genetic variation in the ancient and endangered relictual conifer, Wollemia nobilis (Araucariaceae). Mol. Ecol. 12: 2331-2343. PILGER, R. 1926. Coniferae. Pp. 121-407. In Die Natürlichen Pflanzenfamilien 2nd edition. Edited by A. Engler and K. Prantl. W. Engelmann, Leipzig. POSADA, D., AND K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14: 817-818. QUINN, C. J. 1986. Embryogeny in Phyllocladus. New Zealand J. Bot. 24: 575-580. QUINN, C. J. 1987. The Phyllocladaceae Keng – a critique. Taxon 36: 559-565. QUINN, C. J., R. A. PRICE, AND P. A. GADEK. 2002. Familial concepts and relationships in the conifers based on rbcL and matK sequence comparisons. Kew Bull. 57: 513-531.  76  RAI, H. S., H. E. O’BRIEN, P. A. REEVES, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phylogenet. Evol. 29: 350-359. RAMBAUT, A. 1998. “Se-Al (Sequence Alignment Editor Version 1.0),” Computer program and documentation. Department of Zoology, University of Oxford, UK. RAUBESON, L. A., AND R. K. JANSEN. 1992. A rare chloroplast-DNA structural mutation is shared by all conifers. Biochem. Syst. Ecol. 20: 17-24. RYDIN, C., M. KÄLLERSJÖ, E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci 16: 197-214. ROTHWELL, G. W., AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482. Saarela, J. M., H. S. RAI, J. A. DOYLE, P. K. ENDRESS, S. MATHEWS, A. D. MARCHANT, B. G. BRIGGS, AND S. W. GRAHAM. 2007. Hydatellaceae identified as a new branch near the base of the angiosperm phylogenetic tree. Nature 446: 312-315. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17: 782-797. SETOGUCHI, H., T. A. OSAWA, J-C. PINTAUD, T. JAFFRÉ, AND J-M. VEILLON. 1998. Phylogenetic relationships within Araucariaceae based on rbcL gene sequences. Amer. J. Bot. 85: 1507-1516. SINCLAIR, W. T., R.R. MILL, M. F. GARDNER, P. WOLTZ, T. JAFFRÉ, J. PRESTON, M. L. HOLLINGSWORTH, A. PONGE, AND M. MÖLLER. 2002. Evolutionary relationships of the  77  New Caledonian heterotrophic conifer, Parasitaxus usta (Podocarpaceae), inferred from chloroplast trnL-F intron/spacer and nuclear rDNA ITS2 sequences. Plant Syst. Evol. 233: 79-104. STEFANOVIC, S., M. JAGER, J. DEUTSCH, J. BROUTIN, AND M. MASSELOT. 1998. Phylogenetic relationships of conifers inferred from partial 28S rRNA gene sequences. Amer. J. Bot. 85: 688-697. STOCKEY, R. A., J. KVACEK, R. S. HILL, G. W. ROTHWELL, AND K. KVACEK. 2005. The fossil record of Cupressaceae s. lat. Pp. 64-68. In A monograph of Cupressaceae and Sciadopitys. Edited by A. Farjon. Royal Botanic Gardens, Kew, UK. STÜTZEL, T., AND I. RÖWEKAMP. 1999. Female reproductive structures in Taxales. Flora 194: 145-157. SULLIVAN, J., AND D. L. SWOFFORD. 2001. Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution patterns are violated? Syst. Biol. 50: 723-729. SWOFFORD, D. L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, AND D. M. HILLIS. 1996. Phylogenetic inference. Pp. 407-514. In Molecular Systematics, 2nd edition. USA. Edited by D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, MA. TAKHTAJAN, A. L. 1953. Phylogenetic principles of the system of higher plants. Bot. Rev. 19: 1-45.  78  TOMLINSON, P. B., AND T. TAKASO. 2002. Seed cone structure in conifers in relation to development and pollination: a biological approach. Can. J. Bot. 80: 1250-1273. WAKASUGI, T., J. TSUDZUKI, S. ITO, K. NAKASHIMA, T. TSUDZUKI, AND M. SUGIURA. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Nat. Acad. Sci. 91: 9794-9798. YANG, Z. 1998. On the best evolutionary rate for phylogenetic analysis. Syst. Biol. 47: 125133. YAO, X., T. N. TAYLOR, AND E. L. TAYLOR. 1997. A taxodiaceous seed cone from the Triassic of Antarctica. Amer. J. Bot. 84: 343-354. ZGURSKI, J. M., H. S. RAI, Q. M. FAI, D. J. BOGLER, J. FRANCISCO-ORTEGA, AND S. W. GRAHAM. 2008. How well do we understand the overall backbone of cycad phylogeny? New insights from a large, multigene plastid data set. Mol. Phylogenet. Evol. 47: 12321237.  79  CHAPTER 41 INFERENCE AND MISINFERENCE OF HIGHER-ORDER SEED-PLANT RELATIONSHIPS FROM PLASTID DATA  4.1 INTRODUCTION The reconstruction of extant seed-plant phylogenetic relationships is recognized as one of the most difficult unresolved problems in plant systematics (e.g., Donoghue and Doyle, 2000; Burleigh and Mathews, 2004). Despite substantial clarification from molecular data concerning relationships within each major clade of seed plant (e.g., see Chapters 2, 3 for cycads and conifers), considerable uncertainty still persists regarding the question of the overall pattern of seed-plant phylogeny. Indeed, although individual studies sometimes find modest to strong support for particular relationships (e.g., a possible sister-group relationship between Ginkgo and cycads, Chapter 2), there is no clear consensus on what the sister group is for any of the five major, extant seed plant clades. A wide range of studies of morphological and molecular evidence have given many different and often strongly conflicting results (e.g., Hill and Crane, 1982; Doyle and Donoghue, 1986; Loconte and Stevenson, 1990; Rothwell and Serbet, 1994; Boivin et al., 1996; Doyle, 1996, 2006; Goremykin et al., 1996; Chaw et al., 1997, 2000; Winter et al., 1999; Bowe et al., 2000; Sanderson, 2000; Rydin et al., 2002; Soltis et al., 2002; Friis et al. 2007; Chapters 2, 3). Much of the discussion of higher-order seed plant relationships has focussed on the relative placement of Gnetales along the backbone of seed-plant phylogeny, 1  A version of this chapter will be submitted for publication: RAI, H. S., AND S. W. GRAHAM. Inference and misinference of higher-order seed-plant relationships from plastid data. 80  particularly their placement relative to the conifers. In some analyses, conifers are found not to be monophyletic, and Gnetales are then usually found to be the sister group of Pinaceae, the so-called “gnepine” hypothesis (Bowe et al, 2000; Chaw et al, 2000; Hajibabaei et al., 2006; Fig. 4.1). Other studies support a “Gnetales-sister” hypothesis in which Gnetales are the sister-group of all other seed plants, with the conifers then depicted as monophyletic (e.g., Rydin et al., 2002; Chapters 2, 3; Fig. 4.1). In contrast, various lines of morphological evidence from living and extinct taxa seem to support the idea that Gnetales are instead more closely related to the angiosperms among extant taxa, the so-called “anthophyte” hypothesis (e.g., Doyle, 1996, 2006; Hilton and Bateman, 2006; Fig. 4.1). The vast majority of molecular phylogenetic analyses have recovered trees that are inconsistent with the anthophyte hypothesis (but see Stefanovic et al., 1998). Features possessed by some or all members of Gnetales, such as aspects of their seed architecture (see Friis et al. 2007), their net-veined leaves and vessels, lend morphological support to the idea of a close association between Gnetales and angiosperms (Nixon et al., 1994; Rothwell and Serbet, 1994; Doyle, 1996, 2006). However, in his recent morphological re-analysis, Doyle (2006) found trees only one step longer than the shortest MP trees that depict Gnetales as more closely related to conifers than to angiosperms. A significant problem that has plagued plant molecular systematists is that gymnosperms have a diverse and ancient evolutionary history. Relatively few major extant clades have persisted to the present day, relative to the total diversity of gymnosperm clades that have left traces in the fossil record (e.g., Rothwell, 1982; Rothwell and Serbet, 1994; Crane et al., 2004), with a net effect that the major extant crown groups of gymnosperms are subtended by long interior (non-terminal) branches. There has been considerable speculation  81  (e.g., Sanderson et al. 2000; Rothwell and Stockey, 2002; Burleigh and Mathews, 2004) that long-branch attraction (LBA) or other sources of systematic bias may give rise to strongly misleading results in this situation. If this is correct, then strong bootstrap support values (and bayesian posterior probabilities) for seed-plant relationships may be strongly misleading (see Felsenstein, 1978; Hendy and Penny, 1989; Bergsten, 2005) in at least a subset of studies, and perhaps all of them. However, we should expect model-based methods to be considerably less prone to this problem than maximum parsimony, so long as the analytical model and associated parameters used for analysis provide adequate estimates of the real (unknown) pattern of DNA substitution (see Huelsenbeck, 1995; Chang, 1996; Swofford et al. 2001). Burleigh and Mathews (2004) showed that different site-rate classes estimated for a 13-locus data set comprising plastid, mitochondrial and nuclear data, tend to favour different hypotheses of seed-plant relationship in MP analysis. For example, they found that sites evolving at intermediate rates support the gnepine hypothesis, whereas faster ones favour the Gnetales-sister hypothesis. By removing the fastest classes they inferred well-supported gnepine trees, whereas the full dataset yielded well-supported Gnetales-sister trees. They also found that it is difficult to differentiate between trees that support the gnepine hypothesis and trees that support a “gnetifer” hypothesis (i.e., with Gnetales as the sister group of Pinaceae vs. of a clade comprising all conifers), as both hypotheses are supported by many sites in the fastest rate categories, and relatively fewer sites in intermediate rate categories. In an analysis of stratigraphic data, they also found that the fossil record is more consistent with the anthophyte and gnepine hypotheses than Gnetales-sister reconstructions.  82  Several studies have addressed the potential effect of systematic error in phylogenetic inference using Monte Carlo simulations (e.g., Huelsenbeck et al., 1998; Maddison et al., 1999 and Sanderson et al., 2000). This method has been shown to be useful for quantifying type I and II error rates on tree reconstruction by generating simulated data that are based upon parameters (including tree structure) determined from the original data set (Sanderson et al., 2000). In their study, Sanderson et al. (2000) found biases favouring the Gnetalessister topology and against recovering the anthophyte hypothesis when MP was the phylogenetic criterion, for several plastid genes. More recently, Burleigh and Mathews (2007b) used this technique on a 12-locus data set that included sequence information from all three genomic compartments, and found similar biases when MP was used. However, they also found that when maximum likelihood (ML) is used instead, the apparent bias appears to be limited mostly against recovering the anthophyte hypothesis. I present sequence data from 17 slowly evolving plastid genes for a broad sampling of the major seed plant clades to address deep seed-plant phylogeny. The conservative nature of these characters is predicted to be particularly useful in the reconstruction of deep phylogenetic relationships, as they should be less prone to long-branch attraction than faster genes (Felsenstein, 1983; Graham and Olmstead, 2000a), and they have proven their usefulness for other deep and difficult phylogenetic questions in vascular-plant phylogeny (e.g., Graham and Olmstead, 2000a; Graham et al., 2006; Saarela et al., 2007; Zgurski et al., 2008; Chapters 2, 3, 5). All three families of Gnetales and a sampling of cycads (Chapter 2) are included here, and I include representatives of all major branches of conifer phylogeny (Chapter 3), in addition to Ginkgo and a representative sampling from the basal splits of angiosperm phylogeny.  83  This study is the largest to date with this level of taxon sampling (in terms of the amount of data per taxon), and it focuses exclusively on the plastid genome (cf. Burleigh and Mathews, 2004). I use the methodology outlined in Sanderson et al. (2000) to explore the possibility that systematic error may badly distort some aspects of seed-plant phylogenetic inference from plastid data, and assess the potential for this problem using two different phylogenetic criteria (MP and ML). I repeat these analyses for different sub-partitions of the data that might be expected to favour different phylogenetic hypotheses due to their different rates (rapid vs. slow), specifically for two different codon position partitions (after Sanderson et al., 2000), and for re-partitionings of the protein-coding data based on sites that are classified as slowly vs. rapidly evolving (after Burleigh and Mathews, 2004).  4.2 MATERIALS AND METHODS 4.2.1 Taxonomic and Genomic Sampling I surveyed 17 genes that collectively represent approximately one-eighth to one-ninth of the gymnosperm plastid genome. The coding regions include atpB, rbcL, ten photosystem II (psb) genes, three ribosomal protein genes, and two NADH dehydrogenase subunit genes (see Chapters 2, 3). The final matrix includes five species selected from a broad sample of the diversity of basal angiosperms (see Mathews and Donoghue, 1999; Parkinson et al., 1999; Soltis et al., 1999; Graham and Olmstead, 2000a, b; Graham et al., 2000; Qiu et al., 2000; Savolainen et al., 2000; APG II, 2003), two cycads (Cycas revoluta and Dioon purpusii), three Gnetales (representing the three extant families), 19 conifers (Pinus thunbergii was obtained from GenBank; accession number NC_001631), and five outgroup species. The outgroups are two pteridophytes (Adiantum capillus-veneris and Psilotum  84  nudum; GenBank accession numbers NC_004766 and NC_003386) and three bryophytes (Anthoceros formosae, Marchantia polymorpha and Physcomitrella patens; GenBank accession numbers NC_004543, NC_00319 and NC_005087, respectively). The 19 conifer species examined include at least one representative from each of the eight recognized families in Farjon (2001), the largest sampling to date of this diverse clade for this amount of plastid data. These sequences are a subset of those found in Chapter 3. GenBank numbers for the cycads, Gnetales and the angiosperms are provided in Table 2.1 and Graham and Olmstead (2000a, b). GenBank numbers for the 19 conifers are provided in Table 3.1. The final alignment used here comprises 35 taxa (30 seed plant taxa and five outgroups; but note that I excluded three conifer taxa here from the larger taxon set considered in Chapter 3).  4.2.2 DNA Extraction, Amplification, Sequencing and Data Assembly I extracted DNA from fresh and silica-dried specimens using the protocol of Doyle and Doyle (1987), as modified in Chapter 2. DNA amplification and sequencing methods are outlined in Graham and Olmstead (2000a). I sequenced all regions at least twice for each taxon and, with a few minor exceptions, completely sequenced all products in both forward and reverse directions. I designed a set of 27 new seed-plant specific primers to facilitate amplification and sequencing (Table 4.1). I compiled sequence contigs and performed base calling using Sequencher 4.1 (Gene Codes Corporation; Ann Arbor, MI.). I added these data to a previously generated alignment (Graham et al., 2006) and adjusted this manually in Se-Al version 1.0 (Rambaut, 1998) using alignment criteria outlined in Graham et al. (2000). I used tobacco, Ginkgo and Pinus sequences to determine gene and exon boundaries, and codon positions for each nucleotide. I  85  decided to focus exclusively on coding regions for the analyses in this chapter; it is not yet possible to realistically simulate indels in non-coding regions, which can be large and often have complex patterns of overlap. I obtained most of the regions for most taxa, but several regions have either been lost from the plastid genome (e.g., ndhB and ndhF for all representatives of Pinaceae examined to date; Raubeson and Jansen, 1992; Wakasugi et al., 1994; ndh genes for Welwitschia, McCoy et al., 2008) or I could not amplify or sequence them (see Table 3.1 for further details; rpl2 is now reported as present but hard-to-align in Welwitschia; McCoy et al., 2008). I coded these regions as missing data in the final matrix. The aligned coding regions considered for analysis comprise 12,635 bp per taxon (corresponding to 7,441 bp unaligned in Welwitschia, which has the most genes missing in the matrix, and 11,396 bp unaligned in Ginkgo, a more typical size for most taxa). Of these characters, 1,659 bp are variable but uninformative across land plants, and 4,220 bp are parsimony informative.  4.2.3 Phylogenetic Analyses I performed an heuristic MP search using PAUP* (Swofford, 2002), with all characters and character state changes equally weighted, and using TBR (tree-bisectionreconnection) branch swapping, with 100 random addition replicates, and otherwise using default settings. I also performed an ML heuristic search using PhyML (v.2.4.4; Guindon and Gascuel, 2003), with a BIONJ starting tree, NNI (nearest-neighbour-interchange) branch swapping and model parameters estimated from the data in each case. I chose a DNA substitution model for ML analysis using the hierarchical likelihood ratio test (hLRT) and the Akaike Information Criterion (AIC) in Modeltest v. 3.7 (Posada and Crandall, 1998). The  86  optimal model in both cases was GTR + ! + I [general-time-reversible (GTR) rate matrix with the proportion of invariable sites (I) considered, and among-site rate variation accounted for using the gamma (!) distribution as described by the shape parameter alpha (")]. I performed non-parametric bootstrap analysis (Felsenstein, 1985) using the same search criteria, with 100 bootstrap replicates (and a single random addition replicated per parsimony bootstrap replicate). I use ‘weak,’ ‘moderate,’ and ‘strong’ in reference to clades that have bootstrap support values < 70%, 70-89%, and ! 90%, respectively (e.g., Graham et al. 1998).  4.2.4 Inference of Nucleotide Rate Classes I partitioned the data into different rate classes using HyPhy (Kosakovsky Pond et al., 2005). HyPhy allows for partitioning of data by site rate, based on an estimate of the most likely rate class for each nucleotide given a specified model (in this case the GTR model with eight discrete rate classes), for a given user-supplied tree (here I used the best MP topology). HyPhy uses the tree to estimate all likelihood parameters, and assigns each site to its most likely individual rate category (the total number of rate classes is user specified). I partitioned the data set into nine rate classes (RC0 representing sites with no change, and RC8 the fastest sites).  4.2.5 Systematic Error I used Monte Carlo simulation studies (Sanderson et al., 2000) to explore the possibility that long-branch attraction (LBA) or other types of systematic bias may badly distort major aspects of seed-plant phylogenetic inference using MP or ML analysis. To generate model trees for the simulations I performed constrained ML searches of the full data  87  set in PAUP*, in which the topological constraints reflect particular hypotheses of monophyly. I investigated constraints corresponding to four major hypotheses of seed-plant relationship (see Fig. 4.1; the constrained branch or branches yielding an hypothesis are indicated in each case with an asterisk): A – Gnetales-sister (Gnetales as the sister group of all other seed plants); B – Gnepine (Gnetales as the sister group of Pinaceae); C – Gnetifer (Gnetales as the sister group of monophyletic conifers); D – Anthophyte (Gnetales as the sister group of angiosperms). I simulated 1000 new data sets with Seq-Gen, using each resulting ML tree (v.1.3.2; Rambaut and Grassly, 1997; only the first 100 of these were considered for subsequent ML searches in each case, due to time constraints) and considering the same model that I used for the constrained ML searches (i.e., GTR + ! + I, in addition to the estimated ML branch lengths and model parameters in each case. These simulated matrices were all set to be the same size as the original data partitions (12,635 bp) that they were based on. I then performed unconstrained MP and ML searches on the simulated data sets using the same search settings that I used for the real data (e.g., allowing model parameters to be estimated from each simulated data set, for the ML searches). These were run using the batch modes of PAUP* and PhyML, respectively. Trees resulting from these searches were imported into PAUP* and scored for specific hypotheses (i.e., topologies corresponding to each set of asterisked branches in Figs. 4.1), by filtering for topologies that do not satisfy each constraint (or constraint set; Fig. 4.1A, C), in turn. I scored trees that did not fall into the four constraint categories explored here as “other topologies recovered.” The results are summarized in a matrix in which the diagonal elements of a row give the probability of obtaining the relationship specified by the particular hypothesis used to simulate that row  88  (i.e., one minus an estimate of the type I error), and the off-diagonal elements of a column give the probability of reconstructing an incorrect relationship given the null hypothesis specified in that column (i.e., estimates of type II error). I repeated these simulation analyses on four different partitions of the data. The data partitions considered are: (1) Combined data for the first two codon positions from the plastid genes (ignoring a small overlap in psbC and psbD, in which each site falls into two different codon position classes); (2) Third codon position sites from all genes combined; (3) The six slowest rate classes defined by HyPhy, RC0-RC6 (where RC1 is the slowest class, and RC6 the fastest of these); (4) The two fastest rate classes, RC7 + RC8 (where RC8 is the fastest class of all).  4.3 RESULTS 4.3.1 Phylogenetic Analysis of the Real Data Maximum parsimony and likelihood (MP and ML) analyses of the full data set recover the same phylogenetic relationships (subtree topologies) within angiosperms and conifers, the two major seed-plant clades where I examined more than two taxa per clade (Figs. 4.2, 4.3, Supplementary Figure). They strongly support the monophyly of seed-plants as a whole, of each of the four non-monotypic clades (angiosperms, conifers, cycads, Gnetales) and of Pinaceae (94-100% support; Figs. 4.2, 4.3). However, there are strong conflicts between MP and ML concerning the relative arrangement of most of the major seed-plant clades. Maximum parsimony analysis strongly supports a clade consisting of cycads, Ginkgo and conifers, with the latter two groups then moderately supported as sister groups, and with the angiosperms strongly supported as sister to all seed plants except  89  Gnetales (Fig. 4.2). In contrast, ML analysis strongly supports a sister-group relationship between conifers and a clade consisting of Ginkgo, cycads and angiosperms, with the latter two groups then strongly supported as sister taxa (Fig. 4.3). A major common feature of higher-order seed plant relationship in MP and ML analyses of the full data set is that Gnetales are depicted as the sister group of all other seed plants, with strong support (Figs. 4.2, 4.3). I found that many characters that we might assume to be rapidly evolving because they belong to codon position 3 are actually distributed throughout the rate classes determined by HyPhy (Fig. 4.4 A, B). For example, ~41% of all sites in the moderately evolving rate classes (RC3-6; a total of 2820 sites) belong to codon position 3 (1148 sites); conversely, ~25% of the total characters in the fastest rate classes (RC78, a total of 3059 sites) belong to the “conservative” codon position classes, 1 and 2 (i.e., 774 sites) (see Fig. 4.4A). Nonetheless, as might be expected, codon positions 1 and 2 contributed more characters than codon position 3 for each of the middle three rate classes (RC3-5; Fig. 4.4B). Analyses of the codon position and rate-class data partitions reveals that the Gnetalessister relationship is strongly supported in MP and ML analyses of the two most rapidly evolving data partitions of the real data (i.e., codon position 3 and the RC78 data; headers in Tables 4.2, 4.3). In contrast, the slower data partitions typically provide moderate support for one of the two relationships considered between Gnetales and conifers: the gnepine hypothesis was moderately to strongly supported by MP and ML analysis of codon positions 1 and 2 (Fig. 4.5; headers in Tables 4.2 and 4.3), while the gnetifer hypothesis was weakly supported by ML analysis of RC0-6 (Fig. 4.5; header in Table 4.3). In contrast, MP analysis of RC0-6 yields moderate support for the Gnetales-sister hypothesis (Table 4.2).  90  4.3.2 Inference of Systematic Error Using Monte Carlo Simulations For hypothesis D (the anthophyte hypothesis) maximum likelihood was unable to assign any length to the branch subtending this clade in the tree used for simulations (i.e., based on the topology inferred using the constraint that consists of angiosperms plus Gnetales; Fig 4.1D); see arrow pointing to trichotomy in Fig. 4.6). I specifically used the ML branch lengths to generate all my simulated data; I therefore discuss the simulation results based on this tree separately (Table 4.4) from the three other hypotheses (A-C). I describe the results of analyses of the simulated data by referring first to MP and then ML results for the two most slowly evolving data partitions (RC0-6, and codon positions 1 and 2), the two fastest data partitions (RC78, and codon position 3), and finally the full data set. Recall that if there were no error in tree inference from the simulated data, the model tree (whatever that is) should always be recovered for a particular data partition. In other words, in Tables 4.2 and 4.3 ideally I should always infer high values along the diagonals, and zero (or low) values in the off-diagonals (i.e., low type I and type II error, respectively) for a given 3 x 3 matrix that summarizes the results for model trees considered for that data partition (Gnetales-sister, gnepine and gnetifer, respectively). For MP analysis, the data partitions that showed lowest error rates are for RC0-6 (close to 100% on the matrix diagonals) and codon positions 1 and 2 (91% recovery for Gnetales-sister, 100% for gnepine, 58% for gnetifer; in the final case the gnepine result was the most common mis-inference, 36%). The most poorly performing data partitions are RC78 and codon position 3 (Gnetales-sister inferred in nearly all instances; Table 4.2). For the full data set, MP inferred the correct tree when either the Gnetales-sister or gnepine hypotheses  91  were used as the correct tree, but did rather poorly when the gnetifer hypothesis was considered (I only recovered this model hypothesis 30% of the time; Gnetales-sister was then recovered 59% of the time). For ML, the simulated RC0-6 data again permitted very high recovery (100% in each case) of the model hypotheses, A-C (Table 4.3). For the codon position 1 and 2 partition, when the Gnetales-sister hypothesis was the model tree, this hypothesis was recovered slightly less often than for the corresponding MP case (81% for ML vs. 91% for MP; Tables 4.2 and 4.3). In contrast, when the gnetifer hypothesis was the model tree for this data partition, it was recovered more frequently (91% for ML vs. 58% for MP). The most rapidly evolving data partitions also performed less well for ML than the slower partitions. When RC78 was considered, ML is less error-prone than MP for the two model trees; when gnepine and gnetifer were used as model trees, they were typically recovered for this data partition with ML (100% and 76%; Table 4.3), and rarely (or never) recovered with MP (6% and 0%, respectively; Table 4.2). The most poorly performing data partition for ML, by far, was codon position 3, for which I inferred the Gnetales-sister hypothesis for all three model trees (A-C) 100% of the time. However, for the full data set, ML recovered all three model trees (A-C) with no inferred error (100% of cases; Table 4.3), in spite of the fact that the third codon position data constitute 58% of all 5879 variable sites.  4.3.3 Mis-inference of the Gnetales-Sister Hypothesis when there is No Evidence for It I noticed that the anthophyte topology had no support from the real data in an ML framework (Fig. 4.6; see zero-length branch in left-hand phylogram). ML analysis of all five simulated data partitions considered here led to the inference of three different hypotheses  92  when the anthophyte topology was used as the model tree, with a roughly even split among them (Table 4.4). For example, for RC0-6, Gnetales-sister is inferred 31% of the time, the anthophyte hypothesis 39% of the time and an additional hypothesis (with Gnetales-sister to the rest of the gymnosperms) 30% of the time. This indecision makes sense, as these three relationships are the only possible resolutions of the underlying trichotomy given the zerolength ML branch. In contrast, the MP analyses of the same simulated data (with this zerolength ML branch for anthophytes; Fig. 4.6, left-hand side) led to recovery of Gnetales-sister 100% of the time for three of five data partitions (i.e., for the full data, codon position 3 and RC78; Table 4.4). The only partition that came close to behaving as well as ML inference for the anthophyte hypothesis was RC0-6. MP analysis of this data partition yielded a roughly even split between the three same hypotheses found in the corresponding ML analyses (Table 4.4; ~41.5% for Gnetales-sister, ~31.5% for the anthophyte case and 27% for ‘Gnetales-sister to other gymnosperms’).  4.4 DISCUSSION There has been a long history of ambiguous and conflicting inferences of seed-plant phylogeny from a variety of data sources. It is worth remembering that these inferences involve very deep branches of land-plant phylogeny (seed-plants reach back at least 350 Myr; e.g. Elkinsia polymorpha; Serbet and Rothwell, 1992) and a very sparse sampling of the total diversity of major seed-plant clades (molecular data can only consider the five living groups of seed plants). Potential approaches to dealing with mis-inference of seed-plant phylogeny due to problematic long branches include considering conservative data (e.g., Graham and Olmstead, 2000a), removing data that are inferred (or presumed) to be rapidly  93  evolving (e.g., Burleigh and Mathews, 2004), including a reasonable density of taxa within major seed-plant clades (e.g., Rydin et al., 2002), using model-based approaches such as ML in preference to MP (e.g., Burleigh and Mathews, 2004, 2007b), and using simulated data to examine problematic data partitions (Sanderson et al., 2000; Burleigh and Mathews, 2004, 2007a, 2007b). These methods can also be combined (e.g., Burleigh and Mathews, 2007b), and this is the approach I took here, focussing exclusively on the performance of data from the plastid genome. Most molecular studies of seed-plant phylogeny have surveyed a few genes for many taxa (e.g., four genes for 88 gymnosperms in Rydin et al., 2002), or multiple genes for a few taxa (e.g., 12 genes for 10 gymnosperms in Burleigh and Mathews, 2007b). I considered 17 genes from 25 gymnosperms, with a particularly heavy sampling in the conifers (19 total, representing all major clades). I focussed exclusively on conservative protein-coding regions in the plastid genome in an attempt to refine our understanding of what the pitfalls may be in using this genome in inference of deep branches of land-plant phylogeny. I considered slower and faster data partitions, including first and second vs. third codon positions (changes in these largely reflect non-synonymous and synonymous changes, respectively; Sanderson et al., 2000), and data filtered for the fastest evolving ML rate classes. I also performed simulations based on these real data for a range of seed-plant hypotheses proposed in the literature, concerning the local placement of Gnetales in seed-plant phylogeny. Simulations cannot tell us what the correct answer is concerning Gnetales placement in seed-plant phylogeny. However, they can indicate which relationships may be hard to infer, and the conditions under which mis-inferences may occur. I generally find ML to be a less error-prone method than MP, except for third-codon position data, where I consistently  94  found a strong bias towards the Gnetales-sister hypothesis using both methods. The Gnetales-sister result has been seen in previous studies of seed-plant phylogeny (Rydin et al. 2002), including earlier iterations of this genomic set (Chapters 2, 3; Zgurski et al., 2008). More than half of the variable characters in the full data set belong to codon position 3 (and also RC78), despite this, ML simulations of the full data set do not exhibit the same bias towards the Gnetales-sister hypothesis (100% recovery of each hypothesis considered; Table 4.3). The simulations indicate that ML can ‘correct’ for poor characters (those that tend to lead to tree misinference), particularly when they do not form the entire data set. The data partitions that are more slowly evolving also tend to be less problematic for MP and ML analyses. It is common practice in studies that utilize protein-coding regions to partition data by codon position and subsequently exclude so-called ‘saturated’ third codon position data. I show that filtering out the two fastest ML rate classes is a more effective strategy in reducing systematic error than simply considering codon positions 1 and 2. The first two codon positions include some sites that are very rapidly evolving (~9% of the 8391 sites belonging to codon position 1 and 2 are in RC78; Fig. 4.4A), consistent with the diverse range of functional constraints found even in highly conserved proteins like rbcL (Kellogg and Juliano, 1997). Conversely, a substantial fraction of moderately evolving characters may be thrown out when all codon position 3 sites are excluded from analysis; ~33% of all 3433 variable third position sites lie outside the fastest two rate classes (Fig. 4.4A). Oddly, however, despite containing a larger fraction of conservative sites than the least conservative data (i.e., RC78, which solely includes these most rapid sites), the third codon position is clearly the most error-prone data partition considered here for ML analysis (Table 4.3). This strongly suggests that the very strong tendency I saw for codon position 3 data partition to  95  lead to tree mis-inference (Table 4.2, 4.3) is not solely a function of the high fraction of highrate sites that it contains. One of the major hypotheses of seed-plant relationships, the anthophyte hypothesis, is only rarely recovered from molecular data (e.g., Stefanovic et al., 1998), and never with strong support. When I forced the real data to fit this hypothesis, I observed a startling contrast in how it is perceived by ML vs. MP. According to the former phylogenetic method, there are no characters that support this hypothesis (Fig. 4.6). In contrast, maximum parsimony infers that there are numerous characters that support this hypothesis in the real data. The source of the conflict between MP and ML branch lengths in this situation is unclear. Nonetheless, it is of interest to see how ML and MP perform for a model hypothesis for which there is no supporting data (I used the ML estimates of branch lengths for generating simulation data). One might expect in this situation to randomly resolve the relative positions of the three clades in the trichotomy resulting from this zero-length anthophyte branch (i.e., angiosperms, Gnetales and the remaining gymnosperms). This is indeed what I see for ML. Disturbingly, however, most MP analyses of simulated data sets for the anthophyte hypothesis (Table 4.4) strongly infer a result (the Gnetales-sister hypothesis) that is completely unsupported by these simulated data, indicated by the trichotomy inferred when the real data are constrained to the anthophyte hypothesis (i.e., the model tree used for simulation; Fig. 4.6). This underlines the need for considerable caution in all inferences of seed-plant phylogeny from molecular data. This suggests that it may be desirable in simulation studies of the error of tree inference to consider a range of branch length hypotheses for a given model tree topology, particularly if we suspect that ML inference of  96  branch length is itself subject to strong bias (e.g., if evolution is heterotachous and the ML model is not). My results also bring to light a more subtle problem. The simulation results suggest that ML is generally much less prone to tree misinference than MP. For example, for the three major hypotheses shown in Table 4.3, and considering the slowest rate classes (RC0-6), the first two codon positions, or the full data set, ML typically infers whatever I pose as the model tree (81-100% inference along the diagonals for the 3 x 3 matrix). The simulations therefore suggest that ML analysis of my plastid data for these three partitions of the data should be trustworthy. And yet, I infer different hypotheses using ML inference of the real data for these partitions (Figs. 4.3, 4.5): for RC0-6 I infer the gnetifer tree (with weak support; and also find moderate support, 84%, for the monophyly of conifers); for the first two codon positions I infer the gnepine tree (with moderate support, 88%) and for the full data set I infer the Gnetales-sister hypothesis (with strong support). This continuing conflict among partitions of the data, in situations where simulations predict them to be effective, demonstrates that the simulations and tree inference methods must not be capturing one or more critical aspects of the underlying (real) model. Future studies should therefore consider more complex ML models, including ones that may take account of heterotachy or other aspects of model heterogeneity across a tree (e.g., Chapter 2, and see Tuffley and Steel, 1998; Kolaczkowski and Thornton, 2004, 2008; Zhou et al., 2007).  97  TABLE 4.1. New primers designed for this study.  Primer name/sequence (5’-3’)  Gene/region  B2F: CGTTCTAGTGCGTTGTAKATTC  3’-rps12  B3R: GATTGGAAATCRTGTATTTTC  3’-rps12  B4F: GTATGTACGGTTTGGAGGGAG  3’-rps12  B4R: GCATGAGTGTGAAAAAGGTTCC  3’-rps12 – rps7 IGS  B5F: CGTATTCTTAAACACGGAAAAAAATC  rps7  C5F: ACTTGCYATTCGTTGGTTATTAG  rps7  B6F: CAARAAGGAAGAGAYTCATAAAATG  rps7  B6R: CATTTTATGARTCTCTTCCTTYTTG  rps7  B7F: GGTTCTATTTCATCTCTTYAACAAG  ndhB  B7R: ATYAGRAGAAGAAATAGGCC  ndhB  C7R: GTTRAAGAGATGAAATAGAACCAAG  ndhB  8F: ACTYTATGTATTCCTCTATCCG  ndhB  B9F: TCTGGATATACCAARAGAGATGTAC  ndhB  B9R: GTACATCTCTYTTGGTATATCCAG  ndhB  C10F: TGGTCTTATMAATACACAAATG  ndhB  D10F: TTTRCAAGTTMGTTATTACGGGTAG  ndhB  D10R: CGAATCRCACTCCTTCATATAC  ndhB  B11F: GAGAATCAAACGATTATGCTCATTTTTTTATC  ndhB  B12F: GGAGCCGTGCGAGAWGAAAG  ndhB  B13R: ATRCAAGCAAAAGTTCCTAAATTC  ndhB  98  Primer name/sequence (5’-3’)  Gene/region  B14R: CACCAGAATAGATAAAGTTTTCC  ndhB  B40F: GGGTTGGTCCGGTCTATTGCTYTTTC  psbD  C40F: CTATTGCTYTTTCCTTGYGCTTATTTTGC  psbD  B83R: AAATCAAGTCCACCRCGTAGACATTC  rbcL  C91F: TTGTGAGGTACARCAATTATTAGG  atpB  B92R: TCCACYACTTTAATTCCTGTTTC  atpB  B93F: GGAAATGATCTTTAYATGGAAATG  atpB  99  TABLE 4.2. Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum parsimony (MP) as a search criterion. The first row in each case notes the best supported hypothesis as inferred from the real data (marked ‘X’), and its bootstrap support; the remaining rows indicate the constraint (major hypothesis) used to infer model trees used for Monte Carlo simulations, and columns indicate the fraction of trees in 1000 simulations that are inferred for the corresponding major hypothesis. Data partition  Tree A (Gnetales-sister)  Tree B (Gnepine)  Tree C (Gnetifer)  Tree D (Anthophyte)  Other topologies recovered  X (100%) 1.000 0.008 0.589  0.000 0.992 0.053  0.000 0.000 0.301  0.000 0.000 0.000  0.057  0.908 0.000  X (96%) 0.000 1.000  0.000 0.000  0.035 0.000  0.057  0.000  0.364  0.582  0.000  0.054  All codon positions Real data Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Codon positions 1 and 2 Real data Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Codon position 3 Real data  X (100%)  Tree A (Gnetales-sister)  1.000  0.000  0.000  0.000  Tree B (Gnepine)  1.000  0.000  0.000  0.000  Tree C (Gnetifer)  1.000  0.000  0.000  0.000  100  Data partition  Tree A (Gnetales-sister)  Tree B (Gnepine)  Tree C (Gnetifer)  Tree D (Anthophyte)  Other topologies recovered  Rate Classes (RC) 0-6 Real data  X (80%)  Tree A (Gnetales-sister)  0.993  0.000  0.000  0.007  Tree B (Gnepine)  0.000  0.993  0.004  0.000  Tree C (Gnetifer)  0.000  0.004  0.996  0.000  0.003  RC78 Real data  X (100%)  Tree A (Gnetales-sister)  1.000  0.000  0.000  0.000  Tree B (Gnepine)  0.939  0.061  0.000  0.000  Tree C (Gnetifer)  1.000  0.000  0.000  0.000  101  TABLE 4.3. Major seed-plant hypotheses inferred from various partitions of real and simulated data using maximum likelihood (ML) as a search criterion. The first row in each case notes the best supported hypothesis as inferred from the real data (marked ‘X’), and its bootstrap support; the remaining rows indicate the constraint (major hypothesis) used to infer model trees used for Monte Carlo simulations, and columns indicate the fraction of trees in 100 simulations that are inferred for the corresponding major hypothesis. Data partition  Tree A (Gnetales-sister)  Tree B (Gnepine)  Tree C (Gnetifer)  Tree D (Anthophyte)  Other topologies recovered  X (100%) 1.00 0.00 0.00  0.00 1.00 0.00  0.00 0.00 1.00  0.00 0.00 0.00  0.81 0.00  X (88%) 0.00 1.00  0.00 0.00  0.09 0.00  0.10  0.00  0.05  0.91  0.00  0.04  All codon positions Real data Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Codon positions 1 and 2 Real data Tree A (Gnetales-sister) Tree B (Gnepine) Tree C (Gnetifer) Codon position 3 Real data  X (100%)  Tree A (Gnetales-sister)  1.00  0.00  0.00  0.00  Tree B (Gnepine)  1.00  0.00  0.00  0.00  Tree C (Gnetifer)  1.00  0.00  0.00  0.00  102  Data partition  Tree A (Gnetales-sister)  Tree B (Gnepine)  Tree C (Gnetifer)  Tree D (Anthophyte)  Other topologies recovered  Rate Classes (RC) 0-6 Real data  X (65%)  Tree A (Gnetales-sister)  1.00  0.00  0.00  0.00  Tree B (Gnepine)  0.00  1.00  0.00  0.00  Tree C (Gnetifer)  0.00  0.00  1.00  0.00  RC78 Real data  X (100%)  Tree A (Gnetales-sister)  1.00  0.00  0.00  0.00  Tree B (Gnepine)  0.00  1.00  0.00  0.00  Tree C (Gnetifer)  0.00  0.11  0.76  0.00  0.13  103  TABLE 4.4. Major seed-plant hypotheses inferred from simulations of various partitions of the real data constrained to the anthophyte hypothesis (Gnetales united with angiosperms). Both maximum parsimony (MP) and maximum likelihood results are shown. The rows indicate the data partition used to simulate data given the constraint tree (Tree D, anthophyte), and columns indicate the fraction of trees (MP=1000 simulations; ML=100 simulations) that are inferred for the corresponding major hypothesis. Data partition  Tree A (Gnetales-sister)  Tree B (Gnepine)  Tree C (Gnetifer)  Tree D (Anthophyte)  Other topologies recovered*  All codon positions  1.000  0.000  0.000  0.000  Codon positions 1 and 2 Codon position 3 Rate Classes (RC) 0-6  0.745 1.000 0.415  0.000 0.000 0.000  0.000 0.000 0.000  0.077 0.000 0.316  RC78  1.000  0.000  0.000  0.000  All codon positions  0.41  0.00  0.00  0.23  0.36  Codon positions 1 and 2  0.29  0.00  0.00  0.30  0.41  Codon position 3  0.30  0.00  0.00  0.31  0.39  Rate Classes (RC) 0-6  0.31  0.00  0.00  0.39  0.30  RC78  0.41  0.00  0.00  0.33  0.26  Maximum parsimony 0.178 0.269  Maximum likelihood  *The other topology recovered in every case depicts Gnetales as the sister group of all other gymnosperms  104  Figure 4.1. Various seed-plant topologies proposed in the literature with regard to the position of Gnetales. [ANG = angiosperms, CON = conifers (Pinaceae + Cupressophyta); CUP = non-Pinaceae conifers, (or Cupressophyta; Cantino et al. 2007), GNE = Gnetales, GYM = other gymnosperms, PIN = Pinaceae]. Asterisks denote branches that were constrained in ML searches used to generate the model trees for simulations; the other branches depict the overall arrangements for the other major seed-plant clades found in the respective constrained ML searches of the full data set.  105  Figure 4.2. Plastid-based phylogeny of the conifers and relatives. The tree is the one most parsimonious trees recovered (17,584 steps, CI=0.506, RI=0.616) found using coding regions from 17 plastid genes. Bootstrap values are indicated above branches.  106  Figure 4.3. Maximum likelihood tree (-lnL=95,771.648) found using coding regions from 17 plastid genes and including all 9 rate classes (RC0-RC8). The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches.  107  Figure 4.4 A. Proportion of the total nucleotides in each of two codon-position plastid data partitions (codon positions 1 and 2 vs. codon position 3) that belong to different rate classes; B. Proportion of the total characters in each of nine rate classes that belong to the two codon position data partitions. In each graph, RC0 represents sites that do not change, and RC8 the fastest sites. These values were estimated using the best MP tree (Fig. 4.2) and the GTR + ! model (with model parameters determined from the data in each case). Note that two thirds of all 12590 sites considered belong to the data partition comprising codon positions 1 and 2 (8391 characters).  108  109  110  Figure 4.5. Maximum likelihood tree (-lnL=38,508.097) found using codon positions 1 and 2 for multiple plastid genes. The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches; values for codon positions 1 and 2 are before the slash, and numbers after the slash are maximum likelihood bootstrap values for the partitioned data that excludes the two fastest rate classes (RC0-6). The double-headed arrow (and associated bootstrap value) represents an alternative placement of Pinaceae (gnetifer hypothesis) found using the RC0-6 data.  111  Figure 4.6. Depiction of the zero-length branch (left side tree; large arrow) when maximum likelihood is used as the criterion for viewing the anthophyte hypothesis for coding regions from 17 plastid genes. The same constraint under the maximum parsimony criterion is displayed on the right hand side. Each phylogram is a depiction of the same tree, which was obtained from a heuristic ML search constrained according to the anthophyte hypothesis (Fig. 4.1D), shown here with branch lengths optimized using ML (GTR + ! + I model, with all parameters estimated using ML), and MP (ACCTRAN optimization), respectively.  112  113  Supplementary Figure. Relationships within the conifer clades presented in Figs. 4.2 and 4.3. The tree shown here is a portion of the tree recovered using maximum parsimony for coding regions from 17 plastid genes; the likelihood tree is identical in topology. Parsimony and likelihood bootstrap values are indicated before and after the slash, respectively.  Juniperus 100/100 100/100 Thuja 100/100 Widdringtonia 100/100 Taxodium 100/100  Metasequoia Cunninghamia 100/100 82/ Taxus 79 100/100 Torreya 97/98 Cephalotaxus Sciadopitys 100/100 72/70 Podocarpus 100/100 Saxegothaea Phyllocladus 100/94 100/ 100/ Agathis 100 100 Araucaria Abies 74/53 Cedrus 100/100 Pinus 81/90 Pseudotsuga  Cupressaceae s.l.  Taxaceae Cephalotaxaceae Sciadopityaceae Podocarpaceae Araucariaceae  Pinaceae  114  4.5 REFERENCES ANGIOSPERM PHYLOGENY GROUP (APG II). 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141: 399-436. BERGSTEN, J. 2005. A review of long-branch attraction. Cladistics 21: 163-193. BOIVIN, R., M. RICHARD, D. BEAUSEIGLE, J. BOUSQUET, AND G. BELLEMARE. 1996. Phylogenetic inferences from chloroplast chlB gene sequences of Nephrolepis exaltata (Filicopsida), Ephedra altissima (Gnetopsida), and diverse land plants. Mol. Phyl. Evol. 6: 19-29. BOWE, L. M., G. COAT, AND C. W. DEPAMPHILIS. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales’ closest relatives are conifers. Proc. Acad. Nat. Sci. 97: 4092-4097. BURLEIGH, J. G. AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am. J. Bot. 91: 1599-1613. BURLEIGH, J. G. AND S. MATHEWS. 2007a. Assessing among-locus variation in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 111-124. BURLEIGH, J. G. AND S. MATHEWS. 2007b. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, and M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822–846. CHANG, J. T. 1996. Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Math. Biosci. 134: 189-215.  115  CHAW, S-M., A. ZHARKIKH, H-M. SUNG, T-C. LAU, AND W-H. LI. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14: 56-68. CHAW, S., C. L. PARKINSON, Y. CHENG, T. M. VINCENT, AND J. D. PALMER. 2000. Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Acad. Nat. Sci. 97: 4086-4091. CRANE, P. R., P. HERENDEEN, AND E. M. FRIIS. 2004. Fossils and plant phylogeny. Amer. J. Bot. 91: 1683-1699. DONOGHUE, M. J. AND J. A. DOYLE. 2000. Demise of the anthophyte hypothesis? Curr. Biol. 10: R106-R109. DOYLE, J. A. 1996. Seed plant phylogeny and the relationships of Gnetales. Int. J. Plant Sci. 157: S3-S39. DOYLE, J. A. 2006. Seed ferns and the origin of angiosperms. J. Torrey. Bot. Soc. 133: 169209. DOYLE, J. A., AND M. J. DONOGHUE. 1986. Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach. Bot. Rev. 52: 321-431. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. FARJON, A. 2001. World checklist and bibliography of conifers, 2nd edition. The Bath Press, Bath, England. FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27: 401-410.  116  FELSENSTEIN, J. 1983. Parsimony in systematics: biological and statistical issues. Ann. Rev. Ecol. Syst. 14: 313-333. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783–791. FRIIS, E. M., P. R. CRANE, K. R. PEDERSON, S. BENGTSON, P. C. J. DONOGHUE, G. W. GRIMM, AND M. STAMPANONI. 2007. Phase-contrast  X-ray microtomography links Cretaceous  seeds with Gnetales and Bennettitales. Nature 450: 549-553. GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A. ANTONOV, AND W. MARTIN. 1996. Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Mol. Biol. Evol. 13: 383-396. GRAHAM, S. W., J. R. KOHN, B. R. MORTON, J. E. ECKENWALDER, AND S. C. H. BARRETT. 1998. Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. Syst. Biol. 47: 545-567. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161:S83-S96.  117  GRAHAM, S.W., J.M. ZGURSKI, M.A. MCPHERSON, D.M. CHERNIAWSKI, J.M. SAARELA, V.L. BIRON, J.C. PIRES, R.G. OLMSTEAD, M.W. CHASE AND H.S. RAI. 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Pp. 3-21. In Monocots: comparative biology and evolution (excluding Poales) Edited by J. T. Columbus, E. A. Friar, J. M. Porter, L. M. Prince, and M. G. Simpson. Rancho Santa Ana Botanic Garden, Claremont, CA. GUINDON, S. AND O. GASCUEL. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704. HAJIBABAEI, M., J. XIA, AND G. DROUIN. 2006. Seed plant phylogeny: Gnetophytes are derived conifers and a sister group to Pinaceae. Mol. Phylog. Evol. 40: 208-217. HENDY, M. D. AND D. PENNY. 1989. Framework for the quantitative study of evolutionary trees. Syst. Zool. 38: 297-309. HILL, C. R., AND P. R. CRANE. 1982. Evolutionary cladistics and the origin of the angiosperms. Pp. 269-361. In: Problems of phylogenetic reconstruction (K.A. Joysey and E.A. Friday, eds.). Academic Press, London. HILTON, J., AND R. M. BATEMAN. 2006. Pteridosperms are the backbone of seed-plant phylogeny. J. Torrey. Bot. Soc. 133: 119-168. HUELSENBECK, J. P. 1995. Performance of phylogenetic methods in simulation. Syst. Biol. 44: 17-48. HUELSENBECK, J. P. 1998. Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol. 47: 519-537. KELLOGG, E. A. AND N. D. JULIANO. 1997. The structure and function of RuBisCo and their implications for systematic studies. Am. J. Bot. 84: 413-428.  118  KOLACZKOWSKI, B. AND J. W. THORNTON. 2004. Performance of maximum pasimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431: 980-984. KOLACZKOWSKI, B. AND J. W. THORNTON. 2008. A mixed branch length model of heterotachy improves phylogenetic accuracy. Mol. Biol. Evol. 25: 1054-1066. KOSAKOVSKY POND, S. L., S. D. W. FROST, AND S. V. MUSE. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676-679. LOCONTE, H. AND D. W. STEVENSON. 1990. Cladistics of the spermatophyta. Brittonia. 42: 197-211. MADDISON, D. R., M. D. BAKER, AND K. A. OBER. 1999. Phylogeny of carabid beetles inferred from 18S ribosomal DNA (Coleoptera: Carabidae). Syst. Entomol. 24: 103-138. MATHEWS, S., AND M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286: 947-950. MCCOY, S. R., J. V. KUEHL, J. L. BOORE, AND L. A. RAUBESON. 2008. The complete plastid genome sequence of Welwitschia mirabilis: an unusually compact plastome with accelerated divergence rates. BMC Evol. Biol. 8: 130. NIXON, K. C., W. L. CREPET, D. STEVENSON, AND E. M. FRIIS. 1994. A reevaluation of seed plant phylogeny. Ann. Missouri Bot. Gard. 81: 484-533. PARKINSON, C. L., K. L. ADAMS, AND J. D. PALMER. 1999. Multigene analyses identify the three earliest lineage of extant flowering plants. Curr. Biol. 9: 1485-1488. POSADA, D., AND K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14: 817-818.  119  QIU, Y., J. LEE, J, F. BERNASCONI-QUADRONI, D. E. SOLTIS, P. S. SOLTIS, M. ZANIS, E. A. ZIMMER, Z. CHEN, V. SAVOLAINEN, AND M. W. CHASE. 2000. Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int. J. Plant Sci. 161: S3-S27. RAI, H. S., H.E. O’BRIEN, P. A. REEVES, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phylogenet. Evol. 29: 350-359. RAI, H. S., P. A. REEVES, R. PEAKALL, R. G. OLMSTEAD, AND S. W. GRAHAM. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 86: 659669. RAMBAUT, A., AND N.C. GRASSLY. 1997. SEQ-GEN: an application for the Monte carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13: 235-238. RAMBAUT, A. 1998. “Se-Al (Sequence Alignment Editor Version 1.0),” Computer program and documentation. Department of Zoology, University of Oxford, UK. RAUBESON, L. A., AND R. K. JANSEN. 1992. A rare chloroplast-DNA structural mutation is shared by all conifers. Biochem. Syst. Ecol. 20: 17-24. ROTHWELL, G. W. 1982. New interpretation of the earliest conifers. Rev. Palaeobot. Palynol. 37: 7-28. ROTHWELL, G. W., AND R. SERBET. 1994. Lignophyte phylogeny and the evolution of spermatophytes: a numerical cladistic analysis. Syst. Bot. 19: 443-482. ROTHWELL, G. W., AND R. A. STOCKEY. 2002. Anatomically preserved Cycadeoidea (Cycadeoidaceae) with a reevaluation of systematic characters for the seed cones of Bennettitales. Am. J. Bot. 89: 1447-1452.  120  RYDIN, C., M. KÄLLERSJÖ, AND E. M. FRIIS. 2002. Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int. J. Plant Sci. 163: 197-214. SAARELA, J. M., H. S. RAI, J. A. DOYLE, P. K. ENDRESS, S. MATHEWS, A. D. MARCHANT, B. G. BRIGGS, AND S. W. GRAHAM. 2007. Hydatellaceae identified as a new branch near the base of the angiosperm phylogenetic tree. Nature 446: 312-315. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. SHER KHAN, AND S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17: 782-797. SAVOLAINEN, V., M. W. CHASE, S. B. HOOT, C. M. MORTON, D. E. SOLTIS, C. BAYER, M. W. FAY, A. Y. DE BRUIJN, S. SULLIVAN, AND Y. QIU. 2000. Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Syst. Biol. 49: 306-362. SERBET, R. AND G. W. ROTHWELL. 1992. Characterizing the most primitive seed ferns. A reconstruction of Elkinsia polymorpha. Int. J. Plant Sci. 153: 602-621. SOLTIS, P. S., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402: 402-404. SOLTIS, D. E., P. S. SOLTIS, AND M. ZANIS. 2002. Phylogeny of seed plants based on evidence from eight genes. Am. J. Bot. 89: 1670-1681. STEFANOVIC, S., M. JAGER, J. DEUTSCH, J. BROUTIN, AND M. MASSELOT. 1998. Phylogenetic relationships of conifers inferred from partial 28S rRNA gene sequences. Am. J. Bot. 85: 688-697.  121  SWOFFORD, D. L.; P. J. WADDELL; J. P. HUELSENBECK; P. G. FOSTER; P. O. LEWIS; AND J. S. ROGERS. 2001. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50: 525-539. SWOFFORD, D. L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. TUFFLEY, C. AND M. STEEL. 1998. Modeling the covarion hypothesis of nucleotide substitution. Math Biosci. 147: 63-91. WAKASUGI, T., J. TSUDZUKI, S. ITO, K. NAKASHIMA, T. TSUDZUKI, AND M. SUGIURA. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Natl. Acad. Sci. USA 91: 9794-9798. WINTER, K. U., A. BECKER, T. MUNSTER, J. T. KIM, H. SAEDLER, AND G. THEISSEN. 1999. MADS-box genes reveal that gnetophytes are more closely related to conifers than to flowering plants. Proc. Natl. Acad. Sci. USA 96: 7342-7347. ZGURSKI, J. M., H. S. RAI, Q. M. FAI, D. J. BOGLER, J. FRANCISCO-ORTEGA, S. W. GRAHAM. 2008. How well do we understand the overall backbone of cycad phylogeny? New insights from a large, multigene plastid data set. Mol. Phylogenet. Evol. 47: 1232-1237. ZHOU, Y., N. RODRIGUE, N. LARTILLOT, AND H. PHILIPPE. 2007. Evaluation of the models handling heterotachy in phylogenetic inference. BMC Evol. Biol. 7: 206.  122  CHAPTER 51 INFERENCE OF DEEP VASCULAR-PLANT PHYLOGENY, WITH A FOCUS ON BACKBONE RELATIONSHIPS IN MONILOPHYTA  5.1 INTRODUCTION The fossil record of the vascular plants (tracheophytes; Tracheophyta of Cantino et al., 2007) reaches back at least 410 Myr (Lang, 1937; Cooksonia, the earliest evidence of tracheophytes in the fossil record), representing some of the first fossil traces of land plants. Vascular plants have a dominant diploid sporophytic phase, in contrast to the other land plants (the bryophytes, with a dominant haploid phase). The gametophyte (haploid phase) of vascular plants is reduced but is nutritionally independent of the sporophyte plant in most major clades (the seed plants, Spermatophyta [Chapters 2-4] are one major exception). The sporophytes of extant vascular plants usually have true roots and branched stems bearing leaves and multiple sporangia. Although roots and leaves characterize almost all extant vascular plants, these clearly arose several times in parallel (Doyle, 1998), and so neither organ type constitutes a synapomorphy for the clade as a whole. A branched sporophyte is a synapomorphy for the vascular plants, or more precisely for Polysporangiophyta (a slightly larger clade that includes all the plants with branched sporophytes, Kenrick and Crane, 1997). In contrast to other land plants, the vascular plants have a continuous system of true vascular tissues (xylem and phloem) that transports nutrients and water around the plant body. 1  A version of this chapter will be submitted for publication: RAI, H. S., AND S. W. GRAHAM. Deep vascular-plant phylogeny, with a focus on backbone relationships in Monilophyta. 123  Tracheophytes were traditionally divided into the seed plants and the seedless vascular plants, the latter also known as the “seed free” or “free-sporing” vascular plants, the “ferns and fern allies”, or the “pteridophytes” (“Pteridophyta”). However, it is now wellrecognized that the deepest extant phylogenetic split in vascular plants is between the lycophytes and euphyllophytes (see Fig. 1.1; Raubeson and Jansen, 1992; Kranz and Huss, 1996; Kenrick and Crane, 1997; and Pryer et al., 2001, 2004; referred to, respectively, as Lycopodiophyta and Euphyllophyta in Cantino et al., 2007). Lycophytes have microphylls (simple, often small leaves that usually bear a single unbranched vascular trace), whereas most euphyllophytes have megaphyllous leaves (typically large leaves with a complex structure, often with multiple, branched vascular traces). These two major leaf types are not homologous. The euphyllophytes are in turn divided into two major clades (see Fig. 1.1); the seed plants (Chapters 2-4) and monilophytes (Monilophyta; Cantino et al., 2007). Both ‘monilophytes’ and ‘euphyllophytes’ unfortunately derive their names from structural concepts that either apply in part to unrelated taxa (i.e., “moniliform” steles of extinct fern clades that are not closely related to the monilophytes; Rothwell and Stockey, 2008) or to organs that are not synapomorphic for the clade (i.e. the “true leaves” or megaphylls of euphyllophytes, see below). The monilophytes, which are all seedless, comprise the whisk ferns, horsetails and many taxa traditionally thought of as “true” ferns (i.e., the two extant eusporangiate fern families, Ophioglossaceae and Marattiaceae, and the various families of leptosporangiate ferns (Polypodiopsida of Smith et al., 2006; Leptosporangiatae of Cantino et al., 2007). Various groups of extinct and extant plants have been referred to as ferns. Classically ferns were construed as the vascular plants that have megaphyllous leaves but which lack seeds  124  (e.g., “Filicophyta” in Gifford and Foster, 1989). Although seedless, not all monilophytes have obvious megaphylls, perhaps by reduction (i.e., the whisk ferns, Psilotaceae, apparently lack megaphylls; extant horsetails have leaves that resemble microphylls). Most recently, however, Pryer et al. (2004) referred to the entire monilophyte clade as ferns. This expanded phylogenetic usage of “fern” is arguably terminologically confusing, since it includes taxa that would not traditionally have been recognized as ferns (horsetails, whisk ferns). However, the classical usage is itself problematic, as the megaphyllous leaf likely arose multiple times in euphyllophyte phylogeny from simple or compound overtopped branches (e.g., Doyle, 1998; Boyce and Knoll, 2002), giving rise to apparently independently derived extant and extinct “fern” clades, some likely outside the monilophyte clade inferred from molecular data (e.g., Rothwell, 1999; Rothwell and Nixon, 2006; Rothwell and Stockey, 2008). Moreover, the seed plants are nested in a larger clade (lignophytes) that includes Archaeopteris and other megaphyllous but seedless plants (which might be considered ferns according to the morphological definition, although they are not traditionally recognized as such). Adding to the terminological confusion, many branches of early seed-plant phylogeny are informally referred to as “seed ferns” (pteridosperms; see Doyle, 2006 for a discussion of seed ferns and their relationship to angiosperms). Here I avoid use of the word fern, unless it is part of an informal name. Considerable progress has been made at various levels of vascular-plant phylogeny. Within seed plants, for example, we now have a remarkably clear picture of most aspects of conifer phylogeny (Chapter 3). Within extant lycophytes, the two heterosporous lycophyte families (Isoëtaceae and Selaginellaceae) are recognized as a clade that is sister to the homosporous lycophytes (Lycopodiaceae; e.g. Kenrick and Crane, 1997; Wikström and  125  Kenrick, 1997), and the phylogenetic structure of several lycophyte families is becoming clear (e.g. Lycopodiaceae; Wikström and Kenrick, 1997, 2000; Selaginellaceae; Korall and Kenrick, 2004; Isoëtaceae; Rydin and Wikström, 2002). Within monilophytes, we now have reasonably well-resolved pictures of the major relationships within Ophioglossaceae (Wagner, 1990; Hauk et al., 2003), Marattiaceae (Murdock, 2008), Equisetaceae (Des Marais et al., 2003; Guillon, 2004, 2007) and for much of leptosporangiate fern phylogeny. Many phylogenetic studies of monilophytes have focused primarily on the leptosporangiate ferns, by far the largest branch of monilophyte phylogeny (33 families; Smith et al. 2006). These studies are largely congruent with one another (e.g., Hasebe et al., 1995; Pryer et al., 1995, 2001, 2004; Schneider et al., 2004; Schuettpelz et al. 2006; Schuettpelz and Pryer, 2007). For example, they find well-supported relationships within the leptosporangiate ferns, including the monophyly of heterosporous ferns, tree ferns, and polypod ferns [the relative arrangement of these ‘core leptosporangiates’ has only recently been established with strong support, Schuettpelz et al., (2006)], in addition to strong support for the placement of Osmundaceae as the sister group of all other leptosporangiate ferns. Deeper aspects of the structure of vascular-plant phylogeny remain controversial (cf. Doyle, 1998; Pryer et al. 2001; Wikström and Pryer, 2005; Rothwell and Nixon, 2006), including the relationships among the five major groups of seed plants (Chapters 2-4), and among the five major lines of monilophytes (Equisetaceae, Ophioglossaceae, Psilotaceae, Marattiaceae and the leptosporangiate ferns). Within monilophytes, there is only strong molecular evidence for a sister-group relationship between two of five subclades (between the whisk ferns, Psilotaceae, and a family of eusporangiate ferns, Ophioglossaceae), and several aspects of the backbone of leptosporangiate fern phylogeny remain incompletely  126  understood (Pryer et al., 2001, 2004; Schneider et al., 2004; Schuettpelz et al., 2006; Schuettpelz and Pryer, 2007). In this study I broadly explore monilophyte relationships and their placement within the vascular plants using 17 plastid genes; these markers, and their associated noncoding regions have proven useful for various deep phylogenetic questions (Graham and Olmstead, 2000a; Rai et al. 2003, 2008; Chapters 3, 4). My study represents one of the largest molecular datasets (in terms of nucleotides sequenced per taxon) compiled for vascular plants to date, and samples multiple representatives of all living vascular plant groups, including a majority of the fern families recognized by Smith et al. (2006; Fig. 5.1). The main purpose of this study is to determine whether this expanded gene set permits robust resolution of vascular-plant phylogeny, particularly for the relationships among the major clades of monilophytes and vascular plants. I use filtering methods employed in two previous chapters (Chapters 3, 4) to assess whether removing the fastest evolving characters has a major influence on these phylogenetic reconstructions. I also investigate the potential influence of one very long branch in the inferred phylogeny (Selaginella; Selaginellaceae) on phylogenetic estimation.  5.2 MATERIALS AND METHODS 5.2.1 Taxonomic and Genomic Sampling The final matrix considered here includes 64 representatives of the major vascular plant lineages (see Table 5.1 for a list of newly generated sequences and associated GenBank accession numbers). Of these, four represent the major lineages of lycophytes (Selaginella uncinata and Huperzia lucidula sequences were obtained from GenBank; accession numbers  127  AB197035 and NC_006861, respectively). I include six angiosperms to represent the basal structure of angiosperm phylogeny (see Chapter 4; and see Saarela et al., 2007), and included 16 gymnosperms that are broadly representative of relationships inferred in Chapters 3, 4. I also included four bryophyte outgroups (GenBank accession numbers for Anthoceros formosae, Marchantia polymorpha and Physcomitrella patens are NC_004543, NC_00319 and NC_005087, respectively; see Table 5.1 for Sphagnum sp.). The remaining 34 taxa represent each of the major lineages of monilophytes (Fig. 5.1, see also Fig. 1.1), including eight eusporangiate taxa (Angiopteris evecta and Psilotum nudum sequences were obtained from GenBank, accession numbers NC_008829 and NC_003386, respectively; see Table 5.1 for the remainder) and 26 representative leptosporangiate ferns (Polypodiopsida) (Adiantum capillus-veneris, GenBank accession number NC_004766; see Table 5.1 for the remainder). The latter represent each of the 7 orders and 21 of the 33 families of leptosporangiate ferns recognized by Smith et al. (2006; see Fig. 5.1 here). The leptosporangiate fern families that I have not sampled for this study are mainly limited to small families (each with one to five genera) in Polypodiales and Cyatheales (Fig. 5.1). I surveyed 17 genes and associated non-coding regions that represent approximately 10% of the monilophyte plastid genome (reference taxon = Adiantum capillus-veneris). The regions I retrieved here are the same coding regions as Chapters 2, 3, 4 and non-coding regions from Chapters 2 and 3 with one exception, the intergenic spacer (IGS) between rps7 and ndhB that is present in all seed plants surveyed for this study, is apparently not present as a contiguous region in most leptosporangiate ferns due to a large inversion that involves a large portion of the inverted repeat (Raubeson and Stein, 1995; Wolf et al., 2003), precluding its recovery.  128  5.2.2 DNA Extraction, Amplification and Sequencing Genomic DNA was extracted using the protocol of Doyle and Doyle (1987), as modified in Chapter 2, from fresh, silica-dried, and herbarium specimens. Amplification and sequencing follows Chapters 2, 3, and 4 (and as originally described in Graham and Olmstead, 2000a) and, with a few minor exceptions, I completely sequenced all products in both forward and reverse directions. I designed a set of 16 new fern specific primers to facilitate amplification and sequencing (Table 5.2). The new data were added to a previously generated alignment (Chapter 3) using alignment criteria outlined in Graham et al. (2000) and Chapters 2, 3, 4. Regions in some taxa that I could not amplify or sequence (see Table 5.1) were coded as missing data in the final matrix. The aligned regions considered for analysis include all of the coding regions and unambiguously aligned non-coding regions from two introns (rpl2 and ndhB) and seven intergenic spacer regions (3’-rps12-rps7, and three each in the psbE-psbF-psbL-psbJ and psbB-psbT-psbN-psbH clusters). The final aligned matrix includes 36,139 bp per taxon (corresponding to 11,726 bp unaligned in Vandenboschia davallioides, for example). Of these aligned characters, 2573 bp are variable but uninformative across vascular plants, and 7479 bp are parsimony informative.  5.2.3 Phylogenetic Analyses All phylogenetic analyses were conducted using the full 64-taxon matrix. I performed an heuristic maximum parsimony (MP) search using PAUP* (ver. 4.0b10; Swofford, 2003), with all characters and character state changes equally weighted, and using  129  TBR (tree-bisection-reconnection) branch swapping, with 100 random addition replicates, and otherwise using default settings. I also performed a maximum likelihood (ML) heuristic search using PhyML (ver. 2.4.4; Guindon and Gascuel, 2003), with a BIONJ starting tree, NNI (nearest-neighbour-interchange) branch swapping and model parameters estimated from the data in each case. I chose a DNA substitution model for ML analysis using the hierarchical likelihood ratio test (hLRT) and the Akaike Information Criterion (AIC) in Modeltest ver. 3.7 (Posada and Crandall, 1998). The optimal model chosen in each case was GTR + ! + I [general-time-reversible (GTR) rate matrix with proportion of invariable sites (I) considered, and among-site rate variation accounted for using the gamma (!) distribution as described by the shape parameter alpha (")]. I performed non-parametric bootstrap analysis (Felsenstein, 1985) using the same search criteria, with 100 bootstrap replicates (and a single random addition replicate per parsimony bootstrap replicate). I use ‘weak,’ ‘moderate,’ and ‘strong’ in reference to clades that have bootstrap support values < 70%, 7089%, and ! 90%, respectively (e.g., Chapters 2, 3, 4 and Graham et al., 1998).  5.2.4 Inference of Nucleotide Rate Classes and Exploration of the Effect of Long Branches I partitioned the data into nine rate classes using HyPhy (Kosakovsky Pond et al., 2005; and see Chapter 4). I used the best MP tree topology (above) to assign each of the 36,139 aligned nucleotide sites to its most likely individual rate category (using the GTR model and eight discrete rate classes). 26,087 of the sites were assigned to RC0, representing the total number of sites with no change across this data set. The fastest rate class (RC8)  130  included 2075 sites. After excluding the two fastest rate classes (RC7 and RC8), I performed a maximum-likelihood heuristic search using the search criteria outlined above. It is apparent that the sequence for Selaginella is quite divergent when compared to the rest of the vascular plants (see Results; this was also evident from a visual examination of the DNA sequence alignment). I recalculated the rate class partitioning using a new MP tree (found using PAUP*), after the removal of Selaginella. I then performed an ML search (using the GTR + ! + I model and search criteria outlined above, with parameters estimated from the data) of the seven slowest rate class partitions (RC0-6). I also performed MP and ML analyses of the full (unfiltered) data set after excluding various subsets (and in one case all) of the four bryophyte outgroup taxa to examine their possible effect on ingroup (vascular plant) relationships.  5.3 RESULTS 5.3.1 Phylogenetic Analyses Maximum parsimony (MP) and maximum likelihood (ML) analyses of the full data set (i.e., all sites, including noncoding regions) produce generally similar trees, with a majority of backbone branches strongly supported by bootstrap analysis. This includes strong bootstrap support (94% MP and 100% ML) for the monophyly of monilophytes (Figs. 5.2 and 5.3). I also performed MP and ML analyses of the coding regions only (e.g., Chapter 4) and found that trees were topologically very similar (not shown here; exceptions include branches in the analyses of the full data-set with <50% bootstrap support; these tended to vary in their relative arrangement of constituent clades in MP vs. ML analysis of the full data  131  set). Because the results of the analyses involving the full data set vs. the coding regions are broadly comparable, I discuss the latter only here in detail. There are several substantial conflicts between MP and ML analysis. The MP analyses provide strong support for lycophytes as the sister group of seed plants (94% support; Fig. 5.2), whereas the ML analysis provides moderate support for them as the sister group of all other vascular plants (81% support; Fig. 5.3). Within the seed plants, MP trees recover Gnetales as the sister group of all other seed plants, while Gnetales are associated with conifers in ML analysis (with 100% and 73% bootstrap support, respectively; see Chapter 4 for a more thorough examination of seed-plant relationships) Within the monilophytes, the leptosporangiate ferns are strongly supported as monophyletic (100% MP and ML support; Figs. 5.2, 5.3). Both MP and ML strongly unite the eusporangiate fern lineage Ophioglossaceae with whisk ferns (Psilotaceae) and each family is strongly supported as monophyletic (Figs. 5.2, 5.3). The other eusporangiate fern lineage, Marattiaceae, is also strongly supported as monophyletic (99% and 100% bootstrap support; Figs. 5.2, 5.3) although MP is unable to robustly resolve the exact relationship of these two clades relative to the leptosporangiate ferns (Fig. 5.2). Maximum likelihood provides moderate support for Marattiaceae as the sister group of the leptosporangiate ferns (82% support; Fig. 5.3). ML and MP analyses both resolve Equisetum as the sister group of all other monilophytes, with moderate to poor support (74% and 58% support from MP and ML analysis, respectively). In the leptosporangiate ferns, Leptopteris (Osmundaceae) is strongly supported as the sister group of the remaining taxa, followed by moderately well supported splits between Hymenophyllaceae (=Hymenophyllales) and the rest, and then between gleichenioid ferns,  132  Gleicheniales (a strongly supported clade that includes Dipteridaceae, Matoniacaeae, and Gleicheniaceae; 92% support from both MP and ML; Figs. 5.2, 5.3) and the remainder. The schizaeoid ferns, Schizaeales (represented here by Lygodium and Schizaea; the monophyly of which is well supported) are strongly supported as the sister group of a clade that comprises the heterosporous water ferns, Salviniales (the monophyly of which is also strongly supported; 100% and 99% support from MP and ML analysis, respectively), the tree ferns, Cyatheales (monophyly strongly supported; 100% support from both MP and ML analysis) and the polypod ferns, Polypodiales (monophyly strongly supported; 100%). Within this large clade, a sister group relationship between Cyatheales and Polypodiales is also moderately well supported (76% vs. 89% from MP and ML analysis, respectively); consequently, Salviniales are moderately well supported as the sister group of CyathealesPolypodiales. At the current taxon sampling, several families with more than one representative are strongly supported as monophyletic. Hymenophyllaceae (=Hymenophyllales, represented here by Hymenophyllum and Vandenboschia) have 100% bootstrap support from both MP and ML. The monophyly of both Dipteridaceae (represented here by Dipteris and Cheiropleuria) and Lindsaeaceae (represented here by Lindsaea and Lonchitis) is strongly supported by MP bootstrap (100% and 97% support respectively), although ML bootstrap analysis only weakly recovers Lindsaeaceae (67%; Fig. 5.3). Within Pteridaceae, both maximum parsimony and maximum likelihood recover strong support for a sister-group relationship between Adiantum and Vittaria (100% and 97% bootstrap support from MP and ML respectively), with Ceratopteris strongly supported as the sister group of both, since the monophyly of the family as a whole is also well supported (Figs. 5.2, 5.3).  133  As noted above, the monophyly of each of the orders defined by Smith et al. (2006) is well supported here, at the current taxon sampling: Hymenophyllales, Gleicheniales, Schizaeales, Salviniales, Cyatheales and Polypodiales (only one member of Osmundales, Leptopteris, was sampled here). Within Gleicheniales, Dicranopteris (Gleicheniaceae) is strongly supported as the sister group of a clade consisting of Matoniaceae and Dipteridaceae (97% and 100% support from MP and ML, respectively; Figs. 5.2, 5.3). Within Cyatheales, MP and ML analyses resolve Plagiogyria as the sister group of a strongly supported clade that includes Cyathea and Dicksonia (99% and 96% bootstrap support from MP and ML, respectively). MP and ML analyses do not robustly resolve all of the basal relationships within the polypod ferns; it is not clear whether Saccolomataceae or Lindsaeaceae (or both) are the sister group of the clade that includes most of the living diversity of the leptosporangiate ferns (represented here by exemplar taxa from Pteridaceae, Dennstaedtiaceae, Polypodiaceae, Dryopteridaceae, Aspleniaceae, Thelypteridaceae and Blechnaceae; Table 5.1, Fig. 5.1). Maximum parsimony bootstrap analysis weakly suggests that Pteridacaeae is the sister group of Dennstaedtiaceae and the remaining polypod ferns (67%; Fig. 5.2), but this arrangement is not recovered in the best ML tree (Fig. 5.3) and is only very weakly supported by ML bootstrap analysis (54%; value recovered from the bootstrap majority-rule consensus tree). Three other clades comprising multiple families of polypods are moderately to strongly supported by MP and ML bootstrap analysis: a clade consisting of Dryopteridaceae and Polypodiaceae, a clade consisting of Aspleniaceae, Blechnaceae and Thelypteridaceae, and (Blechnaceae and Thelypteridaceae).  134  5.3.2 Rate Class Analyses When I analyzed the seven moderately evolving rate classes (RC0-6; i.e., excluding the fastest two rate classes) using ML, the ML bootstrap support for several nodes was substantially reduced when compared to analyses that include all of the rate classes (e.g. several major backbone nodes in Fig. 5.3, cf. left-hand bootstrap values in Fig. 5.4). Relationships that are now poorly supported by bootstrap analysis of the RC0-6 data include basal splits within the monilophytes (e.g., the sister group of leptosporangiate ferns) and basal splits within leptosporangiate ferns (e.g., whether Hymenophyllaceae is the sister group of all leptosporangiate ferns except Osmundaceae; whether Gleicheniales are monophyletic). The RC0-6 ML tree (Fig. 5.4) is compatible with the ML tree from the full data set (Fig. 5.3) concerning overall vascular-plant relationships (e.g., lycophytes as the sister group of all others, and Gnetales associated with conifers), but relationships are often weaker (e.g., support for Marattiaceae as sister to leptosporangiate ferns decreases from 82% to <50%). With respect to monilophyte relationships, the tree recovered after removal of the fastest sites (Fig. 5.4) is generally topologically congruent with both ML and MP trees found using the full data set, but with weak to moderate disagreement concerning the relative arrangements of Aspleniaceae, Blechnaceae and Thelypteridaceae (cf. Figs. 5.2-5.4). Other differences between MP and filtered and unfiltered ML analyses concern relationships outside of the monilophytes, particularly the placement of Gnetales in the seed plants (seed-plant relationships are discussed in detail in Chapter 4). The reduced ML bootstrap values observed in the RC0-6 analyses may not simply be a function of a reduced amount of data, since when I excluded Selaginella prior to assignment of rate classes and then re-analysed the data without this taxon, I found improved  135  ML bootstrap support for several major branches (cf. the left-hand and right-hand RC0-6 ML bootstrap values in Fig. 5.4). The major clades with improved support with Selaginella deleted in this way are the euphyllophytes as a whole (80% vs. 52%), the monilophytes as a whole (100% vs. 68%), a possible sister-group relationship between Equisetum and other monilophytes (71% vs. <50%) and leptosporangiate ferns as a whole (100% vs. 71%). The inclusion (or exclusion) of the various bryophyte outgroups for the full data set resulted in several weakly to moderately supported placements of Equisetum within the monilophytes (Fig. 5.5). For example, when all bryophyte taxa were removed prior to ML bootstrap analysis, Equisetum was found to be weakly supported as the sister group of the leptosporangiate ferns (59%; Fig. 5.5), whereas using only a single bryophyte outgroup (Anthoceros) places Equisetum weakly as the sister group of Marattiaceae and leptosporangiate ferns (65%; Fig. 5.5).  5.4 DISCUSSION Overall, the results of this large multigene plastid survey are largely congruent with several recent studies regarding the resolution of higher-order relationships in leptosporangiate ferns and overall vascular-plant phylogeny (Pryer et al., 2001, 2004; Schuettpelz et al., 2006). As in these previous studies, I find strong support for the placement of lycophytes as the sister-group of all other living vascular plants (except, curiously, in MP analysis, where they are strongly supported as the sister group of seed plants) and for a clade that includes all other extant seedless vascular plants (the monilophytes). In addition to this general congruence, I observe improved support for several clades that were previously only weakly to moderately supported by maximum  136  likelihood or maximum parsimony bootstrap in recent one to few gene studies (e.g., 89% ML support for the clade consisting of Cyatheales (tree ferns) and Polypodiales; moderate to strong ML bootstrap support along the rest of the main backbone of leptosporangiate ferns; Fig. 5.3). Some recent studies (Pryer et al., 2004; Schuettpelz et al., 2006; Smith et al., 2006) have suggested that Psilotopsida (Psilotaceae and Ophioglossaceae) is the sister group of all other monilophytes. In contrast, I generally find Equisetum to be the sister group of all other monilophytes, with weak to moderate support from ML analysis (Figs. 5.3, 5.4), although this is sensitive to reductions in outgroup sampling; Fig. 5.5). This result is somewhat less at odds with a placement of Equisetum outside the monilophyte clade, as seen in a morphological study (Rothwell and Nixon, 2006), and is less problematic from a morphological perspective than a placement of horsetails as nested more deeply within this clade (Gar Rothwell, Ohio University, pers. comm.). I also find moderate support from ML analysis of the full data set for Marattiaceae as the sister-group of leptosporangiate ferns (82% ML bootstrap support; Fig. 5.3), an arrangement consistent with Schuettpelz et al. (2006). Along the main backbone of leptosporangiate ferns, the exact nature of the relationship between Hymenophyllaceae and the gleichenioid ferns (Gleicheniales) was equivocal or poorly supported in the most recent classification of monilophytes (Smith et al., 2006; Fig. 5.1) and the earlier phylogenetic study of Pryer et al. (2004). My results indicate that Osmundaceae, Hymenophyllaceae, and the gleichenioid ferns comprise a basal grade of leptosporangiate ferns, with relative arrangements that are consistent with the five-gene data  137  set of Schuettpelz et al. (2006) and the taxonomically dense three plastid-gene data set of Schuettpelz and Pryer (2007). Contrasting several recent studies (Burleigh and Mathews, 2004, 2007; Chapters 3, 4), I found that removing the most rapidly evolving characters prior to analysis generally had little effect on relationships among major vascular-plant groups and also within the monilophytes, except to reduce some ML bootstrap support values for some key nodes. In contrast, removing these fast rate classes and additionally deleting Selaginella from analyses resulted in a dramatic increase of bootstrap values, especially of basal monilophyte nodes (Fig. 5.4). This suggests that this bootstrap reduction is not just a function of a more limited character sampling, but that long branches in lycophytes can have a negative impact on phylogenetic inference within euphyllophytes. The current data set provides further resolution and support for inferences of deep vascular-plant relationship. In particular it corroborates other recent studies of fern phylogeny, with a taxon and gene sampling that is very different from these studies. It also offers new insights into relationships along the backbone of leptosporangiate ferns, including increased support for the earliest evolutionary splits within this incredibly diverse group of vascular plants. This clarified scaffold of monilophyte phylogeny should be of value to researchers investigating fern plastid genomes and their rearrangements, morphological character evolution, and to systematists focusing on individual lineages within the monilophytes. Although my results are generally highly congruent with several recent studies regarding overall monilophyte relationships, I would urge some caution regarding the inference of basal monilophyte relationships. I have highlighted a problematic vascular plant  138  lineage (Selaginella) with a relatively long branch. Long branches have posed rather severe problems for the phylogenetic inference various plant groups (e.g., seed plants; Chapter 4) and my work here demonstrates that the potential for similar problems when investigating monilophyte relationship exists as well (Fig. 5.4, 5.5). Although I have sampled extensively across the backbone of the vascular plant tree, greater taxonomic density, especially within “basal” lineages of the major groups may usefully help to break-up at least some of the long branches present in the current data set (e.g. Hippochaete and additional Equisetum species). Concerning euphyllophyte relationships as a whole, the addition of new data, and especially the incorporation of morphological data from the numerous known crown monilophyte fossil taxa and various euphyllophyte stem taxa with data from extant lineages, may hold the key to further improving our knowledge of euphyllophyte and ultimately vascular-plant relationships as a whole.  139  TABLE 5.1. GenBank accession numbers and vouchers for exemplar pteridophyte (and outgroup) taxa. Gene or region Taxon authority (voucher, herbarium) BRYOPHYTES Sphagnum sp. L. (C. Lafarge Cedar Swamp 28-07-02 s.n.) LYCOPHYTES Isoetaceae Isoëtes sp. L. (H. Rai 1005, ALTA) Lycopodiaceae Lycopodium annotinum L. (H.S. Rai and J.M. Zgurski 14-09-02-13, ALTA) EQUISETOPSIDA Equisetaceae Equisetum x ferrissii Clute (P. Hammond s.n., UC)  atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7  ndhB  EU352260  EU349580  EU552803  EU328217  EU558386  EU352288  EU558420  EU558470  EU558449  EU352261  EU349581  EU552804  EU328218  EU558387  EU352289  EU558421  EU558471  EU558450  EU352262  EU349582  EU552805  EU328219  EU558388  EU352290  EU558422  EU558472  EU558451  EU352264  EU349584  EU552807  EU328221  EU558390  EU352292  EU558424  EU558474  EU558452  140  Gene or region Taxon authority (voucher, herbarium) PSILOTOPSIDA Ophioglossaceae Helminthostachys zeylanica (L.) Hook. (NYBG 233/84) Ophioglossum reticulatum L. (R. Moran 5644, MO) Psilotaceae Tmesipteris elongata P. A. Dang. (A. R. Smith 2607, UC) MARATTIOPSIDA Marattiaceae Danaea elliptica Sm. (J. Sharpe s.n., UC) Marattia attenuata Labill. (R. Schmid s.n., UC)  atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7  ndhB  EU352265  EU349585  EU552808  EU328222  EU558391  EU352293  EU558425  EU558475  EU558453  (U93825) a  n/a  EU552810  EU328224  EU558393  (AF313582)a  EU558427  EU558477  EU558455  EU352266  EU349587  EU552811  EU328225  EU558394  EU352294  EU558428  EU558478  EU558456  EU352263  EU349583  EU552806  EU328220  EU558389  EU352291  EU558423  EU558473  n/a  (AF313546) a EU349586  EU552809  EU328223  EU558392  (AF313581) a EU558426  EU558476  EU558454  141  Gene or region Taxon authority (voucher, herbarium) POLYPODIOPSIDA Aspleniaceae Asplenium viride Huds. (H.S. Rai and J.M. Zgurski 14-09-02-12, ALTA) Blechnaceae Blechnum occidentale L. (Wolf 289, UTC) Cyatheaceae Cyathea klossii Ridl. (Johns 9728, KEW) Dennstaedtiaceae Dennstaedtia punctilobula (Michx.) T. Moore (H.H. Schmidt, M.W.R Eddy & E.C. Rempala 1533, MO) Dicksoniaceae Dicksonia Antarctica Labill. (H.S. Rai 1015, ALTA)  atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7  ndhB  EU352267  EU349588  EU552812  EU328226  EU558395  EU352295  n/a  EU558479  EU558457  EU352268  EU349589  EU552813  EU328227  EU558396  EU352296  EU558429  EU558480  EU558458  EU352271  n/a  EU552816  EU328230  EU558399  EU352299  EU558432  EU558483  n/a  (U93836) a  EU349592  EU552817  EU328231  EU558400  EU352300  EU558433  EU558484  n/a  (U93829) a  n/a  EU552818  EU328232  EU558401  EU352301  EU558434  EU558485  n/a  142  Gene or region Taxon authority (voucher, herbarium) Dipteridaceae Cheiropleuria integrifolia (D.C. Eaton ex Hook.) M. Kato, Y. Yatabe, Sahashi & N. Murak. (Yokoyama 27619, TI) Dipteris conjugata Reinw. (J. Game 98/106, UC) Dryopteridaceae Dryopteris filix-mas (L.) Schott (H.S. Rai and J.M. Zgurski 14-09-02-8, ALTA) Gleicheniaceae Dicranopteris linearis (Burm f.) Underw. (J. Game 98/105A, UC) Hymenophyllaceae Hymenophyllum hirsutum (L.) Sw. (M. Kessler 11596, UC) Vandenboschia davallioides Copel. (Wolf 248, UTC)  atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7  ndhB  EU352270  EU349591  EU552815  EU328229  EU558398  EU352298  EU558431  EU558482  n/a  (AF612696) a n/a  EU552820  EU328234  EU558403  EU352303  EU558436  EU558487  n/a  EU352273  EU349594  EU552821  EU328235  EU558404  (AY268845) a EU558437  EU558488  EU558461/ EU558462  EU352272  EU349593  EU552819  EU328233  EU558402  EU352302  EU558435  EU558486  EU558460  EU352274  EU349595  EU552822  EU328236  EU558405  (AF275645) a EU558438  EU558489  n/a  (U93828) a  EU349606  EU552835  EU328249  EU558418  EU352314  EU558502  EU558469  EU558447  143  Gene or region Taxon authority (voucher, herbarium) Lindsaeaceae Lindsaea rufa K.U. Kramer (G. McPherson & J. Munzinger 18124, MO) Lonchitis hirsuta L. (F. Axelrod 9601, UTC) Lygodiaceae Lygodium japonicum (Thunb.) Sw. (H. S. Rai 1013, ALTA) Marsiliaceae Marsilea drummondii A. Braun (J. Zgurski 78, ALTA) Matoniaceae Matonia pectinata R. Br. (E. Schuettpelz 752, DUKE)  atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7  ndhB  EU352276  EU349597  EU552824  EU328238  EU558407  EU352304  EU558439  EU558491  EU558464  EU352277  EU349598  EU552825  EU328239  EU558408  EU352305  EU558440  EU558492  n/a  EU352278  EU349599  EU552826  EU328240  EU558409  (L13479) a  EU558441  EU558493  EU558465  EU352279  EU349600  EU552827  EU328241  EU558410  EU352306  EU558442  EU558494  n/a  EU352280  EU349601  EU552828  EU328242  EU558411  EU352307  n/a  EU558495  EU558466  EU349596  EU552823  EU328237  EU558406  (AY612678) a n/a  EU558490  EU558463  Osmundaceae Leptopteris wilkesiana EU352275 H. Christ (J. Game 95/035, no voucher)  144  Gene or region Taxon authority (voucher, herbarium) Plagiogyriaceae Plagiogyria japonica Nakai (M. Hasebe 27614, TI) Polypodiaceae Polypodium hesperium Maxon (H.S. Rai and J.M. Zgurski 14-09-02-2, ALTA) Pteridaceae Ceratopteris richardii Brongn. (P. Killip 44595, GH) Vittaria volkensii Hieron. (E.T. Africa, Cherangani Tweedia 2708, KEW) Saccolomataceae Saccoloma inaequale (Kunze) Mett. (372076, DUKE)  atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7  ndhB  EU352281  EU349602  EU552829  EU328243  EU558412  EU352308  EU558443  EU558496  n/a  EU352282  EU349603  EU552830  EU328244  EU558413  EU352309  EU558444  EU558497  EU558467  EU352269  EU349590  EU552814  EU328228  EU558397  EU352297  EU558430  EU558481  EU558459  EU352287  n/a  EU552836  EU328250  EU558419  EU352315  EU558448  EU558503  n/a  EU352283  EU349604  EU552831  EU328245  EU558414  EU352310  n/a  EU558498  n/a  145  Gene or region Taxon authority (voucher, herbarium) Salviniaceae Salvinia sp. Ség. (H.S. Rai 1023, UBC) Schizaeaceae Schizaea dichotoma (L.) J. Sm. (S.W. Graham 02-03-36B s.n.) Thelypteridaceae Thelypteris reticulata (L.) Proctor (J.S. Miller & M. C. Merello 8864, MO)  atpB  ndhF  psbB, T, N, & psbH  psbD & C  psbE, F, L & psbJ  rbcL  rpl2  3'-rps12, rps7  ndhB  EU352284  n/a  EU552832  EU328246  EU558415  EU352311  EU558445  EU558499  EU558468  EU352285  EU349605  EU552833  EU328247  EU558416  EU352312  n/a  EU558500  n/a  EU352286  n/a  EU552834  EU328248  EU558417  EU352313  EU558446  EU558501  n/a  a  Previously published sequences. Accessions in brackets were produced by other workers; see Rai et al. (2003), Graham and Olmstead (2000a, b) and Graham et al. (2000) for a complete list of taxa and accession numbers for other sequences employed in phylogenetic analyses here.  146  Table 5.2. New primers designed for this study. Primer name/  Gene/region  Sequence (5’-3’) F1F1: CCATAATTTRCARGAACATTC  3’-rps12  L1F2: GAGRTAACRGCTTACATAC  3’-rps12  L2F: AAACAACTTGGTGTCYAAGG  3’-rps12  L2R: CTTAGACACCAAGTTGTTTC  3’-rps12  L4F: TGGAAAGCTGTATTCGATG  3’-rps12-rps7 IGS  L4R: TCATCGAATACAGCTTTCC  3’-rps12  L5F: GATCCAATTTATCGTAATCG  rps7  L5R: GATTACGATAAATTGGATC  3’-rps12-rps7 IGS  F9F: TTATGGGTGGARCAAGTTC  ndhB  F9R: TAGAAGAACTTGYTCCACC  ndhB  F13F: GAAACGTATGCTTGCATATTC  ndhB  F13R: GAATATGCAAGCATACGTTTC  ndhB  F20F: ATATCGTSAAATWGATTTTCG  rpl2  F24R: ATCTCTTCCCRAACTGTAC  rpl2  F41F: GGTCCTGARGCACARGG  psbD  F45R: CATTAAAGAGCGTTTCCAC  psbD  1  The prefix ‘F’ indicates a primer designed to work across all ferns  2  The prefix ‘L’ indicates a primer designed to work specifically in leptosporangiate ferns  147  Figure 5.1. The consensus tree presented in Smith et al. (2006) based on recent and ongoing phylogenetic studies, redrawn to highlight the taxonomic sampling presented in this study (clades in red are represented by at least one taxon in the current data set). All resolved nodes have ! 70% bootstrap support (unless otherwise noted above branches) from at least one of the phylogenetic studies used to create the consensus tree. The classification proposed by Smith et al. (2006) is presented to the right.  148  Figure 5.2. Plastid-based phylogeny of the vascular plants. The tree is the most parsimonious tree recovered (47,342 steps, CI=0.338, RI=0.58) found using 17 chloroplast genes and associated noncoding regions. Bootstrap values are indicated above branches. Labels within open circles denote leptosporangiate fern family names, as classified by Smith et al. (2006): Os – Osmundaceae; Hy – Hymenophyllaceae; Gl – Gleicheniaceae; Mt – Matoniaceae; Di – Dipteridaceae; Ly – Lygodiaceae; Sc – Schizeaceae; Ma – Marsileaceae; Sl – Salviniaceae; Pl – Plagiogyriaceae; Cy – Cyatheaceae; Di – Dicksoniaceae; Li – Lindsaeaceae; Sa – Saccolomataceae; Pt – Pteridaceae; De – Dennstaedtiaceae; Dr – Dryopteridaceae; Po – Polypodiaceae; As – Aspleniaceae; Bl – Blechnaceae; Th – Thelypteridaceae.  149  150  Figure 5.3. Maximum likelihood tree (-lnL=23,5448.293) found using 17 plastid genes and associated noncoding regions. This analysis includes all nine rate classes (RC0-RC8). The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches.  151  Figure 5.4. Maximum likelihood tree (-lnL=11,2651.990) found using 17 plastid genes and associated noncoding regions, excluding the two fastest rate classes (RC7 and RC8) and including or excluding Selaginella. The GTR + ! + I model of sequence evolution was chosen by a hierarchical likelihood ratio test. Maximum likelihood bootstrap values are indicated above branches, the ‘*’ indicates 100% bootstrap support. The values before the slash represent an ML analysis of RC0-6 that includes Selaginella, and the numbers after the slash denote an ML analysis of the same data, with Selaginella removed (prior to a new assignment of rate classes by HyPhy). The ‘-‘ indicates an inapplicable node (because of the removal of Selaginella).  152  Figure 5.5. Placement of Equisetum from various taxon-exclusion analyses for the plastid data considered here. Various clades have been collapsed for clarity. Relevant maximum likelihood bootstrap values for each alternative are above branches (above and below the letter indicating each alternative on the vertical branch). A. Placement of Equisetum when no bryophyte representatives are included, B. Placement of Equisetum when a single bryophyte (Anthoceros) is used to root the tree, C. Placement of Equisetum when Selaginella is removed prior to analysis of RC0-6 (rate classes were recalculated after the removal of Selaginella; see Fig. 5.5; Equisetum is also found here in MP and ML analyses of the full data set; Figs. 5.3, 5.4).  153  5.5 REFERENCES BOYCE, K. C. AND A. H. KNOLL. 2002. Evolution of developmental potential and the multiple independent origins of leaves in Paleozoic vascular plants. Paleobiol. 28: 70-100. BURLEIGH, J. G. AND S. MATHEWS. 2004. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am. J. Bot. 91: 1599-1613. BURLEIGH, J. G. AND S. MATHEWS. 2007. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci. 168: 125-135. CANTINO, P. D., J. A. DOYLE, S. W. GRAHAM, W. S. JUDD, R. G. OLMSTEAD, D. E. SOLTIS, P. S. SOLTIS, AND M. J. DONOGHUE. 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56: 822-846. DES MARAIS, D. L., A. R. SMITH, D. M. BRITTON, AND K. M. PRYER. 2003. Phylogenetic relationships and evolution of extant horsetails, Equisetum, based on chloroplast DNA sequence data (rbcL and trnL-F). Int. J. Plant Sci. 164: 737-751. DOYLE, J. A. 1998. Phylogeny of vascular plants. Ann. Rev. Ecol. Syst. 29: 567-599. DOYLE, J. A. 2006. Seed ferns and the origin of angiosperms. J. Torreya Bot. Soc. 133: 169209. DOYLE, J. J., AND J. L. DOYLE. 1987. A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783–791. GIFFORD, E. M. AND A. S. FOSTER. 1989. Morphology and evolution of vascular plants. 3rd ed. W.H. Freeman, New York, NY. GRAHAM, S. W., J. R. KOHN, B. R. MORTON, J. E. ECKENWALDER, AND S. C. H. BARRETT.  154  1998. Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. Syst. Biol. 47: 545-567. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000b. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37: 183-188. GRAHAM, S. W., P. A. REEVES, A. C. E. BURNS, AND R. G. OLMSTEAD. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161: S83S96. GUILLON, J. M. 2004. Phylogeny of horsetails (Equisetum) based on the chloroplast rps4 gene and adjacent noncoding sequences. Syst. Bot. 29: 251-259. GUILLON, J. M. 2007. Molecular phylogeny of horsetails (Equisetum) including chloroplast atpB sequences. J. Plant Res. 120: 569-574. GUINDON, S. AND O. GASCUEL. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704. HASEBE, M., P. G. WOLF, K. M. PRYER, K. UEDA, M. ITO, R. SANO, G. J. GASTONY, J. YOKOYAMA, J. R. MANHART, N. MURAKAMI, E. H. CRANE, C. H. HAUFLER, AND W. D. HAUK. 1995. Fern phylogeny based on rbcL nucleotide sequences. Am. Fern J. 85: 134181.  155  HAUK, W. D., C. R. PARKS, AND M. W. CHASE. 2003. Phylogenetic studies of Ophioglossaceae: evidence from rbcL and trnL-F plastid DNA sequences and morphology. Mol. Phylogenet. Evol. 28: 131-151. KENRICK, P. AND P. R. CRANE. 1997. The origin and early diversification of land plants: a cladistic study. Smithsonian Press, Washington, D.C., USA. KORALL, P. AND P. KENRICK. 2004. The phylogenetic history of Selaginellaceae based on DNA sequences from the plastid and nucleus: extreme substitution rates and rate heterogeneity. Mol. Phylogenet. Evol. 31: 852-864. KOSAKOVSKY POND, S. L., S. D. W. FROST, AND S. V. MUSE. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676-679. KRANZ, H. D., AND V. A. R. HUSS. 1996. Molecular evolution of pteridophytes and their relationship to seed plants: evidence from complete 18S rRNA gene sequences. Plant Syst. Evol. 202: 1-11. LANG, W. H. 1937. On the plant remains from the Downtonian of England and Wales. Phil. Trans. R. Soc. 227B: 245-291. MURDOCK, A. G. 2008. Phylogeny of marattioid ferns (Marattiaceae): Inferring a root in the absence of a closely related outgroup. Amer. J. Bot. 95: 626-641. NICKRENT, D. L., C. L. PARKINSON, J. D. PALMER, AND R. J. DUFF. 2000. Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants. Mol. Biol. Evol. 17: 1885-1895. POSADA, D., AND K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14: 817-818.  156  PRYER, K. M., A. R. SMITH, AND J. E. SKOG. 1995. Phylogenetic relationships of extant ferns based on evidence from morphology and rbcL sequences. Am. Fern J. 85: 205-282. PRYER, K. M., H. SCHNEIDER, A. R. SMITH, R. CRANFILL, P. G. WOLF, J. S. HUNT, AND S. D. SIPES. 2001. Horsetail and ferns are a monophyletic group and the closest living relatives to seed plants. Nature 409: 618-622. PRYER, K. M., E. SCHUETTPELZ, P. G. WOLF, H. SCHNEIDER, A. R. SMITH, AND R. CRANFILL. 2004. Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am. J. Bot. 91: 1582-1598. RAI, H. S., H.E. O’BRIEN, P. A. REEVES, AND S. W. GRAHAM. 2003. Inference of higher-order relationships in the cycads from a large chloroplast data set. Mol. Phylogenet. Evol. 29: 350-359. RAI, H. S., P. A. REEVES, R. PEAKALL, R. G. OLMSTEAD, AND S. W. GRAHAM. 2008. Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany 86: 659669. RAUBESON, L. A., AND R. K. JANSEN. 1992. Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science 255: 1697-1699. RAUBESON, L. A., AND D. B. STEIN. 1995. Insights into Fern Evolution from Mapping Chloroplast Genomes. Am. Fern J. 85: 193-204. ROTHWELL, G. W. 1999. Fossils and ferns in the resolution of land plant phylogeny. Bot. Rev. 65: 188-218. ROTHWELL, G. W., AND K. C. NIXON. 2006. How does the inclusion of fossil data change our conclusions about the phylogenetic history of euphyllophytes? Int. J. Plant Sci. 167: 737749.  157  ROTHWELL, G. W., AND R. A. STOCKEY. 2008. Phylogeny and evolution of ferns: a paleontological perspective. Pp. 332-366. In The biology and evolution of ferns and lycophytes. Edited by T. A. Rankor and C. H. Haufler. Cambridge University Press, NY. RYDIN, C. AND N. WIKSTRÖM. 2002. Phylogeny of Isoetes (Lycopsida): resolving basal relationships using rbcL sequences. Taxon 51: 83-89. SAARELA, J. M., H. S. RAI, J. A. DOYLE, P. K. ENDRESS, S. MATHEWS, A. D. MARCHANT, B. G. BRIGGS, AND S. W. GRAHAM. 2007. Hydatellaceae identified as a new branch near the base of the angiosperm phylogenetic tree. Nature 446: 312-315. SCHNEIDER, H., E. SCHUETTPELZ, K. M. PRYER, R. CRANFILL, S. MAGALLÓN, AND R. LUPIA. 2004. Ferns diversified in the shadow of angiosperms. Nature 428: 553-557. SCHUETTPELZ, E., P. KORALL, AND K. M. PRYER. 2006. Plastid atpA data provide improved supports for deep relationships among ferns. Taxon 55: 897-906. SCHUETTPELZ, E. AND K. M. PRYER. 2007. Fern phylogeny inferred from 400 leptosporangiate species and three plastid genes. Taxon 56: 1037-1050. SMITH, A. R., K. M. PRYER, E. SCHUETTPELZ, P. KORALL, H. SCHNEIDER, AND P. G. WOLF. 2006. A classification for extant ferns. Taxon 55: 705-731. SOLTIS, P. S., D. E. SOLTIS, AND M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple chloroplast genes as a tool from comparative biology. Nature 402:402-404. SWOFFORD, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Computer program and documentation. Sinauer Associates, Sunderland, MA. WAGNER, W. H. 1990. Ophioglossaceae. Pp. 193-197 In The families and genera of vascular plants Vol. 1. Edited by K. U. Kramer and P. S. Green. Springer-Verlag, Berlin.  158  WIKSTRÖM, N. AND P. KENRICK. 1997. Phylogeny of Lycopodiaceae (Lycopsida) and the relationships of Phylloglossum drummondii Kunze based on rbcL sequences. Int. J. Plant Sci. 158: 862-871. WIKSTRÖM, N. AND P. KENRICK. 2000. Relationships of Lycopodium and Lycopodiella based on combined plastid rbcL gene and trnL intron sequence data. Syst. Bot. 25: 495-510. WIKSTRÖM, N. AND K. M. PRYER. 2005. Incongruence between primary sequence data and the distribution of a mitochondrial atp1 group II intron among ferns and horsetails. Mol. Phylogenet. Evol. 36: 484-493. WOLF, P. G., C. A. ROWE, R. B. SINCLAIR, AND M. HASEBE. 2003. Complete nucleotide sequence of the chloroplast genome from a leptosporangiate fern, Adiantum capillusveneris L. DNA Res. 10: 59-65.  159  CHAPTER 6 CONCLUSION  6.1 OVERALL CONCLUSIONS I collected and analyzed a large amount of plastid DNA sequence data to address several questions relating to the deep branches of the vascular plants. In Chapters 2 and 3, I focused on broad-scale relationships within the two largest groups of gymnosperms, the cycads and conifers. In Chapter 4, with the addition of previously published data from the largest seed-plant group, the angiosperms, I addressed the vexing issue of seed-plant relationships in general. Finally, in Chapter 5, I reconstructed relationships among the major lineages of the ferns and relatives (monilophytes), with a particular focus on the backbone of leptosporangiate fern phylogeny. All four studies surveyed a large set of exemplar taxa for a more or less consistent set of plastid regions, the largest that has been attempted to date for this number of taxa. I encountered unusual or noteworthy molecular evolutionary phenomena (Chapters 2, 3), evidence of systematic bias in the inference of deep seed-plant relationships (Chapter 4) and used a likelihood-based filtering method for removing potentially problematic rapidly evolving characters (Chapters 3-5).  6.1.1 Reconstruction of Higher-Order Relationships in Cycad Phylogeny I reconstructed phylogenetic relationships in cycads using 17 plastid genes and with two different optimality criteria (MP and ML). Higher-order cycad relationships have proven difficult to infer, apparently because of a very slow rate of molecular evolution, perhaps coupled with some relatively rapid radiations. I found substantial support for most  160  of the inferred backbone of cycad phylogeny, and weak evidence that the sister group of the cycads among living seed plants is Ginkgo biloba. Cycas (representing Cycadaceae) is the sister-group of the remaining cycads; Dioon is part of the next most basal split. I found two of the three major families of cycads (Zamiaceae and Stangeriaceae) not to be monophyletic; Stangeria (Stangeriaceae) is embedded within Zamiaceae, close to Zamia and Ceratozamia, and is not closely allied to the other genus of Stangeriaceae, Bowenia. These findings are congruent with a recently expanded taxonomic sampling of my data set (Zgurski et al., 2008), in which we obtained a complete genus-level sample for the cycads. In contrast to the other seed plants, cycad chloroplast genomes share two features with Ginkgo; a reduced rate of evolution and an elevated transition:transversion ratio. The latter aspect of their molecular evolution is unlikely to have affected inference of cycad relationships in the context of seedplant wide analyses, as I demonstrated that large variations in the transition:transversion ratios across the seed plants seem to have no affect on tree topology when all seed-plant taxa are included.  6.1.2 Reconstruction of Higher-Order Relationships in Conifer Phylogeny In Chapter 3, I reconstructed the broad backbone of conifer phylogeny using 22 exemplar conifer species. Parsimony and likelihood analyses recover the same higher-order relationships, and I find strong support for most of the deep splits in conifer phylogeny, including those within the two families that I sampled most heavily, Araucariaceae and Cupressaceae. My findings are broadly congruent with other recent studies (e.g., Gadek et al., 2000; Quinn et al., 2002), and are inferred with comparable or improved bootstrap support. The deepest phylogenetic split in conifers is inferred to be between Pinaceae and all  161  other conifers (Cupressophyta). Within the Cupressophyta clade I recovered well-supported relationships among Cephalotaxaceae, Cupressaceae, Sciadopityaceae, and Taxaceae. My data are consistent with recent moves to recognize Cephalotaxus under Taxaceae, and find strong support for a sister-group relationship between the two predominantly southern hemisphere conifer families, Araucariaceae and Podocarpaceae. I argue that Phyllocladus should be recognized under Podocarpaceae, despite residual uncertainty about its relationships to other podocarps. I also identified an unusual local hotspot of indel evolution shared by the latter two conifer families in the coding portion of a plastid ribosomal protein gene, rps7, which has become greatly expanded in a subset of conifers (Araucariaceae and Podocarpaceae). I found that the removal of the most rapidly evolving plastid characters, as defined using a likelihood-based classification of substitution rates for the taxa considered in this thesis, has little to no effect on inferences of higher-order conifer relationships.  6.1.3 Seed-Plant Phylogeny: Inference and Misinference of Higher-Order Relationships The gene sampling used throughout this thesis appears to be sufficient for recovering strong and congruent support within the major clades of vascular plants (angiosperms, conifers, cycads; e.g., Chapters 2 and 3; Graham and Olmstead, 2000; Graham et al., 2006), but the placement of Gnetales in the seed plant phylogeny remains unresolved. The Gnetales clade is one of the few extant lineages of gymnosperms, together with conifers, cycads and Ginkgo. I examined the possibility that systematic error contributes to conflicting suggestions on Gnetales placement. It appears to: I used a simulation approach to show that the “Gnetales-sister” hypothesis (in which Gnetales are the sister group of all other extant seed plants) found in several recent molecular studies may be a long-branch artifact,  162  especially when using maximum parsimony as the reconstruction method. I showed that the use of model-based methods, in particular maximum likelihood, appears to be less prone to systematic error, especially when combined with the removal of rapidly evolving sites; however, different partitions of the data can still produce strongly conflicting results for ML (e.g., different rate class partitions), even when the simulations suggest that there is little systematic error within each of them. Disturbingly, despite zero character support in the real data for the anthophyte relationship according to ML (reflected in the constrained trees used to simulate the anthophyte hypothesis), MP inference of these simulated data usually recovered an alternative relationship (the Gnetales-sister tree) that also had no character support from the anthophyte model tree. Finally, I showed that tree misinference using ML is unlikely to be purely a function of rapidly evolving characters, since third-codon position data, which include both rapidly and slowly evolving characters, are inferred to be much more error-prone using this inference method than are plastid data filtered to remove invariant and slowly to moderately evolving nucleotides.  6.1.4 Monilophytes and Deep Vascular-Plant Phylogeny In Chapter 5, I produced the largest survey of monilophytes to date using 34 exemplar taxa and the same plastid regions used throughout this thesis. I also address the broader issue of deep vascular-plant relationship with the addition of the third major line of vascular plants, the lycophytes. I was able to recover most of the backbone relationships with a high level of bootstrap support. The results of this study are generally congruent with several recently published studies that also address the same question (Pryer et al., 2004; Schuettpelz et al., 2006), although I found improved support in several key areas, for example, along the  163  backbone of leptosporangiate ferns, and for a sister-group relationship between the tree ferns (Cyatheales) and Polypodiales. In contrast to these recent studies, I found that the sistergroup of all other monilophytes is Equisetum, although its placement is only moderately supported by the current data. ML analyses of the full data set suggest that Marattiaceae may be the sister group of leptosporangiate ferns. When I used likelihood-based filtering of rapidly evolving characters, I found that the net effect was to generally reduce support for backbone relationships of the vascular plants, but that this did not affect the overall topology recovered when compared to analyses of the full data set. Taxon-exclusion analyses using Selaginella and bryophyte outgroups affect relationships in the monilophytes, and suggest that Selaginella, at least, is potentially problematic when included in analyses of vascularplant phylogeny at current taxon densities. Equisetum and other monilophyte taxa with relatively long branches may pose a more significant obstacle to accurate phylogenetic inference among the major monilophyte clades, and so additional taxonomic sampling in these lineages may be useful for more clearly resolving the basal nodes of monilophyte phylogeny.  6.2 SUMMARY AND FUTURE DIRECTIONS An overall goal of my research was to solidify our basic knowledge of deep vascularplant relationships, particularly the evolutionary relationships within the cycads, conifers, and monilophytes, and to provide well-supported phylogenies at each of these deep levels of relationship. The data I collected permit robust phylogenetic inference of much of the vascular-plant backbone, and strong support for the monophyly of, and relationships within, the largest clades examined here (i.e., conifers, cycads and leptosporangiate ferns). In  164  addition to this solid framework, I have shown that maximum likelihood is substantially less prone to systematic error when reconstructing phylogenetic relationship than maximum parsimony, and that filtering plastid data according to ML-based rate classifications can be useful when systematic error is evident or suspected. However, it does not solve the source of the conflict among different data partitions, at least concerning the inference of overall seed-plant relationships. Several questions remain, including the ever-lingering question of Gnetales placement within the seed plants, and relationships among the major clades of monilophytes. Current simulated data sets of seed-plant phylogeny as a whole, and the models used to simulate them, may not be complex enough to truly reflect fundamental but incompletely-characterized processes of molecular evolution in these taxa, and as a result, current analyses of higher-order relationship may still be considerably biased in seed plants, at least. Since monilophyte relationships involve even deeper and longer branches, we should consider the possibility that some of these relationships may also be prone to strong misinference. It may be that the key to resolving the seed-plant mystery lies in collaborative efforts that seek to incorporate more nuanced analyses of molecular data for each plant genome and their constituent genes, and by the incorporation of morphological data sets that include the many missing extinct lineages.  165  6.3 REFERENCES GADEK, P. A., D. L. ALPERS, M. M. HESLEWOOD, AND C. J. QUINN. 2000. Relationships within Cupressaceae sensu lato: A combined morphological and molecular approach. Am. J. Bot. 87: 1044-1057. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000a. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: 1712-1730. GRAHAM, S.W., J.M. ZGURSKI, M.A. MCPHERSON, D.M. CHERNIAWSKI, J.M. SAARELA, V.L. BIRON, J.C. PIRES, R.G. OLMSTEAD, M.W. CHASE AND H.S. RAI. 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Pp. 3-21. In Monocots: comparative biology and evolution (excluding Poales) Edited by J. T. Columbus, E. A. Friar, J. M. Porter, L. M. Prince, and M. G. Simpson. Rancho Santa Ana Botanic Garden, Claremont, CA. PRYER, K. M., E. SCHUETTPELZ, P. G. WOLF, H. SCHNEIDER, A. R. SMITH, AND R. CRANFILL. 2004. Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am. J. Bot. 91: 1582-1598. QUINN, C. J., R. A. PRICE, AND P. A. GADEK. 2002. Familial concepts and relationships in the conifers based on rbcL and matK sequence comparisons. Kew Bull. 57: 513-531. SCHUETTPELZ, E., P. KORALL, AND K. M. PRYER. 2006. Plastid atpA data provide improved supports for deep relationships among ferns. Taxon 55: 897-906. ZGURSKI, J. M., H. S. RAI, Q. M. FAI, D. J. BOGLER, J. FRANCISCO-ORTEGA, S. W. GRAHAM. 2008. How well do we understand the overall backbone of cycad phylogeny? New insights from a large, multigene plastid data set. Mol. Phylogenet. Evol. 47: 1232-1237.  166  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0066904/manifest

Comment

Related Items