UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Evolution of free-living relatives of apicomplexan parasites Janouškovec, Jan 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2013_fall_janouskovec_jan.pdf [ 27.2MB ]
JSON: 24-1.0073998.json
JSON-LD: 24-1.0073998-ld.json
RDF/XML (Pretty): 24-1.0073998-rdf.xml
RDF/JSON: 24-1.0073998-rdf.json
Turtle: 24-1.0073998-turtle.txt
N-Triples: 24-1.0073998-rdf-ntriples.txt
Original Record: 24-1.0073998-source.json
Full Text

Full Text

EVOLUTION OF FREE-LIVING RELATIVES OF APICOMPLEXAN PARASITES  by Jan Janouškovec  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Botany)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) July 2013  © Jan Janouškovec, 2013  Abstract As obligate parasites of animals and humans, apicomplexan parasites contain many unique characteristics that are critical to their lifestyle, but bear little resemblance to other eukaryotes. Several free-living relatives of apicomplexans represent a great potential in understanding early apicomplexan evolution. The photosynthetic Chromera velia provides a particular promise in addressing the long-contentious origin of the apicomplexan plastid. The data presented here provides evidence that the photosynthetic plastids in Chromera velia and another novel alga, CCMP3155 (later named Vitrella brassicaformis), are closely and specifically related to the apicomplexan plastid, and that they together are related to plastids in dinoflagellates. The ancestral plastid went through an unusual reduction in gene content and acquired unique features such as Rubisco II and transcript oligouridylylation. The plastid genome in C. velia is interesting on its own. Two proteins of Photosystem I and ATP synthase have been split to two fragments, which are independently expressed. The genome also appears to exist prevailingly as a linear monomer. These, and additional unprecedented features, redefine our understanding of plastid genome architecture and point to intra-chromosomal recombination as a putative driving force. Assessing environmental distribution of the newly-discovered Chromera and Vitrella leads to a discovery of six additional apicomplexan-related linages (ARLs) comprising 1,316 sequences primarily from coral reefs environments. The most abundant lineage, ARL-V, is novel and exclusively associated with coral tissue and surface samples. ARL-V is present in at least 20 species of symbiotic corals across time and space, which suggests that its relationship with corals is of potential significance to the reef ecosystem. Successful culturing of five Colponema isolates provides the first molecular data for another  ii  apicomplexan relative. The genus represents two independent lineages, one of which is the closest sister to apicomplexans and dinoflagellates. Mitochondrial genome data from both lineages reveals a gene-rich content and suggests that a linear monomeric structure with telomeres was ancestral to all alveolates. Altogether, this data illustrates the significance of Chromera, Vitrella, Colponema and several uncultured lineages in illuminating early evolution in apicomplexans and alveolates.  iii  Preface Chapter II is based on the published manuscript: Janouškovec, J., Horák, A., Oborník, M., Lukeš, J., and Keeling, P.J. (2010). A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proceedings of the National Academy of Sciences of the United States of America 107, 10949–10954. All authors designed the project. JJ conducted all experimental work. JJ and AH analyzed the data: JJ conducted comparative genome analyses, Rubisco and transcript U-tailing work, and AH built multigene alignments and phylogenies. JJ prepared the figures and wrote the manuscript and PJK revised it. JJ holds author rights to reproduce the full content of the publication.  Chapter III is based on a manuscript in review: Janouškovec, J.*, Sobotka, R.*, Lai, D.-H.*, Flegontov, P., Koník, P., Komenda, J., Ali, S., Prášil, O., Pain, A., Oborník, M., Lukeš, J., and Keeling, P.J. (in review). Split photosystem protein, linear topology, and growth of structural complexity in the recombination-driven plastid genome of Chromera velia. (in review). * Authors contributed equally to the publication. JJ initiated the project and together with RS, D-HL, JL, and PJK designed the experimental approach. JJ, RS, D-HL, PK, JK, SA, AP, and MO generated the data: RT-PCR and PCR (JJ), PFGE (JJ and MO), Northern blots (D-HL), protein data (RS, PK, and JK), genomic DNA reads (D-HL), and transcriptomic reads (SA and AP). JJ conducted the data analysis with the following contributions from other authors: primary transcript coverage values (PF), plastid gene copy number (D-HL), and interpreting protein gels and mass-spec analysis (RS). OP  iv  contributed by advice. JJ prepared all figures except Figures 4.3 (RS) and 4.13 (D-HL). JJ and PJK jointly wrote the manuscript, with a contribution from RS (all protein results, discussion and methods).  Chapter IV is based on two published manuscripts: Janouškovec, J., Horák, A., Barott, K.L., Rohwer, F.L., and Keeling, P.J. (2012). Global analysis of plastid diversity reveals apicomplexan-related lineages in coral reefs. Current Biology 22, R518–9. Janouškovec, J., Horák, A., Barott, K.L., Rohwer, F.L., and Keeling, P.J. (2013). Environmental distribution of coral-associated relatives of apicomplexan parasites. The ISME Journal 7, 444–447. This research was initiated by a discovery by JJ. JJ, AH, and PJK designed the experimental approach. JJ and AH jointly conducted the analysis: JJ identified all novel organisms, and analyzed their phylogeny, diversity and distribution across all cited studies, and AH conducted the global 16S rRNA gene analysis. KLB and FLR provided primary sequence data. JJ prepared the figures and manuscripts and PJK revised them. JJ holds author rights to reproduce the full content of the The ISME Journal publication and obtained an explicit permission from Cell Press/Elsevier to reproduce the full content of the Current Biology publication (available at request; email: janjan.cz@gmail.com).  Chapter V is based on a manuscript in preparation:  v  Janouškovec, J., Tikhonenkov, D.V., Mikhailov, K.V., Mylnikov, A.P., Aleoshin, V.V., and Keeling, P.J. (in preparation). A deep branching alveolate of a key evolutionary significance. (in preparation). All authors designed the project. DVT and APM identified, isolated and cultured the studied organisms. JJ, DVT, and KVM generated the sequences: Colponema sp. Peru and C. sp. Vietnam nuclear and mitochondrial data (JJ and DVT), C. edaphicum and C. sp. Peru nuclear data (KVM under the supervision of VVA). JJ conducted the data analysis. JJ prepared all figures and wrote the manuscript and PJK revised it.  vi  Table of Contents Abstract.........................................................................................................................................iii Preface.............................................................................................................................................v Table of Contents........................................................................................................................viii List of Tables................................................................................................................................xii List of Figures.............................................................................................................................xiii Acknowledgements......................................................................................................................xv Chapter 1:Introduction............................................................................................................................1 The significance of apicomplexan parasites........................................................................................1 Defining features of apicomplexans....................................................................................................3 The better known apicomplexan relatives - dinoflagellates and ciliates..............................................7 The poorly known apicomplexan relatives - colpodellids and Colponema........................................12 The newly discovered apicomplexan relative - Chromera velia........................................................14 Evolutionary questions......................................................................................................................15 Research objectives...........................................................................................................................19 Chapter 2:A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids ................................................................................................................................................................... 24 Introduction.......................................................................................................................................24 Results and discussion.......................................................................................................................26 Plastid genomes from two photosynthetic relatives of Apicomplexa.............................................................26 Ancient horizontal transfer of form II Rubisco to alveolates.........................................................................30 Ancient origin of mRNA polyuridylylation....................................................................................................31 Plastid phylogeny supports a common origin of alveolate and heterokont plastids.......................................31 Simple hypothesis for plastid evolution.........................................................................................................33  vii  Role of loss in plastid evolution.....................................................................................................................34 Implications for plastid genome evolution in apicomplexans and dinoflagellates.........................................35  Conclusions.......................................................................................................................................36 Materials and methods......................................................................................................................37 Culturing, DNA sequencing and assembly, PFGE and Southern Blotting.....................................................37 RNA extraction, RACE and cRT-PCR...........................................................................................................38 Comparative genomic and phylogenetic analyses..........................................................................................38 Phylogenetic datasets......................................................................................................................................39  Chapter 3:Split photosystem protein, linear-mapping topology and growth of structural complexity in the plastid genome of Chromera velia................................................................................................51 Introduction ......................................................................................................................................51 Results and discussion.......................................................................................................................53 Fragmented genes encode fragmented proteins expressed from polycistronic mRNAs................................53 Significance of split proteins for photosynthesis and ATP generation...........................................................56 Genome-wide transcription profiles...............................................................................................................59 Many plastid ORFs contain unusual features.................................................................................................60 A linear-mapping plastid genome in C. velia.................................................................................................63 Searching for forces driving structural complexity........................................................................................67  Conclusions.......................................................................................................................................71 Materials and methods......................................................................................................................72 DNA extraction, sequencing, annotation and fragmented gene analysis........................................................72 RNA extraction, RACE, and Northern blot hybridization..............................................................................73 Preparations of cell membranes and two dimensional electrophoresis..........................................................74 Protein identification by LC-MS/MS analysis...............................................................................................74 Transcriptome analysis...................................................................................................................................76 Pulse-field gel electrophoresis (PFGE) and Southern blot hybridization.......................................................77  viii  Determining the replication origin and plastid gene copy number.................................................................78  Chapter 4:Global analysis of plastid diversity reveals new apicomplexan-related lineages in coral reefs...........................................................................................................................................................96 Introduction.......................................................................................................................................96 Results and discussion.......................................................................................................................99 Analysis of plastid diversity identifies new apicomplexan-related lineages..................................................99 ARLs are widespread and abundant on coral reefs......................................................................................101 Distribution of ARL-V in the coral reef ecosystem......................................................................................102  Conclusions.....................................................................................................................................105 Materials and methods....................................................................................................................107 Identifying new plastid lineages and their diversity.....................................................................................109 Datasets and phylogenies..............................................................................................................................110 Coral-macroalgal transects (Figure 4.5A, Table 4.3)....................................................................................111 Seasonal abundance of ARL-V and associations with different coral species.............................................113  Chapter 5:Colponema, a deep-branching alveolate of a key evolutionary significance..................122 Introduction.....................................................................................................................................122 Results and discussion.....................................................................................................................123 Conclusions.....................................................................................................................................129 Materials and methods....................................................................................................................129 Sampling and culturing.................................................................................................................................129 DNA extraction, sequencing and data assembly...........................................................................................130 Annotations and comparative genomic analyses..........................................................................................131 Phylogenetic analyses...................................................................................................................................131  Chapter 6:Conclusion...........................................................................................................................138 Summary.........................................................................................................................................138 Future directions.............................................................................................................................141  ix  References...................................................................................................................................143  x  List of Tables Table 3.1.: Summary of total RNA transcript mapping to the C. velia plastid genome........93 Table 3.2: Total RNA transcript mapping to all C. velia plastid genes...................................94 Table 3.2: Total RNA transcript mapping to all C. velia plastid genes (continued)...............95 Table 4.1: Environmental sequence evidence for ARLs (excluding ARL-V)........................119 Table 4.2: Environmental sequence evidence for ARL-V.......................................................120 Table 4.3: Occurrence of ARLs and other plastids in coral-macroalgal transects..............121  xi  List of Figures Figure 1.1: Basic characteristics of apicomplexans (Toxoplasma gondii)..............................21 Figure 1.2: Protofilament structure of apicomplexan microtubules in cross section............21 Figure 1.3: Relatives of apicomplexan relatives........................................................................22 Figure 1.4: Chromera velia.........................................................................................................23 Figure 2.1: The nuclear phylogeny shows C. velia and CCMP3155.......................................41 Figure 2.2: Plastid genome maps of Chromera velia and CCMP3155...................................42 Figure 2.3: Size estimate of the C. velia plastid genome...........................................................43 Figure 2.4: Venn diagram of plastid genome contents in various red plastid lineages. ........44 Figure 2.5: The plastid ribosomal superoperon red algal origin of alveolate plastids...........45 Figure 2.6: Conserved gene order in alveolate plastid genomes..............................................46 Figure 2.7: Form II Rubisco in plastids of C. velia, CCMP3155, and dinoflagellates. .........47 Figure 2.8: Polyuridylylation of plastid transcripts in C. velia. .............................................47 Figure 2.9: Multiprotein phylogenies support the common origin of alveolate plastids.......48 Figure 2.10: Summary of plastid evolution in alveolates. .......................................................48 Figure 2.11: Plastid phylogeny of 68 proteins found in CCMP3155 plastid. .........................49 Figure 2.12: Plastid phylogeny of 34 conserved plastid proteins.............................................50 Figure 3.1: An expression model for the split PsaA..................................................................79 Figure 3.2: Expression model for the split AtpB.......................................................................80 Figure 3.3: The 2D electrophoresis of membrane proteins and their identification..............81 Figure 3.4: Plastid total RNA and polyA RNA transcriptomic profiles..................................82 Figure 3.5: Genes with an unexpected extension at the C-terminus.......................................83  xii  Figure 3.6: Genes with an unexpected extension at the N-terminus.......................................84 Figure 3.7: Genes containing long insertions............................................................................85 Figure 3.8: Genes present in paralogs: (A) clpC and (B) atpH. ..............................................86 Figure 3.9: Schematic of plastid chromosome ends..................................................................87 Figure 3.10: Amplification within and between plastid chromosome ends............................88 Figure 3.11: Pulse-field gels and Southern hybridization........................................................89 Figure 3.12: Prediction of the origin of replication...................................................................90 Figure 3.13: Relative plastid gene copy number in C. velia.....................................................91 Figure 3.14: Large repeats in the C. velia plastid genome.......................................................92 Figure 3.15: Intrachromosomal recombination hotspot near the TIR boundary..................92 Figure 4.1: Backbone trees used in the analysis of plastid diversity......................................114 Figure 4.2: Phylogeny of novel ARL plastids...........................................................................115 Figure 4.3: Phylogenetic testing of the Figure 4.2 tree robustness........................................116 Figure 4.4: Geographical distribution of ARL plastids..........................................................117 Figure 4.5: The fine-scale distribution of ARL-V on coral reefs............................................118 Figure 5.1: Colponema represents two lineages of deep-branching alveolates....................133 Figure 5.2: Mitochondrial genome map of Colponema sp. Vietnam....................................134 Figure 5.3: Venn diagram of mitochondrial genome contents...............................................134 Figure 5.4: Comparison of evolutionary rates in alveolate mitochondrial proteins............135 Figure 5.5: Distribution of selected characteristics among alveolates..................................136 Figure 5.6: Environmental diversity of Colponema-related lineages (CRLs)......................137 Figure 6.1: Alveolate phylogeny using the nuclear ribosomal RNA operon sequence.........142  xiii  Acknowledgements  I thank my supervisor Patrick Keeling for guidance, help and funding. I particularly appreciate that he let me follow my own ideas and interests and make important decisions on my own. I thank the members of my committee, Naomi Fast, Brian Leander, James Berger and Beverley Green for guidance, advice, and occasional discussions (B. Green). I thank Juan Saldarriaga for serving as my TA supervisor, UBC for granting me the Four Year Doctoral Scholarship, and Botany staff, faculty and students for general help.  I thank all my collaborators, and particularly Aleš Horák, De-Hua Lai, Roman Sobotka, Denis Tikhonenkov, and Kirill Mikhailov for their excellent and significant contributions to the publications included in this thesis. I thank all lab mates and friends for creating a great atmosphere at work, teaching me new methods and skills, inspiration, and a good chat. I thank Evelyn and my family for always being there for me.  xiv  Chapter 1: Introduction The research presented here focuses on evolution and significance of free-living relatives of apicomplexan parasites. Apicomplexans are microbial pathogens responsible for a number of medically and economically important diseases in humans and other animals. Their closest free-living relatives are little studied but provide an exciting opportunity to illuminate the apicomplexan origin and evolution. In this introductory section I briefly summarize the significance and selected characteristics of apicomplexans and their relatives, and identify several research questions focused on better understanding of early apicomplexan evolution.  The significance of apicomplexan parasites Evidence suggests that some apicomplexan parasites affected early civilizations (Lalremruata et al., 2013), but it is likely that they have accompanied humans since their divergence from other primates (Prugnolle et al., 2011). Plasmodium, the causative agent of malaria, has been the leading cause of human death by eukaryotic parasites, with current estimates of 660 000 casualties yearly (http://www.who.int/features/factfiles/malaria/en/). Plasmodium is common in other primates, and can sometimes be transmitted between them by a common vector, the mosquito. Two species, P. falciparum and P. vivax cause the most common and severe human malarias with three additional species causing milder infections. Upon injection of the invasive stage (the sporozoites) into the bloodstream Plasmodium evades human immunity by hiding inside hepatocytes and blood cells and is released in periodic cycles linked to malaria fevers. Plasmodium gametocytes enter the mosquito gut upon feeding and later fuse to form motile ookinetes that migrate into mosquito salivary glands and release the next generation  1  of sporozoites. The mosquito is the definitive host of Plasmodium, essential for transmission and completion of the sexual part of the parasite's life cycle. Several other genera that infect erythrocytes of vertebrates are related to Plasmodium and altogether these are called haemosporidians. The sister group of haemosporidians are the piroplasmids, which also invade erythrocytes. Piroplasmids are transmitted by ticks and include parasites of domestic animals and humans (Hunfeld et al., 2008). Coccidians are another major lineage of apicomplexans that infect many vertebrates, e.g., poultry (Eimeria spp.) and humans (Toxoplasma gondii). Toxoplasmosis is a food-borne disease that is widespread in humans. About one third of the world population is estimated to have experienced toxoplasmosis or may contain it chronically, with acute infections posing risks to pregnant women and immunocompromised patients (Pappas et al., 2009). Toxoplasma is also an important laboratory model for studying apicomplexan biology. Several other less-studied apicomplexan genera exist, some of which have been classified as coccidians, that represent distinct lineages of apicomplexans based on molecular evidence (adeleids, Rhytidocystis, Nephromyces), or morphological differences (Gemmocystis, Ixorheis). Cryptosporidium is a ubiquitous apicomplexan pathogen of humans and other mammals that is well known to infect HIV-positive patients. The last major group of apicomplexans are gregarines, which are parasites of invertebrates and include a great variety of marine and terrestrial species. Gregarines are widely regarded as the basal lineage of apicomplexans, but this position and their monophyly are uncertain due to limited molecular evidence, fast-evolving ribosomal RNA genes, and character convergence issues. Novel clades of marine gregarines are being discovered on a surprisingly regular basis (Rueckert and Leander, 2010; Wakeman and Leander, 2013). The attempts to uncover gregarine diversity well illustrate our limited  2  understanding of apicomplexans as a whole: molecular data have invalidated traditional classification schemes and revealed the existence of previously unrecognized groups. Combined with the astounding diversity of apicomplexans at the species level it possible that tens of thousands of additional species still await description (Adl et al., 2007). Because all known apicomplexans are obligate parasites of metazoans, research on apicomplexan biology and evolution provides the basis for understanding their impact on animal life on this planet.  Defining features of apicomplexans The apparent 'success' of apicomplexans measured by their diversity and significance to humans and other animals is well rooted in their morphology. The key defining characteristic of the group is the apical complex: a set of subcellular structures positioned at the apical end of the cell, usually associated with the disperse/invasive stages (Figure 1.1). The apical complex serves several purposes connected to excretion, cell invasion and maintenance of the parasitophorous vacuole - a protective vesicle formed around the parasite during cell invasion. The central part of the apical complex is the conoid, a closed cone of microtubules forming a spiral, which is polymerized in a unique fashion. The conoid microtubules are not closed in cross-section (Figure 1.2) unlike typical microtubule singlets made up by a circle of 13 protofilaments or microtubule doublets and triplets, where incomplete circles of protofilaments are consecutively attached to the first microtubule and each other in a “piggyback” fashion. Instead, they form a side-opened sigma-shaped structure comprising 9 protofilaments (Figure 1.2; Hu et al., 2002). This unique tubulin polymer may indeed be one of the strongest defining synapomorphies of apicomplexans if common across the group as is suggested by the conserved ultrastructure of apicomplexan  3  conoids. However, with data derived largely from a single species, Toxoplasma gondii, more evidence is needed to test this, particularly from the early-branching gregarines. In T. gondii, the conoid is further accompanied by two polar rings, two centrally-positioned microtubule singlets, and a protein component. Purification of the apical complex-enriched cellular fraction and immuno-localization studies have enabled identification of many conoid-associated proteins (Hu et al., 2006; Skariah et al., 2012). In haemosporidians, the conoid is present in the ookinete stage only (i.e., when crossing the mosquito gut wall in Plasmodium), and is reduced to polar rings in blood stages; the latter situation also applies to all piroplasmids where the conoid has been lost altogether. Excretory organelles, the rhoptries and micronemes and dense granules are frequently, though not exclusively, associated with the conoid (Figure 1.1). Many of their secretory functions are essential during cell invasion. The infection process also provides rationale for their differentiation: each of the three organelles contains a distinct content and these contents are discharged sequentially during host cell entry. Micronemes are cigar-shaped and secrete their content first. Many microneme proteins are adhesive and mediate host cell binding during gliding and a firm fixation during early invasion (Lal et al., 2009). Rhoptries are club-shaped and secrete proteins, lipids and membrane whorls during the invasion process. These are essential in forming of the parasitophorous vacuole - a protective layer that originates by invagination of the host cell membrane during the parasite entry (Kats et al., 2006). Dense granules are secreted last and contain components inserted into the parasitophorous vacuole, which mediate trafficking of metabolites and proteins with the cytosol and plasma membrane of the host (Mercier et al., 2005). The micronemes, rhoptries and dense granules originate by budding from the Golgi de  4  novo during each invasion cycle of the parasite and their protein content and the process of their maturation has been increasingly well understood (Bradley et al., 2005; Lal et al., 2009; Mercier et al., 2005; Sam-Yellowe et al., 1998). Apicomplexan movement, pellicle and division contain many characteristic features. Flagellar movement is uncommon and has been limited to microgamete stage in most species. Instead, many apicomplexans move by gliding, which evolved independently from other eukaryotes and possibly, multiple times within the group; the latter uncertain due to complex distribution of movement strategies in gregarines (Leander, 2008a; Wakeman and Leander, 2012). Gliding on the host cell surface usually precedes infection by both Toxoplasma and Plasmodium. Gliding is mediated by a unique class of myosin motors located between the plasma membrane and the inner membrane complex (IMC) (Heintzelman and Schwartzman, 1997; Meissner, 2002). The IMC itself is indispensable for gliding, because it provides an anchor for movement (Bosch et al., 2012). Ultrastructurally, the IMC is a flat membraneous sac (Figure 1.1). The IMC is located under the plasma membrane together making up a typical 'three-layer' apicomplexan pellicle. The IMC is homologous to alveoli in apicomplexan relatives such as dinoflagellates and ciliates (below) and may contain one or more openings called micropores points of active endocytosis. Apicomplexan division occurs through palintomy (termed merogony, schizogony, sporogony or gametogony in different life stages) - a simultaneous division of one polyploid cell into many. In Plasmodium, the synchronized division and release of the parasite into the bloodstream leads to the characteristic fevers diagnostic of malaria. The well-known binary division in Toxoplasma (endodyogeny; Figure 1.1) is superficially different from Plasmodium, but relies on similar principles: two complete daughter cells are assembled  5  inside the mother cell prior to membrane division (Nishi et al., 2008; Shaw et al., 2000). This suggests that endodyogeny is a modified palintomy. Typical palintomy can indeed be found in less-studied stages of the Toxoplasma life cycle (Dzierszinski et al., 2004). Altogether, the evidence indicates that palintomy (also found in other apicomplexans including gregarines) plays a prominent role in apicomplexan biology. Apicomplexan mitochondria contain tubular cristae and a small linear genome; the only exception to this is Cryptosporidium, whose anaerobic mitochondrion lacks both features (Abrahamsen et al., 2004). The apicomplexan mitochondrial genome is highly reduced compared to other eukaryotes and is either linear concatemeric or linear monomeric with short terminal inverted repeats (Feagin, 1994; Hikosaka et al., 2010, 2011). Three protein coding genes and fragments of small and large ribosomal RNA are conserved among all species. The transcription is polycistronic and oligoA tailing in mature transcripts is present in Plasmodium mitochondria (Rehkopf et al., 2000). A remarkable discovery of a cryptic plastid in mid 1990s was a critical step in understanding apicomplexan biology (Gardner et al., 1991; McFadden et al., 1996; Wilson et al., 1996). Several metabolic pathways localize to the plastid (heme, fatty acid and isoprenoid biosynthesis), which have been proposed as potential targets for drugs to treat apicomplexan diseases (Fichera and Roos, 1997; Ralph et al., 2004). Isoprenoid biosynthesis is the key essential pathway in the malaria blood stage (Yeh and DeRisi, 2011). The malaria plastid genome was completely sequenced in 1996 and localized to a multimembrane organelle the same year (McFadden et al., 1996; Wilson et al., 1996). Conflicting evidence about the number of plastid membranes was subsequently settled to the number of four (Figure 1.1; Tomova et al., 2006), which confirms the secondary origin of apicomplexan plastids (primary plastids e.g., in  6  land plants have two membranes). However, it remained unclear where the apicomplexan plastid came from and when in the evolutionary history of apicomplexans it was acquired (see Evolutionary questions below).  The better known apicomplexan relatives - dinoflagellates and ciliates For decades, the divergent characteristics of apicomplexans made it impossible to pinpoint their phylogenetic placement in the context of eukaryotic diversity. Molecular studies using the 18S ribosomal RNA gene provided the first solid evidence about their relationship to other protists, specifically to dinoflagellates and ciliates (Figure 1.3; Gajadhar et al., 1991; Wolters, 1991). This reinforced the suspected homology between 'multilayered' pellicles in all three lineages, an appearance resulting from presence of one or more membraneous sacs under their plasma membranes - the alveoli. The monophyly of alveolates (apicomplexans, dinoflagellates and ciliates) was strongly supported by multiprotein phylogenies and the shared presence of alveoli-specific proteins (Fast et al., 2002; Gould et al., 2008). It was also shown that alveolates are more broadly related to stramenopiles (heterokonts) and rhizarians (Burki et al., 2007; Hackett et al., 2007). Alveolates became one the largest groups of unicellular eukaryotes uniting thousands of described species and numerous organisms of broad general significance. Among alveolates, apicomplexans are specifically related to dinoflagellates (Figure1.3), and together form a clade commonly referred to as myzozoans (Cavalier-Smith and Chao, 2004). Dinoflagellates are defined by a characteristic flagellar organization in one of their life stages: the anterior (transversal) flagellum is undulating and associated with a horizontal groove or depression dividing the cell into two parts (the epicone and hypocone). The core dinoflagellates,  7  so called dinokaryotes (Figure 1.3), are also characterized by permanently condensed chromosomes. Dinokaryotes include all textbook dinoflagellates (the two terms are sometimes used as synonyms) and a number of significant representatives: the endosymbionts of coral Symbiodinium, bioluminescent Noctiluca and Pyrocystis, and toxic species associated with harmful algal blooms and shellfish poisoning, e.g., Gambierodiscus, Pfiesteria, and Alexandrium. About half of dinoflagellate diversity consists of photosynthetic primary producers, although these usually feed mixotrophically (Schnepf and Elbrachter, 1999). They can be both pelagic and benthic, and some are covered in a protective theca made of cellulose. Altogether, dinoflagellates include an astounding diversity of life forms and strategies supported by a large sequence-based evidence for environmental presence and a rich fossil record. The molecular biology of dinoflagellates is one of the most interesting in all eukaryotes. Their DNA is not organized on histones, but a unique type of DNA-binding protein (Gornik et al., 2012). Many of their genes are transcribed polycistronically and all appear to be trans-spliced to a distinct RNA species leading to attachment of a conserved 22 nucleotide long 'spliced leader' to their 5' termini (Zhang et al., 2007). The significance of spliced leader trans-splicing is not well understood, although it could relate to producing monocistronic transcripts or regulation of gene expression (Lukes et al., 2009). The genomes of dinoflagellates are some of the largest among all eukaryotes (LaJeunesse et al., 2005). The reason for this expansion is unknown although the available evidence suggests this is probably not linked to expansion in gene number (McEwan et al., 2008). The organellar genomes in dinoflagellates are extremely unusual. Their mitochondrial genomes typically consists of numerous short linear fragments carrying cox1, cox3 and cob  8  genes (cox3 is split and trans-spliced), pseudogenes, and fragments of ribosomal RNAs (Kamikawa et al., 2007; Nash et al., 2007). Start and stop codons are usually missing and substitutional editing has been discovered in several species. The identity of the plastid genome was long enigmatic. Dinoflagellate plastid genes long resisted PCR-based amplification attempts due to their substantial divergence from homologs in other plastids (Takishita and Uchida, 1999). The plastid Rubisco was discovered in 1995 (Whitney et al., 1995), however, this protein is unusual (form II) and is not plastid-encoded as is Rubisco in other photosynthetic eukaryotes. Altogether, these features limited closer insights into the dinoflagellate plastid genomic structure using traditional approaches, such as Southern hybridization and restriction mapping. This obstacle was eventually overcome by successful hybridization of a highly conserved plastid gene from spinach (psbA) to the satellite band of cesium chloride-fractionated genomic DNA in Heterocapsa (Zhang et al., 1999). Cloning this AT-rich satellite DNA and evidence from restriction experiments provided support for existence of small circular plastid chromosomes termed the minicircles. Minicircles encode a small subset of genes known from canonical plastid genomes substantially conserved across several dinokaryote species (Barbrook and Howe, 2000; Barbrook et al., 2001; Hiller, 2001; Zhang et al., 1999, 2002). This set includes a sum of 12 protein-coding genes, 2 rRNA and 3 tRNA genes (Barbrook et al., 2006), and although additional genes could have remained undetected, some evidence already indicates saturation in sampling (Nelson et al., 2007; Wang and Morse, 2006) (evidence for two additional plastid genes on nuclear-encoded minicircles in Ceratium is somewhat contentious; (Laatsch et al., 2004)). Some minicircles are empty or contain gene fragments pointing to active recombination processes in place (Hiller, 2001; Nisbet et al., 2004; Zhang et al., 2001). A species-conserved non-coding  9  region of minicircles is often associated with direct or inverted repeats and has been implicated in DNA replication (Moore, 2003; Zhang et al., 1999, 2002). Evidence for rolling circle-like replication intermediates in minicircles has been provided (Leung and Wong, 2009). Minicircle transcription can be polycistonic with rolling circle progression and transcript processing including substitutional editing and polyU-tailing (Dang and Green, 2009, 2010; Wang and Morse, 2006; Zauner et al., 2004). Altogether, many of these characteristics are rare or unique in plastids raising a number of questions about their origin and evolution. Early-branching dinoflagellates outside dinokaryotes (Figure 1.3) are comparatively less-known. Three lineages as defined by 18S rRNA phylogenies include parasites of invertebrates and protists: amoebophryids, dubosquellids, and the Syndinium + Hematodinium clade. Little molecular data is available from theses species, but their extensive environmental sequence record indicates they are widespread in the environment (Guillou et al., 2008). Ellobiopsids is another group of parasites affiliated with dinoflagellates whose fast-evolving 18S rRNA sequences and lack of ultrastructural information obscures their evolutionary placement (Gómez et al., 2009; Silberman et al., 2004). Two more genera of free-living protists are commonly referred to as early-branching dinoflagellates: Oxyrrhis and Psammosa. Few morphological features link them to other dinoflagellates (i.e the undulating - though not transversal - flagellum in Oxyrrhis), and molecular phylogenies support their basal position, although the branching pattern around their divergence has not been resolved (Okamoto et al., 2012). The closest sister to dinoflagellates as a whole are the perkinsids including the familiar oyster pathogen Perkinsus and related species. The uniting characteristic of perkinsids and dinoflagellates is the use of spliced leader trans-splicing, and molecular phylogenies support  10  their monophyly (Saldarriaga, 2003; Slamovits et al., 2007; Zhang et al., 2007). The mitochondrial genomes in perkinsids and basal dinoflagellates encode the same three protein-coding genes as dinoflagellates and apicomplexans suggesting this reduced state was ancestral to all myzozoans, as was ribosomal RNA fractionation and loss of canonical start codons (Jackson et al., 2011; Slamovits et al., 2007). Unusual characteristics, such as translational slippage (Masuda et al., 2010) and gene fusion at a level of a processed transcript (Slamovits et al., 2007) were noted in some species. None of these lineages is photosynthetic, but solid evidence now suggests that Perkinsus marinus contains a plastid-localized isoprenoid biosynthesis (Grauvogel et al., 2007; Matsuzaki et al., 2008) and possibly other plastid functions (Stelter et al., 2007). It is plausible that basal dinoflagellates have also retained cryptic plastids (Slamovits and Keeling, 2008a). These findings are important in interpreting early myzozoan evolution and are broadly consistent with their photosynthetic ancestry (Cavalier-Smith, 1999). However, reduced function and absence of genomes in these plastids gives limited evidence about their identity: none of the putative plastid-localized proteins identified in Perkinsus or Oxyrrhis provides unambiguous support for their relationship to either the apicomplexan or dinoflagellate plastid. Ciliates are the most distant alveolate relatives of apicomplexans (Figure 1.3) and will be discussed here only briefly. Although familiarly known for their cilia (resulting from multiplication of the flagellar kinetid), the group is more specifically defined by nuclear dualism - the simultaneous presence of two heterogeneous nuclei. The group is home to an astounding number of life forms and strategies and includes many ecologically ubiquitous heterotrophs and predators, parasites of fish (Ichthyophthirius), models in symbiosis research (Paramecium  11  bursaria, Myrionecta rubra), and the eukaryotic model in studying cellular and biochemical functions, Tetrahymena (Collins and Gorovsky, 2005). At the molecular level, the major novelties include DNA processing between the transcriptionally silent micronucleus and the vegetative macronucleus (Prescott, 2000), evolution of gene scrambling (Chang et al., 2005), and multiple independent acquisitions of non-canonical genetic codes (Lekomtsev, 2007; Sánchez-Silva et al., 2003). It is widely accepted that ciliates lack plastids (Eisen et al., 2006) and it is not known whether their ancestors contained them (a contentious evidence for presence of several alga-derived genes has been published; Reyes-prieto et al., 2008). The mitochondrial genomes in ciliates are linear and many contain telomeres carrying 20-50 bp repeats (Burger et al., 2000; Goldbach et al., 1977). The mitochondrial genome content is somewhat unusual: several genes have been split (Burger et al., 2000; Swart et al., 2011) and are fast-evolving in sequence, and the cox3 subunit of cytochrome oxidase is absent altogether from the genome and mitochondrial proteome (Smith et al., 2007).  The poorly known apicomplexan relatives - colpodellids and Colponema Several additional organisms have been associated with apicomplexans. For some of them, no molecular data is available and their morphology cannot unquestionably resolve their evolutionary placement: Rastrimonas (Brugerolle, 2002a, 2003) is most likely a perkinsid, and Acrocoelus (Fernández et al., 1999) is perhaps derived from within the apicomplexans although its apical end is unusual and lacks a conoid, and its life cycle is unknown. The best candidates for apicomplexan sisters are the colpodellids (Figure 1.3). Colpodellids are a broad and most likely paraphyletic assemblage uniting free-living predatory alveolates that feed by myzocytosis on  12  small flagellates, e.g. bodonids, algae, Spumella and ciliates (Mylnikov, 2009). Electron microscopy of their apical 'sucking' structure revealed several features reminiscent of the apical complex: side-opened tubulin conoid, and associated organelles reminiscent of rhoptries and micronemes (Brugerolle, 2002b; Brugerolle and Mignot, 1979; Mylnikov and Mylnikova, 2008; Simpson and Patterson, 1996). Like other alveolates, colpodellids have submembrane alveoli (flat or inflated), micropores, and tubular mitochondrial cristae. Their division is characteristic by simultaneous division of a cyst into four daughter cells (Mylnikov, 2009). Culturing colpodellids is somewhat difficult, because they require the presence of a prey organism, which itself grazes on bacteria or other protists. Maintaining this system can be tedious over long periods of time, although some species are known to produce resting cysts. The systematics of colpodellids are extremely complicated due to a number of taxonomic issues. Most recently Cavalier-Smith and Chao (2004) divided the genus Colpodella sensu Simpson and Patterson (1996) into several genera: Colpodella, Alphamonas, Voromonas and others. Molecular data is limited to 18S rRNA gene sequences from these three genera, and supports their separation (Cavalier-Smith and Chao, 2004). The monophyly of colpodellids in phylogenies is neither supported nor rejected, and although they branch sister to apicomplexans this branching is never significantly supported to the exclusion of other taxa (Cavalier-Smith and Chao, 2004; Kuvardina et al., 2002; Leander et al., 2003). Our knowledge of Colponema is even poorer than that of colpodellids. No molecular data have been available and ultrastructural information is limited to three studies, only one of which has been translated to English (Myl’nikova and Myl’nikov, 2010). Most Colponema species contain typical alveolate features: alveoli, micropore, tubular cristae. They do not feed by  13  myzocytosis, but capture prey cells in whole using the longitudinal groove and digest it in a posterior phagocytotic vacuole. The genus is largely defined by characteristics plesiomorphic to all alveolates, which cannot ascertain monophyly of all its members. However, the lack of myzocytosis suggests that Colponema species diverged prior to the origin of myzozoans, and this deep-branching position is key in understanding character evolution in other alveolates at both the ultrastructural and molecular levels (Leander, 2003).  The newly discovered apicomplexan relative - Chromera velia Chromera velia is a recently described alga that was originally isolated from the stony coral Plesiastrea (Figure 1.4; Moore et al., 2008). C. velia can be grown photoautotrophically in a simple seawater medium independently of its host suggesting it occurs freely in nature. It contains features typical of other alveolates: alveoli, tubular mitochondrial cristae and the micropore, and oscillates between coccoid vegetative and colpodellid-like flagellate stages (Moore et al., 2008; Oborník et al., 2011). C. velia possesses a photosynthetic plastid, similarly to some dinoflagellates, but lacks all typical dinoflagellate features suggesting that its morphology is unique among alveolates (Figure 1.4). Single gene phylogenies based on 18S rRNA and 28S rRNA genes congruently place C. velia as a sister lineage to apicomplexans with medium support along with colpodellids (18S rRNA) (Figure 1.3; Moore et al., 2008). Phylogenies using the plastid 16S rRNA revealed a relationship between C. velia and apicomplexan plastids with strong support (dinoflagellates were long-branching and excluded, however). Plastid psbA phylogenies showed a relationship between C. velia and dinoflagellate plastids with weak support (apicomplexans were absent) (Moore et al., 2008). Altogether, the  14  molecular data reinforced the morphological evidence, supported the relationship of C. velia, apicomplexans and dinoflagellates (with stronger affiliation to apicomplexans) and suggested that their plastids descended from a common photosynthetic ancestor. Data from plastid tufA and plastid-targeted GAPDH protein sequences supported this conclusion (Oborník et al., 2009). However, because evidence has been limited to single genes, support values are incomplete and branch lengths in plastid phylogenies are highly variable, a more comprehensive data set from both nuclear and plastid genomes is required to test these conclusions.  Evolutionary questions One of the possible reasons why alveolates (and free-living heterotrophs in particular) have received more attention than similar protists in other groups may be their intriguing looks and ultrastructure, both linked to a complex character evolution (Leander, 2003). On its own, ultrastructure cannot resolve deep alveolate phylogeny and historically, these attempts led to theories now considered obsolete (such as the origin of ciliates from within the dinoflagellates, discussed in Lee and Kugrens, 1992). This is partly due to difficulties with interpreting character loss (e.g. pseudoconoids are present in colpodellids and perkinsids to the exclusion of dinoflagellates although the latter two are related), gain of distinctive features (dinoflagellates putatively placed as an early offshoot in the eukaryotic tree due to their unusual nuclear organization; Taylor, 1976), convergent morphologies among related alveolates and in other protists (Leander, 2008b). Examples of the latter include submembrane alveoli-like vesicles in glaucophytes, raphidophytes and other stramenopiles, Telonema and perhaps, in katablepharids and other protists, analogs of condensed chromosomes and extrusomes in euglenids (Lukes et al.,  15  2009), and apically-positioned tubulin-based sucking structures reminiscent of the apical complex in Katablepharis and suctorian ciliates (Lee et al., 1991). Accordingly, ultrastructural data should better be interpreted in the context of molecular evidence (e.g., Wakeman and Leander, 2012), which provides a reliable and straightforward way of resolving alveolate relationships, particularly at a deeper level (Fast et al., 2002). Despite their limited usefulness in inferring relationships, the richness of alveolate characteristics raises a number of evolutionary questions connected to function and significance. Alveoli provide a good example in this respect: it is commonly assumed that the main function of alveoli is to provide a structural support for the cell. Indeed, many alveolates are difficult to break during cell fractionation experiments or have comparatively large cells among protists that would require firm cytoskeleten support. Nonetheless, the original function of alveoli is not known and other functions are plausible. For example, it has been shown that alveoli in Paramecium serve as a calcium storage compartment, which suggests association with signaling-related functions, such as ciliary movement, cellular division and extrusome discharge (many alveolates are predators and use different types of extrusomes for catching prey) (Stelly et al., 1991). Identifying the pellicular proteome provides a first important step in understanding the alveoli function (Gould et al., 2008, 2011). However, many basic uncertainties remain about how alveoli compare among different species in their inner content, associated proteome, and cellular functions (cellular support, storage, signaling and movement/gliding). The origin and evolution of the apical complex is closely intertwined with apicomplexan relatives. The side-opened conoid and rhoptry or microneme-like organelles in colpodellids, perkinsids and more recently Chromera and Psammosa have been homologized with apical  16  complex structures (Leander, 2003; Oborník et al., 2011; Okamoto et al., 2012). Although distribution of the associated features is mosaic, and more direct evidence for homology is still missing, these similarities suggest that the apical complex may have originated from an apparatus for myzocytotic feeding - an intriguing evolutionary hypothesis tightly linked to the origin of apicomplexan parasitism. Indeed, evidence for shared presence of apical complex-associated proteins would provide much more direct evidence for homology, but this analysis is limited by the unavailability of molecular data from many apicomplexan relatives, particularly Chromera and colpodellids. However, understanding apical complex evolution is also, and somewhat surprisingly, limited by our knowledge of apicomplexans. A great deal of what we know about the apical complex ultrastructure is extrapolated from studies on Toxoplasma and both the ultrastructural and molecular data from early-branching apicomplexans such as gregarines is rare, further complicated by absence or modifications of the apical complex across species and life cycles. As already mentioned, palintomy plays a central role in apicomplexan division and is often understood as one of the keys to their success: e.g., the simultaneous release of thousands malaria merozoites in the bloodstream relies on overwhelming the host immunity temporarily. However, both colpodellids and Chromera also use palintomy in division (usually producing 3-4 daughter cells) (Oborník et al., 2011). In colpodellids, a 4-way division in a cyst is the only division type recorded (Mylnikov, 2009). Palintomy is also known in perkinsids and most basal dinoflagellates, although it has not been observed in Oxyrrhis and Psammosa. Indeed, palintomy is present in many other protists, but its prevalence in myzozoans is intriguing and its prevalence is most likely ancestral to the group. Although providing evidence for homology may be difficult  17  in this case, it is important to consider that the division in apicomplexans - though commonly associated with adaptation to parasitism - is well rooted in the life style of their free-living relatives. Apicomplexans and dinoflagellates contain the smallest organellar genomes and their reduction, accompanied by acquisition of unusual features at both genomic and transcriptomic levels, has led to a number of questions. The myzozoan mitochondria represent the largest genome reduction among aerobic mitochondria, but details of this process have been unclear. Mitochondrial genomes and proteomes in Chromera and colpodellids are unlikely to resolve this, because they most likely resemble those in apicomplexans and dinoflagellates. However, information from Colponema, if sister to myzozoans, could be useful in understanding the reduction process and acquisition of many divergent characteristics in both myzozoan and ciliate mitochondria (gene splits, reduction of the membrane complexes, alternative oxidases, nuclear tRNA import). The story of alveolate plastids is even more complex. The discovery of the apicomplexan plastid led to several hypotheses about its origin. A green algal origin was inferred based on the analysis of plastid TufA, plastid rpo proteins and nine plastid proteins (Cai et al., 2003; Köhler et al., 1997; Lau et al., 2009), and the split in mitochondrial cox2 protein (Funes et al., 2002) - a problematic character with little connection to the plastid, the significance of which was rejected by further analysis (Waller and Keeling, 2006; Waller et al., 2003). The alternative red algal origin was suggested by presence of the red lineage-specific ribosomal protein supercluster, sufB, and gene order (Blanchard and Hicks, 1999; McFadden and Waller, 1997; Williamson et al., 1994), and plastid ribosomal RNA gene phylogenies (Zhang et al., 2000), and involved  18  several different scenarios: the apicoplast could be related to other red-algal plastids in dinoflagellates, heterokonts, or derived independently from red algae (Bodył et al., 2009; Cavalier-Smith, 1999; Williamson et al., 1994). However, neither phylogenies nor gene organization provided unambiguous support for one of these hypotheses, leaving the origin of the apicoplast unresolved. The dinoflagellate plastid origin could not be resolved either, because virtually all of their plastid genes are very fast-evolving, relate to photosynthesis and hence were lost in apicomplexans. This led to a stalemate in which plastids in apicomplexans and dinoflagellates could not be directly compared to each other, and their relationships to other plastids was not well resolved (Shalchian-Tabrizi et al., 2006). Because many related organisms lack apparent plastids (ciliates, colpodellids, perkinsids, many dinoflagellates) the distribution of this organelle among alveolates became a central problem in understanding plastid endosymbiosis as a whole: how often can plastids be lost and how often can they be transferred between different eukaryotes? The discovery of a putative cryptic plastid in Perkinsus (Matsuzaki et al., 2008) contributed to the already heated debate, but could not resolve this issue on its own. The recent description of Chromera velia (Moore et al., 2008), however, has finally identified a photosynthetic alveolate whose plastid could directly address the problematic apicoplast origin.  Research objectives Among the evolutionary questions listed, the origin of the apicomplexan plastid has been a long-standing issue of a broad significance. Predatory relatives of apicomplexans, such as colpodellids may harbor a cryptic plastid related to the apicoplast, however its presence is a  19  priori unclear and these species are not readily available in culture. The discovery of Chromera solves both of these issues at once, because it offers an easy-to-culture species with a functional, photosynthetic plastid. The research interest behind this work is therefore focused on the evolutionary history of plastids in Chromera and apicomplexans and the broader significance of Chromera and other free-living organisms in understanding the apicomplexan origin, with the following primary objectives: 1. Resolve the phylogenetic position of C. velia and sequence its plastid genomes in order to assess the origin and relationships of the remnant plastid in apicomplexan parasites. 2. Examine presence of unique features in the C. velia plastid genome (stems from work in the research objective 1). 3. Address the diversity and distribution of C. velia in the environment, and probe for existence of other lineages of plastid-bearing relatives of apicomplexans using environmental sequence data. 4. Resolve the phylogenetic position of Colponema, a free-living alveolate of uncertain position and survey its organellar genomes in order to better understand evolution of alveolate organelles.  20  Figure 1.1: Basic characteristics of apicomplexans (Toxoplasma gondii). (A) Schematic of the longitudinally sectioned tachyzoite stage in T. gondii showing the conoid (C), rhoptries (R), micronemes (Mi), dense granules (D), inner membrane complex (I), micropore (Mp), nucleus (N), Golgi (G), endoplasmatic reticulum (E), mitochondrion (Mt-) and plastid (P). (B) Early binary division (endodyogeny) in T. gondii. Note that subcellular compartments are progressively divided between two daughter cells (middle). (C) Late endodyogeny leading to two independent cells as in (A). Modified from Nishi et al., (2008), with permission.  Figure 1.2: Protofilament structure of apicomplexan microtubules in cross section. (A) Single microtubule with 13 protofilaments. (B) Microtubule doublet. (C) Sigma shaped open-sided microtubule - a unique organization specific to the apicomplexan conoid.  21  Figure 1.3: Relatives of apicomplexan relatives. Relationships between apicomplexan parasites (red) and their relatives are showed based on a consensus of molecular phylogenies discussed in the text. Branches uniting all alveolates (A) and myzozoans (M) are indicated. Question mark indicates that support for the clade uniting apicomplexans, colpodellids and Chromera is uncertain. Circles at branch tips indicate presence of plastids, green fill indicates photosynthesis.  22  Figure 1.4: Chromera velia. Light microscopy (DIC) of the C. velia CCMP2878 culture.  23  Chapter 2: A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids Introduction The primary obstacle in determining the evolutionary origin of the cryptic plastid in apicomplexan parasites, the apicoplast (McFadden et al., 1996), has been its divergent nature. The apicoplast genome is both highly-derived (it is compact, with fast-evolving genes and a very high AT-bias), and reduced (it has lost all genes related to photosynthesis), making clear comparisons with other plastids difficult (Wilson et al., 1996). Moreover, the closest algal relatives to the apicomplexans are dinoflagellates, and dinoflagellate plastids are equally derived but in different ways. Characterized dinoflagellate plastid genomes encode only 12–14 genes, which are extremely fast-evolving and are localized on minicircles with one or a few genes (Zhang et al., 1999). Most importantly, however, nearly all of the genes retained in the dinoflagellate plastid encode photosystem proteins, so the two genomes are virtually incomparable in their gene content. For these reasons, hypotheses for the apicoplast origin rest on analyses of its divergent genes in the absence of dinoflagellate homologs (Cai et al., 2003; Köhler et al., 1997; Lau et al., 2009), plastid-derived genes encoded in the nucleus (Fast et al., 2001), or genes with no connection to the plastid whatsoever (Funes et al., 2002; Waller et al., 2003). Not surprisingly, these data have led to completely inconsistent conclusions either for a green or red algal origin of the apicoplast. The hypothesis that the apicoplast is derived from a red alga is also tied to the broader “chromalveolate” hypothesis (Cavalier-Smith, 1999), which posits that the endosymbiosis that gave rise to the apicoplast also gave rise to plastids in dinoflagellates, heterokonts, and hacrobians (cryptomonads and haptophytes). Although this  24  notion minimizes the number of endosymbiotic events required to explain plastid diversity, it also leads to complexity in other ways because each of the chromalveolate lineages contains early-branching members or sister groups where no plastid is known. Minimizing endosymbiotic events therefore increases the number of times photosynthesis or plastids must have been lost. Alternatively, each of these lineages could have obtained its plastid from an independent red algal endosymbiosis (Falkowski et al., 2004) or from another eukaryote already containing a red algal plastid through serial tertiary endosymbioses (Bodył et al., 2009; Sanchez-Puerta and Delwiche, 2008). The apicomplexans and dinoflagellates illustrate this discrepancy well, because recognizable plastids appear to be absent in basal subgroups of both lineages and the presence of photosynthesis in their common ancestor would require between five and nine independent losses of photosynthesis (and in some cases plastids) just among the early-branching lineages, and probably another dozen losses within dinoflagellates as a whole. Distinguishing between the early vs. late origin of these plastids has proven extraordinarily difficult and has led to a passionate debate over likelihood of plastid gain versus loss and how to interpret genomic data from various nonphotosynthetic groups related to red plastid-containing lineages like ciliates, oomycetes, and rhizarians (Bodył et al., 2009; Lane and Archibald, 2008). Comparisons between complete plastid genomes would constitute the most direct way to test these hypotheses, but this has historically precluded both apicomplexans and dinoflagellates because their genomes are so reduced and divergent that their relationship to one another and other plastid lineages remains obscure. A potential breakthrough to this stalemate was the recent description of a photosynthetic relative of the apicomplexans, Chromera velia (Moore et al., 2008). Although related to apicomplexans, C. velia is photosynthetic, so if the C. velia plastid genome retains characteristics  25  ancestral to both apicomplexan and dinoflagellate plastids, it has the potential to settle many debates conclusively. Unfortunately, to date only three of its genes have been characterized (Moore et al., 2008; Oborník et al., 2009), and nothing is known of its gene content, organization, or structure. Here, we describe the complete plastid genome sequences and other plastid-associated data from C. velia, and also from a second independent lineage of photosynthetic alveolate, represented by the undescribed species CCMP3155. These data provide several lines of evidence (e.g., shared gene content, genome structure, processing pathways, lateral gene transfers, as well as gene phylogenies) that the common ancestor of apicomplexans and dinoflagellates contained a plastid and that the extant plastids in these lineages descended from that ancestral organelle. Phylogenetic reconstruction also provides direct evidence supporting the common origin of this plastid with those of heterokont algae.  Results and discussion Plastid genomes from two photosynthetic relatives of Apicomplexa C. velia has been shown previously to be a photosynthetic alga that is found associated with corals and is related to apicomplexans (Moore et al., 2008). CCMP3155 is another photosynthetic alveolate originally isolated from bodies of reef corals, and we have investigated it as a possible second such lineage. CCMP3155 oscillates between a coccoid stage and a flagellate stage closely reminiscent of colpodellids. The coccoid cell contains a single plastid surrounded by four membranes, like the plastid of C. velia and apicomplexans, and has thylakoids stacked in triplets, similar to C. velia and dinoflagellates (Moore et al., 2008; Oborník et al., 2011). The relationship of CCMP3155 to other alveolates is unknown and the position of  26  C. velia has been inferred only from single gene analyses (Moore et al., 2008; Oborník et al., 2009), so we first sought to clarify the phylogeny. Their position relative to apicomplexans and dinoflagellates is critical to any interpretation of plastid characters - especially if either proved to be specifically related to photosynthetic dinoflagellates rather than apicomplexans. Total genomic DNA from both organisms was sequenced by 454-pyrosequencing and selected nuclear genes extended by PCR, RT-PCR, and 3′RACE into a concatenated dataset of eight nuclear genes consisting of 7,137 characters. All phylogenetic analyses consistently showed with strong support that C. velia and CCMP3155 are closely related to apicomplexans and, strikingly, form two distinct lineages with CCMP3155 more closely related to apicomplexans (Figure 2.1). Analyses of plastid genes (see Plastid Phylogeny below) support their separation, but the positions of C. velia and CCMP3155 are interchanged, so all evidence indicates that they represent two distinct photosynthetic lineages that are closely related to apicomplexans. The plastid genomes of both organisms were assembled into single contigs from 454 sequence data. The CCMP3155 plastid DNA maps as a circle (Figure 2.2), whereas in C. velia a single gap between the two copies of psbA could not be filled, but the sequenced genome size (119.8 kb) corresponds closely to the size estimated from pulse-field gel electrophoresis and Southern hybridization (121.2 kb) (Figure 2.3), suggesting a small gap with few or no genes. It is also possible that the majority of C. velia plastid genome exists as a linear molecule. The C. velia plastid genome is larger than that of CCMP3155 (121.2 kb vs. 85.5 kb), with a lower gene density and stronger strand polarity (Figure 2.2). The C. velia plastid uses a noncanonical genetic code (UGA encodes tryptophan) (Moore et al., 2008), whereas CCMP3155 uses the universal genetic code. At 47.74% GC, the CCMP3155 plastid is one of the least AT-biased plastid  27  genomes known, contrasting to the extremely AT-rich apicoplast (86.86% AT in Plasmodium falciparum). The ribosomal RNA operon of CCMP3155 is also of interest as it is interrupted by a gene for phosphonopyruvate decarboxylase, which appears to be a rare case of lateral gene transfer to the plastid. Comparing gene content among alveolate plastids (Figure 2.4, center rings), reveals the nearly mutually-exclusive gene sets of apicomplexans and dinoflagellates (they only share rRNAs and a handful of tRNAs). Plastid genomes of C. velia and CCMP3155 have a relatively modest gene complement, but nevertheless they collectively contain all genes found in either apicomplexans or dinoflagellates, plus numerous other genes. This is consistent with alveolate plastid genomes originating by reduction from a common ancestor. Similarly, the complete set of alveolate plastid genes is also retained in heterokont, hacrobian, and red algal plastids (Figure 2.4, outer rings). Significantly, 18 genes found in the alveolate collective gene set are never found in green algal or plant plastids (Figure 2.4, shaded boxes), but all are present in plastids of heterokonts, hacrobians, and red algae, consistent with the red algal origin of all alveolate plastids. Another characteristic uniting red algal plastids and their descendants is the ribosomal superoperon, which originated by the fusion of the str and S10+spc+alpha operon clusters (Blanchard and Hicks, 1999; McFadden and Waller, 1997). The ribosomal superoperon is also present in apicomplexans and CCMP3155 (with several internal rearrangements and gene losses) (Figure 2.5), consistent with a red algal origin of these plastids (in C. velia the superoperon has broken up, but the region surrounding the original fusion has been retained in re-arranged form on one fragment). The alveolate superoperons are not as well conserved as those of other algae, but they do share several gene losses in common, including three genes (rpl22, rps9, rps10) that are present in all other red algal plastids. These may have been lost  28  independently, but, given their apparent relationship, a common ancestral loss is perhaps more likely. In addition, the unusual transposition of rpl31 within the superoperon is unique to CCMP3155 and C. velia plastids, suggesting it may also be ancestral to apicoplasts, which have lost the gene (Fig 2.5). A detailed analysis of gene order throughout alveolate plastids revealed little conservation in C. velia but significant co-linearity between CCMP3155 and some apicomplexans. Comparison using a tool for analyzing rearrangements between pairs of genomes (GRIMM; Materials and methods) resulted in a minimum of four inversions (three in the large single copy region and one in the rRNA operon) to explain differences between CCMP3155 and Plasmodium protein- and rRNA-encoding genes. The majority of tRNA genes was also found in orthologous gene blocks in both species, although they are generally more prone to rearrangements as is the case in other plastid genomes (Figure 2.6). In six cases, the order of genes in blocks ranging from four to two genes are unique to CCMP3155 and apicoplasts among all known plastid genomes (Figure 2.6). This level of conservation lends further support to the common origin of plastids in apicomplexans and CCMP3315, but more importantly sheds some light on how the apicoplast genome reduced to its current state. This is because the blocks of conserved gene order are generally not identical due to the presence of many genes in CCMP3155 that are absent in the apicoplast, nearly all of which are related to photosynthesis (Figure 2.6). This suggests that the transition from a CCMP3155-like genome to an apicoplast involved surprisingly little reorganization and primarily involved gene loss. Indeed, except for genes associated with photosynthesis, the C. velia and CCMP3155 plastids respectively contain only 4 and 11 additional genes that are absent from the collective apicoplast gene set, providing an interesting glimpse into how the apicoplast reduced.  29  Ancient horizontal transfer of form II Rubisco to alveolates Neither C. velia nor CCMP3155 plastid genomes encode a form I Rubisco, at least one subunit of which is present in all photosynthetic plastid genomes with the single exception of dinoflagellates. In dinoflagellates, the form I Rubisco has been replaced by a nucleus-encoded, single subunit form II Rubisco acquired by horizontal gene transfer from a proteobacterium (Morse et al., 1995; Whitney et al., 1995). We identified a form II Rubisco gene in the nonplastid 454 data from both C. velia and CCMP3155. To confirm that the C. velia Rubisco gene is nucleus-encoded, we used 3′ and 5′ RACE to show the transcript is polyadenylated and encodes an N-terminal extension with characteristics required for plastid-targeting. Both features were confirmed, and the N-terminal extension was found to encode a readily identifiable signal peptide (P = 0.998 in SignalP-HMM) followed by a positively charged region, features consistent with the bipartite leader required for plastid-targeting in apicomplexans and dinoflagellates (Nassoury et al., 2003; Waller et al., 2000). Unrooted phylogenetic analyses demonstrate both C. velia and CCMP3155 Rubisco genes are closely related to homologs from dinoflagellates (Figure 2.7). Rooting this tree using distantly related form II Rubisco genes (Tabita et al., 2008) did not affect this: the root fell in various positions among proteobacteria but never within alveolates. Altogether, these data show that the horizontal gene transfer that gave rise to the nuclear-encoded form II Rubisco in dinoflagellates actually took place in the common ancestor of dinoflagellates, apicomplexans, C. velia, and CCMP3155, once again supporting the common origin of their plastids.  30  Ancient origin of mRNA polyuridylylation Another feature thought to be unique to dinoflagellate plastids is the 3′ polyuridylylation of transcripts (Wang and Morse, 2006). To see if this too may be ancestral to dinoflagellates and apicomplexans, we carried out RT-PCR on circularized mRNAs from three C. velia photosystem genes (psbB, psbC, and psaA). Multiple mRNAs from all three genes were found to be polyuridylylated (Figure 2.8). This result was confirmed and extended using 3′RACE with a polyU-complementary primer on transcripts from eight other functionally diverse plastid genes. For all eight genes, polyU-specific products were characterized, suggesting polyuridylylation is common to all plastid transcripts in C. velia. This form of processing is otherwise known only in dinoflagellates (Wang and Morse, 2006), so once again the presence of this feature in C. velia suggests it was present in a common ancestor of apicomplexan and dinoflagellate plastids (Figure 2.8). This would in turn suggest that the character is also either present in apicomplexans and CCMP3315, or that it was lost in one or both of those lineages, but to date we are aware of no data from either to distinguish. Plastid phylogeny supports a common origin of alveolate and heterokont plastids Reduced gene content severely restricts any direct comparisons between apicomplexan and dinoflagellate plastids, and plastids in other algal lineages. The C. velia and CCMP3155 plastid genes are also divergent and phylogenies based on them need to be interpreted carefully, however, they nevertheless provide the means to test alveolate relationships in another way. The relationship between apicomplexan and dinoflagellate plastids abundantly supported by the gene content, gene order, and rare genomic characters described above was tested by evaluating the relationship of each group individually to C. velia and CCMP3155 using the gene set common to  31  each. In both cases the monophyly of alveolate plastids and their relationship to red algae are supported under all analytic models (Figure 2.9), reinforcing the conclusion that the ancestor of apicomplexans and dinoflagellates possessed a red algal plastid, that their extant plastids are direct descendents of that organelle, and that each retains different subsets of its ancestral characteristics (Figure 2.10). The plastid genomes of C. velia and CCMP3155 also provide an opportunity to examine the deeper history of this endosymbiosis. The reduction of apicomplexan and dinoflagellate plastids not only made their direct comparison difficult but also challenged any comparisons with other plastid lineages. In contrast, the C. velia and particularly CCMP3155 genomes are the most slowly evolving, gene-rich alveolate plastid genome known and are therefore more readily comparable to other plastid genomes. In phylogenetic analyses of whole-plastid genomes, CCMP3155 consistently groups as a sister lineage to heterokonts with strong support (Figure 2.9, Figure 2.11 and Figure 2.12). Alveolates and stramenopiles are also related in nuclear gene trees (Burki et al., 2008, 2009), so their affiliation in multi-protein plastid phylogenies provides evidence that their plastids are also ancestral. The common ancestry of hacrobian plastids (cryptophytes and haptophytes) also received strong support in all analyses (Figure 2.9, Figure 2.11 and Figure 2.12) and is consistent with the horizontal replacement of rpl36 in their plastid genomes (Rice and Palmer, 2006) and analyses of nuclear genes (Hackett et al., 2007; Patron et al., 2007). Many analyses recovered a monophyletic lineage including all red algal derived plastids (the chromalveolates), but this is not as strongly supported as the alveolate/heterokont or hacrobian groupings. Trees including all plastid genes recovered chromalveolates with weak support (Figure 2.12), whereas trees restricted to the slowest evolving 34 and 11 genes recovered chromalveolates with modest and strong support,  32  respectively (Figure 2.12 and Figure 2.9A). These genes are mostly photosystems, which have been shown to be less likely to lead to spurious results than the housekeeping genes (Hagopian et al., 2004; Khan et al., 2007). Overall we conclude the plastid genomes support the monophyly of two major groups, the alveolate/heterokont group and the hacrobian group - whether they form a single chromalveolate group is not yet certain. Simple hypothesis for plastid evolution The plastid genomes of C. velia and CCMP3155 provide multiple lines of evidence for a common origin of red algal plastids in apicomplexans, dinoflagellates, and heterokonts. This, together with parallel evidence for a relationship between the host lineages (Burki et al., 2008, 2009), supports a rather simple picture of plastid evolution by direct descent in these lineages. Recently, a number of more complex theories involving serial tertiary endosymbiosis have been proposed and expanded, in particular, some that suggest either dinoflagellate and apicomplexan plastids were acquired recently from different sources (Bodył, 2005; Bodył et al., 2009). Our data are explicitly inconsistent with this notion, because extant plastids of dinoflagellates and apicomplexans can be linked through C. velia and CCMP3155. Although serial transfers of plastids could formally explain the relationship of alveolate and heterekont plastids, congruent relationships inferred from plastid (this study) and nuclear genes (Burki et al., 2008, 2009) suggest simple descent from a common ancestor a more likely explanation. Indeed, serial acquisition of eukaryotes with complex (four membrane) plastids remains a theoretical model that has never been observed outside a few lineages of dinoflagellates, and we show here that all other dinoflagellate plastids were inherited by descent from a common ancestor with apicomplexans. A similar situation is found in hacrobians. Plastid and nuclear gene phylogenies  33  combined with the shared presence of horizontally transferred plastid rpl36 argue strongly for a common plastid acquisition. Overall, available data provide support for at most two and perhaps only one secondary endosymbiosis of a red alga (one in hacrobians and the other in the ancestor of heterokonts and alveolates), and there is as yet no direct evidence for any major algal group acquiring a plastid by tertiary endosymbiosis. Another potential twist in plastid evolution is the notion modern plastids might have supplanted older plastids in some algal lineages, and the history of this original organelle is now only recognizable through the presences of relict genes. Such a case was recently made for chromalveolates using molecular data from some members of the heterokonts (Moustafa et al., 2009). The present results neither confirm nor undermine these hypotheses because our conclusions derive from the plastid genome itself and relate specifically to extant plastids. Role of loss in plastid evolution A common origin of alveolate plastids impacts how we view the importance of photosynthesis and plastid loss in evolution. Many alveolate lineages are non-photosynthetic, but if the ancestors of alveolates and heterokonts had a plastid, then all nonphotosynthetic members of these groups had photosynthetic ancestors. Whether they lost plastids or just photosynthesis remains unknown in most cases: the recent discovery of a cryptic plastid in Perkinsus marinus (Matsuzaki et al., 2008), and several plastid-targeted genes in Oxyrrhis marina (Slamovits and Keeling, 2008a) (both deep-branching members of the dinoflagellate lineage), highlights this distinction and the need for direct evidence of plastid ancestry in such lineages. In other lineages the abundance of data allows more solid conclusions. In particular, the data now supporting the photosynthetic ancestor of apicomplexans and dinoflagellates lead us to infer that the ancestor of  34  the apicomplexan Cryptosporidium had a plastid despite the absence of plastid ultrastructure or genes for plastid-targeted proteins. A similar case can be made for ciliates and various non-photosynthetic heterokonts (in particular oomycetes) where whole genomes again confirm the absence of a plastid (Eisen et al., 2006; Tyler et al., 2006), although claims of relict plastid endosymbiont genes have been made (Reyes-prieto et al., 2008; Tyler et al., 2006). Overall, the apicomplexans, dinoflagellates, and their close relatives are a hotspot for loss of plastids and photosynthesis and further research on this group will likely give us important clues about plastid and photosynthesis loss in other algal lineages. Implications for plastid genome evolution in apicomplexans and dinoflagellates Apicomplexan plastid genome reduction is commonly linked to the loss of photosynthesis, although in dinoflagellates the transformation of the plastid genome into single gene minicircles (Zhang et al., 1999) could be interpreted as having allowed the massive transfer of genes to the nucleus (Bachvaroff et al., 2004; Hackett et al., 2004). However, comparing gene content across all red plastids (Figure 2.4) reveals that many genes missing from the plastids of apicomplexans and/or dinoflagellates are also absent from those of C. velia and CCMP3155, and probably were already missing in their common ancestor. This indicates a massive loss or migration to the nucleus of at least 81 genes took place before either apicomplexans or dinoflagellates evolved. Searching the C. velia and CCMP3155 nonplastid sequence revealed fragments of several of these “missing” genes, and 3′RACE showed that transcripts of at least three such genes (chlI, rps9, and rpl21) are polyadenylated, confirming their nuclear localization. Therefore, a major plastid genome reduction by migration to the nucleus took place early in alveolate evolution, and although it continued in both dinoflagellates and apicomplexans (likely  35  for different reasons), the process was not necessarily triggered by specific changes in these lineages. Indeed, it is even possible that this ancient wave of transfers might have precipitated some of the lineage-specific transformations that we observe in their plastids today. These questions and others may be difficult to answer now, but this might change if additional photosynthetic lineages are found to fall in this region of the tree - there are suspiciously few deep-branching photosynthetic alveolates, and the genomes of any additional lineages found might led to further revisions of plastid history.  Conclusions The data presented here provides a good example of how novel organisms can help to resolve previously puzzling issues. Evidence from nuclear phylogenies clearly demonstrates that the photosynthetic Chromera and CCMP3155 (later named Vitrella brassicaformis; Obornik et al., 2012) are specifically related to apicomplexan parasites, and unique features of plastid organisation and phylogeny testify to the shared origin of their plastids. The presence of a unique form II Rubisco acquired by horizontal transfer from bacteria, plastid phylogenies, and, possibly, U tailing in plastid transcripts provide evidence for a photosynthetic ancestor in apicomplexan and dinoflagellates. This data contradicts several widely cited hypotheses on the apicoplast origin, points to ancestral plastid presence in Cryptosporidium, and suggests that loss of photosynthesis and perhaps even plastids has been widespread in the alveolates. The plastid in CCMP3155 has the largest and most slowly evolving gene complement in all alveolates, which provides a solid support for the common origin of alveolate and heterokont plastids, and indicates a significant genome reduction in the alveolate plastid ancestor. These results resolve a  36  significant part of the plastid evolutionary history in alveolates and other eukaryotes and suggest that further information from C. velia and CCMP3155 has a great potential in shedding light on the evolutionary acquisition of apicoplast metabolism and the origin of parasitism in apicomplexans.  Materials and methods Culturing, DNA sequencing and assembly, PFGE and Southern Blotting C. velia (CCMP2878) and CCMP3155 were cultivated in L1 seawater medium at 22°C in 16/8 light/dark cycle. Transmission electron microscopy of CCMP3155 was performed as described previously (Moore et al., 2008). Genomic DNA was extracted as described previously (Moore et al., 2008), and sequenced by 454-pyrosequencing. The assembly of plastid genomes resulted in 13.46x and 92.65x coverage in C. velia and CCMP3155, respectively. The assembly in C. velia was verified by direct pair-end sequencing of specific PCR products spanning the entire genome, either connecting conserved plastid genes or directly over-lapping. The nuclear dataset was constructed by cloning and sequencing PCR, RT-PCR, and 3′RACE products with specific primers designed to extend fragments of nuclear genes found in the 454 data (see the list of genes below). For the size estimate of C. velia plastid genome, cells were embedded in low-melting agarose plugs, treated with 2% N-laurylsarcosine detergent and 2 mg/mL proteinase K for 30 h at 56 °C and run on pulse-field electrophoresis at U = 6 V/cm (pulses 0.5–25 s) for 20 h in 0.5× TBE. Separated DNA was then blotted onto a membrane, which was hybridized with a radioactively marked psbA probe and the final size of the plastid genome was calculated as an  37  average of five lane measurements from two independent pulse field gel electrophoresis (PFGE) runs. RNA extraction, RACE and cRT-PCR Total RNA was isolated by using TRIzol, treated with DNase I for 10 min, and purified using RNeasy MinElute Cleanup Kit (Qiagen). 3′RACE and 5′RACE were performed using FirstChoice RLM-RACE kit (Ambion). 3′ regions of chlI, rps9, and rpl21 transcripts were cloned using a standard protocol and three clones from each sequenced and assembled into single contigs. For determining polyU tails in plastid transcripts, purified total RNA was treated with RNA ligase, reversely transcribed into first-strand cDNA and PCR products amplified using outwards-facing, plastid gene specific primers. All PCR products amplified from cDNA were cloned and one to three clones sequenced. Comparative genomic and phylogenetic analyses Analysis of rearrangements between plastid genomes of CCMP3155 and Plasmodium falciparum was done using GRIMM. Homologous genes from large single copy region and genes from inverted repeats were analysed altogether (circular topology) or separately (linear topology) leading to the same total of 4 inversions. Ribosomal RNA sequences were aligned using arb-aligner (http://www.arb-silva.de/aligner/). Amino acid sequences were aligned using MAFFT v6.240 (Katoh et al., 2005). Alignments were edited using Bioedit v7.0.9 (Hall, 1999) and Gblocks v 0.91b (Castresana, 2000). The subset of 34 conserved plastid genes (Figure 2.12) was selected based on their maximum likelihood distances as inferred with TREE-PUZZLE 5.2 (Schmidt et al., 2002) (cutoff value was set to 0.82). Concatenated nuclear dataset (a; see below) was analysed using RAxML 7.1 ((Stamatakis, 2006); LG+Gamma+F model for protein- and  38  GTR+Gamma for rRNA-genes, 1000 bootstrap replicates) and MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001); WAG+Gamma+F and GTR+Gamma, models, two Markov chains run under default priors for 2x106 generations, first 5x104 were excluded from consensus topology reconstruction as a burn-in). Plastid datasets (b, c, e, f) were analysed under the CpREV+Gamma+F empirical model in RAxML (500 bootstrap replicates) and MrBayes (two Markov chains, default priors, 5x105 generations, burn-in 5x104), and CAT mixture model in PhyloBayes 3.2 (Lartillot et al., 2009) and PhyML 3.0-CAT ((Guindon and Gascuel, 2003; Quang et al., 2008); C50 model for best tree, C20 for 100 replicate bootstrap analysis). Form II Rubisco data set (d) was analysed using RAxML (LG+Gamma+F, 1000 bootstrap replicates), MrBayes and PhyloBayes (settings same as b, c, e, f). Some bioinformatic analyses were carried out on the freely available Bioportal (www.bioportal.uio.no). Phylogenetic datasets Phylogenetic analyses were conducted on the following data sets: a) Data set of nuclear genes (6 protein + 2 rRNA genes; 7137 positions); Figure 2.1. Genes: hsp90, hsp70, alpha-tubulin, beta-tubulin, biP, eF2, SSU rRNA, LSU rRNA. b) Data set of plastid genes limited to genes retained in dinoflagellate plastid genomes (11 genes, 4212 amino acid positions); Figure 2.9A. Genes: atpA, atpB, petB, petD, psaA, psaB, psbA, psbB, psbC, psbD, psbE. c) Data set of plastid genes limited to the content of apicomplexan plastid genomes (23 genes, 4438 amino acid positions); Figure 2.9B. Genes: clpC, rpl2, rpl4, rpl6, rpl11, rpl14, rpl16, rpl23, rpoB, rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8, rps11, rps12, rps17, rps19, sufB, tufA. d) Data set of form II Rubisco (455 amino acid positions); Figure 2.7.  39  e) Data set of 34 conserved plastid genes (7599 amino acid positions); Figure 2.12. Genes: acsF, atpA, atpB, atpH, atpI, clpC, petB, petD, petG, petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbN, psbT, rpl11, rpl14, rpl16, rps12, rps19, rps5, sufB, tufA. f) Data set of all plastid genes present in CCMP3155 or C. velia (68 genes, 15736 amino acid positions); Figure 2.11. Genes: acsF, atpA, atpB, atpH, atpI, ccs1, ccsA, clpC, petA, petB, petD, petG, petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbN, psbT, psbV, rpl2, rpl3, rpl4, rpl5, rpl6, rpl11, rpl14, rpl16, rpl19, rpl20, rpl23, rpl27, rpl31, rpoA, rpoB, rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8, rps11, rps12, rps13, rps14, rps16, rps17, rps18, rps19, secA, secY, sufB, tatC, tufA, ycf3, ycf4  40  Figure 2.1: The nuclear phylogeny shows C. velia and CCMP3155. C. velia and CCMP3155 are two independent photosynthetic lineages closely related to apicomplexan parasites (arrow). The RAxML tree is derived from concatenation of eight nuclear encoded genes (7,137 characters). RAxML/ MrBayes supports are shown above branches; solid circles indicate 100/1 supports.  41  Figure 2.2: Plastid genome maps of Chromera velia and CCMP3155. Genes on the outside are transcribed counter-clockwise. All genes are coloured according to the functional categories (bottom right). Asterisk next to the gene for tRNA-Leu (UAA) indicates an intron in the anticodon triplet. Crosses in C. velia genes label pseudogenes. The plastid genome of C. velia has not been proven to map as a circle (dotted line). The relative sizes of both plastid genomes are proportional.  42  Figure 2.3: Size estimate of the C. velia plastid genome. (A) Pulse field gel electrophoresis (PFGE) of genomic DNA revealed a faint band at the size of ~120 kb (arrow). (B) Southern hybridization of a different PFGE run. Radioactively labeled psbA plastid gene probe showed hybridization signal at the corresponding size. We assume that the lower smudge belongs to sheared plastid DNA. The experiment was repeated two times with the same result.  43  Figure 2.4: Venn diagram of plastid genome contents in various red plastid lineages. Overlap between the four lineages of alveolates represented by the center rings reveals that plastids of C. velia and CCMP3155 collectively encode all genes found in both apicoplasts and dinoflagellate plastids. Gray boxes highlight 18 genes that are absent in plastids of plants and green algae but all found in alveolate and red algal-derived plastids ('Red algal plastids' at the top). Genes that originated through horizontal gene transfer are marked with an asterisk. The diagram does not include genes for tRNAs, other small RNAs (5S rRNA, ffs, tmRNAs, rnpB) and the ppd gene horizontally transferred to the CCMP3155 plastid genome.  44  Figure 2.5: The plastid ribosomal superoperon red algal origin of alveolate plastids. The superoperon originated by fusion of S10+spc+alpha operon cluster and str operon (top). Genes in the superoperon are transcribed in the left-right order and solid horizontal lines connect neighbouring genes (L=rpl and S=rps ribosomal protein genes). Diagonal lines show transposition of rpl31 in CCMP3155 and C. velia (solid) and additional two possible transpositions in the ancestor of alveolates (dotted). The white type of the cryptophyte and haptophyte rpl36 gene indicates it was acquired by horizontal gene replacement from a non-cyanobacterial donor. The asterisk denotes further modifications of the superoperon in heterokont algae: the presence of ycf88 between rps19 and rpl22 in diatoms O. sinensis, P. tricornutum and T. pseudonana and loss of rpl4, rpl29 and rpl18 in pelagophytes A. lagunensis and A. anophagefferens. Red algae: Porphyra purpurea, Porphyra yezoensis, Gracilaria tenuistipitata, Cyanidioschyzon merolae, Cyanidium caldarium; Cryptophytes: Guillardia theta, Rhodomonas salina; Haptophyte: Emiliania huxleyi; Heterokonts: Vaucheria litorea, Heterosigma akashiwo, Thalassiosira pseudonana, Phaeodactylum tricornutum, Odontella sinensis, Aureoumbra lagunensis, Aureococcus anophagefferens, Fucus vesiculosus and Ectocarpus siliculosus.  45  Figure 2.6: Conserved gene order in alveolate plastid genomes. The plastid genomes of CCMP3155 and apicomplexans share several uniquely organized gene clusters (see the key) not found in other plastid genomes. Genes lost in apicomplexans are mostly connected to photosynthetic function in the plastid of CCMP3155. Open reading frames in apicomplexans have no homology to other plastid genes.  46  Figure 2.7: Form II Rubisco in plastids of C. velia, CCMP3155, and dinoflagellates. RaxML phylogenetic tree of form II Rubisco proteins shows the C. velia and CCMP3155 form II Rubisco is closely related to homologs in dinoflagellates.  Figure 2.8: Polyuridylylation of plastid transcripts in C. velia. Alignment of genomic DNA sequences of three plastid genes, psbC, psbB and psaA with corresponding cDNA sequences. All cDNA clones are terminated with thymidine stretches (in bold) that are absent in genomic DNA suggesting presence of transcript polyuridylylation in the plastid. Underlined thymidines may correspond to the 5’UTRs of the circularized transcripts. Presence of polyU tails in psbB and psbC transcripts was validated by sequencing 3 clones from 3’RACE products obtained with oligo-dA and gene specific primers.  47  Figure 2.9: Multiprotein phylogenies support the common origin of alveolate plastids. (A) Analyses of 11 plastid genes retained in dinoflagellate plastids supports the monophyly of alveolate sequences, their relationship to heterokonts and the monophyly of all chromalveolate plastids. (B) Analyses of 23 genes retained in apicoplasts supports plastids of CCMP3155 and C. velia as their closest relatives. The ML trees were constructed using CAT model (A, B) and display PHYML-CAT/RAxML/MrBayes/Phylobayes branch supports (≥60/≥50/≥0.98/≥0.98 are shown as significant); solid circles indicate 100/100/1/1 supports.  Figure 2.10: Summary of plastid evolution in alveolates. The plastid genomes of C. velia and CCMP3155 provide a direct link between the plastids of apicomplexans and dinoflagellates because they retain ancestral features that were previously thought to be exclusive to one or the other of these lineages (boxed at the right). Relationships between the lineages based on nuclear data are shown at the left. An asterisk indicates that several regions of conserved gene order are found between the plastid genomes of apicomplexans and CCMP3155, and CCMP3155 and C. velia. Double asterisk indicates that the presence of polyUs in CCMP3155 plastid transcripts has not yet been determined.  48  Figure 2.11: Plastid phylogeny of 68 proteins found in CCMP3155 plastid. The tree displays complete support for grouping alveolate (represented by CCMP3155) and heterokont plastids (A + H, arrow) and red algal plastids (RP, arrow). Dotted branch indicates the placement of C. velia sequence, which received complete support in all analyses. The Maximum likelihood tree constructed using CpREV+Gamma+F model is displaying PhyML-CAT(aLRT)/ RAxML/ MrBayes/ PhyloBayes supports over branches; solid circles indicate 1/100/1/1 supports. Only ≥0.98/≥60/≥0.98/≥0.98 supports are shown as significant.  49  Figure 2.12: Plastid phylogeny of 34 conserved plastid proteins. The dotted branch indicates the placement of C. velia sequence, which received complete support in all analyses, but was excluded because of its high rate of substitution. The RAxML tree shows RAxML/PhyML-CAT/MrBayes/PhyloBayes supports over branches; solid circles indicate 100/100/1/1 support. Only ≥60/≥50/≥0.98/≥0.98 branch supports are shown as significant. The phylogenies confirm that CCMP3155 contains a red algal plastid and support its relationship to plastids of heterokonts.  50  Chapter 3: Split photosystem protein, linear-mapping topology and growth of structural complexity in the plastid genome of Chromera velia Introduction Chromera velia is an autotrophic alveolate that was discovered during a survey of zooxanthellae in Australian coral reefs . The dominant reef-symbionts are dinoflagellates from the genus Symbiodinium, but C. velia was found to be related to the sister group of dinoflagellates, the apicomplexan parasites (Moore et al., 2008; Oborník et al., 2009). Because apicomplexans include a number of medically and economically important pathogens (e.g., the malaria parasite Plasmodium, as well as Toxoplasma, Cryptosporidium, and Babesia), and because of the interest in the cryptic, non-photosynthetic plastid that has now been found in many of these apicomplexans, the nature of the plastid in C. velia was of immediate interest. Accordingly, the complete C. velia plastid genome was characterized and has proven critical in elucidating the origin of the apicomplexan plastid and its relationship to that of dinoflagellates (Janouškovec et al., 2010), and other plastid-related metabolic pathways have also already been compared to those of apicomplexans (Botté et al., 2011; Kořený et al., 2011). While these questions have certainly directed much attention to C. velia and its plastid in particular, they have also overshadowed the intrinsic interest in this organism. Aside from being a key to understanding apicomplexan plastids, C. velia is itself potentially interesting and important both ecologically and evolutionarily. This is because C. velia is one of the few known photosynthetic lineages in the alveolates, the others being Vitrella brassicaformis (formerly known as CCMP3155; Oborník et al., 2012) and dinoflagellates, so its plastid represents new breadth in the study of plastid diversity never before accessible for comparison with other  51  plastids. Indeed, the initial description of the C. velia plastid genome noted several unusual features with implications for plastid evolution and function (Janouškovec et al., 2010), but since they were typically not shared with either apicomplexans or dinoflagellates (and therefore not comparable to them), they have not been characterized further. Here we describe three intriguing features of the C. velia plastid genome: the presence of split genes encoding split proteins, divergent characteristics of gene organization and expression, and the physical structure of the chromosome. We show that two functionally important proteins involved in photosynthesis, PsaA representing a core subunit of photosystem I (PSI), and AtpB representing the β subunit of the ATP synthase, are expressed in two discrete fragments at both the RNA and protein levels, which has interesting implications for the structure and function of these otherwise highly conserved proteins. We also show that the plastid chromosome is highly-divergent in structure with a pronounced strand polarity, altered gene order, and large extensions and insertions in many coding sequences. The intergenic spacers are riddled with traces of intrachromosomal recombination pointing to a possible driving force for genome remodeling. Lastly, we significantly expand the depth of sequence coverage at the DNA level, and show that the coverage pattern at the chromosome ends, PCR experiments, genome migration on pulse field gels, and presence of two long terminal inverted repeats all suggest that the genome is linear in structure. This would represent the first-documented case of a linear-mapping plastid genome. These unique characteristics substantially expand our current understanding of plastid genome diversity, much of which we hypothesize is due to high levels of recombination in this lineage.  52  Results and discussion Fragmented genes encode fragmented proteins expressed from polycistronic mRNAs The complete C. velia plastid genome contained a number of small fragments of genes, including psaA, atpB, psaB, psbB, rpl3, and tufA. In most of these cases, intact homologues were also present, but in psaA and atpB only two fragments were found that were widely separated in the genome and that together would account for the entire gene (psaA-1, psaA-2, atpB-1, and atpB-2). Both gene products are critical for photosynthesis and ATP generation, suggesting three possible explanations, all of which are unusual: the plastid fragments are pseudogenes and intact proteins are imported from the cytosol, the plastid gene products are trans-spliced at the RNA or protein levels, or the proteins function as unique two-subunit forms. Expression of psaA-1 and psaA-2 fragments was examined by transcript mapping by circularizing mRNAs and Northern analysis (Materials and methods). RT-PCR on circularized mRNA produced a product from each of the fragments comprising the coding region, flanking sequence and a short 6-12 nucleotide-long oligouridylylated (oligoU) tail (Figure 3.1). No full length products were observed in any transcript circularization, and the consistent failure to connect the fragments by RT-PCR using multiple primer sets suggested they are not spliced at the RNA level. Both psaA-1 and psaA-2 are surrounded by several genes on the same strand, so their co-ordinated expression was analyzed by hybridization to probes corresponding to psaA-1, psaA-2, the upstream genes, and the downstream non-coding region. All probes hybridized with a fragment of the expected size of a stand-alone oligoU mRNA corresponding to the fragment in question: no evidence for a spliced RNA form was found. However, probes did hybridize to larger fragments (Figure 3.1 and data not shown) corresponding in size and hybridization pattern  53  to a single dicistronic (psaA-1 + psbB) or multiple large polycistronic mRNAs (psaA-2 + psbE + psaB, and surrounding genes). Linkages between psaA-1 and psbB, and psaA-2 and psaB were confirmed by RT-PCR. Hybridization patterns around psaA-2 suggested processing of large mRNAs into the dicistronic psaA-2 + psbE and subsequently to single gene transcripts (Figure 3.1). Similarly to psaA, all attempts to link the two atpB fragments by RT-PCR yielded no products. The Northern analysis of atpB-1 and atpB-2 revealed single bands corresponding in size to each expected fragment and no evidence was found for a spliced mRNA, again suggesting independent expression (Figure 3.2A). The transcriptional profiles of both atpB fragments were supportive of this conclusion (Figure 3.2B). All four psaA and atpB fragments were structurally conserved (Figure 3.1 and 3.2) and highly expressed: all were among the top 22 most abundantly transcribed plastid genes (Table 3.1, and see below). Evidence at the genomic and transcriptomic levels therefore consistently suggested that all psaA and atpB fragments are independently transcribed, translated and code for functional products. To obtain convincing evidence that psaA and atpB gene fragments are expressed as separate polypeptides, and that no intact version of the proteins is present, we analyzed membrane protein complexes by combination of two-dimensional (2D) electrophoresis and mass spectrometry. Membrane fraction isolated from C. velia cells was solubilized by dodecyl-β-maltoside and protein complexes were separated on Clear-Native gel in the first dimension and on SDS gel in the second dimension (Figure 3.3). The most abundant spots were subjected to mass spectrometric analysis of their tryptic peptides which were correlated with predicted peptides of plastid-encoded genes and available EST sequences. We identified the most abundant plastid-encoded proteins and the nuclear-encoded PetC protein (Figure 3.3), which  54  allowed us to distinguish PSI supercomplex with bound antenna, PSII and PSII supercomplexes and cytochrome b6f (Figure 3.3). Fragments corresponding to PsaA-1, PsaA-2, AtpB-1, and AtpB-2 were all identified, but no spot with the expected mass/charge properties of intact PsaA and AtpB was found. Identification of spots was consistent with chlorophyll fluorescence detected in the native gel. All three PSII complexes exhibited strong fluorescence, which contrasted with the minimal fluorescence of chlorophyll bound within the PSI supercomplex. The minimal fluorescence emission from the PSI complex indicated it was well-preserved before it was separated into subunits in the second dimension (Figure 3.3). This, together with the lack of evidence for intact genes or transcripts for either gene, suggested strongly that the two separate subunits of both PsaA and AtpB proteins are assembled into functional PSI and ATP synthase, respectively. Modeling the two-subunit forms shows that the point at which both genes were split corresponds to a loop spanning structural domains (Figure 3.1 and 3.2). In the case of PsaA the position of this breakpoint corresponds to the largest loop in the protein, which separates the first two pairs of peripheral helixes from the rest of the protein including the photochemistry-performing core (Figure 3.1; Green, 2003; Jordan et al., 2001). The split has driven a considerable change in divergence rate between the two peptides: PsaA-1 at the photosynthetic antenna periphery is significantly less conserved than PsaA-2 at the antenna core. By calculating Maximum likelihood-corrected genetic distances (Materials and methods), we estimated that PsaA-1 is about 4.6 times more divergent than PsaA-2 relative to other plastid homologs, suggesting it is, not surprisingly, under weaker functional constrain. Maximum likelihood phylogenies using a wide sampling of plastid and bacterial homologues (Materials and  55  methods) have also excluded a secondary bacterial origin for PsaA-1 as suggested previously using a neighbour-joining distance analysis (Mazor et al., 2012): although fast-evolving, both C. velia PsaA-1 and PsaA-2 were most closely related to the sister taxon Vitrella and other plastids. When compared to PsaA, the AtpB split is suggestive of a different, but equally interesting process. The region surrounding the split is conserved in plastid, mitochondrial and bacterial homologs, but interrupted by a 24 amino acid insertion in Vitrella, C. velia's closest photosynthetic relative (3.2C). A detailed sequence alignment of this region shows that the split in C. velia AtpB occurred within this shared insertion, suggesting that the ancestral acquisition of this insertion might have provided an evolutionary opportunity for the split. A 31 amino acid long insertion at the same site is also present in the AtpB of the related dinoflagellate Amphidinium. Although unrelated in sequence, the notorious divergence of dinoflagellate plastid proteins and their greater distance to C. velia and Vitrella could easily account for the insertion divergence pointing to a possibly even deeper origin. Significance of split proteins for photosynthesis and ATP generation Both PsaA and AtpB have never been observed to be fragmented like those of C. velia. Although the psaA gene is split in Chlamydomonas reinhardtii, RNA trans-splicing produces a full-length transcript, which is translated to a full-length protein (Merendino et al., 2006). The structure of the PSI core is highly conserved in all known oxygenic phototrophs (Busch and Hippler, 2011; Nelson and Yocum, 2006) and it is formed by a heterodimer of structurally similar PsaA and PsaB proteins, which bind a remarkably high number of cofactors: about 100 chlorophylls, 14 carotenoids and two phylloquinones (Amunts et al., 2010). The algal PSI is expected to be  56  structurally similar to the plant PSI including several Lhca antenna proteins bound in a half-circle around the PSI. PsaA in C. velia is split between domains 4 and 5 (Figure 3.1) in the peripheral pigment-binding domain. The split was therefore not a reversal of the proposed ancient formation of psaA by the gene fusion of a psbB/psbC-like antenna component and a psbA/psbD-like core of the reaction centre (domains 1-6, and 7-11, respectively in Figure 3.1; Schubert et al., 1998). However, the position of the split and the faster evolutionary rate of PsaA-1 (peripheral fragment) can be reconciled with the structure of the PSI. Indeed, the presence of a supercomplex of PSI and antenna proteins in C. velia is obvious from the 2D electrophoresis (Figure 3.3). Apart of the structure of PSI itself, the process of the PSI assembly seems to be highly conserved through evolution from cyanobacteria to higher plant chloroplasts including assistance of the same auxiliary factors such as Ycf3 and Ycf4 (Ozawa et al., 2009). Although many individual steps in the PSI biogenesis remain unknown, this process had to be remodeled in C. velia to assemble three proteins instead of two into the functional PSI core complex. Particularly intriguing is the binding of chlorophyll cofactors into the split PsaA. Both in vivo and in vitro studies suggest that chlorophyll has to be inserted into core subunits of PSI co-translationally, probably as a prerequisite for correct protein folding (Eichacker et al., 1996; Kim et al., 1994). The putative interface between PsaA-1 and PsaA-2 is rich in chlorophyll molecules and, according to available crystal structures, these chlorophylls are coordinated by both parts of split PsaA. It is not clear how C. velia inserts these chlorophylls into PSI and how stability and correct folding of the nascent PsaA-1/2 is achieved. Perhaps, both PsaA fragments are assembled together co-translationally, and this is synchronized with or even assisted by  57  chlorophyll loading (which could intriguingly help to co-ordinate the orientation of the two fragments). In any case, the PSI core biogenesis in C. velia may involve additional synchronizing assembly step(s) and the recruitment of ‘new’, nuclear-encoded auxiliary factors that assist this process. Interestingly, the second protein of the PSI antenna core, PsaB, mirrors some of the PsaA structure at functionally analogous positions: although not split, PsaB also contains a variable loop between the fourth and fifth trans-membrane helix and a highly divergent N-terminus. Whether this plays a compensatory role directly related to the splitting of PsaA is not clear, but it is an interesting possibility that would require direct biochemical testing. Similarly to the PsaA protein, any split of the AtpB protein would seem improbable due to its critical function and conserved structure. Even though the proposed AtpB-1 part of the β-subunit appears to be relatively far from the catalytic site, it is known that tilting of the β-subunits including the top β-sheet ‘crown’ is critical for the catalytic cycle. The movement of the upper part of the β-subunit is subtle, but if it is restricted by inhibitor tentoxin, then the cyclic interconversion of nucleotide binding sites is blocked (Groth, 2002). Tentoxin binds close to the place where the AtpB protein is split (Groth, 2002). Correct re-assembly of AtpB-1 and AtpB-2 proteins is therefore likely to be highly constrained to avoid any restrictions of AtpB-2 movement. On the other hand, it is also possible that the fragmented AtpB-2 is more flexible and perhaps less sensitive to some natural inhibitors. The ATP synthase was also noteworthy in that it migrated at the top of the native protein gel. This has never been observed in cyanobacteria (Herranen et al., 2004) or higher plants (Järvi et al., 2011), and indicates that ATP synthase might be integrated into a very stable megadalton supercomplex in the plastid of C. velia.  58  Genome-wide transcription profiles We analyzed the transcription of plastid genes using total RNA and polyA fraction sequences that were available to us. As expected, transcripts for all predicted open reading frames (ORFs) were predominantly found in the total RNA fraction (Figure 3.4). All plastid transcripts were also represented in the polyA fraction at consistently lower abundance (two to three orders of magnitude), probably due to a carryover of non-polyA transcripts. The psbA gene stood out in two ways, however. First, it is highly expressed, being represented at almost an order of magnitude higher levels than other protein-coding genes (and similar levels as the rRNA operon, Figure 3.4 and Table 3.1). Second, the representation of psbA in the the total RNA and polyA fractions is about equal, which would be the expected result for a polyadenylated transcript. Two other genes, rps19, and orf157 (psbB pseudogene) were also relatively over-represented in the polyA fraction. Whether all these genes are uniquely polyadenylated or a carryover into the polyA fraction occurred due to another reason is currently not clear and awaits successful cloning of their 3' transcript ends. Transcript coverage from the total RNA fraction enabled us to estimate and compare relative expression of all plastid genes separately, and in their functional groups (Table 3.1 and 3.2). As expected, the rRNA genes were most highly expressed followed by genes of the four membrane complexes (PSI, PSII, cytochrome b6f and ATP synthase), and these also comprised all of the 22 most expressed protein-coding genes with the single exception of tufA. The least expressed were genes related to transcription and translation: RNA polymerase, tRNA genes and ribosomal proteins (Table 3.1). However, unknown ORFs and most intergenic regions were also expressed, including relatively highly expressed orf389, orf201, and intergenic regions  59  downstream of psbJ and atpB-1 (Figure 3.4). The overall transcriptional profile suggests that extensive polycistronic transcription is present, and that polycistronic units often comprise functionally similar transcripts (e.g., related to photosynthesis or expression). Lastly, we used the transcriptome to address the presence of RNA editing. We compared base polymorphism to the genome at phred quality score 30 and higher (high quality) and found only four bases with significant (>10%) polymorphism at RNA level (sites 44346 (51.5%), 46071 (23.66%), 46074 (39.35%), and 46574 (23.79%)). All the sites are located within the rRNA operon and are 99.8-100% identical at the DNA level. Based on their location, we conclude this is more likely due to reverse transcription errors of modified rRNA nucleosides, and not RNA editing. Many plastid ORFs contain unusual features Another interesting characteristic of the C. velia plastid genome is the presence of unusually large ORFs, including several genes that are expected to be present in plastid genomes, but which are unusually large due to the presence of long extensions that bear no recognizable similarity to known genes. In three genes (rps7, 8, and 17), the coding region has been extended by 126 to 273 amino acids towards the C-terminus effectively tripling them in size (Figure 3.5). Four other genes (rps3, 4, 8, and 11) are found at the end of unusually large ORFs (Figure 3.6). The most obvious example of this is rps4, which is encoded within a 1998 amino acid long ORF, compared to 201 amino acid long rps4 in Porphyra purpurea. In these cases, start codons may be located near the beginning of the homologous region, but it is still puzzling why the ‘gene’ would be preceded by such a long stretch of sequence uninterrupted by stop codons. We used transcript read mapping from the total RNA fraction to see whether read coverage differs between the homologous ORF regions and their non-homologous extensions. Although this approach cannot accurately determine ORF boundaries (many of these genes are expressed relatively little and 60  intergenic regions are often transcribed) it shows whether transcriptome data is consistent with the possible existence of extensions in mature proteins. All ORFs were found to be covered by transcriptome reads consistently in their entirety (Figure 3.5 and 3.6), suggesting that these genes may really have expanded in size at the protein level. Additionally, rpoC1 and rpoC2 genes contain long insertions (507 and 971 amino acids) in their variable regions. Once again, much of these insertions is consistently covered by transcriptome reads suggesting they may represent genuine insertions in mature proteins, however, a shear drop in transcript coverage is also found within each insertion (Figure 3.7). This raises the possibility that these proteins may be split much like PsaA and AtpB, but in this case both are in frame and low-coverage, so it is impossible to distinguish between the two alternatives. A number of smaller changes to the canonical reading frames are found throughout the C. velia plastid genome. Accounting for extensions and indels longer than 20 amino acids, five other ORFs have a truncated N-terminus (atpA, ccsA, petA, ycf3, psaB), one ORF has a significantly extended N-terminus (rpl4), two ORFs have a truncated C-terminus (ccsA, rpoA), five ORFs have an extended C-terminus (atpA, aptB, atpH2, petA, rps2), and three ORFs contain intervening insertions (rpl6, rpoB, psaB). ORF modifications are therefore found in various functionally unrelated proteins including the conserved core. In addition, all unusual ORFs are apparently expressed in the C. velia plastid, but absent in nuclear transcriptomic data so we conclude that they are unlikely to be pseudogenes. Apart from changes to gene length and structure, we also noted unusual changes to gene presence: two genes are present in two (atpH) and three (clpC) full-length paralogs, despite that a single type is found in other plastid genomes. All three clpC paralogs are highly divergent from  61  each other and other plastid clpCs. Although they overlap partially, the largest and most complete of them (clpC3) is an order of magnitude less expressed than the other two paralogs (Figure 3.8). In contrast, the two atpH paralogs are quite different: atpH1 is both highly expressed and conserved, but the expression of atpH2 is nearly 100-fold lower, despite being present in two copies (on the TIRs; Figure 3.8). The atpH2 gene also contains an unusual C-terminal extension, which doubles it in size. A closer look at the surrounding sequence reveals that atpH2 originated by a recent recombination event involving the duplication of the atpH1 region (and including trnK; see below). This, together with low expression suggests that atpH2 is most likely a non-functional pseudogene. More generally, however, this example illustrates how duplication by intragenomic recombination can drive gene paralogy, ORF expansion, and altered gene order; all features observed throughout the C. velia plastid genome. Altogether, PsaA and AtpB splits stand out as unprecedented cases but represent only a fraction of C. velia plastid peculiarities. A number of highly modified ORFs are found in the genome, but organized without any obvious pattern. Extensions in plastid proteins have been reported previously (mostly in green algae, e.g. de Cambiaire et al., 2006), but never in the extent or number reported here. Notably, many proteins with unusually extended reading frames function in the small subunit of the ribosome. If the extensions are really present in the final functional proteins, this has important implications for plastid ribosome structure. Alternatively, their transcripts could be modified so that only the homologous region is translated. This process may be related to transcript oligouridylylation, but as yet we have no evidence to support this. OligoU tailing is an unusual modification of interest on its own (Wang and Morse, 2006). Its significance and distribution in C. velia plastid transcripts are not known, but so far oligoU tails  62  have been found in mature mRNAs that are derived from polycistronic primary transcripts (Figures 3.1 and 3.4). This suggests a possible role in transcript processing, perhaps similar to that observed in dinoflagellate plastids (Barbrook et al., 2012; Dang and Green, 2009; Nelson et al., 2007). If confirmed, the ubiquitous presence of polycistronic transcription would imply that oligoU tailing is widespread in the C. velia plastid and tightly intertwined with expression of many core plastid genes. A linear-mapping plastid genome in C. velia One of the most significant features of the previously published plastid genome from C. velia was that it did not map as a circle, as do all other plastid genomes completed to date. The ‘gap’ in the sequence falls between the two copies of the terminal inverted repeat (TIR) containing psbA (Janouškovec et al., 2010). We attempted to close this gap using multiple approaches. We started by massively increasing the depth of sequence coverage. The genome was originally sequenced by 454 to a 13-fold depth of coverage, so we used an independent approach and assembled the genome from paired-end Illumina sequence reads to an average depth of almost 600-fold. This approach extended the two ends by 300 bp on each side, but did not link them. We verified much of this extension by PCR, linking the TIR ends to the unique large single copy (LSC) region of the genome. At the very ends of the chromosome, each TIR diverged into a 15bp AT-rich sequence (mutually oriented as a direct repeat) followed by a short sequence extension into the complimentary TIR at a very low depth of coverage (Figure 3.9). These sequence extensions created an overlap between the ends, however, several lines of evidence suggested that this is not due to a circular chromosome, but rather a linear topology with structured ends. First, mapping the depth of coverage revealed a pattern expected for a linear molecule. The depth of coverage was generally consistent across the LSC and most of the TIRs, but reduced linearly beginning at 63  about 600 bp from the end, followed by an exponentially reduction and partial recovery between 300 to 50 bp from the end, followed by a terminal drop (Figure 3.9A). The size and shape of the exponential reduction/recovery would be predicted by the mean DNA fragment size in the sequencing library (327 bp) and linear topology, as a consequence of fragmentation bias near the chromosome end. Therefore, both, the overall decrease in coverage depth and the exponential decrease/recovery close to the ends support a linear chromosome conformation (if it were circular, the whole region should not differ from the rest of the genome). Second, if the genome was circular but the gap was hard to sequence, we would expect some paired ends to span the gap. However, when the over 650,000 paired ends are mapped to the genome, not a single paired end spanning the gap was identified (Materials and methods). Instead, both ends of the chromosome consisted exclusively of reverse-mapping reads (Figure 3.9A and B). Third, PCR experiments using six primers (all shown to successfully amplify products from one side of the gap in different combinations) and all their nested combinations failed to fill the gap between the two TIRs (Figure 3.10). Fourth, the size of the plastid genome estimated directly by pulse-field gel electrophoresis (PFGE), and by psbA probe hybridization was consistent with abundant presence of linear monomers, 120 kb in size (Figure 3.11 and Janouškovec et al., 2010). Subgenomic-sized material and unresolved DNA in the well were also abundantly represented, while putative linear dimers and plastid DNA at the compression zone was rare. The PFGE experiment was reproduced by using probes to three plastid genes encoded in the LSC region, tufA, petA, and atpA (Materials and methods) and consistently led to the same result, suggesting linear monomers are an important constituent of plastid DNA in C. velia. Subgenomic-sized molecules could represent plastid DNA fragmented during PFGE preparation (note that intact C.  64  velia cells were digested in agarose plugs, however), nucleus-encoded plastid DNA fragments, plastid DNA in the process of replication, or other forms of genuine plastid DNA (Ellis and Day, 1986; Oldenburg and Bendich, 2004; Scharff and Koop, 2006). Incompletely digested cells of C. velia in the agarose plugs (Materials and methods) might be responsible for much of the signal in the well, although high molecular-weight branched DNA forms are also likely to be present and cannot be distinguished using the current approach (Bendich, 1991, 2004). Altogether, because the plastid genome was not closed by deep sequencing using either 454 or Illumina, no paired-end linkage could be established, no linkage could be established by PCR, and because the size of the plastid genome estimated by PFGE and Southern blot hybridizations consistent was with the existence of linear monomers, we conclude the genome is principally linear in structure and circular molecules and linear concatemers are much more rare or absent. No linear-mapping plastid genome has been documented to date, although there is a longstanding debate about whether circular-mapping plastid genomes might represent physically linear molecules (Bendich, 2004). At a sufficient depth of coverage the two ends of the C. velia plastid genome can be (barely) overlapped in sequence, but multiple lines of evidence suggest the genome is actually linear. The PFGE data supports existence of linear monomers and other topologies, but is incongruent with a significant presence of linear concatemers. This leaves no direct comparisons possible with other plastid DNAs, including those in related apicomplexans, which comprise various mixtures of circular molecules, linear concatemers, and high molecular weight material (Bendich, 2004; Day and Madesis, 2007; Lilly et al., 2001; Williamson et al., 2002). Presence of terminal inverted repeats (TIRs) in the C. velia plastid genome is another common characteristic of linear genomes (e.g. in apicomplexan mitochondria; Hikosaka et al.,  65  2010; Kairo et al., 1994). The physical structure of the chromosome ends remains unknown, but given the sequence one can speculate based on structures observed in other linear genomes. These include single-stranded loops (possibly followed by a nick) or single-stranded overhangs in the 15 bp regions (reviewed in Nosek et al., 2004), or diverse associations with end-specific proteins (e.g., (Rekosh et al., 1977; Tomáska et al., 1997). Gradually decreasing depth of coverage near the chromosome end may also suggest that the whole terminal 600 bp region, not just its very end, is protected, possibly by a t-loop (Tomaska et al., 2002). No repeated sequence can found close to the ends, however, so determining which of these structures, if any, most closely represents the physical state of the C. velia plastid chromosome will require direct testing. Altogether these data provide evidence for the first linear-mapping plastid genome in C. velia, and a similar topology could be present in other plastid genomes, particularly those that do not presently map as a circle (Gabrielsen et al., 2011). The change in the plastid chromosome topology has many important implications for the process of DNA replication and, particularly, overcoming the end-replication problem. In the circular/concatemeric DNA in apicomplexans (Williamson et al., 2001, 2002) and other plastids, several types of replication origins have been identified and linked to the D-loop, rolling circle, and recombination-mediated replication strategies. The most common type of replication origin is associated with bidirectional D-loop replication and located inside the duplicated rRNA inverted repeat (Day and Madesis, 2007; Kolodner and Tewari, 1975; Krishnan and Rao, 2009; Williamson et al., 2002). To provide a first glimpse into this in C. velia, we identified a putative replication origin of the plastid genome in silico using the cumulative GC-skew analysis. This region, far away from the single rRNA unit (68-69 kb from one end of the linear contig; Figure  66  3.12A) is characterized by a minimum in the cumulative GC-skew and loosely conserved tandem repeats (positions 68404 to 69029). In contrast, both ends of the chromosome are at a cumulative GC-skew maximum suggesting they most likely correspond to replication termini. Another feature commonly associated with origins of replication is the major shift in gene cluster polarity, which is located approximately 18 kb away from the predicted replication origin in C. velia (Figure 3.12B). Interestingly, the broader genomic region surrounding both features is relatively over-represented in the Illumina data, whereas both chromosome ends are relatively under-represented (this holds after correction for base composition; Figure 3.12C). This is consistent with the existence of replicating molecules in the Illumina library, supporting that replication starts at about two thirds of the chromosome and proceeds towards the ends. Lastly, we determined the copy number for two C. velia plastid genes (tufA, atpH) using dot blot hybridization of total genomic DNA to a synthetic construct. The inferred copy number of both plastid genes was 9 times higher compared to a single-copy nuclear gene, topoisomerase II, and confirmed that genes encoded in the LSC and TIR are equally represented (atpH is present in three closely related paralogs all of which hybridized to the construct) (Figure 3.13). This evidence suggests that plastid gene copy number in C. velia is unusually low, and closer to the copy number of non-photosynthetic plastids in apicomplexans than to other photosynthetic plastids (Day and Madesis, 2007; Matsuzaki et al., 2001). Searching for forces driving structural complexity The C. velia plastid genome is unique in a number of ways. Here we have examined several unusual characteristics of its organization and expression, all of which raises the question, why is this genome home to so many oddities? One property that might explain several of these  67  characteristics concurrently is a high level of recombination. A functional DNA recombination machinery has been documented in plastids of several land plants and algae (Boynton et al., 1988; Haberle et al., 2008; Palmer, 1983), but despite its presence, most plastids have retained a conserved genomic architecture. This suggests other forces are in effect to prevent plastid genome reorganization. Unlike most plastids, however, the C. velia plastid genome has been extensively reshuffled. Importantly, this reshuffling affected canonical operons, gene clusters, and translationally-coupled units (Gatenby et al., 1989; Westhoff et al., 1983), which have been either fragmented (e.g., atpB/E, atpI/H/G/F/D/A, psaA/B, petB/D, ribosomal protein super-cluster) or internally reorganized (ribosomal RNA operon). In order to understand the underlying factors, we searched for short repeats, palindromes, and longer duplicates across the C. velia plastid genome sequence. Short repeats and palindromes were comparatively rare, however, the searches did identify a surprisingly large number of pairwise hits between longer genomic sequences: at least 46 matches longer than 50 bp at 65-100% similarity (Figure 3.14). Most of these duplicates are low copy-number (at 2-3 sites) and include the TIRs, many intergenic regions, duplicated atpH and clpC genes, and small duplicated fragments of otherwise complete genes (psaA, psaB, psbB, rpl3, and tufA). The most recombination-active region is found around the TIR boundary (Figure 3.15). Multiple parts of this region, including several gene fragments, three atpH paralogs, and six trnK paralogs comprising both complete and fragmentary variants are scattered in several places in the genome (Figure 3.15). This data shows that duplicative recombination processes are relatively common in the C. velia plastid genome compared to other plastids. Intragenomic recombination associated with duplication is a straightforward way to explain occurrence of paralogs, small gene  68  fragments, extensive gene re-shuffling, and re-structuring of operon units through movement of promoters and intergenic elements. Similar processes could also explain the addition of extensions and insertions to genes. The splitting of psaA and atpB could likewise be seen as a recombinational process possibly involving duplication or partial duplication. It is also possible that the PsaA split proceeded through an intermediate, in which the canonical and split PsaAs co-existed before the canonical PSI was completely lost. In this scenario, each of the two complexes could have been regulated independently and even had different functions much alike the cyanobacterial monomeric and trimeric PSI (Majeed et al., 2012). Lastly, the apparent linearization of the chromosome flanked by TIRs and 15 bp repeated regions could likewise suggest active recombination processes at work. Interestingly, while sequence shuffling has been extensive in the C. velia plastid genome, it has not been completely random. Most genes have been re-organized in large clusters with a strongly pronounced strand polarity (Figure 3.12). More importantly, functionally related genes often cluster together and are apparently transcribed polycistronically (Figure 3.4), so even with high levels of recombination a functionally re-organized genomic architecture can result from selection. Recombination may be the cause of many peculiarities in the C. velia plastid genome, but not all of them. Other factors, such as increased mutation rate, small population size and a short replication cycle could have contributed to the occurrence of some divergent characteristics, most typically transcript oligoU tailing and non-canonical genetic code (Janouškovec et al., 2010; Moore et al., 2008; Oborník et al., 2009). Evolutionary timing of trait acquisition is another important factor that could have shaped up unusual plastid features by the long-term combined effect of above forces. For example, all alveolate plastids (C. velia, apicomplexans,  69  Vitrella and dinoflagellates) have a reduced gene complement, somewhat modified gene order and comparatively fast rate of protein evolution (Janouškovec et al., 2010). Oligouridylylation of plastid transcripts is also found in dinoflagellates and may significantly predate C. velia, although it seems to be missing in apicomplexans (Janouškovec et al., 2010, and unpublished data). The AtpB insertion present in dinoflagellates and Vitrella at a homologous position to the split in C. velia could also be viewed as an ancestral acquisition that may have 'individualized' the two parts of the protein preparing ground for the split. Similarly, recombination has been very active in the dinoflagellate plastids and possibly even led to a massive fragmentation of their genome to small mini-circles (Zhang et al., 1999, 2001), so it might be tempting to speculate that the core cause of many of these conditions traces back to an ancient ancestor. Distinguishing between ancient and independent gains is nevertheless difficult. The plastid genomes of apicomplexans and Vitrella are generally not as unusual in structure as C. velia and dinoflagellages, so it is more accurate to propose that all alveolate plastids were ancestrally somewhat divergent, but evolved in very different directions. The fact that in a single organism two conservative photosynthesis proteins are split is astonishing given the high efficiency of C. velia photosynthesis (Quigg et al., 2012). The mechanism behind a particular split can be relatively simple, but it is more difficult to see how each of the PsaA and AtpB fragments became integrated into a functional multiprotein complex. For example, both PsaA fragments had to co-evolve with mechanisms allowing co-translational insertion of chlorophylls, and simultaneously acquire/remodel interactions with numerous cofactors that mediate their assembly into a fully functional PSI complex. Many other structures in the C. velia plastid appear similarly complicated. Both, the ribosome and RNA polymerase  70  complex comprise a number of highly modified components including proteins with hundreds of amino acid long extensions and insertions. Explaining the appearance of highly modified structures is more difficult than their components individually, because many parts of these systems are mutually constrained. Likewise, molecular processes such as oligoU tailing are difficult to justify in adaptive terms. Considering the complexity of similar 'unnecessary' processes in other organelles, such as RNA editing or intron splicing (Gray et al., 2010), it is attractive to hypothesize that complicated machineries operate in the C. velia plastid that serve no general advantage.  Conclusions Unlike their mitochondrial counterparts, the organization and structure of plastid genomes is highly conserved and unified. The canonical photosynthetic plastid genomes consist of a single circular-mapping chromosome that encodes a highly conserved protein core, involved in photosynthesis and ATP generation. Here we demonstrated that the plastid genome of the photosynthetic relative of apicomplexans, Chromera velia, departs from this view in several unique ways. Core photosynthesis proteins PsaA and AtpB have been broken into two fragments, which we show are independently transcribed, oligoU-tailed, translated, and assembled into functional photosystem I and ATP synthase complexes. Genome-wide transcription profiles reveal extensive polycistronic transcription and processing, and evidence for expression of many highly modified proteins, including several that contain insertions amounting to hundreds of amino acids in length. Canonical gene clusters and operons have been fragmented and reshuffled into novel transcriptional units. Massive genomic coverage by paired-end reads, coupled with  71  pulse-field gel electrophoresis and PCR, consistently indicate that the plastid genome is primarily linear. Abundant duplicative recombination could have led to protein splits, extensions, and linearization of the genome, and is perhaps the driving force behind the many features that defy the conventional ways of plastid genome architecture and function.  Materials and methods DNA extraction, sequencing, annotation and fragmented gene analysis Pelleted cells of C. velia were ground in liquid nitrogen using mortar and pestle and the resulting slurry was incubated in CTAB buffer (2% w/v cetyltrimethyl ammonium bromide; 1.42 M NaCl; 20 mM EDTA; 100 mM Tris HCl, pH 8.0; 2% w/v polyvinylpyrrolidone 0.5% β-mercaptoethanol, 1 mg/ml RNase A) at 65 ºC for 20 min. After two extractions with phenol/chloroform and one with chloroform only, DNA was precipitated with isopropanol and washed with ethanol. Pellet was dried at room temperature and resuspended in TE buffer. Extracted DNA was separated by CsCl-Hoechst gradient ultracentrifugation and AT-enriched fractions were tested for the presence of plastid DNA using the psbA probe. Plastid-enriched DNA fractions were sequenced using Illumina 54 bp paired-end reads technology and deposited in GenBank Sequence Read Archive. De-novo sequence assembly using MIRA3 extended the previous plastid contig (NC_014340.1) at both ends to the final length of 120 426 nucleotides at 572x average depth of coverage. Two errors at homopolymeric regions were corrected, which led to the merging of rps7 with orf142, and rpl4 with orf128. The gene for 5S ribosomal RNA was identified upstream of 23S rRNA, and its fold verified using Mfold (http://mfold.rna.albany.edu/?q=mfold). Proteins were re-annotated by comparison to plastid  72  homologs from an NTG start codon with the exception of 10 proteins that were more consistent with an AT[A,C,T] start codon. Six of the proteins had no alternative NTG start (atpA, rpl31, rps7, rps12, rps14, ycf3). Unknown ORFs and genes with long N-terminal extensions were annotated from the first NTG start codon. All sequence and annotations changes deposited under an updated C. velia plastid genome entry (NC_014340.2). PsaA trans-membrane domains were predicted using TMHMM 2.0 (http://www.cbs.dtu.dk/services/TMHMM/) and by comparison to PsaA of Synechoccocus elongatus. The corrected genetic distances for PsaA-1 and PsaA-2 were calculated in Tree-Puzzle 5.2 (Schmidt et al., 2002) using a dataset of 40 representative plastid PsaA homologs and the following parameters: Likelihood mapping, Slow (exact) parameter estimates, WAG model with 4 gamma categories, estimated alpha parameter and estimated amino acid frequencies. PsaA-1/2 phylogenies were computed using a broader dataset in RAxML using -m PROTGAMMALGF -f a -# 100. PsaA-1/2 and AtpB datasets were aligned using –localpair option in MAFFT (Katoh et al., 2005). RNA extraction, RACE, and Northern blot hybridization Total RNA was isolated from ~ 0.2 g of C. velia cells ruptured by repeated freezing and thawing, followed by grinding with a pestle in liquid nitrogen, by the addition of 5 ml TRI reagent, followed by manufacturer’s instructions (Sigma). Total RNA was ligated with 5’ adapter (5’ RACE) or self-ligated with T4 RNA ligase (circular RACE), and RT-PCR was carried out with either random or specific primers, or with polyT primers (3’ RACE). Sequences of circularized transcripts can be found in the GenBank data libraries under accession numbers KC734564-KC734569. For Northern blot analysis, ~10 μg per lane of total RNA was separated in a 1% formaldehyde agarose gel in 1xMOPS buffer, blotted and UV cross-linked as described  73  elsewhere (Vondrušková et al., 2005) PCR-amplified DNA fragments of selected genes were labeled by random priming with [α32P] dATP and used as probes for hybridization at 55ºC. After washing, the results were visualized on Typhoon Imaging System (GE Healthcare). Preparations of cell membranes and two dimensional electrophoresis 100 mL of cells (optical density at 750 nm = ~ 0.5) was washed and resuspended in 1 ml of 25 mM MES/NaOH, pH 6.5, 5 mM CaCl2, 10 mM MgCl2, and 20% glycerol. The concentrated cell suspension was mixed with 0.5 ml of glass beads (0.1 mm diameter) in 2 ml eppendorf tube and broken in a Mini-BeadBeater (BioSpec). Cells were broken 8 times for 10s with 2-min interruption for cooling on ice and the cell extract was separated on soluble and membrane proteins by centrifugation (40,000 x g, 20 min). The isolated membranes were resuspended in the same buffer and solubilized by a gentle shaking with 1% dodecyl-β-maltoside for 1 hour at 10°C. Insoluble contaminants were removed by centrifugation (65,000 x g, 20 min). Analysis of membrane complexes under native conditions was performed by a Clear-Native electrophoresis as described in Wittig and Schägger, 2008. Individual proteins in membrane complexes were resolved in the second dimension by SDS-PAGE in a 12–20% linear gradient polyacrylamide gel containing 7 M urea (Sobotka et al., 2008). Protein identification by LC-MS/MS analysis Gel slices were placed in 200 µL of 40% acetonitrile, 200mM ammonium bicarbonate and incubated at 37ºC for 30 min. The solution was discarded, the procedure was repeated and the gel was finally dried in a vacuum centrifuge. 20 µL of 40 mM ammonium bicarbonate containing 0.4 μg trypsin (proteomics grade, Sigma) was added to the tube, incubated at 4ºC for 45 min and than dried in a vacuum centrifuge. To digest proteins 20 µL of 9% acetonitrile in 40 mM  74  ammonium bicarbonate was added to the gel and incubated at 37ºC overnight. Peptides were purified using ZipTip C18 pipette tips (Millipore Corporation) according to manufacturer’s protocol. MS analysis was performed on a NanoAcquity UPLC (Waters) on-line coupled to an ESI Q-ToF Premier mass spectrometer (Waters). 1 µL of the sample was diluted in 3% acetonitril/0.1% formic acid and tryptic peptides were separated using the Symmetry C18 Trapping column (180 µm i.d. X 20 mm length, particle size 5µm, reverse phase, Waters) with a flow rate of 15 µL/min for 1 min. It was followed by a reverse-phase UHPLC using the BEH300 C18 analytical column (75 µm i.d. X 150 mm length, particle size 1.7 µm, reverse-phase, Waters). Linear gradient elution was from 97% solvent A (0.1% formic acid) to 40% solvent B (0.1% formic acid in acetonitrile) at a flow rate of 0.4 µL/min. Eluted peptides flowed directly into the ESI source. Raw data was acquired in data independent MS^e identity mode (Waters). Precursor ion spectra were acquired with collision energy 5V and fragment ion spectra with a collision energy 20-35V ramp in alternating 1 sec scans. For a second analysis, data dependent analysis mode was used; peptide spectra were acquired with collision energy 5V and peptides with charge states of +2, +3 and +4 were selected for MS/MS analysis. Fragment spectra were collected with a collision energy 20-40V ramp. In both modes, acquired spectra were submitted for database searching using the PLGS2.3 software (Waters) against the predicted proteins coded by the plastid genome of C. velia and against the available EST sequences (www.ncbi.nlm.nih.gov). Acetyl N-terminal, deamidation N and Q, carbamidomethyl C and oxidation M were set as variable modifications. Identification of 3 consecutive y- or b-ions was required for a positive peptide match and a minimum of 3 peptide matches were required for a positive protein identification.  75  Transcriptome analysis Total RNA and polyA RNA fraction were sequenced using Illumina paired-end read technology. Read coverage depth was averaged across the two TIRs. Coverage values for all sites were then exported and plotted in a spread sheet editor. Incorrect read mapping to duplicated genes (atpH, clpC) and repeated regions was ruled out based on match length or sequence divergence (all matches were 96.6% similar or lower). Presence of RNA editing was assessed by comparing relative representation of high quality nucleotides (phred score 30 or higher) at each site in between transcriptomic and genomic reads mapped on the plastid contig. DNA read mapping, chromosome ends analysis and repeat analysis Illumina paired-end DNA reads were separately mapped on the linear and artificially circularized plastid genome sequence using Consed 22 (Gordon et al., 1998) and Bowtie 2.0 (Langmead and Salzberg, 2012) under default settings. Read coverage depth was averaged across the two TIRs and incorrect read mapping to repeats was ruled out as above. Coverage depth by forward-oriented (i.e., oriented outside of the chromosome) and reverse-oriented reads was calculated using Bedtools 2.17. No forward-oriented reads were found within the last 130 bp of the chromosome ends (all reads in this region were reverse-oriented). Only 23 forward-oriented reads were found within the last 250 bp of the chromosome ends, all of which had a reverse-oriented pair mapping close to the chromosome end except 6 reads whose pairs did not map to the plastid genome sequence or each other. In order to estimate the mean DNA fragment size, full-length read pairs (54bp each) were mapped on the plastid contig using Bowtie 2.0, and sorted and analyzed using Picard Tools SortSam.jar and CollectInsertSizeMetrics.jar utilities. Discrepant bases in Figure 3.9 were visualized using WebLogo 3 (http://weblogo.threeplusone.com/create.cgi) using 'Base probabilities' as units and 'No 76  adjustment for base composition' setting. PCR on genomic DNA was done at the following conditions: annealing at 55ºC, 30s to 4 min elongation times, and 35 cycles. All six primers used in bridging the 'gap' between the two contig ends were confirmed to give functional products with a different primer pair under the same conditions (Figure 3.12C and D). Duplicated and repeated regions were searched using EMBOSS 6.3.1 repeat identification tools at Pasteur Mobyle website (http://mobyle.pasteur.fr/), Pipmaker (http://pipmaker.bx.psu.edu/pipmaker/), and pairwise BLASTn searches (length >50 nt, evalue <0.01, and homology 65-100%), all at default settings. Figure 3.14 was plotted using Circos (Krzywinski et al., 2009). Pulse-field gel electrophoresis (PFGE) and Southern blot hybridization C. velia cells (107-108) were slowly pelleted, embedded in low-melting agarose plugs and treated with 2% N-laurylsarcosine and 2 mg/mL proteinase K for 28 hrs at 56°C. Thick cell walls in C. velia prevented efficient penetration of cell mebranes and DNA release leaving between 10 to 30 % of all cells undigested. The plugs were inserted into 1% agarose gel and DNA was resolved on CHEF-DR III PFGE (Bio-Rad) in 0.5x TBE at 14ºC and using two different settings: 1) U = 6 V/cm with 0.5-25 s pulses and 120º angle for 20 hours, and 2) U = 6 V/cm with 0.1-2s pulses and 120º angle for 14 hours. After treatment with 0.25M HCl for 20 min, the gels were denatured, neutralized and blotted to nylon membrane and UV cross-linked following standard protocols. DNA probes were labeled as described above. Southern blot analysis with psbA, tufA, petA, and atpA probes was performed in NaPi solution (0.5M Na2HPO4, pH 7.2, 1mM EDTA, 7% SDS, 1% BSA) at 65ºC overnight, and the membranes were washed 20 min in 2x SSC, 0.1% SDS at RT and another 20 min in 0.2x SSC, 0.1% SDS at 65ºC and visualized on Typhoon Imaging System (GE Healthcare).  77  Determining the replication origin and plastid gene copy number The cumulative GC skew plot was drawn using the utility at http://gcat.davidson.edu/DGPB/gc_skew/gc_skew.html. Coverage depth by Illumina genomic reads was determined as above. Relative abundance of selected plastid and nuclear genes in C. velia were estimated as follows. A 519 bp-long fragment of nuclear topoisomerase II gene (a typical single-copy nuclear gene), and 337 bp-long and 327 bp-long fragments of chloroplast tufA and atpH2 genes, respectively, were cloned in tandem into a single plasmid, using unique restriction sites in the primers. The resulting construct was labeled p55+13. Separate serial dilutions of total DNA of C. velia digested with DraI and Sph1103I and the EcoRV-linearized p55+13 were spotted on a Biodyne membrane (Pall), cross-linked and blotted as described above. Each gene fragment (Topo II, tufA, atpH) was labeled by [α32P]ATP, hybridized at 65ºC  to one of three identical blots, and visualized as above. The atpH2 probe was tested to hybridize efficiently to all three atpH paralogs (two of which are identical to the probe and the third is 93% identical). For this, genomic DNA digested with DraI and Sph1103I was resolved in a 0.75% agarose gel, blotted and hybridized as above, and signals from atpH1 and atpH2-specific fragments were compared.  78  Figure 3.1: An expression model for the split PsaA. PsaA fragments are separated at the genomic, transcriptomic, and protein level (top to bottom). The top graph shows the transcriptional profile (total RNA in red and polyA RNA in blue) of the genomic region below. In the genomic region, colored boxes represent genes and grey arrows show the coding DNA strand (see Figure 3 legend for gene color code). Consistent with the transcriptional profile, Northern hybridization blots (boxes below) reveal that both psaA fragments are transcribed within larger polycistronic transcripts (P) and further processed into dicistronic (D) and monocistronic (M) units. The monocistronic psaA transcripts contain oligoU tails and translate into independent peptides both of which participate in PSI (numbers of thylakoid trans-membrane domains are indicated in vertical boxes).  79  Figure 3.2: Expression model for the split AtpB. (A) Northern analysis revealed a monocistronic transcript for each of the two atpB fragments. (B) Transcriptome coverage of the two genomic regions is shown for total RNA (red) and polyA RNA (blue). Genes are shown by boxes color-coded in agreement with the Figure 3.4 legend. The grey arrows indicate coding strands. Black arrows below indicate predicted transcripts based on the combination of transcript mapping and Northern analysis. (C) Alignment of plastid, and yeast mitochondrial AtpBs reveals that the split of C. velia AtpB occurred within a 24 amino acid insertion that is also present in the sister taxon Vitrella (the most probable positions of the split are boxed). Non-homologous sequences in C. velia AtpB-1 and AtpB-2 are in grey.  80  Figure 3.3: The 2D electrophoresis of membrane proteins and their identification. Membrane proteins were solubilized by dodecyl-b-maltoside and separated in the first dimension by Clear-Native electrophoresis (Clear-Native-PAGE). After the separation, the gel was scanned by LAS 4000 imager (Fuji, Japan) in chlorophyll fluorescence mode (Chl fl.) after excitation by blue LED light to distinguish the PSII and light harvesting complexes. The protein complexes resolved in the first dimension were further separated in the second dimension by denaturing gel (SDS-PAGE) and stained by Coomassie blue. Protein spots were cut from the gel and analyzed by LC-MS/MS as described in Methods. Identified spots are highlighted and positions of protein complexes separated by Clear-Native electrophoresis are marked as follows: PSII – Photosystem II, SC-PSII – Supercomplexes of photosystem II with antenna, SC-PSI – Supercomplexes of photosystem I with antenna.  81  Figure 3.4: Plastid total RNA and polyA RNA transcriptomic profiles. Coverage by polyA (blue) and total RNA (red) is shown for the whole plastid genome. Gene families are color-coded according to the legend. The highly expressed psbA gene is highlighted in bold. The split psaA and atpB genes are in red. The boundaries of the terminal inverted repeats are separated by dashed vertical lines. Note that the depth of coverage is plotted on Log scale.  82  Figure 3.5: Genes with an unexpected extension at the C-terminus. Blue line shows a Log-scaled depth of mapped transcripts from the total RNA fraction. Red portion of the line corresponds to the region of homology between C. velia and Porphyra purpurea. Gene lengths (thin horizontal lines) and regions of homology (red and orange blocks) in the two species are compared below (aa = amino acids).  83  Figure 3.6: Genes with an unexpected extension at the N-terminus. Blue line shows a Log-scaled depth of mapped transcripts from the total RNA fraction. Red portion of the line corresponds to the region of homology between C. velia and Porphyra purpurea. Gene lengths (thin horizontal lines) and regions of homology (red and orange blocks) in the two species are compared below (aa = amino acids).  84  Figure 3.7: Genes containing long insertions. Blue line shows a Log-scaled depth of mapped transcripts from the total RNA fraction. Red portion of the line corresponds to the region of homology between C. velia and Porphyra purpurea. Gene lengths (thin horizontal lines) and regions of homology (red and orange blocks) in the two species are compared below. Arrows show regions of significantly lower coverage in the total RNA fraction (aa = amino acids).  85  Figure 3.8: Genes present in paralogs: (A) clpC and (B) atpH. Blue line shows a Log-scaled depth of mapped transcripts from the total RNA fraction. Red portion of the line corresponds to the region of homology between C. velia and Porphyra purpurea. Gene lengths (thin horizontal lines) and regions of homology (red and orange blocks) in the two species are compared below.  86  Figure 3.9: Schematic of plastid chromosome ends. (A) Coverage depth (y-axis) by forward-mapping (yellow), reverse-mapping (purple) and total (orange) genomic reads is projected onto the terminal inverted repeat (TIR). Position of genes is shown by horizontal boxes at the bottom (K=trnK, P=trnP). Total coverage depth starts dropping gradually at about 800 bp from the chromosome end, and goes through a U-shaped minimum at about 150 bp from the end. The calculated mean size of the end-sequenced DNA fragments (horizontal black bar, red bits correspond to sequenced end pairs) and coverage by forward/reverse read pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end. A steep drop in total coverage depth occurs at the very end of the chromosome (B). Here, the TIRs on each of the chromosome ends (purple and green, respectively) diverge into a shared 15 bp region followed by a short sequence of the complimentary TIR (DNA ends marked as 3' and 5'). The total depth of coverage by DNA reads (y-axis) steeply decreases in the 15 bp region. The 15 bp region is also enriched in high quality nucleotide discrepancies (sequence logos in bold). The last 130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanning this region can be identified, suggesting that the genome cannot be genuinely circularized.  87  Figure 3.10: Amplification within and between plastid chromosome ends. (A) Overall primer layout on terminal inverted repeats (TIR) and at the neighboring part of the large single copy region (LSC). Dashed vertical line indicates TIR boundary, numbers indicate positions. Employment of primers in PCR experiments and indication of positive versus negative amplification of expected products are shown below. (B) LSC region can be successfully connected to the ends of both TIRs. A strong amplification product was generated using primers B1-R1, which corresponds to the previously published contig of C. velia plastid genome (indicated by asterisk). Nested PCR is required for successful amplification between LSC and primers E2 and E3 at the very ends of TIRs. (C) PCR amplification within TIRs ends generates expected products, however, the same primers and their nested combinations consistently fail to amplify the region in between the TIRs suggesting TIRs are not directly connected to each other.  88  Figure 3.11: Pulse-field gels and Southern hybridization. Pelleted cells of C. velia were digested in low-melting agarose plugs and subjected to pulse-field gel electrophoresis for 20 hours (A, left side) and 14 hours (B, left side; see Methods for conditions). Separated DNA was transferred onto a membrane and hybridized to the psbA probe (A, B, right side). Linear DNA length in kb are indicated on the left. Monomeric linear (M), and dimeric (D) plastid genome forms are visible in the gel front followed by the signal from the compression zone (C) and the well (W) with embedded agarose plugs. Monomeric linear form are sometimes observable directly in the pulse field gel (B). Some content of this figure was modified from the Supplemental material in Janouškovec et al., 2010 (see main text for full reference).  89  Figure 3.12: Prediction of the origin of replication. (A) The global minimum in GC skew is located around the position 68000 (first pair of vertical lines. (B) The orientation of gene clusters (red boxes) on the two coding strands (top and bottom of the thick horizontal line). Clusters are transcribed as indicated by black arrows. A major shift in coding polarity is found around the position 86000 (second pair of vertical lines). (C) Coverage depth by Illumina DNA reads and GC content deviation peaks in the region between the two pairs of vertical lines. A similar profile is retained even after correlation for GC content. The data is consistent with presence of one or more replication origins around or between the vertical lines and bi-directional replication from this region towards the chromosome ends.  90  Figure 3.13: Relative plastid gene copy number in C. velia. Dot blot hybridization of the genomic DNA digest (CHRDNA) and a synthetic construct (p55+13) with a nuclear (TopoII) and two plastid probes (TufA, AtpH). AtpH is present in three closely related paralogs all of which hybridized efficiently to the probe (Materials and methods). Number of total gene copies relative to TopoII is indicated.  91  Figure 3.14: Large repeats in the C. velia plastid genome. The linear genome is drawn as an incomplete circle with genes in green, intergenics in grey, and two TIRs at chromosome ends in red (15 bp regions at the termini are not visible here). Links connect homologous regions of 50bp or more identified by BLAST (blue) and additional matches identified by Pipmaker (orange). Homology between the TIRs is omitted for clarity. Names of genes most affected by duplication events are indicated.  Figure 3.15: Intrachromosomal recombination hotspot near the TIR boundary. The plastid genome (thick black line at the bottom) is shown with terminal inverted repeats (TIR-A and TIR-B) folded onto themselves (left) and the large single copy region (LSC) to the right. Other parts of the genome with homology to this boundary region are shown by thick red lines; numbers indicate positions. Genes and pseudogenes are distinguished according to the legend. Shades of grey show a level of shared homology (legend).  92  Table 3.1.: Summary of total RNA transcript mapping to the C. velia plastid genome. Region  Features  Intergenic regions b Genes  b  c  105  22.4  3444  107  94.1  33990  4.8  348584  Relative coverage vs. gene median a  3  Transfer RNA  29  2.4  699  1.0  Protein coding genes  75  86.9  12033  16.4  Photosystem II (psb)  11  6.6  52244  71.2  Photosystem I (psa)  4  5.7  22740  31.0  Cytochrome b6/f (pet)  4  2.1  18301  25.0  ATPase (atp)  6  5.1  14762  20.1  Ribosomal proteins LSU (rrl)  10  5.4  1141  1.6  Ribosomal proteins SSU (rrs)  12  16.8  1328  1.8  Thylakoid import (sec,tat)  3  4.5  817  1.1  RNA polymerase (rpo) Other function (acsF, ccsA, clpC, tufA, ycf3)  4  15.4  293  0.4  7  4.9  14  11.9 13.5  3617  Unknown function (orf)  1339  1.8  psbA  1.0  383384  522.8  psaA-2  1.2  48594  66.3  psbE  0.3  41694  56.9  atpH1  0.2  41540  56.6  psbD  1.0  37785  51.5  psbC  1.4  32123  43.8  petD  0.5  27882  38.0  psbB  1.6  23102  31.5  psbV  0.5  20686  28.2  psaC  0.2  19422  26.5  atpA  1.7  17239  23.5  petA  0.9  16957  23.1  atpB-1  0.5  15992  21.8  psbH  0.3  15653  21.3  petB  0.6  15341  20.9  psaB  3.4  14407  19.6  tufA  1.3  13235  18.0  petG  0.1  13026  17.8  atpB-2  1.5  10950  14.9  psbK  0.1  9339  12.7  psaA-1  0.8  8538  11.6  psbJ 0.1 8027 Relative coverage compared to median gene coverage per bp (=733.4) Gene/intergenic regions in the terminal inverted repeats were counted once Protein coding  10.9  Highly expressed PC genes  b  Coverage per bp  Ribosomal RNA  c  a  Length [kb]  475.3  93  Table 3.2: Total RNA transcript mapping to all C. velia plastid genes. Gene rrs rrl psbA psaA-2 psbE atpH(1) psbD psbC petD psbB psbV psaC atpA petA atpB-1 psbH petB psaB tufA petG atpB-2 trnV(UAC) psbK psaA-1 psbJ rps14 trnL(UAA) orf389 ycf3 orf201 rpl3 atpI acsF psbT orf391 rps13 clpC(2) rpl4 rps12 rpl11 trnP(UGG) orf135 orf137 ccsA tatC orf157 rrf clpC(1) rpl31 orf115 rpl6 rps18 rps19 rpl5 secY orf1173 rpl2 rpoA rps7 orf634 orf325  Start position  Stop position  Length [bp]  Coverage per bp 1  Relative coverage vs. mean a  Relative coverage vs. median b  43760 45602 1846 17657 18959 63354 30484 28736 8834 12173 39977 110988 114793 7708 60226 114473 113664 19421 51389 112956 4281 40690 40508 11240 50149 80283 84923 111385 41986 24251 50703 79493 16176 110786 14992 53192 25327 38702 65612 78949 41851 52754 59424 41255 6659 14465 48717 23496 67157 66054 105301 69518 34747 103469 105987 35112 33758 66407 64221 31775 108705  45344 48659 2877 18844 19213 63596 31515 30157 9313 13819 40459 111230 116490 8625 60654 114727 114302 22840 52678 113066 5732 40762 40633 12052 50268 80582 85159 112554 42516 24856 51284 80215 17273 110905 16167 53539 28599 39856 65989 79365 41924 53161 59837 41704 7471 14938 48844 24179 67375 66401 105954 70288 35037 104017 107159 38633 34630 67123 65510 33679 109682  1585 3058 1032 1188 255 243 1032 1422 480 1647 483 243 1698 918 429 255 639 3420 1290 111 1452 73 126 813 120 300 237 1170 531 606 582 723 1098 120 1176 348 3273 1155 378 417 74 408 414 450 813 474 128 684 219 348 654 771 291 549 1173 3522 873 717 1290 1905 978  570571 473653 383384 48594.2 41694.2 41539.5 37785 32123.2 27881.6 23102 20686.1 19421.5 17239.2 16956.6 15991.8 15653.4 15340.9 14407.4 13235.4 13025.9 10949.6 10738.4 9339.42 8537.91 8026.92 6548.63 4725.93 4713.61 3980.41 2975.93 2831.71 2657.54 2543.46 2538.89 2514.07 2476.84 2335.43 2314.72 2206.41 1805.22 1674.45 1632.26 1605.29 1577.26 1564 1558.15 1528.7 1528.27 1238.36 991.839 902.631 891.035 826.694 733.372 731.13 727.761 683.905 645.601 625.077 624.782 617.262  31.01 25.75 20.84 2.64 2.27 2.26 2.05 1.75 1.52 1.26 1.12 1.06 0.94 0.92 0.87 0.85 0.83 0.78 0.72 0.71 0.60 0.58 0.51 0.46 0.44 0.36 0.26 0.26 0.22 0.16 0.15 0.14 0.14 0.14 0.14 0.13 0.13 0.13 0.12 0.10 0.09 0.09 0.09 0.09 0.09 0.08 0.08 0.08 0.07 0.05 0.05 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.03 0.03 0.03  778.01 645.86 522.77 66.26 56.85 56.64 51.52 43.80 38.02 31.50 28.21 26.48 23.51 23.12 21.81 21.34 20.92 19.65 18.05 17.76 14.93 14.64 12.73 11.64 10.95 8.93 6.44 6.43 5.43 4.06 3.86 3.62 3.47 3.46 3.43 3.38 3.18 3.16 3.01 2.46 2.28 2.23 2.19 2.15 2.13 2.12 2.08 2.08 1.69 1.35 1.23 1.21 1.13 1.00 1.00 0.99 0.93 0.88 0.85 0.85 0.84  94  Table 3.2: Total RNA transcript mapping to all C. velia plastid genes (continued). Gene  Start position  Stop position  Length [bp]  Coverage per bp 1  rps8 104077 105225 1149 rps4 86041 92037 5997 orf147 109844 110287 444 rpl16 101726 102175 450 trnI(GAU) 41132 41204 73 rpl14 102967 103335 369 rps3 98551 101625 3075 psbN 60818 60943 126 rps11 67467 69158 1692 trnR(ACG) 110607 110680 74 rpoB 74641 78906 4266 rps17 102264 102887 624 trnG(UCC) 10528 10598 71 orf175 13937 14464 528 trnA(UGC) 84747 84819 73 rps2 61381 62256 876 trnN(GUU) 62845 62917 73 trnK(UUU)(2) 63164 63237 74 atpH(2) 3162 3671 510 trnP(AGG) 1737 1810 74 trnK(UUU)(1) 3770 3843 74 trnK(UUU)(3) 63675 63748 74 secA 81078 83585 2508 clpC(3) 54824 59410 4587 rpoC1 70458 74471 4014 trnS(UGA) 62754 62839 86 trnQ(UUG) 107325 107397 73 trnL(UAG) 42813 42898 86 rpoC2 92125 98541 6417 trnT(UGU) 41015 41087 73 trnC(GCA) 107405 107475 71 trnF(GAA) 107220 107292 73 trnM(CAU) 107970 108052 83 rpl36 50473 50586 114 orf230 5822 6514 693 trnI(CAU) 107611 107684 74 trnE(UUC) 42905 42976 72 trnfM(CAU) 45465 45539 75 trnS(GCU) 107685 107769 85 trnW(UCA) 84823 84896 74 trnH(GUG) 107508 107580 73 trnR(UCU) 83967 84039 73 trnY(GUA) 40925 41007 83 trnL(CAA) 107859 107941 83 orf264 889 1683 795 trnD(GUC) 107785 107857 73 a Relative coverage compared to mean gene coverage per bp (=18397.1) b Relative coverage compared to median gene coverage per bp (=733.4)  575.618 562.399 438.73 423.053 421.808 420.561 359.273 356.103 354.331 320.122 309.865 289.325 275.056 267.165 230.384 218.469 213.808 198.162 192.727 191.838 171.52 158.446 155.703 120.648 117.601 104.779 104.644 100.663 97.4871 86.0274 79.1831 71.863 69 60.0702 59.1717 53.3108 48.3472 47.5067 46.8118 40.2973 30.7534 26.5342 23.7952 20.3253 14.5698 9.73973  Relative coverage vs. mean a  Relative coverage vs. median b  0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  0.78 0.77 0.60 0.58 0.58 0.57 0.49 0.49 0.48 0.44 0.42 0.39 0.38 0.36 0.31 0.30 0.29 0.27 0.26 0.26 0.23 0.22 0.21 0.16 0.16 0.14 0.14 0.14 0.13 0.12 0.11 0.10 0.09 0.08 0.08 0.07 0.07 0.06 0.06 0.05 0.04 0.04 0.03 0.03 0.02 0.01  95  Chapter 4: Global analysis of plastid diversity reveals new apicomplexan-related lineages in coral reefs Introduction The presence of relict non-photosynthetic plastids in obligate intracellular apicomplexan parasites (e.g. Plasmodium; McFadden et al., 1996) has proved puzzling in many ways. The apicomplexan plastid genome is highly reduced and encodes a fast-evolving gene complement that prevent accurate understanding of its origin, relationship to other plastids, and time of acquisition (Wilson et al., 1996). Despite the lack of evidence, a number of mutually-contradicting hypotheses have been proposed that relate the apicomplexan plastid to that of green algae, red algae or dinoflagellates (Blanchard and Hicks, 1999; Cavalier-Smith, 1999; Fast et al., 2001; Funes et al., 2002; Köhler et al., 1997; Waller et al., 2003; Williamson et al., 1994). The recent discovery of the photosynthetic relative of apicomplexans, Chromera velia has put end to these speculations (Moore et al., 2008) and provided strong evidence for direct descent of the apicomplexan and dinoflagellate plastid from a common photosynthetic ancestor (Janouškovec et al., 2010). Further evidence from C. velia helped to shed light on the evolutionary acquisition of heme and galactolipid biosynthesis in apicomplexans (Botté et al., 2011; Kořený et al., 2011) and suggested a significant potential in addressing on the origins of apicomplexan parasitism. The intense interest that C. velia has generated, and the concomitant description of its distant photosynthetic relative, Vitrella brassicaformis (formerly known as CCMP3155; Oborník et al., 2012), readily demonstrates how surprisingly little we know about plastid-containing relatives of apicomplexans as a whole.  96  A simple way to address existence of free-living relatives of apicomplexans is through searching environmental sequence data. Databases, such as Silva ribosomal RNA database (http://www.arb-silva.de/), contain hundreds of thousands sequence entries from diverse ecosystems and sampling locations. The nuclear 18S ribosomal RNA (rRNA) gene has become the molecule of choice for conducting environmental sequence surveys in eukaryotes. However, the 18S rRNA gene does not provide direct evidence for plastid presence, which can only be estimated indirectly from tree topologies and hence is not useful for alveolates, where distribution of plastids appears to be patchy (Janouškovec et al., 2010). Moreover, the environmental 18S rRNA gene evidence for Chromera and Vitrella is severely limited (J. Janouškovec, unpublished data, and see below). An interesting alternative to the traditional 18S rRNA gene inquiry in algae is searching for environmental evidence of plastid genes. With rare exceptions, all known plastids encode a genome that significantly exceeds the nuclear DNA in copy number (Day and Madesis, 2007) suggesting that plastid DNA may be relatively abundant in nature. Accordingly, plastid DNA is readily detectable in metagenomic studies and can be employed in understanding distribution of poorly-known algal groups, such as the phytoplankton of the open ocean (Worden et al., 2012). Although the significance of metagenomes for studying biodiversity is still limited (largely due to short reads length, high sample complexity, and absence of reference genomes), the plastid 16S rRNA gene can well substitute for this function: it is substantially long, variable, and contains sites for universal primers design. Unfortunately, very few 16S rRNA gene surveys of plastids have been published (Lepère et al., 2009; Shi et al., 2011), and although they pointed to existence of new lineages among prasinophyte green algae, the limited sequence data they  97  produced cannot encompass the diversity of algae as a whole. Despite the low number of studies specifically focusing on plastids, it has been noted that plastid 16S rRNA genes infiltrate environmental studies on cyanobacteria and other bacteria. The number of plastid contaminants can be so pronounced that it led researchers to develop specific tools for separating them from other bacterial sequences (e.g., the RDP pipeline at http://pyro.cme. msu.edu). However, the extent of this contamination is generally unknown and difficult to determine retroactively because plastid sequence is generally discarded, confused for bacterial, or left unanalyzed (with notable exceptions; Barott et al., 2011), and provides an unprecedented opportunity to examine existence of new plastid lineages and the general plastid diversity. Here we investigate global plastid diversity and distribution by comprehensively searching existing prokaryotic sequence surveys for eukaryotic plastids. From more than 1.6 million bacterial sequences, we identified 9,799 plastid-derived sequences, most of which were previously mis-labeled as ‘novel bacteria’ sequences. 98.8% of these plastid-derived sequences could be assigned to well-defined algal lineages, most often green algae, diatoms, and haptophytes. The only abundant exceptions were all sequences related to apicomplexan parasites, nearly all of which were exclusive to coral reef environments. Close relatives of C. velia were rare, and like the much more abundant Vitrella, were not directly associated with coral tissue or surface, contrary to common assumption. However, the most abundant and newly discovered plastid lineage, ARL-V, was found in a tight and stable co-occurrence with corals comprising at least 20 symbiont-hosting species distributed worldwide.  98  Results and discussion Analysis of plastid diversity identifies new apicomplexan-related lineages Comparing the recently-completed plastid genomes of two deep-branching photosynthetic relatives of apicomplexans (Janouškovec et al., 2010) to public databases revealed that these lineages were present in microbial surveys, but were mistakenly identified as novel bacteria (Sørensen et al., 2007). To examine simultaneously the extent to which plastids infiltrate bacterial surveys and the diversity and ecological distribution of plastid-bearing apicomplexan relatives, we developed a protocol for characterizing plastid sequences from environmental data (Materials and methods). Over 1.6 million environmental sequences were pre-filtered, resulting in 42,989 candidate plastid sequences, which were individually aligned to a backbone dataset comprising representative plastids and cyanobacteria (Figure 4.1A). Alignments were analysed using maximum likelihood (ML) FastTree phylogenies and automated tree sorting, which identified approximately 8500 plastids, 30000 bacterial contaminants, and 2446 ambiguously placed sequences. The latter sequences were analyzed in a second round of phylogenies using an updated dataset and more sensitive algorithm as implemented in RaxML followed by automated tree sorting and comprehensive individual analysis of the remaining sequences (Materials and methods). A total of 9,799 sequences were identified that unambiguously originated from plastids, with 98.8% branching within well-defined algal lineages, predominantly within green algae, diatoms, and haptophytes (Figure 4.2). Most plastid sequences were mis-identified as novel bacterial lineages, but some ‘novel plastid lineages’ were also mis-identified; for example, the prasinophyte clade 16S-IX (Shi et al., 2011) branched with stramenopiles, and not with green  99  algae (data not shown). Many other sequences added to the diversity of existing uncultured clades, for example the prasinophyte clade 16S-VIII (Lepère et al., 2009), deep-branching haptophytes, and rappemonads (Kim et al., 2011). A representative of a novel plastid lineage, clone S25_1200, was also identified, which branched within haptophytes in our analysis (Figure 4.2). Overall, sequences derived from numerous independent studies using different amplification primers successfully captured representatives of all known major plastid lineages, except glaucophytes, which are not abundant even in the best-sampled environments, and dinoflagellates, which are notoriously divergent in sequence (Zhang et al., 2000) and do not amplify with universal primers. We therefore conclude that the systematic exclusion of new algal lineages is possible, but that this exclusion is unlikely to be widespread, and the recovered environmental sequences generally represent existing algal diversity. The only exceptions that were not readily identifiable were 121 plastid sequences that did not fall within a major algal lineage, and all of these sequences branched as sisters to apicomplexans. Inclusion of short reads that were not retrieved in automated searches later extended this number to 1316 sequences comprising 8 distinct apicomplexan-related lineages (ARLs; Figure 4.2), six of which were composed entirely of unidentified sequences. The relationship between ARLs and apicomplexans was further confirmed by phylogenetic testing of the Figure 4.2 tree robustness: increasing plastid and bacterial sampling, excluding short and quickly evolving sequences, and including divergent dinoflagellate plastid sequences, including Symbiodinium (Figure 4.3). None of these dataset modifications changed the overall tree topology of Figure 4.2, confirming that none of the ARLs, despite being common in coral reef environments represent dinoflagellates or the well-known zooxanthellae coral symbionts. Only  100  three ARLs were phylogenetically diverse, and interestingly, two of these lineages include the only known photosynthetic plastids: Chromera (Moore et al., 2008) falls in ARL-III (35 sequences) and the recently described Vitrella (Oborník et al., 2012) falls in ARL-I (148 sequences). However, the most abundant clade of all was ARL-V, which is composed entirely of unidentified sequences that were, like Vitrella plastids, previously mis-identified as novel bacteria related to cyanobacteria (Klaus et al., 2011; Sørensen et al., 2007). ARLs are widespread and abundant on coral reefs Tracking the environmental origin of ARL sequences revealed worldwide presence and a strong association with off-shore tropical and warm subtropical waters closely corresponding to coral reef habitats (Figure 4.4; Tables 4.1 and 4.2). Six of the eight ARL clades (ARL-I, II, III, IV, V and VIII) were identified in microbial sequence directly corresponding to coral surface, reef sediment and macroalgal surface (Table 4.1). Sequences that represented Chromera velia were mostly derived from reef macroalgae and were comparatively rare (35 sequences), but highlighted the unexplored diversity within this lineage (Figure 4.2). Vitrella-related ARL-I sequences were significantly more abundant (148 sequences) and were not strictly associated with coral reefs, but also found in calcifying thrombolite mats (Myshrall et al., 2010). Searching eukaryotic environmental data for sequences related to Vitrella nuclear rRNA confirmed its association with coral, reef sediment, and thrombolites, but also extended its range to a stromatolite mat and a coastal calcium carbonate sink (Table 4.1). Intriguingly, all these environments are united by active calcium carbonate precipitation suggesting the specific distribution of Vitrella might relate to this process.  101  The largest ARL-V clade comprised 1124 environmental sequences all of which were derived from the coral reef environment. A more detailed probe into the primary sequence data and associated studies revealed that ARL-V is strictly and exclusively associated with tissue and surfaces of corals, including at least 20 coral species (Table 4.2). The specific distribution pattern points to a possible symbiotic relationship between ARL-V and corals, raising the question whether ARL-V is a phototroph or parasite. ARL-V clearly does not correspond to the Symbiodinium plastid (Figure 4.3C), and although it might represent the plastid of Genotype N, a putative apicomplexan parasite of coral (Toller et al., 2002), the branching pattern in phylogenies is more consistent with ARL-VIII corresponding to Genotype N. This suggests ARL-V may represent a previously undetected eukaryote. More so than its evolutionary novelty, however, the exclusive association between ARL-V and the keystone species of the reef ecosystem points to an unprecedented potential, prompting for more evidence about its fine-scale distribution in the coral environment. Distribution of ARL-V in the coral reef ecosystem The primary outstanding question regarding ARL-V is whether it is specifically associated with corals themselves, or more generally with reef environments. To address this, we re-analysed plastid sequence from a previous fine-scale 16S ribosomal RNA survey in Curacao that compared two key microbial habitats on the reef and their transition zone: coral surface and macroalgal surface (Barott et al., 2011, 2012). Four five-zone transects were sampled (each in five pooled replicates) spanning tissue and surface areas of the reef-building coral, Montastraea annularis (Zone 1), and one of the four reef-dominant macroalgae tightly associated with the coral (Zone 5), through their common interface (contact zone, Zone 3; see Supplementary  102  Material and methods for details). Relative sequence occurrences for ARL-I, ARL-III, ARL-V and five microalgal groups (dinoflagellates were absent) were calculated for all zones (Table 4.3) and their median values across all transects were plotted for each zone to display the general trends in plastid abundance between corals and dominant reef macroalgae as a whole (Figure 3.5A). In general, plastid sequences were scarce or altogether absent in coral tissue and surface material, and significantly more abundant on the macroalgae associated with the coral. Interestingly, this distribution was true of all the common microalgal groups investigated (for example, diatoms, haptophytes and pelagophytes), but was also found for ARL-I and ARL-III (Vitrella and Chromera). This suggests, contrary to common belief, that Vitrella and Chromera may not be obligate coral symbionts, and possibly interact indirectly with coral. In contrast, ARL-V was confined to coral tissue and surface, and was completely absent from the macroalgae; a distribution entirely opposite to that of all representative algae (Figure 4.5A). A reference sample of Montastraea tissue/surface taken from a remote site on Curacao confirmed the Zone 1 pattern: ARL-V was the only plastid abundantly recovered (Table 4.3). The apparent association between ARL-V and Montastraea prompted us to ask whether a similar pattern can be found in other microbial surveys. In three studies that concurrently sampled corals and surrounding water (Chen et al., 2011; Reis et al., 2009; Sunagawa et al., 2010) ARL-V continued to be strictly associated with coral tissue and surface: ARL-V sequences were absent in reef water, but they were found in all corals sampled. Similarly, three surveys of coral reef sediment that identified sequence from other plastids, including ARL-I (Vitrella), did not contain ARL-V (Gao et al., 2011; Garren et al., 2009; Sørensen et al., 2007). Lastly, in at least one coral species examined (Isopora palifera in Taiwan) we detected ARL-V’s sequence on  103  6 out of 7 occasions during 1 year (February and April to August; not detected in November; Chen et al., 2011), suggesting a prolonged association. Interestingly, the 20 coral species associated with ARL-V do not form a monophyletic group, but comprise three distinct clades: scleractian ‘robust’ and ‘complex’ clades, and octocorals (Table 4.2). To investigate whether ARL-V abundance varies among corals species, we identified all ARL-V sequences from a comparison of seven Caribbean coral species (Sunagawa et al., 2010) and found that ARL-V was ubiquitously present, but 1–2 orders of magnitude more abundant in the octocoral Gorgonia venteliana than in scleractinians (Figure 4.5B). This differential abundance was consistent in 454 and Sanger surveys, suggesting some host preference. Intriguingly, all ARL-V sequences are derived from coral species able to form a symbiotic relationship with Symbiodinium (Table 4.2). We found no evidence for ARL-V presence in Symbiodinium-lacking corals based in GenBank Nucleotide database or five additional microbial surveys where we were able to access the primary data. The ability to form algal symbiosis is widespread but species-specific among cnidarians, and may involve several types of algae (Venn et al., 2008), suggesting that repeated co-occurrence of ARL-V and symbiotic corals is significant. These observations all lead to questions about the functional interactions between ARL-V and corals, the most obvious being whether ARL-V is a photosynthetic or non-photosynthetic, and mutualistic or parasitic. These hypotheses are difficult to distinguish without direct observations of ARL-V biology, however, two additional distribution patterns of ARL-V are noteworthy. Firstly, among several sequence surveys comparing healthy and diseased coral tissues (Barneah et al., 2007; Reis et al., 2009; Sunagawa et al., 2010), we recovered ARL-V  104  almost exclusively from healthy tissue or surface, a pattern unexpected for a coral parasite (but also different from Symbiodinium; Correa et al., 2009). The relative ocurrence of ARL-V sequence in healthy, interface and black band disease-affected zones of Favites sp. from the Red Sea (Barneah et al., 2007) was 65% (20/31 total), 3% (1/35) and 0% (0/65) of the total sequence. We hypothesize that widespread and exclusive presence of ARL-V in healthy corals is more consistent with symbiosis based on mutualism, commensalism or opportunism rather than pathogenicity. Secondly, we found that distribution of ARL-V was strongly depth-dependent in sequence data previously generated from Montastraea annularis in the Caribbean (Klaus et al., 2007). We found ARL-V was consistently dominant in corals at 5 m depth (notably comprising more than a half of all sequences), less abundant at 10 m, and absent at 20 m depths (Figure 4.5C). While reasons for such a pronounced stratification within the photic zone remain unclear, we hypothesize it may be indicative of a photosynthetic lifestyle.  Conclusions Altogether, the phylogenetic and distribution analyses presented here lead to several conclusions. On a practical level, plastid sequences significantly infiltrate environmental sequence surveys of bacteria, and recognizing this should prevent further mis-identification of eukaryotic organelles as novel bacterial phyla. In estimating plastid diversity, our analysis suggests that most biologically abundant algal lineages in commonly sampled environments have likely been discovered and that some of the most understudied hot-spots correspond to relatives of haptophytes and prasinophytes, including a novel plastid lineage, S25_1200. The largest pool of  105  unexplored algal diversity, however, corresponds to relatives of apicomplexan parasites, some of which are widespread, abundant and diversified. The majority of the new apicomplexan-related plastid lineages identified here predominantly associate with coral reef environments. The two known photosynthetic groups, represented by Chromera and Vitrella, are both more diverse and abundant than previously thought, but the often-overlooked Vitrella emerges as the more abundant, widespread, and diverse. The newly recognized ARL-V clade is the most abundant of all apicomplexan-related plastid clades and its intermediate position between parasitic apicomplexans and the related photosynthetic species suggests that - more so than the cultured phototrophs - it may be critical for elucidating the origin and evolution of parasitism in apicomplexans. A considerable body of evidence now also suggests that ARL-V is closely and exclusively associated with multiple species of corals through time and space. Whether ARL-V is an algal symbiont or a parasite remains unknown, but we hypothesize that it is photosynthetic based on its fine-scale distribution, suggesting it has simply been overlooked much as Chromera and Vitrella; testing this will require direct observation. In either event, the interactions between ARL-V and the coral holobiont are potentially highly significant to reef ecology, and demand further evidence. The next step should be to identify and describe this organism from its natural environment, and the environmental sequence distribution data provided here should focus this effort on the most abundant sources of ARL-V in nature.  106  Materials and methods Protocol for automatic determination and sorting of plastid 16S rRNA sequences 1) Sequence retrieval and pre-filtering: 16S ribosomal RNA sequences were retrieved from the latest release of the SILVA rRNA database (release 108, September 2011, http://www.arb-silva.de/documentation/background/release-108/). The original set contained 2,492,653 of chimaera pre-screened entries. To increase reliability of the later phylogenetic estimates, we first filtered out all sequences shorter than 600 bp, resulting in more than 1.6 million (1614666) sequences used further. 2) Clustering and plastid sequence selection: We next used hierarchical clustering algorithm as implemented in Usearch 5.1 (Edgar, 2010) to sort sequences into clusters using a 0.99-0.95-0.90-0.85 similarity scale. The resulting clusters were screened for presence of plastid sequence using 4-15 queries from each of the seven well-defined plastid lineages: glaucophytes, rhodophytes, viridiplantae, haptophytes, cryptomonads, stramenopiles, and alveolates (excluding dinoflagellates, see below), and all sequences from plastid-containing clusters were included in the final set. To avoid a potential loss of sequences from novel plastid lineages during this selection process, the clustering scale criteria (above) were optimized to be loose enough to recover the most divergent plastid sequences known (i .e., alveolates) during cluster selection. Based on the nature of this approach, we cannot exclude a new group of extremely fast-evolving plastid sequences would have been overlooked at this step; however, such an instance is rather unlikely considering the branch length of alveolate plastids is much longer than that of all other plastids. (see Figure 4.1). The only obvious exception to this are even faster-evolving plastid sequences of dinoflagellates. However, we did not detect any environmental evidence for  107  dinoflagellate plastids using similarity searches in the NCBI sequence database, and their extremely biased sequences were therefore not included in our protocol. Altogether, we consider this methodical step the best approximation to analysing all 1.6 million sequences using individual phylogenies; an approach far beyond computing capacity available to us. In summary, this cluster selection procedure helped us remove significant amount of contaminants from distant bacterial phyla in the final set, while retaining 42989 plastid, cyanobacterial, and bacterial sequences, which were further sorted individually using maximum likelihood (ML) phylogenies. 3) ML tree reconstruction: For phylogenetic sorting we first created a 'FastTree backbone' dataset (Figure 4.1A) containing representatives of all major plastid lineages as listed above, cyanobacteria and other bacteria. We then used MUSCLE (Edgar, 2004) to create 42989 alignments by adding each of the filtered sequences to the backbone dataset individually, and cleaned these alignments from gaps and ambiguous regions using Gblocks (Castresana, 2000). We performed ML phylogenetic analysis of each alignment using FastTree 2.1 (Price et al., 2010) under the gamma corrected GTR model of evolution. 4) Tree sorting: Resulting trees were automatically sorted using Phylosort 1.3 (Moustafa and Bhattacharya, 2008), and fell into three general categories. The first category contained approximately 30000 sequences branching within bacteria or cyanobacteria, which were excluded from further analysis. The second category consisted of approximately 8500 sequences branching inside one of the seven plastid clades, which we sorted and counted. The third category contained the remaining 2448 sequences, which could not be reliably assigned to any existing group using this approach.  108  5) RAxML analysis and sorting of ambigously-placed (third category) sequences: As an alternative approach to FastTree we analysed 8000 from the candidate 42989 sequences using a more accurate but more computationally-demanding ML algoritm as implemented in RAxML 7.2.8, using 20 random starts per tree (Stamatakis, 2006). Both approaches yielded highly consistent results in the first two categories, but RAxML topologies were found to be more accurate than FastTree in the third category (FastTree tended to place long branching sequences at the base of all plastids). Therefore, we re-analysed all sequences from the third category under the RAxML algoritm. To facilitate this, we slightly expanded the 'FastTree backbone' dataset by adding representatives of cyanidiales and the new ARL-V clade (the identity of which we had noticed by then), creating the 'RaxML backbone' dataset (Figure 4.1B) with a slightly more accurate backbone topology (compare Figures 4.1A and 4.1B). This enabled us to classify all but 266 sequences in the third category into existing bacterial and plastid bins (including ARL-V and Vitrella bins) and produce total sequence counts for each group. The identity of sorted trees was verified in all bins by eye evaluation of hundreds of final tree topologies (approximately 1/10 of all trees were viewed). Identifying new plastid lineages and their diversity Our protocol produced 266 sequences, the identity of which could not be determined and was evaluated manually by extended similarity searches and phylogenies. Most of these sequences appeared highly divergent, and included unreliable sequence reads, putative chimeras, candidate fast-evolving bacteria, and candidate fast-evolving green algae, many of which were loosely affiliated with ulvophytes for which sparse 16S rRNA gene sampling is currently available. By exclusion of this sequence we found a single strong candidate for a distinctive plastid lineage:  109  the clone S25_1200. This sequence was distantly related to haptophytes and rappemonads and did not appear to be a chimeric sequence. Our protocol also separated 121 sequences related to apicomplexans, most of which belonged to a new apicomplexan-related lineage, ARL-V. This number of was further extended by similarity searches and phylogenies with emphasis on shorter entries that would not be detected by our automatic protocol (see below) to the total of 1316 ARL sequences. We identified a total of 9 chimeric sequences among these ARL sequences, and in all cases, the contaminant sequence parts including all ambiguous nucleotides were trimmed off and the remaining part was confirmed to be apicomplexan-related. Details about sequence generation (sampling locations and conditions, associated organisms, and study) are presented in Tables 4.1 and 4.2. Datasets and phylogenies The ‘standard’ dataset used in Figure 4.2 and Figure 4.3A was created by adding a representative sampling of all apicomplexan-related plastid lineages and the S25_1200 sequence related to haptophytes and rappemonads to the RAxML backbone dataset. The sequences were aligned using the local-pair algoritm in MAFFT 6.857b (Katoh and Toh, 2008), and the alignment trimmed using Gblocks 0.91b using b1=50%+1, b2=50%+1, b3=12, b4=4, b5=h parameters. Phylogenetic inferences were done in RAxML 7.28 using -m GTRGAMMA -f a -# 1000 parameters, PhyML 3.0.1 (Guindon et al., 2010) using -m GTR -f e -v e -c 8 -a e -b -4 -s BEST --n_rand_starts 10 parameters, and MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001) using lset nst=6 rates=invgamma ngammacat=4, and mcmcp ngen=10000000 samplefreq=500 and temp=0.10 parameters (50% consensus tree constructed from 13050 trees in the posterior probability distribution with chain split frequencies 0.05-0.017). The final tree was drawn with  110  the help of FigTree. The robustness of the Figure 4.2 tree topology was then tested through several independent modifications of its dataset. To evaluate the branching topology under extended sequence sampling we included additional 56 plastid and bacterial sequences (Figure 4.3B). To demonstrate that none of the new apicomplexan-related plastid sequences belongs to a dinoflagellate we included the extremely fast-evolving sequences of dinoflagellate plastids (Figure 4.3C). To test the relationships of apicomplexan-related plastids we individually excluded the ‘coral surface’ sequence (360 bp) branching as a sister to the Vitrella clade (Figure 4.3D), the long-branching sequence clade of hematozoans (Figure 4.3E), and all sequences shorter than 1200 bp and hematozoans altogether (Figure 4.3F). Sequence presence in all datasets, detailed sequence descriptions and all sequence accessions are available in the supplementary material in Janouškovec et al., 2012 with the following exceptions: V6 tags of ARL-V from a survey of seven coral species in the Caribbean (Sunagawa et al., 2010) are available at http://vamps.mbl.edu, V6 tags of ARL-V from Isopora palifera and the ARL-V sequence from Stylopora pistillata on Taiwan (Chen et al., 2011) are available at Coral-macroalgal transects (Figure 4.5A, Table 4.3) Microbial surveys by Barott et al., 2011 and Barott et al., 2012 generated 16S ribosomal RNA sequence data using general bacterial primers, and analysed the majority of bacterial and non-ARL plastid sequence. Here we focused on previously overlooked presence of ARLs in this data and compared their distribution to the other already identified plastid lineages. The two studies sampled coral reef in southern coast of Curacao, Caribbean. Four transects were designed to span between multiple specimens of Montastraea annularis and the four following algae:  111  Halimeda opuntia, Dictyota bartayresiana, crustose corraline algae, and turf algae. Within each transect, five different zones were sampled: coral tissue/surface from the centre of the colony 5-10 cm away from coral-algal interface, CAI (Zone 1); coral tissue/surface immediately adjacent to CAI (Zone 2), CAI, the contact zone between coral and algae (Zone 3), algal tissue/surface immediately adjacent to CAI (Zone 4), and algal tissue/surface 5-10 cm away from CAI (Zone 5). Zones 2 and 4 were principally similar to Zones 1 and 5, respectively, however, their proximity to the point of coral-algal contact (Zone 3) suggested they might differ in their microbial composition. All 20 samples (4 transects x 5 zones) were obtained in 5 replicates. All replicates were independently PCR-amplified and pooled before sequencing. Plastid sequences in the resulting sequence pools (containing mostly bacteria) were identified using BLASTn searches and phylogenies, as described in Barott et al (2011). In this study, we concentrated on five groups of unicellular microalgae (diatoms, pelagophytes, haptophytes, unicellular red and green algae), in addition to newly identified ARLs (ARL-I, ARL-III, ARL-V). We excluded the following plastid sequences of macroalgae, the representatives of which were innate part of the transects: phaeophytes (Dictyota), florideophytes (crustose corraline algae), and ulvophytes (Halimeda, turf algae). Absolute and relative occurrences of sequences from selected plastid lineages for all zones were summarized in Table 4.3. To display general trends in plastid abundance between corals and dominant reef macroalgae as a whole, median relative occurrences for across all four transects were plotted for all five zones and eight microalgal groups (Figure 4.5A). Medians rather than means were used to plot these trends in order to mediate the effect of several outliers in the data (none of the outliers belonged to ARL-V). However, the alternative use of means provided a picture consistent with the use of medians. A  112  reference sample of M. annularis tissue/surface equivalent to Zone 1, marked ‘WP (ref.)’ in Table 4.3, was taken from northern-west tip of Curacao (approx. 37 km aerial distance from the first site) using the same sampling method (Barott et al, 2011). Seasonal abundance of ARL-V and associations with different coral species Chen et al. (2011) sampled Isopora palifera colonies (three replicates) in the southern tip of Taiwan during February, April, May, June, July, August and November 2008, and obtained microbial 16S ribosomal RNA sequence for all samples. We identified 117 plastid sequences of ARL-V in this data, all of which were derived from February and April-August samples (none were derived from November samples). Data on ARL-V abundance within 7 coral species and reef water (Figure 4.5B) was derived from a study conducted near Bocas Del Toro, Panama, Caribbean (Sunagawa et al., 2010). Seven coral species were sampled in five replicates using hammer and chisel and coral fragments 1–4 cm2 in size crushed using the same method. Five liters of reef water were collected in the vicinity of corals and filtered using 0.22 mm filter. DNA extractions were done from equivalent amount of homogenized corals (~50 mg). For all samples, equivalent amounts of DNA were used for 16S rDNA amplication, and all amplification reactions were run in triplicates, before they were pooled and sequenced. Parallel approach in sequence generation made these samples suitable for mutual comparisons. We measured relative sequence occurrence of ARL-V in each sample and plotted the result in Figure 4.5B.  113  Figure 4.1: Backbone trees used in the analysis of plastid diversity. FastTree (A) and RAxML (B) backbone trees that were used in individual phylogenies, which enabled us to detect plastid environmental sequences and count them (see Figure 4.2; detailed protocol described in Supplemental Experimental Procedures). Alv=Alveolates, Hap=Haptophytes, Rap=Rappemonads, Cry=Cryptomonads, Str=Stramenopiles, Rho=Rhodophytes=red algae, Vir=Viridiplantae=green algae and plants, Gla=Glaucophytes, Cya=Cyanobacteria, Bac=other bacteria.  114  Figure 4.2: Phylogeny of novel ARL plastids. Rooted phylogeny of plastid 16S rRNA showing the positions of all new apicomplexan-related lineages (ARL, white type on black). For each plastid lineage, the number of environmental sequences detected is shown in the grey box (for alveolates this includes all sequences identified, for other groups only sequences detected in the automated search). Rappemonads and unidentified clone S25_1200 fell within the haptophytes in these analyses, and red algae were paraphyletic. The tree was constructed with RAxML and numbers at nodes correspond to support from rapid bootstrap/PhyML SH-like aLRT/MrBayes posterior probability when greater than 50/0.90/0.95 (supports inside ARL-I and ARL-V clades were omitted for clarity). Black dots indicate complete support.  115  Figure 4.3: Phylogenetic testing of the Figure 4.2 tree robustness. The standard dataset (A) corresponding to Figure 4.2 was modified by inclusion and exclusion of sequences as indicated above each tree (B-E). RAxML trees with RAxML rapid bootstrap and PHYML sh-alrt supports > 50/0.95 are showed. Black circles indicate 100/1.0 supports. Number of sequences compacted clades are indicated in brackets, unless same as in (A). Grey arrows mark significant increase in branch support (D-F). Supports for branches uniting all plastids (Pl), and cyanobacteria+plastids (Cy+Pl) are shown below each tree. He and Co=hematozoan and coccidian apicomplexans, ARL=Apicomplexan-related lineage, Ch=Chromera clade, Vi=Vitrella clade, Di=dinoflagellates, Pl+Ba=other plastids+bacteria.  116  Figure 4.4: Geographical distribution of ARL plastids. Global distribution of ARL-I (Vitrella), ARL-III (Chromera) and ARL-V from environmental 16S and 18S rRNA sequences corresponds to off-shore tropical and subtropical environments.  117  Figure 4.5: The fine-scale distribution of ARL-V on coral reefs. Distribution of ARL-V on coral surface (A), between coral species (B) and across depth gradients (C). (A) Relative occurrences of plastid sequences in a five-zone transect between Montastraea annularis and associated macroalgae. Colored lines connect median values from four transects in each zone (ARLs highlighted by thicker lines). Detailed zone descriptions are in the text. T/S: tissue/surface area (B) Relative occurrence of ARL-V among seven coral species and surrounding reef water. Species compared are Montastraea faveolata (Mfa), M. franksi (Mfr), Diploria strigosa (Dst), Acropora palmata (Apa), A. cervicornis (Ace), Porites asteroides (Pas), Gorgonia ventalina (Gve). Bars indicate proportion of ARL-V sequences (as %) compared with all other sequences characterized. Numbers next to bars indicate absolute sequence counts of V6-tags, or Sanger reads (in brackets if available). Colors and colored boxes on the left indicate three distinct coral groups (S: scleractinian robust and complex clade; O: gorgonian octocoral). (C) Depth distribution of ARL-V in Montastraea annularis from 5 m, 10 m and 20 m depths. Graphs show relative occurrence of ARL-V (as %) in clone libraries compared with all other sequences characterized. A similar pattern was observed in the terminal restriction fragment length polymorphisms analysis (Klaus et al., 2007).  118  Table 4.1: Environmental sequence evidence for ARLs (excluding ARL-V) Total seqs  Uniq Sample collection site seqs a  Barott et al., 2011. Environ. Microbiol. 13(5)  119  93  Netherl. Antilles:Curacao  coral (Montastraea annularis), macroalgae  Myshrall et al., 2010. Geobiology 8(4)  10  10  Bahamas:Highborne Cay  intertidal thrombolites  Garren et al., 2009. PloS ONE 4(10)  6  6  Philippines: Bolinao  coral reef sediment  Sorensen et al., 2007. Mar. Ecol. Prog. Ser. 346  4  4  Hawaii:Oahu,Kaneohe Bay coral reef sediment  Gao et al., 2011. Chin. J.Oceanol. Limnol. 29(6)  4  4  Hawaii:Oahu, Kilo Nalu  coral reef sediment  Meron et al., 2011. ISME J. 5  2  2  Israel: Red Sea, Eilat  tissue/mucus (Acropora eurystoma)  Sekar et al., 2009. Appl. Environ. Microbiol. 75(8) 1  1  Virgin Islands: St. Croix  coral surface (Siderastrea siderea)  Schottner et al., 2011. Environ. Microbiol. 13 (7)  1  1  Israel: Red Sea, Eilat  coral reef sediment  unpublished: Mendell et al. (GenBank)  1  1  unknown  coral reef fish gut (Naso tonganus)  9 independent studies (8 published)  148  122  reefs, thrombolites  corals, reef sediment, reef fish gut, thrombolites  Barott et al., 2011. Environ. Microbiol. 13(5)  35  35  Netherl. Antilles:Curacao  coral (Montastraea annularis), macroalgae  Barott et al., 2011. Environ. Microbiol. 13(5)  2  2  Netherl. Antilles:Curacao  coral (Montastraea annularis), macroalgae  Murdock et al., 2010. Cahiers de biol. marine 51  1  1  Tonga: South. Tonga Arc  white microbial film  Sekar et al., 2009. Appl. Environ. Microbiol. 75(8) 1  1  Virgin Islands: St. Croix  coral surface (Siderastrea siderea)  ARL-VI  unpublished: Li et al. (GenBank)  2  2  unknown  anchovy gut (Coilia mystus)  ARL-VII  Tarlera et al., 2008. FEMS Microbiol. Ecol. 64(1)  1  1  USA: Georgia  soil sediment  1  1  Netherl. Antilles:Curacao  turf algae  Roeselers et al., 2011. ISME J. 5(10)  1  1  India: West Bengal  wild zebrafish gut (Danio rerio)  13 independent studies (11 published)  192  166  mostly coral reefs  18 coral species, reef sediment, algae, fish gut  Myshrall et al., 2010. Geobiology 8(4)  23  23  Bahamas: Highborne Cay  intertidal thrombolites  Garman et al., 2011. Hydrobiologia 677  8  7  Florida: Gulf of Mexico  oxic sediment  Jebaraj et al., 2010. FEMS Microbiol. Ecol. 71  3  3  India: Lakshadweep reef  coral reef sand  Baumgartner et al., 2009. Environ. Microb. 11  1  1  Bahamas: Highborne Cay  subtidal stromatolites  4 independent studies  35  34  calcifying systems  reef sediment, thrombolites, stromatolites  Lineage  Study that generated sequences  Sample  16S rRNA (plastid) ARL-I clade (Vitrella)  ARL-III clade (Chromera) ARL-IV  ARL-II  Apicomplexans (incl. ARL-VIII) Barott et al., 2011. Environ. Microbiol. 13(5)  ALTOGETHER  18S rRNA (nuclear) ARL-I clade (Vitrella)  ALTOGETHER  a  duplicate sequences were removed unless they differed in length or were derived from different samples.  119  Table 4.2: Environmental sequence evidence for ARL-V Coral group  Coral species  Hexacorals, Scleractinians, Robust clade  Diploria strigosa Favites sp.  Hexacorals, Scleractinians, Complex clade  Octocorals, Gorgonians 3 distinct coral lineages a  Fungia scutaria  Long reads Sunagawa (NCBI et al., 2010 database) ---a 21  2 ---  Barott et al., Chen et 2011, 2012 al., 2011 -----  -----  Total sequences  Zooxanthellate coral (hosts Symbiodinium)?  2 21  YES YES  2  ---  ---  ---  2  YES  Montastraea annularis  67  ---  156  ---  223  YES  Montastraea faveolata  1  17  ---  ---  18  YES  Montastraea franksi  5  118  ---  ---  123  YES  Mussismilia braziliensis  2  ---  ---  ---  2  YES  Pocillopora meandrina  1  ---  ---  ---  1  YES  Stylopora pistillata  1  ---  ---  ---  1  YES  -----  10 21  -----  -----  10 21  YES YES  Acropora cervicornis Acropora palmata Galaxea fascicularis  3  ---  ---  ---  3  YES  Isopora palifera  ---  ---  ---  117  117  YES  Pavona cactus  1  ---  ---  ---  1  YES  Porites astreoides  5  11  ---  ---  16  YES  Porites compressa  1  ---  ---  ---  1  YES  Porites cylindrica  1  ---  ---  ---  1  YES  Porites lobata  1  ---  ---  ---  1  YES  Porites lutea  3  ---  ---  ---  3  YES  21  536  ---  ---  557  YES  Gorgonia ventalina 20 coral species  136  715  156  117  Only found in 1124 zooxanthellate corals  absence of sampling  120  Table 4.3: Occurrence of ARLs and other plastids in coral-macroalgal transects. Group  ARL-V  ARL-I Vitrella clade  ARL-III Chromera clade  Diatoms  Haptophytes  Pelagophytes  Unicellular red algae  Unicellular green algae  All sequence  absolute abundance relative abundance (10E-5) Association Z2 Z3 Z4 Z5 Total Z1 Z2 Z3 Z4 Z5 Total (pp.111-112) Z1 WP 30 67.8656 n.a. n.a. Hal 7 23 4 5 0 39 12.2463 51.321 7.84314 7.52627 0 14.184 Dict 10 19 0 0 0 29 30.0039 22.1818 0 0 0 8.0908 CCA 45 0 1 0 0 46 47.0997 0 1.41225 0 0 13.2832 Turf 5 7 0 0 0 12 7.89378 6.70395 0 0 0 3.69574 Total 97 49 5 5 0 156 33.0407 15.3447 2.16273 2.23142 0 11.5676 WP 0 0 n.a. n.a. Hal 0 0 0 31 43 74 0 0 0 46.6629 77.4105 26.9132 Dict 0 0 7 0 0 7 0 0 11.3638 0 0 1.95295 CCA 0 1 1 1 0 3 0 1.18426 1.41225 2.08238 0 0.8663 Turf 0 0 16 3 16 35 0 0 33.4861 7.55934 23.0302 10.7792 Total 0 1 24 35 59 119 0 0.31316 10.3811 15.6199 21.0394 8.824 WP 0 0 n.a. n.a. Hal 0 0 0 1 3 4 0 0 0 1.50525 5.40073 1.45477 Dict 0 0 1 0 0 1 0 0 1.6234 0 0 0.27899 CCA 0 0 0 8 4 12 0 0 0 16.659 8.42336 3.46519 Turf 0 0 0 0 18 18 0 0 0 0 25.909 5.54361 Total 0 0 1 9 25 35 0 0 0.43255 4.01655 8.91501 2.5953 WP 2 4.52438 n.a. n.a. Hal 0 2 6 8467 3383 11858 0 4.46269 11.7647 12745 6090.23 4312.66 Dict 10 3 145 867 1986 3011 30.0039 3.50238 235.393 1239.79 1840.3 840.048 CCA 5 4 5 47 245 306 5.2333 4.73704 7.06125 97.8718 515.931 88.3624 Turf 1 1 39 140 237 418 1.57876 0.95771 81.6224 352.769 341.135 128.735 Total 18 10 195 9521 5851 15595 6.13127 3.13157 84.3466 4249.06 2086.47 1156.39 WP 0 0 n.a. n.a. Hal 0 1 0 12 27 40 0 2.23135 0 18.063 48.6066 14.5477 Dict 0 0 1 2 13 16 0 0 1.6234 2.85996 12.0463 4.46389 CCA 0 0 0 3 0 3 0 0 0 6.24714 0 0.8663 Turf 0 0 1 2 13 16 0 0 2.09288 5.03956 18.712 4.92766 Total 0 1 2 19 53 75 0 0.31316 0.86509 8.47938 18.8998 5.56135 WP 1 2.26219 n.a. n.a. Hal 0 0 0 102 184 286 0 0 0 153.536 331.245 104.016 Dict 0 0 13 80 21 114 0 0 21.1042 114.398 19.4594 31.8052 CCA 0 7 133 190 359 689 0 8.28981 187.829 395.652 755.996 198.96 Turf 0 0 30 275 1149 1454 0 0 62.7865 692.94 1653.86 447.801 Total 1 7 176 647 1713 2544 0.34063 2.1921 76.1282 288.745 610.856 188.641 WP 0 0 n.a. n.a. Hal 0 0 0 10 0 10 0 0 0 15.0525 0 3.63692 Dict 0 0 7 4 0 11 0 0 11.3638 5.71992 0 3.06892 CCA 0 0 0 2 12 14 0 0 0 4.16476 25.2701 4.04273 Turf 1 0 5 369 119 494 1.57876 0 10.4644 929.799 171.287 152.141 Total 1 0 12 385 131 529 0.34063 0 5.19056 171.819 46.7146 39.226 WP 0 0 n.a. n.a. Hal 13 0 3 40 45 101 22.7432 0 5.88235 60.2101 81.011 36.7329 Dict 0 0 12 0 2 14 0 0 19.4808 0 1.85328 3.9059 CCA 0 0 3 0 10 13 0 0 4.23675 0 21.0584 3.75396 Turf 0 0 0 34 19 53 0 0 0 85.6725 27.3484 16.3229 Total 13 0 18 74 76 181 4.42814 0 7.78584 33.025 27.1016 13.4214 WP 44205 n.a. n.a. Hal 57160 44816 51000 66434 55548 274958 n.a. Dict 33329 85656 61599 69931 107917 358432 n.a. CCA 95542 84441 70809 48022 47487 346301 n.a. Turf 63341 104416 47781 39686 69474 324698 n.a. Total 293577 319329 231189 224073 280426 1348594 n.a.  121  Chapter 5: Colponema, a deep-branching alveolate of a key evolutionary significance Introduction Alveolates are one of the largest groups of unicellular eukaryotes with more than ten thousand described species and extensive evidence for environmental diversity and significance (Adl et al., 2007; Guillou et al., 2008; Janouškovec et al., 2012; López-García et al., 2001). The group also includes several well-known organisms that represent three major alveolate groups: the causative agent of malaria in apicomplexans (Plasmodium), endosymbionts of corals in dinoflagellates (Symbiodinium), and model eukaryotes in ciliates (Tetrahymena and Paramecium). The importantance of these and related species has driven much research effort into understanding their biology. Many of their characteristics have provided insights into eukaryote-wide concepts (e.g., Greider and Blackburn, 1985), or are tightly intertwined with their own significance: the apical complex and cell invasion (Hu et al., 2006), permanently condensed chromosomes (Gornik et al., 2012), nuclear dualism and DNA rearrangements (Prescott, 2000),transcript processing and recycling (Slamovits and Keeling, 2008b; Zhang et al., 2007), and organellar biology (McFadden et al., 1996). Alveolate plastids and mitochondria are particularly interesting, because they perform essential functions (Fichera and Roos, 1997), contain highly reduced genomes with unique properties (Waller and Jackson, 2009; Zhang et al., 1999), and play a central role in understanding of the organellar endosymbiosis (Janouškovec et al., 2010). Despite their number and significance, most of these characteristics are group-specific among alveolates and their origins and evolution are unclear. The substantial divergence between apicomplexans, dinoflagellates and ciliates indicates that great changes took place deep in  122  alveolate evolutionary history, which determined much of their current lifestyles. Of the few alveolate species that fall outside the three groups, all are significantly understudied, because they represent uncultured predators that are relatively rare in natural samples (Mylnikov and Tikhonenkov, 2007; Simpson and Patterson, 1996). However, some of these species bear characteristics important for shedding light onto early evolution of the major alveolate lineages. Here we succeeded in isolating and temporary culturing of Colponema, an enigmatic alveolate of unknown placement (Myl’nikova and Myl’nikov, 2010). We show that this genus represents two paraphyletic lineages and an additional diversity of environmental sequences that together give promise in understanding deep events in alveolate evolution. We conducted genomic surveys from both lineages and show that their mitochondrial genomes contain a number of putatively ancestral features and help us understand the evolution of the mitochondrial genome structure and content in alveolates.  Results and discussion We established a temporary culturing system for Colponema using the prey cultures of Parabodo caudatus, Spumella sp.. and Procryptobia sorokini grown on a Pseudomonas fluorescens suspension. Five Colponema isolates were identified from distinct habitats and locations (saline lake, soil and freshwater samples; see below) and clonal cultures were established from single cells. Light microscopy indicated the five isolates represent at least three species: Colponema edaphicum, Colponema sp. Vietnam, and Colponema sp. Peru, the latter two of which are novel. No molecular data from Colponema was previously available so we first sequenced the 18S ribosomal RNA gene (18S rRNA) to clarify the phylogeny. The branching 123  topology of all five isolates supported the existence of three species, and suggested these form two deep-branching lineages among alveolates (data not shown). In order to confirm this result, we sequenced the 28S rRNA and four protein-coding (hsp90, actin, alpha and beta tubulin) genes from all three species. Single gene trees from each of the five genes were congruent with the 18S rRNA tree topology, although some were poorly resolved (tubulins and actin). We therefore created a combined alignment of all genes and analyzed it in RAxML. The resulting tree (Figure 5.1) was well resolved and strongly supported the monophyly of dinoflagellates, myzozoans (apicomplexans + dinoflagellates), ciliates and all alveolates. Colponema species appeared as two deep and independent branches among alveolates: C. sp. Peru was the closest sister to myzozoans with strong support, and C. edaphicum and C. sp. Vietnam had a deeper position that was unresolved with respect to the ciliates and rest of alveolates. These results clearly show that Colponema represents two previously unrecognized alveolate groups at the level of 'phyla'. The molecular results are well reconcilable with the morphology of Colponema species, which contain typical alveolate features (cortical alveoli and micropores), but feed by phagocytosis of whole cells, not myzocytosis as do basal myzozoans (feeding by sucking). Morever, it has been apparent that the genus Colponema is most likely defined by characteristics ancestral to all alveolates, so the finding of two independent Colponema lineages is consistent with the morphological evidence. The main significance of the two Colponema lineages does not rely on their novelty, but rather on what they can tell us about the thousands of other alveolate species, and the origin of apicomplexans, dinoflagellates and ciliates in particular. In order to provide insights into this, we conducted a genomic survey from C. sp. Peru and C. sp. Vietnam. The mitochondrial (mt)  124  genome of C. sp. Peru was identified on two overlapping contigs in the data, which could be readily assembled together: one contig represents a central unique region and the other forms a 17.6 kbp terminal inverted structure (TIR) on both ends (Figure 5.2). The outward oriented ends of the TIRs diverge into a conserved 38 bp tandem telomeric repeat. The presence of both TIRs and telomeres suggests that the mt DNA of C. sp. Peru is linear in structure (Figure 5.2). Telomeres in linear mt genomes are comparatively rare, and are best known from fungi and ciliates. Ciliate mt DNAs are particularly similar to C. sp. Peru, because they contain both TIRs (or near-terminal inverted repeats) and telomeric repeats of a similar length (31-64 bp). Telomeres are found in distantly related ciliates (Swart et al., 2011) and may have been ancestral to the group as a whole, although the sampling is still sparse and they are absent in Paramecium (Pritchard et al., 1990). The simultaneous presence in C. sp. Peru and ciliates of linear mt genomes with both TIRs and telomeres strongly suggests that this mt DNA organization was not only ancestral to ciliates, but to all alveolates. Apicomplexan and dinoflagellate mt DNAs are also linear, and although they lack telomeres some do contain TIRs (piroplasmid apicomplexans; Hikosaka et al., 2010). Because concatemeric mt DNA organization is seemingly prevalent in apicomplexans, the piroplasmid mt monomers with TIRs were thought to arise secondarily (Hikosaka et al., 2013). The mt DNA organization in C. sp. Peru changes this perspective and suggests that piroplasmid mt DNA may represent the original mt DNA structure in apicomplexans. Indeed, it is easier to imagine that both the linear mt concatemers in Plasmodium and Eimeria and the linear mt fragments in dinoflagellates originated from a monomeric linear mt DNA with TIRs rather than vice versa. The mitochondrial genome of C. sp. Vietnam has been assembled into four contigs totaling 23.4 kbp (Materials and methods). Its overall structure  125  remains unknown, but provides an intriguing potential in testing the above hypotheses about evolution of alveolate mt DNA organization. Comparing the mt genome content of the two Colponema species to other eukaryotes revealed that they have altogether retained the largest set of genes among alveolates including 14 genes that are missing in other alveolate mitochondria (Figure 5.3). Of these, rpl32 has only been identified in jakobid mt DNAs, and sdh3 represents the first mt-encoded subunit of the membrane complex II in alveolates and their relatives (the SAR clade). C. sp. Peru is the sister group of myzozoans and comparing their mt genomes indicates at least 42 genes were lost preceding the origin of myzozoans, which represents the largest known reduction in aerobic mitochondria. In myzozoans, the cytochrome oxidase subunit 2 (Cox2) is split and encoded in the nucleus, and the evolutionary significance of this split was a matter of heated debate (Funes et al., 2002; Waller and Keeling, 2006; Waller et al., 2003). In both Colponema species, an intact, mt-encoded Cox2 is found that is related to myzozoans supporting that their split occurred independently from other eukaryotes. Cytochrome oxidase subunits in ciliates are even more unusual: both Cox1 and Cox2 encode long insertions, and Cox3 is absent altogether (Figure 5.3; Smith et al., 2007). In contrast, all cytochrome oxidase proteins are present in Colponema mt genomes in their canonical forms. Unusual splits have been identified in ciliate ribosomal RNA genes (rns, and rnl; Burger et al., 2000), and in the nad1 gene (subunit 1 of the complex I, NADH dehydrogenase), which encodes two individualized mRNAs in Tetrahymena (Edqvist et al., 2000). A number of additional modifications appear in ciliate mt proteins including the highly divergent C-end of Cob (complex III) and multiple small insertions in nad subunits of Complex I (nad2,4, 6, 9). In all cases, an uninterrupted version of the corresponding gene is  126  found in both Colponema mt genomes. Interestingly, one novel gene split is also present in Colponema: nad5 in both C. sp. Peru and C. sp. Vietnam is encoded in two widely-separated genomic fragments and split at different positions. Evolution of mitochondrial tRNA genes in alveolates is also linked to interesting questions related to tRNA import. Mt genomes in ciliates encode only a handful of tRNA genes (3-9), and those in apicomplexans and dinoflagellates have lost them altogether and import aminoacylated tRNAs from the cytosol (Esseiva et al., 2004; Pino et al., 2010). In contrast to this, C. sp. Peru has retained an unusually large set of 21 mt-encoded tRNA genes (Figure 5.3), which requires import of only 3 tRNA species, trnA, trnG and trnR(ncg), in addition to trnT, which is imported in most eukaryotes. This indicates that the ancestral alveolate mt genome was tRNA gene-rich and that reduction occurred independently in ciliates and myzozoans. An unusual trnI(uau) gene with a conserved predicted fold is present in C. sp. Peru, which is presumably a modified trnI(cau). Altogether, the mt genomes in the two Colponema lineages have retained the largest gene complement in alveolates, and represent the closest ancestral state for the dramatic mt reduction in myzozoans and a number of splits and modifications in mitochondrial proteins of other alveolates. Alveolate mt-encoded proteins are notoriously fast-evolving in phylogenies (Burger et al., 2000). To find whether this also applies to the two Colponema lineages a combined matrix of 17 mt-encoded proteins was analyzed using PhyloBayes CAT inference, which provides a more realistic estimate of species branch length (Lartillot and Philippe, 2004). The resulting phylogeny (Figure 5.4) shows that mt proteins in both ciliates and myzozoan are orders of magnitude faster-evolving than those in other eukaryotes and Colponema. C. sp. Peru mt proteins are  127  evolving faster than those in C. sp. Vietnam (Figure 5.4), but altogether they represent the alveolates of choice in phylogenetic reconstructions of mt genomes. The mitochondrial data presented here illustrate that the phylogenetic placement of Colponema species provides true potential in understanding alveolate evolution. No plastid genome sequences were identified in these surveys, however, cryptic plastids are difficult to find and may lack their own genomes (Matsuzaki et al., 2008). Alternatively, Colponema species may lack plastid organelles altogether (Figure 5.5). Additional evolutionary questions relate to the significance and function of alveoli, the palintomic division, and the origin of myzocytosis and apical complex-like structures (Figure 5.5). Some of these question could be addressed by a combination of genomic, transcriptomic and ultrastructural data from different Colponema lineages. In order to probe for existence of additional Colponema-related organisms, we searched environmental sequence databases using the 18S rRNA sequences recovered here. Several matches were identified, but only one was closely related to the five isolates sequenced here (Figure 5.6). The remaining sequences formed four Colponema-related lineages (CRLs) that branched either independently, or close to the divergence of the C. sp. Vietnam clade with no support (Figure 5.6) suggesting they all represent novel organisms. Finding multiple CRLs indicates that a number of deep-branching alveolates are left to be discovered that may help us learn more about character evolution in alveolates and the origin of apicomplexans, dinoflagellates and ciliates.  128  Conclusions The evidence presented here leads to several conclusions. Predatory protists are understudied, but some represent important evolutionary lineages and their successful culturing may be important in addressing previously puzzling problems. First molecular data from the mysterious Colponema shows that the genus represents two lineages of deep-branching alveolates, which is consistent with its morphological features. This evolutionary position is key with respect to apicomplexans, dinoflagellates and ciliates, and may help us understand origins of their numerous unusual features or provide the closest ancestral state to those features. Genomic surveys in two Colponema lineages show that this promise is indeed real, and reveal the probable ancestral mitochondrial organization and gene content in all alveolates. It is therefore likely that genomic and transcriptomic data from Colponema will be important in further understanding alveolate biology including plastid evolution and the presence of cryptic plastids, alveoli function and the associated proteome, and the origins of myzocytosis and apical complex-like structures.  Materials and methods Sampling and culturing The sample containing Colponema sp. Vietnam C7 was obtained from the sediment of the freshwater Dau Tron Lake, Vietnam (107о20'50'' E, 11о28'47N'') on November 24, 2010. The sample was collected at 40 cm depth (Temp. 28.8°C, pH 5.66, Conductivity 12 µS/cm) and contained organic detritus, plant debris and filamentous algae. The sample containing C. sp. 129  Vietnam C7a was obtained at the same locality in May 2012. Clonal culture of C. sp. Vietnam C7 and C7a was isolated from a single cell and cultivated using Spumella sp. OF-40 and Parabodo caudatus BAS-1 as prey. The sampled containing Colponema sp. Peru was obtained from the sediments of saline lake Supay (76°14'44.38'' W, 14°0'5.18 N''), Pisco Province, Ica Department, Peru. The sample was collected at 20 cm depth, (Salinity 35‰, Temp. about 25°C) and contained mainly organic detritus. A culture of C. sp. Peru was isolated from a single cell and cultured on a marine bodonid Procryptobia sorokini. Colponema edaphicum was isolated as described previously (Mylnikov and Tikhonenkov, 2007). A suspension of Pseudomonas fluorescens ICISC19 was used for maintaining of the Colponema prey. DNA extraction, sequencing and data assembly Colponemid cells were were harvested at the peak of their abundance and collected by centrifugation. Genomic DNA was extracted from fresh cells or cells preserved in 96% ethanol (strain C7). Ribosomal RNA and protein-coding genes were amplified using universal primers. Total genomic DNA of C. sp. Peru was sequenced using Illumina 100bp paired-end technology. Ten single cells of C7a were hand-picked using a glass micropipette, their genomic DNA was amplified using Genomiphi v2 (GE Healthcare) and sequenced using Illumina 150bp paired-end technology. DNA sequence reads of C. sp. Peru were assembled in Ray 2.0 using kmer=21. Mitochondrial (mt) genome of C. sp. Peru was identified using Blastp searches on two overlapping contigs, the larger of which could be connected to each end of the smaller forming a terminal inverted repeat (TIR). A 38 bp conserved tandem telomeric repeat (CCTCTGAGTGAGATTATCTTAATATTCAAAACAAACCC) was found on the outward-oriented side of the TIR. No additional contigs with mt genes or the telomeric repeat  130  were identified congruently suggesting the genome is complete. The total size of the mt genome is 50393 bp (15225 bp single copy region and 17584 bp TIRs) flanked by telomeres of unknown length. Paired-end genomic reads of C. sp. Vietnam C7a were merged in FLASH (Magoc and Salzberg, 2011) and assembled in SGA. 12 mt contigs were identified in the assembly using Blastp searches and AT content sorting, and extended and merged using a combination of PCR and successive rounds of read mapping to contig ends in Consed 23 (Gordon et al., 1998). This resulted in four contigs totaling 23.4 kbp in length and representing a partial mt genome. Annotations and comparative genomic analyses Genes were annotated according to Blastp homology and tRNA genes were predicted using tRNAscan-SE organellar search (Schattner et al., 2005). Mitochondrial genome map in Figure 5.1 was drawn with the aid of GView (Petkau et al., 2010). In drawing the Venn diagram in Figure 5.2, the identity of all tRNA genes in alveolate and stramenopile mt genomes were verified in tRNAscan-SE. The trnC(gca) gene in ciliates is shown as present, but has a very low Cove score (21.05) and is only found in Sterkiella (Oxytricha). The identity of the ciliate rps7 was not confirmed and the gene is shown as absent in the group. Phylogenetic analyses The nuclear ribosomal RNA operon sequences were searched in complete eukaryotic genomes (where available) or downloaded from the GenBank nucleotide database. The four protein-coding genes used in the nuclear phylogeny were first searched in complete genomes and transcriptomes (where available). In order to distinguish gene paralogs all significant hits (<1e-25) were retrieved from genomes and transcriptomes and single gene phylogenies in FastTree 2.1.4 (Price et al., 2010) were used to select a homologous, slow-evolving copy for 131  each gene. Sequences from other species of important evolutionarily placement that have no genome or transcriptome available were searched in GenBank and evaluated in the next round of FastTree phylogenies. Finally, the more accurate ML algorithm in RAxML 7.28 (Stamatakis, 2006) was used to evaluate all single gene trees for congruence with the rRNA phylogeny. In several cases, genes from closely related species or genera were combined into operational taxonomic one unit for phylogenetic inferences. The dataset for mitochondrial phylogeny (Figure 5.4) was assembled from the 17 following predominantly mitochondrion-encoded proteins: atp9, cob, cox1, cox2, cox3, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, rpl2, rpl14, rpl16. In several cases the nucleus-encoded mitochondrion-targeted protein homolog was determined using FastTree phylogenies. The 18S rRNA dataset (Figure 5.6) was built using 172 sequences from alveolates and other eukaryotes, 6 Colponema sequences and 9 environmental clones retrieved from the GenBank Environmental sequence database. Single genes were aligned independently MAFFT 7.031b (Katoh and Standley, 2013) and variable sites were removed using trimAl 1.2rev59 (Capella-Gutierrez et al., 2009) using the strict algorithm (proteins) or Gblocks 0.91b (Castresana, 2000) using b1=70%, b2=75%, b3=12, b4=4, b5=h parameters (rRNA operon). Final phylogenetic inferences were done in RAxML 7.28 (Stamatakis, 2006) using -m GTRGAMMA (or PROTGAMMALGF for proteins) -f a -# 500 parameters, PhyML 3.0.1 (Guindon et al., 2010) using -m GTR -f e -v e -c 8 -a e -b -4 -s BEST --n_rand_starts 10 parameters, and Phylobayes 3.2e (Lartillot et al., 2009) using the -cat -poi algorithm run as 2 chains, each run for 98000 generations (20000 burnin; maxdiff: 0.0802305; meandiff: 0.00281839).  132  Figure 5.1: Colponema represents two lineages of deep-branching alveolates. RaxML maximum likelihood phylogeny of six nuclear loci: two ribosomal RNA genes (18S rRNA and 28S rRNA) and four proteins (hsp90, actin, alpha tubulin and beta tubulin). The tree was rooted with a representative sampling of all other eukaryotes. Black dots denote complete support (100).  133  Figure 5.2: Mitochondrial genome map of Colponema sp. Vietnam Mitochondrial genome map with genes represented by boxes colored according to the legend. Genes on top are transcribed to the right, genes on bottom are transcribed to the left. Note the two terminal inverted repeats (TIRs) and telomeres at their ends (length of the telomeres is not known).  Figure 5.3: Venn diagram of mitochondrial genome contents. The sum of all mitochondrial genes found in several eukaryotic groups and all known mitochondria are shown inside colored circles. Genes acquired secodarily, and transfer RNA genes of rare occurrence including trnI(aat) in Colponema sp. Peru were excluded. tRNA-Met species with 'cau' anticodons could not be unambiguously distinguished in the alveolates (asterisk).  134  Figure 5.4: Comparison of evolutionary rates in alveolate mitochondrial proteins. PhyloBayes tree inferred from a concatenation of 17 mitochondrial proteins using the CAT model. Evolutionary rates of mitochondrial protein sequences from two Colponema species are more similar to other eukaryotes than those in ciliates, apicomplexans and dinoflagellates.  135  Figure 5.5: Distribution of selected characteristics among alveolates. Presence or absence of each character is recorded based on the putative ancestral state in the lineage.  136  Figure 5.6: Environmental diversity of Colponema-related lineages (CRLs). Maximum likelihood (PhyML) 18S rRNA phylogeny shows environmental sequence evidence for novel lineages of deep-branching alveolates. The tree has aLRT branch supports (>0.5 are shown) and thickened branches denote ≥0.99 support.  137  Chapter 6: Conclusion Summary The results presented here show that much is to be learned about apicomplexan relatives and their significance to well-studied species. Chromera velia has fulfilled its potential, and together with Vitrella brassicaformis (CCMP3155), provides strong evidence about the origin of the apicomplexan plastid and a simple path for understanding the apicoplast genome content and order. Unexpectedly, the presence of form II Rubisco and additional characteristics simultaneously addresses the origin of the plastid in dinoflagellates, which is inferred to descend from the same ancestor as the apicoplast. The evidence suggests that these plastids are together related to those in stramenopile algae, and that multiple losses of photosynthesis and plastid organelles occurred in apicomplexan and dinoflagellate relatives. Altogether, this data helps to resolve a key part of the puzzling history of plastid endosymbiosis, addresses the frequency of plastid loss and gain, and points attention to intriguing genome reduction in the apicoplast and its relatives. The evolutionary significance put the C. velia plastid in the center of attention, but overshadowed its unusual genomic structure. Two genes, psaA encoding a key protein of the photosynthesis reaction center and atpB encoding a core subunit of the membrane ATP synthase, were found split in two fragments. Comprehensive evidence at the transcriptomic and protein level indicates that the two fragments are not spliced together, but expressed independently. This represents an unprecedented case in plastid functional biology and poses important questions for the function of membrane complexes and their assembly. Canonical gene clusters have been reorganized into novel putative transcriptional units, and in-depth transcriptional profile shows  138  that many unusual ORFs bearing long insertions and extensions are transcribed in their entirety. Characteristics of the plastid genome ends and pulse-field gel electrophoresis congruently suggest that the genome is most likely linear. A detailed inspection of the plastid genomic landscape indicates numerous traces of intrachromosomal recombination, which could provide rationale for gene splits, gene duplicates, ORF modifications, highly-modified gene order and, perhaps, the origin of the linear organization. The discovery of Chromera and Vitrella sparked interest in their diversity and environmental distribution: deep algal diversity is commonly considered well understood and accordingly, findings of novel algal lineages have been rare. We tested this assumption by searching plastid contamination in publicly available bacterial 16S rRNA surveys. Ten thousand plastid sequences were identified and virtually all were assignable to known algal groups or previously described environmental clades (in prasinophytes, haptophytes and rappemonads). The only unassigned plastid sequences were all related to apicomplexan parasites in phylogenies. By extending this evidence for short reads, 1316 sequences were identified comprising 8 distinct apicomplexan-related lineages (ARLs), two of which corresponded to Chromera and Vitrella, and six of which were novel. Surprisingly, almost all of the sequences were derived from coral reef ecosystems. In a gradient between corals and the surrounding environment, Vitrella was most abundant outside of corals, contrary to expectations (the same was true of Chromera, but few data were available). In contrast, the most abundant and newly-discovered clade called ARL-V was tightly and exclusively associated with coral tissue and surface. Analysis of additional short read datasets reveal that ARL-V is present in at least 20 species of healthy zooxanthellate corals (representing three distinct clades) over extended periods of time and  139  enriched in shallow reef depths. Although it remains unknown whether ARL-V is photosynthetic or a parasite, its association with reef corals is novel and potentially significant to the coral reef ecology and the origin of apicomplexan parasitism. Many apicomplexan relatives include uncultured predatory protists, whose morphologies suggest a potential in understanding evolution of the well-known taxa. Five isolates of Colponema were temporarily cultured here and the first molecular data from this genus was obtained, showing it represents at least two lineages of deep-branching alveolates. Phylogeny based on six loci supported that Colponema sp. Peru (a novel species) is the closest known sister group of apicomplexans and dinoflagellates and that another lineage of Colponema is more basal with an unresolved relationship to ciliates and the rest of alveolates. Genome sequence surveys in the two lineages revealed a complete and a partial mitochondrial genome, which altogether retain many canonical characteristics of non-alveolate mitochondria. The Colponema mitochondrial genomes are rich in protein-coding and tRNA genes, and their ORFs provide the ancestral state to a number of modifications in their ciliate, apicomplexan, and dinoflagellate homologs. Additionally, Colponema mitochondrial proteins are slow-evolving comparing to other alveolates suggesting they should be preferentially used in mitochondrial genome phylogenies. The presence of terminal inverted repeats and telomeric repeats in C. sp. Peru suggest that the genome is genuinely linear and that telomeres were probably ancestral to all alveolates. This data may also lead to rethinking the evolution of mitochondrial genome organization in apicomplexans and relatives. Environmental sequence data supports existence of four additional Colponema-related lineages. This indicated that further genomic and  140  transcriptomic data from Colponema and related lineages have significant potential in resolving early events in alveolate evolution and the rise of apicomplexans, dinoflagellates and ciliates.  Future directions Chromera, Vitrella, ARL-V and Colponema represent only a part of diversity among apicomplexan relatives. Although much is left to be learned about them, other organisms represent promising targets in understanding the apicomplexan origin. Much like Colponema, the colpodellids are uncultured predators about which little is known. However, colpodellids are also known to permanently carry a primitive apical complex-like structure, which they use for attachment and feeding, much like gregarine apicomplexans. Chromera flagellates possess a similar structure, but its significance is unknown and it is absent from their vegetative cells, which are easy to culture. Vitrella, possibly the closest sister group to apicomplexans (Figure 6.1), may lack this structure altogether. The distribution of apical complex-like structures clearly illustrates that apicomplexan relatives are very different from one another and that this variability may limit our comprehension of apicomplexan origins if only one species is being examined in detail. We established temporary cultures from three species of colpodellids (Alphamonas, Voromonas, and Colpodella) and Colponema sp. Vietnam in addition to permanent cultures of Chromera and Vitrella, and obtained their transcriptomic data (Figure 1.6). This data is currently being analyzed and and holds great promise for not only resolving their own relationships with one another, but also for providing broader context for our understanding of the origin and evolution of apicomplexan parasites. 141  Figure 6.1: Alveolate phylogeny using the nuclear ribosomal RNA operon sequence. RaxML maximum likelihood tree of the nuclear ribosomal RNA operon unit with emphasis on free-living apicomplexan relatives is rooted with a representative selection of diverse eukaryotes. Rapid bootstrap supports (%) are shown; full black circles denote complete support. Genomic and transcriptomic data generated by us is coded according to the legend. Note that the mitochondrial genome of Colponema sp. Vietnam is partial only and that data for Alphamonas mitochondrial genome has not yet been confirmed (question mark). Grey arrow points to the branch uniting all alveolates.  142  References Abrahamsen, M.S., Templeton, T.J., Enomoto, S., Abrahante, J.E., Zhu, G., Lancto, C.A., Deng, M., Liu, C., Widmer, G., Tzipori, S., et al. (2004). Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science 304, 441–445. Adl, S.M., Leander, B.S., Simpson, A.G.B., Archibald, J.M., Anderson, O.R., Bass, D., Bowser, S.S., Brugerolle, G., Farmer, M.A., Karpov, S., et al. (2007). Diversity, nomenclature, and taxonomy of protists. Syst. Biol. 56, 684–689. Amunts, A., Toporik, H., Borovikova, A., and Nelson, N. (2010). Structure determination and improved model of plant photosystem I. J. Biol. Chem. 285, 3478–3486. Bachvaroff, T.R., Concepcion, G.T., Rogers, C.R., Herman, E.M., and Delwiche, C.F. (2004). Dinoflagellate expressed sequence tag data indicate massive transfer of chloroplast genes to the nuclear genome. Protist 155, 65–78. Barbrook, A.C., and Howe, C.J. (2000). Minicircular plastid DNA in the dinoflagellate Amphidinium operculatum. Mol. Genomics Genet. 263, 152–158. Barbrook, A.C., Symington, H., Nisbet, R.E.R., Larkum, A., and Howe, C.J. (2001). Organisation and expression of the plastid genome of the dinoflagellate Amphidinium operculatum. Mol. Genet. Genomics 266, 632–638. Barbrook, A.C., Santucci, N., Plenderleith, L.J., Hiller, R.G., and Howe, C.J. (2006). Comparative analysis of dinoflagellate chloroplast genomes reveals rRNA and tRNA genes. Bmc Genomics 7, 297–297. Barbrook, A.C., Dorrell, R.G., Burrows, J., Plenderleith, L.J., Nisbet, R.E.R., and Howe, C.J. (2012). Polyuridylylation and processing of transcripts from multiple gene minicircles in chloroplasts of the dinoflagellate Amphidinium carterae. Plant Mol. Biol. 79, 347–357. Barneah, O., Ben-Dov, E., Kramarsky-Winter, E., and Kushmaro, A. (2007). Characterization of black band disease in Red Sea stony corals. Environ. Microbiol. 9, 1995–2006. Barott, K.L., Rodriguez-Brito, B., Janouškovec, J., Marhaver, K., Smith, J.E., Keeling, P., and Rohwer, F.L. (2011). Microbial diversity associated with four functional groups of benthic reef algae and the reef-building coral Montastraea annularis. Environ. Microbiol. 13, 1192–1204. Barott, K.L., Rodriguez-Mueller, B., Youle, M., Marhaver, K.L., Vermeij, M.J. a, Smith, J.E., and Rohwer, F.L. (2012). Microbial to reef scale interactions between the reef-building coral Montastraea annularis and benthic algae. Proc. R. Soc. B Biol. Sci. 279, 1655–1664. Bendich, A.J. (1991). Moving pictures of DNA released upon lysis from bacteria, chloroplasts, and mitochondria. Protoplasma 160, 121–130. 143  Bendich, A.J. (2004). Circular chloroplast chromosomes: the grand illusion. Plant Cell 16, 1661–1666. Blanchard, J.L., and Hicks, J.S. (1999). The non-photosynthetic plastid in malarial parasites and other apicomplexans is derived from outside the green plastid lineage. J. Eukaryot. Microbiol. 46, 367–375. Bodył, A. (2005). Do Plastid-related characters support the Chromalveolate hypothesis? J. Phycol. 41, 712–719. Bodył, A., Stiller, J.W., and Mackiewicz, P. (2009). Chromalveolate plastids: direct descent or multiple endosymbioses? Trends Ecol. Evol. 24, 119–21; author reply 121–2. Bosch, J., Paige, M.H., Vaidya, A.B., Bergman, L.W., and Hol, W.G.J. (2012). Crystal structure of GAP50, the anchor of the invasion machinery in the inner membrane complex of Plasmodium falciparum. J. Struct. Biol. 178, 61–73. Botté, C.Y., Yamaryo-Botté, Y., Janouškovec, J., Rupasinghe, T., Keeling, P.J., Crellin, P., Coppel, R.L., Maréchal, E., McConville, M.J., and McFadden, G.I. (2011). Identification of plant-like galactolipids in Chromera velia, a photosynthetic relative of malaria parasites. J. Biol. Chem. 286, 29893–29903. Boynton, J.E., Gillham, N.W., Harris, E.H., Hosler, J.P., Johnson, A.M., Jones, A.R., Randolph-Anderson, B.L., Robertson, D., Klein, T.M., and Shark, K.B. (1988). Chloroplast transformation in Chlamydomonas with high velocity microprojectiles. Science 240, 1534–1538. Bradley, P.J., Ward, C., Cheng, S.J., Alexander, D.L., Coller, S., Coombs, G.H., Dunn, J.D., Ferguson, D.J., Sanderson, S.J., Wastling, J.M., et al. (2005). Proteomic analysis of rhoptry organelles reveals many novel constituents for host-parasite interactions in Toxoplasma gondii. J. Biol. Chem. 280, 34245–34258. Brugerolle, G. (2002a). Cryptophagus subtilis: a new parasite of cryptophytes affiliated with the Perkinsozoa lineage. Eur. J. Protistol. 37, 379–390. Brugerolle, G. (2002b). Colpodella vorax: ultrastructure, predation, life-cycle, mitosis, and phylogenetic relationships. Eur. J. Protistol. 38, 113–125. Brugerolle, G. (2003). Apicomplexan parasite Cryptophagus renamed Rastrimonas gen. nov. Eur. J Protistol 39, 101. Brugerolle, G., and Mignot, J.. (1979). Observations sur le cycle l’ultrastructure et la position systematique de Spiromonas perforans (Bodo perforans Hollande 1938), flagelle parasite de Chilomonas paramecium: Ses relations avec les dinoflagelles et sporozoaires. Protistologica 15, 183–196.  144  Burger, G., Zhu, Y., Littlejohn, T.G., Greenwood, S.J., Schnare, M.N., Lang, B.F., and Gray, M.W. (2000). Complete sequence of the mitochondrial genome of Tetrahymena pyriformis and comparison with Paramecium aurelia mitochondrial DNA. J. Mol. Biol. 297, 365–380. Burki, F., Shalchian-Tabrizi, K., Minge, M., Skjaeveland, A., Nikolaev, S.I., Jakobsen, K.S., and Pawlowski, J. (2007). Phylogenomics reshuffles the eukaryotic supergroups. Plos One 2, e790–e790. Burki, F., Shalchian-Tabrizi, K., and Pawlowski, J. (2008). Phylogenomics reveals a new “megagroup” including most photosynthetic eukaryotes. Biol. Lett. 4, 366–369. Burki, F., Inagaki, Y., Bråte, J., Archibald, J.M., Keeling, P.J., Cavalier-Smith, T., Sakaguchi, M., Hashimoto, T., Horak, A., Kumar, S., et al. (2009). Large-scale phylogenomic analyses reveal that two enigmatic protist lineages, telonemia and centroheliozoa, are related to photosynthetic chromalveolates. Genome Biol. Evol. 2009, 231–238. Busch, A., and Hippler, M. (2011). The structure and function of eukaryotic photosystem I. Biochim. Biophys. Acta 1807, 864–877. Cai, X., Fuller, A.L., McDougald, L.R., and Zhu, G. (2003). Apicoplast genome of the coccidian Eimeria tenella. Gene 321, 39–46. De Cambiaire, J.-C., Otis, C., Lemieux, C., and Turmel, M. (2006). The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands. Bmc Evol. Biol. 6, 37. Capella-Gutierrez, S., Silla-Martinez, J.M., and Gabaldon, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552. Cavalier-Smith, T. (1999). Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J. Eukaryot. Microbiol. 46, 347–366. Cavalier-Smith, T., and Chao, E.E. (2004). Protalveolate phylogeny and systematics and the origins of Sporozoa and dinoflagellates (phylum Myzozoa nom. nov.). Eur. J. Protistol. 40, 185–212. Chang, W.-J., Bryson, P.D., Liang, H., Shin, M.K., and Landweber, L.F. (2005). The evolutionary origin of a complex scrambled gene. Proc. Natl. Acad. Sci. U. S. A. 102, 15149–15154.  145  Chen, C. a C.-P., Tseng, C.-H., and Tang, S.-L. (2011). The dynamics of microbial partnerships in the coral Isopora palifera. Isme J. 5, 728–740. Collins, K., and Gorovsky, M.A. (2005). Tetrahymena thermophila. Curr. Biol. Cb 15, R317–318. Correa, A.M.S., Brandt, M.E., Smith, T.B., Thornhill, D.J., and Baker, a. C. (2009). Symbiodinium associations with diseased and healthy scleractinian corals. Coral Reefs 28, 437–448. Dang, Y., and Green, B.R. (2009). Substitutional editing of Heterocapsa triquetra chloroplast transcripts and a folding model for its divergent chloroplast 16S rRNA. Gene 442, 73–80. Dang, Y., and Green, B.R. (2010). Long transcripts from dinoflagellate chloroplast minicircles suggest “rolling circle” transcription. J. Biol. Chem. 285, 5196–5203. Day, A., and Madesis, P. (2007). DNA replication, recombination, and repair in plastids. In Cell and Molecular Biology of Plastids, pp. 65–119. Dzierszinski, F., Nishi, M., Ouko, L., and Roos, D.S. (2004). Dynamics of Toxoplasma gondii Differentiation. Eukaryot. Cell 3, 992–1003. Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. Edgar, R.C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461. Edqvist, J., Burger, G., and Gray, M.W. (2000). Expression of mitochondrial protein-coding genes in Tetrahymena pyriformis. J. Mol. Biol. 297, 381–393. Eichacker, L.A., Müller, B., and Helfrich, M. (1996). Stabilization of the chlorophyll binding apoproteins, P700, CP47, CP43, D2, and D1, by synthesis of Zn-pheophytin< i> a</i> in intact etioplasts from barley. Febs Lett. 395, 251–256. Eisen, J. a, Coyne, R.S., Wu, M., Wu, D., Thiagarajan, M., Wortman, J.R., Badger, J.H., Ren, Q., Amedeo, P., Jones, K.M., et al. (2006). Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. Plos Biol. 4, e286–e286. Ellis, T.H., and Day, A. (1986). A hairpin plastid genome in barley. Embo J. 5, 2769–2774. Esseiva, A.C., Naguleswaran, A., Hemphill, A., and Schneider, A. (2004). Mitochondrial tRNA import in Toxoplasma gondii. J. Biol. Chem. 279, 42363–42368. Falkowski, P.G., Katz, M.E., Knoll, A.H., Quigg, A., Raven, J.A., Schofield, O., and Taylor, F.J.R. (2004). The evolution of modern eukaryotic phytoplankton. Science 305, 354–360. 146  Fast, N.M., Kissinger, J.C., Roos, D.S., and Keeling, P.J. (2001). Nuclear-encoded, plastid-targeted genes suggest a single common origin for apicomplexan and dinoflagellate plastids. Mol. Biol. Evol. 18, 418–426. Fast, N.M., Xue, L., Bingham, S., and Keeling, P.J. (2002). Re-examining alveolate evolution using multiple protein molecular phylogenies. J. Eukaryot. Microbiol. 49, 30–37. Feagin, J.E. (1994). The extrachromosomal DNAs of apicomplexan parasites. Annu. Rev. Microbiol. 48, 81–104. Fernández, I., Pardos, F., Benito, J., and Arroyo, N.L. (1999). Acrocoelus glossobalani gen. nov. et sp. nov., a protistan flagellate from the gut of the enteropneust Glossabalanus minutus. Eur. J. Protistol. 35, 55–65. Fichera, M.E., and Roos, D.S. (1997). A plastid organelle as a drug target in apicomplexan parasites. Nature 390, 407–409. Funes, S., Davidson, E., Reyes-Prieto, A., Magallón, S., Herion, P., King, M.P., and González-Halphen, D. (2002). A green algal apicoplast ancestor. Science 298, 2155–2155. Gabrielsen, T.M., Minge, M.A., Espelund, M., Tooming-Klunderud, A., Patil, V., Nederbragt, A.J., Otis, C., Turmel, M., Shalchian-Tabrizi, K., Lemieux, C., et al. (2011). Genome evolution of a tertiary dinoflagellate plastid. Plos One 6, e19132–e19132. Gajadhar, a a, Marquardt, W.C., Hall, R., Gunderson, J., Ariztia-Carmona, E.V., and Sogin, M.L. (1991). Ribosomal RNA sequences of Sarcocystis muris, Theileria annulata and Crypthecodinium cohnii reveal evolutionary relationships among apicomplexans, dinoflagellates, and ciliates. Mol. Biochem. Parasitol. 45, 147–154. Gao, Z., Wang, X., Hannides, A.K., Sansone, F.J., and Wang, G. (2011). Impact of redox-stratification on the diversity and distribution of bacterial communities in sandy reef sediments in a microcosm. Chin. J. Ocean. Limnol. 29, 1209–1223. Gardner, M.J., Feagin, J.E., Moore, D.J., Spencer, D.F., W Gray, M., Williamson, D.H., and JM Wilson, R. (1991). Organisation and expression of small subunit ribosomal RNA genes encoded by a 35-kilobase circular DNA in< i> Plasmodium falciparum</i>. Mol. Biochem. Parasitol. 48, 77–88. Garren, M., Raymundo, L., Guest, J., Harvell, C.D., and Azam, F. (2009). Resilience of coral-associated bacterial communities exposed to fish farm effluent. Plos One 4, e7319–e7319. Gatenby, A., Rothstein, S., and Nomura, M. (1989). Translational coupling of the maize chloroplast atpB and atpE genes. Proc. Natl. Acad. Sci. 86, 4066–4066.  147  Goldbach, R.W., Arnberg, A.C., van Bruggen, E.F., Defize, J., and Borst, P. (1977). The structure of Tetrahymena pyriformis mitochondrial DNA. I. Strain differences and occurrence of inverted repetitions. Biochim. Biophys. Acta 477, 37–50. Gómez, F., López-García, P., Nowaczyk, A., and Moreira, D. (2009). The crustacean parasites Ellobiopsis Caullery, 1910 and Thalassomyces Niezabitowski, 1913 form a monophyletic divergent clade within the Alveolata. Syst. Parasitol. 74, 65–74. Gordon, D., Abajian, C., and Green, P. (1998). Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202. Gornik, S.G., Ford, K.L., Mulhern, T.D., Bacic, A., McFadden, G.I., and Waller, R.F. (2012). Loss of Nucleosomal DNA Condensation Coincides with Appearance of a Novel Nuclear Protein in Dinoflagellates. Curr. Biol. Cb 1–10. Gould, S.B., Tham, W.-H., Cowman, A.F., McFadden, G.I., and Waller, R.F. (2008). Alveolins, a new family of cortical proteins that define the protist infrakingdom Alveolata. Mol. Biol. Evol. 25, 1219–1230. Gould, S.B., Kraft, L.G.K., van Dooren, G.G., Goodman, C.D., Ford, K.L., Cassin, A.M., Bacic, A., McFadden, G.I., and Waller, R.F. (2011). Ciliate pellicular proteome identifies novel protein families with characteristic repeat motifs that are common to alveolates. Mol. Biol. Evol. 28, 1319–1331. Grauvogel, C., Reece, K.S., Brinkmann, H., and Petersen, J. (2007). Plastid Isoprenoid Metabolism in the Oyster Parasite Perkinsus marinus Connects Dinoflagellates and Malaria Pathogens—New Impetus for Studying Alveolates. J. Mol. Evol. 65, 725–729. Gray, M.W., Lukes, J., Archibald, J.M., Keeling, P.J., and Doolittle, W.F. (2010). Irremediable complexity? Science 330, 920–921. Green, B.R. (2003). The Evolution of Light-Harvesting Antennas. In Light-Harvesting Antennas in Photosynthesis, (Netherlands: Kluwer Academic Publishers), pp. 129–168. Greider, C.W., and Blackburn, E.H. (1985). Identification of a specific telomere terminal transferase activity in Tetrahymena extracts. Cell 43, 405–413. Groth, G. (2002). Structure of spinach chloroplast F1-ATPase complexed with the phytopathogenic inhibitor tentoxin. Proc. Natl. Acad. Sci. 99, 3464–3468. Guillou, L., Viprey, M., Chambouvet, A., Welsh, R.M., Kirkham, A.R., Massana, R., Scanlan, D.J., and Worden, A.Z. (2008). Widespread occurrence and genetic diversity of marine parasitoids belonging to Syndiniales (Alveolata). Environ. Microbiol. 10, 3349–3365. Guindon, S., and Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. 148  Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321. Haberle, R.C., Fourcade, H.M., Boore, J.L., and Jansen, R.K. (2008). Extensive rearrangements in the chloroplast genome of Trachelium caeruleum are associated with repeats and tRNA genes. J. Mol. Evol. 66, 350–361. Hackett, J.D., Yoon, H.S., Soares, M.B., Bonaldo, M.F., Casavant, T.L., Scheetz, T.E., Nosenko, T., and Bhattacharya, D. (2004). Migration of the plastid genome to the nucleus in a peridinin dinoflagellate. Curr. Biol. 14, 213–218. Hackett, J.D., Yoon, H.S., Li, S., Reyes-Prieto, A., Rümmele, S.E., and Bhattacharya, D. (2007). Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of rhizaria with chromalveolates. Mol. Biol. Evol. 24, 1702–1713. Hagopian, J.C., Reis, M., Kitajima, J.P., Bhattacharya, D., de Oliveira, M.C., Kitajima, P., and Oliveira, M.C.D. (2004). Comparative analysis of the complete plastid genome sequence of the red alga Gracilaria tenuistipitata var. liui provides insights into the evolution of rhodoplasts and their relationship to other plastids. J. Mol. Evol. 59, 464–477. Hall, T.A. (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98. Heintzelman, M.B., and Schwartzman, J.D. (1997). A novel class of unconventional myosins from< i> Toxoplasma gondii</i>. J. Mol. Biol. 271, 139–146. Herranen, M., Battchikova, N., Zhang, P., Graf, A., Sirpiö, S., Paakkarinen, V., and Aro, E.-M. (2004). Towards functional proteomics of membrane protein complexes in Synechocystis sp. PCC 6803. Plant Physiol. 134, 470–481. Hikosaka, K., Watanabe, Y.-I., Tsuji, N., Kita, K., Kishine, H., Arisue, N., Palacpac, N.M.Q., Kawazu, S.-I., Sawai, H., Horii, T., et al. (2010). Divergence of the mitochondrial genome structure in the apicomplexan parasites, Babesia and Theileria. Mol. Biol. Evol. 27, 1107–1116. Hikosaka, K., Nakai, Y., Watanabe, Y., Tachibana, S.-I., Arisue, N., Palacpac, N.M.Q., Toyama, T., Honma, H., Horii, T., Kita, K., et al. (2011). Concatenated mitochondrial DNA of the coccidian parasite Eimeria tenella. Mitochondrion 11, 273–278. Hikosaka, K., Kita, K., and Tanabe, K. (2013). Diversity of mitochondrial genome structure in the phylum Apicomplexa. Mol. Biochem. Parasitol. 188, 26–33. Hiller, R.G. (2001). “Empty” minicircles and petB/atpA and psbD/psbE (cytb559 alpha) genes in tandem in Amphidinium carterae plastid DNA. Febs Lett. 505, 449–452.  149  Hu, K., Roos, D.S., and Murray, J.M. (2002). A novel polymer of tubulin forms the conoid of Toxoplasma gondii. J. Cell Biol. 156, 1039–1050. Hu, K., Johnson, J., Florens, L., Fraunholz, M., Suravajjala, S., DiLullo, C., Yates, J., Roos, D.S., and Murray, J.M. (2006). Cytoskeletal components of an invasion machine--the apical complex of Toxoplasma gondii. Plos Pathog. 2, e13–e13. Huelsenbeck, J.P., and Ronquist, F. (2001). MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755. Hunfeld, K., Hildebrandt, A., and Gray, J. (2008). Babesiosis: Recent insights into an ancient disease. Int. J. Parasitol. 38, 1219–1237. Jackson, C.J., Gornik, S.G., and Waller, R.F. (2011). The Mitochondrial Genome and Transcriptome of the Basal Dinoflagellate Hematodinium sp.: Character Evolution within the Highly Derived Mitochondrial Genomes of Dinoflagellates. Genome Biol. Evol. 4, 59–72. Janouškovec, J., Horák, A., Oborník, M., Lukeš, J., and Keeling, P.J. (2010). A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc. Natl. Acad. Sci. U. S. A. 107, 10949–10954. Janouškovec, J., Horák, A., Barott, K.L., Rohwer, F.L., and Keeling, P.J. (2012). Global analysis of plastid diversity reveals apicomplexan-related lineages in coral reefs. Curr. Biol. 22, R518–9. Järvi, S., Suorsa, M., Paakkarinen, V., and Aro, E. (2011). Optimized native gel systems for separation of thylakoid protein complexes: novel super- and mega-complexes. Biochem. J. 439, 207–214. Jordan, P., Fromme, P., Witt, H.T., Klukas, O., Saenger, W., and Krauss, N. (2001). Three-dimensional structure of cyanobacterial photosystem I at 2.5 A resolution. Nature 411, 909–917. Kairo, A., Fairlamb, A.H., Gobright, E., and Nene, V. (1994). A 7.1 kb linear DNA molecule of Theileria parva has scrambled rDNA sequences and open reading frames for mitochondrially encoded proteins. Embo J. 13, 898–905. Kamikawa, R., Inagaki, Y., and Sako, Y. (2007). Fragmentation of mitochondrial large subunit rRNA in the dinoflagellate Alexandrium catenella and the evolution of rRNA structure in alveolate mitochondria. Protist 158, 239–245. Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. Katoh, K., and Toh, H. (2008). Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9, 286–298.  150  Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518. Kats, L.M., Black, C.G., Proellocks, N.I., and Coppel, R.L. (2006). Plasmodium rhoptries: how things went pear-shaped. Trends Parasitol. 22, 269–276. Khan, H., Parks, N., Kozera, C., Curtis, B. a, Parsons, B.J., Bowman, S., and Archibald, J.M. (2007). Plastid genome sequence of the cryptophyte alga Rhodomonas salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol. Biol. Evol. 24, 1832–1842. Kim, E., Harrison, J.W., Sudek, S., Jones, M.D.M., Wilcox, H.M., Richards, T. a, Worden, A.Z., and Archibald, J.M. (2011). Newly identified and diverse plastid-bearing branch on the eukaryotic tree of life. Proc. Natl. Acad. Sci. U. S. A. 108, 1496–1500. Kim, J., Eichacker, L.A., Rudiger, W., and Mullet, J.E. (1994). Chlorophyll regulates accumulation of the plastid-encoded chlorophyll proteins P700 and D1 by increasing apoprotein stability. Plant Physiol. 104, 907–916. Klaus, J.S., Janse, I., Heikoop, J.M., Sanford, R.A., and Fouke, B.W. (2007). Coral microbial communities, zooxanthellae and mucus along gradients of seawater depth and coastal pollution. Environ. Microbiol. 9, 1291–1305. Klaus, J.S., Janse, I., and Fouke, B.W. (2011). Coral Black Band Disease Microbial Communities and Genotypic Variability of the Dominant Cyanobacteria (CD1C11). Bull. Mar. Sci. 87, 795–821. Köhler, S., Delwiche, C.F., Denny, P.W., Tilney, L.G., Webster, P., Wilson, R.J., Palmer, J.D., and Roos, D.S. (1997). A plastid of probable green algal origin in Apicomplexan parasites. Science 275, 1485–1489. Kolodner, R.D., and Tewari, K.K. (1975). Chloroplast DNA from higher plants replicates by both the Cairns and the rolling circle mechanism. Nature 256, 708–711. Kořený, L., Sobotka, R., Janouškovec, J., Keeling, P.J., and Oborník, M. (2011). Tetrapyrrole synthesis of photosynthetic chromerids is likely homologous to the unusual pathway of apicomplexan parasites. Plant Cell 23, 3454–3462. Krishnan, N.M., and Rao, B.J. (2009). A comparative approach to elucidate chloroplast genome replication. Bmc Genomics 10, 237–237. Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J., and Marra, M.A. (2009). Circos: An information aesthetic for comparative genomics. Genome Res. 19, 1639–1645.  151  Kuvardina, O.N., Leander, B.S., Aleshin, V.V., Myl’nikov, A.P., Keeling, P.J., and Simdyanov, T.G. (2002). The phylogeny of colpodellids (Alveolata) using small subunit rRNA gene sequences suggests they are the free-living sister group to apicomplexans. J. Eukaryot. Microbiol. 49, 498–504. Laatsch, T., Zauner, S., Stoebe-Maier, B., Kowallik, K.V., and Maier, U.-G. (2004). Plastid-derived single gene minicircles of the dinoflagellate Ceratium horridum are localized in the nucleus. Mol. Biol. Evol. 21, 1318–1322. LaJeunesse, T.C., Lambert, G., Andersen, R.A., Coffroth, M.A., and Galbraith, D.W. (2005). Symbiodinium (Pyrrhophyta) genome sizes (DNA content) are smallest among dinoflagellates. J. Phycol. 41, 880–886. Lal, K., Prieto, J.H., Bromley, E., Sanderson, S.J., Yates, J.R., Wastling, J.M., Tomley, F.M., and Sinden, R.E. (2009). Characterisation of Plasmodium invasive organelles; an ookinete microneme proteome. Proteomics 9, 1142–1151. Lalremruata, A., Ball, M., Bianucci, R., Welte, B., Nerlich, A.G., Kun, J.F.J., and Pusch, C.M. (2013). Molecular Identification of Falciparum Malaria and Human Tuberculosis Co-Infections in Mummies from the Fayum Depression (Lower Egypt). Plos One 8, e60307. Lane, C.E., and Archibald, J.M. (2008). The eukaryotic tree of life: endosymbiosis takes its TOL. Trends Ecol. Evol. 23, 268–275. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. Lartillot, N., and Philippe, H. (2004). A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109. Lartillot, N., Lepage, T., and Blanquart, S. (2009). PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–2288. Lau, A.O.T., McElwain, T.F., Brayton, K. a, Knowles, D.P., and Roalson, E.H. (2009). Babesia bovis: a comprehensive phylogenetic analysis of plastid-encoded genes supports green algal origin of apicoplasts. Exp. Parasitol. 123, 236–243. Leander, B.S. (2008a). Marine gregarines: evolutionary prelude to the apicomplexan radiation? Trends Parasitol. 24, 60–67. Leander, B.S. (2008b). A hierarchical view of convergent evolution in microbial eukaryotes. J. Eukaryot. Microbiol. 55, 59–68. Leander, B.S., and Keeling, P.J. (2003). Morphostasis in alveolate evolution. Trends Ecol. Evol. 18, 395–402.  152  Leander, B.S., Kuvardina, O.N., Aleshin, V.V., Mylnikov, A.P., and Keeling, P.J. (2003). Molecular phylogeny and surface morphology of Colpodella edax (Alveolata): insights into the phagotrophic ancestry of apicomplexans. J. Eukaryot. Microbiol. 50, 334–340. Lee, R.E., and Kugrens, P. (1992). Relationship between the flagellates and the ciliates. Microbiol. Rev. 56, 529–542. Lee, R.E., Kugrens, P., and Mylnikov, A.P. (1991). Feeding apparatus of the colorless flagellate Katablepharis (Cryptophyceae). J. Phycol. 27, 725–733. Lekomtsev, S. a. (2007). Nonstandard genetic codes and translation termination. Mol. Biol. 41, 878–885. Lepère, C., Vaulot, D., and Scanlan, D.J. (2009). Photosynthetic picoeukaryote community structure in the South East Pacific Ocean encompassing the most oligotrophic waters on Earth. Environ. Microbiol. 11, 3105–3117. Leung, S.K., and Wong, J.T.Y. (2009). The replication of plastid minicircles involves rolling circle intermediates. Nucleic Acids Res. 37, 1991–2002. Lilly, J.W., Havey, M.J., Jackson, S. a, and Jiang, J. (2001). Cytogenomic analyses reveal the structural plasticity of the chloroplast genome in higher plants. Plant Cell 13, 245–254. López-García, P., Rodríguez-Valera, F., Pedrós-Alió, C., and Moreira, D. (2001). Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton. Nature 409, 603–607. Lukes, J., Leander, B.S., and Keeling, P.J. (2009). Cascades of convergent evolution: the corresponding evolutionary histories of euglenozoans and dinoflagellates. Proc. Natl. Acad. Sci. U. S. A. 106 Suppl, 9963–9970. Magoc, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963. Majeed, W., Zhang, Y., Xue, Y., Ranade, S., Blue, R.N., Wang, Q., and He, Q. (2012). RpaA Regulates the Accumulation of Monomeric Photosystem I and PsbA under High Light Conditions in Synechocystis sp. PCC 6803. Plos One 7, e45139. Masuda, I., Matsuzaki, M., and Kita, K. (2010). Extensive frameshift at all AGG and CCC codons in the mitochondrial cytochrome c oxidase subunit 1 gene of Perkinsus marinus (Alveolata; Dinoflagellata). Nucleic Acids Res. 38, 6186–6194. Matsuzaki, M., Kikuchi, T., Kita, K., Kojima, S., and Kuroiwa, T. (2001). Large amounts of apicoplast nucleoid DNA and its segregation in Toxoplasma gondii. Protoplasma 218, 180–191.  153  Matsuzaki, M., Kuroiwa, H., Kuroiwa, T., Kita, K., and Nozaki, H. (2008). A cryptic algal group unveiled: a plastid biosynthesis pathway in the oyster parasite Perkinsus marinus. Mol. Biol. Evol. 25, 1167–1179. Mazor, Y., Greenberg, I., Toporik, H., Beja, O., and Nelson, N. (2012). The evolution of photosystem I in light of phage-encoded reaction centres. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 367, 3400–3405. McEwan, M.L., Humayun, R., Slamovits, C.H., and Keeling, P.J. (2008). Nuclear Genome Sequence Survey of the Dinoflagellate Heterocapsa triquetra. J. Eukaryot. Microbiol. 55, 530–535. McFadden, G.I., and Waller, R.F. (1997). Plastids in parasites of humans. BioEssays 19, 1033–1040. McFadden, G.I., Reith, M.E., Munholland, J., and Lang-Unnasch, N. (1996). Plastid in human parasites. Nature 381, 482–482. Meissner, M. (2002). Role of Toxoplasma gondii Myosin A in Powering Parasite Gliding and Host Cell Invasion. Science 298, 837–840. Mercier, C., Adjogble, K.D.Z., Däubener, W., and Delauw, M.-F.-C. (2005). Dense granules: Are they key organelles to help understand the parasitophorous vacuole of all apicomplexa parasites? Int. J. Parasitol. 35, 829–849. Merendino, L., Perron, K., Rahire, M., Howald, I., Rochaix, J.-D., and Goldschmidt-Clermont, M. (2006). A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas. Nucleic Acids Res. 34, 262–274. Moore, R.B. (2003). Highly organized structure in the non-coding region of the psbA minicircle from clade C Symbiodinium. Int. J. Syst. Evol. Microbiol. 53, 1725–1734. Moore, R.B., Oborník, M., Janouškovec, J., Chrudimský, T., Vancová, M., Green, D.H., Wright, S.W., Davies, N.W., Bolch, C.J.S., Heimann, K., et al. (2008). A photosynthetic alveolate closely related to apicomplexan parasites. Nature 451, 959–963. Morse, D., Salois, P., Markovic, P., and Hastings, J.W. (1995). A nuclear-encoded form II RuBisCO in dinoflagellates. Science 268, 1622–1624. Moustafa, A., and Bhattacharya, D. (2008). PhyloSort: a user-friendly phylogenetic sorting tool and its application to estimating the cyanobacterial contribution to the nuclear genome of Chlamydomonas. Bmc Evol. Biol. 8, 6–6. Moustafa, A., Beszteri, B., Maier, U.G., Bowler, C., Valentin, K., and Bhattacharya, D. (2009). Genomic footprints of a cryptic plastid endosymbiosis in diatoms. Science 324, 1724–1726.  154  Mylnikov, A.P. (2009). Ultrastructure and phylogeny of colpodellids (Colpodellida, Alveolata). Biol. Bull. 36, 582–590. Mylnikov, A., and Mylnikova, Z. (2008). Feeding spectra and pseudoconoid structure in predatory alveolate flagellates. Inland Water Biol. 1, 210–216. Mylnikov, A.P., and Tikhonenkov, D.V. (2007). A new species of soil predatory flagellate, Colponema edaphicum sp. n., from Vorontsovskaya Cave, North Caucasus (Protista, Alveolata: Colponemidae). Zoosystematica Ross. 16, 1–4. Myl’nikova, Z.M., and Myl’nikov, A.P. (2010). Biology and morphology of freshwater rapacious flagellate Colponema aff. loxodes Stein (Colponema, Alveolata). Inland Water Biol. 3, 21–26. Myshrall, K.L., Mobberley, J.M., Green, S.J., Visscher, P.T., Havemann, S. a, Reid, R.P., and Foster, J.S. (2010). Biogeochemical cycling and microbial diversity in the thrombolitic microbialites of Highborne Cay, Bahamas. Geobiology 8, 337–354. Nash, E. a, Barbrook, A.C., Edwards-Stuart, R.K., Bernhardt, K., Howe, C.J., and Nisbet, R.E.R. (2007). Organization of the mitochondrial genome in the dinoflagellate Amphidinium carterae. Mol. Biol. Evol. 24, 1528–1536. Nassoury, N., Cappadocia, M., and Morse, D. (2003). Plastid ultrastructure defines the protein import pathway in dinoflagellates. J. Cell Sci. 116, 2867–2874. Nelson, N., and Yocum, C.F. (2006). Structure and function of photosystems I and II. Annu. Rev. Plant Biol. 57, 521–565. Nelson, M.J., Dang, Y., Filek, E., Zhang, Z., Yu, V.W.C., Ishida, K., and Green, B.R. (2007). Identification and transcription of transfer RNA genes in dinoflagellate plastid minicircles. Gene 392, 291–298. Nisbet, R.E.R., Koumandou, L.V., Barbrook, A.C., and Howe, C.J. (2004). Novel plastid gene minicircles in the dinoflagellate Amphidinium operculatum. Gene 331, 141–147. Nishi, M., Hu, K., Murray, J.M., and Roos, D.S. (2008). Organellar dynamics during the cell cycle of Toxoplasma gondii. J. Cell Sci. 121, 1559–1568. Nosek, J., Tomáška, L., and Kucejová, B. (2004). The chromosome end replication: lessons from mitochondrial genetics. J. Appl. Biomedecine 2, 71–79. Oborník, M., Janouškovec, J., Chrudimský, T., and Lukeš, J. (2009). Evolution of the apicoplast and its hosts: from heterotrophy to autotrophy and back again. Int. J. Parasitol. 39, 1–12. Oborník, M., Vancová, M., Lai, D.-H., Janouškovec, J., Keeling, P.J., and Lukeš, J. (2011). Morphology and ultrastructure of multiple life cycle stages of the photosynthetic relative of apicomplexa, Chromera velia. Protist 162, 115–130. 155  Oborník, M., Modrý, D., Lukeš, M., Cernotíková-Stříbrná, E., Cihlář, J., Tesařová, M., Kotabová, E., Vancová, M., Prášil, O., Lukeš, J., et al. (2012). Morphology, ultrastructure and life cycle of Vitrella brassicaformis n. sp., n. gen., a novel chromerid from the Great Barrier Reef. Protist 163, 306–323. Okamoto, N., Horák, A., and Keeling, P.J. (2012). Description of Two Species of Early Branching Dinoflagellates, Psammosa pacifica n. g., n. sp. and P. atlantica n. sp. Plos One 7, e34900–e34900. Oldenburg, D.J., and Bendich, A.J. (2004). Most chloroplast DNA of maize seedlings in linear molecules with defined ends and branched forms. J. Mol. Biol. 335, 953–970. Ozawa, S. -i., Nield, J., Terao, A., Stauber, E.J., Hippler, M., Koike, H., Rochaix, J.-D., and Takahashi, Y. (2009). Biochemical and Structural Studies of the Large Ycf4-Photosystem I Assembly Complex of the Green Alga Chlamydomonas reinhardtii. Plant Cell Online 21, 2424–2442. Palmer, J. (1983). Chloroplast DNA exists in two orientations. Nature 301, 92–93. Pappas, G., Roussos, N., and Falagas, M.E. (2009). Toxoplasmosis snapshots: Global status of Toxoplasma gondii seroprevalence and implications for pregnancy and congenital toxoplasmosis. Int. J. Parasitol. 39, 1385–1394. Patron, N.J., Inagaki, Y., and Keeling, P.J. (2007). Multiple gene phylogenies support the monophyly of cryptomonad and haptophyte host lineages. Curr. Biol. 17, 887–891. Petkau, A., Stuart-Edwards, M., Stothard, P., and Van Domselaar, G. (2010). Interactive microbial genome visualization with GView. Bioinformatics 26, 3125–3126. Pino, P., Aeby, E., Foth, B.J., Sheiner, L., Soldati, T., Schneider, A., and Soldati-Favre, D. (2010). Mitochondrial translation in absence of local tRNA aminoacylation and methionyl tRNA formylation in Apicomplexa. Mol. Microbiol. 76, 706–718. Prescott, D.M. (2000). Genome gymnastics: unique modes of DNA evolution and processing in ciliates. Nat. Rev. Genet. 1, 191–198. Price, M.N., Dehal, P.S., and Arkin, A.P. (2010). FastTree 2--approximately maximum-likelihood trees for large alignments. Plos One 5, e9490–e9490. Pritchard, A.E., Seilhamer, J.J., Mahalingam, R., Sable, C.L., Venuti, S.E., and Cummings, D.J. (1990). Nucleotide sequence of the mitochondrial genome of Paramecium. Nucleic Acids Res. 18, 173–180. Prugnolle, F., Durand, P., Ollomo, B., Duval, L., Ariey, F., Arnathau, C., Gonzalez, J.-P., Leroy, E., and Renaud, F. (2011). A fresh look at the origin of Plasmodium falciparum, the most malignant malaria agent. Plos Pathog. 7, e1001283–e1001283. 156  Quang, L.S., Gascuel, O., and Lartillot, N. (2008). Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24, 2317–2323. Quigg, A., Kotabová, E., Jarešová, J., Kaňa, R., Setlík, J., Sedivá, B., Komárek, O., and Prášil, O. (2012). Photosynthesis in Chromera velia Represents a Simple System with High Efficiency. Plos One 7, e47036–e47036. Ralph, S.A., van Dooren, G.G., Waller, R.F., Crawford, M.J., Fraunholz, M.J., Foth, B.J., Tonkin, C.J., Roos, D.S., McFadden, G.I., and Dooren, G.G.V. (2004). Tropical infectious diseases: metabolic maps and functions of the Plasmodium falciparum apicoplast. Nat. Rev. Microbiol. 2, 203–216. Rehkopf, D.H., Gillespie, D.E., Harrell, M.I., and Feagin, J.E. (2000). Transcriptional mapping and RNA processing of the Plasmodium falciparum mitochondrial mRNAs. Mol. Biochem. Parasitol. 105, 91–103. Reis, A.M.M., Araújo, S.D., Moura, R.L., Francini-Filho, R.B., Pappas, G., Coelho, A.M.A., Krüger, R.H., and Thompson, F.L. (2009). Bacterial diversity associated with the Brazilian endemic reef coral Mussismilia braziliensis. J. Appl. Microbiol. 106, 1378–1387. Rekosh, D.M., Russell, W.C., Bellet, A.J., and Robinson, A.J. (1977). Identification of a protein linked to the ends of adenovirus DNA. Cell 11, 283–295. Reyes-prieto, A., Moustafa, A., Bhattacharya, D., and Reyesprieto, a (2008). Multiple genes of apparent algal origin suggest ciliates may once have been photosynthetic. Curr. Biol. 18, 956–962. Rice, D.W., and Palmer, J.D. (2006). An exceptional horizontal gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophyte and cryptophyte plastids are sisters. Bmc Biol. 4, 31–31. Rueckert, S., and Leander, B.S. (2010). Description of Trichotokara nothriae n. gen. et sp. (Apicomplexa, Lecudinidae) – An intestinal gregarine of Nothria conchylega (Polychaeta, Onuphidae). J. Invertebr. Pathol. 104, 172–179. Saldarriaga, J.F. (2003). Multiple protein phylogenies show that Oxyrrhis marina and Perkinsus marinus are early branches of the dinoflagellate lineage. Int. J. Syst. Evol. Microbiol. 53, 355–365. Sam-Yellowe, T.Y., Del Rio, R.A., Fujioka, H., Aikawa, M., Yang, J.C., and Yakubu, Z. (1998). Isolation of merozoite rhoptries, identification of novel rhoptry-associated proteins from Plasmodium yoelii, P. chabaudi, P. berghei, and conserved interspecies reactivity of organelles and proteins with P. falciparum rhoptry-specific antibodies. Exp. Parasitol. 89, 271–284. Sanchez-Puerta, M.V., and Delwiche, C.F. (2008). A hypothesis for plastid evolution in chromalveolates. J. Phycol. 44, 1097–1107. 157  Sánchez-Silva, R., Villalobo, E., Morin, L., and Torres, A. (2003). A new noncanonical nuclear genetic code: translation of UAA into glutamate. Curr. Biol. 13, 442–447. Scharff, L.B., and Koop, H.-U. (2006). Linear molecules of tobacco ptDNA end at known replication origins and additional loci. Plant Mol. Biol. 62, 611–621. Schattner, P., Brooks, A.N., and Lowe, T.M. (2005). The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 33, W686–9. Schmidt, H.A., Strimmer, K., Vingron, M., and von Haeseler, A. (2002). TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinforma. Oxf. Engl. 18, 502–504. Schnepf, E., and Elbrachter, M. (1999). Dinophyte chloroplasts and phylogeny-A review. Grana 37–41. Schubert, W.D., Klukas, O., Saenger, W., Witt, H.T., Fromme, P., and Krauss, N. (1998). A common ancestor for oxygenic and anoxygenic photosynthetic systems: a comparison based on the structural model of photosystem I. J. Mol. Biol. 280, 297–314. Shalchian-Tabrizi, K., Skånseng, M., Ronquist, F., Klaveness, D., Bachvaroff, T.R., Delwiche, C.F., Botnen, A., Tengs, T., and Jakobsen, K.S. (2006). Heterotachy processes in rhodophyte-derived secondhand plastid genes: Implications for addressing the origin and evolution of dinoflagellate plastids. Mol. Biol. Evol. 23, 1504–1515. Shaw, M.K., Compton, H.L., Roos, D.S., and Tilney, L.G. (2000). Microtubules, but not actin filaments, drive daughter cell budding and cell division in Toxoplasma gondii. J. Cell Sci. 113, 1241–1254. Shi, X.L., Lepère, C., Scanlan, D.J., and Vaulot, D. (2011). Plastid 16S rRNA Gene Diversity among Eukaryotic Picophytoplankton Sorted by Flow Cytometry from the South Pacific Ocean. Plos One 6, e18979–e18979. Silberman, J.D., Collins, A.G., Gershwin, L.-A., Johnson, P.J., and Roger, A.J. (2004). Ellobiopsids of the genus Thalassomyces are alveolates. J. Eukaryot. Microbiol. 51, 246–252. Simpson, A.G.B., and Patterson, D.J. (1996). Ultrastructure and identification of the predatory flagellate Colpodella pugnax Cienkowski (Apicomplexa) with a description of Colpodella turpis n. sp. and a review of the genus. Syst. Parasitol. 33, 187–198. Skariah, S., Bednarczyk, R.B., McIntyre, M.K., Taylor, G.A., and Mordue, D.G. (2012). Discovery of a Novel Toxoplasma gondii Conoid-Associated Protein Important for Parasite Resistance to Reactive Nitrogen Intermediates. J. Immunol. 188, 3404–3415. Slamovits, C.H., and Keeling, P.J. (2008a). Plastid-derived genes in the nonphotosynthetic alveolate Oxyrrhis marina. Mol. Biol. Evol. 25, 1297–1306. 158  Slamovits, C.H., and Keeling, P.J. (2008b). Widespread recycling of processed cDNAs in dinoflagellates. Curr. Biol. 18, R550–2. Slamovits, C.H., Saldarriaga, J.F., Larocque, A., and Keeling, P.J. (2007). The highly reduced and fragmented mitochondrial genome of the early-branching dinoflagellate Oxyrrhis marina shares characteristics with both apicomplexan and dinoflagellate mitochondrial genomes. J. Mol. Biol. 372, 356–368. Smith, D.G.S., Gawryluk, R.M.R., Spencer, D.F., Pearlman, R.E., Siu, K.W.M., and Gray, M.W. (2007). Exploring the mitochondrial proteome of the ciliate protozoon Tetrahymena thermophila: direct analysis by tandem mass spectrometry. J. Mol. Biol. 374, 837–863. Sobotka, R., Duhring, U., Komenda, J., Peter, E., Gardian, Z., Tichy, M., Grimm, B., and Wilde, A. (2008). Importance of the Cyanobacterial Gun4 Protein for Chlorophyll Metabolism and Assembly of Photosynthetic Complexes. J. Biol. Chem. 283, 25794–25802. Sørensen, K., Glazer, B., Hannides, A., and Gaidos, E. (2007). Spatial structure of the microbial community in sandy carbonate sediment. Mar. Ecol. Prog. Ser. 346, 61–74. Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690. Stelly, N., Mauger, J.-P., Claret, M., and Adoutte, A. (1991). Cortical alveoli of Paramecium: a vast submembranous calcium storage compartment. J. Cell Biol. 113, 103–112. Stelter, K., El-Sayed, N.M., and Seeber, F. (2007). The expression of a plant-type ferredoxin redox system provides molecular evidence for a plastid in the early dinoflagellate Perkinsus marinus. Protist 158, 119–130. Sunagawa, S., Woodley, C.M., and Medina, M. (2010). Threatened corals provide underexplored microbial habitats. Plos One 5, e9554–e9554. Swart, E.C., Nowacki, M., Shum, J., Stiles, H., Higgins, B.P., Doak, T.G., Schotanus, K., Magrini, V.J., Minx, P., Mardis, E.R., et al. (2011). The Oxytricha trifallax Mitochondrial Genome. Genome Biol. Evol. 4, 136–154. Tabita, F.R., Hanson, T.E., Satagopan, S., Witte, B.H., and Kreel, N.E. (2008). Phylogenetic and evolutionary relationships of RubisCO and the RubisCO-like proteins and the functional lessons provided by diverse molecular forms. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 363, 2629–2640. Takishita, K., and Uchida, A. (1999). Molecular cloning and nucleotide sequence analysis of psbA from the dinoflagellates: Origin of the dinoflagellate plastid. Phycol. Res. 47, 207–216. Taylor, F.J.R. (1976). Flagellate Phylogeny: A Study in Conflicts. J. Eukaryot. Microbiol. 23, 28–40.  159  Toller, W., Rowan, R., and Knowlton, N. (2002). Genetic evidence for a protozoan (phylum Apicomplexa) associated with corals of the Montastraea annularis species complex. Coral Reefs 21, 143–146. Tomaska, L., Makhov, A.M., Griffith, J.D., and Nosek, J. (2002). t-Loops in yeast mitochondria. Mitochondrion 1, 455–459. Tomáska, L., Nosek, J., and Fukuhara, H. (1997). Identification of a putative mitochondrial telomere-binding protein of the yeast Candida parapsilosis. J. Biol. Chem. 272, 3049–3056. Tomova, C., Geerts, W.J.C., Müller-Reichert, T., Entzeroth, R., and Humbel, B.M. (2006). New comprehension of the apicoplast of Sarcocystis by transmission electron tomography. Biol. Cell 98, 535–545. Tyler, B.M., Tripathy, S., Zhang, X., Dehal, P., Jiang, R.H.Y., Aerts, A., Arredondo, F.D., Baxter, L., Bensasson, D., Beynon, J.L., et al. (2006). Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science 313, 1261–1266. Venn, A.A., Loram, J.E., and Douglas, A.E. (2008). Photosynthetic symbioses in animals. J. Exp. Bot. 59, 1069–1080. Vondrušková, E., van den Burg, J., Zíková, A., Ernst, N.L., Stuart, K., Benne, R., and Lukeš, J. (2005). RNA interference analyses suggest a transcript-specific regulatory role for mitochondrial RNA-binding proteins MRP1 and MRP2 in RNA editing and other RNA processing in Trypanosoma brucei. J. Biol. Chem. 280, 2429–2438. Wakeman, K.C., and Leander, B.S. (2012). Molecular phylogeny of Pacific Archigregarines (Apicomplexa), including descriptions of Veloxidium leptosynaptae n. gen., n. sp., from the sea cucumber Leptosynapta clarki (Echinodermata), and two new species of Selenidium. J. Eukaryot. Microbiol. 59, 232–245. Wakeman, K.C., and Leander, B.S. (2013). Identity of environmental DNA sequences using descriptions of four novel marine gregarine parasites, Polyplicarium n. gen. (Apicomplexa), from capitellid polychaetes. Mar. Biodivers. 43, 133–147. Waller, R.F., and Jackson, C.J. (2009). Dinoflagellate mitochondrial genomes: stretching the rules of molecular biology. BioEssays 31, 237–245. Waller, R.F., and Keeling, P.J. (2006). Alveolate and chlorophycean mitochondrial cox2 genes split twice independently. Gene 383, 33–37. Waller, R.F., Reed, M.B., Cowman, A.F., and Mcfadden, G.I. (2000). Protein trafficking to the plastid of Plasmodium falciparum is via the secretory pathway. Embo J. 19, 1794–1802. Waller, R.F., Keeling, P.J., van Dooren, G.G., and McFadden, G.I. (2003). Comment on “A green algal apicoplast ancestor”. Science 301, 49; author reply 49–49; author reply 49. 160  Wang, Y., and Morse, D. (2006). Rampant polyuridylylation of plastid gene transcripts in the dinoflagellate Lingulodinium. Nucleic Acids Res. 34, 613–619. Westhoff, P., Alt, J., Nelson, N., Bottomley, W., Bünemann, H., and Herrmann, R.G. (1983). Genes and transcripts for the P700 chlorophylla apoprotein and subunit 2 of the photosystem I reaction center complex from spinach thylakoid membranes. Plant Mol. Biol. 2, 95–107. Whitney, S.M., Shaw, D.C., and Yellowlees, D. (1995). Evidence that some dinoflagellates contain a ribulose-1,5-bisphosphate carboxylase/oxygenase related to that of the alpha-proteobacteria. Proc. R. Soc. B Biol. Sci. 259, 271–275. Williamson, D.H., Gardner, M.J., Preiser, P., Moore, D.J., Rangachari, K., and Wilson, R.J. (1994). The evolutionary origin of the 35 kb circular DNA of Plasmodium falciparum: new evidence supports a possible rhodophyte ancestry. Mol. Genomics Genet. 243, 249–252. Williamson, D.H., Denny, P.W., Moore, P.W., Sato, S., McCready, S., and Wilson, R.J. (2001). The in vivo conformation of the plastid DNA of Toxoplasma gondii: implications for replication. J. Mol. Biol. 306, 159–168. Williamson, D.H., Preiser, P.R., Moore, P.W., McCready, S., Strath, M., and Wilson, R.J.M. (2002). The plastid DNA of the malaria parasite Plasmodium falciparum is replicated by two mechanisms. Mol. Microbiol. 45, 533–542. Wilson, R.J., Denny, P.W., Preiser, P.R., Rangachari, K., Roberts, K., Roy, A., Whyte, A., Strath, M., Moore, D.J., Moore, P.W., et al. (1996). Complete gene map of the plastid-like DNA of the malaria parasite Plasmodium falciparum. J. Mol. Biol. 261, 155–172. Wittig, I., and Schägger, H. (2008). Features and applications of blue-native and clear-native electrophoresis. PROTEOMICS 8, 3974–3990. Wolters, J. (1991). The troublesome parasites—molecular and morphological evidence that Apicomplexa belong to the dinoflagellate-ciliate clade. Biosystems 25, 75–83. Worden, A.Z., Janouskovec, J., McRose, D., Engman, A., Welsh, R.M., Malfatti, S., Tringe, S.G., and Keeling, P.J. (2012). Global distribution of a wild alga revealed by targeted metagenomics. Curr. Biol. 22, R675–R677. Yeh, E., and DeRisi, J.L. (2011). Chemical Rescue of Malaria Parasites Lacking an Apicoplast Defines Organelle Function in Blood-Stage Plasmodium falciparum. Plos Biol. 9, e1001138–e1001138. Zauner, S., Greilinger, D., Laatsch, T., Kowallik, K.V., and Maier, U.-G. (2004). Substitutional editing of transcripts from genes of cyanobacterial origin in the dinoflagellate Ceratium horridum. Febs Lett. 577, 535–538.  161  Zhang, H., Hou, Y., Miranda, L., Campbell, D.A., Sturm, N.R., Gaasterland, T., and Lin, S. (2007). Spliced leader RNA trans-splicing in dinoflagellates. Proc. Natl. Acad. Sci. U. S. A. 104, 4618–4623. Zhang, Z., Green, B.R., and Cavalier-Smith, T. (1999). Single gene circles in dinoflagellate chloroplast genomes. Nature 400, 155–159. Zhang, Z., Green, B.R., and Cavalier-Smith, T. (2000). Phylogeny of ultra-rapidly evolving dinoflagellate chloroplast genes: a possible common origin for sporozoan and dinoflagellate plastids. J. Mol. Evol. 51, 26–40. Zhang, Z., Cavalier-Smith, T., and Green, B.R. (2001). A family of selfish minicircular chromosomes with jumbled chloroplast gene fragments from a dinoflagellate. Mol. Biol. Evol. 18, 1558–1565. Zhang, Z., Cavalier-Smith, T., and Green, B.R. (2002). Evolution of dinoflagellate unigenic minicircles and the partially concerted divergence of their putative replicon origins. Mol. Biol. Evol. 19, 489–500.  162  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items