UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Studies on the dinoflagellate genome McEwan, Michelle Louise 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2006-0256.pdf [ 4.09MB ]
JSON: 831-1.0093155.json
JSON-LD: 831-1.0093155-ld.json
RDF/XML (Pretty): 831-1.0093155-rdf.xml
RDF/JSON: 831-1.0093155-rdf.json
Turtle: 831-1.0093155-turtle.txt
N-Triples: 831-1.0093155-rdf-ntriples.txt
Original Record: 831-1.0093155-source.json
Full Text

Full Text

STUDIES O N THE D I N O F L A G E L L A T E G E N O M E by M I C H E L L E LOUISE M c E W A N B . S c , The University o f British Columbia, 2002  A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T OF THE REQUIREMENTS FOR THE DEGREE OF M A S T E R OF SCIENCE  in  T H E F A C U L T Y OF G R A D U A T E STUDIES  (Genetics)  THE UNIVERSITY OF BRITISH C O L U M B I A A p r i l 2006 © Michelle Louise M c E w a n , 2006  ABSTRACT  Dinoflagellates are unusual eukaryotes in many ways, but one o f the most interesting features o f this cell - its enormous genome - is not well studied because its sheer size is an obstacle to sequencing. Genome expansion can be the result o f polyploidy, intron gain, mobile genetic elements, or large intergenic regions. I have studied organellar genome reduction in Kryptoperidinium foliaceum, intron composition in Heterocapsa triquetra and Karlodinium micrum, and surveyed genomic D N A from H. triquetra in order to get a better grasp on mechanisms o f genome expansion in dinoflagellates. K. foliaceum has replaced its ancestral red algal plastid with a diatom plastid via tertiary endosymbiosis. Gene transfer from endosymbiont to host nucleus has likely occurred, but this endosymbiont is much less reduced than well-studied secondary endosymbiotic intermediates, the cryptophytes and chlorarachniophytes, where relict nuclear genomes (nucleomorphs) are retained. I sequenced the first protein-coding genes from the K. foliaceum endosymbiont and host nuclear genomes. I have characterised genes for nucleus-encoded cytosolic proteins, actin, alpha-tubulin, beta-tubulin, and HSP90, from both host and symbiont nuclei o f K . foliaceum. Phylogenies show that the actin is diatom-derived, the beta-tubulin dinoflagellate-derived, while both diatom- and dinoflagellate-derived alpha-tubulin and HSP90 genes were found. The presence o f these genes implies they are still functional and that the endosymbiont is at an earlier stage of genetic reduction than those o f cryptophytes or chlorarachniophytes, Thirteen o f 16 known dinoflagellate introns are non-canonical. I amplified and screened 63 K. micrum and H. triquetra genes, but found no introns. I report that introns are  neither abundant in dinoflagellate genomes, nor have they played a major role in dinoflagellate genome expansion. I built a genomic D N A library for the dinoflagellate, H. triquetra, and sequenced 214 fragments (23 l,164bp). M a i n features o f this library include imperfect complex repeats, retrotransposon domains, 53% G C content, few open reading frames (ORFs), and a lack of identifiable protein-coding regions. These results support mobile elements and repeats as major sources o f D N A in expanded dinoflagellate genomes. The best explanation for the huge amounts of rion-coding D N A remains the idea that it functions as a structural scaffold and contributes to chromosomal organization.  K e y words:  Dinoflagellate, Kryptoperidinium foliaceum, Heterocapsa triquetra, Karlodinium micrum, Nitzschia thermalis, Phaeodactylum tricornutum, endosymbiosis, plastids, tertiary, secondary, gene transfer, genome reduction, genome expansion, introns, noncanonical, non-coding D N A , methylation, diatom, gene loss, plastid replacement.  iii  T A B L E OF CONTENTS Abstract  ii  List o f Tables  v  List o f Figures  vi  Acknowledgements  vii  Dedication  viii  CHAPTER I 1.1  Introduction Morphology, Ecology, and Lifestyle of Dinoflagellates  1.2 Biology of Dinoflagellates 1.3 Organelle Genomes in Dinoflagellates 1.4 Dinoflagellate Genomes 1.5 Expansion and Reduction of Eukaryotic Genomes 1.6 Investigating the Dinoflagellate Genome Literature Cited C H A P T E R II  40 44 45 48  A Genome Survey of Heterocapsa triquetra  4.1 Introduction 4.2 Materials and Methods 4.3 Results and Discussion Literature Cited CHAPTER V  19 21 24 36  Intron Distribution in Dinoflagellates  3.1 Introduction 3.2 Materials and Methods 3.3 Results and Discussion Literature Cited C H A P T E R IV  3 5 8 11 11 13  Organelle Integration in Kryptoperidinium foliaceum  2.1 Introduction 2.2 Materials and Methods 2.3 Results and Discussion Literature Cited C H A P T E R III  1  Conclusion  Literature Cited  52 56 58 62 71 73 iv  List of Tables  1.1  Dinoflagellates and toxins associated with Harmful Algal Blooms ( H A B ) 3  1.2  Genome sizes relative to the 3 M b human genome  4  2.1  A T content of dinoflagellate and diatom protein coding genes  32  3.1  Eukaryotic non-canonical introns  42  4.1  Eukaryotic genome sizes  53  v  List of Figures  1.1 1.2  Morphological Diversity in Benthic Dinoflagellates Plastid Evolution and Endosymbiosis  2 6  2.1 2.2 2.3 2.3  Alpha-tubulin M a x i m u m Likelihood ( M L ) phylogeny Beta-tubulin M L phylogeny Actin M L phylogeny HSP90 M L phylogeny  26 27 28 29  4.1 4.2  Size Distribution of Heterocapsa triquetra Genomic D N A Fragments G C Content of Heterocapsa triquetra Genomic D N A Fragments  -  59 61  vi  Acknowledgements  Many thanks are due to the friends, colleagues, and peers who have been such a pleasure to work and learn with. M y supervisor, Patrick Keeling has been encouraging and supportive from the very beginning. Patrick, you have been a source of scientific energy and inspiration since the very first protistology lesson. I have learned so much these past few years, and I thank you for the opportunities, support and challenges you have given me. M y committee members Naomi Fast, M a x Taylor, and Sally Otto have been excellent sources of information and guidance. I have appreciated your honest answers, helpful hints and happy laughter at committee meetings! Y o u have all taught me a lot about science and what it means to be a scientist. I also thank the members of my examining committee, Jim Berger and Naomi Fast, for their time and effort in preparation for my defense. Hugh Brock, the Director of the Genetics Graduate Program, has been the consummate go-to guy. Hugh, I always felt like you were willing to lend an ear no matter how much you had going on. I won't forget the times you went to bat for me over various roadblocks and bottlenecks. Y o u r realistic perspective and calm demeanor have been refreshing and helpful. Thank you. I thank my lab mates and fellow grad students in the Botany department for giving me a home, sharing beers, coffee, stories, coffee, techniques, coffee, and late nights at the lab. I've enjoyed many excellent discussions on the finer points of P C R voodoo and several skillful versions of the elusive P C R dance that I may never forget. M y students and mentees are reminders of the excitement of learning new things. They have helped me keep perspective on where I've been and how far I've come. A n d last but never least, my friends and cohorts, Jamie Pighin and Robin Young. Y o u always have interesting advice to give and good stories to share. Y o u make being inside all day pretty fun.  T o my parents and sisters For your love and support from the very beginning.  To my partner T i m For sharing discussions, laughs, adventures and challenges. Thank you for helping me maintain balance and for reminding of the things that are most important in life.  Co-Authorship Statement A l l work contained in this thesis was done by me except where noted. The contents of Chapter 2 have been published in a manuscript which was prepared with assistance from Patrick Keeling: M c E w a n , M L . & Keeling, P.J. 2004. "HSP90, Tubulin and Actin are Retained in the Tertiary Endosymbiont Genome of Kryptoperidinium foliaceum." The Journal of Eukaryotic Microbiology, (51) 6, 651-659. The content of Chapter 3 is all my own work, but work by Raheel Humayun on the same library will be included in the final manuscript submission.  ix  C H A P T E R I: I N T R O D U C T I O N  1.1 Morphology, Ecology and Lifestyle of Dinoflagellates Dinoflagellates (Dinozoa) are one o f three protist phyla in the eukaryotic group Alveolata, whose members are characterized by the presence o f alveolae beneath the cell membrane and tubular mitochondrial cristae. They are (mostly) unicellular organisms with cortical alveolae, two asymmetrical flagella and a unique nuclear arrangement (Taylor 1987). The other two alveolate phyla (Apicomplexa and Ciliophora) are composed o f parasitic and heterotrophic members, respectively. Dinoflagellates are globally widespread and are ecologically and morphologically diverse (Figure .1.1). Their habitats also span wide ranges in temperature, salinity, light, and nutrients. Members can be planktonic or benthic, fresh water or marine, parasitic (ex: Haplozoori), or symbiotic (ex: Symbiodinium). Photosynthesis and heterotrophy are equally represented among species, and many 'mixotrophic' species exist. Many species produce toxins that through concentration in food chains cause major yearly fish kills (Pfiesteria piscicida, Gamblerdiscus) and render shellfish highly poisonous or lethal (Table 1.1) (Van Dolah 2000). Dinoflagellates are major photosynthetic producers in the world's oceans, and the mutually beneficial symbiosis between corals and dinoflagellate genus Symbiodinium accounts for most o f the photosynthetic production in the world's coral reefs (Knowlton and Rohwer 2003). The classic external dinoflagellate morphology consists o f an epicone (epitheca), hypocone (hypotheca), and two flagella, one (transverse) that emerges ventrally and spirals counterclockwise along an equatorial groove called the girdle, and one  1  w  M M  Jl  * wS  \  • m  11p H P  Figure 1.1: Morphological Diversity in Benthic Dinoflagellates. A range of marine, sand-dwelling dinoflagellates. These are some of the less elaborate forms, yet the diversity is still astonishingly broad. Chromosomes are visible in most of these images as small clear 'bubbly' forms. (Image courtesy of Mona Hoppenrath) 2  (longitudinal) that extends posteriorly from the same origin, along a longitudinal groove called the sulcus. The transverse flagella supplies forward propulsion and the sulcal flagella acts as a rudder. The resulting motion is a forward twirling trajectory; the prefix dino- means "whirling" in greek. I have used H. triquetra, K. micrum and K. foliaceum as subjects in this research, and they all have many o f the classic features o f armored dinoflagellates (Dodge 1985; Taylor 1987).  Table 1.1: Dinoflagellates and toxins associated with Harmful A l g a l Blooms ( H A B ) (information in this tabic from (Van Dolah 2000)) Toxin  Condition  Abbreviation Associated dinoflagellate(s)  azaspiracids ciguatoxin diarrhetic shellfish toxins brevetoxin saxitoxin  azaspiracid shellfish poisoning A Z P ciguatera fish poisoning CFP  Protoperidinium crassipes Gambierdiscus toxicus  diarrhetic shellfish poisoning D S P neurotoxic shellfish poisoning N S P paralytic shellfish poisoning PSP  Dinophysis, Prorocentrum Karenia brevis Gymnodinium catenatum, Pyrodinium bahamense, Alexandrium  ?  possible estuary associated syndrome  ?  1.2 Biology of Dinoflagellates Dinoflagellates make an impact on our lives because o f their beautiful displays o f phosphorescence, their key role in the building and maintenance o f coral reefs, and stunning (visually and literally) seasonal harmful algal blooms. Incredibly, these characteristics are just the beginning o f the many exceptional aspects o f dinoflagellate biology. When we look closely at the biology o f dinoflagellates, a unique picture o f eukaryotic innovation appears. Dinoflagellates are exceptional among eukaryotes in several ways (Rizzo 2003). First, they have massive amounts o f D N A in their nuclei (as much as 60x the human genome - Table 1.2), which is easily visible with light 3  Table 1.2: Genome sizes relative to the 3 M b human genome  Organism  E. coli Encephalitozoon intestinalis Saccharomyces cerivisiae Heterocapsa pygmaea, Katodinium rotundatum Heterocapsa triquetra Gonyaulax Alexandrium tamarense Prorocentrum micans  Group  Relative Size  Bacteria  - 650x smaller  Microsporidia  - 1300x smaller  Yeast  - 2 5 Ox smaller - about the same - 7x larger - 24x larger - 3 Ox larger  Dinoflagellates  - 65x larger  Table 1.2: Genome sizes from various sources. Dinoflagellate genome size was calculated from picograms D N A (LaJeunesse, Lambert et al. 2005), assuming the average base pair = 660Da = 1.022*10-9 pg. Other sizes from (Keeling, Fast et al. 2005).  microscopy as large condensed chromosomes. Second, they manage to organize this D N A without the benefit o f histones or nucleosomes o f any recognizable sort, keeping it condensed throughout the entire cell cycle (Herzog and Soyer 1981). The chromosomes are arranged i n what is most commonly described as a dense fibrillar or brushlike organization, with loops o f D N A that unwind into the nucleoplasm during transcription (Moreno Diaz de la Espina, Alverca et al. 2005). Basic 'histone-like' proteins have been found in dinoflagellates but they likely play a role in transcription rather than a structural role (Sala-Rovira, Geraud et al. 1991). Third, dinoflagellates undergo 'closed' mitosis where the nuclear envelope remains intact and duplicated chromatids are separated by a microtubular spindle external to the envelope that never directly contacts the chromosomes (Figure 1.2) (Graham and W i l c o x 2000; Spector and Triemer 1981). Fourth, dinoflagellates show an unparalleled propensity for plastid switching and as a result, contain a wide variety of plastids with different pigment and chlorophyll  compliments (Saldarriaga, Taylor et al. 2001). Fifth, the plastid genomes themselves are quite bizarre. Many plastid genes are encoded on single-gene minicircles (Zhang, Green et al. 1999) while plastid m R N A s are modified with poly-uridine tails (Wang and Morse 2006). Sixth, dinoflagellate mitochondria are also doing things differently; mitochondrial genes undergo post-transcriptional base modifications at multiple sites, and via very rare types of base modification. Finally, spliceosomal intron boundary sequences in eukaryotes are so conserved that they are referred to as 'canonical', but in dinoflagellates, of the few introns found so far, all but one have non-canonical intron boundaries (Okamoto, L i u et al. 2001; Rowan, Whitney et al. 1996; Yoshikawa, Uchida et al. 1996). O f the seven molecular features mentioned here, I have focused on organellar genome reduction in K. foliaceum, intron composition in H. triquetra and K. micrum, and general theories o f genome expansion in H. triquetra.  1.3 Organelle Genomes in Dinoflagellates Plastids Photosynthesis occurs in plastids; in algae these plastids can be split into three groups by their endosymbiotic origins: primary, secondary, and tertiary. Primary plastids are the result o f eukaryotic engulfment o f phototrophic cyanobacteria and have a double membrane. Red and green algae, plants and glaucophytes obtained their plastids this way. Secondary plastids are the result o f an engulfment o f an algal cell containing a primary plastid. Dinoflagellates, Apicomplexans, Heterokonts, Haptophytes, Euglenids, Cryptophytes, and Chlorarachniophytes all contain a secondary plastid at different levels of integration. These plastids usually have four membranes (Figure 1.2).  5  Figure 1.2: Plastid Evolution and Endosymbiosis, from (Keeling 2004). Tertiary endosymbiosis gives rise to the dinoflagellates Dinophysis, Karenia, Kryptoperidinium, and Lepidodinium. The chromalveolates acquired their secondary plastid from a red alga.  6  Tertiary plastids are formed by endosymbiotic uptake o f a cell that contains a secondary plastid. For example, K. foliaceum obtained a tertiary plastid by engulfing a pennate diatom (a heterokont), and K. micrum contains a haptophyte-derived tertiary plastid. Tertiary plastids can have three, four or five membranes. Between the engulfment o f an algal cell and the stage at which it becomes an integrated organelle, some genes are lost from the endosymbiont nucleus and some are transferred to the host nucleus. Usually the endosymbiont nucleus reduces this way until it is lost completely, but in Chlorarachniophytes and Cryptophytes, a tiny remnant o f the secondary endosymbiont's nucleus (the nucleomorph) remains. In some dinoflagellates (K. foliaceum, Durinskia baltica, Galeidinium rugatum) the tertiary endosymbiont (now functioning as a plastid) still has a large nucleus with much genetic material (Jeffrey and Vesk 1976; Tamura, Shimada et al. 2005; Tomas and C o x 1973mor). Alveolates and Chromists are derived from a single lineage that acquired a secondary plastid o f red algal origin but dinoflagellates are the only alveolates that have widely retained this algal plastid for photosynthetic purposes. Photosynthetic dinoflagellates exhibit wide plastid diversity, due to the loss o f the ancestral peridinin containing plastid from several lineages and secondary and tertiary acquisitions o f other plastid forms (Saldarriaga, Taylor et al. 2001). Saldarriaga mapped plastid characteristics to a phylogenetic tree o f dinoflagellate nuclear S S U r R N A s to determine a proposal o f plastid loss and gain. Other evidence for plastid replacement is derived from evolutionary analysis o f plastid-targeted genes. In K. micrum, not only have plastid-targeted genes from the tertiary endosymbiont nucleus been transferred to the dinoflagellate nucleus, but they co-exist there with  7  plastid-targeted genes from the ancestral secondary plastid (Patron, Waller et al. 2006). In K. foliaceum, there is physical evidence of the ancestral plastid in the form the eyespot: a triple membrane surrounding carotene droplets (Dodge and Crawford 1969). Fickle plastid switching is not the only strange characteristic o f dinoflagellate plastids. In five genera o f peridinin containing dinoflagellates (Amphidinium, Ceratium, Protoceratium, Symbiodinium, Heterocapsa), at least 18 plastid genes are encoded on plasmid-like single-gene 'minicircles' (Barbrook and Howe 2000; Laatsch, Zauner et al. 2004; Zhang, Green et al. 1999). The minicircle m R N A transcripts are modified with a poly-U tail. Twelve poly-U m R N A s have been found to date and evidence suggests many more exist (Wang and Morse 2006). Finally, dinoflagellate plastids use type II rubisco. A l l other photosynthetic eukaryotes use type I (Morse, Salois et al. 1995). Mitochondria Some dinoflagellate mitochondrial pre-mRNAs undergo cytidine-to-uridine ( C - U ) editing seen at low frequency in many plant and animal mitochondrial transcripts. However, dinoflagellate mitochondrial transcripts (mtRNAs) also undergo at higher frequency an even more rare U - C type o f editing. A third type, adenosine-inosine (A-I) editing previously seen only in a mammalian nuclear gene and cytoplasmic t R N A s , also occurs in dinoflagellate m t R N A s . Remarkably, A - I editing occurs in close to 50% o f edited sites in these m t R N A s (Lin, Zhang et al. 2002).  1.4 Dinoflagellate Genomes In the past dinoflagellate nuclei (dinokaryons) were considered so abnormal that they were thought to represent a median life form between prokaryotic and eukaryotic cells,  8  called the "mesokaryote" (Hamkalo and Rattner 1977; Spector and Triemer 1981). Histone-like proteins in dinoflagellates show more similarity to bacterial D N A - b i n d i n g proteins than to eukaryotic histones (Wong, N e w et al. 2003) and dinoflagellates use a nuclear encoded proteobacterial form II Rubisco instead o f the cyanobacterial form I (Morse, Salois et al. 1995; Palmer 1995; Rowan, Whitney et al. 1996). With molecular phylogeny as a tool, molecular features that appear 'bacterial like' say more about potential lateral gene transfer than direct ancestry, so despite the many unusual characteristics o f dinoflagellates, they are undeniably eukaryotes (Costas and Goyanes 2005). Mitosis During mitosis in most eukaryotes, the nuclear envelope disintegrates before spindle formation and segregation o f chromosomes; in dinoflagellates, an external spindle forms outside o f the nuclear envelope, which remains intact throughout segregation and cytokinesis. The external spindle attaches to chromosomes through tunnels in the fluid nuclear envelope without ever piercing it. It pulls daughter chromatids apart much like a magnet held under a sheet o f paper pulls lead filings along its surface. The envelope then pinches off the daughter nuclei (Bhaud, Guillebault et al. 2000; Taylor 1987). Chromosomes The dinokaryon is physically very large, and is easily discernable in most cells. The most certain identifiers are the packed heterochromatin rods that remain condensed and visible at all stages o f the cell cycle. Describing dinoflagellate chromosomes has posed a challenge over the years and we still lack a comprehensive model o f their architecture, but these features are clear: Dinoflagellate chromatin is arranged in a series o f stacked,  9  nested arches arranged around a dense core stabilized by structural R N A and C a Mg  2 +  and  ions(Herzog and Soyer 1983). Segments of coding D N A loop off o f the main  chromosome structure during transcription, and tuck back in when their genes are transcriptionally inactive (Sigee 1983). Instead o f coding regions being arranged in 'chunks' along the chromosomes, as in many eukaryotic chromosomes, they exist on the periphery o f the chromosomes, interspersed with basic 'histone-like' proteins (SalaRovira, Geraud et al. 1991) surrounding a core o f highly methylated, transcriptionally inactive D N A (Herzog, Soyer et al. 1982). Stained, condensed eukaryotic chromosomes show banding patterns (karyotypes) because G C rich coding regions stain more darkly than G C poorer non-coding regions (Craig and Bickmore 1993), but dinoflagellate chromosomes do not show this classic banding pattern, which supports the idea that coding D N A is homogeneously arranged on the surface o f its chromosomes. D N A Content A large volume nucleus and large chromosomes can just as easily be attributed to a large supporting cast o f proteins as to genetic material, but D A P I staining and flow-cytometry confirms that indeed there is an abundance o f D N A in most dinoflagellates. N o r is the amount due to polyploidy, as is often suggested; dinoflagellates are haploid (Sparrow, Price et al. 1972). Dinoflagellate genomes in the 'normal' eukaryotic size range do exist, but most dinoflagellate genomes are between 15,000 - 40,000 M b p (4-1 Ox the size o f the human genome) (Table 1.2) and span a wide range o f l,300Mbp - 196,000Mbp. There are some indications that Dinoflagellate genomes may exhibit large amounts o f repeated D N A , high G C content, and up to 70% o f modified bases in the form of 5hydroxymethyluracil(Herzog, Soyer et al. 1982). In eukaryotes, smaller amounts o f  10  methylation are characteristic o f heterochromatin. The large proportion of'methylated heterochromatin' and the lack of classic histone - based D N A organization in dinoflagellates suggests a structural function for methylated D N A .  1.5 Expansion and Reduction of Eukaryotic Genomes There are several things we have learned from studying genomes that are reduced as a result o f selective pressures (parasitism as an example). With parasites, the ability to derive nutrients from the host allows loss o f metabolic genes, while the pressure for faster generation times can select for smaller genomes.  Small genomes tend to have fewer  genes and / or be more compressed. They have smaller intergenic spaces, smaller genes (at higher gene density), sometimes share regulatory regions or are co-transcribed, often have smaller and fewer introns, fewer repeated elements, fewer transposons, and tend to have high A T content. Large genomes can show the opposite characteristics: large amounts o f repeated D N A , transposons / mobile elements, larger regulatory regions allowing more complex transcriptional regulation o f a more elaborate proteome. The genes themselves may also be longer because they contain more protein-protein interaction motifs. Larger genomes tend to have more introns, larger introns, and large intergenic spaces, so while their genomes have increased greatly in size from the addition o f non-coding sequence, their proteome sizes have remained much the same.  1.6 Investigating the Dinoflagellate Genome Despite consistently strong interest in dinoflagellate molecular biology, we remain limited in our knowledge o f the molecular characteristics o f the dinoflagellate nuclear  11  genome because of the enormous size o f the dinokaryon. Dinoflagellate genomes are large enough to make sequencing a representative genome completely unrealistic even with modern technology and resources. Classic knowledge o f eukaryotic D N A organization and processing does not apply to dinoflagellates, and basic questions about D N A compaction, transcription, gene regulation, cell division, genome replication and composition are yet unanswered. I was interested in learning about how dinoflagellate genomes are organized, and I approached the question in three different ways. First I studied a dinoflagellate that in addition to its own genome, contained four other genomes. This allowed me to learn about organization and reduction o f organelle genomes. Then I looked specifically for introns in the nuclear genes, one characteristic o f large genomes. Finally for the big picture I built a library of genomic D N A fragments from H. triquetra to get a basic understanding o f the overall content o f dino genomes in the context o f their size and genetic content.  12  LITERATURE CITED  Barbrook, A. C. and C. J. Howe (2000). Minicircular plastid D N A in the dinoflagellate amphidinium operculatum. Mol Gen Genet 263(1): 152-158.  Bhaud, Y., D. Guillebault, et al. (2000). Morphology and behaviour of dinoflagellate chromosomes during the cell cycle and mitosis. J Cell Sci 113 ( Pt 7): 1231-1239.  Costas, E. and V. Goyanes (2005). Architecture and evolution of dinoflagellate chromosomes: A n enigmatic origin. Cytogenet Genome Res 109(1-3): 268-275.  Craig, J. M . and W. A. Bickmore (1993). Chromosome bands-flavours to savour. Bioessays 15(5): 349-354.  Dodge, J. D. (1985). Atlas of dinoflagellates : A scanning electron microscope survey. London, Farrand.  Dodge, J. D. and R. M . Crawford (1969). Observations on the fine structure of the eyespot and associated organelles in the dinoflagellate glenodinium foliaceum. J. Cell. Sci. 5(2): 479-493.  Graham, L. E. and L. W. Wilcox (2000). Algae. Upper Saddler River, N J , Prentice Hall.  13  Hamkalo, B. A. and J. B. Rattner (1977). The structure of mesokaryote chromosome. Chromosoma 60(1): 39-47.  Herzog, M . and M . O. Soyer (1981). Distinctive features of dinoflagellate chromatin. Absence of nucleosomes in a primitive species prorocentrum micans e. Eur J Cell Biol 23(2): 295-302.  Herzog, M . and M . O. Soyer (1983). The native structure of dinoflagellate chromosomes and their stabilization by ca2+ and mg2+ cations. Eur J Cell Biol 30(1): 33-41.  Herzog, M . , M . O. Soyer, et al. (1982). A high level of thymine replacement by 5hydroxymethyluracil in nuclear D N A of the primitive dinoflagellate prorocentrum micans e. Eur J Cell Biol 21 (2): 151-155.  Jeffrey, S. W. and M . Vesk (1976). Further evidence for a membrane-bound endosymbiont within the dinoflagellate peridinium foliaceum. J. Phycol. 12: 450-455.  Keeling, P. J. (2004). Diversity and evolutionary history of plastids and their hosts. American Journal of Botany 91: 1481 -1493.  Keeling, P. J., N. M . Fast, et al. (2005). Comparative genomics of microsporidia. Folia Parasitol (Praha) 52(1-2): 8-14.  14  Knowlton, N. and F. Rohwer (2003). Multispecies microbial mutualisms on coral reefs: The host as a habitat. Am Nat 162(4 Suppl): S51-62.  Laatsch, T., S. Zauner, et al. (2004). Plastid-derived single gene minicircles of the dinoflagellate ceratium horridum are localized in the nucleus. Mol Biol Evol 21(7): 13181322.  Lajeunesse, T. C , G. Lambert, et al. (2005). Symbiodinium (pyrrhophyta) genome sizes ( D N A content) are smallest among dinoflagellates. Journal of Phycology 41: 880886.  Lin, S., H. Zhang, et al. (2002). Widespread and extensive editing of mitochondrial mrnas in dinoflagellates. J Mol Biol 320(4): 727-739.  Moreno Diaz de la Espina, S., E. Alverca, et al. (2005). Organization of the genome and gene expression in a nuclear environment lacking histones and nucleosomes: The amazing dinoflagellates. Eur J Cell Biol 84(2-3): 137-149.  Morse, D., P. Salois, et al. (1995). A nuclear-encoded form i i rubisco in dinoflagellates. Science 268(5217): 1622-1624.  15  Okamoto, O. K., L. Liu, et al. (2001). Members of a dinoflagellate luciferase gene family differ in synonymous substitution rates. Biochemistry 40(51): 15862-15868.  Palmer, J. D. (1995). Rubisco rules fall; gene transfer triumphs. Bioessays 17(12): 10051008.  Patron, N. J., R. F. Waller, et al. (2006). A tertiary plastid uses genes from two endosymbionts. J Mol Biol 357(5): 1373-1382.  Rizzo, P. J. (2003). Those amazing dinoflagellate chromosomes. Cell Res 13(4): 215217.  Rowan, R., S. M . Whitney, et al. (1996). Rubisco in marine symbiotic dinoflagellates: Form ii enzymes in eukaryotic oxygenic phototrophs encoded by a nuclear multigene family. Plant Cell 8(3): 539-553.  Sala-Rovira, M . , M . L. Geraud, et al. (1991). Molecular cloning and immunolocalization of two variants of the major basic nuclear protein (hcc) from the histone-less eukaryote crypthecodinium cohnii (pyrrhophyta). Chromosoma 100(8): 510518.  16  Saldarriaga, J. F., F. J. R. Taylor, et al. (2001). Dinoflagellate nuclear ssu rrna phylogeny suggests multiple plastid losses and replacements. Journal of Molecular Evolution 53: 204-213.  Sigee, D. C. (1983). Structural D N A and genetically active D N A in dinoflagellate chromosomes. Biosystems 16(3-4): 203-210.  Sparrow, A . H., H. J. Price, et al. (1972). A survey of D N A content per cell and per chromosome of prokaryotic and eukaryotic organisms: Some evolutionary considerations. Brookhaven Symp Biol 23: 451-494.  Spector, D. L . and R. E . Triemer (1981). Chromosome structure and mitosis in the dinoflagellates: A n ultrastructural approach to an evolutionary problem. Biosystems 14(34): 289-298.  Tamura, M . , S. Shimada, et al. (2005). Galeidiniium rugatum gen. Et sp. Nov. (dinophyceae), a new coccoid dinoflagellate with a diatom endosymbiont. Journal of Phycology 41: 658-671.  Taylor, F. J. R. (1987). The biology of dinoflagellates. Oxford, Blackwell Scientific. Tomas, R. and E. R. Cox (1973). Observations on the symbiosis of peridinium balticum and its intracellular alga i: Ultrastructure. / . Phycol. 9: 304-323.  17  Van Dolah, F. M . (2000). Marine algal toxins: Origins, health effects, and their increased occurrence. Environ Health Perspect 108 Suppl 1: 133-141.  Wang, Y. and D. Morse (2006). Rampant polyuridylylation of plastid gene transcripts in the dinoflagellate lingulodinium. Nucleic Acids Res 34(2): 613-619.  Wong, J. T., D. C. New, et al. (2003). Histone-like proteins of the dinoflagellate crypthecodinium cohnii have homologies to bacterial D N A - b i n d i n g proteins. Eukaryot Cell 2(3): 646-650.  Yoshikawa, T., A . Uchida, et al. (1996). There are 4 introns in the gene coding the D N A - b i n d i n g protein hcc of crypthecodinium cohnii (dinophyceae). Fisheries Sci 62(2): 204-209.  Zhang, Z., B. R. Green, et al. (1999). Single gene circles in dinoflagellate chloroplast genomes. Nature 400(6740): 155-159.  18  C H A P T E R II: O R G A N E L L E I N T E G R A T I O N I N  KRYPTOPERIDINIUMFOLIACEUM  2.1 Introduction* Most photosynthetic dinoflagellates have a peridinin-containing plastid derived from a secondary endosymbiosis with a red alga, but some have replaced this plastid with another type. Lepididinium has undergone what has been termed a serial secondary endosymbiosis by acquiring a green algal plastid (Watanabe, Suda et al. 1990; Watanabe, Takeda et al. 1987). A t least three (Schnepf and Elbrachter 1999) groups have acquired tertiary plastids by forming an endosymbiotic partnership with other secondary algae: 1) Dinophysis has a cryptophyte-derived plastid; 2) Karenia and Karlodinium have haptophyte-derived plastids; 3) Kryptoperidinium and Durinskia have a diatom-derived plastid (Chesnick, Kooistra et al. 1997; Chesnick, Morden et al. 1996; Hewes, Mitchell et al. 1998; Schnepf and Elbrachter 1988; Tangen and BJ0rnland 1981; Tengs, Dahlberg et al. 2000). Kryptoperidinium foliaceum (previously placed in the genera Glenodinium and Peridinium) was originally noted for its dual nuclei, and then for containing the typical chrysophyte pigment fucoxanthin (Dodge 1971; Withers and Haxo 1975). Morphological observations and phylogenetic analyses o f the large and small subunits of plastid ribulose-1, 5-bisphosphate carboxylase (rbcL and rbcS) demonstrated the existence of a membrane-bound endosymbiont and suggested it was o f diatom origin (Jeffrey and Vesk 1976) (Kite and Dodge 1985) (Chesnick, Morden et al. 1996). Subsequently, small subunit ribosomal R N A ( S S U r R N A ) phylogenies placed the  A version of this chapter has been published. M c E w a n , M . L . & Keeling, P.J. 2004. "HSP90, Tubulin and Actin are Retained in the Tertiary Endosymbiont Genome of Kryptoperidinium foliaceum," The Journal of Eukaryotic Microbiology, (51) 6, 651-659. 19  endosymbiont ancestrally among the pennate diatoms (Chesnick, Kooistra et al. 1997). A closely related species, Durinskia baltica, has also been shown to contain such a plastid (as Peridinium balticum; (Tomas and C o x 1973). The endosymbiosis appears to be permanent and obligatory, and these two species are now thought to be the product o f a single endosymbiosis (Inagaki, Dacks et al. 2000). Interestingly, K. foliaceum has retained a tri-membrane carotene-containing eyespot organelle that has been interpreted as a relict o f the original, tri-membrane peridinin-containing plastid (Dodge 1983; Dodge and Crawford 1969). Unfortunately, there is little molecular data from either the host or symbiont o f K. foliaceum. Yet, it is a remarkable system to study the physical and genetic reduction characteristic o f endosymbionts because it appears to be in the early stages o f this process. While the endosymbiont appears to be obligately intracellular, it is far less reduced than the endosymbionts o f either cryptophytes or chlorarachniophytes (Fine and Loeblich 1976) (Jeffrey and Vesk 1976) (Morrill and Loeblich 1977). The K. foliaceum endosymbiont has lost motility and its distinctive diatom wall, and the endosymbiont nuclear genome o f its sister species, D. baltica, appears to divide amitotically (Tippit and Pickett-Heaps 1976). Nevertheless, the symbiont has retained a comparatively large cytoplasmic space and nucleus. It is unclear whether the nucleus should be referred to as a nucleomorph, so for the present it w i l l be called the endosymbiont nucleus. Mitochondria, which have been lost in all other plastid-endosymbionts, are retained (Kite, Rothschild et al. 1988; Rizzo and C o x 1976; 1977). K. foliaceum is accordingly quite complex at the genome level: it contains two nuclear genomes, two mitochondrial genomes, and at least one plastid genome. Moreover, the diatom endosymbiont  20  apparently resides in the host cytoplasm (since there is only one membrane separating the host and endosymbiont nuclei), in contrast to the cryptophyte and chlorarachniophyte endosymbionts, which reside in the host endomembrane system (Eschbach, Speth et al. 1990; Gilson 2001). These differences and the relative rarity o f such endosymbiotic events make K. foliaceum an interesting point o f comparison with the much better studied cryptophyte and chlorarachniophytes. I have sequenced the first protein coding genes from the nucleus o f the endosymbiont, together with homologues from the host genome and from other diatoms. I examined four genes with interesting functional implications and varying presence or absence in cryptophyte and chlorarachniophyte nucleomorph genomes: heat shock protein 90 (HSP90, which is present in both nucleomorphs), alpha- and betatubulin (which are present in cryptophytes but apparently not in chlorarachniophytes), and actin (which is present in neither nucleomorph).  2.2 Materials and Methods DNA isolation, amplification and sequencing: Cultures of Kryptoperidinium foliaceum (Center for Culture of Marine Phytoplankton, C C M P 1326) and Nitzschia thermalis (Canadian Centre for the Culture o f Microorganisms, C C C M 608) were maintained in f/2-Si medium on a 13/11 light/dark cycle. K. foliaceum was kept at room temperature (23-25 °C); N . thermalis at 16 °C. Cells were harvested by centrifugation and D N A purified using the DNeasy Plant D N A isolation kit (Qiagen, Mississauga, O N ) . D N A from Phaeodactylum tricornutum ( C C M P 630) was kindly provided by J. T. Harper. Alpha-tubulin, beta-tubulin, HSP90, and actin genes were amplified usingthe following  21  primers: 5 ' - T C C G A A T T C A R G T N G G N A A Y G C N G G Y T G G G A - 3 '  and 5'-  C G C G C C A T N C C Y T C N C C N A C R T A C C A - 3 * (alpha-tubulin), 5'G C C T G C A G G N C A R T G Y G G N A A Y C A - 3 ' or 5'T C C T C G A G T R A A Y T C C A T Y T C R T C C A T - 3 ' and 5'C A G G T C G G T C A R T G Y G G N A A - 3 ' (beta-tubulin), plus diatom specific beta-tubulin primers 5 ' - A T B G C K G C N G C M G T N T G Y G G N C A T A - 3 '  and 5'-  C C A C G T C T C C T G S A C R G C V G T G G T - 3 ' . 5'GTCAAGCAYTTYWSNGTNGARGGNCA-3',  5'-  G G A G C C T G A T H A A Y A C N T T Y T A - 3 ' , and 5'G T C C C G C A G N G C Y T G N G C Y T T C A T D A T - 3 ' (HSP90), and 5'G A G A A G A T G A C N C A R A T H A T G T T Y G A - 3 ' , and 5'G G C C T G G A A R C A Y i T N C G R T G N A C - 3 ' (actin) . P G R products were cloned using pCR2.1 T O P O cloning kit (Invitrogen, Burlington, O N ) , and both strands sequenced using B i g Dye terminator chemistry. N e w sequences have been deposited into GenBank, accession numbers A Y 7 1 3 3 8 7 - A Y 7 1 3 3 9 8 . Phylogenetic analysis: Conceptual translations were added to existing amino acid alignments (Keeling and Leander 2003; Saldarriaga, M c E w a n et al. 2003). Thalassiosira pseudonana ( C C M P 1335) alpha- and beta-tubulin, actin, and HSP90 sequences were assembled from genome sequence data, produced by the U S Department o f Energy Joint Genome Institute (http://www.jgi.doe.gov/). Alignments consisted o f 50, 47, 54, and 44 sequences, and 375, 363, 505, and 206 unambiguously aligned sites for alpha-tubulin, beta-tubulin, H S P 90, and actin, respectively (available upon request). Phylogenetic trees were inferred for all four  22  individual gene data sets using distance and maximum likelihood. T R E E - P U Z Z L E 5.0 (Schmidt, Strimmer et al. 2002) was used to calculate a distance matrix under the Whelan and Goldman ( W A G ) model o f substitution frequencies, and each was corrected for siteto-site rate variation using a discrete gamma distribution with 8 variable rate categories plus one invariable category. The amino acid frequencies, proportion o f invariable sites, and shape parameter alpha were estimated from the data with T R E E - P U Z Z L E 5.0. A l p h a parameters (a) and proportion o f invariable sites (i) for alpha-tubulin, beta-tubulin, HSP90, and actin were a=0.63, 0.44, 0.85, 0.47 and i=0.25, 0.00, 0.09, 0.11, respectively. Distance trees were inferred from each distance matrix by weighted neighbour-joining . ( W N J , Weighbor 1.0.1) (Bruno, Socci et al. 2000) and Fitch-Margoliash ( F I T C H , Fitch 3.572) (Felsenstein 1997). For all four data sets 100 (Fitch Margoliash) or 500 (weighted neighbor-joining) bootstrap replicates were analysed using P U Z Z L E B O O T ( A . Roger, M . Holder, http://www.tree-puzzle.de) with the alpha shape parameter and the proportion of invariable sites from the original data. Protein maximum-likelihood ( M L ) trees were inferred for all four data sets using P r o M L 3.6 (Felsenstein 1997) under the Jones, Taylor, and Thornton (JTT) substitution frequency matrix with global rearrangements and two input order jumbles. Site-to-site rate variation was modeled on a gamma distribution with four variable rate categories and one invariable category. Rates and frequencies were estimated using T R E E - P U Z Z L E 5.0. Maximum-likelihood bootstrapping was performed with 100 replicates, only one category of sites, global rearrangements, and one jumble.  23  2.3 Results and Discussion Host and endosymbiont nuclear-encoded protein coding genes from Kryptoperidinium foliaceum: Altogether, eight new homologues o f alpha-tubulin, betatubulin, actin, and HSP90 were characterised from K. foliaceum, as well as alpha-tubulin, actin, and HSP90 from the pennate diatom N. thermalis and actin from P. tricornutum. The phylogenies o f these four genes were constructed to distinguish homologues from the host and endosymbiont nuclear genomes of K. foliceum. The overall characteristics o f these phylogenies resembled those seen in previous analyses (Edgcomb, Roger et al. 2001; Keeling and Leander 2003; Saldarriaga, M c E w a n et al. 2003; Stechmann and Cavalier-Smith 2003), and most importantly the dinoflagellates and diatoms consistently formed strongly supported clades. Three distinct alpha-tubulin genes were amplified from K. foliaceum, two of which were highly similar at the amino acid level. In alpha-tubulin phylogeny (Figure 2.1), the two similar K. foliaceum sequences branched strongly within the dinoflagellate clade (97-100%), while the third gene branched with moderate to strong support in the diatom clade (63—96%)), specifically with the pennate diatom N. thermalis (97—100%)). This pattern is precisely that predicted for a host genome origin for the two similar genes and an endosymbiont genome origin for the third. The beta-tubulin phylogeny included two similar K. foliaceum genes (Figure 2.2) and showed both sequences branching with strong support in the dinoflagellate clade (90-98%>), to the exclusion o f Oxyrrhis and all other species. In this case, both o f these sequences are predicted to have originated in the host genome. Given the presence o f an endosymbiont-derived alpha-tubulin, however, it is likely that an endosymbiont-derived  24  beta-tubulin does exist, but may be too divergent to amplify, despite much effort and multiple primer-sets. This notion is supported by the slightly divergent nature of the pennate diatom alpha-tubulins (since the two often co-evolve), the divergent nature of the centric diatom beta-tubulins (which, together with brown algae, do not branch with oomycete heterokonts), and the failure to amplify the beta-tubulin from N. thermalis or the K. foliaceum endosymbiont with universal or heterokont-specific primers, nor any combination of the two. A single K. foliaceum actin sequence was characterised. In actin phylogenies (Figure 2.3) this gene grouped within a moderately supported heterokont clade (70—83%), specifically within the strongly supported diatom subgroup (97—100%). Within the diatoms, the centric and pennate types do not form discrete clades, but the K. foliaceum sequence does branch weakly with that of the pennate N. thermalis, altogether strongly supporting an endosymbiont origin for this gene. Lastly, two HSP90 genes were characterised from K. foliaceum. In the HSP90 phylogeny (Figure 2.4), these branched with strong support within the dinoflagellate (94—100%) and diatom (94—100%) clades, respectively. Once again, this is indicative of a host and endosymbiont origin for these genes. Overall, the HSP90 phylogeny is the most well supported of the four analysed and the most consistent with well established relationships: dinoflagellates were sisters to apicomplexans, and the alveolate clade including ciliates was resolved with high bootstrap support. Similarly, the second HSP90 copy showed a strong affinity to the diatoms (although to neither centric nor pennate forms), which grouped with other heterokonts.  25  48 Heterocapsa 49 891 triquetra Heterocapsa rotundata 58 DinoAmphidinium herdmanii flagellates — Karenia brew's 8  100/97/99  4  +  Perkinsus  and 99/98/91 L[IK. foliaceum I Oxyrrhis K. foliaceum II Perkinsus marinus 97 Oxyrrhis marina 97 98 Toxoplasma gondii 98 Apicomplexans 93 Eimeria acervulina Plasmodium falciparum '— Oxytricha granulifera Blepharisma japonicum Ciliates Loxodes striatus  97/97/95  • Tetrahymena thermophila Euplotes octocarinatus 100/98/1001— Arabidopsis thaliana t l 86 Orysa sativa 91 Zea mays I87r-Anemia phyllitidis Guillardia thetan 88/71/941 (host) • Goniomonas truncata Chloromonas sp. Chlamydomonas reinhardtii Reclinomonas americana • Bigelowiella natans Streblomastix strix 100/98/IQOi - Pyrsonympha grandis 76 Trypanosoma brucei 99/97/99; 67 83 Leishmania 97 donovani — Euglena gracilis 98 98 Naegleha gruberi — Acrasis rosea Jakoba libera Nitzschia thermalis I 82/84/88 96 100/97/100 92 N. thermalis II 63! K. foliaceum III L- Thalassiosira pseudonana Pelvetia fastigiata  70 67 74  -LZ  100 97 100  3  PC Homo  100 97 100  sapiens — Paracentrotus lividus Drosophila melanogaster 100 Rhizophydium sp. 98 96 Nowakowskiella hemisphaerospora - Karlingomyces sp. 100/97/1001—- Trichomonas vaginalis t l Monocercomonas  40/40/55 98/98/99  t  62  Plants  Cryptophytes Green Algae Jacobid Cercozoan Oxymonads  Euglenozoa  Heteroloboseans Jacobid Heterokonts (pennate diatoms) (centric) Brown algae  Animals  Fungi  Parabasalids  Jakoba incarcerata 3 J a c o b i d Giardia intestinalis 3 Diplomonad 0.1 Figure 2.1: Alpha-tubulin maximum likelihood (ML) phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), Fitch-Margoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. Three K. foliaceum genes are shown in black boxes. Two of these group strongly within dinoflagellates, and the third groups strongly with the pennate diatoms.  26  Peridinium —  Gyrodinium  1  Amphidinium  willei  instriatum  corpulentum Gymnodinium  - Crypthecodinium  98/94/90  Kryptoperidinium  59 44  73/75/-  Heterocapsa  - Tetrahymena  pyriformis Eimeria  Toxoplasma —  foliaceum II  marina Thalassiosira pseudonana Heterokonts Thalassiosira {diatoms + a weissflogii ,brown algae) Ectocarpus variabilis}  99/97/94 J  100/95/100/  59 96 96 62/581 85 93 93/79 r  foliaceum I  1_  Ciliate  tenella gondii  Apicomplexans  Babesia bovis Plasmodium falciparum  Colpoda sp.  84 85/62T  Euplotes Euplotes  focardii  crassus Perkinsus  Stylonychia  Ciliates + Perkinsus marinus  mytilus Zea mays Plants and - Pisum sativum Green Algae - Solanum tuberosum  83/66/83T 6/83T 100/97/1001  91 94  Chlamydomonas  reinhardtii Leishmania mexicana Trypanosoma brucei  100/97/1001  j Cryptophyte . Heterolobosean Euglenid  Naegleria gruberi  701  Euglena  gracilis  100/97/100r~ Bigelowiella ' Cercomonas  I—  I—  amoeboformis Cercozoa  sp. 18  Achlya  Pythium  natans  Lotharella  Phytophthora  100/96/1001 100/97/991  cinnamomi Heterokonts (Oomycetes)  klebsiana  ultimum Oxy monad  Streblomastix stn'x 83/86/921 96/86/82 100 " 98  Harpochythum Spizellomyces  64 61  Drosophila melanogaster — Homo sapiens  Animals  Caenorhabditis sp.  Fungi  elegans  punctatus 100/96/100T  93 95  Trypanosomatids  Guillardia theta (host)  94  93 85 74  Dinoflagellates + Oxyrrhis  triquetra  Kryptoperidinum Oxyrrhis  varians  cohnii  Giardia  intestinalis Spironucleus  barkhanus  Trichomitis urachor bafracnor 'richomt Trichomonas vaginalis  Parabasala  Diplomonads  0.1  Figure 2.2: Beta-tubulin M L phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), Fitch-Margoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. Both Kryptoperidnium foliaceum sequences appear to be host-derived, grouping strongly within the dinoflagellates.  27  Prorocentrum \—  Heterocapsa  minimum  triquetra  Karenia brevis Amphidinium  Dinoflagellates + Perkinsus and Oxyrrhis  carterae  Peridinium willei Crypthecodinium  cohnii  Oxyrrhis marina Toxoplasma  gondii Plasmodium falciparum Cryptosporidium parvum  Apicomplexans  Heterokonts PhytopMhora megasperma  (Oomycetes)  Heterokonts (Brown algae) K. foliaceum Nitzschia thermalis Heterokonts (Diatoms) Thalassiosira pseudonana Phaeodactylum tricornutum Heterolobosean  Plants and Green Algae  Cyanophora  paradoxa  3 Glaucocystophyte  Bigelowiella natans 99/96/92 Lotharella amoeboformis Cercomonas sp.  89/75/81  3 Cryptophyte  Guillardia theta  75 81  Drosophila  melanogaster Animals  - Xenopus laevis 62 60 56 I  56 61  73 74 67  Cercozoa  Homo sapiens Acanthamoeba 77 70/60 77 75 84  r  castellani  Dictyostelium  Amoebae and Slime Molds  discoideum Ajellomyces  Schizosaccharomyces — Puccinia  Fungi  graminis  Xanthophyllomyces Pneumocystis  capsulatus  pombe dendrorhous  carinii  0.1  Figure 2.3: Actin M L phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), FitchMargoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. Apicomplexans, dinoflagellates, and heterokonts are strongly supported in this analysis, and their positions are in accordance with well-established relationships. Ciliates were excluded from this analysis as they are highly divergent and polyphyletic in actin (Keeling 2001). The Kryptoperidinium foliaceum sequence groups strongly within the heterokonts.  28  Heterocapsa triquetra Lessardia elongata Prorocentrum micans 97 94 Kryptoperidinium foliaceum Dinoflagellates 100 100 <— Crypthecodinium cohnii 94 95 Oxyrrhis marina Perkinsus marinus 63 Babesia bovis] 100/99/100/ r 66 69 Theileria parva UpiCryptosporidium parvum Icomplexans • Eimeria tenella Plasmodium falciparum 80/83/541 Tetrahymena pyriformis 100 100/98/100f 1 M — Tetrahymena bergeri 97 100 '— Tetrahymena thermophita Ciliates — Paramecium tetraurelia Halteria grandineila — Blepharisma intermedium Thalassiosira pseudonana" 100/94/100 Heterokonts Phaeodactylum tricomutum (diatoms) 100 Kryptoperidinium foliaceum 96 Achlya ambisexualis 99 r Heterokonts Phytophthora infestans (oomycetes) 88/58/791 Lycopersicon esculentum I — Arabidopsis thaliana 100 Oryza sativa 93 100 100 — Triticum aestivum Plants and Green Algae 96 — Zea mays 91 Ipomoea nil — Chlamydomonas reinhardtii Pedinomonas sp. ] Oxymonad Streblomastix strix 100/98/931— Gallus gallus 100 68/73/77I L Homo sapiens alpha 96 Oncorhynchus 100 tshawyfscha 100 Danio rerio 97/98 Animals 100 Homo sapiens beta  100 94 100  61  80 72 66  100/93/100  100 94 100  97 100 I  - Brugia pahangi Caenorhabditis elegans - Drosophila auraria ioor Anopheles sp. 100 I Podospora anserina 95, 80 I Neurospora crassa 88 76 - Ajellomyces capsulatus Fungi 100 Candida sp. 96/100 r Candida albicans Schizosaccharomyces pombe Dictyostelium discoideum J S l i m e Mold 100/97/iqoi Leishmania mexicana 98 ' Leishmania donovoni Euglenozoa 97 — Trypanosoma brucei _t£0_ Trypanosoma cruzi Bodo saltans • Diplonema papillatum 100 98  100 94 100  100 93 100  100 93 100  Figure 2.4: HSP90 M L phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), Fitch-Margoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. One Kryptoperidinium foliaceum sequence shows affinity to dinoflagellates, the other to diatoms.  29  In summary, the phylogenies support the conclusion that K. foliaceum possesses alphatubulin and HSP90 genes o f both host and endosymbiont origin, as well as at least host beta-tubulin and endosymbiont actin. In analyses including both centric (Thalassiosira) and pennate (Nitzschia and Phaeodactylum) diatoms, endosymbiont-derived K. foliaceum genes showed an affinity to the pennate diatoms over the centric forms in alpha-tubulin and actin, and no affinity in HSP90. Overall, this is in agreement with previous molecular studies based on endosymbiont nuclear S S U r R N A , which suggested a pennate diatom ancestry for the K. foliaceum endosymbiont (Chesnick, Kooistra et al. 1997).  Structural features of host- and endosymbiont-derived genes in Kryptoperidinium foliaceum: The phylogenetic history o f a gene provides a strong inference for its location in the cell, but it remains possible that endosymbiont-derived genes are encoded in the host nuclear genome, and vice versa. Without physically localizing a gene, the genome in which it is encoded w i l l always be in some doubt, but there are certain characteristics that have been used successfully, together with phylogenetic history, to provide a very accurate prediction. Cryptophyte and chlorarachniophyte nucleomorph genes exhibit several physical characteristics distinct from host nuclear genes that allow these genes to be distinguished. Chlorarachniophyte nucleomorph genes have minute, 18- to 20-bp introns that readily distinguish them from those o f the host nuclear genes (an estimated 168-bp on average) (Gilson and McFadden 2002). The difference between cryptophyte nuclear (50- to 74-bp) and nucleomorph (42- to 52-bp) introns is not as pronounced, but still somewhat useful (Douglas, Zauner et al. 2001). Chlorarachniophyte and cryptophyte nucleomorph genes also exhibit a strong AT-bias, so that nucleomorph and nuclear genes can be as much as  30  25% different in overall AT-content (Douglas, Zauner et al. 2001; Gilson and McFadden 2002). Unfortunately, none o f the K. foliaceum genes characterised here contained introns, but the AT-content o f each K. foliaceum sequence was calculated to determine i f there was a bias between dinoflagellate-derived genes versus diatom-derived genes (Table .1). The mean A T content o f the five dinoflagellate-identified genes is 42.5%, ranging from 39.2—46.6%. The three diatom-identified sequences have a mean A T content o f 48.8% (47.0—51.6%). Accordingly, there is an AT-bias o f approximately 6% in the diatomderived genes, and no overlap in the ranges o f AT-content between the two sets o f genes. This bias supports the conclusion that the two phylogenetic classes o f genes in K. foliaceum likely reside in different genomes: the dinoflagellate genes in the host nucleus and the diatom genes in the endosymbiont nucleus. While the 'total-nucleotides' A T bias (7%) may be less than that observed in cryptophyte and chlorarachniophyte genomes, A T bias in 'third-position-only' shows a larger separation between host and endosymbiont genes (18%). The diatom-identified genes from K. foliaceum have a mean third position A T content o f 36.7%, ranging from 35.2—38.8%. In contrast, mean A T content at third position o f the dinoflagellate-identified genes is 18.9%>, ranging from 11.5—23.6% (for a difference o f approximately 18% between host and endosymbiont genes). A codon bias analysis was also done, but no clear trends were observed. Codon biases and A T biases are complimentary and potentially useful tools for sorting sequence data; a 6% or 18% A T bias allows for preliminary predictions about the location o f genes encoded in K . foliaceum.  31  Table 2.1: A T content of dinoflagellate and diatom protein coding genes  Gene Dinoflagellates HSP 90 Beta-tubulin Alpha-tubulin Alpha-tubulin Actin HSP90 Diatoms Alpha Actin  Inferred Source Host Host Host Host Host Endosymbiont Endosymbiont Endosymbiont Source Nitzschia thermalis 1 Nitzschia thermalis 2 Nitzschia thermalis Phaeodactylum  AT content (%) Total 3 position r d  46.62 42.93 43.10 39.15 40.88 46.96 47.81 51.63  17.79 23.65 23.39 11.46 17.19 38.80 35.25 36.00  47.26 46.15 48.91 46.17  38.99 35.44 40.16 32.79  Functional implications of endosymbiont genes. The K. foliaceum plastid endosymbiont presents a unique opportunity to study the early stages of endosymbiotic genome reduction from a functional perspective. Based on the pennate diatom identity of the endosymbiont and its current wall-less, immobile, amitotic state, one would expect that the endosymbiont genome has lost many genes, especially those related to wall structure and deposition, motility, and mitotic nuclear division. However, compared to the nucleomorphs of cryptophytes and chlorarachniophytes, the K. foliaceum endosymbiont nucleus appears to be at a relatively early stage of reduction, which is consistent with our observations that it retains genes that have been lost by nucleomorphs. The presence of HSP90 is the least surprising of the four genes and does not give much insight into endosymbiont reduction, since both cryptophyte (Douglas, Zauner et al. 32  2001)and chlorarachniophyte (unpublished data) nucleomorphs also retain HSP90 genes in their genomes. H S P 9 0 plays a wide variety o f roles in eukaryotic cytoplasm, including as a chaperone in protein folding, and because o f this, it is probably one o f the last proteins lost in a degrading endosymbiont. Actin and the tubulins, on the other hand, are not always retained, and their presence in K. folicaeum is o f more interest. Alpha- and beta-tubulin are component parts o f microtubules, whose activity is most apparent in the flagella, mitotic spindle, and cytoskeleton. Actin is also a prominent protein in the cytoskeleton and it plays a central role in the gliding motility o f diatoms. Alpha-, beta-, and gamma-tubulins are present in the nucleomorph o f the cryptophyte G. theta (Keeling, Deane et al. 1999), but are apparently absent from choraraehniophyte endosymbionts (unpublished data). Actin, on the other hand, appears to be absent from both endosymbionts (although an actin gene o f possible endosymbiont origin has been found in the host nucleus o f the cryptophyte Pyrenomonas helgolandii (Stibitz, Keeling et al. 2000). In the K. foliaceum endosymbiont, while there is no gliding motility, cell wall or frustule, and presumably no mitosis, both alpha-tubulin and actin remain. The presence o f actin is especially interesting because it has been lost in both cryptophyte and chlorarachniophyte nucleomorphs, and because it is a major component o f diatom motility (Poulsen, Spector et al. 1999). Its possible function in the K. foliaceum endosymbiont is not clear, but its presence hints that the general cytoskeleton of the K. foliaceum endosymbiont is less reduced than those o f either cryptophytes or chlorarachniophytes. The endosymbionts of K. foliaceum and D. baltica and their division have been extensively studied by electron microscopy, and no direct evidence o f microtubules has been found during division or  33  growth phases (Tippit and Pickett-Heaps 1976). Microtubules were also notably absent in and around the endosymbiont nuclei during sexual reproduction in D. baltica (Chesnick and C o x 1987). In G. theta, where tubulins are also present despite a lack o f observed microtubules, it has been suggested that they may fulfill some alternate biological role that does not require microtubules (Keeling, Deane et al. 1999). Alternatively, both K. foliaceum and G. theta endosymbionts may contain microtubules that are highly specialized and appear for short periods o f the life cycle, or in restricted numbers and size, making them very difficult to detect. In D. baltica, for example, chromatin condensation and 'crystalline rod' formation in the endosymbiont nucleus were observed after sexual fusion o f both host and symbiont cells (as P. balticum. (Chesnick and C o x 1987)). The absence o f tubulins from the chlorarachniophyte nucleomorph genome leaves open the possibility that microtubules can be discarded long before the complete reduction o f the endosymbiont.  Implications of genetic reduction in Kryptoperidinium foliaceum: While the K. foliaceum endosymbiont has undergone some reduction (i.e. loss o f cell wall and presumed amitosis), the question remains as to whether it has undergone any genetic reduction at all. In other integrated endosymbiotic systems, genetic reduction has partially or completely occurred through some combination o f gene loss and transfer, where the products o f transferred genes are targeted back to the appropriate compartment. Transfers o f plastid-targeted genes and the mechanism by which their products are targeted to the organelle are well known in both primary and secondary plastids (McFadden 1999). In contrast, gene transfer and plastid targeting in tertiary plastids are relatively unknown, and K. foliaceum presents a unique case for targeting: instead o f  34  being contained in the endomembrane system o f the host, freeze-fracture microscopy suggests that the single membrane that separates the symbiont from the host cytoplasm is actually derived from the outer membrane o f the symbiont itself (Eschbach, Speth et al. 1990). If this is true, the product o f any gene that is transferred to the host nucleus must first return to the endosymbiont cytoplasm by an entirely unique method of targeting, perhaps analogous to pinocytosis by the endosymbiont. Determining whether any such transfers to the host nucleus have occurred w i l l potentially provide an important comparison with the better-studied secondary plastids, as it could represent an entirely novel solution to the problem o f endosymbiont protein trafficking.  35  LITERATURE CITED  Bruno, W. J., N. D. Socci, et al. (2000). Weighted neighbor joining: A likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 17(1): 189-197. Chesnick, J. M . and E. R. Cox (1987). Synchronized sexuality of an algal symbiont and its dinoflagellate host, peridinium balticum (levander) lemmermann. Biosystems 21(1): 69-78. Chesnick, J. M . , W. H. Kooistra, et al. (1997). Ribosomal rna analysis indicates a benthic pennate diatom ancestry for the endosymbionts of the dinoflagellates peridinium foliaceum and peridinium balticum (pyrrhophyta). J. Eukaryot: Microbiol. 44(4): 314320. Chesnick, J. M . , C. W. Morden, et al. (1996). Identity of the endosymbiont of peridinium foliaceum (pyrrophyta): Analysis of the rbcls operon. J. Phycol. 32(5): 850857. Dodge, J. D. (1971). A dinoflagellate with both a mesocaryotic and a eucaryotic nucleus. I. Fine structure of the nuclei. Protoplasma 73(2): 145-157. Dodge, J. D. (1983). The functional and phylogenetic significance of dinoflagellate eyespots. Biosystems 16(3-4): 259-267. Dodge, J. D. and R. M . Crawford (1969). Observations on the fine structure of the eyespot and associated organelles in the dinoflagellate glenodinium foliaceum. J. Cell. Sci. 5(2): 479-493. Douglas, S., S. Zauner, et al. (2001). The highly reduced genome of an enslaved algal nucleus. Nature 410(6832): 1091-1096. Edgcomb, V. P., A. J. Roger, et al. (2001). Evolutionary relationships among "jakobid" flagellates as indicated by alpha- and beta-tubulin phylogenies. Mol. Biol. Evol. 18(4): 514-522. Eschbach, S., V. Speth, et al. (1990). Freeze-fracture study of the single membrane between host-cell and endocytobiont in the dinoflagellates glenodinium foliaceum and peridinium balticum. J. Phycol. 26(2): 324-328. Felsenstein, J. (1997). A n alternating least squares approach to inferring phylogenies from pairwise distances. Syst Biol 46(1): 101-111.  36  Fine, K. E . and A . R. Loeblich (1976). Endosymbiosis in the marine dinoflagellate kryptoperidinium foliaceum. J Protozool 23(2): A 8 . Gilson, P. R. (2001). Nucleomorph genomes: M u c h ado about practically nothing. Genome Biol. Reviews 2(8): 1022. Gilson, P. R. and G. I. McFadden (2002). Jam packed genomes—a preliminary, comparative analysis of nucleomorphs. Genetica 115(1): 13-28. Hewes, C. D., B. G. Mitchell, et al. (1998). The phycobilin signatures of chloroplasts from three dinoflagellate species: A microanalytical study of dinophysis caudata, d. Fortii, and d. Acuminata (dinophysiales, dinophyceae). J. Phycol. 34(6): 945-951. Inagaki, Y., J. B. Dacks, et al. (2000). Evolutionary relationship between dinoflagellates bearing obligate diatom endosymbionts: Insight into tertiary endosymbiosis. Int. J. Syst. Evol. Microbiol. 50(6): 2075-2081. Jeffrey, S. W. and M . Vesk (1976). Further evidence for a membrane-bound endosymbiont within the dinoflagellate peridinium foliaceum. J. Phycol. 12: 450-455. Keeling, P. J. (2001). Foraminifera and cercozoa are related in actin phylogeny: T w o orphans find a home? Mol. Biol. Evol. 18(8): 1551-1557. Keeling, P. J., J. A . Deane, et al. (1999). The secondary endosymbiont of the cryptomonad guillardia theta contains alpha-, beta-, and gamma-tubulin genes. Mol. Biol. Evol. 16(9): 1308-1313. Keeling, P. J. and B. S. Leander (2003). Characterisation of a non-canonical genetic code in the oxymonad streblomastix strix. J. Mol. Biol. 326(5): 1337-1349. Kite, G. C. and J. D. Dodge (1985). Structural organization of plastid D N A in two anomalously pigmented dinoflagellates. J. Phycol. 21(1): 50-56. Kite, G. C , L. J. Rothschild, et al. (1988). Nuclear and plastid dnas from the binucleate dinoflagellates glenodinium (peridinium) foliaceum and peridinium balticum. Biosystems 21(2): 151-163. McFadden, G. I. (1999). Plastids and protein targeting. J. Eukaryot. Microbiol. 46(4): 339-346. Morrill, L. C. and A . R. Loeblich (1977). Studies of photo-heterotrophy in binucleate dinoflagellate kryptoperidinium foliaceum. J. Phyc. 13: 46. Poulsen, N. C , I. Spector, et al. (1999). Diatom gliding is the result of an actin-myosin motility system. Cell Motil Cytoskeleton 44(1): 23-33.  37  Rizzo, P. J. and E. R. Cox (1976). Isolation and properties of nuclei from binucleate dinoflagellates. J. Phyc. 12: 31. Rizzo, P. J. and E. R. Cox (1977). Histone occurrence in chromatin from peridinium balticum, a binucleate dinoflagellate. Science 198(4323): 1258-1260. Saldarriaga, J. F., M . L . McEwan, et al. (2003). Multiple protein phylogenies show that oxyrrhis marina and perkinsus marinus are early branches of the dinoflagellate lineage. Int J Syst Evol Microbiol 53(Pt 1): 355-365. Schmidt, H. A., K. Strimmer, et al. (2002). Tree-puzzle: M a x i m u m likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18(3): 502504. Schnepf, E . and M . Elbrachter (1988). Cryptophycean-like double membrane-bound chloroplast in the dinoflagellate, dinophysis ehrenb - evolutionary, phylogenetic and toxicological implications. Bot. Acta. 101(2): 196-203. Schnepf, E . and M . Elbrachter (1999). Dinophyte chloroplasts and phylogeny - a review. Grana 38(2-3): 81-97. Stechmann, A. and T. Cavalier-Smith (2003). Phylogenetic analysis of eukaryotes using heat-shock protein hsp90. J. Mol. Evol. 57(4): 408-419. Stibitz, T. B., P. J. Keeling, et al. (2000). Symbiotic origin of a novel actin gene in the cryptophyte pyrenomonas helgolandii. Mol. Biol. Evol. 17(11): 1731-1738. Tangen, K. and T. Bj0rnland (1981). Observations on pigment and morphology of gyrodinium aureolum hulbert, a marine dinoflagellate containing 19'hexanoyloxyfucoxanthin as the main carotenoid. J. Plankton Res. 3: 389-401. Tengs, T., O. J. Dahlberg, et al. (2000). Phylogenetic analyses indicate that the 19'hexanoyloxy-fucoxanthin-containing dinoflagellates have tertiary plastids of haptophyte origin. Mol. Biol. Evol. 17(5): 718-729. Tippit, D. H. and J. D. Pickett-Heaps (1976). Apparent amitosis in the binucleate dinoflagellate peridinium balticum. J. Cell. Sci. 21(2): 273-289. Tomas, R. and E. R. Cox (1973). Observations on the symbiosis of peridinium balticum and its intracellular alga i: Ultrastructure. J. Phycol. 9: 304-323. Watanabe, M . M . , S. Suda, et al. (1990). Lepidodinium viride gen et sp. Nov. (gymnodiniales, dinophyta), a green dinoflagellate with a chlorophyll a- and b-containing endosymbiont. J. Phycol. 26: 741 - 751.  38  Watanabe, M . M . , Y. Takeda, et al. (1987). A green dinoflagellate with chlorophylls and b: Morphology, fine structure of the chloroplast and chlorophyll composition. J. Phycol. 23: 382 - 389. Withers, N. and F. T. Haxo (1975). Chlorophyll c l and c2 and extraplastidic carotenoids in the dinoflagellate, peridinium foliaceum stein. Plant Sci. Lett. 5: 7-15.  C H A P T E R III: I N T R O N D I S T R I B U T I O N I N D I N O F L A G E L L A T E S  3.1 Introduction The bulk o f eukaryotic genome sequence is made up o f non-coding D N A . In large genomes, not only is there more intergenic D N A , genes themselves can be larger than their small-genome counterparts. The average size o f a human gene is 5795bp, but 4455bp o f this is intron and U T R sequence, leaving a coding sequence that is only 23% of the gene (IHGSConsortium, Lander et al. 2001). Genes (including exons, introns, 5' and 3' U T R s ) can expand in two main ways. The first is by acquiring new coding domains, the benefit o f which is to allow more complex protein-protein interactions (alternative splicing also allows this). Second, genes can expand their non-coding portions: U T R s and introns. One o f the oldest theories about eukaryotic genome expansion is growth by the invasion o f introns. Exactly how (and when) introns first infiltrated eukaryotic genes has been a hotly debated issue for decades, and w i l l likely continue this way for some time (recent reviews: (Lynch and Richardson 2002; Roy and Gilbert 2006)). However, some aspects o f intron evolution are beyond debate, and the clearest o f these is the observation that nearly every single spliceosomal intron in any eukaryotic gene has the same boundary sequences "canonical" G T | A G splice sites that allow efficient splicing by a spliceosome that arose in an ancient eukaryotic ancestor long before the adaptive radiation o f its descendents. Known distribution of non-canonical introns: A m o n g the diverse eukaryotic species that exist today, there are few exceptions to this canonical intron rule. A recent survey o f 22489 EST-confirmed mammalian introns in the N C B I database revealed 98.71%) with  40  canonical G T j A G boundaries (Burset, Seledtsov et al. 2000). In the Arabidopsis genome, approximately 99.45% o f 26961 introns have G T | A G boundaries. Fifty o f the remaining introns were o f the minor U12-dependent spliceosomal type ( A T | A C ) and only two other non-canonical introns: A T | A A and G T | A T were found (Zhu and Brendel 2003). Apart from these two comprehensive studies, research on the existence o f non-canonical introns in other organisms is scarce. Table 3.1 summarizes the known distribution to date. Introns in Dinoflagellates: Although introns are not abundant in dinoflagellates, sixteen have been found in six dinoflagellate genes (Table 3.1). O f these sixteen, only three are canonical. One o f these canonical introns occurs in a type II ribulose 1,6 bisphosphatase (rubisco) subunit (rbcG) that also contains a second, non-canonical intron. The first six non-canonical dinoflagellate introns were found i n the nuclear-encoded type II rubisco large subunit (rbcL). Type II rubisco is of bacterial origin and has only been found in  dinoflagellates (Symbiodinium, Gymnodinium polyedra aka. Lingulodinium polyedrum, H. triquetra, Prorocentrum minimum and Amphidinium carterae). A l l other eukaryotes use the eukaryotic type I rubisco protein (Morse, Salois et al. 1995; Patron, Waller et al. 2005; Rowan, Whitney et al. 1996; Zhang and L i n 2003). Four introns, all noncanonical, were found in the basic histone-like protein ( H C c ) gene in Crypthecodinium cohnii (Yoshikawa, Uchida et al. 1996). In the Pyrocystis lunula luciferase gene there is one non-canonical intron (Okamoto, L i u et al. 2001). I cloned and sequenced an actin gene from Heterocapsa rotundata which contained a 46bp G A | T G non-canonical intron. This gene (AF482409) was submitted to genbank along with a batch o f others including two dinoflagellate genes with one canonical intron each (Saldarriaga, M c E w a n et al. 2003). Finally, Perkinsus marinus  41  Table 3.1: Eukaryotic non-canonical introns,  Group Plants  Species  Gene  Arabidopsis  Human  Dinoflagellates Crypthecodinium cohnii HCc  Boundary Sequences  # Introns (size)  ATAA  1  GTAT  1  GTGG  1  TTAG  1  AGGC  1 (135)  AGGG  1 (188)  CCGC  1 (150)  GGGA  Symbiodinium sp. GCAG rbcA  GAAG  rbcG  GAGG* ATTC  Jakobid  Diplomonad Chytrid  GATG*  (Yoshikawa, Uchida et al. 1996)  (Okamoto, L i u etal. 2001)  1(46)  (Saldarriaga, M c E w a n et al. 2003)  Malawimonas jakobiformis Giardia intestinalis  (Burset, Seledtsov et al. 2000)  1 (403)  Heterocapsa rotundata actin  (Zhu and Brendel 2003)  1 (152) 4(124, 163, 163,163) (Rowan, 2(211,211) Whitney et al. 1996) 1 (456)  Pyrocystis lunula luciferase  Reference  CCTdelta  CTAG  1(61)  (Archibald, O'Kellyetal. 2002)  ferredoxin  CTAG  1(35)  (Nixon, Wang et al. 2002)  Karlingiomyces sp beta-tubulin  CTAG  3 (52, 80, 65) (Keeling 2003)  Table 3.1: Non-canonical introns i n eukaryotes. Boundary sequences denote 5' then 3' dinucleotide splicing signals. The two human boundary sequences are from a stringently analysed dataset o f 126 EST-confirmed non-canonical introns (the rest were A T A C - t y p e U12-dependent non-canonical introns). Total EST-supported Arabidopsis introns (including canonical): 26961. A n asterisk* denotes a non-EST-confirmed intron.  42  which lies at the base o f the dinoflagellate clade, has canonical G T | A G introns in its nuclear-encoded, mitochondrial-targeted superoxide dismutase genes sodl (4 introns) and sod2 (5 introns). These have been compared to Toxoplasma gondii sod genes and one intron has a similar location. The P. marinus sod genes seem to be more apicomplexanlike than dinoflagellate-like (Schott, Robledo et al. 2003) which suggests that noncanonical introns in dinoflagellates appeared in the dinoflagellate lineage sometime after the common ancestor o f P. marinus and higher dinoflagellates, though it is unclear whether by boundary mutation or insertion o f new non-canonical introns. The collective dinoflagellate intron data shows that while dinoflagellate introns are neither particularly abundant nor particularly rare, those that do exist are non-canonical by majority. Research on dinoflagellate introns: Until recent dinoflagellate E S T databases aimed at characterising the dinoflagellate transcriptome (Hackett, Scheetz et al. 2005; Patron, Waller et al. 2005; Patron, Waller et al. 2006; Y o o n , Hackett et al. 2005), there had been no large-scale source of gene data for dinoflagellates, and no intron-specific searches had been done in dinoflagellates. This is not very surprising when one considers the type o f 'needle in a haystack' search that is required. Screening a dinoflagellate genome library means screening, on average, 15,000 - 40,000 M b p o f sequence, most o f it non-coding DNA. An EST-based approach: One o f the common characteristics o f large genomes is an increased number o f introns, or increased intron size. I was interested in investigating /  this possibility in dinoflagellates. I was also interested in the implications o f a potentially large set o f non-canonical introns. B y using the  triquetra and K. micrum E S T  databases, we are able to eliminate intergenic D N A from the screening process while  43  maintaining a good sample o f the total coding D N A . The basic strategy, expanded upon in the methods section below, uses exact sequences o f c D N A s (representing spliced m R N A ) from the H. triquetra and K. micrum E S T libraries to amplify coding sequences from genomic D N A . The presence o f introns increases the size o f the amplified fragment in relation to the known size of the spliced m R N A sequence. This size-based test is the initial screen for potential intron-containing genes.  3.2 Materials and Methods Library construction and sequencing: H. triquetra and K. micrum expressed sequence tag (EST) libraries were made by J. Archibald, N . Patron and R. Waller, sequenced and partially annotated (Patron, Waller et al. 2005; Patron, Waller et al. 2006). Culture Maintenance: K. micrum ( C C M P 415) and H. triquetra ( C C M P 449) were maintained i n f/2 - Si medium at 18°C on a 16/8h light/dark cycle. Cells were subcultured approximately every 10-20 days, checked visually under a light microscope for signs o f obvious bacterial contamination, and harvested periodically by centrifugation. They were either immediately used for D N A or R N A isolation, or frozen at - 2 0 ° C for up to several months. Primer design: 63 exact-match primer sets were designed to 5' and 3' ends o f c D N A sequences (35 and 28 sets, respectively) from the H. triquetra and K. micrum est library projects. Primers were tested with a variety o f online tools (NetPrimer www.premierbiosoft.com, Gene Walker www.cybergene.se) to avoid A T / G C imbalance, primer dimers and hairpins.  44  RNA and DNA isolation and amplification: R N A was isolated using a standard Trizol protocol, preceded by grinding in liquid nitrogen. D N A was isolated as above (Chapter 2 methods) D N A and R N A were measured for quantity and contamination with a spectrophotometer, and contaminated or low yield samples were discarded. Samples were stored at - 2 0 ° C . Primer sets were used to amplify gene fragments from D N A and R N A by P C R and rtPCR. Fragments were size-separated by electrophoresis on a 0.8% agarose gel. This step selects for amplified fragments that could contain introns. D N A Bands larger than the expected 'intron free' size (based on comparison to parallel R N A amplification or to known c D N A sequence length including primers) were excised, cleaned using M o B i o UltraClean 2.0 kit and cloned as above (Chapter2 methods). Following cloning, colonies with inserts were screened by colony-PCR for insert size and plasmids were isolated and sequenced as above (Chapter2 methods). Sequences were trimmed and analysed using Sequencher 4.2.  3.3 Results and Discussion Sixty-three (35 and 28) attempted D N A amplifications o f H. triquetra and K. micrum sequences yielded 18 (13 and 5) bands that appeared larger than expected size, which were excised and cloned. Screens o f 4-12 colonies from each cloning reaction (ligation) yielded 19 (11 and 8) inserts o f appropriate size for sequencing. N o introns, insertions, or deletions were found in any o f these D N A sequences. These findings show that introns in dinoflagellates are not as abundant as we had thought.  45  Intron distribution in dinoflagellates: It is possible that introns are only present in some dinoflagellates, or that they are more common in the species listed in Table 3.1 than they are in H. triquetra and K. micrum. Therefore, for dinoflagellates at least, it is not appropriate to speculate that because introns are present in some species, they are likely to be present in all (or even most) species. Introns as a contributor to genome growth in dinoflagellates: The evidence also suggests that intron acquisition is not a large contributor to the expansive amounts o f D N A in dinoflagellate genomes. Dinoflagellate genome expansion has likely been the result o f several other processes o f genome evolution (See Chapter 4). . Introns in plastid-targeted genes: Because I found no introns, I am unable to address questions regarding the proportion of canonical and non-canonical boundaries in a given dinoflagellate genome. However, I have made some interesting observations about previously known presences and absences of non-canonical introns. I note that all o f the dinoflagellate genes in Table 3.1, except actin, are present in multicopy tandem repeats (Okamoto, L i u et al. 2001; Rowan, Whitney et al. 1996; Yoshikawa, Uchida et al. 1996), Several o f the genes I chose to amplify from genomic D N A were plastid-targeted genes, but the amplified fragments appeared no larger than the c D N A sequences and contained no introns. A l s o , the non-canonical intron-containing H. rotundata actin gene is not plastid-targeted. Therefore, i f introns (canonical or not) are more prevalent in nuclearencoded, plastid-targeted dinoflagellate genes, I have found no evidence o f this. Introns in multicopy, tandemly repeated genes: Regarding potential association o f introns in genes encolded as tandem repeats, luciferase and rubisco genes from L. polyedra are both present in multicopy tandem repeats, but neither contains introns (see  46  Table 3.1 for those that do). P. minimium also has a multicopy tandemly repeated rubisco with no introns (Zhang and L i n 2003). This shows that loss (L. polyedra, P. minimum) or gain (P. lunula, Symbiodinium) o f introns is independent o f the presence o f gene repeats. Intron loss and gain in dinoflagellate evolution: While it appears that introns were present ancestrally in early dinoflagellates and alveolates(Schott, Robledo et al. 2003), there have likely been many losses since then, as well as possible gains o f non-canonical introns, or mutation o f ancestral intron splice sites. There is not enough data across phylogenetic groups o f dinoflagellates to assess whether intron gain is likely to have occurred.  47  LITERATURE CITED  Archibald, J. M . , C. J. O'Kelly, et al. (2002). The chaperonin genes ofjakobid and jakobid-like flagellates: Implications for eukaryotic evolution. Mol Biol Evol 19(4): 422431.  Burset, M . , I. A. Seledtsov, et al. (2000). Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28(21): 4364-4375.  Hackett, J. D., T. E. Scheetz, et al. (2005). Insights into a dinoflagellate genome through expressed sequence tag analysis. BMC Genomics 6(1): 80.  IHGSConsortium, I. H. G. S., E. S. Lander, et al. (2001). Initial sequencing and analysis of the human genome. Nature 409(6822): 860-921.  Keeling, P. J. (2003). Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet Biol 38(3): 298309.  Lynch, M . and A. O. Richardson (2002). The evolution of spliceosomal introns. Curr Opin Genet Dev 12(6): 701-710.  48  Morse, D., P. Salois, et al. (1995). A nuclear-encoded form ii rubisco in dinoflagellates. Science 268(5217): 1622-1624.  Nixon, J. E., A. Wang, et al. (2002). A spliceosomal intron in giardia lamblia. Proc Natl Acad Sci USA 99(6): 3701-3705.  Okamoto, O. K., L . Liu, et al. (2001). Members of a dinoflagellate luciferase gene family differ in synonymous substitution rates. Biochemistry 40(51): 15862-15868.  Patron, N. J., R. F. Waller, et al. (2005). Complex protein targeting to dinoflagellate plastids. J Mol Biol 348(4): 1015-1024.  Patron, N. J., R. F. Waller, et al. (2006). A tertiary plastid uses genes from two endosymbionts. J Mol Biol 357(5): 1373-1382.  Rowan, R., S. M . Whitney, et al. (1996). Rubisco in marine symbiotic dinoflagellates: Form ii enzymes in eukaryotic oxygenic phototrophs encoded by a nuclear multigene family. Plant Cell 8(3): 539-553.  Roy, S. W. and W. Gilbert (2006). The evolution of spliceosomal introns: Patterns, puzzles and progress. Nat Rev Genet 7(3): 211-221.  49  Saldarriaga, J. F., M . L. McEwan, et al. (2003). Multiple protein phylogenies show that oxyrrhis marina and perkinsus marinus are early branches of the dinoflagellate lineage. Int. J. Syst. Evol. Microbiol. 53(Pt 1): 355-365.  Saldarriaga, J. F., M . L. McEwan, et al. (2003). Multiple protein phylogenies show that oxyrrhis marina and perkinsus marinus are early branches of the dinoflagellate lineage. Int J Syst Evol Microbiol 53(Pt 1): 355-365.  Schott, E. J., J. A. Robledo, et al. (2003). Gene organization and homology modeling of two iron superoxide dismutases of the early branching protist perkinsus marinus. Gene 309(1): 1-9.  Yoon, H. S., J. D. Hackett, et al. (2005). Tertiary endosymbiosis driven genome evolution in dinoflagellate algae. Mol Biol Evol 22(5): 1299-1308.  Yoshikawa, T., A. Uchida, et al. (1996). There are 4 introns in the gene coding the D N A - b i n d i n g protein hcc of crypthecodinium cohnii (dinophyceae). Fisheries Sci 62(2): 204-209.  Zhang, H. and S. Lin (2003). Complex gene structure of the form i i rubisco in the dinoflagellate prorocentrum minimum (dinophyceae). Journal of Phycology 39(6): 11601171.  50  Zhu, W. and V. Brendel (2003). Identification, characterization and molecular phylogeny of ul2-dependent introns in the arabidopsis thaliana genome. Nucleic Acids Res 31(15): 4561-4572.  51  C H A P T E R 4: A G E N O M E S U R V E Y O F HETEROCAPSA  TRIQUETRA  4.1 Introduction We know from studying highly reduced genomes that when a cell is pressured to decrease its genetic content, it employs many methods to do so. Some methods affect coding D N A , some affect non-coding D N A , and some affect both. Microsporidians, for example, have very short intergenic regions, they have shrunk their introns or lost them completely, and they have even lost many genes (Keeling and Slamovits 2005). B y reversing the case o f genome reduction, we immediately have expectations for ways genomes can expand: we expect more introns, larger intergenic regions, and more genes. These expectations are backed up by numerous studies (Alexandrov, Troukhan et al. 2006; Andolfatto 2005; Boeva, Regnier et al. 2006; Cavalier-Smith 2005; Lynch 2006). In comparison to prokaryotes, eukaryotic genomes are large, but gene-poor (Table 4.1). In the human genome, exon sequence makes up less than 2% o f the total genome, and 50% o f the genome is repetitive non-coding sequence (IHGSConsortium, Lander et al. 2001). A t 2.4Mb, the largest human gene is larger than many prokaryotic genomes, has at least 78 introns, but only 2300kb o f coding sequence (Den Dunnen, Grootscholten et al. 1992; Tennyson, Klamut et al. 1995; Ussery and Hallin 2004). If, as in humans, H. triquetra exons make up only 2% o f its genome, what does the rest o f the D N A look like, and what is it doing? Surveying a genomic library allows us to analyse features o f noncoding D N A that are most prevalent in a genome. I have constructed a genomic library o f the H. triquetra genome for this purpose.  52  Table 4.1: Eukaryotic genome sizes. This table with references is reproduced from (Keeling and Slamovits 2005) Organism  Major Eukaryotic Group  size (Mb) Reference  Gonyaulax polyedra  Dinoflagellate  98,000  (Shuter, Thomas etal. 1983)  Heterocapsa pygmaea  Dinoflagellate  4,450  (Triplett, Jovine et al. 1993)  Toxoplasma gondii  Apicomplexan  87  (Blaxterand Ivens 1999)  Plasmodium falciparum  Apicomplexan  23  (Gardner, Hall et al. 2002)  Cryptosporidium parvum  Apicomplexan  9  (Spano and Crisanti 2000)  Paramecium caudatum  Ciliate  8,600  (Shuter, Thomas et al. 1983)  Thalassiosira pseudonana  Diatom  32  (Armbrust, Berges et al. 2004)  Coscinodiscus asteromphalus  Diatom  25,000  (Shuter, Thomas et al. 1983)  Amoeba proteus  Amoeba  290,000  (Friz 1968)  Amoeba dubia  Amoeba  670,000  (Friz 1968)  Dictyostelium discoideum  Slime Mold  34  Entamoeba histolytica  Archamoeba  <20  Trichomonas vaginalis  Parabasalian  60-80  (Glockner, Eichinger et al. 2002) http://www.sanger.ac.uk/ Projects/E-histolytica/ http://www.tigr.org/tdb/ e2kl/tvg/  Trypanosoma sp.  Kinetoplastid  39  (El-Sayed, Hegde et al. 2000)  Leishmania major  Kinetoplastid  33  (Myler, Sisk et al. 2000)  Cyanidioschyzon merolae  Red Alga  16  (Matsuzaki, Misumi et al. 2004)  Guillardia theta (nucleomorph) Red Alga (cryptophyte)  0.55  (Douglas, Zauner et al. 2001)  Chlamydomonas reinhardtii  Green Alga  100  (Harris 1993)  Ostreococcus tauri  Green Alga (picoeukaryote)  10  (Courties, Perasso etal. 1998)  Bigelowiella natans  Green Alga (chlorarachniophyte) 0.38  (Gilson and McFadden 2002)  Oryza sativa  Plant  430  (Arumuganathan and Earle 1991)  Zea mays  Plant  3,000  (Arumuganathan and Earle 1991)  Arabidopsis thaliana  Plant  125  Mouse  Animal  2,500  (Yu, Wright et al. 2000) (Waterston, Lindblad-Toh et al. 2002)  Human  Animal  2,900  (Waterston, Lander et al. 2002)  Fugu rubripes  Animal  365  (Aparicio, Chapman et al. 2002)  Drosophila melanogaster  Animal  137  (Adams, Celniker et al. 2000)  Ciona inteslinalis  Animal  156  (Dehal, Satou et al. 2002)  Saccharomyces cerevisiae  Fungus  12  (Blandin, Durrens et al. 2000)  Cryptococcus neoformans  Fungus  20  (Wickes, Moore et al. 1994)  Neurospora crassa  Fungus  43  Encephalitozoon inteslinalis  Microsporidian  2.3  (Schulte, Becker et al. 2002) (Peyretaillade, Biderre et al. 1998)  Encephalitozoon cuniculi  Microsporidian  2.9  (Katinka, Duprat et al. 2001)  Antonospora locustae  Microsporidian  5.4  (Streett 1994)  Spraguea lophii  Microsporidian  6.2  (Biderre, Pages et al. 1994)  Glugea antherinae  Microsporidian  19.5  (Biderre, Pages et al. 1994)  53  Genome Growth by Gene Enrichment: Gene-dense chromosomal areas in eukaryoi.es are usually comparatively G C rich (Gardiner 1995). Some have claimed that dinoflagellate genomes are high in G C (Herzog, Soyer et al. 1982; Rizzo and C o x 1976), which suggests increased gene density. If H. triquetra's large genome is due to an increase in gene density, we could expect to see a high overall G C content, and one that is similar to the G C content o f the E S T database. With high gene-density we would also expect, by chance, to find some identifiable O R F s and/or multicopy genes. Genome Growth by Repetition: repetitive sequence can be the result o f several processes. Recombination can duplicate regions o f genes or chromosomes, and mobile genetic elements often replicate portions o f themselves and other genes, leaving copies of themselves or conserved insert sequences behind as they move around a genome. Repeated sequence from mobile elements appears as short or long stretches of complex sequence scattered throughout the genome. In contrast, structural D N A often bears the signature o f zones o f low complexity repeated sequence. With this library it w i l l be nearly impossible to address recombination because we w i l l not be able to map large sections o f the genome, but we can easily recognize low complexity repeats and the signature o f mobile elements in complex, nearly exact repeats. Transposons and Retrotransposons: Transposons are mobile genetic elements that insert and excise themselves from D N A . They encode a transposase (which may or may not be sequence- specific) that performs a sticky-end insertion. They have short direct repeats on either side o f their insertion site as a result o f the process, and these can remain behind after the transposon has moved on to another location. In this way, they can cause permanent insertion-mutations. There are two types o f Retrotransposons, Long Terminal  54  Repeat ( L T R ) and n o n - L T R retrotransposons. L I N E s (long interspersed nuclear elements) and SINEs (short interspersed nuclear elements) are two types o f n o n - L T R retrotransposons. L I N E s encode genes with reverse transcriptase and integrase functionality that allow them to reverse-transcribe themselves and other mobile elements (like SINEs) that do not have autoreplicative abilities. L I N E s are initially inserted into a genome by reverse-transcribing themselves from R N A into D N A , then inserting into existing D N A with their integrase functionality. Subsequently, the host genome transcribes the new L I N E element, and reverse transcribed D N A copies o f m R N A can be inserted somewhere else in the genome. In this way, L I N E s can rapidly proliferate throughout a genome, replicating themselves and increasing D N A content with each new insertion. L I N E s can also reverse-transcribe other m R N A s into D N A and insert them into genomic D N A . L T R retrotransposons also replicate by copying and inserting themselves in genomes. They are recognizable by their long terminal repeat regions, which can be hundreds or thousands o f bases long. The proliferation o f L T R retrotransposons and L I N E s can contribute to the rapid expansion o f genomes. L T R replication is tempered by recombination between repeat regions, which can lead to deletion o f the element (Biemont and Vieira 2005; Labrador and Corces 1997) (Hua-Van, L e Rouzic et al. 2005). Mobile elements do more than just increase total genomic size; depending on where they insert, they can disrupt genes or regulatory regions, introduce regulatory sequences to new locations, or cause permanent mutations. Their effect on genome evolution should not be underestimated.  55  The first question when vast quantities o f D N A don't code for anything is often "What could it possibly be doing?". It has become clear in recent years that an important function o f repetitive D N A is not necessarily to "do" something, but to "be" somewhere. This requires a shift in the way we think about D N A , from a code-centric view to the consideration o f D N A as physical elements. When we make this shift, we recognise the intrinsic functionality o f structural D N A and non-coding D N A . Suddenly, taking up space no longer seems wasteful. Shotgun Sequencing as a Strategy for Genome Exploration: Based on an average insert size o f 1Kb, 200 colonies represents only ~200kb of sequence, or 0.0009% o f the estimated size o f the H. triquetra genome. H. triquetra has an estimated 21,000 M i l l i o n bases, and sequencing it entirely is far beyond the scope o f this project. The closest anyone has come to a genome-wide survey o f dinoflagellates was screening a genomic Crypthecodinium cohnii D N A library by reverse hybridization to estimate the repetitiveness o f the genome (Moreau, Geraud et al. 1998). In fact, there is still no intent by any group, to sequence any but the tiniest dinoflagellate genomes (LaJeunesse, Lambert et al. 2005). This genomic library is the only feasible way to get a small glimpse-in-pieces o f what the genome as a whole is like. There is not nearly enough coverage to build contigs confidently, but this is nonetheless a worthwhile project, since a sequence-based survey o f dinoflagellate genomic D N A has never been undertaken.  4.2 Materials and Methods Cell culture and DNA extraction: H. triquetra cells were grown as above, scaled up to 4 L quantities over several months and harvested as described above (Chapter 3 Methods).  56  D N A and R N A extractions were performed on the cells immediately following grinding in liquid nitrogen. D N A from several extractions was measured individually with a spectrophotometer and the best quality, most highly concentrated samples were pooled for the genomic D N A library. Library Building: Invitrogen P C R 4-Blunt T O P O Shotgun Cloning kit with nebulizers was used to make the library. 25ug o f Buffered total D N A was physically sheared with nebulizers for 40 seconds at lOpsi, and run on a 1.0% agarose gel to determine size range. Resulting fragments 1000-4000 bp were precipitated and stored overnight at - 2 0 ° C , then used in subsequent steps as per kit instructions. It is important to note that all steps following precipitation o f sheared D N A fragments should be done continuously without storage o f intermediates to eliminate issues arising from degradation o f blunt ends and consequent problems with efficient vector ligation. I did 25ul Blunt end repair reactions (yielding lOul resuspended B E R D N A ) in order to keep volumes low for downstream ligation and cloning steps. I serially diluted, ligated, cloned, and plated a subset o f the B E R D N A and stored the remaining 7ul overnight. I used the full 7ul the following morning in a single large ligation at 1/3 dilution and 10 separate transformations, resulting in approximately 30 plates each containing 50-100 white (transformed) colonies for a rough total o f 2200 colonies with an insert. Screening: I screened 550 colonies using the Fidelitaq colony P C R protocol and agarose gel electrophoresis for large (>1000bp) inserts. Approximately 200 of these colonies contained >1000bp inserts. Those between 1000 and 1300bp were sequenced in one direction as above (Chapter2 Methods), and both strands o f (35) inserts >1300bp were sequenced. A small number o f inserts screened (8) were between 2000 and 3000bp.  57  Sequences were vector-trimmed and auto-assembled into contigs using Sequencher. Where bi-directional sequencing did not cover the full length o f the insert, exact-match primers for internal sequencing walks were made to cover the gap. (These sequencing walks are currently being done by Raheel Humayun to complete 32 large fragments). Sequence analysis: Sequences were trimmed and edited for quality in Sequencher 4.2, and exported using Fetch 5.0.5 (http://fetchsoftworks.com) to the lec 'cowpie' server (http://ernie.botany.ubc.ca/lec/lec.html) which runs blastx and blastn through the N C B I server, and similarity searches against sequences in the local database, returning top hits and links.  4.3 Results and Discussion Facts and Features ofH. triquetra genomic DNA library: Sequence editing produced 208 sequences (annotated A0001 - A0214), average length 1 lOObp, ranging from 335bp2027bp (Figure 4.1). These sequences total 228Kb o f unambiguous sequence (231Kb with 1.3% ambiguous bases) and represent 198 genomic D N A ( g D N A ) fragments (some fragments are made up o f two sequences). The largest completely sequenced fragment (A0197) is 2027bp with 45.8% G C . There are no obvious repeats in A0197 (dot plot not shown) and it has no similarity to any other fragments in the local database or at N C B I . The largest unfinished fragment is estimated at 3200bp.  58  Size Distribution of Heterocapsa triquetra Genomic D N A Fragments  90  79  80 H  70  60  e  I  50  PH  39  O  40 JO  £ 3  2: 30 20  17  11 10  inn r O  ^  ^ O  <  n  v O  A ^ 3  3  2  _n  o  r O  -  ; O  0  0 O  0  ; O  0  '  -  -  ;  f  N ^  r  n —  ^  —  < <  /  ^  -  -  v -  n  o -  '  r  >  -  -  o -  o -  o  \ ~  0  '  -  (N  i <N  Fragment Size (Kb)  Figure 4.1: Histogram showing the distribution of fragments by size.  59  GC Content: The H.triquetra genome contains an estimated 53.56%GC based on 231Kb of D N A sequence from 208 fragments that range from 26.42%-70.14%GC. Most of the fragments have G C contents between 52% and 67% with a mode o f 59.5% (Figure 4.2). Fragment A0003 has the lowest % G C at 26.4% and shows high similarity (3 " ) to e  14  plastid-targeted hydroxyproline-rich glycoprotein (mucin) from plants. The next two most A T - r i c h fragments, at 2 7 % G C , encode the only full-length genes identified from the set o f fragments, the methyltransferases (A0206 and A0020). The G C estimate for H. triquetra coding D N A is 61.75%) based on l,816,929bp o f coding sequence from the E S T library (minus poly-A tails (Wang and Morse 2006)). 19 Plastidtargeted genes from the E S T database have an average G C content o f 64% (ranging from 59-72%). This data supports the common assertion that coding regions are richer in G and C nucleotides than non-coding regions are. Gene content: 198 fragments rough vector-trimmed fragments were analysed to determine similarity to publicly available sequences. 173/198 were acceptable for analysis (absent o f large amounts o f ambiguous sequence). O f these 173, 61 fragments had similarity (e" or better) to genes on N C B I database. Most o f these hits were for 04  short stretches o f D N A , 20-150bp. 14 o f 61 hit genes coding for bacterial 16S or 23S r R N A , 7 hit other hypothetical bacterial proteins or O R F s , including one proteobacterial gene. A t least 9 fragments had portions o f mobile elements. These include integrase (AO 147) and the pol gene (AO 135), both from hits to eukaryotic transposons. Finally, two separate fragments (A0020, A0204R) encode full-length cytosine-dependent D N A methyltransferases. 112 fragments (64.7%) did not show significant similarity to any genes in the N C B I database, though some o f these (below) did share repeats with  60  p  CO  m  (N  00  V  CN  ro  o  •—i  ro  ro  ro  ro  M3  o  ro  *t  CN  in Q\  m  in CN  in  00  in in in  1—1  00  in  © —  ro  ©  ro  CN  Percent G C  Figure 4.2: Histogram showing the distribution o f fragments by G C content.  61  other fragments. This data supports the null hypothesis that genes are sparse i n dinoflagellates like in most eukaryotes. Repeats: Several fragments share short regions o f nearly perfect sequence similarity with other fragments in the database. These imperfect complex repeats do not appear on the ends o f fragments, as would occur i f the fragments were adjacent segments o f the genome (or i f they were contaminated with flanking vector sequence), but internally. For example, a 61 bp sequence is shared between AOOlOf and A0009f. The sequence is located at 537-597 o f AOOlOf and 605-665 o f A 0 0 0 9 f and there are 7 mismatches along its length. A similar 150bp sequence is shared between 275-425 o f A0079r and 687-838 of A0009f. Note that A 0 0 0 9 f contains two different repeat regions. A 0 0 9 0 f and A0081r match at 215 sites o f a 250bp region in the middle of their fragments. There are at least four other pairs like these, the shared region ranging from 150-400bp. One sequence has a small internal repeat near the 3' end (7bp tandem repeat) and none are entirely made up of repeated sequence. There could be as many as 25000 copies o f a single repeat in the C. cohnii genome (Moreau, Geraud et al. 1998), and a set o f seven pairs o f imperfect complex repeats in this sample suggests that they could be a prevalent feature o f dinoflagellate genomes. Transposons: Fragment A0213 showed similarity (8e-17) to an L T R retrotransposon, and AO 169 could itself be an L T R retrotransposon. AO 169 has two nearly exact match rubisco fragments (50 and 25bp). The fragments are 75bp apart and buffered between two lOObp direct repeats by 80bp on one side and 150bp on the other. This fragment may have resulted from internal recombination between the transposon and a rubisco form II polyprotein. O f the few fragments with recognisable sequence, most carry mobile  62  elements or bacterial ribosomal gene fragments. Finding multiple retrotransposon fragments in such a small sample of the genome suggests that they make up a large o f the H. triquetra genome.  LITERATURE CITED  Adams, M . D., S. E. Celniker, et al. (2000). The genome sequence of drosophila melanogaster. Science 287(5461): 2185-2195.  Alexandrov, N. N., M . E. Troukhan, et al. (2006). Features of arabidopsis genes and genome discovered using full-length cdnas. Plant Mol Biol 60(1): 69-85.  Andolfatto, P. (2005). Adaptive evolution of non-coding D N A in drosophila. Nature 437(7062): 1149-1152.  Aparicio, S., J. Chapman, et al. (2002). Whole-genome shotgun assembly and analysis of the genome of fugu rubripes. Science 297(5585): 1301-1310.  Armbrust, E . V., J. A. Berges, et al. (2004). The genome of the diatom thalassiosira pseudonana: Ecology, evolution, and metabolism. Science 306(5693): 79-86.  Arumuganathan, K. and E. D. Earle (1991). Nuclear D N A content of some important plant species. Plant Mol Biol Rep 9: 208-219.  Biderre, C , M . Pages, et al. (1994). On small genomes in eukaryotic organisms: Molecular karyotypes of two microsporidian species (protozoa) parasites of vertebrates. C R Acad Sci III 317(5): 399-404.  64  Biemont, C. and C. Vieira (2005). What transposable elements tell us about genome organization and evolution: The case of drosophila. Cytogenet Genome Res 110(1-4): 2534.  Blandin, G., P. Durrens, et al. (2000). Genomic exploration of the hemiascomycetous yeasts: 4. The genome of saccharomyces cerevisiae revisited. FEBS Lett 487XI): 31-36.  Blaxter, M . and A. Ivens (1999). Reports from the cutting edge of parasitic genome analysis. Parasitol Today 15(11): 430-431.  Boeva, V., M . Regnier, et al. (2006). Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22(6): 676-684.  Cavalier-Smith, T. (2005). Economy, speed and size matter: Evolutionary forces driving nuclear genome miniaturization and expansion. Ann Bot (Lond) 95(1): 147-175.  Courties, C , R. Perasso, et al. (1998). Phylogenetic analysis and genome size of ostreococcus tauri (chlorophyta, prasinophyceae). Journal of Phycology 34: 844-849.  Dehal, P., Y. Satou, et al. (2002). The draft genome of ciona intestinalis: Insights into chordate and vertebrate origins. Science 298(5601): 2157-2167.  65  Den Dunnen, J. T., P. M . Grootscholten, et al. (1992). Reconstruction of the 2.4 mb human dmd-gene by homologous yac recombination. Hum Mol Genet 1(1): 19-28.  Douglas, S., S. Zauner, et al. (2001). The highly reduced genome of an enslaved algal nucleus. Nature 410(6832): 1091-1096.  El-Sayed, N. M . , P. Hegde, et al. (2000). The african trypanosome genome. Int J Parasitol 30(4): 329-345.  Friz, C. T. (1968). The free amino acid levels of pelomyxa carolinensis, amoeba dubia and proteus. J Protozool 15(1): 149-152.  Gardiner, K. (1995). Human genome organization. Curr Opin Genet Dev 5(3): 315-322.  Gardner, M . J., N. Hall, et al. (2002). Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419(6906): 498-511.  Gilson, P. R. and G. I. McFadden (2002). Jam packed genomes-a preliminary, comparative analysis of nucleomorphs. Genetica 115(1): 13-28.  Glockner, G., L . Eichinger, et al. (2002). Sequence and analysis of chromosome 2 of dictyostelium discoideum. Nature 418(6893): 79-85.  66  Harris, E . H. (1993). Chlamydomonas reinhardtii. Genetic maps: A compilation of linkage and restriction maps of genetically studied organisms. S. J. O'Brien. Cold Spring Harbor, New York, Cold Spring Harbor Laboratory: 2156-2169.  Herzog, M . , M . O. Soyer, et al. (1982). A high level of thymine replacement by 5hydroxymethyluracil in nuclear D N A of the primitive dinoflagellate prorocentrum micans e. Eur J Cell Biol 21 (2): 151-155.  Hua-Van, A., A. Le Rouzic, et al. (2005). Abundance, distribution and dynamics of retrotransposable elements and transposons: Similarities and differences. Cytogenet Genome Res 110(1-4): 426-440.  IHGSConsortium, I. H. G. S., E . S. Lander, et al. (2001). Initial sequencing and analysis of the human genome. Nature 409(6822): 860-921.  Katinka, M . D., S. Duprat, et al. (2001). Genome sequence and gene compaction of the eukaryote parasite encephalitozoon cuniculi. Nature 414(6862): 450-453.  Keeling, P. J. and C. H. Slamovits (2005). Causes and effects of nuclear genome reduction. Curr Opin Genet Dev 15(6): 601-608.  67  Labrador, M . and V. G. Gorces (1997). Transposable element-host interactions: Regulation of insertion and excision. Annu Rev Genet 31: 381-404.  Lajeunesse, T. C., G. Lambert, et al. (2005). Symbiodinium (pyrrhophyta) genome sizes ( D N A content) are smallest among dinoflagellates. Journal of Phycology 41: 880886.  Lynch, M . (2006). The origins of eukaryotic gene structure. Mol Biol Evol 23(2): 450468.  Matsuzaki, M . , O. Misumi, et al. (2004). Genome sequence of the ultrasmall unicellular red alga cyanidioschyzon merolae lOd. Nature 428(6983): 653-657.  Moreau, H., M . L. Geraud, et al. (1998). Cloning, characterization and chromosomal localization of a repeated sequence in crypthecodinium cohnii, a marine dinoflagellate. Int Microbiol 1(1): 35-43.  Myler, P. J., E. Sisk, et al. (2000). Genomic organization and gene function in leishmania. Biochem Soc Trans 28(5): 527-531.  Peyretaillade, E., C. Biderre, et al. (1998). Microsporidian encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a Isu rrna reduced to the universal core. Nucleic Acids Res 26(15): 3513-3520.  68  Rizzo, P. J. and E. R. Cox (1976). Isolation and properties of nuclei from binucleate dinoflagellates. J. Phyc. 12: 31.  Schulte, U., I. Becker, et al. (2002). Large scale analysis of sequences from neurospora crassa. J Biotechnol 94(1): 3-13.  Shuter, B. J., J. E . Thomas, et al. (1983). Phenotypic correlates of genomic D N A content in unicellular eukaryotes and other cells. American Naturalist 122: 26-54.  Spano, F. and A . Crisanti (2000). Cryptosporidium parvum: The many secrets of a small genome. Int J Parasitol 30(4): 553-565.  Streett, D. A . (1994). Analysis of nosema locustae (microsporidia) to antonospora locustae n comb. Based on molecular and ultrastructural data. Journal of Eukaryotic Microbiology 51: 207-213.  Tennyson, C. N., H. J. Klamut, et al. (1995). The human dystrophin gene requires 16 hours to be transcribed and is cotranscriptionally spliced. Nat Genet 9(2): 184-190.  Triplett, E. L . , R. V. Jovine, et al. (1993). Characterization of two full-length cdna sequences encoding for apoproteins of peridinin-chlorophyll a-protein (pep) complexes. Mol Mar Biol Biotechnol 2(4): 246-254.  69  Ussery, D. W. and P. F. Hallin (2004). Genome update: Length distributions of sequenced prokaryotic genomes. Microbiology 150(Pt 3): 513-516.  Wang, Y. and D. Morse (2006). Rampant polyuridylylation of plastid gene transcripts in the dinoflagellate lingulodinium. Nucleic Acids Res 34(2): 613-619.  Waterston, R. H., E . S. Lander, et al. (2002). O n the sequencing of the human genome. Proc Natl Acad Sci USA 99(6): 3712-3716.  Waterston, R. H., K. Lindblad-Toh, et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915): 520-562.  Wickes, B. L., T. D. Moore, et al. (1994). Comparison of the electrophoretic karyotypes and chromosomal location of ten genes in the two varieties of cryptococcus neoformans. Microbiology 140 ( Pt 3): 543-550.  Yu, Z., S. I. Wright, et al. (2000). Mutator-like elements in arabidopsis thaliana. Structure, diversity and evolution. Genetics 156(4): 2019-2031.  70  C H A P T E R 5: C O N C L U S I O N  Reduction of Endosymbiont genomes: K. foliaceum is a particularly complex cell. It contains five genomes (host nuclear, endosymbiont nuclear, host mitochondria, endosymbiont mitochondria, and endosymbiont plastid) and the memory o f a sixth (the peridinin-containing plastid genome o f the past) in the form o f the eyespot. The K. foliaceum nucleus likely contains transferred genes from its first (red algal) plastid (as K. micrum does, (Patron, Waller et al. 2006)) and might encode a small set o f genes from its second (diatom) plastid or its nucleus, which is in the early stages o f reduction ( M c E w a n and Keeling 2004). The K. foliaceum example shows the extreme extent to which dinoflagellates use the acquisition o f endosymbiotic organelles to increase the diversity and content o f their genomes. Introns: Like many eukaryotes, dinoflagellates contain introns, but the distribution o f introns among dinoflagellate species is yet unclear because o f a small and poorly dispersed sample set o f only 16 introns in 5 genes. Though the genomes o f some eukaryotes contain large proportions o f introns, this is not the case for dinoflagellates. I found no introns in either species, and no evidence to suggest that introns have played a large part i n the expansion o f dinoflagellate genomes. Genome survey: This study is the first attempt at characterizing the makeup o f dinoflagellate genomes at a genome-wide scale. The estimate o f H. triquetra G C content is lower than is usually reported for dinoflagellate genomes, but coding sequences are higher in G C than the overall genomic D N A , which fits the normal eukaryotic trend. Neither G C content nor representation o f genes in the database supported high levels o f  71  duplicate genes in the H. triquetra genome. The data shows that imperfect complex repeats and transposable elements are both widely present in the H. triquetra genome.  72  L I T E R A T U R E CITED  McEwan, M . L . and P. J. Keeling (2004). Hsp90, tubulin and actin are retained in the tertiary endosymbiont genome of kryptoperidinium foliaceum. J Eukaryot Microbiol 51(6): 651-659.  Patron, N. J., R. F. Waller, et al. (2006). A tertiary plastid uses genes from two endosymbionts. J Mol Biol 357(5): 1373-1382.  73  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items