UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Studies on the dinoflagellate genome McEwan, Michelle Louise 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


831-ubc_2006-0256.pdf [ 4.09MB ]
JSON: 831-1.0093155.json
JSON-LD: 831-1.0093155-ld.json
RDF/XML (Pretty): 831-1.0093155-rdf.xml
RDF/JSON: 831-1.0093155-rdf.json
Turtle: 831-1.0093155-turtle.txt
N-Triples: 831-1.0093155-rdf-ntriples.txt
Original Record: 831-1.0093155-source.json
Full Text

Full Text

STUDIES O N T H E D I N O F L A G E L L A T E G E N O M E by M I C H E L L E L O U I S E M c E W A N B . S c , The University of British Columbia, 2002 A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R OF S C I E N C E in T H E F A C U L T Y OF G R A D U A T E S T U D I E S (Genetics) T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A Apr i l 2006 © Michelle Louise McEwan, 2006 A B S T R A C T Dinoflagellates are unusual eukaryotes in many ways, but one of the most interesting features of this cell - its enormous genome - is not well studied because its sheer size is an obstacle to sequencing. Genome expansion can be the result of polyploidy, intron gain, mobile genetic elements, or large intergenic regions. I have studied organellar genome reduction in Kryptoperidinium foliaceum, intron composition in Heterocapsa triquetra and Karlodinium micrum, and surveyed genomic D N A from H. triquetra in order to get a better grasp on mechanisms of genome expansion in dinoflagellates. K. foliaceum has replaced its ancestral red algal plastid with a diatom plastid via tertiary endosymbiosis. Gene transfer from endosymbiont to host nucleus has likely occurred, but this endosymbiont is much less reduced than well-studied secondary endosymbiotic intermediates, the cryptophytes and chlorarachniophytes, where relict nuclear genomes (nucleomorphs) are retained. I sequenced the first protein-coding genes from the K. foliaceum endosymbiont and host nuclear genomes. I have characterised genes for nucleus-encoded cytosolic proteins, actin, alpha-tubulin, beta-tubulin, and HSP90, from both host and symbiont nuclei of K . foliaceum. Phylogenies show that the actin is diatom-derived, the beta-tubulin dinoflagellate-derived, while both diatom- and dinoflagellate-derived alpha-tubulin and HSP90 genes were found. The presence of these genes implies they are still functional and that the endosymbiont is at an earlier stage of genetic reduction than those of cryptophytes or chlorarachniophytes, Thirteen of 16 known dinoflagellate introns are non-canonical. I amplified and screened 63 K. micrum and H. triquetra genes, but found no introns. I report that introns are neither abundant in dinoflagellate genomes, nor have they played a major role in dinoflagellate genome expansion. I built a genomic D N A library for the dinoflagellate, H. triquetra, and sequenced 214 fragments (23 l,164bp). Main features of this library include imperfect complex repeats, retrotransposon domains, 53% G C content, few open reading frames (ORFs), and a lack of identifiable protein-coding regions. These results support mobile elements and repeats as major sources of D N A in expanded dinoflagellate genomes. The best explanation for the huge amounts of rion-coding D N A remains the idea that it functions as a structural scaffold and contributes to chromosomal organization. Key words: Dinoflagellate, Kryptoperidinium foliaceum, Heterocapsa triquetra, Karlodinium micrum, Nitzschia thermalis, Phaeodactylum tricornutum, endosymbiosis, plastids, tertiary, secondary, gene transfer, genome reduction, genome expansion, introns, non-canonical, non-coding D N A , methylation, diatom, gene loss, plastid replacement. iii T A B L E OF C O N T E N T S Abstract i i List of Tables v List of Figures v i Acknowledgements v i i Dedication v i i i C H A P T E R I Introduction 1.1 Morphology, Ecology, and Lifestyle of Dinoflagellates 1 1.2 Biology of Dinoflagellates 3 1.3 Organelle Genomes in Dinoflagellates 5 1.4 Dinoflagellate Genomes 8 1.5 Expansion and Reduction of Eukaryotic Genomes 11 1.6 Investigating the Dinoflagellate Genome 11 Literature Cited 13 C H A P T E R II Organelle Integration in Kryptoperidinium foliaceum 2.1 Introduction 19 2.2 Materials and Methods 21 2.3 Results and Discussion 24 Literature Cited 36 C H A P T E R III Intron Distribution in Dinoflagellates 3.1 Introduction 40 3.2 Materials and Methods 44 3.3 Results and Discussion 45 Literature Cited 48 C H A P T E R IV A Genome Survey of Heterocapsa triquetra 4.1 Introduction 52 4.2 Materials and Methods 56 4.3 Results and Discussion 58 Literature Cited 62 C H A P T E R V Conclusion 71 Literature Cited 73 iv List of Tables 1.1 Dinoflagellates and toxins associated with Harmful Algal Blooms ( H A B ) 3 1.2 Genome sizes relative to the 3Mb human genome 4 2.1 A T content of dinoflagellate and diatom protein coding genes 32 3.1 Eukaryotic non-canonical introns 42 4.1 Eukaryotic genome sizes 53 v List of Figures 1.1 Morphological Diversity in Benthic Dinoflagellates 2 1.2 Plastid Evolution and Endosymbiosis 6 2.1 Alpha-tubulin Maximum Likelihood ( M L ) phylogeny 26 2.2 Beta-tubulin M L phylogeny 27 2.3 Actin M L phylogeny 28 2.3 HSP90 M L phylogeny - 29 4.1 Size Distribution of Heterocapsa triquetra Genomic D N A Fragments 59 4.2 G C Content of Heterocapsa triquetra Genomic D N A Fragments 61 vi Acknowledgements Many thanks are due to the friends, colleagues, and peers who have been such a pleasure to work and learn with. M y supervisor, Patrick Keeling has been encouraging and supportive from the very beginning. Patrick, you have been a source of scientific energy and inspiration since the very first protistology lesson. I have learned so much these past few years, and I thank you for the opportunities, support and challenges you have given me. M y committee members Naomi Fast, Max Taylor, and Sally Otto have been excellent sources of information and guidance. I have appreciated your honest answers, helpful hints and happy laughter at committee meetings! Y o u have all taught me a lot about science and what it means to be a scientist. I also thank the members of my examining committee, Jim Berger and Naomi Fast, for their time and effort in preparation for my defense. Hugh Brock, the Director of the Genetics Graduate Program, has been the consummate go-to guy. Hugh, I always felt like you were will ing to lend an ear no matter how much you had going on. I won't forget the times you went to bat for me over various roadblocks and bottlenecks. Your realistic perspective and calm demeanor have been refreshing and helpful. Thank you. I thank my lab mates and fellow grad students in the Botany department for giving me a home, sharing beers, coffee, stories, coffee, techniques, coffee, and late nights at the lab. I've enjoyed many excellent discussions on the finer points of P C R voodoo and several skillful versions of the elusive P C R dance that I may never forget. M y students and mentees are reminders of the excitement of learning new things. They have helped me keep perspective on where I've been and how far I've come. And last but never least, my friends and cohorts, Jamie Pighin and Robin Young. Y o u always have interesting advice to give and good stories to share. Y o u make being inside all day pretty fun. To my parents and sisters For your love and support from the very beginning. To my partner T i m For sharing discussions, laughs, adventures and challenges. Thank you for helping me maintain balance and for reminding of the things that are most important in life. Co-Authorship Statement A l l work contained in this thesis was done by me except where noted. The contents of Chapter 2 have been published in a manuscript which was prepared with assistance from Patrick Keeling: McEwan, M L . & Keeling, P.J. 2004. "HSP90, Tubulin and Actin are Retained in the Tertiary Endosymbiont Genome of Kryptoperidinium foliaceum." The Journal of Eukaryotic Microbiology, (51) 6, 651-659. The content of Chapter 3 is all my own work, but work by Raheel Humayun on the same library wil l be included in the final manuscript submission. ix C H A P T E R I: I N T R O D U C T I O N 1.1 Morphology, Ecology and Lifestyle of Dinoflagellates Dinoflagellates (Dinozoa) are one of three protist phyla in the eukaryotic group Alveolata, whose members are characterized by the presence of alveolae beneath the cell membrane and tubular mitochondrial cristae. They are (mostly) unicellular organisms with cortical alveolae, two asymmetrical flagella and a unique nuclear arrangement (Taylor 1987). The other two alveolate phyla (Apicomplexa and Ciliophora) are composed of parasitic and heterotrophic members, respectively. Dinoflagellates are globally widespread and are ecologically and morphologically diverse (Figure .1.1). Their habitats also span wide ranges in temperature, salinity, light, and nutrients. Members can be planktonic or benthic, fresh water or marine, parasitic (ex: Haplozoori), or symbiotic (ex: Symbiodinium). Photosynthesis and heterotrophy are equally represented among species, and many 'mixotrophic' species exist. Many species produce toxins that through concentration in food chains cause major yearly fish kills (Pfiesteria piscicida, Gambler discus) and render shellfish highly poisonous or lethal (Table 1.1) (Van Dolah 2000). Dinoflagellates are major photosynthetic producers in the world's oceans, and the mutually beneficial symbiosis between corals and dinoflagellate genus Symbiodinium accounts for most of the photosynthetic production in the world's coral reefs (Knowlton and Rohwer 2003). The classic external dinoflagellate morphology consists of an epicone (epitheca), hypocone (hypotheca), and two flagella, one (transverse) that emerges ventrally and spirals counterclockwise along an equatorial groove called the girdle, and one 1 w M M J l * wS \ • m 11 p H P Figure 1.1: Morphological Diversity in Benthic Dinoflagellates. A range of marine, sand-dwelling dinoflagellates. These are some of the less elaborate forms, yet the diversity is still astonishingly broad. Chromosomes are visible in most of these images as small clear 'bubbly' forms. (Image courtesy of Mona Hoppenrath) 2 (longitudinal) that extends posteriorly from the same origin, along a longitudinal groove called the sulcus. The transverse flagella supplies forward propulsion and the sulcal flagella acts as a rudder. The resulting motion is a forward twirling trajectory; the prefix dino- means "whirling" in greek. I have used H. triquetra, K. micrum and K. foliaceum as subjects in this research, and they all have many of the classic features of armored dinoflagellates (Dodge 1985; Taylor 1987). Table 1.1: Dinoflagellates and toxins associated with Harmful Alga l Blooms ( H A B ) (information in this tabic from (Van Dolah 2000)) Toxin Condition Abbreviation Associated dinoflagellate(s) azaspiracids ciguatoxin diarrhetic shellfish toxins brevetoxin saxitoxin ? azaspiracid shellfish poisoning ciguatera fish poisoning diarrhetic shellfish poisoning neurotoxic shellfish poisoning paralytic shellfish poisoning possible estuary associated syndrome A Z P C F P D S P N S P PSP Protoperidinium crassipes Gambierdiscus toxicus Dinophysis, Prorocentrum Karenia brevis Gymnodinium catenatum, Pyrodinium bahamense, Alexandrium ? 1.2 Biology of Dinoflagellates Dinoflagellates make an impact on our lives because of their beautiful displays of phosphorescence, their key role in the building and maintenance of coral reefs, and stunning (visually and literally) seasonal harmful algal blooms. Incredibly, these characteristics are just the beginning of the many exceptional aspects of dinoflagellate biology. When we look closely at the biology of dinoflagellates, a unique picture of eukaryotic innovation appears. Dinoflagellates are exceptional among eukaryotes in several ways (Rizzo 2003). First, they have massive amounts of D N A in their nuclei (as much as 60x the human genome - Table 1.2), which is easily visible with light 3 Table 1.2: Genome sizes relative to the 3Mb human genome Organism Group Relative Size E. coli Bacteria - 650x smaller Encephalitozoon intestinalis Microsporidia - 1300x smaller Saccharomyces cerivisiae Yeast - 2 5 Ox smaller Heterocapsa pygmaea, Katodinium rotundatum Dinoflagellates - about the same Heterocapsa triquetra - 7x larger Gonyaulax - 24x larger Alexandrium tamarense - 3 Ox larger Prorocentrum micans - 65x larger Table 1.2: Genome sizes from various sources. Dinoflagellate genome size was calculated from picograms D N A (LaJeunesse, Lambert et al. 2005), assuming the average base pair = 660Da = 1.022*10-9 pg. Other sizes from (Keeling, Fast et al. 2005). microscopy as large condensed chromosomes. Second, they manage to organize this D N A without the benefit of histones or nucleosomes of any recognizable sort, keeping it condensed throughout the entire cell cycle (Herzog and Soyer 1981). The chromosomes are arranged in what is most commonly described as a dense fibrillar or brushlike organization, with loops of D N A that unwind into the nucleoplasm during transcription (Moreno Diaz de la Espina, Alverca et al. 2005). Basic 'histone-like' proteins have been found in dinoflagellates but they likely play a role in transcription rather than a structural role (Sala-Rovira, Geraud et al. 1991). Third, dinoflagellates undergo 'closed' mitosis where the nuclear envelope remains intact and duplicated chromatids are separated by a microtubular spindle external to the envelope that never directly contacts the chromosomes (Figure 1.2) (Graham and Wilcox 2000; Spector and Triemer 1981). Fourth, dinoflagellates show an unparalleled propensity for plastid switching and as a result, contain a wide variety of plastids with different pigment and chlorophyll compliments (Saldarriaga, Taylor et al. 2001). Fifth, the plastid genomes themselves are quite bizarre. Many plastid genes are encoded on single-gene minicircles (Zhang, Green et al. 1999) while plastid m R N A s are modified with poly-uridine tails (Wang and Morse 2006). Sixth, dinoflagellate mitochondria are also doing things differently; mitochondrial genes undergo post-transcriptional base modifications at multiple sites, and via very rare types of base modification. Finally, spliceosomal intron boundary sequences in eukaryotes are so conserved that they are referred to as 'canonical', but in dinoflagellates, of the few introns found so far, all but one have non-canonical intron boundaries (Okamoto, L i u et al. 2001; Rowan, Whitney et al. 1996; Yoshikawa, Uchida et al. 1996). O f the seven molecular features mentioned here, I have focused on organellar genome reduction in K. foliaceum, intron composition in H. triquetra and K. micrum, and general theories of genome expansion in H. triquetra. 1.3 Organelle Genomes in Dinoflagellates Plastids Photosynthesis occurs in plastids; in algae these plastids can be split into three groups by their endosymbiotic origins: primary, secondary, and tertiary. Primary plastids are the result of eukaryotic engulfment of phototrophic cyanobacteria and have a double membrane. Red and green algae, plants and glaucophytes obtained their plastids this way. Secondary plastids are the result of an engulfment of an algal cell containing a primary plastid. Dinoflagellates, Apicomplexans, Heterokonts, Haptophytes, Euglenids, Cryptophytes, and Chlorarachniophytes all contain a secondary plastid at different levels of integration. These plastids usually have four membranes (Figure 1.2). 5 Figure 1.2: Plastid Evolution and Endosymbiosis, from (Keeling 2004). Tertiary endosymbiosis gives rise to the dinoflagellates Dinophysis, Karenia, Kryptoperidinium, and Lepidodinium. The chromalveolates acquired their secondary plastid from a red alga. 6 Tertiary plastids are formed by endosymbiotic uptake of a cell that contains a secondary plastid. For example, K. foliaceum obtained a tertiary plastid by engulfing a pennate diatom (a heterokont), and K. micrum contains a haptophyte-derived tertiary plastid. Tertiary plastids can have three, four or five membranes. Between the engulfment of an algal cell and the stage at which it becomes an integrated organelle, some genes are lost from the endosymbiont nucleus and some are transferred to the host nucleus. Usually the endosymbiont nucleus reduces this way until it is lost completely, but in Chlorarachniophytes and Cryptophytes, a tiny remnant of the secondary endosymbiont's nucleus (the nucleomorph) remains. In some dinoflagellates (K. foliaceum, Durinskia baltica, Galeidinium rugatum) the tertiary endosymbiont (now functioning as a plastid) still has a large nucleus with much genetic material (Jeffrey and Vesk 1976; Tamura, Shimada et al. 2005; Tomas and Cox 1973mor). Alveolates and Chromists are derived from a single lineage that acquired a secondary plastid of red algal origin but dinoflagellates are the only alveolates that have widely retained this algal plastid for photosynthetic purposes. Photosynthetic dinoflagellates exhibit wide plastid diversity, due to the loss of the ancestral peridinin containing plastid from several lineages and secondary and tertiary acquisitions of other plastid forms (Saldarriaga, Taylor et al. 2001). Saldarriaga mapped plastid characteristics to a phylogenetic tree of dinoflagellate nuclear S S U r R N A s to determine a proposal of plastid loss and gain. Other evidence for plastid replacement is derived from evolutionary analysis of plastid-targeted genes. In K. micrum, not only have plastid-targeted genes from the tertiary endosymbiont nucleus been transferred to the dinoflagellate nucleus, but they co-exist there with 7 plastid-targeted genes from the ancestral secondary plastid (Patron, Waller et al. 2006). In K. foliaceum, there is physical evidence of the ancestral plastid in the form the eyespot: a triple membrane surrounding carotene droplets (Dodge and Crawford 1969). Fickle plastid switching is not the only strange characteristic of dinoflagellate plastids. In five genera of peridinin containing dinoflagellates (Amphidinium, Ceratium, Protoceratium, Symbiodinium, Heterocapsa), at least 18 plastid genes are encoded on plasmid-like single-gene 'minicircles' (Barbrook and Howe 2000; Laatsch, Zauner et al. 2004; Zhang, Green et al. 1999). The minicircle m R N A transcripts are modified with a poly-U tail. Twelve poly-U m R N A s have been found to date and evidence suggests many more exist (Wang and Morse 2006). Finally, dinoflagellate plastids use type II rubisco. A l l other photosynthetic eukaryotes use type I (Morse, Salois et al. 1995). Mitochondria Some dinoflagellate mitochondrial pre-mRNAs undergo cytidine-to-uridine (C-U) editing seen at low frequency in many plant and animal mitochondrial transcripts. However, dinoflagellate mitochondrial transcripts (mtRNAs) also undergo at higher frequency an even more rare U - C type of editing. A third type, adenosine-inosine (A-I) editing previously seen only in a mammalian nuclear gene and cytoplasmic tRNAs , also occurs in dinoflagellate mtRNAs. Remarkably, A - I editing occurs in close to 50% of edited sites in these mtRNAs (Lin , Zhang et al. 2002). 1.4 Dinoflagellate Genomes In the past dinoflagellate nuclei (dinokaryons) were considered so abnormal that they were thought to represent a median life form between prokaryotic and eukaryotic cells, 8 called the "mesokaryote" (Hamkalo and Rattner 1977; Spector and Triemer 1981). Histone-like proteins in dinoflagellates show more similarity to bacterial DNA-binding proteins than to eukaryotic histones (Wong, New et al. 2003) and dinoflagellates use a nuclear encoded proteobacterial form II Rubisco instead of the cyanobacterial form I (Morse, Salois et al. 1995; Palmer 1995; Rowan, Whitney et al. 1996). With molecular phylogeny as a tool, molecular features that appear 'bacterial l ike' say more about potential lateral gene transfer than direct ancestry, so despite the many unusual characteristics of dinoflagellates, they are undeniably eukaryotes (Costas and Goyanes 2005). Mitosis During mitosis in most eukaryotes, the nuclear envelope disintegrates before spindle formation and segregation of chromosomes; in dinoflagellates, an external spindle forms outside of the nuclear envelope, which remains intact throughout segregation and cytokinesis. The external spindle attaches to chromosomes through tunnels in the fluid nuclear envelope without ever piercing it. It pulls daughter chromatids apart much like a magnet held under a sheet of paper pulls lead filings along its surface. The envelope then pinches off the daughter nuclei (Bhaud, Guillebault et al. 2000; Taylor 1987). Chromosomes The dinokaryon is physically very large, and is easily discernable in most cells. The most certain identifiers are the packed heterochromatin rods that remain condensed and visible at all stages of the cell cycle. Describing dinoflagellate chromosomes has posed a challenge over the years and we still lack a comprehensive model of their architecture, but these features are clear: Dinoflagellate chromatin is arranged in a series of stacked, 9 nested arches arranged around a dense core stabilized by structural R N A and Ca and M g 2 + ions(Herzog and Soyer 1983). Segments of coding D N A loop off of the main chromosome structure during transcription, and tuck back in when their genes are transcriptionally inactive (Sigee 1983). Instead of coding regions being arranged in 'chunks' along the chromosomes, as in many eukaryotic chromosomes, they exist on the periphery of the chromosomes, interspersed with basic 'histone-like' proteins (Sala-Rovira, Geraud et al. 1991) surrounding a core of highly methylated, transcriptionally inactive D N A (Herzog, Soyer et al. 1982). Stained, condensed eukaryotic chromosomes show banding patterns (karyotypes) because G C rich coding regions stain more darkly than G C poorer non-coding regions (Craig and Bickmore 1993), but dinoflagellate chromosomes do not show this classic banding pattern, which supports the idea that coding D N A is homogeneously arranged on the surface of its chromosomes. D N A Content A large volume nucleus and large chromosomes can just as easily be attributed to a large supporting cast of proteins as to genetic material, but D A P I staining and flow-cytometry confirms that indeed there is an abundance of D N A in most dinoflagellates. Nor is the amount due to polyploidy, as is often suggested; dinoflagellates are haploid (Sparrow, Price et al. 1972). Dinoflagellate genomes in the 'normal' eukaryotic size range do exist, but most dinoflagellate genomes are between 15,000 - 40,000 Mbp (4-1 Ox the size of the human genome) (Table 1.2) and span a wide range of l ,300Mbp - 196,000Mbp. There are some indications that Dinoflagellate genomes may exhibit large amounts of repeated D N A , high G C content, and up to 70% of modified bases in the form of 5-hydroxymethyluracil(Herzog, Soyer et al. 1982). In eukaryotes, smaller amounts of 10 methylation are characteristic of heterochromatin. The large proportion of'methylated heterochromatin' and the lack of classic histone - based D N A organization in dinoflagellates suggests a structural function for methylated D N A . 1.5 Expansion and Reduction of Eukaryotic Genomes There are several things we have learned from studying genomes that are reduced as a result of selective pressures (parasitism as an example). With parasites, the ability to derive nutrients from the host allows loss of metabolic genes, while the pressure for faster generation times can select for smaller genomes. Small genomes tend to have fewer genes and / or be more compressed. They have smaller intergenic spaces, smaller genes (at higher gene density), sometimes share regulatory regions or are co-transcribed, often have smaller and fewer introns, fewer repeated elements, fewer transposons, and tend to have high A T content. Large genomes can show the opposite characteristics: large amounts of repeated D N A , transposons / mobile elements, larger regulatory regions allowing more complex transcriptional regulation of a more elaborate proteome. The genes themselves may also be longer because they contain more protein-protein interaction motifs. Larger genomes tend to have more introns, larger introns, and large intergenic spaces, so while their genomes have increased greatly in size from the addition of non-coding sequence, their proteome sizes have remained much the same. 1.6 Investigating the Dinoflagellate Genome Despite consistently strong interest in dinoflagellate molecular biology, we remain limited in our knowledge of the molecular characteristics of the dinoflagellate nuclear 11 genome because of the enormous size of the dinokaryon. Dinoflagellate genomes are large enough to make sequencing a representative genome completely unrealistic even with modern technology and resources. Classic knowledge of eukaryotic D N A organization and processing does not apply to dinoflagellates, and basic questions about D N A compaction, transcription, gene regulation, cell division, genome replication and composition are yet unanswered. I was interested in learning about how dinoflagellate genomes are organized, and I approached the question in three different ways. First I studied a dinoflagellate that in addition to its own genome, contained four other genomes. This allowed me to learn about organization and reduction of organelle genomes. Then I looked specifically for introns in the nuclear genes, one characteristic of large genomes. Finally for the big picture I built a library of genomic D N A fragments from H. triquetra to get a basic understanding of the overall content of dino genomes in the context of their size and genetic content. 12 L I T E R A T U R E C I T E D Barbrook, A. C. and C. J. Howe (2000). Minicircular plastid D N A in the dinoflagellate amphidinium operculatum. Mol Gen Genet 263(1): 152-158. Bhaud, Y., D. Guillebault, et al. (2000). Morphology and behaviour of dinoflagellate chromosomes during the cell cycle and mitosis. J Cell Sci 113 ( Pt 7): 1231-1239. Costas, E. and V. Goyanes (2005). Architecture and evolution of dinoflagellate chromosomes: A n enigmatic origin. Cytogenet Genome Res 109(1-3): 268-275. Craig, J. M. and W. A. Bickmore (1993). Chromosome bands-flavours to savour. Bioessays 15(5): 349-354. Dodge, J. D. (1985). Atlas of dinoflagellates : A scanning electron microscope survey. London, Farrand. Dodge, J. D. and R. M. Crawford (1969). Observations on the fine structure of the eyespot and associated organelles in the dinoflagellate glenodinium foliaceum. J. Cell. Sci. 5(2): 479-493. Graham, L. E. and L. W. Wilcox (2000). Algae. Upper Saddler River, NJ , Prentice Hall . 13 Hamkalo, B. A. and J. B. Rattner (1977). The structure of mesokaryote chromosome. Chromosoma 60(1): 39-47. Herzog, M. and M. O. Soyer (1981). Distinctive features of dinoflagellate chromatin. Absence of nucleosomes in a primitive species prorocentrum micans e. Eur J Cell Biol 23(2): 295-302. Herzog, M. and M . O. Soyer (1983). The native structure of dinoflagellate chromosomes and their stabilization by ca2+ and mg2+ cations. Eur J Cell Biol 30(1): 33-41. Herzog, M., M. O. Soyer, et al. (1982). A high level of thymine replacement by 5-hydroxymethyluracil in nuclear D N A of the primitive dinoflagellate prorocentrum micans e. Eur J Cell Biol 21 (2): 151-155. Jeffrey, S. W. and M. Vesk (1976). Further evidence for a membrane-bound endosymbiont within the dinoflagellate peridinium foliaceum. J. Phycol. 12: 450-455. Keeling, P. J. (2004). Diversity and evolutionary history of plastids and their hosts. American Journal of Botany 91: 1481 -1493. Keeling, P. J., N. M. Fast, et al. (2005). Comparative genomics of microsporidia. Folia Parasitol (Praha) 52(1-2): 8-14. 14 Knowlton, N. and F. Rohwer (2003). Multispecies microbial mutualisms on coral reefs: The host as a habitat. Am Nat 162(4 Suppl): S51-62. Laatsch, T., S. Zauner, et al. (2004). Plastid-derived single gene minicircles of the dinoflagellate ceratium horridum are localized in the nucleus. Mol Biol Evol 21(7): 1318-1322. Lajeunesse, T. C , G. Lambert, et al. (2005). Symbiodinium (pyrrhophyta) genome sizes ( D N A content) are smallest among dinoflagellates. Journal of Phycology 41: 880-886. Lin, S., H. Zhang, et al. (2002). Widespread and extensive editing of mitochondrial mrnas in dinoflagellates. J Mol Biol 320(4): 727-739. Moreno Diaz de la Espina, S., E. Alverca, et al. (2005). Organization of the genome and gene expression in a nuclear environment lacking histones and nucleosomes: The amazing dinoflagellates. Eur J Cell Biol 84(2-3): 137-149. Morse, D., P. Salois, et al. (1995). A nuclear-encoded form i i rubisco in dinoflagellates. Science 268(5217): 1622-1624. 15 Okamoto, O. K., L. Liu, et al. (2001). Members of a dinoflagellate luciferase gene family differ in synonymous substitution rates. Biochemistry 40(51): 15862-15868. Palmer, J. D. (1995). Rubisco rules fall; gene transfer triumphs. Bioessays 17(12): 1005-1008. Patron, N. J., R. F. Waller, et al. (2006). A tertiary plastid uses genes from two endosymbionts. J Mol Biol 357(5): 1373-1382. Rizzo, P. J. (2003). Those amazing dinoflagellate chromosomes. Cell Res 13(4): 215-217. Rowan, R., S. M. Whitney, et al. (1996). Rubisco in marine symbiotic dinoflagellates: Form ii enzymes in eukaryotic oxygenic phototrophs encoded by a nuclear multigene family. Plant Cell 8(3): 539-553. Sala-Rovira, M., M. L. Geraud, et al. (1991). Molecular cloning and immunolocalization of two variants of the major basic nuclear protein (hcc) from the histone-less eukaryote crypthecodinium cohnii (pyrrhophyta). Chromosoma 100(8): 510-518. 16 Saldarriaga, J. F., F. J. R. Taylor, et al. (2001). Dinoflagellate nuclear ssu rrna phylogeny suggests multiple plastid losses and replacements. Journal of Molecular Evolution 53: 204-213. Sigee, D. C. (1983). Structural D N A and genetically active D N A in dinoflagellate chromosomes. Biosystems 16(3-4): 203-210. Sparrow, A . H., H. J. Price, et al. (1972). A survey of D N A content per cell and per chromosome of prokaryotic and eukaryotic organisms: Some evolutionary considerations. Brookhaven Symp Biol 23: 451-494. Spector, D. L. and R. E. Triemer (1981). Chromosome structure and mitosis in the dinoflagellates: A n ultrastructural approach to an evolutionary problem. Biosystems 14(3-4): 289-298. Tamura, M. , S. Shimada, et al. (2005). Galeidiniium rugatum gen. Et sp. Nov. (dinophyceae), a new coccoid dinoflagellate with a diatom endosymbiont. Journal of Phycology 41: 658-671. Taylor, F. J. R. (1987). The biology of dinoflagellates. Oxford, Blackwell Scientific. Tomas, R. and E. R. Cox (1973). Observations on the symbiosis of peridinium balticum and its intracellular alga i : Ultrastructure. / . Phycol. 9: 304-323. 17 Van Dolah, F. M. (2000). Marine algal toxins: Origins, health effects, and their increased occurrence. Environ Health Perspect 108 Suppl 1: 133-141. Wang, Y. and D. Morse (2006). Rampant polyuridylylation of plastid gene transcripts in the dinoflagellate lingulodinium. Nucleic Acids Res 34(2): 613-619. Wong, J. T., D. C. New, et al. (2003). Histone-like proteins of the dinoflagellate crypthecodinium cohnii have homologies to bacterial DNA-binding proteins. Eukaryot Cell 2(3): 646-650. Yoshikawa, T., A . Uchida, et al. (1996). There are 4 introns in the gene coding the DNA-binding protein hcc of crypthecodinium cohnii (dinophyceae). Fisheries Sci 62(2): 204-209. Zhang, Z., B. R. Green, et al. (1999). Single gene circles in dinoflagellate chloroplast genomes. Nature 400(6740): 155-159. 18 C H A P T E R II: O R G A N E L L E I N T E G R A T I O N I N KRYPTOPERIDINIUMFOLIACEUM 2.1 Introduction* Most photosynthetic dinoflagellates have a peridinin-containing plastid derived from a secondary endosymbiosis with a red alga, but some have replaced this plastid with another type. Lepididinium has undergone what has been termed a serial secondary endosymbiosis by acquiring a green algal plastid (Watanabe, Suda et al. 1990; Watanabe, Takeda et al. 1987). A t least three (Schnepf and Elbrachter 1999) groups have acquired tertiary plastids by forming an endosymbiotic partnership with other secondary algae: 1) Dinophysis has a cryptophyte-derived plastid; 2) Karenia and Karlodinium have haptophyte-derived plastids; 3) Kryptoperidinium and Durinskia have a diatom-derived plastid (Chesnick, Kooistra et al. 1997; Chesnick, Morden et al. 1996; Hewes, Mitchell et al. 1998; Schnepf and Elbrachter 1988; Tangen and BJ0rnland 1981; Tengs, Dahlberg et al. 2000). Kryptoperidinium foliaceum (previously placed in the genera Glenodinium and Peridinium) was originally noted for its dual nuclei, and then for containing the typical chrysophyte pigment fucoxanthin (Dodge 1971; Withers and Haxo 1975). Morphological observations and phylogenetic analyses of the large and small subunits of plastid ribulose-1, 5-bisphosphate carboxylase (rbcL and rbcS) demonstrated the existence of a membrane-bound endosymbiont and suggested it was of diatom origin (Jeffrey and Vesk 1976) (Kite and Dodge 1985) (Chesnick, Morden et al. 1996). Subsequently, small subunit ribosomal R N A (SSU r R N A ) phylogenies placed the A version of this chapter has been published. McEwan, M . L . & Keeling, P.J. 2004. "HSP90, Tubulin and Actin are Retained in the Tertiary Endosymbiont Genome of Kryptoperidinium foliaceum," The Journal of Eukaryotic Microbiology, (51) 6, 651-659. 19 endosymbiont ancestrally among the pennate diatoms (Chesnick, Kooistra et al. 1997). A closely related species, Durinskia baltica, has also been shown to contain such a plastid (as Peridinium balticum; (Tomas and Cox 1973). The endosymbiosis appears to be permanent and obligatory, and these two species are now thought to be the product of a single endosymbiosis (Inagaki, Dacks et al. 2000). Interestingly, K. foliaceum has retained a tri-membrane carotene-containing eyespot organelle that has been interpreted as a relict of the original, tri-membrane peridinin-containing plastid (Dodge 1983; Dodge and Crawford 1969). Unfortunately, there is little molecular data from either the host or symbiont of K. foliaceum. Yet, it is a remarkable system to study the physical and genetic reduction characteristic of endosymbionts because it appears to be in the early stages of this process. While the endosymbiont appears to be obligately intracellular, it is far less reduced than the endosymbionts of either cryptophytes or chlorarachniophytes (Fine and Loeblich 1976) (Jeffrey and Vesk 1976) (Morri l l and Loeblich 1977). The K. foliaceum endosymbiont has lost motility and its distinctive diatom wall , and the endosymbiont nuclear genome of its sister species, D. baltica, appears to divide amitotically (Tippit and Pickett-Heaps 1976). Nevertheless, the symbiont has retained a comparatively large cytoplasmic space and nucleus. It is unclear whether the nucleus should be referred to as a nucleomorph, so for the present it w i l l be called the endosymbiont nucleus. Mitochondria, which have been lost in all other plastid-endosymbionts, are retained (Kite, Rothschild et al. 1988; Rizzo and Cox 1976; 1977). K. foliaceum is accordingly quite complex at the genome level: it contains two nuclear genomes, two mitochondrial genomes, and at least one plastid genome. Moreover, the diatom endosymbiont 20 apparently resides in the host cytoplasm (since there is only one membrane separating the host and endosymbiont nuclei), in contrast to the cryptophyte and chlorarachniophyte endosymbionts, which reside in the host endomembrane system (Eschbach, Speth et al. 1990; Gilson 2001). These differences and the relative rarity of such endosymbiotic events make K. foliaceum an interesting point of comparison with the much better studied cryptophyte and chlorarachniophytes. I have sequenced the first protein coding genes from the nucleus of the endosymbiont, together with homologues from the host genome and from other diatoms. I examined four genes with interesting functional implications and varying presence or absence in cryptophyte and chlorarachniophyte nucleomorph genomes: heat shock protein 90 (HSP90, which is present in both nucleomorphs), alpha- and beta-tubulin (which are present in cryptophytes but apparently not in chlorarachniophytes), and actin (which is present in neither nucleomorph). 2.2 Materials and Methods DNA isolation, amplification and sequencing: Cultures of Kryptoperidinium foliaceum (Center for Culture of Marine Phytoplankton, C C M P 1326) and Nitzschia thermalis (Canadian Centre for the Culture of Microorganisms, C C C M 608) were maintained in f/2-Si medium on a 13/11 light/dark cycle. K. foliaceum was kept at room temperature (23-25 °C); N . thermalis at 16 °C. Cells were harvested by centrifugation and D N A purified using the DNeasy Plant D N A isolation kit (Qiagen, Mississauga, ON) . D N A from Phaeodactylum tricornutum ( C C M P 630) was kindly provided by J. T. Harper. Alpha-tubulin, beta-tubulin, HSP90, and actin genes were amplified usingthe following 21 primers: 5 ' - T C C G A A T T C A R G T N G G N A A Y G C N G G Y T G G G A - 3 ' and 5'-C G C G C C A T N C C Y T C N C C N A C R T A C C A - 3 * (alpha-tubulin), 5'-G C C T G C A G G N C A R T G Y G G N A A Y C A - 3 ' or 5'-T C C T C G A G T R A A Y T C C A T Y T C R T C C A T - 3 ' and 5'-C A G G T C G G T C A R T G Y G G N A A - 3 ' (beta-tubulin), plus diatom specific beta-tubulin primers 5 ' - A T B G C K G C N G C M G T N T G Y G G N C A T A - 3 ' and 5'-C C A C G T C T C C T G S A C R G C V G T G G T - 3 ' . 5'-G T C A A G C A Y T T Y W S N G T N G A R G G N C A - 3 ' , 5'-G G A G C C T G A T H A A Y A C N T T Y T A - 3 ' , and 5'-G T C C C G C A G N G C Y T G N G C Y T T C A T D A T - 3 ' (HSP90), and 5'-G A G A A G A T G A C N C A R A T H A T G T T Y G A - 3 ' , and 5'-G G C C T G G A A R C A Y i T N C G R T G N A C - 3 ' (actin) . P G R products were cloned using pCR2.1 T O P O cloning kit (Invitrogen, Burlington, ON), and both strands sequenced using B i g Dye terminator chemistry. New sequences have been deposited into GenBank, accession numbers AY713387- AY713398. Phylogenetic analysis: Conceptual translations were added to existing amino acid alignments (Keeling and Leander 2003; Saldarriaga, M c E w a n et al. 2003). Thalassiosira pseudonana ( C C M P 1335) alpha- and beta-tubulin, actin, and HSP90 sequences were assembled from genome sequence data, produced by the US Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/). Alignments consisted of 50, 47, 54, and 44 sequences, and 375, 363, 505, and 206 unambiguously aligned sites for alpha-tubulin, beta-tubulin, HSP 90, and actin, respectively (available upon request). Phylogenetic trees were inferred for all four 22 individual gene data sets using distance and maximum likelihood. T R E E - P U Z Z L E 5.0 (Schmidt, Strimmer et al. 2002) was used to calculate a distance matrix under the Whelan and Goldman ( W A G ) model of substitution frequencies, and each was corrected for site-to-site rate variation using a discrete gamma distribution with 8 variable rate categories plus one invariable category. The amino acid frequencies, proportion of invariable sites, and shape parameter alpha were estimated from the data with T R E E - P U Z Z L E 5.0. Alpha parameters (a) and proportion of invariable sites (i) for alpha-tubulin, beta-tubulin, HSP90, and actin were a=0.63, 0.44, 0.85, 0.47 and i=0.25, 0.00, 0.09, 0.11, respectively. Distance trees were inferred from each distance matrix by weighted neighbour-joining . (WNJ, Weighbor 1.0.1) (Bruno, Socci et al. 2000) and Fitch-Margoliash (F ITCH, Fitch 3.572) (Felsenstein 1997). For all four data sets 100 (Fitch Margoliash) or 500 (weighted neighbor-joining) bootstrap replicates were analysed using P U Z Z L E B O O T (A. Roger, M . Holder, http://www.tree-puzzle.de) with the alpha shape parameter and the proportion of invariable sites from the original data. Protein maximum-likelihood ( M L ) trees were inferred for all four data sets using P r o M L 3.6 (Felsenstein 1997) under the Jones, Taylor, and Thornton (JTT) substitution frequency matrix with global rearrangements and two input order jumbles. Site-to-site rate variation was modeled on a gamma distribution with four variable rate categories and one invariable category. Rates and frequencies were estimated using T R E E - P U Z Z L E 5.0. Maximum-likelihood bootstrapping was performed with 100 replicates, only one category of sites, global rearrangements, and one jumble. 23 2.3 Results and Discussion Host and endosymbiont nuclear-encoded protein coding genes from Kryptoperidinium foliaceum: Altogether, eight new homologues of alpha-tubulin, beta-tubulin, actin, and HSP90 were characterised from K. foliaceum, as well as alpha-tubulin, actin, and HSP90 from the pennate diatom N. thermalis and actin from P. tricornutum. The phylogenies of these four genes were constructed to distinguish homologues from the host and endosymbiont nuclear genomes of K. foliceum. The overall characteristics of these phylogenies resembled those seen in previous analyses (Edgcomb, Roger et al. 2001; Keeling and Leander 2003; Saldarriaga, McEwan et al. 2003; Stechmann and Cavalier-Smith 2003), and most importantly the dinoflagellates and diatoms consistently formed strongly supported clades. Three distinct alpha-tubulin genes were amplified from K. foliaceum, two of which were highly similar at the amino acid level. In alpha-tubulin phylogeny (Figure 2.1), the two similar K. foliaceum sequences branched strongly within the dinoflagellate clade (97--100%), while the third gene branched with moderate to strong support in the diatom clade (63—96%)), specifically with the pennate diatom N. thermalis (97—100%)). This pattern is precisely that predicted for a host genome origin for the two similar genes and an endosymbiont genome origin for the third. The beta-tubulin phylogeny included two similar K. foliaceum genes (Figure 2.2) and showed both sequences branching with strong support in the dinoflagellate clade (90--98%>), to the exclusion of Oxyrrhis and all other species. In this case, both of these sequences are predicted to have originated in the host genome. Given the presence of an endosymbiont-derived alpha-tubulin, however, it is likely that an endosymbiont-derived 24 beta-tubulin does exist, but may be too divergent to amplify, despite much effort and multiple primer-sets. This notion is supported by the slightly divergent nature of the pennate diatom alpha-tubulins (since the two often co-evolve), the divergent nature of the centric diatom beta-tubulins (which, together with brown algae, do not branch with oomycete heterokonts), and the failure to amplify the beta-tubulin from N. thermalis or the K. foliaceum endosymbiont with universal or heterokont-specific primers, nor any combination of the two. A single K. foliaceum actin sequence was characterised. In actin phylogenies (Figure 2.3) this gene grouped within a moderately supported heterokont clade (70—83%), specifically within the strongly supported diatom subgroup (97—100%). Within the diatoms, the centric and pennate types do not form discrete clades, but the K. foliaceum sequence does branch weakly with that of the pennate N. thermalis, altogether strongly supporting an endosymbiont origin for this gene. Lastly, two HSP90 genes were characterised from K. foliaceum. In the HSP90 phylogeny (Figure 2.4), these branched with strong support within the dinoflagellate (94—100%) and diatom (94—100%) clades, respectively. Once again, this is indicative of a host and endosymbiont origin for these genes. Overall, the HSP90 phylogeny is the most well supported of the four analysed and the most consistent with well established relationships: dinoflagellates were sisters to apicomplexans, and the alveolate clade including ciliates was resolved with high bootstrap support. Similarly, the second HSP90 copy showed a strong affinity to the diatoms (although to neither centric nor pennate forms), which grouped with other heterokonts. 25 48 8 4 Heterocapsa 49 58 891 triquetra 100/97/99 70 67 74 97/97/95 99/98/91 L[I Heterocapsa rotundata Amphidinium herdmanii — Karenia brew's K. foliaceum I K. foliaceum II 97 98 93 97 98 Perkinsus marinus Oxyrrhis marina Dino-flagellates + Perkinsus and Oxyrrhis Toxoplasma gondii Eimeria acervulina Plasmodium falciparum '— Oxytricha granulifera Blepharisma japonicum Loxodes striatus 86 91 I 8 7 r - -• Tetrahymena thermophila Euplotes octocarinatus 100/98/1001— Arabidopsis thaliana Orysa sativa t l Zea mays Anemia phyllitidis 88/71/941 Chloromonas sp. Chlamydomonas reinhardtii Reclinomonas americana Guillardia thetan (host) • Goniomonas truncata Apicomplexans Ciliates Plants Cryptophytes Green Algae Bigelowiella natans 100/98/IQOi 76 67 83 Streblomastix strix - Pyrsonympha grandis 99/97/99; 97 98 98 -LZ — Euglena gracilis Naegleha gruberi — Acrasis rosea Trypanosoma brucei Leishmania donovani Jakoba libera 100 97 100 96 92 63! 100/97/100 82/84/88 Nitzschia thermalis I N. thermalis II K. foliaceum III L- Thalassiosira pseudonana Pelvetia fastigiata • J a c o b i d Cercozoan Oxymonads Euglenozoa Heteroloboseans J a c o b i d Heterokonts (pennate diatoms) 3 (centric) Brown algae 100 97 100 40/40/55 98/98/99 PC Homo sapiens — Paracentrotus lividus 100 98 96 62 Drosophila melanogaster t Rhizophydium sp. Nowakowskiella hemisphaerospora - Karlingomyces sp. 100/97/1001—- Trichomonas vaginalis Monocercomonas Jakoba incarcerata 3 Jacob id t l Animals Fungi Parabasalids Giardia intestinalis 3 Diplomonad 0.1 Figure 2.1: Alpha-tubulin maximum likelihood (ML) phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), Fitch-Margoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. Three K. foliaceum genes are shown in black boxes. Two of these group strongly within dinoflagellates, and the third groups strongly with the pennate diatoms. 26 Peridinium willei 98/94/90 59 44 73/75/-1— Gyrodinium instriatum Amphidinium corpulentum Gymnodinium varians - Crypthecodinium cohnii Kryptoperidinium foliaceum I Heterocapsa triquetra Kryptoperidinum foliaceum II Oxyrrhis marina Dinoflagellates + Oxyrrhis 100/95/100/ 99/97/94 Thalassiosira J pseudonana 1_ Thalassiosira weissflogii , Ectocarpus variabilis} H e t e r o k o n t s { d i a t o m s + a b r o w n a l g a e ) 96 96 85 - Tetrahymena pyriformis 59 62/581 Eimeria tenella Toxoplasma gondii Babesia bovis 93 93/79 r 84 85/62T — Plasmodium falciparum Colpoda sp. Euplotes focardii Euplotes crassus Perkinsus marinus Stylonychia mytilus Ciliate Apicomplexans Ciliates + Perkinsus 91 94 83/66/83T 100/97/1001 / 3T Zea mays - Pisum sativum - Solanum tuberosum Chlamydomonas reinhardtii 100/97/1001 Leishmania mexicana Trypanosoma brucei 94 701 Guillardia theta (host) Plants and Green Algae Trypanosomatids Naegleria gruberi . Euglena gracilis 100/97/100r~ Bigelowiella natans ' Lotharella amoeboformis 93 85 74 64 61 Cercomonas sp. 18 100/96/1001 Phytophthora cinnamomi 100/97/991 I— Achlya klebsiana I— Pythium ultimum Streblomastix stn'x 83/86/921 Drosophila melanogaster 96/86/82 100 " 98 — Homo sapiens Caenorhabditis elegans Harpochythum sp. Spizellomyces punctatus j Cryptophyte Heterolobosean Euglenid Cercozoa Heterokonts (Oomycetes) Oxy monad Animals Fungi 100/96/100T 93 95 Giardia intestinalis Spironucleus barkhanus Trichomitis bafracnor Trichomonas vaginalis urachor ' t Diplomonads P a r a b a s a l a 0.1 Figure 2.2: Beta-tubulin M L phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), Fitch-Margoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. Both Kryptoperidnium foliaceum sequences appear to be host-derived, grouping strongly within the dinoflagellates. 2 7 Prorocentrum minimum \— Heterocapsa triquetra Karenia brevis Amphidinium carterae Peridinium willei Crypthecodinium cohnii Oxyrrhis marina Toxoplasma gondii Plasmodium falciparum Cryptosporidium parvum Dinoflagellates + Perkinsus and Oxyrrhis Apicomplexans PhytopMhora megasperma K. foliaceum Nitzschia thermalis Thalassiosira pseudonana Phaeodactylum tricornutum Cyanophora paradoxa 99/96/92 89/75/81 75 81 Bigelowiella natans Lotharella amoeboformis Heterolobosean Plants and Green Algae 3 Glaucocystophyte Cercozoa Heterokonts (Oomycetes) Heterokonts (Brown algae) Heterokonts (Diatoms) Cercomonas sp. 62 60 56 I Guillardia theta Drosophila melanogaster - Xenopus laevis Homo sapiens Acanthamoeba castellani Dictyostelium discoideum 3 Cryptophyte Animals 56 r 61 73 74 67 77 70/60 77 75 84 Ajellomyces capsulatus Schizosaccharomyces pombe — Puccinia graminis Xanthophyllomyces dendrorhous Pneumocystis carinii Amoebae and Slime Molds Fungi 0.1 Figure 2.3: Actin M L phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), Fitch-Margoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. Apicomplexans, dinoflagellates, and heterokonts are strongly supported in this analysis, and their positions are in accordance with well-established relationships. Ciliates were excluded from this analysis as they are highly divergent and polyphyletic in actin (Keeling 2001). The Kryptoperidinium foliaceum sequence groups strongly within the heterokonts. 28 100 94 95 97 94 100 100 94 100 61 63 66 69 Heterocapsa triquetra Lessardia elongata Prorocentrum micans Kryptoperidinium foliaceum <— Crypthecodinium cohnii Oxyrrhis marina Perkinsus marinus Dinoflagellates 100/99/100/ r Babesia bovis] Theileria parva Cryptosporidium parvum 80 72 66 100 97 100 • Eimeria tenella Plasmodium falciparum 80/83/541 Tetrahymena pyriformis 1 M — Tetrahymena bergeri '— Tetrahymena thermophita — Paramecium tetraurelia Upi-Icomplexans 100/98/100f 100/94/100 100/93/100 100 96 99 r Halteria grandineila — Blepharisma intermedium Thalassiosira pseudonana" Phaeodactylum tricomutum Ciliates Kryptoperidinium foliaceum 100 93 100 100 96 91 Achlya ambisexualis Phytophthora infestans 88/58/791 Lycopersicon esculentum I — Arabidopsis thaliana Oryza sativa — Triticum aestivum — Zea mays Ipomoea nil — Chlamydomonas reinhardtii 100 96 100 Pedinomonas sp. Streblomastix strix 100/98/931— Gallus gallus 68/73/77I L Homo sapiens alpha Heterokonts (diatoms) Heterokonts (oomycetes) Plants and Green Algae ] Oxymonad 100 94 100 100 97/98 Danio rerio Oncorhynchus tshawyfscha 100 97 100 I 100 98 ioor Homo sapiens beta - Brugia pahangi Caenorhabditis elegans 100 94 100 80 88 76 100 95 , - Drosophila auraria Anopheles sp. I Podospora anserina I Neurospora crassa Animals 100 96/100 r - Ajellomyces capsulatus Candida sp. Candida albicans Schizosaccharomyces pombe 100 93 100 100 93 100 Dictyostelium discoideum 100/97/iqoi Leishmania mexicana ' Leishmania donovoni 98 97 _ t £ 0 _ — Trypanosoma brucei Trypanosoma cruzi Bodo saltans • Diplonema papillatum Fungi J S l ime Mold Euglenozoa Figure 2.4: HSP90 M L phylogeny. Bootstrap values shown for weighted neighbor-joining (left, top), Fitch-Margoliash (centre), and maximum likelihood (right, bottom). Major lineages are boxed and labeled to the right. One Kryptoperidinium foliaceum sequence shows affinity to dinoflagellates, the other to diatoms. 29 In summary, the phylogenies support the conclusion that K. foliaceum possesses alpha-tubulin and HSP90 genes of both host and endosymbiont origin, as well as at least host beta-tubulin and endosymbiont actin. In analyses including both centric (Thalassiosira) and pennate (Nitzschia and Phaeodactylum) diatoms, endosymbiont-derived K. foliaceum genes showed an affinity to the pennate diatoms over the centric forms in alpha-tubulin and actin, and no affinity in HSP90. Overall, this is in agreement with previous molecular studies based on endosymbiont nuclear S S U r R N A , which suggested a pennate diatom ancestry for the K. foliaceum endosymbiont (Chesnick, Kooistra et al. 1997). Structural features of host- and endosymbiont-derived genes in Kryptoperidinium foliaceum: The phylogenetic history of a gene provides a strong inference for its location in the cell, but it remains possible that endosymbiont-derived genes are encoded in the host nuclear genome, and vice versa. Without physically localizing a gene, the genome in which it is encoded wi l l always be in some doubt, but there are certain characteristics that have been used successfully, together with phylogenetic history, to provide a very accurate prediction. Cryptophyte and chlorarachniophyte nucleomorph genes exhibit several physical characteristics distinct from host nuclear genes that allow these genes to be distinguished. Chlorarachniophyte nucleomorph genes have minute, 18- to 20-bp introns that readily distinguish them from those of the host nuclear genes (an estimated 168-bp on average) (Gilson and McFadden 2002). The difference between cryptophyte nuclear (50- to 74-bp) and nucleomorph (42- to 52-bp) introns is not as pronounced, but still somewhat useful (Douglas, Zauner et al. 2001). Chlorarachniophyte and cryptophyte nucleomorph genes also exhibit a strong AT-bias, so that nucleomorph and nuclear genes can be as much as 30 25% different in overall AT-content (Douglas, Zauner et al. 2001; Gilson and McFadden 2002). Unfortunately, none of the K. foliaceum genes characterised here contained introns, but the AT-content of each K. foliaceum sequence was calculated to determine i f there was a bias between dinoflagellate-derived genes versus diatom-derived genes (Table .1). The mean A T content of the five dinoflagellate-identified genes is 42.5%, ranging from 39.2—46.6%. The three diatom-identified sequences have a mean A T content of 48.8% (47.0—51.6%). Accordingly, there is an AT-bias of approximately 6% in the diatom-derived genes, and no overlap in the ranges of AT-content between the two sets of genes. This bias supports the conclusion that the two phylogenetic classes of genes in K. foliaceum likely reside in different genomes: the dinoflagellate genes in the host nucleus and the diatom genes in the endosymbiont nucleus. While the 'total-nucleotides' A T bias (7%) may be less than that observed in cryptophyte and chlorarachniophyte genomes, A T bias in 'third-position-only' shows a larger separation between host and endosymbiont genes (18%). The diatom-identified genes from K. foliaceum have a mean third position A T content of 36.7%, ranging from 35.2—38.8%. In contrast, mean A T content at third position of the dinoflagellate-identified genes is 18.9%>, ranging from 11.5—23.6% (for a difference of approximately 18% between host and endosymbiont genes). A codon bias analysis was also done, but no clear trends were observed. Codon biases and A T biases are complimentary and potentially useful tools for sorting sequence data; a 6% or 18% A T bias allows for preliminary predictions about the location of genes encoded in K . foliaceum. 31 Table 2.1: AT content of dinoflagellate and diatom protein coding genes AT content (%) Gene Inferred Source Total 3 r d position Dinoflagellates HSP 90 Host 46.62 17.79 Beta-tubulin Host 42.93 23.65 Host 43.10 23.39 Alpha-tubulin Host 39.15 11.46 Host 40.88 17.19 Alpha-tubulin Endosymbiont 46.96 38.80 Actin Endosymbiont 47.81 35.25 HSP90 Endosymbiont 51.63 36.00 Diatoms Source Alpha Nitzschia thermalis 1 47.26 38.99 Nitzschia thermalis 2 46.15 35.44 Actin Nitzschia thermalis 48.91 40.16 Phaeodactylum 46.17 32.79 Functional implications of endosymbiont genes. The K. foliaceum plastid endosymbiont presents a unique opportunity to study the early stages of endosymbiotic genome reduction from a functional perspective. Based on the pennate diatom identity of the endosymbiont and its current wall-less, immobile, amitotic state, one would expect that the endosymbiont genome has lost many genes, especially those related to wall structure and deposition, motility, and mitotic nuclear division. However, compared to the nucleomorphs of cryptophytes and chlorarachniophytes, the K. foliaceum endosymbiont nucleus appears to be at a relatively early stage of reduction, which is consistent with our observations that it retains genes that have been lost by nucleomorphs. The presence of HSP90 is the least surprising of the four genes and does not give much insight into endosymbiont reduction, since both cryptophyte (Douglas, Zauner et al. 32 2001)and chlorarachniophyte (unpublished data) nucleomorphs also retain HSP90 genes in their genomes. HSP90 plays a wide variety of roles in eukaryotic cytoplasm, including as a chaperone in protein folding, and because of this, it is probably one of the last proteins lost in a degrading endosymbiont. Act in and the tubulins, on the other hand, are not always retained, and their presence in K. folicaeum is of more interest. Alpha- and beta-tubulin are component parts of microtubules, whose activity is most apparent in the flagella, mitotic spindle, and cytoskeleton. Act in is also a prominent protein in the cytoskeleton and it plays a central role in the gliding motility of diatoms. Alpha-, beta-, and gamma-tubulins are present in the nucleomorph of the cryptophyte G. theta (Keeling, Deane et al. 1999), but are apparently absent from choraraehniophyte endosymbionts (unpublished data). Act in , on the other hand, appears to be absent from both endosymbionts (although an actin gene of possible endosymbiont origin has been found in the host nucleus of the cryptophyte Pyrenomonas helgolandii (Stibitz, Keeling et al. 2000). In the K. foliaceum endosymbiont, while there is no gliding motility, cell wall or frustule, and presumably no mitosis, both alpha-tubulin and actin remain. The presence of actin is especially interesting because it has been lost in both cryptophyte and chlorarachniophyte nucleomorphs, and because it is a major component of diatom motility (Poulsen, Spector et al. 1999). Its possible function in the K. foliaceum endosymbiont is not clear, but its presence hints that the general cytoskeleton of the K. foliaceum endosymbiont is less reduced than those of either cryptophytes or chlorarachniophytes. The endosymbionts of K. foliaceum and D. baltica and their division have been extensively studied by electron microscopy, and no direct evidence of microtubules has been found during division or 33 growth phases (Tippit and Pickett-Heaps 1976). Microtubules were also notably absent in and around the endosymbiont nuclei during sexual reproduction in D. baltica (Chesnick and Cox 1987). In G. theta, where tubulins are also present despite a lack of observed microtubules, it has been suggested that they may fulfill some alternate biological role that does not require microtubules (Keeling, Deane et al. 1999). Alternatively, both K. foliaceum and G. theta endosymbionts may contain microtubules that are highly specialized and appear for short periods of the life cycle, or in restricted numbers and size, making them very difficult to detect. In D. baltica, for example, chromatin condensation and 'crystalline rod' formation in the endosymbiont nucleus were observed after sexual fusion of both host and symbiont cells (as P. balticum. (Chesnick and Cox 1987)). The absence of tubulins from the chlorarachniophyte nucleomorph genome leaves open the possibility that microtubules can be discarded long before the complete reduction of the endosymbiont. Implications of genetic reduction in Kryptoperidinium foliaceum: While the K. foliaceum endosymbiont has undergone some reduction (i.e. loss of cell wall and presumed amitosis), the question remains as to whether it has undergone any genetic reduction at all . In other integrated endosymbiotic systems, genetic reduction has partially or completely occurred through some combination of gene loss and transfer, where the products of transferred genes are targeted back to the appropriate compartment. Transfers of plastid-targeted genes and the mechanism by which their products are targeted to the organelle are well known in both primary and secondary plastids (McFadden 1999). In contrast, gene transfer and plastid targeting in tertiary plastids are relatively unknown, and K. foliaceum presents a unique case for targeting: instead of 34 being contained in the endomembrane system of the host, freeze-fracture microscopy suggests that the single membrane that separates the symbiont from the host cytoplasm is actually derived from the outer membrane of the symbiont itself (Eschbach, Speth et al. 1990). If this is true, the product of any gene that is transferred to the host nucleus must first return to the endosymbiont cytoplasm by an entirely unique method of targeting, perhaps analogous to pinocytosis by the endosymbiont. Determining whether any such transfers to the host nucleus have occurred wi l l potentially provide an important comparison with the better-studied secondary plastids, as it could represent an entirely novel solution to the problem of endosymbiont protein trafficking. 35 L I T E R A T U R E C I T E D Bruno, W. J., N. D. Socci, et al. (2000). Weighted neighbor joining: A likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 17(1): 189-197. Chesnick, J. M. and E. R. Cox (1987). Synchronized sexuality of an algal symbiont and its dinoflagellate host, peridinium balticum (levander) lemmermann. Biosystems 21(1): 69-78. Chesnick, J. M., W. H. Kooistra, et al. (1997). Ribosomal rna analysis indicates a benthic pennate diatom ancestry for the endosymbionts of the dinoflagellates peridinium foliaceum and peridinium balticum (pyrrhophyta). J. Eukaryot: Microbiol. 44(4): 314-320. Chesnick, J. M., C. W. Morden, et al. (1996). Identity of the endosymbiont of peridinium foliaceum (pyrrophyta): Analysis of the rbcls operon. J. Phycol. 32(5): 850-857. Dodge, J. D. (1971). A dinoflagellate with both a mesocaryotic and a eucaryotic nucleus. I. Fine structure of the nuclei. Protoplasma 73(2): 145-157. Dodge, J. D. (1983). The functional and phylogenetic significance of dinoflagellate eyespots. Biosystems 16(3-4): 259-267. Dodge, J. D. and R. M. Crawford (1969). Observations on the fine structure of the eyespot and associated organelles in the dinoflagellate glenodinium foliaceum. J. Cell. Sci. 5(2): 479-493. Douglas, S., S. Zauner, et al. (2001). The highly reduced genome of an enslaved algal nucleus. Nature 410(6832): 1091-1096. Edgcomb, V. P., A. J. Roger, et al. (2001). Evolutionary relationships among "jakobid" flagellates as indicated by alpha- and beta-tubulin phylogenies. Mol. Biol. Evol. 18(4): 514-522. Eschbach, S., V. Speth, et al. (1990). Freeze-fracture study of the single membrane between host-cell and endocytobiont in the dinoflagellates glenodinium foliaceum and peridinium balticum. J. Phycol. 26(2): 324-328. Felsenstein, J. (1997). A n alternating least squares approach to inferring phylogenies from pairwise distances. Syst Biol 46(1): 101-111. 36 Fine, K. E. and A . R. Loeblich (1976). Endosymbiosis in the marine dinoflagellate kryptoperidinium foliaceum. J Protozool 23(2): A 8 . Gilson, P. R. (2001). Nucleomorph genomes: Much ado about practically nothing. Genome Biol. Reviews 2(8): 1022. Gilson, P. R. and G. I. McFadden (2002). Jam packed genomes—a preliminary, comparative analysis of nucleomorphs. Genetica 115(1): 13-28. Hewes, C. D., B. G. Mitchell, et al. (1998). The phycobilin signatures of chloroplasts from three dinoflagellate species: A microanalytical study of dinophysis caudata, d. Fortii, and d. Acuminata (dinophysiales, dinophyceae). J. Phycol. 34(6): 945-951. Inagaki, Y., J. B. Dacks, et al. (2000). Evolutionary relationship between dinoflagellates bearing obligate diatom endosymbionts: Insight into tertiary endosymbiosis. Int. J. Syst. Evol. Microbiol. 50(6): 2075-2081. Jeffrey, S. W. and M. Vesk (1976). Further evidence for a membrane-bound endosymbiont within the dinoflagellate peridinium foliaceum. J. Phycol. 12: 450-455. Keeling, P. J. (2001). Foraminifera and cercozoa are related in actin phylogeny: Two orphans find a home? Mol. Biol. Evol. 18(8): 1551-1557. Keeling, P. J., J. A . Deane, et al. (1999). The secondary endosymbiont of the cryptomonad guillardia theta contains alpha-, beta-, and gamma-tubulin genes. Mol. Biol. Evol. 16(9): 1308-1313. Keeling, P. J. and B. S. Leander (2003). Characterisation of a non-canonical genetic code in the oxymonad streblomastix strix. J. Mol. Biol. 326(5): 1337-1349. Kite, G. C. and J. D. Dodge (1985). Structural organization of plastid D N A in two anomalously pigmented dinoflagellates. J. Phycol. 21(1): 50-56. Kite, G. C , L. J. Rothschild, et al. (1988). Nuclear and plastid dnas from the binucleate dinoflagellates glenodinium (peridinium) foliaceum and peridinium balticum. Biosystems 21(2): 151-163. McFadden, G. I. (1999). Plastids and protein targeting. J. Eukaryot. Microbiol. 46(4): 339-346. Morrill, L. C. and A . R. Loeblich (1977). Studies of photo-heterotrophy in binucleate dinoflagellate kryptoperidinium foliaceum. J. Phyc. 13: 46. Poulsen, N. C , I. Spector, et al. (1999). Diatom gliding is the result of an actin-myosin motility system. Cell Motil Cytoskeleton 44(1): 23-33. 37 Rizzo, P. J. and E. R. Cox (1976). Isolation and properties of nuclei from binucleate dinoflagellates. J. Phyc. 12: 31. Rizzo, P. J. and E. R. Cox (1977). Histone occurrence in chromatin from peridinium balticum, a binucleate dinoflagellate. Science 198(4323): 1258-1260. Saldarriaga, J. F., M. L. McEwan, et al. (2003). Multiple protein phylogenies show that oxyrrhis marina and perkinsus marinus are early branches of the dinoflagellate lineage. Int J Syst Evol Microbiol 53(Pt 1): 355-365. Schmidt, H. A., K. Strimmer, et al. (2002). Tree-puzzle: Maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18(3): 502-504. Schnepf, E. and M. Elbrachter (1988). Cryptophycean-like double membrane-bound chloroplast in the dinoflagellate, dinophysis ehrenb - evolutionary, phylogenetic and toxicological implications. Bot. Acta. 101(2): 196-203. Schnepf, E. and M. Elbrachter (1999). Dinophyte chloroplasts and phylogeny - a review. Grana 38(2-3): 81-97. Stechmann, A. and T. Cavalier-Smith (2003). Phylogenetic analysis of eukaryotes using heat-shock protein hsp90. J. Mol. Evol. 57(4): 408-419. Stibitz, T. B., P. J. Keeling, et al. (2000). Symbiotic origin of a novel actin gene in the cryptophyte pyrenomonas helgolandii. Mol. Biol. Evol. 17(11): 1731-1738. Tangen, K. and T. Bj0rnland (1981). Observations on pigment and morphology of gyrodinium aureolum hulbert, a marine dinoflagellate containing 19'-hexanoyloxyfucoxanthin as the main carotenoid. J. Plankton Res. 3: 389-401. Tengs, T., O. J. Dahlberg, et al. (2000). Phylogenetic analyses indicate that the 19'hexanoyloxy-fucoxanthin-containing dinoflagellates have tertiary plastids of haptophyte origin. Mol. Biol. Evol. 17(5): 718-729. Tippit, D. H. and J. D. Pickett-Heaps (1976). Apparent amitosis in the binucleate dinoflagellate peridinium balticum. J. Cell. Sci. 21(2): 273-289. Tomas, R. and E. R. Cox (1973). Observations on the symbiosis of peridinium balticum and its intracellular alga i : Ultrastructure. J. Phycol. 9: 304-323. Watanabe, M . M., S. Suda, et al. (1990). Lepidodinium viride gen et sp. Nov. (gymnodiniales, dinophyta), a green dinoflagellate with a chlorophyll a- and b-containing endosymbiont. J. Phycol. 26: 741 - 751. 38 Watanabe, M. M., Y. Takeda, et al. (1987). A green dinoflagellate with chlorophylls and b: Morphology, fine structure of the chloroplast and chlorophyll composition. J. Phycol. 23: 382 - 389. Withers, N. and F. T. Haxo (1975). Chlorophyll c l and c2 and extraplastidic carotenoids in the dinoflagellate, peridinium foliaceum stein. Plant Sci. Lett. 5: 7-15. C H A P T E R III: I N T R O N D I S T R I B U T I O N I N D I N O F L A G E L L A T E S 3.1 Introduction The bulk of eukaryotic genome sequence is made up of non-coding D N A . In large genomes, not only is there more intergenic D N A , genes themselves can be larger than their small-genome counterparts. The average size of a human gene is 5795bp, but 4455bp of this is intron and U T R sequence, leaving a coding sequence that is only 23% of the gene (IHGSConsortium, Lander et al. 2001). Genes (including exons, introns, 5' and 3' UTRs) can expand in two main ways. The first is by acquiring new coding domains, the benefit of which is to allow more complex protein-protein interactions (alternative splicing also allows this). Second, genes can expand their non-coding portions: U T R s and introns. One of the oldest theories about eukaryotic genome expansion is growth by the invasion of introns. Exactly how (and when) introns first infiltrated eukaryotic genes has been a hotly debated issue for decades, and wi l l likely continue this way for some time (recent reviews: (Lynch and Richardson 2002; Roy and Gilbert 2006)). However, some aspects of intron evolution are beyond debate, and the clearest of these is the observation that nearly every single spliceosomal intron in any eukaryotic gene has the same boundary sequences -"canonical" G T | A G splice sites that allow efficient splicing by a spliceosome that arose in an ancient eukaryotic ancestor long before the adaptive radiation of its descendents. Known distribution of non-canonical introns: Among the diverse eukaryotic species that exist today, there are few exceptions to this canonical intron rule. A recent survey of 22489 EST-confirmed mammalian introns in the N C B I database revealed 98.71%) with 40 canonical G T j A G boundaries (Burset, Seledtsov et al. 2000). In the Arabidopsis genome, approximately 99.45% of 26961 introns have G T | A G boundaries. Fifty of the remaining introns were of the minor U12-dependent spliceosomal type ( A T | A C ) and only two other non-canonical introns: A T | A A and G T | A T were found (Zhu and Brendel 2003). Apart from these two comprehensive studies, research on the existence of non-canonical introns in other organisms is scarce. Table 3.1 summarizes the known distribution to date. Introns in Dinoflagellates: Although introns are not abundant in dinoflagellates, sixteen have been found in six dinoflagellate genes (Table 3.1). O f these sixteen, only three are canonical. One of these canonical introns occurs in a type II ribulose 1,6 bisphosphatase (rubisco) subunit (rbcG) that also contains a second, non-canonical intron. The first six non-canonical dinoflagellate introns were found in the nuclear-encoded type II rubisco large subunit (rbcL). Type II rubisco is of bacterial origin and has only been found in dinoflagellates (Symbiodinium, Gymnodinium polyedra aka. Lingulodinium polyedrum, H. triquetra, Prorocentrum minimum and Amphidinium carterae). A l l other eukaryotes use the eukaryotic type I rubisco protein (Morse, Salois et al. 1995; Patron, Waller et al. 2005; Rowan, Whitney et al. 1996; Zhang and L i n 2003). Four introns, all non-canonical, were found in the basic histone-like protein (HCc) gene in Crypthecodinium cohnii (Yoshikawa, Uchida et al. 1996). In the Pyrocystis lunula luciferase gene there is one non-canonical intron (Okamoto, L i u et al. 2001). I cloned and sequenced an actin gene from Heterocapsa rotundata which contained a 46bp G A | T G non-canonical intron. This gene (AF482409) was submitted to genbank along with a batch of others including two dinoflagellate genes with one canonical intron each (Saldarriaga, M c E w a n et al. 2003). Finally, Perkinsus marinus 41 Table 3.1: Eukaryotic non-canonical introns, Group Species Gene Boundary Sequences # Introns (size) Reference Plants Arabidopsis A T A A 1 (Zhu and Brendel 2003) G T A T 1 Human G T G G 1 (Burset, Seledtsov et al. 2000) T T A G 1 Dinoflagellates Crypthecodinium cohnii H C c A G G C 1 (135) (Yoshikawa, Uchida et al. 1996) A G G G 1 (188) C C G C 1 (150) G G G A 1 (152) Symbiodinium sp. rbcA G C A G 4(124, 163, 163,163) (Rowan, Whitney et al. 1996) G A A G 2(211,211) rbcG G A G G * 1 (456) Pyrocystis lunula luciferase A T T C 1 (403) (Okamoto, L i u etal . 2001) Heterocapsa rotundata actin G A T G * 1(46) (Saldarriaga, McEwan et al. 2003) Jakobid Malawimonas jakobiformis CCTdelta C T A G 1(61) (Archibald, O ' K e l l y e t a l . 2002) Diplomonad Giardia intestinalis ferredoxin C T A G 1(35) (Nixon, Wang et al. 2002) Chytrid Karlingiomyces sp beta-tubulin C T A G 3 (52, 80, 65) (Keeling 2003) Table 3.1: Non-canonical introns in eukaryotes. Boundary sequences denote 5' then 3' dinucleotide splicing signals. The two human boundary sequences are from a stringently analysed dataset of 126 EST-confirmed non-canonical introns (the rest were ATAC- type U12-dependent non-canonical introns). Total EST-supported Arabidopsis introns (including canonical): 26961. A n asterisk* denotes a non-EST-confirmed intron. 42 which lies at the base of the dinoflagellate clade, has canonical G T | A G introns in its nuclear-encoded, mitochondrial-targeted superoxide dismutase genes sodl (4 introns) and sod2 (5 introns). These have been compared to Toxoplasma gondii sod genes and one intron has a similar location. The P. marinus sod genes seem to be more apicomplexan-like than dinoflagellate-like (Schott, Robledo et al. 2003) which suggests that non-canonical introns in dinoflagellates appeared in the dinoflagellate lineage sometime after the common ancestor of P. marinus and higher dinoflagellates, though it is unclear whether by boundary mutation or insertion of new non-canonical introns. The collective dinoflagellate intron data shows that while dinoflagellate introns are neither particularly abundant nor particularly rare, those that do exist are non-canonical by majority. Research on dinoflagellate introns: Unti l recent dinoflagellate E S T databases aimed at characterising the dinoflagellate transcriptome (Hackett, Scheetz et al. 2005; Patron, Waller et al. 2005; Patron, Waller et al. 2006; Yoon, Hackett et al. 2005), there had been no large-scale source of gene data for dinoflagellates, and no intron-specific searches had been done in dinoflagellates. This is not very surprising when one considers the type of 'needle in a haystack' search that is required. Screening a dinoflagellate genome library means screening, on average, 15,000 - 40,000 Mbp of sequence, most of it non-coding D N A . An EST-based approach: One of the common characteristics of large genomes is an increased number of introns, or increased intron size. I was interested in investigating / this possibility in dinoflagellates. I was also interested in the implications of a potentially large set of non-canonical introns. B y using the triquetra and K. micrum EST databases, we are able to eliminate intergenic D N A from the screening process while 43 maintaining a good sample of the total coding D N A . The basic strategy, expanded upon in the methods section below, uses exact sequences of c D N A s (representing spliced m R N A ) from the H. triquetra and K. micrum E S T libraries to amplify coding sequences from genomic D N A . The presence of introns increases the size of the amplified fragment in relation to the known size of the spliced m R N A sequence. This size-based test is the initial screen for potential intron-containing genes. 3.2 Materials and Methods Library construction and sequencing: H. triquetra and K. micrum expressed sequence tag (EST) libraries were made by J. Archibald, N . Patron and R. Waller, sequenced and partially annotated (Patron, Waller et al. 2005; Patron, Waller et al. 2006). Culture Maintenance: K. micrum ( C C M P 415) and H. triquetra ( C C M P 449) were maintained in f/2 - Si medium at 18°C on a 16/8h light/dark cycle. Cells were subcultured approximately every 10-20 days, checked visually under a light microscope for signs of obvious bacterial contamination, and harvested periodically by centrifugation. They were either immediately used for D N A or R N A isolation, or frozen at - 2 0 ° C for up to several months. Primer design: 63 exact-match primer sets were designed to 5' and 3' ends of c D N A sequences (35 and 28 sets, respectively) from the H. triquetra and K. micrum est library projects. Primers were tested with a variety of online tools (NetPrimer www.premierbiosoft.com, Gene Walker www.cybergene.se) to avoid A T / G C imbalance, primer dimers and hairpins. 44 RNA and DNA isolation and amplification: R N A was isolated using a standard Trizol protocol, preceded by grinding in liquid nitrogen. D N A was isolated as above (Chapter 2 methods) D N A and R N A were measured for quantity and contamination with a spectrophotometer, and contaminated or low yield samples were discarded. Samples were stored at - 2 0 ° C . Primer sets were used to amplify gene fragments from D N A and R N A by P C R and rtPCR. Fragments were size-separated by electrophoresis on a 0.8% agarose gel. This step selects for amplified fragments that could contain introns. D N A Bands larger than the expected 'intron free' size (based on comparison to parallel R N A amplification or to known c D N A sequence length including primers) were excised, cleaned using M o B i o UltraClean 2.0 kit and cloned as above (Chapter2 methods). Following cloning, colonies with inserts were screened by colony-PCR for insert size and plasmids were isolated and sequenced as above (Chapter2 methods). Sequences were trimmed and analysed using Sequencher 4.2. 3.3 Results and Discussion Sixty-three (35 and 28) attempted D N A amplifications of H. triquetra and K. micrum sequences yielded 18 (13 and 5) bands that appeared larger than expected size, which were excised and cloned. Screens of 4-12 colonies from each cloning reaction (ligation) yielded 19 (11 and 8) inserts of appropriate size for sequencing. No introns, insertions, or deletions were found in any of these D N A sequences. These findings show that introns in dinoflagellates are not as abundant as we had thought. 45 Intron distribution in dinoflagellates: It is possible that introns are only present in some dinoflagellates, or that they are more common in the species listed in Table 3.1 than they are in H. triquetra and K. micrum. Therefore, for dinoflagellates at least, it is not appropriate to speculate that because introns are present in some species, they are likely to be present in all (or even most) species. Introns as a contributor to genome growth in dinoflagellates: The evidence also suggests that intron acquisition is not a large contributor to the expansive amounts of D N A in dinoflagellate genomes. Dinoflagellate genome expansion has likely been the result of several other processes of genome evolution (See Chapter 4). . Introns in plastid-targeted genes: Because I found no introns, I am unable to address questions regarding the proportion of canonical and non-canonical boundaries in a given dinoflagellate genome. However, I have made some interesting observations about previously known presences and absences of non-canonical introns. I note that all of the dinoflagellate genes in Table 3.1, except actin, are present in multicopy tandem repeats (Okamoto, L i u et al. 2001; Rowan, Whitney et al. 1996; Yoshikawa, Uchida et al. 1996), Several of the genes I chose to amplify from genomic D N A were plastid-targeted genes, but the amplified fragments appeared no larger than the c D N A sequences and contained no introns. Also , the non-canonical intron-containing H. rotundata actin gene is not plastid-targeted. Therefore, i f introns (canonical or not) are more prevalent in nuclear-encoded, plastid-targeted dinoflagellate genes, I have found no evidence of this. Introns in multicopy, tandemly repeated genes: Regarding potential association of introns in genes encolded as tandem repeats, luciferase and rubisco genes from L. polyedra are both present in multicopy tandem repeats, but neither contains introns (see 46 Table 3.1 for those that do). P. minimium also has a multicopy tandemly repeated rubisco with no introns (Zhang and L i n 2003). This shows that loss (L. polyedra, P. minimum) or gain (P. lunula, Symbiodinium) of introns is independent of the presence of gene repeats. Intron loss and gain in dinoflagellate evolution: While it appears that introns were present ancestrally in early dinoflagellates and alveolates(Schott, Robledo et al. 2003), there have likely been many losses since then, as well as possible gains of non-canonical introns, or mutation of ancestral intron splice sites. There is not enough data across phylogenetic groups of dinoflagellates to assess whether intron gain is likely to have occurred. 47 L I T E R A T U R E C I T E D Archibald, J. M., C. J. O'Kelly, et al. (2002). The chaperonin genes ofjakobid and jakobid-like flagellates: Implications for eukaryotic evolution. Mol Biol Evol 19(4): 422-431. Burset, M. , I. A. Seledtsov, et al. (2000). Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28(21): 4364-4375. Hackett, J. D., T. E. Scheetz, et al. (2005). Insights into a dinoflagellate genome through expressed sequence tag analysis. BMC Genomics 6(1): 80. IHGSConsortium, I. H. G. S., E. S. Lander, et al. (2001). Initial sequencing and analysis of the human genome. Nature 409(6822): 860-921. Keeling, P. J. (2003). Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet Biol 38(3): 298-309. Lynch, M. and A. O. Richardson (2002). The evolution of spliceosomal introns. Curr Opin Genet Dev 12(6): 701-710. 48 Morse, D., P. Salois, et al. (1995). A nuclear-encoded form ii rubisco in dinoflagellates. Science 268(5217): 1622-1624. Nixon, J. E., A. Wang, et al. (2002). A spliceosomal intron in giardia lamblia. Proc Natl Acad Sci USA 99(6): 3701-3705. Okamoto, O. K., L. Liu, et al. (2001). Members of a dinoflagellate luciferase gene family differ in synonymous substitution rates. Biochemistry 40(51): 15862-15868. Patron, N. J., R. F. Waller, et al. (2005). Complex protein targeting to dinoflagellate plastids. J Mol Biol 348(4): 1015-1024. Patron, N. J., R. F. Waller, et al. (2006). A tertiary plastid uses genes from two endosymbionts. J Mol Biol 357(5): 1373-1382. Rowan, R., S. M . Whitney, et al. (1996). Rubisco in marine symbiotic dinoflagellates: Form ii enzymes in eukaryotic oxygenic phototrophs encoded by a nuclear multigene family. Plant Cell 8(3): 539-553. Roy, S. W. and W. Gilbert (2006). The evolution of spliceosomal introns: Patterns, puzzles and progress. Nat Rev Genet 7(3): 211-221. 49 Saldarriaga, J. F., M. L. McEwan, et al. (2003). Multiple protein phylogenies show that oxyrrhis marina and perkinsus marinus are early branches of the dinoflagellate lineage. Int. J. Syst. Evol. Microbiol. 53(Pt 1): 355-365. Saldarriaga, J. F., M. L. McEwan, et al. (2003). Multiple protein phylogenies show that oxyrrhis marina and perkinsus marinus are early branches of the dinoflagellate lineage. Int J Syst Evol Microbiol 53(Pt 1): 355-365. Schott, E. J., J. A. Robledo, et al. (2003). Gene organization and homology modeling of two iron superoxide dismutases of the early branching protist perkinsus marinus. Gene 309(1): 1-9. Yoon, H. S., J. D. Hackett, et al. (2005). Tertiary endosymbiosis driven genome evolution in dinoflagellate algae. Mol Biol Evol 22(5): 1299-1308. Yoshikawa, T., A. Uchida, et al. (1996). There are 4 introns in the gene coding the DNA-binding protein hcc of crypthecodinium cohnii (dinophyceae). Fisheries Sci 62(2): 204-209. Zhang, H. and S. Lin (2003). Complex gene structure of the form i i rubisco in the dinoflagellate prorocentrum minimum (dinophyceae). Journal of Phycology 39(6): 1160-1171. 50 Zhu, W. and V. Brendel (2003). Identification, characterization and molecular phylogeny of ul2-dependent introns in the arabidopsis thaliana genome. Nucleic Acids Res 31(15): 4561-4572. 51 C H A P T E R 4: A G E N O M E S U R V E Y OF HETEROCAPSA TRIQUETRA 4.1 Introduction We know from studying highly reduced genomes that when a cell is pressured to decrease its genetic content, it employs many methods to do so. Some methods affect coding D N A , some affect non-coding D N A , and some affect both. Microsporidians, for example, have very short intergenic regions, they have shrunk their introns or lost them completely, and they have even lost many genes (Keeling and Slamovits 2005). B y reversing the case of genome reduction, we immediately have expectations for ways genomes can expand: we expect more introns, larger intergenic regions, and more genes. These expectations are backed up by numerous studies (Alexandrov, Troukhan et al. 2006; Andolfatto 2005; Boeva, Regnier et al. 2006; Cavalier-Smith 2005; Lynch 2006). In comparison to prokaryotes, eukaryotic genomes are large, but gene-poor (Table 4.1). In the human genome, exon sequence makes up less than 2% of the total genome, and 50% of the genome is repetitive non-coding sequence (IHGSConsortium, Lander et al. 2001). A t 2.4Mb, the largest human gene is larger than many prokaryotic genomes, has at least 78 introns, but only 2300kb of coding sequence (Den Dunnen, Grootscholten et al. 1992; Tennyson, Klamut et al. 1995; Ussery and Hal l in 2004). If, as in humans, H. triquetra exons make up only 2% of its genome, what does the rest of the D N A look like, and what is it doing? Surveying a genomic library allows us to analyse features of non-coding D N A that are most prevalent in a genome. I have constructed a genomic library of the H. triquetra genome for this purpose. 52 Table 4.1: Eukaryotic genome sizes. This table with references is reproduced from (Keeling and Slamovits 2005) Organism Major Eukaryotic Group size (Mb) Reference Gonyaulax polyedra Dinoflagellate 98,000 (Shuter, Thomas etal. 1983) Heterocapsa pygmaea Dinoflagellate 4,450 (Triplett, Jovine et al. 1993) Toxoplasma gondii Apicomplexan 87 (Blaxterand Ivens 1999) Plasmodium falciparum Apicomplexan 23 (Gardner, Hall et al. 2002) Cryptosporidium parvum Apicomplexan 9 (Spano and Crisanti 2000) Paramecium caudatum Ciliate 8,600 (Shuter, Thomas et al. 1983) Thalassiosira pseudonana Diatom 32 (Armbrust, Berges et al. 2004) Coscinodiscus asteromphalus Diatom 25,000 (Shuter, Thomas et al. 1983) Amoeba proteus Amoeba 290,000 (Friz 1968) Amoeba dubia Amoeba 670,000 (Friz 1968) Dictyostelium discoideum Slime Mold 34 (Glockner, Eichinger et al. 2002) Entamoeba histolytica Archamoeba <20 http://www.sanger.ac.uk/ Projects/E-histolytica/ Trichomonas vaginalis Parabasalian 60-80 http://www.tigr.org/tdb/ e2kl/tvg/ Trypanosoma sp. Kinetoplastid 39 (El-Sayed, Hegde et al. 2000) Leishmania major Kinetoplastid 33 (Myler, Sisk et al. 2000) Cyanidioschyzon merolae Red Alga 16 (Matsuzaki, Misumi et al. 2004) Guillardia theta (nucleomorph) Red Alga (cryptophyte) 0.55 (Douglas, Zauner et al. 2001) Chlamydomonas reinhardtii Green Alga 100 (Harris 1993) Ostreococcus tauri Green Alga (picoeukaryote) 10 (Courties, Perasso etal. 1998) Bigelowiella natans Green Alga (chlorarachniophyte) 0.38 (Gilson and McFadden 2002) Oryza sativa Plant 430 (Arumuganathan and Earle 1991) Zea mays Plant 3,000 (Arumuganathan and Earle 1991) Arabidopsis thaliana Plant 125 (Yu, Wright et al. 2000) Mouse Animal 2,500 (Waterston, Lindblad-Toh et al. 2002) Human Animal 2,900 (Waterston, Lander et al. 2002) Fugu rubripes Animal 365 (Aparicio, Chapman et al. 2002) Drosophila melanogaster Animal 137 (Adams, Celniker et al. 2000) Ciona inteslinalis Animal 156 (Dehal, Satou et al. 2002) Saccharomyces cerevisiae Fungus 12 (Blandin, Durrens et al. 2000) Cryptococcus neoformans Fungus 20 (Wickes, Moore et al. 1994) Neurospora crassa Fungus 43 (Schulte, Becker et al. 2002) Encephalitozoon inteslinalis Microsporidian 2.3 (Peyretaillade, Biderre et al. 1998) Encephalitozoon cuniculi Microsporidian 2.9 (Katinka, Duprat et al. 2001) Antonospora locustae Microsporidian 5.4 (Streett 1994) Spraguea lophii Microsporidian 6.2 (Biderre, Pages et al. 1994) Glugea antherinae Microsporidian 19.5 (Biderre, Pages et al. 1994) 53 Genome Growth by Gene Enrichment: Gene-dense chromosomal areas in eukaryoi.es are usually comparatively G C rich (Gardiner 1995). Some have claimed that dinoflagellate genomes are high in G C (Herzog, Soyer et al. 1982; Rizzo and Cox 1976), which suggests increased gene density. If H. triquetra's large genome is due to an increase in gene density, we could expect to see a high overall G C content, and one that is similar to the G C content of the E S T database. With high gene-density we would also expect, by chance, to find some identifiable ORFs and/or multicopy genes. Genome Growth by Repetition: repetitive sequence can be the result of several processes. Recombination can duplicate regions of genes or chromosomes, and mobile genetic elements often replicate portions of themselves and other genes, leaving copies of themselves or conserved insert sequences behind as they move around a genome. Repeated sequence from mobile elements appears as short or long stretches of complex sequence scattered throughout the genome. In contrast, structural D N A often bears the signature of zones of low complexity repeated sequence. With this library it w i l l be nearly impossible to address recombination because we wi l l not be able to map large sections of the genome, but we can easily recognize low complexity repeats and the signature of mobile elements in complex, nearly exact repeats. Transposons and Retrotransposons: Transposons are mobile genetic elements that insert and excise themselves from D N A . They encode a transposase (which may or may not be sequence- specific) that performs a sticky-end insertion. They have short direct repeats on either side of their insertion site as a result of the process, and these can remain behind after the transposon has moved on to another location. In this way, they can cause permanent insertion-mutations. There are two types of Retrotransposons, Long Terminal 54 Repeat (LTR) and non-LTR retrotransposons. L INEs (long interspersed nuclear elements) and SINEs (short interspersed nuclear elements) are two types of non-LTR retrotransposons. L INEs encode genes with reverse transcriptase and integrase functionality that allow them to reverse-transcribe themselves and other mobile elements (like SINEs) that do not have autoreplicative abilities. L INEs are initially inserted into a genome by reverse-transcribing themselves from R N A into D N A , then inserting into existing D N A with their integrase functionality. Subsequently, the host genome transcribes the new L I N E element, and reverse transcribed D N A copies of m R N A can be inserted somewhere else in the genome. In this way, L INEs can rapidly proliferate throughout a genome, replicating themselves and increasing D N A content with each new insertion. LINEs can also reverse-transcribe other m R N A s into D N A and insert them into genomic D N A . L T R retrotransposons also replicate by copying and inserting themselves in genomes. They are recognizable by their long terminal repeat regions, which can be hundreds or thousands of bases long. The proliferation of L T R retrotransposons and L I N E s can contribute to the rapid expansion of genomes. L T R replication is tempered by recombination between repeat regions, which can lead to deletion of the element (Biemont and Vieira 2005; Labrador and Corces 1997) (Hua-Van, Le Rouzic et al. 2005). Mobile elements do more than just increase total genomic size; depending on where they insert, they can disrupt genes or regulatory regions, introduce regulatory sequences to new locations, or cause permanent mutations. Their effect on genome evolution should not be underestimated. 5 5 The first question when vast quantities of D N A don't code for anything is often "What could it possibly be doing?". It has become clear in recent years that an important function of repetitive D N A is not necessarily to "do" something, but to "be" somewhere. This requires a shift in the way we think about D N A , from a code-centric view to the consideration of D N A as physical elements. When we make this shift, we recognise the intrinsic functionality of structural D N A and non-coding D N A . Suddenly, taking up space no longer seems wasteful. Shotgun Sequencing as a Strategy for Genome Exploration: Based on an average insert size of 1Kb, 200 colonies represents only ~200kb of sequence, or 0.0009% of the estimated size of the H. triquetra genome. H. triquetra has an estimated 21,000 M i l l i o n bases, and sequencing it entirely is far beyond the scope of this project. The closest anyone has come to a genome-wide survey of dinoflagellates was screening a genomic Crypthecodinium cohnii D N A library by reverse hybridization to estimate the repetitiveness of the genome (Moreau, Geraud et al. 1998). In fact, there is still no intent by any group, to sequence any but the tiniest dinoflagellate genomes (LaJeunesse, Lambert et al. 2005). This genomic library is the only feasible way to get a small glimpse-in-pieces of what the genome as a whole is like. There is not nearly enough coverage to build contigs confidently, but this is nonetheless a worthwhile project, since a sequence-based survey of dinoflagellate genomic D N A has never been undertaken. 4.2 Materials and Methods Cell culture and DNA extraction: H. triquetra cells were grown as above, scaled up to 4L quantities over several months and harvested as described above (Chapter 3 Methods). 56 D N A and R N A extractions were performed on the cells immediately following grinding in liquid nitrogen. D N A from several extractions was measured individually with a spectrophotometer and the best quality, most highly concentrated samples were pooled for the genomic D N A library. Library Building: Invitrogen P C R 4-Blunt T O P O Shotgun Cloning kit with nebulizers was used to make the library. 25ug of Buffered total D N A was physically sheared with nebulizers for 40 seconds at lOpsi, and run on a 1.0% agarose gel to determine size range. Resulting fragments 1000-4000 bp were precipitated and stored overnight at -20°C, then used in subsequent steps as per kit instructions. It is important to note that all steps following precipitation of sheared D N A fragments should be done continuously without storage of intermediates to eliminate issues arising from degradation of blunt ends and consequent problems with efficient vector ligation. I did 25ul Blunt end repair reactions (yielding lOul resuspended B E R D N A ) in order to keep volumes low for downstream ligation and cloning steps. I serially diluted, ligated, cloned, and plated a subset of the B E R D N A and stored the remaining 7ul overnight. I used the full 7ul the following morning in a single large ligation at 1/3 dilution and 10 separate transformations, resulting in approximately 30 plates each containing 50-100 white (transformed) colonies for a rough total of 2200 colonies with an insert. Screening: I screened 550 colonies using the Fidelitaq colony P C R protocol and agarose gel electrophoresis for large (>1000bp) inserts. Approximately 200 of these colonies contained >1000bp inserts. Those between 1000 and 1300bp were sequenced in one direction as above (Chapter2 Methods), and both strands of (35) inserts >1300bp were sequenced. A small number of inserts screened (8) were between 2000 and 3000bp. 57 Sequences were vector-trimmed and auto-assembled into contigs using Sequencher. Where bi-directional sequencing did not cover the full length of the insert, exact-match primers for internal sequencing walks were made to cover the gap. (These sequencing walks are currently being done by Raheel Humayun to complete 32 large fragments). Sequence analysis: Sequences were trimmed and edited for quality in Sequencher 4.2, and exported using Fetch 5.0.5 (http://fetchsoftworks.com) to the lec 'cowpie' server (http://ernie.botany.ubc.ca/lec/lec.html) which runs blastx and blastn through the N C B I server, and similarity searches against sequences in the local database, returning top hits and links. 4.3 Results and Discussion Facts and Features ofH. triquetra genomic DNA library: Sequence editing produced 208 sequences (annotated A0001 - A0214), average length 1 lOObp, ranging from 335bp-2027bp (Figure 4.1). These sequences total 228Kb of unambiguous sequence (231Kb with 1.3% ambiguous bases) and represent 198 genomic D N A (gDNA) fragments (some fragments are made up of two sequences). The largest completely sequenced fragment (A0197) is 2027bp with 45.8% G C . There are no obvious repeats in A0197 (dot plot not shown) and it has no similarity to any other fragments in the local database or at N C B I . The largest unfinished fragment is estimated at 3200bp. 58 90 80 H 70 60 e I 50 PH O 40 J O £ 3 2: 30 20 10 Size Distribution of Heterocapsa triquetra Genomic D N A Fragments 79 17 inn 11 39 3 2 _ n n A 3 ^  r ^ ^ < n v o r - ; 0 0 0 ; 0 ' - - ; f N r n ^ - < / ^ v o r > o o o \ 0 ' - i O O O O O O O ^ — — < - - - - ' - - - - ~ ( N <N Fragment Size (Kb) Figure 4.1: Histogram showing the distribution of fragments by size. 5 9 GC Content: The H.triquetra genome contains an estimated 53.56%GC based on 231Kb of D N A sequence from 208 fragments that range from 26.42%-70.14%GC. Most of the fragments have G C contents between 52% and 67% with a mode of 59.5% (Figure 4.2). Fragment A0003 has the lowest % G C at 26.4% and shows high similarity (3 e" 1 4) to plastid-targeted hydroxyproline-rich glycoprotein (mucin) from plants. The next two most AT- r i ch fragments, at 27%GC, encode the only full-length genes identified from the set of fragments, the methyltransferases (A0206 and A0020). The G C estimate for H. triquetra coding D N A is 61.75%) based on l,816,929bp of coding sequence from the EST library (minus poly-A tails (Wang and Morse 2006)). 19 Plastid-targeted genes from the E S T database have an average G C content of 64% (ranging from 59-72%). This data supports the common assertion that coding regions are richer in G and C nucleotides than non-coding regions are. Gene content: 198 fragments rough vector-trimmed fragments were analysed to determine similarity to publicly available sequences. 173/198 were acceptable for analysis (absent of large amounts of ambiguous sequence). O f these 173, 61 fragments had similarity (e"04 or better) to genes on N C B I database. Most of these hits were for short stretches of D N A , 20-150bp. 14 of 61 hit genes coding for bacterial 16S or 23S r R N A , 7 hit other hypothetical bacterial proteins or ORFs, including one proteobacterial gene. A t least 9 fragments had portions of mobile elements. These include integrase (AO 147) and the pol gene (AO 135), both from hits to eukaryotic transposons. Finally, two separate fragments (A0020, A0204R) encode full-length cytosine-dependent D N A methyltransferases. 112 fragments (64.7%) did not show significant similarity to any genes in the N C B I database, though some of these (below) did share repeats with 60 p o r o M3 CN m 00 1—1 © r o CO m r o *t in in in ( N 00 • — i o r o Q\ CN in 00 — © r o V CN r o r o r o in in in CN Percent GC Figure 4.2: Histogram showing the distribution of fragments by G C content. 61 other fragments. This data supports the null hypothesis that genes are sparse in dinoflagellates like in most eukaryotes. Repeats: Several fragments share short regions of nearly perfect sequence similarity with other fragments in the database. These imperfect complex repeats do not appear on the ends of fragments, as would occur i f the fragments were adjacent segments of the genome (or i f they were contaminated with flanking vector sequence), but internally. For example, a 61 bp sequence is shared between AOOlOf and A0009f. The sequence is located at 537-597 of AOOlOf and 605-665 of A0009f and there are 7 mismatches along its length. A similar 150bp sequence is shared between 275-425 of A0079r and 687-838 of A0009f. Note that A0009f contains two different repeat regions. A0090f and A0081r match at 215 sites of a 250bp region in the middle of their fragments. There are at least four other pairs like these, the shared region ranging from 150-400bp. One sequence has a small internal repeat near the 3' end (7bp tandem repeat) and none are entirely made up of repeated sequence. There could be as many as 25000 copies of a single repeat in the C. cohnii genome (Moreau, Geraud et al. 1998), and a set of seven pairs of imperfect complex repeats in this sample suggests that they could be a prevalent feature of dinoflagellate genomes. Transposons: Fragment A0213 showed similarity (8e-17) to an L T R retrotransposon, and AO 169 could itself be an L T R retrotransposon. AO 169 has two nearly exact match rubisco fragments (50 and 25bp). The fragments are 75bp apart and buffered between two lOObp direct repeats by 80bp on one side and 150bp on the other. This fragment may have resulted from internal recombination between the transposon and a rubisco form II polyprotein. O f the few fragments with recognisable sequence, most carry mobile 62 elements or bacterial ribosomal gene fragments. Finding multiple retrotransposon fragments in such a small sample of the genome suggests that they make up a large of the H. triquetra genome. L I T E R A T U R E C I T E D Adams, M. D., S. E. Celniker, et al. (2000). The genome sequence of drosophila melanogaster. Science 287(5461): 2185-2195. Alexandrov, N. N., M. E. Troukhan, et al. (2006). Features of arabidopsis genes and genome discovered using full-length cdnas. Plant Mol Biol 60(1): 69-85. Andolfatto, P. (2005). Adaptive evolution of non-coding D N A in drosophila. Nature 437(7062): 1149-1152. Aparicio, S., J. Chapman, et al. (2002). Whole-genome shotgun assembly and analysis of the genome of fugu rubripes. Science 297(5585): 1301-1310. Armbrust, E. V., J. A. Berges, et al. (2004). The genome of the diatom thalassiosira pseudonana: Ecology, evolution, and metabolism. Science 306(5693): 79-86. Arumuganathan, K. and E. D. Earle (1991). Nuclear D N A content of some important plant species. Plant Mol Biol Rep 9: 208-219. Biderre, C , M. Pages, et al. (1994). On small genomes in eukaryotic organisms: Molecular karyotypes of two microsporidian species (protozoa) parasites of vertebrates. C R Acad Sci III 317(5): 399-404. 64 Biemont, C. and C. Vieira (2005). What transposable elements tell us about genome organization and evolution: The case of drosophila. Cytogenet Genome Res 110(1-4): 25-34. Blandin, G., P. Durrens, et al. (2000). Genomic exploration of the hemiascomycetous yeasts: 4. The genome of saccharomyces cerevisiae revisited. FEBS Lett 487XI): 31-36. Blaxter, M . and A. Ivens (1999). Reports from the cutting edge of parasitic genome analysis. Parasitol Today 15(11): 430-431. Boeva, V., M. Regnier, et al. (2006). Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22(6): 676-684. Cavalier-Smith, T. (2005). Economy, speed and size matter: Evolutionary forces driving nuclear genome miniaturization and expansion. Ann Bot (Lond) 95(1): 147-175. Courties, C , R. Perasso, et al. (1998). Phylogenetic analysis and genome size of ostreococcus tauri (chlorophyta, prasinophyceae). Journal of Phycology 34: 844-849. Dehal, P., Y. Satou, et al. (2002). The draft genome of ciona intestinalis: Insights into chordate and vertebrate origins. Science 298(5601): 2157-2167. 65 Den Dunnen, J. T., P. M. Grootscholten, et al. (1992). Reconstruction of the 2.4 mb human dmd-gene by homologous yac recombination. Hum Mol Genet 1(1): 19-28. Douglas, S., S. Zauner, et al. (2001). The highly reduced genome of an enslaved algal nucleus. Nature 410(6832): 1091-1096. El-Sayed, N. M. , P. Hegde, et al. (2000). The african trypanosome genome. Int J Parasitol 30(4): 329-345. Friz, C. T. (1968). The free amino acid levels of pelomyxa carolinensis, amoeba dubia and proteus. J Protozool 15(1): 149-152. Gardiner, K. (1995). Human genome organization. Curr Opin Genet Dev 5(3): 315-322. Gardner, M. J., N. Hall, et al. (2002). Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419(6906): 498-511. Gilson, P. R. and G. I. McFadden (2002). Jam packed genomes-a preliminary, comparative analysis of nucleomorphs. Genetica 115(1): 13-28. Glockner, G., L. Eichinger, et al. (2002). Sequence and analysis of chromosome 2 of dictyostelium discoideum. Nature 418(6893): 79-85. 66 Harris, E . H. (1993). Chlamydomonas reinhardtii. Genetic maps: A compilation of linkage and restriction maps of genetically studied organisms. S. J. O'Brien. Cold Spring Harbor, New York, Cold Spring Harbor Laboratory: 2156-2169. Herzog, M. , M. O. Soyer, et al. (1982). A high level of thymine replacement by 5-hydroxymethyluracil in nuclear D N A of the primitive dinoflagellate prorocentrum micans e. Eur J Cell Biol 21 (2): 151-155. Hua-Van, A., A. Le Rouzic, et al. (2005). Abundance, distribution and dynamics of retrotransposable elements and transposons: Similarities and differences. Cytogenet Genome Res 110(1-4): 426-440. IHGSConsortium, I. H. G. S., E . S. Lander, et al. (2001). Initial sequencing and analysis of the human genome. Nature 409(6822): 860-921. Katinka, M. D., S. Duprat, et al. (2001). Genome sequence and gene compaction of the eukaryote parasite encephalitozoon cuniculi. Nature 414(6862): 450-453. Keeling, P. J. and C. H. Slamovits (2005). Causes and effects of nuclear genome reduction. Curr Opin Genet Dev 15(6): 601-608. 67 Labrador, M. and V. G. Gorces (1997). Transposable element-host interactions: Regulation of insertion and excision. Annu Rev Genet 31: 381-404. Lajeunesse, T. C., G. Lambert, et al. (2005). Symbiodinium (pyrrhophyta) genome sizes ( D N A content) are smallest among dinoflagellates. Journal of Phycology 41: 880-886. Lynch, M. (2006). The origins of eukaryotic gene structure. Mol Biol Evol 23(2): 450-468. Matsuzaki, M. , O. Misumi, et al. (2004). Genome sequence of the ultrasmall unicellular red alga cyanidioschyzon merolae lOd. Nature 428(6983): 653-657. Moreau, H., M. L. Geraud, et al. (1998). Cloning, characterization and chromosomal localization of a repeated sequence in crypthecodinium cohnii, a marine dinoflagellate. Int Microbiol 1(1): 35-43. Myler, P. J., E. Sisk, et al. (2000). Genomic organization and gene function in leishmania. Biochem Soc Trans 28(5): 527-531. Peyretaillade, E., C. Biderre, et al. (1998). Microsporidian encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a Isu rrna reduced to the universal core. Nucleic Acids Res 26(15): 3513-3520. 68 Rizzo, P. J. and E. R. Cox (1976). Isolation and properties of nuclei from binucleate dinoflagellates. J. Phyc. 12: 31. Schulte, U., I. Becker, et al. (2002). Large scale analysis of sequences from neurospora crassa. J Biotechnol 94(1): 3-13. Shuter, B. J., J. E. Thomas, et al. (1983). Phenotypic correlates of genomic D N A content in unicellular eukaryotes and other cells. American Naturalist 122: 26-54. Spano, F. and A . Crisanti (2000). Cryptosporidium parvum: The many secrets of a small genome. Int J Parasitol 30(4): 553-565. Streett, D. A . (1994). Analysis of nosema locustae (microsporidia) to antonospora locustae n comb. Based on molecular and ultrastructural data. Journal of Eukaryotic Microbiology 51: 207-213. Tennyson, C. N., H. J. Klamut, et al. (1995). The human dystrophin gene requires 16 hours to be transcribed and is cotranscriptionally spliced. Nat Genet 9(2): 184-190. Triplett, E. L. , R. V. Jovine, et al. (1993). Characterization of two full-length cdna sequences encoding for apoproteins of peridinin-chlorophyll a-protein (pep) complexes. Mol Mar Biol Biotechnol 2(4): 246-254. 69 Ussery, D. W. and P. F. Hallin (2004). Genome update: Length distributions of sequenced prokaryotic genomes. Microbiology 150(Pt 3): 513-516. Wang, Y. and D. Morse (2006). Rampant polyuridylylation of plastid gene transcripts in the dinoflagellate lingulodinium. Nucleic Acids Res 34(2): 613-619. Waterston, R. H., E. S. Lander, et al. (2002). On the sequencing of the human genome. Proc Natl Acad Sci USA 99(6): 3712-3716. Waterston, R. H., K. Lindblad-Toh, et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915): 520-562. Wickes, B. L., T. D. Moore, et al. (1994). Comparison of the electrophoretic karyotypes and chromosomal location of ten genes in the two varieties of cryptococcus neoformans. Microbiology 140 ( Pt 3): 543-550. Yu, Z., S. I. Wright, et al. (2000). Mutator-like elements in arabidopsis thaliana. Structure, diversity and evolution. Genetics 156(4): 2019-2031. 70 C H A P T E R 5: C O N C L U S I O N Reduction of Endosymbiont genomes: K. foliaceum is a particularly complex cell. It contains five genomes (host nuclear, endosymbiont nuclear, host mitochondria, endosymbiont mitochondria, and endosymbiont plastid) and the memory of a sixth (the peridinin-containing plastid genome of the past) in the form of the eyespot. The K. foliaceum nucleus likely contains transferred genes from its first (red algal) plastid (as K. micrum does, (Patron, Waller et al. 2006)) and might encode a small set of genes from its second (diatom) plastid or its nucleus, which is in the early stages of reduction (McEwan and Keeling 2004). The K. foliaceum example shows the extreme extent to which dinoflagellates use the acquisition of endosymbiotic organelles to increase the diversity and content of their genomes. Introns: Like many eukaryotes, dinoflagellates contain introns, but the distribution of introns among dinoflagellate species is yet unclear because of a small and poorly dispersed sample set of only 16 introns in 5 genes. Though the genomes of some eukaryotes contain large proportions of introns, this is not the case for dinoflagellates. I found no introns in either species, and no evidence to suggest that introns have played a large part in the expansion of dinoflagellate genomes. Genome survey: This study is the first attempt at characterizing the makeup of dinoflagellate genomes at a genome-wide scale. The estimate of H. triquetra G C content is lower than is usually reported for dinoflagellate genomes, but coding sequences are higher in G C than the overall genomic D N A , which fits the normal eukaryotic trend. Neither G C content nor representation of genes in the database supported high levels of 71 duplicate genes in the H. triquetra genome. The data shows that imperfect complex repeats and transposable elements are both widely present in the H. triquetra genome. 72 L I T E R A T U R E C I T E D McEwan, M. L. and P. J. Keeling (2004). Hsp90, tubulin and actin are retained in the tertiary endosymbiont genome of kryptoperidinium foliaceum. J Eukaryot Microbiol 51(6): 651-659. Patron, N. J., R. F. Waller, et al. (2006). A tertiary plastid uses genes from two endosymbionts. J Mol Biol 357(5): 1373-1382. 73 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items