Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Molecular evolution of the parasitic green alga, Helicosporidium sp. de Koning, Audrey Patricia 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2006_fall_dekoning_audrey.pdf [ 3.88MB ]
Metadata
JSON: 831-1.0092865.json
JSON-LD: 831-1.0092865-ld.json
RDF/XML (Pretty): 831-1.0092865-rdf.xml
RDF/JSON: 831-1.0092865-rdf.json
Turtle: 831-1.0092865-turtle.txt
N-Triples: 831-1.0092865-rdf-ntriples.txt
Original Record: 831-1.0092865-source.json
Full Text
831-1.0092865-fulltext.txt
Citation
831-1.0092865.ris

Full Text

MOLECULAR EVOLUTION OF THE PARASITIC GREEN ALGA, HELICOSPORIDIUM SP.    by  AUDREY PATRICIA DE KONING  B.Sc., The University of Northern British Columbia, 1999 M.Sc., The University of British Columbia, 2002         A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF  THE REQUIREMENTS FOR THE DEGREE OF   DOCTOR OF PHILOSOPHY    in   THE FACULTY OF GRADUATE STUDIES    (Genetics)         THE UNIVERSITY OF BRITISH COLUMBIA  AUGUST 2006    Audrey Patricia de Koning, 2006  ii Abstract Helicosporidia are single-celled obligate endoparasites of invertebrates. They have a unique morphology and infection strategy, which make them unlike any other eukaryote. Molecular data were produced to clarify their phylogenetic relationship and to examine the evolution of their cryptic plastid. Phylogenetic analyses of 69 ribosomal proteins identified from an expressed sequence tag (EST) library showed that Helicosporidia are derived green algae and more specifically, are related to the trebouxiophyte algae. An obligate parasitic lifestyle is rare among plant and algal groups, and because Helicosporidium possesses no pigments and no chloroplast-like structure has been identified, photosynthetic ability has presumably been lost in this organism. I sought to examine the role that a relict plastid might play in Helicosporidium. I identified ESTs of 20 putatively plastid-targeted enzymes that are involved in a wide variety of metabolic pathways. As expected, no components of photosynthesis were found, but components of other metabolic pathways including sulfur metabolism and fatty acid, isoprenoid and heme biosynthesis suggest that Helicosporidium retains its plastid for these functions. The complete plastid genome of this species of Helicosporidium was sequenced and revealed only four protein-coding genes not involved in transcription or translation, with two of these confirming the metabolic functions suggested by the nuclear-encoded, plastid-targeted genes identified from the ESTs. In addition, the Helicosporidium plastid genome is one of the smallest known (37.5 kb). Its reduced size results from loss of many genes commonly found in plastids of other plants and algae (including all proteins that function in photosynthesis), elimination of duplicated genes and redundant tRNA isoacceptors, and minimization of intergenic spaces. The Helicosporidium plastid genome is also highly structured, with each half of the circular genome containing nearly all genes on one strand. Both the structure and content of the plastid genome and the deduced function of the organelle show parallels with the relict plastid found in the malaria parasite, Plasmodium falciparum. These unrelated organisms each evolved from photosynthetic ancestors, and the convergence in form and function of their relict plastids suggest that common forces shape plastid evolution, following the switch from autotrophy to parasitism.  iii Table of contents Abstract ...................................................................................................................... ii Table of contents.............................................................................................................iii List of Tables .................................................................................................................. vi List of Figures.................................................................................................................vii Acknowledgements .......................................................................................................viii Co-authorship Statement................................................................................................ ix Chapter 1 – Introduction ..................................................................................................1 Literature Review.........................................................................................................1 Lifecycle ...................................................................................................................1 Ecology and diversity ...............................................................................................3 Phylogenetics...........................................................................................................4 Research Objectives....................................................................................................5 Origin of Helicosporidium .........................................................................................6 The function of a plastid in Helicosporidium .............................................................6 The Helicosporidium plastid genome .......................................................................7 References ..................................................................................................................9 Chapter 2 - Expressed sequence tag (EST) survey of the highly adapted  green algal parasite, Helicosporidium.........................................................13 Introduction ................................................................................................................13 Methods.....................................................................................................................14 Strains, cultivation, and library construction ...........................................................14 Expressed sequence tag sequencing and analysis................................................15 Results and Discussion..............................................................................................16 EST sequencing.....................................................................................................16 Distribution of functional classes of expressed genes............................................17  iv Novel genes in Helicosporidium .............................................................................20 Phylogeny of Helicosporidium genes. ....................................................................22 Concluding remarks ...............................................................................................27 References.................................................................................................................28 Chapter 3 - Nucleus-encoded genes for plastid-targeted proteins in Helicosporidium: functional diversity of a cryptic plastid in a parasitic alga............................30 Introduction ................................................................................................................30 Methods.....................................................................................................................32 Identification and characterization of putative plastid-targeted proteins .................32 Phylogenetic analyses............................................................................................32 Presequence characterization................................................................................33 Results.......................................................................................................................34 Discussion .................................................................................................................39 Functional implications of plastid-targeted proteins in Helicosporidium..................39 Fatty acid metabolism ............................................................................................39 Tetrapyrrole biosynthesis .......................................................................................40 Isoprenoid biosynthesis..........................................................................................41 Amino acid biosynthesis.........................................................................................41 Reducing potential..................................................................................................43 Sulfur assimilation ..................................................................................................44 Comparative plastid reduction in obligate parasites ...............................................47 References.................................................................................................................48 Chapter 4 - The complete plastid genome sequence of the parasitic green alga Helicosporidium sp. is highly reduced and structured.................................54 Introduction ................................................................................................................54 Methods.....................................................................................................................56 Cell culture and genomic DNA isolation .................................................................56 Genome sequencing ..............................................................................................56 Annotation and analyses ........................................................................................57  v Results and discussion ..............................................................................................57 Genome size and density of coding regions...........................................................57 Genome structure and organization .......................................................................61 Gene content..........................................................................................................64 The Helicosporidium plastid encodes a minimal set of tRNAs................................67 Concluding remarks ...............................................................................................70 References ................................................................................................................71 Chapter 5 - Conclusions ................................................................................................76 Summary ...................................................................................................................76 Future work................................................................................................................77 References ................................................................................................................80   vi List of Tables Table 2.1 - Functional classification of clusters with more than five ESTs.....................21 Table 2.2 - Summary of phylogenetic analyses of SSU ribosomal proteins. .................24 Table 2.3 - Summary of phylogenetic analyses of LSU ribosomal proteins...................25 Table 4.1 - Plastid genome features compared between non-photosynthetic  plastids and photosynthetic relatives. .........................................................60   vii List of Figures Figure 1.1 - Lifecycle stages in Helicosporidium .............................................................2 Figure 2.1 - Distribution of clusters by similarity to known genes ..................................18 Figure 2.2 - Distribution of 163 annotated protein clusters to functional groups by COG categories .....................................................................................19 Figure 2.3 -  Protein maximum likelihood phylogenies of ribosomal proteins ................23 Figure 3.1 - Summary of cDNAs for putative plastid-targeted proteins from Helicosporidium sorted by pathway ............................................................35 Figure 3.2 - Maximum-likelihood phylogenies of four proteins showing different types of support for plastid-targeted proteins in Helicosporidium................36 Figure 3.3 - Difference in mean amino acid composition between target peptides  and mature proteins....................................................................................38 Figure 3.4 - The leucine biosynthetic pathway is present in the Helicosporidium  plastid .........................................................................................................42 Figure 3.5 – Phylogenetic evidence for the sulfur assimilation pathway........................46 Figure 4.1 - Gene map of Helicosporidium plastid DNA ................................................58 Figure 4.2 - GC skew diagram for the Helicosporidium plastid genome........................63 Figure 4.3 - Gene order comparisons between trebouxiophyte plastid genomes..........65 Figure 4.4 - Codon frequency and tRNAs in the Helicosporidium plastid ......................68   viii Acknowledgements  I thank A. Tartar and D. Boucias (University of Florida) for providing Helicosporidium cells in culture, and for collaborative efforts in sequencing the EST library presented in chapter 2. I also thank the research group, PEPdb, for software used in clustering and annotating Helicosporidium EST data.  I acknowledge the Chlamydomonas Genome Consortium for providing unpublished sequence data. These sequence data were produced by the US Department of Energy Joint Genome Institute, http://www.jgi.doe.gov/ and were used in phylogenetic analyses in chapter 3.  This work was completed while I was receiving support through scholarships from the Canadian Institutes for Health Research and the Michael Smith Foundation for Health Research, and I am appreciative of the opportunities this support has provided me.  Finally, I am most grateful for the guidance and training I received from my thesis supervisor, Dr. Patrick Keeling, for the support and suggestions provided by the members of my supervisory committee, Drs. Rosemary Redfield, Brian Leander, and Mary Berbee, and for the advice and friendship I was given from all members of the Keeling Lab.  ix Co-authorship Statement  The manuscript that is included as Chapter 2 was originally published as: de Koning A.P., Tartar A., Boucias D.G., Keeling P.J.: Expressed Sequence Tag (EST) survey of the highly adapted green algal parasite, Helicosporidium. Protist 2005, 156(2):181-190. A. Tartar constructed the cDNA library under the guidance of D.G. Boucias. Mass excision and EST sequencing was performed jointly by A. Tartar and myself. I performed the data analyses with supervision from P.J. Keeling. P.J. Keeling and I wrote the manuscript jointly.  Chapter 3 is based on a manuscript originally published as: de Koning A.P., Keeling P.J.: Nucleus-Encoded Genes for Plastid-Targeted Proteins in Helicosporidium: Functional Diversity of a Cryptic Plastid in a Parasitic Alga. Eukaryotic Cell 2004, 3(5):1198-1205. P.J. Keeling suggested the project, while I designed the methodology, conducted experiments and data analyses, and wrote the manuscript. P.J. Keeling contributed insights into data analyses and the final manuscript.  The manuscript included as Chapter 4 was originally published as: de Koning A.P., Keeling P.J.: The complete plastid genome sequence of the parasitic green alga, Helicosporidium sp. is highly reduced and structured. BMC Biol 2006, 4(1):12. I designed and performed the study, interpreted the data and wrote the manuscript. P.J. Keeling participated in editing the manuscript.    1 Chapter 1 – Introduction Literature Review Lifecycle Helicosporidia were first named and described in 1921 by Keilin [1], who found a species he called Helicosporidium parasiticum in fly larvae collected in England. He described vegetative growth of Helicosporidia in the hemocoel of its host, which he called “schizogonic multiplications,” and noted that this was followed by the formation of cysts (which he called “spores”). The cysts contained three ovoid cells surrounded by an elongated filamentous cell. He believed that following death and decay of the host, the filamentous cell was responsible for breaking open the cyst and releasing the ovoid cells, which he called “sporozoites.” The sporozoites would then infect new hosts.  Fifty years later, the lifecycle and development of Helicosporidium was re-investigated with an isolate discovered in a beetle [2] that was used to infect larvae of the navel orangeworm Paramyelois transitella (Lepidoptera). Kellen and Lindegren [3] showed that Helicosporidium cysts were infectious when ingested, and would break open and release the three ovoid and one filamentous cell in the midgut of the host. After 24 hours, helicosporidial cells were observed in the host hemolymph, and vegetative growth was observed after 48 hours. Vegetative growth consists of cell divisions occurring within a pellicle, followed by rupture of the pellicle and release of 4 or 8 daughter cells. High rates of vegetative growth were followed by the appearance of cysts in the hemolymph 6-12 days after initial infection. A number of more recent studies have examined the biology of Helicosporidium in much detail, and have provided some information on the infection process and lifecycle of this parasite. Infection begins with ingestion of the cyst (Figure 1.1 A-D), which ruptures inside the midgut of the host. The three released ovoid cells lyse, while the filamentous cell uncoils, but resists degradation [4]. The filamentous cell then migrates through the midgut epithelium (Figure 1.1 E) and enters the hemocoel, although it is unknown whether the filamentous cell passes through or around the midgut epithelial cells. Once in the hemolymph, filamentous  2  Figure 1.1 - Lifecycle stages in Helicosporidium A) Scanning electron micrograph (SEM) of mature cysts. B) Transmission electron micrograph (TEM) of a cyst showing three centrally located ovoid cells, surrounded by the filamentous cell. C) SEM of ruptured cyst, showing an uncoiled filamentous cell projecting away from the ovoid cell complex. D) TEM of a ruptured cyst, showing release of the filamentous cell. E) SEM of the filamentous cell penetrating the midgut epithelium of the host. F) Differential interference contrast light micrograph (LM) of host hemocytes, with a filamentous cell visible in the upper left hemocyte. G) LM of a quartet of rod-shaped daughter cells within the host hemocyte. H) LM of a host hemocyte releasing vegetative cells. I) SEM of a vegetative cell. J) TEM of a vegetative cell. K) TEM of vegetative cell following autosporulation, showing four daughter cells present within the pellicle of the mother cell. Panels A-C are reprinted from [4] and D is reprinted from [6] with permission from Blackwell Publishing ©, panels E-H are reprinted from [5] with permission from Elsevier ©, panel I is unpublished by Drion Boucias © of the University of Florida and is used with permission, and panels J-K were taken during the course of my thesis studies. 1.0 µm 7.5 µm 1.5 µm 2.0 µm  3 cells are phagocytosed by host hemocytes (Figure 1.1 F), but survive cellular digestion and begin vegetative reproduction intracellularly [5]. Filamentous cells divide twice before the outer pellicle breaks to release four rod-shaped daughter cells (Figure 1.1 G). These each divide once more to produce eight spherical vegetative cells. Multiplying vegetative cells induce the breakdown of host hemocytes (Figure 1.1 H) and once released, the Helicosporidia cells begin many rounds of vegetative reproduction within the hemolymph. Vegetative cells  (Figure 1.1 I-K) reproduce by autosporulation, whereby the formation of 2, 4, or 8 daughter cells with pellicles occurs within the pellicle of the mother cell [6]. Both vegetative growth and cyst differentiation occur within hemocytes and extracellularly in the host hemolymph [5]. Cysts, which are believed to form from vegetative 4-cell stages, accumulate in the host and are released after death. In artificially infected heterologous hosts, Helicosporidium can maintain an infection in an insect from larval to adult stage and vertical transmission from parental host to filial generation has been shown [7]. However, Helicosporidia have not been observed in association with reproductive organs, and vertical transmission rates are quite low, so it is unlikely that vertical transmission plays a large role in the spread of Helicosporidium infection. Helicosporidia probably rely on host death and release of cysts to spread to new environments [6]. Ecology and diversity Helicosporidia have been described from a variety of invertebrate hosts including beetles [2, 7-11], flies [1, 4, 9, 12-15], moths [16], springtails [17, 18], mites [1, 17, 19], daphnia [20], oligochaete worms [9], and even trematode sporocysts that are themselves parasites of mussels [21]. Helicosporidia appear to be obligate parasites since free-living vegetative cells have never been described, although one study has isolated Helicosporidia cysts from pond water [22]. Helicosporidia are distributed world-wide, as they have been found in North America [2, 4, 12], South America [16], Asia [11, 13], Africa [10], and Australia [7]. Only one species has been named so far; Helicosporidium parasiticum was named in the first description of Helicosporidia [1]. All other isolates either have been  4 called parasiticum or are unnamed at the species level. There do not seem to be basic differences in ultrastructure or morphology between different isolates. Molecular comparisons of nuclear-encoded 18S rRNA, plastid-encoded 16S rRNA, actin, and B-tubulin genes from strains isolated from two hosts (a fly from Florida, and a beetle from Australia) show some differences, but not enough for the authors of the study to propose the existence of more than one species [23]. Helicosporidia probably have broad host ranges, as cyst isolates from one species have been shown to initiate infections when experimentally fed to a wide range of other host species [e.g., 3, 4, 7, 14, 15, 20, 24].  Phylogenetics In the first description of Helicosporidium in 1921, Keilin [1] admitted difficulty with taxonomic placement. He believed Helicosporidium to be a protozoan, but it did not neatly fit into the classification scheme of the day, and he concluded that his isolate differed markedly from all other protozoa known at that time. A decade later, Kudo [25] used Keilin’s descriptions to place Helicosporidia in its own order within the Cnidosporidia (a group which, at that time, included Microsporidia).  Forty years would pass before the taxonomic position of Helicosporidium was revisited. In 1970, Weiser [16] described Helicosporidium isolated from a wound in a larval moth. Weiser thought that the route of infection must be through wounds in the cuticle, which is the mode of infection seen in most insect-infecting fungi. In addition, he argued that the spore stage was very different than anything found among the Protozoa, but that they had some similarities with lower fungi. In particular, he thought that the filamentous cell resembled the needle-shaped ascospores of some fungi, and proposed that Helicosporidia be classified among primitive Ascomycetes. Some years later, Kellen and Lindegren’s studies [2, 3] showed that Helicosporidium is transmitted primarily through ingestion of cysts, but they suggested that Helicosporidia were as likely to be primitive fungi as protists. However, ultrastructure studies in 1976 led Lindegren and Hoffman [26] to suggest that Helicosporidia be re-classified within the Protozoa. For the next 25 years, the only studies involving  5 Helicosporidium were reports documenting its presence in new host species, which generally avoided any discussion of classification [e.g., 13, 17, 20-22].  Recently, interest in Helicosporidium has resurfaced. Boucias et al [4] described the developmental features of helicosporidia isolated from a blackfly, and pointed out that aspects of vegetative cell development and morphology were reminiscent of the achlorophyllic Trebouxiophyte green algae of the genus Prototheca. However, none of the described species of Prototheca are known to produce a filamentous cell-containing cyst, which is diagnostic of Helicosporidium. Like Helicosporidia, Protothecans are known to be associated with aquatic environments and are parasites of animals [27], although Helicosporidia only infect invertebrates, while Protothecans are vertebrate parasites. The first molecular study of Helicosporidium confirmed the suggested relationship to green algae (Chlorophyta). Tartar et al [28] showed that phylogenies constructed from nucleotide sequences of two protein-coding genes, actin and beta-tubulin, as well as genes encoding nuclear small and large subunit ribosomal RNA (SSU and LSU rRNA), all supported a green algal affiliation for Helicosporidium. Additionally, analysis of SSU rRNA supported a specific relationship of Helicosporidium with the trebouxiophyte group of green algae. Finally, recent phylogenetic studies undertaken to address relationships within the genus Prototheca using LSU and SSU, have shown that Helicosporidium, Auxenochlorella and Prototheca genera form a monophyletic group within the trebouxiophytes [29]. Research Objectives Helicosporidia are an enigmatic group of parasites with a unique lifecycle, infection strategy, and evolutionary history. Aspects of their biology such as host range, and their effectiveness as insect pathogens have been studied in some depth; this is because of interest in the possibility of using them as a biocontrol agent for mosquitoes [15, 22, 30, 31]. However, Helicosporidia are poorly characterized at the molecular level and there are unanswered questions surrounding their evolutionary origins from an alga. Helicosporidium and its presumed sister genus Prototheca, are the only known green algal parasites of  6 animals, and Helicosporidium is one of only a few eukaryotic parasites that replicate intracellularly. My first research objective was to address the almost complete lack of molecular data from Helicosporidium using genomics approaches. I generated sequence databases for Helicosporidium using three approaches: expressed genes were surveyed using expressed sequence tags (ESTs), the nuclear genome was surveyed using a genome sequence survey, and the plastid genome was completely sequenced. Each of these are efficient ways to generate a wealth of sequence data for protein-coding genes, and I have used this data to examine several questions relating to the algal origin of Helicosporidium and its adaptation to parasitism. Origin of Helicosporidium I first examined a possible phylogenetic relationship between Helicosporidium and trebouxiophyte green algae using EST data. This has been suggested based on analyses of only a few genes, so I sought to determine if it would be supported once a large number of genes were analyzed. From EST data, I identified all members of a large class of highly expressed proteins, ribosomal proteins, and inferred phylogenetic trees for all these genes to determine the frequency and robustness of this relationship from substantially more data than had been used previously. In addition, I examined the functional distribution of known genes among the database of expressed sequences, the distribution of EST abundance, and the prevalence of previously unknown gene sequences. The function of a plastid in Helicosporidium Plastids are the organelles of plants and algae that house photosynthesis and a number of other biochemical pathways. Plastids contain a small genome, but most of their proteins are encoded in the nucleus and post-translationally targeted to the organelle [32]. When plants and algae lose photosynthesis, a highly reduced “cryptic” plastid is often retained. Cryptic plastids are known to exist in a number of organisms, although their metabolic functions are seldom understood [33]. The best-studied example of a cryptic plastid is from the intracellular malaria parasite, Plasmodium, which has retained a plastid for the biosynthesis of fatty acids, isoprenoids, and heme, using plastid-targeted enzymes [34].   7 When I began this study there was no direct evidence for a plastid in Helicosporidium, but based on other algae-turned-parasites there was good reason to suspect it would have been retained (and data confirming this emerged as my work progressed – see below). To study the transformation of a photosynthetic plastid to a cryptic plastid in a system that is completely independent of the well-studied Apicomplexa, I used the EST data to identify a suite of genes for probable plastid-targeted proteins in Helicosporidium, and used these to determine a range of functions that may have been retained by a cryptic plastid. Based on phylogenetic relationships to other plastid homologues and the presence of N-terminal sequences with characteristics expected for the plastid-targeting transit peptides, I have identified 20 putatively plastid-targeted enzymes involved in a wide variety of metabolic pathways. These include some genes involved in housekeeping functions such as transcription and translation, genes that implicate the plastid as a site for biosynthesis of fatty acids, amino acids, tetrapyrroles, and isoprenoids, and genes for proteins involved in sulfur metabolism. I also compared the inferred metabolic pathways in the Helicosporidium plastid with those associated with cryptic plastids of Apicomplexa. While Apicomplexa and Helicosporidia are both endoparasitic, they are only distantly related and the reduction of their plastids therefore represents independent evolutionary events. I expected that retention and loss of particular plastid functions would have occurred differentially in these groups during their descent from photoautotrophic ancestors, and indeed I identified several plastid pathways in Helicosporidium that are absent in the well-studied Apicomplexa. However, the two parasites also retain several pathways in common and, not surprisingly, some typical plastid functions such as photosynthesis have been lost in both groups. These represent instances of convergent evolution associated with their shared lifestyle as parasites.   The Helicosporidium plastid genome Although Helicosporidia are colourless and no plastid-like structure has yet been identified in any member of the group, their phylogenetic affiliation with green  8 algae and the data presented in Chapter 2 of this thesis strongly suggest that a cryptic organelle is present. Indeed, during the course of my work, a small fragment of plastid DNA was identified in Helicosporidium [35].  This implies that the cryptic organelle retained a genome, but how the genome adapted to the transition to a parasitic way of life was not known. Loss of photosynthesis has occurred independently in many plant and algal lineages [e.g., 36, 37], and represents a major metabolic shift with potential consequences for the content and structure of plastid genomes. Based on observations from other non-photosynthetic organisms with cryptic plastids, plastid genome reduction appears to take place both through outright loss of genes and the relocation of organellar genes to the nuclear genome with concomitant targeting of the gene products back to the plastid (as described in Chapter 2). In the Apicomplexa, the genes that remain in the genome are also organized in unusual, and often very compact, ways [38].  To investigate the transformation of the Helicosporidium plastid at the genomic level, I completely sequenced the plastid genome. In most plastid genomes, the great majority of genes encode products involved in either gene expression or photosynthesis. When photosynthesis is lost, so are most or all of the related genes, leading to dramatic changes in plastid genome size, coding capacity, and often also structure. Plastids in non-photosynthetic organisms provide an opportunity to study the effects of massive genomic changes following the loss of a major metabolic function. In the case of Helicosporidium, I found its plastid genome to be among the smallest known, being only marginally larger than the apicomplexan apicoplasts. Like Apicomplexa, the Helicosporidium genome is also highly compacted in a structured way, so that genes are efficiently organized with a strong strand bias that likely reflects the direction of DNA replication. This contributes significantly to our understanding of the evolution of plastid DNA because it illustrates the highly ordered reduction that can occur following the loss of a major metabolic function, such as photosynthesis.  9 References 1. Keilin D: On the life history of Helicosporidium parasiticum n. g., n. sp., a new species of protist parasite in the larvae of Dashelaea obscura Winn (Diptera:Ceratopogonidae) and in some arthropods. Parasitology 1921, 13(2):97-113. 2. Kellen WR, Lindegren JE: New host records for Helicosporidium parasiticum. J Invertebr Pathol 1973, 22:296-297. 3. Kellen WR, Lindegren JE: Life cycle of Helicosporidium parasiticum in the navel orangeworm, Paramyelois transitella. J Invertebr Pathol 1974, 23(2):202-208. 4. Boucias DG, Becnel JJ, White SE, Bott M: In vivo and in vitro development of the protist Helicosporidium sp. J Eukaryot Microbiol 2001, 48(4):460-470. 5. Blaske-Lietze VU, Boucias DG: Pathogenesis of Helicosporidium sp. (Chlorophyta : Trebouxiophyceae) in susceptible noctuid larvae. J Invertebr Pathol 2005, 90(3):161-168. 6. Blaske-Lietze VU, Shapiro AM, Denton JS, Botts M, Becnel JJ, Boucias DG: Development of the insect pathogenic alga Helicosporidium. J Eukaryot Microbiol 2006, 53(3):165-176. 7. Blaske VU, Boucias DG: Influence of Helicosporidium spp. (Chlorophyta : Trebouxiophyceae) infection on the development and survival of three noctuid species. Env Entomol 2004, 33(1):54-61. 8. Lindegren JE, Okumura GT: Pathogens from economically important nitidulid beetles. In: USDA ARS. vol. W-9: USDA; 1973: 1-7. 9. Purrini K: Malamoeba scolyti sp. n. (Amoebidae, Rhizopoda, Protozoa) parasitizing the bark beetles, Dryocoetes autographus Ratz., and Hylurgops palliatus Gyll. (Scolytidae, Coleoptera). Arch Protistenkd 1980, 123(3):358-366. 10. Purrini K: On disease agents of insect pests of wild palms and forests in Tanzania. J Appl Entomol 1985, 99(3):237-240.  10 11. Yaman M, Radek R: Helicosporidium infection of the great European spruce bark beetle, Dendroctonus micans (Coleoptera : Scolytidae). Eur J Protistol 2005, 41(3):203-207. 12. Chapman HC: Biological control of mosquito larvae. Annu Rev of Entomol 1974, 19:33-59. 13. Hembree SC: Preliminary report of some mosquito pathogens from Thailand. Mosq News 1979, 39(3):575-581. 14. Fukuda T, Lindegren JE, Chapman HC: Helicosporidium sp. a new parasite of mosquitoes. Mosq News 1976, 36(4):514-517. 15. Seif AI, Rifaat MM: Laboratory evaluation of a Helicosporidium sp. (Protozoa: Helicosporida) as an agent for the microbial control of mosquitoes. J Egypt Soc Parasitol 2001, 31(1):21-35. 16. Weiser J: Helicosporidium parasiticum Keilin infection in the caterpillar of a hepialid moth in Argentina. J Protozool 1970, 17(3):436-440. 17. Purrini K: Light and electron microscope studies on Helicosporidium sp. parasitizing oribatid mites (Oribatei, Acarina) and collembola (Apterygota, Insecta) in forest soils. J Invertebr Pathol 1984, 44(1):18-27. 18. Avery SW, Undeen AH: The isolation of microsporidia and other pathogens from concentrated ditch water. J Am Mosq Control Assoc 1987, 3(1):54-58. 19. Purrini K: Studies on some ameobas (Ameobida) and Helicosporidium parasiticum (Helicosporida) infecting moss mites (Oribatei, Acarina) in forest soil samples. Arch Protistenkd 1981, 124(3):303-311. 20. Sayre RM, Clark TB: Daphnia magna (Cladocera:Chydoroidea) - new host of a Helicosporidium sp. (Protozoa: Helicosporida). J Invertebr Pathol 1978, 31(2):260-261. 21. Pekkarinen M: Bucephalid trematode sporocysts in brackish water Mytilus edulis, new host of a Helicosporidium sp. (Protozoa: Helicosporida). J Invertebr Pathol 1993, 61(2):214-216.  11 22. Avery SW, Undeen AH: Some characteristics of a new isolate of Helicosporidium and its effect upon mosquitoes. J Invertebr Pathol 1987, 49(3):246-251. 23. Tartar A, Boucias DG, Becnel JJ, Adams BJ: Comparison of plastid 16S rRNA (rrn 16) genes from Helicosporidium spp.: evidence supporting the reclassification of Helicosporidia as green algae (Chlorophyta). Int J Syst Evol Micr 2003, 53:1719-1723. 24. Conklin T, Blaske-Lietze VU, Becnel JJ, Boucias DG: Infectivity of two isolates of Helicosporidium spp. (Chlorophyta : Trebouxiophyceae) in heterologous host insects. Fla Entomol 2005, 88(4):431-440. 25. Kudo RR: Handbook of Protozoology, 6th ed. edn. Springfield, IL.: C.C. Thomas; 1931. 26. Lindegren JE, Hoffmann DF: Ultrastructure of some developmental stages of Helicosporidium sp. in Navel Orangeworm, Paramyelois transitella. J Invertebr Pathol 1976, 27(1):105-113. 27. Pore RS, Barnett EA, Barnes WC, Jr., Walker JD: Prototheca ecology. Mycopathologia 1983, 81(1):49-62. 28. Tartar A, Boucias DG, Adams BJ, Becnel JJ: Phylogenetic analysis identifies the invertebrate pathogen Helicosporidium sp. as a green alga (Chlorophyta). Int J Syst Evol Micr 2002, 52:273-279. 29. Ueno R, Hanagata N, Urano N, Suzuki M: Molecular phylogeny and phenotypic variation in the heterotrophic green algal genus Prototheca (Trebouxiophyceae, Chlorophyta). J Phycol 2005, 41(6):1268-1280. 30. Hembree SC: Evaluation of the microbial control potential of a Helicosporidium sp. (Protozoa, Helicosporida) from Aedes aegypti and Culex quinquefasciatus from Thailand. Mosq News 1981, 41(4):770-783. 31. Kim SS, Avery SW: Effects of Helicosporidium sp. infection on larval mortality, adult longevity, and fecundity of Culex salinarius Coq. Korean J of Entomol 1986, 16:153-156. 32. McFadden GI: Primary and secondary endosymbiosis and the origin of plastids. J Phycol 2001, 37(6):951-959.  12 33. Williams BA, Keeling PJ: Cryptic organelles in parasitic protists and fungi. Adv Parasitol 2003, 54:9-68. 34. Ralph SA, van Dooren GG, Waller RF, Crawford MJ, Fraunholz MJ, Foth BJ, Tonkin CJ, Roos DS, McFadden GI: Metabolic maps and functions of the Plasmodium falciparum apicoplast. Nat Rev Microbiol 2004, 2(3):203-216. 35. Tartar A, Boucias DG: The non-photosynthetic, pathogenic green alga Helicosporidium sp. has retained a modified, functional plastid genome. FEMS Microbiol Lett 2004, 233(1):153-157. 36. Wolfe KH, Morden CW, Palmer JD: Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proc Natl Acad Sci USA 1992, 89(22):10648-10652. 37. Gockel G, Hachtel W: Complete gene map of the plastid genome of the nonphotosynthetic euglenoid flagellate Astasia longa. Protist 2000, 151(4):347-351. 38. Wilson RJ, Rangachari K, Saldanha JW, Rickman L, Buxton RS, Eccleston JF: Parasite plastids: maintenance and functions. Philos Trans R Soc Lond, B, Biol Sci 2003, 358(1429):155-162.  13 Chapter 2 - Expressed sequence tag (EST) survey of the highly adapted green algal parasite, Helicosporidium.∗ Introduction The Helicosporidia are an enigmatic group of obligate pathogens that are found in a variety of insect hosts. The defining feature of the group is the infective cyst stage. Cysts consist of a pellicle surrounding three internal and relatively undifferentiated cells, around which is wrapped a long and highly differentiated helical cell with tapered and barbed ends. When the cysts burst in the gut lumen of insects, the helical cell is expelled. It pierces the gut epithelial cells and sticks, apparently aided by the barbed ends. The helical cell then migrates through the epithelium wall and into the host hemolymph where it differentiates into the vegetative stage [1]. The vegetative stage undergoes a 2-4 cell asporogenic division with both cell division and daughter cell wall formation occurring within the mother cell. By 7-12 days post-infection, vegetative cells completely fill the hemocoel and begin to differentiate into the cyst form. It is likely that the cysts are released into the environment upon the death of the host or possibly transovum transmitted by infected females [2]. The evolutionary origin of Helicosporidia has been unclear since their first discovery. The parasites are highly adapted and do not closely resemble any other group of eukaryotes, and they have therefore been excluded from most large taxonomic schemes of eukaryotes. When they are considered, they have been placed in various phylogenetic positions, including protozoans [3, 4] and fungi [5, 6]. Recently, the study of Helicosporidia has been greatly advanced by the ability to grow Helicosporidium sp. axenically in vitro [1]. This has allowed the parasite to be purified in substantial quantities, and led to the first molecular data from Helicosporidia. These data provided a very different and unexpected picture of the                                             ∗ A version of this chapter has been published. de Koning AP, Tartar A, Boucias DG, Keeling PJ: Expressed Sequence Tag (EST) survey of the highly adapted green algal parasite, Helicosporidium. Protist 2005, 156(2):181-190.  14 evolution of this group: analysis of nuclear small subunit ribosomal RNA (SSU rRNA), actin and beta-tubulin all showed Helicosporidium to be a member of the green algae, and specifically related to the opportunistic parasite Prototheca [7], which is consistent with some features of the vegetative stage of Helicosporidium [1]. The green algal origin of Helicosporidium has recently been supported by phylogenies based on plastid SSU rRNA [8], elongation factor Tu [9], the EFL protein [10], and numerous plastid-targeted proteins [11]. We have conducted an expressed sequence tag (EST) project on the blackfly (Simulium jonesii) isolate of Helicosporidium sp. to accelerate gene discovery in this enigmatic group of pathogens. Prior to the initiation of this survey, the only molecular data from the group were nuclear and plastid SSU rRNA, actin, beta-tubulin, and a fragment of the plastid genome. We sequenced approximately 1,400 cDNA clones resulting in approximately 700 unique sequences, increasing the available molecular data from the group approximately 100-fold. The survey has already led to descriptions of the Helicosporidium EFL homologue [10] and 20 cDNAs for plastid-targeted proteins [11]. To further demonstrate the potential utility of these data, we reanalyze the phylogenetic position of Helicosporidium here using the ribosomal proteins. We identified 69 nuclear ribosomal proteins and conducted phylogenetic analyses on each protein individually. Phylogenetic evidence for the green algal origin of Helicosporidium and its close relationship to Prototheca is currently based on a small number of genes, each of which provides strong support. The analyses of ribosomal proteins and the ESTs in general complement these data by providing broad and consistent support for this conclusion based on results from a large number of genes.  Methods  Strains, cultivation, and library construction The Helicosporidium sp. (strain ATCC 50920) isolated from the black fly Simulium jonesii [1] was propagated on artificial media (TC-100 insect medium supplemented by 5% fetal calf serum) containing gentamycin (50 µg/ml) and incubated at 26 ˚C. Vegetative cells were inspected for purity by light microscopy  15 and collected by low-speed centrifugation, re-suspended into 10 ml of TriReagent (Sigma) plus glass beads (0.45 mm), and broken using a Braun MSK homogenizer. Following cell breakage, total RNA was extracted using the TriReagent manufacturer protocol. An aliquot of this total RNA was used to isolate polyA mRNA, using the Oligotex mRNA purification kit (Qiagen). The cDNA library was prepared in the Uni-ZAP XR plasmid using the ZAP-cDNA synthesis kit (Stratagene). Following the manufacturer’s protocol, the cDNAs were ligated directionally into the Uni-ZAP XR vector, and the ligation reaction products were packaged using the Gigapack III Gold packaging extract. The library was titered, amplified, and mass-excised, converting phage into the pBluescript phagemid.  Expressed sequence tag sequencing and analysis Colonies from the mass-excised library were selected at random and plasmid DNA was isolated from 1,536 clones. All clones were sequenced from the 5’ end using dye-terminator chemistry. Trace files were vector- and quality-trimmed, and sequences greater than 50 bp were clustered using the Protist EST Program database, PEPdb (http://megasun.bch.umontreal.ca/pepdb/pepdb.html). All clusters were used to search GenBank using tBLASTx and internally using BLASTn. Clusters were also subjected to automatic annotation using AutoFACT (http://megasun.bch.umontreal.ca/Software/AutoFACT.htm). All EST sequences with annotation have been deposited in the public database PEPdbPUB (http://amoebidia.bcm.umontreal.ca/public/pepdb/agrm.php), and NCBI dbEST (accessions CX128248-CX129443). Ribosomal proteins were identified based on the automatic annotation results. Prototheca homologues were identified by the same procedure or by similarity to Helicosporidium homologues. All Helicosporidium clusters and Prototheca sequences [12] (available at: http://amoebidia.bcm.umontreal.ca/public/pepdb/ welcome.php) corresponding to putative ribosomal proteins were translated and added to amino acid multiple sequence alignments that included a broad diversity of eukaryotic 60S and 40S proteins (generally between 30 and 50 taxa in total). In all cases, alignments included representative animals, fungi and land plants, and  16 whatever protists and algae that were available in public databases. In cases where no green algal sequence existed in GenBank, the Chlamydomonas reinhardtii sequence was assembled from EST or genomic data and added to the alignment so that all alignments included representatives of both land plants and green algae. Phylogenetic trees (with support estimated using 100 bootstrap replicates) were inferred from all alignments using PhyML 2.3 [13] using the JTT substitution matrix with site-to-site rate variation modeled on a discrete gamma distribution with four variable rate categories and the shape parameter alpha estimated from the data. For the four genes where the phylogenies are shown, distance trees and bootstraps were also performed by TREE-PUZZLE 5.2 [14] using the WAG substitution frequency matrix and a gamma distribution with eight rate categories and invariable sites, with the alpha parameter and proportion of invariable sites estimated from the data. Trees were constructed using WEIGHBOR 1.2 [15]. Distance bootstraps were conducted in the same way using the shell script PUZZLEBOOT (A. Roger and M. Holder; www.tree-puzzle.de). Results and Discussion EST sequencing An axenic culture of vegetative Helicosporidium cells was harvested, and a directionally cloned cDNA library was constructed from poly-A purified mRNA. From a mass excision, 1,536 clones were isolated and sequenced from the 5’ end, resulting in 1,432 readable sequences and 1,188 sequences passing quality checks and vector trimming. These sequences were assembled into clusters of homologous sequences, resulting in 700 clusters. Six clusters were found to correspond to Escherichia coli contaminant (all six were represented by a single EST), which we interpret to have arisen during the library construction, since this Helicosporidium sp. was cultured in the presence of gentamycin and no contaminant was observed by microscopy prior to RNA isolation. These clusters were removed and are not considered further, leaving 694 unique clusters. The method used to generate clones for EST sequencing is not strictly quantitative, but the representation of ESTs in samples such as this bears some  17 relationship to the expression levels of genes, qualitatively. Certain genes can be over- or under-represented, but the overall trends do convey some sense of expression patterns. This should be interpreted with some caution, but the information is useful of gauging the effectiveness of the sampling for gene discovery. From the distribution of the number of ESTs in each cluster (not shown), it is apparent that the majority of ESTs are present in low copy number, while a relatively small number of sequences are over-represented. Almost three-quarters of the clusters were represented by a single EST, while the most highly represented cluster consisted of 49 ESTs, 2.5 times that of the next largest cluster of 20. The large variation in EST number likely corresponds to high-expression levels of the most extreme outliers. Overall, the distribution of EST number per cluster shows that the approach was very favourable for gene discovery, since the ultimate rate of new gene discovery was one unique sequence per two ESTs.  Distribution of functional classes of expressed genes The consensus sequences of the 694 unique clusters were compared with public sequence and motif databases using PEPdb (http://megasun.bch.umontreal. ca/pepdb/pepdb.html) to identify potentially homologous sequences (Figure 2.1). A large proportion of clusters (299 or 43%) were not detectably similar to any known protein, and a further five clusters were only found to contain similarity to known domains. An additional 57 clusters were found to be similar to hypothetical proteins in other organisms: 8 and 16 being hypothetical proteins with and without known domains respectively, and another 33 being similar only to other ESTs from other organisms. Six clusters corresponded to fragments of the ribosomal RNA operon (individual inspection showed these to be non-overlapping fragments corresponding to the SSU, ITS region, and LSU). The remaining 327 clusters (47%) were predicted to be known genes according to the annotation protocol.  The functional distribution of the 327 clusters matching annotated proteins was examined by classifying clusters by COG (clusters of orthologous genes [16]) categories (Figure 2.2). Over half of the 163 clusters that may be classified into COG categories are related to translation: many of these encode ribosomal proteins,   18        Figure 2.1 - Distribution of clusters by similarity to known genes For each fraction, the label includes the number of clusters (left), the designation of the cluster (centre) and the percent of clusters in this fraction (right). Designations (clockwise from the top) are defined as follows. “Annotated Protein” clusters are similar to sequences of annotated genes in public databases. “rRNA” clusters are similar to ribosomal RNA. “Hypothetical Protein” clusters are similar to proteins of unknown function in other organisms. “Unannotated EST” clusters are similar to EST sequences of unknown function from other organisms in public databases. “Hypothetical Domain-Containing” clusters are similar to hypothetical proteins, but also contain a known functional domain. “Domain-Containing” clusters contain known functional domains but are otherwise not similar to any known protein. “Unclassified” clusters are those with no significant similarity to any sequence in extant databases.   19         Figure 2.2 - Distribution of 163 annotated protein clusters to functional groups by COG categories For each fraction, the labels include the number of clusters (left), the COG category name (centre) and the percent of clusters in this fraction (right).  20 which represent the single largest class of genes found (see below). The next largest class of proteins (at 11%) is related to protein modification and turnover, followed by energy production (6%), amino acid metabolism (5%), coenzyme metabolism (4%) and transcription (4%). Many other categories are represented by a single cluster.  Novel genes in Helicosporidium  At 43%, the proportion of Helicosporidium clusters that were not detectably similar to any known gene is not outside the range typically expected when a large sample is acquired from a eukaryotic genome, but there are a few interesting characteristics of this class of genes that deserve note. Helicosporidia are obligate parasites with a highly specialized infection strategy, but they evolved from a free-living, photosynthetic green alga. One aspect of this transition that has been studied to date is the fate of its plastid. There is now evidence for a plastid genome and a variety of proteins targeted to the organelle, suggesting very strongly that it has been retained in a cryptic form for metabolic pathways other than photosynthesis. Other significant adaptations to parasitism in the Helicosporidia likely involve of the genes of unknown function, so it is interesting to note that many of the most abundant ESTs are unclassified (Table 2.1). In fact, six out of the top 10 represented ESTs are unclassified proteins, including the most highly represented EST, with 49 copies. Moreover, of the clusters with more than five ESTs, all but three are either unclassified or related to translation. The three exceptions include a member of the HSP20 family (a ubiquitous family of small proteins with a variety of functions), a putative homologue of a plant cell-wall protein, and a putative repressor. Proteins encoded by the unclassified clusters would be interesting candidates for functional investigation as they may represent surface proteins or some other abundant protein of interest related to a parasitic lifestyle.  ESTs that are not detectably similar to any known gene sequence are normally interpreted as representing genes that are relatively recent inventions or are evolving sufficiently rapidly to be beyond detection. In either case, if a closely related species were sampled one would expect to find some of these “unknown”   21     Table 2.1 – Functional classification of clusters with more than five ESTs   Rank Identity of Cluster Number of ESTs 1 unclassified 49 2 unclassified 20 3 unclassified 18 4 unclassified 14 5 HSP20 Family 14 6 unclassified 11 7 Cellular repressor of E1A-stimulated genes CREG 10 8 40S ribosomal protein S23 9 9 unclassified 8 10 Putative cell wall protein FLO11p 8 11 40S ribosomal protein S16 7 12 60S acidic ribosomal protein P1 7 13 unclassified 7 14 40S Ribosomal protein S27 6 15 40S ribosomal protein S19 6 16 40S ribosomal protein S24 6 17 EFL (EF-like protein) 6    22 genes in that close relative. Indeed, when Helicosporidium ESTs were compared with 3,943 EST sequences from its closest known relative (see below), the trebouxiophyte green alga Prototheca wickerhamii [12], we found 12 clusters with a recognizable match in P. wickerhamii but no significant similarity to a sequence in any other organism. These genes may be interesting cases to study the origin of parasitism in the ancestors of these genera. Phylogeny of Helicosporidium genes The phylogenetic history of Helicosporidia has not been obvious from the initial observations of these parasites. They have, at various times, been allied with the Cnidospora (apicomplexa, microsporidia and other parasites), or lower fungi [5, 6, 17]. The first molecular data from Helicosporidium were something of a surprise, therefore, as phylogenies of these genes showed a relationship to green algae. This conclusion has been supported by nuclear and plastid SSU rRNA genes, as well as actin, beta-tubulin, EFL and plastid elongation factor Tu. Moreover, most analyses (with the exception of EFL) suggest that the closest relative of Helicosporidium is the opportunistic parasite Prototheca [7, 8, 11]. We therefore examined the EST data to see if there is uniform support from a large number of genes for this conclusion, which has so far been based on strong evidence from a few genes. At the broadest level, the top BLAST hit to virtually every EST cluster is either a green alga or a plant, so we selected a class of genes to examine as a whole.  Ribosomal proteins are an ideal class of proteins to use for phylogeny because they are generally highly conserved, and are highly expressed and therefore abundant in EST samples. We identified all clusters encoding putative ribosomal proteins from Helicosporidium, resulting in a set of 69 proteins. We then identified homologues for 65 of these in the Prototheca EST data [12], and conducted ML phylogenetic and bootstrap analyses on all 69 proteins. Representative trees from both small and large subunits are shown in Figure 2.3, and summaries of the results are shown in Tables 2.2 and 2.3. The four genes shown in Figure 2.3 all support the sister relationship between Helicosporidium and  23  Figure 2.3 - Protein maximum likelihood phylogenies of ribosomal proteins For SSU proteins (A) S23 and (B) S3, and LSU proteins (C) L10 and (D) L31, green algae are indicated by a shaded box while other major groups are named and bracketed to the right. Numbers at nodes correspond to bootstrap support from ML (top or left) and distance (bottom or right). Dashes indicate support less than 50% and numbers are only shown for nodes relevant to Helicosporidium or supporting other major groups.  24   Table 2.2 - Summary of phylogenetic analyses of SSU ribosomal proteins  SSU Protein Taxa/ Characters ML Bootstrap % for clade containing Helicosporidium & Green Algae or Helicosporidium & Green Algae & Plantsa ML Bootstrap % for clade containing Helicosporidium & Protothecab S2 39/159 36c 84 S3 30/203 64c 97 S3A 40/230 29 58 S5 38/138 36 96 S6 33/136 82 - S8 35/168 41 70 S9 34/92 29 69 S10 39/90   - 9 S11 37/138 89 91 S13 40/139   - 85 S14 39/118   - 89 S15 45/74 20c 86 S15A 53/123 70 98 S16 34/123   - 99 S19 39/137 38 99 S20 29/101   - 92 S21 37/71 43 54 S23 46/137 67 94 S24 34/122 46c NA S25 28/87 53 - S26 31/115 70 - S27 37/78 27c 61 S28 28/61 83c - S29 28/54   - 6 S30 25/56 47 80 SA 37/103 80c -  a Dashes (-) indicate this relationship was not observed in the bootstrap tree. b NA indicates that no Prototheca data were available for comparison. c In these cases the green algae were not monophyletic, but the green algae plus plants were monophyletic, so the support for the green algae and plants collectively is reported.  25 Table 2.3 - Summary of phylogenetic analyses of LSU ribosomal proteins  SSU Protein Taxa/ Characters ML Bootstrap % for clade containing Helicosporidium & Green Algae or Helicosporidium & Green Algae & Plantsa ML Bootstrap % for clade containing Helicosporidium & Protothecab L4B 36/183 77 95 L5 37/210 89 99 L6 39/98   -   - L7 35/180 55c 70 L7A 33/188 70 54 L8 (L2) 39/176 73 64 L9 40/146 32 39 L10 47/119 58 95 L11 41/163 71 92 L13 27/191 89 NA L14 33/116 81   - L15 39/153   - 86 L17 37/134 52c   - L18 37/164 71 47 L18A 36/151 77 47 L19 37/146 95 62 L21 36/148 26c NA L22 35/95 58c 42 L23 43/132 54 95 L23A 37/128 58c 80 L24 40/99 78 99 L26 33/119 21 34 L27 40/113 86 75 L27A 40/102 54 49 L29 27/53   - 32 L30 39/107 31 66 L31 33/84 85 88 L32 35/123 67 79 L34 36/107 57c 91 L35 38/105   - 52 L35A 34/94   - 62 L36 36/76 78c 70 L37 34/81 88   -  a Dashes (-) indicate this relationship was not observed in the bootstrap tree. b NA indicates that no Prototheca data were available for comparison. c In these cases the green algae were not monophyletic, but the green algae plus plants were monophyletic, so the support for the green algae and plants collectively is reported.  26 Table 2.3 continued   SSU Protein Taxa/ Characters ML Bootstrap % for clade containing Helicosporidium & Green Algae or Helicosporidium & Green Algae & Plantsa ML Bootstrap % for clade containing Helicosporidium & Protothecab L37A 38/83 50   - L38 30/68 24 41 L39 19/51 51 NA L40 (CEP52) 42/121 19 47 L44 39/96 91 79 P0 41/86 44   - P1 38/78 46 88 P2 45/66   - 94  a Dashes (-) indicate this relationship was not observed in the bootstrap tree. b NA indicates that no Prototheca data were available for comparison. c In these cases the green algae were not monophyletic, but the green algae plus plants were monophyletic, so the support for the green algae and plants collectively is reported.   27 Prototheca relatively strongly (bootstrap support ranging from 81 to 97 %). Some support the relationship of Helicosporidium and Prototheca to other green algae specifically (e.g., S23, and L31) and others only support a weak relationship of these to green algae and land plants together (e.g., S3). The trees in Figure 2.3 are only intended to serve as examples; the important information comes from the overall view of all 69 phylogenies. Of the 27 small subunit proteins identified (Table 2.2), 19 placed Helicosporidium and Prototheca as sisters (14 with support over 80%). In one case no Prototheca data exist and in five others they were not sisters, but in every one of these six cases, Helicosporidium grouped with the other green algae. Similarly, 42 large subunit proteins were analyzed (Table 2.3) and 33 of these placed Helicosporidium as sister to Prototheca (13 with support over 80%). Again, of the remaining nine genes, no Prototheca data were available for three, but in only a single case was the Helicosporidium gene not related to green algal homologues (this case is L6, where Helicosporidium was related to red algal homologues). Overall, 80% of the relevant phylogenies (52 out of 65 genes) showed a specific relationship between Helicosporidium and Prototheca, and 98% (68 out of 69) showed a relationship of Helicosporidium to either Prototheca or other green algae.  Concluding remarks In recent years, our understanding of Helicosporidia has been transformed by the application of molecular methods to the group. What were not long ago regarded as enigmatic parasites of unknown origin, are now interpreted as highly modified trebouxiophyte green algae. This is a remarkable evolutionary transformation supported by virtually all data, as exemplified by the large proportion of ribosomal proteins that show such a relationship. Harder questions have yet to be addressed; in particular, what molecular changes are involved with the evolution of a parasite from a free-living, photosynthetic green alga (probably though a constitutively parasitic form like modern Prototheca). The answers to such questions likely lie in the most difficult proteins to study - those with no readily identifiable homologues - and many such candidates have now been identified.  28 References 1. Boucias DG, Becnel JJ, White SE, Bott M: In vivo and in vitro development of the protist Helicosporidium sp. J Eukaryot Microbiol 2001, 48(4):460-470. 2. Blaske VU, Boucias DG: Influence of Helicosporidium spp. (Chlorophyta : Trebouxiophyceae) infection on the development and survival of three noctuid species. Env Entomol 2004, 33(1):54-61. 3. Keilin D: On the life history of Helicosporidium parasiticum n. g., n. sp., a new species of protist parasite in the larvae of Dashelaea obscura Winn (Diptera:Ceratopogonidae) and in some arthropods. Parasitology 1921, 13(2):97-113. 4. Lindegren JE, Hoffmann DF: Ultrastructure of some developmental stages of Helicosporidium sp. in navel orangeworm, Paramyelois transitella. J Invertebr Pathol 1976, 27(1):105-113. 5. Weiser J: Helicosporidium parasiticum Keilin infection in the caterpillar of a hepialid moth in Argentina. J Protozool 1970, 17(3):436-440. 6. Kellen WR, Lindegren JE: New host records for Helicosporidium parasiticum. J Invertebr Pathol 1973, 22:296-297. 7. Tartar A, Boucias DG, Adams BJ, Becnel JJ: Phylogenetic analysis identifies the invertebrate pathogen Helicosporidium sp. as a green alga (Chlorophyta). Int J Syst Evol Micr 2002, 52:273-279. 8. Tartar A, Boucias DG, Becnel JJ, Adams BJ: Comparison of plastid 16S rRNA (rrn 16) genes from Helicosporidium spp.: evidence supporting the reclassification of Helicosporidia as green algae (Chlorophyta). Int J Syst Evol Micr 2003, 53:1719-1723. 9. Tartar A, Boucias DG: The non-photosynthetic, pathogenic green alga Helicosporidium sp. has retained a modified, functional plastid genome. FEMS Microbiol Lett 2004, 233(1):153-157. 10. Keeling PJ, Inagaki Y: A class of eukaryotic GTPase with a punctate distribution suggesting multiple functional replacements of translation  29 elongation factor 1 alpha. Proc Natl Acad Sci U S A 2004, 101(43):15380-15385. 11. de Koning AP, Keeling PJ: Nucleus-encoded genes for plastid-targeted proteins in Helicosporidium: Functional diversity of a cryptic plastid in a parasitic alga. Eukaryot Cell 2004, 3(5):1198-1205. 12. Borza T, Popescu CE, Lee RW: Multiple metabolic roles for the nonphotosynthetic plastid of the green alga Prototheca wickerhamii. Eukaryot Cell 2005, 4(2):253-261. 13. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52(5):696-704. 14. Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 2002, 18(3):502-504. 15. Bruno WJ, Socci ND, Halpern AL: Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 2000, 17(1):189-197. 16. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN et al: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4:41. 17. Kudo RR: Handbook of Protozoology, 6th ed. edn. Springfield, IL.: C.C. Thomas; 1931.    30 Chapter 3 - Nucleus-encoded genes for plastid-targeted proteins in Helicosporidium: functional diversity of a cryptic plastid in a parasitic alga∗ Introduction Plastids, or chloroplasts, are photosynthetic organelles of plants and algae that are derived from an endosymbiotic cyanobacterium [1]. All plastids that have been examined have small genomes that encode a handful of proteins involved in housekeeping functions, photosynthesis, and some anabolic functions. These proteins represent only a small fraction of the proteins needed for plastid maintenance and activity: most genes for plastid proteins have been transferred to the nuclear genome, and their products are post-translationally targeted to the plastid. While these genes may be encoded in the nucleus, they are derived from the cyanobacterium, and they can therefore potentially be recognized by their phylogenetic history, and by the presence of an N-terminal transit peptide that directs their targeting to the organelle [2]. While plastids are most often associated with photosynthesis, they fulfill a number of other important metabolic roles in both plants and algae. The best characterized of these are biosynthesis of fatty acids, isopentyl diphosphate (for isoprenoid biosynthesis), various amino acids, and tetrapyrroles. With such metabolic complexity, it is not surprising that the many plants and algae that have lost photosynthesis have retained cryptic plastids for some other purpose. Cryptic plastids are highly modified non-photosynthetic organelles that lack thylakoid structures and pigments, which makes them difficult to identify. It is only by investigating their metabolism or molecular biology that these plastids are recognized and the photosynthetic ancestry of the organism that harbours them is exposed. A good example of this is the recently discovered plastid of apicomplexan                                             ∗ A version of this chapter has been published. de Koning AP, Keeling PJ: Nucleus-Encoded Genes for Plastid-Targeted Proteins in Helicosporidium: Functional Diversity of a Cryptic Plastid in a Parasitic Alga. Eukaryotic Cell 2004, 3(5):1198-1205.  31 parasites such as the malaria parasite, Plasmodium. This organelle, called the apicoplast, was first identified by the discovery of plastid-encoded genes [3]. Not surprisingly, however, the apicoplast genome contains few genes, and most are involved in housekeeping activities that revealed little about why this plastid was retained. The function of the organelle was only discovered through sequencing nucleus-encoded genes for plastid-targeted enzymes, which have shown that the apicoplast’s role in the malaria parasite is biosynthesis of fatty acids, isoprenoids and heme [4]. The conversion of a fully functional photosynthetic plastid to the highly derived relict presently seen in Apicomplexa is a remarkable transformation that raises a number of questions on the changing role of the organelle as various metabolic pathways become obsolete. Here, we have examined this process in an unrelated parasite to compare the effects of metabolic reduction. Helicosporidia are a group of parasites that infect various invertebrates [5, 6]. They were first described in 1921 [7] and since then have been assigned to various types of protist or fungi, but their extremely sophisticated and derived infection mechanism is unlike that of any other eukaryote and they accordingly remained something of a mystery. Recently, however, the first molecular data from this group was characterized, and surprisingly showed Helicosporidium to be a trebouxiophyte green alga, closely related to another enigmatic parasite, Prototheca [6]. Recent descriptions of plastid-like small subunit rRNA [8], and a fragment containing elongation factor-Tu and ribosomal protein genes [9] in Helicosporidium confirm that, like its green-algal relatives, this non-photosynthetic parasite possesses a plastid genome. The metabolic role of the Helicosporidium plastid is of considerable comparative interest because it is a primary plastid of green algal ancestry, while the cryptic plastid of Plasmodium is a secondary plastid derived from the red algal lineage. In this chapter, I characterized 20 nuclear-encoded genes for plastid-targeted proteins and show that Helicosporidium has retained a considerably greater degree of plastid-derived metabolic diversity than Plasmodium.  32 Methods Identification and characterization of putative plastid-targeted proteins Genes encoding putative plastid-targeted proteins were identified from a pool of cDNA sequences produced from an ongoing EST project on a Helicosporidium sp. (strain ATCC 50920, originally isolated from the black fly, Simulium jonesii). Genes were identified for further characterization based on three criteria: 1) their association with pathways known to occur in the plastid, 2) their phylogenetic relationship to plastid homologues in other organisms, and 3) the presence of a presequence with hallmarks of plastid-targeting transit peptides.  Clones encoding cDNAs of putative plastid-targeted proteins were bi-directionally sequenced and conceptual amino acid translations were produced from both unique cDNAs and from reconstructed contiguous sequence for genes with multiple cDNA clones. For genes with truncated cDNAs, missing sequence was PCR-amplified. Helicosporidium was cultured in Sabouraud Maltose Yeast medium at 25ºC in the dark. Genomic DNA was isolated from harvested cells using the DNeasy Plant Minikit (Qiagen). PCR amplification utilized primer pairs consisting of an “anchor primer” designed from cDNA sequence, and a degenerate primer designed from plant and/or cyanobacterial homologues. PCR products were cloned in Topo 2.1 vector (Invitrogen), and sequenced on both strands. Forty-two new sequences have been deposited in GenBank under Accession Numbers AY596480-AY596521.  Phylogenetic analyses Inferred amino acid translations of Helicosporidium sequence were used as queries for BLAST searching of the non-redundant (NR) and EST public databases. In cases where no chlorophyte homologues were known, the gene of interest was sought in the Chlamydomonas reinhardtii unpublished genome sequence (http://www.jgi.doe.gov/). Homologues that minimized expect scores while maximizing taxonomic sampling were selected from the BLAST output. Amino acid sequences were aligned using CLUSTALX [10], and manually inspected. Ambiguously aligned regions were not considered in phylogenetic analysis.   33 Phylogenetic trees were produced from amino acid sequence alignments with maximum likelihood (ML) and ML-distance methods. ML-distances were calculated with TREE-PUZZLE 5.1 [11], using the VT model of amino acid substitution and corrected for among-site rate variation approximated by a discrete gamma distribution with eight rate categories plus an invariable rate category. The shape parameter (α), and the probability of site-invariability parameter (i), were estimated from the data. ML-distance trees were constructed using BIONJ [12], and Weighbor [13]. Support for ML-distance trees was obtained by bootstrapping (100 replicates) with PUZZLEBOOT (A. Roger and M. Holder, www.tree-puzzle.de). ML trees were constructed with PhyML [14], using the Dayhoff model of amino acid substitution. Among-site rate variation was modeled with a discrete gamma distribution for variable sites (with 4 substitution rate categories and the alpha parameter estimated from the data), and a proportion of invariable sites (estimated from the data). Branch significance was measured with 100 bootstrapped replicates for ML trees. Presequence characterization Alignments showed that a number of Helicosporidium nuclear proteins contained extensions at N-terminus of the polypeptide when compared to bacterial and organelle-encoded homologues. Moreover, such extensions were also present in eukaryotic nuclear-encoded homologues that are targeted to organelles. The extensions were analyzed by TargetP [15] and iPSORT [16], to determine if they had characteristics typical of known target peptides. Amino acid compositions of extensions were also examined. Extensions were classified as being potentially recognized by mitochondrion or plastid import machinery based on the phylogenetic relationship of the protein to mitochondrion or plastid-targeted homologues. The average amino acid compositions of nine putative plastid and eight putative mitochondrion targeting peptides were compared to the average amino acid compositions of the remainder of the respective proteins, and Wilcoxon signed-ranks tests were used to detect significant differences.   34 Results Sequences can be identified as encoding plastid-targeted proteins using three criteria: phylogenetic relationship to cyanobacterial and other plastid homologues, presence of an N-terminal transit peptide, and inferred function of the protein (as part of a common plastid pathway). These criteria are likely very effective at identifying genes derived from the plastid endosymbiont and whose products function in other known plastids. An initial survey of 1,360 Helicosporidium EST sequences representing 341 unique and identifiable genes, revealed 36 potentially plastid-related proteins that fulfilled one or more of these criteria. Preliminary phylogenetic analyses immediately identified 16 of these as mitochondrial, cytosolic or of uncertain type, and these were not analyzed further. Some of the remaining 20 cDNAs were truncated, and additional sequence data for 6 of these was acquired by PCR from genomic DNA. To determine the likelihood that each functions in the plastid, comprehensive phylogenetic analyses of all 20 proteins were carried out, and the results are summarized in Figure 3.1. Many of these phylogenies showed a strongly supported relationship between the Helicosporidium gene and a plastid-targeted homologue of green algae or plants. An example of such evidence can be found in acyl carrier protein (ACP), which is plastid-targeted in all plants and algae, and where two Helicosporidium homologues were sequenced (Figure 3.2 A). One Helicosporidium homologue is closely related to mitochondrial ACP (89-96%), while the other forms a very strong clade with the green algal plastid ACP (98-99%), which is in turn closely related to plant plastid ACP and more distantly to those of cyanobacteria. This strongly supports the plastid origin of this protein and by extension its plastid location, which is also supported by the presence of a 47 amino acid N-terminal extension that is predicted to encode a plastid transit-peptide.  In other cases, the phylogeny and the known distribution of a protein combine to provide support for a plastid location. For instance, all known eukaryotic homologues of SufB are associated with the plastid (Figure 3.2 B). SufB is plastid-encoded in red algae and organisms containing secondary plastids of red algal origin, while green algae and plants have a plastid-targeted SufB, and no other  35  Figure 3.1 - Summary of cDNAs for putative plastid-targeted proteins from Helicosporidium sorted by pathway On the left, filled boxes indicate the presence of an N-terminal extension predicted to be a plastid transit peptide, half-filled boxes indicate a leader of ambiguous function (e.g., the transcript was truncated within the leader, or the leader could not unambiguously be predicted to be a transit peptide), and an open box indicates the transcript was truncated within the mature protein. The second column indicates whether the Helicosporidium protein was phylogenetically related to other plastid-targeted proteins, and the right column indicates the statistical support for this. Asterisks indicate proteins that have evolved by lineage-specific duplication in plants, followed by relocation of one copy to a new cellular compartment, thereby obscuring the ancestral location of the plant proteins. Support was standardized to the node uniting Helicosporidium specifically with green algal and plant plastid homologues, except where indicated by superscripts as follows: (1) support for plastid and cyanobacteria as a whole; (2) support for Helicosporidium and green algae; (3) in the ferredoxin protein family, there is a distinct plastid-type that includes the Helicosporidium sequence, but the position of Helicosporidium within the plastid-type is not resolved.  36  Figure 3.2 - Protein maximum-likelihood phylogenies of four proteins showing different types of support for plastid-targeted proteins in Helicosporidium Major groups are indicated to the right, and numbers at nodes correspond to bootstrap support (%) from (left to right) maximum likelihood, weighted neighbor joining, and BioNJ. Genbank GI numbers for sequences are given following the species name; sequences without GI numbers were assembled from the NCBI EST database or the Chlamydomonas reinhardtii genome (http://www.jgi.doe.gov). “P” following the sequence name denotes genes that are encoded in the plastid. (A) ACP phylogeny shows strong support (98-99%) for a relationship between the Helicosporidium protein and the plastid-targeted homologue from the green alga Chlamydomonas. The Helicosporidium mitochondrial ACP is also shown. (B) SufB phylogeny, showing the relationship between the Helicosporidium protein and plastid-targeted proteins in green algae and plants. Other algal SufBs are plastid-encoded, and the protein is not known in eukaryotes outside this association with the plastid. (C) Ferredoxin-thioredoxin reductase subunit A is only known in plants, algae and cyanobacteria, and the Helicosporidium homologue groups strongly (97-100%) within the plant and algal clade. (D) Stearoyl-ACP desaturase is only known from plants and algae, and the Helicosporidium homologue is closely related to the Chlamydomonas homologue (100%).  37 eukaryotic SufB is known. The Helicosporidium gene forms a moderately well-supported clade with the plastid-targeted proteins of plants and other green algae (75-87%), supporting its plastid origin. Other proteins have even narrower distributions. For example, subunit A of ferredoxin-thioredoxin reductase (Figure 3.2 C) is only known in cyanobacteria and plastids, and the Helicosporidium homologue encodes a predicted transit-peptide and groups with the green algae within the strongly supported plastid clade (97-100%). Stearoyl ACP desaturase (Figure 3.2 D), on the other hand, is exclusively known from plastids, and the Helicosporidium homologue forms a strongly supported group with the other green algal homologue (100%). The green algal homologue is not clearly plastid-targeted itself and the Helicosporidium homologue is truncated, so while this gene is plastid-derived, whether it is plastid-targeted is not clear.  Phylogenetic support for the plastid origin of other proteins ranges between very strong (e.g., glutamate-1-semialdehyde aminotransferase, ClpB and dihydrolipoamide dehydrogenase) to modest (e.g., rpL15 and phosphoserine aminotransferase) or weak. The latter category includes proteins for which the phylogeny does not adequately resolve the position of Helicosporidium (e.g., poly-A binding protein) and for which multiple isoforms exist and plastid localization is not certain. These proteins have not been excluded from consideration because some encode transit peptide-like leaders, and all have a functional relationship to known plastid metabolic pathways. Whether these really are plastid-targeted, however, will require direct localization evidence. Complete leader sequences were available for nine of the 20 putative plastid-targeted genes. While there is little sequence conservation among target peptides, some trends have been observed; mitochondrial targeting peptides often have an excess of arginine, alanine and serine, while the acidic amino acids are rare [15]. Plastid targeting peptides have a deficit of acidic amino acids, but arginine and the hydroxylated amino acids are over-abundant [17]. The amino acid composition of predicted plastid and mitochondrion targeting peptides from Helicosporidium was examined (Figure 3.3), and they have many of the expected features. Helicosporidium plastid-targeting peptides have an average overabundance of  38        Figure 3.3 - Difference in mean amino acid composition between target peptides and mature proteins Difference was calculated as leader minus mature protein. Mean composition was calculated on a sample size of 9 for predicted plastid-targeted proteins (dark bars), and 8 for predicted mitochondria-targeted proteins (light bars). Significance was measured by Wilcoxon signed-ranks tests and significant differences (p<0.05) are indicated by bold outline. neg: acidic amino acids (D+E), pos: basic amino acids (R+K).   39 arginine when compared to the average amino acid composition of the mature parts of the proteins (p=0.01) and have an average deficit of acidic amino acids (0.05>p>0.025). The predicted mitochondrial targeting peptides are deficient in acidic amino acids (p=0.01), and have a surplus of serine (0.025>p>0.01).  The only definite proof of plastid-localization would be immunolocalization to the organelle. Nevertheless, the phylogenetic evidence shows that these proteins are plastid-derived, the full length genes encode leaders as expected for plastid-targeted proteins, and all of these proteins are part of metabolic pathways known to operate in plastids, most being found in other cryptic plastids as well. Moreover, the plastid genomic data shows that Helicosporidium does have a plastid. While it remains possible that these proteins function in some other compartment, this would be unprecedented in plastid evolution. Plastid-derived enzymes have only once been shown to take up a function in another compartment, and in this case the original plastid function was retained as well [18]. Even in Plasmodium, plastid-derived proteins have not been recruited to function in other compartments [4], so there is no reason to suspect such an event has taken place in Helicosporidium. Discussion Functional implications of plastid-targeted proteins in Helicosporidium  As expected for the non-photosynthetic Helicosporidium, no components of the photosynthetic machinery were found. However, some of the major metabolic roles of plastids were represented by the proteins we characterized. These putative plastid proteins are parts of pathways that have implications for the metabolism of Helicosporidium and the role of its cryptic plastid.  Fatty acid metabolism Evidence for fatty acid metabolism is provided by the presence of acyl carrier protein (ACP), which carries the elongating acyl chain through fatty acid biosynthesis [19]. In animals and fungi, ACP is part of a multifunctional, cytosolic, Type I fatty acid synthase. In bacteria and plastids, fatty acids are synthesized by a multisubunit Type II fatty acid synthase, where ACP is a small stand-alone protein [19]. Such a Type II  40 complex also operates in the mitochondria of some eukaryotes [20]. Helicosporidium encodes both plastid and mitochondrial ACP (Figure 3.2 A), indicating that some fatty acid metabolism occurs in each compartment. Whether this activity consists of de novo fatty acid biosynthesis or simply a modification of fatty acids synthesized elsewhere cannot be concluded until additional components of a Type II fatty acid synthase complex are also identified. The primary products of Type II fatty acid biosynthesis are the unsaturated fatty acids palmitoyl-ACP and stearoyl-ACP. In plant plastids, stearoyl-ACP is further modified to oleoyl-ACP (a major precursor of membrane glycerolipids and polyunsaturated fatty acids) by the enzyme stearoyl-ACP desaturase[19]. To date, stearoyl-ACP desaturase has only been identified in higher plants [21], but we found homologues in Helicosporidium and Chlamydomonas (Figure 3.2 D), suggesting that their plastids play an integral role in fatty acid modification. In plastids, fatty acids are synthesized from acetyl-CoA produced by the pyruvate dehydrogenase (PDH) complex [22, 23]. The plastid PDH complex is made up of four subunits, E1α, E1β, E2 and E3. Helicosporidium encodes plastid-derived dihydrolipoamide dehydrogenase (E3), implying that the Helicosporidium plastid metabolizes pyruvate and can provide acetyl-CoA for the synthesis of fatty acids. Tetrapyrrole biosynthesis Tetrapyrroles include hemes, which serve as prosthetic groups of respiratory enzymes, and chlorophyll, the major light harvesting pigment in photosynthesis. All tetrapyrroles are formed from δ-aminolevulinic acid. In animals, fungi and α proteobacteria, δ-aminolevulinic acid is produced through the Shemin pathway, while in most eubacteria, archaea, plants and algae, it is produced through the C5 pathway [24]. In plastid-containing eukaryotes, there is evidence for the Shemin pathway in the mitochondrion and the C5 pathway in the plastid [25-28], although the plastid C5 pathway seems to be responsible for most tetrapyrrole synthesis [24]. Helicosporidium encodes a glutamate-1-semialdehyde aminomutase, which catalyzes the third step in the C5 pathway. In contrast, Plasmodium appears to use the mitochondrial Shemin pathway to produce δ-aminolevulinic acid, which is then  41 exported to the apicoplast for further steps in heme biosynthesis [4]. Helicosporidium tetrapyrrole biosynthesis may be more similar to that of higher plants. Isoprenoid biosynthesis Isoprenoids, such as terpenes, dolichols, cholesterol, ubiquinone, chlorophyll and carotenoids, are compounds containing isoprene subunits. The isoprene precursors, isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP), are synthesized by either the mevalonate or DOXP pathways. Animals, fungi, some eubacteria and archaea use the mevalonate pathway, while most eubacteria, including the cyanobacteria, use the DOXP pathway [29].  Plants use the mevalonate pathway in the cytosol and the DOXP pathway in plastids [29]. Plasmodium falciparum also uses the DOXP pathway in the apicoplast, but apparently lacks the mevalonate pathway [4]. The final step of the DOXP pathway is catalyzed by ispH [29, 30], and the presence of this enzyme in Helicosporidium suggests that the DOXP pathway functions in its plastid.  Amino acid biosynthesis The Helicosporidium plastid appears to be involved in the synthesis of several amino acids. Leucine biosynthesis only takes place in bacteria, archaea, fungi and plants [31, 32], and in plants, recent evidence suggests it occurs in the plastid [33, 34]. The first committed step of leucine synthesis is catalyzed by isopropylmalate synthase (IPMS). Helicosporidium encodes a plastid-derived IPMS homologue (Figure 3.4 A) with an N-terminal leader predicted to be a plastid transit peptide. The second step in leucine biosynthesis is catalyzed by isopropylmalate isomerase (IPMI). In prokaryotes and plant plastids, IPMI is comprised of a large and small subunit [31, 35], whereas in fungi, the two subunits are fused and function in the cytoplasm [32]. Helicosporidium encodes the small subunit IPMI, which forms a well-supported clade with plant and Chlamydomonas plastid-targeted homologues (Figure 3.4 B). While the localization of this protein has not been demonstrated directly in plants or chlorophytes, all these proteins encode N-terminal leaders predicted to be transit peptides, and isolated plant chloroplasts produce leucine when supplied with the substrate for IPMS [34].  42      Figure 3.4 - The leucine biosynthetic pathway is present in the Helicosporidium plastid Amino acid maximum likelihood phylogenies for (A) isopropylmalate synthase and (B) isopropylmalate isomerase small subunit show the Helicosporidium homologue is closely related to the green algal homologue, and plastid-targeted proteins as a whole are consistently supported at 100%. Both Helicosporidium proteins are predicted to encode transit peptides, altogether supporting the presence of this pathway in its cryptic plastid. Annotation and numbers at nodes are as in Figure 3.2.   43 There is also evidence for metabolism of other amino acids. Phosphoserine aminotransferase catalyzes the second step in a pathway from 3-phosphoglycerate to serine. This pathway is cytosolic in fungi and animals, but plastidial in plants [36]. The Helicosporidium phosphoserine aminotransferase encodes an N-terminal extension and is phylogenetically related to plant homologues. Lastly, one Helicosporidium transcript is highly similar to an amino acid aminotransferase. This gene is found in plants, bacteria and archaea, but its function is only known in Arabidopsis and rice, where it has been shown to have aminotransferase activity with a high affinity for lysine [37]. The Helicosporidium sequence is closely related to rice and Arabidopsis as well as other plants and a chlorophyte. The plant and chlorophyte sequences all possess N-terminal extensions predicted to be plastid transit peptides, and the enzyme has been shown to be localized to the plastid of Arabidopsis [37]. In summary, the presence of genes involved in the synthesis of three different amino acids provides strong evidence that the plastid is an integral part of amino acid metabolism in Helicosporidium.  Reducing potential  Ferredoxins are small proteins involved in a variety of redox reactions. The best-known role of ferredoxin is in photosynthesis, where it accepts electrons from the electron transport chain and then reduces NADP or thioredoxin. Ferredoxin can also be reduced by NADPH, and the reduced ferredoxin acts as a cofactor for several plastid enzymes such as those involved in nitrate metabolism [38, 39], sulfite reduction [40], and fatty acid desaturation [41]. The oxidation of NADPH/reduction of ferredoxin is dominant in non-photosynthetic plastids [42]. Helicosporidium encodes a ferredoxin with an N-terminal transit peptide whose closest relatives include plastid-targeted ferredoxins from plants, chlorophytes and Apicomplexa, plastid-encoded ferredoxins from rhodophytes and other algae, and ferredoxins from cyanobacteria. The Helicosporidium plastid ferredoxin likely provides reducing power for ferredoxin-dependent enzymes, such as stearoyl ACP desaturase, which is one of the enzymes that has been shown to be ferredoxin-dependent in non-photosynthetic plastids of plants [41].  44 Ferredoxin can also reduce thioredoxin through the action of ferredoxin-thioredoxin reductase, an enzyme that has only been found in photosynthetic eukaryotes and cyanobacteria [43]. Plastid thioredoxin is a key cofactor in photosynthesis and is involved in light-dependent regulation of the redox state of several plastid enzymes. Thioredoxin has been detected in non-photosynthetic plant plastids, where it may activate enzymes in fatty acid biosynthesis, nitrate metabolism, isoprenoid biosynthesis, tetrapyrrole biosynthesis, and sulfur metabolism [43, 44]. The Helicosporidium ferredoxin-thioredoxin reductase subunit A encodes a leader and groups with plant enzymes and an uncharacterized Chlamydomonas gene (Figure 3.2 C), but its role in the cryptic plastid is uncertain because of the number of potential targets for a plastid thioredoxin. Lastly, iron-sulfur clusters are cofactors that bind various enzymes and mediate redox reactions. Two of the putative plastid-targeted proteins in Helicosporidium are Fe-S metalloenzymes: IspH and ferredoxin [4]. In plastids, iron-sulfur clusters are thought to be synthesized by a pathway that is homologous to one found in bacteria [45], involving SufB. Red algae and related plastids encode SufB in the plastid genome, while green algae and plants encode plastid-targeted enzymes, and localization has been demonstrated in Arabidopsis [46]. The Helicosporidium SufB is related to plant and chlorophyte homologues (Figure 3.2 B), as expected of a plastid-targeted enzyme. Sulfur assimilation Sulfates that are acquired from the environment are adenylated to form 5-adenylylsulfate (APS), which has one of two fates. It can be incorporated into a myriad of sulfate-containing cellular products, through sulfation reactions in the cytosol that use the substrate 3-phospho-5-adenylylsulfate (PAPS) as the sulfuryl donor. Alternatively, APS can be used for cysteine biosynthesis [47]. The PAPS used in sulfation reactions is formed from APS by the sequential action of two enzymes. The second enzyme is adenylylsulfate kinase, which we have identified in Helicosporidium sp. In Arabidopsis thaliana and presumably all plants, adenylylsulfate kinase is encoded by four genes specifying one cytosolic and three  45 plastid-localized enzymes [48]. Phylogenetic analysis of adenylylsulfate kinase (Figure 3.5 A) reveals that the plastidial and cytosolic adenylylsulfate kinases in plants form distinct clades and appear to have evolved by a duplication in the plant lineage. Chlorophyte adenylylsulfate kinases, including the Helicosporidium protein, form a well-supported clade that is sister to the clade containing plastidial and cytosolic plant proteins. As the chlorophyte proteins used in this analysis are uncharacterized, it is difficult to determine whether the ancestral protein of plants and chlorophytes was plastidial or cytosolic. However, all the plastidial plant enzymes and the chlorophyte enzymes contain an obvious N-terminal leader sequence predicted to be a plastid-targeting peptide. While the phylogenetic evidence for a plastidial clade is not conclusive, the leader sequences suggest that adenylylsulfate kinase probably functions in the plastid in Helicosporidium sp.  A second putative plastid-targeted protein in Helicosporidium is involved in sulfur assimilation. O-acetylserine(thiol)lyase (OAS-TL), catalyzes the final step in cysteine synthesis. Cysteine plays a pivotal role in sulfur metabolism as it is the central intermediate from which most sulfur compounds are synthesized [47]. OAS-TL is known only from plants, algae and bacteria. In plants, this enzyme is encoded by multiple genes, whose products either function in the cytosol or in the plastid [49]. In addition, Arabidopsis thaliana has a third, mitochondrial isoform of cysteine synthase that probably arose in a recent lineage-specific duplication [50]. OAS-TL from Chlamydomonas reinhardtii (a chlorophyte) has been characterized, and is represented by a single gene that is most likely targeted to the plastid [51]. OAS-TL sequences are also available for two rhodophytes. Unfortunately, it is not known if these proteins function in the plastid or if multiple isozymes are present in rhodophytes. In phylogenetic analysis of OAS-TL (Figure 3.5 B), highly supported branches group plant plastid-targeted enzymes, and plant cytosolic enzymes. These two clades form a supported sister group, and indicate that the isoforms in plants probably arose through a plant-lineage-specific duplication. The rhodophyte, Helicosporidium sp., and C. reinhardtii sequences, together with the plant plastidial/cytosolic clade, comprise a larger, moderately well supported clade. Similar to the case described above for adenylylsulfate kinase, the plant-specific duplication  46      Figure 3.5 – Phylogenetic evidence for the sulfur assimilation pathway Protein maximum likelihood phylogenies for (A) Adenylylsulfate kinase and (B) O-acetylserine(thiol)lyase, which show Helicosporidium sp. sequences as part of well-supported clades whose other members are predicted to possess plastid-targeting leader sequences (see text). Wavy lines indicate branches whose actual length is twice what is shown. Annotation and numbers at nodes are as in Figure 3.2.    47 makes it difficult to determine whether the ancestral condition represented cytosolic or plastid-targeted OAS-TL. Unfortunately, no plastid-targeting sequence information is available from Helicosporidium sp., as the cDNA from which it was sequenced was truncated in at the 5’ end. Nevertheless, all plant plastid-targeted genes, rhodophyte genes and the C. reinhardtii gene are predicted to contain plastid-targeted peptides at the 5’ end, and thus it is likely that they represent plastid-targeted proteins.  Comparative plastid reduction in obligate parasites The range of putative plastid-targeted proteins in Helicosporidium implies many biosynthetic pathways could play important roles in the parasite’s plastid. Some of these, such as fatty acid, tetrapyrrole, and isoprenoid biosynthesis, are thought to represent the core functions of the cryptic plastid of the malaria parasite, Plasmodium [4]. Other processes, such as chaperone activity and translation, have housekeeping functions that are needed by any plastid that retains a genome, and are present in both Plasmodium and Helicosporidium. Of the pathways they share, the two cryptic plastids seem to do some things differently. For example, while both parasites synthesize tetrapyrroles in the plastid, Helicosporidium may do so from endogenously-synthesized precursors, while Plasmodium has lost this pathway and imports precursors from the mitochondrion [4].  The cryptic Helicosporidium plastid is probably also metabolically more diverse and retains several pathways that have been lost in Plasmodium, such as sulfur assimilation and amino acid metabolic pathways. Two evolutionary factors may explain this: 1) Selection may be acting to maintain pathways in the Helicosporidium plastid. Helicosporidium can survive in cyst form outside its host, and thus might need greater metabolic autonomy than Plasmodium, which is entirely host-associated throughout its lifecycle. 2) The increased complexity of the Helicosporidium plastid may reflect a more recent autotrophic ancestor. If plastid reduction is a continuous process, then Plasmodium may simply be further along in the process of reducing a full-functioning plastid to the highly specialized relict we see today.   48 References 1. McFadden GI: Primary and secondary endosymbiosis and the origin of plastids. J Phycol 2001, 37(6):951-959. 2. Cline K, Henry R: Import and routing of nucleus-encoded chloroplast proteins. Annu Rev Cell Dev Biol 1996, 12(1):1-26. 3. Gardner MJ, Williamson DH, Wilson RJM: A circular DNA in malaria parasites encodes an RNA polymerase like that of prokaryotes and chloroplasts. Mol Biochem Parasitol 1991, 44(1):115-123. 4. Ralph SA, van Dooren GG, Waller RF, Crawford MJ, Fraunholz MJ, Foth BJ, Tonkin CJ, Roos DS, McFadden GI: Metabolic maps and functions of the Plasmodium falciparum apicoplast. Nat Rev Microbiol 2004, 2(3):203-216. 5. Boucias DG, Becnel JJ, White SE, Bott M: In vivo and in vitro development of the protist Helicosporidium sp. J Eukaryot Microbiol 2001, 48(4):460-470. 6. Tartar A, Boucias DG, Adams BJ, Becnel JJ: Phylogenetic analysis identifies the invertebrate pathogen Helicosporidium sp. as a green alga (Chlorophyta). Int J Syst Evol Microbiol 2002, 52:273-279. 7. Keilin D: On the life history of Helicosporidium parasiticum n. g., n. sp., a new species of protist parastie in the larvae of Dashelaea obscura Winn (Diptera:Ceratopogonidae) and in some arthropods. Parasitology 1921, 13:97-113. 8. Tartar A, Boucias DG, Becnel JJ, Adams BJ: Comparison of plastid 16S rRNA (rrn 16) genes from Helicosporidium spp.: evidence supporting the reclassification of Helicosporidia as green algae (Chlorophyta). Int J Syst Evol Microbiol 2003, 53:1719-1723. 9. Tartar A, Boucias DG: The non-photosynthetic, pathogenic green alga Helicosporidium sp. has retained a modified, functional plastid genome. FEMS Microbiol Lett 2004, 233(1):153-157. 10. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL X windows interface: flexible strategies for multiple sequence  49 alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25(24):4876-4882. 11. Strimmer K, von Haeseler A: Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 1996, 13(7):964-969. 12. Gascuel O: BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 1997, 14(7):685-695. 13. Bruno WJ, Socci ND, Halpern AL: Weighted neighbor joining: A likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 2000, 17(1):189-197. 14. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52(5):696-704. 15. Emanuelsson O, Nielsen H, Brunak S, von Heijne G: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 2000, 300(4):1005-1016. 16. Bannai H, Tamada Y, Maruyama O, Nakai K, Miyano S: Extensive feature detection of N-terminal protein sorting signals. Bioinformatics 2002, 18(2):298-305. 17. Bruce BD: The paradox of plastid transit peptides: conservation of function despite divergence in primary structure. Biochim Biophys Acta 2001, 1541(1-2):2-21. 18. Brinkmann H, Martin W: Higher-plant chloroplast and cytosolic 3-phosphoglycerate kinases: a case of endosymbiotic gene replacement. Plant Mol Biol 1996, 30(1):65-75. 19. Harwood JL: Recent advances in the biosynthesis of plant fatty acids. Biochim Biophys Acta 1996, 1301(1-2):7-56. 20. Wada H, Shintani D, Ohlrogge J: Why do mitochondria synthesize fatty acids? Evidence for involvement in lipoic acid production. Proc Natl Acad Sci USA 1997, 94(4):1591-1596.  50 21. Tocher DR, Leaver MJ, Hodgson PA: Recent advances in the biochemistry and molecular biology of fatty acyl desaturases. Prog Lipid Res 1998, 37(2-3):73-117. 22. Ke J, Behal RH, Back SL, Nikolau BJ, Wurtele ES, Oliver DJ: The role of pyruvate dehydrogenase and acetyl-coenzyme A synthetase in fatty acid synthesis in developing Arabidopsis seeds. Plant Physiol 2000, 123(2):497-508. 23. Bao XM, Focke M, Pollard M, Ohlrogge J: Understanding in vivo carbon precursor supply for fatty acid synthesis in leaf tissue. Plant J 2000, 22(1):39-50. 24. Papenbrock J, Grimm B: Regulatory network of tetrapyrrole biosynthesis - studies of intracellular signalling involved in metabolic and developmental control of plastids. Planta 2001, 213(5):667-681. 25. Iida K, Mimura I, Kajiwara M: Evaluation of two biosynthetic pathways to delta-aminolevulinic acid in Euglena gracilis. Eur J Biochem 2002, 269(1):291-297. 26. Weinstein JD, Beale SI: Separate physiological roles and subcellular compartments for two tetrapyrrole biosynthetic pathways in Euglena gracilis. J Biol Chem 1983, 258(11):6799-6807. 27. Porra RJ, Klein O, Wright PE: The proof by C-13-NMR spectroscopy of the predominance of the C-5 pathway over the Shemin pathway in chlorophyll biosynthesis in higher plants and of the formation of the methyl ester group of chlorophyll from glycine. Eur J Biochem 1983, 130(3):509-516. 28. Ohhama T, Seto H, Otake N, Miyachi S: C-13-NMR evidence for the pathway of chlorophyll biosynthesis in green algae. Biochem Biophys Res Commun 1982, 105(2):647-652. 29. Cunningham FX, Lafond TP, Gantt E: Evidence of a role for LytB in the nonmevalonate pathway of isoprenoid biosynthesis. J Bacteriol 2000, 182(20):5841-5848.  51 30. Rohdich F, Hecht S, Gartner K, Adam P, Krieger C, Amslinger S, Arigoni D, Bacher A, Eisenreich W: Studies on the nonmevalonate terpene biosynthetic pathway: metabolic role of IspH (LytB) protein. Proc Natl Acad Sci USA 2002, 99(3):1158-1163. 31. Velasco AM, Leguina JI, Lazcano A: Molecular evolution of the lysine biosynthetic pathways. J Mol Evol 2002, 55(4):445-459. 32. Kohlhaw GB: Leucine biosynthesis in fungi: entering metabolism through the back door. Microbiol Mol Biol Rev 2003, 67(1):1-15. 33. Hagelstein P, Schultz G: Leucine synthesis in spinach chloroplasts: partial characterization of 2-isopropylmalate synthase. Biol Chem Hoppe-Seyler 1993, 374(12):1105-1108. 34. Hagelstein P, Sieve B, Klein M, Jans H, Schultz G: Leucine synthesis in chloroplasts: Leucine/isoleucine aminotransferase and valine aminotransferase are different enzymes in spinach chloroplasts. J Plant Physiol 1997, 150(1-2):23-30. 35. Tamakoshi M, Yamagishi A, Oshima T: The organization of the leuC, leuD and leuB genes of the extreme thermophile Thermus thermophilus. Gene 1998, 222(1):125-132. 36. Ho CL, Noji M, Saito M, Yamazaki M, Saito K: Molecular characterization of plastidial phosphoserine aminotransferase in serine biosynthesis from Arabidopsis. Plant J 1998, 16(4):443-452. 37. Song JT, Lu H, Greenberg JT: Divergent roles in Arabidopsis thaliana development and defense of two homologous genes, ABERRANT GROWTH AND DEATH 2 and AGD2-LIKE DEFENSE RESPONSE PROTEIN 1, encoding novel aminotransferases. Plant Cell 2004, 16(2):353-366. 38. Neuhaus HE, Emes MJ: Nonphotosynthetic metabolism in plastids. Annu Rev Plant Phys 2000, 51(1):111-140. 39. Emes MJ, Neuhaus HE: Metabolism and transport in non-photosynthetic plastids. J Exp Bot 1997, 48(317):1995-2005.  52 40. Yonekura-Sakakibara K, Onda Y, Ashikari T, Tanaka Y, Kusumi T, Hase T: Analysis of reductant supply systems for ferredoxin-dependent sulfite reductase in photosynthetic and nonphotosynthetic organs of maize. Plant Physiol 2000, 122(3):887-894. 41. Schultz DJ, Suh MC, Ohlrogge JB: Stearoyl acyl carrier protein and unusual acyl-acyl carrier protein desaturase activities are differentially influenced by ferredoxin. Plant Physiol 2000, 124(2):681-692. 42. Onda Y, Matsumura T, Kimata-Ariga Y, Sakakibara H, Sugiyama T, Hase T: Differential interaction of maize root ferredoxin:NADP(+) oxidoreductase with photosynthetic and non-photosynthetic ferredoxin isoproteins. Plant Physiol 2000, 123(3):1037-1045. 43. Baumann U, Juttner J: Plant thioredoxins:the multiplicity conundrum. Cell Mol Life Sci 2002, 59:1042–1057. 44. Balmer Y, Koller A, del Val G, Manieri W, Schurmann P, Buchanan BB: Proteomics gives insight into the regulatory function of chloroplast thioredoxins. Proc Natl Acad Sci USA 2003, 100(1):370-375. 45. Wilson RJ, Rangachari K, Saldanha JW, Rickman L, Buxton RS, Eccleston JF: Parasite plastids: maintenance and functions. Philos Trans R Soc Lond, B, Biol Sci 2003, 358(1429):155-162. 46. Møller SG, Kunkel T, Chua  N-H: A plastidial ABC protein involved in intercompartmental communication of light signaling. Genes Dev 2001, 15:90-103. 47. Leustek T, Martin MN, Bick JA, Davies JP: Pathways and regulation of sulfur metabolism revealed through molecular and genetic studies. Annu Rev Plant Phys 2000, 51:141-165. 48. Lillig CH, Schiffmann S, Berndt C, Berken A, Tischka R, Schwenn JD: Molecular and catalytic properties of Arabidopsis thaliana adenylyl sulfate (APS) kinase. Arch Biochem Biophys 2001, 392(2):303-310. 49. Hell R, Jost R, Berkowitz O, Wirtz M: Molecular and biochemical analysis of the enzymes of cysteine biosynthesis in the plant Arabidopsis thaliana. Amino Acids 2002, 22(3):245-257.  53 50. Jost R, Berkowitz O, Wirtz M, Hopkins L, Hawkesford MJ, Hell R: Genomic and functional characterization of the OAS gene family encoding O-acetylserine (thiol) lyases, enzymes catalyzing the final step in cysteine biosynthesis in Arabidopsis thaliana. Gene 2000, 253(2):237-247. 51. Ravina CG, Chang CI, Tsakraklides GP, McDermott JP, Vega JM, Leustek T, Gotor C, Davies JP: The sac mutants of Chlamydomonas reinhardtii reveal transcriptional and posttranscriptional control of cysteine biosynthesis. Plant Physiol 2002, 130(4):2076-2084.    54 Chapter 4 - The complete plastid genome sequence of the parasitic green alga Helicosporidium sp. is highly reduced and structured∗ Introduction  Plastids are organelles found in plants and algae. Plastids originated in the endosymbiotic uptake of a cyanobacterium, which was subsequently transformed from a complex free-living bacterium to the highly specialized organelle now integrated with its host. At the genomic level, this integration involved the loss of many genes and the transfer of many more to the host nuclear genome, the protein products of which are targeted back to the organelle [1]. This process is not complete, however, as all known plastids have retained a residual genome that encodes a handful of RNA and protein-coding genes, which typically include many of the key components of photosystems I and II [2]. Our concept of plastids is inexorably tied to photosynthesis, since this is the dominant metabolic process of most plastids. They are, however, metabolically diverse organelles that play a role in the biosynthesis of amino acids, fatty acids, isoprenoids and heme, as well as in other processes related to photosynthesis such as pigment biosynthesis and radical detoxification. Indeed, in several lineages of plants and algae photosynthesis has been lost altogether, but the plastid has been retained for these and other purposes [3]. Well-known examples of this include holoparasitic and mycotrophic plants, many lineages of heterotrophic algae and parasitic apicomplexans (such as the malaria parasite). In most plastid genomes, the majority of genes encode products involved in either gene expression or photosynthesis. When photosynthesis is lost, so are most or all of the related genes, leading to dramatic changes in the plastid genome in size, coding capacity, and often also structure.  These genomes offer an opportunity to study the effects of massive genomic changes following a functional shift. Unfortunately, the number of fully sequenced                                             ∗ A version of this chapter has been published. de Koning AP, Keeling PJ: The complete plastid genome sequence of the parasitic green alga, Helicosporidium sp. is highly reduced and structured. BMC Biol 2006, 4(1):12.  55 non-photosynthetic plastid genomes is small, being limited to Epifagus virginiana (a holoparasitic angiosperm), Euglena longa (a heterotrophic euglenid), and several apicomplexan parasites bearing secondary plastids of red algal origin called apicoplasts (Plasmodium falciparum, Theileria parva, Eimeria tenella and Toxoplasma gondii). The E. virginiana plastid is about half the size of typical angiosperm plastids, having lost all its photosynthetic genes, but is otherwise similar to its relatives in many ways including non-coding DNA content, synteny of remaining genes and overall structure [4]. Likewise, E. longa has lost most of the photosynthetic genes found in the plastid of its close relative Euglena gracilis, but they share many features that are unique to euglenids, such as three tandem repeats of the RNA operon and a multitude of distinctive introns [5]. Apicomplexan plastid genomes, however, are quite different from those of other secondary red algal plastids found in photosynthetic lineages. They have undergone extensive rearrangements, are exceedingly small (~35 kb) and compact, and contain very little non-coding DNA [6-9]. The distinctiveness of apicomplexan plastid genomes may simply be due to time: apicomplexan plastids probably lost photosynthesis long ago in the ancestor of this diverse group, whereas other sequenced non-photosynthetic plastid genomes come from organisms with close relatives that are photosynthetic.  To further examine the process of genome reduction after the loss of a major metabolic function, we have completely sequenced the genome of the non-photosynthetic plastid of Helicosporidium sp., a parasitic green alga. Helicosporidia are obligate parasites of invertebrates with a unique morphology and infection strategy [10]. Their evolutionary origin was disputed until recently, when molecular evidence surprisingly showed that they are highly specialized trebouxiophyte green algae, specifically related to the opportunistic vertebrate parasites, Prototheca [11, 12]. This led to the prediction that Helicosporidia contain plastids, and although these have not yet been visually identified, molecular evidence has confirmed their existence [13-15]. The function of this organelle has been investigated by examining nucleus-encoded plastid-targeted proteins, which cumulatively suggest the Helicosporidium plastid is functionally similar in many ways to that of apicomplexan parasites [15]. Here, we show that the Helicosporidium plastid genome, while  56 retaining many features confirming its phylogenetic affiliation, has been radically reduced in a non-random, structured fashion. The result is a genome that is highly ordered with regard to several characteristics such as coding strand and the selective loss of tRNAs. Comparing the Helicosporidium plastid genome to that of other green algae and more distantly related non-photosynthetic plastids raises the interesting possibility that the ‘structured-reduction’ of Helicosporidium represents a common fate of such a genome. Methods Cell culture and genomic DNA isolation Helicosporidium sp. (ATCC 50920, isolated from the blackfly Simulium jonesii) was cultured axenically in TNM-FH insect medium (Sigma-Aldrich) supplemented with 5% fetal bovine serum and 50mg/ml of gentamycin at 25 °C in the dark. Cells were harvested by centrifugation and ground under liquid nitrogen. Total genomic DNA was extracted from the ground cells using the Plant DNeasy Mini Kit (Qiagen). Genome sequencing Genes encoding accD and cysT were amplified by PCR using the degenerate primer pairs GGCGTGATGGACTTYCANTTYATGG/GCCGTCACCCCNCCNGTNGTNG and GACTACTATGTGGAYYTNCCNTTYGC/GCCCCGAAGTARTCRTAYTGYTC, respectively. In addition, a fragment containing a portion of the rpoC1 and rpoC2 genes was sequenced as part of an ongoing genomic sequence survey, and two sequences (a partial SSU rRNA and rps12/rps7/tufA/rpl2) were characterized previously. These four sequences were used as anchors for long-range PCR containing 1 U Elongase polymerase mix (Invitrogen), 1.5mM [Mg 2+] buffer, 100 ng template DNA, 200 µM dNTPs and 0.2 µM each of two primers, resulting in eight overlapping fragments of the plastid genome ranging in size from 867 to 8168 bp. These fragments were TOPO TA (Invitrogen) cloned and sequenced using BigDye terminator chemistry (ABI) on both strands by primer walking. Sequences were  57 assembled using Sequencher (Gene Codes Corporation), yielding a circular molecule, with a total of about 2.4 kb of overlap between clones. The plastid genome sequence has been deposited in GenBank with accession number DQ398104. Annotation and analyses Protein coding genes were initially identified by BLASTX [44] searches of NCBI protein databases. In cases of divergent sequence and/or length heterogeneity such that the Helicosporidium ORF could be defined by more than one initiation codon, the longest non-overlapping ORF was selected. Ultimately, 94.9% of the genome consisted of ORFs with detectable homologues in other plastid genomes, giving high confidence that all genes were identified and annotated. Ribosomal RNA (rRNA) genes were identified by BLASTN searches against the plastid genome database at NCBI. Endpoints of rRNA genes were determined by alignment with trebouxiophyte plastid rRNA genes. Transfer RNA (tRNA) genes were identified using tRNAscan-SE [45]. All non-coding regions were re-analyzed with BLASTX and BLASTN searches, revealing no detectable matches. Mean intergenic distances were calculated from intergenic spaces between all genes, with overlapping genes given a value of 0. Results and discussion Genome size and density of coding regions The Helicosporidium sp. plastid genome was determined to be a circle 37,454 bp in length with a gene map as shown in Figure 4.1. It has an overall GC content of 26.9%, which is less than most plastids, but not as extreme as the 13.1% observed in the Plasmodium falciparum apicoplast [9]. Non-coding regions are more AT-rich (14.7% GC) than genes, and are small. Gene-density is high, with only 5.1% non-coding DNA and an average intergenic space of 36 bp. Four gene pairs (YCF1-V(UAC), rpoC1-rpoC2, rps19-rps3, and rpl20-W(CCA)) overlap by between 4 and 27 bp. Helicosporidium has by far the smallest plastid genome of any known Viridiplantae (plants and green algae), and is the smallest sequenced plastid   58    Figure 4.1 - Gene map of Helicosporidium plastid DNA Genes on the inside of the map are transcribed clockwise, while those on the outside are transcribed counter-clockwise. Transfer RNAs are indicated by the one-letter amino acid code followed by the anticodon in parentheses. rRNA: ribosomal RNA subunits, rpo: RNA polymerase subunits, tufA: elongation factor Tu, tilS: tRNA(Ile)-lysidine synthetase, ftsH: ftsH protease, cysT: sulfate transport protein, ycf1: conserved plastid protein of unknown function, accD: acetyl-CoA carboxylase beta subunit.  59 genome outside those of apicomplexan parasites. One of the most compact plastid genomes reported so far is the primitive red alga Cyanidioschyzon merolae, which although extremely gene rich was reported by Ohta et al. [16] to have a median intergenic distance of just 14 bp. Using the same measure of compactness as Ohta et al. (the median of intergenic spaces between adjacent protein-coding genes, where overlapping genes have a negative intergenic space), Helicosporidium is comparably compact, with a median intergenic distance of 8 bp. A comparison of the genomic features of non-photosynthetic plastids and their photosynthetic relatives is presented in Table 4.1. Compared with the photosynthetic trebouxiophyte Chlorella vulgaris, the Helicosporidium plastid has undergone a 4-fold reduction in genome size through large-scale gene loss (4-fold), compaction of the remaining genes with smaller intergenic regions (7-fold) and an overall lower proportion of non-coding sequence (3.7-fold). The opportunistic parasite Prototheca wickerhamii is a close relative of Helicosporidium [11, 12, 14], and has genome characteristics intermediate to Helicosporidium and C. vulgaris. At an estimated 54 kb [17], the P. wickerhamii plastid is one third the size of the C. vulgaris plastid with less non-coding DNA and more densely packed genes, but is reduced to a much lesser extent than Helicosporidium. Comparing other non-photosynthetic plastid genomes with photosynthetic relatives reveals that the reduction and compaction of Epifagus virginiana and Euglena longa are not as substantial (about a 2-fold reduction in size). Plastids of red algae and their derivatives tend to have more genes than those of green plastid lineages [1, 2], so it is interesting that the smallest and most compact genomes are found among the red plastids of Apicomplexa. A sister group comparison is difficult for this group, since the closest relatives of Apicomplexa are dinoflagellates, the plastid genomes of which are difficult to compare with other plastids because they have been transformed into single gene mini-circles [18]. However, the photosynthetic ancestors of Apicomplexa were probably similar to other secondary red plastid–containing organisms, such as Odontella sinensis (Table 4.1), which would indicate a 4-fold reduction in the plastid genome and an even more extreme level of compaction. 60     Table 4.1 - Plastid genome features compared between non-photosynthetic plastids and photosynthetic relatives    * The Prototheca genome is unfinished and its size is estimated from a restriction map [17], while the percentage non-coding DNA and mean intergenic distance are calculated from available sequences, constituting about half the genome.   61 Genome structure and organization Unlike most plastid genomes, the Helicosporidium genome does not contain an inverted repeat (Figure 4.1). Although inverted repeats are probably an ancestral character state for all plastids, they have been independently lost in several lineages. Among the green algal plastids investigated so far, the inverted repeat is absent in charophytes (Staurastrum punctulatum and Zygnema circumcarinatum [19]), ulvophytes (Caulerpa sertularoides [20] and Codium fragile [21]) and the trebouxiophyte Chlorella vulgaris [22], but is present in Chlorella ellipsoidea [23]. More interestingly, Helicosporidium has also lost the ribosomal RNA (rRNA) operon structure, which is a nearly universal feature of all genomes, including prokaryotes, eukaryotes and organelles. The plastid rRNA operon is normally part of the inverted repeat when it is present, and consists of the small and large subunit (SSU and LSU) rRNA genes separated by a spacer region containing the tRNA-Ile and tRNA-Ala genes. In Helicosporidium, the rRNA genes are separated by 22.6 kb of sequence, but tRNA-Ile and tRNA-Ala genes remain associated with the SSU and LSU genes, respectively, such that a typical rRNA operon has been broken in half and distributed at opposing ends of the circle (Figure 4.1). While the vast majority of plastids have the rRNA operon, it has been disrupted in C. ellipsoidea, S. punctulatum and the P. falciparum and coccidian apicoplasts, where the SSU and LSU rRNA genes are adjacent to each other but encoded on opposite strands [6, 7, 9, 19, 23]. It has also been disrupted in the charophytes, Z. circumcarinatum [19] and Spirogyra maxima [24], and the ulvophytes C. sertularoides [20] and C. fragile [21], where the two rRNA genes are located on the same strand but far apart, as in Helicosporidium. This genome rearrangement has therefore occurred in at least three independent lineages, and may be an outcome of loss of the inverted repeat.  The most striking feature of the Helicosporidium genome is the symmetry shown in strand bias of coding regions (Figure 4.1). The rRNA genes are nearly diametrically opposed, and all but two proteins and one tRNA on one side of them are on the same strand, while all but one tRNA on the other side are on the opposite strand. Similarly organized coding strand biases are also found in some  62 apicomplexan plastids and in the euglenid plastids, but the bias is not as strong. In P. falciparum, the coding strand switch occurs between the adjacent inverted repeats, each of which encodes LSU and SSU rRNA and nine tRNA genes [9] and contains the origin of replication [25]. In Euglena, the coding strand symmetry is bound on one end by the origin of replication and on the other end by the replication termination site [26]. In these organisms, the majority of genes are transcribed in each direction away from the bidirectional origin of replication, such that the leading strand of replication is largely the coding strand. Such bias is also widespread among bidirectional replicating prokaryote genomes, where it is hypothesized to be the result of selection to minimize collision between DNA and RNA polymerases moving in opposite directions [27].  Notably, one of only two non-coding regions larger than 100 bp in the Helicosporidium plastid genome is situated between the SSU rRNA and tRNA-Glu genes, at one of the crossover points in strand selection. To investigate whether this could be a replication origin, we constructed a sliding window of cumulative GC skew [G-C/G+C]. These plots detect changes in compositional bias of guanine over cytosine along a sequence, which presumably occur because of strand-specific mutational biases during replication, and their global minimum and maximum points correspond to the origin and termination of replication for genomes with bi-directional replication origins [28, 29]. The cumulative GC-skew of the Helicosporidium genome reveals a global minimum and maximum at the regions either side of the SSU and LSU genes (Figure 4.2), lending support to the idea that the origin of replication is located as marked in Figure 4.2. If this is so, the Helicosporidium plastid is like those of apicomplexans and euglenids in that almost all genes are encoded on the leading strand of replication, and that the observed coding strand symmetry is an adaptation for co-directional replication and transcription. Interestingly, this bilateral symmetry is not seen in most other plastid genomes, even though the selection pressure should be universal for bi-directionally replicating circular genomes. However, other selective pressures might also increase coding strand bias. A recent examination of plastid gene order in green algae and plants showed that in the Chlamydomonas reinhardtii plastid genome, adjacent genes were more often functionally related and  63     Figure 4.2 - GC skew diagram for the Helicosporidium plastid genome The sum of [G-C]/[G+C] calculated in 37 base, non-overlapping adjacent windows is shown superimposed on a linearized scale map of the entire genome, starting between the LSU and rpoC2 genes. The global minimum of this plot corresponds to the proposed origin of replication.  64 clustered on the same strand than in an inferred ancestral genome, and that this clustering is significantly higher than would be expected from random genome rearrangements [30]. At least in the highly re-arranged plastid genome of C. reinhardtii, and perhaps more generally, increased coding strand bias seems to be an outcome of selection for co-transcription of genes of common function. The comparison of gene order with C. vulgaris reveals that, in addition to the large-scale changes in genome structure, smaller rearrangements have also been common (Figure 4.3). The most obvious differences between the genomes are the many large deletions in Helicosporidium. These missing segments encode mostly photosynthetic products, but also clpP protease, the cell division proteins minD and minE, and several tRNAs and ribosomal proteins. In the remaining shared segments, synteny is low between Helicosporidium and C. vulgaris. Some conserved blocks do remain, such as a large string of genes for ribosomal proteins that are co-directionally transcribed, including L2, S19, S3, L16, L14, L5, S8, L36, and S11, and RNA polymerase subunit A (rpoA). This particular block of genes is conserved, with some lineage-specific deletions, in all plastids and is probably co-expressed. The partial plastid genome of P. wickerhamii shows considerable rearrangement of genes when compared to either C. vulgaris or Helicosporidium, suggesting rapid and ongoing rearrangements in these genomes. Gene content The Helicosporidium genome encodes 26 proteins, 3 rRNAs and 25 tRNAs. The only intron is a group I intron in the tRNA Leu (UAA) gene. This particular intron is commonly found in cyanobacteria and plastids and may be an ancestral plastid feature, although lineage-specific losses have occurred among green algae [31]. No unique ORFs of appreciable size were found and most of the protein-coding genes in Helicosporidium are identifiable as housekeeping proteins involved in transcription and translation (Figure 4.1). These include 16 ribosomal proteins, an elongation factor and components of an RNA polymerase (rpo). In plants, this polymerase is responsible mainly for transcription of genes associated with photosynthesis. In non-photosynthetic plants, algae and apicomplexans [4, 5, 9], some or all of the   65       Figure 4.3 - Gene order comparisons between trebouxiophyte plastid genomes Genomes are drawn to scale. Genes are depicted as boxes, with those above the line transcribed left-to-right and those below transcribed right-to-left. Coloured boxes represent identity with genes found in Helicosporidium sp., grey-shaded boxes represent identity between genes found in Chlorella vulgaris and Prototheca wickerhamii, and open boxes represent genes found only in C. vulgaris. Segments of the C. vulgaris genome that have been completely lost in Helicosporidium are shown in call-out boxes above the remainder of the genome. Straight lines join the genomes at the centre points of protein-coding and rRNA genes. For orientation, the rRNA genes are indicated.  66 rpo subunits have been lost from the plastid genome, and it is thought that a separate nuclear-encoded polymerase is responsible for plastid transcription. In the Helicosporidium plastid, however, all four subunits of the RNA polymerase complex (rpoA, rpoB, rpoC1, and rpoC2) are present. The Helicosporidium plastid also encodes tRNA(Ile)-lysidine synthetase (tilS), which is responsible for modifying the  CAU anticodon of a unique tRNA that is cognate for isoleucine. This CAU-reading tRNA is universally found among bacteria and plastids [32]. In plastids, however, tilS is generally encoded in the nuclear genome and targeted to the organelle. In addition to Helicosporidium, tilS is also plastid-encoded in the Rhodophyta, and in the green algae Nephroselmis olivacea, C. vulgaris, Chaetosphaeridium globosum and Mesostigma viride.  Only four protein-coding genes encode products not involved in transcription or translation: FtsH protease, which degrades membrane-bound proteins [33, 34], ycf1, a poorly conserved gene of unknown function that has been shown to be essential [35, 36], Acetyl-CoA carboxylase beta subunit (accD), which is involved in fatty acid biosynthesis [37], and a sulfate transport protein (cysT) [38]. These four genes have a scattered distribution among plastid genomes. FtsH protease is found in red algae, chromists (algae with plastids of secondary red algal origin) and green algae, but is nuclear-encoded in plants. AccD is found in most plants, green algae and red algae. Ycf1 is only found in the plastid genomes of plants and green algae, while cysT is restricted to green algae, a few lower plants and one red alga. Other components of the metabolic pathways in which accD and cysT participate are known to be encoded in the nucleus of Helicosporidium and targeted to the plastid, confirming these as metabolic functions of the organelle [15]. As expected, no genes involved in photosynthesis or bioenergetic processes were found.  As noted earlier, P. wickerhamii probably represents an intermediate form between autotrophic, Chlorella-like trebouxiophytes and the highly reduced Helicosporidium. Over half the P. wickerhamii plastid genome is known, and no photosystem, electron transport or chlorophyll biosynthesis proteins have been found. However, the P. wickerhamii plastid does encode genes for six of the  67 subunits of ATP synthase [17], which are not present in Helicosporidium or apicoplast genomes. The Helicosporidium plastid encodes a minimal set of tRNAs The Helicosporidium plastid genome contains just 25 tRNAs, which is among the smallest number of tRNA genes documented to date in a plastid genome (Table 4.1). This is in part due to a reduction in tRNA gene copy number, such that the Helicosporidium plastid encodes only a single copy of each tRNA with a particular anticodon. Multiple tRNA gene copies are universally found in plastids, sometimes independently (as in C. vulgaris) and sometimes as part of the inverted repeat (e.g., P. falciparum). Moreover, the Helicosporidium plastid genome contains a minimal functional set of tRNAs for a genome using all 61 sense codons and the universal genetic code. The set of tRNAs in Helicosporidium is a good illustration of the degree of order in the reduction of this genome. There are twenty amino acids and each is represented by a single tRNA except leucine, serine, arginine, methionine and isoleucine (Figure 4.4). Leucine, serine, and arginine are distinguished by having six codons and so require two tRNAs: one for four codons and another for the other two. Methionine has a single codon, but requires an initiator tRNA and a second one for internal methionine codons. Isoleucine, with three codons, requires 2 tRNAs: one for a pair of codons ending in a purine (R) and a second that is the substrate for tRNA(Ile)-lysidine synthetase, which modifies C in the first position of the anticodon to lysidine (L), making it cognate for the codon AUA. Conceptually, this is a minimum complement of tRNAs for plastids: one each for the twelve 2-fold degenerate codon groups, one each for the eight 4-fold degenerate codon groups, one tryptophan, one initiator methionine, one elongation methionine, one for the AUR pair of isoleucine codons, and the modified tRNA-Ile.  In general, plastids use more than the minimal set: about 32 different tRNA species are usually found because more than one isoacceptor is often used to decode the 4-fold degenerate groups of serine, leucine, threonine, arginine and glycine and the 2-fold degenerate lysine (UAR) group. Helicosporidium minimizes the number of isoacceptors used, by complete utilization of 3rd position wobble.  68      Figure 4.4 - Codon frequency and tRNAs in the Helicosporidium plastid *Wobble rules indicate the allowed mismatches between the first position of the anticodon and the third position of the codon (other positions, indicated by ‘X’, follow standard Watson-Crick base-pairing rules): 1) GXX anticodons read XXU and XXC codons, 2) CXX anticodons read XXG codons 3) AXX anticodons (where A is modified to I) read XXN codons, 4) UXX anticodons (where U is modified) read XXA and XXG codons, 5) UXX anticodons read XXN codons, and 6) LAU anticodon (where C is modified to L) reads AUA [32, 39-41].  69 As Figure 4.4 shows, the complement of tRNAs encoded in the Helicosporidium genome are sufficient to decode all codons found in the mRNA, assuming that some known modifications [32, 39] are used. Furthermore, Helicosporidium has retained only specific isoacceptors. For example, in every one of the eight degenerate codon pairs ending in a pyrimidine, the genome encodes only the tRNA with an anticodon that starts with guanine. The genome encodes the only tRNA with an anticodon that starts with uridine for every one of the five codon pairs ending in a purine, and for seven out of eight 4-fold degenerate groups. The single exception to this uniformity is tRNA-Arg (ACG). The first anticodon position A is presumably modified to inosine and reads all four Arg codons, as happens in plant plastids [40, 41]. Once again, the closest comparison for this type of reduction lies in the non-photosynthetic plastids. P. falciparum takes nearly complete advantage of the wobble rules but uses two anticodons for glycine. E. tenella, T. gondii and T. parva have dispensed with the extraneous tRNA-Gly and use the same suite of tRNAs as Helicosporidium. Curiously, however, all the Apicomplexa appear to lack the modified tRNA-Ile (CAT reading ATA). The ATA codon frequently appears in these genomes, so either a unique and unknown modification system must exist [42], or they import a tRNA. In the holoparasitic plant E. virginiana, a number of tRNAs have been lost or exist as pseudogenes. Seven essential anticodons are missing, so it has been suggested that E. virginiana must import tRNAs [43]. Helicosporidium is therefore unique in that it reduced its tRNA complement to a minimum without inventing new modifications, changing its genetic code or importing tRNAs from the cytoplasm; instead, it has simply done away with all redundant tRNAs to leave the perfect minimal set for the universal code. Interestingly, there is a strong AT codon bias in Helicosporidium protein-coding genes (the arginine codon CGG is used only once in the entire genome), and this bias is often counter to the tRNA complement (Figure 4.4). In some systems, there is a correlation between codon bias and what tRNA genes are present in the genome, and this is assumed to occur by selection for increased translation efficiency [32]. However, in Helicosporidium, codon bias is clearly a result of an  70 overall high AT bias, while the presence or absence of tRNAs is dictated by wobble rules.  Concluding remarks  When a major metabolic shift occurs, many genes may be lost. If photosynthesis disappears, this loss of genes can represent a large proportion of the plastid genome, so the effect is severe. By itself, however, this loss does not explain the nature of these reduced genomes, because there is no obvious reason that the resulting genome could not be similar in form to its photosynthetic ancestor, but reduced in content. Indeed, this is what we see in the holoparasitic plant E. virginiana and the heterotrophic euglenid E. longa. However, the genomes of the apicoplast and Helicosporidium are different; these genomes are highly reduced but more ordered than their ancestors’ were. At least some aspects of this ‘structured reduction’ appears to be related to high coding density: the symmetrical coding strand bias probably developed to co-ordinate transcription and replication, and the elegant utilization of wobble rules is probably to reduce the complement of tRNA genes to a minimal functional set. The Apicomplexa and Helicosporidium are not closely related; indeed, among plastid types they could hardly be more different: the apicoplast is a secondary plastid derived from a red alga whereas the Helicosporidium plastid is a primary green algal plastid. Their genomes have both retained characteristics that betray these origins, but they have also converged on a similar form in many ways. It is possible there are specific selective pressures operating here that are not important to other sequenced non-photosynthetic plastid genomes, or it could be that this is a predictable outcome for the evolution of these genomes, and less ordered examples are simply not there yet. Either way, the overall forms of apicomplexan and Helicosporidium plastid genomes have been shaped in parallel by common evolutionary forces. Comparing them raises interesting questions about whether there are selective pressures that lead genomes to compact, or if compaction is simply a by-product of reduction that occurs for neutral reasons.  71 References 1. McFadden GI: Primary and secondary endosymbiosis and the origin of plastids. J Phycol 2001, 37:951-959. 2. Douglas SE: Plastid evolution: origins, diversity, trends. Curr Opin Genet Dev 1998, 8(6):655-661. 3. Williams BAP, Keeling PJ: Cryptic organelles in parasitic protists and fungi. Adv Parasitol 2003, 54:9-67. 4. Wolfe KH, Morden CW, Palmer JD: Function and evolution of a minimal plastid genome from a non-photosynthetic parasitic plant. Proc Natl Acad Sci USA 1992, 89(22):10648-10652. 5. Gockel G, Hachtel W: Complete gene map of the plastid genome of the nonphotosynthetic euglenoid flagellate Astasia longa. Protist 2000, 151(4):347-351. 6. Denny P, Preiser P, Williamson D, Wilson I: Evidence for a single origin of the 35 kb plastid DNA in apicomplexans. Protist 1998, 149(1):51-59. 7. Cai X, Fuller AL, McDougald LR, Zhu G: Apicoplast genome of the coccidian Eimeria tenella. Gene 2003, 321:39-46. 8. Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M et al: Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science 2005, 309(5731):134-137. 9. Wilson RJ, Denny PW, Preiser PR, Rangachari K, Roberts K, Roy A, Whyte A, Strath M, Moore DJ, Moore PW et al: Complete gene map of the plastid-like DNA of the malaria parasite Plasmodium falciparum. J Mol Biol 1996, 261(2):155-172. 10. Boucias DG, Becnel JJ, White SE, Bott M: In vivo and in vitro development of the protist Helicosporidium sp. J Eukaryot Microbiol 2001, 48(4):460-470.  72 11. Tartar A, Boucias DG, Adams BJ, Becnel JJ: Phylogenetic analysis identifies the invertebrate pathogen Helicosporidium sp. as a green alga (Chlorophyta). Int J Syst Evol Micr 2002, 52:273-279. 12. de Koning AP, Tartar A, Boucias DG, Keeling PJ: Expressed Sequence Tag (EST) survey of the highly adapted green algal parasite, Helicosporidium. Protist 2005, 156(2):181-190. 13. Tartar A, Boucias DG: The non-photosynthetic, pathogenic green alga Helicosporidium sp. has retained a modified, functional plastid genome. FEMS Microbiol Lett 2004, 233(1):153-157. 14. Tartar A, Boucias DG, Becnel JJ, Adams BJ: Comparison of plastid 16S rRNA (rrn 16) genes from Helicosporidium spp.: evidence supporting the reclassification of Helicosporidia as green algae (Chlorophyta). Int J Syst Evol Micr 2003, 53:1719-1723. 15. de Koning AP, Keeling PJ: Nucleus-encoded genes for plastid-targeted proteins in Helicosporidium: functional diversity of a cryptic plastid in a parasitic alga. Eukaryot Cell 2004, 3(5):1198-1205. 16. Ohta N, Matsuzaki M, Misumi O, Miyagishima SY, Nozaki H, Tanaka K, Shin IT, Kohara Y, Kuroiwa T: Complete sequence and analysis of the plastid genome of the unicellular red alga Cyanidioschyzon merolae. DNA Res 2003, 10(2):67-77. 17. Knauf U, Hachtel W: The genes encoding subunits of ATP synthase are conserved in the reduced plastid genome of the heterotrophic alga Prototheca wickerhamii. Mol Genet Genomics 2002, 267(4):492-497. 18. Zhang Z, Green BR, Cavalier-Smith T: Single gene circles in dinoflagellate chloroplast genomes. Nature 1999, 400:155-159. 19. Turmel M, Otis C, Lemieux C: The complete chloroplast DNA sequences of the charophycean green algae Staurastrum and Zygnema reveal that the chloroplast genome underwent extensive changes during the evolution of the Zygnematales. BMC Biol 2005, 3:22. 20. Lehman RL, Manhart JR: A preliminary comparison of restriction fragment patterns in the genus Caulerpa (Chlorophyta) and the unique  73 structure of the chloroplast genome of Caulerpa sertulariodes. J Phycol 1997, 33(6):1055-1062. 21. Manhart JR, Kelly K, Dudock BS, Palmer JD: Unusual characteristics of Codium fragile chloroplast DNA revealed by physical and gene mapping. Mol Gen Genet 1989, 216(2-3):417-421. 22. Wakasugi T, Nagai T, Kapoor M, Sugita M, Ito M, Ito S, Tsudzuki J, Nakashima K, Tsudzuki T, Suzuki Y et al: Complete nucleotide sequence of the chloroplast genome from the green alga Chlorella vulgaris: the existence of genes possibly involved in chloroplast division. Proc Natl Acad Sci USA 1997, 94(11):5967-5972. 23. Yamada T: Repetitive sequence-mediated rearrangements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat. Curr Genet 1991, 19(2):139-147. 24. Manhart JR, Hoshaw RW, Palmer JD: Unique chloroplast genome in Spirogyra maxima (Chlorophyta) revealed by physical and gene mapping. J Phycol 1990, 26(3):490-494. 25. Williamson DH, Denny PW, Moore PW, Sato S, McCready S, Wilson RJ: The in vivo conformation of the plastid DNA of Toxoplasma gondii: implications for replication. J Mol Biol 2001, 306(2):159-168. 26. Hallick RB, Hong L, Drager RG, Favreau MR, Monfort A, Orsat B, Spielmann A, Stutz E: Complete sequence of Euglena gracilis chloroplast DNA. Nucleic Acids Res 1993, 21(15):3537-3544. 27. French S: Consequences of replication fork movement through transcription units in vivo. Science 1992, 258(5086):1362-1365. 28. Grigoriev A: Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res 1998, 26(10):2286-2290. 29. Guy L, Roten CA: Genometric analyses of the organization of circular chromosomes: a universal pressure determines the direction of ribosomal RNA genes transcription relative to chromosome replication. Gene 2004, 340(1):45-52.  74 30. Cui L, Leebens-Mack J, Wang LS, Tang J, Rymarquis L, Stern DB, Depamphilis CW: Adaptive evolution of chloroplast genome structure inferred using a parametric bootstrap approach. BMC Evol Biol 2006, 6(1):13. 31. Simon D, Fewer D, Friedl T, Bhattacharya D: Phylogeny and self-splicing ability of the plastid tRNA-Leu group I Intron. J Mol Evol 2003, 57(6):710-720. 32. Marck C, Grosjean H: tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features. RNA 2002, 8(10):1189-1232. 33. Chiba S, Akiyama Y, Mori H, Matsuo E, Ito K: Length recognition at the N-terminal tail for the initiation of FtsH-mediated proteolysis. EMBO Rep 2000, 1(1):47-52. 34. Lindahl M, Spetea C, Hundal T, Oppenheim AB, Adam Z, Andersson B: The thylakoid FtsH protease plays a role in the light-induced turnover of the photosystem II D1 protein. Plant Cell 2000, 12(3):419-431. 35. Drescher A, Ruf S, Calsa T, Jr., Carrer H, Bock R: The two largest chloroplast genome-encoded open reading frames of higher plants are essential genes. Plant J 2000, 22(2):97-104. 36. Boudreau E, Turmel M, Goldschmidt-Clermont M, Rochaix JD, Sivan S, Michaels A, Leu S: A large open reading frame (orf1995) in the chloroplast DNA of Chlamydomonas reinhardtii encodes an essential protein. Mol Gen Genet 1997, 253(5):649-653. 37. Sasaki Y, Hakamada K, Suama Y, Nagano Y, Furusawa I, Matsuno R: Chloroplast-encoded protein as a subunit of acetyl-CoA carboxylase in pea plant. J Biol Chem 1993, 268(33):25118-25123. 38. Laudenbach DE, Grossman AR: Characterization and mutagenesis of sulfur-regulated genes in a cyanobacterium: evidence for function in sulfate transport. J Bacteriol 1991, 173(9):2739-2750. 39. Osawa S, Jukes TH, Watanabe K, Muto A: Recent evidence for evolution of the genetic code. Microbiol Rev 1992, 56(1):229-264.  75 40. Pfitzinger H, Weil JH, Pillay DT, Guillemaut P: Codon recognition mechanisms in plant chloroplasts. Plant Mol Biol 1990, 14(5):805-814. 41. Sugiura C, Sugita M: Plastid transformation reveals that moss tRNA(Arg)-CCG is not essential for plastid function. Plant J 2004, 40(2):314-321. 42. Preiser P, Williamson DH, Wilson RJ: tRNA genes transcribed from the plastid-like DNA of Plasmodium falciparum. Nucleic Acids Res 1995, 23(21):4329-4336. 43. Wolfe KH, Morden CW, Ems SC, Palmer JD: Rapid evolution of the plastid translational apparatus in a nonphotosynthetic plant: loss or accelerated sequence evolution of tRNA and ribosomal protein genes. J Mol Evol 1992, 35(4):304-317. 44. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402. 45. Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997, 25(5):955-964.   76 Chapter 5 - Conclusions Summary Before the work presented in this thesis had begun, the molecular information from Helicosporidium was limited to partial sequences from only three nuclear genes, rRNA, actin and beta-tubulin [1]. The work presented here has led to over a 100-fold increase in the publicly available molecular information from Helicosporidium, and now Helicosporidium is one of the better-sampled members of the green algae in terms of molecular data.  Specifically, my work has led to a number of insights into the origin, evolution, metabolism, and genomics of Helicosporidium. The green algal origin of Helicosporidium is now supported by analyses based on a large number of genes, and this origin suggests that it has evolved from a photosynthetic ancestor, and as such, should possess a plastid. Consistent with this, I identified 20 putatively plastid-targeted enzymes involved in a wide variety of metabolic pathways. A comparison of the suggested metabolic complexity of the Helicosporidium plastid with that of an unrelated, cryptic plastid-containing organism, Plasmodium falciparum, showed that the metabolic diversity of the Helicosporidium cryptic plastid exceeds that of Plasmodium, since it includes representatives of most of the pathways known to operate in Plasmodium as well as several others.  In particular, several amino acid biosynthetic pathways have been retained, including the leucine biosynthesis pathway only recently recognized in plant plastids. Since this work was completed, a similar study was published on Prototheca wickerhamii [2]. Plastid-targeted proteins in P. wickerhamii reveal an even greater metabolic diversity than Helicosporidium, functioning in both carbohydrate metabolism and purine biosynthesis. These three parasites represent different evolutionary trajectories in plastid metabolic adaptation. The loss of an autotrophic lifestyle likely leads to a gradual reduction in self-sufficiency, and in the current spectrum of such organisms, Helicosporidium is somewhere between relatively self-sufficient opportunistic parasites such as P. wickerhamii and highly reduced obligate parasites like Plasmodium. This is consistent with the fact that Helicosporidium is an obligate parasite with a free  77 infectious cyst stage. Loss of photosynthesis has occurred independently in many plant and algal lineages, and represents a major metabolic shift with potential consequences for the content and structure of plastid genomes. Chapter 4 investigated such changes by examining the complete plastid genome of Helicosporidium. It is among the smallest known (37.5 kb), and like other plastids from non-photosynthetic organisms it lacks all genes for proteins that function in photosynthesis. Its reduced size results from more than just loss of genes; it is also highly compacted. Indeed, although Helicosporidium is related to trebouxiophyte green algae, the genome is structured and compacted in a manner more reminiscent of the non-photosynthetic plastids of apicomplexan parasites, raising the interesting possibility that there are common forces that shape plastid genomes, subsequent to the loss of photosynthesis in an organism. Reconstructing the evolution of major transitions like the origin of parasitism is a difficult problem. Since we cannot go back in time to observe the events, we need to compare several examples of the modern organisms that resulted from them. In some cases, such as algae becoming parasites, few such examples exist, so the greatest value of genomic data from Helicosporidium may be as a point of comparison with organisms like Plasmodium to help work out how these organisms got to be the way they are today.   Future work This work has shown beyond any serious doubt that a plastid exists in Helicosporidium; however, the organelle has yet to be observed. The Prototheca plastid is morphologically recognizable and easily identified in ultrastructural studies [3], because it is large and contains starch grains. Unfortunately, similar attempts to identify a plastid in Helicosporidium have not been successful [4, 5]. It is likely that the organelle is small and lacks the morphological characteristics that would make it readily identifiable, as is the case with the apicomplexan plastid. Like Helicosporidium, the cryptic plastid in Apicomplexa was first recognized through molecular sequence data, and only later was localized by using a fragment of the  78 16S rRNA gene as a probe for high resolution in situ hybridization [6]. With the plastid genome sequence now known, it would be possible to use this strategy to detect the Helicosporidium plastid. Once the plastid has been identified, a variety of techniques could be used to further investigate many aspects of the work presented in this thesis. Most importantly, in chapter 3 the function of the plastid was examined through identification of proteins that are predicted to function in the plastid, based on their phylogeny and the presence of targeting peptides. While these are strong indicators of a plastid location, direct evidence for the location of these proteins is important to confirm the location of the functions identified in chapter 3. This could be tested by high-resolution immunolocalization (immuno-electron microscopy) of a number of the identified proteins. For example, direct evidence for a plastid location of acyl carrier protein, isopropylmalate synthase, and adenylylsulfate kinase would validate that fatty acid biosynthesis, leucine biosynthesis and sulfur assimilation are indeed occurring within the plastid. Such an approach was used to confirm that fatty acid biosynthesis is one of the main functions of the apicoplast [7]. The function of the Helicosporidium plastid is based on a sample of the plastid proteome identified from EST sequencing. This identified potential pathways, but no pathway was complete and others likely still exist undetected. A more complete picture could be obtained if the entire nuclear sequence of Helicosporidium was known. For example, a detailed metabolic map of the apicoplast was created based on predicted plastid-targeted genes identified from the complete nuclear genome of Plasmodium falciparum [8]. Because all enzymes of a particular pathway were accounted for, the authors were also able to identify some variations on pathways that occur in most plastids, and these were related to specific adaptations to a parasitic lifestyle. Helicosporidium may be a good candidate for complete genome sequencing. In other obligate endoparasites, genome reduction is commonly seen [e.g., 9, 10], and this appears to be true for Helicosporidium as well. The nuclear genome is comprised of nine chromosomes, ranging in size from 700 kb to 2000 kb, with a total estimated size of about 10.5 Mb [11]. This is over three times smaller than the free-living trebouxiophyte, Chlorella vulgaris [12].   79 Lastly, full genome sequencing would be very useful in comparative studies with green algae as well as parasites. Two green algal genome sequencing projects are currently nearing completion: Chlamydomonas reinhardtii, and Ostreococcus tauri [13]. O. tauri is particularly interesting: it is a marine planktonic, prasinophyte green alga. It is believed to be the smallest free-living eukaryote, both in terms of physical size (about 1 micron in diameter), and nuclear genome size (9.7 Mb) [14]. The complete Helicosporidium nuclear genome would thus allow comparisons between several categories of reduced genomes, such as unrelated organisms that share a parasitic lifestyle, like apicomplexans [e.g., 15, 16], and related organisms that have different lifestyles, such as O. tauri.    80 References 1. Tartar A, Boucias DG, Adams BJ, Becnel JJ: Phylogenetic analysis identifies the invertebrate pathogen Helicosporidium sp. as a green alga (Chlorophyta). Int J Syst Evol Micr 2002, 52:273-279. 2. Borza T, Popescu CE, Lee RW: Multiple metabolic roles for the nonphotosynthetic plastid of the green alga Prototheca wickerhamii. Eukaryot Cell 2005, 4(2):253-261. 3. Nadakavukaren MJ, McCracken DA: An ultrastructural survey of genus Prototheca with special reference to plastids. Mycopathologia 1977, 61(2):117-119. 4. Yaman M, Radek R: Helicosporidium infection of the great European spruce bark beetle, Dendroctonus micans (Coleoptera : Scolytidae). Eur J Protistol 2005, 41(3):203-207. 5. Blaske-Lietze VU, Shapiro AM, Denton JS, Botts M, Becnel JJ, Boucias DG: Development of the insect pathogenic alga Helicosporidium. J Eukaryot Microbiol 2006, 53(3):165-176. 6. McFadden GI, Reith ME, Munholland J, Lang-Unnasch N: Plastid in human parasites. Nature 1996, 381(6582):482-482. 7. Waller RF, Keeling PJ, Donald RG, Striepen B, Handman E, Lang-Unnasch N, Cowman AF, Besra GS, Roos DS, McFadden GI: Nuclear-encoded proteins target to the plastid in Toxoplasma gondii and Plasmodium falciparum. Proc Natl Acad Sci U S A 1998, 95(21):12352-12357. 8. Ralph SA, van Dooren GG, Waller RF, Crawford MJ, Fraunholz MJ, Foth BJ, Tonkin CJ, Roos DS, McFadden GI: Tropical infectious diseases: metabolic maps and functions of the Plasmodium falciparum apicoplast. Nat Rev Microbiol 2004, 2(3):203-216. 9. Vivares CP, Metenier G: Towards the minimal eukaryotic parasitic genome. Curr Opin Microbiol 2000, 3(5):463-467. 10. Moran NA: Microbial minimalism: genome reduction in bacterial pathogens. Cell 2002, 108(5):583-586.  81 11. Tartar A: Incertae sedis no more: The phylogenetic affinity of Helicosporidia. PhD Thesis. Gainesville: University of Florida; 2004. 12. Higashiyama T, Yamada T: Electrophoretic karyotyping and chromosomal gene mapping of Chlorella. Nucleic Acids Res 1991, 19(22):6191-6195. 13. Grossman AR: Paths toward algal genomics. Plant Physiol 2005, 137(2):410-427. 14. Derelle E, Ferraz C, Lagoda P, Eychenie S, Cooke R, Regad F, Sabau X, Courties C, Delseny M, Demaille J et al: DNA libraries for sequencing the genome of Ostreococcus tauri (Chlorophyta, Prasinophyceae): The smallest free-living eukaryotic cell. J Phycol 2002, 38(6):1150-1156. 15. Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M et al: Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science 2005, 309(5731):134-137. 16. Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, Deng M, Liu C, Widmer G, Tzipori S et al: Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science 2004, 304(5669):441-445.    

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092865/manifest

Comment

Related Items