@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix dc: . @prefix skos: . vivo:departmentOrSchool "Medicine, Faculty of"@en, "Medical Genetics, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Gill, Erin E."@en ; dcterms:issued "2009-04-16T16:02:45Z"@en, "2009"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """Microsporidia are unicellular eukaryotes that are closely related to fungi. They are obligate intracellular parasites of diverse animals. Microsporidia possess some of the smallest eukaryotic primary nuclear genomes in existence. The human pathogen Encephalitozoon cuniculi has a fully sequenced genome, and at 2.9 Mbp, is at the smaller end of the 2.3 to 19.5 Mbp microsporidian range. E. cuniculi’s genome has undergone a process of reduction, and is characterized by short intergenic spaces, few introns and shorter genes compared to homologues in yeast. A combined eight-gene dataset was employed in order to identify the closest fungal relative of microsporidia. All phylogenetic methods recovered microsporidia as a sister to a combined ascomycete and basidiomycete clade, but other options could not be rejected based on approximately unbiased (AU) tests. The effects of genome reduction on E. cuniculi’s DNA repair systems were examined by determining the presence or absence of the components of five evolutionarily conserved single and double strand repair pathways. While some individual proteins that participate in the single strand repair pathways were absent in E. cuniculi, the essential functional machinery of each pathway remained intact. However, integral proteins were absent from both double strand repair pathways, suggesting that E. cuniculi does not repair its double stranded DNA lesions by previously characterized mechanisms. E. cuniculi meront and spore transcripts were compared to examine differences in life cycle regulation of 5’ untranslated region (5’UTR) length and mRNA splicing. Spore transcripts are never spliced and have longer 5’UTRs that frequently overlap with upstream genes, meront transcripts are often spliced with shorter 5’UTRs that overlap less frequently with upstream genes. Large differences in transcriptional regulation exist between the spore and the meront, and the odd spore transcripts may be a byproduct sporulation. Transcription was examined in Edhazardia aedis, a microsporidian with a larger, non-compact genome. A large degree of similarity was found when clusters of orthologous groups of gene (COG) categories of unique transcripts of Ed. aedis and another distantly related species, Antonospora locustae were compared. In addition, we documented the first known case of transcription of a transposable element in a microsporidian."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/7210?expand=metadata"@en ; dcterms:extent "6638975 bytes"@en ; dc:format "application/pdf"@en ; skos:note "EVOLUTION OF THE MICROSPORIDIAN GENOME AND GENE EXPRESSION by ERIN E. GILL B.Sc., Dalhousie University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Genetics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2009  Erin E. Gill, 2009 ii Abstract Microsporidia are unicellular eukaryotes that are closely related to fungi. They are obligate intracellular parasites of diverse animals that are characterized by their unique host invasion apparatus, the polar tube (polar filament). Microsporidia possess some of the smallest eukaryotic primary nuclear genomes in existence. The human pathogen Encephalitozoon cuniculi is the only microsporidian whose genome has been fully sequenced, and at 2.9 Mbp, it is at the smaller end of the 2.3 to 19.5 Mbp microsporidian range. E. cuniculi’s genome has undergone a process of reduction to reach its diminutive size, and is characterized by short intergenic spaces, few introns and shorter genes compared to homologues in yeast. Microsporidian genes also evolve very quickly, which led to difficulties in resolving the details of the microsporidian-fungal relationship. A combined eight-gene dataset was employed using multiple phylogenetic methods in order to identify the closest fungal relative of microsporidia. All phylogenetic methods recovered microsporidia as a sister to a combined ascomycete and basidiomycete clade, but other options could not be rejected based on approximately unbiased (AU) tests. The effects of genome reduction on E. cuniculi’s DNA repair systems were examined by determining the presence or absence of the components of five evolutionarily conserved single and double strand repair pathways. While some individual proteins that participate in the single strand repair pathways were absent in E. cuniculi, the essential machinery to ensure the function of each pathway remained intact. However, integral proteins were absent from both double strand repair pathways, suggesting that E. cuniculi does not repair its double stranded DNA lesions by previously characterized mechanisms. The E. cuniculi homologues of three proteins that participate in pre-mRNA splicing were assayed for their ability to complement yeast knockout mutants. None of the E. cuniculi proteins rescued the yeast knockouts, implying that the microsporidian proteins interact with each other using different functional residues and in the absence of proteins that are used in yeast. E. cuniculi meront and spore transcripts were compared to examine differences in life cycle regulation of 5’ untranslated region (5’UTR) length and mRNA splicing. While transcripts from spores are never spliced and iii have longer 5’UTRs that frequently overlap with upstream genes, meront transcripts are often spliced with 5’UTRs that are usually shorter and overlap less frequently with upstream genes. Large differences in transcriptional regulation exist between the spore and the meront, and the longer, unspliced transcripts from the spore may be a byproduct of the cell shutting down during the process of spore formation. Transcription was examined in Edhazardia aedis, a microsporidian with a larger, non-compact genome. A large degree of similarity was found when clusters of orthologous groups of gene (COG) categories of unique transcripts from Ed. aedis and another distantly related species, Antonospora locustae were compared. In addition, we documented the first known case of transcription of a transposable element in a microsporidian. iv Table of Contents Abstract .........................................................................................................................ii Table of Contents ......................................................................................................... iv List of Tables ................................................................................................................vi List of Figures..............................................................................................................vii Acknowledgements.....................................................................................................viii Co-Authorship Statement ............................................................................................ ix Chapter 1 - Introduction............................................................................................... 1 1.1 Microsporidian Discovery and General Characteristics .......................................... 1 1.2 The Microsporidian Life Cycle.............................................................................. 1 1.3 Microsporidian Phylogeny..................................................................................... 2 1.4 The Microsporidian Genome ................................................................................. 3 1.5 Consequences of Microsporidian Genome Reduction ............................................ 3 1.6 Transcription in a Non-Compact Microsporidian Genome ..................................... 4 1.7 Objectives ............................................................................................................. 4 1.8 References............................................................................................................. 7 Chapter 2 – Assessing the Microsporidia-Fungi Relationship .................................. 10 2.1 Introduction......................................................................................................... 10 2.2 Materials and Methods ........................................................................................ 12 2.2.1 Gene and Taxon Selection ............................................................................ 12 2.2.2 Phylogenetic Analyses .................................................................................. 14 2.3 Results and Discussion ........................................................................................ 15 2.3.1 Tree Topologies and Fungal Phyla ................................................................ 15 2.3.2 Considering the Phylogenetic Position of Microsporidia ............................... 17 2.4 References........................................................................................................... 25 Chapter 3 – Genome Reduction and DNA Repair Systems in E. cuniculi ................ 30 3.1 Introduction......................................................................................................... 30 3.1.1 DNA Repair in Eukaryotes ........................................................................... 30 3.1.2 Genome Reduction in Encephalitozoon cuniculi ........................................... 31 3.2 Methods .............................................................................................................. 32 3.3 Results................................................................................................................. 33 3.3.1 Base Excision Repair (BER) ......................................................................... 33 3.3.2 Nucleotide Excision Repair (NER) ............................................................... 34 3.3.3 Methyltransferase Repair .............................................................................. 35 3.3.4 Mismatch Repair (MMR).............................................................................. 36 3.3.5 Homologous Recombination Repair (HRR) .................................................. 36 3.3.6 Non-Homologous End Joining Repair (NHEJ).............................................. 38 3.3.7 DNA Polymerases ........................................................................................ 39 3.4 Discussion ........................................................................................................... 39 3.4.1 Reduction in Complexity of DNA Repair...................................................... 40 3.4.2 DNA Polymerases and Repair....................................................................... 41 3.4.3 Double Strand Break Repair in E. cuniculi .................................................... 42 3.4.4 Potential Consequences for E. cuniculi ......................................................... 44 3.5 References........................................................................................................... 52 vChapter 4 - Genome Reduction and the E. cuniculi Spliceosome ............................. 58 4.1 Introduction......................................................................................................... 58 4.1.1 Microsporidia and the Microsporidian Genome ............................................ 58 4.1.2 Splicing and the Spliceosome........................................................................ 58 4.1.3 Dib1 ............................................................................................................. 60 4.1.4 Prp22............................................................................................................ 60 4.1.5 Brr2 .............................................................................................................. 61 4.2 Materials and Methods ........................................................................................ 62 4.3 Results and Discussion ........................................................................................ 63 4.3.1 Dib1 ............................................................................................................. 63 4.3.2 Prp22............................................................................................................ 65 4.3.3 Brr2 .............................................................................................................. 66 4.3.4 The E. cuniculi Spliceosome......................................................................... 66 4.4 References........................................................................................................... 74 Chapter 5 – Life Stage Differences in Splicing and Transcription............................ 78 5.1 Introduction......................................................................................................... 78 5.2 Materials and Methods ........................................................................................ 80 5.3 Results................................................................................................................. 81 5.3.1 Transcription ................................................................................................ 81 5.3.2 Splicing ........................................................................................................ 82 5.4 Discussion ........................................................................................................... 83 5.4.1 Transcription in E. cuniculi ........................................................................... 83 5.4.2 Splicing in E. cuniculi and A. locustae .......................................................... 84 5.4.3 Splicing and Transcription are Integrated Processes ...................................... 85 5.5 References........................................................................................................... 96 Chapter 6 – Gene Expression in a Non-Compact Microsporidian Genome ........... 100 6.1 Introduction....................................................................................................... 100 6.2 Methods ............................................................................................................ 103 6.3 Results............................................................................................................... 103 6.3.1 Overview.................................................................................................... 103 6.3.2 Transcript Structure .................................................................................... 105 6.4 Discussion ......................................................................................................... 106 6.4.1 Comparing Microsporidian Transcriptomes ................................................ 106 6.4.2 Hsp70 ......................................................................................................... 107 6.4.3 Transposable Elements ............................................................................... 108 6.4.4 Transcript Structure .................................................................................... 110 6.5 References......................................................................................................... 119 Chapter 7 - Conclusion ............................................................................................. 122 7.1 Introduction....................................................................................................... 122 7.2 Assessing the Microsporidia-Fungi Relationship ............................................... 122 7.3 Genome Reduction and DNA Repair Systems in Encephalitozoon cuniculi ....... 123 7.4 Genome Reduction and the E. cuniculi Spliceosome.......................................... 123 7.5 Life Stage Differences in Splicing and Transcription ......................................... 124 7.6 Gene Expression in a Non-Compact Microsporidian Genome............................ 125 7.7 General Conclusions.......................................................................................... 125 7.8 References......................................................................................................... 127 vi List of Tables Table 3.1: S. cerevisiae DNA Polymerases and Proteins that Participate in the Five Primary DNA Repair Pathways.......................................................................... 47 Table 4.1: Proteins Examined in this Study. .............................................................. 68 Table 4.2: S. cerevisiae and E. cuniculi Dib1 Residues in Conserved Positions. ....... 69 Table 4.3: S. cerevisiae and E. cuniculi Prp22 Sequences vs. Seven Conserved Motifs in DEAH Helicases. ............................................................................................. 69 Table 4.4: S. cerevisiae and E. cuniculi Brr2 Sequences vs. Conserved Sequence Motifs in DEXH Helicases................................................................................... 70 Table 4.5: Plasmids Used in this Study....................................................................... 70 Table 4.6: S. cerevisiae Strains Used in this Study. .................................................... 70 Table 5.1: Intron splicing patterns and size distribution in E. cuniculi and A. locustae................................................................................................................. 87 Supplementary Table 5.1: E. cuniculi 5’UTR lengths in Spores and Meronts. ........ 89 Supplementary Table 5.2: Primers Used in this Study to Examine Splicing. ........... 90 Supplementary Table 5.3: Primers Used in this Study to Examine Transcription. . 92 Table 6.1: Unique Ed. aedis Transcripts that are Homologous to Genes Present in Other Microsporidia.......................................................................................... 115 Table 6.2: Ed. aedis Genes that are Absent from Other Microsporidia. ................. 115 vii List of Figures Figure 1.1: The Microsporidian Life Cycle.................................................................. 6 Figure 2.1: A. AU Probabilities for the Three Branching Patterns Not Rejected at a 5% Significance Level. B. PROML Tree Generated from the Complete Concatenated Dataset.......................................................................................... 20 Figure 2.2: PROML Tree Generated from the Short Concatenated Dataset Including Spizellomyces punctatus. ..................................................................... 21 Supplementary Figure 2.1: IQPNNI Trees Generated from Individual Gene Analyses. .............................................................................................................. 23 Figure 3.1: A Comparison of the Five Major DNA Repair Pathways...................... 48 Figure 3.2: The Homologous Recombination Repair Pathway. ................................ 49 Figure 3.3: The Non-Homologous End Joining Repair Pathway. ............................. 50 Figure 4.1: Saccharomyces Dib1 Tetrad Dissections.................................................. 71 Figure 4.2: Saccharomyces Prp22 Transformations. ................................................. 71 Figure 4.3: Saccharomyces Brr2 Transformations. ................................................... 72 Figure 4.4: Pairwise Distances Between Various Species for Dib1, Prp22 and Brr2. .............................................................................................................................. 73 Figure 5.1: Transcript Lengths and Start Sites in E. cuniculi. .................................. 88 Supplementary Figure 5.1: The Life Cycle of E. cuniculi.......................................... 93 Supplementary Figure 5.2: Microsporidian Phylogeny Depicting the Relationship between A. locustae and E. cuniculi. ................................................................... 94 Supplementary Figure 5.3: Capillary Electrophoresis Results for L7A and S29 from Meronts. ............................................................................................................... 95 Figure 6.1: The Phylogenetic Relationships Between Several Microsporidia......... 116 Figure 6.2: Total Ed. aedis Transcripts Represented by COG Category With and Without Hsp70................................................................................................... 117 Figure 6.3: Unique Ed. aedis Transcripts Represented by COG Category............. 118 Figure 6.4: 5’ RACE Conducted on a Moderately Represented Transcript in Ed. aedis Reveals Multiple Transcription Start Sites. ............................................ 118 viii Acknowledgements I want to thank the members of the Keeling and Fast labs for helping me slog my way through over the past five years. I’m especially grateful to my parents, Rowena and Corey for listening to me rant and helping me to relax. Thank you for always believing I could finish, even when I didn’t believe it myself. ix Co-Authorship Statement Chapter 2 was originally published as: Gill, E.E.; Fast, N.M. (2006) Assessing the microsporidia-fungi relationship: Combined phylogenetic analysis of eight genes. Gene 375, 103-109. I performed the data mining, phylogenetic analyses and drafted the manuscript. NMF conceived of the project, assisted in the interpretation of the data and helped edit the manuscript. Chapter 3 was originally published as: Gill, E.E.; Fast, N.M. (2007) Stripped- down DNA repair in a highly reduced parasite. BMC Mol Biol 8, 24. I conceived of the project, performed the analyses and drafted the manuscript. NMF assisted in the interpretation of the data and helped edit the manuscript. Chapter 4 is unpublished. I assessed complementation candidates with Valerie Limpright. I performed the wet lab work and wrote the chapter. NMF assisted in the interpretation of the data and helped edit the chapter. Chapter 5 is in preparation for publication as follows: Gill, E.E.; Lee, R.C.; Corradi, N.; Grisdale, C.; Limpright, V.O.; Keeling, P.J.; Fast, N.M. Splicing and transcription in microsporidia: the spore-meront dichotomy. I performed the 5’RACE experiments for the E. cuniculi meronts, grew the locusts and extracted tissue infected with A. locustae and drafted the manuscript. RCL performed the 5’RACE experiments for most of the E. cuniculi spore intron-containing genes. NC performed the 5’RACE experiments for the E. cuniculi spore intron-less genes. CG performed 5’RACE on A. locustae and some of the E. cuniculi genes. VOL performed 5’RACE on some of the E. cuniculi spore intron-containing genes. PJK and NMF conceived of the project, contributed the interpretation of the data and assisted in editing the manuscript. Chapter 6 was originally published as: Gill, E.E.; Becnel, J.J.; Fast, N.M. (2008) ESTs from the microsporidian Edhazardia aedis. BMC Genomics 9, 296. I extracted the RNA from the spores, manually edited the sequences, analyzed the data, and drafted the manuscript. JJB infected the mosquito larvae and harvested the spores. NMF conceived of the project, assisted in the interpretation of the data and helped edit the manuscript. 1Chapter 1 - Introduction 1.1 Microsporidian Discovery and General Characteristics Microsporidia are unicellular organisms that are closely related to fungi. They are obligate intracellular parasites that are only able to replicate inside host tissue, and spread between cells and hosts via spores. Microsporidia were first identified hundreds of years ago in France as the causative agent of the silkworm disease “pebrine”. Since the 1800s, over 1200 species of microsporidia have been described from hundreds of genera (Wittner and Weiss, 1999). They primarily infect animals, and have hosts belonging to every animal phylum, including fish, crustaceans and insects. Many aspects of the biology of these organisms are still poorly understood, but microsporidia have gained much interest from the medical community since the advent of HIV/AIDS due to the ability of some species to infect immuno-compromised humans. Therefore, much of our existing data has been gleaned from human parasites belonging to the genera Encephalitozoon and Enterocytozoon (Wittner and Weiss, 1999). Microsporidian spores are single cells that are enveloped in a thick spore wall that is composed of carbohydrate and protein. Their distinguishing feature is the polar filament (polar tube), which encircles the nucleus (or nuclei) several times within the spore. The number of times that the polar tube encircles the nucleus is often used to identify different species. The spore also contains an anchoring disk at the anterior end to which the polar tube is attached, and posterior vacuole, which plays a role in the germination process. Although spores are plentiful in the environment, they are unable to grow and multiply without a host (Wittner and Weiss, 1999). 1.2 The Microsporidian Life Cycle The microsporidian life cycle begins when a spore (Fig. 1a) is in close proximity to a host cell and unknown events trigger its germination. The process of germination entails the eversion of the polar filament, which breaks through the spore wall and forcefully projects itself through the host cell membrane (b). The posterior vacuole then expands and the spore contents are transported out of the spore, through the polar tube and into the host cell’s cytoplasm (c). Once inside the host cell, the spore contents are 2called a meront. When the meront has divided many times (d), a spore wall begins to form around each nucleus (or pair of nuclei, depending on the species and host) (e). The host cell then lyses, and the spores are released. Spores can either infect adjacent cells or exit the body of the host and come into contact with new potential hosts. The microsporidian life cycle ranges from moderate to complex and can involve several host individuals from different species. Some species produce only a single spore type, whereas others produce three or four (Wittner and Weiss, 1999). Because the spore is easier to separate from host tissue, is less perishable and is self-contained, nearly all molecular studies conducted on microsporidia to date have focused on this life stage. Therefore, we have very little data on the biochemical processes occurring in the meront. 1.3 Microsporidian Phylogeny Initially, Pasteur identified the organism causing the silkworm disease “pebrine” as a yeast. Yet in the early 1880’s, Balbiani realized that these life forms were fundamentally distinct from other extant organisms and coined the term “microsporidia.” Although microsporidia have since been recognized as a cohesive group, their position within the eukaryotic tree of life has been tenuous. Early ribosomal RNA and elongation factor phylogenies seemed to indicate that microsporidia were an extremely ancient eukaryotic lineage (Kamaishi et al., 1996a;Kamaishi et al., 1996b;Vossbrinck et al., 1987). These phylogenies and microsporidians’ lack of typical mitochondria led investigators to believe that they were members of the kingdom “Archezoa”. However, the identification of the mitochondrial Hsp70 (Germot et al., 1997;Peyretaillade et al., 1998) and pyruvate dehydrogenase E1 (Fast and Keeling, 2001) genes and the presence of degenerate mitochondria (mitosomes) in some life stages (Williams et al., 2002) have forced us to reexamine the Archezoa hypothesis. Phylogenetic analyses of protein coding genes including α- and β-tubulin (Keeling et al., 2000;Keeling, 2003), TATA box binding protein (Fast et al., 1999), and the largest subunit of RNA polymerase II (RPB1) (Hirt et al., 1999) suggest a relationship with fungi. However, the exact nature of the relationship varies depending on the gene used to build the phylogeny. Therefore, it was difficult to discern whether microsporidia arose from within the fungi, or if they simply shared a common ancestor. 31.4 The Microsporidian Genome Microsporidian genomes range in size from 2.3 to 19.5 Mbp. Despite this large range, the most well-studied species possess genomes that lie at the smaller end of this range. The only microsporidian whose genome has been fully sequenced, the human pathogen Encephalitozoon cuniculi, has a phenomenally small genome (2.9 Mbp) that is extremely compact (~1 gene/kb) and possesses very few introns (15) (Katinka et al., 2001) (See Chapter 5). Although microsporidia that are human pathogens are becoming increasingly well-studied, little is known about genome architecture and composition in most other species. Microsporidia have relatively fast-evolving genes (Thomarat et al., 2004), but their genome arrangements appear to be largely static, and species that are not closely related display much larger degrees of synteny than would be expected (Slamovits et al., 2004). This is perhaps due to a paucity of transposable elements, as these elements seem to flank regions of synteny between species (Xu et al., 2006). Transposable elements are generally known to be major forces shaping eukaryotic genomes, and play roles in genome rearrangement, expansion and regulation (Miller and Capy, 2004). 1.5 Consequences of Microsporidian Genome Reduction At 2.9 Mbp, the genome of E. cuniculi is extremely small by eukaryotic standards and is predicted to contain a scant 2000 genes (Katinka et al., 2001). Consequently, E. cuniculi lacks many genes that are ubiquitous among eukaryotes, including many that are necessary for viability in the yeast Saccharomyces cerevisiae. The absence of these genes could be partially explained by E. cuniculi’s parasitic lifestyle, as it may be able to absorb metabolites from the host that it is unable to synthesize for itself (Wittner and Weiss, 1999). However, E cuniculi is unlikely to use a multitude of proteins synthesized by the host for its own essential cellular processes. Because E. cuniculi’s gene content has been reduced to such a large extent, it provides a system in which we can study the core machinery of many cellular processes. Surprisingly, studies of E. cuniculi and the distantly related Antonospora locustae spore transcripts show that a large proportion contain fragments of more than one gene 4(Corradi et al., 2008;Williams et al., 2005). This is a rare situation among eukaryotes, and is also different than prokaryotic operons, because only one protein can usually be produced from each transcript. It is thought that this phenomenon is a byproduct of genome compaction, and results from genes being very closely spaced (Corradi et al., 2008). However, it is not known whether the transcripts contain fragments of multiple genes because of a lack of transcriptional regulation, or if promoters are indeed located within adjacent genes. In addition, only transcripts from spores have been examined; it is not known if multi gene transcription also occurs in the proliferative meront stage. 1.6 Transcription in a Non-Compact Microsporidian Genome Genome sequence survey projects were conducted by Williams et al. (Williams et al., 2008) to gain a glimpse of two larger microsporidian genomes. The survey revealed that genome compaction is not a universal feature of microsporidia. The genomes of Brachiola algerae (15-20 Mbp) and Edhazardia aedis (actual genome size unknown) contain large regions of non-coding DNA and several selfish DNA elements, which are entirely absent from E. cuniculi, and together may account for most of the extra genome size. In addition, B. algerae has a broader host range and Ed. aedis has a more complex life cycle than E. cuniculi, and both species may encode additional genes that increase the overall genome sizes. Multi gene transcripts are prevalent in the spores of microsporidia with smaller genomes, and this phenomenon is thought to be a byproduct of the close spacing of genes (Corradi et al., 2008;Williams et al., 2005). However, transcription has not been studied in a microsporidian with a larger genome, such as Ed. aedis. Since genes are more widely spaced, one would predict that the incidence of multi-gene transcription would be drastically reduced, if not eliminated. 1.7 Objectives The goal of my thesis is to address questions concerning the evolution of the microsporidian genome and gene expression. Until recently, the position of microsporidia among fungi has been unresolved, and I first aimed to bring some clarity to the situation. As discussed above, certain well-studied microsporidian genomes have undergone reduction/compaction. These processes have resulted in some rare 5phenomena, such as multi-gene transcription, which I elected to study more carefully. I have also endeavored to investigate transcription in a microsporidian whose genome is not compact to gain a more complete view of gene expression in microsporidia as a group. My thesis has thus been devoted to addressing the following questions: I. How are microsporidia related to fungi? II. How has genome reduction affected DNA repair in Encephalitozoon cuniculi? III. How has genome reduction affected the spliceosome of Encephalitozoon cuniculi? IV. Do mRNA splicing and transcription occur differently in the two life stages of microsporidia? V. How does gene expression occur in a microsporidian with a (relatively) large genome? 6Figure 1.1: The Microsporidian Life Cycle. The spore (a) germinates when the polar tube everts from the spore (b) and the spore contents are pushed through the tube into the host cell cytoplasm (c). The meront divides (d) then spore walls are laid down surrounding each meront nucleus (e). The host cell lyses, releasing new spores (a). Spore walls are laid down surrounding each meront nucleus (e). The host cell lyses, releasing new spores (a). The host cell nucleus is denoted with an “N”. 71.8 References Corradi, N., Gangaeva, A. and Keeling, P.J. 2008. Comparative profiling of overlapping transcription in the compacted genomes of microsporidia Antonospora locustae and Encephalitozoon cuniculi. Genomics 91: 388-393. Fast, N.M. and Keeling, P.J. 2001. Alpha and beta subunits of pyruvate dehydrogenase E1 from the microsporidian Nosema locustae: mitochondrion-derived carbon metabolism in microsporidia. Mol.Biochem.Parasit. 117: 201-209. Fast, N.M., Logsdon, J.M. and Doolittle, W.F. 1999. Phylogenetic analysis of the TATA box binding protein (TBP) gene from Nosema locustae: evidence for a microsporidia- fungi relationship and spliceosomal intron loss. Mol. Biol. Evol. 16: 1415-1419. Germot, A., Philippe, H. and Le Guyader, H. 1997. Evidence for loss of mitochondria in microsporidia from a mitochondrial-type HSP70 in Nosema Locustae. Mol Biochem Parasit 87: 159-168. Hirt, R.P., Logsdon, J.M., Healy, B., Dorey, M.W., Doolittle, W.F. and Embley, T.M. 1999. Microsporidia are related to fungi: Evidence from the largest subunit of RNA polymerase II and other proteins. Proc Natl Acad Sci USA 96: 580-585. Kamaishi, T., Hashimoto, T., Nakamura, Y., Masuda, Y., Nakamura, F., Okamoto, K., Shimizu, M. and Hasegawa, M. 1996a. Complete nucleotide sequences of the genes encoding translation elongation factors 1 alpha and 2 from a microsporidian parasite, Glugea plecoglossi: implications for the deepest branching of eukaryotes. J Biochem (Tokyo) 120: 1095-1103. Kamaishi, T., Hashimoto, T., Nakamura, Y., Nakamura, F., Murata, S., Okada, N., Okamoto, K., Shimizu, M. and Hasegawa, M. 1996b. Protein phylogeny of translation elongation factor EF-1alpha suggests microsporidians are extremely ancient eukarotes. J. Mol. Evol. 42: 257-263. 8Katinka, M.D., Duprat, S., Cornillot, E., Metenier, G., Thomarat, F., Prensier, G., Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P. et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414: 450-453. Keeling, P.J. 2003. Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet. Biol. 38: 298-309. Keeling, P.J., Luker, M.A. and Palmer, J.D. 2000. Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi. Mol. Biol. Evol. 17: 23-31. Miller, W.J. and Capy, P. 2004. Mobile genetic elements as natural tools for genome evolution. Methods Mol. Biol. 260: 1-20. Peyretaillade, E., Broussolle, V., Peyret, P., Metenier, G., Gouy, M. and Vivares, C.P. 1998. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin. Mol. Biol. Evol. 15: 683-689. Slamovits, C.H., Fast, N.M., Law, J.S. and Keeling, P.J. 2004. Genome compaction and stability in microsporidian intracellular parasites. Curr. Biol. 14: 891-896. Thomarat, F., Vivares, C.P. and Gouy, M. 2004. Phylogenetic analysis of the complete genome sequence of Encephalitozoon cuniculi supports the fungal origin of microsporidia and reveals a high frequency of fast-evolving genes. J. Mol. Evol. 59: 780- 791. Vossbrinck, C.R., Maddox, J.V., Friedman, S., Debrunner-Vossbrinck, B.A. and Woese, C.R. 1987. Ribosomal RNA sequence suggests microsporidia are extremely ancient eukaryotes. Nature 326: 411-414. Williams, B.A., Lee, R.C., Becnel, J.J., Weiss, L.M., Fast, N.M. and Keeling, P.J. 2008. Genome sequence surveys of Brachiola algerae and Edhazardia aedis reveal microsporidia with low gene densities. BMC Genomics 9: 200. 9Williams, B.A., Slamovits, C.H., Patron, N.J., Fast, N.M. and Keeling, P.J. 2005. A high frequency of overlapping gene expression in compacted eukaryotic genomes. Proc. Natl. Acad. Sci. USA 102: 10936-10941. Williams, B.A.P., Hirt, R.P., Lucocq, J.M. and Embley, T.M. 2002. A mitochondrial remnant in the microsporidian Trachipliestophora hominis. Nature 418: 865-869. Wittner, M. and Weiss, L.M. 1999. The microsporidia and microsporidosis. ASM Press, Washington, D. C. Xu, J., Pan, G., Fang, L., Li, J., Tian, X., Li, T., Zhou, Z. and Xiang, Z. 2006. The varying microsporidian genome: existence of long-terminal repeat retrotransposon in domesticated silkworm parasite Nosema bombycis. Int. J. Parasitol. 36: 1049-1056. 10 Chapter 2 – Assessing the Microsporidia-Fungi Relationship1 2.1 Introduction Microsporidia are a fascinating group of organisms from both a medical and an evolutionary point of view. These unicellular eukaryotes infect at least 1200 species of animals from every major evolutionary lineage, from crustacean to mammal, with a large proportion infecting insects (Wittner and Weiss, 1999). Microsporidia first came to the attention of humans when a strange parasite decimated lucrative European silkworm populations in the 19th century, but the drive to investigate these organisms was drastically increased by the more recent discovery of microsporidian infections in immuno-compromised humans, such as AIDS, organ transplant and cancer patients in the 1980’s and 1990’s. Microsporidia alternate between two life forms – the spore and the meront, but only the spores are viable outside an animal host. Although diverse in size and shape, microsporidian spores have a typical morphology that includes a thick, protective proteinaceous coat containing chitin, and the polar filament, which is specialized for host invasion. This filament, also called the polar tube, is perhaps the most recognized microsporidian feature. It is attached to a plate at one terminus of the spore, is wound around the spore contents, and eventually ends near the posterior vacuole at the other end of the spore. Both the polar tube and the posterior vacuole play integral roles in spore germination and host infection. For a comprehensive review of the microsporidian life- cycle, see (Keeling and Fast, 2002). Although microsporidia contain unique infective organelles, they lack several structures that are usually considered hallmarks of eukaryotic life, such as typical mitochondria, peroxisomes and centrioles. They also possess several seemingly “prokaryotic” characteristics, such as 70S ribosomes, tiny genomes and a fused 5.8S and 28S rRNA. For these reasons, microsporidia were long thought to be a primitive or ancestral eukaryotic lineage, that diverged from the universal eukaryotic ancestor before 1 A version of this chapter has been published. Gill, E.E.; Fast, N.M. (2006) Assessing the microsporidia-fungi relationship: Combined phylogenetic analysis of eight genes. Gene 375, 103-109. 11 the gain of the α-proteobacterial endosymbiont, which eventually became the mitochondrion (Cavalier-Smith, 1991). This hypothesis placed microsporidia into Kingdom Archezoa, a group of organisms defined by their primitive lack of mitochondria (Cavalier-Smith, 1991). Initially, molecular data seemed to support the inclusion of microsporidia in the Archezoa. Notably, analyses of ribosomal RNA (Vossbrinck et al., 1987), elongation factor 1-alpha (EF1-α) and elongation factor 2 (EF-2) sequences (Kamaishi et al., 1996a; Kamaishi et al., 1996b) placed microsporidia at the base of the eukaryotic tree. Later, α- and β-tubulin phylogenies (Edlind et al., 1996; Keeling and Doolittle, 1996; Keeling et al., 2000) indicated a close relationship to fungi, in stark contradiction to the data supporting microsporidia as members of the Archezoa. Additional analyses conducted on mitochondrial Hsp70 (Germot et al., 1997; Hirt, 1997; Peyretaillade et al., 1998), TATA box binding protein (Fast et al., 1999), the largest subunit of RNA polymerase II (RPB1) (Hirt et al., 1999), and pyruvate dehydrogenase subunits E1α and β (Fast and Keeling, 2001) bolstered the proposed microsporidia-fungi relationship. In line with the discovery of mitochondrion-derived genes in microsporidian genomes and a fungal ancestry for microsporidia, Williams et al. identified a cryptic mitochondrion in the microsporidian Trachipleistophora hominis, by immunolocalization of Hsp70 (Williams et al., 2002). In addition, re-analysis of certain molecules, such as EF-2 (Hirt et al., 1999; Van de Peer et al., 2000) and LSU rRNA (Fischer and Palmer, 2005), did not support a basal origin of microsporidia when using a larger taxon set. As phylogenetic methodology advanced, shortcomings were also found with other previous analyses that had originally supported a basal position for microsporidia. For instance, EF1-α is not a suitable molecule to use in many phylogenetic analyses due to its mutation saturation and covarion behaviour, particularly in microsporidian sequences (Hirt et al., 1999; Inagaki et al., 2004). Microsporidia share physiological and biochemical characteristics with fungi as well. Both have a similar meiotic mechanism, utilizing a closed spindle formation (Flegel and Pasharawipas, 1995), and they also share a common mRNA capping mechanism (Hausmann et al., 2002). However, these characteristics are not exclusive to fungi and microsporidia. Taken together, the phylogenetic, cytological and biochemical 12 evidence indicate that microsporidia do have some tie with fungi, but the exact nature of this relationship has remained elusive, partially due to inadequate fungal representatives in analyses. Two more recent analyses attempted to remedy this situation and came to differing conclusions. Tanabe et al. conducted analyses of RPB1 and EF1-α (Tanabe et al., 2002). These analyses included a wider sampling of fungal sequences, including representatives from four fungal phyla (ascomycetes, basidiomycetes, zygomycetes and chytrids), and results did not indicate a strong relationship between microsporidia and fungi. Instead, the microsporidia were placed at the base of a combined fungal/animal clade. In contrast to the Tanabe analysis, a phylogenetic study performed by Keeling not only strongly supported a relationship between microsporidia and fungi, but also proposed that microsporidia evolved from a zygomycete ancestor (Keeling, 2003). This is an important issue to be addressed herein: If microsporidia are related to fungi, as the majority of data indicate, are microsporidia the descendants of an actual fungus, or did they share a common ancestor with extant fungi? Unfortunately, the evolutionary history of any lineage is difficult, if not impossible, to ascertain from the analysis of any single gene. In general, support values for branching patterns in trees are low with any dataset of restricted length. This is also true for phylogenies including microsporidia. By increasing the length of the dataset and the number of fungal species, our aim is to generate a more robust tree. Here we attempt to resolve the nature of the microsporidia-fungi relationship using a concatenated alignment of eight genes, containing 1666 amino acid characters. 2.2 Materials and Methods 2.2.1 Gene and Taxon Selection In order to clarify the relationship between microsporidia and fungi, genes were chosen for analysis based on the recent work of Thomarat et al. (Thomarat et al., 2004). Thomarat’s group analyzed the genome of Encephalitozoon cuniculi, and conducted four types of phylogenetic analysis on each of several dozen genes. Each gene was annotated as branching at a specific point between fungi, animals and plants. Genes that consistently branched with fungi were selected for this study and the final dataset 13 includes 8 genes: α-tubulin, β-tubulin, the largest subunit of RNA polymerase II (RPB1), the DNA repair helicase RAD25, TATA-box binding protein (TBP), a subunit of the E2 ubiquitin conjugating enzyme (UBC2), and the E1α and β subunits of pyruvate dehydrogenase. Each of these genes was compared to available public sequence data using the BLAST algorithm (Altschul et al., 1997). The majority of sequences were retrieved from the NCBI databases, with the following exceptions. Data for the zygomycete Rhizopus oryzae were collected from the Rhizopus sequencing project performed at the Broad Institute of MIT and Harvard (http://www.broad.mit.edu/annotation/fungi/fgi/). Spizellomyces punctatus sequences were downloaded from the Protist EST Project public database (http://amoebidia.bcm.umontreal.ca/public/pepdb/agrm.php). Taxa were identified that possessed sequences for all eight of the protein-coding genes listed above. Two representative species for each major fungal lineage were sought, however sequence data is relatively scarce for zygomycetes and chytrids. Although it is not ideal, given previous hypotheses regarding a relationship between microsporidia and zygomycetes, the limited available data only allowed for a single representative taxon for both zygomycetes and chytrids. Two animal taxa, a plant and an amoebozoan are also included. As Blastocladiella sequences were retrieved from an EST survey, not all of the genes used in the analysis were present at full-length. Therefore, significant portions of β-tubulin, RAD25, RPB1 and UBC2 were not included in the analysis. This dataset, hereafter refered to as the complete dataset, has 1666 characters and contains 12 species, for which all eight of the above genes are included. A second dataset containing an additional chytrid representative, Spizellomyces punctatus, was assembled (for explanation, see section 2.1). Due to sequence availability, not all eight protein-coding genes could be included. This dataset has 858 characters - approximately half the number included in the complete dataset. UBC2, TBP and RAD25 sequences were not included. For each individual protein-coding gene, up to 40 additional sequences were obtained from eukaryotic and bacterial representatives. These sequences were analyzed along with those of the 12 taxa present in the complete dataset in single gene phylogenetic analyses. 14 2.2.2 Phylogenetic Analyses Alignments of each of the eight protein sequences were constructed using CLUSTALW version 1.83 (Thompson et al., 1994) and manually edited using MacClade software version 4.06 (Maddison and Maddison, 1989). For the concatenated analysis, all eight alignments were placed sequentially in the same MacClade file. All analyses were conducted using the JTT substitution matrix (Jones et al., 1992) with 6 Γ rate categories, plus one invariable. The fraction of invariable sites and α-parameter were determined from each dataset using TREE-PUZZLE version 5.2 (Schmidt et al., 2002). TREE-PUZZLE settings were adjusted to calculate pairwise distances only and to generate exact parameter estimates. Maximum likelihood (ML) analyses were performed using IQPNNI (Important Quartet Puzzling and Nearest Neighbor Interchange) version 2.6 (Vin le and von Haeseler, 2004) and PROML version 3.6b (Felsenstein et al., 2000) (for the concatenated datasets). Default settings were used for IQPNNI, while PROML settings were altered to perform a slow analysis, global rearrangements and randomize input order, jumbling ten times. Maximum likelihood-distance (ML-distance) analyses were performed using FITCH version 3.6b (Felsenstein et al., 2000), and Bayesian analyses (for the concatenated dataset) were performed with MRBAYES version 3.0b4 (Huelsenbeck and Ronquist, 2001). FITCH settings were altered to allow global rearrangements and randomize input order, jumbling ten times. MRBAYES was set to perform one million generations, sampling every 1000th generation. Four chains were used in the analysis, and the final consensus tree was constructed with a burn-in of 10000 generations. Bootstrapping was conducted using PUZZLEBOOT and FITCH for ML- distance, and PHYML (Guindon and Gascuel, 2003) for ML analyses. 100 pseudo- datasets were used for bootstrapping purposes for both methods. The approximately unbiased (AU) test of tree selection (Shimodaira, 2002) was performed using CONSEL version 1.10 (Shimodaira and Hasegawa, 2001) with default settings. Parameters were unlinked between genes. The site likelihoods for this method were determined by TREE- PUZZLE and reformatted utilizing a script written and provided by J. Leigh (Dalhousie University, Halifax, Nova Scotia, Canada). 15 The α-parameter was estimated by TREE-PUZZLE to be 0.85 for the full-length dataset, and 0.88 for the shorter dataset. The fraction of invariable sites was 0.2 for both datasets. 2.3 Results and Discussion 2.3.1 Tree Topologies and Fungal Phyla Until recently, the lack of available sequences from both fungi and microsporidia prevented a large-scale phylogenetic analysis. In general, single gene phylogenies addressing the microsporidia-fungi relationship were not robust. Although microsporidian sequences consistently branched with fungal sequences in these analyses, the lack of sampling or poor phylogenetic resolution prevented the exact nature of the microsporidia-fungi relationship from being pinpointed. In 2003, Keeling’s combined analysis of α- and β-tubulin sequences was the first to utilize data from more than one gene, and although support for placing microsporidia within the fungal radiation was high, the overall branching patterns of the trees were only moderately supported (Keeling, 2003). In the current study, we analyze more than twice as many characters to test the microsporidian phylogenetic position predicted by the tubulin trees. To date, this is the largest dataset employed to examine the relationship of microsporidia and fungi. Before conducting the combined analysis, each molecule was analyzed separately. Maximum likelihood analyses of α- and β-tubulin, RPB1, RAD25, UBC2 and PDH E1α placed microsporidia within the fungal clade, whereas the TBP and PDH E1β analyses placed them at the base of a combined animal/fungal group. These results are similar to those obtained in previous analyses of RPB1, α- and β-tubulin, TBP and PDH E1α and E1β (Edlind et al., 1996; Hirt et al., 1999; Fast et al., 1999; Fast and Keeling, 2001; Thomarat et al., 2004). In general, bootstrap support values were very low. As the analysis of each of these molecules demonstrated either a microsporidia-fungi relationship or a fungi/animal-microsporidia relationship, all were included in the concatenated dataset. (See supplementary data for individual phylogenetic trees.) All of the concatenated analyses (ML, ML-distance and Bayesian), performed on the complete dataset recovered identical trees that placed microsporidia as a sister to a combined ascomycete+basidiomycete clade with 81% and 73% support for ML and ML- 16 distance bootstrapping methods, respectively. (See Fig. 2.1) Bootstrap values for both methods lie above 70% at all nodes, with the vast majority being greater than 90%. Bayesian posterior probabilities are 100 at all nodes. To test the strength of the position of the microsporidia, AU tests were carried out. AU tests were conducted on topologies constructed by moving the microsporidia (as a pair) to all alternative positions within the original tree. Tests found the ML/ML-distance/Bayesian tree (i.e. Fig. 2.1) to be the most likely, with an AU probability of 0.872. However, the next two most likely trees, those placing microsporidia at the base of the fungal clade, and as a sister to the basidiomycetes, respectively, cannot be rejected at a significance level of 5%. In fact, the optimal tree (with microsporidia branching as the sister to ascomycetes+basidiomycetes) possesses a high degree of dominance over the alternatives that possess AU probabilities of 0.369 (base of the fungi) and 0.183 (sister to basidiomycetes). (See closed and open circles in Fig. 2.1.) Strangely, Rhizopus oryzae and Blastocladiella emersonii, a zygomycete and a chytrid, respectively, branched together with 100% bootstrap support in the recovered tree. This topology was somewhat unexpected, but probably reflects our current understanding of fungal systematics. A recent and comprehensive multi-locus analysis from the “Assembling the Fungal Tree of Life” (AFTOL) project indicates that the zygomycetes and chytrids are probably not monophyletic (Lutzoni et al., 2004). Indeed, the AFTOL analyses placed the blastocladialean chytrid representative, Allomyces, within one of several otherwise zygomycete clades. This result is not unique: an analysis conducted on ribosomal DNA sequences from dozens of chytrids placed the blastocladialean chytrid fungi within a combined zygomycete/blastocladialean clade (James et al., 2000). Ideally, several representatives of both the zygomycetes and the chytrids would have been included in this study, especially given the somewhat tenuous position of the Blastocladiales within the fungal tree. However, sequence sampling in these groups is slim. There is, however, a limited amount of sequence data from a different chytrid, Spizellomyces punctatus, which is not a member of the Blastocladiales lineage. A second concatenated dataset was constructed including both chytrids, but it is much shorter, containing approximately half the characters of the complete dataset. (See Section 2.2.1) 17 Trees generated from the shorter dataset place Blastocladiella and Spizellomyces together as a sister clade to Rhizopus, again with 100% support. (See Fig. 2.2) The topology of this tree differs from that of the complete dataset in the placement of the microsporidia. In this case, they occupy a position at the base of the fungi, one of the topologies not rejected by AU tests performed on the complete dataset. However, this position is not supported, as it was not recovered by either bootstrapping method. In general, this tree’s topology is not nearly as well-supported as that of the complete dataset – likely because it is only based on approximately half as many characters. Bootstrap values range from 70 to 100%, with two nodes not recovered from the majority of bootstrapping datasets for one or both methods. This lack of support is also evident in AU tests, where the position of the microsporidian pair was tested as described above for the complete dataset. AU tests of 19 topologies fail to reject seven alternatives at a 5% significance level, including microsporidia as a sister to the animals, as a sister to the basidiomycetes, as a sister to the ascomycetes, as a sister to Dictyostelium and Arabidopsis, as a sister to Cryptococcus and at the base of the animal/fungal clade (see closed and open circles in Fig. 2.2). Perhaps it is notable that AU testing identified the most likely tree as that placing the microsporidian pair with the ascomycetes+basidiomycetes, the strongly supported topology recovered in the analysis of the complete dataset. 2.3.2 Considering the Phylogenetic Position of Microsporidia During the past 20 years, microsporidia have been placed at several different nodes on the eukaryotic tree. Initial molecular evidence seemed to indicate an ancient origin, while more recent single gene phylogenies consistently grouped microsporidia with fungi. However, the exact nature of the relationship between microsporidia and fungi has remained unclear. Tanabe and Keeling attempted to solve this problem by broadening the fungal representation in their analyses (Keeling, 2003; Tanabe et al., 2002). The results of Tanabe’s EF1-α analysis were in concordance with early work, recovering microsporidia near the base of the eukaryotic tree, whereas his RPB1 analysis placed microsporidia at the base of an animal/fungal clade. Keeling’s analyses placed microsporidia within the zygomycete clade. Trees generated from the multi-gene 18 analyses presented here recover another possibility: microsporidia as a sister to the ascomycetes and basidiomycetes. This topology is well supported statistically, however, other topologies – including microsporidia branching as a sister to the fungi – were not rejected by AU tests. Although there are many ascomycetes and basidiomycetes that infect animals (for example, the basidiomycete Cryptococcus), these animal parasites tend to be found as terminal branches of the fungal tree, nested within free-living clades of ascomycetes and basidiomycetes (Berbee, 2001). This suggests that the ancestor of ascomycetes+basidiomycetes was free-living, and not an animal parasite. Although this seems at odds with microsporidia having a phylogenetic position as a sister to ascomycetes+basidiomycetes (as in Fig. 2.1), there remains the formal possibility that such a parasitic ancestor of ascomycetes+basidiomycetes did exist at some point but is now extinct. It is, perhaps, easier to reconcile microsporidian life style with a zygomycete heritage as suggested by the tubulins (Keeling, 2003). Zygomycetes and chytrids have many members (including putative basal lineages) that are animal parasites (with many infecting insects). However, tubulins are the only molecules to suggest this history, and one has to consider weaknesses inherent in using tubulins as phylogenetic markers. It has previously been noted that the loss of flagella and other 9+2 microtubule structures has led to accelerated rates of evolution in tubulin genes (Keeling et al., 2000; Keeling, 2003), and this trend is clearly evident in sequences from ascomycetes, basidiomycetes, zygomycetes and microsporidia. Chytrid sequences are much less divergent, as chytrids possess flagella for part of their lifecycle. This discrepancy in rates could compromise the phylogenetic usefulness of tubulins to resolve fungal and microsporidian relationships. In fact, long branch attraction has been raised as a reason to question the microsporidia-fungi relationship based on tubulin phylogenies. This may be a legitimate concern when resolving the position of microsporidia within the fungal radiation in tubulin trees. However, the suggestion that microsporidia and fungi only branch together in tubulin trees because of long branch attraction has been discredited by analyses showing a microsporidia-fungi relationship when the only fungal representatives in the tree were the short-branched chytrid sequences (Keeling et al., 2000). In the current combined analysis, the branch lengths are much more conservative and consistent 19 across fungi, however, one could still argue that the microsporidia and the ascomycetes+basidiomycetes do possess the longer branches in the tree. Nevertheless, the bootstrap support for the sisterhood of microsporidia and ascomycetes+basidiomycetes is fairly strong. Although the validity of EF-1α as a phylogenetic marker to assess microsporidian relationships has seriously been called into question, Tanabe et al. propose that indel information in EF-1α could shed light on the relationship between microsporidia and fungi (Tanabe et al. 2002). They identify a two amino acid deletion that is present in all examined fungi, but is absent in microsporidia. This character unites fungi to the exclusion of microsporidia, and implies that microsporidia and fungi could only be related as sisters. Although there are currently only three EF-1α sequences from microsporidia in the NCBI database (from Antonospora locustae, Encephalitozoon cuniculi, and Glugea plecoglossi), they do represent a large fraction of microsporidian diversity. On its own the indel data might not be too compelling, however it is consistent with results of this study where the sisterhood of microsporidia and fungi could not be statistically rejected by analysis of either dataset. Although this study significantly increases the amount of data used to address the relationship between microsporidia and fungi, a single strongly-supported phylogenetic position is not clearly evident. If microsporidia are sisters to ascomycetes+basidiomycetes – a provocative possibility that would have serious implications for the development of parasitism within the fungi – we expect that future phylogenetic analyses will recover similar results and bolster this position. However, we do not discount the possibility that microsporidia and fungi are sisters, based on the EF- 1α indel and our own phylogenetic results that do not reject this possibility. When more EF-1α sequences become available, particularly from microsporidia considered to be basal, the usefulness of the indel as a phylogenetic marker can be more effectively evaluated. Until that time, the exact nature of the relationship between microsporidia and fungi remains unclear. 20 Figure 2.1: A. AU Probabilities for the Three Branching Patterns Not Rejected at a 5% Significance Level. B. PROML Tree Generated from the Complete Concatenated Dataset. Bootstrap values are indicated at nodes. The first and second numbers represent percentages from ML (PHYML) and ML-distance (FITCH) bootstrap methods, respectively. An open circle represents a position for microsporidia that was not rejected 21 in AU tests at a significance level of 5%. A closed circle represents a position for microsporidia that was rejected in AU tests. Figure 2.2: PROML Tree Generated from the Short Concatenated Dataset Including Spizellomyces punctatus. Bootstrap values are indicated at nodes. The first and second numbers represent percentages from ML (PHYML) and ML-distance (FITCH) bootstrap methods, respectively. Open and closed circles are as described for Figure 2.1. Asterixes indicate branching patterns that were not recovered by bootstrapping. 22 23 Supplementary Figure 2.1: IQPNNI Trees Generated from Individual Gene Analyses. 24 Panels a through h represent α-tubulin (372 characters), β-tubulin (275 characters), RAD25 (194 characters), pyruvate dehydrogenase E1-α (107 characters), pyruvate dehydrogenase E1-β (190 characters), RPB1 (366 characters), TATA-box binding protein (171 characters) and UBC2 (148 characters), respectively. Bootstrap values greater than 50% are at nodes. The first and second numbers represent percentages from ML and ML-distance bootstrap methods, respectively. 25 2.4 References Altschul, S.F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J., 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. Berbee, M. L., 2001. The phylogeny of plant and animal pathogens in the Ascomycota. Physiol. Mol. Plant P. 59: 165-187. Cavalier-Smith, T., 1991. Archamoebae: the ancestral eukaryotes? Biosystems 25: 25-38. Edlind, T.D., Li, J., Visvesvara, G. S., Vodkin, M. H., McLaughlin, G. L. and Katiyar, S. K., 1996. Phylogenetic analysis of beta-tubulin sequences from amitochondrial protozoa. Mol. Phylogenet. Evol. 5: 359-367. Fast, N.M., Logsdon, J. M., and Doolittle, W. F., 1999. Phylogenetic analysis of the TATA box binding protein (TBP) gene from Nosema locustae: evidence for a microsporidia-fungi relationship and spliceosomal intron loss. Mol. Biol. Evol. 16: 1415- 1419. Fast, N.M., and Keeling, P. J., 2001. Alpha and beta subunits of pyruvate dehydrogenase E1 from the microsporidian Nosema locustae: mitochondrion-derived carbon metabolism in microsporidia. Mol. Biochem. Parasit. 117: 201-209. Felsenstein, J. et al., 2000. PHYLIP: phylogeny inference package, http://evolution.genetics.washington.edu/phylip.html. Fischer, W. M., and Palmer, J. D., 2005. Evidence from small-subunit ribosomal RNA sequences for a fungal origin of microsporidia. Mol. Phylogenet. Evol. 36: 606-622. Flegel, T.W., and Pasharawipas, T., 1995. A proposal for typical meiosis in microsporidians. Can J. Microbiol. 41: 1-11. 26 Germot, A., Philippe, H., and Le Guyader, H., 1997. Evidence for loss of mitochondria in microsporidia from a mitochondrial-type HSP70 in Nosema locustae. Mol. Biochem. Parasit. 87: 159-168. Guindon, S., and Gascuel, O., 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704. Hausmann, S., Vivares, C. P., and Shuman, S., 2002. Characterization of the mRNA capping apparatus of the microsporidian parasite Encephalitozoon cuniculi. J. Biol. Chem. 277: 96-103. Hirt, R.P., Healey, B., Vossbrinck, C. R., Canning, E. U., and Embley, T. M., 1997. A mitochondrial Hsp70 orthologue in Varimorpha necatrix: molecular evidence that microsporidia once contained mitochondria. Curr. Biol. 7: 995-998. Hirt, R.P., Logsdon, J. M., Healy, B., Dorey, M. W., Doolittle, W. F., and Embley, T. M., 1999. Microsporidia are related to fungi: Evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. USA 96: 580-585. Huelsenbeck, J.P., Ronquist, F., 2001. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572-1574. Inagaki, Y., Susko, E., Fast, N. M., and Roger, A. J., 2004. Covarion shifts cause a long- branch attraction artifact that unites microsporidia and archaebacteria in EF-1 alpha phylogenies. Mol. Biol. Evol. 21: 1340-1349. James, T.A., Porter, D., Leander, C. A., Vilgalys, R., and Longcore, J. E., 2000. Molecular phylogenetics of the Chytridiomycota supports the utility of ultrastructural data in chytrid systematics. Can. J. Bot. 78: 336-350. 27 Jones, D.T., Taylor, W. R., and Thornton, J. M., 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8: 275-282. Kamaishi, T., Hashimoto, T., Nakamura, Y., Nakamura, F., Murata, S., Okada, N., Okamoto, K., Shimizu, M., and Hasegawa, M., 1996a. Protein phylogeny of translation elongation factor EF-1alpha suggests microsporidians are extremely ancient eukarotes. J. Mol. Evol. 42: 257-263. Kamaishi, T., Hashimoto, T., Nakamura, Y., Masuda, Y., Nakamura, F., Okamoto, K., Shimizu, M., and Hasegawa, M., 1996b. Complete nucleotide sequences of the genes encoding translation elongation factors 1 alpha and 2 from a microsporidian parasite, Glugea plecoglossi: implications for the deepest branching of eukaryotes. J. Biochem.- Tokyo 120: 1095-1103. Keeling, P.J., and Doolittle, W. F., 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Mol. Biol. Evol. 13: 1297-1305. Keeling, P.J., Luker, M. A., and Palmer, J. D., 2000. Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi. Mol. Biol. Evol. 17: 23-31. Keeling, P.J., and Fast, N. M., 2002. Microsporidia: biology and evolution of highly reduced intracellular parasites. Annu. Rev. Microbiol. 56: 93-116. Keeling, P.J., 2003. Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet. Biol. 38: 298-309. Lutzoni, F., et al., 2004. Assembling the fungal tree of life: progress, classification, and evolution of subcellular traits. Am. J. Bot. 91: 1446-1480. Maddison, W.P., and Maddison, D. R., 1989. Interactive analysis of phylogeny and character evolution using the computer program MacClade. Folia Primatol. 53: 190-202. 28 Peyretaillade, E., Broussolle, V., Peyret, P., Metenier, G., Gouy, M., and Vivares, C. P., 1998. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin. Mol. Biol. Evol. 15: 683-689. Schmidt, H.A., Strimmer, K., Vingron, M., and von Haeseler, A., 2002. TREE-PUZZLE: Maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18: 502-504. Shimodaira, H., and Hasegawa, M., 2001. CONSEL for assessing the confidence of phylogenetic tree selection. Bioinformatics 17: 1246-1247. Shimodaira, H., 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51: 492-508. Sogin, M.L., 1991. Early evolution and the origin of eukaryotes. Curr. Opin. Genet. Dev. 1: 457-463. Tanabe, Y., Watanabe, M. M., and Sugiyama, J., 2002. Are Microsporidia really related to fungi?: a reappraisal based on additional gene sequences from basal fungi. Mycol. Res. 106: 1380-1391. Thomarat, F., Vivares, C. P., Gouy, M., 2004. Phylogenetic analysis of the complete genome sequence of Encephalitozoon cuniculi supports the fungal origin of microsporidia and reveals a high frequency of fast-evolving genes. J. Mol. Evol. 59: 780- 791. Thompson, J.D., Higgins, D. G., and Gibson, T. J., 1994. CLUSTALW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673- 4680. 29 Van de Peer, Y., Ben Ali, A., and Meyer, A., 2000. Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene 246: 1-8. Vin le, S., and Von Haeseler, A., 2004. IQPNNI: moving fast through tree space and stopping in time. Mol. Biol. Evol. 21: 1565-1571. Vossbrinck, C.R., Maddox, J. V., Friedman, S., Debrunner-Vossbrinck, B. A., and Woese, C. R., 1987. Ribosomal RNA sequence suggests microsporidia are extremely ancient eukaryotes. Nature 326: 411-414. Williams, B.A.P., Hirt, R. P., Lucocq, J. M., and Embley, T. M., 2002. A mitochondrial remnant in the microsporidian Trachipleistophora hominis. Nature 418: 865-869. Wittner, M., and Weiss, L. M., 1999. The microsporidia and microsporidosis. ASM Press, Washington, D. C. 30 Chapter 3 – Genome Reduction and DNA Repair Systems in E. cuniculi2 3.1 Introduction 3.1.1 DNA Repair in Eukaryotes DNA repair processes are vital to all living organisms. Without appropriate mechanisms to remove and replace damaged bases and nucleotides, multiple lesions would accumulate, leading to total genome degradation and loss of vital genetic information. DNA lesions take many forms, including single strand and double strand breaks, in addition to inter- and intra-strand crosslinks and modified bases. Several pathways operate in a concerted manner to minimize information loss each time a DNA lesion occurs. Many of these fundamental pathways have been conserved throughout eukaryotes, and many eukaryotic enzymes have homologues in prokaryotes (Taylor and Lehmann, 1998). Eukaryotic DNA repair can be divided into six primary pathways, all of which are conserved (Fleck and Nielsen, 2004). Some types of DNA lesions (such as double stranded breaks) can be recognized and repaired by more than one pathway. Therefore there is some overlap in function between pathways (Fleck and Nielsen, 2004). Mismatch repair (MMR), base excision repair (BER) and nucleotide excision repair (NER) all operate to repair aberrant bases or nucleotides from one strand of the double helix, using the other strand as a template for new DNA synthesis. In contrast, methyltransferase repair does not require the synthesis of new DNA; anomalous methyl groups are removed without causing any breaks in the double helix. Non-homologous end joining repair (NHEJ) and homologous recombination repair (HRR) are double strand break repair pathways (Fleck and Nielsen, 2004). Double strand breaks are one of the most detrimental forms of DNA lesions, as they can cause genome fragmentation and apoptosis if they are not properly repaired (Hopfner et al., 2002). (See Figure 3.1 for a comparison of DNA repair processes.) 2 A version of this chapter has been published. Gill, E.E.; Fast, N.M. (2007) Stripped- down DNA repair in a highly reduced parasite. BMC Mol Biol 8, 24. 31 DNA polymerases are key players in DNA repair, as they are required to fill in gaps created by repair enzymes or incurred from damage (Hubscher et al., 2000). Eukaryotic cells have a wide range of polymerases that are specialized in function; certain polymerases act almost solely in genome replication, others are only active at DNA lesions, and a few have dual roles in repair and replication (Hubscher et al., 2000). For consistency, the names of the genes and proteins involved in these pathways will be referred to using Saccharomyces cerevisiae nomenclature. 3.1.2 Genome Reduction in Encephalitozoon cuniculi Encephalitozoon cuniculi belongs to a group of obligate intracellular parasites known as microsporidia. These organisms infect a variety of animals including fish, insects and mammals. Various microsporidia have gained the attention of the medical community in the past few decades due to their infection of immuno-compromised humans, such as AIDS and chemotherapy patients (Wittner and Weiss, 1999). Microsporidian genomes range in size from 2.3 Mbp to 19.5 Mbp. The only completely sequenced microsporidian genome, that of E. cuniculi, is a mere 2.9 Mbp in size (Katinka et al., 2001). The E. cuniculi genome is smaller than that of other eukaryotes for several reasons: it has fewer and shorter genes that are separated by tiny intergenic spaces and are only interrupted by a few short introns (Katinka et al., 2001). Given the degree of genome reduction, the effects are evident in most cellular processes, including DNA repair. The precise phylogenetic position of microsporidia is not yet known, but a large body of evidence indicates that they are closely related to the fungi (Katinka et al., 2001; Hirt et al., 1997; Keeling, 2003; Fischer and Palmer, 2005; James et al., 2006 and others). Therefore, E. cuniculi’s DNA repair systems have been compared primarily to those of another fungus, the yeast S. cerevisiae. S. cerevisiae’s repair pathways have been well studied at the functional level, making this organism ideal for comparative purposes. In order to gain an accurate perspective of what genes have been lost from E. cuniculi during the process of genome reduction (i.e., genes that were present in the common ancestor of microsporidia and fungi), only genes that have homologues in animals were examined. 32 3.2 Methods Identification of DNA Repair Pathway Components in S. cerevisiae and Data Mining in E. cuniculi Components of the six major DNA repair pathways were gathered from recent literature and supplemented with data from the Saccharomyces genome database (http://www.yeastgenome.org/). Refer to Table 3.1 for a list of genes involved in each pathway and the DNA polymerase subunits and references. Amino acid sequences of DNA repair proteins from S. cerevisiae (Table 3.1) were collected from NCBI GENBANK, and compared to E. cuniculi’s protein and nucleotide data using BLASTP and TBLASTN (Altschul et al., 1997). In instances where a Schizosaccharomyces pombe homologue to an S. cerevisiae protein existed that was more conserved among eukaryotes than the S. cerevisiae protein itself, the S. pombe sequence was used in the BLAST searches. (These proteins are indicated with asterixes in Table 3.1 and S. pombe protein ID numbers are given). In most instances, BLASTP searches were sufficient to identify putative E. cuniculi homologues. In cases where no significant results (significance was defined arbitrarily as an e-value of 10-5 or less) were produced from the initial BLASTP analysis, the PSI-BLAST algorithm was used. Homologues of the S. cerevisiae protein were identified in all available eukaryotic protein data to construct a position-specific scoring matrix (Altschul et al., 2005). Up to six iterations were run in cases where no significant E. cuniculi alignment was found. In order to rule out similarity by chance, the identities of putative homologues detected in E. cuniculi were confirmed by comparing them to GENBANK’s S. cerevisiae protein database using BLASTP. Homology was inferred when this search recovered the S. cerevisiae protein that was used for the initial E. cuniculi search as the top hit. In many instances, BLAST searches in E. cuniculi confirmed annotations of DNA repair genes and polymerases by Katinka et al. (2001). A brief examination of the number of interaction partners of each protein in S. cerevisiae was conducted using data from the online Database of Interacting Proteins (DIP) (http://dip.doe-mbi.ucla.edu/). 33 Proteins that are absent in E. cuniculi do not have a significantly different number of interaction partners from proteins that are present. (Data not shown.) 3.3 Results DNA Repair Inventory Comparison of DNA repair proteins in S. cerevisiae to E. cuniculi’s genome and proteome via BLAST and PSI-BLAST searches has revealed that E. cuniculi appears to contain a reduced set of proteins in all major repair pathways. Of the 56 repair genes that were sought in E. cuniculi, 16 are absent, with another 6 potentially absent. Six out of 14 DNA polymerases or polymerase subunits are absent (See Table 3.1). Although all repair pathways have been reduced, the loss of genes is not distributed evenly among pathways. Each process has been affected differently by genome reduction. A detailed discussion of the components of each pathway is presented below. 3.3.1 Base Excision Repair (BER) BER is one of the least complex of the DNA repair mechanisms, and involves only a small number of proteins. When a base becomes damaged, it is recognized by a DNA glycosylase that is specific for the particular base and/or the type of damage (methylation, oxidation, etc.). S. cerevisiae contains four types of glycosylases, although far more have been found in other organisms (animals, bacteria, etc.) (Krokan et al., 1997). Glycosylases cleave the glycosylic bond between the base and the deoxyribose to remove the damaged base, at which point a non-specific apurinic/apyrimidinic (AP) endonuclease (Apn1 or Apn2) removes the remaining deoxyribose phosphate to create a gap (Boiteux and Guillet, 2004). In short patch BER (which replaces a single nucleotide), the gap is filled by DNA polymerase β. In long patch BER (which replaces two or more nucleotides), the DNA polymerases β, or δ and ε in concert with proliferating cell nuclear antigen (PCNA) synthesize several nucleotides which displace the original DNA strand. Rad27 then removes the displaced DNA. The ligase Cdc9 (or DNA ligase III and Xrcc1 in other eukaryotes) is used to seal the nick (Kelley et al., 2003; Sung and Demple, 2006). 34 The Rad1-Rad10 and Mus81-Mms4 endonucleases are also believed to play minor roles in BER by processing the 3’ ends of the DNA once an incision has been made into the sugar-phosphate backbone (Bioteux and Guillet, 2004). E. cuniculi’s BER pathway appears to be nearly complete, but lacks DNA polymerase β, the Cdc9 DNA ligase (but possesses Xrcc1, the cofactor of the ligase used in this process in some eukaryotes) and part of a 3’ endonuclease, Mms4. (See Table 3.1) Deletion of either polymerase β or Mms4 is not a lethal mutation in yeast, however S. cerevisiae cannot survive in the absence of Cdc9 (Saccharomyces genome database). Another ligase is likely utilized for BER in E. cuniculi, as sharing non-specialized enzymes between pathways is not uncommon in S. cerevisiae (see discussion), and there is no reason to believe that this is not the case in E. cuniculi. 3.3.2 Nucleotide Excision Repair (NER) NER is used primarily to remove bulky lesions from DNA, such as inter- and intra-strand crosslinks. NER is a more complex process than BER, and utilizes a large number of proteins that are evolutionarily conserved among eukaryotes. NER is comprised of two subpathways, global genome repair (GGR) and transcription-coupled repair (TCR). As is suggested by their names, the subpathways act on different types of DNA: DNA that is not transcribed (or the non-transcribed strands of expressed genes), and actively transcribed DNA, respectively. In both GGR and TCR, DNA damage recognition is the first step to occur, followed by DNA unwinding. Next, incisions are made on either side of the aberrant base(s), and a total of 25-30 nucleotides on either side are removed as a single strand. The gap is then filled by DNA polymerase and sealed by DNA ligase. Recruitment of five multi-protein complexes, nucleotide excision repair factors (NEFs) 1 through 4 and the replication protein A (RPA) complex, is believed to take place in a stepwise manner to complete this process. The RPA complex is composed of Rpa1 and Rpa2 and recognizes damaged DNA. In GGR, the first protein complex to arrive at the damaged site is NEF4, which recognizes damage and is composed of the proteins Rad7 and Rad16. Rad7 binds the NEF2 complex (Rad4/Rad23), recruiting it to the damaged site and increasing DNA binding efficiency. The presence of NEF4 is not strictly required for the recruitment of 35 NEF2 to the DNA lesion, but facilitates the process. The above proteins do not act in the other sub-pathway; in TCR, initiation of repair takes place when an RNA polymerase stalls. Two proteins involved specifically in TCR, Rad26 and Rad28, also participate in the beginning of this process (van den Boom et al., 2002). In both GGR and TCR, NEF1 and NEF3 are the next components to be recruited, and are held at the damage site by NEF2. NEF1 is composed of Rad1, Rad10 and Rad14, while NEF3 is composed of Rad2 and transcription elongation factor IIH (TFIIH). TFIIH contains the Rad3, Rad25, SSL1, TFB1, TFB2 and TFB3 proteins, and provides the single strand DNA helicases required for repair proteins to access the damaged site. Rad1 and Rad10 form a heterodimer that acts as a single strand endonuclease at the 5’ end of the stretch of damaged DNA and Rad2 is a single strand endonuclease that cuts at the 3’ end. RPA is thought to be the last player to arrive at the scene. Many of these proteins also have roles in other cellular processes, such as recombination and transcription, therefore mutants express defects in several pathways. For a comprehensive review of NER, see Prakash and Prakash (2000). Most proteins participating in NER are present in E. cuniculi, with two exceptions. Half of a GGR heterodimeric damage sensor complex (Rad23) and the Tfb1 subunit of TFIIH appear to be absent (See Table 3.1). Rad23 appears to have diverse functions within the cell, ranging from DNA repair to the regulation of a cell-cycle checkpoint and protein degradation. Specifically, this protein helps to prevent the degradation of Rad4, as well as serving a role with the 26S proteasome in regulating the NER pathway (Gillette et al., 2006; van Laar et al., 2002). Deletion of Tfb1 in S. cerevisiae is lethal (Giaver et al., 2002), likely due to loss of function in transcription. The presence or absence of Rad7 and Rad16 were not confirmed, as BLAST and PSI-BLAST searches using S. cerevisiae and S. pombe sequences as queries did not return homologues from most animals or other eukaryotes besides fungi. 3.3.3 Methyltransferase Repair Methyltransferases are present in both eukaryotes and prokaryotes and remove certain DNA lesions involving methylation (O6-methylguanine, O4-methylthymine). 36 These proteins irreversibly relocate methyl groups from DNA to their own cysteine residues, and are therefore suicide enzymes (Sassanfar and Samson, 1990). E. cuniculi does not possess the methyltransferase found in other eukaryotes, Mgt1. Deletion of this gene is not lethal in S. cerevisiae (Saccharomyces genome database). 3.3.4 Mismatch Repair (MMR) In MMR, mismatches are recognized by the heterodimers MutSα (Msh2/Msh6) and MutSβ (Msh2/Msh3). Single base mismatches are recognized by MutSα and insertion/deletion loops (IDLs) less than about 9 nucleotides in length are recognized by MutSβ (Marti et al., 2002). Both MutSα and MutSβ can recognize a single unpaired nucleotide. PCNA is also involved in MMR, perhaps assisting in these initial recognition steps. MutLα (Mlh1/Pms1) binds MutSα and β and allows them to efficiently bind to IDLs and mismatches. The exonuclease Exo1 then excises the mismatched base(s) and a DNA polymerase and DNA ligase fill and seal the gap. It should be noted that the proteins required for the MMR process differ among eukaryotes. For instance, Drosophila and Caenorhabditis lack Msh3 homologues, and therefore do not require them for the removal of IDLs (Marti et al., 2002). Schizosaccharomyces does possess a Msh3 homologue, but it appears to play a different role within the cell, instead participating in recombination (Tornier et al., 2001). The majority of S. cerevisiae MMR proteins are present in E. cuniculi. The sole missing protein is Msh3, a subunit of the MutSβ heterodimer that recognizes small IDLs (See Table 3.1). Deletion of this gene in S. cerevisiae is not lethal (See discussion) (Giaver et al., 2002). 3.3.5 Homologous Recombination Repair (HRR) HRR is the major form of double strand break repair utilized in yeast. A double stranded break is recognized by damage recognition proteins, and single stranded overhangs are generated at both sides of the break. A region of the genome that is homologous to the single stranded overhangs is then found. Strand invasion follows, and 37 the homologous (non-damaged) DNA is used as a template for synthesis on the broken strand. HRR is completed through re-annealing of the broken DNA strand and ligation. See Figure 3.2 for an overview of this process. The Rad51, 52, 54, 55 and 57 proteins perform most steps of the HRR process. Rad51 is a homologue of the bacterial enzyme RecA, and is well conserved within eukaryotes. When a double strand break is formed, the MRX complex (which is composed of Mre11, Rad50 and Xrs2, and also acts in NHEJ) is involved in damage recognition. The DNA ends on either side of the break are then chewed back in the 5’ to 3’ direction by an unknown nuclease. Rad24 (which is a checkpoint protein as well) is also involved in end processing. The results of this process are short 3’ overhangs on either strand. RPA (which also acts in NER, as described above) then coats the overhangs. RPA is later replaced by Rad51, with the aid of Rad52, Rad55/Rad57, and very likely Rad54 as a genome-wide search for homologous sequences takes place. Strand invasion then occurs while the helicase Hpr5 removes Rad51 from the DNA. DNA is synthesized by an undetermined polymerase based on the donor template strands, and then ligated. Although the mechanism is not clear, it is evident that the Rad55/Rad57 complex is somehow involved in this last step. The Sgs1 helicase plays a specific role in the repair of double strand breaks generated by the stalling of a replication fork. For a review of HRR, see Aylon and Kupiec (2004). Three other signaling and damage sensor proteins are also involved in the HRR pathway, as well as the BER pathway and the NHEJ pathways. The Rad17/Med3/Ddc1 (9-1-1) complex triggers DNA damage checkpoints (Lisby and Rothstein, 2005), and stimulates repair pathways (Brandt et al., 2006) as well as various individual repair proteins, including DNA polymerase β (Toueille et al., 2004), Rad51 (Pandita et al., 2006) (HRR), Rad27 (Helt et al., 2005) (BER, NHEJ) and Cdc9 (Wang et al., 2006) (BER). E. cuniculi lacks more than half of the proteins involved in the HRR pathway. Almost all steps of the process are affected by these losses (see discussion). Missing proteins include the Hpr5 helicase, Rad54 and Rdh54 (See Figure 3.2, Table 3.1). Rad24 and the 9-1-1 complex are all absent from the cell signaling pathways. S. cerevisiae single mutants lacking these proteins are viable (Giaver et al., 2002), likely due to yeast’s 38 ability to use either double strand break repair pathway (HRR or NHEJ) to fix damaged DNA. The presence or absence of Rad55 and Rad57 was not determined. Rad55 and Rad57 are paralogs of Rad51. PSI-BLAST searches using S. cerevisiae Rad55 and Rad57 proteins retrieve Rad51 in other fungi, therefore making it difficult to discern the presence of these proteins in E. cuniculi, which is related to fungi. 3.3.6 Non-Homologous End Joining Repair (NHEJ) NHEJ is the second form of double strand break repair that is a separate, though not completely independent pathway from HRR. In S. cerevisiae this method of double strand break repair plays a minor role compared to the HRR pathway. Upon double strand break formation, damage is recognized and both ends of the lesion are brought together through the action of several proteins. A minimal amount of DNA synthesis occurs, which is followed by ligation. As DNA on either side of the break may be degenerated before the break is repaired, the potential for information loss in this case is substantial (Aylon and Kupiec, 2004). The NHEJ process begins when the Ku complex (Ku70/Ku80) binds either end of the double strand break (See Fig 3.3). These proteins are DNA-dependent protein kinases that also have a role in telomere maintenance. Once bound to the damaged site, the Ku complex is responsible for recruiting the MRX complex for the next stage in the repair process. The MRX complex is composed of Rad50 (an ATP binding protein), Mre11 (a 5’-3’ exonuclease) and Xrs2 (responsible for aligning the MRX complex with the break site) (Trujillo et al., 2003). Dnl4/Lif1 (a DNA ligase complex) is tethered to the break site by Xrs2 and the Ku complex. The DNA polymerase Pol4 and the structure-specific nuclease Rad27 are the last players to arrive at the scene, thus completing the repair complex. All of the yeast NHEJ proteins are present in most eukaryotes, and the core of Ku70 and 80 is homologous to a smaller bacterial protein that performs the same function, thus indicating a large degree of conservation. For a review of this process, see Hefferin and Tomkinson (2005). 39 E. cuniculi is missing nearly all NHEJ proteins. Absent proteins include Ku70, Ku80, Xrs2, Dnl4, and Pol4 (See Fig 3.3, Table 3.1). As is the case with single S. cerevisiae mutants for genes involved in the HRR pathway, most are viable (Giaver et al., 2002) due to yeast’s ability to rely on the other (HRR) double strand break repair pathway. Although there are animal homologues of Lif1 and Xrs2 (Xrcc4 and Nbs1, respectively), BLASTP and PSI-BLAST searches using yeast proteins did not retrieve homologues in any organisms other than fungi. The presence or absence of these proteins is therefore not known. 3.3.7 DNA Polymerases DNA polymerases are essential for both genome replication and repair. There are several polymerases present in eukaryotic cells, all of which serve particular functions within the cell. The polymerases α, δ and ε act in the process of genome replication, but also play roles in certain repair processes, notably NER and HRR. Polymerase γ acts solely within mitochondria, while all other polymerases are nuclear. Polymerase β is a specialized repair polymerase that is involved in BER and NHEJ. The polymerases ζ, η and Rev1 help prevent double stranded DNA breaks from forming during replication due to their ability to synthesize DNA through a lesion, where polymerases α, δ and ε stall and dissociate from the replication fork (Hubscher et al., 2000). Of the 8 polymerases identified in S. cerevisiae that have human counterparts (confirming that they are not fungal or ascomycete specific), E. cuniculi possesses 3: α, δ and ε (See Table 3.1). All three of these polymerases are necessary for viability in S. cerevisiae. All of the polymerases that are absent in E. cuniculi are utilized solely for repair or lesion bypass and are not essential for viability, likely because their function is replaced by other polymerases (Giaver et al., 2002). 3.4 Discussion In general, the consequences of genome reduction on DNA repair in E. cuniculi are most evident in the double strand break pathways. The single strand repair pathways 40 have been less affected, but some are operating at a level of reduced complexity compared to S. cerevisiae. 3.4.1 Reduction in Complexity of DNA Repair E. cuniculi’s BER pathway lacks the DNA ligase Cdc9, DNA polymerase β and Mms4. Although deletion of Cdc9 is lethal in S. cerevisiae (Giaver et al., 2002), the role of this protein is likely filled by another ligase. This is not unusual in S. cerevisiae, as several enzymes are sometimes able to act on the same substrate. For example, in the HRR pathway, the polymerase and nuclease have not yet been defined, likely because different combinations of polymerases and nucleases are capable of performing the required functions of this pathway (Aylon and Kupiec, 2004). The absence of DNA polymerase β could indicate that most BER in E. cuniculi is carried out via the long patch pathway, where DNA is synthesized by the polymerases δ and ε. The use of one BER pathway over another is common in eukaryotes; studies have indicated that in yeast, long patch BER is carried out preferentially instead of short patch BER, whereas in humans, the reverse is true (Kelley et al., 2003). The absence of Mms4 is not likely to have serious ramifications for BER in E. cuniculi. The Mus81-Mms4 endonuclease processes 3’ ends of nicked DNA to prepare for DNA synthesis. However, its role is predicted to be minor, and somewhat overlapping with that of the Rad1-Rad10 endonuclease, which is present (Boiteux and Guillet, 2004). The NER pathway is missing a core TFIIH component and the Rad23 subunit of the Rad4/Rad23 damage recognition complex. TFIIH is composed of a ring containing the three Tfb proteins (Tfb1, Tfb2, and Tfb3), which serve to tether the functional parts of the complex: the helicases Rad3 and Rad25 (Chang and Kornberg, 2000). Since transcription must occur in E. cuniculi, it is difficult to predict exactly how the absence of these proteins would affect this organism, as deletion of Tfb1 is lethal in S.cerevisiae. Complete absence of this protein is difficult to reconcile with the Tfb ring’s essential functions, as well as the presence of the two other ring components (See below). However, it is not unreasonable to assume that the absence of Tfb1 would likely lead to a reduction in the efficiency of this repair process, particularly when Rad23 also appears to be absent. 41 E. cuniculi also lacks Msh3, which interacts with Msh2 to form MutSβ, which recognizes insertion or deletion loops (IDLs) in the MMR pathway. In S. cerevisiae, deletion of Msh3 is not lethal, but mutants are slightly more prone to frameshift mutations (Saccharomyces genome database). Although the MutSβ heterodimer is present in S. cerevisiae, Schizosaccharomyces, humans and Arabidopsis, its presence is not ubiquitous among eukaryotes. Drosophila and Caenorhabditis lack Msh3, where it appears that the MutSα complex is able to recognize both mismatches and insertion or deletion loops (Marti et al., 2002). Drosophila and Caenorhabditis are able to effectively perform MMR in the absence of Msh3, which is the sole missing protein in E. cuniculi. Therefore, it is very likely that this pathway operates in E. cuniculi in a similar manner to these organisms, whose MMR systems are fully functional. The absence of the DNA methyltransferase Mgt1 suggests that E. cuniculi is able to employ other methods to remove O6-methylguanine from its DNA. In the bacterium Escherichia coli, O6-methylguanine can be removed by both the NER and the methyltransferase mechanisms (Samson et al., 1988), therefore it is likely that E. cuniculi has simply dispensed with one of two parallel pathways. 3.4.2 DNA Polymerases and Repair Eukaryotes and prokaryotes possess many specialized DNA polymerases to accomplish specific tasks within the cell. Some of these polymerases are involved in genome replication, while others act solely in repair processes. E. cuniculi possesses only three DNA polymerases (α, δ and ε) of the 8 present in S. cerevisiae. All of these polymerases are involved in standard genome replication, while polymerase δ also plays a role in BER, NER, MMR and in bypassing DNA lesions (Hubscher et al., 2000). Polymerase ε is required for BER and probably NER. Polymerases α, δ and ε are all likely utilized in HRR (Hubscher et al., 2000). E. cuniculi lacks polymerase β, which is utilized in a variety of repair pathways and polymerases ζ and η, which are used for error-prone and error-free DNA synthesis across lesions, 42 respectively (Hubscher et al., 2000). E. cuniculi also lacks the mitochondrial DNA polymerase γ. S. cerevisiae mutants lacking polymerase β display a high frequency of recombination and sensitivity to methyl methanesulfonate (MMS). Rev1 mutants display decreased revertibility, while polymerase η mutants have a heightened sensitivity to UV radiation. Conversely, polymerase ζ deletion mutants resist UV mutagenesis. Cells lacking polymerase γ lose their mitochondrial DNA (Saccharomyces genome database), however microsporidian mitochondria (mitosomes) are highly reduced and it is unlikely that they possess autonomous DNA (Williams et al., 2002). The phenotype of a S. cerevisiae cell lacking several polymerases is not known, but one could speculate that such cells would display a higher frequency of double stranded DNA breaks generated during replication due to a lack of translesion polymerases. 3.4.3 Double Strand Break Repair in E. cuniculi The fact that most of the NHEJ repair proteins appear to be absent in E. cuniculi is perhaps not overly surprising, as this method of double strand break repair appears to be a back-up method in yeast (Aylon and Kupiec, 2004). (Note that this preference is not strictly maintained throughout eukaryotic life. Humans, for example, use NHEJ as the primary pathway (Taylor and Lehmann, 1998).) E. cuniculi’s genome is known to be highly reduced compared to that of S. cerevisiae. Therefore, it seems logical that the first genes to be deleted from a genome undergoing reduction would be those encoding proteins that act in back-up pathways. Of key interest is the lack of Ku proteins (Ku70 and Ku80) in E. cuniculi. These proteins play a pivotal role in NHEJ; they are involved in recognizing double strand break sites and in recruiting other repair factors to the break site. Not only is their function key, but they are present in archaebacteria, bacteria and eukaryotes. The core of the Ku proteins is largely conserved from prokaryotes to eukaryotes (Hefferin and Tomkinson, 2005). However, the absence of these proteins in E. cuniculi is not entirely unique, as we were also unable to identify Ku proteins in the genome of the human parasite Plasmodium, nor has it been recognized in Trichomonas (Carlton et al., 2007). 43 Dispensing with a backup double strand break repair pathway during genome reduction would stand to reason if the primary repair pathway was retained, however, this is also highly questionable. E. cuniculi also lacks over half of the HRR proteins that are present in yeast (See Table 3.1). The DNA helicase Hpr5, Rad54, Rdh54 and the checkpoint/DNA end-processing Rad24 are among the proteins that appear to be absent from the HRR pathway. Hpr5 plays a cryptic role in HRR, as S. cerevisiae deletion mutants have hyperrecombination phenotypes (Rong et al., 1991), and the protein was therefore assumed to be a negative regulator of the process. However, recent work by Aylon et al. (2003) has shown that Hpr5 is intimately involved in commitment to gene conversion, which must take place before recombination can occur. Rdh54 is a Rad54 homolog that participates in interhomologue gene conversion and meiosis (Klein, 1997), while Rad54 is a chromatin remodeling protein that has been implicated in strand invasion and the removal of repair proteins from DNA after HRR has taken place (Miyazaki et al., 2004). In addition to functioning as a checkpoint protein, Rad24 also plays a role in the resection and recombination processes (Aylon and Kupiec, 2003). It is possible that the functions of Rad55 and Rad57 (which are potentially absent) are carried out by Rad51, as all three proteins are homologues of the bacterial protein RecA. This is a distinct possibility, as Rad55 and Rad57 appear to act in concert with Rad51 during the HRR process (Aylon and Kupiec, 2004). Although the Rad51, Rad52 and Sgs1 proteins are present in E. cuniculi, it is not known whether HRR can take place in the absence of all other HRR components. It is difficult to imagine this process occurring in the absence of DNA resection (Rad24), strand invasion (Rad54) and gene conversion (Hpr5 and Rdh54). Therefore, E. cuniculi appears to have drastically reduced both mechanisms for double strand break repair. Although E. cuniculi’s genome contains very few duplicate genes (regions of homologous sequence) to use as templates for DNA synthesis in HRR, both S. cerevisiae (Kadyk and Hartwell, 1992) and mammals (Johnson and Jasin, 2000) prefer to use sister chromatids rather than homologous sequences (on the same or different chromosomes) for this process. As E. cuniculi is likely diploid (Brugere et al., 44 2000) (as are yeast and mammals), it is reasonable to assume that this preference would exist in this organism as well. Given that such a large number of genes involved in both double strand break repair pathways are absent, it is curious that some of these genes have been retained. When one looks closely at the functions of these genes, it is evident that they all play roles in other critical biological processes. Mre11 and Rad50, both members of the MRX complex (found in both double strand break repair pathways), are also involved in telomere maintenance and the generation of meiotic double strand breaks (Symington, 2002; Krogh and Symington, 2004). Rad27 is a nuclease that is implicated in the processing of Okazaki fragments during replication (Ayyagari et al., 2003). All of the proteins belonging to the HRR pathway that are present in E. cuniculi are also involved in meiosis (Symington, 2002). Although sexual reproduction has not been observed in E. cuniculi, it does contain three of the seven core meiosis-specific genes (Hop2, Mnd1 and Spo11), as discussed in Ramesh et al. (2005), and there is evidence that it may possess a mating type locus (Burglin, 2003). Sexual reproduction has also been observed in numerous other microsporidia (Wittner and Weiss, 1999), therefore there is little reason to suspect that E. cuniculi is an exception. As a large number of proteins involved in the HRR pathway are absent in E. cuniculi, the repair functions of the remaining proteins are unknown. It is possible that they have been retained because of their role in meiosis. 3.4.4 Potential Consequences for E. cuniculi Reductions within the DNA repair pathways have led to two fundamentally different outcomes: reduced complexity by loss of a few proteins (NER, MMR, BER) and drastic losses of half or more proteins involved in a pathway (methyltransferase repair, HRR and NHEJ). Although an organism may be able to tolerate a somewhat sloppy repair system, it is difficult to imagine how the organism could exist without any means to mend double-strand DNA breaks, especially given their frequency during meiosis and mitosis. E. cuniculi must, therefore, utilize some other form of double strand break repair, or contain such highly divergent copies of most NHEJ and HRR proteins that they were impossible to identify in this study. 45 Along with many of the proteins that carry out the work of repair, E. cuniculi has lost several proteins that participate in cell signaling and cycle control. The 9-1-1 signaling complex is absent, which has been proposed to play a role in the signaling cascade leading to cell cycle arrest and apoptosis (Parrilla-Castellar et al., 2004). Both the Ku and MRX complexes are also involved in cell cycle control, although their roles are not well defined (Lee et al., 1998). Loss of coordination of cellular activities could result from the absence of these proteins. In addition to their role in repair, the Ku proteins protect telomeres from degradation and help to control telomerase activity (Lee et al., 1998). As E. cuniculi houses eleven chromosomes that contain telomeres (Brugere et al., 2000), and encodes the catalytic subunit of the telomerase enzyme (Katinka et al., 2001), this organism must have developed an alternate method to maintain its telomeres, or it would suffer extreme telomere attrition. Like the Ku proteins, the DNA ligase Cdc9 performs several functions as well. It plays a role in recombination and in the ligation of Okazaki fragments during replication (Johnston, 1983), therefore, it is possible that these processes are somewhat impaired. The reduction that is observed within the DNA repair pathways is similar to that observed throughout E. cuniculi’s genome, as this organism lacks many proteins that participate in diverse biosynthetic pathways. In this way, the genome of E. cuniculi is very similar to those of many endosymbiotic and parasitic bacteria. Buchnera aphidicola has also lost many DNA repair genes during the process of genome reduction; indeed it has been proposed that it was this lack of DNA repair genes that allowed Buchnera’s genome to become so small in the first place (Moran and Mira, 2001). We cannot rule out the possibility that our bioinformatics tools were unsuccessful in locating highly divergent proteins that act in the DNA repair processes. It is also possible that in some cases, other non-homologous proteins carry out essential functions to replace absent proteins (ie. Tfb1 in NER). Such proteins may still be identified, as roughly half of E. cuniculi’s genome consists of hypothetical proteins (Katinka et al., 2001). Another potential explanation for this lack of biosynthetic machinery is that E. cuniculi is able to import many of the products of these pathways from the host’s cytoplasm (ie., ATP) (Wittner and Weiss, 1999). However, it seems unlikely that this 46 would be the case for DNA repair proteins, as protein uptake has not been documented in microsporidia, and these proteins would have to be targeted to the nucleus in order for them to function. For the moment, double strand break repair in E. cuniculi will remain a mystery. Base Excision Repair (BER) Nucleotide Excision Repair (NER) Gene Name Present in E. cuniculi S. cerevisiae Protein ID E. cuniculi Protein ID Gene Name Present in E. cuniculi S. cerevisiae Protein ID E. cuniculi Protein ID Apn1 Y CAA81954 CAD26065 Rad1 (XPF) Y P06777 CAD26381 Apn2 Y NP_009534 NP_585892 Rad2 (XPG) Y CAA97287 CAD27314/XP_955715 Mag1 Y NP_011069 CAD25924/CAD26679 Rad3 (Ercc2/XPD) Y CAA46255 NP_585776 Mus81 Y NP_010674 NP_584664 Rad4 (XPC) Y CAA39375 CAD24914 Ntg1 Y NP_009387 CAD26394 Rad10 (Ercc1) Y CAA86642 CAD25852/NP_586248 Ogg1 Y NP_013651 CAD26383 Rad14 (XPA) Y P28519 NP_586232 PCNA Y AAS56041 NP_597446 Rad25(Ssl2/XPB) Y Q00578 CAD24977 Rad1 (XPF) Y P06777 CAD26381 Rad26(Ercc6/CSB) Y CAA57290 CAD27013/ XP_955594 Rad10 (Ercc1) Y CAA86642 CAD25852/NP_586248 Rpa1 Y NP_009404 CAD25779 Rad27 (Fen1) Y CAA81953 CAD26252 Rpa2 Y NP_014087 CAD25396 Ung1 Y CAA86634 CAD26772 Ssl1 Y CAA97527 CAD25215 *Xrcc1 (Cut5) Y P32372 NP_584657 Tfb2 Y AAB40628 CAD24937 Cdc9 (DNA lig I) N CAA48158 - Tfb3 Y AAB64899 CAD25932 Ddc1 (Rad9) N NP_015130 - Tfb4 Y NP_015381 CAD25620 Mec3 (Hus1) N NP_013391 - Rad7 ? CAA85071 - Mms4 N NP_009656 - Rad16 ? CAA89580 - Rad17 (Rad1) N CAA99699 - Rad23(HR23B) N AAB28441 - Tfb1 N AAB64747 - Methyltransferase Repair DNA Polymerases Gene Name Present inE. cuniculi S. cerevisiae Protein ID E. cuniculi Protein ID Gene Name Present in E. cuniculi S. cerevisiae Protein ID E. cuniculi Protein ID Mgt1 N CAA42920 - α Pol1 Y AAZ22505 CAD26619 α Pol12 Y NP_009518 CAD25827 α Pri1 Y AAT92878 CAD26368 Mismatch Repair (MMR) α Pri2 Y NP_012879 CAD26641 δ Pol3 Y CAA43922 CAD27015 Gene Name Present inE. cuniculi S. cerevisiae Protein ID E. cuniculi Protein ID δ Hys2 Y NP_012539 CAD27050 Exo1 Y NP_014676 CAD25986 ε Dpb2 Y NP_015501 NP_597373 Mlh1 Y NP_013890 NP_597370 ε Pol2 Y NP_014137 CAD25840 Msh2 Y CAA99102 CAD26200/NP_597565 β Pol4 N NP_009940 - Msh6 Y NP_010382 NP_586186 γ Mip1 N NP_014975 - PCNA Y AAS56041 NP_597446 η Rad30 N NP_010707 - Pms1 Y P14242 NP_586432 ζ Rev3 N NP_015158 - Msh3 N CAA42247 - ζ Rev7 N AAA98667 - Rev1 N CAA99674 - 47 Homologous Recombination (HRR) Non-Homologous End Joining(NHEJ) Gene Name Present inE. cuniculi S. cerevisiae Protein ID E. cuniculi Protein ID Gene Name Present in E. cuniculi S. cerevisiae Protein ID E. cuniculi Protein ID Mre11 (Rad32) Y BAA02017 CAD26648/ NP_597471 Mre11 (Rad32) Y BAA02017 CAD26648/ NP_597471 Rad50 Y CAA65494 CAD25593/NP_585989 Rad27 (Fen1) Y CAA81953 CAD26252 Rad51 Y CAA45563 CAD25992/NP_586388 Rad50 Y CAA65494 CAD25593/ NP_585989 Rad52 Y CAA86623 XP_955647 Lif1 (Xrcc4) ? NP_011425 - Rpa1 Y NP_009404 CAD25779 *Xrs2 (Nbs1) ? BAC80248 - Rpa2 Y NP_014087 CAD25396 Ddc1 (Rad9) N NP_015130 - Sgs1 Y NP_013915 CAD25646 Dnl4 N CAA99193 - Rad55 (Rad51C) ? BAA01284 - Mec3 (Hus1) N NP_013391 - Rad57 (Xrcc3) ? NP_010287 - Ku70 N NP_014011 - *Xrs2 (Nbs1) ? BAC80248 - Ku80 N NP_013824 - Ddc1 (Rad9) N NP_015130 - Rad17 (Rad1) N CAA99699 - Hpr5 (Srs2) N CAA89385 - Mec3 (Hus1) N NP_013391 - Rad17 (Rad1) N CAA99699 - Rad24 (Rad17) N P32641 - Rad54 N CAA88534 - Rdh54 (Rad54B) N CAA85017 - Table 3.1: S. cerevisiae DNA Polymerases and Proteins that Participate in the Five Primary DNA Repair Pathways. Presence or absence in E. cuniculi is indicated (as defined in Methods), along with the genbank accession numbers for both S. cerevisiae and E. cuniculi proteins. Absent proteins are presented in bold type. Italicized accession numbers indicate that the presence or absence of these proteins in E. cuniculi was unclear. (See results for more information.) When S. pombe proteins appeared to be more conserved among eukaryotes than S. cerevisiae homologues (or where S. cerevisiae homologues do not exist), they were used to conduct the BLAST and PSI-BLAST searches. These proteins are marked with asterixes. S. cerevisiae nomenclature is used, with S. pombe or animal homologues given in brackets. Pathway components were largely compiled from the following sources: MMR from Marti et al. (2002), BER from Boiteux and Guillet (2004), NER from Prakash and Prakash (2000), NHEJ from Daley et al. (2005), HRR from Aylon and Kupiec (2004), DNA polymerases from Burgers (1998) and Hubscher et al. (2000). 48 Figure 3.1: A Comparison of the Five Major DNA Repair Pathways. (See text for explanation.) Newly synthesized DNA is indicated in grey. 49 Figure 3.2: The Homologous Recombination Repair Pathway. (See text for explanation.) Blue proteins are present in E. cuniculi; all others are absent. Newly synthesized DNA is indicated in grey. Although the MRX complex 50 (Mre11/Rad50/Xrs2) acts in damage recognition in this pathway, it is not shown. (Modified from Aylon and Kupiec (2004).) Figure 3.3: The Non-Homologous End Joining Repair Pathway. 51 (See text for explanation.) Blue proteins are present in E. cuniculi; all others are absent. Newly synthesized DNA is indicated in grey. (Modified from Hefferin and Tomkinson (2005).) 52 3.5 References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. Aylon, Y. and Kupiec, M. 2003. The checkpoint protein Rad24 of Saccharomyces cerevisiae is involved in processing double-strand break ends and in recombination partner choice. Mol. Cell. Biol. 23: 6585-6596. Aylon, Y. and Kupiec, M. 2004. DSB repair: the yeast paradigm. DNA Repair 3: 797- 815. Aylon, Y., Liefshitz, B., Bitan-Banin, G. and Kupiec, M. 2003. Molecular dissection of mitotic recombination in the yeast Saccharomyces cerevisiae. Mol. Cell. Biol. 23: 1403- 1417. Ayyagari, R., Gomes, X.V., Gordenin, D.A. and Burgers, P.M.J. 2003. Okazaki fragment maturation in yeast. I. Distribution of functions between FEN1 and DNA2. J. Biol. Chem. 278: 1618-1625. Bertuch, A.A. and Lundblad, V. 2003. Which end: dissecting Ku’s function at telomeres and double-strand breaks. Genes Dev 17: 2347-2350. Boiteux, S. and Guillet, M. 2004. Abasic sites in DNA: repair and biological consequences in Saccharomyces cerevisiae. DNA Repair 3: 1-12. Brandt, P.D., Helt, C.E., Keng, P.C. and Bambara, R.A. 2006. The Rad9 protein enhances survival and promotes DNA repair following exposure to ionizing radiation. Biochem Bioph Res Co 347: 232-237. Brugere, J.F., Cornillot, E., Metener, G., Bensimon, A. and Vivares, C. 2000. Encephalitozoon cuniculi (Microsporidia) genome: physical map and evidence for telomere-associated rDNA units on all chromosomes. Nucleic Acids Res. 28: 2026-2033. 53 Burgers, P.M.J. 1998. Eukaryotic DNA polymerases in DNA replication and DNA repair. Chromosoma 107: 218-227. Burglin, T.R. 2003. The homeobox genes of Encephalitozoon cuniculi (Microsporidia) reveal a putative mating locus. Dev. Genes Evol. 213: 50-52. Carlton, J.M., Hirt, R.P., Silva, J.C., Delcher, A.L., Schatz, M., Zhao, Q., Wortman, J.R., Bidwell, S.L., Alsmark, U.C.M., Besteiro, S. et al. 2007. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315: 207-212. Chang, W.H. and Kornberg, R.D. 2000. Electron crystal structure of the transcription factor and DNA repair complex, core TFIIH. Cell 102: 609-613. Daley, J.M., Palmbos, P.L., Wu, D. and Wilson, T.E. 2005. Nonhomologous end joining in yeast. Annu. Rev. Genet. 39: 431-451. Database of Interacting Proteins [http://dip.doe-mbi.ucla.edu/] Fischer, W.M. and Palmer, J.D. 2005. Evidence from small-subunit ribosomal RNA sequences for a fungal origin of microsporidia. Mol. Phylogenet. Evol. 36: 606-622. Fleck, O. and Nielsen, O. 2004. DNA repair. J. Cell. Sci. 117: 515-517. Giaver, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Véronneau, S., Dow, S., Lucau- Danila, A., Anderson, K., André, B. et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418: 387-391. Gillette, T.G., Yu, S., Zhou, Z., Waters, R., Johnston, S.A. and Reed, S.H. 2006. Distinct functions of the ubiquitin-proteasome pathway influence nucleotide excision repair. EMBO J 25: 2529-2538. Hefferin, M.L. and Tomkinson, A.E. 2005. Mechanism of DNA double-strand break repair by non-homologous end joining . DNA Repair 4: 639-648. 54 Helt, C.E., Wang, W., Keng, P.C. and Bambara, R.A. 2005. Evidence that DNA damage detection machinery participates in DNA repair. Cell Cycle 4: 529-532. Hirt, R.P., Healey, B., Vossbrinck, C.R., Canning, E.U. and Embley, T.M. 1997. A mitochondrial Hsp70 orthologue in Varimorpha necatrix: molecular evidence that microsporidia once contained mitochondria. Curr. Biol. 7: 995-998. Hopfner, K.P., Putnam, C.D. and Tainer, J.A. 2002. DNA double-strand break repair from head to tail. Curr. Opin. Struc. Biol. 12: 115-122. Hubscher, U., Nasheuer, H.P. and Syvaoja, J.E. 2000. Eukaryotic DNA polymerases, a growing family. Trends. Biochem. Sci. 25: 143-147. James, T.Y., Kauff, F., Schoch, C.L., Matheny, P.B., Hofstetter, V., Cox, C.J., Celio, G., Gueidan, C., Fraker, E., Miadlikowska, J. et al. 2006. Reconstructing the early evolution of fungi using a six-gene phylogeny. Nature 443: 818-822. Johnson, R.D. and Jasin, M. 2000. Sister chromatid gene conversion is a prominent double-strand break repair pathway in mammalian cells. EMBO J. 19: 3398-3407. Johnston, L.H. 1983. The Cdc9 ligase joins completed replicons in baker’s yeast. Mol. Gen. Genet. 190: 315-317. Kadyk, L.C. and Hartwell, L.H. 1992. Sister chromatids are preferred over homolgs as substrates for recombinational repair in Saccharomyces cerevisiae. Genetics 132: 387- 402. Katinka, M.D., Duprat, S., Cornillot, E., Metenier, G., Thomarat, F., Prensier, G., Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P. et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414: 450-453. Keeling, P.J. 2003. Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet. Biol. 38: 298-309. 55 Kelley, M., Kow YW and Wilson 3rd DM. 2003. Disparity between DNA base excision repair in yeast and mammals: translational implications. Cancer Res 63: 549-554. Klein, H.L. 1997. RDH54, a RAD54 homologue in Saccharomyces cerevisiae, is required for mitotic diploid-specific recombination and repair and for meiosis. Genetics 147: 1533-1543. Krogh, B.O. and Symington, L.S. 2004. Recombination proteins in yeast. Annu. Rev. Genet. 38: 233-271. Krokan, H.E., Standal, R. and Slupphaug, G. 1997. DNA glycosylases in the base excision repair of DNA. Biochem. J. 325: 1-16. Lee, S.E., Moore, J.K., Holmes, A., Umezu, K., Kolodner, R.D. and Haber, J.E. 1998. Saccharomyces Ku70, Mre11/Rad50 and RPA proteins regulate adaptation to G2/M arrest after DNA damage. Cell 94: 399-409. Lisby, M. and Rothstein, R. 2005. Localization of checkpoint and repair proteins in eukaryotes. Biochimie 87: 579-589. Marti, T.M., Kunz, C. and Fleck, O. 2002. DNA mismatch repair and mutation avoidance pathways. J. Cell. Physiol. 191: 28-41. Miyazaki, T., Bressan, D.A., Shinohara, M., Haber, J.E. and Shinohara, A. 2004. In vivo assembly and disassembly of Rad51 and Rad52 complexes during double-strand break repair. EMBO J. 23: 939-949. Moran, N.A. and Mira, A. 2001. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol 2: RESEARCH0054. Pandita, R.K., Sharma, G.G., Laszlo, A., Hopkins, K.M., Davey, S., Chakhparonian, M., Gupta, A., Wellinger, R.J., Zhang, J., Powell, S.N. et al. 2006. Mammalian Rad9 plays a role in telomere stability, S- and G2-phase-specific cell survival, and homologous recombination repair. Mol Cell Biol 26: 1850-1864. 56 Parrilla-Castellar, E.R., Arlander, S.J.H. and Karnitz, L. 2004. Dial 9-1-1 for DNA damage: the Rad9-Hus1-Rad1 (9-1-1) clamp complex. DNA Repair 3. Prakash, S. and Prakash, L. 2000. Nucleotide excision repair in yeast. Mutat. Res. 451: 13-24. Ramesh, M.A., Malik, S.B. and Logsdon, J.M. 2005. A phylogenomic inventory of meiotic genes; evidence for sex in Giardia and an early eukaryotic origin of meiosis. Curr. Biol. 15: 185-191. Rong, L., Palladino, F., Aguilera, A. and Klein, H.L. 1991. The hyper-gene conversion hpr5-1 mutation of Saccharomyces cerevisiae is an allele of the SRS2/RADH gene. Genetics 127: 75-85. Saccharomyces Genome Database [http://www.yeastgenome.org/] Samson, L., Thomale, J. and Rajewsky, M.F. 1988. Alternative pathways for the in vivo repair of O6-alkylguanine and O4-alkylthymine in Escherichia coli: the adaptive response and nucleotide excision repair. EMBO J. 7: 2261-2267. Sassanfar, M. and Samson, L. 1990. Identification and preliminary characterization of an O6-methylguanine DNA repair methyltransferase in the yeast Saccharomyces cerevisiae. J Biol Chem 265: 20-25. Sung, J. and Demple, B. 2006. Roles of base excision repair subpathways in correcting oxidized abasic sites in DNA. FEBS J. 273: 1620-1629. Symington, L.S. 2002. Role of RAD52 epistasis group genes in homologous recombination and double-strand break repair. Microbiol. Mol. Biol. R. 66: 630-670. Taylor, E.M. and Lehmann, A.R. 1998. Review: conservation of eukaryotic DNA repair mechanisms. Int.J.Radiat. 74: 277-286. Tornier, C., Bessone, S., Varlet, I., Rudolph, C., Darmon, M. and Fleck, O. 2001. Requirement for Msh6, but not for Swi4 (Msh3), in Msh2-dependent repair of base-base 57 mismatches and mononucleotide loops in Schizosaccharomyces pombe. Genetics 158: 65-75. Toueille, M., El-Andaloussi, N., Frouin, I., Freire, R., Funk, D., Shevelev, I., Friedrich- Heineken, E., Villani, G., Hottiger, M.O. and Hubscher, U. 2004. The human Rad9/Rad1/Hus1 damage sensor clamp interacts with DNA polymerase beta and increases its DNA substrate utilization efficiency: implications for DNA repair. Nucleic Acids Res 22: 3316-3324. Trujillo, K.M., Roh, D.H., Chen, L., Van Komen, S., Tomkinson, A. and Sung, P. 2003. Yeast Xrs2 binds DNA and helps target Rad50 and Mre11 to DNA ends. J. Biol. Chem. 278: 48957-48964. van den Boom, V., Jaspers, N.G.J. and Vermeulen, W. 2002. When machines get stuck- obstructed RNA polymerase II: displacement, degradation or suicide. BioEssays 24: 780- 784. van Laar, T., van der Eb, A.J. and Terleth, C. 2002. A role for Rad23 proteins in 26S proteasome-dependent protein degradation? Mutat Res 499: 53-61. Wang, W., Lindsey-Boltz, L.A., Sancar, A. and Bambara, R.A. 2006. Mechanism of stimulation of human DNA ligase I by the Rad9-Rad1-Hus1 checkpoint complex. J Biol Chem 281: 20865-20872. Williams, B.A.P., Hirt, R.P., Lucocq, J.M. and Embley, T.M. 2002. A mitochondrial remnant in the microsporidian Trachipliestophora hominis. Nature 418: 865-869. Wittner, M. and Weiss, L.M. 1999. The microsporidia and microsporidosis. ASM Press, Washington, D. C. 58 Chapter 4 - Genome Reduction and the E. cuniculi Spliceosome3 4.1 Introduction 4.1.1 Microsporidia and the Microsporidian Genome Microsporidia are unicellular eukaryotes that lead an obligate parasitic lifestyle. They are only able to grow inside host cells and pass from one host individual to another via unique spores. Although these spores are highly adapted to parasitism, they lack many features that are typically found in eukaryotic cells, such as standard mitochondria and peroxisomes. Concomitant with their reduced complement of typical cellular structures, microsporidia have extremely small genomes for eukaryotes (2.3-19.5 Mbp). We have learned what makes some microsporidian genomes so small from the genome of Encephalitozoon cuniculi, a human parasite with a genome of 2.9 Mbp. E. cuniculi’s genome is much more compact than most eukaryotes (nearly 1 gene per kb) and also possesses far fewer proteins (~2000). In addition, E. cuniculi’s genes are significantly shorter than their yeast counterparts and contain fewer introns (Katinka et al., 2001). The E. cuniculi genome contains a mere 14 verified introns that are very small in size (23- 49bp, See Chapter 5). Although E. cuniculi’s introns are tiny, nearly all would produce frameshift mutations in the resulting proteins if they were not removed. E. cuniculi maintains a reduced set of splicing machinery (Collins and Penny, 2005) and spores contain at least one protein that is the product of spliced transcripts (Brosson et al., 2006). The spliceosome is functional for at least part of the life cycle, as Chapter 5 demonstrates that splicing occurs in meronts. 4.1.2 Splicing and the Spliceosome Among the different types of introns present in eukaryotes, type III or spliceosomal introns are the most pervasive. Unlike self-splicing introns, they require a complex array of RNA and proteins for their removal. The spliceosome rivals the 3 A version of this chapter may be submitted for publication. Gill, E. E.; Limpright, V.O.; Fast, N.M. E. cuniculi spliceosomal proteins fail to complement yeast knockout mutants. 59 ribosome in complexity and contains more than 70 proteins in yeast (Kaufer and Potashkin, 2000), as well as five small nuclear (sn) RNAs. The snRNAs and proteins form small nuclear ribonucleoprotein complexes (snRNPs) that interact with introns in a stepwise manner to catalyze their removal. The splicing process is conserved between yeast and humans, as are the steps involved spliceosome assembly (Schellenberg et al., 2008). The following is a concise description of the spliceosomal intron excision process, which is described in more detail in Will and Luhrmann (Will and Luhrmann, 1997) and Collins and Guthrie (Collins and Guthrie, 2000). Initially, the U1 snRNA (complexed with the U1 snRNP) base pairs with the 5’ end of the intron, followed by the binding of the U2 snRNA (complexed with the U2 snRNP) with the branchpoint consensus. The U4/U6•U5 snRNP is then recruited to the intron, where a structural rearrangement occurs and the U1 and U4 snRNPs are ejected from the complex. The remaining structure of the intron and the U2, U5 and U6 snRNPs forms the catalytically active spliceosome. Several transesterification reactions result in the excision of the intron in the form of a lariat and the spliced mRNA. Although several proteins are present in various snRNP complexes (such as the Sm/LSm proteins), many proteins interact with only one snRNP. The proteins that form the spliceosome also differ between species (Collins and Penny, 2005;Kaufer and Potashkin, 2000). As is the case with many E. cuniculi proteins, the proteins involved in splicing are generally much shorter than their yeast homologues. In addition, the E. cuniculi spliceosome is extremely reduced and therefore lacks several proteins that are essential for viability in Arabidopsis thaliana, Saccharomyces cerevisiae, and humans. These proteins are involved in all steps of splicing (Collins and Penny, 2005). Since E. cuniculi contains such a reduced set of splicing machinery, this organism affords us an opportunity to examine the mechanism of splicing at its functional core. In order to ascertain how splicing operates in E. cuniculi, we examined three E. cuniculi proteins that are essential for viability in A. thaliana, S. cerevisiae and humans and tested their abilities to functionally complement S. cerevisiae knockouts. (See Table 4.1 for a list.) 60 4.1.3 Dib1 Dib1 is a 17kDa protein that acts in the U4/U6•U5 snRNP, the G2/M transition, the export of a variety of mRNAs from the nucleus and plays a role in the regulation of the anaphase promoting complex (Berry and Gould, 1997;Carnahan et al., 2005;Zhang et al., 1999). Among these functions, Dib1’s role in the G2/M transition of the cell cycle has been the most thoroughly characterized. The protein is evolutionarily conserved in eukaryotes, and belongs to a family of proteins containing a thioredoxin fold. However, Dib1 does not function in the same manner as other proteins of this group, as the catalytic site is absent. Although the molecular action of Dib1 has yet to be elucidated, several groups have performed both 3D modeling and mutation assays in order to discern which amino acids are necessary for function (Reuter et al., 1999). Reuter et al. suggest that there are 20 such residues, which fall into three categories: conserved basic residues, the residues that are present in the hydrophobic cleft and those that are present in the hydrophobic groove. Eight of twenty broadly conserved amino acids predicted to be important for function have been replaced by non-conservative substitutions in E. cuniculi Dib1 (EcDib1) (See Table 4.2). Dib1 is a globular protein that contains 3 beta sheets and 5 alpha helices. The crystal structure has been determined for the human orthologue, and structural predictions performed by SWISS-MODEL software (Arnold et al., 2006;Guex and Peitsch, 1997;Schwede et al., 2003) on the EcDib1 protein indicate that although it shares only 33% identity with the human protein and 31% identity with the yeast protein, it is very likely to possess the same secondary structures. Although experiments conducted by Berry and Gould (Berry and Gould, 1997) indicate that both mouse and Schizosaccharomyces pombe orthologues are able to complement S. cerevisiae deletion mutants, EcDib1 is highly divergent (See Figure 4.4). 4.1.4 Prp22 Prp22 is a 130kDa DEAH-box RNA helicase that is not part of an snRNP, but participates in the second transesterification step and acts to free spliced mRNA from the spliceosome (Schneider and Schwer, 2001). In S. cerevisiae, the Prp22 protein is 1145 amino acids in length and, like other members of the DEAH family, contains seven 61 conserved motifs that are essential for function. E. cuniculi’s Prp22 protein (EcPrp22) is 784 amino acids in length, and lacks amino acids homologous to the first 265 of the S. cerevisiae protein. Intriguingly, these residues do not appear to be necessary for function (Schneider and Schwer, 2001). There is an S1 motif contained in this region of the Prp22 homologues of most species. Although the exact function of the motif in Prp22 remains to be ascertained, it has been found to bind RNA in other proteins (Meka et al., 2005). EcPrp 22 contains the seven motifs shared by other DEAH helicases, but with some substitutions (See Table 4.3). The substitutions are in all of the motifs except for number VI, which may play a role in ATP hydrolysis/nucleotide binding (Schneider et al., 2002). The crystal structure of this protein has not been solved, therefore structural prediction and comparison could not be performed. 4.1.5 Brr2 The 246 kDa Brr2 protein interacts with many of the snRNPs, as it works to dissociate the U4 and U6 snRNPs from each other so that they are able to base pair with the pre-mRNA. Brr2 is also involved in releasing the components of the spliceosome once the splicing reaction has occurred (Small et al., 2006). Like Prp22, Brr2 is an RNA helicase, but belongs to the DE(I/V)H family. Brr2 possesses two helicase-like domains, the second of which is responsible for most of its protein-protein interactions (van Nues and Beggs, 2001). E. cuniculi’s Brr2 protein (EcBrr2) is drastically shorter than its yeast homologue (1058 amino acids vs. 2163), and this size difference is mostly due to the lack of the second helicase-like domain. This domain is conserved in multiple diverse lineages, and may act to regulate the action of the protein itself. However, most of the conserved motifs in the first domain that are believed to be important for function (Raghunathan and Guthrie, 1998) are present in EcBrr2 (See Table 4.4). The crystal structure of this protein has not been solved, therefore structural comparisons could not be performed. Dib1, Prp22 and Brr2 were selected as candidates for complementation based on E. cuniculi sequence similarity to yeast homologues (BLAST e-value <10-10), presence of homologues in yeast, human and Arabidopsis (broad phylogenetic distribution), inviability of yeast knockouts, size (between 100 and 1500 amino acids) and number of 62 protein interaction partners. See Table 4.1 for a comparison of E. cuniculi proteins to their yeast homologues. 4.2 Materials and Methods The primers used to amplify spliceosomal genes from E. cuniculi genomic DNA were designed based on the genbank sequences NP_584652 (Dib1), XP_955567 (Prp22) and NP_586011 (Brr2). The forward primer for EcDib1 contains a BamHI site and the reverse primer contains a ClaI site. The forward primers for EcBrr2 and EcPrp22 contain SmaI sites and the reverse primers contain EcoRI sites. Following PCR amplification with Platinum Pfx DNA polymerase (Invitrogen), 3’A overhangs were added to the products using Taq DNA Polymerase (Invitrogen). The products were then ligated into the TOPO 2.1 cloning vector (Invitrogen). Cloning vectors containing E. cuniculi genes were digested with BamHI and ClaI to release the Dib1 fragment and SmaI and EcoRI to release the Brr2 and Prp22 fragments. The fragments were then ligated into a similarily digested 2µ yeast expression vector containing the Ura3 gene for Dib1 (p426 (Mumberg et al., 1995)) and His3 gene for Brr2 and Prp22 (p423 (Mumberg et al., 1995)). The resulting plasmids were then sequenced to verify protein-coding regions. A positive control expression vector was constructed containing S. cerevisiae Dib1 (Genbank Q06819) in an identical manner. See Table 4.5 for a list of plasmids used in this study. EcDib1 and S. cerevisiae Dib1-containing plasmids were transformed into a heterozygous diploid Dib1 knockout yeast strain (Open Biosystems) as previously described (Adams et al., 1998) and plated onto selective media lacking uracil. The yeast was grown for two days at 30oC. Individual colonies were grown overnight at 30oC in YPAD medium then sporulated and tetrads were dissected as described previously (Adams et al., 1998). EcBrr2-containing and empty plasmids (p423) were transformed into a haploid Brr2 knockout strain containing S. cerevisiae Brr2 on a Ura3-containing plasmid (PRY118 (Noble and Guthrie, 1996;Raghunathan and Guthrie, 1998), see Table 4.4). Transformed yeast was plated on selective media lacking histidine and containing 5’ fluoroorotic acid and grown at 30oC for two days. The Prp22 transformations were 63 conducted in the same manner, but using a Prp22 knockout strain (prp22Δ (Schneider et al., 2004), see Table 4.6 for a list of strains used in this study). Dib1 protein secondary structure predictions were performed using SWISS- MODEL workspace software (Arnold et al., 2006;Guex and Peitsch, 1997;Schwede et al., 2003). Predictions could not be performed for Brr2 or Prp22 as the crystal structures of these proteins have not been solved. Protein sequences of diverse eukaryotes were retrieved from Genbank and aligned using CLUSTALW v.1.83 software (Thompson et al., 1994). The alignment was manually edited in MacClade 4.06 (Maddison and Maddison, 1989) and pairwise distances were calculated using TREE-PUZZLE 5.2 (Schmidt et al., 2002) set to estimate exact parameters using the WAG substitution model (Goldman and Whelan, 2000) with 8Γ rate categories + 1 invariable. The number of included characters for each dataset was as follows: Dib1 – 142, Prp22 – 600, Brr2 – 728. 4.3 Results and Discussion All three E. cuniculi splicing proteins (Dib1, Brr2, Prp22) failed to rescue S. cerevisiae knockouts, although positive controls gave expected results in all cases (See Figures 4.1, 4.2 and 4.3). 4.3.1 Dib1 Of the three proteins, EcDib1 was judged to be the most likely candidate to complement yeast knockouts because of its similarity in size (a difference of only three amino acids) and likely highly similar secondary structure based on SWISS-MODEL predictions (Arnold et al., 2006;Guex and Peitsch, 1997;Schwede et al., 2003). However, the molecular mechanism of Dib1 function remains largely unknown in eukaryotic cells. Although the protein belongs to the thioredoxin fold superfamily, it lacks the active site of other members of the family. Reuter et al., (Reuter et al., 1999) identified the crystal structure of the human Dib1 homologue, and discussed several conserved residues that form a hydrophobic cleft and a hydrophobic groove, as well as six basic residues. Among the putative processes involving Dib1 are G2/M transition (by modulating the quantity of anaphase-promoting complex or cyclosome (APC/C) present in the cell) and 64 mRNA splicing (as a member of the U5 snRNP). Carnahan et al. (Carnahan et al., 2005) found that in S. pombe Dib1 is directly involved in the splicing and export from the nucleus of Lid1 transcripts, a component of the APC/C. Therefore, the two functions of the protein appear to be related, but it is not known which domain(s) serve which function(s). Building on this work, Zhang et al., (Zhang et al., 1999) systematically mutated various residues of Dib1 in S. cerevisiae and tested its ability to bind with known binding partners. As expected, mutation did decrease binding, but the residues whose mutations decreased binding to the greatest extent were those in regions that are not evolutionarily conserved and thus not vital to the functions of the protein. Therefore, it is likely that interaction partner binding is species-specific. Although Dib1 is remarkably conserved throughout eukaryotes, it appears that EcDib1 is extraordinarily divergent. Fig. 4.4 depicts pairwise distances between EcDib1 and Dib1 of other eukaryotes, including the nucleomorph of Bigelowiella natans. The pairwise distances between EcDib1 and Dib1 homologues in other species are very large, and exceed the distances between any other species. The pairwise distance between E. cuniculi and Saccharomyces Dib1 is nearly nine times greater than the distance between Mus and Schizosaccharomyces between which functional complementation has been reported (Berry and Gould, 1997). Notably, the isoelectric points of S. cerevisiae and E. cuniculi Dib1 are also very different (See Table 4.1). This difference may have added to the factors that made EcDib1 unable to rescue the yeast knockout. Eight of the twenty conserved residues that are predicted to be functional (Reuter et al., 1999) have been replaced by non-conservative substitutions in EcDib1, compared to only three in S. cerevisiae (See Table 4.2). Despite these differences, SWISS-MODEL software predicts EcDib1 to contain the same secondary structure elements as human (for which the crystal structure has been solved (Reuter et al., 1999)) and S. cerevisiae Dib1 (Arnold et al., 2006;Guex and Peitsch, 1997;Schwede et al., 2003). However, functional complementation of S. cerevisiae knockout mutants did not occur. There are several possible explanations. As discussed in Zhang et al. (Zhang et al., 2000), the sites that are integral for protein-protein interactions are not well conserved between species. Although EcDib1 may possess the necessary amino acid residues to assume the correct 65 secondary and tertiary structures, it likely lacks the specific residues necessary for interaction with other S. cerevisiae proteins. 4.3.2 Prp22 Prp22 is a DEAH box RNA helicase that is essential for the liberation of spliced mRNA from the spliceosome as well as the second transesterification reaction of mRNA splicing. The protein contains seven motifs that confer its helicase activity, as well as an S1 RNA binding motif near the N-terminus. Although the S1 motif is evolutionarily conserved, it is not required for the ATPase function of the protein, and seems to play a role in repressing the protein’s activity (Schneider and Schwer, 2001). EcPrp22 is only 784 amino acids in length (a difference of 361 residues), but contains all of the RNA helicase motifs (See Table 4.3). The “missing” amino acids appear to be mostly truncated from the N-terminus, as EcPrp22 begins at approximately residue 275 of the S. cerevisiae protein and lacks large stretches of amino acids before residue 481. Although S. cerevisiae Prp22 possesses full activity when residues 1-260 (where the S1 motif is located) are absent, it appears that the residues that are present between the S1 motif and motif I of the helicase, which begins at residue 506, are essential, although their function is not known. The absence of the residues corresponding to 275-481 of the S. cerevisiae protein from the EcPrp22 protein may be contributing to its inability to complement the S. cerevisiae knockout. Schneider and Schwer (Schneider and Schwer, 2001) concluded that although the N-terminal amino acids of the Prp22 protein are not necessary for helicase function, the C-terminal residues, including those after the last helicase motif (VI) are essential. EcPrp22 contains a complete C-terminus. The pairwise distances between EcPrp22 and those of other species are more modest than those of Dib1 (See Fig. 4.4) and the isoelectric points of yeast and EcPrp22 are very similar (See Table 4.1). However, differences in conserved residues found in nearly all motifs are present (See Table 4.3). When combined with the shortened N- terminus, these two factors may render EcDib1 sufficiently different and therefore incapable of complementing the yeast mutant. 66 4.3.3 Brr2 Brr2 was the riskiest venture in the complementation project, as the E. cuniculi protein lacks one of the helicase domains. This domain is responsible for protein-protein interactions in yeast, and may have been eliminated due to a shrinking proteome in E. cuniucli. In addition, the isoelectric points of the yeast and E. cuniculi proteins are modestly different (See Table 4.1). However, nearly all of the functional residues of the first helicase domain are present in EcBrr2 (See Table 4.3), and the pairwise distances with other species are modest compared to Dib1. The E. cuniculi protein used for complementation retrieved two S. cerevisiae proteins as top hits during reciprocal BLAST searches, both with e-values near 10-100. The other protein is called Slh1 and is also an RNA helicase belonging to the DEXH group, which stops non-polyadenylated RNAs from being translated. It is related to the Ski2 protein, which acts in concert with Slh1 to arrest the multiplication of viruses, and like the E. cuniculi protein, contains only one helicase domain. BLAST searches do not recover Ski2 with an e-value comparable to those obtained for Brr2 and Slh1, but the size of 1287 amino acids is much more similar to the E. cuniculi protein (1058 amino acids) than either Brr2 (2164 amino acids) or Slh1 (1968 amino acids). The E. cuniculi protein contains nearly all of the conserved residues present in DEXH helicases (See Table 4.4), suggesting that it does function as a helicase, but the cellular process in which it participates is not easily discernable. It is possible that the E. cuniculi protein is a Brr2 homologue, but the absence of the second helicase domain, which participates in protein- protein interactions, renders it incapable of functioning in yeast. Perhaps the smaller number of proteins, and therefore smaller protein interaction networks rendered the second domain unnecessary in the microsporidian. Another possibility is that the E. cuniculi protein acts in the viral repression pathway, and was therefore not homologous to the Brr2 protein that was absent in the yeast knockouts. 4.3.4 The E. cuniculi Spliceosome The presence of spliced transcripts in E. cuniculi meronts indicates that the spliceosome is active during a portion of the life cycle. Given the conservation of the splicing process from yeast to humans, it is reasonable to assume that although it may 67 operate with a reduced number of proteins, the reaction is the same in E. cuniculi. It is possible to predict the components of the E. cuniculi spliceosome and its associated factors using bioinformatics methods in an approach similar to that taken by Collins and Penny (Collins and Penny, 2005). However, it is impossible to know the identities of all of the proteins that participate in splicing without performing large-scale purifications of splicing extract and examining the components individually. As it is extremely difficult to separate meronts from host cells, it is likely that we will never know the exact composition of the E. cuniculi spliceosome. Yet, by studying the individual components, we can hope to gain a glimpse of how this reduced system operates. All of the E. cuniculi proteins failed to complement yeast knockouts, suggesting that the E. cuniculi proteins interact with each other using different residues. These residues may have diverged in E. cuniculi due to its accelerated evolution rate (Thomarat et al., 2004). However, as has been shown for Dib1, the regions of a protein that interact with partners are not necessarily the well-conserved ones (Zhang et al., 2000). The general trend towards shortening of E. cuniculi genes compared to yeast homologues is borne out in two of the three proteins tested for complementation. The smaller EcPrp22 and EcBrr2 proteins could lack residues homologous to those used for protein-protein interactions and regulation in yeast. As the number of proteins encoded by E. cuniculi is predicted to be about 1/3 of that encoded by S. cerevisiae, the range of interaction partners for each protein is also predicted to be somewhat smaller. E. cuniculi could lack at least one homologue of any of the proteins that Saccharomyces Dib1, Prp22 or Brr2 interacts with. Therefore, the E. cuniculi proteins would not possess the residues necessary to interact with these partners. Unfortunately, negative results have not yielded an answer as to the reason for failure to complement yeast mutants. However, we do know that the E. cuniculi spliceosome is functional and therefore its proteins must work in concert to effectively excise introns from pre-mRNAs. Although the E. cuniculi proteins are shorter than yeast homologues and highly divergent in sequence, they have evolved together, thus ensuring their ability to function as a unit. 68 Protein GenbankAccession Species Size (Amino Acids) Molecular Weight (Daltons) Isoelectric Point Dib1 NP_015407 S. cerevisiae 143 16,776 6.12 NP_584652 E. cuniculi 140 15,647 5.2 Prp22 NP_010929 S. cerevisiae 1145 130,010 7.79 XP_955567 E. cuniculi 784 88,243 7.75 Brr2 NP_011099 S. cerevisiae 2163 246,183 5.28 NP_586011 E. cuniculi 1058 120,911 6.44 Slh1 NP_011787 S. cerevisiae 1967 224,828 6.52 Ski2 NP_013502 S. cerevisiae 1287 146,057 6.72 Table 4.1: Proteins Examined in this Study. Accession numbers, sizes, molecular weights and isoelectric points are provided for both S. cerevisiae and E. cuniculi proteins. Consensus Residue S. cerevisiae Residue E. cuniculi Residue W13 W D V15 V V I19 I V L20 V G F31 F F F70 F L G roove F85 F F M73 M P M83 M M M92 M K I93 C I L95 F C I103 L I C left W105 F F R87 H N K89 K R R122 R K R125 R V K126 K K B asic R128 K K 69 Table 4.2: S. cerevisiae and E. cuniculi Dib1 Residues in Conserved Positions. The conserved positions form the hydrophobic groove, the hydrophobic cleft and six basic residues. Residue numbers are based on the S. cerevisiae protein. Residues that match the consensus are colored green, those that are conservative substitutions for the consensus (defined as a value >= 1 in the BLOSUM62 matrix) are colored yellow, and non-conservative substitutions are colored pink. Motif # Conserved Sequence S. cerevisiae Sequence E. cuniculi Sequence I GETGSGKT GETGSGKT GETGSGKS Ia PRRVAA PRRVAA PRRAAA II MIDEAHERT MLDEAHERT ILDEAHERT III TSATMN TSATLN MSATIE IV FLTG FLTG FVTG V TSLTIDGIRYVI TSITIDGIYYVV TSLTIPNIGYVI VI QR-GRAGR QRKGRAGR QRTGRAGR Table 4.3: S. cerevisiae and E. cuniculi Prp22 Sequences vs. Seven Conserved Motifs in DEAH Helicases. Amino acid residues that are not conserved within the consensus sequence are indicated with a “-“. Residues differing from the consensus sequence are indicated in bold italic text. (Modified from Noble and Guthrie, 1996) S. cerevisiae SequenceConserved Sequence Domain 1 Domain 2 E. cuniculi Sequence A-TG-GKT APTGSGKT SGKGTGKT APTGSGKT P-KAL PLKAL PMRLW PRMAL GD GD GN GD T T T T DE-H DEIH DDAH DEIH SAT SAT SNC SAT G-N GVN AFA GVN QM-GRAGR QMLGRAGR EMVGLASG QIFGRAGR 70 Table 4.4: S. cerevisiae and E. cuniculi Brr2 Sequences vs. Conserved Sequence Motifs in DEXH Helicases. Amino acid residues that are not conserved within the consensus are indicated with a “-“. Residues differing from the consensus sequence are indicated in bold italic text. (Modified from Raghunathan and Guthrie, 1998) Plasmid Name Selectable Marker Inserted Gene Parent Plasmid P426 Ura - - P423 His - - URAEcDib1 Ura EcDib1 P426 URAScDib1 Ura ScDib1 P426 HISPrp22 His EcPrp22 P423 HISBrr2 His EcBrr2 P423 Table 4.5: Plasmids Used in this Study. Protein Strain Name Genotype Source Dib1 Dib1Δ MATa dib1::kanr his3Δ1 leu2Δ0 met15Δ0 ura3Δ0 / MATα his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0 Open Biosystems Prp22 YGS2 MATa ura3 his3 ade2 tyr1 Guthrie Lab Prp22 Prp22Δ MATa prp22::leu2 ura3 trp1 his3 leu2 ade2 (Prp22 Ura3 Cen) Schwer Lab (Schneider et al., 2004) Brr2 PRY120 MATα ade2 lys2 his3 ura3 leu2 (Brr2 Ura3 Cen) Guthrie Lab (Noble and Guthrie, 1996;Raghunathan and Guthrie, 1998) Brr2 PRY118 MATa brr2::LEU2 ade2 lys2 his3 ura3 leu2 (Brr2 Ura3 Cen) Guthrie Lab (Noble and Guthrie, 1996;Raghunathan and Guthrie, 1998) Table 4.6: S. cerevisiae Strains Used in this Study. Plasmids are in parentheses. 71 Figure 4.1: Saccharomyces Dib1 Tetrad Dissections. Panel A depicts dissected tetrads of the mutant strain transformed with empty expression vector, panel B depicts dissected tetrads of the mutant strain transformed with Saccharomyces Dib1 in an expression vector, and panel C depicts tetrads dissected from the mutant strain transformed with Encephalitozoon Dib1 in an expression vector. Figure 4.2: Saccharomyces Prp22 Transformations. 72 Upper quadrants contain Saccharomyces Prp22 mutant strain (Prp22Δ) transformed with empty vector and expression vector containing Encephalitozoon Prp22. The lower quadrants contain wild-type Saccharomyces (YGS2) transformed with expression vector containing Encephalitozoon Prp22 and empty vector. Figure 4.3: Saccharomyces Brr2 Transformations. Upper quadrants contain Saccharomyces wild-type strain (PRY120) transformed with empty vector and expression vector containing Encephalitozoon Brr2. Lower quadrants contain Saccharomyces Brr2 mutant strain (PRY118) transformed with expression vector containing Encephalitozoon Brr2 and empty vector. 73 Figure 4.4: Pairwise Distances Between Various Species for Dib1, Prp22 and Brr2. See Materials and Methods for a description of how these were calculated. 74 4.4 References Adams, A., Gottschling, D.E., Kaiser, C.A. and Stearns, T. 1998. Methods in yeast genetics: a Cold Spring Harbour laboratory course manual. Cold Spring Harbor Laboratory Press, Plainview, N.Y. Arnold, K., Bordoli, L., Kopp, J. and Schwede, T. 2006. The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics 22: 195-201. Berry, L.D. and Gould, K.L. 1997. Fission yeast dim1+ encodes a functionally conserved polypeptide essential for mitosis. J. Cell Biol. 137: 1337-1354. Brosson, D., Kuhn, L., Delbac, F., Garin, J., Vivares, C.P. and Texier, C. 2006. Proteomic analysis of the eukaryotic parasite Encephalitozoon cuniculi (microsporidia): a reference map for proteins expressed in late sporogonial stages. Proteomics 6: 3625- 3635. Carnahan, R.H., Feoktistova, A., Ren, L., Niessen, S., Yates, J. R. III. and Gould, K.L. 2005. Dim1p is required for efficient splicing and export of mRNA encoding Lid1p, a component of the fission yeast anaphase-promoting complex. Eukaryot Cell 4: 577-587. Collins, C.A. and Guthrie, C. 2000. The question remains: Is the spliceosome a ribozyme? Nat Struct Biol 7: 850-854. Collins, L. and Penny, D. 2005. Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol 22: 1053-1066. Goldman, N. and Whelan, S. 2000. Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol Biol Evol 17: 975- 978. Guex, N. and Peitsch, M.C. 1997. SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modelling. Electrophoresis 18: 2714-2723. 75 Katinka, M.D., Duprat, S., Cornillot, E., Metenier, G., Thomarat, F., Prensier, G., Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P. et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414: 450-453. Kaufer, N.F. and Potashkin, J. 2000. Analysis of the splicing machinery in fission yeast: a comparison with budding yeast and mammals. Nucleic Acids Res 28: 3003-3010. Maddison, W.P. and Maddison, D.R. 1989. Interactive analysis of phylogeny and character evolution using the computer program MacClade. Folia Primatol 53: 190-202. Meka, H., Werner, F., Cordell, S.C., Onesti, S. and Brick, P. 2005. Crystal structure and RNA binding of the Rpb4/Rpb7 subunits of human RNA polymerase II. Nucleic Acids Res 33: 6435-6444. Mumberg, D., Muller, R. and Funk, M. 1995. Yeast vectors for the controlled expression of heterologous proteins in different genetic backgrounds. Gene 156: 119-122. Noble, S.M. and Guthrie, C. 1996. Identification of novel genes required for yeast pre- mRNA splicing by means of cold-sensitive mutations. Genetics 143: 67-80. Raghunathan, P.L. and Guthrie, C. 1998. RNA unwinding in U4/U6 snRNPs requires ATP hydrolysis and the DEIH-box splicing factor Brr2. Curr Biol 8: 847-855. Reuter, K., Nottrott, S., Fabrizio, P., Luhrmann, R. and Ficner, R. 1999. Identification, characterization and crystal structure analysis of the human spliceosomal U5 snRNP- specific 15 kD protein. J Mol Biol 294: 515-525. Schellenberg, M.J., Ritchie, D.B. and MacMillan, A.M. 2008. Pre-mRNA splicing: a complex picture in higher definition. Trends Biochem Sci 33: 243-246. Schmidt, H.A., Strimmer, K., Vingron, M. and von Haeseler, A. 2002. TREE-PUZZLE: Maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18: 502-504. 76 Schneider, S., Campodonico, E. and Schwer, B. 2004. Motifs IV and V in the DEAH box splicing factor Prp22 are important for RNA unwinding, and helicase-defective Prp22 mutants are suppressed by Prp8. J Biol Chem 279: 8617-8626. Schneider, S., Hotz, H.R. and Schwer, B. 2002. Characterization of dominant-negative mutants of the DEAH-box splicing factors Prp22 and Prp16. J Biol Chem 277: 15452- 15458. Schneider, S. and Schwer, B. 2001. Functional domains of the yeast splicing factor Prp22. J. Biol. Chem. 276: 21184-21191. Schwede, T., Kopp, J., Guex, N. and Peitsch, M.C. 2003. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res 31: 3381-3385. Small, E.C., Leggett, S.R., Winans, A.A. and Staley, J.P. 2006. The EF-G-like GTPase Snu114p regulates spliceosome dynamics mediated by Brr2p, a DExD/H box ATPase. Mol Cell 23: 389-399. Thomarat, F., Vivares, C.P. and Gouy, M. 2004. Phylogenetic analysis of the complete genome sequence of Encephalitozoon cuniculi supports the fungal origin of microsporidia and reveals a high frequency of fast-evolving genes. J. Mol. Evol. 59: 780- 791. Thompson, J.D., Higgins, D.G. and Gibson, T.J. 1994. CLUSTALW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673- 4680. van Nues, R.W. and Beggs, J.D. 2001. Functional contacts with a range of splicing proteins suggest a central role for Brr2p in the dynamic control of the order of events in spliceosomes of Saccharomyces cerevisiae. Genetics 157: 1451-1467. Will, C.L. and Luhrmann, R. 1997. Protein functions in pre-mRNA splicing. Curr Opin Cell Biol 9: 320-328. 77 Wittner, M. and Weiss, L.M. 1999. The microsporidia and microsporidosis. ASM Press, Washington, D. C. Zhang, Y., Lindblom, T., Chang, A., Sudol, M., Sluder, A.E. and Golemis, E.A. 2000. Evidence that Dim1 associates with proteins involved in pre-mRNA splicing, and delineation of residues essential for Dim1 interactions with hnRNP F and Npw38/PQBP- 1. Gene 257: 33-43. Zhang, Y.Z., Gould, K.L., Dunbrack, R.L.J., Cheng, H., Roder, H. and Golemis, E.A. 1999. The evolutionarily conserved Dim1 protein defines a novel branch of the thioredoxin fold superfamily. Physiol Genomics 1: 109-118. 78 Chapter 5 – Life Stage Differences in Splicing and Transcription4 5.1 Introduction Eukaryotic protein-coding genes are frequently interrupted by non-coding elements called spliceosomal introns. Introns are removed from pre-mRNAs by a protein/RNA complex called the spliceosome. In S. cerevisiae intron sizes range from 52 - 1002bp (Spingola et al., 1999) while introns in humans range from 30 - 497,816bp (Sakharkar et al., 2004). Although introns are exceedingly common, their origins, potential functions and evolutionary significance have long been debated (Rogers, 1990;Collins and Penny, 2005;Koonin, 2006;Rogozin et al., 2005). mRNA transcription and processing are highly integrated processes in eukaryotic cells. RNA polymerase II coordinates the activities of multiple proteins to achieve 5’ capping, polyadenylation and splicing of nascent mRNA molecules (Bentley, 2005;Hagiwara and Nojima, 2007). Microsporidia are unicellular eukaryotes that are obligate intracellular parasites of a wide range of animals and members of the protist apicomplexan lineage. Typical microsporidian life cycles begin with the spore (See Fig. S1a), which is the infective stage of the organism. When a spore germinates, (Fig. S1b.) the polar tube everts, piercing through the spore wall. This tube is capable of perforating the membrane of a host cell and is then used to inject the spore contents into the host cell’s cytoplasm (Fig. S1c). At this point, the organism enters the meront or proliferative stage (Fig. S1d). The meront divides then a spore wall is deposited around each nucleus (Fig. S1e). Eventually, the host cell lyses, releasing the spores. The spores may infect adjacent cells or they may be excreted and come in contact with a new host. There are two distantly related microsporidia (See Fig. S2) from which we can compare both genomic and transcription data: Antonospora locustae and Encepahlitozoon cuniculi. A. locustae is a grasshopper parasite that has been partially sequenced (Slamovits et al., 2004). E. cuniculi is a mammalian parasite whose genome 4 A version of this chapter will be submitted for publication. Gill, E.E.; Lee, R.C.; Corradi, N.; Grisdale, C.; Limpright, V.O.; Keeling, P.J.; Fast, N.M. Splicing and transcription in microsporidia: the spore-meront dichotomy. 79 has been completely sequenced and at 2.9 Mbp, is extremely small for a eukaryote. The genome possesses the following features: a mere 2000 protein-coding genes, small size of proteins compared to homologues in other eukaryotes and a reduced amount of non- coding DNA (Katinka et al., 2001). Although the genome of A. locustae is roughly twice the size of E. cuniculi’s, regions examined by Slamovits et al. (Slamovits et al., 2004) in a genome sequence survey (GSS) indicate gene densities similar to that seen in E. cuniculi. Surprisingly, spores of both E. cuniculi and A. locustae have been shown to possess transcripts containing more than one gene (Corradi et al., 2008b;Williams et al., 2005). Multi-gene transcripts are rare among eukaryotes and these transcripts are also distinctly different from those produced by prokaryotic operons. In microsporidian multi-gene transcripts, genes are not all full length or always encoded on the same DNA strand, making it almost impossible to imagine that multiple functional proteins could be produced from such transcripts. Instead, it has been hypothesized that these multi-gene transcripts may be a byproduct of genome compaction and result from extremely small intergenic spaces; indeed, it has been shown that transcripts are more likely to contain fragments of adjacent genes when intergenic spaces are tiny (Corradi et al., 2008b). It is not currently known whether this curious pattern of transcription occurs in E. cuniculi meronts as well as spores. Furthermore, examining a larger sample of genes will help determine whether multigene transcripts in spores exist because promoter elements are poorly defined or simply because they are located in upstream genes. Only 16 introns in 15 genes were predicted in E. cuniculi’s genome (Katinka et al., 2001;Vivares et al., 2002). These predicted introns are extremely short (17bp to 52bp) and are almost exclusively found in ribosomal protein-coding genes (RPGs). As A. locustae’s genome has not been completely sequenced, its complete intron complement is not known. However, this organism is predicted to contain at least three introns in RPGs. The positions of these introns are conserved with those in E. cuniculi (Limpright et al., unpublished data). Although only a small fraction of E. cuniculi genes contain introns, E. cuniculi possesses many spliceosomal proteins and at least two snRNAs (Katinka et al., 2001). Spores are also known to contain proteins encoded by intron-containing genes (Brosson et al., 2006). These proteins could not be translated without intron removal, as 80 they would have introduced frameshifts. Therefore, it is reasonable to assume that mRNA splicing occurs in E. cuniculi. An examination of splicing in microsporidia may reveal why a small number of introns have been retained in such a compact, reduced genome. Studying splicing in such a reduced system may also allow us to gain a glimpse of the functional core of the spliceosome so that we can better understand how this behemoth of eukaryotic biochemical machinery functions. Our study is the first of its kind to systematically compare two integral eukaryotic processes between life stages in microsporidia and reveals stark differences in the regulation of both transcription and splicing between the spore and the meront. 5.2 Materials and Methods E. cuniculi spores and meronts were obtained as a generous gift from the Didier lab in the Faculty of Tropical Medicine at Tulane University (New Orleans, LA). Meronts were harvested from RK-13 (rabbit kidney) cell cultures 48 hours after infection and stored in RNAlater (Ambion). A. locustae spores were obtained from M&R Durango Inc. (http://www.goodbug.com/). To obtain A. locustae meronts, Semaspore Bait was obtained from Seeds of Change (http://www.seedsofchange.com) and fed to locust (Schistocerca gregaria) nymphs during the 2nd instar stage. The locusts were allowed to mature to adulthood, then sacrificed. Their fat bodies were removed and stored in RNAlater (Ambion). All E. cuniculi and A. locustae spores and meronts were disrupted by grinding in liquid nitrogen. RNA was extracted using Ambion’s RNAqueous kit. To examine transcription, 5’ nested RACE was then carried out using Ambion’s FirstChoice RLM-RACE kit. An annealing temperature of 550C was used for all reactions. Minus TAP controls were employed to ensure that there was no contaminating DNA. The 5’ nested primer had a 5-carboxyfluorescein attached to the 5’ end. A list of gene-specific primers used is provided in Table S2. All 5’nested RACE products were analyzed by capillary electrophoresis (CE) to determine their sizes (Corradi et al., 2008a). Transcripts that were of sufficient length to be overlapping with upstream genes were cloned and sequenced for verification. 81 To examine splicing, 5’ nested RACE was performed using Ambion’s FirstChoice RLM-RACE kit. An annealing temperature of 500C was used for all reactions. A list of primers used is provided in table S3. Products were resolved on agarose gels, resulting bands were excised, products gel isolated and cloned using Invitrogen’s TOPO TA kit. At least ten clones of each transcript were sequenced, with the exception of L5, L37, L39, S24 and S29 from E. cuniculi spores, where RACE products could not be obtained. Only one sequence was obtained for S30 from E. cuniculi spores. All sequences were examined using Sequencher 4.8.1 software (Gene Codes Corp.), and TATA/TTTR motifs were identified using the “find” function. 5.3 Results 5.3.1 Transcription Previous studies of E. cuniculi spores have demonstrated that a high proportion of loci produce multi gene transcripts (Corradi et al., 2008b). Our aims are to ascertain whether this pattern exist in both life stages of E. cuniculi and if overlapping transcription is due to the relocation of promoters into upstream genes or to a general “disregulation” of transcription. Therefore, 5’RACE products were obtained from both spores and meronts for 31 genes and their sizes determined by CE (See Fig. 1 and Table S1). In both spores and meronts, transcription began at distinct locations and produced products of discrete sizes; no evidence of “sloppiness” was seen. The average number of transcription start sites per gene was 2.48 in spores and 1.68 in meronts. Transcript UTRs had varying lengths in both spores and meronts (see Fig. 1). Transcripts from spores generally had longer 5’ UTRs than transcripts from meronts and had a greater tendency to overlap with upstream genes. The size of the shortest RACE product was the same in spores and meronts 48% of the time, smaller in meronts 42% of the time and smaller in spores 10% of the time, while the size of the longest RACE product was greater in spores 65% of the time, greater in meronts 16% of the time and the same 19% of the time. The longest 5’RACE products were also compared for each gene to assess overlap with upstream genes. 5’RACE products overlapped with upstream genes in spores exclusively 39% of the time, in both spores and meronts 13% of the time, 82 in meronts exclusively 6% of the time and transcripts did not overlap 42% of the time. In 18 of the 31 genes examined, one transcription start site in spores was within 10 bp of a meront start site. Three genes had two shared transcriptional start sites between spores and meronts (See Table S1). 78% of the meront transcription start sites and 59% of the spore transcription start sites have a TATA or TTTR motif 50 base pairs or less upstream. The genes whose transcripts overlap with upstream genes exclusively in spores belong to diverse functional categories, whereas transcripts overlapping exclusively in meronts encode only hypothetical proteins. Therefore, it is not possible to compare the regulation of transcription of functional categories. 5.3.2 Splicing From the E. cuniculi spore transcripts, 5’RACE products were obtained for the majority of intron-containing genes (see Methods). Each product was cloned, and at least ten clones of each product were sequenced. All transcripts examined contained introns, suggesting that splicing does not occur in spores. The presence of multiple splicing factors and the protein products of spliced transcripts in spores indicate that splicing takes place in E. cuniculi, so we decided to focus on the other life stage: the meront. Meronts were grown in host cells for 48h post-infection before RNA extraction. Both spliced and unspliced transcripts were recovered for 14 of the 16 predicted intron-containing genes. L7A and S29 were the only genes from which spliced transcripts were not found. 5’ nested RACE products from these transcripts were examined via capillary electrophoresis to further assess whether any spliced transcripts were present. Splicing appeared to be absent in both cases (See Fig.S3), which lead us to question whether these transcripts contain bona fide introns (see Discussion). Of the 14 spliced introns, six were shorter than predicted, thus decreasing the mean intron size in E. cuniculi from nearly 36bp to 31.5bp. See Table 1 for a summary of these results. No spliced transcripts were recovered from A. locustae spores, and a mixture of spliced and unspliced transcripts was recovered from the infected locust tissue sample. The A. locustae material harvested from locust tissue was likely a mixture of meronts and spores. Because A. locustae was obtained from living host organisms and not from cell 83 cultures, the infection could not be synchronized. Therefore, different parasite cells would be at different points in the life cycle. It is impossible to tell whether the unspliced transcripts originated from spores or from meronts. 5.4 Discussion Our results show that a dichotomy exists between microsporidian spores and meronts; while transcripts from spores are unspliced and have longer 5’UTRs that frequently overlap with upstream genes, meront transcripts are often spliced, and have shorter 5’UTRs that less often overlap with upstream genes. Spore transcripts also utilize more start sites than meront transcripts, and less often start in the close proximity of a TATA or TTTR motif. Previous investigations of microsporidian transcription have indicated the prevalence of multi gene transcripts, a highly unusual phenomenon. However, our results show that these transcripts are almost exclusively observed in spores. The absence of splicing in spores augments the oddity of this life stage. 5.4.1 Transcription in E. cuniculi The average gene in both spores and meronts is transcribed from more than one start site. Recent studies of yeast (Miura et al., 2006;Zhang and Dietrich, 2005), mouse and human (Carninci et al., 2006) transcription indicate that, like E. cuniculi, the majority of genes are transcribed from two or more different start sites. In yeast it has been reported that transcription begins at a weak consensus sequence and that regions upstream of protein coding genes are unusually enriched with consensus sequence tracts (Zhang and Dietrich, 2005). Either TATA or TTTR sequences are present within 50 bp upstream of a large portion of the E. cuniculi transcription start sites in both the spore and the meront. In both yeast and humans, TATA elements are composed of 6bp motifs, yet only 4bp are conserved among the E. cuniculi sequences. The 4bp TATA motifs are consistent with, but represent only 1/4 of the length of the predicted promoters of polar tube proteins proposed by Delbac et al. (Delbac et al., 2001). In contrast, the TTTR motifs do not seem to be present in promoters of other eukaryotes. The distances between the E. cuniculi motifs and the transcriptional start sites vary, which is also the 84 case in yeast (Hampsey, 1998). Overall, the significance of these motifs remains unknown. Importantly, the capillary electrophoresis technique has allowed us to examine the average number of transcriptional start sites for genes in E. cuniculi. Although the frequency with which transcripts overlap with upstream genes is high (particularly in spores), the average number of transcriptional start sites is not outside the range of normalcy for eukaryotes (Carninci et al., 2006;Miura et al., 2006). It is probable that the capillary electrophoresis method does not identify transcripts of lower abundance, and therefore a fraction of transcriptional start sites are undoubtedly missed. However, the presence of multiple definitive start sites indicates that transcription begins preferentially from distinct locations. These data suggest that although they may act differently in the spore and meront life stages, E. cuniculi promoters could behave in the manner of typical eukaryotic promoters. The proportion of transcripts found to overlap with adjacent genes in spores in this study (52%) is much less than the 80% previously reported (Corradi et al., 2008b). However, this figure includes loci that were analyzed using both 5’ and 3’RACE. Since the majority of multigene transcripts in E. cuniculi were found to occur in 5’RACE products, only 5’UTRs were analyzed in the present study. Therefore, the percentages are not directly comparable. Nevertheless, the differences between the two studies fail to explain a 28% difference in multigene transcripts. The loci analyzed in this study are different than those examined previously, thus innate variability may explain part of the discrepancy. 5.4.2 Splicing in E. cuniculi and A. locustae The conservation of life stage specific splicing between two such distantly related microsporidia (See Fig.S2) suggests that A. locustae and E. cuniculi have similarities in their spore biology. The E. cuniculi data garner support from congruent results in a distantly related microsporidian. It is becoming clear that although microsporidian genes evolve quickly, other characteristics, such as genome organization, are conserved among divergent species (Slamovits et al., 2004). 85 Interestingly, the predicted introns of L7A and S29 for which we found no evidence of splicing are unlike E. cuniculi’s verified introns in a few ways. Firstly, they do not cause frameshifts in the resulting protein if they are not removed, whereas all of the other introns would cause frameshifts and premature stop codons. Secondly, they lack features that are conserved among the verified introns (Lee et al, unpublished data; (Irimia and Roy, 2008)). Therefore, it is not likely that these sequences represent bona fide introns. Our results show that transcripts are spliced in E. cuniculi and A. locustae meronts, but not in spores. It is evident that splicing of these transcripts is tightly correlated with the organism’s life cycle. S. cerevisiae also possesses a group of intron- containing genes whose transcripts are efficiently spliced exclusively during sporulation (Davis et al., 2000;Juneau et al., 2007). Removal of introns from transcripts during sporulation ensures that protein production is restricted to the sexual phase of the life cycle. It remains to be determined whether this is also the case in E. cuniculi. 5.4.3 Splicing and Transcription are Integrated Processes In E. cuniculi, it appears that splicing and transcription are uncoupled in the spore. This is at odds with the typical situation in eukaryotes, where transcription by RNA polymerase II (RNAP II) and RNA processing usually occur at the same time. The C- terminal domain of RNAP II is able to interact with and coordinate multiple proteins that act to polyadenylate, 5’ cap and splice mRNA as it is being transcribed and multiple lines of evidence have shown that transcription by RNAP II and splicing are very closely linked (See, for example, (Bentley, 2005;Hagiwara and Nojima, 2007;Kornblihtt et al., 2004). Components of the U1 snRNP interact with RNAP II and also help regulate transcription and transcription initiation (Das et al., 2007;Kwek et al., 2002). In addition, the Saccharomyces cap binding complex proteins that act in the 5’ capping process also aid in the recruitment of splicing factors to nascent mRNA (Gornemann et al., 2005). Although recent evidence suggests that the removal of most yeast introns does not occur before transcription is completed, spliceosome component recruitment commences cotranscriptionally (Moore et al., 2006;Tardiff et al., 2006). A myriad of parallels between animal and yeast transcription and mRNA processing systems seem to indicate 86 that this system is at least partly conserved among opisthokonts, and there is no reason to assume that microsporidia would be any different. It appears that transcription and splicing in the meront operate in a similar manner to what is observed in other eukaryotes, which is perhaps not surprising, given that this is the life stage with greater metabolic activity. However, these processes are uncoupled and differently regulated in the spore. Transcription in the spore begins less frequently (59% of transcripts vs 78% in meronts) downstream of TATA or TTTR motifs, is initiated more often inside upstream genes, and the transcripts of intron-containing genes remain unspliced. Perhaps the spore-meront differences are the result of a general “shutting down” of transcription machinery that occurs during the final stages of sporulation. When the organism is in the process of becoming a spore, a relaxation of the constraints that dictate transcription initiation site choice and couple splicing to transcription could occur. Low levels of both transcription and translation occur in yeast spores (Brengues et al., 2002). Brengues et al. did not investigate whether the introns of these transcripts were spliced, but discovered that the transcripts physically resemble those of the actively growing cell and are found in polysome complexes. There is currently no information on these processes in microsporidian spores, but the large differences in transcript structure between E. cuniculi spores and meronts suggests that their function may be dissimilar. There is a provocative possibility that the mRNAs of both intron-containing and intron- less genes remain untranslated in spores and serve a structural/functional role rather than an informational role. In yeast, splicing occurs in the nucleus, while the majority of unspliced transcripts are exported from the nucleus to the cytoplasm for degradation (Isken and Maquat, 2007). Therefore, the location of the intron-containing transcripts could perhaps give us an indication of their role. Part of this role could involve the tethering of ribosomes together to prevent their degradation, as the 60S and 40S subunits are broken down separately in yeast (Kraft et al., 2008). It is also feasible that the transcripts do not serve any function and are simply the byproducts of the cell entering the spore stage. Upon germination, spore mRNAs could be degraded, and transcription would replace them with meront-specific transcripts to be translated. This scenario raises 87 questions about the potential metabolic differences between the two life stages, as well as between microsporidia and better-characterized systems like yeast. It is clear that although splicing and transcription are tightly interwoven in eukaryotes, they operate very differently in the spore and meront life stages of microsporidia. We have an exciting new opportunity to study two universal eukaryotic processes that operate in a very unique way in microsporidia. Gene Spore Meront Predicted Size (bp) Actual Size (bp) E.L5 ?  38 26 E.L7A X X 29 N/A E.L19 X  31 31 E.L27A X  28 28 E.L37 ?  31 31 E.L37A X  52 49 E.L39 ?  32 32 E.S8 X  31 31 E.S17 X  23 23 E.S24 ?  44 29 E.S26 X  42 33 E.S29 ? X 33 N/A E.S30 X  45 45 E.Sec61α X  38 33 E.CDP-DAG Transferase X/? / 43/25 25/25 A.L37 X  N/A 23 A.L37A X  N/A 26 A.L39 X  N/A 20 Table 5.1: Intron splicing patterns and size distribution in E. cuniculi and A. locustae. The presence of spliced transcripts for each gene is indicated in the spore and the meront. E. cuniculi genes are prefixed with an ‘E.’ and A. locustae genes are prefixed with an ‘A.’. The predicted size of each intron is listed with the actual size determined by 5’RACE. Actual sizes of introns which are smaller than predicted are indicated in italics. Predicted introns that were not spliced in either life stage are shaded in grey. 88 Figure 5.1: Transcript Lengths and Start Sites in E. cuniculi. Bar sizes are proportional to transcript lengths. Transcription start sites are indicated by ovals. Portions of bars shaded in grey indicate regions of transcripts that overlap with upstream genes. Locus Putative protein function Length of the 5' intergenic region Spores - Meronts UTR 1 UTR 2 UTR 3 UTR 4 UTR 5 UTR 6 UTR 7 S 30bp 75bp 1* ECU03_0180 Chromobox Protein 93bp M 5bp 16bp 23bp S 27bp 57bp 2 ECU03_0160 Hypothetical Protein (nucleotide-sugartransporter) 76bp M 63bp 126bp S 112bp 125bp 188bp 234bp 3* ECU03_0305 Vacuolar ATP Synthase subunit F 144bp M 3bp 28bp 97bp S 4bp 4 ECU03_0320 60S Ribosomal Protein L13 36bp M 3bp S 4bp 5 ECU03_0520 Heat Shock Related 70kDa Protein 89bp M 5bp S 17bp 91bp 159bp 6* ECU03_0530 Hypothetical Protein 82bp M 3bp 89 Locus Putative protein function Length of the 5' intergenic region Spores - Meronts UTR 1 UTR 2 UTR 3 UTR 4 UTR 5 UTR 6 UTR 7 S 46bp 58bp 76bp 106bp 7* ECU11_1460 Translation Elongation Factor 2 173bp M 9bp S 9bp 51bp 8* ECU11_1450 Transport Protein SEC13 173bp M 3bp S 19bp 34bp 83bp 95bp 123bp 134bp 234bp 9* ECU11_1390 Hypothetical Protein 80bp M 12bp S 55bp 10* ECU11_0660 Ser/Thr ProteinPhosphate PPI-1 Catalytic SU 123bp M 3bp 8bp 15bp S 218bp 280bp 11* ECU11_0670 Hypothetical Protein YG22 5bp M 21bp S 47bp 98bp 123bp 218bp 291bp 12* ECU02_1090 ATP-Dependent DNA-binding Helicase 105bp M 9bp S 13bp 13 ECU07_0120 Hypothetical Protein 96bp M 3bp 30bp 54bp S 7bp 146bp 14* ECU07_0200 Hypothetical Protein 47bp M 1bp S 68bp 126bp 15* ECU07_0210 Hypothetical Protein 27bp M 8bp S 14bp 36bp 54bp 16 ECU07_0260 similarity with WD-repeat Proteins 51bp M 5bp 44bp S 1bp 46bp 80bp 175bp 17 ECU07_0270 Hypothetical Protein 84bp M 2bp 176bp S 32bp 111bp 18* ECU07_1260 Guanosine Diphosphatase 102bp M 13bp 15bp 74bp S 57bp 67bp 83bp 19* ECU07_1250 Hypothetical Protein 36bp M 2bp S 62bp 20* ECU08_1220 Hypothetical Protein 77bp M 3bp S 13bp 135bp 21* ECU08_1210 Hypothetical Protein 84bp M 11bp S 6bp 90bp 22* ECU08_1100 Coatomer Complex Beta Subunit OverlappingORF M 8bp S 8bp 15bp 60bp 23* ECU08_1110 Guanine nucleotide binding Protein Beta SU 62bp M 9bp S 2bp 24 ECU05_1240 Gamma Glutamyl Transpetidase OverlappingORF M 33bp 111bp 305bp 327bp S 53bp 79bp 25* ECU05_1250 CDP-Diacylgycerol Synthase 265bp M 9bp 29bp 42bp S 2bp 15bp 51bp 102bp 26 ECU04_1670 Hypothetical Protein 423bp M 11bp 99bp S 4bp 85bp 27 ECU04_1660 Hypothetical Protein 316bp M 310bp S 35bp 88bp 28 ECU09_0180 Hypothetical Protein 105bp M 6bp 30bp 100bp 108bp 152bp S 38bp 29 ECU09_0170 GTP-Binding Protein 105bp M 47bp S 4bp 10bp 105bp 30* ECU07_1620 BOS1-Like Vescicular Transport Protein 35bp M 9bp S 7bp 84bp 136bp 31* ECU07_1630 Putative Protein with Mut T domain 45bp M 2bp Supplementary Table 5.1: E. cuniculi 5’UTR lengths in Spores and Meronts. The locus of each gene is listed along with the length of the 5’intergenic space and the 5’UTR lengths of all transcripts examined in spores and meronts. When two transcripts 90 were found to begin within 10bp or less of each other, they were counted as having a single start site. Genes highlighted in pink had transcripts that overlapped with upstream genes exclusively in the spore, while those in yellow overlap in the spore and the meront, and those in green overlap exclusively in the meront. Genes with an asterix have the longest transcripts in the spore, while those in bold italics have the longest transcripts in the meront. UTRs highlighted in blue have transcripts in both spores and meronts that begin within 10bp of each other. Gene Outer Primer Inner Primer L5 CTC AGC TGC AGA AAG CTT CGC CT CTC TTA TAG ATT CGC GGA ATC TCC TC L7A CAG GTG AGT CTC GTA GTT ATC G ACA AGC TTC CCA AGA AGC GTC G L19 CAA GAT GAG ATC ACT CTG TTG GAT AG CTT GAT TAT GCA GTC CTT CAT GTG L27A TGT GAA GTA CCT GGC CTT GAC G CTC CCG GAC ATC AAT CAC AGG C L37 CTT AGC CCA GAG AGT CCT CAG TAT C TCA TTC TTC CAG TCC CAA TGG TCC G L37A GCG GTA AAA CCC GTT GCA AAA C GTG TCA GGA GTC TAG GCA TCC AGA C L39 GAT CTT CAA CTT CTT CGA CCT CCA G GTG TTC CTT CAT CAT CCT CTT CCA TGC S8 TCC GAA CTT CCT TGG ACT TCT TG TGT CCA GGT CTG GAG GTG ATG ATT GC S17 GCA TCG ACT CCT TCG GAA TTA C GAC TCC TTC GGA ATT ACG TTT TCC TTT S24 CAT AGC CCT CTT GAG AGT GCC GAA GA CAG TGA GTT TCG CCA CAA TGT GCT S26 TGA GCA CTG GCC ACC CTG TTT G CCT GGG AAG AAC GAA CCC TGA C S29 TGT GTA GGC GAT GCT TCA GGG AAT CTG CTA CAC GGT CCG TGG CAT AAT S30 TGT GGC CGG AAA GAT GAA GCT GAA AGA GAA GCG AGA TGG GAT ACT TCG Sec61α CTG CCA TAG AAC CCT GTG TAC ACC AGG TGA CCA CAG GGC TTG TTC C CDP-Inner CCA TAG TAC GCT CGT TTC TGC AG GGA ATC AGA CAC CTG GAT GCC AG CDP-Outer TCG TGG GCT CAT CCA ACT TCA ACA GAT TCA AGC ACT GCA TGG CTG CAA Supplementary Table 5.2: Primers Used in this Study to Examine Splicing. 91 Locus Outer Primer Inner Primer 1 ECU03_0180 TTC GAT CCT TTT CGA GAC TCC TCA T CCA CTT CAC GAG ATA TTG CTT CAC T 2 ECU03_0160 AGC TGG AAT TTG CTC TGG TTC AGT A GTA AGC AGG TCT TCC ACC TGC CTA T 3 ECU03_0305 CTC TTC TTA GAT CGT CTT CAG ACG T GGA TGA GAT TTG GAT TGT CGT GAG T 4 ECU03_0320 TTT CGG AGC TTC TTC TCA GCC AT TTC GTC TCT GGA TCG TGG TGG AT 5 ECU03_0520 GTC CTC TCA CCA TCC TGG TTC GTA CCT TTC CGC TTA TGT ATC CAG CAA 6 ECU03_0530 ATA AAC ATA AGC GTT TCC CTC ACT T CCC CTG GTG AAG AAC GAA AAC TG 7 ECU11_1460 GCC TTA ATC ACA AGA CAG TCC GTC A TTT CCG TGA TCG ACA TGT GCA ATC A 8 ECU11_1450 CTC TAG CTC CAA ATT CAG CTC GTA CCT TGG CGA ACA CAC GGA CCA T 9 ECU11_1390 ATG TCG TAG TCA AGG ATG TTC TGT C TCT CCA CAA TCA GCT CCA GGT CA 10 ECU11_0660 GCT GCT TGA ATA CAT CAG TCG ACT TCT GCC TCT GCA AGG TGG ACT A 11 ECU11_0670 CTT CAG GGA GTA GTA GTC ACA CTC A ATC CTC GGC AAA CAC GTC AAA TGA 12 ECU02_1090 CAA GCA CCG CAC ATA TTA TCG ACA TCA GGC AGG ATA TAA GCT TTG TCA T 13 ECU07_0120 GAT TCC CAA CCG TCA TCT GTC CTA T ATT TCG CAT TCC TCA CAA TAG TAC T 14 ECU07_0200 TTA CTC TGG CCT TCA TCT TGT CAG A GAA GTC GTA GGT GCT TTT CGA CGA T 15 ECU07_0210 TTC TTG CCT TGT GCT GAG AAA CTA G GGA TAC TGC CTC AGG CAT GCC TT 16 ECU07_0260 GAA ATT CGA AGC ATC TAC AGA GCT A TGC CTG CTT CAG AGG ATT GAA ATC T 17 ECU07_0270 CAT TCC ATG CAT TTG AGA ACC CTG T TCG ACT CCC TAA CGA CCA GTT TAT C 18 ECU07_1260 TGA TGC CTA ATT TCT CCA TCA AGG A TCC ACT ATT CCG CTG TAC GTC ATC T 19 ECU07_1250 CTT GGC ATA GTC TCT GAG TAG ATC T TTC TTA CGT CAA AGA CTT TCC TGC T 20 ECU08_1220 GAA GTG AAG CAC CCT CTC CTT GT TGA GCT CTG GCC CTT TGG TAG TAT 21 ECU08_1210 GCT CCT GGA AGT CGA CAT TGA ACT T TAA TCT CGC CAA TCT CTC TGC CAA T 22 ECU08_1100 TGT GTA AGG GAG CTG AAA TCC TCT CCT CGA TCT TGT CCT GCT CAC TG 23 ECU08_1110 CAA ACT CAG AGT CGA TCT TGC TGT AGG AGT ACA GAA TTT CCC TTC CAG A 24 ECU05_1240 GTC CAA TGC CGA GAA TTC CAG TCA A TTC TGG AAA AGA TGG AGG CTG AGA T 92 Locus Outer Primer Inner Primer 25 ECU05_1250 CTC CTG AAG AAG TTG GTC TTT GTG AT TTT TGT GTC GTC TCA TAG CGT ACT T 26 ECU04_1670 CCT ACG AAG ATA AAC GGG AAT GCT A GTT ACA AAG GAG ACA CTG GCA CTC T 27 ECU04_1660 TCT TCT ACC GGA ACA TCT GCC TCT A TCC TCT TCC TTA CTC TCC TCC TTA G 28 ECU09_0180 GAT GGT CTG CGT GAA TAC GAG CT TTT CGA ATG TGG TGT TGT ACA TCG T 29 ECU09_0170 TCC TGG TGC AGA TAC TGT GAC ATC CCG ACA TTA GAA CTT CCC AGG AAC A 30 ECU07_1620 GCC TAA GCT GGA TAC CGA TCT TGA CCC TAT GTT CTC CAG GGT TTC TTC A 31 ECU07_1630 CAA CAT GGT CCA GAA GTT GCT TAC T CCG TAG TTG TCG ATC AGA AAC CAG Supplementary Table 5.3: Primers Used in this Study to Examine Transcription. 93 Supplementary Figure 5.1: The Life Cycle of E. cuniculi. The spore (a) germinates when the polar tube everts from the spore (b) and the spore contents are pushed through the tube into the host cell cytoplasm (c). The meront divides (d) then spore walls are laid down surrounding each meront nucleus (e). The host cell lyses, releasing new spores (a). The host cell nucleus is denoted with an “N”. 94 Supplementary Figure 5.2: Microsporidian Phylogeny Depicting the Relationship between A. locustae and E. cuniculi. Adapted from (Slamovits et al., 2004). 95 Supplementary Figure 5.3: Capillary Electrophoresis Results for L7A and S29 from Meronts. Blue peaks correspond to the presence of transcripts of a particular size in the sample. Expected sizes for “spliced” and “unspliced’ transcripts are indicated. 96 5.5 References Bentley, D.L. 2005. Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors. Curr Opin Cell Biol 17: 251-256. Brengues, M., Pintard, L. and Lapeyre, B. 2002. mRNA decay is rapidly induced after spore germination of Saccharomyces cerevisiae. J Biol Chem 277: 40505-40512. Brosson, D., Kuhn, L., Delbac, F., Garin, J., Vivares, C.P. and Texier, C. 2006. Proteomic analysis of the eukaryotic parasite Encephalitozoon cuniculi (microsporidia): a reference map for proteins expressed in late sporogonial stages. Proteomics 6: 3625- 3635. Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C. et al. 2006. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38: 626-635. Collins, L. and Penny, D. 2005. Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol 22: 1053-1066. Corradi, N., Burri, L. and Keeling, P.J. 2008a. mRNA processing in Antonospora locustae spores. Mol Genet Genomics 280: 565-574. Corradi, N., Gangaeva, A. and Keeling, P.J. 2008b. Comparative profiling of overlapping transcription in the compacted genomes of microsporidia Antonospora locustae and Encephalitozoon cuniculi. Genomics 91: 388-393. Das, R., Yu, J., Zhang, Z., Gygi, M.P., Krainer, A.R., Gygi, S.P. and Reed, R. 2007. SR proteins function in coupling RNAPII transcription to pre-mRNA splicing. Mol Cell 26: 867-881. Davis, C.A., Grate, L., Spingola, M. and Ares, M.J. 2000. Test of intron prediction reveals novel splice sites, alternatively spliced mRNAs and new introns in meiotically regulated genes of yeast. Nucleic Acids Res 28: 1700-1706. 97 Delbac, F., Peuvel, I., Metenier, G., Peyretaillade, E. and Vivares, C.P. 2001. Microsporidian invasion apparatus: identification of a novel polar tube protein and evidence for clustering of ptp1 and ptp2 genes in three Encephalitozoon species. Infect Immun 69: 1016-1024. Gornemann, J., Kotovic, K.M., Hujer, K. and Neugebauer, K. 2005. Cotranscriptional spliceosome assembly occurs in a stepwise fashion and requires the cap binding complex. Mol Cell 19: 53-63. Hagiwara, M. and Nojima, T. 2007. Cross-talks between transcription and post- transcriptional events within a 'mRNA factory'. J Biochem 142: 11-15. Hampsey, M. 1998. Molecular genetics of the RNA polymerase II general transcription machinery. Microbiol Mol Biol R 62: 465-503. Irimia, M. and Roy, S.W. 2008. Evolutionary convergence on highly-conserved 3' intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome. PLOS Genet. 4: e1000148. Isken, O. and Maquat, L.E. 2007. Quality control of eukaryotic mRNA: safeguarding cells from abnormal mRNA function. Genes Dev 21: 1833-1856. Juneau, K., Palm, C., Miranda, M. and Davis, R.W. 2007. High-density yeast-tiling array reveals previously undiscovered introns and extensive regulation of meiotic splicing. Proc Natl Acad Sci USA 104: 1522-1527. Katinka, M.D., Duprat, S., Cornillot, E., Metenier, G., Thomarat, F., Prensier, G., Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P. et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414: 450-453. Koonin, E.V. 2006. The origin of introns and their role in eukaryogenesis: a compromise to the introns-early versus introns-late debate? Biol Direct 1: 22. Kornblihtt, A.R., De La Mata, M., Fededa, J.P., Munoz, M.J. and Nogues, G. 2004. Multiple links between transcription and splicing. RNA 10: 1489-1498. 98 Kraft, C., Deplazes, A., Sohrmann, M. and Peter, M. 2008. Mature ribosomes are selectively degraded upon starvation by an autophagy pathway requiring the Ubp3p/Bre5p ubiquitin protease. Nat Cell Biol 10: 602-610. Kwek, K.Y., Murphy, S., Furger, A., Thomas, B., O'Gorman, W., Kimura, H., Proudfoot, N.J. and Akoulitchev, A. 2002. U1 snRNA associates with TFIIH and regulates transcriptional initiation. Nat Struct Biol 9: 800-805. Miura, F., Kawaguchi, N., Sese, J., Toyoda, A., Hattori, M., Morishita, S. and Ito, T. 2006. A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci USA 103: 17846-17851. Moore, M.J., Schwartzfarb, E.M., Silver, P.A. and Yu, M.C. 2006. Differential recruitment of the splicing machinery during transcription predicts genome-wide patterns of mRNA splicing. Mol Cell 24: 903-915. Rogers, J.H. 1990. The role of introns in evolution. FEBS Lett 268: 339-343. Rogozin, I.B., Sverdlov, A.V., Babenko, V.N. and Koonin, E.V. 2005. Analysis of evolution of exon-intron structure of eukaryotic genes. Brief Bioinform 6: 118-134. Sakharkar, M.K., Chow, V.T.K. and Kangueane, P. 2004. Distributions of exons and introns in the human genome. In Silico Biol. 4: 32. Slamovits, C.H., Fast, N.M., Law, J.S. and Keeling, P.J. 2004. Genome compaction and stability in microsporidian intracellular parasites. Curr. Biol. 14: 891-896. Spingola, M., Grate, L., Haussler, D. and Ares, M. 1999. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA 5: 221-234. Tardiff, D.F., Lacadie, S.A. and Rosbash, M. 2006. A genome-wide analysis indicates that yeast pre-mRNA splicing is predominately posttranscriptional. Mol Cell 24: 917-929. Vivares, C.P., Gouy, M., Thomarat, F. and Metenier, G. 2002. Functional and evolutionary analysis of a eukaryotic parasitic genome. Curr Opin Microbiol 5: 499-505. 99 Williams, B.A., Slamovits, C.H., Patron, N.J., Fast, N.M. and Keeling, P.J. 2005. A high frequency of overlapping gene expression in compacted eukaryotic genomes. Proc. Natl. Acad. Sci. USA 102: 10936-10941. Zhang, Z. and Dietrich, F.S. 2005. Mapping of transcription start sites in Saccharomyces cerevisiae using 5' SAGE. Nucleic Acids Res 33: 2838-2851. 100 Chapter 6 – Gene Expression in a Non-Compact Microsporidian Genome5 6.1 Introduction Microsporidia are single-celled eukaryotic intracellular parasites that are related to fungi. Currently, over 1200 species have been identified, infecting animals from nearly every phylum, including commercially important species such as honeybees and fish, as well as humans (Wittner and Weiss, 1999). Inside host cells, microsporidia proliferate as vegetative stages (meronts, schizonts) which eventually produce spores that are released when the host cell lyses. Spores possess a unique host cell invasion apparatus called the polar filament, which is forcefully everted upon germination to form a tube and can pierce a nearby host cell (Wittner and Weiss, 1999). The tube then acts as a conduit allowing the contents of the spore to be injected into the host cell’s cytoplasm, where the parasite undergoes vegetative replication. Microsporidia are a diverse group of organisms, and vary greatly in the complexity of their life cycles. For instance, Encephalitozoon cuniculi and Antonospora locustae produce only one type of spore (uninucleate in the former and binucleate in the latter), and complete their entire life cycles inside one host individual, while Amblyospora californica requires two host groups (mosquitoes and microcrustacea) and produces three morphologically and functionally distinctive spore types (Wittner and Weiss, 1999). Microsporida possess some of the smallest primary nuclear genomes known (as tiny as 2.3Mbp). The only microsporidian whose genome has been completely sequenced is the human parasite, E. cuniculi. At a meager 2.9Mbp, E. cuniculi’s genome is extremely compact, with only 2000 genes (Katinka et al., 2001). A small genome sequence survey (GSS) project has been conducted on A. locustae, a grasshopper parasite that has been approved as a biological control agent in the United States (Slamovits et al., 2004). A. locustae’s genome is roughly 5.4Mbp in size (Streett, 1994), or about twice the size of E. cuniculi’s genome. Despite the genome size difference, both genomes appear 5 A version of this manuscript has been published. Gill, E.E.; Becnel, J.J.; Fast, N.M. (2008) ESTs from the microsporidian Edhazardia aedis. BMC Genomics 9, 296. 101 to be structured in much the same way. Genes are closely packed (nearly one gene per kilobase), are small in size compared to homologues in animals and fungi, and are intron- poor. There is also a much greater degree of synteny between these two organisms than would be expected given their phylogenetic relationship, which implies that although microsporidian genes are fast-evolving, genomic rearrangements occur only rarely (Slamovits et al., 2004) (See Fig.61). However, we have very little information on microsporidian genomes of larger sizes. Edhazardia aedis is a microsporidian that infects Aedes aegypti, the mosquito vector of the Dengue hemorrhagic and yellow fever viruses. Ed. aedis has been intensively studied as a viable biological control agent for A. aegypti (Becnel, 1990) and has a genome estimated to be to be many times larger than that of E. cuniculi. There are several possible explanations for this difference: Ed. aedis may have more genes that control its complex life cycle. Genes may also be longer, more widely spaced, and contain more introns than E. cuniculi (Katinka et al., 2001). Morphological studies conducted on Ed. aedis have revealed at least four different types of spores – two uninucleate and two binucleate (Becnel et al., 1989; Johnson et al., 1997). The two types of uninucleate spore types differ morphologically but possess similar pyriform shapes. However, the cell division events from which they arise differ. Spores produced via mitosis are roughly 8.5 µm in length, whereas spores produced via meiosis (meiospores) are about 7.5 µm. Small binucleate spores (~6.5 µm in length) that have short polar filaments are formed first, then followed by the production of larger binucleate spores (~9 µm in length) that are ovoid in shape. Meiospore formation is usually abortive and rarely produces normal spores (Becnel et al., 1989). Ed. aedis’ life cycle is moderately complex and involves two generations of the mosquito host. It begins when a uninucleate spore is ingested by a mosquito larva from the environment. Once in the gut, the spore germinates and begins to multiply in the host tissue. Within 48 hours, a small binucleate spore is formed that is responsible for spread to other tissues. Orally infected larvae generally exhibit reduced growth, and may die before reaching maturity if the parasite load is high, thus releasing more spores into the environment. However, if the infection load is sufficiently small, the larva will mature into an adult mosquito and survive to reproduce (Becnel et al., 1989). If the adult 102 mosquito is female, large binucleate spores will develop in her ovaries and will infect oocytes, thus passing the infection on to the next generation where the majority of mortality occurs in larvae. Little is known about the factors that modulate the transition from one phase in the life cycle to the next, or about the changes in gene expression that occur during these transitions. It is also possible that the difference in genome size between Ed. aedis and E. cuniculi or A. locustae may have less to do with the number of genes, and more to do with genome architecture. Ed. aedis genes could be longer, more widely spaced, and contain more introns than E. cuniculi (Katinka et al., 2001). In an effort to learn more about Ed. aedis’ genome, a GSS of >200kbp was conducted (Williams et al., 2008). This study concluded that Ed. aedis’ genome structure is very different from those of E. cuniculi and A. locustae. A large portion of the genome is occupied by non-coding DNA and genes are not closely packed together, although the existence of local areas of compaction could not be ruled out. Previous examinations of ESTs from microsporidia have only been conducted on microsporidia with small genomes. These transcripts possessed unusual features that are atypical in eukaryotes. Examinations of ESTs from A. locustae (Williams et al., 2005) and transcripts from E. cuniculi revealed numerous multi-gene transcripts. These transcripts are different from prokaryotic operons, as the proteins encoded by the transcript do not have related functions and are often not encoded on the same DNA strand. Many transcripts encode only a portion of one gene, while the other is present in its entirety (Williams et al., 2005; Corradi et al., 2008). The reason for this phenomenon is not known, but it has been suggested that transcriptional control elements have been lost (or moved into adjacent genes) during the process of genome compaction (Corradi et al., 2008). As Ed. aedis’ genome and life cycle are very different from E. cuniculi and A. locustae, it is reasonable to assume that the transcript structure and number of genes present may differ as well. In this study, we describe the first survey of ESTs from a microsporidian with a much larger genome size and complex life cycle. In sequencing over 1300 transcripts, we have elucidated more of Ed. aedis’ genome content, and have 103 gained a profile of its transcript structure and composition. Surprisingly, the Ed. aedis uninucleate spore transcriptome is remarkably similar to that of A. locustae. 6.2 Methods Uninucleate Ed. aedis spores were grown and harvested from A. aegypti larvae as described previously (Becnel et al., 1995). Ed. aedis spores were lysed in Ambion’s plant RNA isolation aid and lysis/binding solution from an Ambion RNAqueous kit using a bead beater operating at 2500rpm for 6 minutes with glass beads. RNA was extracted from the resulting supernatant using the RNAqueous kit. A microquantity cDNA library was constructed by Marligen, using the pExpress-1 vector. 1307 clones with an average insert size of 1.5kb were uni-directionally sequenced using an automated capillary sequencer. Sequences were manually edited and analyzed using Sequencher 4.2 software. Proteins encoded by the transcripts were identified via BLASTX (Altschul et al., 1997) searches performed on the NCBI website (Genbank). Transcripts were identified as encoding a particular protein when BLASTX hits to Genbank proteins had e-values of 10-4 or lower. Transcripts were scored as “present in other microsporidia” when the best BLASTX hit was a gene present in other microsporidia or when the best hit was a gene that has a microsporidian homologue, and the homologue was identified in other microsporidia by BLASTing the Ed. aedis transcript against available microsporidian data. Putative Ed. aedis-specific genes are transcripts that contain open reading frames encoding 100 amino acids or more and do not have any BLASTX hits with e-values lower than 10-3. In order to facilitate access to the EST sequences, they were uploaded and annotated by the dbEST website (O’Brien et al., 2007). 6.3 Results 6.3.1 Overview Sequences were deposited into the Genbank EST database and have the accession numbers FG063843 to FG065106. From the 1307 clones sequenced, 133 unique genes were found; 55 were represented by a single transcript, while the remaining 78 were represented by two or more. 97 of the 133 unique genes are present in other 104 microsporidia (See Table 6.1), while 10 are present in other (non-microsporidian) organisms (See Table 6.2), 18 are putatively Ed. aedis–specific and 8 have no apparent open reading frames. Coding sequences contained 43% G+C while 5’ and 3’ untranslated regions possessed 27% and 26%, respectively. Approximately a quarter of the transcripts analyzed coded for Hsp70. Almost all of the Hsp70 sequences were most similar to the “heat shock related 70kDa protein” found in E. cuniculi (NP_597563). Single nucleotide variation exists between sequences, usually as 3rd position synonymous substitutions. Where non-synonymous substitutions exist, they are always a single nucleotide and there are no indels between sequences. Mitochondrial-type and DNAK-like Hsp70s were also represented. Genes were assigned to COG categories to allow for comparison with A. locustae. Figure 6.2 illustrates the percentages of total Ed. aedis transcripts that are dedicated to each COG category. Total A. locustae transcripts are provided for comparative purposes. As the randomness of the library is uncertain, it is possible that some transcripts are artificially overrepresented. It is therefore more informative to examine unique transcripts (ie. counting multiple transcripts for the same gene only once) rather than total transcripts. Figure 6.3 displays the percentages of unique Ed. aedis and A. locustae transcripts dedicated to each category. Surprisingly, the values are similar and sometimes identical (maximum difference between Ed. aedis and A. locustae categories is 5%). Notable transcripts include a retrotransposon that is similar to LTR retrotransposons present in Sorghum bicolor (AAD27571) and Nosema bombycis (ABE26655). All belong to the Ty3/Gypsy family of retrotransposons. Ed. aedis also possesses a methionine aminopeptidase 2 gene (MetAP-2), which is present in E. cuniculi. There were several transcripts present that appear homologous to proteins found in various eukaryotes, but are absent in other microsporidia examined to date. These include hypothetical or unknown proteins found in Oryza, Danio and Plasmodium, as well as genes encoding proteins with identified functions, such as an adenosine kinase, a lysine-tRNA ligase and an L-asparaginase (See Table 6.2). In addition, Ed. aedis encodes a putative hydrolase-like protein that is present in A. locustae, but absent in E. cuniculi. 105 E. cuniculi and A. locustae both contain a small number of introns in their genomes and consequently, they have retained a minimal set of splicing machinery. These two organisms are not closely related (See Fig. 6.1), but they do share a few conserved introns (Limpright and Fast, personal communication). Therefore, there is reason to suspect that some of these introns may also be present in Ed. aedis. Fortunately, seven transcripts of the gene encoding ribosomal protein L5 (which contains an intron in E. cuniculi) were recovered from the Ed. aedis library. These sequences were used to design primers to amplify the L5 gene from genomic DNA. It was found that the Ed. aedis L5 gene does not contain an intron. 6.3.2 Transcript Structure As Ed. aedis is an intracellular parasite and therefore cannot be easily cultured, RNA was limited and the library could not be constructed in a 5’ cap-dependent manner. Therefore, nearly all of the inserts encoding the same gene were of different lengths, and most were 5’ truncated. However, some of Ed. aedis’ transcripts appear to have very long 5’ untranslated regions (UTRs) of several hundred base pairs. To further assess transcript structure, cap-dependent 5’ RACE (rapid amplification of cDNA ends) was conducted on transcripts from a moderately represented gene, glucosamine fructose-6-phosphate aminotransferase. 5’ RACE confirmed that transcript lengths for this gene do vary, with 5’ UTRs ranging from 255 to 348bp (See Fig. 6.4). Contrary to the variable start sites of the transcripts, nearly all appear to have identical end sites. The notable exceptions are the heat shock related 70kDa protein transcripts, which have somewhat variable 3’ polyadenylation sites. There were frequently single nucleotide differences between sequences in contigs, but these differences were usually restricted to silent third position substitutions. In instances where the substitutions are not silent, they are conservative amino acid substitutions. These differences could represent different copies of the same gene or different alleles within the population (UTRs were not available in most cases to determine which). 106 6.4 Discussion 6.4.1 Comparing Microsporidian Transcriptomes This is the second microsporidian EST project to be conducted and the first from a microsporidian possessing a large genome, allowing for a meaningful comparison of microsporidian spore transcriptomes. Despite the vast differences in genome size and life cycle complexity between Ed. aedis and A. locustae, their transcriptomes are highly similar in their compositions. The proportions of unique transcripts encoding proteins devoted to the “protein destination” COG category in both Ed. aedis and A. locustae are relatively large (19% and 16%, respectively) (See Fig. 6.3). It is interesting to note that proteomic work correlates with these results, as the number of proteins in E. cuniculi devoted to the “protein destination” COG category form a large percentage of the total proteins present (~28%) that have known functions (Brosson et al., 2006). When the total number of unique genes found in Ed. aedis and A. locustae are compared based on COG category classification, the percentages in each category are close to identical (See Fig. 6.3). The largest differences lie in the categories of cellular organization and biogenesis, cellular communication and signal transduction and cell rescue, defense, cell death and aging. One notable difference between the two spore transcriptomes is that no transposable elements were recovered in the A. locustae ESTs, whereas Ed. aedis transcribes a retrotransposon of the Ty3/gypsy family. Transposable elements have been previously reported to exist in the genomes of Nosema bombycis (Xu et al., 2006), Spraguea lophii (Hinkle et al., 1997), Brachiola algerae and Ed. aedis (Williams et al., 2008) (See below). To the best of our knowledge, this is the first instance of documented transposable element transcription in microsporidia, and could indicate active transposition. Nearly 8% of the unique transcripts from Ed. aedis encode genes that are present in various eukaryotes, but are absent from other microsporidia. The existence of these genes has several possible explanations. Sequence data from microsporidia is scarce, and the only completely sequenced genome is that of E. cuniculi. Therefore, it is currently impossible to assert that these genes are absent in any microsporidia other than E. cuniculi. The possibility exists that they were present in the genome of the microsporidian ancestor, and were lost during genome reduction/compaction events in E. 107 cuniculi. These genes could also have arisen from lateral transfer events or they could have come to resemble genes in other organisms by chance or by convergence. Parsimoniously, the first explanation seems most likely, therefore, these data seem to suggest that the ancestor of microsporidia was not, indeed, compact to the extent of E. cuniculi. The MetAP-2 protein is a target for drug therapy in E. cuniculi (Pandrea et al., 2005). The Ed. aedis copy of the MetAP-2 gene is very similar to that present in E. cuniculi, and contains the amino acid residues that bind the drug fumagillin as well as those believed to coordinate metals. Like E. cuniculi, Ed. aedis lacks a polylysine tract at the N-terminus of the MetAP-2 protein that is present in animals, other fungi and plants. This tract plays a role in hindering the phosphorylation of eukaryotic initiation factor 2α (eIF2α), and its absence indicates that the microsporidian proteins likely lack this function (Pandrea et al., 2005). Although our work indicates that the Ed. aedis L5 gene does not contain an intron like its E. cuniculi homologue (see Results, above), there is reason to believe that there are introns elsewhere in the genome. There are several transcripts encoding proteins that act in pre-mRNA splicing: an arginine/serine rich pre-mRNA splicing factor (NP_597487 in E. cuniculi), a pre-mRNA splicing factor (NP_586183 in E. cuniculi) and a U5 associated snRNP (NP_586393 in E. cuniculi). These genes comprise 2.2% of the total unique genes found. 6.4.2 Hsp70 Roughly 28% of total Ed. aedis transcripts encoded some form of Hsp70, a heat shock protein that assists in the folding of other proteins. Hsp70 helps prevent proteins from becoming insoluble and also plays a role in various other intracellular processes, such as apoptosis (Mayer and Bukau, 2005). The action of Hsp70 allows mutant proteins to continue functioning by being refolded instead of being degraded, which necessitates the costly synthesis of more protein. The number of Hsp70 transcripts in the Ed. aedis ESTs is an order of magnitude higher than was found in A. locustae (2%) (O’Brien et al., 2007). We are cautious in this interpretation as we have not quantitatively assessed the 108 transcription level of Hsp70 in Ed. aedis, and it is likely that transcripts of this protein are somewhat overrepresented in the library. Although no E. cuniculi ESTs have been published, Brosson et al. (2006) investigated the proteins present in spores. Hsp70 constitutes a moderate amount of all protein present. Brosson and his colleagues classified all proteins based on their COG categories, and found that all “protein destination” proteins together comprise 21% of E. cuniculi’s proteome. Intriguingly, Brosson et al.’s (2006) experiments indicate that of the four copies of Hsp70 in E. cuniculi, the predominately expressed copy of Hsp70 in E. cuniculi is homologous to the highly represented transcript in Ed. aedis. In A. locustae, the most highly transcribed copy was most similar to the abundantly transcribed copy in Ed. aedis as well (O’Brien et al., 2007). Therefore, it is likely that microsporidia employ similar primary mechanisms to ensure proper folding of proteins. In other parasites and endosymbionts, such as Buchnera aphidicola, Hsp70 is also highly expressed (Wilcox et al., 2003) and may constitute up to 10% of the protein contained in the cell at any one time. In species that lead parasitic or endosymbiotic lifestyles, genetic drift and relaxed selection pressure frequently lead to an increased mutation rate. The need for Hsp70 in order for proteins to fold correctly seems to increase with both the size and number of mutations in the protein (Mayer and Bukau, 2005). Although microsporidian genomes appear to have had little rearrangement, the nucleotide mutation rate appears to be high in this group of organisms (Thomarat et al., 2004; Van de Peer et al., 2000). Microsporidia could, therefore, contain elevated levels of Hsp70 in order to allow folding of mutant proteins. 6.4.3 Transposable Elements One of the Ed. aedis ESTs closely matches the integrase domain of the Ty3/gypsy family of retrotransposons. Several of these elements were identified in a GSS of Ed. aedis (Williams et al., 2008) and a few other microsporidian species, but to the best of our knowledge, this is the first instance in which transcripts of any microsporidian retrotransposon have been found. Transcripts could be indicative of active transposition occurring in Ed. aedis’ genome. 109 Ty3/gypsy retrotransposons exist in many organisms ranging from the microsporidia Spraguea lophii (Hinkle et al., 1997), Brachiola algerae (Williams et al., 2008), and Nosema bombycis (Xu et al., 2006) to Saccharomyces, Drosophila and Sorghum. Ty3 elements have been well characterized in budding yeast, and exist in 1-4 copies per genome, where they are transcribed by RNA polymerase III. Transcription typically occurs only in haploid cells in the presence of mating pheromones (Kinsey and Sandmeyer, 1995). The N. bombycis genome contains at least 8 different retrotransposons in the Ty3/gypsy family, but unlike yeast, they are not exclusively located upstream of tRNAs (Xu et al., 2006). Nearly all N. bombycis retrotransposons encode a polyprotein containing 5 domains, which exist in a defined order: Gag, protease, reverse transcriptase, RnaseH and integrase. As many of the sequences in the Ed. aedis library appear to be 5’ truncated, it is possible that the other domains upstream of the integrase in the polyprotein are also present in genomic DNA. Indeed, the GSS project revealed sequences matching the reverse transcriptase domain (Williams et al., 2008). Although the microsporidian Vittaforma corneae is also known to possess at least one transposable element (Mittleider et al., 2002), it belongs to a different family than those present in Ed. aedis – the L1 family present in humans. The only completely sequenced microsporidian genome, that of E. cuniculi (Katinka et al., 2001), is completely devoid of transposable elements. The existence of similar transposable elements (of the Ty3/gypsy family) in the distantly related S. lophii, N. bombycis, B. algerae and Ed. aedis (See Fig. 6.1) implies that this element may have been present in the genome of the ancestor of microsporidia. Therefore, the process of genome compaction that gave rise to the E. cuniculi genome likely involved purging transposable elements. It has been suggested that transposable elements may act to reorganize genes within the genome. Xu et al. (2006) compared regions of synteny between N. bombycis and E. cuniculi chromosomes, as selection appears to be acting to retain gene synteny among microsporidia, even if they are only distantly related (Slamovits et al., 2004). In N. bombycis, transposable elements flank these syntenic regions (Xu et al., 2006). If Ed. aedis’ large genome is partially a product of transposable element proliferation, one would expect much less synteny between this species and other microsporidia. Perhaps 110 future research will elucidate other roles that transposable elements have played in shaping microsporidian genomes, especially since the minute genome of E. cuniculi seems to lack them, while they are present in larger genomes. The functions that these transposable elements perform in a given genome are cryptic at best, but evidence is emerging that they may be more than just simply parasitic DNA. Peaston et al. (2004) recently discovered that a class of mouse retrotransposons appears to regulate gene expression in embryos. 6.4.4 Transcript Structure Transcripts in A. locustae typically contain more than one gene. These transcripts do not necessarily contain complete open reading frames for all genes and the genes are frequently in opposite orientations (Williams et al., 2005). It is not known how many proteins are made from each transcript or whether this situation is typical for microsporidia, but recent work by Corradi et al. (2008) suggests that E. cuniculi also possesses multi-gene transcripts. Unlike A. locustae and E. cuniculi, Ed. aedis appears to transcribe very few multi- gene transcripts, if any at all. This is not unexpected, given that Ed. aedis genes appear to be separated by large intergenic spaces (Williams et al., 2008). However, there are a very small number of transcripts that appear to have overlapping reading frames (ORFs) in more than one frame, but neither encodes a protein that is significantly similar to anything in Genbank (See supplementary material). Clearly, Ed. aedis lacks the large proportion of multi-gene transcripts found in A. locustae. The Ed. aedis GSS could not rule out the possibility that local areas of compacted genes might exist (Williams et al., 2008). Given the lack of multi-gene transcripts identified, this seems increasingly unlikely. Also contrary to what is found in A. locustae, nearly all of Ed. aedis’ transcripts encode proteins in a positive frame (<1% are in a negative frame, compared to 17% in A. locustae) (Williams et al., 2005). Although antisense transcripts are used in many organisms (possibly also A. locustae) to suppress translation, it appears unlikely that this type of regulation occurs in Ed. aedis. Conversely, the large number of antisense 111 transcripts in A. locustae may be due to a lack of transcriptional regulation resulting from genome compaction. Ed. aedis’ transcripts seem to start at multiple locations upstream of the start codon (5’ UTR length is 180 bp on average) but terminate at the same position with a relatively short 3’ UTR (51bp on average) (See, for example, Figure 6.4). This is more in line with transcription in E. cuniculi and contrasts with the situation for A. locustae, where transcripts start directly upstream of the translation initiation site, but often terminate much farther downstream in the adjacent gene (Corradi et al., 2008). For comparison, the yeast S. cerevisiae contains much shorter 5’ UTRs than 3’ UTRs (15-75 and ~144bp, respectively (Zhang and Dietrich, 2005; Graber et al., 1999)), a common trend seen in other fungi, plants and animals. The reason for this reversal is unknown, since 3’ UTRs are ubiquitously used as translation regulators. It is likely that Ed. aedis lacks some of the translational control mechanisms present in other fungi, plants and animals (Mazumder et al., 2003). Gene name Species Genbank accession number 16S rRNA GENE Brachiola algerae AM422905 1-ACYL-SN-GLYCEROL-3-PHOSPHATE ACYLTRANSFERASE Encephalitozoon cuniculi NP_586146 26S PROTEASOME REGULATORY SUBUNIT 4 Encephalitozoon cuniculi NP_586091 26S PROTEASOME REGULATORY SUBUNIT 6 Encephalitozoon cuniculi NP_586128 26S PROTEASOME REGULATORY SUBUNIT 8 Encephalitozoon cuniculi XP_955738 40S RIBOSOMAL PROTEIN S2 Leishmania infantum XP_001466537 40S RIBOSOMAL PROTEIN S3 Encephalitozoon cuniculi XP_955676 40S RIBOSOMAL PROTEIN S4 Mycetophagus quadripustulatus CAJ17168 40S RIBOSOMAL PROTEIN SA or P40 Encephalitozoon cuniculi NP_584728 60S RIBOSOMAL PROTEIN L3 Encephalitozoon cuniculi NP_597630 60S RIBOSOMAL PROTEIN L4 Encephalitozoon cuniculi NP_597213 60S RIBOSOMAL PROTEIN L5 Encephalitozoon cuniculi NP_585846 6-PHOSPHOFRUCTOKINASE Encephalitozoon cuniculi NP_597579 ABC TRANSPORTER (MITOCHONDRIAL TYPE) #1 Encephalitozoon cuniculi NP_586426 ABC TRANSPORTER (MITOCHONDRIAL TYPE) #2 Encephalitozoon cuniculi NP_586426 112 Gene name Species Genbank accession number ACTIN Blakeslea trispora AAW32475 ARGININE/SERINE RICH PRE-mRNA SPLICING FACTOR Encephalitozoon cuniculi NP_597487 ASSOCIATED WITH RAN (NUCLEAR IMPORT/EXPORT) FUNCTION FAMILY MEMBER Caenorhabditis elegans NP_499369 ATP SYNTHASE Encephalitozoon cuniculi XP_955732 BELONGS TO THE ABC TRANSPORTER SUPERFAMILY Encephalitozoon cuniculi NP_597462 cAMP-DEPENDENT PROTEIN KINASE TYPE 1 REGULATORY CHAIN Encephalitozoon cuniculi NP_597223 CASEIN KINASE 1 HOMOLOG (INVOLVED IN DNA REPAIR Encephalitozoon cuniculi NP_597600 CATION-TRANSPORTING ATPase Encephalitozoon cuniculi NP_586078 CHOLINE PHOSPHATE CYTIDYLYLTRANSFERASE Encephalitozoon cuniculi NP_586276 DNA REPLICATION LICENSING FACTOR MCM2 Encephalitozoon cuniculi NP_584768 DNA REPLICATION LICENSING FACTOR OF THE MCM FAMILY MCM6 Encephalitozoon cuniculi NP_597420 DNA REPLICATION LICENSING FACTOR OF THE MCM FAMILY MCM7 Encephalitozoon cuniculi NP_585977 DNAJ PROTEIN HOMOLOG 2 Encephalitozoon cuniculi NP_586004 DNAK-LIKE PROTEIN Encephalitozoon cuniculi NP_586489 EUKARYOTIC TRANSLATION INITIATION FACTOR 4A Encephalitozoon cuniculi XP_955671 FIBRILLARIN (34kDa NUCLEOLAR PROTEIN) Encephalitozoon cuniculi NP_586197 GENERAL TRANSCRIPTION FACTOR Encephalitozoon cuniculi NP_597292 GLUCOSAMINE FRUCTOSE-6- PHOSPHATE AMINOTRANSFERASE Encephalitozoon cuniculi NP_586057 GLYCERALDEHYDE-3-PHOSPHATE DEHYDROGENASE Encephalitozoon cuniculi NP_586008 GUANINE NUCLEOTIDE BINDING PROTEIN BETA SUBUNIT Encephalitozoon cuniculi NP_597241 HEAT SHOCK RELATED 70kDa PROTEIN Encephalitozoon cuniculi NP_597563 HEAT-SHOCK PROTEIN HSP90 HOMOLOG Encephalitozoon cuniculi NP_584635 HISTIDYL tRNA SYNTHETASE Antonospora locustae AAT12372 HISTONE ACETYLTRANSFERASE TYPE B SUBUNIT 2 Encephalitozoon cuniculi NP_586003 HISTONE DEACETYLASE 1 Encephalitozoon cuniculi NP_597645 HISTONE DEACETYLASE Encephalitozoon cuniculi XP_955621 HISTONE H3 Mus musculus JQ1983 HSP 101 RELATED PROTEIN Encephalitozoon cuniculi NP_586448 HYPOTHETICAL PROTEIN ECU02_0840 Encephalitozoon cuniculi NP_584609 113 Gene name Species Genbank accession number HYPOTHETICAL PROTEIN ECU02_0950 Encephalitozoon cuniculi NP_584620 HYPOTHETICAL PROTEIN ECU06_0450 Encephalitozoon cuniculi NP_585801 HYPOTHETICAL PROTEIN ECU06_1280 Encephalitozoon cuniculi NP_585884 HYPOTHETICAL PROTEIN ECU07_0530 Encephalitozoon cuniculi NP_585981 HYPOTHETICAL PROTEIN ECU08_1500 Encephalitozoon cuniculi NP_597278 HYPOTHETICAL PROTEIN ECU09_0740 Encephalitozoon cuniculi XP_955628 HYPOTHETICAL PROTEIN ECU09_1700 Encephalitozoon cuniculi XP_955723 HYPOTHETICAL PROTEIN ECU11_1720 Encephalitozoon cuniculi NP_586478 LIM DOMAIN-CONTAINING PROTEIN Encephalitozoon cuniculi NP_586340 LONG CHAIN FATTY ACID CoA LIGASE Encephalitozoon cuniculi NP_586206 METHIONINE AMINOPEPTIDASE TYPE 2 Encephalitozoon cuniculi NP_586190 METHIONINE PERMEASE Encephalitozoon cuniculi NP_585905 NIFS-LIKE PROTEIN (CYSTEINE DESULFURASE) INVOLVED IN IRON- SULFUR CLUSTER SYNTHESIS Encephalitozoon cuniculi NP_586483 NUCLEAR SER/THR PROTEIN PHOSPHATASE PP1-1 GAMMA CATALYTIC SUBUNIT Encephalitozoon cuniculi NP_597385 P68-LIKE PROTEIN (DEAD BOX FAMILY OF RNA HELICASES) Encephalitozoon cuniculi NP_597238 PEPTIDE CHAIN RELEASE FACTOR SUBUNIT 1 Encephalitozoon cuniculi NP_597376 PEPTIDE ELONGATION FACTOR 2 Glugea plecoglossi BAA11470 PHOSPHATIDYLINOSITOL TRANSFER PROTEIN, ALPHA Danio rerio NP_957229 PHOSPHOMANNOMUTASE Encephalitozoon cuniculi NP_597365 POLYADENYLATE-BINDING PROTEIN 2 Encephalitozoon cuniculi NP_586226 POLYPROTEIN Sorghum bicolor AAD27571 PRE-mRNA SPLICING FACTOR Encephalitozoon cuniculi NP_586183 PROTEIN KINASE B-LIKE PROTEIN Plasmodium falciparum AAT06260 PROTEIN TRANSPORT PROTEIN SEC23 HOMOLOG (COPII COAT) Encephalitozoon cuniculi NP_586385 PUTATIVE HYDROLASE-LIKE PROTEIN Antonospora locustae AAU11090 PUTATIVE ZINC FINGER PROTEIN Encephalitozoon cuniculi NP_597297 SER/THR PROTEIN PHOSPHATASE 2- A Encephalitozoon cuniculi NP_584753 114 Gene name Species Genbank accession number SER/THR PROTEIN PHOSPHATASE PP2-A REGULATORY SUBUNIT B Encephalitozoon cuniculi NP_597423 SERINE/THREONINE PROTEIN KINASE (REQUIRED FOR ACTIN RING AND SEPTATION) Encephalitozoon cuniculi XP_965898 SIMILAR TO DNAJ-LIKE PROTEIN Nasonia vitripennis XP_001602403 SIMILARITY TO 14-3-3 PROTEIN 1 Encephalitozoon cuniculi NP_597610 SIMILARITY TO ADP/ATP CARRIER PROTEIN Paranosema grylli CAI30461 SIMILARITY TO CDC20 (WD-REPEAT PROTEIN) Encephalitozoon cuniculi NP_597660 SIMILARITY TO Hsp70-RELATED PROTEIN Encephalitozoon cuniculi NP_584537 SIMILARITY TO HYPOTHETICAL INTEGRAL MEMBRANE PROTEIN YQ55_CAEEL Encephalitozoon cuniculi NP_597662 SIMILARITY TO HYPOTHETICAL PROTEIN YAAT_BACSU Encephalitozoon cuniculi NP_597532 SIMILARITY TO HYPOTHETICAL PROTEIN YB36_METJA Encephalitozoon cuniculi NP_597239 SIMILARITY TO PUTATIVE AMINOACID TRANSPORTER YEU9_yeast Encephalitozoon cuniculi NP_584803 SIMILARITY TO SKT5 PROTEIN Encephalitozoon cuniculi NP_586349 SIMILARITY TO TRANSCRIPTION INITIATION FACTOR TFIIA Encephalitozoon cuniculi NP_597616 STE12 TRANSCRIPTION FACTOR Encephalitozoon cuniculi NP_586509 STRUCTURE-SPECIFIC RECOGNITION PROTEIN Encephalitozoon cuniculi NP_586030 T COMPLEX PROTEIN 1 SUBUNIT BETA Encephalitozoon cuniculi XP_955601 THREONYL tRNA SYNTHETASE #1 Encephalitozoon cuniculi NP_586084 THREONYL tRNA SYNTHETASE #2 Encephalitozoon cuniculi NP_586084 TRANSLATION ELONGATION FACTOR 1 ALPHA Glugea plecoglossi BAA12288 TRIOSE PHOSPHATE ISOMERASE Encephalitozoon cuniculi NP_586329 TUBULIN BETA CHAIN Encephalitozoon cuniculi NP_597591 U5 ASSOCIATED snRNP Encephalitozoon cuniculi NP_586393 UNNAMED PROTEIN PRODUCT (Hsp70) Candida glabrata XP_445544 VACUOLAR ATP SYNTHASE CATALYTIC SUBUNIT A Encephalitozoon cuniculi NP_586434 VACUOLAR ATP SYNTHASE SUBUNIT B Encephalitozoon cuniculi NP_586219 ZINC FINGER PROTEIN Encephalitozoon cuniculi NP_584833 115 Table 6.1: Unique Ed. aedis Transcripts that are Homologous to Genes Present in Other Microsporidia. Species names and Genbank accession numbers of top BLASTX hits are indicated. Bold text in the “Gene Name” column indicates instances where two different transcripts both had the same top BLASTX hit. Underlining indicates a copy of Hsp70 that is most similar to a protein that remains unnamed in Genbank. Gene name Species Genbank accession number 60S RIBOSOMAL PROTEIN L2 Babesia bovis XP_001612300 ADENOSINE KINASE Homo sapiens AAA97893 HYPOTHETICAL PROTEIN Candida albicans XP_717148 HYPOTHETICAL PROTEIN PY5484 Plasmodium yoelii yoelii XP_725949 L-ASPARIGINASE Dirofilaria immitis Q9U518 LYSINE tRNA LIGASE Saccharomyces cerevisiae CAA39699 PUTATIVE VESICULAR TRANSPORT FACTOR USO1P Candida albicans XP_710120 SEC63 DOMAIN CONTAINING PROTEIN Trichomonas vaginalis XP_001580151 PROTEIN PHOSPHATASE 2B Cryptosporidium hominis XP_666159 WD-40 REPEAT FAMILY PROTEIN Arabidopsis thaliana NP_201533 Table 6.2: Ed. aedis Genes that are Absent from Other Microsporidia. The species names and Genbank accession numbers of the top BLASTX hit for each gene are listed. 116 Figure 6.1: The Phylogenetic Relationships Between Several Microsporidia. Species that house transposable elements belonging to the Ty3/gypsy family are highlighted in blue, while species containing LTR transposons are highlighted in yellow. Genome sizes are indicated to the right of each species. (Adapted from Slamovits et al., 2004.) 117 Figure 6.2: Total Ed. aedis Transcripts Represented by COG Category With and Without Hsp70. Total A. locustae transcripts are provided for comparison. (A. locustae data adapted from Williams et al., 2005.) 118 Figure 6.3: Unique Ed. aedis Transcripts Represented by COG Category. Unique A. locustae transcripts are provided for comparison. (A. locustae data adapted from Williams et al., 2005.) Figure 6.4: 5’ RACE Conducted on a Moderately Represented Transcript in Ed. aedis Reveals Multiple Transcription Start Sites. ESTs are depicted in green and RACE products in purple. The predicted translational start codon is indicated by the orange arrow. As indicated, the E. cuniculi homologue of this gene contains 128 amino acids at N-terminus that appear to be absent in Ed. aedis. 119 6.5 References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. Becnel, J.J., Garcia, J.J. and Johnson, M.A. 1995. Edhazardia aedis (Microspora: Culicosporidae) effects on the reproductive capacity of Aedes aegypti (Diptera: Culicidae). J. Med. Entomol. 32: 549-553. Becnel, J.J., Sprague, V., Fukuda, T. and Hazard, E.I. 1989. Development of Edhazardia aedis (Kudo, 1930 N. G., N. Comb. (Microsporidia: Amblyosporidae) in the mosquito Aedes aegypti (L.) (DIptera: Culicidae). J. Protozool. 36: 119-130. Becnel, J.J. 1990. Edhazardia aedis (Microsporidia: Amblyosporidae) as a biological control agent of Aedes aegypti (Diptera: Culicidae). : 56-60. Brosson, D., Kuhn, L., Delbac, F., Garin, J., Vivares, C.P. and Texier, C. 2006. Proteomic analysis of the eukaryotic parasite Encephalitozoon cuniculi (microsporidia): a reference map for proteins expressed in late sporogonial stages. Proteomics 6: 3625- 3635. Corradi, N., Gangaeva, A. and Keeling, P.J. 2008. Comparative profiling of overlapping transcription in the compacted genomes of microsporidia Antonospora locustae and Encephalitozoon cuniculi. Genomics 91: 388-393. Graber, J.H., Cantor, C.R., Mohr, S.C. and Smith, T.F. 1999. Genomic detection of new yeast pre-mRNA 3'-end-processing signals. Nucleic Acids Res 3: 888-894. Hinkle, G., Morrison, H.G. and Sogin, M.L. 1997. Genes coding for reverse transcriptase, DNA-directed RNA polymerase, and chitin synthetase from the microsporidian Spraguea lophii. Biol. Bull. 193: 250-251. 120 Johnson, M.A., Becnel, J.J. and Undeen, A.H. 1997. A new sporulation sequence in Edhazardia aedis (Microsporidia: Culicosporidae), a parasite of the mosquito Aedes aegypti (Diptera: Culicidae). J. Invertebr. Pathol. 70: 69-75. Katinka, M.D., Duprat, S., Cornillot, E., Metenier, G., Thomarat, F., Prensier, G., Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P. et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414: 450-453. Kinsey, P.T. and Sandmeyer, S.B. 1995. Ty3 transposes in mating populations of yeast: a novel transposition assay for Ty3. Genetics 139: 81-94. Mayer, M.P. and Bukau, B. 2005. Hsp70 chaperones: cellular functions and molecular mechanism. Cell. Mol. Life Sci. 62: 670-684. Mazumder, B., Seshadri, V. and Fox, P.L. 2003. Translational control by the 3'-UTR: the ends specify the means. Trends. Biochem. Sci. 2: 91-98. Mittleider, D., Green, L.C., Mann, V.H., Michael, S.F., Didier, E.S. and Brindley, P.J. 2002. Sequence survey of the genome of the opportunistic microsporidian pathogen, Vittaforma corneae. J. Eukaryot. Microbiol. 49: 393-401. O'Brien, E., Koski, L., Zhang, Y., Yang, L., Wang, E., Gray, M.W., Burger, G. and Lang, B.F. 2007. TBestDB: a taxinomically broad database of expressed sequence tags (ESTs). Nucleic Acids Res. 35: D445-451. Pandrea, I., Mittleider, D., Brindley, P.J., Didier, E.S. and Robertson, D.L. 2005. Phylogenetic relationships of methionine aminopeptidase 2 among Encephalitozoon species and genotypes of microsporidia. Mol. Biochem. Parasit. 140: 141-152. Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D. and Knowles, B.B. 2004. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7: 597-606. Slamovits, C.H., Fast, N.M., Law, J.S. and Keeling, P.J. 2004. Genome compaction and stability in microsporidian intracellular parasites. Curr. Biol. 14: 891-896. 121 Streett, D.A. 1994. Analysis of Nosema locustae (Microsporidia:Nosematidae) chromosomal DNA with pulsed-field gel electrophoresis. J. Invertebr. Pathol. 63: 301- 303. Thomarat, F., Vivares, C.P. and Gouy, M. 2004. Phylogenetic analysis of the complete genome sequence of Encephalitozoon cuniculi supports the fungal origin of microsporidia and reveals a high frequency of fast-evolving genes. J. Mol. Evol. 59: 780- 791. Van de Peer, Y., Ben Ali, A. and Meyer, A. 2000. Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene 246: 1-8. Wilcox, J., Dunbar, H.E., Wolfinger, R.D. and Moran, N.A. 2003. Consequences of reductive evolution for gene expression in an obligate endosymbiont. Mol. Microbiol. 48: 1491-1500. Williams, B.A., Lee, R.C., Becnel, J.J., Weiss, L.M., Fast, N.M. and Keeling, P.J. 2008. Genome sequence surveys of Brachiola algerae and Edhazardia aedis reveal microsporidia with low gene densities. BMC Genomics 9: 200. Williams, B.A., Slamovits, C.H., Patron, N.J., Fast, N.M. and Keeling, P.J. 2005. A high frequency of overlapping gene expression in compacted eukaryotic genomes. Proc. Natl. Acad. Sci. USA 102: 10936-10941. Wittner, M. and Weiss, L.M. 1999. The microsporidia and microsporidosis. ASM Press, Washington, D. C. Xu, J., Pan, G., Fang, L., Li, J., Tian, X., Li, T., Zhou, Z. and Xiang, Z. 2006. The varying microsporidian genome: existence of long-terminal repeat retrotransposon in domesticated silkworm parasite Nosema bombycis. Int. J. Parasitol. 36: 1049-1056. Zhang, Z. and Dietrich, F.S. 2005. Mapping of transcription start sites in Saccharomyces cerevisiae using 5' SAGE. Nucleic Acids Res 33: 2838-2851. 122 Chapter 7 - Conclusion 7.1 Introduction My thesis work has examined genomes and gene expression in microsporidia in an attempt to better understand their origins and the ways in which these intracellular parasites eke out an existence inside host cells. Our knowledge of these organisms has grown significantly over the last two decades, yet we still know relatively little about the biochemical processes that are unique among microsporidia, the factors that modulate life cycle transitions and the myriad differences between the spore and the meront. 7.2 Assessing the Microsporidia-Fungi Relationship The first research question that I addressed involved the relationship of microsporidia to fungi. This was the first multi-gene analysis conducted to assess this relationship. The results yielded a well-supported tree that placed microsporidia as a sister to a combined ascomycete and basidiomycete clade, which was congruent with the findings of many previous single gene analyses. This work augmented the body of evidence suggesting that microsporidia are true fungi. However, like previous phylogenetic studies, the analysis relied on microsporidian gene sequences, which evolve extremely quickly. Therefore, the microsporidian species had comparatively long branches in the tree. Since the publication of this work in 2006, Lee et al. (2008) found an ingenious method to examine the microsporidian-fungal relationship. Although microsporidian genes evolve quickly, their genome structures are remarkably stable (Slamovits et al., 2004). Lee et al. found a syntenic group of genes that acts as the sex locus in zygomycete fungi in the genomes of three microsporidia. In addition, several other regions of synteny between the genomes of E. cuniculi and a zygomycete were discovered. This new data is compatible with the results of many phylogenetic analyses, particularly tubulin trees, which also place microsporidia as close relatives of the zygomycetes (Keeling, 2003). Although the results of Lee et al.’s work differ from those of my analysis in the specific placement of microsporidia within the fungi, both bodies of work suggest that microsporidia originated from a fungal ancestor, and are not a sister to fungi, as has been suggested previously (Tanabe et al., 2002). 123 By discerning microsporidian origins, my aim was to gain a clearer perspective of the evolutionary processes that lead to a highly specialized group of parasites. This viewpoint provided a framework from which to approach the other research questions addressed in my thesis. One of the immediately apparent features of some microsporidian genomes is their tiny size; microsporidia have some of the smallest eukaryotic primary nuclear genomes in existence. To reach such a diminutive size, they have undergone genome compaction, which has reduced the number of genes present, the sizes of remaining genes and the non-coding portions of the genome. I endeavored to study the effects of genome reduction on the biological processes that occur in microsporidia. 7.3 Genome Reduction and DNA Repair Systems in Encephalitozoon cuniculi By examining the DNA repair systems encoded by E. cuniculi’s genome, I found that genome reduction did not affect the five major repair pathways in a uniform manner. The single strand repair pathways lacked a few genes encoding proteins whose functions are non-essential to the functioning of that particular repair process. However, the double strand repair pathways both lacked core proteins that are integral for pathway function in yeast. These results suggest that E. cuniculi is probably not using the conventionally described pathways to repair double stranded DNA breaks. This is odd, given that these pathways are well-conserved among eukaryotes. Examining the DNA repair pathways systematically demonstrated how stripped down a genome can become while maintaining function. A recent investigation of the complement of protein kinases encoded (kinome) by E. cuniculi (Miranda-Saavedra et al., 2007) reached similar conclusions. E. cuniculi has the smallest kinome of any eukaryote yet examined, and may serve as a model to demonstrate which genes are indispensable among eukaryotes. 7.4 Genome Reduction and the E. cuniculi Spliceosome The study of a subset of E. cuniculi’s spliceosomal proteins via yeast complementation allowed an indirect examination of the functioning of the microsporidian spliceosome. E. cuniculi’s spliceosome is both smaller in the number and size of proteins of which it is composed than yeast. Therefore, the ability of an E. 124 cuniculi protein to complement a yeast mutant could have elucidated important functional residues that would have enhanced our understanding of the splicing mechanism. However, none of the chosen E. cuniculi proteins were able to rescue yeast mutants. Our negative results signify simply that E. cuniculi and yeast are divergent organisms whose proteins interact with each other using different functional residues. The flaw that lay in this set of experiments stemmed from the inability to derive meaningful data from negative results. This is a short-coming that is characteristic of most complementation experiments, and which we felt was offset by the possibility of obtaining positive results that were meaningful and informative from a functional perspective. Further efforts to characterize the microsporidian spliceosome should not rely exclusively on heterologous systems for experimental evidence. 7.5 Life Stage Differences in Splicing and Transcription The examination of mRNA splicing and transcription in different life stages of microsporidia has shown that large differences exist between the spore and the meront. While meront transcripts are commonly spliced, overlap less frequently with upstream genes, have shorter 5’UTRs and are initiated from fewer start sites, spore transcripts are never spliced, overlap frequently with upstream genes, have longer 5’UTRs and are initiated from more start sites. It appears that the situation in meronts is more in keeping with how splicing and transcription occur in other eukaryotes, and we suggest that the transcripts present in the spore are a byproduct of the cell “shutting down” as the spore is formed. It is also hypothesized that spore transcripts may not serve an informational role, and may instead be degraded upon germination. This work is the first of its kind to systematically compare different life stages of microsporidia and has shown that large differences exist between them. It will prove interesting to test these hypotheses with further experimental work. Non-germinating E. cuniculi spores could be transiently exposed to radioactive uracil and methionine to determine whether low levels of transcription and translation are occurring, as is the case in yeast spores (Brengues et al., 2002). Preliminary data suggests that the transcripts of different intron-containing genes are spliced at differing frequencies in meronts. Quantifying spliced vs. nonspliced transcript levels for each gene via real-time PCR may give us a better idea of the sizes 125 and compositions of introns that promote more efficient splicing. Furthermore, the meronts we examined were harvested 48 hours after host tissue infection. We also have meront samples from 24- and 72-hour infection time points. Since splicing appears to be temporally regulated, it will be intriguing to observe how the splicing profiles for different genes evolve as the life cycle progresses. 7.6 Gene Expression in a Non-Compact Microsporidian Genome The final research chapter of this thesis examines gene expression in a microsporidian whose genome is relatively large on the microsporidian scale. As its genome is predicted to be several times larger than that of E. cuniculi, Ed. aedis provided an opportunity to examine a species in which genome reduction was not apparent. We found a surprising congruency in the COG categories of unique transcripts between Ed. aedis and A. locustae, despite the fact that the two species are not closely related and have very different life cycles. We also documented the first case of transcription of a transposable element in a microsporidian. Unlike the genome of E. cuniculi, that of Ed. aedis contains large intergenic spaces. We demonstrated that multi-gene transcription does not occur in this organism. It has been found that this phenomenon is correlated with small intergenic spaces, so it stands to reason that it is absent in Ed. aedis. The proliferation of transposable elements may have contributed to these spaces. This body of data has shown that we still have much to learn about microsporidia as a group, especially since the bulk of the research has been conducted on species that have small genomes. Future projects exploring the genomes and transcriptomes of diverse microsporidia will help to illuminate which traits and genes are broadly conserved among microsporidia as a group and which are unique to individual species or genera. 7.7 General Conclusions Overall, my research has shown that some similarities underlie microsporidia. Despite having disparate life cycles and genomes that vary greatly in size, my research has shown that distantly related microsporidia retained similar patterns of transcription and mRNA splicing. My work has also explored issues surrounding genome reduction and provided a comparison of reduced and non-reduced species in the same phylum. A 126 new, more comprehensive picture of microsporidian diversity is emerging. This data will grant us a greater understanding of the avenues of parasite and eukaryotic evolution. I am proud to have contributed to this body of knowledge. 127 7.8 References Brengues, M., Pintard, L. and Lapeyre, B. 2002. mRNA decay is rapidly induced after spore germination of Saccharomyces cerevisiae. J Biol Chem 277: 40505-40512. Keeling, P.J. 2003. Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet. Biol. 38: 298-309. Lee, S.C., Corradi, N., Byrnes, E.J.3., Torres-Martinez, S., Dietrich, F.S., Keeling, P.J. and Heitman, J. 2008. Microsporidia evolved from ancestral sexual fungi. Curr Biol 18: 1675-1679. Miranda-Saavedra, D., Stark, M.J., Packer, J.C., Vivares, C.P., Doerig, C. and Barton, G.J. 2007. The complement of protein kinases of the microsporidium Encephalitozoon cuniculi in relation to those of Saccharomyces cerevisiae and Schizosaccharomyces pombe. BMC Genomics 8: 309. Slamovits, C.H., Fast, N.M., Law, J.S. and Keeling, P.J. 2004. Genome compaction and stability in microsporidian intracellular parasites. Curr. Biol. 14: 891-896. Tanabe, Y., Watanabe, M.M. and Sugiyama, J. 2002. Are Microsporidia really related to fungi?: a reappraisal based on additional gene sequences from basal fungi. Mycol. Res. 106: 1380-1391."@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2009-05"@en ; edm:isShownAt "10.14288/1.0067107"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Genetics"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "Attribution-NonCommercial-NoDerivatives 4.0 International"@en ; ns0:rightsURI "http://creativecommons.org/licenses/by-nc-nd/4.0/"@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Evolution of the microsporidian genome and gene expression"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/7210"@en .