UBC Faculty Research and Publications

Transcriptome analysis of the parasite Encephalitozoon cuniculi: an in-depth examination of pre-mRNA… Grisdale, Cameron J; Bowers, Lisa C; Didier, Elizabeth S; Fast, Naomi M Mar 28, 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-12864_2012_Article_4911.pdf [ 361.24kB ]
Metadata
JSON: 52383-1.0167812.json
JSON-LD: 52383-1.0167812-ld.json
RDF/XML (Pretty): 52383-1.0167812-rdf.xml
RDF/JSON: 52383-1.0167812-rdf.json
Turtle: 52383-1.0167812-turtle.txt
N-Triples: 52383-1.0167812-rdf-ntriples.txt
Original Record: 52383-1.0167812-source.json
Full Text
52383-1.0167812-fulltext.txt
Citation
52383-1.0167812.ris

Full Text

RESEARCH ARTICLE Open AccessTranscriptome analysis of the parasiteEncephalitozoon cuniculi: an in-depth examinationof pre-mRNA splicing in a reduced eukaryoteCameron J Grisdale1, Lisa C Bowers2,3, Elizabeth S Didier2,3 and Naomi M Fast1*AbstractBackground: The microsporidian Encephalitozoon cuniculi possesses one of the most reduced and compactedeukaryotic genomes. Reduction in this intracellular parasite has affected major cellular machinery, including the lossof over fifty core spliceosomal components compared to S. cerevisiae. To identify expression changes throughoutthe parasite’s life cycle and also to assess splicing in the context of this reduced system, we examined thetranscriptome of E. cuniculi using Illumina RNA-seq.Results: We observed that nearly all genes are expressed at three post-infection time-points examined. A largefraction of genes are differentially expressed between the first and second (37.7%) and first and third (43.8%)time-points, while only four genes are differentially expressed between the latter two. Levels of intron splicing arevery low, with 81% of junctions spliced at levels below 50%. This is dramatically lower than splicing levels found intwo other fungal species examined. We also describe the first case of alternative splicing in a microsporidian, anunexpected complexity given the reduction in spliceosomal components.Conclusions: Low levels of splicing observed are likely the result of an inefficient spliceosome; however, at least inone case, splicing appears to be playing a functional role. Although several RNA decay genes are encoded inE. cuniculi, the lack of a few key players could be reducing decay levels and therefore increasing the proportion ofunspliced transcripts. Significant proportions of genes are differentially expressed in the first forty-eight hours butnot after, indicative of genetic changes that precede the intracellular to infective stage transition.BackgroundMicrosporidia possess among the smallest, most com-pact eukaryotic genomes known [1]. All microsporidiaare intracellular parasites and alternate between a thick-walled, extracellular stage (spore) and intracellular stages(meronts, sporonts, and sporoblasts). When triggered, aspecialized structure called the polar tube shoots out ofthe spore and, upon contacting a host cell, creates apassageway into the host [2]. If a host cell is infected,meronts will proliferate, then undergo sporogony beforebeing released from the host cell. The mammalian pa-rasite Encephalitozoon cuniculi typically infects humanswith compromised immunity due to HIV-infection orimmune-suppressive therapy [3,4]. E. cuniculi was thefirst microsporidian to have its genome completelysequenced, and at 2.9 Mb this highly reduced genomepossesses many unusual features. It has a reduced cod-ing capacity, encoding less than two thousand protein-coding genes, most of which are shorter than theirhomologs in yeast [3]. It lacks genes for several biosyn-thetic pathways and components of the energy-produ-cing tricarboxylic acid cycle. This stripped down genomeprovides an opportunity to study cellular processes thatgenerally require large, complex sets of components, yetin microsporidia such complexity is reduced, while re-taining function. The spliceosome is a large macromol-ecular machine that is responsible for removing nuclearspliceosomal introns from pre-mRNA via two transeste-rification reactions [5,6]. In humans, this complex rivalsthe size of the bacterial ribosome and contains hundredsof protein components and five small nuclear RNAs(snRNAs). Conversely, E. cuniculi is only predicted to* Correspondence: nfast@mail.ubc.ca1Biodiversity Research Centre and Department of Botany, University of BritishColumbia, Vancouver, British Columbia, CanadaFull list of author information is available at the end of the article© 2013 Grisdale et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.Grisdale et al. BMC Genomics 2013, 14:207http://www.biomedcentral.com/1471-2164/14/207possess 30 spliceosomal proteins [3]. Such reduced eu-karyotes could hold important information about intronand spliceosome evolution as they harbor so few spli-ceosomal introns (less than 40), and some microsporidiaare completely devoid of introns and splicing machi-nery [7,8].In a previous study we assessed differences in E.cuniculi transcription and spliced transcript levels be-tween intracellular and extracellular life stages [9]. Wefound that transcripts have much longer untranslated re-gions (UTRs) and more transcription start sites in thespore stage compared to the intracellular stage. Splicingappears to take place exclusively in the intracellularstage leaving long, unspliced transcripts in the spore,that may play a structural rather than an informationalrole [9]. Although pre-mRNA splicing occurs, we foundno evidence for alternative splicing or mis-splicing [10].We also found that E. cuniculi intron-containing geneshave exclusively short 5'UTRs and that, on average,intracellular stage 5'UTR lengths are among the shortestknown [11]. Another unusual feature of microsporidiantranscription is the presence of overlapping transcriptsin the extracellular stage. E. cuniculi and the distantly re-lated microsporidian Antonospora locustae were bothfound to have overlapping transcripts [12,13]. However,transcripts in the former often initiate in upstreamgenes, while those in the latter often terminate in down-stream genes [12,13]. These peculiarities of microspori-dian molecular biology and the differences in transcriptsbetween extracellular and intracellular life stages led usto conduct a comprehensive investigation of the para-site’s transcriptome during intracellular stages.Using Illumina HiSeq technology we performed deepRNA sequencing of the E. cuniculi transcriptome 24 hr,48 hr, and 72 hr post-infection. This allowed us to assessspliced transcript and gene expression levels at multipletime points, find novel transcribed regions (NTRs), andimprove gene annotations. RNA-seq is an ideal methodfor examining transcriptomes, as it is relatively unbiased,has greater sensitivity than hybridization methods suchas microarrays, and produces high coverage of tran-scripts [14-17]. We analyzed splicing at all 37 splicejunctions to assess the role of these few remaining E.cuniculi introns, determined gene expression levels of allannotated genes, found several novel ORFs and, in gene-ral, increased our understanding of the dynamic trans-criptomes of these unusual parasites.Results and DiscussionGenomic analyses of microsporidian species have re-vealed a number of unusual features that are distinctfrom other eukaryotes. To date, the microsporidia exam-ined have either done away with introns and splicingmachinery entirely, or retain very few of each. How theseintrons are spliced with greatly reduced machinery, andwhy so few are retained are questions that pertain bothspecifically to the evolution of these parasites and, moregenerally, to intron splicing in eukaryotes. In this study,we present the first transcriptomic analysis of E. cu-niculi. Intracellular stage genotype 2 E. cuniculi was ex-amined at three time-points: 24 hr, 48 hr, and 72 hr afterinfection of RK13 cells (rabbit kidney fibroblast cell line).A total of 525.9 million reads were produced (Table 1),40.6 million (7.7%) of which aligned to the E. cuni-culi genotype 2 reference genome (GenBank accessionAEWQ01000000) [18], and 273.5 million (52.0%) ofwhich aligned to the rabbit host (Oryctolagus cuniculusreference genome: GenBank accession AAGW02000000).We saw no evidence of cross mapping between host andparasite genomes, as expected, owing to the availability ofreference genomes for both organisms and the high levelof divergence between them (data not shown). The num-ber of reads mapping to E. cuniculi at 24 hr, 48 hr, and72 hr post-infections were 13.9 million, 17.5 million, and9.3 million, respectively. This was sufficient coverage toassess splicing and examine gene expression levels at eachtime-point in order to address questions regarding intronfunction and evolution, as well as the expression ofpathogenesis-related and microsporidia-specific genes.Identification of novel transcribed regionsWe annotated eleven previously unidentified, transcribedORFs, three of which have the potential to play a role inpathogenesis. These eleven ORFs are distributed overeight chromosomes. E. cuniculi chromosomes were an-notated using GLIMMER to find putative ORFs with aminimum length cut-off of either 300 or 150nt [3]. ORFswere used for BLAST searches followed by protein do-main identification. This type of annotation leaves openthe possibility of ORFs not being annotated due to theirsmall size or lack of known, conserved domains. In orderto find novel ORFs that may have been overlooked bythe automated annotation software, we examined eachchromosome visually using Integrated Genomics Viewer[19] (see Methods for details).Table 1 Number of reads mapped to parasite and hostgenomesTime-point EncephalitozooncuniculiOryctolaguscuniculusTotal readsmappedT1 13895384 92511829 106407213T2 17454072 84792245 102246317T3 9286586 96176009 105462595Total 40636042 273480083 314116125The number of reads mapping to Encephalitozoon cuniculi and Oryctolaguscuniculus at three post-infection time-points are shown.Grisdale et al. BMC Genomics 2013, 14:207 Page 2 of 9http://www.biomedcentral.com/1471-2164/14/207The novel ORF on chromosome 3 (ECU03_0255) is apotential candidate for a pathogenesis-related gene in-volved in cell entry. Although no clear function for thisORF could be predicted from similarity searches, weak(30%) similarity to a viral entry protein could suggestthat the product of this ORF functions in host invasion.We discovered two additional ORFs that are so far uni-que to microsporidia, and therefore may play a rolerelating to their parasitic lifestyle. Novel ORF ECU03_0715 has a clear homolog in E. hellem, sharing 72%identity over all 116 amino acids. Although not pre-sent in all known microsporidian genomes, this ORFshares similarity with genes of unknown function inAntonospora locustae and Nematocida parisii, two dis-tantly related microsporidia. A second ORF that ap-pears to be microsporidia-specific is ECU06_0735, whichshares 41% identity over 133 of its 146 amino acids withhomeobox domain-containing transcription factors inother Encephalitozoon species. The products of theseORFs will require functional analysis to ascertain thecellular roles of their microsporidia-specific proteinproducts.An additional ORF (ECU08_1555) we discovered hasno predicted connection to pathogenesis, but may playan important cellular role as it shows similarity to thenucleolar protein NOP10. NOP10 is associated withsnoRNAs in ribonucleoprotein complexes that are in-volved in 18S rRNA production, rRNA pseudouridy-lation, and are components of the telomerase complex[20]. Additional novel ORFs had very weak similarity toknown proteins, and were identified based on transcrip-tion signal alone (data not shown). Also, several pre-dicted intergenic regions were transcribed with distinctboundaries but no ORF could be assigned on eitherstrand. These may represent important non-coding RNAsor possibly even unknown selfish genetic elements.All coding genes are transcribed in intracellular E. cuniculiThe expression data revealed that nearly all 1981 geneshad detectable levels of expression in all three time-points (see Additional file 1): all 1981 genes wereexpressed 24 hr post-infection, 1980 genes were expressed48 hr post-infection, and 1979 genes were expressed 72 hrpost-infection. The twenty genes with highest averageexpression, in descending order, include spore wall protein 1,RNA-binding domain-containing protein (discussed below),translation elongation factor 1 alpha, actin, histonesH2B/H3/H2A/H4, heat shock protein 70, and riboso-mal protein L9. The remaining ten genes encode hypo-thetical proteins with unknown functions. As expected,many highly-expressed genes have housekeeping func-tions; however, the most highly expressed gene, exclud-ing hypotheticals, is a spore wall protein-encodinggene. This highlights the priority of preparing to formthe infective stage, even as early as the first 72 hrs fol-lowing infection. In summary, essentially all E. cuniculiprotein-coding genes are expressed during the firstthree days post-infection in tissue culture.High frequency of differentially expressed genes in thefirst 48 hrsAlthough nearly all genes are expressed at all time-points, we found an abundance of genes with considera-ble differences in expression levels between time-points.There were 746 (37.7%) genes differentially expressedbetween 24 hr and 48 hr post-infection and 867 (43.8%)genes differentially expressed between 24 hr and 72 hr(Figure 1A,B). However, between 48 hr and 72 hr therewere only 4 genes differentially expressed (Figure 1C),all with fairly weak fold changes of less than 0.5. Thispattern, where many genes are differentially expressedwithin the first 48 hrs but not after, has implications forthe life-cycle of this parasite, such as the possibility thatspore formation begins by 48 hr post-infection.Evidence from expression data suggests that E. cuni-culi meronts undergo a shift towards producing spore-related genes by 48 hr post-infection. The ten genes withlargest positive and negative fold change between 24 hrand the two subsequent time-points includes mostlyhousekeeping genes and genes encoding hypotheticalproteins. An exception to this is polar tube protein 2,whose gene had some of the strongest positive foldchanges, both from 24 hr to 48 hr (2.25) and from 24 hrto 72 hr (2.45). Also, the gene encoding polar tube pro-tein 1 showed a similar pattern, with a fold change of2.05, while the spore wall protein-encoding gene had afold change of 1.55. This suggests that expression ofspore-related genes increases by 48 hr and spore for-mation could be taking place, however we did not seeevidence of spore-specific transcripts with extended50UTRs [11], even by 72 hr post-infection. This is in linewith previous experiments, which have found a spore re-lated gene to have increased expression between 24 hrand 72 hr post-infection [21], and evidence of spore-containing vacuoles beginning at 120 hr post-infection[22].Housekeeping genes are down-regulated after 24 hr,providing evidence that proliferation is taking place verysoon after spore germination and likely for a very brieftime. Among the ten most strongly down-regulated ge-nes are several ribosomal protein genes, ubiquitin, anRNA polymerase, two novel ORFs, and several hypothet-ical protein encoding genes. Down-regulation of house-keeping genes after the 24 hr time-point likely occursbecause their expression is high upon germination. Wealso found that, while many ribosomal protein geneshave relatively weak fold changes, they are all negative,further evidence that housekeeping genes as a whole areGrisdale et al. BMC Genomics 2013, 14:207 Page 3 of 9http://www.biomedcentral.com/1471-2164/14/207being down-regulated after 24 hrs. In summary, it seemsthat spore-specific proteins are produced early in theintracellular life-stage, although spores are likely notformed until after 72 hr post-infection, and housekeep-ing genes are being down-regulated after 24 hr, possiblyas a result of slowing intracellular stage replication rates.Analysis of pre-mRNA splicingE. cuniculi has a reduced spliceosomeGene annotation in E. cuniculi identified just 30 ORFswith similarity to spliceosomal components [3], predic-ting one of the smallest functional spliceosomes known.Several components that are required for viability inyeast are absent in E. cuniculi, raising questions aboutthe necessity of these components, the redundancy builtinto this pathway, and the flexibility of the spliceosome.Also, one of the five RNA components, the U1 snRNA,has not been identified [23]. This suggests that splicingmay be occurring without a complete U1 complex, whichis involved in the key first step of splicing when the intronis recognized and bound at the 5' splice site [5]. The re-duction in E. cuniculi spliceosome machinery is severeand is likely to have an effect on the splicing reaction, po-tentially reducing splicing efficiency.Discovery of introns and splice isoformsThe original genome annotation of E. cuniculi predicted16 introns, almost all of which were in ribosomal proteingenes [3]. The number of introns was increased to 34after a thorough search was performed with a combin-ation of visual and string-search algorithm methods [10].Many of these new introns were found in non-ribosomalprotein-coding genes, which has implications for ourunderstanding of intron retention and evolution inMicrosporidia (discussed in [10]). Ranging in size from22–76 nt, E. cuniculi introns are among the smallestspliceosomal introns found in nature, surpassed only bythe miniature introns of Paramecium tetraurelia [24]and the Chlorarachniophyte nucleomorph genomes [25].All E. cuniculi introns have standard GT-AG boundaries,and relatively strict 5' splice site and branch point motifs(see Additional file 2). This is in line with phylogene-tically broad genomic analyses, which have shown thatstrict splicing motifs are common in intron-poor ge-nomes [26,27]. Utilizing the RNA-seq dataset we con-firmed that all previously annotated introns are indeedspliced and are bona fide introns. Also, we found onenew intron that creates a novel ORF (ECU09_1255), andconfirmed splicing of two others that were recently dis-covered in a comparison of four Encephalitozoon species[28]. These three recently detected introns were eachconfirmed with more than a hundred spliced transcripts,as well as having motifs that are characteristic of E.cuniculi introns (Additional file 2).We have found the first evidence of alternative splicingin a microsporidian parasite. A small proportion of tran-scripts for three intron-containing genes utilize alterna-tive downstream acceptor sites. Although unexpected tofind alternative splicing in such a reduced, streamlinedsystem, the alternative transcripts are so rare that theymay represent erroneous splice events. In all cases ob-served, the alternative isoform represents less than 5% ofthe reads at the corresponding junction. Despite theirlow abundance, it is possible that the alternative formscould be utilized as another post-transcriptional regula-tory mechanism by inducing rapid decay, as has beenhypothesized in P. falciparum [29]. If these transcriptswere to induce decay this would help explain the rarityat which we observe them. Unfortunately, we lack thetools needed to manipulate decay rates in microsporidia,and therefore cannot test this hypothesis directly. Wealso see evidence of alternative intron retention, most1 102 104 1 102 104 1 102 104-1012Mean Expression Mean Expression Mean Expressionlog2 Fold Changelog2 Fold Changelog2 Fold Change-10123-1012A B CFigure 1 Differential expression across three post-infection time-points. Plot of log2 fold change versus mean expression level for allE. cuniculi genes. Red dots indicate those genes that are differentially expressed and black dots indicate those that are not. (A) Differentialexpression between 24 hr and 48 hr, (B) 24 hr and 72 hr, and (C) 48 hr and 72 hr post-infection.Grisdale et al. BMC Genomics 2013, 14:207 Page 4 of 9http://www.biomedcentral.com/1471-2164/14/207notably in ECU11_0850 (Figure 2). In this case the up-stream intron is spliced at higher levels than the down-stream intron, which would result in some transcriptsbeing truncated at the 3' end, but potentially still func-tional. Since no genes that function in alternative spli-cing regulation, such as SR protein family genes, havebeen found in E. cuniculi, we suggest that variation inintron motif features are responsible for differing levelsof intron retention within a gene. It has been shown pre-viously that modification to intron motifs can affect spli-cing efficiency [30]. Therefore, alternative splicing couldbe playing a minor role in E. cuniculi gene expression.Comparative analysis of intron-containing transcriptsWe quantified transcript abundance of intron-containinggenes to assess levels of intron-retention versus intronremoval in order to get a better understanding of theroles of pre-mRNA splicing and RNA decay in E. cu-niculi. There are several possible scenarios with regardsto levels of pre-mRNA splicing and RNA decay. Onescenario would be that decay rates are low, and thelevels of intron retention or removal are dictated by spli-cing levels. Another option would be that decay is effi-cient, creating high levels of spliced transcripts whetheror not splicing is efficient, as well. We found that, onaverage, levels of spliced transcripts in E. cuniculi werevery low (Figure 2). A staggering 30 of 37 introns(81.1%) had less than 50% of transcripts with introns re-moved, and 22 (59.5%) of these had below 20% splicedtranscripts. Levels of intron-lacking, or spliced, tran-scripts ranged from less than 5% to over 85%, with oneparticularly interesting outlier at the high end of therange. The gene ECU09_1470, an RNA binding domain-containing protein-coding gene, had previously beennoted as unusual for containing the longest E. cuniculiintron. In this study we found further reason to examinethis gene closely as it had the highest levels of splicingand it was one of the few introns with significant diffe-rences in splicing levels between time-points. On theother hand, since all E. cuniculi introns contain stopcodons or cause frameshifts if not properly removed,it is surprising that the majority of them appear to bespliced at such low levels. For example, over half ofthe transcripts of thirty of these genes appear to benon-functional because they retain introns. This sug-gests that decay rates are low (discussed below) andpre-mRNA splicing has a strong influence on levelsof transcripts with introns retained or removed.To assess whether these transcripts were unique tomicrosporidia or common to parasites and organisms0102030405060708090100Spliced/Total02_077002_061002_088004_048504_135704_135505_067005_025006_065006_0900i06_121306_1080i06_144507_035507_171007_146007_100508_122508_103008_103008_095008_095008_178009_086009_039509_147009_125510_080510_099010_1570i10_157509_013010_080511_106011_085011_085011_0505Figure 2 Splicing levels of all E. cuniculi intron-containing genes. Levels of splicing were determined by measuring the number of splicedand unspliced transcripts and dividing spliced by total transcripts to produce a percentage of splicing. From left to right, splicing levels at 24 hr,48 hr, and 72 hr are indicated by grey bars. E. cuniculi gene names are on the x-axis. Significant differences in splicing levels between time-pointsare shown by darkened boxes along the x-axis. From left to right, darkened boxes indicate significant differences between 24 hr and 48 hr, 24 hrand 72 hr, and 48 hr and 72 hr.Grisdale et al. BMC Genomics 2013, 14:207 Page 5 of 9http://www.biomedcentral.com/1471-2164/14/207with compact genomes, we performed a similar examin-ation of splicing levels in a free-living and a parasiticfungus. The transcriptomes of Saccharomyces cerevisiaeand Candida albicans encode 306 and 540 introns, re-spectively [31,32]. The introns of both are similar in size,generally in the 50-1000nt range [33,34]. Although thesefungi possess similarly sized spliceosomes that lack overtwenty components found in mammals, they encodemore than twice as many components as E. cuniculi,and therefore, E. cuniculi still represents a model of ex-treme reduction.Levels of splicing in both S. cerevisiae and C. albicanswere distinctly different from those observed in E. cu-niculi, with averages of 80% in S. cerevisiae and 95% inC. albicans (Additional file 3). We found that 32 of 46(69.6%) S. cereveisiae introns were spliced at levels above80%, while 39 of 46 (84.8%) were spliced at levels above50%. Splicing levels in C. albicans were comparable with39 of 48 (81.3%) spliced at levels above 80%, and 43 of48 (89.6%) spliced at levels above 50%. Also, a similaranalysis of splicing levels has been performed in the rela-tively reduced parasitic protist P. falciparum, the causa-tive agent of Malaria [29]. The authors found that in thisunicellular parasite splicing levels were quite high onaverage, with a median of five times more spliced readsthan intron-retained reads observed [29]. They also notethat only 5.6% of introns were spliced at levels below50% [29]. Therefore, spliced transcript levels in E. cu-niculi are drastically lower than those in both a fungaland a very distantly related protistan parasite, as well asa free-living fungus. This result, along with the fact thatthe E. cuniculi spliceosome is much more reduced thanP. falciparum and both fungal species, indicates that itmay not be the life-style of the organism that is havingsuch an effect on splicing, but the severe reduction ofthe spliceosomal machinery. If, over evolutionary time-scales, the loss of spliceosomal components resulted indecreases in splicing levels, the reduction of the spli-ceosome could not have reached its current point unlessthe levels of intron-containing gene expression were ac-ceptable for cell viability and decay rates increased tocompensate for increased intron-containing transcripts.Therefore, the spliceosomal core is likely much smallerthan we expect, since mutations in introns and increasesin gene expression levels can compensate for decreasedsplicing levels.One possible reason for the abundance of transcriptswith introns retained is that they could be playing afunctional role in gene regulation. For example, severalribosomal protein-coding genes in yeast are known toperform autoregulatory splicing: where the product ofthe splicing reaction inhibits further splicing by specific-ally binding to newly made transcripts [35-37]. Otheryeast genes have their splicing regulated by environmentalstress, such as amino acid starvation [38], or in con-junction with the meiotic cycle [39]. We found 11 of 37junctions with significant differences in splicing levels be-tween time-points, most with relatively modest changes(Figure 2). Interestingly, one of the few genes with two in-trons had significant changes in splicing in both introns,including the largest change (30%), and high variability inlevels between introns (Figure 2). This provides evidencethat splicing may be playing a regulatory role. However,even with nearly a third of intron-containing genes show-ing differences in splicing levels over the course of infec-tion, nearly all changes are too small to warrant strongevidence of regulatory splicing. Also, we failed to find anystrong compensatory role of splicing to moderate expres-sion levels of ribosomal genes, in order to balance theirrelatively high levels of variability. The low levels of spli-cing observed do not seem to be the result of regulation atthe level of splicing in most cases, however, the splicingpatterns of a few genes are indicative of regulation and willrequire further examination.Another plausible explanation for the elevated levels ofintron-retained transcripts is that RNA decay may not befunctioning efficiently in E. cuniculi. Since all E. cuniculiintrons either contain stop codons or induce frameshiftsthat result in downstream pre-mature stop codons, intronretention should induce transcript degradation by an RNAdecay pathway. Metabolic pathways are generally reducedin E. cuniculi [3], so complete RNA decay pathways wouldnot be expected. However, E. cuniculi appears to haveretained a small number of decay proteins, encoding ORFswith similarity to key players including Upf1, Dcp2, Dis3,Dhh1, Ccr4, and Nmd5 (Additional file 4). It is likely thatthese few decay proteins have evolved to function inthe absence of their canonical reaction partners, similar tothe spliceosome and DNA repair system [40], as the cellwould presumably not be able to function properly with-out RNA degradation. However, as we predict with splice-osomal functioning, there may be a significant reductionin decay efficiency that could play a part in increasing theproportion of unspliced transcripts present. Yet, to invokereduced RNA decay as the sole source of these results,decay would have to be very inefficient indeed - a situationthat seems unlikely given that no other obvious abnormal-ities are observed in the transcriptome. Although a formalpossibility, it seems unlikely that decay alone is the causeof the high levels of unspliced transcripts. Therefore, theloss of spliceosome components is likely the cause of re-duced splicing activity, and in combination with low decayrates, results in a large proportion of unspliced transcripts.ConclusionsAssessing the transcriptome of E. cuniculi allowed us toimprove the genome annotation, uncover novel tran-scribed regions that could play a role in pathogenesis,Grisdale et al. BMC Genomics 2013, 14:207 Page 6 of 9http://www.biomedcentral.com/1471-2164/14/207discover new introns, and assess levels of intron splicing.We found spliced transcript levels to be surprisingly lowon average, most likely as a result of spliceosomal reduc-tion, but with the potential for decreased decay rates tobe playing a role. Gene expression levels vary over thecourse of infection; tremendous numbers of genes aredifferentially expressed in the first 48 hrs post-infection,suggesting a major genetic change that likely precedes alife-stage change. The reduction of spliceosome andRNA decay pathway components appears to be the causeof decreased splicing efficiency and an accumulation ofunspliced, non-functional transcripts. This suggests that abalance is maintained between inefficiency resulting fromgene loss, and continued pressure of genome reduction.MethodsRNA preparationE. cuniculi (Genotype II) was cultured in the rabbit kid-ney fibroblast cell line (CCL-37, American Type CultureCollection, Manassas, VA USA). Intracellular merontstages of E. cuniculi appear to bind to the parasitopho-rous vacuole membrane and thus cannot be physicallyseparated from host cells. Total RNA therefore, wasextracted from two biological replicates of RK13 cells in25cm2 tissue culture flasks 24 hr, 48 hr, and 72 hr post-infection using the Ambion RNAqueous kit (Ambion,Austin, TX). Extracted RNA was treated with TURBODNase (Ambion, Austin, TX) to eliminate any contamin-ating DNA. RNA quality was assessed on an AgilentBioanalyzer 2100 (Agilent, Santa Clara, CA) and RNAquantity was measured on a Qubit 2.0 fluorometer (LifeTechnologies Corp., Carlbad, CA).RNA-seq library preparationA total of six Illumina libraries were prepared accordingto the TruSeq library preparation protocol (Illumina,Hayward, CA). A total of 4ug of RNA from each of thesix DNase-treated samples was used as starting material.Library quality control and pooling were performed bythe Biodiversity Research Centre (BRC) sequencing faci-lity (UBC, Vancouver, BC).Illumina sequencing and data processingPaired-end sequencing was performed on an IlluminaHiSeq 2000 at the BRC sequencing facility. The six li-braries were multiplexed and sequenced in two lanes inorder to give two technical replicates for each of thebiological replicates, and to help avoid bias associatedwith a particular flow cell or lane therein [41]. Althoughpaired-end RNA-seq does not account for possible anti-sense and overlapping transcription, our previous workhas indicated that such transcripts are limited to theextracellular spore stage of the parasite [9,11-13].Raw sequence data was processed and converted tofastq format. Since RNA was obtained from E. cuniculigenotype 2 infected RK13 cells, reads were mapped tothe genotype 2 reference genome (GenBank accessionAEWQ01000000) [18], as opposed to strain GB-M1 [3].Reads mapping to E. cuniculi are available at the NCBI Se-quence Read Archive under study accession SRP017112.The short-read aligner Bowtie version 0.12.7 [42] was usedfor read mapping, using default mismatch parameters,and allowing only a single alignment for each read. Bio-logical and technical replicates showed extremely highlevels of correlation (see Additional file 1), as has beenseen in previous RNA-seq experiments [31,32,43,44].SAMtools version 0.1.18 [45] was used to process SAMand BAM alignment files. Alignments were visualizedusing the Integrative Genomics Viewer version 2.0.7 [19].Expression levels were measured in the standard frag-ments per kilobase per million mapped reads (FPKM) for-mat [43]. We found 45 genes with less than twenty readsof coverage in at least one time-point, suggesting thattheir expression may be the result of background or anti-sense transcription, and therefore not of biological signifi-cance. However, 42 of these encode tRNAs, 5S rRNA, orU2 snRNA, which were not expected to have read cover-age following polyA-selected library preparation.Assessing splicing efficiencyAttempts were made to use Tophat [46] as a splice junc-tion mapper, however it was not able to detect introns inE. cuniculi. Therefore, a custom Bowtie reference wasmade in order to automate splicing level counts. The se-quences of all E. cuniculi introns and one hundredflanking nucleotides were obtained from the genome ref-erence [18]. Two reference sequences were created foreach intron locus, one containing the intron sequenceand one with the intron sequence removed. The flankingsequence was also reduced to 96nt at each end of thesplice junction. Therefore, in order for a read to map toone of the reference sequences, it must overlap thesplice junction (without the intron) or the intron itselfby a minimum of 5nt. The data set was mapped to thisreference using Bowtie, producing a SAM output file.SAMtools was used to obtain mapping statistics for thereference sequences, producing counts of the number ofreads that map to the spliced and unspliced referencesequences. The number of spliced reads was then di-vided by the total number of reads covering each splicejunction, in order to produce a measure of the splicinglevels. Pairwise comparisons of splicing levels for eachintron-containing gene were performed with correctedPearsons’s chi-squared tests in R [47]. Pairs of splicinglevel values were considered to be significantly differentif the chi-squared p-value was less than 0.01. As de-scribed above, splicing levels observed are unlikely toGrisdale et al. BMC Genomics 2013, 14:207 Page 7 of 9http://www.biomedcentral.com/1471-2164/14/207result from antisense transcription in this stage of theparasite. Indeed, the presence of significantly differentsplicing levels across time points for several introns, dif-ferent splicing levels for two introns in the same gene,and several genes showing high levels of splicing, furthersupports previous observations that antisense transcrip-tion is not widespread in intracellular E. cuniculi, and istherefore unlikely to be responsible for the splicing levelsobserved.Custom Bowtie reference sequences were prepared for80 randomly selected introns from Saccharomyces ce-revisiae and Candida albicans, as described above. AllRNA-seq reads from the publicly available datasets for S.cerevisiae (SRX000559-SRX000564) [31] and C. albicans(SRP002852) [32] were mapped against the respectivecustom reference sequences, allowing spliced and un-spliced reads to be counted (as above). Forty-six S. ce-revisiae and forty-eight C. albicans junctions remainedafter filtering for those with at least 50X coverage.Differential gene expression analysisAfter mapping with Bowtie, read counts were obtainedfor all E. cuniculi ORFs using HTseq (http://www-huber.embl.de/users/anders/HTSeq/). The read counts werethen analyzed for differential expression (DE) using DESeq[48], an R/Bioconductor package [47,49]. A p-value cut-off of 0.01 was used for the DE analysis. The custom E.cuniculi gene annotation file used with DESeq was createdfrom the ecotype II genome assembly files [18] using acustom Python [50] script (available upon request).Search for novel transcribed regions (NTRs)The extreme gene-dense nature of the E. cuniculi genomemade it unreliable to use a custom script to search forNTRs. Therefore, the read alignment files were searchedvisually for NTRs using IGV. The search parameters usedwere: a minimum of 10X coverage, no overlap with previ-ously annotated ORFs, and distinguishable borders withregards to the reads mapping to adjacent ORFs, in orderto avoid counting untranslated regions.Additional filesAdditional file 1: Gene expression levels. Expression levels in FPKM areshown for all 1985 E. cuniculi genes at three post-infection time-points.Additional file 2: Intron motifs. (A) Weblogo of 34 E. cuniculi intronmotifs, showing strict 5' splice site, branch point, and 3' AG. (B) Weblogoof three recently discovered introns, with intron motifs that areconsistent with currently annotated introns. (C) Combined old and newdata for a total of 37 introns, showing very little change from (A).Additional file 3: Splicing levels in two fungal species. Levels ofsplicing found for 46 Saccharomyces cerevisiae introns (A) and 48 Candidaalbicans introns (B). Splicing level was measured by counting the numberof spliced and unspliced transcripts and then dividing spliced by totaltranscripts to give a percentage of splicing.Additional file 4: RNA decay genes. Table of six key RNA decaypathway genes found in E. cuniculi. Gene names in yeast and are shown,as well as the protein BLAST e-values.AbbreviationssnRNA: Small nuclear RNA; UTR: Untranslated region; ORF: Open readingframe; NTR: Novel transcribed regions; DE: Differential expression;FPKM: Fragments per kilobase of exon per million fragments mapped.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsCJG extracted RNA, prepared Illumina RNA-seq libraries, performed andinterpreted sequence analyses, and drafted the manuscript. LCB and ESDprovided parasite material and contributed to interpretation and manuscriptpreparation, and NMF conceived the study, contributed to the interpretationof the results and drafted the manuscript. All authors read and approved thefinal manuscript.AcknowledgementsThis work was supported by a grant from the Natural Sciences andEngineering Research Council of Canada to NMF (Discovery Grant-262988), agraduate scholarship to CJG (CGS D3-410243-2011) and funding from theNational Institutes of Health, Bethesda, MD USA (OD011104 support toNational Primate Research Centers) and the Tulane Research EnhancementFund) to ESD. We thank Jean-François Pombert and Patrick Keeling forproviding computational resources; and David Tack for helpful discussionsand assistance with programming.Author details1Biodiversity Research Centre and Department of Botany, University of BritishColumbia, Vancouver, British Columbia, Canada. 2Division of Microbiology,Tulane National Primate Research Center, Covington, LA 70433, USA.3Department of Tropical Medicine, Tulane University School of Public Healthand Tropical Medicine, New Orleans, LA 70112, USA.Received: 15 November 2012 Accepted: 18 March 2013Published: 28 March 2013References1. Corradi N, Pombert JF, Farinelli L, Didier ES, Keeling PJ: The completesequence of the smallest known nuclear genome from themicrosporidian Encephalitozoon intestinalis. Nat Commun 2010, 21(1):77.2. Delbac F, Polonais V: The microsporidian polar tube and its role ininvasion. Subcell Biochem 2008, 47:208–220.3. Katinka MD, Duprat S, Cornillot E, Méténier G, Thomarat F, Prensier G, BarbeV, Peyretaillade E, Brottier P, Wincker P, Delbac F, El Alaoui H, Peyret P,Saurin W, Gouy M, Weissenbach J, Vivarès CP: Genome sequence and genecompaction of the eukaryote parasite Encephalitozoon cuniculi. Nature2001, 414(6862):450–453.4. Wittner M, Weiss LM: The Microsporidia and Microsporidiosis. Washington DC:American Society of Microbiology; 1999.5. Wahl MC, Will CL, Lührmann R: The spliceosome; design principles of adynamic RNA machine. Cell 2009, 136(4):701–718.6. Jurica MS, Moore MJ: Pre-mRNA splicing: awash in a sea of proteins.Molecular Cell 2003, 12:5–14.7. Keeling PJ, Corradi N, Morrison HG, Haag KL, Ebert D, Weiss LM, Akiyoshi DE,Tzipori S: The reduced genome of the parasitic microsporidianEnterocytozoon bieneusi lacks genes for core carbon metabolism. GenomeBiol Evol 2010, 12(2):304–309.8. Cuomo CA, Desjardins CA, Bakowski MA, Goldberg J, Ma AT, Becnel JJ,Didier ES, Fan L, Heiman DI, Levin JZ, Young S, Zeng Q, Troemel ER:Microsporidian genome analysis reveals evolutionary strategies forobligate intracellular growth. Genome Res 2012. Epub.9. Gill EE, Lee RC, Corradi N, Grisdale CJ, Limpright VO, Keeling PJ, Fast NM:Splicing and transcription differ between spore and intracellular lifestages in the parasitic microsporidia. Mol Biol Evol 2010, 27(7):1579–1584.10. Lee RC, Gill EE, Roy SW, Fast NM: Constrained intron structures in amicrosporidian. Mol Biol Evol 2010, 27(9):1979–1982.Grisdale et al. BMC Genomics 2013, 14:207 Page 8 of 9http://www.biomedcentral.com/1471-2164/14/20711. Grisdale CJ, Fast NM: Patterns of 5' untranslated region lengthdistribution in Encephalitozoon cuniculi: implications for gene regulationand potential links between transcription and splicing. J EukaryotMicrobiol 2011, 58(1):68–74.12. Williams BA, Slamovits CH, Patron NJ, Fast NM, Keeling PJ: A highfrequency of overlapping gene expression in compacted eukaryoticgenomes. Proc Natl Acad Sci USA 2005, 102:10936–10941.13. Corradi N, Gangaeva A, Keeling PJ: Comparative profiling of overlappingtranscription in the compacted genomes of microsporidia Antonosporalocustae and Encephalitozoon cuniculi. Genomics 2008, 91:388–393.14. Wilhelm BT, Marguerat S, Goodhead I, Bähler J: Defining transcribedregions using RNA-seq. Nat Protoc 2010, 5(2):255–266.15. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool fortranscriptomics. Nat Rev Genet 2009, 10(1):57–63.16. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: anassessment of technical reproducibility and comparison with geneexpression arrays. Genome Res 2008, 18(9):1509–1517.17. Agarwal A, Koppstein D, Rozowsky J, Sboner A, Habegger L, Hillier LW,Sasidharan R, Reinke V, Waterston RH, Gerstein M: Comparison andcalibration of transcriptome data from RNA-Seq and tiling arrays.BMC Genomics 2010, 11:383.18. Pombert JF, Xu J, Smith DR, Heiman D, Young S, Cuomo CA, Weiss LM,Keeling PJ: Complete genome sequences from three genetically distinctstrains reveal a high intra-species genetic diversity in the microsporidianEncephalitozoon cuniculi. Eukaryotic Cell 2013. Epub.19. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES,Getz G, Mesirov JP: Integrated genomics viewer. Nat Biotechnol 2011,29(1):24–26.20. Henras A, Henry Y, Bousquet-Antonelli C, Noaillac-Depeyre J, Gélugne JP,Caizergues-Ferrer M: Nhp2p and Nop10p are essential for the function ofH/ACA snoRNPs. EMBO J 1998, 17(23):7078–7090.21. Taupin V, Méténier G, Delbac F, Vivarès CP, Prensier G: Expression of twocell wall proteins during the intracellular development ofEncephalitozoon cuniculi: an immunocytochemical and in situhybridization study with ultrathin frozen sections. Parasitology 2006,132(6):815–825.22. Fischer J, Tran D, Juneau R, Hale-Donze H: Kinetics of Encephalitozoon spp.Infection of human macrophages. J. Parasitol. 2008, 94(1):169–175.23. López MD, Rosenblad MA, Samuelsson T: Computational screen forspliceosomal RNA genes aids in defining the phylogenetic distributionof major and minor spliceosomal components. Nucleic Acids Res 2008,36:3001–3010.24. Russel CB, Fraga D, Hinrichsen RD: Extremely short 20-33 nucleotideintrons are the standard length in Paramecium tetraurelia. Nucleic AcidsRes 1994, 22(7):1221–1225.25. Gilson PR, McFadden GI: The miniaturized nuclear genome of eukaryoticendosymbiont contains genes that overlap, genes that arecotranscribed, and the smallest known spliceosomal introns. Proc NatlAcad Sci USA 1996, 93(15):7737–7742.26. Irimia M, Roy SW: Evolutionary convergence on highly-conserved 3'intron structures in intron-poor eukaryotes and insights into theancestral eukaryotic genome. PloS Genet 2008, 4(8):e1000148.27. Irimia M, Penny D, Roy SW: Coevolution of genomic intron number andsplice sites. Trends in Genetics 2007, 23(7):321–325.28. Pombert JF, Selman M, Burki F, Bardell FT, Farinelli L, Solter LF, WhitmanDW, Weiss LM, Corradi N, Keeling PJ: Gain and loss of multiple functionallyrelated, horizontally transferred genes in the reduced genomes of twomicrosporidian parasites. PNAS 2012, 109(31):12638–12643.29. Sorber K, Dimon MT, DeRisi JL: RNA-seq analysis of splicing in Plasmodiumfalciparum uncovers new splice junctions, alternative splicing andsplicing of antisense transcripts. Nucleic Acids Res 2011, 39(9):3820–3825.30. Skelly DA, Ronald J, Connelly CF, Akey JM: Population genomics of intronsplicing in 38 Saccharomyces cerevisiae genome sequences. Genome BiolEvol 2009, 1:466–478.31. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M:The transcriptional landscape of the yeast genome defined by RNAsequencing. Science 2008, 320(5881):1344–1349.32. Bruno VM, Wang Z, Marjani SL, Euskirchen GM, Martin J, Sherlock G, SnyderM: Comprehensive annotation of the transcriptome of the human fungalpathogen Candida albicans using RNA-seq. Genome Research 2010,20(10):1451–1458.33. Spingola M, Grate L, Haussler D, Ares M Jr: Genome-wide bioinformaticsand molecular analysis of introns in Saccharomyces cerevisiae. RNA 1999,5(2):221–234.34. Mitrovich QM, Tuch BB, Guthrie C, Johnson AD: Computational andexperimental approaches double the number of known intron in thepathogenic yeast Candida albicans. Genome Res 2007, 17(4):492–502.35. Li B, Vilardell J, Warner JR: An RNA structure involved in feedbackregulation of splicing and of translation is critical for biological fitness.Proc Natl Acad Sci USA 1996, 93(4):1596–1600.36. Dabeva MD, Warner JR: Ribosomal protein L32 of Saccharomycescerevisiae regulates both splicing and translation of its own transcript.J Biol Chem 1993, 268:19669–19674.37. Fewell SW, Woolford JL Jr: Ribosomal protein S14 of Saccharomycescerevisiae regulates its expression by binding to RPS14B pre-mRNA andto 18S rRNA. Mol Cell Biol 1999, 19:826–834.38. Pleiss JA, Whitworth GB, Bergkessel M, Guthrie C: Rapid, transcript-specificchanges in splicing in response to environmental stress. Mol Cell 2007,27(6):928–937.39. Engebrecht JA, Voelkel-Meiman K, Roeder GS: Meiosis-specific RNA splicingin yeast. Cell 1991, 66(6):1257–1268.40. Gill EE, Fast NM: Stripped-down DNA repair in a highly reduced parasite.BMC Mol Biol 2007, 20(8):24.41. Auer PL, Doerge RW: Statistical design and analysis of RNA sequencingdata. Genetics 2010, 185(2):405–416.42. Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.Genome Biol 2009, 10(3):R25.43. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping andquantifying mammalian transcriptomes by RNA-Seq. Nature Methods2008, 5:621–628.44. Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I,Penkett CJ, Rogers J, Bähler J: Dynamic repertoire of a eukaryotictranscriptome surveyed at single-nucleotide resolution. Nature 2008,453(7199):1239–1243.45. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, AbecasisG, Durbin R, and 1000 Genome Project Data Processing Subgroup: Thesequence alignment/map (SAM) format and SAMtools. Bioinformatics2009, 25(16):2078–2079.46. Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctionswith RNA-Seq. Bioinformatics 2009, 25(9):1105–1111.47. R Development Core Team: R: A language and environment for statisticalcomputing, reference index version 2.14.1. Vienna, Austria: R Foundation forStatistical Computing; 2011. ISBN 3-900051-07-0. http://www.R-project.org/.48. Anders S, Huber W: Differential expression analysis for sequence countdata. Genome Biology 2010, 11:R106.49. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B,Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R,Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, TierneyL, Yang JYH, Zhang J: Bioconductor: open software development forcomputational biology and bioinformatics. Genome biology 2004, 5:R80.50. Python 2.6.2. http://www.python.org.doi:10.1186/1471-2164-14-207Cite this article as: Grisdale et al.: Transcriptome analysis of the parasiteEncephalitozoon cuniculi: an in-depth examination of pre-mRNA splicingin a reduced eukaryote. BMC Genomics 2013 14:207.Grisdale et al. BMC Genomics 2013, 14:207 Page 9 of 9http://www.biomedcentral.com/1471-2164/14/207

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.52383.1-0167812/manifest

Comment

Related Items