UBC Faculty Research and Publications

Analysis of EST data of the marine protist Oxyrrhis marina, an emerging model for alveolate biology and… Lee, Renny; Lai, Hugo; Malik, Shehre B; Saldarriaga, Juan F; Keeling, Patrick J; Slamovits, Claudio H Feb 11, 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2013_Article_5782.pdf [ 1.2MB ]
JSON: 52383-1.0223905.json
JSON-LD: 52383-1.0223905-ld.json
RDF/XML (Pretty): 52383-1.0223905-rdf.xml
RDF/JSON: 52383-1.0223905-rdf.json
Turtle: 52383-1.0223905-turtle.txt
N-Triples: 52383-1.0223905-rdf-ntriples.txt
Original Record: 52383-1.0223905-source.json
Full Text

Full Text

Analysis of EST data of the marine protist Oxyrrhismarina, an emerging model for alveolate biologyand evolutionLee et al.Lee et al. BMC Genomics 2014, 15:122http://www.biomedcentral.com/1471-2164/15/122RESEARCH ARTICLE Open AccessAnalysis of EST data of the marine protist Oxyrrhismarina, an emerging model for alveolate biologyand evolutionRenny Lee2, Hugo Lai2, Shehre Banoo Malik1,2, Juan F Saldarriaga3, Patrick J Keeling1,3 and Claudio H Slamovits1,2*AbstractBackground: The alveolates include a large number of important lineages of protists and algae, among which arethree major eukaryotic groups: ciliates, apicomplexans and dinoflagellates. Collectively alveolates are present invirtually every environment and include a vast diversity of cell shapes, molecular and cellular features and feedingmodes including lifestyles such as phototrophy, phagotrophy/predation and intracellular parasitism, in addition to avariety of symbiotic associations. Oxyrrhis marina is a well-known model for heterotrophic protist biology, and isnow emerging as a useful organism to explore the many changes that occurred during the origin anddiversification of dinoflagellates by virtue of its phylogenetic position at the base of the dinoflagellate tree.Results: We have generated and analysed expressed sequence tag (EST) sequences from the alveolate Oxyrrhismarina in order to shed light on the evolution of a number of dinoflagellate characteristics, especially regarding theemergence of highly unusual genomic features. We found that O. marina harbours extensive gene redundancy,indicating high rates of gene duplication and transcription from multiple genomic loci. In addition, we observed acorrelation between expression level and copy number in several genes, suggesting that copy number maycontribute to determining transcript levels for some genes. Finally, we analyze the genes and predicted products ofthe recently discovered Dinoflagellate Viral Nuclear Protein, and several cases of horizontally acquired genes.Conclusion: The dataset presented here has proven very valuable for studying this important group of protists. Ouranalysis indicates that gene redundancy is a pervasive feature of dinoflagellate genomes, thus the mechanismsinvolved in its generation must have arisen early in the evolution of the group.Keywords: Dinoflagellates, Alveolates, Chromatin, Genome, OxyrrhisBackgroundThe dinoflagellate Oxyrrhis marina is emerging as apopular model to study many aspects of heterotrophicprotist biology including ecophysiology, behaviour, dis-tribution and dispersal, swimming, motility as well asvarious aspects of cellular and nuclear biology [1]. Cru-cially, O. marina is well suited to explore the origins andthe unusual characteristics of two important groups ofprotists, dinoflagellates and apicomplexans. In this re-gard, Oxyrrhis represents an early branch within thedinoflagellate lineage. Its phylogenetic position has nowbeen securely established as radiating close to the separ-ation between apicomplexans and ‘crown’ dinoflagellatesbut after the oyster parasite Perkinsus marinus [2,3]. Thestatus of Oxyrrhis as a dinoflagellate is not unanimousamong protistologists [4,5] but the basis for including itin the group, albeit as a divergent early representativeare sound [5]. Regardless the preferred taxonomic treat-ment, Oxyrrhis offers a unique perspective to understandthe evolution of these fascinating protists.Dinoflagellates are known for their highly divergentfeatures, such as expansive genomes, an unusual karyo-kinetic process and a very atypical chromatin structure,unique among eukaryotes [6-10]. Apicomplexans, on theother hand, exhibit some contrasting features such as ahighly developed specialization for intracellular parasitism.* Correspondence: cslamo@dal.ca1Canadian Institute for Advanced Research, Program in Integrated MicrobialBiodiversity, Alberta, Canada2Department of Biochemistry and Molecular Biology, Dalhousie University,B3H4R2 Halifax, NS, CanadaFull list of author information is available at the end of the article© 2014 Lee et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly credited.Lee et al. BMC Genomics 2014, 15:122http://www.biomedcentral.com/1471-2164/15/122Both groups have unusual organellar genomes, character-ized by gene loss or transfer to the nucleus and unusualgenomic architecture. Compared to many heterotrophs,O. marina is a robust organism that is easy to maintain inthe laboratory; it grows fast and has flexible nutritionalrequirements [11,12]. These advantages explain in partwhy O. marina is a fashionable model organism, but lackof molecular data has been a severe limitation to the scopeof questions that can be addressed with this species.Over the last few years, we have carried out severalstudies using a dataset of expressed sequence tags (EST)from strain CCMP1788 of O. marina, which was used toaddressed several specific questions on plastid evolution,lateral gene transfer, the structure of the mitochondrialgenome and others [3,13-15]. Specifically, this datasethas revealed that at least eight genes are likely to havebeen inherited from a plastid-bearing ancestor whilesome of them showed strong signal of being related togenes from peridinin-containing dinoflagellates and api-complexans [13], supporting the idea that the apicoplastand the photosynthetic plastid of dinoflagellates share anorigin [13,16-18]. It has also revealed well-supported ex-amples of horizontal gene transfer (HGT) [14,15], one ofwhich involved the acquisition of rhodopsin proteins,which may have important functional implications [14].Finally, EST data allowed a comprehensive characterizationof the mitochondrial genome of O. marina, providing valu-able insight into the complicated scenario of the evolutionof these organelles in alveolates [3,19,20]. These exampleshighlight the value of the data generated by the O. marinaEST project. More recently, Lowe et al. published atranscriptomic analysis of O. marina isolate 44-PLY01(Plymouth Harbour, UK) based on 454 pyrosequencing,which constitutes the first attempt to use massively parallelDNA sequencing on this species [21]. Here we report theanalysis of the full EST dataset, which is now available in itsentirety in public databases, and give a general overview ofthe nature of the genes encoded in the O. marina genome,with particular discussion on the evolution of the nucleargenome and chromatin architecture.MethodsStrain, cultivation and EST library constructionOxyrrhis marina strain CCMP 1788 was cultivated inDroop’s Ox-7 medium at the Bigelow Laboratory for OceanSciences (formerly CCMP). 20 L of culture was harvestedin a continuous-flow centrifuge and stored in Trizol reagent(Invitrogen, Carlsbad, CA). Total RNA was preparedin 20 ml batches according to the manufacturer’s di-rections, resulting in 2 μg of total RNA. A directionalcDNA library from polyadenylated RNA was con-structed in pBluescript II SK using EcoR1 and XhoI sites(Amplicon Express, Pullman, WA, USA), and shown tocontain 5.3×105 cfu. 23,702 clones were picked and 5′-endsequenced using Sanger capillary sequencers (NationalResearch Council, Halifax, NS, Canada). Quality controland vector trimming resulting in 18,012 EST sequences(deposited into GenBank EST database with accessionnumbers EG729650-EG747671) that assembled into 9,876unique clusters using tbESTdb [22]. The clustering methodimplemented in tbESTdb is based on the phred/phrapalgorithms [23] and ensures high discriminatory power toidentify closely related paralogues and distinct gene copies[22]. The clusters were further examined manually usingGeneious Pro versions 5 and 6 (Biomatters, Auckland,New Zealand) to assess quality. Sequences shorter than200 bases were discarded because we observed a largeproportion of low-quality and low-complexity, resultingin a filtered dataset of 8,141 sequences.Annotation and functional classificationSequences in the final dataset were searched against theNCBI non-redundant (nr) protein database using BLASTNand BLASTX to identify and annotate rRNA genes andprotein coding genes, respectively. BLASTX was run twotimes, both with the default parameters except the cut-offE-value, which was set to ≤ 1e-10 and ≤ 1e-5 for each of thetwo BLASTX sessions. For clusters not yielding a hit in thefirst search (≤ 1e-10), we examined their hits in the secondsearch set (≤ 1e-5) individually to distinguish spurious anduseful matches. BLAST searches were done using Koriblast3.0 (Korilog SARL, Questembert, France). The top matchfor each sequence was kept and the taxonomic affinityrecorded for each entry. High-level taxonomic assignmentwas done manually. Assignment of functional categoriesand gene ontology to the top BLASTX hits was donewith Blast2go [24]. Various sequence analyses and ma-nipulations involving sequence alignments, conceptualtranslation, protein sequence examinations and calculations(e.g. molecular weight, isoelectric point) were carriedout with Geneious v5.6 and v6. In addition, clusters withno hits to known proteins were searched for Pfam domainswith Blast2Go (Additional file 1: Table S1).Identification of meiotic componentsConserved proteins identified in lists of DNA repairand recombination proteins from the genome projectsof Homo sapiens [25,26], Saccharomyces cerevisiae(http://db.yeastgenome.org), Trypanosoma brucei andT. cruzi [27,28], Trichomonas vaginalis [29] and refs.[30-33] were used to search a local database of inferredproteins from the genome sequences of Cryptosporidiumparvum, Toxoplasma gondii (the smallest and largestsequenced apicomplexan genomes, respectively) and theoyster parasite Perkinsus marinus by batch BLASTP withan e-value cutoff of 1e-1. Similarly, C. parvum, H. sapiensand S. cerevisiae protein homologs were used to query theapicomplexan Ascogregarina taiwanensis genome surveyLee et al. BMC Genomics 2014, 15:122 Page 2 of 16http://www.biomedcentral.com/1471-2164/15/122sequence [34] by batch tBLASTn with an e-value cutoffof 1e-1. C. parvum, T. gondii, P. marinus, H. sapiens orS. cerevisiae inferred proteins were used to search theO. marina ESTs by tBLASTn with an e-value cutoff of 1e-1,and the identity of the best sequence hit(s) in O. marinaverified by BLASTx against GenBank’s non-redundant data-base. Phylogenetic analyses of individual candidate proteinswere conducted with PhyML (http://www.atgc-montpellier.fr/phyml/) [35] using the amino acid conceptual trans-lations aligned with MAFFT [36]. The maximum likeli-hood trees were built using the LG substitution modelwith invariant sites and 8 γ-distributed substitutionrate categories. Node support was assessed with 1,000bootstrapping replicates.Real-time PCR estimation of relative expressionRNA from O. marina cultures was extracted and purifiedwith Aurum Total RNA mini kit (Bio-Rad, Hercules, CA)and cDNA was produced using Superscript III reverse tran-scriptase (Invitrogen, Carlsbad, CA). cDNA was quantifiedwith a Qubit 2.0 fluorometer (Life Technologies, Carlsbad,CA). Sets of primers were designed for O. marina TVP1,Actin, Tubulin and Proteorhodopsin genes using the pro-gram PrimerSelect (DNASTAR, Inc., Madison, WI, USA)with its default parameters for real-time PCR. Tests forpromiscuous binding were done using Blastn. Primerswere ordered from Integrated DNA Technologies Inc.,(Coralville, IA, USA). A full list of primer sequencesand characteristics can be found in Additional file 1:Table S2. Real-time PCR was performed with a CFX96instrument (Bio-Rad) and iQ SYBR Green Supermix(Bio-Rad). Expression level was expressed in relativeunits by entering the Ct value into the standard curvesprepared for each gene as described [37].Results and discussionESTs and assembled clustersAlthough now superseded by ultra-high throughputsequencing methods, cloning based cDNA library construc-tion followed by directional Sanger sequencing remains apowerful way to analyse the gene complement of an organ-ism because high quality, long read sequence offers the pos-sibility to obtain full-length sequences of individual clones.We used this method to conduct a genome-wide survey ofexpressed genes in O. marina, aiming to shed light on someaspects of alveolate biology and evolution.A total of 23,702 clones were sequenced from the 5’end, of which 18,012 remained after quality filtering andvector trimming. The ESTs were assembled using thetbESTdb pipeline [22], resulting in 9,876 unique clus-ters (unigenes, [38]). Visual inspection of the clustersrevealed that a large number of the short clusters werelow complexity repeats, thus we decided to discard allsequences shorter than 200 bases in order to prioritizethe quality of the data, resulting in a final set of 8,141clusters. The size distribution of ESTs is bimodal, with apeak between 650 and 750 bases and another between 50and 150 bases (not shown), suggesting that substantial ofdegradation in the RNA sample took place.Taxonomic distribution of blast hitsWe used NCBI Blast to assign putative functional iden-tity by similarity with the aid of Koriblast and Blast2go.Searching with the Blastn algorithm against NCBI’snon-redundant nucleotide database (nr) identified 6clusters matching rRNA genes: four clusters correspond topieces of the fragmented mitochondrial rRNA genes, whichhave already been analysed [3], one to nuclear small subunitRNA (SSU), and two to the nuclear large subunit RNAgene (LSU) (Table 1). Both SSU and LSU transcripts arevery abundant compared to most RNA species in thesample, with 88 and 172 ESTs, respectively.Having excluded rRNA genes and sequences shorter than200 bases, we conducted Blastx searches against the nrdatabase with a cutoff E value set to e ≤ 1x-05. The searchproduced 4,515 sequences with matches. Examining align-ments with lower Blast scores, we noticed that many werefalse positive results due to spurious similarity between re-peats in the clusters and low-complexity protein sequencesin the database, so we set a stricter cutoff at E ≤ 1x-10,resulting in 4,222 hits. Of these, 633 corresponded tobacteria, 30 to viruses and 14 were similar to membersof Archaea (Figure 1). The remaining 3,545 positive hitswere similar to eukaryotic sequences. Figure 1 showsthe distribution of the eukaryotic hits by taxonomicgroups: alveolates make up about one third of the total(1,385 clusters), then opisthokonts (metazoan and fungi)with 1,116 and Archaeplastida (plants, green and red algae)with 561. The remaining fraction was composed ofStramenopiles (e.g. diatoms, brown algae, oomycetes),excavates (e.g. euglenids, kinetoplastids, parabasalia),haptophytes, cryptophytes and cercozoans. WithinTable 1 Identity of the O. marina EST clusters encodingmitochondrial transcripts and nuclear ribosomal RNA genesGene Clusters Total ESTscox1 1 339cob-cox3 fusion 1 130LSU-E 1 14LSU-E 1 53LSU-rna10 1 18LSU-G 1 1Nuclear SSU 1 88Nuclear LSU 2 172For each type, the number of distinct clusters and the number of ESTs perclusters is indicated.Lee et al. BMC Genomics 2014, 15:122 Page 3 of 16http://www.biomedcentral.com/1471-2164/15/122alveolates, most top hits were from dinoflagellates(884) followed by apicomplexans (389) and ciliates (112).Not surprisingly, the largest part of the dinoflagellate hitscomes from a single species, the oyster parasite Perkinsusmarinus with (the genus Perkinsus is often classified in itsown Phylum Perkinsozoa [4,39]). P. marinus is the onlymember of the dinoflagellate lineage with a genome pro-ject at an advanced stage (at the time of writing a pre-sumably complete set of genes have been annotated anddeposited in Genbank, but the analysis has not yet beenpublished). (Note added during revision: a draft gen-ome project of Symbiodinium sp. was published re-cently [40]). Dinoflagellates are still comparativelyunderrepresented in databases, especially in the pro-tein databases as most sequences produced so far havebeen deposited as ESTs. Hits to apicomplexan species arenot as conspicuous as one would expect considering thatApicomplexa is the sister taxon to dinoflagellates and thereare about ten complete genomes from apicomplexanparasites in public databases. This could be a conse-quence of the high degree of specialization that charac-terizes the phylum Apicomplexa, typically exhibiting highdivergence of protein sequences and heavy gene loss. Se-quences with top Blast similarity to animals and fungi col-lectively (i.e. Opisthokonta) represent a similar fraction asalveolates (Figure 1). This set of genes probably repre-sents a core of well-conserved and ubiquitouseukaryotic genes of very deep ancestry, a class of genesthat usually exhibits little correlation between Blast simi-larity and phylogenetic affinity (and also likely reflects thelarge number of animal and fungal genomes that are avail-able). The remaining one-third of Blast hits correspond toplants, green and red algae, stramenopiles and excavates(Figure 1). In part, these assignments are probably due tosimilar factors as those from animals and fungi, but couldalso include sequences with different evolutionary histor-ies for at least two reasons. First, all alveolates, or at leastthe clade conformed by apicomplexans and dinoflagellates[17] descend from plastid-harbouring ancestors and assuch, their nuclei contain many genes derived from thatancient photosynthetic endosymbiont that may be con-tributing to the hits to archaeplastids and stramenopiles.Eight genes likely to be inherited from a plastid from thissample have been reported in a previous paper [13]. On theother hand, however, if these genes were really derived fromthe endosymbiont, they might be expected to be found inciliate genomes as well, but the evidence for this is still con-troversial [41-43]. Second, O. marina is a voracious preda-tor and its diverse menu includes mainly green and redalgae as well as many stramenopiles and haptophytes. Con-tinuous repeated exposure to prey could have resulted in anumber of genes being transferred and integrated into thenuclear genome [44].Overall, roughly half of the sequences had no signifi-cant matches to known proteins deposited in the nrdatabase. Clearly, the fraction of sequences with noBlastx matches in GenPep depends on the abundance ofsequences from related organisms in the database andon how atypical is the organism in its gene content anddegree of sequence divergence. As more genomes in agroup of related organisms are sequenced and anno-tated, the first factor becomes less relevant and the frac-tion of unmatched sequences decreases. Figure 2 showsthe distribution of top Blast hits when we searchedNCBI’s “est-others” database using the O. marina ESTsthat returned no significant hits in the previous search.The preponderance of plants and animals simply reflectsthe largely biased composition of the database, whereasthe next fraction in abundance corresponds to dinoflagel-lates in spite of the comparatively low representation ofthese protists in the database. The fraction of unmatchedBlast sequences in O. marina at the E ≤ 1xe-10 thresh-old is 47%, which is in line with other estimates: 53%Cerco zoa (1) Cryptop hyta (2) Haptop hyta (14) Excavata(192) St ramenopiles(274) Archeplastida(561) Opisto konta (1116) Alveolata (1385) Apicomplexa(389) Cilipoho ra (112) Dinoflagellate(884) Distribution of hits to Eukaryota (3545)Eukaryotes (3545)Bacteria (633)Virus (30)Archaea (14)Blastx  by Domain of LifeFigure 1 Taxonomic distribution of the top blastx hits of 4,222EST clusters (this set excludes clusters with no hits or spurioushits at the ≤ 1e-5 level). Top: Number of hits by domain of life.Bottom: Distribution of eukaryotic top hits by high-order taxonomy.Lee et al. BMC Genomics 2014, 15:122 Page 4 of 16http://www.biomedcentral.com/1471-2164/15/122for Alexandrium catenella [45], 71% for Karenia brevis[46], 63% for a combined set from Amoebophrya sp.and Karlodinium veneficum [47] and 72% for A. minutum[48]. Variation is due to a mix of factors, including non-standardized criteria to determine cutoff, potential biasesfrom library construction and sample size and increasinglybetter representation of dinoflagellate genes in the data-bases. We also compared our EST dataset to the transcrip-tomic data generated by 454 pyrosequencing by Lowe et al.[21]. Even though the number of clusters (contigs) inboth studies is roughly similar (ca. 8,000), the overlap insequence identity was very low. Only 23% of our ESTdataset (sequences 200 bp or longer) had one or moreblastn hits (E ≤ 1xe-5) among the 7,398 sequences in the454 dataset. While the two samples come from closelyrelated organisms, we also used tblastx to compare atthe predicted amino acid level in case divergence at thenucleotide level was too high. This time the overlap in-creased slightly to 29%.Bacterial sequencesThe ESTs from O. marina contains 633 sequences forwhich the top Blast hit is bacterial. Many of those prob-ably reflect sampling artifacts that result in misleadingtop Blast hits and others might represent genes with anevolutionary origin different from the O. marina nucleus(i.e. horizontal and endosymbiotic gene transfer). Wealso detected a sizeable fraction with very high similarityto alpha-proteobacteria of the order Rhodobacterales,mainly from the genus Oceanicaulis. We examined thesesequences closely and they do not appear to be derivedfrom genuine, polyadenylated mRNA. First, they areextremely similar, often identical to the annotated genomeof the bacterium O. alexandrii. Second, the orientation ofthe coding sequence appears to be equally distributedbetween forward and reverse, and third, some individualESTs even contain portions of genes that are adjacentin the O. alexandrii genome. We conclude that thesesequences are most likely derived from a contaminationwith bacterial DNA during the library construction.O. alexandrii has been isolated from cultures of thedinoflagellate Alexandrium and several Rhodobacteralesare known to coexist with dinoflagellates, even as sym-biotic partners [49-53]. Likely, symbiotic bacteria thatresist methods to generate axenic cultures accompanythe O. marina culture. We observed that O. marina cellsthat had been grown with antibiotics still contain tightlyassociated bacteria (Figure 3). Overall, the O. alexandriiBlast matches accounted for 82 of the genes with tophits to bacteria (13%, not shown), leaving a sizablenumber of genes, many of which could still come fromsymbionts but others may have resulted from HGT.Unfortunately there are no available data to confirmthis possibility, since our ESTs typically do not includethe spliced leader sequence characteristic of the 5′ endof dinoflagellate genomes [46,54].Extensive gene redundancyIn many cases, ESTs apparently encoding the same genewere assembled as separate clusters of highly similar se-quence (albeit below the strict threshold for assembly).Close examination of the raw files revealed that this isnot due to sequencing errors or low quality, but to thepresence of genuinely distinct copies of many genes, insome cases over 40. Multi-copy genes have been describedpreviously in dinoflagellates, often as tandems of as manyas 5,000 adjacent copies [55,56] and recently, hints fordisperse arrangements affecting many genes have beenreported by sequencing [21,57] and fluorescent in situhybridization [58] techniques. The pyrosequencing-based transcriptomic study by Lowe et al. detected anumber of redundant genes, including tandem arrange-ments [21], but we did not find extensive overlap inmulticopy genes between both studies: only three genes(hsp70, hsp90 and S-adenosyl-methionine synthetase)in our list from Table 2 were also found to be redundantin the Lowe et al. analysis.Accumulating evidence suggests that prevalence ofmulticopy genes in dinoflagellates is more common thanin other organisms [57,59-62], but a thorough assessmentof the prevalence of gene redundancy in one organism islacking. In our sample, among the sequences with positiveBlast hits, 422 were represented by two or more distinctclusters, 64 sequences by 4 or more clusters and 11 geneswere represented by 10 to 42 clusters (Table 2). SinceApicomplexans 27 (3%) Ciliates 2 (0%) Dinoflagellates 138 (16%) Stramenopiles 34 (4%) Haptophytes 37 (4%) Cryptophytes 1 (0%) Plants 270 (30%) Chlorophytes 16 (2%) Rhodophytes 7 (1%) Animals 265 (30%) Choanoflagellates 4 (0%) Fungi 54 (6%) Amoebozoans 14 (2%) Excavates 15 (2%) Cercozoans 1 (0%) UNIKONTS CHROMALVEOLATESARCHAEPLASTIDSFigure 2 EST clusters of O. marina with no hits to NCBI’snon-redundant protein database were searched against theNCBI’s ‘est-others’ database using tblastx (compares translatedamino acid sequences using nucleotide queries and database).The pie chart shows the taxonomic distribution of the top hits(≤ 1e-5). Number of hits and percentage are shown. The centralcircle indicates major eukaryotic groupings of the portions in theexternal circle.Lee et al. BMC Genomics 2014, 15:122 Page 5 of 16http://www.biomedcentral.com/1471-2164/15/122ESTs are clustered only if they are highly similar, differentclusters with the same blast hits probably representdifferent genomic loci, but the opposite may or may notbe true: genes present in multiple but identical copiescannot be recognized by this approach, since all ESTsoriginating from all the units will cluster in a single con-tig. Tandem arrangements of identical copies have beendescribed in dinoflagellates, therefore they may occur inO. marina as well. In fact, Lowe et al. found evidence oftandemly arranged genes encoding Beta and Alpha tubulin,EF-2, Rhodopsin and HSP90 with intergenic spacersranging between 200 and 400 bp [21].In contrast to the protein-coding genes, we found thatexpressed nuclear rRNA genes are highly homogeneous:all 88 ESTs from small subunit cluster to a single contig,whereas the 172 ESTs from the large subunit cluster to2 contigs (Table 1). Long operons of identical or nearlyidentical units of rRNA genes are very common ineukaryotic genomes, and the high level of similarityamong the copies has been attributed to gene conversionand other mechanisms that result in concerted evolu-tion [63]. The sequence heterogeneity observed in manyprotein-coding genes may reflect particular genomicconditions that prevent them from achieving or main-taining homogeneity. However, concerted evolutiondepends on genomic and biological factors such as thespatial distribution of the repeated genes (i.e. whethertandemly arranged or scattered) and the frequency ofsomatic and meiotic recombination, all features almostvirtually unknown in dinoflagellates.Levels of gene expression and gene variantsWe observed large variations in the number of ESTs percluster (Tables 2 and 3). As reported previously, the mostabundant corresponded to mitochondrial coxI, whilemitochondrial cob-coxIII fusion was also among the topwith 130 ESTs ([3], Table 1). Of the nucleus-encodedgenes, proteorhodopsin was the most highly expressed([14]). Aside for housekeeping genes typically highlyexpressed in eukaryotic cells (hsp90, actin, tubulins, EFL,ribosomal proteins), the list of most abundant ESTs ispopulated almost exclusively by metabolic proteins,suggesting that at the time of harvesting the cells werein active metabolic state (Additional file 2: Figure S1).Notable among the metabolic genes are alcohol dehydro-genase and the glyoxylate cycle enzyme isocitrate lyase, sug-gesting active utilization of 2C molecules such as ethanolor acetate as carbon sources. Interestingly, another highlyexpressed gene is Gpr1/Fun34/YaaH, whose product isessential for acetate permease activity in Aspergillus andits mutation trigger hypersensitivity to acetic acid inyeast [64,65]. When grown with prey food, O. marinacells behave voraciously and one individual can easilyingest three or four prey cells, which in the case of thegreen alga Dunaliella tertiolecta, are about half the size ofan O. marina cell. Consistently, we observed evidence ofintense protein degradation activity in the form of highexpression levels of cysteine protease 1 (132 ESTs, Table 3).ABCFigure 3 Oxyrrhis marina cells coexist with bacteria, probablyin some type of symbiotic relationship. The figure showsO. marina cells from antibiotic-treated cultures in close relationshipswith unidentified bacteria. A, B: Scanning electron micrographsshowing the oral region of O. marina with rod-shaped bacteriaattached. C: Fluorescence in-situ hybridization of an O. marina cellusing a fluorescent probe that recognizes a conserved region of bacterialsmall subunit (16S) ribosomal RNA genes following the standard protocoldescribed in http://www.arb-silva.de/fish-probes/fish-protocols/. Blue:DAPI-stained nucleus of O. marina; red dots: bacteria.Lee et al. BMC Genomics 2014, 15:122 Page 6 of 16http://www.biomedcentral.com/1471-2164/15/122Three enzymes of the S-adenosyl-L-homocysteine metab-olism are also highly expressed: adensyl homocysteinehydrolase, adenosyl methionine synthetase and adenosylhomocysteinase (Table 3), which participate in severalmetabolic pathways, mainly the synthesis of adenosine,methionine and cysteine.At face value, these numbers suggest that some genesare relatively highly expressed; however to test whetherthis has any correspondence to mRNA levels in the cell,we conducted real-time quantitative PCR (qPCR) on asample of genes. Specifically, mRNA levels of four genes,proteorhodopsin (PR), TVP1, alpha tubulin (Atub) andactin were quantified from O. marina culture grown insimilar conditions as the original culture. In coincidencewith the estimation from EST abundance, PR and TVP1were first and second in the qPCR estimation with 50,000and 18,100 relative units, respectively (not shown). ForAtub and actin 4,200 and 1,000 relative units were esti-mated, respectively. Albeit preliminarily, this test showscorrespondence between the number of ESTs for a givengene and the qPCR estimation of gene expression, butconclusions based on this evidence must be taken withcaution until a more rigorous experiment is conducted.There are many factors, both biological and technicalthat could explain differences between EST abundanceand qPCR estimation [21,66]. In this case, the multicopystructure of the genome may constitute an additionalcomplication since qPCR primers pick only a restrictedsample of the mRNAs encoding a particular type of pro-tein, resulting in potentially large sampling errors.Moreover, it is now emerging that in dinoflagellatesgene expression is largely modulated posttranscriptionally[61,62,67-69]. If so, transcriptional regulation may havelittle relationship to the protein levels, and increasingTable 2 Identity inferred as top BlastX hits of the O. marina genes with largest numbers of distinct clustersHit accession Clusters Total Ests Definition SpeciesAAO14677 42 240 Proteorhodopsin Pyrocystis lunulaABV72550 25 87 DVNP Heterocapsa triquetraZP_07751542 23 152 GPR1/FUN34/yaaH family protein Mucilaginibacter paludisBAE79387 13 41 Actin Symbiodinium sp. CS-156ABV22332 12 132 Cysteine protease 1 Noctiluca scintillansXP_002506839 12 61 Acetate-coa ligase Micromonas sp. RCC299XP_002500277 11 27 Acetyltransferase-like/FAD linked oxidase Micromonas sp. RCC299AAM02973 10 90 Heat shock protein 70 Crypthecodinium cohniiXP_002775922 10 27 Succinate dehydrogenase, putative Perkinsus marinusXP_002780969 10 25 Heterogeneous nuclear ribonucleoprotein, putative Perkinsus marinusACI12882 9 122 NAD-dependent alcohol dehydrogenase Euglena gracilisCBJ30560 8 20 Glutathione S-transferase Ectocarpus siliculosusXP_002786250 8 8 40S ribosomal protein S9, putative Perkinsus marinusAAW79379 7 36 Fumarate reductase Heterocapsa triquetraXP_002787701 7 35 14-3-3 protein, putative Perkinsus marinusZP_06799564 7 8 Hypothetical protein Mycobacterium tuberculosisABI14419 6 39 Heat shock protein 90 Karlodinium micrumAAV71134 6 23 Cytosolic class II fructose bisphosphate aldolase Heterocapsa triquetraABF22754 6 16 Mitochondrial cytochrome c oxidase subunit 2b Karlodinium micrumACA60905 6 15 Gag-pol polyprotein Thalassiosira pseudonanaZP_07985860 6 12 ATP-dependent DNA helicase Streptomyces sp.YP_002500649 6 6 Peptidase C14, caspase catalytic subunit p20 Methylobacterium nodulansXP_002765341 5 46 S-adenosylmethionine synthetase, putative Perkinsus marinusABG56231 5 38 Translation elongation factor-like protein Karlodinium micrumXP_001763482 5 23 Acetyl-CoA synthetase Physcomitrella patensXP_002184734 5 22 Predicted protein Phaeodactylum tricornutumXP_002769616 5 21 Conserved hypothetical protein Perkinsus marinusNP_001068397 5 17 Hypothetical protein Oryza sativaAAG01128 5 17 Hypothetical protein Solanum lycopersicumLee et al. BMC Genomics 2014, 15:122 Page 7 of 16http://www.biomedcentral.com/1471-2164/15/122Table 3 Identity inferred as top BlastX hits of the O. marina genes with largest numbers of ESTsHit accession Clusters Ests Definition SpeciesAAO14677 42 240 Proteorhodopsin Pyrocystis lunulaZP_07751542 23 152 GPR1/FUN34/yaaH family protein Mucilaginibacter paludisABV22332 12 132 Cysteine protease 1 Noctiluca scintillansACI12882 9 122 NAD-dependent alcohol dehydrogenase Euglena gracilisAAM02973 10 90 Heat shock protein 70 Crypthecodinium cohniiABV72550 25 87 DVNP Heterocapsa triquetraZP_01726360 2 63 Aldehyde dehydrogenase Cyanothece sp.XP_002950429 3 62 S-Adenosyl homocysteine hydrolase Volvox carteriXP_002506839 12 61 Acetate-coa ligase Micromonas sp.XP_002765341 5 46 S-adenosylmethionine synthetase, putative Perkinsus marinusXP_002784353 3 43 H + -translocating inorganic pyrophosphatase TVP1, Perkinsus marinusYP_130418 3 42 L-lactate permease Photobacterium profundumBAE79387 13 41 Actin Symbiodinium sp.ABI14419 6 39 Heat shock protein 90 Karlodinium micrumABG56231 5 38 Translation elongation factor-like protein Karlodinium micrumAAW79379 7 36 Fumarate reductase Heterocapsa triquetraXP_002787701 7 35 14-3-3 protein, putative Perkinsus marinusXP_002766754 2 30 40S ribosomal protein S11, putative Perkinsus marinusXP_002500277 11 27 Acetyltransferase-like/FAD linked oxidase Micromonas sp. RCC299XP_002775922 10 27 Succinate dehydrogenase, putative Perkinsus marinusABD46571 4 27 Alcohol dehydrogenase-like protein Euglena gracilisXP_002786429 4 27 Osmotic growth protein, putative / Fumarate reductase Perkinsus marinusXP_002904993 2 26 Isocitrate lyase Phytophthora infestansXP_002780969 10 25 Heterogeneous nuclear ribonucleoprotein, putative Perkinsus marinusZP_06800645 3 25 Heat shock protein Mycobacterium tuberculosisXP_002773236 1 25 Ribonucleotide reductase small subunit, putative Perkinsus marinusABV22229 3 24 ATP/ADP translocator Karlodinium micrumYP_638223 3 24 Nucleotide-diphosphate-sugar epimerase/NmrA family protein Mycobacterium sp. MCSAAV71134 6 23 Cytosolic class II fructose bisphosphate aldolase Heterocapsa triquetraXP_001763482 5 23 Acetyl-CoA synthetase Physcomitrella patensXP_001638515 3 23 No hitsCBX99834 2 23 Similar to cytochrome b2 Leptosphaeria maculansXP_002786953 2 23 Tubulin alpha chain, putative Perkinsus marinusXP_002184734 5 22 Predicted protein Phaeodactylum tricornutumABF61766 3 22 Chloroplast 3-dehydroquinate synthase/O-methyltransferase Heterocapsa triquetraXP_002788505 3 22 Hypothetical protein Perkinsus marinusABU52986 2 22 Beta-tubulin Karenia brevisXP_002911883 9 21 Hypothetical protein Coprinopsis cinereaXP_002769616 5 21 Conserved hypothetical protein Perkinsus marinusXP_002780466 3 21 2-methylcitrate synthase, putative Perkinsus marinusCBJ30560 8 20 Glutathione S-transferase Ectocarpus siliculosusXP_666127 3 20 Ribosomal protein L5A Cryptosporidium hominisAAN31463 1 20 Glutamine synthetase Phytophthora infestansACV41934 4 19 No hitsLee et al. BMC Genomics 2014, 15:122 Page 8 of 16http://www.biomedcentral.com/1471-2164/15/122the number of functional transcriptional units could bean alternative way to maintain high levels of mRNA ofcertain genes. In our data, highly expressed genes tendto exhibit more distinct variants (Tables 2 and 3), raisingthe intriguing possibility that dinoflagellates modulatebaseline expression levels by, at least in part, increasing thenumber of copies of the gene instead of (or in addition to)adjusting transcription levels.Functional categorization of the O. marina ESTsDNA repair and meiosisWe identified 27 O. marina transcripts homologous togenes conserved in humans, yeast, and other protiststhat were functionally linked to the recognition and re-pair of damaged DNA in model animals or fungi [25,26](Additional file 1: Table S3). These include components ofthe excision repair machinery, DNA double-strand breakrepair by homologous recombination (HR), editing andprocessing nucleases (EPN), post-replication repair (PRR),chromatin structure (CS), the DNA damage checkpoint(DDC), DNA replication licensing (DRL), and DNAdamage response (DDR). Excision repair protein homologsencoded by O. marina include (i) break excision repair(BER) poly (ADP-ribose) polymerase PARP2 that protectssingle-strand DNA interruptions, (ii) mismatch repair(MMR) protein Mlh1, a mutL homolog also requiredfor meiotic crossovers, and for (iii) nucleotide excisionrepair (NER), replication factor A (RFA1) that binds tosites of DNA damage, and the XPD/ERCC2 5′-3′ helicasethat helps unwind the pre-incision intermediate. Homologsof DNA polymerase catalytic subunits delta, epsilon andPCNA, employed in MMR and NER, were also identifiedin O. marina. Components of the HR machinery includethe SbcD 3′ exonuclease homolog Mre11, RecA re-combinase homolog Rad51, Brca1, a sister chromatidcohesin subunit (Smc3), homologous condensin subunitsSmc2 and Smc4, and meiosis-specific Hop2 and Spo11-2(Additional file 2: Figure S1 and Additional file 3:Figure S2). Conserved homologs of a flap endonuclease(FEN1), the DNA damage response and checkpoint signal-ing machinery (Suc1, Rad17, Chk1, Chk2) and the DNAreplication licensing complex (Mcm3, Mcm5 and Mcm7)are also encoded by O. marina. Components of the post-replication repair Rad6 pathway (Rad6A, Rad6B) arepresent. Proteins involved in chromatin structure suchas BLM, RecQ helicases are also identified.These findings indicate that O. marina encodes con-served components of several eukaryotic recombinationand repair pathways, except for non-homologous endjoining (NHEJ). Since the proteins encoded by these genesinteract together with other conserved DNA repair proteinswhere studied in other eukaryotes, we expect that add-itional O. marina genomic or transcriptomic data willreveal homologous genes encoding other key DNA repairand recombination proteins, including additional membersof the ERCC, XRCC and Rad52 epistasis groups, additionalMutL and MutS homologs involved in mismatch repair,and more meiosis-specific homologs.To date, a single report on sexual reproduction has beenpublished for O. marina [70,71]. Based on observations ofsmall cells presumed to be gametes, the paper claims thatO. marina cells engage in sexual reproduction, but no datasupport the occurrence of meiotic division [72]. Moreover,even the ploidy status and most details about the life cycleof O. marina are poorly known [58,72]. Since O. marinaencodes meiosis-specific Spo11-2 and Hop2 genes, weexpect other “core meiotic genes” [73,74] not yet detectedmight also be present. Since Spo11-2 and Mre11 genes arepresent, we expect to find Rad50, since Rad50 and Mre11act together in other eukaryotes to remove Spo11 fromDNA ends in meiosis and also process DNA ends duringmitotic HR. Since Hop2 is present, meiosis-specific Mnd1and Dmc1 homologs might also be encoded, since in othereukaryotes Hop2 and Mnd1 form a complex that interactsTable 3 Identity inferred as top BlastX hits of the O. marina genes with largest numbers of ESTs (Continued)ACJ13434 3 19 Adenosylhomocysteinase Amphidinium carteraeXP_002766763 2 19 Protein TIS11, putative Perkinsus marinusXP_001612035 2 18 Conserved hypothetical protein Babesia bovisNP_001068397 5 17 Hypothetical protein Oryza sativaAAG01128 5 17 Hypothetical protein Solanum lycopersicumAAX27763 4 17 Hypothetical protein Toxoplasma gondiiABI13175 2 17 Asparaginyl endopeptidase Emiliania huxleyiXP_002776404 1 17 Methylenetetrahydrofolate reductase, putative Perkinsus marinusABI14188 1 17 ADP-ribosylation factor Pfiesteria piscicidaABF22754 6 16 Mitochondrial cytochrome c oxidase subunit 2b Karlodinium micrumXP_002765511 2 16 40S ribosomal protein S3a, putative Perkinsus marinusXP_002772672 2 16 Vacuolar ATP synthase subunit b, putative Perkinsus marinusLee et al. BMC Genomics 2014, 15:122 Page 9 of 16http://www.biomedcentral.com/1471-2164/15/122with Dmc1 in interhomolog strand exchange. Presence ofthese pieces of the conserved meiotic machinery indicatesthat meiosis is indeed part of the life cycle of O. marina, al-though probably in a very inconspicuous way. Possibly theconditions in which we generated the RNA (i.e. exponentialgrowth) favours asexual reproduction, hence the paucity ofmeiosis-related genes in our sample. This also means thatthe life cycle must include diploid (or polyploidy) stages.Chromatin architecture and remodelingDinoflagellates have long been known as ‘rule breakers’because they present exceptions to many well-establishedrules of eukaryotic cell biology. For example, numerouslines of evidence suggest that typical nucleosomal organ-isation of the chromatin is absent in dinoflagellates, andwhat histones remain do not function in the same capacityas in other eukaryotes [8,9]. Instead, the chromatin appearsto be rich in basic proteins but the way in which nuclearDNA and proteins interact is still unknown [6,7,10,75,76].This raises the fundamental question of the involvementof chromatin organisation for transcriptional regulation indinoflagellates. Gene regulation through chromatin re-modeling is a ubiquitous eukaryotic feature that exhibitsvariations but the essential aspects are presumed to bepresent in all eukaryotes. Evidence that histone genes arepresent and indeed expressed in dinoflagellates is startingto emerge, suggesting that these proteins probably playsome role in chromatin organisation [75,77]. Since chro-matin organisation at the molecular level appears to betypical in Perkinsus, the most basal lineage of the dinofla-gellate tree for which genomic data is available, the datafrom O. marina could provide valuable hints on the earlystages of the transformations leading to the unusual na-ture of the dinoflagellate chromatin. We looked for evi-dence of histones and chromatin remodeling sequencesin O. marina and found no clear histone homologues;neither the typical eukaryotic nor the histone-like pro-teins of bacterial origin that have been reported inCrypthecodinium [78]. We did find one sequence withhigh similarity (E < 1x10-48) to a histone deacetylase ofthe AcuC/AphA family and another with similarity toSir2, another conserved histone deacetylase of the Sirtuinfamily involved in epigenetic silencing. Even if we assumethat histones should be present as suggested by recentfindings on other species, failure to find transcripts in oursample is not surprising given that their evidence hasremained elusive in several other studies and when found,histone transcripts binned among the lowly expressedgenes. In animals and plants, replication-dependent his-tone transcripts are not polyadenylated, and in yeasts, thelength of the polyA tail of histones varies with the stage ofthe cell cycle. Difficulty in detecting histone transcripts indinoflagellates may also reflect the existence of similarmechanisms of transcriptional regulation involving shortor absence of polyA tails. A recent study in the parasiticdinoflagellate Hematodinium sp. described a novel proteinnamed DVNP (for Dinoflagellate Viral Nuclear Protein),which appears to be a main basic protein found in thechromatin [75]. In the study, micrococcal nuclease diges-tion of intact Hematodinium chromatin failed to yield thetypical nucleosomal band pattern on an agarose gel, unlikeP. marinus, which yield the 180 bp ladder expected frompartial nucleosomal DNA digestion [75]. The concomitantpresence of these two features in Hematodinium andnot in Perkinsus suggests that the loss of nucleosomalorganization of the nuclear DNA is somehow relatedto the replacement of histones by DVNP as the mainbasic nuclear protein [75]. Interestingly, sequenceswith high similarity to DVNP were also found amongESTs of different dinoflagellates, including O. marina.On this information, we searched exhaustively our dataand found 25 clusters with high similarity to DVNP(Tables 2 and 3). Of these, we analysed the amino acidtranslations of the 20 sequences that encompassed thecomplete protein (Figure 4). The proteins ranged between134 and 142 amino acids in length and had a mean isoelec-tric point of 12.73, indicating a strong basic character. Thepredicted mean molecular weight was 14.8 KDa. The pro-teins exhibit secondary structure features similar to thosefound by Gornik et al. [75] in the Hematodinium DVNP:an alpha helix of variable length encompassing the firsthalf of the protein followed by a ‘helix-turn-helix’ region(Figure 4). The O. marina DVNP sequences are predictedto have nuclear localization signals (NLS), a feature alsofound in the Hematodinium proteins [75].As DVNP appear to be well established in O. marina,they must have taken their present role prior to the splitbetween O. marina and the core dinoflagellates, but afterthe split of P. marinus [75]. Very likely, DVNP is the trueidentity of Np23, the major basic nuclear protein detectedpreviously in nuclear extracts of O. marina cells [79]. Not-withstanding, the presence of histone deacetylase genessuggests that histones and other associated factors are stillfunctional in dinoflagellates, therefore it cannot be ruledout that at least part of the genome is arranged withthe canonical nucleosomal organisation. Clearly morecomprehensive genomic sequencing and molecular biologyexperiments must be done in order to determine whatother conserved elements of chromatin and epigeneticregulation are involved in these protists.Transcription and RNA processingWhile the amount of dinoflagellate sequence data isincreasing, the components of gene regulation in thegroup have largely been left unexplored, despite therecent claims that much of the regulation of dinoflagel-late genes is controlled at the post-transcriptional level[67,68,80,81]. However, documenting the presence ofLee et al. BMC Genomics 2014, 15:122 Page 10 of 16http://www.biomedcentral.com/1471-2164/15/122the canonical (or better-studied) regulatory pathwaysis also needed to determine the complexity of generegulation. For instance, mRNA splicing and transcriptionare two broad categories that may be valuable to dissect indinoflagellates. Splicing is of particular interest becausethere is extremely little information of introns and their fea-tures in the group, despite its large genome sizes. Further,the idea that all dinoflagellate mRNAs are trans-splicedwith a leader sequence [46,54] adds an additional layer ofcomplexity to splicing and gene regulation.We searched the O. marina clusters for the majormRNA splicing and transcription components using aconsolidated list of 256 splicing proteins and 228 tran-scription proteins. A component was considered presentif the O. marina cluster also wasn’t identified via blast-ing from a different conserved splicing or transcriptionalcomponent. Additional file 1: Table S4 lists the genesinvolved in splicing or transcription with matches inour O. marina dataset, and their functions. In addition,potential genes of interest in this category contain Pfamdomains for RNA recognition and/or binding and varioustypes of DNA-binding domains such as zinc fingers andknuckles (Additional file 3: Figure S2). From the subset ofsplicing genes identified, the majority of them are eitherassociated with the U6 or U2 snRNPs – the spliceosomeparts that recognize the intron in the initial steps onsplicing [82]. These include the Sm and Sm-like (Lsm)proteins. Curiously, the U6 snRNP is also the majorspliceosome component that is known to participate inleader trans-splicing [83], so over-representation of U6snRNP components may suggest elevated expressiondue to involvement in trans-splicing. Prp46 and Cwc2,also identified, are members of the Nineteen Complex(NTC) that acts in the first major step of splicing. Notidentified in this EST survey includes the most con-served splicing protein (Prp8), although its particularconstituent domains were (data not shown). Lengthlimitations of the ESTs may also have prohibited theclear identification of the major transcription factorsand the RNA polymerases involved in transcription – onlyauxiliary transcriptional components were identified, withmany of those participating in processes unrelated tomRNA transcription.RetroelementsEvidence for active retroelements that could provide reversetranscriptase (RT) activity could help us understand someof the unusual characteristics of dinoflagellate genomes[84,85]. In particular, endogenous RT would be necessaryfor a hypothesis that dinoflagellate mRNAs are frequentlyretrotranscribed into dsDNA and integrated into the gen-ome [84]. This process would result in the creation of largenumbers of retrogenes that accumulate in the genome, andmay partially explain the large numbers of highly expressedgenes. This model is supported by the presence of ‘relic’spliced leaders (rSL) immediately after the SL sequence thatcaps every mRNA at the 5′ end. The rSL appear in asizeable fraction of mRNAs from several species [60,84]and are thought to be remnants of previous events ofprocessing and recycling. The most common sources ofendogenous RT activity in eukaryotic cells are LTR andnon-LTR retrotransposons and telomerases, thereforewe searched our EST data for sequences with similarity toknown RT proteins but also to other features found inLTR and non-LTR transposons. Table 4 shows 14 clustersthat were found to have similarity (E value < 1.10-10) toretrotransposons, all matching different regions ofLTR-transposons belonging to a single class, known asTy1/copia. In addition, clusters unidentifiable by Blastx butcontaining Pfam domains related to TN functions are listedin the Additional file 1: Table S1. Ty1/copia is one ofthe two main types of LTR-retrotransposons, is ubiqui-tous in eukaryotes and has been most widely studied inplan genomes. LTR-transposons are capable of mobilizationvia a ‘copy-and-paste’ mechanism involving transcription ofthe element and making DNA copies by an RT proteinFigure 4 Graphic alignment of twenty full-length variants of the nuclear protein DVNP from O. marina (OML) with 13 DVNP sequencesfrom Hematodinium sp. The alignment shows strong sequence and structural conservation, although not at the level observed in typicaleukaryotic histones. A schematic representation of the predicted structural features based on the consensus sequence is shown at the top. Thepattern of a long alpha helix followed by a ‘helix-turn-helix’ terminal motif is generally conserved among all the variants.Lee et al. BMC Genomics 2014, 15:122 Page 11 of 16http://www.biomedcentral.com/1471-2164/15/122encoded by the transcript itself. The replicative activity ofretrotransposons is hindered most of the time to avoid thedeleterious effects of their proliferation. This is achievedepigenetically by methylation of certain regions of the LTRthat otherwise act as promoters, but under certain condi-tions, transcription is unleashed allowing the elements toreplicate and proliferate. Uncontrolled bursts of retrotrans-poson activity can result in occupying large proportions ofgenomes in short periods of time, events that and arethought to have played and a vital role in the organisationand evolution of eukaryotic genomes. Outside these epi-sodic and apparently rare events of proliferation, transcriptsof retrotransposons occur at very low levels, if at all. Sincethe level of expression of retrotransposons may be a strongpredictor of active transposition, we wondered if thetranscripts we found in our data are indication thattransposition is ongoing in O. marina. Unfortunately, itis very difficult to make meaningful comparisons withexpression data from other organisms in absolute termsbecause every transposon-genome system is differentand there are no comparative analysis done. Instead, wecan compare the expression of the O. marina elements(as revealed by the number of ESTs) relative to othergenes. Ty1/copia clusters in O. marina are expressed atlow to moderate levels (1 to 20 ESTs per cluster), butcollectively they add up to 97 ESTs (Table 4). If thenumber of ESTs representing a gene in the sample evenas a rough approximation of its relative level of expres-sion, Ty1/copia element is among the top in transcriptabundance (compare to Table 3) and it would be reason-able to speculate that RT proteins are present. This is aninteresting possibility because it would lend support to thehypothesized role of mRNA recycling in dinoflagellategenome evolution by showing that retrotransposons canbe a suitable source of RT activity [84,85].Lateral gene transferLateral gene transfer (LGT) has already been reportedfrom O. marina, including the acquisition of bacterial AroB[15] and proteorhodopsin genes at least two times inde-pendently [14]. Detection of LGT in eukaryotic sequencedata is not always as clear-cut, however, since confoundingfactors like sequence divergence, incomplete taxon sam-pling and a convoluted evolutionary history, or the presenceof contaminating bacteria in the culture can complicate theinterpretation of the phylogenies of suspected LGT cases.When we looked for potential LGTcases in our data we en-countered combinations of all these factors, in particularthe presence of about 600 bacterial sequences, presumablyoriginating from contaminant DNA. While most of thesesequences can be readily identified as contaminants, theyundermine the level of certainty of potential LGT. In spiteof this we have identified two additional genes with a con-flicting phylogenetic signal suggesting horizontal acquisitionfrom bacteria. Both cases share an intriguing pattern of be-ing closely associated to an unrelated eukaryote but beyondthat, embedded among bacteria (similar to a growing num-ber of other cases of LGT, see [86]). Additional file 4: FigureS3 shows a phylogenetic analysis of the deduced proteinsequence of cluster OML00001921 along with 2 sequencesfrom diatoms, one from the ichthyosporean Sphaeroformaarctica and 46 eubacterial sequences of L-lactate permease(LctP). The O. marina sequence forms a strongly sup-ported node with the other three eukaryotic sequences(100% bootstrap), which in turn is connected to variousproteobacteria, mainly involving the subgroups gammaTable 4 O. marina EST clusters with top Blastx hits corresponding to known transposable elementsCluster ID Top hit Acc. Top hit E-value ESTsOML00000073 XP_002422173 Hypothetical protein FOXB_16913 Fusarium oxysporum 8.00E-27 7OML00000330 ABA95820 Hypothetical protein FOXB_16913 Fusarium oxysporum 3.00E-38 20OML00002280 ACB59199 Copia-like protein [Brassica oleracea] 7.00E-15 3OML00002762 EFY94000 Retrotransposon like protein [Metarhizium anisopliae ARSEF 23] 8.00E-12 3OML00002917 ABF93649 Retrotransposon protein Ty1-copia subclass [Oryza sativa Japonica group] 5.00E-18 9OML00002925 CAB46043 Retrotransposon like protein [Arabidopsis thaliana] 4.00E-16 5OML00004005 BAB01972 Copia-like retrotransposable element [Arabidopsis thaliana] 2.00E-10 12OML00004886 AAD32898 Putative retroelement pol polyprotein [Arabidopsis thaliana] 1.00E-16 1OML00005041 AAP46257 Putative polyprotein [Oryza sativa Japonica Group] 8.00E-15 4OML00006583 BAB01972 Copia-like retrotransposable element [Arabidopsis thaliana] 6.00E-12 8OML00009617 XP_003376336 Retrovirus-related Pol polyprotein from transposon TNT 1-94 [Trichinella spiralis] 2.00E-13 11OML00010486 EGU73258 Hypothetical protein FOXB_16913 Fusarium oxysporum 2.00E-21 6OML00010490 BAB01972 Copia-like retrotransposable element [Arabidopsis thaliana] 3.00E-12 5OML00010499 ABA95820 Retrotransposon protein, putative, unclassified [Oryza sativa Japonica Group] 1.00E-21 3Total 97Lee et al. BMC Genomics 2014, 15:122 Page 12 of 16http://www.biomedcentral.com/1471-2164/15/122and delta. The LctP protein catalyses the transport ofL-lactate across membranes. It has been suited func-tionally only in a few species of bacteria, most notablyin E. coli, where this gene is part of an operon involvedin L-lactate utilisation [87]. In eukaryotes, members ofthe monocarboxylate transporter family (MCT) catalyzethe proton-linked transport of monocarboxylates suchas L-lactate, pyruvate, and the ketone bodies across theplasma membrane. Since LctP does not seem to have anyrelationship with the MCT family, and no other eukaryoticorganisms were found to contain LctP-related sequences,an ancient, common origin of the four eukaryotic se-quences shown in Additional file 4: Figure S3 is unlikely.At the same time, the fact that the eukaryotic sequencesbranch together to the exclusion of everything else makesit intriguing. If the lctP genes were acquired independ-ently by O. marina, the diatoms and the ichthyosporeanS. arctica, they have been transferred from the same orvery similar donors. Alternatively, the gene may have beentransferred from bacteria to one eukaryotic lineage, andthen transferred between eukaryotes [86]. The second caseis shown in Figure 5. Several clusters were found to behighly similar to alcohol dehydrogenase proteins (ADH)that seem to be absent from other eukaryotes except forone species, Euglena gracilis. Alcohol dehydrogenases be-long in a very large superfamily of ancient origin knownas MDR (medium-chain dehydrogenase/reductase) andis formed by zinc-dependent ADHs, quinone reductases,and many more families and subfamilies [88]. The mostrecent comprehensive study of MDR sequences identifiedover 500 families that can be ascribed to MDR superfam-ily, 8 of which are highly widespread with ADH being thelargest [88]. In addition, the study recognized other 9families of restricted scope but of special interest fortheir functions or potential relevance. The ADH se-quences from O. marina are most similar to one familyfrom this group of additional “special interest” MDRfamilies tentatively named BurkDH family because ofits prevalence among Burkholderia species and severalother genera of proteobacteria including Pseudomonas,Brucella, Ralstonia and Rhizobium (Figure 5). Surprisingly,only one eukaryotic protein sequence can be found inGenbank that belongs in this family but it is from E. gracilis,a phototrophic freshwater protist completely unrelatedFigure 5 Schematic phylogenetic tree of amino acid sequences of Zinc-dependent Alcohol dehydrogenase proteins includingrepresentatives from the main bacterial lineages and E. gracilis and O. marina (highlighted in a black box), which are the only eukaryoticorganisms for which homologs have been detected. The numbers at the nodes indicate bootstrap support when higher than 50%.Lee et al. BMC Genomics 2014, 15:122 Page 13 of 16http://www.biomedcentral.com/1471-2164/15/122to O. marina. Unfortunately we are unable to make in-ferences as to the adaptive roles of this acquisition byO. marina and E. gracilis as no complete genome dataare available for neither organism, therefore we do notknow if other MDR paralogs are present. In addition,the function of BurkDH proteins has not yet been investi-gated [88]. These two cases illustrate puzzling scenariosthat result of great interest for understanding the evolution-ary dynamics of metabolic adaptation but are in turn diffi-cult to interpret in the context of the current data. Clearlythese cases will have to be reanalyzed when more,comprehensive from free-living protists (in these cases,dinoflagellates, euglenids, haptophytes) are available.ConclusionsOur EST dataset from O. marina has so far yielded in-teresting insights into the evolution, genetics, phylogenyand metabolism of this species and dinoflagellates at large.Here we tapped on this valuable dataset to conductadditional investigations, this time concentrating ongenes and molecular characteristics associated to nuclearand genomic biology, which is an area where dinoflagellatesare particularly unusual. We describe several gene categor-ies and show that O. marina contains many of the typicallywidespread components that comprise DNA repair, andgene expression, suggesting that in spite of the seeminglyhighly divergent nature of dinoflagellate nuclear processes,they still maintain many of the core eukaryotic mecha-nisms. Moreover, we find extensive gene redundancy andmultiplicity, indicating transcription from multiple genomicloci. For some of the most highly represented transcripts,we estimate multiple genomic copies suggesting a positivecorrelation between transcript abundance and genomiccopy number, which may be a generalized dinoflagellatefeature. Extending on previous findings, we described twostriking examples of lateral gene transfer, reinforcing theidea that acquisition of foreign genes plays an importantrole, in shaping the O. marina genome and furthersupporting the role of this phenomenon in adaptationon eukaryotes, particularly heterotrophic protists.Additional filesAdditional file 1: Table S1. PFAM domains found among theO. marina EST clusters with no hits to known proteins. Table S2- Primersequences, melting temperature Tm (oC), and insert length (bp) for fourO. marina genes. Table S3 (next page): O. marina encodes DNA repair andrecombination proteins conserved in other eukaryotes. Homologs ofcomponents of the machinery for base excision repair (BER), mismatchrepair (MMR), nucleotide excision repair (NER), homologous recombination(HR), meiosis-specific homologous recombination (HR-M1), DNA polymerasesubunits involved in repair (DNAP), editing and processing nucleases (EPN),post-replication repair (PRR), chromatin structure relevant to repair (CS), theDNA damage checkpoint (DDC), DNA replication licencing (DRL), and DNAdamage response (DDR) are present in O. marina. Data identified in thecomplete genome sequences of humans (H. sapiens), yeast (S. cerevisiae),kinetoplastids (T. brucei), parabasalids (T. vaginalis), apicomplexans(T. gondii, C. parvum, and genome sequence survey of A. taiwanensis),and a dinoflagellate (P. marinus) is compared with the O. marina ESTs.Table S4: The O. marina EST dataset contains a number of sequenceswith hits to proteins involved in transcriptional regulation and splicing.Listed below are eciprocal hits with a database built with curated proteinsfrom H. sapiens and S. cerevisiae.Additional file 2: Figure S1. O. marina encodes orthologs ofmeiosis-specific recombination genes. Aligned amino acid sites wereanalyzed by PhyML with an invarying and 8 γ-distributed substitution ratecategories and the LG substitution model. Numbers at the nodes indicate %bootstrap support (≥ 50%) from 1000 replicates. O. marina Spo11 is closelyrelated to apicomplexan Spo11-2. 218 sites, LnL = –10981.3.Additional file 3: Figure S2. O. marina encodes orthologs ofmeiosis-specific recombination genes. Aligned amino acid sites wereanalyzed by PhyML with an invarying and 8 γ-distributed substitution ratecategories and the LG substitution model. Numbers at the nodes indicate% bootstrap support (≥ 50%) from 1000 replicates. O. marina Hop2 ismost closely related to its ortholog in Perkinsus marinus, within thealveolates. 172 sites, LnL = –6417.0.Additional file 4: Figure S3. Phylogeny of representative L-lactatepermease LctP proteins indicates that O. marina lctP is most closelyrelated to lctP in diatoms and an icthyosporean (S. arctica), which arederived from a clade of marine bacterial lctP homologs. 502 amino acidsites were analyzed by PhyML with an invarying and 8 γ-distributedsubstitution rate categories and the LG substitution model. Numbers atthe nodes indicate % support (≥ 50%) from 1000 bootstrap replicates.LnL = – 23076.2. No other eukaryotic homologs were identified by BLASTpsearches of the JGI, Broad Institute, or NCBI non-redundant databases, norby tBLASTn searches of dbEST-others, with an e-value cutoff of 1. Some ofthe highly similar Neisseria and Haemophilus orthologous protein sequenceswere excluded from the phylogeny shown here.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsRL analysed data and drafted the paper, SBM conducted analysis of meiosisgenes and lateral gene transfer and drafted sections of the paper, HLconducted real-time PCR experiments, JFS contributed to the datageneration and analyses, PJK provided the cultures and data generation anddrafted the paper, CHS conducted analyses, provided supervision and wrotethe paper. All authors read and approved the final manuscript.AcknowledgementsWe thank Susana Breglia for the scanning electron micrographs depicted inFigure 3. C.H.S. and P.J.K are Fellows of the Canadian Institute for AdvancedResearch, Program on Integrated Microbial Biodiversity. S-B.M was supportedby a CIFAR Global Academy fellowship. EST sequencing was done throughthe Protist EST Project (PEP) funded by Genome Canada and GenomeAtlantic. P.J.K Is supported by NSERC (Discovery grant 227301); C.H.S. issupported by NSERC (Discovery Grant 386345-2010).Author details1Canadian Institute for Advanced Research, Program in Integrated MicrobialBiodiversity, Alberta, Canada. 2Department of Biochemistry and MolecularBiology, Dalhousie University, B3H4R2 Halifax, NS, Canada. 3BotanyDepartment, University of British Columbia, V6T1Z4 Vancouver, BS, Canada.Received: 19 May 2013 Accepted: 6 February 2014Published: 11 February 2014References1. Montagnes DJS, Lowe CD, Roberts EC, Breckels MN, Boakes DE, Davidson K,Keeling PJ, Slamovits CH, Steinke M, Yang Z, et al: An introduction to thespecial issue: oxyrrhis marina, a model organism? J Plankton Res 2011,33(4):549–554.2. Saldarriaga JF, McEwan ML, Fast NM, Taylor FJ, Keeling PJ: Multiple proteinphylogenies show that Oxyrrhis marina and Perkinsus marinus are earlyLee et al. BMC Genomics 2014, 15:122 Page 14 of 16http://www.biomedcentral.com/1471-2164/15/122branches of the dinoflagellate lineage. Int J Syst Evol Microbiol 2003,53(Pt 1):355–365.3. Slamovits CH, Saldarriaga JF, Larocque A, Keeling PJ: The highly reduced andfragmented mitochondrial genome of the early-branching dinoflagellateOxyrrhis marina shares characteristics with both apicomplexan anddinoflagellate mitochondrial genomes. J Mol Biol 2007, 372(2):356–368.4. Adl SM, Simpson AG, Lane CE, Lukes J, Bass D, Bowser SS, Brown MW, Burki F,Dunthorn M, Hampl V, et al: The revised classification of eukaryotes. J EukMicro 2012, 59(5):429–493.5. Lowe CD, Keeling PJ, Martin LE, Slamovits CH, Watts PC, Montagnes DJS:Who is Oxyrrhis marina? Morphological and phylogenetic studies on anunusual dinoflagellate. J Plankton Res 2011, 33(4):555–567.6. Moreno Diaz de la Espina S, Alverca E, Cuadrado A, Franca S: Organizationof the genome and gene expression in a nuclear environment lackinghistones and nucleosomes: the amazing dinoflagellates. Eur J Cell Biol2005, 84(2-3):137–149.7. Rizzo PJ: Biochemistry of the dinoflagellate nucleus. In The biology ofdinoflagellates. Blackwell Botanical Monographs, Blackwell Publishing,Oxford, UK: Taylor FJR (Ed.); 1987:143–173.8. Rizzo PJ: The enigma of the dinoflagellate chromosomes. J Protozool1991, 38:246–252.9. Rizzo PJ: Those amazing dinoflagellate chromosomes. Cell Res 2003,13(4):215–217.10. Spector DL: Dinoflagellate nuclei. In Dinoflagellates. Edited by Spector DL.New York: Academic Press; 1984:107–147.11. Droop MR: Nutritional investigations of phagotrophic protozoa underaxenic conditions. Helgolander Wiss Meeresunters 1970, 20:272–277.12. Lowe CD, Martin LE, Watts PC, et al: Isolation and culturing strategies forthe maintenance of Oxyrrhis marina. J Plankton Res 2011, 33(4):569–578.13. Slamovits CH, Keeling PJ: Plastid-derived genes in the nonphotosyntheticalveolate Oxyrrhis marina. Mol Biol Evol 2008, 25(7):1297–1306.14. Slamovits CH, Okamoto N, Burri L, James ER, Keeling PJ: A bacterialproteorhodopsin proton pump in marine eukaryotes. Nat Commun2011, 2:183.15. Waller RF, Slamovits CH, Keeling PJ: Lateral gene transfer of a multigeneregion from cyanobacteria to dinoflagellates resulting in a novelplastid-targeted fusion protein. Mol Biol Evol 2006, 23(7):1437–1443.16. Fernandez-Robledo JA, Schott EJ, Vasta GR: Perkinsus marinus superoxidedismutase 2 (PmSOD2) localizes to single-membrane subcellularcompartments. Biochem Biophys Res Commun 2008, 375(2):215–219.17. Janouskovec J, Horak A, Obornik M, Lukes J, Keeling PJ: A common redalgal origin of the apicomplexan, dinoflagellate, and heterokont plastids.Proc Natl Acad Sci U S A 2010, 107(24):10949–10954.18. Sanchez-Puerta MV, Lippmeier JC, Apt KE, Delwiche CF: Plastid genes in anon-photosynthetic dinoflagellate. Protist 2007, 158(1):105–117.19. Nash EA, Nisbet RE, Barbrook AC, Howe CJ: Dinoflagellates: amitochondrial genome all at sea. Trends Genet 2008, 24(7):328–335.20. Waller RF, Jackson CJ: Dinoflagellate mitochondrial genomes: stretchingthe rules of molecular biology. Bioessays 2009, 31(2):237–245.21. Lowe CD, Mello LV, Samatar N, Martin LE, Montagnes DJ, Watts PC: Thetranscriptome of the novel dinoflagellate Oxyrrhis marina(Alveolata: Dinophyceae): response to salinity examined by 454sequencing. BMC Genomics 2011, 12:519.22. O’Brien EA, Koski LB, Zhang Y, Yang L, Wang E, Gray MW, Burger G, Lang BF:TBestDB: a taxonomically broad database of expressed sequence tags(ESTs). Nucleic Acids Res 2007, 35(Database issue):D445–D451.23. Ewing B, Green P: Base-calling of automated sequencer traces usingphred: II: error probabilities. Genome Res 1998, 8(3):186–194.24. Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M,Talon M, Dopazo J, Conesa A: High-throughput functional annotation and datamining with the Blast2GO suite. Nucleic Acids Res 2008, 36(10):3420–3435.25. Wood RD, Mitchell M, Lindahl T: Human DNA repair genes, 2005. MutatRes 2005, 577(1-2):275–283.26. Wood RD, Mitchell M, Sgouros J, Lindahl T: Human DNA repair genes.Science 2001, 291(5507):1284–1289.27. Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, BartholomeuDC, Lennard NJ, Caler E, Hamlin NE, Haas B, et al: The genome of the Africantrypanosome Trypanosoma brucei. Science 2005, 309(5733):416–422.28. El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, Aggarwal G, CalerE, Renauld H, Worthey EA, Hertz-Fowler C, et al: Comparative genomics oftrypanosomatid parasitic protozoa. Science 2005, 309(5733):404–409.29. Carlton JM, Hirt RP, Silva JC, Delcher AL, Schatz M, Zhao Q, Wortman JR, BidwellSL, Alsmark UC, Besteiro S, et al: Draft genome sequence of the sexuallytransmitted pathogen Trichomonas vaginalis. Science 2007,315(5809):207–212.30. Malik SB, Pightling AW, Stefaniak LM, Schurko AM, Logsdon JM Jr: Anexpanded inventory of conserved meiotic genes provides evidence forsex in Trichomonas vaginalis. PLoS One 2008, 3(8):e2879.31. Blanton HL, Radford SJ, McMahan S, Kearney HM, Ibrahim JG, Sekelsky J:REC, Drosophila MCM8, drives formation of meiotic crossovers.PLoS Genet 2005, 1(3):e40.32. Cobbe N, Heck MM: The evolution of SMC proteins: phylogenetic analysisand structural implications. Mol Biol Evol 2004, 21(2):332–347.33. Melo J, Toczyski D: A unified view of the DNA-damage checkpoint.Curr Opin Cell Biol 2002, 14(2):237–245.34. Templeton TJ, Enomoto S, Chen WJ, Huang CG, Lancto CA, AbrahamsenMS, Zhu G: A genome-sequence survey for Ascogregarina taiwanensissupports evolutionary affiliation but metabolic diversity between aGregarine and Cryptosporidium. Mol Biol Evol 2010, 27(2):235–248.35. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: Newalgorithms and methods to estimate maximum-likelihood phylogenies:assessing the performance of PhyML 3.0. Syst Biol 2010, 59(3):307–321.36. Katoh K, Toh H: Parallelization of the MAFFT multiple sequencealignment program. Bioinformatics 2010, 26(15):1899–1900.37. Larionov A, Krause A, Miller W: A standard curve based method forrelative real time PCR data processing. BMC Bioinformatics 2005, 6:62.38. Baxevanis AD, Ouellette BF: Bioinformatics: A Practical Guide to the Analysis ofGenes and Proteins. Hoboken, HJ. USA: John Wiley & Sons, Inc; 2005.39. Mangot JF, Debroas D, Domaizon I: Perkinsozoa, a well-known marineprotozoan flagellate parasite group, newly identified in lacustrinesystems: a review. Hydrobiologia 2011, 659(1):37–48.40. Shoguchi E, Shinzato C, Kawashima T, Gyoja F, Mungpakdee S, Koyanagi R,Takeuchi T, Hisata K, Tanaka M, Fujiwara M, et al: Draft assembly of theSymbiodinium minutum nuclear genome reveals dinoflagellate genestructure. Curr Biol: CB 2013, 23(15):1399–1408.41. Bodyl A, Stiller JW, Mackiewicz P: Chromalveolate plastids: direct descentor multiple endosymbioses? Trends Ecol Evol 2009, 24(3):119–121. authorreply 121-112.42. Burki F, Flegontov P, Obornik M, Cihlar J, Pain A, Lukes J, Keeling PJ:Re-evaluating the green versus red signal in eukaryotes with secondaryplastid of red algal origin. Genome Biol Evol 2012, 4(6):626–635.43. Keeling PJ: The endosymbiotic origin, diversification and fate of plastids.Philos Trans R Soc Lond B Biol Sci 2010, 365(1541):729–748.44. Doolittle WF: You are what you eat: a gene transfer ratchet couldaccount for bacterial genes in eukaryotic nuclear genomes. Trends Genet1998, 14(8):307–311.45. Toulza E, Shin MS, Blanc G, Audic S, Laabir M, Collos Y, Claverie JM, Grzebyk D:Gene expression in proliferating cells of the dinoflagellate Alexandriumcatenella (Dinophyceae). Appl Environ Microbiol 2010, 76(13):4521–4529.46. Lidie KB, Van Dolah FM: Spliced leader RNA-mediated trans-splicing in adinoflagellate: Karenia brevis. J Euk Micro 2007, 54(5):427–435.47. Bachvaroff TR, Place AR, Coats DW: Expressed sequence tags fromAmoebophrya sp. infecting Karlodinium veneficum: comparing host andparasite sequences. J Euk Micro 2009, 56(6):531–541.48. Yang I, John U, Beszteri S, Glockner G, Krock B, Goesmann A, Cembella AD:Comparative gene expression in toxic versus non-toxic strains of themarine dinoflagellate Alexandrium minutum. BMC Genomics 2010, 11:248.49. Oh HM, Kang I, Vergin KL, Lee K, Giovannoni SJ, Cho JC: Genome sequenceof Oceanicaulis sp. strain HTCC2633, isolated from the Western SargassoSea. J Bacteriol 2011, 193(1):317–318.50. Strompl C, Hold GL, Lunsdorf H, Graham J, Gallacher S, Abraham WR, MooreER, Timmis KN: Oceanicaulis alexandrii gen. nov., sp. nov., a novelstalked bacterium isolated from a culture of the dinoflagellateAlexandrium tamarense (Lebour) Balech. Int J Syst Evol Microbiol 2003,53(Pt 6):1901–1906.51. Alavi M, Miller T, Erlandson K, Schneider R, Belas R: Bacterial communityassociated with Pfiesteria-like dinoflagellate cultures. Environ Microbiol2001, 3(6):380–396.52. Biegala IC, Kennaway G, Alverca E, Lennon JF, Vaulot D, Simon N: Identificationof bacteria associated with dinoflagellates (Dinophyceae) Alexandrium spp.using tyramide signal amplification-fluorescent in situ hybridizationand confocal microscopy. J Phycol 2002, 38(2):404–411.Lee et al. BMC Genomics 2014, 15:122 Page 15 of 16http://www.biomedcentral.com/1471-2164/15/12253. Jasti S, Sieracki ME, Poulton NJ, Giewat MW, Rooney-Varga JN: Phylogeneticdiversity and specificity of bacteria closely associated withAlexandrium spp. and other phytoplankton. Appl Environ Microbiol 2005,71(7):3483–3494.54. Zhang H, Hou Y, Miranda L, Campbell DA, Sturm NR, Gaasterland T, Lin S:Spliced leader RNA trans-splicing in dinoflagellates. Proc Natl Acad SciU S A 2007, 104(11):4618–4623.55. Bertomeu T, Morse D: Isolation of a dinoflagellate mitotic cyclin byfunctional complementation in yeast. Biochem Biophys Res Commun 2004,323(4):1172–1183.56. Lee DH, Mittag M, Sczekan S, Morse D, Hastings JW: Molecular cloning andgenomic organization of a gene for luciferin-binding protein from thedinoflagellate Gonyaulax polyedra. J Biol Chem 1993, 268(12):8842–8850.57. Bachvaroff TR, Place AR: From stop to start: tandem gene arrangement,copy number and trans-splicing sites in the dinoflagellate Amphidiniumcarterae. PLoS One 2008, 3(8):e2929.58. Sano J, Kato KH: Localization and copy number of the protein-codinggenes actin, alpha-tubulin, and HSP90 in the nucleus of a primitivedinoflagellate. Oxyrrhis Marina Zoolog Sci 2009, 26(11):745–753.59. Hou Y, Lin S: Distinct gene number-genome size relationships foreukaryotes and non-eukaryotes: gene content estimation fordinoflagellate genomes. PLoS One 2009, 4(9):e6978.60. Jaeckisch N, Yang I, Wohlrab S, Glockner G, Kroymann J, Vogel H, CembellaA, John U: Comparative genomic and transcriptomic characterization ofthe toxigenic marine dinoflagellate alexandrium ostenfeldii.PLoS One 2011, 6(12):e28012.61. Morey JS, Monroe EA, Kinney AL, Beal M, Johnson JG, Hitchcock GL, VanDolah FM: Transcriptomic response of the red tide dinoflagellate, Kareniabrevis, to nitrogen and phosphorus depletion and addition.BMC Genomics 2011, 12(1):346.62. Moustafa A, Evans AN, Kulis DM, Hackett JD, Erdner DL, Anderson DM,Bhattacharya D: Transcriptome profiling of a toxic dinoflagellate reveals agene-rich protist and a potential impact on gene expression due tobacterial presence. PLoS One 2010, 5(3):e9688.63. Ganley AR, Kobayashi T: Highly efficient concerted evolution in theribosomal DNA repeats: total rDNA repeat variation revealed bywhole-genome shotgun sequence data. Genome Res 2007, 17(2):184–191.64. Robellet X, Flipphi M, Pegot S, Maccabe AP, Velot C: AcpA, a member ofthe GPR1/FUN34/YaaH membrane protein family, is essential for acetatepermease activity in the hyphal fungus Aspergillus nidulans. Biochem J2008, 412(3):485–493.65. Gentsch M, Kuschel M, Schlegel S, Barth G: Mutations at different sites inmembers of the Gpr1/Fun34/YaaH protein family cause hypersensitivityto acetic acid in Saccharomyces cerevisiae as well as in Yarrowialipolytica. FEMS Yeast Res 2007, 7(3):380–390.66. Duftner N, Larkins-Ford J, Legendre M, Hofmann HA: Efficacy of RNAamplification is dependent on sequence characteristics: implicationsfor gene expression profiling using a cDNA microarray. Genomics2008, 91(1):108–117.67. Brunelle SA, Vand FM: Post-transcriptional regulation of S-phase genes inthe dinoflagellate: Karenia brevis. J Euk Micro 2011, 58(4):373–382.68. Erdner DL, Anderson DM: Global transcriptional profiling of the toxicdinoflagellate Alexandrium fundyense using massively parallel signaturesequencing. BMC Genomics 2006, 7:88.69. Machabee S, Wall L, Morse D: Expression and genomic organization of adinoflagellate gene family. Plant Mol Biol 1994, 25(1):23–31.70. Von Stosch HA: La signification cytologique de la cyclose nucleaire dansle cycle de vie des Dinoflagellates. Soc Bot Fr Memoires 1972:201–212.71. Von Stosch HA: Observations on vegetative reproduction and sexual lifecycles of two freswater dinoflagellates, Gymnodinium pseudopalustreSchiller and Woloszynkskia apiculata sp. Nov. Br Phycol J 1973, 8:105–134.72. Montagnes DJ, Lowe CD, Martin L, Watts PC, Downes-Tettmar N, Yang Z,Roberts EC, Davidson K: Oxyrrhis marina growth, sex and reproduction.J Plankton Res 2011, 33(4):615–627.73. Ramesh MA, Malik SB, Logsdon JM Jr: A phylogenomic inventory ofmeiotic genes; evidence for sex in Giardia and an early eukaryotic originof meiosis. Curr Biol: CB 2005, 15(2):185–191.74. Villeneuve AM, Hillers KJ: Whence meiosis? Cell 2001, 106(6):647–650.75. Gornik SG, Ford KL, Mulhern TD, Bacic A, McFadden GI, Waller RF: Loss ofnucleosomal DNA condensation coincides with appearance of a novelnuclear protein in dinoflagellates. Curr Biol: CB 2012, 22(24):2303–2312.76. Li JY: Studies on dinoflagellate chromosomal basic protein. Biosystems1983, 16(3-4):217–225.77. Roy S, Morse D: A full suite of histone and histone modifying genes aretranscribed in the dinoflagellate Lingulodinium. PLoS One 2012, 7(4):e34340.78. Wong JT, New DC, Wong JC, Hung VK: Histone-like proteins of thedinoflagellate Crypthecodinium cohnii have homologies to bacterialDNA-binding proteins. Eukaryot Cell 2003, 2(3):646–650.79. Kato KH, Moriyama A, Huitorel P, Cosson J, Cachon M, Sato H: Isolation ofthe major basic nuclear protein and its localization on chromosomes ofthe dinoflagellate: Oxyrrhis marina. Biol Cell 1997, 89(1):43–52.80. Okamoto OK, Hastings JW: Genome-wide analysis of redox-regulatedgenes in a dinoflagellate. Gene 2003, 321:73–81.81. Okamoto OK, Hastings JW: Novel dinoflagellate clock-related genes identifiedthrough microarray analysis. J Phycol 2003, 39:519–526.82. Valadkhan S: Role of the snRNAs in spliceosomal active site. RNA Biol2010, 7(3):345–353.83. Blumenthal T: Trans-splicing and operons in C. elegans. The C.elegans Research Community, WormBook: WormBook, ed; 2012.doi/10.1895/wormbook.1.5.2, http://www.wormbook.org.84. Slamovits CH, Keeling PJ: Widespread recycling of processed cDNAs indinoflagellates. Curr Biol: CB 2008, 18(13):R550–R552.85. Slamovits CH, Keeling PJ: Contributions of Oxyrrhis marina to molecularbiology, genomics and organelle evolution of dinoflagellates. J PlanktonRes 2011, 33(4):591–602.86. Andersson JO, Roger AJ: Evolution of glutamate dehydrogenase genes:evidence for lateral gene transfer within and between prokaryotes andeukaryotes. BMC Evol Biol 2003, 3:14.87. Dong JM, Taylor JS, Latour DJ, Iuchi S, Lin EC: Three overlapping lct genesinvolved in L-lactate utilization by Escherichia coli. J Bacteriol 1993,175(20):6671–6678.88. Persson B, Hedlund J, Jornvall H: Medium- and short-chain dehydrogenase/reductase gene and protein families: the MDR superfamily. Cell Mol Life Sci2008, 65(24):3879–3894.doi:10.1186/1471-2164-15-122Cite this article as: Lee et al.: Analysis of EST data of the marine protistOxyrrhis marina, an emerging model for alveolate biology andevolution. BMC Genomics 2014 15:122.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitLee et al. BMC Genomics 2014, 15:122 Page 16 of 16http://www.biomedcentral.com/1471-2164/15/122


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items