UBC Faculty Research and Publications

Sessile snails, dynamic genomes: gene rearrangements within the mitochondrial genome of a family of caenogastropod… Rawlings, Timothy A; MacInnis, Martin J; Bieler, Rüdiger; Boore, Jeffrey L; Collins, Timothy M Jul 19, 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2010_Article_3034.pdf [ 2.94MB ]
JSON: 52383-1.0220536.json
JSON-LD: 52383-1.0220536-ld.json
RDF/XML (Pretty): 52383-1.0220536-rdf.xml
RDF/JSON: 52383-1.0220536-rdf.json
Turtle: 52383-1.0220536-turtle.txt
N-Triples: 52383-1.0220536-rdf-ntriples.txt
Original Record: 52383-1.0220536-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessSessile snails, dynamic genomes: generearrangements within the mitochondrialgenome of a family of caenogastropod molluscsTimothy A Rawlings1*, Martin J MacInnis1,2, Rüdiger Bieler3, Jeffrey L Boore4, Timothy M Collins5,6AbstractBackground: Widespread sampling of vertebrates, which comprise the majority of published animal mitochondrialgenomes, has led to the view that mitochondrial gene rearrangements are relatively rare, and that gene orders aretypically stable across major taxonomic groups. In contrast, more limited sampling within the Phylum Mollusca hasrevealed an unusually high number of gene order arrangements. Here we provide evidence that the lability of themolluscan mitochondrial genome extends to the family level by describing extensive gene order changes thathave occurred within the Vermetidae, a family of sessile marine gastropods that radiated from a basalcaenogastropod stock during the Cenozoic Era.Results: Major mitochondrial gene rearrangements have occurred within this family at a scale unexpected for suchan evolutionarily young group and unprecedented for any caenogastropod examined to date. We determined thecomplete mitochondrial genomes of four species (Dendropoma maximum, D. gregarium, Eualetes tulipa, andThylacodes squamigerus) and the partial mitochondrial genomes of two others (Vermetus erectus and Thylaeodussp.). Each of the six vermetid gastropods assayed possessed a unique gene order. In addition to the typicalmitochondrial genome complement of 37 genes, additional tRNA genes were evident in D. gregarium (trnK) andThylacodes squamigerus (trnV, trnLUUR). Three pseudogenes and additional tRNAs found within the genome ofThylacodes squamigerus provide evidence of a past duplication event in this taxon. Likewise, high sequencesimilarities between isoaccepting leucine tRNAs in Thylacodes, Eualetes, and Thylaeodus suggest that tRNAremolding has been rife within this family. While vermetids exhibit gene arrangements diagnostic of this family,they also share arrangements with littorinimorph caenogastropods, with which they have been linked based onsperm morphology and primary sequence-based phylogenies.Conclusions: We have uncovered major changes in gene order within a family of caenogastropod molluscs thatare indicative of a highly dynamic mitochondrial genome. Studies of mitochondrial genomes at such lowtaxonomic levels should help to illuminate the dynamics of gene order change, since the telltale vestiges of geneduplication, translocation, and remolding have not yet been erased entirely. Likewise, gene order characters mayimprove phylogenetic hypotheses at finer taxonomic levels than once anticipated and aid in investigating theconditions under which sequence-based phylogenies lack resolution or prove misleading.BackgroundAnimal mitochondrial (mt) genomes typically consist ofa circular molecule of DNA encoding 37 genes (2 rRNAgenes, 13 protein-encoding genes, and 22 tRNA genes),the arrangement of which is often highly conservedwithin major taxonomic groups [1]. Consequently, whengene rearrangements occur, they may provide compel-ling phylogenetic markers that can corroborate or con-tradict hypotheses based on primary sequence data andprovide resolution for deeper nodes that are oftenweakly supported in sequence-based phylogenies [2-6].With recent technological and methodological advances(e.g., rolling circle amplification: [7,8]; next generationsequencing technologies: [9]), and associated decreasingcosts of DNA sequencing, the amplification and* Correspondence: Timothy_Rawlings@cbu.ca1Cape Breton University, 1250 Grand Lake Road, Sydney, NS B1P 6L2,CANADARawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440© 2010 Rawlings et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.sequencing of whole mt genomes has become routine.As a result, there has been a marked increase in thesequencing of whole animal mt genomes over the pastdecade as well as the development of computationalmethods to extract phylogenetic information from thesegenomes through inferences of past gene dynamics[10-12]. To date, 1868 complete metazoan mt genomesare available in the NCBI Genomes database http://www.ncbi.nlm.nih.gov/guide/genomes/; January 8, 2010),the majority belonging to arthropods (293) and verte-brates (1292).Compared to other major metazoan phyla, molluscanmitochondrial genomes are poorly represented atNCBI [13], with only 78 complete mt genomes avail-able as of January, 2010. Despite this, molluscan mtgenomes are beginning to challenge the traditionalview that mitochondrial gene orders are stable overlong periods of evolutionary time [13-16], a view basedlargely on the heavily sampled and highly conservedmt genomes of vertebrates. Instead, mollusc mt gen-omes demonstrate substantial heterogeneity in lengthand “architecture” [16], reflecting differences in genecomplement resulting from gene loss or duplication, aswell as changes in the position and strand specificityof tRNA genes, protein-encoding genes, and rRNAgenes. Changes in gene arrangement within the Mol-lusca have been so dramatic that representatives offour classes of molluscs (Gastropoda; Bivalvia; Cepha-lopoda; Scaphopoda) share remarkably few mitochon-drial gene boundaries, with gene orders varyingextensively even across major lineages of bivalves aswell as gastropods [14]. Changes in gene arrangementhave also been observed within bivalve and gastropodgenera, based on changes in position of: 1) tRNAs andan rRNA gene in the oyster, Crassostrea [17], and 2)protein encoding and tRNA genes in the vermetidmarine gastropod genus, Dendropoma [18]. Differencesin gene order are also evident between paternally ver-sus maternally inherited mitochondrial genomes ofbivalves exhibiting doubly uniparental inheritance [19],including the unionid freshwater bivalve, Inversidensjapanensis [14], and the marine venerid clam, Veneru-pis (Ruditapes) philippinarum (NCBI, unpublished).Similar intrageneric gene translocations have now beendescribed in 19 of 144 genera in which two or morecomplete mt genomes have been sequenced [16],including representatives of the Porifera, Platyhel-minthes, Nematoda, Mollusca, Arthropoda and Chor-data. Thus, growing evidence suggests that mtgenomes of many metazoan phyla may be considerablymore plastic than originally believed, with the con-served genome architecture of vertebrates reflecting aderived stabilization of the mt genome and not anancestral feature [16].The discovery of mt gene order changes at lower taxo-nomic levels, as found within the Mollusca, is excitingfor several reasons. First, gene dynamics involving trans-locations and inversions of genes offer the promise ofnew and robust characters that can be used to supportphylogenetic hypotheses at the level of families, genera,and species [18]. Given the comparatively low rate ofrearrangement and the astronomical number of possiblegene arrangements, convergence is likely to be rarecompared to four-state nucleotide sequence data [20].Second, it is becoming increasingly apparent that theapplication of mitochondrial sequences and gene orderdata to questions of evolutionary history and phyloge-netic relatedness requires a better understanding of theevolutionary dynamics of mt genomes [21]. Basicmechanisms of gene rearrangement associated withslipped-strand mispairing [22], errors in replication ori-gins or end points [23], and intramolecular recombina-tion [24], remain poorly understood. Likewise, tRNAremolding and tRNA recruitment events [25-28], generearrangement “hotspots” [29,30], the non-random lossof duplicated genes [31] and gene order homoplasy[32-34], which can act to confound phylogenetic infer-ences based on mtDNA sequences and gene orders,need to be explored more fully. Comparison of genearrangements at low taxonomic levels can help to eluci-date the process of gene rearrangement. For instance,the signature of specific processes such as tRNAremolding or recruitment can be most easily recognizedwhen such events have occurred recently, sinceremolded or recruited tRNAs can be identified throughhigh similarity scores and phylogenetic analyses [27,28].Likewise, those taxonomic groupings with unusuallylabile genomes offer the opportunity to investigate themechanics of gene rearrangement: telltale vestiges ofgene duplication and translocation, typically erased oroverwritten with time, may still be present within thesegenomes [35] and such intermediate stages can be criti-cal to reconstructing the processes through which suchgene rearrangements have occurred. Comparisons of mtgenomes at low taxonomic levels, even within familiesand genera, can thus be extremely helpful in interpret-ing the evolutionary dynamics of these genomes andexploiting the phylogenetic signal retained within theseDNA molecules [16].Here we present further evidence of highly dynamicmolluscan mt genomes by revealing extensive geneorder changes within members of one caenogastropodfamily: the Vermetidae. Vermetids are a group of sessile,irregularly coiled, suspension-feeding gastropods foundin warm temperate to tropical oceans around the worldthat radiated from a basal caenogastropod stock in theearly Cenozoic Era. They are currently classified asmembers of the Hypsogastropoda [36,37], a large andRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 2 of 24diverse group with a fossil record extending back to thePermo-Triassic boundary that includes all extant caeno-gastropods, except for the Architaenioglossa, Cerithioi-dea and Campaniloidea. While relationships within theHypsogastropoda are not well resolved, vermetids aretypically positioned within the infraorder Littorinimor-pha. More specifically, molecular analyses suggest thatvermetids are members of a largely asiphonate clade ofgastropods including the Littorinidae, Eatoniellidae, Ris-soidae, Anabathridae, Hipponicidae, Pterotracheidae,Epitoniidae, Cerithiopsidae, Eulimidae, and Naticidae[37]. This association is also supported by morphologicalsimilarities in euspermatozoa shared by many membersof this clade [38].Gene order rearrangements have been recognized pre-viously in this family [18] based on small (<3.5 kb) por-tions of the mt genome sequenced from several specieswithin the genus Dendropoma. In this paper, we expandupon these earlier results by providing complete mtgenomes for two Dendropoma species as well as forrepresentatives of two other vermetid genera, Thyla-codes and Eualetes. We also reveal additional gene rear-rangements within this family through the partialgenomes of the vermetid genera Thylaeodus and Verme-tus. The extent of gene rearrangement within the familyoffers great potential for improving our phylogenetichypothesis for the enigmatic Vermetidae as well as forunderstanding more fully the mechanics of gene orderchange within metazoan mt genomes.MethodsAmplification and sequencingSelection of taxa was based on: 1) the discovery of novelgene orders in these species following PCR amplifica-tions spanning gene boundaries [18] and 2) those gen-omes that successfully amplified using long and accuratePCR (LAPCR). The collecting locality, tissue source, andField Museum of Natural History (FMNH) voucherinformation for each specimen, are presented in Table1, along with GenBank accession numbers, primersequences, and lengths of amplification products. Othergene arrangements have been identified in additionalvermetids based on partial genome sequences (<3 kb),but these results are not presented here (Rawlings et al.,in prep).DNA was extracted from ethanol-preserved tissuesusing a phenol chloroform extraction protocol asdescribed in [39]. Initially, an 1800 bp region of themitochondrial genome spanning the rrnS to nad1 regionwas amplified as part of a phylogenetic analysis of Ver-metidae [18]. This sequence was subsequently used todesign outwardly facing primers for LAPCR (Table 1:rrnL-F; rrnL-R). Because attempts to amplify the gen-ome in one piece were not successful, we amplified a650 bp fragment of cox1 using modifications of Folmer’swidely used cox1 primers [40]. This cox1 sequence wasthen used as a template for designing a second pair ofprimers (cox1-F; cox1-R). Successful amplifications wereassociated with the primer combinations: rrnL-F/cox1-R("A” fragment) and rrnL-R/cox1-F ("B” fragment).LAPCR reactions were undertaken using GeneAmp XLPCR kits (Applied Biosystems; N8080193). Typically, 25μL reactions were set up in two parts separated by awax bead following the manufacturer’s recommenda-tions, using a [Mg(OAc)2] of 1.2 mM and an annealingtemperature specific to the primer combination. Typicalconditions consisted of a 94°C denaturation period last-ing 60 s, followed by: 16 cycles at 94°C for 25 s, 60°Cfor 60 s, and 68°C for 10 min; 18 cycles at 94°C for 25s, 60°C for 60 s, and 68°C for 12 min; and a final exten-sion period at 72°C for 10 min. Amplifications were runon a Stratagene Gradient Robocycler. PCR productswere cleaned by separating high molecular weight pro-ducts from primers and sequencing reagents using Milli-pore Ultrafree filter columns [7]. Samples were added to200 μL of sterile water and then spun in a picofuge for15 min or until the filter membranes were dry. PCRproducts were eluted from the membrane in 20 μL ofsterile distilled water, and 5 μL of this product was runout on a 0.8% agarose gel to confirm the presence of aband of the appropriate size. Typically, products fromseveral replicate PCR reactions were pooled prior toquantitation. Once 3 ng of PCR product had beenobtained, samples were dried down in a vacuum centri-fuge and sent to the Joint Genome Institute, WalnutCreek, CA, where they were sequenced using standardshotgun sequencing protocols [7,8].Genome annotation and analysisGenome annotationThe approximate locations of the rRNA and protein-encoding genes were determined by aligning each unan-notated genome with genes from other caenogastropodmt genomes. The precise boundaries of rRNA genescould not be determined due to the lack of sequencesimilarity at the 5′ and 3′ ends; therefore, the location ofeach rRNA gene was assumed to extend from theboundary of the upstream flanking gene to the boundaryof the downstream flanking gene, as in [41]. A standardinitiation codon was located at the beginning of eachprotein-encoding gene (either ATG, ATA, or GTG), andthe derived amino acid sequence was aligned withhomologous protein sequences to ensure that this was asuitable initiation codon (based on length). When possi-ble, the first proper stop codon (TAG or TAA) down-stream of the initiation codon was chosen to terminatetranslation; however, to reduce overlap with downstreamgenes, abbreviated stop codons (T or TA) were selectedRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 3 of 24Table1Samplinglocalities,tissuesources,andPCRprimersequencesforthesixvermetidspeciesexaminedinthisstudySpeciesFMNHvoucher#GenBankaccession#LocalityTissuePrimerSetA(5’-3’)rrnL-F/cox1-RFragmentSize(bp)PrimerSetB(5’-3’)rrnL-R/cox1-FFragmentSize(bp)Dendropomamaximum(Sowerby1825)FMNH318221HM174253GulfofAqaba,JordanFoottissue;buccalmassCGAATTGAAAGGGGGGCTTGTGACCTCGATGTTGTTTCGATCCGTTAAAAGCATAGTGATAGCTCC6877ACGCTACCTTCGCACGGTCAAAGTACCGCGGCTTGTTATGCCAATAATGATTGGTGGTTTCGG8809DendropomagregariumHadfield&Kay1972FMNH318222HM174252HI,USAFoottissue;buccalmassCAAATCGAAAAAAGGGTTTGCGACCTCGATGTTGTTACGGTCAGTTAAGAGTATGGTAATAGCACC6490ATGCTACCTTTGCACGGTCAGGGTACCGCGGCTGGTAATACCCATGATAATTGGAGGTTTTGG9254Eualetestulipa(Chenu1843)FMNH318223HM174254PeanutIsland,PalmBeachCo.,FL,USABuccalmassCATATCGAAAGAATAGTTTGCGACCTCGATGTTGTTTCGGTCCGTCAACAATATTGTAATTGCCCC6880TTCAACGAGAGCGACGGGCGATATGTACAC(rrnS-R)TGGTAATGCCTATAATGATTGGGGGGTTCGG7472Thylacodessquamigerus1(Carpenter,1857)FMNH318997HM174255CoronadelMar,CA,USAFoottissueCCCATCGAAAGAAGAGTTTGTGACCTCGATGTTGTTTCGGTCCGTCAACAGCATAGTAATAGCTCC6648ATGCTACCTTTGCACGGTCAGAGTACCGCGGCTGGTTATACCAATAATAATTGGTGGCTTCGG9034Thylaeodussp.FMNH318224HM174256KewaloMarineLaboratory,HI,USAHead,foot&mantlemarginCATATTGAAAAAAAAGTTTGTGACCTCGATGTTGTTTCGATCAGTCAATAACATAGTAATTGCGCC4703ATGCTACCTTTGCACGGTCAGAGTACCGCGGCTAGTAATACCTATAATAATTGGTGGATTTGGN/AVermetuserectusDall,1888FMNH318225HM174257PourtalesTerrace,FloridaKeys;FL,USAFoottissue;buccalmassCTAATCGAAGAAAAGGCTTGTGACCTCGATGTTGTTTCGGTCAGTCAGTAGTATAGTAATAGCACC3107ATGCTACCTTAGCACGGTTAAAATACCGCGGCTTGTCATGCCTATAATAATTGGCGGATTTGGN/A1ThylacodeshasbeendemonstratedtohavepriorityoverSerpulorbis;henceSerpulorbissquamigerusisnowcorrectlyreferredtoasThylacodessquamigerus(see[62]).Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 4 of 24for some genes. In these instances, polyadenylation ofthe mRNA was assumed to restore a full TAA stopcodon [7]. Regardless, some genes appeared to overlapbased on the conservation of their open reading framesequences and the lack of a potential abbreviated stopcodon. Gene locations and secondary structures oftRNAs were identified using tRNAscanSE [42] andARWEN [43]. On the rare occasion that these programsdid not find all of the expected tRNAs, the remainingtRNAs were found by eye and folded manually. We pro-duced tRNA drawings manually using Canvas (ACDSystems). Using cox1 as the conventional starting pointfor the four genomes, linear maps of the circular gen-omes were created to facilitate comparison of geneorders amongst vermetids, across available caenogastro-pods, and between caenogastropods and other selectmolluscs, including the abalone, Haliotis rubra (ClassGastropoda,Superorder Vetigastropoda), the octopus,Octopus vulgaris (Class Cephalopoda), and the chiton,Katharina tunicata (Class Polyplacophora).Nucleotide compositionThe nucleotide composition of each complete and partialgenome was described by calculating the overall base com-position, %AT content, AT-skew, and GC-skew for thestrand encoding cox1, hereafter referred to as the “+”strand. Base composition and %AT content were deter-mined using MacVector (MacVector, Inc.), and strandskews (AT skew = (A-T)/(A+T); GC skew = (G-C)/G+C))were calculated using the formulae of [44]. For completegenomes, the %AT content, AT-skew, and GC-skew werealso calculated for rRNA genes, protein-encoding genes(for all bases, third codon positions, and third positions offour-fold degenerate (4FD) codons, as in [44]), tRNAgenes (separately for those coded for on the “+” and “-”strand), and intergenic (unassigned) regions. Values werecompared across categories within each genome, andwithin categories across genomes. In addition, we exploredthe nucleotide composition at third positions of 4FDcodons in relation to the position of each protein-encod-ing gene within the genome [45]. Gene positions weredetermined by calculating the distance (number of nucleo-tides) from the midpoint of each gene to a reference pointchosen here as the start of nad1.A base composition plot (%A+C and %G+T along thelength of the genome) was created for each completegenome using a sliding window of 100 nucleotides.Typically, the leading strand is G+T rich associated withits protracted single-stranded state during replicationand transcription; deviations from this pattern can sig-nify switches in the assignments of these strands [41].Plots were aligned with linear representations of thematching genome to signify base composition trends forprotein and tRNA genes encoded on the “+” strand,tRNA genes encoded on the “-” strand, and rRNA genes.Genetic codeTo ensure that all codons were being utilized, codonusage frequencies were analyzed using MacVector (Mac-Vector, Inc.); unused codons can potentially signal achange in the genetic code [46]. Codons whose aminoacid identity has changed in the mt genetic code ofother metazoans (e.g. flatworms, echinoderms, andhemichordates; [47]) were investigated using a proce-dure similar to [48]: the derived amino acids of fivecodons (TGA, ATA, AGA, AGG, AAA) were examinedin the alignment of three highly conserved proteins(cox1, cox2, and cox3) from the mt genomes of four ver-metids, D. gregarium, D. maximum, E. tulipa andT. squamigerus, and several other caenogastropods.Within the vermetids, if the derived amino acid fromany of these five codons occurred in a conserved posi-tion (present at that location in >50% of caenogastropodsequences examined) this was scored as a positive result.Amino acids that occurred in a non-conserved position(<50%) were scored as negative. The proportion ofappearances in conserved vs. non-conserved locationswas calculated for each codon across the vermetidsexamined. High percentages in conserved locationslikely indicate the retention of amino acid identity bythe specific codon [48]; low percentages can be sugges-tive of a change in amino acid identity, although otherexplanations are also possible.Unassigned regionsAs metazoan mt genomes are typically compact withminimal non-coding DNA [49], unassigned stretches ofnucleotides often contain control elements for transcrip-tion or replication or remnants of duplicated protein,rRNA or tRNA genes. We examined putative non-coding regions (> 20 bp) for gene remnants, repeatsequences, inverted repeats, palindromes, and secondarystructure features, all of which can be associated withsignaling elements of the mt genome. Protein, rRNAand tRNA remnants were identified by aligning unas-signed regions with annotated genes from caenogastro-pod mitochondrial genomes. MEME [50] and M-Fold[51] were used to identify potential sequence motifs andstructural features, respectively.tRNA remolding/recruitmentTo search for close matches between tRNA genes indi-cative of gene remolding/recruitment, each tRNA genesequence was aligned to tRNA genes from other avail-able caenogastropod genomes (including new vermetidtRNA sequences). Similarity scores were based on initialtRNA alignments undertaken in MacVector that weresubsequently adjusted by eye according to secondarystructure features (stems vs. loops). As in [28], the thirdbase of the anticodon triplet was excluded in the calcu-lation of % similarity between two tRNA genes, but gapswere counted as mismatches.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 5 of 24Results and DiscussionCaenogastropod mitochondrial genomesTo date, comparisons across published caenogastro-pod mt genomes have suggested a model of geneorder conservation unusual for the Gastropoda[15,48,52,53]. Among the 16 complete caenogastropodgenomes available at NCBI as of January 8, 2010,gene order rearrangements within this clade appearminor, involving only changes in position of trnLUUR,trnLCUN, trnV and trnS2, and an inversion of trnT([53]). Likewise, only two rearrangements, one inver-sion and one transposition, separate the reconstructedancestral gastropod gene order from that of most cae-nogastropods [15,53]. While this conservation in geneorder within the Caenogastropoda may be real, itcould also reflect a strong sampling bias for membersof the Neogastropoda (12 complete genomes) - a lar-gely Cenozoic radiation of predatory snails - withonly four complete genomes from two genera (Cyma-tium and Oncomelania), examined from other caeno-gastropod groups. In contrast, wider sampling of mtgene orders within the Heterobranchia, a sister cladeto the Caenogastropoda, has demonstrated highlydynamic mt genomes [15]. Of 13 genomes sampledacross a disparate array of taxa, including 5 opistho-branchs (including an unpublished Elysia genome), 7pulmonates (including a second Biomphalaria genomenot included in [15]), and 1 basal heterobranch [15],numerous changes in gene order have been observed,with few mt gene boundaries shared between hetero-branchs and the hypothetical ancestral gastropod mtgenome inferred by [15].Here, based on a detailed sampling of mt genomeswithin one family of caenogastropods outside the Neo-gastropoda, we provide a new and very different pictureof gene order dynamics within the Caenogastropoda.Our results increase the number of caenogastropod mtgenomes sequenced to 20, substantially increase thesampling of mt genomes outside the Neogastropoda,and present the first direct evidence of major gene orderrearrangements within the Littorinimorpha based oncomplete genome sequences. Full mt genomes were suc-cessfully sequenced for the vermetids, Dendropoma gre-garium, D. maximum, Eualetes tulipa, and Thylacodessquamigerus (Figure 1; Tables 2, 3, 4 and 5). Amplifica-tions of the “B” fragment (cox1-F/rrnL-R) were not suc-cessful for Thylaeodus sp. and V. erectus andconsequently only partial mt genomes are described forthese two taxa (Figure 1; Tables 6 and 7). Nevertheless,extensive gene rearrangements were evident withinthese species compared to other vermetids and caeno-gastropods. Characterization of the vermetid mt gen-omes presented below is based only on the fourcomplete genomes, except where noted otherwise.Genome organizationThe mitochondrial genomes of D. gregarium, D. maxi-mum, E. tulipa, and T. squamigerus varied in size from15078 bp to 15641 bp, similar in length to other caeno-gastropods (range: 15182 - 16648 bp, n = 16) but con-siderably larger than most heterobranchs (range:13670 - 14745 bp, n = 13). Each genome contained the37 genes typical of most animal mitochondrial genomes(Figure 1; Tables 2, 3, 4 and 5), with all 13 protein-encoding genes and 2 rRNA genes located on the “+”strand along with either 14 or 15 tRNA genes dependingon the species. In each of the four genomes, the tRNAgenes, trnY, trnM, trnC, trnW, trnQ, trnG, and trnE,were located on the “-” strand forming a cassette of7 adjacent tRNAs. The only difference across genomesin the strand specific coding of a gene was for trnT,which was located on the “+” strand of E. tulipa, but onthe “-” strand of D. gregarium, D. maximum, andT. squamigerus. The tendency for protein and rRNAgenes to be coded for on the same strand has beenfound in all caenogastropod taxa examined so far, butthis is atypical for molluscs (excluding bivalves) andother metazoan mt genomes described to date (137 of1428 genomes as of Dec 17, 2008; [54]). The predomi-nance of this single-strand-dependence for protein andrRNA genes among sponges and cnidarians has led tothe proposition that this is the plesiomorphic metazoancondition [54].The tRNA genes identified within each genome andtheir secondary structures are presented as additionalfiles (see Additional file 1, Figure S1; Additional file 2,Figure S2; Additional file 3, Figure S3; Additional file 4,Figure S4; Additional file 5, Figure S5; Additional file 6,Figure S6). Extra tRNA genes were found in two verme-tid genomes. D. gregarium had a second trnK locatedbetween genes trnI and nad3 (see Figure 1). This secondtrnK can be folded in to a typical tRNA cloverleaf struc-ture (see Additional file 2, Figure S2), and thus may befunctional. The genome of T. squamigerus also con-tained additional trnV and trnLUUR genes (see Figure 1).These genes were associated with two large stretches ofunassigned sequence positioned between the two copiesof trnV (see below).Overlapping adjacent genes were common in vermetidmt genomes. Two pairs of protein-encoding genes over-lapped in all four species (atp8 and atp6; nad4L andnad4); overlap of nad4L and nad4 was also evident in thepartial genome of Thylaeodus. These gene pairs com-monly overlap in animal mt genomes. This phenomenonwas more variable between tRNA genes (see Tables 2, 3,4, 5, 6 and 7 for specific examples). In addition, in threeinstances tRNA genes overlapped with protein-encodinggenes (trnH and nad4 in E. tulipa; trnSAGN with cox3 andtrnR with nad2 in T. squamigerus). Similar comparisonsRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 6 of 24Figure 1 Gene maps of four complete and two partial vermetid mt genomes. Blue arrows represent protein-encoding genes; red arrowsrepresent rRNA genes; and circles represent tRNA genes. Arrows indicate the direction of transcription (clockwise = “+” strand). Genes arelabeled according to their standard abbreviations. For tRNA genes, green circles represent those genes located on the “+” strand and whitecircles those located on the “-” strand. Regions of unassigned nucleotides (presumed non-coding sequences) are not identified on these maps.As is the standard convention for metazoan mt genomes, cox1 has been designated the start point for the “+” strand. The asterisks (*) inD. maximum and T. squamigerus denote stretches of sequences (presumed pseudogenes) with high similarity to portions of complete genes. Forexact locations of each gene, as well as unassigned regions, see Tables 2-7.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 7 of 24could not be made for the two rRNA genes since theirboundaries were only imprecisely defined by boundarieswith neighbouring genes.Gene initiation and terminationNearly all protein-encoding genes (58/61) were initiatedby the canonical ATG start codon, although ATA(twice) and GTG (once) also acted as start codons(Tables 2, 3, 4, 5, 6 and 7). These start codons are notunusual in molluscan mt genomes, but their usage, asfound elsewhere, appears to be much less frequent thanATG [7] (but see [15]). TAG was the most common ter-mination codon (28/59), but TAA (15/59) and abbre-viated stop codons (a single T or TA terminating theopen reading frame; 16/59) were used slightly moreTable 2 Detailed description of genes and unassigned regions (UR) within the complete mt genome (15578 bp) ofDendropoma maximumGene Starta End Length Amino acids Start codon End codonb UR (%AT)ccox1 1 1536 1536 512 ATG TAG 38 (76.3)trnD 1575 1642 68 - - - 0atp8 1643 1798 156 52 ATG TAG -7atp6 1792 2479 688 229 ATG T– 0trnN 2480 2550 71 - - - 0nad5 2551 4251 1701 567 ATG TAA 14 (78.6)trnK 4266 4340 75 - - - -2trnA 4339 4404 66 - - - 1cox3 4406 5185 780 260 ATG TAA 15 (86.7)trnSAGN 5201 5274 74 - - - 1nad2 5276 6295 1020 340 ATG TAG 0trnR 6296 6364 69 - - - 11 (54.5)/167 (69.3)drrnS* (pseudo) 6376 6496 121 - - - 35 (74.3)trnY* 6532 6598 67 - - - 0trnM* 6599 6668 70 - - - 2trnC* 6671 6739 69 - - - 0trnW* 6740 6807 68 - - - -4trnQ* 6804 6874 71 - - - 0trnG* 6875 6943 69 - - - 0trnE* 6944 7016 73 - - - 0rrnS 7017 8049 1033 - - - 0trnV 8050 8116 67 - - - 0rrnL 8117 9563 1447 - - - 0trnLUUR 9564 9633 70 - - - -4trnLCUN 9630 9698 69 - - - 1nad1 9700 10632 933 311 ATG TAA 2trnP 10635 10707 73 - - - 1nad6 10709 11189 481 160 ATG T– 0cob 11190 12329 1140 380 ATG TAG 2trnSUCN 12332 12399 68 - - - 33 (54.5)trnT* 12433 12508 76 - - - 8nad4L 12517 12810 294 98 ATG TAG -7nad4 12804 14161 1358 452 GTG TA- 0trnH 14162 14231 70 - - - 46 (47.8)trnF 14278 14346 69 - - - 0trnI 14347 14418 72 - - - 2nad3 14421 14774 354 118 ATG TAG 46 (65.2)cox2 14821 15498 678 226 ATG TAG 80 (63.8)a Genes are arranged relative to cox1. Those encoded on the “-” strand are indicated with an asterisk (*).b T, and TA, refer to instances where incomplete stop codons are inferred.c Unassigned regions are identified by positive values. Negative values indicate overlap between adjacent genes. Values in brackets refer to %AT of theassociated UR (only those UR >10 bp in length were analyzed).d For the unassigned region between trnR and trnY spanning the pseudogene rrnS.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 8 of 24frequently when considered together. The sequences forthe cox1 genes of Thylaeodus and Vermetus wereincomplete so the termination codons for these genesare currently unknown.Unassigned regionsThe number of nucleotides unassignable to any generanged from 289 (1.9% of the genome) in Eualetestulipa to 625 (4.0% of the genome) in Thylacodessquamigerus (Table 8). No unassigned stretch ofsequence was greater than 270 bp in length, althoughintergenic regions ranging in size from 10 - 99 bp werecommon (Table 8). There was little consistency in sizeand position of unassigned regions across genomes, withthe position of the largest region found in differentareas for each species. Short (≤ 15 bp) AT-rich (>80%)intergenic regions were evident within the genomes ofTable 3 Detailed description of genes and unassigned regions (UR) within the complete mt genome (15641 bp) ofDendropoma gregariumGene Starta End Length Amino acids Start codon End codonb UR (%AT)ccox1 1 1533 1533 511 ATG TAG 3trnD 1537 1611 75 - - - 2atp8 1614 1769 156 52 ATG TAA -7atp6 1763 2452 690 230 ATG TAA 7trnN 2460 2535 76 - - - 2nad5 2538 4259 1722 574 ATA TAA 27 (74.1)trnA 4287 4354 68 - - - 1cox3 4356 5134 779 260 ATG TA- 0trnSAGN 5135 5201 67 - - - 38 (60.5)nad2 5240 6252 1013 338 ATG TA- 0trnR 6253 6321 69 - - - 55 (72.7)trnY* 6377 6443 67 - - - 1trnM* 6445 6514 70 - - - 10 (30.0)trnC* 6525 6597 73 - - - 1trnW* 6599 6667 69 - - - 11 (50.0)trnQ* 6679 6735 57 - - - -3trnG* 6733 6798 66 - - - 2trnE* 6801 6872 72 - - - 0rrnS 6873 7851 979 - - - 0trnV 7852 7922 71 - - - 16 (50.0)trnK1 7939 8013 75 - - - 55 (65.5)trnP 8069 8139 71 - - - 1nad6 8141 8629 489 163 ATG TAG 0rrnL 8630 9978 1349 - - - 0trnLUUR 9979 10049 71 - - - -2trnLCUN 10048 10116 69 - - - 2nad1 10119 11049 931 310 ATG T– 125 (52.8)cob 11175 12323 1149 383 ATG TAA 13 (84.6)trnSUCN 12337 12407 71 - - - 4tRNA T* 12412 12479 68 - - - 7nad4L 12487 12780 294 98 ATG TAA -7nad4 12774 14129 1356 452 ATG TAG 30 (66.7)trnH 14160 14227 68 - - - 0trnF 14228 14296 69 - - - 6trnI 14303 14375 73 - - - 137 (58.4)trnK2 14513 14586 74 - - - 2nad3 14589 14939 351 117 ATG TAG 7cox2 14947 15636 690 230 ATG TAG 5a Genes are arranged relative to cox1. Those encoded on the “-” strand are indicated with an asterisk (*).b T, and TA, refer to instances where incomplete stop codons are inferred.c Unassigned regions are identified by positive values. Negative values indicate overlap between adjacent genes. Values in brackets refer to %AT of theassociated UR (only those UR >10 bp in length were analyzed).Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 9 of 24Dendropoma maximum, D. gregarium, and T. squami-gerus, but not in E. tulipa (Tables 2, 3, 4 and 5), andagain, the position of these sequences varied across taxa.Stretches of unassigned sequence including invertedrepetitive elements are known to reside between trnFand cox3 in the mt genomes of several caenogastropods,likely representing the control region for replication andtranscription in these genomes [52,53]. In our compari-sons across the four complete vermetid genomes, wewere unable to identify any similar regions associatedwith repetitive sequence motifs or palindromes. As iscommon in mt genomes, however, secondary structureelements were found by MFOLD within many of theseintergenic regions, suggestive of a role in post-transcrip-tional modification of the polycistronic transcript.Our analyses did uncover several interesting featureswithin large intergenic regions of Dendropomamaximum and Thylacodes squamigerus (Figure 1). InTable 4 Detailed description of genes and unassigned regions (UR) within the complete mt genome (15078 bp) ofEualetes tulipaGene Starta End Length Amino acids Start codon End codonb UR (%AT)ccox1 1 1531 1531 510 ATG T– 0trnD 1532 1596 65 - - - 0atp8 1597 1752 156 49 ATG TAG -7atp6 1746 2430 685 228 ATG T– 0trnN 2431 2497 67 - - - 0nad5 2498 4205 1708 569 ATG T– 0trnK 4206 4270 65 - - - 51 (62.1)cox3 4322 5104 783 261 ATG TAA 0trnSAGN 5105 5168 64 - - - 0nad2 5169 6176 1008 336 ATG TAA 0trnR 6177 6242 66 - - - 4trnT 6247 6312 66 - - - 33 (66.7)trnY* 6346 6411 66 - - - 2trnM* 6414 6480 67 - - - 0trnC* 6481 6546 66 - - - 0trnW* 6547 6609 63 - - - -5trnQ* 6605 6671 67 - - - 0trnG* 6672 6735 64 - - - 6trnE* 6742 6807 66 - - - 0rrnS 6808 7716 909 - - - 0trnV 7717 7783 67 - - - 0rrnL 7784 9087 1304 - - - 0trnLCUN 9088 9152 65 - - - 9trnP 9162 9225 64 - - - 73 (71.2)nad1 9299 10231 933 311 ATA TAG 10 (50.0)trnA 10242 10304 63 - - - 2trnLUUR 10307 10371 65 - - - 56 (58.9)nad6 10428 10934 507 169 ATG TAG 2cob 10937 12076 1140 380 ATG TAG 0trnSUCN 12077 12143 67 - - - 18 (66.7)nad4L 12162 12452 291 97 ATG TAG -7nad4 12446 13810 1365 455 ATG TAG -1trnH 13810 13875 66 - - - -1trnF 13875 13940 66 - - - 0trnI 13941 14012 72 - - - 0nad3 14013 14366 354 118 ATG TAG 3cox2 14370 15058 689 229 ATG TA- 20 (65.0)a Genes are arranged relative to cox1. Those encoded on the “-” strand are indicated with an asterisk (*).b T, and TA, refer to instances where incomplete stop codons are inferred.c Unassigned regions are identified by positive values. Negative values indicate overlap between adjacent genes. Values in brackets refer to %AT of theassociated UR (only those UR >10 bp in length were analyzed).Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 10 of 24Table 5 Detailed description of genes and unassigned regions (UR) within the complete mt genome (15544 bp) ofThylacodes squamigerusGene Starta End Length Amino acids Start codon End codonb UR (%AT)ccox1 1 1537 1537 512 ATG T– 78 (57.5)trnD 1616 1682 67 - - - 1atp8 1684 1839 156 52 ATG TAG -7atp6 1833 2531 699 233 ATG TAA 12 (83.3)trnN 2544 2610 67 - - - 1nad5 2612 4306 1695 565 ATG TAG 17 (47.1)trnK 4324 4389 66 - - - -3trnA 4387 4453 67 - - - 0cox3 4454 5236 783 261 ATG TAA -2trnSAGN 5235 5299 65 0nad2 5300 6298 999 333 ATG TAG -1trnR 6298 6365 68 - - - 35 (74.3)trnY* 6401 6467 67 - - - -2trnM* 6466 6531 66 - - - 6trnC* 6538 6599 62 - - - 2trnW* 6602 6669 68 - - - -5trnQ* 6665 6729 65 - - - 8trnG* 6738 6802 65 - - - 3trnE* 6806 6873 68 - - - 0rrnS 6874 7809 936 - - - 0trnV1 7810 7877 68 - - - 0/163 (69.9)drrnL (pseudo) 7878 8017 140 - - - 23 (82.6)trnLCUN 8041 8105 65 - - - 0/270 (58.9)enad1 (pseudo) 8106 8225 120 - - - 36 (61.1)rrnS (pseudo) 8262 8375 114 - - - 0trnV2 8376 8443 68 - - - 0rrnL 8444 9733 1290 - - - 0trnLUUR1 9734 9801 68 - - - -4trnLUUR2 9798 9862 65 - - - 0nad1 9863 10798 936 312 ATG TAG 1trnP 10800 10867 68 - - - 2nad6 10870 11350 481 160 ATG T– 0cob 11351 12490 1140 380 ATG TAG 0trnSUCN 12491 12561 71 - - - -2trnT* 12560 12623 64 - - - 5nad4L 12629 12919 291 97 ATG TAG -7nad4 12913 14277 1365 455 ATG TAA 5trnH 14283 14344 62 - - - 2trnF 14347 14416 70 - - - 6trnI 14423 14492 70 - - - 2nad3 14495 14848 354 118 ATG TAG 6cox2 14855 15544 690 230 ATG TAA 0a Genes are arranged relative to cox1. Those encoded on the “-” strand are indicated with an asterisk (*).b T, and TA, refer to instances where incomplete stop codons are inferred.c Unassigned regions are identified by positive values. Negative values indicate overlap between adjacent genes. Values in brackets refer to %AT of theassociated UR (only those UR >10 bp in length were analyzed).d For the unassigned region between trnV1 and trnLCUN spanning the pseudogene rrnL.e For the unassigned region between trnLCUN and trnV2 spanning the pseudogenes nad1 and rrnS.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 11 of 24Table 6 Detailed description of genes and unassigned regions (UR) within the partial mt genome of Thylaeodus spGene Starta End Length Amino acids Start codon End codonb UR (%AT)crrnS (partial) 1 431 431 - 5trnV 437 504 68 - 13 (53.8)trnE 518 591 74 - 0rrnL 592 1882 1291 - 16 (56.3)trnLCUN 1899 1964 66 - 99 (70.7)nad6 2064 2562 499 166 ATG T– 0cob 2563 3703 1141 380 ATG T– -6trnSUCN 3698 3766 69 - 1trnT* 3768 3834 67 - 11 (100.0)nad4L 3846 4131 286 95 ATG T– -2nad4 4130 5491 1362 454 ATG TAA 6trnF 5498 5565 68 - - - 9trnY 5575 5639 65 - - - -1trnLUUR 5639 5708 70 - - - 11 (63.6)cox1 (partial) 5720 6415 696 232 ATG N/A 0a Genes are not arranged relative to cox1 because only a partial genome is available. Those genes encoded on the “-” strand are indicated with an asterisk (*).b T, and TA, refer to instances where incomplete stop codons were inferred.c Unassigned regions are identified by positive values. Negative values indicate overlap between adjacent genes. Values in brackets refer to %AT of theassociated UR (only those UR >10 bp in length were analyzed).Table 7 Detailed description of genes and unassigned regions (UR) within the partial mt genome of Vermetus erectusGene Starta End Length Amino acids Start codon End codonb UR (%AT)crrnS 1 429 429 - - - 0trnV 430 499 70 - - - 0rrnL 500 1756 1257 - - - 0trnLUUR 1757 1819 63 - - - 3trnLCUN 1823 1887 65 - - - 1nad1 1889 2824 936 312 ATG TAG 18 (66.7)trnH 2843 2904 62 - - - 10 (60.0)trnI 2915 2980 66 - - - 0nad3 2981 3331 351 117 ATG TAG 3cox2 3335 4022 688 229 ATG T– 0cox1 (partial) 4023 4718 696 232 ATG N/A 0a Genes are not arranged relative to cox1 because only a partial genome is available. Those genes encoded on the “-” strand are indicated with an asterisk (*).b T, and TA, refer to instances where incomplete stop codons were inferred.c Unassigned regions are identified by positive values. Negative values indicate overlap between adjacent genes. Values in brackets refer to %AT of theassociated UR (only those UR >10 bp in length were analyzed).Table 8 A comparison of unassigned regions (UR) within the complete mt genomes of four vermetid gastropodsUR length distributionc Longest URcSpecies % UR (bp)a #b 10-19 bp 20-39 bp 40-99 bp > 99 bp bp LocationDendropoma maximum d 2.95 (459) 8 2 2 3 1 167 trnR - trnYDendropoma gregarium 3.65 (571) 11 4 3 2 2 137 trnI - trnK2Eualetes tulipa 1.92 (289) 7 2 2 3 0 73 trnP - nad1Thylacodes squamigerus d 4.02 (625) 6 2 1 1 2 270 trnLCUN - trnV2a Expressed as a percent of the whole genome and the total number of base pairs.b Includes all unassigned sequences ≥ 10 bp in length.c The location and size of the UR is defined by upstream and downstream neighbouring genes.d Data for Dendropoma maximum and Thylacodes squamigerus include pseudogene sequences found within some unassigned regions (see Tables 2 and 5).Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 12 of 24D. maximum, we identified a 121 bp stretch within anunassigned region of 167 bp between trnR and trnY thatwas a perfect match to the reverse complement of aportion of rrnS. Likewise, in Thylacodes squamigerus, inan unassigned region between trnV1 and trnLCUN, wefound a 140 bp stretch of sequence that was identical toa corresponding portion of rrnL. In addition, betweentrnLCUN and trnV2, we discovered a 120 bp remnant ofnad1 (82% identical, with a 91 bp stretch differing onlyin 6 bases) and a 114 bp remnant of rrnS (92% identi-cal). These pseudogene fragments in Thylacodes, as wellas the extra tRNAs (trnV and trnLUUR) present in thisregion, are likely the result of a duplication event span-ning rrnS-trnV-rrnL-trnLUUR-trnLCUN-nad1, with subse-quent overwriting of some gene duplicates (Rawlingset al., in prep).Nucleotide composition and skewsAll four vermetid genomes were AT-rich, with these twonucleotides accounting for 59 - 62% of the genome(Figure 2; Additional file 7, Table S1); this trend wasconsistent across regions of each genome associatedwith protein-encoding genes, rRNA genes, tRNAs, andnon-coding regions. Interestingly, however, the AT-biases of vermetid genomes were noticeably lower thanthose reported for other complete caenogastropodgenomes (range: 65.2-70.1%, n = 14; Additional file 7,Table S1). Likewise, AT content was only moderatelyhigher at third-codon positions and 4FD sites of pro-tein-encoding genes, a result unusual in comparisonwith many other protostomes where %AT can oftenexceed >80% at 4FD sites [44]. Skew analyses revealedthat the “+” strand was strongly biased against A (AT-skew ranging from -0.148 to -0.238) and towards G(GC-skew ranging from +0.065 to +0.251). This patternwas similar to other caenogastropods (Additional file 7,Table S1), heterobranchs [15], and the chiton, Kathar-ina tunicata [55], but opposite to that found on the “+”strand of the abalone Haliotis [56] and the cephalopods,Nautilus and Octopus, suggestive of switches in theassignments of leading and lagging strands within theMollusca [41].Skew patterns were different between regions of thegenome depending on their function. For AT-skews, themost strongly biased areas were found associated withprotein-encoding genes, but these were not necessarilymost pronounced at third positions or at 4FD sites(except for D. maximum; Figure 2). The two rRNAgenes showed a marked difference from protein-encod-ing genes, with a slight positive AT-skew, on average.Differences in skew patterns between rRNA and pro-tein-encoding genes have been noted elsewhere [41,57],and may reflect base-pairing constraints associated withthe secondary structures of these rRNAs [41]. Unas-signed regions, assumed to be non-coding, wereexpected to experience similar selective pressures tothose of 4FD third codon positions; this correspondencewas not strong, however, with unassigned regions show-ing variation in AT-bias across taxa from slight positiveAT-skews to moderate negative skews. tRNA genesregardless of whether they were coded for on the “+” or“-” strand exhibited no consistent skew patterns, with anaverage close to zero.GC-skews were moderately positive for most regionsof the genome, including protein-encoding genes, rRNAgenes, unassigned regions, and tRNAs encoded on the“+” strand. Skews were particularly strong, however, forthe third codon positions and 4FD third codon positionsin Dendropoma maximum and Eualetes tulipa. On aver-age, neutral to negative GC-skews were associated withregions of the “+” strand associated with tRNA genesencoded on the “-” strand.As predicted by skew patterns described above, G+Tcomposition varied across the length of the “+” strandin all four complete genomes (Figure 3). Areas of weakor no G+T bias were typically associated with rRNAsand regions with tRNAs encoded on the “-” strand. Den-dropoma maximum exhibited a particularly strong biasfor G+T (>65%) in the region from 10,000 to 14,000 bp.Codon frequencyNo evidence was found of changes to the standardgenetic code employed by other molluscs (Additionalfile 8, Table S2). Each codon was used multiple times inthe protein-encoding genes of each species (Additionalfile 9, Table S3), however, codon usage frequencies werenot equal. Codon usage strongly reflected skew patternsat the third position for synonymous codons (both two-fold and four-fold degenerate codons; see Additional file9, Table S3): the most abundant base at the third codonposition across all four complete genomes was T (36.7-43.1%) and the least abundant base was C (9.4-17.9%).The relative ranking of G (18.5-26.5%) and of A (21.0-29.8%), however, was less consistent across genomes.Putative origins of replicationLocating the regions of the mt genomes associated withthe replication origins (ORs, also known as controlregions or A+T rich regions) for both “+” and “-”strands can be challenging based on a knowledge of thenucleotide sequence of mt genomes alone. Recognitionelements can be in the form of conserved sequenceblocks, regions rich in A+T, stable stem-loop structurescontaining T-rich loops, or repetitive elements/palin-dromes [58]. ORs can also be identified by associationwith rearrangement “hotspots” [23]. Such features arenot definitive evidence, however, and can differ in utilityacross taxonomic groups. For instance, only 35% of ten-tatively identified molluscan control regions have beenfound to have palindromes and the average %AT con-tent of these regions is in the range of 65-70%, close toRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 13 of 24Figure 2 Nucleotide composition of four complete vermetid genomes. For each species, the %AT content, AT-skew, and GC-skew areshown for the “+” strand of the complete genome and its functional components, including: protein-encoding genes (complete protein, 3rdcodon positions only, and 3rd positions of 4-fold degenerate (4FD) codons only), rRNA genes, unassigned regions (UR), and tRNA genes(including “+” and “-” strand encoded genes). Error bars refer to standard errors based on a sample size from each genome of: 13 proteinencoding genes, 14 -15 “+” strand tRNA genes, and 7-8 “-” strand tRNA genes.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 14 of 24the average genomic %AT content for molluscs (basedon 9 molluscan genomes analyzed) [59]. Consequently,looking for palindromes or A+T rich regions may notbe helpful in identifying the OR of molluscs. In contrast,85% of insect control regions have palindromes and CRregions are associated with >80% AT [59].Because none of the recognition elements describedabove were useful in locating putative ORs within thecomplete genomes of the four vermetids examined here,we attempted to locate ORs by examining the relation-ship between nucleotide composition and the positionof protein-encoding genes within these genomes. Reyeset al. [45] determined that the AT and GC-skews at4FD sites of protein-encoding genes of 25 mammalianmt genomes were significantly correlated with the dura-tion of single stranded state of heavy-strand genes dur-ing replication, with increased duration reflecting, inpart, the proximity of protein encoding genes to the ori-gin of replication of the heavy strand (ORH), as well astheir position relative to the OR of the light strand(ORL). Longer durations in the single stranded conditionincrease the vulnerability of DNAs to hydrolytic andoxidative damage creating the compositional asymme-tries between the heavy and light strands (or leadingand lagging strands). In vertebrates, the ORL is typicallylocated two-thirds of the way around the genome fromthe ORH. In insects, the only protostomes examined indetail, however, the replication origins for both strands(leading and lagging) appear to be in close proximity,near the ends of a conserved block within the A+T richregion, with the lagging strand beginning after 95% ofthe leading strand has been replicated [60]. Comparativestudies are lacking for molluscs.If the ORs for both strands are located close to oneanother in molluscs, then examining nucleotide compo-sition in relation to the position of each protein-encod-ing gene could reveal putative sites for the OR. Wetested this by plotting nucleotide composition at 4FDthird codon positions as in [45] versus the midpointposition of each protein-encoding gene (Figure 4). Bymoving the putative OR between different protein-encoding genes, we found a compelling linear relation-ship, with a negative slope for %T and %G and a posi-tive slope for %A and %C, when we positioned the ORbetween nad2 and nad1 (or between nad2 and nad6 forD. gregarium). This region generally encompasses thecassette of 7 tRNA genes encoded on the “-” strand, tworRNA genes, as well as additional tRNAs, albeit withsome notable differences amongst taxa (Figure 1). Thisrelationship is shown based on averaged values acrossall four complete vermetid genomes (Figure 4: dia-monds), and separately for the genome of Dendropomamaximum (Figure 4: squares) which exhibited the stron-gest pattern across all four taxa. Reyes et al. [45] foundan increase in the frequency of A and C at 4FD sites inthe light (sense) strand in direct relation to the single-stranded duration of the heavy strand. They speculatedthat this was the result of spontaneous deamination ofadenine into hypoxanthine, which base pairs with Crather than T, and cytosine into uracil, which base pairswith A rather than G, along the heavy strand during itssingle-stranded state in replication (with associatedincreases in nucleotides A and C in the light, sensestrand). In vermetid genomes, where genes are encodedon the opposite strand to vertebrates (i.e. the heavystrand is the sense or “+” strand for all vermetid pro-tein-encoding genes, see Figure 3), genes with lower fre-quencies of G and T (or higher frequencies of C and A)at 4FD sites should be those experiencing shorter dura-tions in the single-stranded condition. The marked dif-ference in nucleotide composition of 4FD sites betweennad2 (low %G and %T) and nad1 (high %G and %T) isthus suggestive of substantial differences in exposure ofthese two genes to the single-stranded condition. Conse-quently, the OR may lie between these two protein-encoding genes. Comparisons of nucleotide compositionpatterns along the genomes of protostome taxa withwell defined ORs are now necessary to confirm or refutethe predictive power of such analyses in identifying con-trol regions.Two other observations support the general locationfor the OR between nad2 and nad1. First, this region isassociated with a major change in base compositionalbias reflecting the presence of rRNA genes and a cas-sette of tRNAs encoded on the “-” strand (Figure 3).Such changes are thought to be associated with ORs[41,58]. Second, this region of the mt genome appearsto be involved in a number of gene order rearrange-ments, possibly reflecting a rearrangement “hotspot”(see Gene order rearrangements, below). Hotspots havebeen associated with ORs in previous studies [23]. Inmany other caenogastropods, however, ORs have beententatively identified in a different region of the genome,despite the presence of a similar cassette of tRNA geneslocated on the “-” strand [48,52,53]. In these taxa, theOR is thought to be present between trnF and coxIIIwithin an unassigned stretch of sequence of variablelength (from 15 - 848 bp) associated with invertedrepeats and secondary structure elements. This geneboundary is not present in the Vermetidae. Direct exam-ination of mRNAs is now needed to demonstrate con-clusively the presence of ORs within vermetid and othercaenogastropod mt genomes.Gene order rearrangementsEach of the vermetid genomes examined here possesseda unique gene arrangement (Figure 5). The four com-plete vermetid genomes differed primarily in theRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 15 of 24position of tRNA genes, the most mobile elementswithin the mt genome [16]. While the position of manytRNA genes was conserved, the location of trnA, trnK,trnP, trnT, trnLCUN, trnLUUR, and trnV was more vari-able across the four genomes. Only D. gregarium dif-fered in the order of protein-encoding genes, with nad6changing position relative to other protein encoding andrRNA genes, as described in [18]. The two partial gen-omes of Vermetus erectus and Thylaeodus sp. providedadditional evidence of the extent of gene rearrangementwithin this family, however, with novel arrangementsboth tRNA and protein-encoding genes.Figure 3 Plots of A+C and G+T composition along the “+” strand of the mt genomes of D. maximum, D. gregarium, E. tulipa, and T.squamigerus using a sliding window of 100 nucleotides. Each plot is positioned above a linear representation of its corresponding genomehighlighting regions corresponding to the presence of: 1) protein encoding genes and tRNA genes encoded on the “+” strand, 2) rRNA genesencoded on the “+” strand, and 3) tRNA genes encoded on the “-” strand. See [41] for a comparison with Nautilus and Katharina. As is thestandard convention for metazoan mt genomes, cox1 has been designated the start point for the “+” strand.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 16 of 24Given the extent of shared gene boundaries withrepresentatives of other molluscan classes, Grande et al.[15] have suggested that the mt genome of the abalone,Haliotis rubra [56], represents the ancestral gene orderof gastropods, albeit with two derived changes. Theposition of trnD and trnN are likely autapomorphies ofHaliotis, since many caenogastropods share a different(and presumed ancestral) position for trnD with Octopusand trnN with Octopus and Katharina (Figure 5). Basedon this inferred plesiomorphic gastropod condition, ver-metids exhibit a number of derived gene order changes,some of which are shared with other caenogastropods.All published caenogastropod mt genomes, and the fourcomplete vermetid genomes examined here, share aninversion of a block of 23 genes spanning from trnF totrnE in Haliotis, with a reversion (to the original strand)of genes spanning trnM to trnE within this block. Asdiscussed by [48], this rearrangement must haveoccurred before the divergence of caenogastropods, butafter their separation from the Vetigastropoda (Haliotis).Vermetid mt genomes also shared an interesting geneorder change with two other members of the superfam-ily Littorinimorpha, Littorina saxatilis and Oncomelaniahupensis, associated with the position of the two leucinetRNA genes, trnLUUR and trnLCUN (Figure 5). Withinthe mt genome of Haliotis, Octopus, and Katharina,these two leucine tRNAs are sandwiched between rrnLand nad1 in the following arrangement: rrnL-trnLCUN-trnLUUR-nad1. This gene order is retained in allcaenogastropod mt genomes sampled to date, except forLittorina, Oncomelania, Dendropoma maximum,D. gregarium, and Vermetus erectus, where the relativeposition of these two genes has switched, such thattrnLUUR is located directly upstream from trnLCUN.(Note: this gene order is incorrectly annotated in [48](pg 40), and this gene order change is not recognized inFigure 4 Variation in nucleotide composition at 4FD sites of protein-coding genes in relation to the position of each gene within thegenome. Data are presented for: 1) the mean of all four vermetid genomes (diamonds) and 2) Dendropoma maximum alone (squares). Theposition of each protein encoding gene was calculated as the number of nucleotides from the beginning of nad1 (or nad6 for D. gregarium) tothe midpoint of the target gene. This different start point for D. gregarium was based on the derived translocation of nad6 to a position directlyupstream from nad1 in this taxon [18]. Regression equations are as follows. All four genomes: Y = -2.92 × 10-4 X + 40.20, r2 = 0.21[%T]; Y = 2.59× 10-4 X + 24.47, r2 = 0.18 [%A]; Y = 4.84 × 10-4 X + 10.60, r2 = 0.79 [%C]; Y = -4.51 × 10-4 X + 24.73, r2 = 0.56 [%G]; D. maximum alone: Y =-2.45 × 10-4 X + 44.22, r2 = 0.07 [%T]; Y = 8.66 × 10-4 X + 15.47, r2 = 0.78 [%A]; Y = 4.83 × 10-4 X + 6.63, r2 = 0.55 [%C]; Y = -1.10 × 10-3 X +33.67, r2 = 0.65 [%G].Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 17 of 24[15](page 11) or [53](page 4), where it is stated thatthere are no differences in gene order between thepartial genome of Littorina and complete genome ofneogastropods). This difference in the order of thesetwo tRNAs is also present in several other vermetid taxa(data not shown), but is not present in two additionalmembers of the Littorinimorpha: Cymatium partheno-peum and Calyptraea chinensis [53]. While this sharedgene rearrangement may be a synapomorphy defining aclade within the Littorinimorpha to which the Littorini-dae (Littorina), Pomatiopsidae (Oncomelania) and Ver-metidae belong, changes in tRNA positions involvingneighbouring leucine tRNA genes should be treatedwith caution. Gene translocations among neighbouringgenes appear to occur with increased frequency in mtgenomes and thus may be more likely to arise indepen-dently [23]. Such “position switches” between neigh-bours may therefore be less reliable phylogeneticcharacters for addressing deeper level phylogenetic ques-tions. In addition, simple changes in position of thesetwo leucine tRNA genes can mask more complicateddynamics that may involve gene duplications and tRNAremolding events [28] (see Evidence for tRNA remoldingand recruitment, below). Uncovering the dynamics thatFigure 5 Linear arrangement of protein-encoding, rRNA and tRNA genes within the complete mt genomes of vermetid gastropods,caenogastropods, and select representatives of other molluscan groups. Partial genomes of Thylaeodus sp, Vermetus erectus, Littorinasaxatilis and Calyptraea chinensis are also included. As is the standard convention for metazoan mt genomes, cox1 has been designated the startpoint for the “+” strand for all genomes. The dashed line indicates missing genes in the representation of partial genomes. Genes are transcribedleft-to-right as depicted except for those underlined to signify opposite orientation. Gene names are as in Figure 1 except that L1 and L2 refer totrnLCUN and trnLUUR, respectively, and S1 and S2 refer to trnSAGN and trnSUCN, respectively. Pseudogenes identified in the genomes of Dendropomamaximum and Thylacodes squamigerus are not shown. In some instances, gene orders differ from their original publications (e.g., [53]); this isbased on revised annotations listed in MitoZoa [63]. Abbreviations are as follows. GAST: Class Gastropoda; POLY: Class Polyplacophora; CEPH:Class Cephalopoda; CA: Caenogastropoda; Li: Littorinimorpha; Ne: Neogastropoda; VE: Vetigastropoda.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 18 of 24have occurred within this region of the genome can beessential to interpreting gene identity correctly andaccurately reconstructing past gene rearrangementevents. The discovery of two trnLUUR genes side by sidebetween rrnL and nad1 of Thylacodes squamigerusfurther suggests that such tRNA remolding events maybe at play within vermetid mt genomes.Some gene order arrangements were unique to theVermetidae (Figure 5). All four complete vermetid gen-omes shared a block of three protein-encoding genesand 3 - 4 tRNA genes: trnN-nad5-trnK-trnA-cox3-trnS1-nad2, with trnA and trnK absent from this block inEualetes and D. gregarium, respectively. This gene rear-rangement was associated with the movement of nad5from its conserved position between trnF and trnH inKatharina, Octopus, Haliotis, and other caenogastro-pods, and with the break-up of cox3, trnS1, and nad2from their conserved association within a block of genesincluding cox3 - 5 tRNAs - nad3 - trnS1 - nad2 in thesesame taxa. TrnD also shared a unique derived positionbetween cox1 and atp8 in the Vermetidae. While trnDin Haliotis has differing neighbouring genes, the geneorder cox1-trnD is shared with the chiton, Katharina,and trnD-atp8 is shared with Octopus, as well as allother caenogastropods. The derived vermetid arrange-ment thus appears to be associated with a switch in theposition of cox2 relative to cox1, trnD, and atp8, along alineage leading to the Vermetidae. In addition, the ver-metids also shared a derived change in the relative posi-tions of trnY and trnM within the cassette of 7 tRNAgenes encoded on the “-” strand. Such distinctionsbetween the mt genome of vermetids and other littorini-morphs are particularly relevant here given that there isboth molecular and morphological support for theinclusion of littorines and vermetids in a clade of lowercaenogastropods [28,37,38]. Gene rearrangements sharedby vermetids to the exclusion of other Littorinimorpha(Littorina, Oncomelania, Calyptraea, and Cymatium),thus can be inferred to have occurred following thedivergence of the ancestral vermetid from the commonancestor of these taxa.Numerous gene order changes have also occurredwithin the family Vermetidae (Figure 5). The Vermeti-dae has a fossil record extending to the Late Cretaceous,suggesting that this gene reshuffling has happenedwithin the past 65 million years, and based on moleculardating, perhaps within the past 38 million years [18].Preliminary evidence suggests that the mt genome ofDendropoma maximum represents the ancestral geneorder of the Vermetidae. The translocation of trnK1 andtrnP-nad6 found in Dendropoma gregarium has alsobeen shown to be a derived gene order change withinthis genus [18]. Relative to D. maximum, most generearrangements evident in the three other completegenomes have occurred between trnR and trnT. Eualetesdiffers from D. maximum in the position of four tRNAs:trnA, trnP, trnLUUR and trnT, with gene remoldingevents between trnLCUN and trnLUUR likely associatedwith gene order changes involving these two isoaccept-ing tRNA genes (see Evidence for tRNA remolding andrecruitment, below). Differences between the genearrangement of Dendropoma maximum and Thylacodesappear related to a gene duplication event in Thylacodesspanning rrnS to nad1, as inferred from gene vestiges ofrrnS, rrnL, and nad1 and extra tRNA genes (trnV,trnLUUR). Details of this rearrangement will be examinedfurther in another paper (Rawlings et al. in prep). Manyof the conserved gene boundaries described above forthe four complete vermetid genomes sequenced, eventhose shared with other caenogastropods, Haliotis,Katharina, and Octopus, were not evident in the partialgenomes of V. erectus and Thylaeodus (Figures 1 &5).The scale of these rearrangements is impressive giventhat these changes have happened within a family ofgastropods, and suggests that further sampling of theVermetidae should uncover more changes and perhapsintermediate stages that may reveal the dynamics under-lying the gene rearrangements shown here. The genomeof Thylacodes squamigerus provides hope of this as ves-tiges of once functional genes discovered between anno-tated genes are helping to understand the mechanism ofgene order change and the appearance of duplicatedtRNAs. This expectation has also been borne out byadditional data collected as part of a phylogenetic analy-sis of the Vermetidae: numerous gene order changeshave now been uncovered within vermetid taxa basedon short sequences (<3.5 kb) extending from rrnS tonad1 (Rawlings et al., in prep; data not shown). Withfurther sampling of the Vermetidae, we anticipate thatthese gene order rearrangements will provide a suite ofrobust characters that can be used, in addition to mor-phological and nucleotide sequence characters, to builda well-supported phylogenetic hypothesis for this familyand to firmly place the Vermetidae within the contextof caenogastropod evolution.Evidence for tRNA remolding and recruitmentImplicit in the use of secondary structure characteristicsand anticodon triplets to recognize tRNAs is theassumption that tRNA genes cannot change identity bysimple nucleotide substitutions in their anticodon. Forthe most part this seems to be true within animal mtgenomes, likely because of the presence of specificrecognition elements that are required by tRNA synthe-tases to identify and correctly charge their associatedtRNAs. However, evidence is accumulating that tRNAsdo occasionally change identities. Through a processknown as tRNA remolding, identity change does occurRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 19 of 24between the two isoaccepting leucine tRNA genes[25,26,28] and possibly also between the two isoaccept-ing serine tRNA genes [61]. Cases of tRNA identitychange across non-isoaccepting tRNAs, referred to astRNA recruitment, are now also coming to light [27].The strongest evidence of tRNA remolding is the dis-covery of unexpectedly high levels of sequence similaritybetween two tRNA leucine genes [28]. Consequently,these dynamics are most easily recognized when theyhave happened recently, before mutational changes canobscure the common history of the duplicated tRNAs.Although gene remolding and recruitment events areoften associated with changes in gene order, the patternof duplicate loss may result in maintenance of the origi-nal gene order. In fact, recognizing gene remoldingevents can often help to uncover genome dynamics thatare hidden at the level of gene order alone [28].Given that leucine tRNAs seem particularly suscepti-ble to remolding events [26,28], we investigated thesequence similarity between both leucine tRNAs withineach vermetid taxon as well as other select caenogastro-pods, Haliotis, Octopus, and Katharina (Table 9). Highsequence similarities (80% or greater) between leucinetRNA genes were found in five taxa (Table 9) suggestingthat one tRNA has taken over the role of the otherthrough a process of gene duplication, mutation in theanticodon triplet and the eventual loss of the originalgene [26,28]. The T. squamigerus genome represents aparticularly interesting case of this. Within this genome,the two copies of LUUR only share 67.6% sequence iden-tity; in comparison there is 98.4% sequence similaritybetween trnLCUN and trnLUUR2. Consequently, we canassume that trnLUUR2 is a duplicated copy of trnLCUNthat has subsequently undergone a mutation in the thirdposition of the anticodon to assume the identity oftrnLUUR. Such events argue for the exclusion of tRNAleucine genes in gene-order based phylogenetic analysesat high taxonomic levels; at low taxonomic levels, suchas within the Vermetidae, however, they offer the pro-mise of new phylogenetic characters and the discoveryof gene dynamics that may not be evident at the level ofgene order alone. Given the importance of identifyingremolding events, it is surprising that so little attentionis often paid to the dynamics of these two leucine tRNAgenes. Failure to distinguish between these two genescorrectly can also lead to mistakes in interpretation,where two genomes are considered to have identicalgene orders, but differ in the position of these two leu-cine tRNAs (see comparisons between Littorina saxatilisand other caenogastropods in [15,48,53]).The presence of a second trnK within the genome ofDendropoma gregarium is suggestive of a past duplica-tion event within this genome, likely the result ofslipped-strand mispairing during replication. Thesequence similarity between the two lysine tRNAs(trnK) is not high (38.0%). Comparisons between thesesequences and the sequences of presumed trnK ortho-logs from other taxa revealed strong similarities betweenthese and the trnK1 located between trnV and trnP(Figure 6). The second trnK (trnK2), located betweentrnI and nad3 lacked many of the conserved sequenceelements present in other trnKs (Figure 6). We com-pared trnK2 with other tRNAs within the genome ofD. gregarium to determine if this might represent a caseTable 9 Percentage similarity in nucleotide sequencesbetween isoaccepting leucine tRNA genes (trnLUUR andtrnLCUN) within the mt genomes of six vermetids, sixteenadditional caenogastropods, and three other molluscs forcomparisonSpecies tRNA comparisonab Percent similaritycDendropoma maximum LUUR-LCUN 60.9%Dendropoma gregarium LUUR-LCUN 69.0%Eualetes tulipa LCUN-/-LUUR 98.4%Thylacodes squamigerusd LCUN-/-LUUR1 66.2%LCUN-/-LUUR2 98.4%LUUR1-LUUR2 67.6%Vermetus erectus LUUR-LCUN 76.6%Thylaeodus sp. LCUN-/-LUUR 87.0%Oncomelania LUUR-LCUN 91.2%Littorina LUUR-LCUN 80.3%Calyptraea LCUN-LUUR 58.0%Cymatium LCUN-LUUR 61.8%Thais LCUN-LUUR 58.0%Rapana LCUN-LUUR Insufficient dataeConus textile LCUN-LUUR 60.9%Conus borgesi LCUN-LUUR 60.9%Lophiotoma LCUN-LUUR 62.3%Ilyanassa LCUN-LUUR 64.7%Nassarius LCUN-LUUR 64.7%Bolinus LCUN-LUUR 62.3%Cancellaria LCUN-LUUR 52.7%Cymbium LCUN-LUUR 58.2%Fusiturris LCUN-LUUR 66.7%Terebra LCUN-LUUR 59.4%Haliotis rubra LCUN-LUUR 64.2%Katharina tunicata LCUN-LUUR 48.5%Octopus vulgaris LCUN-LUUR 66.7%a tRNA genes were aligned using Clustal X with subsequent adjustmentsmade according to secondary structure.b Leucine tRNAs are presented according to relative order and position: onedash indicates that the two tRNAs are adjacent to each other; two dashesseparated by a backslash indicate that these two genes are separated by atleast one other gene.c Percent similarity reflects the number of nucleotide matches over the totallength of the alignment, excluding the third base of the anticodon triplet.d Because Thylacodes had two copies of trnLUUR, three pairwise comparisonsbetween leucine tRNA genes were made for this species.e Sequence contained ambiguous nucleotides.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 20 of 24of tRNA recruitment [27], but did not uncover anystrong matches with any other gene. Consequently, theorigin and function of this putative tRNA are currentlyunclear.ConclusionsHere we have presented gene order and sequence char-acteristics for six gastropods within the family Vermeti-dae, including the complete mt genomes of four species(Dendropoma maximum, D. gregarium, Eualetes tulipa,and Thylacodes squamigerus) and the partial mt gen-omes of two others (Vermetus erectus and Thylaeodussp.). The publication of these genomes increases thenumber of complete caenogastropod mt genomessequenced from 16 to 20 and presents the first directevidence of major gene order rearrangements within theLittorinimorpha.While molluscs have long been recognized as havingunusually dynamic mt genomes, preliminary sampling ofthe Caenogastropoda represented by genomes from 16species has suggested a model of highly conserved mtgene order. Our results reverse this trend and providefurther evidence of the lability of the molluscan mt gen-ome by describing extensive gene order changes thathave occurred within one family of caenogastropod mol-luscs, the Vermetidae. Based on these results, we antici-pate that the mt genomes of caenogastropods exhibit avariety of gene order arrangements, which, if exploredand exploited, could provide a wealth of phylogenetically-informative characters to enhance our understanding ofFigure 6 Structural alignments of lysine tRNA genes sampled from four vermetid gastropods and other select caenogastropods. TwotRNA genes are included from Dendropoma gregarium, one located between trnV and trnP (trnK1), the other located between trnI and nad3(trnK2). Nucleotides appearing in >50% of taxa at a given position in the gene are shaded grey. Colors below each region of the alignment areused to show the associated position in the colored tRNA secondary structure model.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 21 of 24the evolutionary radiation of this diverse clade ofgastropods.Despite the extent of mt gene rearrangement withinthe Vermetidae, their genomes exhibit similar character-istics to other caenogastropods. Each complete genomecontained the full complement of 37 genes, althoughadditional tRNA genes were evident in the mt genomesof D. gregarium (trnK) and Thylacodes squamigerus(trnV, trnLUUR). Protein-encoding and rRNA genes wereall encoded on the same strand (i.e. the “+” strand), notunusual for caenogastropods, whereas tRNA genes, themost mobile mt genomic components, were distributedbetween the “+” and “-” strands. Nucleotide skews pat-terns (i.e. AT and GC-skews) of vermetid genomes andtheir various components were similar to thosedescribed for other caenogastropods, although the ATbias was less pronounced than other caenogastropodsdescribed to date. Although no control regions weredefinitively identified, a compelling trend in the relation-ship between nucleotide composition and position fromnad1 was uncovered, which appears worthy of furtherinvestigation.Our results also demonstrate that focused sampling ofmt genomes at low taxonomic levels can be extremelyproductive, both in terms of uncovering characters use-ful in phylogeny and in understanding more fully theevolutionary dynamics and mechanics of mt gene rear-rangements. Each of the six vermetid mt genomesexamined had a unique gene order, with evidence ofgene rearrangements involving translocations of tRNAand protein-encoding genes, and one gene inversion,occurring within this family. Additional rearrangementshave also been uncovered amongst other representativesof this family suggesting that further sampling of com-plete mt genomes within the Vermetidae will also proveuseful. The extent of gene rearrangement within suchan evolutionarily young group offers the opportunity toexplore gene dynamics such as tRNA remolding, whichappears to have been rife within this family, and toinvestigate its consequences for phylogenetic analysesbased on gene orders. The sampling of mt genomes,such as that of Thylacodes squamigerus, in the inter-mediate stages of a gene rearrangement, may also helpto illuminate mechanisms of gene translocation andinversion and to interpret past events that have shapedan organism’s current mt gene order. Continued studiesof the mt genome dynamics within the Vermetidae willimprove our understanding of this family’s phylogeny,its phylogenetic placement within the Littorinimorpha,and the mechanisms and processes that have actedto shape the mt genomes of this family and othermetazoans.Additional materialAdditional file 1: Figure S1. Inferred tRNA secondary structures basedon the nucleotide sequences of 22 mitochondrial tRNA genes identifiedfrom the complete mt genome of Dendropoma maximum. tRNA genesare labeled according to their amino acid specificity and are arrangedalphabetically.Additional file 2: Figure S2. Inferred tRNA secondary structures basedon the nucleotide sequences of 23 mitochondrial tRNA genes identifiedfrom the complete mt genome of Dendropoma gregarium. tRNA genesare labeled according to their amino acid specificity and are arrangedalphabetically.Additional file 3: Figure S3. Inferred tRNA secondary structures basedon the nucleotide sequences of 22 mitochondrial tRNA genes identifiedfrom the complete mt genome of Eualetes tulipa. tRNA genes arelabeled according to their amino acid specificity and are arrangedalphabetically.Additional file 4: Figure S4. Inferred tRNA secondary structures basedon the nucleotide sequences of 24 mitochondrial tRNA genes identifiedfrom the complete mt genome of Thylacodes squamigerus. tRNA genesare labeled according to the amino acid specificity and are arrangedalphabetically.Additional file 5: Figure S5. Inferred tRNA secondary structures basedon the nucleotide sequences of eight mitochondrial tRNA genesidentified from the partial mt genome of Thylaeodus sp. tRNA genes arelabeled according to their amino acid specificity and are arrangedalphabetically.Additional file 6: Figure S6. Inferred tRNA secondary structures basedon the nucleotide sequences of five mitochondrial tRNA genes identifiedfrom the partial mt genome of Vermetus erectus. tRNA genes arelabeled according to their amino acid specificity and are arrangedalphabetically.Additional file 7: Table S1. Base compositions and nucleotide skews fornew vermetid mt genomes, existing caenogastropod mt genomes, andother select molluscs.Additional file 8: Table S2. The conservation of amino acid identity inselect codons from cox1, cox2, and cox3 genes from the mt genomesof Dendropoma maximum, D. gregarium, Eualetes tulipa, andThylacodes squamigerus.Additional file 9: Table S3. Summary of codon usage across all protein-encoding genes in the mitochondrial genomes of Dendropomamaximum, D. gregarium, Eualetes tulipa, and Thylacodes squamigerus.AcknowledgementsWe thank Drs. Michael Hadfield, Isabella Kappner, Jon Norenburg, andRichard and Megumi Strathmann for vermetid samples, and Dr. MónicaMedina for assistance with genome sequencing. MJM was funded in part byCBU RP Grant 8341 to TAR. Vermetid research was supported by NSF DEB(REVSYS)-0841760/0841777 to RB, TMC, and TAR.Author details1Cape Breton University, 1250 Grand Lake Road, Sydney, NS B1P 6L2,CANADA. 2University of British Columbia, 2239 West Mall, Vancouver, BC V6T1Z4, CANADA. 3Field Museum of Natural History, 1400 S. Lake Shore Dr,Chicago, IL 60605-2496, USA. 4Genome Project Solutions, Inc.,1024Promenade Street, Hercules, CA 94547, USA. 5Florida International University,11200 SW 8th Street, University Park, Miami, FL 33199, USA. 6NationalScience Foundation, 4201 Wilson Boulevard, Arlington, VA 22230, USA.Authors’ contributionsRB and TAR obtained the samples. TAR and TMC extracted the DNA andamplified the genomes which were then sequenced by JLB. TAR and MJMannotated the genome, MJM undertook the descriptive analyses and TARRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 22 of 24and MJM wrote the first draft of the manuscript. All authors participated insubsequent revisions of the manuscript.Received: 23 February 2010 Accepted: 19 July 2010Published: 19 July 2010References1. Boore JL: Animal mitochondrial genomes. Nucleic Acids Research 1999,27(8):1767-1780.2. Smith MJ, Arndt A, Gorski S, Fajber E: The phylogeny of echinodermclasses based on mitochondrial gene arrangements. Journal of MolecularEvolution 1993, 36:545-554.3. Boore JL, Brown WM: Complete sequence of the mitochondrial DNA ofthe annelid worm Lumbricus terrestris. Genetics 1995, 141:305-319.4. Lavrov DV, Brown WM, Boore JL: Phylogenetic position of thePentastomida and (pan)crustacean relationships. Proceedings of the RoyalSociety of London Series B-Biological Sciences 2004, 271(1538):537-544.5. Boore JL, Brown WM: Mitochondrial genomes of Galathealinum,Helobdella, and Platynereis: sequence and gene arrangementcomparisons indicate that Pogonophora is not a phylum and Annelidaand Arthropoda are not sister taxa. Molecular Biology and Evolution 2000,17(1):87-106.6. Lavrov DV, Lang BF: Poriferan mtDNA and animal phylogeny based onmitochondrial gene arrangements. Systematic Biology 2005, 54(4):651-659.7. Boore JL, Macey JR, Medina M: Sequencing and comparing wholemitochondrial genomes of animals. Methods in Enzymology 2005,395:311-348.8. Simison WB, Lindberg DR, Boore JL: Rolling circle amplification ofmetazoan mitochondrial genomes. Molecular Phylogenetics and Evolution2006, 39(2):562-567.9. Hudson ME: Sequencing breakthroughs for genomic ecology andevolutionary biology. Molecular Ecology Resources 2007, 8:3-17.10. Blanchette M, Kunisawa T, Sankoff D: Gene order breakpoint evidence inanimal mitochondrial phylogeny. Journal of Molecular Evolution 1999,49(2):193-203.11. Moret BME, Tang JJ, Wang LS, Warnow T: Steps toward accuratereconstructions of phylogenies from gene-order data. Journal ofComputer and System Sciences 2002, 65(3):508-525.12. Larget B, Simon DL, Kadane JB, Sweet D: A Bayesian analysis of metazoanmitochondrial genome arrangements. Molecular Biology and Evolution2005, 22(3):486-495.13. Simison WB, Boore JL: Molluscan evolutionary genomics. Phylogeny andEvolution of the Mollusca Berkeley, CA: University of California PressPonderW, Lindberg DR 2008, 447-461.14. Vallès Y, Boore JL: Lophotrochozoan mitochondrial genomes. Integrativeand Comparative Biology 2006, 46(4):544-557.15. Grande C, Templado J, Zardoya R: Evolution of gastropod mitochondrialgenome arrangements. BMC Evolutionary Biology 2008, 8.16. Gissi C, Iannelli F, Pesole G: Evolution of the mitochondrial genome ofMetazoa as exemplified by comparison of congeneric species. Heredity2008, 101(4):301-320.17. Milbury CA, Gaffney PM: Complete mitochondrial DNA sequence of theeastern oyster Crassostrea virginica. Marine Biotechnology 2005,7(6):697-712.18. Rawlings TA, Collins TM, Bieler R: A major mitochondrial generearrangement among closely related species. Molecular Biology andEvolution 2001, 18(8):1604-1609.19. Serb JM, Lydeard C: Complete mtDNA Sequence of the North AmericanFreshwater Mussel, Lampsilis ornata (Unionidae): an examination of theevolution and phylogenetic utility of mitochondrial genomeorganization in Bivalvia (Mollusca). Molecular Biology and Evolution 2003,20(11):1854-1866.20. Dowton M, Castro LR, Austin AD: Mitochondrial gene rearrangements asphylogenetic characters in the invertebrates: the examination ofgenome ‘morphology’. Invertebrate Systematics 2002, 16(3):345-356.21. Ballard JWO, Rand DM: The population biology of mitochondrial DNAand its phylogenetic implications. Annual Review of Ecology Evolution andSystematics 2005, 36:621-642.22. Levinson G, Gutman GA: Slipped-strand mispairing: a major mechanismfor DNA sequence evolution. Molecular Biology and Evolution 1987,4(3):203-221.23. Boore JL, Brown WM: Big trees from little genomes: mitochondrial geneorder as a phylogenetic tool. Current Opinion in Genetics & Development1998, 8(6):668-674.24. Lunt DH, Hyman BC: Animal mitochondrial DNA recombination. Nature1997, 387:247.25. Cantatore P, Gadaleta MN, Roberti M, Saccone C, Wilson AC: Duplicationand remoulding of tRNA genes during the evolutionary rearrangementof mitochondrial genomes. Nature 1987, 329:853-855.26. Higgs PG, Jameson D, Jow H, Rattray M: The evolution of tRNA-Leu genesin animal mitochondrial genomes. Journal of Molecular Evolution 2003,57(4):435-445.27. Lavrov DV, Lang BF: Transfer RNA gene recruitment in mitochondrialDNA. Trends in Genetics 2005, 21(3):129-133.28. Rawlings TA, Collins TM, Bieler R: Changing identities: tRNA duplicationand remolding within animal mitochondrial genomes. Proceedings of theNational Academy of Sciences of the United States of America 2003,100(26):15700-15705.29. Dowton M, Austin AD: Evolutionary dynamics of a mitochondrialrearrangement “hot spot” in the Hymenoptera. Molecular Biology andEvolution 1999, 16(2):298-309.30. San Mauro D, Gower DJ, Zardoya R, Wilkinson M: A hotspot of gene orderrearrangement by tandem duplication and random loss in thevertebrate mitochondrial genome. Molecular Biology and Evolution 2006,23(1):227-234.31. Lavrov DV, Boore JL, Brown WM: Complete mtDNA sequences of twomillipedes suggest a new model for mitochondrial generearrangements: Duplication and nonrandom loss. Molecular Biology andEvolution 2002, 19(2):163-169.32. Flook P, Rowell H, Gellissen G: Homoplastic rearrangements of insectmitochondrial transfer-RNA genes. Naturwissenschaften 1995,82(7):336-337.33. Macey JR, Papenfuss TJ, Kuehl JV, Fourcade HM, Boore JL: Phylogeneticrelationships among amphisbaenian reptiles based on completemitochondrial genomic sequences. Molecular Phylogenetics and Evolution2004, 33(1):22-31.34. Dowton M, Cameron SL, Dowavic JI, Austin AD, Whiting MF:Characterization of 67 mitochondrial tRNA gene rearrangements in theHymenoptera suggests that mitochondrial tRNA gene position isselectively neutral. Molecular Biology and Evolution 2009, 26(7):1607-1617.35. Iannelli F, Griggio F, Pesole G, Gissi C: The mitochondrial genome ofPhallusia mammillata and Phallusia fumigata (Tunicata, Ascidiacea): highgenome plasticity at intra-genus level. BMC Evolutionary Biology 2007, 7.36. Ponder WF, Lindberg DR: Towards a phylogeny of gastropod molluscs: ananalysis using morphological characters. Zoological Journal of the LinneanSociety 1997, 119:83-265.37. Colgan DJ, Ponder WF, Beacham E, Macaranas J: Molecular phylogeneticsof Caenogastropoda (Gastropoda: Mollusca). Molecular Phylogenetics andEvolution 2007, 42(3):717-737.38. Healy JM: Sperm morphology in Serpulorbis and Dendropoma and itsrelevance to the systematic position of the Vermetidae. Journal ofMolluscan Studies 1988, 54:295-308.39. Collins TM, Frazer KS, Palmer AR, Vermeij GJ, Brown WM: Evolutionaryhistory of Northern Hemisphere Nucella: molecules, morphology,ecology, and fossils. Evolution 1996, 50:2287-2304.40. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R: DNA primers foramplification of mitochondrial cytochrome c oxidase subunit I fromdiverse metazoan invertebrates. Molecular Marine Biology andBiotechnology 1994, 3(5):294-299.41. Boore JL: The complete sequence of the mitochondrial genome ofNautilus macromphalus (Mollusca: Cephalopoda). BMC Genomics 2006, 7.42. Lowe TM, Eddy SR: tRNAscan-SE: A program for improved detection oftransfer RNA genes in genomic sequence. Nucleic Acids Research 1997,25(5):955-964.43. Laslett D, Canback B: ARWEN: a program to detect tRNA genes inmetazoan mitochondrial nucleotide sequences. Bioinformatics 2008,24(2):172-175.44. Perna NT, Kocher TD: Patterns of nucleotide composition at fourfolddegenerate sites of animal mitochondrial genomes. Journal of MolecularEvolution 1995, 41(3):353-358.Rawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 23 of 2445. Reyes A, Gissi C, Pesole G, Saccone C: Asymmetrical directional mutationpressure in the mitochondrial genome of mammals. Molecular Biologyand Evolution 1998, 15(8):957-966.46. Santos MAS, Moura G, Massey SE, Tuite MF: Driving change: the evolutionof alternative genetic codes. Trends in Genetics 2004, 20(2):95-102.47. Yokobori S, Suzuki T, Watanabe K: Genetic code variations inmitochondria: tRNA as a major determinant of genetic code plasticity.Journal of Molecular Evolution 2001, 53:(4-5):314-326.48. Bandyopadhyay PK, Stevenson BJ, Cady MT, Olivera BM, Wolstenholme DR:Complete mitochondrial DNA sequence of a Conoidean gastropod,Lophiotoma (Xenuroturris) cerithiformis: gene order and gastropodphylogeny. Toxicon 2006, 48(1):29-43.49. Boore JL, Daehler LL, Brown WM: Complete sequence, gene arrangement,and genetic code of mitochondrial DNA of the cephalochordateBranchiostoma floridae (Amphioxus). Molecular Biology and Evolution 1999,16(3):410-418.50. Bailey TL, Elkan C: Fitting a mixture model by expectation maximizationto discover motifs in biopolymers. Proceedings of the Second InternationalConference on Intelligent Systems for Molecular Biology: 1994; Menlo Park,California AAAI Press 1994, 28-36.51. Zuker M: Mfold web server for nucleic acid folding and hybridizationprediction. Nucleic Acids Research 2003, 31(13):3406-3415.52. Bandyopadhyay PK, Stevenson BJ, Ownby JP, Cady MT, Watkins M,Olivera BM: The mitochondrial genome of Conus textile, coxI-coxIIintergenic sequences and conoidean evolution. Molecular Phylogeneticsand Evolution 2008, 46(1):215-223.53. Cunha RL, Grande C, Zardoya R: Neogastropod phylogenetic relationshipsbased on entire mitochondrial genomes. BMC Evolutionary Biology 2009,9:210.54. Jang KH, Hwang UW: Complete mitochondrial genome of Bugula neritina(Bryozoa, Gymnolaemata, Cheilostomata): phylogenetic position ofBryozoa and phylogeny of lophophorates within the Lophotrochozoa.BMC Genomics 2009, 10.55. Boore JL, Brown WM: Complete DNA-sequence of the mitochondrialgenome of the black chiton, Katharina tunicata. Genetics 1994,138(2):423-443.56. Maynard BT, Kerr LJ, McKiernan JM, Jansen ES, Hanna PJ: MitochondrialDNA sequence and gene organization in the Australian Blacklip AbaloneHaliotis rubra (Leach). Marine Biotechnology 2005, 7:645-658.57. Wang X, Lavrov DV: Seventeen new complete mtDNA sequences revealextensive mitochondrial genome evolution within the Demospongiae.Public Library of Science One 2008, 3(7):e2723.58. Brugler MR, France SC: The mitochondrial genome of a deep-sea bamboocoral (Cnidaria, Anthozoa, Octocorallia, Isididae): genome structure andputative origins of replication are not conserved among octocorals.Journal of Molecular Evolution 2008, 67(2):125-136.59. Arunkumar KP, Nagaraju J: Unusually long palindromes are abundant inmitochondrial control regions of insects and nematodes. Public Library ofScience One 2006, 1(1):e110.60. Saito S, Tamura K, Aotsuka T: Replication origin of mitochondrial DNA ininsects. Genetics 2005, 171(4):1695-1705.61. Lavrov DV, Brown WM: Trichinella spiralis mtDNA: a nematodemitochondrial genome that encodes a putative ATP8 and normallystructures tRNAs and has a gene arrangement relatable to those ofcoelomate metazoans. Genetics 2001, 157(2):621-637.62. Bieler R, Petit RE: Thylacodes - Thylacodus - Tulaxodus: Worm-snail nameconfusion and the status of Serpulorbis (Gastropoda: Vermetidae).Malacologia 2010, 52(1):183-187.63. Lupi R, D’Onorio de Meo P, Picardi E, D’Antonio M, Paoletti D,Castrignano T, Pesole G, Gissi C: MitoZoa: A curated mitochondrialgenome database of metazoans for comparative genomics studies.Mitochondrion 2010, 10(2):192-199.doi:10.1186/1471-2164-11-440Cite this article as: Rawlings et al.: Sessile snails, dynamic genomes:gene rearrangements within the mitochondrial genome of a family ofcaenogastropod molluscs. BMC Genomics 2010 11:440.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitRawlings et al. BMC Genomics 2010, 11:440http://www.biomedcentral.com/1471-2164/11/440Page 24 of 24


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items