UBC Faculty Research and Publications

Transcriptome analysis for Caenorhabditis elegans based on novel expressed sequence tags Shin, Heesun; Hirst, Martin; Bainbridge, Matthew N; Magrini, Vincent; Mardis, Elaine; Moerman, Donald G; Marra, Marco A; Baillie, David L; Jones, Steven J Jul 8, 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-12915_2008_Article_183.pdf [ 1.26MB ]
Metadata
JSON: 52383-1.0132618.json
JSON-LD: 52383-1.0132618-ld.json
RDF/XML (Pretty): 52383-1.0132618-rdf.xml
RDF/JSON: 52383-1.0132618-rdf.json
Turtle: 52383-1.0132618-turtle.txt
N-Triples: 52383-1.0132618-rdf-ntriples.txt
Original Record: 52383-1.0132618-source.json
Full Text
52383-1.0132618-fulltext.txt
Citation
52383-1.0132618.ris

Full Text

ralssBioMed CentBMC BiologyOpen AcceResearch articleTranscriptome analysis for Caenorhabditis elegans based on novel expressed sequence tagsHeesun Shin*1,2, Martin Hirst2, Matthew N Bainbridge2, Vincent Magrini3, Elaine Mardis3, Donald G Moerman4, Marco A Marra2, David L Baillie1 and Steven JM Jones2Address: 1Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada, 2Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, British Columbia Cancer Agency, Vancouver, BC, Canada, 3Genome Sequencing Center, Washington University School of Medicine, St Louis, USA and 4Department of Zoology, University of British Columbia, Vancouver, BC, CanadaEmail: Heesun Shin* - heesuns@sfu.ca; Martin Hirst - mhirst@bcgsc.ca; Matthew N Bainbridge - matthewb@bcgsc.ca; Vincent Magrini - vmagrini@watson.wustl.edu; Elaine Mardis - emardis@watson.wustl.edu; Donald G Moerman - moerman@zoology.ubc.ca; Marco A Marra - mmarra@bcgsc.ca; David L Baillie - baillie@sfu.ca; Steven JM Jones - sjones@bcgsc.ca* Corresponding author    AbstractBackground: We have applied a high-throughput pyrosequencing technology for transcriptome profiling ofCaenorhabditis elegans in its first larval stage. Using this approach, we have generated a large amount of data forexpressed sequence tags, which provides an opportunity for the discovery of putative novel transcripts andalternative splice variants that could be developmentally specific to the first larval stage. This work alsodemonstrates the successful and efficient application of a next generation sequencing methodology.Results: We have generated over 30 million bases of novel expressed sequence tags from first larval stage wormsutilizing high-throughput sequencing technology. We have shown that approximately 14% of the newly sequencedexpressed sequence tags map completely or partially to genomic regions where there are no annotated genes orsplice variants and therefore, imply that these are novel genetic structures. Expressed sequence tags, which mapto intergenic (around 1000) and intronic regions (around 580), may represent novel transcribed regions, such asunannotated or unrecognized small protein-coding or non-protein-coding genes or splice variants. Expressedsequence tags, which map across intron-exon boundaries (around 300), indicate possible alternative splice sites,while expressed sequence tags, which map near the ends of known transcripts (around 600), suggest extensionof the coding or untranslated regions. We have also discovered that intergenic and intronic expressed sequencetags, which are well conserved across different nematode species, are likely to represent non-coding RNAs.Lastly, we have incorporated available serial analysis of gene expression data generated from first larval stageworms, in order to predict novel transcripts that might be specifically or predominantly expressed in the firstlarval stage.Conclusion: We have demonstrated the use of a high-throughput sequencing methodology to efficientlyproduce a snap-shot of transcriptional activities occurring in the first larval stage of C. elegans development. Suchapplication of this new sequencing technique allows for high-throughput, genome-wide experimental verificationof known and novel transcripts. This study provides a more complete C. elegans transcriptome profile and,Published: 8 July 2008BMC Biology 2008, 6:30 doi:10.1186/1741-7007-6-30Received: 13 June 2008Accepted: 8 July 2008This article is available from: http://www.biomedcentral.com/1741-7007/6/30© 2008 Shin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 14(page number not for citation purposes)furthermore, gives insight into the evolutionary and biological complexity of this organism.BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30BackgroundComputationally based genomic analyses have been ableto accomplish interpretation of the genome of Caenorhab-ditis elegans on a global scale. The aims of some high-throughput genomic projects have focused on the identi-fication of developmental stage, tissue or cell-specific'transcriptomes', which attempt to describe transcribedregions and their relative abundance [1-3].Approaches such as microarray, serial analysis of geneexpression (SAGE) [4], and expressed sequence tag (EST)analysis have been widely used for the identification ofgenes that are selectively turned on or off in specific cell ortissue types with regard to development, aging, and dis-ease. These approaches have also provided experimentalevidence for the confirmation of predicted gene struc-tures, alternative splice variants [5], and non-coding RNAs(ncRNAs) [6].At present, there are approximately 340,000 C. elegansESTs in WormBase which, in addition to available cDNAsequences, provide complete transcriptional evidence for34.6% of the transcripts. The remaining transcripts arepartially confirmed by ESTs or only computationally pre-dicted by comparative genomics or ab-initio gene predic-tion methods (WS180 release notes).We have sequenced a large number of ESTs from a C. ele-gans cDNA population, synchronized at the first larval(L1) developmental stage, by a high-throughput, sequenc-ing-by-synthesis technology, namely 454 sequencing [7].This method produces DNA sequences more rapidly andcost-effectively than the traditional Sanger sequencingapproach and has been successfully utilized in other stud-ies for various purposes, such as expression profiling andnovel gene discovery [8-10]. We have generated morethan 300,000 novel C. elegans EST sequences by thishighly parallel sequencing method for this study.We have analyzed the novel sequence data to obtain amore complete C. elegans transcriptome profile, providingnot only confirmation of computationally predicted tran-scripts but also the identification of potential novel tran-scripts, alternatively spliced variants, and ncRNAs [11]. Inaddition, the increased depth of this sequencing of C. ele-gans L1 cDNA library facilitated a more sensitive search fornovel transcribed regions that may be specific for the firstlarval stage of C. elegans. We have also investigated conser-vation of potential novel transcribed regions across avail-able nematode species namely: C. elegans, C. briggsae, C.remanei, C. brenneri, Brugia malayi and Pristionchus pacifi-cus.Results and discussion454 EST sequencing identifies known transcripts and partially confirms computationally predicted transcriptsUsing sequencing-by-synthesis technology we have gener-ated a total of 300,453 reads (30,907,940 bases) from anL1-specific cDNA sample with an average read length of102 bases. An average 454 read accuracy is measured to be99.4% with substantially all of the bases having Phred 20or better quality [7]. Sequences identified as vector con-tamination were filtered out using Crossmatch [12],resulting in a data set of 298,838 454 ESTs, which werealigned using the Basic Local and Alignment and SearchTool (BLAST) [32], to around 22,000 known and pre-dicted C. elegans genes (WormBase release WS160). Fromthis set, a total of 229,989 454 ESTs (77%) were directlymapped to 6132 known or predicted C. elegans genes byBLAST with high confidence value (P-value less than 9 ×10-7). Transcripts which have the greatest number of 454ESTs (250 to 10,000), generally match ribosomal proteincoding genes. This is expected as ribosomal protein cod-ing genes are the most abundantly expressed type ofgenes. These data provide partial experimental evidencefor approximately 200 genes, which have previously beenpredicted only through computational methods (Addi-tional file 1).Around 22% of the 454 EST data (66,358 reads) had nosignificant matches to known or predicted C. elegans tran-scripts at the specified stringency and as such may repre-sent previously unidentified genetic structures, such asnovel transcripts, L1 stage-specific transcripts, novel splicevariants and ncRNAs. The remaining 1% (2491 reads) ofthe data ambiguously map to more than one transcript atthe high stringency used, although these ambiguousmatches are usually simple repeats or sequences of low-complexity.454 EST reads are biased towards 3'-transcript endsThe physical distribution of 454 EST reads, which mapacross known transcripts from their 5'- to 3'-ends, shows alarger coverage on the 3'-ends (Figure 1A). This is likelydue to the presence of partial transcripts in the cDNAlibrary. The lack of splice leader sequences also indicatesunder-representation of the 5'-ends of the transcripts. Onaverage, six unique EST reads were mapped to each knowntranscript ranging from a single EST to 847 unique ESTs,with a median of two ESTs.Most statistically over-represented genes identified by 454 ESTs correlate to developmental, reproductive, and cellular metabolic processesWe performed gene ontology (GO) analysis on 6132genes identified by 454 ESTs using GOstat [14]. The mostPage 2 of 14(page number not for citation purposes)statistically over-represented GO annotations in biologi-cal processes (P-value less than 9 × 10-10) within thisBMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30group of genes correlate to multicellular organismaldevelopment processes (that is, larval development, post-embryonic body morphogenesis, positive regulation ofgrowth, and homeostatic process), reproductive develop-mental processes in a multicellular organism (that is, sexdifferentiation, gamete generation, genitalia develop-ment, and oviposition), and lastly, cellular metabolicprocesses (that is, translation, cellular component organi-zation and biogenesis, co-enzyme biosynthetic process,and protein and RNA metabolic processes).454 ESTs that map to C. elegans' genome identify putative novel transcripts or splice variantsThe 22% (66,358 reads) of 454 ESTs, which did not havesignificant matches to known or predicted C. elegans tran-scripts, was subsequently compared with the genomicsequence of C. elegans using BLAST. As a result, 31,570ESTs (14%) map to the genome at a high stringency (thatis, P-value less than 9 × 10-5). A stringent P-value thresh-old of 9 × 10-7 was used for mapping 454 ESTs to the tran-scriptome to ensure that read alignments to thetranscriptome were of very high quality and unlikely tooccur by chance. Subsequently the less stringent thresholdof 9 × 10-5 was used here for alignments against thegenome. Although this increases the chance of incorrectalignment, it increases the total number of aligned readsand may facilitate the discovery of novel transcriptionevents, which can subsequently be validated.The remaining ESTs (8%), which do not map to either thetranscriptome or genome are composed of contamina-tion, low complexity or poor quality sequences (Figure1B). From this analysis, 530 additional genes (along with6132 genes found in the previous step) were identified byESTs mapping completely to their introns or partially tothe exons.Genomic EST hits are categorized according to genomic mapping locationsThe 31,570 ESTs that align to the genome have been sub-divided into the following categories: ESTs which map tointergenic regions (50%), intronic regions (14%), andtranscript ends and/or untranslated regions (UTRs)(19%), EST reads that split into two separate alignmentblocks (3%), and those which span exon and intronboundaries suggesting alternative splice junctions (11%).The last 3% mapped to overlapping transcripts (Figure1C). We have investigated each category to search forgenetic structures, such as putative novel genes, splice var-iants, and ncRNA genes from each genomic region. Someof the genomic regions already have computational pre-dictions or other previously sequenced ESTs supportingthe presence of such structures, and some do not have any454 ESTs mapping to Caenorhabditis elegans transcriptsFigure 1454 ESTs mapping to Caenorhabditis elegans tran-scripts. (A) Histogram showing the distribution of 454 expressed sequence tags (ESTs) mapping to Caenorhabditis elegans transcripts. Coordinate 0 on the x-axis represents the 5'-end of the transcripts. (B) Summary of 454 EST map-ping result to the C. elegans transcriptome and genome. (C) Categorization of genomic 454 EST hits.Page 3 of 14(page number not for citation purposes)other information to support these findings. Lack of other BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30experimental and computational evidence may indicatepossible splicing errors or unknown genetic features.Intergenic ESTsWe have classified intergenic ESTs as those ESTs whichmapped within intergenic regions of the genome but didnot overlap with adjacent genes. A total of 8449 intergenicESTs mapped to 1061 intergenic regions ranging from sin-gle counts to coverage with over 1000 reads (includingidentical ESTs); 120 of these intergenic regions have fiveor more unique ESTs mapping within them (Additionalfile 2).Most intergenic regions (around 850) have one uniqueEST cluster (that is, identical and overlapping ESTs)mapped, around 150 intergenic regions have two clusters,around 35 intergenic regions have three clusters, around20 have four clusters, and, finally, one intergenic regionhas 14 EST predicted clusters.Figure 2 shows the intergenic regions with the most ESTs.There are no protein-coding gene predictions in theseintergenic regions, although there are some small openreading frames within these intergenic regions. However,as indicated in the Figure 2, the regions where most ESTsmap show a high conservation between C. elegans and C.briggsae. In addition, BLAST analysis of these regions(nucleotide to protein via six-frame translation) revealsprotein homology against reference protein data sets fromthe genomes of yeast, fly, worm, and human, and alsoagainst SwissProt and TREMBL [15]. These EST loci mayrepresent novel genes that are small or extensions ofneighboring genes.Interestingly, the neighboring gene to the novel EST hitsshown in Figure 2B, gsa-1 (R06A10.2), encodes a Gs alphasubunit of heterotrimeric G proteins, which affects L1stage viability, movement, and egg laying [16]. gsa-1 isconfirmed both by previously sequenced cDNAs andESTs. We postulate that the 454 ESTs may, therefore, indi-cate a UTR extension of gsa-1 or, alternatively, a splice var-iant. It is also interesting to note that the number of 454ESTs mapped near the 3'-end of gsa-1 is much greater thanthe number of previously sequenced ESTs mapped to gsa-1. This may indicate that 454 EST sequencing has muchdeeper coverage of L1 stage mRNA sample or, alterna-tively, the potential novel splice variant shows a relativelyhigher level of expression at the L1 stage.Finally, previously sequenced ESTs (Yuji Kohara, unpub-lished) overlap with 454 ESTs in the intergenic regionshown in Figure 2C. These ESTs support a possible 3'-endextension of the neighboring gene, nhr-88 (K08A2.5a).stage C. elegans cDNA library. As nhr-88 has been deter-mined to belong to a gene cluster containing genes thatare significantly enriched in L1 muscle [17], this exampleimplies that our 454 EST sequencing data has deep cover-age of L1-enriched genes.The size of the intergenic regions to which the 454 ESTsmap ranges from 114 base pairs (bp) to 38,046 bp.Although we observe a positive correlation between thephysical distribution of EST hits and the intergenic size(Pearson correlation coefficient of 0.46), we found no cor-relation between the size of the intergenic region and thenumber of ESTs that mapped to it. The distribution ofintergenic EST hits across intergenic regions was observedto be relatively uniform, which is unexpected given thatwe anticipated witnessing a bias towards the ends of theintergenic regions (that is, close to neighboring genes),which would likely represent UTRs or novel terminatingor initiating exons of the bordering genes. EST hits in themiddle of large intergenic regions distant from neighbor-ing genes represent more likely candidates for novel tran-scripts, including ncRNA genes [18].Intronic ESTsWe have classified intronic ESTs as those ESTs whichmapped completely within introns. Intronic EST matchesmay represent novel exons (that is, alternative splicing), aswell as novel overlapping transcripts on the oppositestrand. ncRNAs are also known to be present in intronicregions [11]. A total of 1921 ESTs with over 90% align-ment were mapped within introns of 584 C. elegans tran-scripts using BLAST (P-value less than 9 × 10-5); seeAdditional file 3. Of these genes, 262 only had theintronic EST hits without any ESTs completely mapping totheir annotated exons. These ESTs may indicate that thereare ncRNA genes or novel transcripts on the oppositestrand within the introns. The reasoning behind this spec-ulation is that the probability of a gene having onlyintronic ESTs without any ESTs mapping to its annotatedexons is low for the possibility that the intronic ESTs arederived from a novel exon of that gene. In fact, the recentWormBase version (WS180) added four new protein cod-ing genes within some of these introns but on the oppo-site DNA strand and the ESTs mapped in the intronsmatch those new genes (Figure 3A and Additional file 4).5'- and 3'-end ESTs5'- and 3'-end ESTs are ESTs that partially map to thebeginning or end of transcripts (that is, 5'- and 3'-UTRs orterminating/initiating exons). These EST matches are alsointeresting in that they may contain regulatory elements,such as subcellular localization signals [19], and cis-ele-ments for mRNA stability and translation [20]. We foundPage 4 of 14(page number not for citation purposes)The ESTs generated by Yuji Kohara (yk1039b06,yk1074f06, yk1232e09), are also generated from an L1131 transcripts with ESTs mapping to their 5'-ends (Addi-tional file 5) and 956 transcripts had ESTs mapped to their BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30The intergenic region on chromosomes with unique 454 expressed sequence tagsFigure 2The intergenic region on chromosomes with unique 454 expressed sequence tags. (A) The intergenic region on chromosome III with 49 unique 454 expressed sequence tags (ESTs). (B) The intergenic region on chromosome I with the most number of unique 454 ESTs (99) in this analysis. (C) The intergenic region on chromosome II with 62 unique 454 ESTs. The 454 EST clusters in the middle of these intergenic regions with black vertical bars represent deep EST coverage, and conservation of these regions between Caenorhabditis elegans and C. briggsae is shown. These ESTs may represent a novel gene or extension of the neighboring gene. Note that the genomic regions shown are Page 5 of 14(page number not for citation purposes)not to the same scale.BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/303'-ends. These 3'- and 5'-end 454 ESTs represent UTRs orcoding region extensions potentially including alternativestart and stop codons.Gapped ESTsWhen one end of an EST read maps to a genomic locationand the other end of the EST read maps to a location somedistance away (that is, two separate alignment blocks), wehave categorized these as 'gapped ESTs'. ESTs that map tothe ends of two adjacent exons confirming known introns(approximately 29,000 ESTs) are not included in this dataset as they were examined in the initial comparison to thetranscriptome. Fifteen such 'gapped EST' hits were found(Figure 3B), confirming novel exon-intron boundariesand providing strong experimental evidence for noveltranscripts or alternative splice variants with skippedexons, novel internal or end exons, or novel exon-intronboundaries (Table 1). Six of these EST matches confirmupdated gene structures in recent WormBase releaseExon-intron boundary ESTsESTs that map across exon and intron boundaries are apossible indication of novel alternative splicing events(that is, alternative 5'- or 3'-end splice sites) or, alterna-tively, cDNAs that have been partially processed withsome introns left intact. These EST hits could also provideexperimental confirmation for incorrect splice site predic-tions in the current gene models, particularly for thosethat lack experimental validation. Additional file 6 lists284 transcripts with 454 ESTs that map across their exonand intron boundaries.Exon-intron boundaries with 454 ESTs mapped show weaker 3'-end splice site conservationWe have analyzed splice sites for the transcripts whichhave ESTs mapped across exon-intron boundaries, andfound the consensus 3'-end splice sites (that is, TTTTCAG)are less well conserved compared with the transcripts thathave splice sites confirmed by ESTs or RNAs as shown inExamples of intronic and gapped expressed sequence tagsFigure 3Examples of intronic and gapped expressed sequence tags. (A) An example of intronic expressed sequence tags (ESTs) showing 454 ESTs mapped to the gene, K09E2.3, which is added to a recent WormBase release (WS180) within the intron of K09E2.2. There are also other ESTs recently added that confirm K09E2.3. (B) An example of a gapped-EST suggesting alterna-tive splicing or correction of the current gene model. Note that the genomic regions shown are not to the same scale.Page 6 of 14(page number not for citation purposes)(WS180). Figure 4A. The weaker conservation of the 3'-end spliceBMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30sites may be a feature of alternative splice sites or simplybe more prone to erroneous splicing events.454 ESTs mapped to different genomic regions show average guanine-cytosine contents similar to the genomic averagesComparisons of guanine-cytosine (GC) contents of the454 EST read sequences, in different genomic regions,with overall C. elegans genomic sequences are shown inFigure 4B. As expected, EST read sequences that mappedto exons have the highest GC content, and these are closeto the average for annotated coding sequences of thewhole genome (around 45%). It might be expected that454 EST reads, which mapped to intergenic and intronicregions, would have GC content close to that of the cod-ing sequence if they represent novel transcripts or exons.However, the intergenic and intronic EST read sequenceshave similar percentage GC to the average GC contents ofintergenic and intronic sequences for genome, around34% and 28%, respectively, suggesting the ESTs that mapto intergenic regions or introns represent evidence fornon-coding, transcribed sequences rather than proteincoding sequences as ncRNA genes do not show as strongbase composition biases as do protein coding sequences[21].Intergenic regions with 454 ESTs show a higher degree of cross-species conservationConservation of intergenic regions across different nema-tode species is evidence for functional genetic structures.These intergenic regions, with 454 ESTs mapped to them,were aligned with other nematode species namely: C.briggsae, C. remanei, C. brenneri, B. malayi, and P. pacificus,1:1 ratio of intergenic regions with 454 ESTs aligned toand ones without. These intergenic regions had three tosix orthologous sequences for multiple sequence align-ments depending on the availability and existence oforthologous sequences. Intergenic regions with mapped454 ESTs had overall higher average ClustalW alignmentscores than the ones without 454 ESTs (Figure 4C). Thishigher degree of conservation of the intergenic regionswith 454 ESTs represents further evidence that supportsthe presence of putative novel functional transcripts iden-tified by the 454 EST sequences.Cross-species EST-to-genome comparisons identify highly conserved ESTs and species-specific ESTsAnother important analysis is the cross-species compari-son of ESTs that map to the genome. We have comparedthe well-annotated genomes of C. elegans and C. briggsae,as well as the C. remanei genome that has more recentlybecome available.Strong and abundant EST matches on well-conservedgenomic regions is strong evidence supporting the pres-ence of novel genetic structures. We were interested incomparing EST hits and cross-species conservation of thegenomic regions where the ESTs align. Such characteriza-tion of EST hits unique to one species and EST hits in con-served regions may offer evolutionary clues to alternativesplicing.We have examined both species-specific and species-con-served splicing events by mapping the intergenic 454 ESTsequences to C. briggsae, C. remanei, and C. brenneri byBLAST. A total of 3524 unique C. elegans ESTs, which wereTable 1: Summary of gapped-expressed sequence tag matches and putative novel structures% Cov Gap size Chromosome First alignment block Second alignment block Putative novel structure98 480 I 390514 to 390558 391038 to 391059 Skipped exon96 489 I 5761023 to 5761058 5761547 to 5761592 Novel end exon confirmed*98 101 I 7642453 to 7642487 7642588 to 7642631 Novel end exon98 62 I 9724862 to 9724908 9724970 to 9725025 Alternate exon-intron boundary98 157 I 11931105 to 11931163 11931320 to 11931345 Novel internal exon92 45 II 2782924 to 2782962 2783007 to 2783086 Novel end exon confirmed*98 224 II 10828328 to 10828404 10828628 to 10828667 Confirmed intron*91 461 II 13635576 to 13635653 13636114 to 13636163 Confirmed intron*98 548 II 14747166 to 14747225 14747773 to 14747821 Novel end exon98 306 III 8552492 to 8552538 8552844 to 8552868 Alternate exon-intron boundary92 277 III 11751557 to 11751606 11751883 to 11751921 Novel end exons confirmed*92 1364 III 12639534 to 12639558 12640922 to 12640967 Novel internal exon98 926 IV 9350535 to 9350565 9351491 to 9351529 Confirmed intron*97 900 V 1981345 to 1981373 1982273 to 1982364 Novel end exons98 50 X 11824129 to 11824180 11824230 to 11824259 Novel transcript/novel end exons* A recent WormBase release (WS180) has since confirmed these exon-intron boundaries; the original mapping analysis was performed using WB160.Page 7 of 14(page number not for citation purposes)using ClustalW. Approximately 1000 intergenic regions intotal were randomly selected for this analysis, providing aaligned to intergenic regions at the high stringency, weremapped to C. briggsae, C. remanei, and C. brenneri. The topBMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/305% of the BLAST hits, with the highest scores, were mostcommon among the three nematodes, but C. remanei hadthe greatest number of high score BLAST hits (around15%) with E-values lower than 1 × 10-4 (Figure 4D). Theintergenic regions where these ESTs are aligned are highlyconserved across species as expected, and synteny of thesegenomic regions also seems to be well conserved (data notshown). These highly conserved EST hits across specieslikely represent novel transcripts. EST alignments withpoor scores, such as BLAST hits with an E-value highernovel transcripts that are unique to C. elegans, although itis also possible that some or all of these ESTs may be tran-scriptional noise.Highly conserved ESTs are mapped to ncRNAsncRNAs are anticipated to be conserved [13]. The conser-vation of primary structure for ncRNAs is known to be var-iable when the secondary structure is expected to be morehighly conserved across species [22]. It is also known thatexpression of ncRNAs vary with developmental stagesComparative analyses of 3' splice sites, GC contents, and cross-species sequence conservationFigure 4Comparative analyses of 3' splice sites, GC contents, and cross-species sequence conservation. (A) The conserva-tion of consensus 3'-end splice sites (TTTTCAG) of confirmed transcripts and transcripts with exon-intron boundary 454 expressed sequence tag hits. (B) A comparison of 454 expressed sequence tags and Caenorhabditis elegans whole genome for guanine-cytosine content of different genomic regions. (C) Average ClustalW alignment score comparisons for intergenic regions with or without 454 expressed sequence tags for different numbers of orthologous sequences. (D) Chart showing 3524 unique Caenorhabditis elegans intergenic 454 expressed sequence tags mapped to C. briggsae, C. remanei, and C. brenneri.Page 8 of 14(page number not for citation purposes)than 10, indicate that the ESTs mapped uniquely to C. ele-gans at high stringency. These EST sequences may be from[23], and therefore, our ESTs may identify ncRNAs highlyexpressed in the L1 stage.BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30We have selected and examined the EST loci that arehighly conserved across species (E-values lower than 1 ×10-4), and have at least five or more EST reads mapped inthe middle of large intergenic regions (more than 10 kb)away from neighboring genes (Table 2). These EST hits arethe most probable candidates for novel transcripts and infact, many of these EST loci are either mapped to ncRNAsthat are identified, confirmed and added to more recentWormBase or ncRNA predictions performed by RNAz[24] (Figure 5A and 5B).The most conserved intronic ESTs map to C. elegans geneK04G7.10 and its C. briggsae ortholog. Consistent withwell conserved EST loci in intergenic regions, these ESTsmay represent an ncRNA as they overlap with an ncRNAprediction within the intron, however, alternatively thiscould be a novel exon as GeneFinder has predicted anexon in that region (Figure 5C).454 ESTs support computational ncRNA gene predictionsCurrently, there are around 1300 annotated functionalncRNAs in WormBase [11], of which 39 are in our dataset, including snoRNAs, miRNAs, 21URNAs, rRNAs, andsome ncRNAs that could not be assigned to any functionalclass (Table 3). We suspect that 454 ESTs may representprecursor RNAs, such as pre-miRNA and pre-snoRNAs,which are known to be polyadenylated [25-27], since ourRNA preparation was done using a polyA-dependentmethod. However, it is also possible that potentialncRNAs that are identified in this study may belong topolyadenylated ncRNA classes, such as mRNA-likencRNAs (mlncRNAs) [28].Single-sequence RNA secondary structure predictions,without using comparative genomes, only take intoaccount thermodynamic models and energy minimiza-tion, which are not sufficient to achieve the necessary sen-sitivity and specificity for ncRNA prediction. For thatreason, we have compared the 454 EST mapping resultwith RNAz ncRNA predictions [24], which incorporatehomology information of the RNA secondary structure tomake predictions for ncRNAs. This approach, using C. ele-gans and C. briggsae, has proven fruitful in identifying over2000 putative RNA loci [18].We compared RNAz ncRNA predictions for C. elegans and332 unique 454 ESTs mapped to intergenic regions. Wefound 19 ncRNA predictions in close proximity to 454ESTs (within 100 bp), with nine of these predictedncRNAs overlapping with the 454 ESTs in intergenicregions (Table 4). We have also compared the intronic454 ESTs with the RNAz ncRNA predictions and foundthat 10 introns contained both 454 ESTs and ncRNA pre-dictions within 100 bp, and five introns had intronic ESTsthat overlap with ncRNA predictions (Table 5). These ESTsthat map to ncRNA predictions may represent novelncRNAs.Table 2: Most highly conserved 454 expressed sequence tags loci in intergenic regionsCount Intergenic coordinates Intergenic distance Distance to nearest gene Expressed sequence tags count1 I:12824792..12825228 436 10 62 I:1841201..1842444 1243 107 133 II:11162247..11171658 9411 4258 54 II:15226621..15230280 3659 763 135 II:5956644..5961678 5034 612 166 II:9773798..9774904 1106 150 97 III:11170439..11171970 1531 377 178 III:12130215..12152775 22560 7576 59 III:1733092..1743238 10146 4249 1010 III:1743309..1752172 8863 1819 611 III:2768655..2770841 2186 687 612 III:3664472..3667602 3130 79 1913 III:4098100..4121436 23336 1088 814 III:8935324..8944721 9397 1136 1815 IV:9934182..9935039 857 229 2516 V:1075619..1081451 5832 939 1817 V:1361139..1373418 12279 403 518 V:1980421..1982526 2105 6 1119 V:5932735..5933473 738 180 520 V:6336709..6357288 20579 1448 621 X:15012843..15026225 13382 4845 822 X:15616440..15620416 3976 928 723 X:16927937..16944791 16854 1211 22Page 9 of 14(page number not for citation purposes)24 X:837757..838677 920 315 6BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30L1 SAGE and 454 ESTs overlap by 50%We have investigated both the commonalities and differ-ences between the large amount of available SAGE dataanimals and as such are a direct comparison of these twogene expression-measuring techniques.The number of genes that were identified by SAGE andESTs independently from L1 stage animals is 5115 and6132, respectively, but the number of genes that wereidentified by both methods is only 3068. Hence, 2047genes were identified by SAGE only and 3064 genes wereidentified only by ESTs. It is worth noting that while thesame mRNA sample was used for both SAGE and 454 ESTanalyses, the inherent differences in the technologies usedmay have introduced discrepancies in gene identification.For example, approximately half of the genes identified bySAGE only have a single SAGE tag, which may not be suf-ficient evidence for expression of those genes due to thepossibility of erroneous assignment [4], and approxi-mately 12% of the genes identified only by 454 ESTs donot contain NlaIII restriction enzyme site required for atranscript to be identified by SAGE [4,30].Spearman correlation of the transcript abundances, meas-ured by SAGE and ESTs, was calculated using genes thathave both SAGE tags and ESTs mapped to them. The cor-relation coefficient is 0.36, which is not as high as we ini-tially expected considering both EST and SAGE librarieswere prepared from the same mRNA sample. This, how-ever, raises interesting questions as to how well each dataset represents the complete picture of transcriptionalactivities. It could be that from the large scale of transcrip-tional activities, each snap-shot represents only a partialpicture, or that each experiment contains significantamounts of new information, although it could simply bedue to discrepancies between different gene expressionprofiling methods.L1 SAGE and 454 ESTs identify putative novel L1 stage-enriched genesWe have compared 454 ESTs and SAGE tags, which mapspecifically to intergenic regions. There are 166 intergenicregions that have both L1 454 ESTs and L1 SAGE tagsmapped to them (Additional file 7). When we examinedintergenic regions with SAGE tags, which are enriched inthe L1 stage but lowly expressed in embryo and otherdevelopmental stages, we observed a good correlationbetween the L1-enriched SAGE tags and L1 454 ESTs. Inother words, most intergenic regions with SAGE tags thatare expressed highly in the L1 stage also had 454 ESTsmapped in close proximity. We speculate these loci mayrepresent novel coding or non-coding transcripts that arepotentially L1 stage specific but expressed in low abun-dance. Additionally, we postulate that both intergenicSAGE tags enriched in L1 and L1 454 ESTs, which maptogether in regions without any genes in their vicinity mayExamples of 454 ESTs mapped to known or predicted ncRNAsFigure 5Examples of 454 ESTs mapped to known or pre-dicted ncRNAs. (A), (B) Representative 454 expressed sequence tag data, which identify known non-coding RNAs. (C) The most conserved cross-species intronic 454 expressed sequence tags hit mapping to a RNAz non-coding RNA prediction. Note that the genomic regions shown are not to the same scale.Page 10 of 14(page number not for citation purposes)[3,29] and the novel 454 EST data. Both data sets weregenerated from the same mRNA preparation of L1 stagerepresent putative novel transcripts that may be enrichedin the L1 stage of C. elegans. In addition, ESTs and SAGEBMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30tags which map near the 3'-end of genes might represent3'-UTR extensions and as such can provide evidence ofexpression for those genes at the L1 stage.ConclusionWe have successfully demonstrated the use of the next-generation sequencing technology (454 sequencing-by-synthesis approach) for transcriptome analysis in anextremely efficient manner. We have identified a numberof putative novel genetic structures from the transcrip-tome snap-shot obtained from this analysis, includingputative novel splice variants and ncRNAs that might bestage specific.MethodsmRNA and cDNA preparationTotal RNA from a pooled sample was prepared using TRI-ZOL Reagent (Invitrogen Life Technologies, Carlsbad, CA)following the manufacturer's instructions and wasassayed for quality and quantified using an Agilent 2100Bioanalyzer (Agilent Technologies, Mississauga, ON) andRNA 6000 Nano LabChip kit (Caliper Technologies, Hop-kinton, MA). Contaminating genomic DNA was removedfrom total RNA by DNAse1 treatment using DNAfree(Ambion, Austin, TX), following the manufacturer'sinstructions. First-strand cDNA synthesis was preparedfrom 2 μg of total RNA using the Powerscript ReverseTranscriptase (cat#639501, Takara Bio Inc. Shiga, Japan).For the first-strand synthesis, custom biotinylated primersTable 3: Non-coding RNA genes identified by 454 expressed sequence tags (incomplete)RNA type Gene name Chromosome Status LocationmiRNA Y105E8A.31 I Predicted 3'-UTR of Y105E8A.16miRNA F08F3.11 V Predicted IntergenicNon-coding RNA F54D7.7 I Predicted Overlap with 3'-UTR of F54D7.421URNA C46G7.7 IV RNAs Intergenic21URNA C08F11.38 IV RNAs Intron of C08F11.1321URNA T23G4.18 IV RNAs Intergenic21URNA T23G4.24 IV RNAs Intergenic21URNA F55B11.13 IV RNAs Intergenic21URNA Y105C5A.159 IV RNAs Intergenic21URNA Y51H4A.106 IV RNAs IntergenicNon-coding RNA F09E10.10 X Expressed sequence tags IntergenicNon-coding RNA C30E1.9 X Expressed sequence tags IntergenicRNA pseudogene D1005.t1 X Predicted IntergenicRNA pseudogene ZK380.t2 X Expressed sequence tags IntergenicrRNA F31C3.11 I Expressed sequence tags IntergenicrRNA F31C3.9 I Expressed sequence tags IntergenicsnoRNA R12E2.17 I RNAs Intron of R12E2.3snoRNA F25H5.9 I RNAs Intron of F25H5.3snoRNA T10B9.11 II RNAs IntergenicsnoRNA M106.6 II RNAs Intron of M106.1snoRNA H06I04.9 III Predicted Intron of H06I04.4snoRNA ZK643.9 III RNAs IntergenicsnoRNA Y43B11AR.7 IV Expressed sequence tags Intron of Y43B11AR.4snoRNA F17C11.14 V RNAs Intron of F17C11.9snoRNA K09E9.5 X mRNA IntergenicTable 4: Intergenic 454 expressed sequence tags overlapping with RNAz non-coding RNA predictionsExpressed sequence tags Expressed sequence tags coordinate Predicted non-coding RNA coordinate Intergenic coordinate062385_1158_0389 I:9863279..9863316 I:9863183..9863302 I:9862287..9863587004771_1566_3251 II:15165861..15165910 II:15165800..15165950 II:15165619..15166104074784_3939_2069 III:11474420..11474530 III:11474520..11474625 III:11474411..11479792313519_0405_2992 IV:3168808..3168913 IV:3168772..3168891 IV:3155983..3169942320686_1504_2153 V:5440646..5440778 V:5440615..5440736 V:5438973..5446748322033_3470_0718 V:11801636..11801764 V:11801572..11801690 V:11800350..11803726240717_0095_3913 V:12296851..12296944 V:12296911..12297022 V:12293608..12302290093179_3480_2283 X:17010136..17010216 X:17010094..17010200 X:17008275..17013376Page 11 of 14(page number not for citation purposes)104650_3438_2367 X:6961371..6961474 X:6961430..6961548 X:6956349..6967668BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30containing a recognition sequence for the type IIS restric-tion enzyme Mme1 were used at a final concentration of1 μM (454-3F-biotin, 5'-/Biotin/-AAG CAG TGG TAA CAACGC ATC CGA CTT TTT TTT TTT TTT TTT TTT TTV N-3';454-3A, AAG CAG TGG TAA CAA CGC AGA GTA CGCGGG). The resulting single-stranded cDNA was amplifiedusing the Advantage 2 polymerase chain reaction (PCR)kit (BD Biosciences Clontech, Mountain View, CA) in afinal volume of 1000 μl with the addition of 454-3A at afinal concentration of 240 nM. The cycling conditionsconsisted of an initial denaturation at 95°C for 1 minutefollowed by 20 cycles of 95°C for 30 seconds, 65°C for 30seconds and 68°C for 6 minutes. After amplification, theDNA was recovered using a QIAquick PCR Purification kit(Qiagen) according to manufacturer's instructions. Fol-lowing column elution, the DNA was bound to pre-washed M280 Streptavidin beads (Dynal Biotech) andsubjected to Mme1 digestion according to manufacturer'sinstructions (New England Biolabs) in the presence of S-adenosylmethionine. Following a 2.5 hour incubation at37°C, the supernatant was removed and subjected to phe-nol chloroform isoamyl alcohol (pH 8.0, 100 μl; Fisher)extraction in phase-lock gel tubes (heavy) 0.5 ml (Eppen-dorf) and the 600 μl aqueous phase precipitated by theaddition of 2750 μl of 100% ethanol, 8 μl of mussel gly-cogen (Invitrogen), and 360 μl of 7.5 M ammonium ace-tate, and incubation at -20°C overnight. The precipitatewas recovered by centrifugation at 14,000 rpm for 30 min-utes at 4°C in an Eppendorf benchtop refrigerated centri-fuge (model 5810R) and washed in 70% ethanol,resuspended in 14 μl dH2O. The DNA quality wasassessed and quantified using an Agilent DNA 1000 seriesII assay (Agilent). In preparation for 454 sequencing, 3 μgof the cDNA sample was nebulized to a mean fragmentsize of 600 ± 50 bp, end-repaired and adapter-ligatedaccording to the standard procedures described previously[7].454 sequencing and sequence analysisWe adapted the standard procedures for 454 sequencingdescribed previously [7]. We also followed standard post-run, bioinformatics processing on the 454 platform toAfter high quality sequence reads were obtained, BLASTanalysis was performed as in Bainbridge et al [2].Sequences were first trimmed of low quality bases usingtrim2 (-M 10) [31] and mapped to C. elegans transcrip-tome (WormBase release WS160) using wuBLAST (ver-sion 2.0, 10 May 2005) [32]. BLAST hits with a P-value of9 × 10-7 or less (comparable to the BLAST E-value ofaround 9 × 10-13), which corresponds approximately to a60-bp contiguous perfect match in the data set, were con-sidered to be successful hits against the transcriptome.Sequences that did not map to C. elegans transcriptomewere then aligned with wuBLAST to C. elegans genome (P-value of 9 × 10-5 or less, comparable to the BLAST E-valueof around 9 × 10-11). The positions of significant hits withrespect to exons, introns, intergenic regions, ESTs, SAGEtags and other DNA alignment features were determinedusing the Perl Ensembl API (version 35) and Ensembldatabase (WormBaseWS160). Also, ClustalW (version1.74) was used for cross-species, multiple sequence align-ments.List of abbreviationsBLAST: Basic Local Alignment Search Tool; bp: base pairs;EST: expressed sequence tag; GC: guanine-cytosine; GO:gene ontology; L1: first larval; ncRNA: non-coding RNA;PCR: polymerase chain reaction; SAGE: serial analysis ofgene expression; UTR: untranslated region.Authors' contributionsDLB, MAM and SJMJ conceived of the study. HS per-formed the analyses and drafted the manuscript. DGMprovided mRNAs. MH prepared cDNA libraries and wrotethe corresponding methods section. VM and EM carriedout DNA sequencing. MNB designed the mapping algo-rithm.Additional materialAdditional file 1Computationally predicted genes (WS170) partially confirmed by 454 expressed sequence tags.Click here for fileTable 5: Intronic 454 expressed sequence tags overlapping with RNAz non-coding RNA predictionsExpressed sequence tag coordinate Predicted non-coding RNA coordinate Intron coordinate GeneIII:4689667..4689728 III:4689566..4689715 III:4689399..4690021 T04A8.5III:7160222..7160329 III:7160256..7160375 III:7159674..7160467 K04G7.10V:18041446..18041591 V:18041383..18041481 V:18041071..18041867 Y59A8B.6V:18041446..18041593 V:18041501..18041624 V:18041071..18041867 Y59A8B.6V:6881222..6881352 V:6881154..6881264 V:6881209..6881399 K11C4.3X:9091394..9091461 X:9091460..9091597 X:9090452..9092219 H08J11.2Page 12 of 14(page number not for citation purposes)determine reads that passed various quality filters. [http://www.biomedcentral.com/content/supplementary/1741-7007-6-30-S1.xls]BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/30AcknowledgementsThis work was funded by the Canadian Institutes of Health Research and Natural Sciences and Engineering Research Council. Genome British Columbia and Genome Canada provided funding for generating the SAGE data used in this study. We thank Robert Johnsen, Monica Sleumer, Rene Warren and Martin Jones for their help with editing the manuscript and useful discussions.References1. Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, Eizinger A, WylieBN, Davidson GS: A gene expression map for Caenorhabditiselegans.  Science 2001, 293:2087-2092.2. Bainbridge MN, Warren RL, Hirst M, Romanuik T, Zeng T, Go A,Delaney A, Griffith M, Hickenbotham M, Magrini V, Mardis ER, SadarMD, Siddiqui AS, Marra MA, Jones SJM: Analysis of the prostatecancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach.  BMC Genomics 2006, 7:246.3. McKay SJ, Johnsen R, Khattra J, Asano J, Baillie DL, Chan S, Dube N,Tyson JR, Vatcher G, Warner A, Wong K, Zhao Z, Moerman DG:Gene expression profiling of cells, tissues, and developmen-tal stages of the nematode C. elegans.  Cold Spring Harb SympQuant Biol 2003, 68:159-169.4. Pleasance ED, Marra MA, Jones SJ: Assessment of SAGE in tran-script identification.  Genome Res 2003, 13:1203-1215.5. Modrek B, Lee C: A genomic view of alternative splicing.  NatGenet 2002, 30:13-19.6. He H, Wang J, Liu T, Liu XS, Li T, Wang Y, Qian Z, Zheng H, Zhu X,Wu T, Shi B, Geng W, Zhou W, Skogerbø G, Chen R: Mapping theC. elegans noncoding transcriptome with a whole-genometiling microarray.  Genome Res 2007, 17:1471-1477.7. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA,Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM,Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, JandoSC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR,Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB,McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, PlantR, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW,Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, WangSH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genomesequencing in microfabricated high-density picolitre reac-tors.  Nature 2005, 437:376-380.8. Torres TT, Metta M, Ottenwalder B, Schlotterer C: Gene expres-sion profiling by massively parallel sequencing.  Genome Res2008, 18:172-177.9. Ohtsu K, Smith MB, Emrich SJ, Borsuk LA, Zhou R, Chen T, Zhang X,Timmermans MC, Beck J, Buckner B, Janick-Buckner D, Nettleton D,Scanlon MJ, Schnable PS: Global gene expression analysis of theshoot apical meristem of maize (Zea mays L.).  Plant J 2007,52:391-404.10. Cheung F, Haas BJ, Goldberg SM, May GD, Xiao Y, Town CD:Sequencing Medicago truncatula expressed sequenced tagsusing 454 Life Sciences technology.  BMC Genomics 2006, 7:272.11. Stricklin SL, Griffiths-Jones S, Eddy SR: C. elegans noncoding RNAgenes.  WormBook 2005, 25:1-7.12. Ewing B, Green P: Base-calling of automated sequencer tracesusing phred. II. Error probabilities.  Genome Res 1998,8:186-194.13. Eddy SR: Computational analysis of RNAs.  Cold Spring Harb SympQuant Biol 2006, 71:117-128.14. Beissbarth T, Speed TP: GOstat: find statistically overrepre-sented gene ontologies within a group of genes.  Bioinformatics2004, 20:1464-1465.15. Bairoch A, Apweiler R: The SWISS-PROT protein sequencedata bank and its new supplement TREMBL.  Nucleic Acids Res1996, 24:21-25.16. Korswagen HC, Park JH, Ohshima Y, Plasterk RH: An activatingmutation in a Caenorhabditis elegans Gs protein induces neu-ral degeneration.  Genes Dev 1997, 11:1493-1503.17. Roy PJ, Stuart JM, Lund J, Kim SK: Chromosomal clustering ofmuscle-expressed genes in Caenorhabditis elegans.  Nature2002, 418:975-979.18. Missal K, Zhu X, Rose D, Deng W, Skogerbo G, Chen R, Stadler PF:Prediction of structured non-coding RNAs in the genomes ofthe nematodes Caenorhabditis elegans and Caenorhabditisbriggsae.  J Exp Zoolog B Mol Dev Evol 2006, 306:379-392.19. Ueyama T, Lekstrom K, Tsujibe S, Saito N, Leto TL: Subcellularlocalization and function of alternatively spliced Noxo1 iso-forms.  Free Radic Biol Med 2007, 42:180-190.20. Misquitta CM, Iyer VR, Werstiuk ES, Grover AK: The role of 3'-untranslated region (3'-UTR) mediated mRNA stability incardiovascular pathophysiology.  Mol Cell Biochem 2001,224:53-67.21. Carter RJ, Dubchak I, Holbrook SR: A computational approachto identify genes for functional RNAs in genomic sequences.Nucleic Acids Res 2001, 29:3928-3938.22. Rivas E, Eddy SR: Noncoding RNA gene detection using com-parative sequence analysis.  BMC Bioinformatics 2001, 2:8.23. Deng W, Zhu X, Skogerbo G, Zhao Y, Fu Z, Wang Y, He H, Cai L,Sun H, Liu C, Li B, Bai B, Wang J, Jia D, Sun S, He H, Cui Y, Wang Y,Bu D, Chen R: Organization of the Caenorhabditis eleganssmall non-coding transcriptome: genomic features, biogen-esis, and expression.  Genome Res 2006, 16:20-29.Additional file 2Intergenic regions with five or more unique 454 expressed sequence tags.Click here for file[http://www.biomedcentral.com/content/supplementary/1741-7007-6-30-S2.xls]Additional file 3List of genes with 454 expressed sequence tags mapped within their introns.Click here for file[http://www.biomedcentral.com/content/supplementary/1741-7007-6-30-S3.xls]Additional file 4Genes with five or more intronic 454 expressed sequence tags only. *A more recent WormBase (WS180) added new genes within the introns where these expressed sequence tags mapClick here for file[http://www.biomedcentral.com/content/supplementary/1741-7007-6-30-S4.xls]Additional file 5Genes with 5'-end 454 expressed sequence tags with over 90% alignment.Click here for file[http://www.biomedcentral.com/content/supplementary/1741-7007-6-30-S5.xls]Additional file 6List of genes with exon-intron boundary 454 expressed sequence tags.Click here for file[http://www.biomedcentral.com/content/supplementary/1741-7007-6-30-S6.xls]Additional file 7166 intergenic regions with both first larval stage 454 expressed sequence tags and first larval stage SAGE.Click here for file[http://www.biomedcentral.com/content/supplementary/1741-7007-6-30-S7.xls]Page 13 of 14(page number not for citation purposes)Fang L, Goszczynski B, Ha E, Halfnight E, Hollebakken R, Huang P,Hung K, Jensen V, Jones SJ, Kai H, Li D, Mah A, Marra M, McGhee J,Newbury R, Pouzyrev A, Riddle DL, Sonnhammer E, Tian H, Tu D,24. Washietl S, Hofacker IL, Stadler PF: Fast and reliable predictionof noncoding RNAs.  Proc Natl Acad Sci USA 2005, 102:2454-2459.Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Biology 2008, 6:30 http://www.biomedcentral.com/1741-7007/6/3025. Cai X, Hagedorn CH, Cullen BR: Human microRNAs are proc-essed from capped, polyadenylated transcripts that can alsofunction as mRNAs.  RNA 2004, 10:1957-1966.26. Allmang C, Kufel J, Chanfreau G, Mitchell P, Petfalski E, Tollervey D:Functions of the exosome in rRNA, snoRNA and snRNA syn-thesis.  EMBO J 1999, 18:5399-5410.27. Beggs JD, Tollervey D: Crosstalk between RNA metabolic path-ways: an RNOMICS approach.  Nat Rev Mol Cell Biol 2005,6:423-429.28. Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A: Identificationof mammalian microRNA host genes and transcriptionunits.  Genome Res 2004, 14:1902-1910.29. Serial Analysis of Gene Expression   [http://elegans.bcgsc.ca/home/sage.html]30. Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pan-doh P, Dhalla N, Prabhu AL, Ma K, Lee S, Ally A, Tam A, Sa D, RogersS, Charest D, Stott J, Zuyderduyn S, Varhol R, Eaves C, Jones S, HoltR, Hirst M, Hoodless PA, Marra MA: Large-scale production ofSAGE libraries from microdissected tissues, flow-sortedcells, and cell lines.  Genome Res 2007, 17:108-116.31. Huang X, Wang J, Aluru S, Yang SP, Hillier L: PCAP: a whole-genome assembly program.  Genome Res 2003, 13:2164-2170.32. BLAST   [http://blast.wustl.edu]yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 14 of 14(page number not for citation purposes)

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0132618/manifest

Comment

Related Items