UBC Faculty Research and Publications

Identifying novel genes in C. elegans using SAGE tags Nesbitt, Matthew J; Moerman, Donald G; Chen, Nansheng Dec 10, 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12867_2010_Article_582.pdf [ 1.45MB ]
JSON: 52383-1.0223454.json
JSON-LD: 52383-1.0223454-ld.json
RDF/XML (Pretty): 52383-1.0223454-rdf.xml
RDF/JSON: 52383-1.0223454-rdf.json
Turtle: 52383-1.0223454-turtle.txt
N-Triples: 52383-1.0223454-rdf-ntriples.txt
Original Record: 52383-1.0223454-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessIdentifying novel genes in C. elegans using SAGEtagsMatthew J Nesbitt1, Donald G Moerman2, Nansheng Chen1*AbstractBackground: Despite extensive efforts devoted to predicting protein-coding genes in genome sequences, manybona fide genes have not been found and many existing gene models are not accurate in all sequenced eukaryotegenomes. This situation is partly explained by the fact that gene prediction programs have been developed basedon our incomplete understanding of gene feature information such as splicing and promoter characteristics.Additionally, full-length cDNAs of many genes and their isoforms are hard to obtain due to their low level or rareexpression. In order to obtain full-length sequences of all protein-coding genes, alternative approaches arerequired.Results: In this project, we have developed a method of reconstructing full-length cDNA sequences based onshort expressed sequence tags which is called sequence tag-based amplification of cDNA ends (STACE). Expressedtags are used as anchors for retrieving full-length transcripts in two rounds of PCR amplification. We havedemonstrated the application of STACE in reconstructing full-length cDNA sequences using expressed tags minedin an array of serial analysis of gene expression (SAGE) of C. elegans cDNA libraries. We have successfully appliedSTACE to recover sequence information for 12 genes, for two of which we found isoforms. STACE was used tosuccessfully recover full-length cDNA sequences for seven of these genes.Conclusions: The STACE method can be used to effectively reconstruct full-length cDNA sequences of genes thatare under-represented in cDNA sequencing projects and have been missed by existing gene prediction methods,but their existence has been suggested by short sequence tags such as SAGE tags.BackgroundThe nematode Caenorhabditis elegans, which is a well-established model organism for biomedical research [1],is the first metazoan whose genome was subject towhole-genome sequencing [2]. Its gene models werefirst predicted using the gene prediction program Gene-finder (P. Green, unpublished). Over the dozen yearssince the completion of the C. elegans genome sequen-cing project [2], the C. elegans gene set has been curatedby the C. elegans research community and by Worm-Base curators [1,3-5]. However, the C. elegans gene setis still far from complete for the following reasons: First,because Genefinder, like other gene prediction pro-grams, was developed based on an incomplete under-standing of gene structures, it suffers from both falsepositive and false negative predictions; second, manybona fide genes, especially those of unknown character,have been missed. In WormBase http://www.wormbase.org, the official database for the biology and genomicsof C. elegans, less than 40% of the annotated gene mod-els are fully confirmed. All others are either partiallysupported or not supported at all. Additional gene mod-els have been revealed in transcriptome sequencing[6,7], suggesting many gene models remain to be discov-ered. This situation is also true for other species [8]. Inthe human genome, it has been estimated that the mostaccurate programs only correctly predict 40% of theannotated genes [9].In this project, we explored how to reconstruct full-length gene models for genes that are not correctlyrepresented in the current gene set, using expressedsequence tags obtained in large-scale gene expressionprojects. In particular, we attempted to reconstructnovel C. elegans gene models using SAGE (serial* Correspondence: chenn@sfu.ca1Department of Molecular Biology and Biochemistry, Simon Fraser University,Burnaby, British Columbia, CanadaFull list of author information is available at the end of the articleNesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96© 2010 Nesbitt et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.analysis of gene expression). The SAGE technique wasoriginally developed for profiling gene expression[10,11]. The expression profiles created with SAGE havea wide range of applications that include therapeutic tar-get identification in cancerous tissues [12] and others ofbiological and medical importance [13]. Recently, SAGEwas applied to probe gene expression in C. elegans bythe C. elegans Gene Expression Consortium http://ele-gans.bcgsc.bc.ca/home/ge_consortium.html. TheseSAGE libraries have been fundamental for the successof a variety of research projects [14-19]. While SAGEtags that correspond to existing gene models can beused to evaluate the abundance of gene expression,there are a large number of SAGE tags that do not cor-respond to existing gene models. These SAGE tags sug-gest the existence of additional coding exons, splicevariants [20], or novel genes.ResultsTag based reconstruction of full-length cDNA sequence ofnovel genesExpressed sequence tags that cannot be aligned to theC. elegans virtual transcriptome (i.e., cDNA sequencesof all annotated transcripts) suggest the existence of yetunannotated genes [13,21]. We have established a proto-col, termed as “sequence tag-based amplification ofcDNA ends”, or STACE, based on the RACE protocol[22], to identify potential novel genes. The method canbe used to amplify full-length cDNA transcripts thathave been reverse-transcribed from the mRNA sequenceof novel genes. STACE uses three primer hybridizationsites. The first site (the 5’ site) is a sequence located atthe extreme 5’ end of the target transcript, the secondsite (the 3’ site) is downstream of the polyadenylationsequence, and the third site (the gene-specific site) cor-responds to the genomic span where the uncharacter-ized tag maps. The amplicons are then cloned,sequenced and mapped to the genome. As such, STACEnot only confirms the existence of a novel gene, but alsodefines the full-length transcript sequence of the yetundefined gene.In this project, in order to get a primer hybridizationsite at the extreme 5’ end of the RNA transcripts, wetook advantage of the trans-splice leader 1 (SL1) in C.elegans, and used its sequence as a primer for our 5’site. It is appropriate to design the 5’ primer based onthe SL1 sequence because SL1 is trans-spliced to theextreme 5’ end of nearly 50% of all C. elegans mRNAs[23,24]. For applications in which the sample transcrip-tome does not undergo trans-splicing of this nature, acommon oligo anchoring sequence can be ligated to the5’end of each transcript. An oligo sequence was attachedto the polyadenylation tracks of mRNA through reversetranscription with a modified oligo d(T) primer thatincluded a 3’ common sequence (5’ - CCAGACAC-TATGCTCATACGACGCAGT(16)VN - 3’). This pro-vided us with a cDNA library containing transcripts thathad a usable 3’ site. Finally, we chose gene-specific sitesby bioinformatically identifying SAGE tags. Whenaligned to the C. elegans genome, qualified SAGE tagsdo not overlap with existing gene models. For each qua-lified SAGE tag, a primer was designed and used in con-junction with a primer complementary to the SL1sequence to amplify the upstream amplicon. A secondprimer was designed and used in conjunction with theprimer complementary to the 3’ common sequence(above) to amplify the downstream amplicon. Thepotential template was amplified, and the ampliconsequences were mapped to the C. elegans genome usingBLAT [25], which is available at WormBase http://www.wormbase.org.Computational selection of SAGE tags that suggest novelgenesSAGE tags used in this study were selected from 33SAGE libraries, which were sequenced from differenttissues and developmental stages of C. elegans http://tock.bcgsc.bc.ca/cgi-bin/sage160. There are altogether220,770 unique SAGE tags in these libraries. OnlySAGE tags that did not overlap with annotated protein-coding genes in the WS160 version of the C. elegansgenome map were selected for this project.We obtained four different sets of SAGE tags for test-ing, one preliminary set and three test sets (Set 1-3)(Table 1). The preliminary set, which was arbitrarilychosen, was used to test the STACE protocol. Set 1used a longSAGE meta-library as a starting tag set(16,587 SAGE tags). Set 2 was created from the WS160version of the mixed stage library (14,701), and Set 3used SAGE libraries derived from Solexa sequencing ofthe SWN21 and SWN22 embryonic samples (359,457SAGE tags). Solexa SAGE produced more initial SAGEtags than the previous SAGE libraries because it hasmuch deeper coverage. Note that the Illumina SolexaGenome Analyzer produced a SAGE library that isabout 20 times more sensitive than a normal SAGElibrary [26].SAGE libraries were filtered to select SAGE tags forfinding novel genes (Figure 1). The criteria usedincluded the following: (1) Only SAGE tags that can bealigned to the C. elegans genome were selected; (2) TheSAGE tags must not overlap with any annotated codingexon; (3) To avoid tags containing sequencing errors,SAGE tags must have a frequency of at least three forevery 100,000 reads; (4) To increase the chances of find-ing novel genes (rather than novel missing exons), theSAGE tags must not overlap with an intron and have tobe at least 500 bp away from an annotated 5’ or 3’ geneNesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 2 of 9boundary; (5) SAGE tags must have a GC contentbetween 35% and 45%, which is critical for primerdesign.Primers based on SAGE tags were designed to ensurea reduced possibility of formation of secondary struc-tures which would inhibit proper annealing of the pri-mers [27]. For many cases, we trimmed sequences fromeither end of the SAGE tags to ensure primer quality.SAGE tag sequences that could not be used to guideproper primer design were not used. Primer design wasdone using the Primer3 program [28].cDNA librariesTwo different cDNA libraries were created; one from amixed stage population of C. elegans and another onefrom embryonic animals. In order to maximize thenumber of successful experiments, candidate SAGE tagswere only screened against the developmental librarythat corresponded with the time in development thatthe tags were originally observed.New transcripts and novel cDNAsSTACE-identified candidates consist of three categoriesbased on the alignments of these candidates to the C.elegans genome (release WS160): (1) novel gene (sixcandidates), (2) annotation extension (four candidates),and (3) non-protein-coding gene overlap (two candi-dates) (Figure 2; Table 2). Novel genes are proof ofentirely new genes discovered using the STACE method.The exons of these six new genes are all bordered bythe canonical GT-AG splice signals, as is the case withmost exons [29,30]. Our identification of six novel geneswas based on using the WS160 annotated genome. Inthe interim four have been annotated in WS200, whilethe other two are still completely novel. One of thesetwo new gene models was characterized as a full-lengthgene model with the STACE method, while the other’sexistence was implied by an upstream ampliconsequence. Four tested SAGE tag primers producedresults that suggest an extension to the annotated lengthof the gene models. These annotated extensions alignperfectly with annotated exons, and imply either addi-tional exons are transcribed within the gene, or that theterminal exons are longer than shown by WormBase.We found a successful STACE result overlapped witha pseudogene. While this transcript may not be trans-lated, using STACE we have clearly shown that it is pro-cessed with introns removed and a polyadenylation trackadded to the 3’end. We have also found that a STACEresult overlapped with an annotated ncRNA gene. Thetranscript was also processed with a previously unknownintron excised and a polyadenylation track added.Altogether, we have reconstructed seven full-length,true positive cDNA sequences, corresponding to sevenTable 1 SAGE tag numbers for each set through the identification of high value candidate SAGE tagsTotaltagsMappabletagsNon-transcriptometagsTags withfrequencycount >3Tags absent from geneboundaries and intronsTags withappropriate GCcontentTags that canserve as primersSAGE tagprimerstestedSet 1 16,587 13,743 3,052 616 418 128 39 30Set 2 14,701 10,534 4,755 365 41* 19 12 12Set 3 359,457 32,416 13,542 8,211 469 124 106 96Figure 1 SAGE tag primer development. SAGE tag pools are putthrough various filtrations to produce a source of SAGE tag primersthat are used in STACE experiments. 1) SAGE tag libraries aredownloaded from the MultiSAGE website. 2) All SAGE tags that donot fully map to the genome as one single uninterrupted alignmentare discarded. 3) SAGE tags that map to annotated transcribedsequences are discarded. 4) SAGE tags that have a low level ofexpression and are possible errors in sequencing are discarded. 5)SAGE tags that are found to overlap with intronic sequences ormap near a 5’ or 3’ boundary are eliminated. 6) Only tags with GCcontent between 35% and 45% are retained. 7) SAGE tags areedited into primer form. All SAGE tags that are likely to producesecondary structure (i.e. hairpins, homo-dimers, hetero-dimers) arediscarded. 8) A final list of SAGE tag primers is procured.Nesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 3 of 9separate gene models (Table 3). All seven cDNAs con-tain SL1 signals at the 5’ ends and polyadenylation atthe 3’ends. The remaining seven true positive cDNAsrecovered represent the 5’ends of separate gene models,and these too contain full-length 5’ SL1 signals. Thus, inthis study, we have identified 14 SL1-trans-splicedcDNA sequences. All 14 cDNA sequences have beensubmitted to GenBank (Table 3).DiscussionIn this project, we have developed an experimentalmethod termed STACE for reconstructing full-lengthcDNAs of novel genes. The applicability of STACE hasbeen demonstrated by defining novel genes in the well-curated C. elegans genome, using SAGE tags from geneexpression studies. We reconstructed seven novel full-length cDNAs and seven partial cDNA sequences thatcan be merged to existing gene models. Novel genes,Figure 2 STACE experimental procedure. Template cDNA is used in two separate PCR reactions that utilize primers based on the sequence ofa SAGE tag, the SL1 sequence and modified oligo d(T) primer. The PCRs produce an amplicon representative of the sequence upstream of theSAGE tag location (PCR product 1), and another that comes from the downstream sequence (PCR product 2). These products are sequenced,and mapped to the C. elegans genome. Those sequences whose alignments overlap with the SAGE tag used in primer design are consideredtrue positive STACE results.Nesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 4 of 9annotation extensions, and non-protein-coding geneoverlaps are represented by the identified cDNAsequences 3.3 (Figure 3), 3.1 (Figure 4), and P.3 (Figure5), respectively.We compared novel cDNAs with C. elegans genemodels predicted using AUGUSTUS [31], mGENE [32],TWINSCAN [33] and FGENESH++ [34], which areavailable at WormBase. All cDNAs, which were detectedusing STACE, when aligned to the C. elegans genomeoverlap to a certain extent with predicted gene models.The novel full-length cDNA 3.3 aligned well with a pre-diction from TWINSCAN and with a prediction madeTable 2 Result classifications for all sets of tested SAGE tag primersNovelGenesAnnotationExtensionsNon-Protein GeneOverlapNumber of Candidate cDNAs/Number of SAGE tag PrimersTestedPreliminarySet1 0 2 3/6 (50%)Set 1 3 1 0 4/30 (13%)Set 2 0 2 0 2/12 (17%)Set 3 2 1 0 3/96 (3%)Total 6 4 2 12/144 (8%)Table 3 Identified cDNA sequences from Set 3 STACE experimentsResult SAGE tag primer SAGE taglocationSequence 5’mappingboundarySequence 3’mappingboundaryGenBankaccessionnumberStatus (as of WS200)Full-lengthcDNA P.1GTTAGGATCGTAGAGGACATG II:8786920 II:8786297 II: 8787044 HQ451870 Overlaps F07H5.4 (pseudogene):evidence for extension to annotatedexonPartialcDNA P.2AGAGGATTAATTCCCCCCATG II:9375813 II:9376228 II:9375792 HQ451877 Overlaps with C06C3.10Full-lengthcDNA P.3GGGGGAAAATCGAAAGACATG II:10201160 II: 10202155 II:10201084 HQ451871 Overlaps with tts-2 (ncRNA): evidencefor new intronPartialcDNA 1.1GAAACGAAGAAGAAAAGCATG V:19434698 V:19434352 V:19434718 HQ451878 Evidence of a novel geneFull-lengthcDNA 1.2TTCGACGGCAGATTGTTCATG V:19432707 V:19433037 V:19432406 HQ451872 Overlaps with C25F9.11: evidence fornew 5’ UTRFull-length 1.3TAGCTCAGTCAAAACAACATG V:5812559 V:5813070 V:5812296 HQ451873 Overlaps with ZC250.4: evidence forextension to 3’ UTRPartialcDNA 1.4aAAAGTTGAGCTTCTGCTCATG X:2346863 X:2335678 X:2346883 HQ451879 Overlaps with T01B6.1: evidence fornew coding sequencePartialcDNA1.4bAAAGTTGAGCTTCTGCTCATG X:2346863 X:2345479 X:2346883 HQ451880 Overlaps with T01B6.1: evidence fornew transcriptional start sitePartialcDNA 2.1aTGGTTGTTAGTAGTGTACATG II:15229391 II:15207408 II:15229412 HQ451881 Overlaps with Y46E12BL.4: evidencefor new 3’ UTR exonPartialcDNA2.1bTGGTTGTTAGTAGTGTACATG II:15229391 II:15216289 II:15229412 HQ451882 Overlaps with Y46E12BL.4: evidencefor new initial coding exonFull-lengthcDNA 2.2CCATCTAAAGGGCTCTACA IV:4415359 IV:44085996 IV:4415616 HQ451874 Overlaps with Y24D9A.1: evidence forextension to 3’ UTRFull-lengthcDNA 3.1CTCATTGAAGGTGAAGCAT X:14690913 X:14692920 X:14690763 HQ451875 Overlaps with sox-3: evidence fornew 3’ UTRPartialcDNA 3.2TGAAATGTCACAGTACACAT III:7604002 III: 7601399 III:7604022 HQ451883 Evidence of a novel geneFull-lengthcDNA 3.3GAGAGAATTGTTGTGACCAT X:4681136 X:4682689 X:4680952 HQ451876 Evidence of a novel geneAll data is current as of WS200.Nesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 5 of 9Figure 3 Candidate cDNA 3.3 alignment. Full length gene model reconstructed for candidate cDNA 3.3. This gene model suggests acompletely novel gene that is missing from WormBase as of WS200.Figure 4 Candidate DNA 3.1 alignment. Full length gene model reconstructed for candidate cDNA 3.1. This gene model suggests a 3’ UTRextension to the current sox-3 gene model.Nesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 6 of 9by FGENESH++. The annotation extension result (full-length cDNA 3.1) was found to overlap with gene pre-dictions from each of the utilized programs. However, anew 3’ UTR exon was shown to be part of this genemodel, and this exon did not overlap with the predic-tions made by any of the described programs. Addition-ally, the P.3 result overlapped with an existing ncRNAgene model. However, the novel intron suggested bythis STACE result was not included in the WormBasegene model, although it overlaps with AUGUSTUSprediction.ConclusionsWe have found that the STACE method can be used torecover accurate full-length gene models. This methodis useful for reconstructing gene models for genes thathave been missed in cDNA sequencing projects andwere missed or mispredicted by gene finders. With thewide application of next-generation sequencing methodsin the deep sequencing of transcriptomes, moreexpressed sequence tags, which indicate the presence ofnovel genes will be uncovered. We expect that thesetags will serve as input to the STACE protocol forfurther novel gene discovery and determination.MethodscDNA library productionTwo samples of C. elegans were produced that repre-sented both a mixed stage population and an embryonicsample. Tissue samples were put through an RNA extrac-tion using TRIzol (Invitrogen, SKU# 10296-028). ThecDNA libraries used in this project were created with theSuperscript III reverse transcriptase kit (Invitrogen,SKU# 18080-085), and the primer used to initiate reversetranscription was a modified oligo d(T) primer (5’ -CCAGACACTATGCTCATACGACGCAGT(16) VN - 3’)(Invitrogen). The protocol accompanying the kit was fol-lowed, and the samples were treated with RibonucleaseH (Invitrogen, SKU# 18021-014).Amplification of tag endsThe reverse complement of each SAGE tag sequencewas used to design the SAGE tag primers. These pri-mers were used in conjunction with a primer based onthe SL1 sequence (5’ - GGTTTAATTACCCAAGTTT-GAG - 3’) in a PCR. The PCR was initiated with a 94°Cmelt step for 2 minutes, followed by 32 cycles of a 94°Cmelt step for 15 seconds, a 60°C annealing step for 45seconds, and a 72°C extension step for 1 minute. ThisFigure 5 Candidate cDNA P.3 alignment. Full length gene model reconstructed for candidate cDNA P.3. This gene model suggests revision of thestructure to the tts-2 gene model, and also that this transcript is polyadenylated, a feature that is commonly associated with protein-coding genes.Nesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 7 of 9was followed by a final extension at 72°C for 5 minutes.A Taq polymerase provided by Dr. Harald Hutter wasused in all of the PCRs. Amplicons produced by thePCRs were visualized with a 1% gel electrophoresis, andextracted with a QIAquick Gel Extraction kit (Qiagen,ID 28704). These amplicons were then cloned with theInsTAclone kit (Fermentas, #K1214). Cloned ampliconswere submitted for sequencing (Macrogen, Seoul,Korea), and returned sequences were mapped back tothe C. elegans genome with the BLAT tool [25] on theWormBase website http://www.wormbase.org/. Weopted to use BLAT instead of other alignment toolsbecause this program can take spliced mRNA sequences(i.e. STACE cloned sequences) and align them to thegenome in a way that reflects intron - exon boundaries[25,35]. Those amplicons whose sequence alignmentindicated a true positive result were then further stu-died. The returned sequence was used to design aninternal primer that would be compatible with the uni-versal primer (5’ - CACTATGCTCATACGACGCAGT -3’). These primers were then used in a PCR with thesame parameters described above to produce the down-stream amplicons needed for full-length characteriza-tion. Internal primers were designed using the Primer3program [28].AcknowledgementsWe thank Drs. Harald Hutter, David Baillie and Robert Johnsen for theiradvice, technical assistance, and reagents. We also thank Dr. Johnsen forproofreading the manuscript. We thank members of the Chen, Hutter, andBaillie laboratories for their technical assistance. Lindsay McGhee helped withgenerating figures. This project is supported by a Discovery Grant from theNatural Sciences and Engineering Research Council of Canada (NSERC) toNC. NC is also a Michael Smith Foundation for Health Research (MSFHR)Scholar and a Canadian Institutes of Health Research (CIHR) NewInvestigator.Author details1Department of Molecular Biology and Biochemistry, Simon Fraser University,Burnaby, British Columbia, Canada. 2Department of Zoology, University ofBritish Columbia, Vancouver, British Columbia, Canada.Authors’ contributionsNC and DGM conceived of the study. MJN conducted the experiments. MJNand NC wrote the manuscript with input from DGM. All authors have readand approved the final manuscript.Received: 20 April 2010 Accepted: 10 December 2010Published: 10 December 2010References1. Hillier LW, Coulson A, Murray JI, Bao Z, Sulston JE, Waterston RH: Genomicsin C. elegans: so many genes, such a little worm. Genome Res 2005,15:1651-1660.2. C. elegans Sequencing Consortium: Genome sequence of the nematodeC. elegans: a platform for investigating biology. Science 1998,282:2012-2018.3. Chen N, Harris TW, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, Bradnam K,Canaran P, Chan J, Chen CK, et al: WormBase: a comprehensive dataresource for Caenorhabditis biology and genomics. Nucleic Acids Res2005, 33:D383-389.4. Waterston R, Martin C, Craxton M, Huynh C, Coulson A, Hillier L, Durbin R,Green P, Shownkeen R, Halloran N, et al: A survey of expressed genes inCaenorhabditis elegans. Nat Genet 1992, 1:114-123.5. Reboul J, Vaglio P, Rual JF, Lamesch P, Martinez M, Armstrong CM, Li S,Jacotot L, Bertin N, Janky R, et al: C. elegans ORFeome version 1.1:experimental verification of the genome annotation and resource forproteome-scale protein expression. Nat Genet 2003, 34:35-41.6. Hillier LW, Reinke V, Green P, Hirst M, Marra MA, Waterston RH: Massivelyparallel sequencing of the polyadenylated transcriptome of C. elegans.Genome Res 2009, 19:657-666.7. Shin H, Hirst M, Bainbridge MN, Magrini V, Mardis E, Moerman DG,Marra MA, Baillie DL, Jones SJ: Transcriptome analysis for Caenorhabditiselegans based on novel expressed sequence tags. BMC Biol 2008, 6:30.8. Brent MR: Genome annotation past, present, and future: how to definean ORF at each locus. Genome Res 2005, 15:1777-1786.9. Guigo R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F,Antonarakis S, Ashburner M, Bajic VB, Birney E, et al: EGASP: the humanENCODE Genome Annotation Assessment Project. Genome Biol 2006,7(Suppl 1):S2 1-31.10. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of geneexpression. Science 1995, 270:484-487.11. Gnatenko DV, Dunn JJ, McCorkle SR, Weissmann D, Perrotta PL, Bahou WF:Transcript profiling of human platelets using microarray and serialanalysis of gene expression. Blood 2003, 101:2285-2293.12. Porter D, Yao J, Polyak K: SAGE and related approaches for cancer targetidentification. Drug Discov Today 2006, 11:110-118.13. Wang SM: Understanding SAGE data. Trends Genet 2007, 23:42-50.14. Pleasance ED, Marra MA, Jones SJ: Assessment of SAGE in transcriptidentification. Genome Res 2003, 13:1203-1215.15. Blacque OE, Perens EA, Boroevich KA, Inglis PN, Li C, Warner A, Khattra J,Holt RA, Ou G, Mah AK, et al: Functional genomics of the cilium, asensory organelle. Curr Biol 2005, 15:935-941.16. Jones SJ, Riddle DL, Pouzyrev AT, Velculescu VE, Hillier L, Eddy SR,Stricklin SL, Baillie DL, Waterston R, Marra MA: Changes in gene expressionassociated with developmental arrest and longevity in Caenorhabditiselegans. Genome Res 2001, 11:1346-1352.17. McGhee JD, Fukushige T, Krause MW, Minnema SE, Goszczynski B, Gaudet J,Kohara Y, Bossinger O, Zhao Y, Khattra J, et al: ELT-2 is the predominanttranscription factor controlling differentiation and function of the C.elegans intestine, from embryo to adult. Dev Biol 2009, 327:551-565.18. McGhee JD, Sleumer MC, Bilenky M, Wong K, McKay SJ, Goszczynski B,Tian H, Krich ND, Khattra J, Holt RA, et al: The ELT-2 GATA-factor and theglobal regulation of transcription in the C. elegans intestine. Dev Biol2007, 302:627-645.19. Wang X, Zhao Y, Wong K, Ehlers P, Kohara Y, Jones SJ, Marra MA, Holt RA,Moerman DG, Hansen D: Identification of genes expressed in thehermaphrodite germ line of C. elegans using SAGE. BMC Genomics 2009,10:213.20. Ruzanov P, Jones SJ, Riddle DL: Discovery of novel alternatively spliced C.elegans transcripts by computational analysis of SAGE data. BMCGenomics 2007, 8:447.21. Chen J, Sun M, Lee S, Zhou G, Rowley JD, Wang SM: Identifying noveltranscripts and novel genes in the human genome by using novel SAGEtags. Proc Natl Acad Sci USA 2002, 99:12257-12262.22. Schaefer BC: Revolutions in rapid amplification of cDNA ends: newstrategies for polymerase chain reaction cloning of full-length cDNAends. Anal Biochem 1995, 227:255-273.23. Zorio DA, Cheng NN, Blumenthal T, Spieth J: Operons as a common formof chromosomal organization in C. elegans. Nature 1994, 372:270-272.24. Blumenthal T: Trans-splicing and operons. WormBook 2005, 1-9.25. Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res 2002,12:656-664.26. Bonetta L: Gene expression: an expression of interest. Nature 2006,440:1233-1237.27. Gamper HB, Cimino GD, Hearst JE: Solution hybridization of crosslinkableDNA oligonucleotides to bacteriophage M13 DNA. Effect of secondarystructure on hybridization kinetics and equilibria. J Mol Biol 1987,197:349-362.28. Rozen S, Skaletsky H: Primer3 on the WWW for general users and forbiologist programmers. Methods Mol Biol 2000, 132:365-386.Nesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 8 of 929. Breathnach R, Benoist C, O’Hare K, Gannon F, Chambon P: Ovalbumingene: evidence for a leader sequence in mRNA and DNA sequences atthe exon-intron boundaries. Proc Natl Acad Sci USA 1978, 75:4853-4857.30. Breathnach R, Chambon P: Organization and expression of eucaryoticsplit genes coding for proteins. Annu Rev Biochem 1981, 50:349-383.31. Stanke M, Waack S: Gene prediction with a hidden Markov model and anew intron submodel. Bioinformatics 2003, 19(Suppl 2):ii215-225.32. Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, DeBona F, Hartmann L, Bohlen A, et al: mGene: accurate SVM-based genefinding with an application to nematode genomes. Genome Res 2009,19:2133-2143.33. Korf I, Flicek P, Duan D, Brent MR: Integrating genomic homology intogene structure prediction. Bioinformatics 2001, 17(Suppl 1):S140-148.34. Solovyev V, Kosarev P, Seledsov I, Vorobyev D: Automatic annotation ofeukaryotic genes, pseudogenes and promoters. Genome Biol 2006,7(Suppl 1):S10 11-12.35. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H,Diekhans M, Furey TS, Harte RA, Hsu F, et al: The UCSC Genome BrowserDatabase: update 2006. Nucleic Acids Res 2006, 34:D590-598.doi:10.1186/1471-2199-11-96Cite this article as: Nesbitt et al.: Identifying novel genes in C. elegansusing SAGE tags. BMC Molecular Biology 2010 11:96.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitNesbitt et al. BMC Molecular Biology 2010, 11:96http://www.biomedcentral.com/1471-2199/11/96Page 9 of 9


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items