UBC Faculty Research and Publications

A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished… Ralph, Steven G; Jung E Chun, Hye; Kolosova, Natalia; Cooper, Dawn; Oddy, Claire; Ritland, Carol E; Kirkpatrick, Robert; Moore, Richard; Barber, Sarah; Holt, Robert A; Jones, Steven J; Marra, Marco A; Douglas, Carl J; Ritland, Kermit; Bohlmann, Jörg Oct 14, 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-12864_2008_Article_1677.pdf [ 563.97kB ]
Metadata
JSON: 52383-1.0132587.json
JSON-LD: 52383-1.0132587-ld.json
RDF/XML (Pretty): 52383-1.0132587-rdf.xml
RDF/JSON: 52383-1.0132587-rdf.json
Turtle: 52383-1.0132587-turtle.txt
N-Triples: 52383-1.0132587-rdf-ntriples.txt
Original Record: 52383-1.0132587-source.json
Full Text
52383-1.0132587-fulltext.txt
Citation
52383-1.0132587.ris

Full Text

ralssBioMed CentBMC GenomicsOpen AcceResearch articleA conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis)Steven G Ralph1,5, Hye Jung E Chun2, Natalia Kolosova1,3, Dawn Cooper1, Claire Oddy1, Carol E Ritland4, Robert Kirkpatrick2, Richard Moore2, Sarah Barber2, Robert A Holt2, Steven JM Jones2, Marco A Marra2, Carl J Douglas3, Kermit Ritland4 and Jörg Bohlmann*1Address: 1Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada, 2British Columbia Cancer Agency Genome Sciences Centre, Vancouver, British Columbia, V5Z 4E6, Canada, 3Department of Botany, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada, 4Department of Forest Sciences, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada and 5Department of Biology, University of North Dakota, Grand Forks, ND, 58202-9019, USAEmail: Steven G Ralph - steven.ralph@und.nodak.edu; Hye Jung E Chun - echun@bcgsc.ca; Natalia Kolosova - kolosova@interchange.ubc.ca; Dawn Cooper - dmcooper@sfu.ca; Claire Oddy - coddy@interchange.ubc.ca; Carol E Ritland - critland@interchange.ubc.ca; Robert Kirkpatrick - robertk@bcgsc.ca; Richard Moore - rmoore@bcgsc.ca; Sarah Barber - sbarber@bcgsc.ca; Robert A Holt - rholt@bcgsc.ca; Steven JM Jones - sjones@bcgsc.ca; Marco A Marra - mmarra@bcgsc.ca; Carl J Douglas - cdouglas@interchange.ubc.ca; Kermit Ritland - kritland@interchange.ubc.ca; Jörg Bohlmann* - bohlmann@interchange.ubc.ca* Corresponding author    AbstractBackground: Members of the pine family (Pinaceae), especially species of spruce (Picea spp.) andpine (Pinus spp.), dominate many of the world's temperate and boreal forests. These conifer forestsare of critical importance for global ecosystem stability and biodiversity. They also provide themajority of the world's wood and fiber supply and serve as a renewable resource for otherindustrial biomaterials. In contrast to angiosperms, functional and comparative genomics researchon conifers, or other gymnosperms, is limited by the lack of a relevant reference genome sequence.Sequence-finished full-length (FL)cDNAs and large collections of expressed sequence tags (ESTs)are essential for gene discovery, functional genomics, and for future efforts of conifer genomeannotation.Results: As part of a conifer genomics program to characterize defense against insects andadaptation to local environments, and to discover genes for the production of biomaterials, wedeveloped 20 standard, normalized or full-length enriched cDNA libraries from Sitka spruce (P.sitchensis), white spruce (P. glauca), and interior spruce (P. glauca-engelmannii complex). Wesequenced and analyzed 206,875 3'- or 5'-end ESTs from these libraries, and developed a resourceof 6,464 high-quality sequence-finished FLcDNAs from Sitka spruce. Clustering and assembly of147,146 3'-end ESTs resulted in 19,941 contigs and 26,804 singletons, representing 46,745 putativeunique transcripts (PUTs). The 6,464 FLcDNAs were all obtained from a single Sitka sprucePublished: 14 October 2008BMC Genomics 2008, 9:484 doi:10.1186/1471-2164-9-484Received: 10 June 2008Accepted: 14 October 2008This article is available from: http://www.biomedcentral.com/1471-2164/9/484© 2008 Ralph et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 17(page number not for citation purposes)genotype and represent 5,718 PUTs.BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484Conclusion: This paper provides detailed annotation and quality assessment of a large EST andFLcDNA resource for spruce. The 6,464 Sitka spruce FLcDNAs represent the third largestsequence-verified FLcDNA resource for any plant species, behind only rice (Oryza sativa) andArabidopsis (Arabidopsis thaliana), and the only substantial FLcDNA resource for a gymnosperm.Our emphasis on capturing FLcDNAs and ESTs from cDNA libraries representing herbivore-,wound- or elicitor-treated induced spruce tissues, along with incorporating normalization tocapture rare transcripts, resulted in a rich resource for functional genomics and proteomicsstudies. Sequence comparisons against five plant genomes and the non-redundant GenBank proteindatabase revealed that a substantial number of spruce transcripts have no obvious similarity toknown angiosperm gene sequences. Opportunities for future applications of the sequence andclone resources for comparative and functional genomics are discussed.BackgroundConifers (members of the pine family) have very largegenomes (10 to 40 Gb, [1]), and this poses difficulties forboth structural and functional genomic studies. In addi-tion, their generation times are long and their habitualout-breeding nature prevents the development of inbredstrains useful for genetics research. A further difficulty inconifer genomics is the large evolutionary distancebetween conifers and angiosperms (i.e., flowering plants),separated by 300 million years of evolution [2], whichseverely restricts gene comparisons of conifers withangiosperms. While there are several completelysequenced angiosperm genomes, as well as high-qualitysequence-finished full-length (FL)cDNA resources, forArabidopsis [3,4], rice [5-7], poplar (Populus trichocarpa;[8,9]), grapevine (Vitis vinifera; [10]), and a moss (Phys-comitrella patens; [11]), these basic genomics resourceshave not yet been developed for the conifer phyla or forany other gymnosperm.In species with large genomes, a critical first step forgenome characterization is to survey the expressed genes.A common approach to characterize the expressedgenome is to sequence cDNA libraries and to assemblelarge collections of expressed sequence tags (ESTs) [12]. Inthe absence of a conifer genome sequence, large and deepEST collections are particularly useful. Sequencing ofcDNA libraries constructed from diverse tissues and devel-opmental stages, and from materials subjected to diverseenvironmental conditions or treatments, enhances thediversity of genes captured in EST populations. In addi-tion, normalization techniques reduce the frequency ofhighly expressed genes and increase the rate of rare genediscovery [13,14], thus providing more comprehensivecoverage of the expressed genome.In conifers, gene discovery via EST sequencing was firstconducted in loblolly pine (Pinus taeda; [15]), the mosteconomically important tree species in the southeasterntreatments such as drought stress [17] and embryogenesis[18]. As of May 2008, the loblolly pine EST collection con-tains more than 328,000 sequences [19]. Recent ESTprojects with species of spruce have used tissues related toshoot growth and xylem development in white spruce[20,21], wound treatment in interior spruce [21], rootdevelopment in Sitka spruce [21], and xylem develop-ment and bud burst in Norway spruce (P. abies; [22,23]).EST resources have also been developed for a few othergymnosperm species outside of the pine family, such ascycas (Cycas rumphii; [24]), ginkgo (Ginkgo biloba; [25]),Japanese yew (Taxus cuspidata; [26]), Japanese cedar (Cryp-tomeria japonica; [27,28]) and Hinoki cypress (Chamae-cyparis obtusa; [28]).In addition to deep EST sampling, other important com-ponents of a cDNA sequence resource are the quality andlength of sequence coverage for a given gene. Ideally,FLcDNA clones that capture the entire mature transcript ofa gene should be identified and completely sequencedwith high accuracy. FLcDNA sequences should span notonly the protein-coding open reading frame (ORF) regionbut also the non-coding 5' and 3' untranslated regions(UTRs). Most importantly, true FLcDNA sequencesshould be derived from a single individual FLcDNA clone.Using individual clones prevents the assembly of chimericFLcDNA sequences consisting of ESTs from multiplecDNA clones representing closely related genes. Further-more, allelic nucleotide polymorphisms and alternativelyspliced variants of a gene are difficult to detect using in sil-ico assembled sequence contigs from multiple clones. Tofurther discriminate among closely related genes, theauthenticity of sequences should be verified by re-sequencing of the same clone (sequence verification).Compared to single-pass ESTs or in silico assembledsequence contigs originating from multiple clones,sequence-verified FLcDNA clones offer several advantagesfor comparative, structural, and functional genome analy-ses, in particular for conifers with their great evolutionaryPage 2 of 17(page number not for citation purposes)USA. The early emphasis in loblolly pine was on woodforming tissues [16], but newer projects have involveddistance from angiosperms. First, the complete protein-coding regions of FLcDNAs can be unambiguously identi-BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484fied. An accurate prediction of full-length proteinsequences aides in the correct identification of distantangiosperm homologues. Second, in anticipation of afuture conifer genome sequence, FLcDNAs can be used toimprove gene prediction from genomic sequences as dem-onstrated in Arabidopsis [29-31] and poplar [8,9]. Third,FLcDNA clones can be used for functional characteriza-tion of conifer genes using biochemical approaches [e.g.,[32,33]] or for functional complementation of mutants inheterologous systems. Given the lack of knock-outmutants in conifers and the slow process of generatingknock-down mutants in conifers, biochemical approachesand heterologous complementation that rely on FLcDNAclones are essential tools for functional genomics in coni-fers. Finally, FLcDNAs can be used to accurately identifypeptides in large-scale conifer proteome analyses [34,35].Despite their immense value, sequence-verified FLcDNAclones have not been generated in most plant species sub-jected to genome analysis. Only a few resources of largeand sequence-verified FLcDNA data sets have been gener-ated for angiosperm plant species; namely, for Arabidop-sis [4], rice [7], and poplar [9]. In contrast, no substantialFLcDNA resource has been reported for a conifer or anyother gymnosperm species. The Conifer Forest Healthgenomics project "Treenomix" [36] aims to developgenomic resources for spruce, characterize mechanisms ofresistance against insect pests and adaptation to localenvironments, and identify genes for the formation of ole-oresin-based terpenoid biomaterials [37-43]. Here, wereport on a comprehensive spruce EST and FLcDNAresource and discuss its utility for conifer genomics. Atotal of 206,875 ESTs were obtained by sequencing 20standard, normalized or full-length cDNA librariesderived from Sitka spruce, white spruce, and interiorspruce. Analysis of ESTs identified 46,745 putative uniquetranscripts (PUTs). We describe advantages covered by thefirst large set of 6,464 sequence-verified, high-qualityFLcDNAs obtained from a single clonally propagated treeof Sitka spruce.ResultsSequencing and assembly of spruce ESTsWe constructed 20 unidirectional standard, normalized orfull-length enriched cDNA libraries from various tissues,developmental stages, and stress treatments of Sitkaspruce, white spruce and interior spruce (Table 1). Severallibraries were made from trees subjected to insect feedingby white pine weevils (Pissodes strobi) or spruce budworms(Choristoneura occidentalis), or to herbivory-simulationtreatments such as mechanical wounding or methyl jas-monate application. From these libraries, we obtained206,875 EST sequences, consisting of 165,403 3'-end ESTsequence reads from 5'-ends were performed as pairedend reads, primarily from clones derived from FLcDNAlibraries, to support the identification of a non-redundantFLcDNA set for complete insert sequencing. Removinglow-quality and vector sequences (see Table 2 for criteria),as well as any obvious contaminant sequences, provideda database containing 147,146 high-quality (hq) 3' ESTs(88.9% success rate) with an average read length of 656bp (Table 2). When we analyzed the 147,146 hq 3'-endESTs using the CAP3 program ([44]; assembly criteria:95% identity, 40 bp window), 120,342 ESTs assembledinto 19,941 contigs and the remaining 26,804 ESTs wereclassified as singletons, suggesting a combined total of46,745 PUTs across Sitka spruce, white spruce and interiorspruce (Table 2). On average, contigs contained six assem-bled EST sequences. Only 88 contigs consisted of greaterthan 50 ESTs. The five largest contigs contain 618 (aspar-tyl protease), 229 (ribulose biphosphate carboxylasesmall subunit), 222 (metallothionein), 209 (translation-ally controlled tumor protein) and 172 (no significantmatch) ESTs. The proportion of EST sequences fromorganelles was small. Known and putative mitochondrialand chloroplast sequences contribute only 285 (0.19%)and 787 (0.53%) ESTs to the entire data set, respectively.In separate species-specific assemblies using ESTs fromonly white spruce or Sitka spruce, we identified 23,963PUTs (72,649 3'-end EST sequences, 10,948 contigs and13,015 singletons) and 17,988 PUTs (49,198 3'-end ESTsequences, 6,918 contigs and 11,070 singletons), respec-tively.Gene discovery in normalized and non-normalized cDNA librariesFrom each of the 20 cDNA libraries, between 1,536 and24,959 clones were 3'-end sequenced, with the rate of hqsequences ranging from 77.1% to 94.1% and an averageEST length of 532 bp to 756 bp in each library (AdditionalFile 1). The rate of gene discovery for each library wasassessed from: (1) the number of unique transcriptssequenced from each library; (2) the average number ofEST sequences forming contigs; (3) the percentage of ESTswith no similarity to protein sequences in the non-redun-dant (NR) database of GenBank using BLASTX; (4) thepercentage of singleton ESTs; and (5) the percentage oflibrary-specific transcripts. Based on these criteria, all buttwo of the normalized libraries (i.e., WS-SE-N-A-18 andWS-SE-N-A-19) showed considerably higher rates of genediscovery, and hence higher complexity, than the corre-sponding non-normalized libraries (Additional File 1).For example, among the six successfully normalized ESTlibraries, the percentage of unique transcripts identifiedwithin the first 1,000 reads averaged 94.7% (92.7% to95.9%), whereas among the seven corresponding stand-Page 3 of 17(page number not for citation purposes)sequences and 41,472 5'-end EST sequences (Table 2). Weinitially focused on 3'-end sequencing. Subsequentard EST libraries made from the same RNA samples, theaverage was only 78.8% (73.8% to 85.6%). The diversityBMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484of starting biological materials combined with normaliza-tion resulted in low sequence redundancy demonstratedby the presence of only three PUTs (derived from 3'-endESTs) sequenced in all of the 20 cDNA libraries (Table 3).These three transcripts were identified as translationallycontrolled tumor protein (209 ESTs), eukaryotic transla-tion initiation factor 5A (115 ESTs) and S-adenosylme-thionine synthase (104 ESTs).Quality assessment of FLcDNAsFLcDNAs are defined as individual cDNA clones that con-tain the complete ORF coding sequence as well as at leastpartial 5' and 3' UTRs for a given transcript. We preparedthree FLcDNA libraries using the biotinylated cap trappermethod [45]. All FLcDNA libraries were made from insect-date clones for complete insert sequencing, whichresulted in 6,464 hq sequence-verified FLcDNA clones(Additional File 2). Analysis of the 6,464 FLcDNAsequences using the CAP3 program ([44]; assembly crite-ria: 95% identity, 40 bp window) identified 5,197 FLcD-NAs as singletons, with the remaining 1,267 groupinginto 521 contigs, suggesting a total of 5,718 PUTs repre-sented with finished FLcDNA sequences. The high rate(88.5%) of unique transcript discovery resulted from asuccessful strategy for selection of a low-redundancyFLcDNA clone set prior to sequence finishing (Figure 1).All 6,464 sequence-verified FLcDNAs achieved a mini-mum of Phred30 sequence quality at every base (i.e., nomore than one error in 103 bases). The majority were ofTable 1: Libraries, tissue sources and spruce species for sequences described in this studycDNA Library Tissue/Developmental stage Species (genotype)WS-ES-A-1a Young shoots harvested from 25-year old treesd. P. glauca (PG-29)WS-PS-A-2a Flushing buds, young shoots and mature shoots harvested from 25-year old treesd. P. glauca (PG-29)WS-X-A-3a Early (June 15th), mid (July 10th) and late (August 17th) season outer xylem harvested from 25-year old treesd.P. glauca (PG-29)IS-B-A-4a Bark tissue (with phloem and cambium) harvested after razor blade wounding and treatment with 0.01% methyl jasmonate. Tissue was collected 0 (untreated), 3, 6 and 12 h post-treatmente.P. glauca × P. engelmannii (Fal-1028)SS-R-A-5a Young growth (terminal 1–3 cm) and mature growth (distal to terminal 1–3 cm) rootse. P. sitchensis (Gb2-229)WS-PP-A-6a Early (June 15th), mid (July 10th) and late (August 17th) season phloem harvested from 25-year old treesd.P. glauca (PG-29)IS-B-A-7a Bark tissue (with phloem and cambium) harvested after razor blade wounding and treatment with 0.01% methyl jasmonate. Tissue was collected 24 h, 2 d, 4 d and 8 d post-treatmente.P. glauca × P. engelmannii (Fal-1028)WS-PS-N-A-8b Flushing buds, young shoots and mature shoots harvested from 25-year old treesd. P. glauca (PG-29)WS-X-N-A-9b Early (June 15th), mid (July 10th) and late (August 17th) season outer xylem harvested from 25-year old treesd.P. glauca (PG-29)IS-B-N-A-10b Bark tissue (with phloem and cambium attached) harvested after razor blade wounding and treatment with 0.01% methyl jasmonate. Tissue was collected 0 h (untreated), 3 h, 6 h, 12 h, 24 h, 2 d, 4 d and 8 d post-treatmente.P. glauca × P. engelmannii (Fal-1028)SS-R-N-A-11b Young growth (terminal 1–3 cm) and mature growth (distal to terminal 1–3 cm) rootse. P. sitchensis (Gb2-229)WS-PP-N-A-12b Early (June 15th), mid (July 10th) and late (August 17th) season phloem harvested from 25-year old treesd.P. glauca (PG-29)SS-IB-A-FL-13c Bark tissue (with phloem and cambium attached) harvested after continuous feeding by Pissodes strobi weevils. Tissue was collected 2, 6 and 48 h post-treatmente.P. sitchensis (FB3-425)SS-IL-A-FL-14c Green portion of leader tissue harvested after continuous feeding by Choristoneura occidentalis budworms. Tissue was collected 3 h, 6 h, 12 h, 24 h, 52 h, 4 d, 6 d, 8 d and 10 d post-treatmente.P. sitchensis (FB3-425)SS-IB-A-FL-15c Bark tissue (with phloem and cambium attached) harvested after continuous feeding by P. strobi weevils. Tissue was collected 2, 6 and 48 h post-treatmente.P. sitchensis (FB3-425)WS-SE-A-16a Somatic embryo tissue harvested at the callus stage, and after 2, 4 and 6 weeks of growth on media supplemented with abscisic acid and indole-3-butyric acid.P. glauca (I-1026)WS-MC-A-17a Cones harvested from 25-year old treesd P. glauca (11)WS-SE-N-A-18b Somatic embryo tissue harvested at the callus stage, and after 2, 4 and 6 weeks of growth on media supplemented with abscisic acid and indole-3-butyric acid.P. glauca (I-1026)WS-SE-N-A-19b Somatic embryo tissue harvested at the callus stage, and after 2, 4 and 6 weeks of growth on media supplemented with abscisic acid and indole-3-butyric acid.P. glauca (I-1026)WS-MC-N-A-20b Cones harvested from 25-year old treesd P. glauca (11)aStandard cDNA library; bNormalized cDNA library; cFull-length cDNA library; dField site located at Kalamalka Research Station in Vernon, British Columbia; eOne- or two-year old trees grown in potted soil under greenhouse conditions at the University of British ColumbiaPage 4 of 17(page number not for citation purposes)induced tissues of a single Sitka spruce genotype (Table1). From these libraries, we identified 8,127 cDNA candi-even higher quality with the minimum and average qual-ity values exceeding Phred45 (less than one error inBMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484approximately 3 × 104 bases) and Phred80 (less than oneerror in 108 bases), respectively (Figure 2). We predictedthe complete protein-coding ORFs for all 6,464 FLcDNAs(Additional File 2). The average sequenced FLcDNAlength (from beginning of the 5' UTR to the end of thepolyA tail) was 1,088 ± 404 bp (mean ± SD), and rangedfrom 401 to 3,003 bp, whereas the average predicted ORFwas 616 ± 374 bp and ranged from 30 to 2,583 bp (Figure3). ORFs could not be detected (i.e., less than 30 bp) for11 FLcDNAs. The 5' and 3' UTRs averaged 154 ± 164 bpand 301 ± 174 bp, respectively (Figure 3).To further assess the quality of the FLcDNAs, we per-formed reciprocal BLAST analysis using 872 known FLsequences from other conifer and gymnosperm speciesidentified in previous entries in the NR database of Gen-Bank. Using a stringent similarity threshold [identity ≥50%; BLASTX score value ≥ 95, where alignment scores arecalculated based on match, mismatch and gaps in align-ments using the default BLAST scoring matrices andparameters] we identified 297 pairs of Sitka spruce andother gymnosperm FLcDNAs. Of these pairs, 244 (82.1%)agreed well with regard to their ORF lengths (Figure 4)and positions of their starting methionine and stopcodons (± ten amino acids). For the remaining pairs, thepredicted 5' and/or 3' ORF ends did not match, suggestingalternative start or stop codons, splice variants, or the pos-sibility that one of the pair members was truncated or hadan incorrectly predicted ORF. Despite the relatively smallnumber of other gymnosperm FL sequences available forpairwise comparison, the high sequence similarity withinthis dataset indicates that most of the 6,464 FLcDNAs rep-resent true FL transcripts with complete ORFs and cor-rectly annotated start and stop codons.Most spruce ESTs have low similarity with angiosperm sequencesSince conifers and other gymnosperms are difficult exper-imental systems with few functionally characterized pro-teins, in silico annotation of spruce ESTs was performedagainst predicted peptides from sequenced genomes offour angiosperms (Arabidopsis, rice, poplar, and grape-vine) and the moss Physcomitrella patens, together with allprotein sequences in the NR database of GenBank.Among hq 3'-end ESTs > 400 bases in length (N =133,065), between 60.5% and 68.6% have matchesagainst each of the five plant genomes with a low strin-gency BLASTX score of > 50 (Figure 5A and Additional File3). Using a more stringent threshold of score > 200,between 16.1% and 21.4% of spruce 3'-end ESTs matchpeptides from each of the five plant genomes of this com-parison. BLASTX matches with hq 3'-end ESTs wereTable 2: Spruce EST summaryTotal sequences 206,875Number of 5' sequences 41,472Number of 3' sequences 165,403Average assembled 3' EST length (bp)a 656.4Number of high-quality 3' sequencesb 147,146Number of contigsc 19,941Number of singletons 26,804Number of putative unique transcriptsd 46,745Number of assembled 3' ESTs witheSignificant BLASTX match 96,454No significant BLASTX match 50,692Average number of contig members 6.03Number of contigs containing2 ESTs 6,0503–5 ESTs 7,4496–10 ESTs 3,84111–20 ESTs 1,94121–50 ESTs 572>50 ESTs 88aHigh-quality (hq) sequences only.bA sequence is considered of hq if it is not derived from contaminant species and its vector-trimmed and poor-quality-trimmed PHRED 20 length is >100 bases.cA contig (contiguous sequence) contains two or more ESTs; 3' sequences only.dNumber of putative unique transcripts (PUTs) among assembled 3' ESTs equals the number of contigs plus the number of singletons.eThreshold for BLASTX significance versus the non-redundant (NR) database of GenBank is a score value > 50.Table 3: Distribution of ESTs in multiple cDNA librariesNumber of libraries Number of putative unique transcripts with ESTs in all libraries compared20 319 218 217 1216 1615 2214 3613 4112 6611 101Page 5 of 17(page number not for citation purposes)10 175BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484slightly higher (72.8% and 24.5% at score > 50 and > 200,respectively) when compared to the more comprehensivecollection of proteins in the NR database (Figure 5A andAdditional File 3). Similar results were obtained using theassembled contig set of 46,745 spruce PUTs derived from3'-end ESTs (Figure 5C and Additional File 5). Among hq5'-end ESTs > 400 bases in length (N = 36,505), sequencesimilarity with proteins predicted from the five plantgenome sequences was higher compared to 3' ESTs andPUTs, with between 74.3% and 82.6% (low stringency)and 30.7% and 40.2% (high stringency) of 5'-end ESTseven higher proportion of 5'-end ESTs had BLASTXmatches against the NR database (85.9% and 43.8% atscore > 50 and > 200, respectively). These results illustratethe challenge of in silico annotation of conifer ESTs, evenwith hq sequences averaging > 650 bases in length.We also compared the spruce ESTs and PUTs against ESTsfrom all gymnosperm species combined (dbEST databaseof GenBank, excluding ESTs reported in this study) usingBLASTN. As expected, sequence similarity between thespruce ESTs and published gymnosperm ESTs was highClone selection and complete insert sequencing of 6,464 Sitka spruce FLcDNAsFigure 1Clone selection and complete insert sequencing of 6,464 Sitka spruce FLcDNAs. A total of 20,469 candidate FL transcripts were identified in two consecutive rounds of clone selection involving initially 32,980 and then 46,745 putative unique transcripts (PUTs) derived from a total of 147,146 high-quality 3'-end ESTs. See Methods for complete details of candi-date clone selection criteria. Among the 8,127 candidates selected for complete insert sequencing, 5,298 were finished by end reads only, and another 1,166 were finished by end reads plus gap closing using primer walking, yielding a total of 6,464 sequence-verified finished FLcDNAs. An additional 1,396 clones (17.1%) from the starting set of 8,127 will be finished in future work. Only 267 clones (3.2%) were aborted, which supports the success of our strategy for FLcDNA clone selection.113,612 high-quality 3’-end ESTs• CAP3 assembly32,980 transcriptsFilter for FL candidate criteria• 2,771 missing 5’ SSPA• 1,312 short 5’- and/or 3’-end ESTs (Q20 < 400 nt)• 920 missing polyA tail• 357 5’-end ESTs > 100 nt shorter than public match• Select PUTs with FLcDNA library clones10,710 candidate FL transcriptsROUND 1147,146 high-quality 3’-end ESTsROUND 246,745 transcripts9,759 candidate FL transcriptsFilter for FL candidate criteria• 3,673 missing 5’ SSPA• 1,514 short 5’- and/or 3’-end ESTs (Q20 < 400 nt)• 1,119 missing polyA tail• 450 5’-end ESTs > 100 nt shorter than public match• Select PUTs with FLcDNA library clones• Discard clones selected in round1• CAP3 assembly5,348 candidatesselected2,779 candidatesselected5,298 clones finished(65.1%)• Rearray into 384-well plates• 2x 3’-end and 2x 5’-end per clone1,166 clones finished(14.3%)• Rearray into 96-well plates• Gap-closing by primer walking1,396 clones incomplete(17.1%)267 clones aborted(3.2%)• Manual and automated inspection for finished quality• Manual and automated inspection for finished qualityPage 6 of 17(page number not for citation purposes)matching each of the plant genomes (Figure 5B and Addi-tional File 4). As observed with 3'-end ESTs and PUTs, an(Figure 5 and Additional Files 3, 4, 5). Among PUTs(derived from 3'-end ESTs), hq 3'-end and 5'-end ESTs >BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484400 bases in length, 88.6%, 95.4% and 96.9%, respec-tively, have matches with scores > 50. At higher BLASTNstringency levels (i.e., scores > 200 and > 1,000), sequencematches for PUTs, 3'-end and 5'-end ESTs remain consist-ently high. Among those PUTs, 3'-end and 5'-end ESTs >400 bases in length and with no obvious similarity to pro-teins from the five sequenced plant genomes (at score ≤50), 60.0%, 79.1%, and 82.0%, respectively, haveBLASTN scores > 200 versus published gymnosperm ESTs(Additional Files 3, 4, 5). When the spruce ESTs are com-pared against published ESTs from white spruce andloblolly pine, the two gymnosperm species with the mostsubstantial EST collections, a higher proportion of PUTs,and 3'-end and 5'-end ESTs show sequence similarity towhite spruce compared to loblolly pine, especially at thehighest BLASTN threshold (Figure 5 and Additional Files3, 4, 5).Utility of spruce FLcDNAs for comparative sequence annotationAs might be expected, sequence similarity between the6,464 Sitka spruce FLcDNAs and other gymnosperm ESTsand high sequence similarity thresholds, respectively (Fig-ure 6A and Additional File 2). As observed with spruceESTs, sequence similarity was highest between spruceFLcDNAs and white spruce ESTs, with lower similarityobserved with loblolly pine ESTs (Figure 6A). Next, thespruce FLcDNAs were compared against predicted pro-teins from five plant genome sequences and proteinsequences in the complete NR database of GenBank. At alow sequence similarity threshold of score > 50, between76.5% and 84.2% of FLcDNAs matched proteins fromeach of the plant genomes of this comparison, whereas ata higher threshold of score > 200 the percentages of FLcD-NAs with matches in the plant genome sequences rangedfrom 38.1% to 44.9% (Figure 6A and Additional File 2).Overall, the Sitka spruce FLcDNAs show greater similarityto predicted proteins from sequenced plant genomescompared to the spruce ESTs. The proportion of spruceFLcDNAs with similarity to proteins in the NR databasewas also higher than spruce ESTs at 87.7% and 47.9% atscore > 50 and score > 200, respectively (Figure 6A andAdditional File 2).Validation of sequence quality of FLcDNAsFigure 2Validation of sequence quality of FLcDNAs. Sequence accuracy was measured as the percentage of the 6,464 FLcDNAs which, with 100%, 95.0–99.9%, 90.0–94.9% or <90.0% of their sequence length, exceeded Phred30, Phred40, Phred50 or Phred60 sequence quality thresholds. All 6,464 FLcDNAs exceeded the Phred30 quality thresholds (less than 1 error in 103 sequenced nucleotides) over 100% of their sequence length. Even at the threshold level of Phred60 (less than 1 error in 106 sequenced nucleotides) the majority (74.1%) of the FLcDNA sequences met this very high sequence quality score over > 95.0% of their length.Phred30% of FLcDNAsMinimum sequence quality score100%99.9-95.0%94.9-90.0%< 90%0102030405060708090100Phred40 Phred50 Phred60Proportion of FLcDNA sequence lengthPage 7 of 17(page number not for citation purposes)is very high, with 96.5%, 94.6% and 78.7% of FLcDNAsmatching published gymnosperm ESTs at low, medium,These results show that FLcDNAs provide a clear advan-tage over ESTs for large scale in silico annotation of spruceBMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484sequences. Nevertheless, when using high stringency crite-ria relevant for in silico functional annotation (score val-ues > 200), the comparison of spruce FLcDNAs against thefive plant genomes, as well as all plant species in the NRdatabase, still identifies a substantial number ofsequences that only show significant matches with othergymnosperms, as opposed to angiosperms. Among the6,464 spruce FLcDNAs, we found 927 (14.3%) without areliable match to angiosperm sequences at a low strin-gency (i.e., BLASTX score ≤ 50), of which 743 (80.1%)match with high sequence similarity (i.e., BLASTN score >200) to a published gymnosperm EST sequence (Addi-tional File 2). A very small number of spruce FLcDNAslack sequence similarity to angiosperm or gymnospermsequences (at score ≤ 50) and display a best match withnon-plant species in the NR database of GenBank; 1.0% atscore > 50 and 0.3% at score > 200 (Additional File 2). Inthese cases, the best match is often an insect sequence sug-gesting small amounts of contaminants in the cDNAlibraries.Comparing the entire spruce FLcDNA dataset againstsequences from all species identified that 71.9% (at scoreBank, and gymnosperm ESTs) (Figure 6B and 6C). It isnotable that at the higher threshold of score > 200, 47.2%of spruce FLcDNAs match only to a single database, andin the vast majority of cases this is a gymnospermsequence (Figure 6E). Another 1.0% (at score ≤ 50) or3.8% (at score ≤ 200) of spruce FLcDNA sequences do notalign to any sequences in available databases. Thesesequences could represent genes from spruce (or genesfrom other contaminant organisms) that have not beensequenced before in any source.DiscussionSpruce ESTs and FLcDNAs enhance conifer genomics resourcesGenomics research on conifers has been limited by thelack of a relevant gymnosperm reference genomesequence. The very large size of conifer genomes (10 to 40Gb; [1]), dominated by repetitive DNA, has been a road-block to a conifer genome sequence project. Furthermore,the phylogenetic distance between conifers and the well-studied angiosperms is more than 300 million years [2],limiting the utility of angiosperm genome information forresearch in conifers. To overcome these obstacles to coni-Distribution of open reading frame (ORF) and 5' and 3' untranslated region (UTR) sizes among the finished 6,464 FLcDNAs (A), and the mean ORF and UTR length (± standard deviation) (B)Figure 3Distribution of open reading frame (ORF) and 5' and 3' untranslated region (UTR) sizes among the finished 6,464 FLcDNAs (A), and the mean ORF and UTR length (± standard deviation) (B). Each finished FLcDNA sequence was examined for the presence of ORFs using the EMBOSS getorf program (version 2.5.0; [69,70]). In each case, the longest stretch of uninterrupted sequence between a start (ATG) and stop codon (TGA, TAG, TAA) in the 5' to 3' direction was taken as the predicted ORF. The presence and coordinates of the 5' second strand primer adaptor sequence (SSPA) and polyA tail were also noted. The regions between the 5'SSPA and the predicted ORF start and between the predicted ORF stop and the polyA tail were taken to be the 5' and 3' UTRs, respectively. The 5' SSPA and 3' polyA tail lengths were not included when determining UTR length.No. of clone s5’ UTR: 154 ± 164 bp Size (bp)0400800120016002000<5050-99100-149150-199200-249250-299300-349350-399400-449450-499>4990400800120016002000<200200-399400-599600-799800-9991000-11991200-13991400-1599>15990400800120016002000<5050-99100-149150-199200-249250-299300-349350-399400-449450-499>499ORF: 616 ± 374 bp 3’ UTR: 301 ± 174  bp Size (bp) Size (bp)ABPage 8 of 17(page number not for citation purposes)> 50) or 34.2% (at score > 200) have matches in all sevendatasets (i.e., five plant genomes, the NR database of Gen-fer genome research, we have developed two new valuablecomponents for the "conifer genomics toolbox".BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484First, we have assembled a large collection of high-quality,sequence-verified FLcDNA clones from Sitka spruce, alongwith a corresponding database of in silico annotations(Additional File 2). These FLcDNAs are of very low redun-dancy. They represent the third largest sequence-verifiedFLcDNA resource for any plant species, behind only rice[7] and Arabidopsis [4], and are the only substantialFLcDNA resource for a conifer or any other gymnosperm.Second, we have added a large number of new ESTsequences to the public spruce EST collection in GenBank,along with corresponding databases of in silico annota-tions (Additional Files 3, 4, 5). This resource, which wasdeveloped from Sitka, white and interior spruce (interiorspruce has varying degrees of admixture between whiteand Engelmann spruce), substantially improves the sizeand quality of the previously described spruce EST collec-tions [20-23]. The spruce EST collection, along with theESTs from loblolly pine [15-18], is now one of the twoization, which had previously not been applied to a coni-fer EST program. Also, we have added sequences from anuntil now poorly represented class of tissues representinga biologically important component of conifer defense:insect-, wound- or elicitor-induced tissues.We identified 46,745 PUTs (19,941 contigs, 26,804 sin-gletons; derived from 3'-end ESTs) in the three speciesgroups surveyed here; Sitka spruce, white spruce, and inte-rior spruce. The rates of PUT discovery for all species com-bined (31.8%), white spruce only (33.0%) and Sitkaspruce only (36.6%) are comparable, as are the ratios ofsingletons to contigs in each collection. Among contigsfrom the combined analysis of white and Sitka spruceESTs, 26.7% contained ESTs from both species, suggestingthat ESTs derived from different spruce species represent-ing the same spruce gene often cluster together. The PUTsidentified here may represent a substantial portion of theexpressed gene catalogue for species of spruce, but a com-Validation of spruce FLcDNAs by comparison of ORF lengths (A) and cDNA lengths (B) of 297 spruce FLcDNAs with match-ing gymnosperm FLcDNAs in the public domainFi ure 4Validation of spruce FLcDNAs by comparison of ORF lengths (A) and cDNA lengths (B) of 297 spruce FLcD-NAs with matching gymnosperm FLcDNAs in the public domain. The 6,464 FLcDNAs were compared to a collec-tion of 872 gymnosperm sequences from SwissProt using BLASTX ([71]; release 50.1 of June 13th, 2006) annotated as full-length (excluding predicted proteins derived from genomic DNA). This comparison identified 297 homologous pairs. A spruce-gymnosperm FLcDNA pair was considered homologous if (1) the best gymnosperm protein BLASTX match exceeded a strin-gent threshold (% identity ≥ 50%; score value > 95) and (2) the reciprocal TBLASTN analysis identified the same spruce FLcDNA with a score value equal to or within 10% of the best match. ORF and cDNA lengths for gymnosperm sequences were extracted from the SwissProt records, and spruce ORF lengths were predicted using the EMBOSS getorf program. Strong correlations were observed for both ORF and cDNA lengths between spruce and gymnosperm sequences for the avail-able test set of 297 homologous pairs.A BGymnosperm ORF length (aa)8006004002000200 400 600 800Spruce ORF length (aa)0Gymnosperm cDNA length  (bp)30002000100001000 2000 3000Spruce cDNA length (bp)0Page 9 of 17(page number not for citation purposes)largest EST resources for any conifer species. To enhancegene discovery, we strategically employed library normal-plete genome sequence is needed for assessment of truegene numbers in conifers.BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484Page 10 of 17(page number not for citation purposes)Sequence annotation of 3' and 5' ESTs and putative unique transcripts (PUTs) against published databasesFig re 5Sequence annotation of 3' and 5' ESTs and putative unique transcripts (PUTs) against published databases. Panels A, B and C show the percentage of 3' ESTs, 5' ESTs and PUTs (derived from 3'-end ESTs), respectively, with sequence similarity to entries in nine databases including BLASTX searches against peptides from five sequenced plant genomes (i.e., Ara-bidopsis thaliana, Populus trichocarpa, Oryza sativa, Vitis vinifera, and Physcomitrella patens), and all peptides in the non-redundant (NR) database of GenBank; as well as BLASTN searches against 1) all gymnosperm ESTs in dbEST database of GenBank, 2) all Picea glauca ESTs in dbEST, and 3) all Pinus taeda ESTs in dbEST. Matches were identified using low (score > 50) medium (score > 200) or high (score > 1,000) BLAST stringency thresholds.A 100806040200> 50> 200% of 3’ESTswith database matchA. thalianapeptidesBC> 1,000BLASTscoreO. sativapeptidesV. viniferapeptidesP. patenspeptidesGenBank NRpeptidesgymnospermESTsP. glaucaESTsP. taedaESTsP. trichocarpapeptides100806040200> 50> 200% of 5’ESTswith database matchA. thalianapeptides> 1,000BLASTscoreO. sativapeptidesV. viniferapeptidesP. patenspeptidesGenBank NRpeptidesgymnospermESTsP. glaucaESTsP. taedaESTsP. trichocarpapeptides100806040200> 50> 200% ofPUTswith database matchA. thalianapeptides> 1,000BLASTscoreO. sativapeptidesV. viniferapeptidesP. patenspeptidesGenBank NRpeptidesgymnospermESTsP. glaucaESTsP. taedaESTsP. trichocarpapeptidesBMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484Figure 6 (see legend on next page)A 100806040200> 50> 200% ofFLcDNAswith database matchA. thalianapeptidesB CMatch all 7 databasesMatch 5 of 7 databasesMatch 4 of 7 databasesMatch 3 of 7 databasesMatch 2 of 7 databasesMatch 1 of 7 databasesNo database matchesBLAST score > 5071.9%9.2%2.6%0.9%1.1%11.3%1.0%> 1,000BLASTscoreO. sativapeptidesV. viniferapeptidesP. patenspeptidesGenBank NRpeptidesgymnospermESTsP. glaucaESTsP. taedaESTsBLAST score > 20047.2%3.8%34.2%7.1%2.6%1.2%3.2%D EGymnosperm ESTs onlyGenBank NR onlyBLAST score > 5092.1%7.9%BLAST score > 20098.4%1.6%P. trichocarpapeptides2.0%Match 6 of 7 databases0.9%Page 11 of 17(page number not for citation purposes)BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484The spruce ESTs described here have already provided thefoundation for functional and comparative genomicsresearch on conifer defense against insects, adaptation tothe environment, somatic embryogenesis and wood for-mation, via both transcriptome and proteome analyses[21,34,35,42,46]. They have also allowed development ofthree types of genetic markers: microsatellites [47,48], sin-gle nucleotide polymorphisms (SNP) and conservedorthologous sequences (COS) [41]. The FLcDNAsequences enable rigorous large-scale comparisons of evo-lutionary patterns at large evolutionary scales (K. Ritlandet al., manuscript in preparation).Utility of spruce FLcDNAs for functional characterization of gene families including nearly identical paralogous genesPrior to this work, only a few dozen complete spruce pro-tein sequences were available in the SwissProt database,and no substantial FLcDNA resource was available for anygymnosperm. Using FLcDNAs, detailed pathway annota-tion, gene expression analysis, and biochemical func-tional characterization of individual genes and genefamilies are now possible (S.G. Ralph and J. Bohlmann,manuscript in preparation). The Sitka spruce FLcDNAshave already advanced the discovery and the characteriza-tion of conifer defense genes [49-53]. Importantly, Sitkaspruce FLcDNAs allow for accurate analysis of closelyrelated members of gene families such as cytochromeP450-dependent monooxygenases or terpenoid synthases(TPS) involved in defense against insects or pathogens[40,54]. For example, TPSs represent a gene family con-taining many pairs or groups of nearly identical paralo-gous genes each with a potentially different biochemicalfunction [32]. Our recent mutational analysis of twoclosely related paralogous Norway spruce di-TPS illus-trated that a single amino acid mutation in a backgroundof more than 800 amino acids completely alters biochem-ical product profiles [55]. Similarly, in rice, the functionaldivergence of two distinct TPS of primary and secondarymetabolism was due to a single amino acid substitutionfunctional assessment of gene evolution that is now pos-sible in Sitka spruce.Utility of FLcDNAs for conifer proteome and genome characterizationBeyond their importance for functional characterizationof individual genes and the analysis of gene families, onan even larger scale, FLcDNAs are also superior to ESTs foroverall proteome and genome characterization in a coni-fer. Because the Sitka spruce FLcDNAs allow for a muchmore reliable prediction of the complete protein-codingORF than ESTs, they have been invaluable for proteomepredictions and practical proteome analyses [35]. Inexpectation of future efforts to sequence a conifergenome, FLcDNAs and their ORFs will be essential for thedevelopment and training of gene prediction software, ashas recently been demonstrated for poplar [8,9].Spruce FLcDNAs from insect-induced libraries reveal genes not detected in angiospermsComparison of Sitka spruce sequences againstangiosperm plants suggests that there are likely a substan-tial number of genes in the collection of 6,464 FLcDNAsthat are either absent in other species, or lack significantsequence similarity for unambiguous identification. Inearlier work, Kirst et al. [16] suggested that less than 10%of loblolly pine transcripts lack a related gene in Arabi-dopsis (defined at a BLASTX E value cutoff of 1e-10 or ca.score 60). When we analyzed the spruce FLcDNAs, wefound that approximately 14% had no similarity to anyangiosperm at a BLASTX stringency of score 50 (slightlylower than that applied by Kirst et al. [16]), based on com-parisons to four sequenced angiosperm genomes and allangiosperm sequences in the NR database. This slightlyhigher rate may be the result of sequencing libraries madefrom tissues induced by insect attack, which may dispro-portionally represent genes with specialized functions inconifer defense that are subject to high levels of naturalselection due to biotic interaction. By contrast, genesinvolved in xylem development and wood formationSequence annotation of 6,464 high-quality spruce FLcDNAs against published databasesFig re 6 (see previous page)Sequence annotation of 6,464 high-quality spruce FLcDNAs against published databases. Panel A shows the per-centage of FLcDNAs with sequence similarity to entries in nine databases including BLASTX searches against peptides from five sequenced plant genomes (i.e., Arabidopsis thaliana, Populus trichocarpa, Oryza sativa, Vitis vinifera, and Physcomitrella patens), and all peptides in the non-redundant (NR) database of GenBank; as well as BLASTN searches against 1) all gymnosperm ESTs in dbEST database of GenBank, 2) all Picea glauca ESTs in dbEST, and 3) all Pinus taeda ESTs in dbEST. Matches were identified using low (score > 50) medium (score > 200) or high (score > 1,000) BLAST stringency thresholds. Panels B and C show the non-overlapping distribution of matches of spruce FLcDNAs against seven databases (peptides from A. thaliana, P. trichocarpa, O. sativa, V. vinifera, P. patens, and the NR database of GenBank; and gymnosperm ESTs) at BLAST score thresholds of > 50 and > 200, respectively. Panels D and E show the database source in cases where spruce FLcDNAs matched only a single database in panels C and D at BLAST score thresholds of > 50 and > 200, respectively.[56]. These examples illustrate the utility of true FLcDNAs appear to be well conserved in angiosperms and conifersPage 12 of 17(page number not for citation purposes)for discovery of nearly identical paralogous genes and for [16,46].BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484ConclusionThe 206,875 ESTs and 6,464 FLcDNAs and the corre-sponding in silico annotated sequence databases provide anew and valuable genomics resource for species of spruce,as well as for gymnosperms in general. Our emphasis onFLcDNAs and ESTs from cDNA libraries constructed fromherbivore-, wound- or elicitor-treated induced spruce tis-sues, along with incorporating normalization to capturerare transcripts, gives a rich conifer EST resource whichalso apparently contains a substantial number of tran-scripts with no obvious sequence similarity to knownangiosperm sequences. Recent research has begun to fullyrealize the application of these EST and FLcDNAsequences, and FLcDNA clones.MethodscDNA library constructionDetails of the isolation of total and poly(A)+ RNA aredescribed in Additional File 6. Standard cDNA librarieswere directionally constructed (5' EcoRI and 3' XhoI) using5 μg poly(A)+ RNA and the pBluescript II XR cDNA LibraryKit, following manufacturer's instructions (Stratagene, LaJolla, USA) with modifications. First-strand synthesis wasperformed using Superscript II reverse transcriptase (Invit-rogen, Carlsbad, USA) and an anchored oligo d(T) primer[5'-(GA)10ACTAGTCTCGAG(T)18VN-3']. Size fractiona-tion was performed on XhoI-digested cDNA prior to liga-tion into vector using a 1% NuSieve GTG low meltingpoint agarose gel (BioWhittaker Molecular Applications,Walkersville, USA) and β-agarase (New England Biolabs,Ipswich, USA) to isolate cDNAs from 300 bp to 5 kb.Select cDNA libraries were normalized to Cot = 5 usingestablished protocols [13,14]. Library plasmids werepropagated in ElectroMAX DH10B T1 Phage ResistantCells (Invitrogen). FLcDNA libraries were directionallyconstructed (5' XhoI and 3' BamHI) according to methodsof Carninci and Hayashizaki [57] and Carninci et al. [58],with modifications described in Additional File 6.DNA sequencing and sequence filteringDetails of bacterial transformation with plasmids, clonehandling, DNA purification and evaluation, and DNAsequencing are provided in Additional File 6. Sequencesfrom each cDNA library were closely monitored to assesslibrary complexity and sequence quality. DNA sequencechromatograms were processed using the PHRED soft-ware (versions 0.000925.c and 0.020425.c) [59,60].Sequences were quality-trimmed according to the high-quality (hq) contiguous region determined by PHREDand vector-trimmed using CROSS_MATCH software [61].Sequences with less than 100 high quality bases (Phred20or better) after trimming and sequences with polyA tails of≥ 100 bases were removed from the analysis. Alsousing BLAST [62,63] to E. coli K12 DNA sequence (GI:6626251), Saccharomyces cerevisiae (GenBank, http://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/yeast.nt.gz),Aspergillus nidulans (TIGR ANGI.060302), and Agrobacte-rium tumefaciens (custom database generated using SRS,Lion Biosciences). Sequences were also compared to theNR protein database [67]. Top ranked BLAST matches tospecies other than plants with score values > 60 wereflagged as contaminants and were removed from the ESTdataset. EST sequences have been deposited in the dbESTdatabase of GenBank [DR448912 to DR451924;DR463975 to DR595214; CV720218 to CV720219;CO203067 to CO245079; CO250245 to CO252887;CO252989 to CO253183; CO253265 to CO257405;CO257513 to CO258618; CN480886 to CN480910].Selection of candidate FLcDNA clones and sequencing strategyAll 3'-end ESTs remaining after filtering were clusteredand assembled using CAP3 ([44]; assembly criteria: 95%identity, 40 bp window). The resulting contigs and single-tons were defined as the putative unique transcript (PUT)set. PUTs with a cDNA clone from a FLcDNA library wereselected as candidates for complete insert sequencing.Candidate clones from FLcDNA libraries were single-passsequenced from both 3'- and 5'-ends, and both sequenceswere used for subsequent clone selection. Clones werescreened for the presence of a polyA tail (3'-end EST) andthe second-strand primer adaptor (SSPA; 5'-ACTAGTT-TAATTAAATTAATCCCCCCCCCCC-3'; 5'-end EST).Clones lacking either of these features were eliminated. ApolyA tail was defined as at least 12 consecutive, or 14 of15 "A" residues within the first 30 bases of the 3'-end EST(5' to 3'). The presence of the SSPA was detected using theNeedleman-Wunsch algorithm limiting the search to thefirst 30 bases of the 5'end EST (5' to 3'). The SSPA wasdefined as eight consecutive "C" residues and a ≥ 80%match to the remaining sequence (5'-ACTAGTTTAAT-TAAATTAAT-3'). In each case, the algorithms used todetect the 5' and 3' clone features were set to produce max-imal sensitivity while maintaining a 0% false positive rate,as determined using test data sets. Candidate clones forwhich either of the initial 5'-end or 3'-end EST sequenceshad a Phred20 quality length of < 400 bases were alsoexcluded. Finally, any clone with a 5'-end EST which hada BLASTN match (score value > 300) to a gymnospermEST in the public domain (excluding ESTs from this col-lection) and was > 100 bases shorter at the 5' end than thematching EST was flagged as truncated at the 5' end andwas excluded. For each PUT represented by multiple can-didate clones after filtering, the clone with the longest 5'sequence was selected for complete insert sequencing.Insert sizing using colony PCR and vector primers was per-Page 13 of 17(page number not for citation purposes)removed were sequences representing bacterial, yeast orfungal contaminations identified by sequence alignmentsformed on 1,634 cDNA clones with an average insert sizeof ca. 1,250 bp. Based on this information, a sequencingBMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484strategy emphasizing the use of end reads was chosen.Using end reads only, 5,298 clones were complete insertsequenced to a high quality. Among this set, the averagesequenced insert size was 1,005 ± 282 bp (average ± SD)with an average of 5.93 ± 0.51 end reads required to fin-ish. Using a combination of end sequencing and primerwalking, an additional 1,166 clones were complete insertsequenced, with an average insert size of 1,653 ± 447 bp,and requiring six end reads and 2.62 ± 1.51 internalprimer reads per clone.Sequence finishing of FLcDNA clonesFLcDNA clones selected for complete sequence finishingwere rearrayed into 384-well plates, followed by two addi-tional rounds of 5'-end and 3'-end sequencing using vec-tor primers. All sequences from an individual clone werethen assembled using PHRAP (version 0990329) [59,60].To meet our hq criteria, the resulting clone consensussequence was required to achieve a minimum averagescore of Phred35, with each base position having a mini-mum score of Phred30. Each base position also requiredat least two sequences, each with a minimum quality ofPhred20, that were in agreement with the consensussequence (i.e., no high-quality discrepancies). Clones thatdid not meet these finishing criteria or that had gaps afterthree rounds of end sequencing were then subjected tosuccessive rounds of sequencing using custom primersdesigned using the Consed graphical tool version 14 [64]until the required quality levels were achieved. Regardlessof the finishing strategy, all clones that did not meet theminimum finishing criteria according to an automatedpipeline were manually examined. Clones were aborted ifthey were manually verified to lack the minimum finish-ing criteria, did not possess the cloning structures, wereidentified as chimeric, were refractory to sequence finish-ing due to the presence of a "hard-stop", or if errors wereidentified in the re-array of glycerol stocks. FLcDNAsequences have been deposited in GenBank [EF081469 toEF087932].Comparative sequence annotationThe following databases were used to perform BLASTanalyses for EST and FLcDNA annotation: 1) Arabidopsisthaliana, The Arabidopsis Information Resource version 7,release date April 25th, 2007, 31,921 peptides [65]; 2) Pop-ulus trichocarpa, Joint Genomes Institute (JGI) version 1.1,release date September 16th, 2006, 45,555 peptides [66];3) Oryza sativa, National Center for Biotechnology Infor-mation (NCBI), download date April 8th, 2008, 177,254peptides [67]; 4) Vitis vinifera, NCBI, download date April8th, 2008, 55,851 peptides [67]; 5) Physcomitrella patens,JGI version 1.1, release date January 4th, 2008, 35,938peptides [68]; 6) NR database of GenBank, NCBI releasereported in this study), download date April 8th, 2008,622,923 ESTs [67]; 8) Picea glauca ESTs in NCBI (exclud-ing ESTs reported in this study), download date April 8th,2008, 197,042 ESTs [67]; 9) Pinus taeda ESTs in NCBI,download date April 8th, 2008, 328,628 ESTs [67].Authors' contributionsJB and SGR conceived and directed this study. SGR, NK,DC, and CO developed full-length cDNA and EST librar-ies. SGR, HJEC, RK and JB analyzed data with assistancefrom the coauthors. RAH, SJMJ and MM directed sequenc-ing and bioinformatics work at the GSC. JB and SGR wrotethe paper. All authors read and approved the final manu-script.Additional materialAdditional File 1cDNA library summary statistics. Sequencing statistics organized by cDNA library source for spruce expressed sequence tags.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-484-S1.doc]Additional File 2Full-length cDNA inventory. Predicted protein-coding features, annota-tion, and GenBank accession numbers for the Sitka spruce full-length cDNA collection.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-484-S2.xls]Additional File 33'-end EST inventory. Detailed annotation and GenBank accession num-bers for the complete set of spruce 3'-end ESTs.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-484-S3.zip]Additional File 45'-end EST inventory. Detailed annotation and GenBank accession num-bers for the complete set of spruce 5'-end ESTs.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-484-S4.zip]Additional File 5PUT inventory. Detailed annotation for the complete set of putative unique transcripts.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-484-S5.zip]Page 14 of 17(page number not for citation purposes)162, release date October 15th, 2007, 5,372,238 peptides[67]; 7) gymnosperm ESTs in NCBI (excluding ESTsBMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484AcknowledgementsWe thank Barry Jaquish, John King, and Alvin Yanchuk (BC Ministry of For-ests and Range, Victoria) and David Ellis (formerly with CellFor Inc., Victo-ria) for plant material and generous support of this project, and Nancy Liao, Jerry Liu, Diana Palmquist, Brian Wynhoven, Yaron Butterfield, Jeffrey Stott, George Yang and Asim Siddiqui at the Genome Sciences Centre for technical assistance with large-scale DNA sequencing. We also thank Ian Cullis (UBC) for somatic embryo propagation, Sharon Jancsik (UBC) for assistance with clone insert sizing, David Kaplan (UBC) for greenhouse sup-port, and Bob McCron and Rene I. Alfaro from the Canadian Forest Service for access to western spruce budworms and white pine weevils, respec-tively. This project was supported with funding from Genome Canada and Genome British Columbia (TreenomixII Conifer Forest Health to K.R. and J.B., and TreenomixI to C.J.D., K.R., and J.B.) and the Natural Sciences and Engineering Research Council of Canada (NSERC to J.B.). Salary support for J.B. was provided, in part, by the UBC Distinguished University Scholar Program and an NSERC E.W.R. Steacie Memorial Fellowship.References1. Friesen N, Brandes A, Heslop-Harrison JS: Diversity, origin, anddistribution of retrotransposons (gypsy and copia) in coni-fers.  Mol Biol Evol 2001, 18:1176-1188.2. Bowe LM, Coat G, dePamphilis CW: Phylogeny of seed plantsbased on all three genomic compartments: Extant gymno-sperms are monophyletic and Gnetales' closest relatives areconifers.  Proc Natl Acad Sci USA 2000, 97:4092-4097.3. Arabidopsis Genome Initiative: Analysis of the genome sequenceof the flowering plant Arabidopsis thaliana.  Nature 2000,408:796-815.4. Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, NakajimaM, Enju A, Akiyama K, Oono Y, Muramatsu M, Hayashizaki Y, KawaiJ, Carninci P, Itoh M, Ishii Y, Arakawa T, Shibata K, Shinagawa A, Shi-nozaki K: Functional annotation of a full-length ArabidopsiscDNA collection.  Science 2002, 296:141-145.5. Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, GlazebrookJ, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C,Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J,Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L,Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y,Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A,Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM,Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J,Macalma T, Oliphant A, Briggs S: A draft sequence of the ricegenome (Oryza sativa L. spp. japonica).  Science 2002,296:92-100.6. Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y,Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C,Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, LiJ, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, ZhuM, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X,Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, DongJ, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W,Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P,Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G,Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen7. Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N,Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, NamikiT, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, OtomoY, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y,Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Nari-kawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, IshibikiJ, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R,Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K,Arakawa T, Fukuda S, Hara A, Hashidume W, Hayatsu N, Imotani K,Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N,Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T,Yoshino M, Hayashizaki Y: Collection, mapping and annotationof over 28,000 cDNA clones from japonica rice.  Science 2003,301:376-379.8. Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, AertsA, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, BrunnerA, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL,Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunning-ham R, Davis J, Degroeve S, Déjardin A, dePamphilis C, Detter J,Dirks B, Dubchak I, Duplessis S, Ehlting J, Ellis B, Gendler K, Good-stein D, Gribskov M, Grimwood J, Groover A, Gunter L, HambergerB, Heinze B, Helariutta Y, Henrissat B, Holligan D, Holt R, Huang W,Islam-Faridi N, Jones S, Jones-Rhoades M, Jorgensen R, Joshi C, Kan-gasjärvi J, Karlsson J, Kelleher C, Kirkpatrick R, Kirst M, Kohler A,Kalluri U, Larimer F, Leebens-Mack J, Leplé JC, Locascio P, Lou Y,Lucas S, Martin F, Montanini B, Napoli C, Nelson DR, Nelson C,Nieminen K, Nilsson O, Pereda V, Peter G, Philippe R, Pilate G, Polia-kov A, Razumovskaya J, Richardson P, Rinaldi C, Ritland K, Rouzé P,Ryaboy D, Schmutz J, Schrader J, Segerman B, Shin H, Siddiqui A,Sterky F, Terry A, Tsai CJ, Uberbacher E, Unneberg P, Vahala J, WallK, Wessler S, Yang G, Yin T, Douglas C, Marra M, Sandberg G, PeerY Van de, Rokhsar D: The genome of black cottonwood, Popu-lus trichocarpa (Torr. & Gray).  Science 2006, 313:1596-1604.9. Ralph SG, Chun HJE, Cooper D, Kirkpatrick R, Kolosova N, GunterL, Tuskan GA, Douglas CJ, Holt RA, Jones SJM, Marra MA, BohlmannJ: Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery ofgenes responding to insect feeding.  BMC Genomics 2008, 9:57.10. Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A,Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, HugueneyP, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, BillaultA, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V,Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N,Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le ClaincheI, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, MerdinogluD, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F,Pè ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weis-senbach J, Quétier F, Wincker P: The grapevine genomesequence suggests ancestral hexaploidization in majorangiosperm phyla.  Nature 2007, 449:463-467.11. Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H,Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, Tanahashi T,Sakakibara K, Fujita T, Oishi K, Shin-I T, Kuroki Y, Toyoda A, SuzukiY, Hashimoto S, Yamaguchi K, Sugano S, Kohara Y, Fujiyama A, Anter-ola A, Aoki S, Ashton N, Barbazuk WB, Barker E, Bennetzen JL, Blank-enship R, Cho SH, Dutcher SK, Estelle M, Fawcett JA, Gundlach H,Hanada K, Heyl A, Hicks KA, Hughes J, Lohr M, Mayer K, Melkoz-ernov A, Murata T, Nelson DR, Pils B, Prigge M, Reiss B, Renner T,Rombauts S, Rushton PJ, Sanderfoot A, Schween G, Shiu SH, StueberK, Theodoulou FL, Tu H, Peer Y Van de, Verrier PJ, Waters E, WoodA, Yang L, Cove D, Cuming AC, Hasebe M, Lucas S, Mishler BD, ReskiR, Grigoriev IV, Quatrano RS, Boore JL: The Physcomitrellagenome reveals evolutionary insights into the conquest ofland by plants.  Science 2008, 319:64-69.12. Adams MD, Soares MB, Kerlavage AR, Fields C, Venter JC: RapidcDNA sequencing (expressed sequence tags) from a direc-tionally cloned human infant brain cDNA library.  Nat Genet1993, 4:373-380.13. Soares MB, Bonaldo MF, Jelene P, Su L, Lawton L, Efstratiadis A: Con-struction and characterization of a normalized cDNAlibrary.  Proc Natl Acad Sci USA 1994, 91:9228-9232.14. Bonaldo MF, Lennon G, Soares MB: Normalization and subtrac-tion: Two approaches to facilitate gene discovery.  Genome ResAdditional File 6Supplemental methods. Detailed methods for RNA isolation, full-length cDNA library construction, bacterial transformation with plasmids, clone handling, DNA purification and evaluation, and DNA sequencing.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-484-S6.doc]Page 15 of 17(page number not for citation purposes)S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H: A draftsequence of the rice genome (Oryza sativa L. spp. indica).  Sci-ence 2002, 296:79-92.1996, 6:791-806.BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/48415. Allona I, Quinn M, Shoop E, Swope K, St Cyr S, Carlis J, Riedl J, RetzelE, Campbell MM, Sederoff R, Whetten RW: Analysis of xylem for-mation in pine by cDNA sequencing.  Proc Natl Acad Sci USA1998, 95:9693-9698.16. Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, PauleC, Retzel E, Whetten R, Sederoff R: Apparent homology ofexpressed genes from wood-forming tissues of loblolly pine(Pinus taeda L.) with Arabidopsis thaliana.  Proc Natl Acad Sci USA2003, 100:7383-7388.17. Lorenz WW, Sun F, Liang C, Kolychev D, Wang H, Zhao X, Cordon-nier-Pratt MM, Pratt LH, Dean JFD: Water stress-responsivegenes in loblolly pine (Pinus taeda) roots identified by analy-ses of expressed sequence tag libraries.  Tree Physiol 2006,26:1-16.18. Cairney J, Zheng L, Cowels A, Hsiao J, Zismann V, Liu J, Ouyang S,Thibaud-Nissen F, Hamilton J, Childs K, Pullman GS, Zhang Y, Oh T,Buell CR: Expressed sequence tags from loblolly pineembryos reveal similarities with angiosperm embryogenesis.Plant Mol Biol 2006, 62:485-501.19. dbEST database summary   [http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html]20. Pavy N, Paule C, Parsons L, Crow JA, Morency MJ, Cooke J, JohnsonJE, Noumen E, Guillet-Claude C, Butterfield Y, Barber S, Yang G, LiuJ, Stott J, Kirkpatrick R, Siddiqui A, Holt R, Marra M, Seguin A, RetzelE, Bousquet J, MacKay J: Generation, annotation, analysis anddatabase integration of 16,500 white spruce EST clusters.BMC Genomics 2005, 6:144.21. Ralph SG, Yueh H, Friedmann M, Aeschliman D, Zeznik JA, NelsonCC, Butterfield YSN, Kirkpatrick R, Liu J, Jones SJM, Marra MA, Doug-las CJ, Ritland K, Bohlmann J: Conifer defense against insects:microarray gene expression profiling of Sitka spruce (Piceasitchensis) induced by mechanical wounding or feeding byspruce budworms (Choristoneura occidentalis) or white pineweevils (Pissodes strobi) reveals large-scale changes of thehost transcriptome.  Plant Cell & Environ 2006, 29:1545-1570.22. Koutaniemi S, Warinowski T, Kärkönen A, Alatalo E, Fossdal CG,Saranpää P, Laakso T, Fagerstedt KV, Simola LK, Paulin L, Rudd S,Teeri TH: Expression profiling of the lignin biosynthetic path-way in Norway spruce using EST sequencing and real-timeRT-PCR.  Plant Mol Biol 2007, 65:311-328.23. Yakovlev IA, Fossdal CG, Johnsen Ø, Junttila O, Skrøppa T: Analysisof gene expression during bud burst initiation in Norwayspruce via ESTs from subtracted cDNA libraries.  Tree Gen &Genomes 2006, 2:39-52.24. Brenner ED, Stevenson DW, McCombie RW, Katari MS, Rudd SA,Mayer KFX, Palenchar PM, Runko SJ, Twigg RW, Dai G, MartienssenRA, Benfey PN, Coruzzi GM: Expressed sequence tag analysis inCycas, the most primitive living seed plant.  Genome Biol 2003,4:R78.25. Brenner ED, Katari MS, Stevenson DW, Rudd SA, Douglas AW, MossWN, Twigg RW, Runko SJ, Stellari GM, McCombie WR, Coruzzi GM:EST analysis in Ginkgo biloba: an assessment of conserveddevelopmental regulators and gymnosperm specific genes.BMC Genomics 2005, 6:143.26. Jennewein S, Wildung MR, Chau M, Walker K, Croteau R: Randomsequencing of an induced Taxus cell cDNA library for identi-fication of clones involved in Taxol biosynthesis.  Proc Natl AcadSci USA 2004, 101:9149-9154.27. Ujino-Ihara T, Yoshimura K, Ugawa Y, Yoshimaru H, Nagasaka K,Tsumura Y: Expression analysis of ESTs derived from theinner bark of Cryptomeria japonica.  Plant Mol Biol 2000,43:451-457.28. Ujino-Ihara T, Kanamori H, Yamane H, Taguchi Y, Namiki N, MukaiY, Yoshimura K, Tsumura Y: Comparative analysis of expressedsequence tags of conifers and angiosperms revealssequences specifically conserved in conifers.  Plant Mol Biol2005, 59:895-907.29. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI,Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O:Improving the Arabidopsis genome annotation using maxi-mal transcript alignment assemblies.  Nucleic Acids Res 2003,31:5654-5666.30. Castelli V, Aury JM, Jaillon O, Wincker P, Clepet C, Menard M, Cru-aud C, Quétier F, Scarpelli C, Schächter V, Temple G, Caboche M,approach to evaluate and improve Arabidopsis genome anno-tation.  Genome Res 2004, 14:406-413.31. Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB,Feldmann KA: Features of Arabidopsis genes and genome dis-covered using full-length cDNAs.  Plant Mol Biol 2006, 60:69-85.32. Martin DM, Fäldt J, Bohlmann J: Functional characterization ofnine Norway spruce TPS genes and evolution of gymno-sperm terpene synthases of the TPS-d subfamily.  Plant Physiol2004, 135:1908-1927.33. Ro DK, Arimura G, Lau SYW, Piers E, Bohlmann J: Loblolly pineabietadienol/abietadienal oxidase PtAO (CYP720B1) is amultifunctional, multisubstrate cytochrome P450 monooxy-genase.  Proc Natl Acad Sci USA 2005, 102:8060-8065.34. Lippert D, Zhuang J, Ralph S, Ellis DE, Gilbert M, Olafson R, Ritland K,Ellis B, Douglas CJ, Bohlmann J: Proteome analysis of earlysomatic embryogenesis in Picea glauca.  Proteomics 2005,5:461-473.35. Lippert D, Chowrira S, Ralph SG, Zhuang J, Aeschliman D, Ritland C,Ritland K, Bohlmann J: Conifer defense against insects: pro-teome analysis of Sitka spruce (Picea sitchensis) bark inducedby mechanical wounding or feeding by white pine weevils(Pissodes strobi).  Proteomics 2007, 7:248-270.36. Treenomix research program   [http://www.treenomix.ca]37. Keeling CI, Bohlmann J: Diterpene resin acids in conifers.  Phyto-chemistry 2006, 67:2415-2423.38. Keeling CI, Bohlmann J: Genes, enzymes and chemicals of ter-penoid diversity in the constitutive and induced defence ofconifers against insects and pathogens.  New Phytol 2006,170:657-675.39. Ritland K, Ralph S, Lippert D, Rungis D, Bohlmann J: New directionsin conifer genomics.  In Landscapes, Genomics and Transgenic ConiferForests Edited by: Williams C. New York: Springer Press; 2006:75-84. 40. Bohlmann J: Insect-induced terpenoid defenses in spruce.  InInduced Plant Resistance to Herbivory Edited by: Schaller A. Springer Sci-ence; 2008:173-187. 41. Bousquet J, Isabel N, Pelgas B, Cottrell J, Rungis D, Ritland K: Spruce.In Genome Mapping and Molecular Breeding in Plants Volume 7. Editedby: Kole C. Springer-Verlag, Heidelberg; 2007:93-114. 42. Holliday JA, Ralph SG, White R, Bohlmann J, Aitken SN: Globalmonitoring of autumn gene expression within and amongphenotypically divergent populations of Sitka spruce (Piceasitchensis).  New Phytol 2008, 178:103-122.43. Bohlmann J, Keeling CI: Terpenoid biomaterials.  Plant J 2008,54:656-669.44. Huang X, Madan A: CAP3: a DNA sequence assembly program.Genome Res 1999, 9:868-877.45. Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M,Kamiya M, Shibata K, Sasaki N, Izawa M, Muramatsu M, Hayashizaki Y,Schneider C: High-efficiency full-length cDNA cloning by bioti-nylated CAP trapper.  Genomics 1996, 37:327-336.46. Friedmann M, Ralph SG, Aeschliman D, Zhuang J, Ritland K, Ellis BE,Bohlmann J, Douglas CJ: Microarray gene expression profiling ofdevelopmental transitions in Sitka spruce (Picea sitchensis)apical shoots.  J Exp Bot 2007, 58:593-614.47. Bérubé Y, Zhuang J, Rungis D, Ralph S, Bohlmann J, Ritland K: Char-acterization of EST-SSRs in loblolly pine and spruce.  Tree Gen& Genomes 2007, 3:251-259.48. Rungis D, Bérubé Y, Zhang J, Ralph S, Ritland CE, Ellis BE, Douglas C,Bohlmann J, Ritland K: Robust simple sequence repeat markersfor spruce (Picea spp.) from expressed sequence tags.  TheorAppl Genet 2004, 109:1283-1294.49. Miller B, Madilao LL, Ralph S, Bohlmann J: Insect-induced coniferdefense. White pine weevil and methyl jasmonate inducetraumatic resinosis, de novo formed volatile emissions, andaccumulation of terpenoid synthase and putative octadeca-noid pathway transcripts in Sitka spruce.  Plant Physiol 2005,137:369-382.50. Hudgins JW, Ralph SG, Franceschi VR, Bohlmann J: Ethylene ininduced conifer defense: cDNA cloning, protein expression,and cellular and subcellular localization of 1-aminocyclopro-pane-1-carboxylate oxidase in resin duct and phenolic paren-chyma cells.  Planta 2006, 224:865-877.51. Ralph SG, Hudgins JW, Jancsik S, Franceschi VR, Bohlmann J: Amino-cyclopropane carboxylic acid synthase is a regulated step inPage 16 of 17(page number not for citation purposes)Weissenbach J, Salanoubat M: Whole genome sequence compar-isons and "full-length" cDNA sequences: a combinedethylene-dependent induced conifer defense. Full-lengthcDNA cloning of a multigene family, differential constitutive,Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Genomics 2008, 9:484 http://www.biomedcentral.com/1471-2164/9/484and wound- and insect-induced expression, and cellular andsubcellular localization in spruce and Douglas fir.  Plant Physiol2007, 143:410-424.52. Ralph S, Park JY, Bohlmann J, Mansfield SD: Dirigent proteins inconifer defense: gene discovery, phylogeny and differentialwound- and insect-induced expression of a family of DIR andDIR-like genes in spruce (Picea spp.).  Plant Mol Biol 2006,60:21-40.53. Ralph SG, Jancsik S, Bohlmann J: Dirigent proteins in coniferdefense II: Extended gene discovery, phylogeny, and consti-tutive and stress-induced gene expression in spruce (Piceaspp.).  Phytochemistry 2007, 68:1975-1991.54. Hamberger B, Bohlmann J: Cytochrome P450 mono-oxygenasesin conifer genomes: discovery of members of the terpenoidoxygenase superfamily in spruce and pine.  Biochem Soc Trans2006, 34:1209-1214.55. Keeling CI, Weisshaar S, Lin RPC, Bohlmann J: Functional plasticityof paralogous diterpene synthases involved in coniferdefense.  Proc Natl Acad Sci USA 2008, 105:1085-1090.56. Xu M, Wilderman PR, Peters RJ: Following evolution's lead to asingle residue switch for diterpene synthase product out-come.  Proc Natl Acad Sci USA 2007, 104:7397-7401.57. Carninci P, Hayashizaki Y: High-efficiency full-length cDNAcloning.  Methods Enzymol 1999, 303:19-44.58. Carninci P, Shibata Y, Hayatsu N, Sugahara Y, Shibata K, Itoh M,Konno H, Okazaki Y, Muramatsu M, Hayashizaki Y: Normalizationand subtraction of cap-trapper-selected cDNAs to preparefull-length cDNA libraries for rapid discovery of new genes.Genome Res 2000, 10:1617-1630.59. Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automatedsequencer traces using phred. I. Accuracy assessment.Genome Res 1998, 8:175-185.60. Ewing B, Green P: Base-calling of automated sequencer tracesusing phred II. Error probabilities.  Genome Res 1998, 8:186-194.61. Laboratory of Dr. Phil Green: software resources   [http://phrap.org]62. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic localalignment search tool.  J Mol Biol 1990, 215:403-410.63. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lip-man DJ: Gapped BLAST and PSI-BLAST: A new generation ofprotein database search programs.  Nucleic Acids Res 1997,25:3389-3402.64. Gordon D, Abajian C, Green P: Consed: a graphical tool forsequence finishing.  Genome Res 1998, 8:195-202.65. The Arabidopsis Information Resource   [http://www.arabidopsis.org/]66. The Populus trichocarpa genome sequence   [http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html]67. National Center for Biotechnology Information   [http://www.ncbi.nlm.nih.gov/]68. The Physcomitrella patens genome sequence   [http://genome.jgi-psf.org/Phypa1_1/Phypa1_1.home.html]69. Rice P, Longden I, Bleasby A: EMBOSS: the European MolecularBiology Open Software Suite.  Trends Genet 2000, 16:276-277.70. EMBOSS   [http://emboss.sourceforge.net/]71. SwisProt database   [http://www.ebi.ac.uk/swissprot]yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 17 of 17(page number not for citation purposes)

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0132587/manifest

Comment

Related Items