UBC Faculty Research and Publications

Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones… Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg Aug 6, 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12870_2009_Article_448.pdf [ 331.24kB ]
JSON: 52383-1.0223229.json
JSON-LD: 52383-1.0223229-ld.json
RDF/XML (Pretty): 52383-1.0223229-rdf.xml
RDF/JSON: 52383-1.0223229-rdf.json
Turtle: 52383-1.0223229-turtle.txt
N-Triples: 52383-1.0223229-rdf-ntriples.txt
Original Record: 52383-1.0223229-source.json
Full Text

Full Text

ralssBioMed CentBMC Plant BiologyOpen AcceResearch articleTargeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genomeBjörn Hamberger1, Dawn Hall1, Mack Yuen1, Claire Oddy2, Britta Hamberger1, Christopher I Keeling1, Carol Ritland2, Kermit Ritland2 and Jörg Bohlmann*1,2Address: 1Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, B.C., V6T 1Z4, Canada and 2Department of Forest Sciences, University of British Columbia, Vancouver, B. C., V6T 1Z4, CanadaEmail: Björn Hamberger - bjoernh@interchange.ubc.ca; Dawn Hall - dehall74@interchange.ubc.ca; Mack Yuen - mack@bioinformatics.ubc.ca; Claire Oddy - coddy@interchange.ubc.ca; Britta Hamberger - brittah@interchange.ubc.ca; Christopher I Keeling - ckeeling@mac.com; Carol Ritland - critland@interchange.ubc.ca; Kermit Ritland - kritland@interchange.ubc.ca; Jörg Bohlmann* - bohlmann@interchange.ubc.ca* Corresponding author    AbstractBackground: Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts ofrepetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess thefeasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two largegene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer.Results: We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carenesynthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Piceaglauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-foldcoverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp(CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoterregions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), highcomplexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomicDNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides directionfor future efforts of conifer genome sequencing.Conclusion: We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomicDNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene familiescan be isolated in a gene-specific fashion. The results of the present work provide important new information about thestructure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.Published: 6 August 2009BMC Plant Biology 2009, 9:106 doi:10.1186/1471-2229-9-106Received: 30 April 2009Accepted: 6 August 2009This article is available from: http://www.biomedcentral.com/1471-2229/9/106© 2009 Hamberger et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 13(page number not for citation purposes)BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106BackgroundConifers (Coniferales) are a large group of gymnospermtrees which are separated from the angiosperms by morethan 300 million years of independent evolution. Theconifers include the economically and ecologically impor-tant species of spruce (Picea) and pine (Pinus), whichdominate many of the world's natural and planted forests[1]. The development of genomic resources for conifershas focused on the discovery and characterization ofexpressed genes in the form of expressed sequence tags(ESTs) and full-length (FL)cDNAs. The available conifercDNA sequence resources are extensive (1,158,419 ESTsas of December 3, 2008), representing almost 9% of allESTs in the plant genome database (http://plantgdb.org/,http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html). The EST and FLcDNA resourcesdeveloped for white spruce (Picea glauca), Sitka spruce (P.sitchensis), and a hybrid white spruce (P. glauca × P. engel-mannii) [2,3], have enabled transcriptome profiling [1,4-6], proteome analysis [7-9], marker development [10-13],and the functional characterization of gene products [14-16]. These functional genomics studies have providedconsiderable insights into conifer defence against insectsand pathogens, adaptation to the environment, anddevelopment [1,4].Beyond the characterization of cDNAs and their encodedproteins, the lack of a gymnosperm reference genomesequence limits our knowledge of the organization, struc-ture and gene space of conifer genomes. Sequencing aconifer genome has not yet been attempted and willremain a daunting task, given that conifer genomes rangein size from 20 to 40 Gbp, which is 200  400-fold largerthan the genome of Arabidopsis and larger than any othergenome sequenced to date. The sequencing of a conifergenome may also be challenging due to a very high con-tent of repetitive DNA [17] and the tendency of conifers toout-cross, preventing the development of inbred strains.An important step in assessing the feasibility of conifergenome sequencing will be the isolation, in random ortargeted fashion, of genomic (g)DNA in the form of BACclones, followed by the sequencing and assembly of largesegments of gDNA. However, to the best of our knowl-edge, sequencing of a complete BAC clone or any largesegment of nuclear gDNA has not yet been reported in theliterature for a conifer or any other gymnosperm species.Recently, a loblolly pine (Pinus taeda) gDNA BAC librarywas used to assess the contribution of a novel pine-spe-cific retrotransposon family (Gymny) to conifer genomesize [18].Unlike in angiosperms, conifers are not thought to haveundergone recent genome duplication events [17,19].bly of BACs containing genes of interest involved in coni-fer defence. First, many conifer defence genes exist asclosely related members of large families. For example,genes encoding the oleoresin producing terpenoid syn-thases (TPSs) [14,15], cytochrome P450 monooxygenases(P450s) involved in diterpene resin acid formation(CYP720B) [20,21], TIR-NBS-LRR disease resistance pro-teins [22], pathogenesis-related (PR)-10 proteins [23],and dirigent proteins [24,25] are members of such multi-gene families. Against the background of large gene fami-lies it may be difficult to isolate BACs for a specific targetgene. Second, the abundance of transposable elements(TEs), specifically those of the Copia and Gypsy classes,which have been demonstrated by in situ hybridizationsas diverse families of retroelements across conifer chro-mosomes [26,27], may cause additional problems withgenome sequence assemblies.In this paper we report a successful strategy for the tar-geted BAC identification and isolation of TPS and P450genes using PCR-based screening of a non-arrayed whitespruce BAC library of 3X genome coverage, and the subse-quent gDNA insert sequencing, sequence assembly, andsequence characterization. When extended to other coni-fers, our strategy will enable a comparative analysis of syn-teny of specific target regions of conifer genomes.ResultsTargeted isolation of BAC clones containing TPS (3CAR) and P450 (CYP720B4) genesOur first objective was to test if individual BAC clonescontaining conifer genes of large gene families could beisolated in a gene-specific manner. A white spruce (geno-type PG29) gDNA BAC library of approximately 3Xgenome coverage was constructed, aliquoted into pools inten 96-well plates, and screened in a hierarchical fashionby PCR as described previously [28]. The primers used toscreen pooled BAC clones for a specific TPS gene werebased on the functionally characterized Norway spruce(Picea abies) and Sitka spruce 3-carene synthase FLcDNAs(3CAR) [[29], D. Hall, J. Robert, C.I. Keeling, J. Bohl-mann, unpublished results]. Primers used to screen for aspecific target P450 gene were based on the functionallycharacterized diterpene oxidase CYP720B4 from Sitkaspruce and its white spruce orthologue [B. Hamberger, T.Ohnishi, J. Bohlmann, unpublished results]. The functionof the spruce CYP720B4 gene is similar to that of loblollypine CYP720B1 in diterpene resin acid formation [20,21].Primers used for gene-specific screening for TPS (3CAR)-or P450 (CYP720B4)-containing BAC clones wereassessed in silico against other known members of thelarge conifer TPS-d family [15] and other members of theconifer-specific CYP720B family [20], respectively, toPage 2 of 13(page number not for citation purposes)However, two features of conifer genomes pose untestedchallenges for the targeted isolation and sequence assem-minimize the chance of isolating non-target members ofthese gene families.BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106From a total of 960 BAC pools (ten 96-well plates), whichwere initially screened as 200 super-pools (20 super-poolsper 96-well plate) we identified 23 and 18 pools thatyielded PCR products with the 3CAR and CYP720B4primers, respectively. The 23 independent PCR productsobtained with 3CAR primers represented four unique3CAR-like sequences with at least 95% identity (in theopen reading frame) amongst each other and to the Sitkaspruce 3CAR FLcDNA Q09 (see Additional file 1). We alsosequenced five independent PCR products obtained byscreening the BAC pools with CYP720B4 primers. All fivesequences were 100% identical with the target CYP720B4sequence. For each of the two target genes, a single indi-vidual BAC clone was isolated, verified by sequencing thePCR product, and the gDNA inserts were excised and theirsize estimated based upon their mobility in pulsed fieldgel electrophoresis. The BAC clone PGB02 (3CAR) con-tained a gDNA insert of approximately 185 kbp and BACclone PGB04 (CYP720B4) contained an insert of approx-imately 110 kbp. These gDNA inserts were sheared intofragments of 700  2000 bp and shotgun-subcloned intoplasmid libraries for sequencing.Automated sequence assemblies of PGB02 and PGB04The shotgun plasmid libraries for PGB02 and PGB04 werearrayed in 384-well plates. Plasmid inserts from ten andfive 384-well plates were Sanger-sequenced for PGB02and PGB04, respectively, resulting in 6,954 and 3,677paired sequence reads (see Additional file 2). The averageplasmid insert length was 1,102 bp for the PGB02 libraryand 1,056 bp for the PGB04 library. Sequences werescanned and masked for vector sequences and contami-nating bacterial sequences, eliminating 21.4% (PGB02)and 27.9% (PGB04) of the total sequences. Using PHRAP,we assembled the sequences into 15 contigs for PGB02and 14 contigs for PGB04. For PGB02, the two largest con-tigs assembled in this automated fashion covered a totallength of 172,403 bp (91.2% of the sequence reads); thethree largest contigs for PGB04 covered over 93,905 bp(94.4% of the sequence reads) (see Additional file 3).Manual curation of the sequence assemblies of PGB02 and PGB04To improve the assembly of PGB02 and PGB04, weinspected each contig generated with the PHRAP software.We found that chimeric sequences, resulting from the liga-tion of independent gDNA fragments during the produc-tion of shotgun plasmid libraries, were included in someof the plasmid insert sequences, which together with low-quality sequences and low-complexity repeats, preventedthe automated assembly into continuous sequence. Inaddition, we manually aligned shorter contigs with lowsequence representation to the larger contigs. The left andshotgun libraries, provided orientation for the scaffolds ofPGB02 and PGB04 (Figure 1).The final assembly of PGB02 contained two contigs sepa-rated by a short gap (approximately 25  50 bp based onPCR amplification of the gap region) without sequencecoverage. The gap is flanked by long stretches of low-com-plexity repeat sequence. It is likely that the sequence gapresulted from physical repeat structures (e.g., hairpins)which interfered with sequencing this region. Manualcuration resulted in a single complete contig for thePGB04 gDNA. In PGB04 two high-complexity repeats andseveral low-complexity repeats extend for over 1 kbp oneither side of a region of approximately 200 bp with lowsequence coverage (transition) (Figure 1).In summary, the combined automated and manualsequence assemblies resulted in two contigs for PGB02with a combined sequence length of 172,056 bp and15.6× sequence coverage, and into a single contigs forPGB04 with a sequence length of 93,592 bp and 16.0×sequence coverage. The size of the assembled sequencecontigs for PGB02 and PGB04 agree well with the size ofBAC inserts as estimated by PFGE (185 kbp and 110 kbp,respectively).In silico analysis of the effect of sequencing depth on assembly qualityUsing the high sequence coverage (16×) and high-qualitymanually curated sequence assembly (93,592 bp) forPGB04 we analyzed the effect of plasmid shotgun librarysequencing depth on the quality of the automated assem-bly. This assessment can guide cost-effective sequencing ofBAC clones for future efforts of conifer genome sequenc-ing. The sequences obtained from the plasmids of five384-well plates for PGB04 were assembled into independ-ent builds in all permutations of two, three, four or fiveplates (see Additional files 4 and 5). With sequencesobtained from one plate, an average coverage of 3.2× wasobtained and the number of nucleotides assembled intocontigs (average contig number of 22.2) was less than 90kbp (representing 93.0% coverage). By assemblingsequences from two plates, the coverage doubled to anaverage of 6.4×, the number of contigs (average 9.9) wasreduced, the assembly included over 95 kbp in contigs,and the full length scaffold had over 98% coverage relativeto the reference PGB04 assembly. When sequences fromthree, four or five plates were used in the assembly, cover-age increased to 9.6×, 12.8× and 16×, respectively, with afurther increase in the number of nucleotides assembled.The assembly of sequences from three, four or five platesalso resulted in an increase of the number of contigs. Evenwith five plates, the coverage obtained by automatedPage 3 of 13(page number not for citation purposes)right arms of the pIndigoBAC-5 vector, which were sub-cloned together with the gDNA inserts into the plasmidassembly never reached 100% relative to the PGB04 refer-ence assembly, which involved manual curation.BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106Gene content of PGB02 and PGB04Results from the overall sequence analyses of the BACclones PGB02 and PGB04, visualised using gbrowse, areavailable as online information at http://treenomix3.msl.ubc.ca/cgi-bin/gbrowse/PGB02/; http://treenomix3.msl.ubc.ca/cgi-bin/gbrowse/PGB04/ (user-name: treenomix; password: conifer). These descriptionsinclude BLAST annotations (against NCBI NR, MIPS con-iferales repeats, spruce ESTs), GC content and gene predic-tions [Genemark Prediction (Eukaryotic HMM),FGENESH Prediction, Genescan Prediction]. PGB02 andPGB04 each contained a single functional gene identifiedby BLAST searches, which match the target genes 3CAR(PGB02) and CYP720B4 (PGB04) (Figure 1). Relative tothe complete gDNA sequence length of PGB02 andPGB04, the gene density with a single gene per 172 kbpand 94 kbp, respectively, is at least 10-fold lower than theoverall gene density of the sequenced genomes of Arabi-Structure of white spruce genomic DNA of BAC clones PGB02 and PGB04Figure 1Structure of white spruce genomic DNA of BAC clones PGB02 and PGB04. The position of the target genes 3CAR and CYP720B4 is indicated. Red and yellow bars represent repeated segments and segments with similarity to DNA trans-posons, respectively. Transposable elements were identified with the RepeatMasker using the viridiplantae section of the Rep-Base Update database. EcIS10, E. coli individual insertion sequence (IS) of the bacterial transposon Tn10; CSRE, conifer specific repeat element; LB/RB left and right border of pINDIGO; arrows in PGB04 indicate local putative segment duplications. The scale bar represents 10 kbp. (p) pseudogene, based on the accumulation of deleterious mutations and the absence of transcript with >90% identity.RBEcIS10LTR Gypsy(p)LTR Gypsy(p)CYP720B4CSRE(p)Line (p) LTR Gypsy(p)10kbLB RBGap(+)-3-CARtransitionLTR Gypsy(p) LTR Copia(p)LTR Gypsy(p)LTR Gypsy(p)Line(p)LTR Copia(p)LTR Copia(p) LTR Gypsy(p)Line(p)LTR Copia(p)LTR Gypsy(p)LBLTR Copia(p)PGB02PGB04Table 1: General features of the gDNA sequences of the white spruce BAC clones PGB02 and PGB04 as compared to the genome sequence features of Arabidopsis, rice, poplar and grapevine.Genome Size (Mbp) Predicted genes Avg Gene length (bp) Gene density (kbp per gene) % TE GC content (%)Arabidopsis thaliana1 115 25,498 1,992 4.5 14.0 36.0Orzya sativa2 389 37,544 2,699 9.9 34.8 43.6Populus trichocarpa3 485 45,555 2,392 10.6 42.0 33.7Vitis vinifera4 487 30,434 3,399 16.0 41.4 34.6PGB025 0.172 1 3,138 172 36.0 38.0PGB045 0.094 1 3,131 93.6 41.6 37.0Page 4 of 13(page number not for citation purposes)14 [30-33]5BAC insert sizeBMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106dopsis, rice, poplar and grapevine (Table 1). The GC con-tent (37%) of the two white spruce gDNAs was lower thanthe GC content of the rice genome (43.6%) and higherthan those of the Arabidopsis (36%), poplar (33.7%), andgrapevine (34.6%) genomes (Table 1) [30-33].Analyses of the gDNA sequences for 3CAR and CYP720B4The genomic region of the 3CAR gene on PGB02 covers3,541 bp, including a 198 bp 5'-UTR and 205 bp 3'-UTRwhich are part of the corresponding transcript isolatedfrom cDNA (Figure 2A). The gene contains ten exons andnine introns, with intronic regions accounting for 35.4%of the gene sequence between the start and stop codon ofthis TPS gene. The genomic region of the CYP720B4 geneon PGB04 covers 3,131 bp over nine exons (1,452 bp)and eight introns and includes transcribed 5'- and 3'-UTRsof 38 bp and 134 bp, respectively (Figure 2B). The intronicregion covers 50% of the gene sequence between the startand stop codon. The introns of 3CAR and CYP720B4 areof much lower GC content than the exons (% GC contentexons/introns: 3CAR, 42.3/27.8; CYP720B4, 41.4/25.5).Analyses of upstream promoter regions of 3CAR and CYP720B4Our analysis of upstream sequences for cis-regulatory ele-ments covered 3,793 bp upstream of the ATG start codonfor 3CAR and 2,500 bp upstream of the ATG start codonfor CYP720B4. Putative cis-regulatory elements were iden-tified by a similarity search of the PlantCARE database[34]. The region upstream of the ATG in 3CAR is uniqueuntil -3,973 bp which marks the location of a DNA trans-poson (Figure 1). In contrast, only the region from -1 bpto -749 bp upstream of the start codon of CYP720B4 isunique, followed by repetitive sequence (Figure 1 and Fig-ure 2). Several promoter enhancing sequences (TATA andCAAT boxes) were identified in the region immediatelyupstream of the start codon of the 3CAR and CYP720B4genes (Figure 2).Gene structure of white spruce 3CAR (A) and CYP720B4 (B) and comparison of 3CAR with the grand fir (Abies grandis) limonene synthase (LIM) and pinene synthase (PIN) genes (CFigure 2Gene structure of white spruce 3CAR (A) and CYP720B4 (B) and comparison of 3CAR with the grand fir (Abies grandis) limonene synthase (LIM) and pinene synthase (PIN) genes (C). Exons of the 3CAR and CYP720B4 genes matching the cDNA sequences are shown with grey arrows separated by introns. The UTRs are shown with grey lines. ATG, start codon. Putative cis-acting elements were identified using the PlantCare database and positions are highlighted in blue (not to scale): wun-box, wound-responsive element (Brassica oleracea); W-box, fungal elicitor responsive element (Pet-roselinum crispum); TCRR, TC-rich repeats, cis-acting element involved in defence and stress responsivenes (Nicotiana tabacum); TCA, cis-acting element involved in salicylic acid responsiveness (Brassica oleracea); TGACG, cis-acting regulatory element involved in the MeJA-responsiveness (Hordeum vulgare). LIM, AF326518; PIN, AF326517; roman numbers in part C indicate 1kb3CAR (7.33 kb) CYP720B4 (5.87 kb)ATGW-box TATA-boxTGACGTCRRTCATCRRwun boxATGTGACGTCA TCRRUnique repeated sequenceSTOPSTOPABTCATATA-box CAAT-box3CARLIMCPINI II III IV V VI VII VIII IXI II III IV V VI VIIVIII IXI II III IV V VI VII VIII IXXXXPage 5 of 13(page number not for citation purposes)conserved exons in 3CAR, LIM and PIN; the scale bar represents 1 kbp.BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106Since the spruce TPS and CYP720B genes are involved inthe biosynthesis of defence related terpenoids induced byinsects, pathogens, wounding or methyl jasmomate(MeJA) [21,35-38], we analysed the upstream genomicregions of 3CAR and CYP720B4 for putative cis-acting ele-ments associated with plant defence responses (Figure 2).In CYP720B4, a conserved W box motif (TTGACC), whichinteracts in Arabidopsis with members of the WRKY tran-scription factor family to mediate responses to woundingor pathogen responses [39], is located at position -1,129relative to the ATG of CYP720B4 on PGB04. A similar ele-ment (TGACG), involved in the MeJA-responsive geneexpression in barley (Hordeum vulgare) [40], is detected at-1,266 relative to the start codon of 3CAR and at -79 rela-tive to the start codon of CYP720B4. The upstream regionsof 3CAR and CYP720B4 also contain a TCA-element atpositions -815 and -3,291 in PGB02 and at positions -1,227, -676 and -1,162 (TCAGAAGAGG, GAGAAGAATAand CAGAAAAGGA) in PGB04, respectively. This elementwas first characterised as a cis-acting element involved insalicylic acid responsiveness and systemic acquired resist-ance in wild cabbage (Brassica oleracea) [41]. In addition,we identified several TC-rich repeats (ATTTTCTCCA) inthe up-stream regions of 3CAR (one on PGB02) andCYP720B4 (six on PGB04). These sequences were previ-ously described in tobacco (Nicotiana tabacum) as cis-act-ing elements involved in defence and stressresponsiveness [42].The upstream regions of the 3CAR and CYP720B4 genesalso include a large number of putative transcription fac-tor binding sites (37 for 3CAR; 19 for CYP720B4), impli-cated in light responsiveness in several other plant species.Interestingly, the promoter sequence including the tran-scribed 5'-UTR of the 3CAR gene on PGB02 contains aunique and conserved repeated sequence of 44 bp(TCAGGTTCTGCCATTGCCTTTTTAGTTCATTATCTT-GAGCTGCC) which is located four times (with no morethan two nucleotide changes) between -21 and -199 bpupstream of the start codon. Seventeen of the 44 bp in thisrepeated sequence have high levels (94100%) of sequenceidentity to plant I-box transcription factor binding sites,which are involved in light responsiveness [43]. Theactual role of this sequence in gene regulation isunknown, however, the prevalence of this sequence in thetranscribed 5'-UTR of the 3CAR gene on PGB02 as well asin the 5'-UTR of two white spruce 3CAR-like ESTs(GQ03804.B7_I10 and GQ03313.B7_P23) and one Sitkaspruce 3CAR-like EST (WS02910_I02) would make thissequence a relevant target for future transcription factorbinding site analysis. In addition, several cis-acting ele-ments previously identified in other plant species to beinvolved in responses to giberellin (GARE, TAACAGA; P-box; GCCTTTTGAGT), auxin (ARF, TGTCTC; TGA-ele-ment, AACGAC; AUX28, ATTTATATAAAT), ethylene(ERE, AWTTCAAA), and abiotic stresses (HSE,AAAAAATTTC; MBS, TAACTG; LTR, CCGAAA) were foundin the upstream regions of 3CAR and CYP720B4.Identification and distribution of high and low complexity repeats in PGB02 and PGB04Since repeat regions may offer a particular challenge forgenome sequence assembly in conifers, it is important toaccurately detect and mask high and low complexityrepeats. A comparison of the PGB02 and PGB04sequences with the genome sequences of Arabidopsis, rice,poplar, and grapevine [30-33] identified 3.7% of PGB02and 3.0% of PGB04 with similarity (E-value < 10-5) torepetitive regions found in these angiosperms http://www.phytozome.net (Table 2). Using RepeatMasker [44]we found that high complexity repeats contribute to21.9% and 17.6% of the sequence of PGB02 and PGB04,respectively (Table 2). We identified regions with similar-ity to RNA-based retroelements, predominantly Ty1/Copia and Gypsy/DIRS1 (long terminal repeat (LTR) ele-ment class) and a few segments of L1/CIN4 [long inter-spersed element (LINE) class] (Figure 1). In contrast to thelarge number of retroelement-based TEs, we found fewregions (0.7% of total sequence of PGB02 and PGB04)with similarity to DNA-based transposons (EnSpm, Heli-tron, MuDR and hAT). Although PGB02 and PGB04 repre-sent only a small fraction of the spruce gDNA, theTable 2: High complexity repeats in the white spruce gDNA of PGB02 and PGB04.BAC Repetitive sequences with similarities in angiosperms1 TEs detected with RepeatMasker2 Total repeat content3 Similarity to EST4(%)PGB02 3.7% 21.9% 36.0% 14.7%PGB04 3.0% 17.6% 41.6% 17.1%1Portion of the white spruce gDNA sequences of PGB02 and PGB04 with similarity to repeat regions identified in the genomes of Arabidopsis, rice, poplar and grapevine (cut-off E-value < 10-5); this excludes the coding regions of 3CAR and CYP720B4.2Percentage of PGB02 and PGB04 sequences consisting of TEs as detected by the RepeatMasker using the viridiplantae section of the RepBase Update.3Percentage of PGB02 and PGB04 sequences consisting of high complexity repeats as detected by pairwise comparisons of the two gDNA Page 6 of 13(page number not for citation purposes)sequences.4Fraction of the PGB02 and PGB04 sequences with similarity (at least 80  90% nucleotide sequence identity) to white spruce ESTs; this excludes the coding regions of 3CAR and CYP720B4; no EST hits were detected outside repeat regions.BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106identification of these DNA-based TEs is important as thisis the first report of these elements in a gymnosperm.While LTR retrotransposons have been reported in sprucewith a high copy number, it is not known if members ofthe Ty1/Copia or Gypsy/DIRS1 families are active inspruce [27]. Presence of retrotransposons in the transcrip-tome and sequence conservation indicates that they areactive. A BLAST search of the repetitive regions of PGB02and PGB04 against EST databases (plant genome data-base, http://plantgdb.org/) yielded significant hits withESTs from white spruce, Sitka spruce, interior spruce andNorway spruce as well as with pine species (Table 2). Pair-wise comparison of the gDNA sequences of PGB02 andPGB04 revealed substantial sequence conservation withinthe repeat regions (Table 2). All regions with similarity toTEs reside in large, often continuous sections with highhomology (average identity 86% over up to 3,000 bp) onPGB02 and PGB04 (Figure 1).Screening for homologous regions between and withinPGB02 and PGB04 also identified several previouslyundetected repeated elements, one of which represents aputative conifer specific repeat element (CSRE), whichappears to have locally multiplied in PGB04 (Figure 1). Awhite spruce transcript with 91% identity to this CSRE isalso present in the EST database (accession numberWS0339.C21_N21). The occurrence of high complexityrepeats in the BAC clones is estimated at 36.0% in PGB02and 41.6% in PGB04, values which are substantiallyhigher than those found in the fully sequenced genomesof Arabidopsis (10%) and poplar (12.6%), and similar tothe genomes of rice (35%) and grapevine (38.8%) [30-33](Table 2).DiscussionSequencing and assembly of BAC clones as a test for conifer genome sequencingTo date there is no sequence report for large segments ofconifer gDNA, and researchers have avoided sequencing aconifer genome due to the large size and high content ofrepetitive elements. Several approaches are currentlybeing considered for future efforts to sequence a conifergenome including the high-throughput sequencing ofBAC libraries. To assess the feasibility of sequencing andassembling long, continuous segments of conifer gDNA,we targeted two white spruce defence genes, 3CAR andCYP720B4, for BAC clone isolation, sequencing andassembly. These genes were chosen because they areknown to be members of large gene families with keyfunctions in terpenoid biosynthesis.Pre-assembled bidirectional reads of shotgun plasmidPGB02 and 14 for PGB04). Both BAC clones had areas ofreduced quality reads with low or no sequence coveragebordered by regions of low complexity sequence repeats,which necessitated manual curation of the sequenceassembly resulting in substantially improved sequenceassemblies of two (PGB02) and one (PGB04) contigs.High complexity and simple repeats did not interfere withthe automated PHRAP assembly and manual inspectionof the contigs did not reveal falsely matched reads withinthe repeat regions. The use of pre-assembled paired readsand quality scores produced by PHRED balanced betweentolerating discrepancies and complete mis-assembly ofthe data sets [45]. We found that most problems for auto-mated sequence assembly resulted from chimeric clonesin the plasmid libraries, bacterial DNA contamination,low-quality sequences and low-complexity repeats.Targeted BAC isolation of members of large conifer defence gene families provides insights into gene content of a conifer genomeThe two genes targeted for BAC sequencing are membersof large defence-related TPS and P450 gene families inspruce [20,46]. In the TPS gene family, members withmore than 90% sequence identity can have distinct bio-chemical functions with non-overlapping product profiles[14,15]. In this study we demonstrate for the first timethat it is possible to isolate, in an efficient and targetedfashion, BAC clones for specific members of the large con-ifer TPS and CYP720 defence gene families, thus provid-ing new opportunities to characterize members of theseimportant defence gene families at the genome level.The 3CAR gene contains 10 exons and 9 introns, identicalto the exon-intron structure of the grand fir (Abies grandis)monoterpene synthase genes (-)-limonene synthase and(-)-α/β-pinene synthase, previously cloned by PCR ampli-fication of the gDNAs between the start and stop codonsidentified in the corresponding FLcDNAs (Figure 2C)[47]. The identity of the deduced amino acid sequence tothe previously functionally characterised Norway spruce3CAR [29] is 84%. The CYP720B4 gene has 9 exons and 8introns, and is the first genomic structure reported for agymnosperm P450 gene. A comparison of the CYP720B4gDNA structure with the gDNA structures of ArabidopsisP450s shows highly conserved intron-exon boundariesbetween CYP720B4 and Arabidopsis CYP88, which isinvolved in the primary metabolism of giberellin biosyn-thesis. Both families of P450s share a similar reactionmechanism and catalyse consecutive oxidation steps ofstructurally similar substrates [21]. These findings suggesta common ancestor of CYP88 (primary metabolism) andCYP720B4 (secondary metabolisms).Page 7 of 13(page number not for citation purposes)libraries for each BAC clone were assembled using PHRAPsoftware resulting in a large number of contigs (15 forDespite the large size of conifer genomes (estimated 20 to40 Gbp; 200  400-fold larger than the genome of Arabidop-BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106sis), it is not likely that the spruce genome contains a pro-portionally larger number of protein coding genes thanArabidopsis as estimated from EST and FLcDNA discovery[3]. In contrast to previously sequenced angiospermgenomes, the spruce gDNA sequences of PGB02 andPGB04 reveal a low gene density, with a single gene per172 kbp and 94 kbp respectively, which is at least 10-foldlower than the overall gene density of the genomes of Ara-bidopsis, rice, poplar and grapevine (Table 1). This obser-vation of low gene density has also been confirmed byadditional sequencing of several randomly selected spruceBAC clones (K. Ritland et al., unpublished results).In angiosperms, several mechanisms contribute to theexpansion of gene families, including whole genome andchromosome segmental duplications [48], and tandemduplication of closely related genes [49]. For the genefamily members targeted in this work, we did not find evi-dence for local tandem duplication.The upstream regions of 3CAR and CYP720B4 contain putative cis-acting elements consistent with the roles of these genes in induced defenceA large volume of previous research on the regulation andcoordination of defence responses in spruce has targetedprocesses at the anatomical and molecular levels ofinduced metabolite accumulation, enzyme activities, andtranscript abundance of genes involved in the biosyn-thetic pathways of terpenoid and phenolic defences[16,25,36,38,46,50-54]. In particular, 3CAR transcriptswere up-regulated by real and simulated insect attack inSitka spruce [36] and in Norway spruce [29]. In loblollypine transcripts of the CYP720B4 related CYP720B1 wereup-regulated in response to MeJA treatment [21]. In addi-tion, large-scale proteome and gene expression profilinghas identified putative transcription factors in spruce thatwere up-regulated in response to real or simulated insectattack [1,8,9]. This is the first report of the upstreamsequences of conifer defence-related genes and the puta-tive cis-acting elements located in those regions.The upstream sequences of 3CAR and CYP720B4 eachhave more than five elements with sequence identity tocis-acting elements putatively involved in wound, stressand defence responses in angiosperms. The promoterregion of the CYP720B4 gene is 95% to 99% identicalwith the corresponding PCR-amplified regions across sev-eral genotypes of Sitka spruce, hybrid interior spruce, andwhite spruce (data not shown). The conserved W-boxmotif present upstream of CYP720B4 is recognised andbound by transcription factors of the plant specific WRKYclass which mediate pathogen defence responses inangiosperms [39]. More than 80 members of the WRKYArabidopsis WRKY proteins AtWRKY6, AtWRKY3 andAtWRKY4, involved in defence, stress and pathogenresponses [57,58] were found in the white spruce ESTdatabases. These putative promoter regions and cis-actingelements represent valuable tools for future studies of thetranscriptional regulation of conifer defence genes. Trans-formation of white spruce for characterization of promot-ers has been reported [59,60]. In future work we will usethis transformation system, in parallel with transforma-tion in heterologous plant systems, for functional testingof spruce TPS and P450 promoter constructs linked toreporter genes.The finding of a novel 44 bp sequence element which isdetected four times in the 5'UTR of the white spruce 3CARgene on PGB02 was also found 19 times in the 5'UTR ofthe orthologous gene isolated as a cDNA in Sitka spruce.The conservation of this short sequence across spruce spe-cies suggests that this element has an important func-tional role in the regulation of the 3CAR gene.Genomic regions surrounding the 3CAR and CYP720B4 genes contain DNA and RNA based transposable elementsThe genomic regions surrounding the 3CAR andCYP720B4 genes contain retrotransposons, DNA trans-posons and simple repeat sequences. With the exceptionof a fully preserved IS10 element present in the genomicsequence of PGB04 (likely the result of transposition fromthe bacterial host E. coli genome), all repetitive sequencesappear to have accumulated a large number of mutations,deletions and rearrangements suggesting that these ele-ments are no longer functional. The repeat regions in thegDNA of PGB02 (15%) and PGB04 (17%) have up to89% similarity to white spruce TE-related ESTs. The pres-ence of ESTs for these TEs indicates that members of theseretrotransposon families may actively proliferate in coni-fers, potentially increasing genetic variability.Remnants of DNA transposons of the cut-and-paste andcopy-and-paste classes were found within 4 kbp and 500bp of 3CAR and CYP720B4, respectively. In maize, theDNA-transposon helitron is associated with the duplica-tion of CYP72A [61], and DNA-based transposons havebeen implicated in the capture and transduplication ofhost genes in rice, Lotus japonicus and Arabidopsis [62-64].The proximity of DNA transposons to the protein coding3CAR and CYP720B4 genes is consistent with the possi-bility that a DNA transposon-mediated translocationmechanism may contribute to the diversification of theconifer TPS and P450 gene families.ConclusionWe report the first sequence assembly and annotation ofPage 8 of 13(page number not for citation purposes)family have been reported in pine [55,56] and more thanten different sequences with 60% to 80% identity to thelarge segments of gDNA from a conifer. We also demon-strate that genomic BAC clones for specific members ofBMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106large conifer defence gene families can be isolated in avery efficient and targeted fashion. This work providesimportant new information about the structure and con-tent of conifer genome regions associated with the 3CARand CYP720B4 genes in white spruce. Features of lowgene density, high content of repetitive sequence regions,and richness of TEs identified in this work are likely char-acteristic of conifer genomes in general.This work also provides relevant information for futureefforts to sequence a conifer genome. Cost-efficiency is acritical factor in genome sequencing and is a function ofsequencing chemistry, the complexity of the region beingsequenced, and the quality of the assembly. Our simula-tion of the effect of BAC sequencing depth on assemblycoverage showed that increasing the sequencing depthbeyond 5  7 × coverage results in only a marginal improve-ment of the sequence assembly. The future sequencing ofa conifer genome will likely use a combination of ultra-high throughput methods in combination with sequenc-ing of BAC clones to anchor the high throughput reads.The bi-directional Sanger sequencing used in this studygenerated high quality sequences of more than 1,000 bpaverage length which were critical for the assembly of full-length BAC clones. Low quality reads resulting in poorsequence coverage occurred in regions of complex andsimple repeats, which may also provide challenges forultra high-throughput sequencing.MethodsWhite spruce BAC libraryGenomic (g)DNA was isolated from 200 g fresh weight ofapical shoot tissue collected in April 2006 from a singlewhite spruce (Picea glauca, genotype PG29) tree at theKalamalka Research Station (British Columbia Ministry ofForests and Ranges, Vernon, British Columbia, Canada).A BAC library cloned into the HindIII site of pIndigoBAC-5 was made by BioS&T (http://www.biost.com/, Mon-treal). The non-arrayed library consisted of approximately1.1 million BAC clones with an average insert size of 140kbp, representing approximately 3× coverage of the whitespruce genome.BAC library screening and shot-gun subcloning into plasmid librariesThe BAC library was screened by BioS&T for two targetgenes, a TPS gene encoding 3-carene synthase (3CAR) anda P450 gene encoding a diterpene oxidase (CYP720B4)using the procedures detailed in Isodore et al. [28]. Inbrief, the entire BAC library was plated (977 plates;approximately 1,200 colonies per plate) and colonieswere transferred into ten 96-well plates with approxi-mately 1,000 BAC clones per well (pool). Twenty super-rows and eight horizontal columns. These super-poolswere screened by PCR for the two target genes. We used allavailable spruce EST and FLcDNA sequence informationto design PCR primers that are, to the best of currentknowledge, specific for the two target genes, while sup-pressing amplification of other known members of thespruce TPS and P450 gene families. Primers were designedto amplify fragments of approximately 500 bp, were eval-uated with white spruce PG29 gDNA. The primersequences (shown in 5'-3' orientation) are CTT-TCAAGCCCAATACCCAAAGGCACTG and GGGAAT-GGCAATCACTGCATTGGTATAG for CYP720B4; andGGAGAATTAGTGAGTCATGTCGATG and CTCTGTCT-GATTGGTGGAACAGGC for 3CAR. PCR products fromsuper-pools were sequenced to confirm the identity of thetarget DNA. The individual pool (well containing the tar-get gDNA clone) was identified, confirmed by PCR, andindividual BAC clones isolated as described in Isidore etal. [28].Isolated BAC clones PGB02 (3CAR) and PGB04(CYP720B4) were digested with NotI to release the insert,and insert DNA size was determined by pulse field gelelectrophoresis. The gDNA inserts of PGB02 and PGB04were isolated by gel purification and sheared using a neb-ulizer (Invitrogen). After blunt-end repair, gDNA frag-ments were size fractionated on SeaPlaque agarose gels(CBM Intellectual Properties, Inc.). Fragments of 7002000 bp were recovered and ligated into the SmaI site ofpUC18. Plasmids were transformed in E. coli DH10B.Sequencing and automated sequence assemblyShotgun subcloned plasmid libraries for PGB02 andPGB04 were arrayed in 384-well plates and gDNA insertswere Sanger-sequenced from both ends. Sequences werescanned and masked for vector sequences and contami-nating bacterial sequences, eliminating 21.4% (PGB02)and 27.9% (PGB04) of the total sequences. This high levelof contaminating DNA resulted from prolonged growth ofbacterial cultures prior to BAC isolation. We have subse-quently found that the use of Plasmid-Safe ATP-depend-ent DNase (Epicentre) reduces the amount ofcontaminating bacterial DNA.Sequences were processed using PHRED software (version0.020425.c) [65], quality-trimmed according to the high-quality contiguous region determined by PHRED, andvector-trimmed using CROSS_MATCH software http://phrap.org/. Vector and bacterial contaminated DNAsequences were identified by sequence alignments usingmegaBLAST to all UniVec and non-redundant bacterialsequences from NCBI respectively, and hits with 95%identity were subsequently masked with N's. ProcessedPage 9 of 13(page number not for citation purposes)pools of BAC clones were generated for each of the ten 96-well plates by combining the wells from twelve verticalsequences were assembled with PHRAP http://www.phrap.org/ using the base quality files and with theBMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106bi-directional reads generated for each clone pre-assem-bled by PHRAP to match paired reads. The two commonlyused assembling routines CAP3 and PHRAP were testedfor their capability of assembling the BAC sequences.Despite CAP3 employing a higher stringency as comparedto PHRAP [66], PHRAP assemblies of both BAC clonesresulted in fewer but higher quality contigs whichincluded more total sequences (PGB02: CAP3 49 contigs,PHRAP 14 contigs; PGB04: CAP3 19 contigs, PHRAP 14contigs). The gDNA sequences identified in this workwere submitted to NCBI GenBank under accession num-bers FJ609174 (PGB02) and FJ609175 (PGB04).Manual curation of sequence assembliesThe contigs for PGB02 (15 contigs) and PGB04 (14 con-tigs) obtained by automated sequence assembly weremanually curated. Sequences that prevented correctassembly such as sequences from chimeric DNA wereremoved and the remaining contigs were re-aligned.PGB02 was manually assembled into 2 contigs. Assemblyof PGB04 into a single contig required the re-introductionof several sequences which had been previously identifiedas contaminating E. coli sequence. Examination of this E.coli sequence identified it as the insertion sequence(EcIS10) of the plasmid-associated bacterial transposonTn10, which was presumably inserted into the BAC dur-ing proliferation. The left and right arms of the BAC vector(pIndigoBAC-5) were used to orient the remaining con-tigs, resulting in the final builds of PGB02 and PGB04.Oligonucleotide primers were designed to bridge gaps inautomated and manually curated sequence assemblies ofPGB02. PCR using PGB02 BAC DNA and primers placed1,112 bp and 993 bp on either side of the gap generated asingle band of approximately 2.2 kbp. Sequencing of thisPCR product verified up to 900 bp of sequence on eitherside of the gap but no additional sequence for the gapregion were obtained, possibly due to low sequence com-plexity. For sequence finishing, oligonucleotide primers(shown in 5'-3' orientation) were designed based on thesequence scaffolds of PGB02 (AATTGGTCAATTC-CTAAAACACCATG, AAATTATGGGTTTTAAGGGCTA-GAGTTC) and PGB04 (AACAAATTTACTCATTTACCCGTGA, CCCATCAAAATCCATGCCCAAG, TTC-CAAGTTCTTGTGGGAGGAG, GACTGATTTTCTCTCCAC-CAAGCAAG).Sequence analysisRepetitive DNA was identified with the RepeatMaskersoftware (A.F.A. Smit, R. Hubley & P. Green, unpublisheddata. Current Version: open-3.2.6 (RMLib: 20080801)),using the viridiplantae section of the RepBase Update [67]as a database. Gene models were predicted using the abwith similarity to DNA transposons were identified withRepeatMasker [44,67] with a threshold score over 200 anda length over 100 bp.Cloning and sequencing of up-stream regions of 3CAR and CYP720B4The regions upstream of the start codon including the5'UTR and promoter regions for 3CAR and CYP720B4were amplified by PCR using white spruce PG29 gDNA asa template. Gene specific oligonucleotide primers (shownin 5'-3' orientation) were based on the BAC scaffolds ofPGB02 (3CAR) (ACCCATCTTCACAAAATTAC, GTAGTC-CATAACGAGCAGAA) and PGB04 (CYP720B4) (TGA-TATTTGGTCTGCCATGGGCG,CATTTCCCTGCATGTATTCAATGCC, CCACCACATAGT-TAGACCGTGATGC).Authors' contributionsBjH, DH, MY, CIK and JB designed experiments, con-ducted the data analysis and interpretation of data andresults. BjH, DH, CO and BrH carried out experiments. JBand KR conceived of the overall study. CR participated inthe design of the study and coordination. BjH, DH, MYand JB wrote the manuscript. All authors read andapproved the final manuscript.Additional materialAdditional file 1Figure S1  Alignment of nucleic acid sequences of four closely related 3CAR gDNA fragments from white spruce (Picea glauca, Pg_3CAR1-4) and Sitka spruce (Picea sitchensis) (+)-3-carene synthase (Ps_Q09). The numbering above the alignment corresponds to the nucle-otide position of the complete 3CAR gene of PGB02. Underlined sequences correspond to primer binding sites used for sequencing.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2229-9-106-S1.pdf]Additional file 2Table S1. Sequencing summary of plasmid libraries for PGB02 and PGB04.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2229-9-106-S2.pdf]Additional file 3Figure S2  Size and read allocation of the PHRAP assembled contigs of PGB02 (A) and PGB04 (B). The upper panel in each of A and B shows the number of reads in all contigs with the relative percentage of total reads given on top of the bars. The lower panel in A and B shows the length of all contigs given in bp with the relative percent of the length of the respective contig in percent of the total assembly given above the bars.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2229-9-106-S3.pdf]Page 10 of 13(page number not for citation purposes)initio gene finder FGENESH (dicot matrix; [68]), Genscanand GeneMark.hmm with default parameters. RegionsBMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106AcknowledgementsWe thank Dr. Alvin Yanchuk, Dr. John King, and Mr. Barry Jaquish from the British Columbia Ministry of Forests and Range for access to white spruce trees and generous support of this project. We thank Ms. Karen Reid (UBC) for excellent support with project and laboratory management, BioS&T (Montreal) for producing the PG29 BAC library and clone identifi-cation, and the Michael Smith Genome Sciences Centre (Vancouver) for sequencing and bioinformatics support. The work described in this paper was supported with funding from Genome British Columbia and Genome Canada for the Treenomix Conifer Forest Health project http://www.treenomix.ca (to JB and KR) and the Natural Sciences and Engineering Research Council of Canada (EWR Steacie Memorial Fellowship and Dis-covery Grant to JB, and a Postdoctoral Fellowship to DH). JB is supported in part by the University of British Columbia Distinguished University Scholars program.References1. Ralph SG, Yueh H, Friedmann M, Aeschliman D, Zeznik JA, NelsonCC, Butterfield YS, Kirkpatrick R, Liu J, Jones SJ, et al.: Coniferdefence against insects: microarray gene expression profil-ing of Sitka spruce (Picea sitchensis) induced by mechanicalwounding or feeding by spruce budworms (Choristoneuraoccidentalis) or white pine weevils (Pissodes strobi) revealslarge-scale changes of the host transcriptome.  Plant Cell Envi-ron 2006, 29:1545-1570.2. Pavy N, Paule C, Parsons L, Crow JA, Morency MJ, Cooke J, JohnsonJE, Noumen E, Guillet-Claude C, Butterfield Y, et al.: Generation,annotation, analysis and database integration of 16,500white spruce EST clusters.  BMC Genomics 2005, 6:144.3. Ralph SG, Chun HJ, Kolosova N, Cooper D, Oddy C, Ritland CE,Kirkpatrick R, Moore R, Barber S, Holt RA, et al.: A conifer genom-ics resource of 200,000 spruce (Picea spp.) ESTs and 6,464high-quality, sequence-finished full-length cDNAs for Sitkaspruce (Picea sitchensis).  BMC Genomics 2008, 9:484.4. Friedmann M, Ralph SG, Aeschliman D, Zhuang J, Ritland K, Ellis BE,Bohlmann J, Douglas CJ: Microarray gene expression profiling ofdevelopmental transitions in Sitka spruce (Picea sitchensis)apical shoots.  J Exp Bot 2007, 58:593-614.5. Holliday JA, Ralph SG, White R, Bohlmann J, Aitken SN: Globalmonitoring of autumn gene expression within and amongphenotypically divergent populations of Sitka spruce (Piceasitchensis).  New Phytol 2008, 178:103-122.6. Pavy N, Boyle B, Nelson C, Paule C, Giguere I, Caron S, Parsons LS,Dallaire N, Bedon F, Berube H, et al.: Identification of conservedcore xylem gene sets: conifer cDNA microarray develop-7. Lippert D, Chowrira S, Ralph SG, Zhuang J, Aeschliman D, Ritland C,Ritland K, Bohlmann J: Conifer defense against insects: pro-teome analysis of Sitka spruce (Picea sitchensis) bark inducedby mechanical wounding or feeding by white pine weevils(Pissodes strobi).  Proteomics 2007, 7:248-270.8. Lippert D, Zhuang J, Ralph S, Ellis DE, Gilbert M, Olafson R, Ritland K,Ellis B, Douglas CJ, Bohlmann J: Proteome analysis of earlysomatic embryogenesis in Picea glauca.  Proteomics 2005,5:461-473.9. Lippert DN, Ralph SG, Phillips M, White R, Smith D, Hardie D, Ger-shenzon J, Ritland K, Borchers CH, Bohlmann J: QuantitativeiTRAQ proteome and comparative transcriptome analysisof elicitor-induced Norway spruce (Picea abies) cells revealselements of calcium signaling in the early conifer defenseresponse.  Proteomics 2009, 9:350-367.10. Bérubé Y, Zhuang J, Ralph S, Rungis D, Bohlmann J, Ritland K: Char-acterization of EST-SSRs in loblolly pine and spruce.  TreeGenetics & Genomics 2007, 3:251-259.11. Namroud MC, Beaulieu J, Juge N, Laroche J, Bousquet J: Scanningthe genome for gene single nucleotide polymorphismsinvolved in adaptive population differentiation in whitespruce.  Mol Ecol 2008, 17:3599-3613.12. Pelgas B, Beauseigle S, Achere V, Jeandroz S, Bousquet J, Isabel N:Comparative genome mapping among Picea glauca, P. mari-ana × P. rubens and P. abies, and correspondence with otherPinaceae.  Theor Appl Genet 2006, 113:1371-1393.13. Rungis D, Bérubé Y, Zhang J, Ralph S, Ritland CE, Ellis BE, Douglas C,Bohlmann J, Ritland K: Robust simple sequence repeat markersfor spruce (Picea spp.) from expressed sequence tags.  TheorAppl Genet 2004, 109:1283-1294.14. Keeling CI, Weisshaar S, Lin RP, Bohlmann J: Functional plasticityof paralogous diterpene synthases involved in coniferdefense.  Proc Natl Acad Sci USA 2008, 105:1085-1090.15. Martin DM, Fäldt J, Bohlmann J: Functional Characterization ofNine Norway Spruce TPS Genes and Evolution of Gymno-sperm Terpene Synthases of the TPS-d Subfamily.  Plant Phys-iol 2004, 135:1908-1927.16. Phillips MA, Walter MH, Ralph SG, Dabrowska P, Luck K, Uros EM,Boland W, Strack D, Rodriguez-Concepcion M, Bohlmann J, Gershen-zon J: Functional identification and differential expression of1-deoxy-D-xylulose 5-phosphate synthase in induced terpe-noid resin formation of Norway spruce (Picea abies).  Plant MolBiol 2007, 65:243-257.17. Ahuja MR, Neale DB: Evolution of Genome Size in Conifers.  Sil-vae Genetica 2005, 54:126-137.18. Morse AM, Peterson DG, Islam-Faridi MN, Smith KE, Magbanua Z,Garcia SA, Kubisiak TL, Amerson HV, Carlson JE, Nelson CD, DavisJM: Evolution of genome size and complexity in Pinus.  PLoSONE 2009, 4:e4332.19. Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ,Soltis PS, Carlson JE, Arumuganathan K, Barakat A, et al.: Wide-spread genome duplications throughout the history of flow-ering plants.  Genome Res 2006, 16:738-749.20. Hamberger B, Bohlmann J: Cytochrome P450 mono-oxygenasesin conifer genomes: discovery of members of the terpenoidoxygenase superfamily in spruce and pine.  Biochem Soc Trans2006, 34:1209-1214.21. Ro DK, Arimura G, Lau SY, Piers E, Bohlmann J: Loblolly pine abi-etadienol/abietadienal oxidase PtAO (CYP720B1) is a multi-functional, multisubstrate cytochrome P450monooxygenase.  Proc Natl Acad Sci USA 2005, 102:8060-8065.22. Liu JJ, Ekramoddoullah AK: Isolation, genetic variation andexpression of TIR-NBS-LRR resistance gene analogs fromwestern white pine (Pinus monticola Dougl. ex. D. Don.).  MolGenet Genomics 2003, 270:432-441.23. Liu JJ, Ekramoddoullah AK: Characterization, expression andevolution of two novel subfamilies of Pinus monticola cDNAsencoding pathogenesis-related (PR)-10 proteins.  Tree Physiol2004, 24:1377-1385.24. Ralph S, Park JY, Bohlmann J, Mansfield SD: Dirigent proteins inconifer defense: gene discovery, phylogeny, and differentialwound- and insect-induced expression of a family of DIR andDIR-like genes in spruce (Picea spp.).  Plant Mol Biol 2006,60:21-40.Additional file 4Figure S3  Effect of sequencing depth on assembly quality. The sequence reads from five plates were used in all possible permutations to build assemblies corresponding to one, two, three and four combined plates. (A) The number of contigs and the number of nucleotides repre-sented in the contigs. (B) Coverage relative to the manually curated sequence scaffold of PGB04 (93,592 bp). The fold coverage is indicated.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2229-9-106-S4.pdf]Additional file 5Table S2. Impact of sequencing depth on assembly quality of PGB04.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2229-9-106-S5.pdf]Page 11 of 13(page number not for citation purposes)ment, transcript profiling and computational analyses.  NewPhytol 2008, 180:766-786.25. Ralph SG, Jancsik S, Bohlmann J: Dirigent proteins in coniferdefense II: Extended gene discovery, phylogeny, and consti-BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/106tutive and stress-induced gene expression in spruce (Piceaspp.).  Phytochemistry 2007, 68:1975-1991.26. Friesen N, Brandes A, Heslop-Harrison JS: Diversity, origin, anddistribution of retrotransposons (gypsy and copia) in coni-fers.  Mol Biol Evol 2001, 18:1176-1188.27. L'Homme Y, Seguin A, Tremblay FM: Different classes of retro-transposons in coniferous spruce species.  National ResearchCouncil Canada/Conseil national de recherches Canada 2000,43:1084-1089.28. Isidore E, Scherrer B, Bellec A, Budin K, Faivre-Rampant P, Waugh R,Keller B, Caboche M, Feuillet C, Chalhoub B: Direct targeting andrapid isolation of BAC clones spanning a defined chromo-some region.  Functional & integrative genomics 2005, 5:97-103.29. Fäldt J, Martin D, Miller B, Rawat S, Bohlmann J: Traumatic resindefense in Norway spruce (Picea abies): methyl jasmonate-induced terpene synthase gene expression, and cDNA clon-ing and functional characterization of (+)-3-carene synthase.Plant Mol Biol 2003, 51:119-133.30. AGI: Analysis of the genome sequence of the flowering plantArabidopsis thaliana.  Nature 2000, 408:796-815.31. IRGSP: The map-based sequence of the rice genome.  Nature2005, 436:793-800.32. Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A,Choisne N, Aubourg S, Vitulo N, Jubin C, et al.: The grapevinegenome sequence suggests ancestral hexaploidization inmajor angiosperm phyla.  Nature 2007, 449:463-467.33. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,Putnam N, Ralph S, Rombauts S, Salamov A, et al.: The genome ofblack cottonwood, Populus trichocarpa (Torr. & Gray).  Science2006, 313:1596-1604.34. Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y, Peer Y Van de,Rouze P, Rombauts S: PlantCARE, a database of plant cis-actingregulatory elements and a portal to tools for in silico analysisof promoter sequences.  Nucleic acids research 2002, 30:325-327.35. Byun-McKay A, Godard KA, Toudefallah M, Martin DM, Alfaro R,King J, Bohlmann J, Plant AL: Wound-induced terpene synthasegene expression in Sitka spruce that exhibit resistance orsusceptibility to attack by the white pine weevil.  Plant Physiol2006, 140:1009-1021.36. Miller B, Madilao LL, Ralph S, Bohlmann J: Insect-induced coniferdefense. White pine weevil and methyl jasmonate inducetraumatic resinosis, de novo formed volatile emissions, andaccumulation of terpenoid synthase and putative octadeca-noid pathway transcripts in Sitka spruce.  Plant Physiol 2005,137:369-382.37. Phillips MA, Croteau RB: Resin-based defenses in conifers.Trends Plant Sci 1999, 4:184-190.38. Bohlmann J: Insect-induced terpenoid defenses in spruce.  InInduced Plant Resistance to Herbivory Edited by: Schaller A. Springer Sci-ence; 2008:173-187. 39. Eulgem T, Rushton PJ, Robatzek S, Somssich IE: The WRKY super-family of plant transcription factors.  Trends Plant Sci 2000,5:199-206.40. Rouster J, Leah R, Mundy J, Cameron-Mills V: Identification of amethyl jasmonate-responsive region in the promoter of alipoxygenase 1 gene expressed in barley grain.  Plant J 1997,11:513-523.41. Goldsbrough AP, Albrecht H, Stratford R: Salicylic acid-induciblebinding of a tobacco nuclear protein to a 10 bp sequencewhich is highly conserved amongst stress-inducible genes.Plant J 1993, 3:563-571.42. Klotz KL, Lagrimini LM: Phytohormone control of the tobaccoanionic peroxidase promoter.  Plant Mol Biol 1996, 31:565-573.43. Giuliano G, Pichersky E, Malik VS, Timko MP, Scolnik PA, CashmoreAR: An evolutionarily conserved protein binding sequenceupstream of a plant light-regulated gene.  Proc Natl Acad Sci USA1988, 85:7089-7093.44. Chen N: Using RepeatMasker to identify repetitive elementsin genomic sequences.  Curr Protoc Bioinformatics 2004, Chapter4(Unit 4):10.45. de la Bastide M, McCombie WR: Assembling genomic DNAsequences with PHRAP.  Curr Protoc Bioinformatics 2007, Chapter11(Unit 11):14.46. Keeling CI, Bohlmann J: Genes, enzymes and chemicals of ter-conifers against insects and pathogens.  New Phytol 2006,170:657-675.47. Trapp SC, Croteau RB: Genomic organization of plant terpenesynthases and molecular evolutionary implications.  Genetics2001, 158:811-832.48. De Bodt S, Maere S, Peer Y Van de: Genome duplication and theorigin of angiosperms.  Trends Ecol Evol 2005, 20:591-597.49. Rizzon C, Ponger L, Gaut BS: Striking similarities in the genomicdistribution of tandemly arrayed genes in Arabidopsis andrice.  PLoS Comput Biol 2006, 2:e115.50. Hudgins JW, Ralph SG, Franceschi VR, Bohlmann J: Ethylene ininduced conifer defense: cDNA cloning, protein expression,and cellular and subcellular localization of 1-aminocyclopro-pane-1-carboxylate oxidase in resin duct and phenolic paren-chyma cells.  Planta 2006, 224:865-877.51. Martin D, Tholl D, Gershenzon J, Bohlmann J: Methyl jasmonateinduces traumatic resin ducts, terpenoid resin biosynthesis,and terpenoid accumulation in developing xylem of Norwayspruce stems.  Plant Physiol 2002, 129:1003-1018.52. Martin DM, Gershenzon J, Bohlmann J: Induction of volatile ter-pene biosynthesis and diurnal emission by methyl jasmonatein foliage of Norway spruce.  Plant physiology 2003,132:1586-1599.53. McKay SA, Hunter WL, Godard KA, Wang SX, Martin DM, BohlmannJ, Plant AL: Insect attack and wounding induce traumatic resinduct development and gene expression of (-)-pinene syn-thase in Sitka spruce.  Plant Physiol 2003, 133:368-378.54. Ralph SG, Hudgins JW, Jancsik S, Franceschi VR, Bohlmann J: Amino-cyclopropane carboxylic acid synthase is a regulated step inethylene-dependent induced conifer defense. Full-lengthcDNA cloning of a multigene family, differential constitutive,and wound- and insect-induced expression, and cellular andsubcellular localization in spruce and Douglas fir.  Plant Physiol2007, 143:410-424.55. Liu JJ, Ekramoddoullah AK: Identification and characterizationof the WRKY transcription factor family in Pinus monticola.Genome 2009, 52:77-88.56. Zhang Y, Wang L: The WRKY transcription factor superfamily:its origin in eukaryotes and expansion in plants.  BMC Evol Biol2005, 5:1.57. Lai Z, Vinod K, Zheng Z, Fan B, Chen Z: Roles of ArabidopsisWRKY3 and WRKY4 transcription factors in plant responsesto pathogens.  BMC Plant Biol 2008, 8:68.58. Robatzek S, Somssich IE: A new member of the ArabidopsisWRKY transcription factor family, AtWRKY6, is associatedwith both senescence- and defence-related processes.  Plant J2001, 28:123-133.59. Godard KA, Byun-McKay A, Levasseur C, Plant A, Séguin A, Bohl-mann : Testing of a heterologous, wound- and insect-induciblepromoter for functional genomics studies in conifer defense.Plant Cell Reports 2007, 26:2083-2090.60. Bedon F, Levasseur C, Grima-Pettenati J, Seguin A, MacKay J:Sequence analysis and functional characterization of thepromoter of the Picea glauca Cinnamyl Alcohol Dehydroge-nase gene in transgenic white spruce plants.  Plant Cell Reports2009, 28:787-800.61. Jameson N, Georgelis N, Fouladbash E, Martens S, Hannah LC, Lal S:Helitron mediated amplification of cytochrome P450monooxygenase gene in maize.  Plant Mol Biol 2008, 67:295-304.62. Hoen DR, Park KC, Elrouby N, Yu Z, Mohabir N, Cowan RK, BureauTE: Transposon-mediated expansion and diversification of afamily of ULP-like genes.  Mol Biol Evol 2006, 23:1254-1268.63. Holligan D, Zhang X, Jiang N, Pritham EJ, Wessler SR: The transpos-able element landscape of the model legume Lotus japonicus.Genetics 2006, 174:2215-2228.64. Juretic N, Hoen DR, Huynh ML, Harrison PM, Bureau TE: The evo-lutionary fate of MULE-mediated duplications of host genefragments in rice.  Genome Res 2005, 15:1292-1297.65. Ewing B, Green P: Base-calling of automated sequencer tracesusing phred. II. Error probabilities.  Genome Res 1998,8:186-194.66. Huang X, Madan A: CAP3: A DNA sequence assembly pro-gram.  Genome Res 1999, 9:868-877.67. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichie-Page 12 of 13(page number not for citation purposes)penoid diversity in the constitutive and induced defence of wicz J: Repbase Update, a database of eukaryotic repetitiveelements.  Cytogenet Genome Res 2005, 110:462-467.Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Plant Biology 2009, 9:106 http://www.biomedcentral.com/1471-2229/9/10668. Salamov AA, Solovyev VV: Ab initio gene finding in Drosophilagenomic DNA.  Genome Res 2000, 10:516-522.yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 13 of 13(page number not for citation purposes)


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items